paperperday

daily papers in ML


today's paper:

2025/03/02

Attention is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ɓukasz Kaiser, Illia Polosukhin

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of ...

previously featured:

  • ImageNet Classification with Deep Convolutional Neural Networks
    arxiv2025/02/08
  • Deep Residual Learning for Image Recognition
    arxiv2025/02/08
  • Neural Machine Translation by Jointly Learning to Align and Translate
    arxiv2025/02/08
  • Adam: A Method for Stochastic Optimization
    arxiv2025/02/08
  • Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
    arxiv2025/02/08
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    arxiv2025/02/08
  • Mask R-CNN
    arxiv2025/02/08

join the discussion:

DiscordRedditTwitter
Background