What is the Transformer architecture?

Answer

The Transformer (introduced in "Attention Is All You Need", 2017) is an architecture based entirely on self-attention mechanisms, replacing recurrence and convolution. It consists of an encoder (maps input to context representations) and decoder (generates output). Each encoder/decoder block has multi-head self-attention and feed-forward layers with residual connections and layer normalization. Transformers process all tokens in parallel (unlike RNNs) and capture long-range dependencies efficiently. They are the foundation of BERT, GPT, T5, and virtually all modern NLP models.

What is the attention mechanism in neural networks?

What is BERT and how is it pre-trained?

More Machine Learning / AI Questions

View all →

Intermediate What is a convolutional neural network (CNN)?
Intermediate What is a Recurrent Neural Network (RNN)?
Intermediate What is an LSTM and how does it solve the vanishing gradient problem?
Intermediate What is the attention mechanism in neural networks?
Intermediate What is BERT and how is it pre-trained?

All Machine Learning / AI Questions Browse All Topics