🤖 Machine Learning / AI
Intermediate
What is the attention mechanism in neural networks?
Answer
The attention mechanism allows a model to focus on the most relevant parts of the input when producing each output token, rather than compressing the entire input into a fixed-size vector. Given a query and a set of key-value pairs, attention computes a weighted sum of values, where weights are determined by the similarity between the query and each key. Self-attention computes attention between elements in the same sequence, enabling each position to attend to all other positions. Attention is the core building block of the Transformer architecture.
Previous
What is an LSTM and how does it solve the vanishing gradient problem?
Next
What is the Transformer architecture?
More Machine Learning / AI Questions
View all →- Intermediate What is a convolutional neural network (CNN)?
- Intermediate What is a Recurrent Neural Network (RNN)?
- Intermediate What is an LSTM and how does it solve the vanishing gradient problem?
- Intermediate What is the Transformer architecture?
- Intermediate What is BERT and how is it pre-trained?