🤖 Machine Learning / AI
Intermediate
What is BERT and how is it pre-trained?
Answer
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained Transformer encoder model that learns deep bidirectional representations by conditioning on both left and right context simultaneously. Pre-training uses two tasks: Masked Language Modeling (MLM) — 15% of input tokens are masked and the model predicts them; and Next Sentence Prediction (NSP) — predicts whether two sentences are consecutive. After pre-training on large corpora, BERT is fine-tuned on specific NLP tasks (question answering, classification, NER) by adding a small task-specific head.
More Machine Learning / AI Questions
View all →- Intermediate What is a convolutional neural network (CNN)?
- Intermediate What is a Recurrent Neural Network (RNN)?
- Intermediate What is an LSTM and how does it solve the vanishing gradient problem?
- Intermediate What is the attention mechanism in neural networks?
- Intermediate What is the Transformer architecture?