🤖 Machine Learning / AI
Intermediate
What is an LSTM and how does it solve the vanishing gradient problem?
Answer
An LSTM (Long Short-Term Memory) network uses gating mechanisms to selectively remember or forget information over long sequences. It has three gates: the forget gate (decides what to erase from cell state), the input gate (decides what new information to add), and the output gate (decides what to output). The cell state acts as a highway that allows gradients to flow with minimal modification, addressing the vanishing gradient problem of vanilla RNNs. LSTMs are used in language modeling, machine translation, speech recognition, and time series forecasting.
Previous
What is a Recurrent Neural Network (RNN)?
Next
What is the attention mechanism in neural networks?