🤖 Machine Learning / AI
Intermediate
What is the softmax function?
Answer
The softmax function converts a vector of raw scores (logits) into a probability distribution where all values are between 0 and 1 and sum to 1. For input vector z, softmax(zᵢ) = e^zᵢ / Σe^zⱼ. It amplifies differences between values — the largest logit gets the highest probability. Used in the output layer of multi-class classifiers, combined with categorical cross-entropy loss. The temperature parameter τ in softmax(z/τ) controls distribution sharpness: low τ → more peaked, high τ → more uniform (used in knowledge distillation and language model sampling).
Previous
What is the difference between classification and clustering?
Next
What is a ResNet (Residual Network)?
More Machine Learning / AI Questions
View all →- Intermediate What is a convolutional neural network (CNN)?
- Intermediate What is a Recurrent Neural Network (RNN)?
- Intermediate What is an LSTM and how does it solve the vanishing gradient problem?
- Intermediate What is the attention mechanism in neural networks?
- Intermediate What is the Transformer architecture?