Advanced
Artificial Intelligence & Machine Learning
Q84 / 100
What is the mixture of experts (MoE) architecture?
Correct! Well done.
Incorrect.
The correct answer is B) A neural network architecture routing inputs to a subset of "expert" feed-forward networks, enabling sparse activation of a much larger total parameter count
B
Correct Answer
A neural network architecture routing inputs to a subset of "expert" feed-forward networks, enabling sparse activation of a much larger total parameter count
Explanation
MoE (Switch Transformer, GPT-4): each token routed to top-k experts (sparse activation). Total params >> active params per token. Enables scaling to hundreds of billions of parameters while keeping per-token compute manageable.
Progress
84/100