What is the mixture of experts (MoE) architecture?

Correct! Well done.

Incorrect.

The correct answer is B) A neural network architecture routing inputs to a subset of "expert" feed-forward networks, enabling sparse activation of a much larger total parameter count

Correct Answer

A neural network architecture routing inputs to a subset of "expert" feed-forward networks, enabling sparse activation of a much larger total parameter count

Explanation

MoE (Switch Transformer, GPT-4): each token routed to top-k experts (sparse activation). Total params >> active params per token. Enables scaling to hundreds of billions of parameters while keeping per-token compute manageable.

Previous All Questions Next

Progress

84/100

🧠

Browse All Artificial Intelligence & Machine Learning Questions

100 questions · beginner to advanced