Advanced
Artificial Intelligence & Machine Learning
Q89 / 100
What is the difference between model parallelism and data parallelism in distributed training?
Correct! Well done.
Incorrect.
The correct answer is B) Data parallelism: each device trains on different data batches with replicated models; model parallelism: splits the model itself across devices for models too large for one GPU
B
Correct Answer
Data parallelism: each device trains on different data batches with replicated models; model parallelism: splits the model itself across devices for models too large for one GPU
Explanation
Data parallelism (DDP, FSDP): scale to large datasets. Model parallelism (Tensor parallelism, pipeline parallelism): required when a model (e.g., 70B params) doesn't fit in a single GPU's memory. Megatron-LM uses both.
Progress
89/100