Advanced Artificial Intelligence & Machine Learning
Q89 / 100

What is the difference between model parallelism and data parallelism in distributed training?

Correct! Well done.

Incorrect.

The correct answer is B) Data parallelism: each device trains on different data batches with replicated models; model parallelism: splits the model itself across devices for models too large for one GPU

B

Correct Answer

Data parallelism: each device trains on different data batches with replicated models; model parallelism: splits the model itself across devices for models too large for one GPU

Explanation

Data parallelism (DDP, FSDP): scale to large datasets. Model parallelism (Tensor parallelism, pipeline parallelism): required when a model (e.g., 70B params) doesn't fit in a single GPU's memory. Megatron-LM uses both.

Progress
89/100