Advanced Big Data & Data Engineering
Q100 / 100

Why might increasing the number of Spark shuffle partitions help with out-of-memory errors during a large aggregation, but also potentially hurt performance if set too high?

Correct! Well done.

Incorrect.

The correct answer is A) More partitions mean smaller per-task data, reducing memory pressure, but too many small partitions add scheduling and task-overhead costs that can slow the job down

A

Correct Answer

More partitions mean smaller per-task data, reducing memory pressure, but too many small partitions add scheduling and task-overhead costs that can slow the job down

Explanation

Tuning shuffle partitions balances memory per task against per-task overhead; too few causes large tasks that may OOM, while too many causes excessive scheduling overhead.

Progress
100/100