Advanced
Big Data & Data Engineering
Q100 / 100
Why might increasing the number of Spark shuffle partitions help with out-of-memory errors during a large aggregation, but also potentially hurt performance if set too high?
Correct! Well done.
Incorrect.
The correct answer is A) More partitions mean smaller per-task data, reducing memory pressure, but too many small partitions add scheduling and task-overhead costs that can slow the job down
A
Correct Answer
More partitions mean smaller per-task data, reducing memory pressure, but too many small partitions add scheduling and task-overhead costs that can slow the job down
Explanation
Tuning shuffle partitions balances memory per task against per-task overhead; too few causes large tasks that may OOM, while too many causes excessive scheduling overhead.
Progress
100/100