Advanced
Big Data & Data Engineering
Q91 / 100
When does a distributed query engine typically choose a "shuffle hash join" vs. a "sort-merge join"?
Correct! Well done.
Incorrect.
The correct answer is A) Shuffle hash join can be used when one side is small enough to build an in-memory hash table after shuffling; sort-merge join handles larger datasets by sorting both sides on the join key
A
Correct Answer
Shuffle hash join can be used when one side is small enough to build an in-memory hash table after shuffling; sort-merge join handles larger datasets by sorting both sides on the join key
Explanation
Sort-merge join scales well for large datasets via sorting, while shuffle hash join can be faster when one side fits in memory after shuffling, avoiding the sort step.
Progress
91/100