Advanced Big Data & Data Engineering
Q91 / 100

When does a distributed query engine typically choose a "shuffle hash join" vs. a "sort-merge join"?

Correct! Well done.

Incorrect.

The correct answer is A) Shuffle hash join can be used when one side is small enough to build an in-memory hash table after shuffling; sort-merge join handles larger datasets by sorting both sides on the join key

A

Correct Answer

Shuffle hash join can be used when one side is small enough to build an in-memory hash table after shuffling; sort-merge join handles larger datasets by sorting both sides on the join key

Explanation

Sort-merge join scales well for large datasets via sorting, while shuffle hash join can be faster when one side fits in memory after shuffling, avoiding the sort step.

Progress
91/100