When does a distributed query engine typically choose a "shuffle hash join" vs. a "sort-merge join"?

Correct! Well done.

Incorrect.

The correct answer is A) Shuffle hash join can be used when one side is small enough to build an in-memory hash table after shuffling; sort-merge join handles larger datasets by sorting both sides on the join key

Correct Answer

Shuffle hash join can be used when one side is small enough to build an in-memory hash table after shuffling; sort-merge join handles larger datasets by sorting both sides on the join key

Explanation

Sort-merge join scales well for large datasets via sorting, while shuffle hash join can be faster when one side fits in memory after shuffling, avoiding the sort step.

Previous All Questions Next

Progress

91/100

📊

Browse All Big Data & Data Engineering Questions

100 questions · beginner to advanced