Intermediate Big Data & Data Engineering
Q62 / 100

What is "data skew" in a distributed join or aggregation, and why is it a problem?

Correct! Well done.

Incorrect.

The correct answer is A) When data is unevenly distributed across partitions (e.g., one key has far more rows), causing some tasks to take much longer than others

A

Correct Answer

When data is unevenly distributed across partitions (e.g., one key has far more rows), causing some tasks to take much longer than others

Explanation

Skew occurs when certain keys dominate the data, overloading the tasks/nodes handling them, which can be mitigated with techniques like salting or repartitioning.

Progress
62/100