Intermediate
Big Data & Data Engineering
Q62 / 100
What is "data skew" in a distributed join or aggregation, and why is it a problem?
Correct! Well done.
Incorrect.
The correct answer is A) When data is unevenly distributed across partitions (e.g., one key has far more rows), causing some tasks to take much longer than others
A
Correct Answer
When data is unevenly distributed across partitions (e.g., one key has far more rows), causing some tasks to take much longer than others
Explanation
Skew occurs when certain keys dominate the data, overloading the tasks/nodes handling them, which can be mitigated with techniques like salting or repartitioning.
Progress
62/100