What is data augmentation?

Answer

Data augmentation is a technique to artificially expand the training dataset by applying transformations to existing samples. For images: random flips, rotations, crops, color jitter, and adding noise. For text: synonym replacement, back-translation, and random insertion/deletion. For audio: pitch shifting, time stretching, and adding background noise. Augmentation helps prevent overfitting by exposing the model to more varied inputs and acts as a regularizer. It is especially important when labeled data is scarce.