What is Git's garbage collection and how does it work?

Why Interviewers Ask This

Senior Git & GitHub engineers are expected to reason about architecture, performance, and edge cases. This question separates mid-level from senior candidates by testing deep system-level understanding.

Answer

Git garbage collection (git gc) cleans up unnecessary files and optimizes the local repository. Operations performed: (1) Pack loose objects: consolidates many small loose object files into packfiles; (2) Expire stale refs: remove reflog entries older than the expiry time (default 90 days for reachable, 30 days for unreachable objects); (3) Prune unreachable objects: remove objects not reachable from any branch, tag, or reflog entry (these are the "dangling" commits from hard resets, dropped branches, etc.); (4) Repack pack files: consolidate multiple pack files. Git runs gc automatically (git gc --auto) when certain thresholds are reached (e.g., too many loose objects). Force run: git gc. Aggressive GC (thorough but slow): git gc --aggressive — better delta compression; only beneficial occasionally. Prune immediately without waiting for expiry: git gc --prune=now. Why unreachable objects are kept temporarily: the 14-day grace period prevents data loss if you're in the middle of an operation (e.g., mid-rebase, stash not yet applied). Server-side: hosting platforms run gc on their infrastructure. git remote prune origin — remove stale remote-tracking references (branches deleted on remote).

Common Mistake

Candidates often give textbook answers here. Interviewers are more impressed when you relate the concept to a specific problem you solved in a real Git & GitHub project.