How does Git store objects internally?
Why Interviewers Ask This
This is a differentiating question used for senior and lead roles. Interviewers want to see if you can explain not just what happens, but why — and what the trade-offs are in different approaches.
Answer
Git uses a content-addressable file system. Every object is stored by the SHA-1 (or SHA-256 in newer Git) hash of its content. The .git/objects/ directory stores four types of objects: (1) Blob: stores raw file content. No filename, no metadata — just bytes. Same content = same hash, stored once regardless of how many files reference it; (2) Tree: represents a directory listing — contains references to blobs (files) and other trees (subdirectories) with their names and permissions. A commit's directory structure is a tree of trees; (3) Commit: points to a tree (snapshot), references parent commit(s), and contains author, committer, timestamp, and message. The SHA of a commit covers all of this — any change in history or content creates a different hash; (4) Tag: (annotated tags only) points to another object (usually a commit) with tag name, message, and tagger. Storage on disk: objects are stored in .git/objects/ab/cdef... (first 2 chars = directory, rest = filename). Packfiles (.git/objects/pack/) bundle many objects together with delta compression for efficiency. Understand this and you understand why Git is a "stupid content tracker" — a hash → content mapping with commit graph on top.
Pro Tip
Before answering, structure your response: one-line definition → real-world analogy → concrete example from a project. This makes even complex Git & GitHub answers easy to follow.
Previous
What is semantic versioning (SemVer) and how does it relate to Git tags?
Next
What is a fast-forward merge and when does it happen?