How does Git store objects internally?

Answer

Git uses a content-addressable file system. Every object is stored by the SHA-1 (or SHA-256 in newer Git) hash of its content. The .git/objects/ directory stores four types of objects: (1) Blob: stores raw file content. No filename, no metadata — just bytes. Same content = same hash, stored once regardless of how many files reference it; (2) Tree: represents a directory listing — contains references to blobs (files) and other trees (subdirectories) with their names and permissions. A commit's directory structure is a tree of trees; (3) Commit: points to a tree (snapshot), references parent commit(s), and contains author, committer, timestamp, and message. The SHA of a commit covers all of this — any change in history or content creates a different hash; (4) Tag: (annotated tags only) points to another object (usually a commit) with tag name, message, and tagger. Storage on disk: objects are stored in .git/objects/ab/cdef... (first 2 chars = directory, rest = filename). Packfiles (.git/objects/pack/) bundle many objects together with delta compression for efficiency. Understand this and you understand why Git is a "stupid content tracker" — a hash → content mapping with commit graph on top.

Answer

More Git & GitHub Questions