Overview
The Git index (also called the staging area or cache) is stored as a binary file at $GIT_DIR/index. It serves as a critical data structure that tracks the state of files in the working directory and acts as a bridge between the working tree and the repository’s object database.
The index uses network byte order for all binary numbers and checksums are computed using the repository’s configured hash algorithm (SHA-1 or SHA-256).
File Structure
The index file consists of three main sections:
Header - Contains signature and metadata
Index Entries - Sorted list of tracked files
Extensions - Optional data structures for performance optimization
The index begins with a 12-byte header:
4 bytes - Signature: 'D', 'I', 'R', 'C' ("dircache")
4 bytes - Version number (2, 3, or 4)
4 bytes - Number of index entries
Python Example
Node.js Example
# Reading index header
import struct
with open ( '.git/index' , 'rb' ) as f:
signature = f.read( 4 )
version = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
num_entries = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
print ( f "Signature: { signature.decode() } " )
print ( f "Version: { version } " )
print ( f "Entries: { num_entries } " )
Index Entry Format
Each index entry represents a file and contains metadata from stat(2) plus Git-specific information:
Entry Structure
32-bit ctime seconds
32-bit ctime nanosecond fractions
32-bit mtime seconds
32-bit mtime nanosecond fractions
32-bit dev
32-bit ino
32-bit mode
32-bit uid
32-bit gid
32-bit file size (truncated)
Object ID (20 or 32 bytes depending on hash)
16-bit flags
Variable-length path name
1-8 NUL bytes for padding (versions 2-3)
Mode Field Breakdown
The 32-bit mode field is structured as:
16 bits - Unused (must be zero)
4 bits - Object type:
1000 (0x8) = Regular file
1010 (0xA) = Symbolic link
1110 (0xE) = Gitlink (submodule)
3 bits - Unused (must be zero)
9 bits - Unix permissions (only 0755 and 0644 valid for regular files)
Flags Field
The 16-bit flags field contains:
1 bit - assume-valid flag
1 bit - extended flag (must be 0 in version 2)
2 bits - stage (0 = normal, 1-3 = merge conflicts)
12 bits - name length (0xFFF if length >= 0xFFF)
Version 3+ : When the extended flag is set, an additional 16-bit field follows with skip-worktree and intent-to-add flags.
Version 4 Optimizations
Version 4 introduces two key optimizations:
Path Compression
Path names are prefix-compressed relative to the previous entry:
Entry 1: "src/main/app.js"
Entry 2: [N=9] "utils/helper.js" → "src/main/utils/helper.js"
The integer N indicates how many bytes to remove from the previous path before appending the new suffix.
No Padding
Unlike versions 2-3, version 4 does not pad entries to 8-byte boundaries, resulting in smaller index files.
Index Extensions
Extensions enable additional functionality without breaking compatibility:
4 bytes - Extension signature (optional if first byte is 'A'-'Z')
4 bytes - Extension size
N bytes - Extension data
Cache Tree (TREE)
Stores pre-computed tree objects for unchanged directories:
NUL-terminated path component
ASCII decimal entry count
Space (0x20)
ASCII decimal subtree count
Newline (0x0A)
Object ID (if valid, -1 means invalid)
Speeds up git commit by reusing existing tree objects
Improves git status performance when comparing against HEAD
Reduces object database writes for incremental commits
Resolve Undo (REUC)
Preserves pre-resolution merge conflict state:
For each conflict:
- NUL-terminated pathname
- Three NUL-terminated octal mode strings (stages 1-3)
- Up to three object IDs (missing stages omitted)
Enables git checkout -m to recreate conflicts.
Split Index (link)
Shares most index data via a base index file:
Hash of shared index (stored at .git/sharedindex.<hash>)
EWAH-encoded delete bitmap
EWAH-encoded replace bitmap
Replacement entries
Added entries
Untracked Cache (UNTR)
Caches untracked file information:
Environment validation strings
Stat data for $GIT_DIR/info/exclude
Stat data for core.excludesFile
32-bit dir_flags
Hash of exclude files
Directory tree structure with untracked entries
The untracked cache is invalidated if exclude files or environment variables change.
File System Monitor (FSMN)
Integrates with filesystem watching tools:
Version 1:
32-bit version (1)
64-bit nanoseconds since epoch
32-bit bitmap size
EWAH bitmap of non-valid entries
Version 2:
32-bit version (2)
NUL-terminated opaque token
32-bit bitmap size
EWAH bitmap of non-valid entries
End of Index Entry (EOIE)
Enables fast extension location:
32-bit offset to end of index entries
Hash over extension types and sizes
EOIE must be written last since it must be loadable before parsing entries.
Index Entry Offset Table (IEOT)
Enables multi-threaded index loading:
32-bit version (1)
For each block:
- 32-bit offset from file start
- 32-bit count of entries in block
Sparse Directory Entries
When using sparse-checkout in cone mode with extensions.sparseIndex:
Mode: 040000 (directory)
Flags: SKIP_WORKTREE bit set
Path: Ends with directory separator '/'
Index format versions 4 and earlier include the sdir extension signature to indicate sparse directory support.
Checksum
The index file ends with a hash checksum:
20 bytes (SHA-1) or 32 bytes (SHA-256)
The checksum covers all content before it, ensuring index integrity.
Sorting and Ordering
Index entries are sorted by:
Primary : Path name as unsigned bytes (memcmp order)
Secondary : Stage field (for merge conflicts)
int index_name_cmp ( const char * name1 , int len1 ,
const char * name2 , int len2 ,
int stage1 , int stage2 )
{
int cmp = memcmp (name1, name2, len1 < len2 ? len1 : len2);
if (cmp)
return cmp;
if (len1 < len2)
return - 1 ;
if (len1 > len2)
return 1 ;
return stage1 - stage2;
}
Working with the Index
Reading Index Entries
Reading Index Entry
Validating Index Integrity
import struct
import hashlib
def read_index_entry ( f , version ):
# Read fixed-size portion (62 bytes)
ctime_s = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
ctime_n = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
mtime_s = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
mtime_n = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
dev = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
ino = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
mode = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
uid = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
gid = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
size = struct.unpack( '>I' , f.read( 4 ))[ 0 ]
# Object ID (20 bytes for SHA-1)
oid = f.read( 20 ).hex()
# Flags
flags = struct.unpack( '>H' , f.read( 2 ))[ 0 ]
# Extract name length from flags
name_len = flags & 0x FFF
# Read path name
if version == 4 :
# Version 4: variable-width encoding
# (simplified - actual implementation more complex)
path = read_path_v4(f)
else :
# Version 2/3: read until NUL
path_bytes = b ''
while True :
byte = f.read( 1 )
if byte == b ' \x00 ' :
break
path_bytes += byte
path = path_bytes.decode( 'utf-8' )
# Skip padding to 8-byte boundary
entry_len = 62 + 20 + 2 + len (path_bytes) + 1
padding = ( 8 - (entry_len % 8 )) % 8
f.read(padding)
return {
'path' : path,
'oid' : oid,
'mode' : mode,
'size' : size,
'mtime' : (mtime_s, mtime_n)
}
Version 4 Use version 4 for large repositories to reduce index size through path compression and eliminated padding.
Split Index Enable split index mode for frequently changing repositories to avoid rewriting the entire index.
Untracked Cache Enable untracked cache to speed up git status by avoiding filesystem scans.
FS Monitor Integrate with fsmonitor for real-time tracking in very large working trees.