Git repository internals

Part 1. What’s inside

Git repository consists of the following objects:
.
Blob is a piece of content without metadata
Tree is a list of file/directory names pointing at blobs or other trees /* not considering subprojects */.
Commit is an object that 1. has pointer to other commit[s], 2. has pointer to tree (project root), 3. contains metadata (commit message, dates, authors, etc.)
Each ref points to (usually last) commit in some branch (either your or remote)

Part 2. A short story

1 Adding files

Let’s create a simple project:
mkdir qqq && cd qqq
mkdir src
echo ’int main(){printf("Helo wrold\r");}’ > src/main.c
cat > Makefile << \EOF
all:
        gcc src/*.c -o hello
EOF
​
git init
git add .
“git add .” addded src/main.c and Makefile. It created two blobs (main.c and Makefile) and two trees (root tree and “src” tree).
figure git1.png

2 Committing

git commit -m "Add files to project"
Now we have just created a commit. Commit is pointed by ref (name of branch), tree is pointed by commit, blob is pointed by tree.
figure git2.png

3 More commits

echo ’int main(){ printf("%s", "Hello world\n"); return 0;}’ > src/main.c
git commit -am "Typo fix"
We changed main.c. It gets new blob. So we need to have new “src” tree (with updated blob pointer). So we need to have the new root tree (with updated “src” tree pointer). New commit points to the new tree.
figure git3.png
mkdir src/plugins
echo ’void func(){}’ > src/plugins/extra.c
git add src/plugins
git commit -m "Add a plugin"
figure git4.png
cat > Makefile << \EOF
all:
        gcc src/*.c src/plugins/*.c -o hello
EOF
git add Makefile
git commit -m "Oh, forgot about Makefile"
figure git5.png
Note that now “src” tree is reused from previous commit (as nothing have changed there)

Part 3. Rewriting history

Commit “c54c” is not useful: the very next commit fixes obvious thing. We want to merge that two commits into one, eliminating the “wrong” intermediate c54c.
git rebase -i HEAD~2  # ~2 means we need to process 2 last commits
# in editor:
pick c54c
squash ff5f
This is convert commits c54c and ff5f to patches, rewind to commit 58cd then apply that two patches, but producing only one commit (you can also split and edit them). /* Note: this case can be done simpler: git reset --soft HEAD~2 && git commit -m “Add a plugin”*/
Stale commits “ff5f” and “c54c” still remain in repository and can be used for recovery (unless finally cleaned by “git gc”)
figure git6.png
If/when stale commits get cleaned (they will be there for a certain time), it will be like this:
figure git7.png
“ff5f” and “c54c” are gone, tree “aa54” is gone too (as it is not referenced by anything anymore).

Part 4. Interaction with other repositories

We can fetch or push commits to other repositories. Obtaining (or publishing) a commit implies transferring it’s trees and blobs as well. For example,
git remote add origin git://github.com/hello/exaple.git
git fetch origin +refs/heads/master:refs/remotes/origin/master
Registers URL of remote repository under the name “origin” and then fetches origin’s ref named “refs/heads/master” (or just “master”) to our local ref “refs/remotes/origin/master”, replacing old “refs/remotes/origin/master” even if it is not direct child (this will occur if, for example, somebody have fetched our commit ff5f or c54c and now we rewritten it to new one 7747).
Note: This was a full form of fetch command, usually “git fetch origin” or “git fetch” does the required thing.
Pushing command example is “git push origin 4112:refs/heads/master”. It also can have forcing “+”, source ref can be “master” or “HEAD” (or even nothing - means “delete it”) instead of direct commit number “4112” or be configured for just “git push” to do what you expect.