Basics like initializing (a repository), staging and commiting files aren’t explained here; they simply make sense; no ‘Aha!’s there. Moving references, branching and merging — coupled with Git’s arcane command names1 — are the confusing parts.
Basics
- Git is a distributed VCS; each repo can be both a server/client
- Honestly,
git
(sub)commands are just graph manipulating commands - Every codebase is made of a graph; each commit is a node with edges to parent(s)2
- Git diagrams often have arrows backwards (←) for this reason
- Git stores snapshots not differences i.e. entire file contents — as a blob
- Every commit is a complete snapshot of tracked repo contents + (0 or more) parent ID(s) identified with a 40-byte SHA-1 hash
- This way, the exact state of your project can be referred to, copied, or restored at any time
“finally figuring out that git commands are strangely named graph manipulation commands – creating/deleting nodes, moving around pointers” – Kent Beck
- Nodes of the graph are created by your commits
- Nodes are never really deleted in the traditional sense; they’re made unreachable (see below)
- These unreachable nodes eventually get garbage collected by Git
Reachablity
A---B---C
/
D---E---F---G
\
H---I
An important (linked-list) concept that applies to Git (too)
If the first node is lost, the list, too, is lost.
- Since a commit also has parent commit(s) (except root), following the chain of parents will eventually take you back to the beginning of the project
- In a well-branched graph, depending on the leaf node you start from, different parts of the graph will be reachable
- Commit X is “reachable” from commit Y if commit X is an ancestor of commit Y
- In the above example,
A
,B
andC
are unreachable fromG
, so areF
andG
when starting fromC
orB
orA
- In the above example,
- The
gc
subcommand3 walks the graph, building a list of every commit it can reach; removes unreachable ones4- Will clear-up disk space; no good reason to run it often
- Some Git subcommands may run it automatically too!
References
“References make commits reachable” – Think like a Git
- Plainly, references are “meaningful” names to some commits
- They facilitate easy git-speak with your friends/colleagues 😜
- Branches and tags are references too
- Creating a
branchreference is a way to “nail down” part of the graph that you want to return to later (reachability) - References are just reference-named files containing a 40-byte commit ID
- They’re specific to a single repository
- Remote references are local, remote-tracking references to a commit in a remote repository 5
- There’re many more ways of referring to commits:
man gitrevisions
is your friend- Collectively called commit-ish
Commands Affecting Refs
These are the primary subcommands that allow you to move refs directly:
commit
merge
rebase
reset
Subcommands that affect moving remote refs:
fetch
push
Commands like pull
, cherry-pick
, … work atop these.
Checkout vs Reset
Before getting into the details, here’s the gist
checkout
mostly operates on the working tree, whilereset
operates on index.
To understand both commands, you first need to understand HEAD
6. Most people know about the working tree and stating area but not HEAD
.
HEAD
references the currently checked out commit; your working tree will mostly be from this snapshot – the commit pointed to by HEAD
. Pro Git summarizes this nicely
HEAD
will be the parent of the next commit that is created.
Checkout
git checkout HEAD -- file
When you checkout
a file
from HEAD
, what you do is get a clean copy of file
from the commit HEAD
is pointing to; this replaces your working tree copy. Of course, one could use other refs too, HEAD
is just a convenient default, you can replace it with any ref; if HEAD
is omitted, it’ll be from index — the stage.
git checkout topic
When you checkout a branch (reference to a commit/node) e.g. topic
, HEAD
will be set to its tip commit and hence the entire working tree, not just a file, will be from the commit that branch is pointing to.
Reset
Plainly, reset
moves HEAD
around. It’s used to move HEAD
to a given commit. There’re different flavours of doing this — depending on what happens to the index7 and working tree (--hard
, --soft
, --mix
…) — but the crux is to move HEAD
.
But isn’t that what checkout
does too? Yes, but with a difference. Quoting Pro Git, with my emphasis
reset will […] move what
HEAD
points to. This isn’t the same as changingHEAD
itself (which is what checkout does); reset moves the branch thatHEAD
is pointing to8.
Caveat: with reset
, HEAD
moves the branch reference along with it, only if it’s attached.
Detached HEAD
Whoa! Slow down there, cowboy. Before talking about detached, what’s the attached state of HEAD
? We already know that HEAD
is just a reference to a commit. Say this commit also has another reference pointing to it: a branch name.
When
HEAD
is moved byreset
, if it’s attached to a branch, that reference too will move withHEAD
.
C1 <-- C2 <-- C3 <-- C4 <-- C5 <-- master
^
|
HEAD
git reset --hard C3
This would move both HEAD
and master
to C3
9. HEAD
would continue to be attached. Now if it weren’t attached, it’ll only move HEAD
leaving master
behind, hence the detached HEAD
state10.
In its detached state, HEAD
refers to a specific commit as opposed to referring to a named branch. Like Git’s diagnostic message says, it’s useful to poke around and inspect the code base at a particular commit. Making a new commit now would mean a commit only pointed to by HEAD
.
There’re a couple of ways to identify if HEAD
is detached. git status
’s very first line will tell you:
> git status
On branch master
…
> git status
# HEAD detached at 847fe59
Another way is to use git log
; I learnt from this actually.
> git log --oneline -5
847fe59 (HEAD -> master) Initial commit
…
> git log --oneline -5
847fe59 (HEAD, master) Initial commit
Notice that when HEAD
is attached, you see an arrow (→) pointing to the branch it’s attached to. However, in the detached state they’re listed as independent items.
Attach/Detaching HEAD
How do we attach or detach HEAD
to a reference? Both are done with checkout
, but with a subtle difference. To attach HEAD
, you’d checkout
> git checkout master
When you checkout a commit using anything other than a branch name, you’d detach HEAD
e.g. commit ID, HEAD~1, branch~3, HEAD{5}, HEAD^^
, etc. Since it wouldn’t know what to associate HEAD
with, Git detaches HEAD
. When you want to inspect the code base at a particular unnamed – except for its commit ID – commit, this is what you normally do.
> git checkout lk3nw7ef
Here, it doesn’t matter if this commit has other branch references to it. Since you referred to it using the raw commit ID, Git takes it as a cue to detach HEAD
.
Practise
I highly recommend playing around in Visualizing Git with checkout
, reset
; also get your hands dirty with the whole attach/detach business. Here’s a small snippet to get you started; see what happens as each command gets executed:
git commit
git commit
git commit
git commit
git commit
# create topic branch and checkout; HEAD now attached to topic
git checkout -b topic
# move HEAD one commit behind topic; this will also move topic with HEAD
git reset topic~1
# detach HEAD!
git checkout HEAD~2
# attach to master
git checkout master
# move back master by 3
git reset master~3
# move master forward/backward with commit ID
git reset f08ad6
Rebase
rebase
seems to have a scary reputation on the web, with good reason of course. It’s infamous for rewriting history; something your teammates mightn’t take kindly. However, when you’re doing this only locally, within your repo, before pushing, it’s a great tool.
The crux of a
rebase
: given a subgraph’s root node,rebase
changes its parent pointer from one node to another; thereby rebasing the entire subgraph to a new parent.
Take note, a commit is not just its contents but also includes its parent(s). So any kind of rebase entails — since the parent/lineage is changed — a change of commit ID for the same commit contents.
Interactive rebase (rebase -i
) is quite useful. I frequently use it to amend (not just the recent commit), fix, reword, edit, drop or squash commits. During an interactive rebase, one can even create multiple commits as usual and continue with the rebase; things will be taken care of! This is normal when dividing a commit into smaller parts.
pull = fetch + merge rebase? 🤔
When pulling from a remote branch, you might know that your changes are unrelated to the ones coming down. In this case, to avoid a merge commit and have a linear commit history, you’d pass --rebase
do override the default merge strategy of pull
: merge.
git pull --rebase origin master
git pull
is just git fetch
followed by git merge
which creates a new merge commit. git pull --rebase
, however, is git fetch
and git rebase
; it pulls commits from remote to your current branch and then replay your commits atop your current branch’s tip – this works if there’re no merge conflicts; otherwise you’ve to resolve conflicts as you’d normally. The resolution (changes) become a part of one of your commits where rebase halted; you’d end up re-writing your commit. However, you don’t have to force push your changes to the remote since the resolution just happened in your local commits. Rewriting (commit) history, as long as it is not public, is OK 😉
A counter point to pull-with-rebase: if you want logical separation of a set of commits, say for a completely new feature, then rebase — which makes them inline, muddled with unrelated history — isn’t the right tool; use merge
instead.
Use
git pull --rebase
when your changes do not deserve a separate branch.
seems to be the appropriate answer to when should I git pull --rebase
.
See Also
I get surprised by Git commands every now and then, I document the obscure but useful ones!
Learn by Doing
try.github.io for good DIY resources.
- Visualizing Git – lets you visualize your git commands
- Visualizing Git Concepts with D3 – explains commands with interactive images
- A Visual Git Reference – explains commands with images
References
See Also
Magit – Git porcelain for Emacs – shields me mostly but knowing them helps. ↩︎
Merge commits have more than one parent. ↩︎
Not to be confused with
git clean
which removes untracked files from the working tree. ↩︎git reflog
shows these otherwise unreachable commits. You’ve time untilgit gc
is run to make a commit reachable by adding a reference to it. ↩︎Remote-tracking branches (
origin/master
) are different from remote branches (origin master
); former is local, updated byfetch
ing from the latter. ↩︎Case-sensitive!
HEAD
will be the parent of a new commit in working tree, while a branch’s head means its tip; see glossary. ↩︎I thought index is empty until something’s staged. However, Pro Git clarifies that index actually has “all the file contents that were last checked out into your working directory!” Don’t believe me? Try
git ls-files -s
. You’ve to grok this to get whygit reset --mixed
works the way it does. ↩︎Pro Git is explaining
reset
’s internals here, so it may sound like it won’t moveHEAD
but only the branch, but rest assured that it moves both. ↩︎Using
C3
for readability; substitute with proper commit ID. ↩︎Refer
man git-checkout
; §DETACHED HEAD details it with nice ASCII art ✨. ↩︎