Hacker News new | past | comments | ask | show | jobs | submit login
Common Git Problems and How to Fix Them (citizen428.net)
228 points by mzehrer on July 28, 2018 | hide | past | favorite | 91 comments



I think the most powerful tool for fixing mistakes is "git reflog". It doesn't fix everything, but it works very well as long as you view git with the right mental model: a git repo is an ever-growing tree of immutable commits, with branches and tags pointing to the most interesting ones. As long as code has ever made it into that tree (by being in any commit at any time), it's recoverable, and reflog lets you trace your steps back to any point in the past on that tree. Supposedly-destructive operations like amend and rebase actually just build the tree at a different point and move branches to other places in the tree, but (pretty much) nothing in the tree is ever destroyed.

For the actually-destructive git commands like checkout and reset, another tool that I'd highly recommend is a "local history" feature in your editor. JetBrains IDEs do this by default, and other editors have plugins for it. It automatically records changes as you make them so you can go back to previous versions of any file. Usually git is enough to save me, but I've also had plenty of times where I make some mistake outside of git commit history and am saved by digging through local history.


SmartGit has a brilliant implementation of the reflog.

The Git command line reflog is just a list of hashes and messages. If you're not sure which of those commits is the one you want, it's a fairly laborious process to dig through them. What if you have several commits with the same message, as will happen if you've rebased or amended any commits?

In SmartGit, you simply click the Recyclable Commits checkbox in the Log view, and now everything in the reflog shows up in the log tree, just like any other commit. You can see immediately the parent of each reflog commit, and to see what you changed, just click one of them as you would any other commit in the log. SmartGit shows the differences immediately.

Same thing for stashes. After all, stashes are really just commits by another name. Click the Stashes checkbox and they show up in the log too.

SmartGit is full of features like this where something cumbersome on the command line is straightforward and easy. I've used it for years and recommend it highly.

https://www.syntevo.com/smartgit/


It's also commerical, expensive software. I agree that it sounds useful, but are there any equivalent open source solutions?


It's only commercial software if you use it for commercial purposes. They have a pretty liberal non-commercial license as well. SmartGit is free to use for any of these purposes:

* to actively work on open-source projects,

* for learning or teaching on a public academic institution,

* in the spare time to manage projects where you don't get financial compensation for (hobby usage),

* by public charitable organizations primarily targeting philanthropy, health research, education or social well-being.

(Wording is from their license, English is not their first language.)

https://www.syntevo.com/documents/smartgit-license.html

For open source alternatives, a few people I've worked with like SourceTree. I don't think it has the integrated reflog though. I'd be curious about other recommendations too.


I tend to work on both Open Source and Commercial software.

The cost of context switching between different tools because of it's licensing makes it a non-starter to use one for open source and the other for commercial work.

Granted, the cost of a tool is minuscule compared to the potential gains from speed of development, and any corporation that balks at paying for tools for their developers won't be in business very long, I still find myself penny pinching and using free and or open source tools, even if inferior or lacking in just one or two bits of functionality.


$99 is expensive? That’s, like, a half hour of my time


Not everyone is on your hourly rate.

Hardly anyone is on your hourly rate...


> For the actually-destructive git commands like checkout and reset, another tool that I'd highly recommend is a "local history" feature in your editor.

Amen. Both vim and emacs do this by default, too. They keep all changes under a tree, so you can even undo your undos.

Another thing that's saved me from destructive changes is the terminal scrollback buffer, which typically represents a timeline of my work. I have a habit of looking at `git diff` frequently. If I lose any of those because of an accidental `y` in `git checkout -p`, I can just execute `git apply` copy the hunks that I want (with the initial file diff header) and paste them in the same terminal, then Ctrl-D to finish.


I believe you need the undo-tree package for Emacs to do that. :)


undo-tree displays it nicely, but you can still navigate it on stock Emacs with C-/ alone. You can insert "foo", undo the insert with C-/, insert "bar", undo the insert with C-/, and undo the undo with C-/ to get back to "foo".

This is also the case with vim. There is the gundo plugin to display a nice tree to navigate, but you can navigate it on stock vim with g- and g+. Emacs C-/ is actually equivalent to vim's g- and not the u command.

EDIT: I wish emacs had something like vim's g+ to move forward in that history. If I wanted to move back to "bar" after the undos that I did, I'd have to undo the undo of the undo with C-/, and undo the undo of the insert with C-/. Then, if I wanted to get back to "foo" again, I'd have to undo the undo of the undo of the insert with C-/, and then undo the undo of the undo of the undo with C-/. This can get confusing quickly without the visual display of the tree, but it's just a matter of hitting C-/ enough times to get to where I wanted to be.

EDIT 2: Now that I think about it, C-/ is not equivalent to g-, but it's the closest thing. It's just different models of undo that avoid any loss of state. While vim's model is an actual tree and undos are movements in it, emacs's is a ring and undos are inserted changes at the end of the ring. In vim you can be at any point of the tree, but in emacs you're always at the end.


In VIM one can also specify a point in wall time to return to, i.e. `:earlier 10m`. I'm not sure how granular it gets, but I've saved work that I was sure was gone with that tool.


Even more useful, you can use `:earlier 1f`, `:earlier 2f`, etc. (and `:later 1f`, etc.) to go back and forth to however many times the buffer was saved until the moment it was opened. So, for example, if I edit a source file that works, save it, see I made a mistake when I run it, try to fix it, save it again, and see another problem, I can go back to when it was working by doing `:earlier 2f` no matter how many changes and undos I've made.

EDIT: As to how granular it is, besides :write units, you can specify days, hours, minutes, seconds, or individual change units (the kind that u, g-, and g+ work with. So, `:earlier 10` is the same as `10g-`.


Thank you!


> As long as code has ever made it into that tree (by being in any commit at any time), it's recoverable, and reflog lets you trace your steps back to any point in the past on that tree.

Actually it's enough for the file to hit index (staging area) to be recoverable.


True, but finding the right sha1 is a bit more difficult in that case (given that blobs and trees don't have much metadata associated with them when compared to commits and annotated tags).


Do you mean it as recovering it while it's still in the index, or is it possible to add a file to the index, remove it from the index, let days of commits pass and still recover it?

If the latter, how do you do that?


Once you add it to index it's in .git/objects. You can find it using `find` with modification date parameter. The name will be a hash but you can look at the contents. (there will also be a tree object with file names).


I would recommend never using git checkout to clear local changes, and instead recommend either git stash or git stash -p depending on whether you want to save the whole work tree or just part of it. The stash subcommand internally creates commits for the stashed content, so you can probably also get to them via the reflog, though I haven't personally needed to do this.


If I have local "dirty" state that I need to hold on to, I just hide it in a temp directory like this:

  mkdir temp
  echo '*' >temp/.gitignore
Quick, dirty, effective. :)


Local History in my IDE has saved me many times!


Can the main article link to https://www.codementor.io/citizen428/git-tutorial-10-common-... which is vastly more readable.


Oh thanks, those snippet links make it really hard to read the post otherwise


Thank you, that’s so much better


Mods?


Why aren't code snippets part of the article? Why are they hosted somewhere else? Especially when they are 2-3 line snippets.


Agreed. I stopped reading after two (which I knew anyway) due to the annoyance factor. Hopefully author can fix and republish article.


> Originally published at gist.github.com

Maybe because the article was originally a gist[0] itself.

[0] https://gist.github.com/citizen428/16fb925fcca59ddfb652c7cb2...


That gist links back to the actual original article, which shows code inline:

https://www.codementor.io/citizen428/git-tutorial-10-common-...


This should be the posted article.

A lot of useful stuff here but not readable until I found this.


Better, but still too many typos or wrong syntax to be useful enough.


I'm wondering if they meant to embed them.


It looks like the author used Medium's import tool[1] but didn't change the gist links to embeds. The source for the import is a gist itself[2], which just has the raw links.

[1]: https://help.medium.com/hc/en-us/articles/214550207-Import-p...

[2]: https://gist.github.com/citizen428/16fb925fcca59ddfb652c7cb2...


I'm sorry, I hate "recipe style" Git articles.

Understanding reset and checkout is not hard if you understand the underlying data model. If you don't understand te data model, it's all black magic.

Interestingly, I feel the same way about actual recipe books for cooking. It's one thing to keep around as a reference. But if you don't know what a bay leaf tastes like and what it does to a dish, you won't learn anything from someone telling you to use it in a particular recipe.


I don't understand this fad that programmers should know only first principles, and shun checklists in favor of deriving everything from scratch every time.

I know how my car works but I still pull out the service manual when I need to change an air filter. Nothing under the hood is 'black magic' but seeing the procedure written out saves me a ton of time.

And: git commands are hard, even if you understand the underlying data model. (I've only been using it for 8 or 9 years. Maybe I'm just too dumb?) Understanding the data model won't help me remember that the "--amend" flag is how I edit the commit message (though I could probably build it myself from reset + commit, if I really wanted to), or what folder I should put global pre-commit hooks in.


I don't understand this fad that programmers should know only first principles, and shun checklists in favor of deriving everything from scratch every time. I know how my car works but I still pull out the service manual when I need to change an air filter. Nothing under the hood is 'black magic' but seeing the procedure written out saves me a ton of time.

The problem is that the explanations of the commands are wrong.

Reset does not undo a commit. It moves HEAD and updates the work tree. I have no problem with cheat sheets and quick references. Memorizing the commandline interface is not important. Use a cheat sheet. Handwaving over what the commands you are running actually do is asking for trouble.

Edit:

I was witness to a similar debate on IRC the other day. Someone insisted but you can't learn Python without understanding the underlying data model of strings in C. IMO that is ridiculous because the abstraction in Python is close to airtight; you just about never need to think about the implementation of strings when using strings.

Git, on the other hand, frequently exposes implementation details to the user. Casual users are likely to run into edge cases that don't understand. You basically can't resolve rebase conflicts effectively if you don't know what "ours" and "theirs" means. The man pages are indecipherable if you don't know what blobs and trees are. You don't need to teach people that stuff on day one, when they first learn about pull, add, commit, etc. But you had better teach them soon. I see no reason why a discussion of rebase couldn't at least casually explain how it's implemented: "check out base branch, cherry-pick commits from rebased branch", is that so hard? If they don't understand what a common ancestor is, how are they expected to know when to use --onto?


I would love to read a cooking book which talked about the effect of ingredients and process. Anybody know if one exists?

edit: well this one looks interesting https://www.amazon.com/Ingredient-Unveiling-Essential-Elemen...


I'll strongly second the Cook's Illustrated recommendation. If you want to know the reasons behind the recipes, there's no better place to start. I've used things I've learned from their articles many, many times to help with other recipes.

And here's some books that spend at least as much time talking about how to cook as they do giving lists of ingredients.

_The Zuni Cafe Cookbook_ by Judy Rodgers

_Cooking by Hand_ by Paul Bertolli

The French Laundry book by Thomas Keller is worth a read.

If you're into charcuterie, someone else mentioned Michael Ruhlman; his book _Charcuterie_ with chef Bryan Polcyn is excellent. _The River Cottage Meat Cookbook_ is also good.

If you want to go deep into ingredients, _The Elements of Taste_ by Gray Kunz and Peter Kaminsky (and _The Flavor Bible_ by Karen Page and Andrew Dornenberg (I haven't personally read that one all the way through, though)).

And you can always just pick up a culinary school textbook.


The Flavor Bible is more of an ingredient reference. It's also very Western- and Northern- (as in the hemisphere) centric.


I have a few I can personally recommend:

- How to Cook Everything by Mark Bittman (also How to Cook Everything Vegetarian)

- Ratio by Michael Ruhlman

- The Food Lab by J. Kenji Lopez-Alt

- Salt, Fat, Acid, Heat by Samin Nosrat


I can also vouch for Ratio, and in general any book by Ruhlman I've read has been excellent (give Egg a go)


Have a look at 'On Food and Cooking', Harold McGee. It isn't a cookbook, but McGee talks extensively about ingredients, process, and their interactions.


Take a look at Chef John’s recipes on YouTube. He knows his stuff, and he makes sure to explain the thought behind the process. It’s illuminating and entertaining.

https://youtube.com/user/foodwishes


I found such a thing in a textbook belonging to a friend's mother who went to some kind of household school in the 70s or so. It was a revelation.

When I later wrote some teaching material, I realized that there are two bad kinds of educational texts: Cookbooks and math textbooks. One is a simplistic series of steps without explanation, the other is a facts dump with no motivation, context or intuition.

Getting back to git, its documentation manages to combine the disadvantages of both styles: it is a disjointed cookbook where steps are explained in confusing technical terms that only make sense if you already know the theory, which isn't coherently explained.


Cook's Illustrated magazine doesn't get so far as explaining the chemistry, but they do document the lengthy trial, error and tuning process to get their recipes right, as well as what decisions they made and why.



Nathan Myhrvold

Edit: And many others too


I found the list helpful, with just about everything I would expect in a list such as this.

If I could make one suggestion: I found it to be quite annoying to have to click through to a two-line gist for every single one command. Having those commands be in the article itself would be considerably easier to read.


Linking to this comment instead of the actual link you want so that the right person gets credit :)

https://news.ycombinator.com/item?id=17634075


Can’t tell if joking.


Don't use push --force, use push --force-with-lease. Then you avoid accidentally undoing unexpected new commits on the remote.


Better yet, don’t use push —force at all unless you specifically intend to undo commits on the remote. Resolve conflicts locally and never force push to a shared branch.


This forces you to merge master into your shared topic branches instead of rebasing them on master, which makes the history harder to follow.


You can always run something like:

    git diff origin/your-branch..your-branch
to check whether you have made any unintentional changes to the code. For the commits themselves, you can do something like:

    git log origin/master..origin/your-branch
and

    git log origin/master..your-branch
to see if the commits differ. You can use the -p switch on the git log commands to see if the diffs have changed. If you do this before pushing up to the remote, then it's much easier to see what you're going to change before you run git push -f.


Could you explain this and why it requires --force?

(One of the features of git that I struggle to understand isn't a deal-killer for its adoption is the lack of repo access control; if you want this you have to implement it manually through PRs)


More information on this option, as well as caveats: https://developer.atlassian.com/blog/2015/04/force-with-leas...


I really wish they'd switch these around so --force used the lease logic by default and you had to turn it off. I don't think backward compatibility is anywhere near worth the problems this default causes. :/


Oh, that's pretty neat! Didn't realize that was an option with git!


Another reference site that has come in handy for me for quick reminders, and has a very memorable name -- http://ohshitgit.com/


More comprehensive (enough so that I haven't read every entry): https://github.com/k88hudson/git-flight-rules


I advocate:

git add -p

for interactive staging. It’ll present each hunk of code with a y/n prompt. This is a good habit to prevent committing any debug code or stray marks.


That's one of the reasons why I've always preferred a GUI for most of my day-to-day operations. It's a lot easier to click "Add Hunk", or CTRL-click a couple lines and click "Unstage Lines", then it is to go through the CLI options for dozens of hunks. Similarly, SourceTree's interactive rebase UI is great, and when I briefly played with Tower's latest beta, they made it as simple as drag-and-drop for individual commits.

On the flip side, it's a lot easier to do "git add -u" or "git add src/some/folder" for those use cases.


'tig' is a TUI for git, once you learn the shortcuts it's easy to stage specific lines/hunks.


I've gone further with this when staging parts of the diff while viewing the output of git diff in vim. You can visually highlight the part of the diff along with the diff header lines and run

    :'<,'>!git-apply --cached -
git apply is one of the git "plumbing" commands that can be used to apply patches to the working directory or the git index.


git checkout -p and git reset -p work too :)


In addition to what nerdponx said about recipe style articles being poor, there are some moderate issues with this article:

1. it doesn't clearly say that git checkout with a path is irreversible.

2. it says that git reset --hard is irreversible, which is not correct. (see 3.)

3. it doesn't mention one of the most powerful git features for fixing mistakes, the reflog.

4. git remove is not a git command.

5. gitingore is not a git file.

6. git-amend is not a git command.

7. It doesn't adequately explain why force-pushing causes problems.

It looks like clicking through to the gists was such a pain that even the author didn't proofread them.


Oh god, I missed this one:

git checkout with a path is irreversible

Not only is it a reversible, but it is destructive. It will overwrite untracked changes and even untracked files, without so much as a warning. I personally consider this a critical bug in Git, but apparently the mailing list denizens do not agree with me.


And this is exactly the sort of behaviour that led me to call git "The Swiss Army Chainsaw of Version Control". It can do anything, but if you wield it wrong it'll cut your leg off without hesitation.

I consider this a UX failure. Give me hg anytime.


That is, broadly speaking, not correct. The reflog makes it very difficult to permanently lose committed changes, and for the most part, commands that change the working directory are generally self-evident. git checkout is really the only common command that has this non-obvious behavior, and even then, only when checking out a path; checking out a commit will warn before overwriting working directory changes. In fact, from what I read, git is far better than mercurial in this respect, as reflog is always on, whereas the journal extension must be manually enabled.


What you heard about mercurial being worse is wrong. There is no need for a journal extension to prevent mercurial from permanently deleting commits. Operations that might delete commits such as strip, rebase, etc. either saves the removed commits as a backup bundle or marks the commits as obsolete and does not ever automatically garbage collect them.


Git is based on a one-two of UX responsibility evasion:

1) there's a distinction between the "plumbing" and the "porcelain", and all problems are blamed on the (replaceable) porcelain. So if you don't like it, use different porcelain.

2) Canonical 'git' is the only viable porcelain and we will never fix it.


It's important to train people to be aware of the difference between clean state and dirty state.

As long as you avoid dirty state you are guaranteed not to lose work.

You avoid dirty state in three ways.

1. Commit your work frequently in your working branch. You can always squash later using "git reset --soft".

2. Use "git stash".

3. Create throwaway directories if all you want to do is keep your experimental files handy:

  mkdir temp
  echo '*' >temp/.gitignore
Just like C forces you to be aware of how data structures are allocated internally, git forces you to have hygiene about your local file state.

The only "real" problem is when your git admin fails to pay attention and make sure the top-level .gitignore and .gitattributes are set up correctly.


Mods - please link the original, which has the code fragments inline:

https://www.codementor.io/citizen428/git-tutorial-10-common-...


Can someone explain to me the fascination for git, despite the constant stream of "how to fix [insert various git issues]" articles showing it's very difficult to use, at least compared to the alternatives? Most people's work streams only need something as simple as hg, which would also save them a lot of time and hassle.

My personal impression is that it's a fashion thing (people think it's cool to waste a lot of time on git, because git is cool and fixing git problems feels like real work, because you're using your keyboard and all that), but maybe someone has a different perspective.


I'm seeing only github gist URLs instead of code snippets. What became of progressive enhancement?


Nothing. They actually are links, not embedded gists.


Out of all these, I wish I could use git bisect the most. Working on large codebases with large teams, the command itself is a godsend.

Unfortunately a PC with Symantec Endpoint Protection makes the filesystem dog slow to the point that the one time I needed it and it should've taken about 13 steps to find where the bug was introduced, it was faster to redo the feature without accessing the previous code


Intrusive anti-virus products imposed by people that don't know better than to click on dodgy porn sites and download and run untrusted binaries is a great way to waste tens of thousands of dollars of developer productivity.

God help you if you're doing front-end development, with deep node_modules and build processes creating and destroying files.


Did you try to add your development folder to the AV's exception list?


You mean Symantec?


Ah yes. Fixed :)


So why do i have to open 10 more tabs to see the actual commands? Or am I supposed to blindly run them without knowing what they are?


As much as I like rerere, I have had it result in silently[1] producing the wrong end result. Especially since I merge / rebase frequently, and end up recording lots of small conflict resolutions. Always always always check non-clean merges by hand or you'll eventually be surprised.

And sometimes I just resolved it incorrectly once, and now it always does it wrong. Is there a way to make rerere forget a resolution?

[1]: relatively speaking. it doesn't fail / pause the operation, so it's "silent", even though it prints out that it applied a conflict-fix.


Was I the only one who was expecting something along THOSE lines? https://i.pinimg.com/originals/4d/ca/1c/4dca1ce93db819647f2f...


    function fuckgit
        rm -rf /tmp/fuckgit/
        git clone --no-checkout (git config --get remote.origin.url) /tmp/fuckgit/
        rm -rf .git/
        mv /tmp/fuckgit/.git .
        rm -rf /tmp/fuckgit/
    end


What shell is this?

You use /tmp in an insecure way. Please use "mktemp -d" for creating temporary directories.


It's fish. What's insecure about the way I'm using it?


/tmp is world-writable. Any local user could create /tmp/fuckgit and stuff it with malicous code.


Or create /tmp/fuckgit as a symbolic link beforehand, pointing to a directory which will be deleted by the first command in your function.


I see, thank you.


Need "missing blob" (corrupted git repo).

That problem is a depressing one and not uncommon [1]

[1] https://stackoverflow.com/q/18678853/1212596


incredibly annoying how the examples are links to another site. had to stop at the 3rd example.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: