Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Do you ever truly use your revision history?
232 points by _bxg1 on March 8, 2020 | hide | past | favorite | 282 comments
Source control gives you a full history of every change that's ever been made to your codebase since its beginning. At my current company they place a huge value on that history, so much so that they haven't transitioned from SVN to git solely because of the logistical challenge of migrating 30 years of commits.

Obviously the write-only paradigm is useful when reconciling changes with others and when reverting recent, broken changes or recovering accidentally-deleted work. But to me, it seems like there's diminishing value the further back you go. I can't imagine getting much value from trawling through two-year-old commits, much less twenty-year-old commits.

So I ask: at your company and in your experience, do you get value from source-control-arachaeology? And if so, what does that look like in your case?




Hell yes, I use it daily. I'm in AAA gamedev and the codebase I deal with goes back 20+ years. The last 10 years are readily accessible in Perforce and the rest can be found in another version control system. I am forever grateful to past engineers for outlining WHY they made their changes, and not WHAT the changes were per se. With thousands of engineers that have come and gone, this is incredibly useful information in addition to the code itself.

IMHO revision history is just as valuable to a company as the code itself.


> I am forever grateful to past engineers for outlining WHY they made their changes, and not WHAT the changes were per se.

So true, we have this one senior developer who gets mad if someone's algorithm isn't as efficient as it could be (fair enough I suppose). But we can't get him to use commit messages that are more than 1-3 words and simply mention a word or three about the area of code that was changed. Years later, he also can't remember WHY he made those changes, so I'd much rather work with someone who writes inefficient algorithms that are easily improved at any time than commit comments that are forever useless.

What was changed is easily seen in the commit itself, why needs to be in the commit message.


I wonder why is it so hard for people to right good commit messages. At my company I've actually tried trying talking in-person to people, even write a doc explaining the benefits and how to do it, pointing people to good resources like this one: https://chris.beams.io/posts/git-commit/

And still, I can't get people to do it. I find it so valuable to look at commit messages that are written, that explains the why behind the changes in the commit but can't get people to see the same value as I do. Any tips on that? Would be really appreciated. :)


"What was changed is easily seen in the commit itself, why needs to be in the commit message."

Commit message is like subject of an email. Is n't it faster to look at commit message, and get an idea about change rather than go through commit and understand it?


The first line of the commit message should be the "subject line". The rest of the commit message should contain a summary of the high-level stuff such as the "why".


Yes, fair enough... having both in the commit message is ideal. I just mean that the absolute minimum is the why because at least the what can be deconstructed from the commit.


> I am forever grateful to past engineers for outlining WHY they made their changes, and not WHAT the changes were per se.

Oh, so much this! The same applies to in-code comments. If your revision notes (or your code comments) are only telling me what I can plainly read in the code, then they're utterly pointless. Tell me what I can't read in the code: the "why"s, as well as potential consequences and "gotchas" the changes may present.


> I'm in AAA gamedev and the codebase I deal with goes back 20+ years.

I'm curious what parts you work on (engine/tooling?). I've always had the impression that games usually have more throwaway code than other types of applications.


Until the OP mentioned Perforce, I was 100% sure they were a co-worker of mine. I also work on a 20+ year old project, mainly on the core graphics/game engine. It’s mainly used to power a single game franchise, although I’d say that any commits from more than 2 major versions ago aren’t all that useful anymore. Too many things keep changing especially around the area that I work on.


Well, the Unreal Engine is 22 years old. I doubt the current incarnation is free of legacy code.


The current version was developed from scratch.


Could easily be a sports game. That was my guess, then I remembered that the quake engine powered Half Life 1 forever. Could possibly also be working with the iD engine?


How's the AAA industry nowadays? I got out awhile ago (though HoN never really counted as AAA) but it was a fun ride at the time.

Is work life balance a bit better now, or does everyone still push themselves pretty hard?


It's both better and worse than its ever been.

There's an awareness and discussion about "sustainable" development practices, but a large portion of our workforce had to leave for stress reasons last year, on a project that is saying "sustainable development" the loudest.. so while it feels like lip service, at least there's an awareness at some level.

(also, gamers are more entitled than ever, so we're always running; which causes our games to be buggy as hell which slows us down later.. horrible and completely unsustainable)


I’m making a game of my own right now and am curious about the larger industry

- is it common for AAA companies to claim ownership over all ip you create, even outside work? (My last job did this)

- How would one find part time or short term contract work in the games industry?


> - is it common for AAA companies to claim ownership over all ip you create, even outside work?

Yes. This is super common. Depending on the company you can make some kind of agreement. Most agreements are based on income (so if you make a lot of income you need to renegotiate, something 10% your yearly salary or something).

> - How would one find part time or short term contract work in the games industry?

If you’re an artist, I guess this is easier because those are contracted out positions. But for others it’s unlikely the company will hire you for part time work. They seem to want everyone giving 111%, a part timer might be useful but could cause blocking issues.


But that sounds like you value documentation, not necessarily the code history


Code history is documentation. There are lots of different kinds of documentation: code API level, module level, system level, tutorials, even books in some cases. Revision history is just another one of those levels, and I believe it is the best at capturing the "why"s of systems rather than just the "what"s.


I agree code history can be used as a form of documentation, but in cases like this looking through years of code to find the decisions/reasons leading to a particular design seems like inefficient communication. It seems like "real" documentation with a few sentences explaining directly would be more suitable.


I disagree. In some cases, code history can be much more efficient. You really need a mix of both.

There will be things that are much better captured as part of a revision/commit, especially if your commits are well-designed, grouped into logical chunks, and include messages themselves (and maybe are linked to a project management tool).

You will need information like "this code was added as by X as part of work they were doing on Y, and they also made changes in other parts of the code as part of that". That context is really valuable.

You can think of it like event sourcing, which captures a lot more information than traditional mutation of data, and as a pattern, is a lot more rock solid... except event sourcing for your data is (usually) much more difficult to implement in practice, and code revisions are already an almost completely solved problem.


Capturing most of your data in your commits/revisions seems to suffer from a lot of the unsolved event sourcing issues:

- How do I quickly find the info that I want? How do I "query" the commit log? Often times, we want a "view" of the history which tells us specific info. If I need to scan through half of the commits just to get a good understanding of the architecture of the code, then that's more wasteful than just having a design doc. If I'm troubleshooting a production bug, then the granularity the commit log offers becomes important enough to offset its slow "query" speed, so I'd want enough "why" commits in the commit log and outside of people's heads.

- People write to the commit log without a well-defined "schema". If you use something like tags, how do you handle changes to the tags ("schema evolution")

This is my train of thought for why I lean towards "why" comments near the code or in a design doc over commit messages, which I allow to be a more sloppy.

A higher-level thought: The attractiveness of the event sourcing analogy often comes from assuming that the commit log should be a strongly consistent source of truth. However, it's good to remember that a huge amount of info about the code is stored in the team members' heads. In particular, the code writer knows a huge amount that can't be easily documented. So an alternative analogy would be to think of each member as a VM attached to block storage. If a VM fails (the person gets sick) or leaves the cluster (they leave the job), then you lose all of the data in block storage. So, the team wants to facilitate just enough overhead/admin work to transfer important data from individual team members to shared but slower storage (like the commit log, design doc, comments, etc.)


I recommend checking out Peter Naur's essay "Programming as Theory Building"[0] as it touches on the subject of a program being more than the code + documentation, it lives on its designers' and developers' heads, their intents, visions, etc.

[0] http://pages.cs.wisc.edu/~remzi/Naur.pdf


I personally think of version control history as like a sedimentation layering of documentation that "updates" itself in the process of doing the work -- like the desk that looks messy, but by picking up and using papers, the most important stuff is on top. "Real" documentation can be clearer, but it must be maintained manually, and cleared out regularly. VCS kinda handles this with less process weight.

The right tactic is def a mix of both though, so I think I'm in agreement with you :)


It's because "your commit needs to link to something in the bug tracker" is pretty easy for code review to enforce, but I have never seen an org manage to indefinitely keep an accurate as-built design doc beyond the code itself. You can convince people to write new aspirational design docs for intended major changes, but after approval those never get updated to reflect what really got built, and lots of small bugfixes don't get one at all.


You might find some weird hack in codebase that isn't obvious what it does at the first glance. This isn't something that people document in official documentation but even finding a JIRA ticket that is linked with that specific commit can help tremendously.


This is a great comment, except for the last line, which is extreme hyperbole.


Parent clearly stated it as opinion.


Isn't it a tautology? Revision history includes the code itself.


Frequently. Just this evening I was looking in the HN repository for the last version of the code that pg wrote, to remind myself how he used to do something.

One of my favorite tricks is to make a file out of all the changes in the history:

  git log -p > bigass
and then grep through the file (edit: which I like to do in Emacs—hence the file) to see every appearance of some construct. There's a lot of knowledge in there. It's particularly useful when you remember that you did something, but forget how you did it.

In fact, I use git proactively this way, to store things in the version history that I might want to remember later. For example, if I write exploratory code to test out a feature or throwaway code to do some analysis—anything I might want to use again, but don't want to commit to the codebase—I'll add it as a commit and then immediately revert the commit (i.e. make a new commit that deletes what I just added). The codebase remains unchanged, but what I just did is now there forever for future me to recover.

Such an approach only works if your system is small, but I like to work on small systems and prevent them from becoming large systems. There's a beneficial feedback loop here: as you get comfortable working with history, it gives you more confidence to delete things, helping to keep the system small.

I've also found this technique useful for solving the chronic problem with documentation: that it inevitably fails to get updated. When I write something about the code, I commit it and, as above, immediately revert the commit. Now it's permanently glued to the state of the code when I wrote it. When I read it in the future, I can do so alongside a diff of the code from then to now. This makes it easy to see what has changed in the meantime, in which case I can update the document and commit/revert it again.


This is good. I'm stealing it. It's using git as sort of a super-brain, helping you remember stuff you wouldn't otherwise. It's how I use gmail. Instead of shooting for inbox zero, I just leave anything in there that I might want to remember later and delete the rest. Then, many times years later, I can search through looking for important correspondence. Both of these stories are good examples of how simple but flexible tools can be used perhaps in ways not anticipated by their creators or the majority of their users.


I use gmail the same way. I do feel that the search functionality could be improved though which I find ironic.


I use gmail that way too! Never made the connection to the git thing though.


Gmail search used to be the best. Now it fails to find anything useful most of the time


Inbox zero isn’t deleting everything, it’s archiving those that are done so they aren’t in your immediate inbox.

What you’re doing sounds compatible with that.


> git log -p > bigass

> and then grep through the file (edit: which I like to do in Emacs—hence the file)

Since you already have Emacs, I would suggest you do M-& git log -p RET instead. No need to round-trip the data through a file.


I had a feeling someone would come up with a suggestion like this! Cunningham's Law and all that. Posting wrong or insufficient things on the internet is a great learning strategy.

However, I get:

  WARNING: terminal is not fully functional
  -  (press RETURN)
and on pressing return, I only get enough data to fill the buffer.


I always set PAGER=cat in Emacs. If you're running a VT100 emulator in Emacs this may be bad, otherwise it avoids the problem your describe.


I find the temporarily localized documentation idea quite interesting – anyone else here tried that?

> Such an approach only works if your system is small,

I see no reason why this wouldn't scale if you (and other people working on the same codebase) just maintain individual branches (say "dang-braindump"). Just do exploratory/documentation work on that and cherry-pick, rebase -i or git merge --ff --squash --no-commit to skip the reverts.


> I find the temporarily localized documentation idea quite interesting – anyone else here tried that?

I've done this a bit.

For several of my $DAYJOB projects I maintain documentation as Markdown files in the project, which get rendered into a website at build time. The project changelog is kept in version control the same way (https://keepachangelog.com).

Several times I've tried an experiment or alternate approach that didn't work out and done exactly this, complete with documentation, so the state is preserved. It's easy to document a prototype if it just means throwing a few paragraphs into a text file with the prototype.

Add a note in the changelog about adding the prototype then commit.

Make the revert commit and add a note in the changelog that you've removed the prototype.

Keeping a good, human-readable changelog as described above makes this strategy more scalable. You can document failed experiments you backed out, features that have been removed, and the like, in a few quick summary lines, keeping devs aware of them across the project's lifespan.

New devs can review it for the big-picture history and use the changelog's revision history to see what commits they need to see for the details.

Using a changelog this way also makes it reasonably possible to delete code but still remember it existed years later, something that's otherwise a surprisingly hard problem (especially if the project has high turnover).


Thanks for sharing! I'm not a big fan of changelogs, personally, except for publicly released projects. I prefer to have workflows that use version control for this. Reverting on master, but tracking the prototype in the changelog is a nice hack, but has a cost in terms of clutter, ability to bisect etc. I think I would prefer something branch based, but I haven't yet properly thought about the best way of doing it. If you have a special prototype branch from which you merge in master and do your prototyping, reverting and optionally notes.md there, you might effectively be able to replace the changelog based flow with just doing git log --no-merges --first-parent. What do you think?


I think as the number of branches grows, people will just start to drown in the huge list of unmerged branches and will start to work as if only trunk/master exists.

With a text file that summarizes all the project's main line of history in a structured, linear way, it's very simple to extract just what you're interested in or find all the entries for a particular thread of research across history, quickly.

I don't think the tooling around git branches really enables that kind of operation in a quick, easy way, partly due to the inherent nature of branches.

I'd be interested to see what your hypothetical workflow turns out like in practice, though. It's certainly possible I'm just not seeing the potential.


I’m stealing this too! I often try to persist exploratory history as separate remote branches, but for some reason a lot of the devs I’ve worked with are very adamant about deleting any and all “unneeded” branches, and I end up appending such branch names with a “-dont-delete”, but that starts to get abit passive aggressive. Now I can just do this and nobody needs to know


Or at least by the time they know, it will be too late :)

There are some disadvantages. It can mess with git bisect, for example.


What's the advantage of doing this over just making an experiment branch? A branch would be easier to find than a reverted commit somewhere in the mainline branch.


For me I end having a bunch of “exper-xyz” (experimental) branches that are often small changes that I may never merge into master. The problem is I loose context of that branch, so I kind of like the idea of the reverted commits, will need to try it, for small changes at least.


Then I'd have too many branches and no longer a single place to search.


> For example, if I write exploratory code to test out a feature or throwaway code to do some analysis—anything I might want to use again, but don't want to commit to the codebase—I'll add it as a commit and then immediately revert the commit (i.e. make a new commit that deletes what I just added).

Why not use a named stash for something like this? It'll keep your history cleaner, and you can always find the stash by name later.


Too complicated for me. I didn't even know that named stashes were a thing, and I wouldn't remember the names.

The nice thing about the approach is that there's only one place to look. I don't think it makes the history less clean, since it only contains things that were at some point, if only briefly, in the system.


Can stashes be pushed to a remote repo?


Along these lines: git-log has a --grep option which returns only commit messages that match a regex, and git-grep searches over revision controlled files like above.

>I like to work on small systems and prevent them from becoming large systems.

This is very wise!


Similarly, ‘git log -S needle’ will surface all commits with ‘needle’ in the diff.


Yeah, the grep option is on the log messages. Use -G if you want the regex in the diff.


You can pipe through `less` and then search using `/`, no need to redirect into a file (and this way you don't have to wait for the history to be done dumping before you start your search, which can be a pain for very huge repos).


I do that when I know just what I'm searching for, but for this sort of meandering around, I prefer to explore a file in Emacs. I can quickly get a list of everything that matches a certain regex and hop between the locations etc. It's much easier and faster (for me—a lot of this is just whatever one's used to). You're right that for a much larger repo, waiting for a file to dump, not to mention Emacs to open it, would make this impratical.


can anyone truly ever gitlog bigass?


All the time!

Two weeks ago I found something in a critical library at work (that ~every single C++ binary we run depends on: our main implementation of our custom threads' executor API) that made no sense. I couldn't understand why a variable was being rounded before being passed down to a lower layer, in a way that introduced an average 0.5 Ms of latency to many operations (I estimate that at peak, just one of the binaries that I maintain, a caching system, runs this code at least 200 million times per second), for no gain that I could see. There even was a comment attempting to explain why the rounding logic was added, but it was factually incorrect. As far as I could tell, I could just delete the rounding logic and everything would just work. I was baffled.

... until I looked at the code history! It explained it immediately (well, in like 5 to 10 minutes): the code from 2013, when the rounding was introduced, was calling into some lower level API that received parameters in a way that had limitations that ... Well, let's just say made it very clear to me why the rounding had been added.

Someone cleaned up the lower level library in 2016 or so, but the rounding remained in the upper layer.

This is just one example of many. I do this all the time.

Just two days ago, I was running scripts to extract lines-of-code by author and reviewer over different directories to get a sense of the size of the contributions of different team members, as part of the employee performance evaluation process (obviously, LOC is just one of many many many signals, and has to be taken in context). "Interesting, this person has already contributed 4k LOC to this particular directory, I didn't realize that!" Or "Source code files in the directories of the components that this person is a Tech Lead for had contributions from 131 engineers in 2019; of these, at least 56 engineers contributed more than 100 loc."

I guess I'll call out also that when I find a reproduceable bug that I can't explain, being able to binary search in the code history until I find the first change that exhibits the bug can be a life saver. I don't do this very often, but I estimate that, when I've done it, it has saved me days, possibly even weeks, of work.


Chestertons Fence, to put a name to the phenomenon.

https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence


How so? In GP's case, they found that the "fence" really was no longer needed:

> Someone cleaned up the lower level library in 2016 or so, but the rounding remained in the upper layer.

This is like the opposite of a Chesterton fence, where the reason it was put up no longer exists, so it's totally safe to remove it.


Chesterton's fence is not about never changing things.

They followed Chesterton's fence to the letter. They saw something that didn't make sense, and then tracked down why it worked the way it did.

Once they understood the root cause, they examined the environment and discovered that the underlying issue had been fixed. That allowed them to confidently rework the inconvenient behavior into something better.


The point of Chesterton's fence isn't that the fence is necessarily still useful, it's that being ignorant of the usefulness of a thing isn't equivalent to knowing that it's useless.


> Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.

This is the bit that applies.


Your first example hits the nail on its head. I use code history all the time to understand the context in which a bug or weird looking piece of code was introduced. If the developer who wrote it isn't around anymore, or can't remember the details, then this information is really crucial to gain confidence that changing or removing the code won't just introduce another obscure bug because you didn't understand the reasoning behind it.


Seconded heartily: git bisect is one of those tools I use very infrequently but when I do, it is a lifesaver.


I think there's some indirect psychological value that shouldn't be underestimated.

People have a tendency to comment out unused code "in case they still need it". Or not delete unused stuff, because who knows what.

I have the feeling that I'm much more inclined to just delete a bunch of code lines that "I might still need in some situation" if I know there's version control. Because even if it's unlikely, "I can get it back if I want to" is a good feeling.

I think this leads to less cluttered code overall.

Also something that came to mind: When the shellshock vuln was discovered in bash noone really knew when and how it got introduced, because it was so old (literally decades) and there was no version control in that time. I don't think anyone suspects any malpractice with shellshock, but think about it: If you find a really strange bug that looks like a backdoor, and it's 10 or 20 years old. Wouldn't you want to know who committed that code?


At an old employer, I wrote neat visualization that drew equal-potential curves through a sampled grid of points. Somewhat complicated, targeted for one specific feature.

I leave, but stay in touch with friends there. They completely turned the ship and that whole module was no longer part of the system. A few years later, I get a message from a buddy: “dude, we dug up your equal-potential curves thing from SVN and damn is it nice to work with. Good job! That just saved me a ton of time!”


One thing CVS (yes, that one) taught me was to create versioned “attic.<langext>“ files and throw all aging code in there separated by enough newlines instead of commenting out or just deleting. While you can search through a history, it is much easier to have an inactive source with all the remnants you may need to revive in a week. Once things are done, delete it from there and then you always know where it was or how it looked like.


Version control is like an out-of-the-money option, or like a home insurance policy. And just like with an OTM option in capital markets or with a home insurance - the point isn't that you know you'll use it. The point is that you can, if need be, which allows you to take more risk (in your example - delete some code). It's not just psychological, it can be explained quite easily in terms of risk management.


We use reverts frequently enough that it's not an insurance option for us, it's just our normal workflow.

http://thecodelesscode.com/case/118


Yes, but maybe not in the way that you're thinking.

The scenario isn't "I'm gonna go browse the changes that were made in March of 1994", instead it's trying to solve a specific mystery.

You see some code that doesn't make much sense, so you look at git blame to find the commit where it was written. Look at the full change, read the commit message, and now you've got some more context. Often this is enough to understand, but if not, you can check out the code at that time and read the implementation of related systems. Soon things are starting to make sense! Certainly they make much more sense than they did when you started.


I love reading through revision history but it's so, so easy to mess up given people and time.

We have a long history spread across several different (generations of) SCMs. I see each of the following quite often:

Most recent revisions (version A, code comments date code to the 90s):

- 2011-02-03: Migrate to git

- 2008-01-12: Migrate to SVN

- ~Fin~

Most recent revisions (version B):

- 2018-12-12: Split <X> out into own file

- ~Fin~

Most recent revisions (version C, sweet monorepo blogpost edition):

- 2019-10-01: Create monorepo for <Team/Superteam/Division>

- ~Fin~

---

It's hard to do transitions between SCMs (or between repos) right. For instance, when you follow the recommended steps for moving to a monorepo in git, the history is maintained but not shown in GitHub. It's so easy for well meaning people to destroy history or make it inaccessible for practical purposes when cleaning up dead code, reorganizing code, etc. Even if the history is still maintained (e.g. if there was a file rename in the same repository, you could use git log --follow, if you're trying to find when a particular snippet first came into existence, you can use git log -S<snippet>) but practically as soon as I run into one of the above, that's the end of the line.


I suspect that if the industry ever moves away from git in the next 5-20 years, it will be because there's another SCM that has this capability.

Just like DVCS "solved" branching and merging, following history through renames, splits, and other structural will make SCM 2x better.


I guess this is the place to ask: Anyone seen tools that make those jumps/skipping less important commits/... easier? Would be a good feature for a code exploration tool.


I once worked on a codebase where almost every mystery in the code could be traced back to a commit "Moving TIM one level down", and no further.


As in there was a terrible developer named TIM that caused all the problems in the code and the solution was to move him to the basement and assign him to non-coding tasks, so no one ever heard from him again?


It could be. I would have said that the reason was that the name of the software was the TIM, and at some point, the repo had been restructured so that the code for the main app was in a subdirectory rather than in the repo root, but that could easily have been a cover story.

EDIT Ah no, it couldn't have been that, because i met several of the developers who had caused all the problems in the code, and none of them were called Tim or in a basement.


Oh boy, library levels.

"How are we almost level 100 now? We're going to need an exemption soon!"

"We'd better go move around some orphaned, 20 year old C code to avoid triggering this well-meaning organizational policy!"

I was sure you and I must have worked at the same company. Alas, TIM isn't named TIM here though.


Whats' TIM?


Telecom Italia Mobile is a customer of ours, that's what I know as "TIM"


Indeed. I'm guessing I use the revision history several times a week. We have a complex code-base, often you'll want to check why something was written a certain way, so annotating and checking commits, and possibly the referenced JIRA issues gives that context.


You hit the nail on the head. The code review tool at my company links together all the commits that were in that code review, when there’s more than one package, I mean. It’s super helpful to look at what all was changed across packages and understand how the code base evolved.


Depends on your project/business.

On my current job, I very rarely go back to see when something was changed, because the business requirements are very straight forward. A change needs to happen, and the implications are clear. Also, no one really documents discussions systematically, commit messages are rather short etc. Not much value can be extracted.

On my last job, a system with over 15 years of history, my team was often puzzled with the existing codebase and the seemingly weird things it did. "Who wants this?", "Is there a usecase for this?" and "Do any of our customers actually expect this functionality if we remove this?" was a frequent question.

Then we'd check the commit history and get the 3-4 tickets involved in the functionality's history. Long discussions and back and forth with the client, explanations why the functionality was being added etc.

This archaeology was so frequently fruitful that all the team engaged in it.


We just switched from Perforce to git at work, and about the first 2/3 of a project I work on got squashed together. It took me less than a week to bump in to that "initial commit" when trying to figure out why a bit of code is the way it is.

"git blame" (or the p4 equivalent) is my usual archaeologic tool in this context, but "git bisect" has been very helpful in others. For the first, it should be easy to look at your current codebase in SVN and see how far back the history goes in any particular area. I've found that bisection is most useful for relatively recent history, because I usually have wanted to build or run the software to test for a bug or something - beyond some point in history that becomes impractical.

Moving from SVN to git shouldn't require losing history though...


I spent a couple months experimenting with p4git to make sure we got a real history imported. They didn’t get every repository right and I made them redo a few, did a few others over again myself.

But lately I’ve been delving into some of the early architectural changes, trying to figure out why a bunch of things get loaded and seemingly never used/only used once... and sure enough someone did some sort of single commit repository split or bull transfer into p4 in the first place... so I have no clues why.

And yes, I use commit history all the time. Fewer secondary bugs from bug fixes. Fewer lines changed per fix, and fixing bugs nobody else can figure out. Started when someone was furious that a bug had been reintroduced twice. Two devs were alternating fixes for two bugs that caused each other. Good times.

If anything I want more commit history. I want to be able to go back and add notes to commits for posterity, or at least for myself in six months when I have forgotten because I haven’t looked at this module once in that time.


> I want to be able to go back and add notes to commits

Interesting idea - how about using tags for this?


Possible. There’s also another feature that several people have pointed out could be used for this purpose but as with so many things in git, it’s not seamless. Might still be worth using though, even if it only helps a few of us.


+1 for git bisect, criminally underrated!


Reverts that get to master can royally screw up git bisect.


I'm in a similar boat with you, so I have a few questions:

1. Why did you switch? Are you satisfied with the new experience?

2. Why was some of the history needed to be squashed? Were there any technical concerns?


1a) I wasn't involved in the initial decision to switch, but gather it came down to expense and some unsupported perforce-related tooling.

1b) Absolutely! The switch from p4 could've been smoother, but git and access to the associated ecosystem is far ahead of Perforce for everything except storing binaries. We've been abusing p4 by storing electronics CAD files and similar, and do need to put some effort in to a new solution there. For source code, git (and the ecosystem of modern tooling it gives access to) is a huge improvement over p4.

A particular improvement has to do with network latency. Working from halfway around the world, I notice that p4 operations involving many files are very much slower than the similar git operations.

2a) I wasn't involved, but the particular project mentioned above had moved within the depot at some point, so guess the conversion tool that was used to move it couldn't manage that move. I intend to rebuild the older history in git, then see how viable it is to use git-replace to stick it on the beginning of what we have now.

2b) Yes. However, nontechnical concerns dominated conversation by far - to the detriment of the technical concerns.


In 2013-2014, I was tracking a strange bug in a legacy accounting software.

I was reading the business logic that triggered the bug and it made no sense.

I activated the blame view of the code, and I realized most of the code had been written in ~1998 but a couple lines had been updated in ~2007, by someone who probably never even met the original author.

Realizing that made it a lot easier to understand the context of the bug and fixing it.

There is a lot of value in knowing that two lines of code next to each other have been written decades apart by people that did not coordinate with each other. Never erase that history voluntarily.


Yes, yes, absolutely. I've looked at history going back 10+ years (at least). Many times. Two-year-old commits I consider to be fairly recent.

I can't remember a specific reason why off the top of my head, but it was usually something to do with looking at the context around why some piece of code existed. The companies I've worked for also require commit messages to contain bug tracking IDs, which can provide further context.

There's also really not much of a reason to migrate from svn to git if svn is still working for your organization. Whenever the topic has come up previously in my workplaces it ended with "nah, svn is still working fine for us." OTOH I was involved in a migration from CVS to SVN because of limitations/problems with CVS.


> There's also really not much of a reason to migrate from svn to git if svn is still working for your organization.

Attraction to new devs.

I'm new enough (past decade) to software development that I've only used git/mercurial, and joining a company that required using svn would give me serious pause.


It shouldn't. I've switched version control systems probably a dozen times over the years. There are common features, and of course, unique syntax to each of them. But once you understand how one works, it's not usually too difficult to use others at least at a basic level of getting changes and committing changes.


> There's also really not much of a reason to migrate from svn to git if svn is still working for your organization.

But also not much reason against, given the quality of migration tools. The toughest nut to crack is the absence of that handy incrementing version counter.


The issue with the migration is the references (commit IDs) and the fact that branching best practices differ dramatically.


But merging is not a half-baked shitshow.


> they haven't transitioned from SVN to git solely because of the logistical challenge of migrating 30 years of commits

lol, I don't think that's the reason. At the only place I worked that used SVN the real reason was that the old guys didn't want to learn something new.


It may well be the reason. I don't know the parent commenter but speaking from my own experience I recently tried to migrate a legacy code base from svn to git and the common tools all failed for me. One of them ran for ~72 hours before falling over for "reasons".

That was only ~6 years of commit history too. I could imagine 30 years is a whole extra logistical challenge. Not to mention making sure the whole developer base is aware of the change, prepared for it and willing to either update documentation where it refers to revision numbers or write some instructions on how to find a new commit reference from an old revision number.

The tag model is also different between svn and git.

It's not out of ignorance that the switch can be difficult, we've had some projects on git for years and we're comfortable with it.

Just because you've switched easily from svn to git doesn't mean everyone has the same experience.


Same. We have a 20 year old code base we migrated from Perforce to svn about 10 years ago. Now they want to migrate to git. Unfortunately, we can't get it to work either. Same thing - after many hours of running it just fails with typical useless git messages that nobody comprehends. It's quite frustrating.

(And personally, I hate git. I've used it professionally, and while it works, it's very difficult to use compared to svn or p4. Between it's utterly incomprehensible made-up terminology and horrid syntax, and hashes for commit numbers instead of just an incrementing integer, it's quite a bit less useful than svn was for us. But it's being forced on us, unfortunately.)


Maybe the old guys didn't think there was a good reason to switch from SVN to git, if SVN had been working for them. It sounds like switching just because git became more popular.

Of course if you're young, it's no big deal since you haven't been using SVN for decades, and what's the big deal with learning something new? But when you've been around the block a few times, sometimes there's needs to be a better reason than the new and shiny.


I used to be one of the "old guys" fighting against git because subversion is "good enough" and I didn't want to relearn a new tool. I finally had to learn git because a project I contributed to made the migration and I could no longer postpone it. I'd never, ever, go back.

At this point I'd go as far as saying that git is objectively superior to SVN because it does everything SVN can and then more. One caveat being potentially very large repositories and especially repositories containing sub-repositories, git was terrible at that and while it's improved over the past decade it's still a bit messy. Unfortunately in my experience these types of repositories are fairly common in proprietary codebase where people often don't hesitate to commit big binary files alongside the source code.

Still, I'd say that as a rule of thumb if a codebase is still in active use it's probably worth taking a week or so to migrate it to Git unless there's a very good reason not to.


SVN has its limitations, but having used Perforce in my last job and now using Git, I wouldn't mind going back. It's a judgment call, as the two systems have different strengths and weaknesses, but I don't find Git to be an overwhelming improvement. For instance, in Git, if you merge from branch A to branch B while work is underway on both, then merge A to master — and assuming you're using squash merges, as is pretty much essential to get a readable commit history — and then merge from master to B, you're in merge hell, because Git doesn't remember that the squash-merge commit contains some of the changes that were already merged to B. Perforce gets this right: it gives you an equally clear master history, with a single merge commit each time you merge from a branch, but doesn't lose the relationship of that to the individual commits in the branch.


Write down the nicknames. In 15 years git is ripe for next gen source control and then the same ppl will complain who are now advocating what is so hard on migrating


> But when you've been around the block a few times, sometimes there's needs to be a better reason than the new and shiny.

I went through periods where I had to use CVS, then SVN, and then git. The transition period between each one was a little problematic since I had to constantly refer to the documentation in order to learn the new source control commands, but, in my opinion, one should be capable of learning new technologies.


SVN? What's that? All you young whippersnappers keep harping on the new and shiny.

Real men use cvs.

(I kid.)


Yes, absolutely, and failing to check the history of some changes can easily lead to re-opening closed bugs!

For example, we once had a customer-reported issue in an older version of our product (customers were complaining that an automation script for our product started lasting minutes where it previously took a few seconds). After some investigation, it turned out someone had deleted some code which excepted the scenario in the customer script from a timeout.

The commit removing the exception had a bug attached - the QA team had been complaining about the expected timeout not applying in some cases, and someone found the exception in the code and deleted it. They had no idea why the exception was there in the first place (according to the bug chat logs) and didn't bother to look back in history to see.

Funnily enough, looking back even further in history, we found that the exception had been introduced a few years prior, after a customer had complained that... some automation scripts were taking too long... the same automation scripts that we received in the new complaint, give or take a few years worth of additions.


Besides piling on with the others saying I use it a lot, I'll remind you that the old history can still be available, in a separate read-only repository, after you've moved on to a new tool. Not as convenient, but still available.

Some use cases are:

a) "Blame" tool, which produces a file version annotated line-by-line with who and when last changed that line, along with the commit message. "Who the f*&# did that s%$@#? Oh, it was me again..."

b) Searching the history of a file by keyword. Especially useful when something was deleted, and as such no longer exists in the source code, but you can find it by searching for the commit message. (knowing you can later do this gives you more confidence to actually delete things, instead of commenting them out or leaving them there in case they become necessary again)

c) "All I know about that feature is that Jenny implemented it before she left the company." Filter for Jenny's user tag.

d) "All I can find about that change is this old email saying it had just been done." Look at logs around email date.


We switched from SVN to Git about a year and a half ago. I often use the "annotate" feature in my editor to see the history of lines of code (to figure out who to talk to when I have questions), and I routinely run into the "initial commit" wall from when everything was squashed.

I wish that when the team had migrated from SVN to git, they had used a tool that would have preserved the history. It's very easy to do! I don't know why they didn't. They did it right before I joined the company so I never had an opportunity to show them how.


It may not be too late. you could rewrite the history (but good luck syncing it to everyone)


You don't even need to rewrite the history. Create a separate repository with the converted SVN history, and tell everyone who needs it how to virtually combine both repositories through the .git/info/grafts mechanism.


Git annotate is a fantastic tool, even if the developers aren't great contributors.

A lot of developers do not write good statements. They don't even link to a ticket. But you get to know those developers real quick when you're doing spelunking using annotate. And developers who don't write good commits probably didn't leave any other documentation behind of use.

I've used this to illuminate "technical debt" from a different perspective. If you take a critical code path, find the important commits for critical logic, and then just show the "context" you're left with, you'll often be able to say "this is why your quality sucks" in a real concrete way.

Managers love proof, and showing them what little context you have for critical areas can be a very different way of looking at the quality of their systems. Otherwise, I've often seen a LOT of overconfidence largely because "we have automation in place".


Frequently. Whenever I run into something that makes little sense to me, I go back and look at the commit messages, the related bugs, the evolution of the code, and who wrote it.

It's called software archeology. It's not important if you keep exactly the same people working on the same project and they have perfect memory. But if you, say, move people between different teams, or lose people, or hire people, it's a gold mine.


Yes! My company has source that is 20-30 years old, sometimes just going through the history to see who has committed on it and checking if that person is still in the company is already a win, other times there's a code review link or a ticket system link that gives possibly more context to why something was made.

(Granted code review systems and ticket systems change overtime)

Git blame on gitlab is also a good way of getting context of why something is there to begin with.


If migrating all old commits is such a challenge (it shouldn't be), then a workaround could be to keep your SVN online but in read only mode. If your git blame shows "Initial migration commit" you just move over to blame in the old svn.


Every bug ends up blamed on the whitespace guy who goes around correcting tabs to spaces.


That guy's going to the special hell. DO NOT vandalize code you aren't actually changing.


Source control history is absolutely essential for me for debugging hard-to-reproduce bugs. The first step is a blame (both git and svn have this) to find out when the last changes to a suspicious piece of code happened and what has changed. Also who changed it so I know who to talk to (if they are still in the company). If the commit comment gives additional info, why the change happened that's great.

Sometimes those changes are over a decade old (of course such old changes make it more unlikely that they are still buggy, but new changes may interact with those old changes in unexpected ways).

So yes, the older a code base, the more important a complete change history becomes.


The last time I moved a large codebase from git to svn the surprising thing for me was that a git clone was _smaller_ even though it had all of the history, than svn was holding only truncated history. Everything runs faster, and the tooling is better. They will feel like they traded their gocart for a Ferrari.


Revision history is essential for traceability in safety critical work. The entire history with name and timestamps can show both who introduced a problem, but also gives context for how a more subtle architectural issue got baked in slowly over time by multiple people. It can then be fixed, and possibly the process can be updated to help avoid such hard to spot issues in the future.


Yes, we frequently review revision history, and it’s not unusual for us to go back about a decade. Unfortunately, prior to that, source control wasn’t used, and that has made some tasks fairly difficult.

A great example is data migration. Infrastructure changes over time, even if only gradually. Databases get upgraded and moved around. Recently we realized that some data we migrated nearly a decade ago had significant inconsistencies. We didn’t have full revision history, but what we did have was enough to piece together the puzzle over a period of several weeks. If we had full revision history—which would’ve gone back about two decades—the job would’ve been much easier.


If you find something wrong in the code, you can fix it with or without the context with which it was written.

If you fix it without context, you may not actually fix anything, and actually create a broken state. This may be a new bug or a regression.

If there's more context, you're less likely to fall into that trap.

Of course, this is all moot if there's decent documentation, but I've never been employed in a place that does. Everywhere requires reverse engineering / archeological expeditions to understand the mistakes of the past, before accepting them as necessary evils, or fixing them without breaking the side effects of the mistakes.


One of the more frequent uses of source history where I work is to see when an issue was introduced to help gauge the priority of a fix. When something is discovered and someone wants to make it a stop-the-presses fix-it-now type thing, we look back in history to see when the change was introduced. If it's been there a while, and especially if an external customer has never created a ticket on it, we come back and say, "Well, it's been that way for five years now, so why don't we just put it into the next scheduled release instead of an off-cycle emergency fix."


Reason 1: keeping codebases cleaner.

By keeping commits larger-grained (especially if I'm deleting a functional component), it supports deleting with abandon, and follow the "you aren't going to need it" (YAGNI) principle, rather than having large commented-out sections (or worse, large sections of deadwood in your tree). It also allows you to restore it later if you need it again by only reverting one commit.

Reason 2: finding out WTH went wrong.

By having a master/stable branch and a development branch, if anything goes wrong, I can always diff between the branches to see what/how things broke. Sometimes it's a change to a dependency. Sometimes (ok, most of the time) it's a change I made.

This said, I think it's useful to me because I know what's in the history already. I think looking through commits from someone else with a tree that I'm not familiar with is going to be of very limited use, especially because people don't generally provide the critical answer of "why" a change was done in the commit log.

Related #protip: always try to describe why you're making a change in the commit log.

The how of the change is already there: it's the diff. Why you're either making the change or choosing a specific method over another can be invaluable to the Engineers of Tomorrow, and prevent them from a regression due to context loss/tribal knowledge loss.


Yes, every day. Well-written commit messages are like comments that never go out of sync with the code they’re attached to. Often it’s the stuff from years (vs months) ago that’s the most valuable, because that’s the information that nobody who still works at the company can remember.


Not frequently, as some other commenters have claimed to do. But when I do, it's invaluable.

I build custom automation equipment, which involves individual 100-400 hour projects. They're developed in a continuous scratch-to-complete flurry, with a few days of revision after customer review and installation, a year's warranty that typically involves 1-5 on-site days, and an annual "we never read the manual please remind us how to calibrate it again" for the next decade. Very little maintenance coding, lots of fresh feature development.

I disagree that there's diminishing value to older commits. You're more likely to forget what you did the farther back you go!

I'd estimate that I use revision history maybe 0-2 times in a typical project. But that's an easy way to recover a couple days of work that would otherwise need to be rewritten from memory, or worse, reengineered from scratch! You can write a lot of commit messages in 16 hours, so one incident where you can recover two days of work makes two months of using version control without ever referencing the history worth it. Plus, it's a nice security blanket for me, I don't worry about commenting around old code or making changes to a reference implementation I'm modifying because I know it will be in version control.

I do think it's exceedingly unlikely that you'll suddenly decide to revert to the state of your codebase from 20 years ago. If you transitioned to Git and kept the SVN repository around for the rare occasion when you need to reference it, at least in my projects, you'd be able to do so without much trouble.


At my current company they place a huge value on that history, so much so that they haven't transitioned from SVN to git solely because of the logistical challenge of migrating 30 years of commits.

You can convert a repo from SVN to Git with history intact!

There's a tool called cvs2svn that I have used to upgrade really old CVS projects to git (it can do git too), and there is also an svn2git. And, I believe there is git-svn that provides a git interface to an svn repo.


This is the best answer, but even if you didnt want to go through this trouble... just create a new repo in git, lock down svn and assume from that day on, if you need to see older history, go look in svn. Over time the need to look at svn for history will dwindle.


That sounds like a really bad idea. Eventually you'll consider the SVN stuff to be lost, and it'll be impractical to import it later. As all the responses show, that history has real value.. you don't want to lose it.


When you're trying to solve a mystery one of the first questions you ask about a piece of code you suspect might be involved in the problem you ask "what does the commit message for this say?"

It often contains valuable clues.

Less often you will want to know "how has this code changed over time?" or "was the code like this originally, or did it used to look different at some point in the past?"

Commit messages often say why something was changed. Well, good ones do.


Depending on use case, anything from weekly to very occasionally, but in all these cases it's invaluable. E.g., selectively reverting commits that are known to have caused bugs, checking something whilst preparing release notes, working out how we did X in the past, etc.

> At my current company they place a huge value on that history, so much so that they haven't transitioned from SVN to git solely because of the logistical challenge of migrating 30 years of commits.

I assume they've actually tried to do it? I ask because there's a bunch of tooling and at least one reasonably well understood process for achieving this and preserving history so it's pretty low investment to try it out and see if it works.

Here's Atlassian's version, for example:

https://www.atlassian.com/git/tutorials/migrating-overview

(I will grant you that figuring out how to navigate to the next page of the current tutorial at the bottom of the page is unnecessarily complex.)

I suspect with 30 years of history it's going to take a very long time to do the conversion (days to weeks), but you can set it off and leave it running. Once you have your initial migrate done you can set up syncing to git, and then you need to pick a time when everyone will stop committing to svn, allow a sync and verification window of a few days, and then everyone starts using git.

It gets more complex with multiple projects ongoing, and scheduling around releases, but making this happen is more a matter of will than battling complexity.


Yes, regularly. Some archaic code has a surprising longevity. Personally at least once per month I end up with a case where I wonder why some code was implemented or for what purpose. Context that is rarely documented in the source code, but is often exposed at least implicitly through commits, commit messages, date or authors.

I strongly advice against abandoning revision history just because it is easier to just start fresh from a single git commit of the current state of the code. Especially so for code that has been in use more than a couple of years, where the developers may have forgotten the purpose or who did what.

Surely you can convert the svn repository to git with history intact? We did that when we migrated from cvs to mercurial. If it is too complicated to do directly from svn to git, maybe it is easier to convert via mercurial, i.e first from svn to hg, then from hg to git?

https://www.mercurial-scm.org/wiki/ConvertExtension


> they haven't transitioned from SVN to git solely because of the logistical challenge of migrating 30 years of commits

The decision of migrating the repository or missing the commits is a false dichotomy. One can deal with two repositories without much of a problem, it's only a little slow down at the rare event you have to look at it.

Anyway, that applies only if you do have a reason to migrate.


There is less value in old commits, but you don't know which one will prove valuable, so all of them have to be there.

2019 fix, of a 2011 breakage:

http://www.kylheku.com/cgit/txr/commit/?id=3a91828748385d8d6...

2020 removal of 2009 misfeature:

http://www.kylheku.com/cgit/txr/commit/?id=24bd936a9fa671599...

The TXR project only goes back to 2009.

We can fix these kinds of things without reference to the past, but the process would feel uniformed and impaired.

Not everything is in the code; there are sometimes questions of requirements, which are not always properly captured in documentation.

We need all the historic questions to be able to figure out the whole situation: what happened to the requirments as well as the code, and how it all relates.


Almost everyday, often several times per day.

Changes made more than 10 years ago help to fix bugs still present today even if the codebase has changed a lot (they have commit messages, link to old tickets with more discussions ; sometimes, just the name of the committer tells a lot about what to expect from a change).

I've spent a lot of efforts when we started migrating from SVN to git to not lose this, knowing the pain of not having the history go far enough (some of our projects were already migrated from CVS to SVN a long time ago, and histories where lost then). Efforts have been more human than technical, BTW, since not everyone was aware of the value of the history — usually bugs in the oldest parts of the codebase get through only a handful of people who have been there for a long time, and other people tend to take for granted that we understand why something is the way it is.


Our codebase is around 20 years old and was in CVS, then SVN, then git. Then several years after git, a new git repo without any history due to poor use of the first git repo (someone added binaries, bloated the repo to GBs instead of maybe 200-300MB, which made git export horridly slow).

In all the steps we preserved the commit history, except for the final git->git. However also when we moved from SVN to git we kept the old svn server running as a historical archive for several years, as we didn’t carry across all projects (some were already EOL’d years ago).

During that time I looked at it maybe twice, and ultimately we decommissioned it.

Likewise with the new/old git repos, we still have the old git repo if we need the history.

One final thought: git blame was nice, until someone reformatted the entire codebase and committed it back in (we’ve since adopted better git workflow and code review practices!)


>Then several years after git, a new git repo without any history due to poor use of the first git repo (someone added binaries, bloated the repo to GBs instead of maybe 200-300MB, which made git export horridly slow).

Considering you already needed to freeze work, couldn't you have removed/ammended the commits affecting those files and the force pushed including the history with a smaller repo?


Thought about it, but for external security audit purposes we aren’t allowed to rewrite history.


Rewriting history by squashing history in one new git repo with "initial commit" was bureaucratically acceptable I assume.

Since working in a big company this is something I've been thinking about a lot. How a lot of traceability and quality enforcement often leads to lower traceability and lower quality.

Quality enforcement at large often results in so much friction for changes that the company can't make necessary changes anymore and just piles on technical debt until it becomes unmaintainable. An example is the recertification process for airplane pilots that directly lead to "low friction" workaround like MCAS. After many iterations of this, modern 737s are now a pile of technical debt. The max was only the straw that broke the camel's back.

With traceability like in your case you have the same. The "low friction" version is to throw away potentially pertinent historical information, because it wouldn't accurately reflect history.

I don't have a solution for this, but it's been bothering me for a while.


You can use git log -S and the regex of the line to see how that line changed over time. It is also called git pickaxe. Also git diff has an ignore white space which basically ignores formatting stuff if that is the case. I believe that git log has it also.


> git blame was nice, until someone reformatted the entire codebase

We have a few "blame walls" in our codebase like this due to formatting changes or badly-implemented restructuring/renames. When I'm in bitbucket tracing back through the history of a particular line, I end up having to manually jump back a few commits at certain points in time, and then if the file was renamed I have to also navigate to its old location/name.


You should have rewritten history. It’s less invasive than doing it in svn.


GitLens[1] is my favorite VS Code extension. This extension shows relevant revision history directly in the code editor. These features are called current line blame and authorship code lens.

[1] https://gitlens.amod.io/#features


Yeah I've used it before.

Once I did a git blame on a file, found that the offending code had been committed nine years previous, and was able to figure out why the code was the way it was by looking at that nine year old commit and all the other code that had changed with that commit.

The nine year old context was super useful.


I'm working on a 15-year old codebase, and I was here from the beginning. I use 'blame' daily to make sense of code - why it was added, for what project, for what feature, to fix what bug, and who added it. It's priceless, and I'm putting off stuff like getting rid of stupid homegrown types instead of standard ones, using clang-format consistently, and migrating to a small, newer probably faster repository format because I don't want to use history.

I realize the value of this history is smaller for newcomers to the team.

(And Subversion and it's bigger, expensive brother Perforce still make sense for game development - when you don't really want to go wild with branches or remote work, and when you need multi-terabyte-sized repositories and multi-gigabyte single commits.)


Yea, like many people in this thread. I'd just like to add, you don't have to block on migrating SVN history. It's actually enough to make sure that the SVN repo remains accessible indefinitely (if there's nothing secret in it, making a tarball of the SVN repo is a fine way to avoid running a server). I don't go through historic code often enough that it would be annoying to find a different repo when I need it, and half the time people's git conversions don't let me make sense of "This fixes a regression in r1234."

Of course, there are other options too, like migrating and then using replace, migrating and then rebasing, etc. I just want to point out that even the lowest effort option is valuable enough compared to throwing away history.


Just a few weeks ago, I had to dig back to try and root cause a very old bug. It ended up being in code from back in early 2012, but the root problem was drift in functionality that it didn't keep up with and eventually broke in an unexpected way. I would never have known the context around why that code existed in the form that it did without being able to go back through the history.

Granted, the context didn't really change what the fix needed to be, but it did provide a useful moment of reflection on the ways in which software can break through subtle changes over time that stack up, and it helped to know that the section of code that broke was indeed originally intended to work the way it did (and not that it was a bug from the very beginning).


Absolutely. Here's some cases where I've used older history:

1) We found a nasty heisenbug that crashed with a useless unrelated stack trace, but only if you didn't have a debugger attached, and only if you held open the windows 8 "charm bar" open for more than 10 seconds, and only on the main menu screen. After wasting a couple weeks trying to root cause it with logic, I eventually resorted to brute force bisecting perforce history by hand - and then the changes within the changelist to blame, as it was a large one. This let me figure out it was a bug in a seemingly completely unrelated, closed source system API, that we were calling to check internet connectivity. I had to write a standalone repro case to prove to myself it was the cause, it seemed so nonsensical. I wrote a workaround. This bug was only a few months old though, because QA was able to catch it early enough. The bug likely would've eventually gone unfixed without perforce history.

2) I went to upgrade a 3rd party dependency that we checked in, that hadn't been upgraded in years - maybe even a decade - for bugfixes and such. Except we'd made changes to said 3rd party dependency, so I needed to seperate out and understand our changes to the baseline SDK so I could decide if I should re-apply them to the updated SDK (in some cases yes! I was able to drop others.) We had a web interface to an archived SVN repository containing the commits before our years-old Perforce transition - and before my employment there - which I used to help me grok it all. I might have reached as far back as a decade in this case - very low frequency of commits to that part of the code, however, so "a decade" might have meant "the past 10 or 20 commits", if that. I had to reach out to IT to even get credentials to see said history. Helped turn a nervwracking upgrade into a tame one.

3) We decided to port an archived, years-old project to a new platform. Just seeing the last change made to sanity check if the weird logic I'm seeing might be a "new" bug or not means looking at years old history. This has actually happened to me a couple of times.


It is invaluable when working on a long-lived codebase. There have been many occurrences where a current bug involves code (sometimes that I wrote) from a decade ago. You will find high correlation with the value derived from meaningful code comments :)


At companies I worked it was used for political games within. Use it to blame bad code, use that blame later on performance review, deny bonuses and/or salary increases.

After transitioning to freelancer and being the sole user of it, it finally allows me to use it for its true purpose, namely refreshing my memory on some techniques. Sometime I copy/pasta code from older revisions on a different project because that's the code I need for current project (while the current code of that project changed due to client requirements changes). Also sometime it's used by clients to see how the status of the project evolved over time, so it also serves as a metric purpose.


With the exception of rare uses of git blame (hate that name) to figure out who made a change so I can ask them if they happen to remember about it (and if it was more than a year so my expectation is “I don’t remember” which is fair enough)... virtually never. I don’t care if my git history is “clean” or “dirty” (if you never use a tool to make a visual of the branches then you’ll probably never notice or care).

We put Jira ticket IDs in our commits and sometimes that’s useful. But the value tends to be in the content of the tickets as much as the commits.

If we decided to squash everything more than a year or two old into a single commit I doubt it would affect us very much in practice.


> With the exception of rare uses of git blame (hate that name)

I do too, so it's worth noting that you don't have to call it that. I think it was a mistake for Subversion to introduce blame as a cute alias for annotate, but its successors (at least Mercurial and Git) have at least retained annotate as an alternative. So you can git annotate to your heart's content and never blame anyone at all.


I'm enjoying all the comments here!

Thank you OP for demonstrating the effective use of Cunningham's law: "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."


I do, I think. I often blame individual lines to see if there's information in the commit about the reason for the change. It's often straightforward to understand what's happening when reading code, the information that's often missing is the why.

Sometimes there are clues sometimes not, but you can often see the line change in the context of changes to other files and that can help.

For this reason I tend to make quite verbose commits with the context of why I'm making the change. A comment in the code would go out of date, and pollute readability, but a well written commit can be very useful.


That's indeed useful, some IDEs/editors are able to show the commit message just above the line of code you're looking at, providing a lot of useful context when troubleshooting. Of course it does require discipline on our part, as it can fall over when a dev has been through hours of messages like "trying something..."


I’ve inherited code bases that migrated from SVN to Git without converting and bringing over the old revisions. This has resulted in a number of times where I hit a cliff when trying to identify when a piece of code was changed as everything points back to the initial “SVN import” commit. Couple that with the original SVN repository being unavailable (either lost or just not worth the trouble of finding in 10 years worth of offline backups) and I would say yes, having your source control under one system with all of its history is ideal.


The more public and more developers it has, the more I use it. The project where I use it the most is Linux kernel. It's also probably because there's a strong requirement to actually write useful commit descirptions. But if changes come from so many developers, it's very useful to catch up on news, what's coming to the next kernel release, what changed in what driver or subsystem, what might have caused the regression I'm seeing, who to contact, etc.

On personal projects I don't use it that much.


Yea, the only times where I don't use it is when I'm working solo on small projects. Then the dirty tricks like duplicating files into `.bak` or `-ver3`, comment-out unused code for potential reuse later, etc, are actually more time-efficient at least for a non-git wizard like me.

It's invaluable in larger projects, even as small as 2 persons.


I mean, I still use version control for pretty much everything. I just don't look at history all that much or try to write very careful commit messages for the small personal projects.


Constantly. Devlopment is a process, and artifacts fall out of it. The obvious artifact is the source code, but others include documentation, revision history, tickets and infrastructure configuration. We should be taking as much care of these as we do source code.

One example I don't think I've seen mentioned in this thread is that sometimes a change touches two widely-separated parts of the code. The commit message may be your only opportunity to comment both parts at the same time - to tie them together.


Yes, but not for a distant past (which is relative). Sometimes it is a revert to revision thing, sometimes I just remember a revision as a base for an ongoing refactoring. It must be done in a branch, but when I work alone on my thing, I just break trunk and do commit broken tree at every evening (or at logical points, whichever comes first). Besides refactoring, nope, write-only style. All the variants of “knowledge base” code I need to reference to are in a separate dir/repo, ready to copy, re-experiment, revert back or commit. VCS is spacetime - you can use time and you can use space.

Also must confess that I never seen much profit for myself from commit messages apart from one-liners like “broken”, “savepoint” and “fixes to bar, uploaded foo”. If trunk has a problem, you just can blame and get an exact revision. If you search through a history, use a gui tool / ide that can fetch it quickly and compare to head, then bisect manually. I don’t make hundred-pagedown commits, so that’s easy enough.

For a future employer: that doesn’t mean I’m against or unable to make branches and write good commit messages. All above are just obvious shortcuts that my own “garage” projects tolerate with no downsides. Personally, I don’t get why some guys freely decide to break project rules when at work - and it was frustrating when they did it to me.


Yes, while maintaining code which other people wrote who are no longer around I often have to figure out what they might have tried to do. Being able to see how code looked before someone rewrote it can often give you an idea what it's about. Even better if the people even used good commit messages which explain why they fixed something and why they did it the way they did it.

Sometimes code only makes sense if you can see it's evolution.

Also knowing who wrote it you can ask those people sometimes about it.


It's not too hard to migrate from SVN to Git.

I did this for a 15+ year old very large code base. I tried various recommended techniques but everything would fail at some point or another (usually after many, many hours) and I'd have to start over.

Finally I wrote a very simple program with logging where the logging also records the current state and on any failure I could start from any point in the log.

The idea was simple.

1. Check out SVN version N

2. Parse the changed files list for file between commit N and N - 1

3. Copy those files to the Git folder

4. Parse the commit log for version N to get the commit date, committer name, and commit message

5. Commit in Git using "[<name>@<date>] <message>" (this preserving the original commit information)

6. Repeat with N + 1.

If it fails at anytime then simply reset both SVN and Git to the last successful commit and restart. I also did a binary compare of the entire directory tree every 100 commits to ensure the copies were identical.

The process took about two weeks running all day and night (since one commit at a time is very slow) but it was very robust and left a perfect version history.

To deal with the fact that the SVN repo was still live, I believe I mirrored (or something like that) the repo and would sync between my local mirror and the live repo every couple of days. When my program caught up with the live repo we just stopped commits for a few hours while I wrapped everything up and then archived the SVN repo.


For recent commits, it is useful for figuring out causes of bugs that have been brought up recently by checking out to an earlier commit until it goes away.

As for commits that are over 2 years old, they still serve a purpose. For a legacy app that I worked on, I had `git blame` ran on every line (vim and vscode both have support), and I was able to see who worked on a block of code last. Sometimes, those developers are still there and available to ask questions which has helped me greatly.


This kind of depends. Many years ago we migrated an SVN project to git(largely because of my constant complaining about SVN which I truly hate) but we didn't bother with keeping the history. We simply kept the SVN repo for another ~2 years if we ever wanted to check something. At a certain point when we felt like the project was stable, we got rid of it. I've gone to looking back to where something came from on a number of occasions so I would say it is valuable.

Which applies to any project on which a number of people is working on. Especially when there is a bug, git blame is a life savior. Which potentially has a lot to do with me being annoyingly pedantic about commit messages and branches. I did however had the "pleasure" of working with a guy who's branches were commonly called "bugfix102015" and commit messages along the lines of "fix some bug". In such cases there is not much you can do when shit hits the fan.

For my very personal projects - hardly. Much like you, if something has been done 2 years ago, chances are it's working fine as it is, or you are not using it at all. So for personal projects, digging years back is something I don't ever recall doing.


all.. the... time

if you have never experienced the raw power of "git bisect" when trying to hunt down a bug, you're missing out.

using git bisect can literally save your life in terms of stress. I think it one of THE most important tools in git that developers can learn. it shows exactly why we should commit small and commit often.

https://www.youtube.com/watch?v=REaowJ8JSfw


I did work for a company where they had a few important vendors who would run legacy versions of their software and demand a bugfix for that version. They did this only once or twice a year, but being able to reproduce and patch old versions was enough to get some clients to shell out absurd amounts of money. These companies were swimming in cash and didn't like to learn anything new. Needing old commits is rare, but important.


As a DBA, I use git blame to see who wrote slow queries. Then I assign a jira to them. :)


If your ticket explains why the query is slow and suggests an alternative that would be faster I'd welcome that. If anything, I would expect a good DBA to do that.


Well, that level of TLC isn't going to happen when ...

- I'm doing batches of 50 at a time (quarterly review)

- the SQL doesn't make any sense in the first place.


Sometimes your name is next to that code. It’s okay, you can admit it. We’ve all done it.


Yes, especially with projects with long history. You want to know who did what at a particular code block, and what it was like before. Since we tag all commits with a ticket name/number, we can go back to Jira and see the ticket that was responsible for that change.

This helps a ton when refactoring code in a project with alot of history. I know that this bit was done this way for a reason, but that reason could be anything.


> So I ask: at your company and in your experience, do you get value from source-control-arachaeology? And if so, what does that look like in your case?

I work with different code bases, some have 20+ years of history (migrated from RCS to CVS to git).

There's no week where I don't go back to look at some kind of history, usually to find out why or when something was done. Often the issue keys / ticket numbers referenced in the commit messages help me when the commit message itself is too opaque to understand.

I also like to get a sense of how often a file changes. This gives me a sense of whether the code is likely to be fragile and/or touches often-changing requirements.

There is a diminishing return for very old commits, partly because our team was much smaller back then, and communicated less in writing, partly because too much of the context has changed. But two years doesn't qualify as "very old" here, in our case the diminishing returns start more at 5 to 8 years.

That said, if I were working with SVN again, I'd likely look at the history much less, because it's that much slower and more painful.


Unequivocally yes.

I switched companies (FANG) to my family software company and the company had been using Microsoft Visual SourceSafe. The company was only casually using it, as one computer was used to compile customer executables (and fix compile-time linker errors) and was often times never checked into VSS. Needless to say, no one on the SWE side knew if any code actually worked or when anyone did anything.

Part of this was lack of management, part of this was inadequate tools. After I joined and learned about the horrors of VSS, I switched our company immediately to git (there was some initial resistance). While there was around 10-20 years of VSS commit history to migrate over, having git blame immediately in VisualStudio and any git client makes a world of difference. While legacy code can’t be cleaned up immediately, the team’s mindset has changed so that there’s no more commented out code (“in case I need it later”), no more new duplicate implementations of the same business logic, and a person to blame for software bugs :)


> I switched companies (FANG) to my family software company

> After I joined and learned about the horrors of VSS, I switched our company immediately to git (there was some initial resistance)

Be careful with changes in these situations. "Some hotshot coming from FAANG and telling us how we should do things" doesn't always play well.


Of course. At that point it became a matter of internal politics rather than the actual tool or tech itself. Most of it was showing industry best practices and where things were headed.


Very rarely, beyond a couple weeks of commits. In those cases it is more a question of "did some particular commit hit branch XYZ?". I wouldn't call that source-control-arachaeology.

I'm at the stage where if someone suggests that we try to keep a linear history in git I push back and argue that it isn't worth the extra effort compared to the gains.


The most valuable part for me has been ticket numbers. We prefix every change with a ticket number, and it provides an easy way to answer "why" for any line of code. I wish I could use the comments as well, but that's harder to enforce quality in.

I use git blame and history at least once a week when bug sleuthing, and value it very highly.


Yes. I do git blame/annotate every day on a code base that is over 20 years, 100k commits. I migrated it myself to svn in 2007 without history which I regret because a lot of changes now look like they are from the initial svn commit in 2007 when in reality it had an older history. For that reason I spent a lot of care when migrating to git to include all history but also trim away some dead parts and mistaken commits from the past. Getting a chance to clean up history is great. Migrating an svn repo to git was definitely worth it. It was not a huge logistical challenge as there are great tools for it. The hardest part was finding which tool to use.

The most common use case for archaeology is to find who made a particular source line change 10 years ago and just ask them something. I often find it’s my own code...

People usually remember at least vaguely why they wrote the code even 10 years ago.


I normally do not comment code unless it is a dangerous hack or a todo. Otherwise I make a small commit with a commit message explaining the why of the change. The git blame of our code base documents the why of almost every line and reviews are failed if they describe the what in a one line.

when we need to dig history of a line we git log -S


It only took a day (actually a few partial days) to make a script that could migrate projects from CVS to Git. The importer we used kept all the comment history and even converted multiple commits at the same time with the same comment into multi-file commits. So no need to go looking through the old VCS for the old stuff.


Yes, all the time. I frequently need to know who made a change, what ticket was associated with that change, when was it made, what did the code do prior to that change, etc. We did make the change to git almost 10 years ago. I wasn't directly involved, but we managed to do it in a way that largely preserved history.


It's crazy how in agreement we are that keeping our records is important for legal defense, debugging, auditing, finding out how things came to be etc

Then for our users it's like you can have one name and one name only if you change it, it's going to be that name forever in the past too

Tables should have revisioning on as a default


I have it set up (and many do) so I can highlight a bit of code and see when it was created or touched last, who touched it, and usually what ticket the work was done against -- the context of the change. This gives a whole third dimension of understanding a codebase:

* Dimension 1: Code layout (organization) structure

* Dimension 2: Execution (data-flow) structure

* Dimension 3: Evolution (change over time) structure

All three dimensions try to capture some of the intent of the implementer, and understanding that intent is very important when improving upon the work. Along with that comes a perspective on what assumptions the author had. Code last modified 5 years ago very likely had different assumptions than code written last week -- Being able to see which lines in a function came from which era can illuminate things nicely, and that is only scratching the surface of this evolution-dimension.


Yes, we get a lot of value out of version history. It is a good tool to evaluate existing code. We also have experience with leaving the history behind from migrating to git.

Our case was a tooling policy issue but it should be possible to migrate from SVN to git and keep the change history. You should investigate this option.


I very rarely go back more than a month or so. Very occasionally I have gone back 10 years but only for the weirdest problems and to get a sense of what the person was thinking when they added that code. I think I could live with just 6 months of history, and in most cases 30 days and never have a problem.


I use it to find more detail about a particular line. Often knowing when something was added, who added it and what was the commit message helps me understand the reasoning behind some particularly weird or legacy code. In some projects this can lead you to a pull request link, with even more context.


I'd second that. I've also used old revisions to try and figure out when a particular bug I've found was introduced too.

Depending on how long it's been there can help you assign a priority, e.g. minor bug has existed for 10 years without anyone noticing is probably not something you have to fix right now. The opposite of that is a nasty bug has existed for 10 years until some data structures or control flow changed and exposed it, so it's not a hard and fast rule.

Once or twice I've seen bugs be introduced by merging huge commits from another branch, so there's lessons to be learned from history too. In this case, avoid large merges with conflicts, especially something you cherry-picked.


First git project, had a guy who loved deflecting, hated git, and had a habit of fucking up merges. Let’s call him Steve.

Some code got broken and Steve immediately throws Jim under the bus. I look at the commit history, and sure enough git blame shows Jim broke it. Except... there was one little problem: I was the one who signed off on that code. I was specifically looking for this exact class of bug before I approved it, and was pleasantly surprised to see that the author had already anticipated this problem.

So what the hell happened?

Steve happened. Steve fucked up (another) three way merge and I didn’t catch it this time. I found the original commit hash and sure enough the code was correct. So then I showed Steve, again, what I call a Five Way Merge (3 way merge, then resolve again against both parents) and made it pretty clear this was no longer a strong suggestion and now a demand that he use this method to do merges.


For sure. And unlike comments or documentation, it never lies.


Yes, responding to the title. In particular in my rich web application if I am in the process of merging & deploying the branch to the production environment, sometimes the automated tests fail and I use a bisect to narrow down the offending commit. Having good commit messages in general helps with debugging in this scenario.

I’m surprised you haven’t found excellent tools to migrate from SVN to Git, considering how popular these VCS systems are.

I once worked in a company whose source tree originally predated version control. There I found an entire module that appeared to be dead code and I wanted to determine how it became dead. I did a bisect and landed on an 8 year commit that was the very first commit in the version-controlled tree. Yikes. So I guess I’ll never know how that module became dead.


I've done it both ways. I've had to partition a years old code base, keeping an old SVN repo up to a certain date, then using my own limited git-svn skill to promote some recent months to git, to establish a git repo for use from there forward. At another company we had a magician who ported maybe 10 years of SVN history to git over a weekend, and we were able to abandon the SVN repo.

In the first case, I don't recall ever needing to go back into the old SVN repo, spelunking for "how we used to do it". But the capability was there, with the minor hassle of not having a single repository to search. The git repo, with some minimal recentish history, soon became the authoritative source, and we never looked back.

[edited to clarify the partitioning of the first codebase]


It’s not every day but it’s extremely useful when you need it. I’ve used that history to find context for when someone made an otherwise unexplained change - tickets; names of people, projects or departments; the commits immediately before or after; etc. can all be really handy for learning why something works a specific way. (“Why are we pinned on this ancient version? Oh, that server was decommissioned years ago - we can drop it “)

In your specific example, git-svn works really well for maintaining that history including authorship. I have a few projects which predate Git existing and it’s been quite usable for history. You can’t direct link to a commit ID but Git searches are very fast (we’re not on 20 year old hardware) and you shouldn’t be doing this many times a day.


Every day. My team owns a 10+ year old system. None of the original authors are around any more. We're doing a massive migration of the system (several of them actually) and being able to go back and understand why the code was written the way it was is great. Earlier this week I needed to understand a very odd piece of code. It was making an rpc and if that threw an exception it was trying the same exception in the exception handler with different parameters. Looking back through the history turns out this was because the code used to do something different but during a migration was changed sick that the second set of parameters made no sense. One I understood that contact the fix was trivial and I felt safe making it


Using them all the time:

- Looking at root causes of undocumented weird hacks and technical decisions

- Finding culprit, the one that causes a bug, in order to remind them to do better.

- Finding out someone underappreciated contributor

- Reverting changes.

- Cherry-picking changes.

A REMINDER: Revision history is great, but so is flexibility and velocity. You can always cut off history and use another tree (e.g. when moving from SVN to git), make a documentation about it, keep the SVN history as an archive and use git.

If a decision will boost velocity, flexibility, and sacrifice less valuable thing, you should do it, but make sure you will have a fallback.

In the end, flexibility is what you will need at every level (code, product, company) because the world around you (and requirements) always changes and you'll need flexiblity to be adaptive.


There is value in keeping track of who wrote each line of code. Even if it's quite old, you can still figure out the persons that originally worked on a particular project/module and ask them about it. If you reset history, this knowledge is lost.


I use it as a reminder of what I did over the last year for performance reviews. I use a date range on the 'svn log' command to see all of my commit messages and which files I changed. I frequently find smaller things I did that I forgot about and which may have had an outsized impact on our work as a team. I can find refactorings pretty easily with it, too.

I've also used it to do some deep archeology. I had a piece of code I inherited that was always problematic. Eventually, I went through its history to figure out what it was originally intended to do and why it changed over the years. This was invaluable for finally figuring out how to fix the damn thing once and for all.


Constantly. We migrated from SVN to Git at the beginning of 2018, and spent a lot of time getting the revision history migrated. I routinely check the history and field questions from coworkers who saw that I committed code before the transition.


From 2007 forward (1,2,3,4,5,6,7,8,9,10 companies I've gone through) when using git, developers looked at the history infrequently. A couple times in the (#6 company) where we spent 3 years to create a js framework from scratch and a mobile web offering of the HUGE office toolsuite. Most of the time, it was to back out changes or bisect, which requires a revision history. You almost never need it, until you do, mostly for local and your own branches. Features tagged in commits with JIRA support covers the vast majority of needs. Extended comments are useful to explain individual goals for the commit that make up the feature.


I don’t think I could every work in a place that didn’t value some for revision history...

`git bisect` to find when and how a bug was introduced, getting details when it’s time to merge multi-month merges, getting stats about previous projects.


Very rarely past a few weeks.

But on the rare occasions I need it, I often really need it. Especially because code that has survived sufficiently unchanged for that long ofte has done so for important reasons.

The amortized value per commit for really old code is likely low, but you get them 'for free' because you want to do them to have them for recent code, and the overall value of having them for older code to the codebase as a whole can be significant.

I'd say the SVN history challenge is an excuse - firstly there are tools that can do it.

Alternatively you can easily enough keep the SVN repo around for those rare occasions people really need to dig.


Yes! I performed a major refactor that lasted about 6 months. With the commit history (that linked to tickets with discussions and even mockups), I could understand the process of people who had left the company two or three years ago. It allowed me to tell obscure business logic from leftover code and bugs. I could trace each line of code to its requirement.

I also used it to pinpoint the cause of a bug after updating a docker image, knowing that the bug was introduced in a certain file between certain dates.

Now I try to strictly enforce detailed tickets and ticket numbers in commit messages.


Yes, I use it routinely to do something like

git blame > who changed that line the last time git log > why was it changed

You can quickly find out if this was some trivial typo fix, or an important feature was introduced.

Implicitly, it means than to get some sort of value from that kind of archaeology you have either very detailed git commit message, or really clear bug tracker with all the why, regression tests, etc that were done a that time.

That being said, I think this is a bad argument for not changing/updating your VCS.

You can absolutely move to git, and keep a dump of the SVN base you can still expose and review at will.


I do. Most often I get that value when looking through the repeated blame output of a particular piece of code. I facilitated that through this tool I wrote:

https://github.com/jolmg/git-reblame

Last time I used it was last week to see how a particular piece of code was developed throughout the years. There was a comment that didn't explain some puzzling details, and it helped to make sense of it by seeing how the code changed from the time the comment was written.


It varies. There was a mostly complete PKIX path resolver I worked on a year or so back, that turned out at the time to be unnecessary so I diteched it. Fast forward a year and it became useful and saved a week or more of work. But that was only useful because I knew it was there, and it was still relatviely current.

What benefit you can get from 30 years of commits I'm not sure.

By the way - it looks to be possible to migrate history from SVN to Git, so if your company needs that, maybe start there , by creating a local git repo with intact history and showing it to them.


I've certainly gotten some mileage out of `git bisect` over the years.


All the time. If I'm reading code and think "why on earth is this here?" the first thing I do is hit the git blame page on GitHub.

If a project has a clean commit history, this instantly gives me extra context and hopefully even links me to an issue thread explaining what was being solved.

In older code bases this is invaluable - I often find myself looking at history from five years ago or more.

It's also great for my own projects. Even if I wrote the code six months ago there's still a strong chance I won't fully remember the context for the change.


Game industry uses source control _a lot_.

It’s an important communication tool. Also game companies tend not to have unit tests, but the culture is very much “don’t break _anything_, and don’t make the game worse,” so devs have to triple-check the intentions & effects of any code/script they touch, to be sure they understand what they’re changing and know it won’t introduce any unexpected changes. Timelapse view (Perforce’s version of git blame) is an essential tool for all departments, especially for anyone trying to figure out a bug.


I actually read the description (we call it the “take message”) of every changelist submitted to our main dev branch. It’s a great way to learn, and maintain awareness of anything that may be relevant to my duties. Unfortunately it’s now in the hundreds per day (team is growing), so it becomes hard to actually process the info.


I regularly find myself trawling back through code that's 5+ or 10+ years old, and the best way to understand a detail of code can be to look back to the commit message where it was documented.


When I worked at a large, decade+ big tech company there were several times I looked at running code that was over a decade old. I was working in a section of the business that hadn't existed for more than a decade, so that somewhat bounded how old code could be in my domain, but somehow my team ended up responsible for one thing that was much older, and understanding it took some reading.

There were other core systems that I also read sometimes that were older, and it was extremely useful to understand their construction and function.


No not in 20 years except maybe 3-4 times

Shitty codebase and lack of tests defining business needs require it

Also Devs with headphones on not starting decisions ... That way when people leave no one know why it is the way it is.


Yes, I use it heavily. The further back in time I go, the more valuable the revision history is.

Where I work, we use it primarily as part of maintenance. Looking through what changes have been made to a section of code over time very often gives insight into what is causing a current malfunction -- sometimes it even lets you spot the problem almost immediately.

We also use it as part of development and bug tracking. All code changes are tracked by revision number. Even there, being able to look up even antique history can be very useful.


When Mozilla moved from CVS to Mercurial in 2007, the CVS revision history regretfully wasn't imported. A git repository had since been created with the combined CVS and Mercurial history, but the Mercurial repository of still the official source of truth.

https://gregoryszorc.com/blog/2015/05/18/firefox-mercurial-r...


Yes! So, from my experience it's invaluable to: See why things were done like that. My best usage so far of it was to add a #number at the end of the commit message where number is a trello card id. Then you can go back to the discussion and card that originated that change. You can see all the context of why something was done like that.

This becomes very valuable when maintaining projects that will be running for years, and prevents you from undoing things or going back to doing the same mistakes


At least once a month, when modifying some critical piece of code - git blame is the easiest way to find reasons for non-obvious lines of code.

BTW: if you sometimes move code around between two git repos (from multirepo to monorepo for example), I wrote a script to move a subfolder between the two and keep history:

https://github.com/jakub-g/git-move-folder-between-repos-kee...


Oh that's a cool tool. I'll try to remember it next time.


Like everyone here, “Yes, the history is super useful”. So, I’m curious about fixing the actual problem.

What happened in the CVS => SVN migration? You don’t have 30-years of SVN history. Do you have an SVN mirror / backup for which you can try out the git svn to try to import the codebase?

Besides, it may be 30-years worth of commits, but I’d guess it’s smaller than the LLVM SVN repo was at the time of the first git mirroring. How many commits are you talking about? (Including all branches, etc.).


I navigate through unfamiliar code almost daily, completely different projects and different authors. I use the blame feature a ton to understand in what context a method was added: by blaming it, I can usually see what other code was added at the same time, and its evolution.

I've used git-svn[0] to use git within svn, it's been working flawlessly in my case.

[0] https://git-scm.com/docs/git-svn


Not archaeology, but I use git reflog constantly as my workflow:

I push to master infrequently. I keep a series of topic branches off of master, one per project phase. For changes that affect other developers, I pull those out and PR them to master, then rebase the topic branch chain once the PR completes. When I switch projects I use git reflog to remember where I was working.

Basically I take advantage of git rebase and use it like time travel constantly. Somehow I stay sane..


Who wrote this piece of junk code? ... git blame ... oh ...


But seriously. I have seen many places just move the files from their old vcs and into git.

It is wrong to say you loose the history; you just have to go into old system to access it.

Also sometimes; you change from a monorepo approach to a repository pr. project approach and you want to tidy up in dead projects and irrelevant history while you do that.


Git importation tools are perfectly well equipped to preserve the code deltas in individual commits. Some metadata might get dropped, but you can and should have your commit history.


Lots of the value for me is in well written commit message bodies.

Especially if I have written them since when someone asks me how something works or why it's written that way I can read the commit body and explain it to them/refer them to the message (otherwise my answer is, I don't remember!)

Additionally if you choose your changes well and don't squash commits it can be a good guide to what else touches the thing I'm looking at.


I think you do get value from it. You won't know it till you eventually do have to go digging in commit history, though I don't find myself doing this regularly. I have found it useful when attempting to understand why code was changed or written the way it was some years previous before. I have also used it to understand when/where a bug was introduced. A tool I like for exploring git history is DeepGit.


Before using git, my code/comment ratio was ~1/2 as there were many things that "may be useful in the future" (usually they weren't)/

And yes, once in a blue moon there is a change that breaks something, I need to go back and recover some old code. Much more often - I need to see how it worked before.

Plus, let's second the psychological benefit. I don't need to worry or think twice before changing code.


Yes, quite frequently. You are correct that it is often only the most recent history that really matters, but sometimes the most recent change happened years ago, so date-based cutoffs don't work.

When you do want to convert that SVN repository, use Reposurgeon (http://www.catb.org/~esr/reposurgeon/).


Yes: I used it last week to source a problem we've had since 2012. It helped rephrase our discussion about how it was missed, why it was introduced, add discuss why it sat in waiting for 8 years.

More importantly: we have a clear date in this case for what versions we need to considering releasing patch fixes.

In some cases this can be useful: even when the functional problem it causes is not easily evident in previous versions.


We got sued about an old feature and I used revision history to recreate a snapshot of how our SaaS used to look at the time of the alleged event.


> they haven't transitioned from SVN to git solely because of the logistical challenge of migrating 30 years of commits.

I did such a migration 3 years ago at a company that had a 10 year history and it was fine using the standard tool. Is there a particular problem your company has with it, or have they just not tried?

(Also, if they are really worried, nothing to stop them keeping a read only SVN server somewhere.)


PR descriptions/discussions, ADRs, and READMEs are often germane and useful, but the revision history itself? No, almost never. Any rationale motivating a piece of code as it exists in the repo is, in my experience, best provided as a comment in-situ with the code. Information that's in the revision history but not in that kind of comment is, in my experience, historical noise.


The p4 import tool can do incrementals so I would be surprised if the sun one doesn’t as well.

You fiddle with it until you get it working the way you like, then you do an import in the background or overnight. It takes as long as it takes but you don’t care. When it’s time to make the transition you aren’t importing the whole thing, just the past week. The older stuff has already been transferred over.


"We won't switch from SVN to git due to the logistical challenge of migration".

It would be worth switching to git if the current technical costs outweighed the costs of the migration, yes.

I think there's more to it than just "we don't need all the history, just squash it and starting with git would be better" (or even "setup authors file, git svn fetch"), though.


I was recently trying to link our source code against a newer version of LLVM that had introduced a few API changes that I couldn't make heads or tails of. Without the git log I would've had to manually compare the changes and figure out what they meant, or just guess. But luckily the commit messages quite clearly explained what changed in most cases.


Yes. Sometimes you need to know why a particular thing was done.

Recently I had to go back and find out why a particular conditional was added to the code. 10 years prior someone added in a particular conditional for a bug in IE8 (which we no longer support). There was a Jira associated with it. I then knew I could remove this odd logic as it was no-longer relevant.


I often use the VCS feature of PHPStorm where I can select a piece of code and then immediately get a nice list of all the commits that changed that code. I can then double click on that code and get a list of other files that were modified. For example, I can then figure out why a certain piece of code does what it does, and why it was added.


Heads up to anyone using Jetbrains IDEs, they have some great VCS history features like annotate lines: https://www.jetbrains.com/help/idea/viewing-changes-informat...


It's very simple to migrate even really old Subversion repos to git. If they claim it is an insurmountable logistical challenge, then they don't know what they are talking about.

Yes, there is diminishing value in old commits, but they are far from worthless! Never ever destroy the commit history. Doing that is imho, a cardinal sin.


I just want to point out that it's possible to migrate an SVN repo to git while preserving history. I used svn2git (https://github.com/nirvdrum/svn2git) to migrate many repositories, although not very large ones.


It's very nice when your dependencies have a thorough history. I updated one of my dependencies and found one of my tests started failing. Very nice to pinpoint the exact modifications that caused the break, and I got a much speedier patch because of it. Depends how often you update your dependencies though of course


> so much so that they haven't transitioned from SVN to git solely because of the logistical challenge of migrating 30 years of commits.

Just bite the bullet and convert the repo from SVN to Git?

Guessing the primary issue is the time it takes to convert the repo, maybe a job to do over the holidays when most people are off?


I’m a QA engineer and I go back pretty frequently so that I can see when a issue popped up and what might have caused it. It’s super useful for root cause analysis on bugs. also I use it to troubleshoot legacy versions for customers. Although I see less of that since I work on a saas product now.


I use it quite a bit for various reasons. But I think I get the most value from just knowing it's there. I can plow ahead with anything, try anything, experiment, it doesn't matter. I know that my previous history is there waiting for me if the plunge I just took doesn't pay off.


Git blame shows you when and why a given line of code was last touched. That alone is worth all of the overhead of revision control. Commit messages can contain more context about changes (what story where they for, what bug where they trying to fix.) I probably use it at least once per day.


Yes, I often have to go back into commits from a year or two ago to discover the motivation for certain decisions.

https://en.m.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fen...


Often, but not to the point of more than 3 months back. To me, it's more so I can delete obsolete code instead of commenting it out, without worrying that it'll be irreversible. Or sometimes if a feature is suddenly broken we can trace if any changes were made at the time.


Yes, sometimes you see some weird code and you have no idea why it was added, searching the history I can see the commit and the ticket related with that piece of code.

Also sometimes you want to find the author of a piece of code to ask more questions why some things was done a certain way.


An underrated feature is “blame” in the IDE. IntelliJ or Eclipse both support showing the last commit a line was changed in in the “gutter” of the code editor.

Makes it easier to figure out how old a line of code is and (if the commit messages are any good) why it was introduced or changed.


>why it was introduced or changed.

"Fixed formatting"


That one’s easy enough. Show diff, run annotate again in the left hand side.

Which is also why I always separate my formatting and structural changes into two sequential commits. Last interesting change is easy to read, and the previous one is the work of a few extra seconds.


A good UI is key. If your editor can navigate blame and commit history with just a key press, it actually speeds up figuring things out, especially in old codebases with bit rot. Benefiting from 10+ years of history might be rare, but two years feel like yesterday.


I used to get irritated that peers made me jump through hoops to collapse commits into singular meaningful commits. Why bother. Nobody looks at history.

And then I started looking at history and its invaluable to have particularly when understanding rationale or debugging issues.


There's kind of a vicious cycle here. History is often garbage, because nobody looks at it, and nobody looks at it because it's garbage.

I find one thousand line commits with changes all over the place evil, but have a hard time convincing people that atomic commits are worth the effort. You know, it's not Real Work™.


There are numerous comments showing why git history is useful, but I think nobody mentioned this.

In Goland/idea you can look on the git history of only the selected code. I use this constantly to see how the code has been previously modified before I make my own changes.


There's another case of reverting the codebase to an earlier version just to run it


Of course we do! Using blame/annotate every team member can get the idea who made the changes in question and why they did it. With 9 years old codebase of 400+kLOC (not counting blank lines and comments) this perk is invaluable.


I often wonder:

- What is this trying to accomplish?

- Why are you doing it this way?

- Why not this other way?

Ideally these things would be answered in comments, but they often aren't. The commit message hopefully answers #1, and it links to the code review tool which may shed light on the others.


yes, I have a tool called automigrate that uses git history to generate and apply DB migrations

I also use history for git blame to understand when a change was introduced for debugging / intent purposes; this can go back months if not years


Besides the obvious benefits of git annotate and git log, the ability to do a git bisect to isolate the exact commit that caused a regression is invaluable. So, to answer your question, yes we do get immense value out of it.


Yes. Not frequently, but when sometimes git bisect saves my day. That's why I don't allow rebase or squash in my repos – code's history is valuable information, and I don't want to lose it.


Git bisect solved a problem I had before. I had exhausted debugging and even println debugging, in the end I had to find out which commit introduced the bug. Once I found the change it was very easy to fix.


Sure, history has definitely been useful for me. But I've been involved in a few version control system changes, and as long as the old system continues to be available for reading, cutovers are fine.


I most often use VCS history to figure out why some piece of code exists. This probably wouldn't be necessary if it were reasonably readable or properly commented by whoever wrote it.


Just recently I found a 7 years old regression in our codebase, I could pinpoint the exact commit where it was introduced. We use mercurial, I find hg grep and bisect really useful.


I have a tool I built that will link any line directly to it's pull request it was introduced in. That has helped me so much with "why the hell is this like this?"


Absolutely. Its shocking to me that any experienced software developer could even pose this question.

The sequence of diffs is much, much, more informative than the current state of the software.


Yes. Nearly everyday. And the overhead of migration is nearly fixed no matter the size of the history. The expensive part is writing the script to do the export/import.


Practically daily. Every bugfix or feature starts with looking at the already done mistakes engraved in the tree.

Though we stand on shoulders of midgets usually, you still get a better view.


Just yesterday I had to provide dates on a proposal for different projects that I had worked on over the past 12 years.

Being able to pop into our SVN history made this a trivial issue ask.


Yes, absolutely. In VS Code I use the feature that shows git blame on the current line, which can be quite helpful in understanding code history and responsibility.


Heck yes! In fact, just a couple weeks ago, I used the commit history with git bisect to find the commit where a regression was introduced. Felt like wizardry!


I use `git log -S` very often to read the commit message for figuring why a change happened.

A wonderful reason to create atomic commits with good commit messages.


I have never needed it so far, although I have it in case it is useful to me or someone else in future.


No. Nothing after 30-60 days. Always possible for very rare exception, but nothing in last 10 years


Don’t ask the lawyers. They’ll want your document retention policy to apply to source code...


It's like a seat belt. You are only happy to have it when shit his the fan. Lol.


Revision history is something you don't need... until you very badly need it.

Call it insurance.


At my current place they use SVN like they would use stones and sticks.

Most commit messages are only the code of the Jira issue and maybe its title, but almost never what they actually did or why. Frequently, they will have half a dozen commits with the same message -sometimes even unrelated commits because they got a bit too lazy-. Most Jira tasks don't have a description. If it's a new development, the documentation is generally elsewhere and the Jira task has no description at all. If it's a bug, it may have some screenshot attached, and it sometimes has an explanation but generally the explanation is given verbally to the developer.

A handful of developers heard The Architect say once that it's preferable to submit one commit for each changed file than to put two unrelated changes in the same commit, and so they do. They change 12 different files for a certain feature and they will make 12 separate commits, one file each. Not always one after the other but sometimes dispersed through the day. One or two developers obsessively commit each single change they do. Meaning they write a couple of lines of code, commit it, and then try it, see it wasn't correct -there was a typo, it wasn't the correct field they needed, whatever-, edit again, commit again, etc.

They have a certain backup process which stores a handful of XML log files from some processes; they store them by committing them to the SVN repo. A commit every hour, in the development branch.

They have a flow with two branches, trunk and development, and a 6 month cycle for releases... It sort of works this way:

Start (theoretical): People develop on development. Two -or two and a half- months before release, they make "the switch". Everybody commits whatever they are doing at the moment and stops for a day. They merge development into trunk. and then they all start working on trunk for the rest of the cycle until release.

In that final period, trunk is mostly "open" -more on this later- and people just commit to it and that's it. development is abandoned and deleted. A new development branch is taken from trunk but is not generally used during this period.

When release time comes, trunk is tagged with the version. Everybody switches back to the new development and development is done there. But this is not what happens because there's another period of maybe one or two months, where trunk -the released version- has a number of a. bugs, b. stuff that was unfinished, c. smaller things which "well, we could do it on trunk because it's just a small thing". So, what happens is they go on working on trunk for that month or two, and only gradually people start working on development.

Also, they don't really tag trunk at release time because it's not "done" yet. When the bug hunting season is over -or when they are just tired of it- then they tag and freeze trunk, with the version, move it into storage. Nothing in this is really planned. They just decide one day and then tell people, who just rush whatever they were doing on trunk and commit it, or abandon it and move to development.

During both pre-release and post-release periods, merges are done about once or twice a week from trunk to development. If you use SVN you'll know that these merges are seen as a single commit in the receiving branch. You can see the full history if you query the merge info, but it's not shown directly in the main "svn log".

All this means they have:

- about 40% automated commits from some backup process.

- Most changes happening in the other branch, so you need to go through mergeinfo several times.

- Main development branches deleted and created new every so often.

- Most people not explaining what they did in commit messages.

- About half of the bugs in Jira not describing the problem and almost all of the tasks not explaining the work to be done.

So... do we ever truly use the revision history?

Yes.

A few people -particularly Karen- use it to drop the blame on whoever they want. They get a bug, they open the svn log for something related, see a name they don't like much and say "Ok, just assign this to X, because they did something on that file 4 months ago".

I am using it, sometimes -with some effort and some success- to try to understand just where do some heavily copy-pasted snippets come from, so that I can wipe them out for good. Also, sometimes I use it just to write in my diary and laugh a bit about it so I don't cry so much when I get up in the morning. This is probably the most valuable thing we get out of it, because it keeps me... well, insane, but at least not murderly insane.


No, but I find revisionist history to serve me quite well.


Of course. It usually goes like this:

- Why the heck is this here?!

- git blame, git log, git show

- Ahh...


there are tools which migrate history from svn to git


at my previous job (consultancy) we provided detailed bill that included literally the (slightly edited) git log for the month


Postmortems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: