Related: Fossil SCM is a distributed source code manager (just like Git), but with the added benefit that documentation and issues also goes in the repository (so like GitHub, but packed into the repository). When you clone a Fossil repository, you get everything, not just the code.
Just don't forget that Fossil does not support rebase/amend, on principle.
So if you want both "commit very often" and "have nice, readable commits with clear descriptions", it is not for you.
I read this document every once in a while, and I envy Fossil's author. He clearly is a much better programmer than I am. For example, he says things like:
> Git lets a developer write a feature in ten check-ins but collapse it down to an eleventh check-in and then deliberately push only that final collapsed check-in to the parent repo
> Fossil pushes all 11 check-ins to the parent repository by default, so that someone doing that bisect sees the complete check-in history, so the bisect will point them at the single original check-in that caused the problem.
And I think: Well, it'll work if you are drh.. I know if I were in this situation, commit 3 is broken because I forgot to run tests, commit 5 would not even compile because I made it just before going home, and the main app would not even work until commit 8 because I focused on the unit tests..
My non-final commits are messy. I don't want to inflict them on people.
This sums up my thoughts as well. Fossil breaks several flows:
- committing code at the end of the day
- making a temporary commit in order to checkout to another branch to work on things, etc
- committing code in order to pull onto another computer (admittedly a bit lazy, but I do this for small personal projects)
- fixing a commit that accidentally checked in binary files, secrets, etc. it happens!
I find myself checking out to different branches frequently to look at something for a colleague. I would much rather make a temporary commit to later rebase instead of using git stash.
Reading about fossil evokes a little bit of "you're holding the phone wrong", in my opinion. Git does lack things like issue trackers but I would rather not be limited by own tooling for little gain.
I don’t really understand the issue with doing this. Context switches sometimes take a while to get back to what you were working on, I’d rather not rely on local git stash state to get back to it. And if I do end up working on something else for a while then someone else might pick it up, they might as well have access to my WIP. Also the cost of a CI run is really low, there’s no practical downside to pushing it and letting the tests run on it so I have some more context about its current status when I or someone else picks it back up.
Stash is not reliable because it's not pushed to the server. If your computer breaks or you just want to move to a different computer, your stash won't follow you, but branch will.
I was responding to the comment that "making a temporary commit in order to checkout to another branch to work on things" is a "a Git bug."
But sure, I'll grant you that it has issues. Heck, Git is distributed, so if you're off-network, committing a bunch of code, and your computer breaks, it doesn't matter if you're using stash or commit. It all depends upon your use case.
You're not OP, but if they, needing to switch to a different branch for a bit, "would much rather make a temporary commit to later rebase instead of using git stash" I'm curious why. I know I used to do it because I didn't understand how stash worked, and then I read up on it, noted what the commands are in my listing (since I don't need to use it daily), and refresh my memory from my notes when I need to use it.
But I also prefer to do smaller commits, more often, and find history to be important. Likely because I do a lot of brownfield development and maintenance. And if I need to switch to another branch to help a colleague, and my code is such that I can't commit it yet, I'll either git clone to a new directory (if they're a new employee or someone just learning a project, and I know I'm going to be helping them a lot) or leverage git stash.
The main reason I prefer commits is because I frequently switch between computers, so I push up those commits. I also write detailed commit messages so I can come back a week or a month later.
Admittedly I haven't read that far into git stash's documentation, but commits + rebasing is far simpler in terms of cognitive load and doesn't require me to remember what I have in my stash on what computer, etc. All I need to do is push up the branch, and I can just rebase master onto the branch to pull in updates, etc.
I've been working primarily in public, shared repositories for almost 25 years with open source projects that have used cvs, svn, and git. A big step is to let go of the fear that you might be embarrassed by what you commit. That is partly about managing your own anxieties and partly about developing a culture that supports people and recognizes nobody is perfect. Still, there is a real part where you learn from mistakes and get better at making low-embarrassment commits over time. A team has to expect this when onboarding newer participants and mentor and support appropriately.
A second, equally important step is to adopt tools and methods which allow the appropriate amount of build and test for your team. If you insist on full continuous-integration and an always-working master branch, you had better make sure people can run build-and-test cycles from a working copy, prior to any commit. On the other hand, if you have focused on automated build-and-test being driven by commits, you had better adopt an appropriate branch and merge strategy to tolerate the fact that untested things are going to be committed as a regular event!
With git, I think the disconnect people have is that you should not be rebasing or amending history that you have already shared. Someone following along in a shared repo should never have the rug pulled out from them, putting their own tools into a broken state. I would much rather see an uglier commit history than feel like I am being gaslit by transient errors and here-today, gone-tomorrow commits. Depending on your exact tools, sharing might mean pushes to the central repo or everything in your own self-hosted repo that you allow others to see, even read-only.
If using a sharing model like github, that means you could feel free to rebase or amend only the batch of changes since your last push to upstream. Nobody looks at your own local clone of the repo. Every push should monotonically move the history forward in a way that all observers will see the same way. I wouldn't care if you amended commits before you pushed them, since I could never see the earlier drafts. I also would not care if there are some buggy commits in your personal working branch, as long as you address them with further commits before attempting a pull request or other merge to a shared integration branch. But, I should never be able to pull twice in any available branch and face a bizarre merge-conflict because you amended something in between my pulls.
Also, many teams can benefit from earlier sharing to avoid different private efforts from diverging too far and creating larger integration headaches or dead ends. Seeing individual commits that expose developer thought process can also help with communicating new ideas or identifying potential problems. Large rebases can interfere with this. Some organizations also will find it easier to do disaster mitigation if they know that all work products are pushed regularly (e.g. daily) to a shared repository, rather than having to worry about many distributed repos holding sole copies of new work. Such shared, periodic snapshots put an upper bound on how much work product you might be able to amend/rebase without sharing.
This is not about "embarrassment", I was talking specifically about bisect. Bisect needs every commit to be usable; and this, in turns needs either mutable history, superhuman programmers, or manual file copy. So do many other things -- for example, cherry-picking is almost impossible if the feature is spread over tens of commits.
I agree about not changing published history. But you don't need Fossil for that -- gitlab, github and even vanilla git all have ways to make master branch "fast-forward-only", which eliminates this problem. And if you are worried about "here-today, gone-tomorrow commits", git has a per-branch flag to prevent rebases and warn you loudly about them.
I agree that working in the open is good, but again, there is no need for Fossil for that -- that's what "sandbox" branches are for. As long as you don't push incomplete work to master, you can share it easily.
It would be cool if we could have something that could have both the individual commit history, and without removing/changing anything add a default view which are aggregate:
I absolutely agree. I always feel like commits are trying to serve two masters:
- Describe the immutable history of work done on the project by individual programmers. ("Then I fixed X but broke Y", "then I fixed Y", "then I committed before going home")
- Describe the semantic history of features / bug fixes. ("Merge feature X", "Fix bug Y", "Refactor feature Z")
Both of these logs of history are valuable, and git commits conflate both of these ideas to the detriment of users. The natural affordances of git seem to push you toward seeing commits as simply logging history, but having a history of features & fixes is more useful. (Because of git bisect, and it makes git log more useful, and so on). But throwing away all that information about what actually happened seems unnecessarily lossy to me. Its like we're trying to paint a revisionist history of our work. And retroactively modifying commits with rebase and so on works against a lot of git's other design assumptions - like commits being content-addressable)
I wish there were ways of flagging commits as "oh and by the way, this commit isn't just me committing for convenience. This commit merges feature X into the project. We've run all the unit tests. If you run git bisect, use this sha but not the others. Whitelist this sha in git log by default."
Though I personally rebase and rewrite history fairly frequently, so each step is friendlier to reviewers and works independently:
Couldn't you make a new history branch, copy/paste as needed from the original one, and just abandon the old history?
That's effectively what git does under the hood anyway - all your shas are changed, but the old ones are left around. If you haven't named (branch/tag) them, you just don't see them normally, and they may eventually get garbage collected.
Well, I need something where I store temporary files. Like right now, I am in the middle of prototyping major new functionality. It is huge, it is ugly (still) and it touches a half dozen of major subsystem.
It is a huge blob of changes. When it will be ready, I'd have to split it to maybe 6 to 8 commits, with well written commit messages. But I cannot do this yet -- for all that I know, I'll find out my design was bad, and I'll have to change those things tomorrow.
So the question is: what do I do with it?
In the old days, I had a VCS which could not change history (SVN). So I'd keep multiple worktrees and back them up manually ("project-2009-12-03.tar.gz", "file1.cc.before-refactor") and so on. The data would be lost sometimes. It was a pain generally.
Now, I have a git branch. It has commit messages like "fix crash in Foo()", "what if we used trie?", "oops forgot to update Bar to match" and so on. They should never see the light of day, and they won't -- they'll get re-made into different commits later. And git will help me to make sure the changes are not lost.
It looks like with the fossil, I should be going back to multiple worktrees and manual backups. I don't understand why would one want to do so.
With Fossil you could create many private branches off your feature branch with your experiments, and merge them into your feature branch when you're satisfied with that experiment.
that's the thing -- I'll be making 6-8 different branches out of that experimental branch.
In rebase world, I have a cycle of "cherry-pick a few changes, get them into master, rebase experimental so it gets smaller; repeat until nothing is left of experimental"
In permanent world, I guess I have two worktrees and copy files one-by-one, manually?
It seems like both "an immutable view of what happened" and "a crafted narrative of feature changes, bug fixes and releases" are useful. Its ridiculous that I'm expected to pick one. Git and fossil should raise their game.
(Yes; git's tooling supports either approach, but not both at the same time. A better tool would let me avoid making this tradeoff at all.)
You can. That is certainly a possible axiom as evidenced by fossil and some others, but as evidenced by git and common usage it is not necessarily "axiom 0" for all possible VCS's.
Unless you want to get really semantic in which case, sure "history" can't be changed, but "recorded history" certainly can.
I honestly think that commit very often is overrated. It really ought to be save and back up your files very often.
When I implement something, I don't bother making commits until I have completed the features. I then take the diff against the code and then selectively stage parts of the diff and make commits out of them.
If someone else wants to see what was done so far, I just send them the diff as it is.
git is really meant to show a meaningful log of changes to the codebase. If one wanted to use it as a backup, there's no reason why one couldn't configure their editor to just run git commit -a -m "<current timestamp>" every time they saved, but that wouldn't result in a history that's useful in terms of being able to tell what actually changed and why.
I worked in online IDEs which did automatic backups, and it was useless. Those timestamps don’t mean anything to me, and diff interface was very bad.
But semi-automatic backups, a.k.a. frequent git commits, are very helpful. For me, three-word commit messages like “no longer segfaults” or “will try sets” are exactly what I need, as they easily let me find out what exactly did I do so it segfaults again, or to revert code a bit if it turns out the sets are not a great idea.
Of course no one else, including me, would care about this once the feature is done, and that’s where rebase comes in.
> For me, three-word commit messages like “no longer segfaults” or “will try sets” are exactly what I need, as they easily let me find out what exactly did I do so it segfaults again, or to revert code a bit if it turns out the sets are not a great idea.
If you're going to end up squashing all those in progress commits down to a single commit, then that will work. If you're going to try to make several logical commits, then it becomes a bit more difficult (unless you just take the diff from the original branch and start over on a new branch).
I actually find that using the multi-level undo feature in my editor sufficient for trying out small changes to see if they work or not. That is, if I get to a point where I can't figure out what I messed up, I can simply run something like :earlier 5m to go back to what the state was 5 minutes ago and get back to a working state.
Not just the Wiki and Issues, but also more things like release artifacts can be (optionally) sync'd and are part of the repository (as "unversioned content"). There's also a Forum included, and as with the rest those messages are distributed.
Something very nice is that the issue tracker is nothing but an SQLite database. You can customize it any way you like, running arbitrary SQL queries on it, and generating any html you want. I use it for notes, among other things. In addition, you can extend it in any language using a CGI program.
I've been using this since I asked HN why BugsEverywhere failed (https://news.ycombinator.com/item?id=20963039) about six months ago and have to say it's far superior to BE. So far, I haven't found a case where the synchronization to Github didn't work as expected. I also have sensitive repositories that are hosted on Keybase and, since it doesn't include integrated bug tracking, we use Git-bug as a distributed team.
I think BugsEverywhere might have failed because of suboptimal design and lack of features? I have not use BE nor git-bug before, but from looking at the doc pages, git-bug looks way superior.
In particular, it looks that by default, BE stores bugs in main repo. Which probably means every comment is a commit, which puts lots of useless entries into the main commit log.
(They do mention workflows that store bugs in separate branches.. but this seems undocumented, non-default option)
Ironically, I am also not sure how well BE works in the distributed fashion. git-bug has this nice section on how they implemented CRDTs using git, so everyone can comment at once at there are no conflicts.
BE, on the other hand, seem to have no mention of such thing? In fact, I think they assume single central bug repository per project, on some server which handles email submission (how else can one implement debian bugtracker-style email interface?)
> Bugs Everywhere (BE) is a bugtracker built on distributed revision control. The idea is to package the bug information with the source code, so that developers working on the code can make appropriate changes to the bug repository as they go. For example, by marking a bug as “fixed” and applying the fixing changes in the same commit. This makes it easy to see what’s been going on in a particular branch and helps keep the bug repository in sync with the code.
You have snipped the first part of my message. This is what I said:
> git-bug has this nice section on how they implemented CRDTs using git, so everyone can comment at once at there are no conflicts. BE, on the other hand, seem to have no mention of such thing? In fact, I think they assume single central bug repository per project, on some server which handles email submission (how else can one implement debian bugtracker-style email interface?)
So, here is git-bug talking about multiple submissions: [0]. It talks about how to order messages and resolve conflicts in decentralized fashion.
Does BE has anything like this? What happens if two people try to claim a bug or change severity to different values? The manual certainly does not tell. When I looked at the bugtracker for BE itself, looks like they don't have any special handling of this, and concurrent PRs will conflict. And for comments, we just use timestamps and don't try to have consistent ordering in any way (
In fact, I think the very next paragraph in your link nicely illustrates how "decentralized" they are. This is what they say:
You know that this URL is? This is a primary BE bugs server. It is not using git, hg or any other decentralized protocol -- it is just good old centralized file storage.
Even their model -- storing bugs in the main branch -- prevents decentralization, as ability to file bugs (usually granted to everyone) requires ability to commit to master (usually restricted).
So yes, they claim things on the page, but they don't expect true decentralized operation, just a one way mirror.
> So, here is git-bug talking about multiple submissions: [0]. It talks about how to order messages and resolve conflicts in decentralized fashion.
This is the part that seems to sum that whole page up:
> When pulling bug updates from a remote, we will simply add our new operations (that is, new Commits), if any, at the end of the chain.
Which indicates it's just changing a value without any indication to you that you changed what someone else wrote.
I'm not so arrogant as to claim that is what happens just from reading some text on a page, so I'd love for someone more experience with this tool to clarify how a "multiple edits of the same attribute" event flow plays out.
> Does BE has anything like this? What happens if two people try to claim a bug or change severity to different values?
They're just files in the repo, so conflicting changes behave exactly the same way changes in git or hg or whatever do: you have to resolve the conflict.
> In fact, I think the very next paragraph in your link nicely illustrates how "decentralized" they are. This is what they say:
> If you have any problems with BE, you can look for matching bugs: > $ be --repo http://bugs.bugseverywhere.org/ list
You know that this URL is? This is a primary BE bugs server. It is not using git, hg or any other decentralized protocol -- it is just good old centralized file storage.
BE has the ability to query a remote bug repo without the need for cloning the entire git/hg/whatever-vcs/no-vcs repo it's hosted in. That's what you're seeing there. But hey good for you on a ridiculous rant based on misunderstanding something. Have a gold star.
> Even their model -- storing bugs in the main branch
... bugs are stored in a directory, and versioned like regular files. The entire point being that as you make a change to the code in a branch, when finished you can also close the bug in that branch. When merged, that bug is now closed.
I don't know whether you just didn't read anything about BE, or you're deliberately making shit up to troll, but your claims are wildly inaccurate.
I won't comment on BE because I simply don't know enough about it but I'll try to clarify some things.
Instead of storing bug state, git-bug store intent of the users in a chain of commit. When read, those intents are interpreted and compiled into the state of the bug at that point. When a concurrent edition happen and a merge is needed, the concurrent operations are reordered as well as possible and a new state is computed. This state might not be the perfect state but:
- it is guaranteed that the bug is in a legal state (no actual merge conflict making the data unreadable)
- even if the reorder is not the one you would have wanted, the intent of user is preserved. It is clear who did what and you can "fix" the state manually by adding more operations if needed.
I'm not 100% sure that it's the optimal solution but it works. It might break in some full P2P scenario. To be honest, I'd like to eventually move over a full DAG of operation instead of this purely linear way so the conflicts can be merged in a more optimal way. See [1] and [2] for details.
What I'm sure though is that storing the state directly in git and especially alongside the code is the bad way to do it. It does lead to have unreadable state and badly merged state, and has been stated as a reason for failure by at least one similar project.
> The entire point being that as you make a change to the code in a branch, when finished you can also close the bug in that branch. When merged, that bug is now closed.
It's a nice feature but implementing that way implies the shortcoming described above. I believe the best way to implement that is to have git-bug "aware" of branches and have it react to those changes to either change the state or recompute it on the fly so the conflict resolution characteristics can be preserved. See [3].
"git" is in the name but it's actually not tied to it so much. If you can implement those interfaces [1], you can port it to another VCS and everything else will work. It's not trivial but entirely possible.
And perhaps ditz¹ [+ pyditz²], BugsEverywhere³, git-issue⁴.
As someone who -- I guess obviously -- cares about distributed issue tracking, I really like the direction and implementation of git-bug. The termui subcommand is really cool, and it appears to be enough to push co-workers to try it when they normally push back on "weird" tools.
I think the closest we had to user friendly before was the read-only web interfaces in ditz and BE, but user friendly isn't all that friendly if it is read-only.
http://fossil-scm.org/
Also been discussed on HN before, a selection:
- Fossil VS Git (https://fossil-scm.org/fossil/doc/trunk/www/fossil-v-git.wik...) - https://news.ycombinator.com/item?id=19006036
- https://news.ycombinator.com/item?id=12673229
I haven't personally used it for more than toy projects, but looks very interesting and would love to use it more.