The other day I was trying to work with git LFS. I was very surprised to find out git-lfs, as in the binary, CLI application is the only (open) implementation in existence. There is nothing else. And even it itself does not offer itself up as a library; so even native Go code (the implementation language) has to fall back to shelling out to the CLI git extension! Not even bindings are possible. Such a painful loss of interoperability: IPC via return codes and parsing stdout/stderr.
It seems a similar story with the rest of git. I have hopes for gitoxide aka gix, and think the approach of library-first is correct going into the future. A CLI is then simply a thin wrapper around it, mapping argv to library operations basically.
> It seems a similar story with the rest of git. I have hopes for gitoxide aka gix, and think the approach of library-first is correct going into the future. A CLI is then simply a thin wrapper around it, mapping argv to library operations basically.
It's worth noting that there is currently a push to "lib-ify` git internals and it's a gradual process. I'm not actually sure how much of this work has actually made it into the tree yet but I've been seeing patchsets towards that goal on the mailing list since at least January.
Dulwich[1] is a pure-python Git implementation that's been around for many years, meant to be used as a library. I used it a long time ago to make a git-backed wiki. There's also libgit2 which is exactly what it sounds like and it has mature Go bindings[2]. I'm sure there are more implementations.
I always respected the fact that the authors of Subversion, right from the start, structured their software as a library, with the CLI being a user of that library.
The way IDEs and GUIs interacted with CVS was to shell out to the CLI, which inevitably had problems with filenames with spaces, parsing of error messages, etc. Subversion understood in 2000 that the things were changing, and that the CLI was only one way you'd use a VCS. People were more and more interacting with the VCS via IDEs, or via right-click menus in Windows Explorer, etc.
I felt happy knowing I'd never again have to deal VCSs via tools just shelling out to their CLI ever again. How wrong I was...
JetBrains forgot this memo, as it requires you to configure svn.exe in order for it to work. And usually TortoiseSVN is the way to go for subversion...
Isomorphic Git is a Git implementation purely in JS (no WASM). I wrote a minimal library to handle LFS with it, it’s not that hard, the spec is pretty small.
I'm guessing you're being sarcastic. If so, it's not really a fair criticism. Go has properly-typed return values (not just parsing text from stdout) and any type can implement the error interface, allowing for error types with any custom, properly-typed attributes. You use Go's type assertion or errors.Is / errors.As to handle the latter.
Maybe it's my job, but I don't see a lot of Go code treating errors as anything but strings. Errors I get from Go programs are similarly vague and redundant and lacking context. ("error from RPC: failure making from call: couldn't complete operation: 7 is not a valid flag" ah thanks, so much better than stack trace)
Yes, but don’t you value the fact that this error message was painstakingly crafted by hand under the care of skilled golang artisans instead of being mass-produced by an automated exception handler?
Yeah because that's the error interface? It's literally a method that returns string.
Unfortunately, I guess after the fact they decided pattern matching was useful, so they did it via reflection (errors.As). But they don't warn you that errors.Is might return false for a true errors.As, it is certainly a mess, but the pedagogy could be improved.
I'd like to join the weekend bitching. I do think go's errors are often shitty. Stacktraces are easy to get with [0] and you can always do fmt.Errorf("%s", string(debug.Stack()) to get them in an error. I've not profiled them in go, but in native code they're horribly inefficient to gather so you don't want to be using them unless you really want to I'd bet
That said, other applications have exceptions and people still manage to write functions that return a Boolean to indicate success, and log an equally unhelpful message rather than returning anything I can work with. Maybe we need new programmers.
You jump to a critical interpretation instead of allowing for instance "It's the weekend when I'm off the clock and don't police myself so much"? Interrsting.
I think that's a fair critique. You have to really put thought into good error messages and error handling. When you do, they're nicer and more informative than an ugly stack trace. But a poor error message chain like the one you've shown is significantly less helpful than a stack trace.
The problem with the IPC approach isn't error handling - in the error case a good string is usually fine. The problem is with parsing the output of the sub process. A library can return structured data. Parsing always has the risk of format change or quickly gets very heavyweight. Both in the syntax of the output and the parsing routines. It is another good example of the SVN system, that their commands usually have the "xml" option which creates structured output in xml format. So you can use standard parsers to get robust output.
This is also one thing I like about powershell, that commands can produce and pass objects instead of just writing to stdout.
I don’t know what that is, but their docs very prominently and strongly say this:
> However, we do not maintain a stable Go language API or ABI, as Git LFS is intended to be used solely as a compiled binary utility. Please do not import the git-lfs module into other Go code and do not rely on it as a source code dependency.
I made Kubernetes into a library once. Like the apiserver and default controllers, running in-process. Copying and pasting were involved. I was not expecting support.
If code is out there under a compatible license, you can do whatever you want with it. If it breaks, you get to keep both pieces.
That you technically have the ability to call into an internal module does not in any way constitute it "offering itself up as a library", and doesn't make it effectively useful in that way.
A library and a module are the same. Having an open and available module does not make it "offered up as a library," was the point I was trying (and failing, evidently), to make.
You made the point perfectly. So did the docs in the first place.
It's not on you that someone decides not to agree the sky is blue because they use their own definition of blue.
I have been wanting something like this, but with a few more features such as "git diff". I took a crack at it, but the popular (and maybe only) Go Git implementation has some issues:
In my opinion github.com/go-git/go-git is a very high-quality project. Just because it doesn't solve some super-specific use-case that you have, doesn't mean the project isn't good. It's open source, have you tried opening a pull request to solve your own issue?
Sorry to hear that, but go-git is good enough to be useful to Vault, Pulumi, kubernetes/test-infra, and many other projects which directly import it.
Some folks seem to expect it to be a feature-complete library-usable Go re-implementation of the entirety of Git, despite it being (afaik) largely unfunded / volunteer-driven in recent years. I don't think that's realistic. And yes in many use-cases it may indeed make more sense to shell out to `git`, there's nothing wrong with that.
In any case, in my direct experience, the go-git maintainers do happily accept helpful PRs.
It just seems really weird to me to see folks trash go-git, while simultaneously cheering this new gogit project which is just 400 lines and intentionally by design/scope contains only a tiny fraction of go-git's functionality.
> go-git aims to be fully compatible with git, all the porcelain operations are implemented to work exactly as git does.
It literally says this on the tin, yes. But it's not. I'm simply providing my own experience and a disclaimer.
Sure, they "happily accept helpful PRs" but the code is so very complex and many paths have horrible performance that it's an incredible uphill battle. To be clear I'm not simply shitting on this library, I'm saying it's not a reasonable choice over just using git directly, and is unlikely to ever be.
I haven't made any comment on TFA so while I'm not cheering it on, I do think the spirit is correct: just do the thing you need to do, don't try to be feature complete, just solve the task at hand.
I interpret "aims to be fully compatible" as meaning the operations it implements are intended to be compatible with how Git implements those operations. I do not interpret this statement as saying they implement all features of Git.
The godoc also says right upfront it "nowadays covers the majority of the plumbing read operations and some of the main write operations, but lacks the main porcelain operations such as merges." - https://pkg.go.dev/github.com/go-git/go-git/v5#pkg-overview
> I'm saying it's not a reasonable choice over just using git directly, and is unlikely to ever be.
OK, that's apparently true for your use-case. But again, what go-git implements is directly useful to a number of very popular projects, as well as literally two thousand less popular ones.
I find the exported functionality to be high quality, at least for my own use-case. I'm not commenting on the code quality. If I need a shed for bikes, and someone is giving out free but ugly bikesheds, I'm thankful. I don't complain about the color of the bikeshed.
It's useful for toy projects or narrow stateless use cases, like testing or fetching something one time, like in Terraform, but as soon as you have to deal with real world scenarios related to actual distributed version control, the wheels completely fall off.
Can you use it to implement a git remote? No. What about to write some commits? Also no. What about a read-only client? Nope.
It only works well if you have a narrow use case, and only works predictably if you are in full control of the repository being interacted with. Otherwise it is simply going to cause more pain than just using git directly. Look at the open issues, the open issues related to it in projects that depend on it.
You clearly have a horse in this race and that's fine, if it works for you, great. But I don't recommend it. And no amount of lobbying on your part is going to alter that reality. If you're doing serious things, use git directly, and if you're not, it's probably simpler to write it yourself.
And lastly, the exported functionality is not high quality. It performs poorly in many scenarios where shelling out to git does not, and it breaks with any sort of complicated set up.
not really. its littered with interfaces, in some cases many levels deep, which in Go is an anti pattern. original author clearly came from Java or some other deeply OOP language. also:
If I understand your issue correctly, it's `git diff --cached` that you're specifically looking for, not just `git diff` in general?
Your expectation seems to be that someone has already implemented this in Go for you, for free, but this is not the case. Why is this your expectation, and what does complaining about it accomplish?
> If I understand your issue correctly, it's `git diff --cached` that you're specifically looking for, not just `git diff` in general?
incorrect. I am simply looking for a normal "git diff", which compares the index with the working directory. shocking this is not available out of the box, hence my issue.
object.Tree has a Diff() method. You can get an object.Tree of any commit hash from a Repository with its TreeObject() method. I don't recall offhand how to get an object.Tree of the working directory or index (perhaps from the Repository.Storer?) but worst-case you could just create a new commit in order to get a hash and then run the diff.
The package is more read-oriented than write-oriented; the docs specifically say it "covers the majority of the plumbing read operations and some of the main write operations". If you're trying to diff working directory modifications, that's a write-path use-case since it implies files are being changed / you're not trying to diff two pre-existing commits.
(edit: removed some text based on an initial misreading of your statement.)
> You can get an object.Tree of any commit hash from a Repository with its TreeObject() method.
OK, but I am not dealing with a commit, as mentioned in the issue and my previous comment, I am dealing with THE INDEX and working directory, not a commit.
> I don't recall offhand how to get an object.Tree of the working directory or index (perhaps from the Repository.Storer?)
right, so you can see how its not as easy to hand wave away the problem as you initially thought.
> worst-case you could just create a new commit in order to get a hash and then run the diff.
no, I am not making a commit just to diff the working directory. would you tell someone to do that with the command line tool as well?
> The package is more read-oriented than write-oriented; the docs specifically say it "covers the majority of the plumbing read operations and some of the main write operations".
cool, we are talking about a read operation, no writing is being done.
> If you're trying to diff working directory modifications, that's a write-path use-case since it implies files are being changed / you're not trying to diff two pre-existing commits.
no its not. I am not writing to anything, only reading.
> no, I am not making a commit just to diff the working directory.
OK, you do you. I say it depends on the situation: if this is a throwaway clone (especially an in-memory one), creating a commit is harmless. I mean it's certainly not an ideal solution, but at the end of the day it solves the problem at hand.
> would you tell someone to do that with the command line tool as well?
It's not my library, I didn't design it, I'm just trying to provide a solution to the problem you posed. If you don't like that solution then, well, OK? Shell out to `git` and call it a day, or write your own `git` implementation for scratch, or send go-git a PR. Any of these would be more productive than complaining about how a free open source library doesn't provide a solution to your use-case.
> I am not writing to anything
Clearly you've written to the files in the working directory, otherwise your diff would be blank.
Again, it's a read-path oriented library. If you're writing to working directory files, your use-case may not be aligned with that of the package authors.
> creating a commit is harmless. I mean it's certainly not an ideal solution, but at the end of the day it solves the problem at hand.
youre making my arguments for me here. youre essentially saying the library is so poorly designed that the "easiest" solution is to create a commit, rather than actually just diffing the worktree directly.
> Shell out to `git` and call it a day, or write your own `git` implementation for scratch
again, making my arguments for me. youre essentially saying the library is so poor, that is cant support simple use cases and it would be easier to shell out to Git than to write a program that diffs the worktree.
> a solution to your use-case.
I would say its a solution to essentially every command line user. how many people DONT use git diff?
> Clearly you've written to the files in the working directory, otherwise your diff would be blank.
the diff it not writing anything. the issue is not "how do I write to a file", that can already be done with the standard library.
> If you're writing to working directory files
writing to files it done outside the scope of the Git package. the Git package is only needed to handle diffs that were done with some other tool.
> your use-case may not be aligned with that of the package authors.
I didn't say that was the "easiest" solution, or ever imply it. Don't put quotes around words I didn't say. That's really not cool.
The library is designed for use-cases like getting git metadata or contents from git repos. IIRC, the previous main sponsor (creator?) was a product that let you run SQL SELECT queries against git repos. There's no need to interact with the working tree at all in that type of use-case, so why would they spend a bunch of time implementing diffs for it?
If you're trying to use this library for an IDE, or something else like that where arbitrary modifications are made to working dir files, you're going to have a bad time. The library simply wasn't created to do what you want it to do. That doesn't mean it's bad or useless. It's directly imported by Vault, Pulumi, k8s test-infra, etc because its use-case is aligned with what these projects need.
Personally I think it's cool that I can use go-git to clone a git repo to memory and then perform programmatic read operations on the repo's contents. That's useful to me. It's not useful to you, and that's fine, but clearly we have different opinions and expectations around community-driven FOSS software projects.
>The verbosity of Go’s error handling has been much-maligned. It’s simple and explicit, but every call to a function that may fail takes an additional three lines of code to handle the error
Putting error nil checks into a function is an anti-pattern in Go. There is no need to worry about the LOC count of your error checking code.
> Putting error nil checks into a function is an anti-pattern in Go.
I assume you mean into a helper function like I've done with check()? If so, I agree with you for normal "production" Go code. But for simple throw-away scripts you don't want half your code littered with error handling, when you could just throw a stack trace.
> There is no need to worry about the LOC count of your error checking code.
Well, it means some functions are more than half error handling, obscuring the guts of what a function actually does. Even the Go language designers agree that Go's error handling is too verbose, hence proposals like this from Russ Cox: https://go.googlesource.com/proposal/+/master/design/go2draf... (there are many other proposals, some from the Go team)
agreed. when I see people talking about LOC my eyes roll. its verbose for a reason, the language designers WANT YOU to pay attention to the errors, not ignore them.
I think the ultimate question for me is, does increased tedium demand/imply/require "attention", or does it just create a new opportunity for mistake? If devs are so frequently writing these "check()" style functions, are they paying attention to or ignoring errors?
That's exactly why every other language took the wiser decision of actually having runtime errors.
To force you to pay attention to them before your application state went into an unknown configuration, thus making it nearly impossible to troubleshoot or even pretend to be deterministic.
I still have no idea how any programmer thinks this is OK. Nondeterminism and unknown/un-considered application state are literally the source of all bugs. I much prefer (and honestly believe it makes a ton more sense) to do what Erlang/Elixir does, which is to fail, log, and immediately restart the process (which only takes a few cycles due to the immutability-all-the-way-down design).
If you hit my Phoenix application with a million requests in 2 seconds and each throws a 500 error, my webserver will keep chugging along, while every other technology's webserver will quickly exhaust its pool of ready-to-go webserver processes and fall over like a nun on a bender.
I don't really know how to take comments like this.
I work on a pipeline that processes millions of events per day in a pipeline that contains two pretty busy Go programs. One has an embedded Javascript engine and does all kinds of pattern matching and string manipulation. The other does a ton of database read/write and some crypto-calculations to do data integrity for us.
Millions of events. Per day. Never a panic in production.
It is easily possible to write deterministic, highly performant, and durable applications in Go.
I was talking about millions of events per every couple of seconds, not per day. A million events per day can be accomplished with just 11 per second, btw; a very... achievable number.
WhatsApp, which is built on OTP, handled 64 billion messages in a day. Averaging out to 744,000 a second. Almost 10 years ago. (Granted, also on 8000 cores. But thanks to the concurrency...)
> every other technology's webserver will quickly exhaust its pool of ready-to-go webserver processes and fall over like a nun on a bender
Most other technologies don't use one process per request, but instead have a single server process which handles all exceptions that come out of a request by returning 500 without crashing the server.
You can absolutely do this in Go. In a webserver you panic and recover your goroutines for 500’s or other failures that you deem invalid states.
There’s a qualitative difference between what we call 500 errors or 400 errors and it’s a good thing to handle the former by throwing exceptions or panicing and the latter with normal program flow and error values.
Errors that you can sensibility handle and display are part of your domain logic, just values and functions. They should be handled and contextualised right there at the site they come up.
Errors that represent invalid program states should be thrown as far as possible and handled at the edge.
You may have misread this. The OP means "extracting error nil checks into a function is an anti-pattern", not "your functions should not contain error handling".
`git pull` is not easy! It implies implementing a merge algorithm, for example. (One could half-ass this by only implementing fast-forward merge, I suppose.)
Feels like you're missing the spirit of the article. Nobody's advocating it as a git replacement -- the author is just posting thoughts about something they built.
The second paragraph explains why this exists, and it's not to provide a useful implementation of Git.
> I wanted to compare what it would look like in Go, to see if it was reasonable to write small scripts in Go – quick ’n’ dirty code where performance isn’t a big deal, and stack traces are all you need for error handling.
It's a toy problem that's just big enough to be interesting. Comparing it to Hoyt's earlier Python implementation of the same problem lets him evaluate how Go would fit into a certain place in his development workflow.
It seems a similar story with the rest of git. I have hopes for gitoxide aka gix, and think the approach of library-first is correct going into the future. A CLI is then simply a thin wrapper around it, mapping argv to library operations basically.