Fun project to hack on, but strikes me as kind of pointless beyond the experience/academic value of making it.
There are already a number of read-only FUSE-git bridges, and we already have production-ready FTP daemons galore if that's the protocol you prefer.
If you don't like the existing FUSE-git bridges make one of those instead. It's more composable; you could also point apache at the read-only FUSE mount for example.
You're unnecessarily tightly-coupling libgit2 and the FTP protocol, and probably creating a bunch of bugs (potentially dangerous RCE ones being a network service) in the process.
I would love to have some kind of protocol standardized for this. Having done some work on build/packaging tooling for the ROS ecosystem, there are a bunch of cases where we we need to access a metadata file (for dependency resolution) or grab a tarball from a remote repo and end up having to have multiple implementations to handle the slight differences between Github, Gitlab, and Bitbucket, plus a fallback option that actually does a local clone, see:
I have hit similar problems multiple times in the past. So I'd be happy if there was a standard git server that is worth its name. SVN did that part right by using WebDAV, so any HTTP client is enough to grab a the contents of a file from there. It is a shame that git doesn't have anything sinilar out of the box almost 20 years later. Even hg has an embedded http that can provide this.
I feel like something went wrong a few steps ago if you find yourself needing this.
Happy to be enlightened otherwise?
I find it hard to criticize these sorts of projects. I think people should be questioned about strange ideas but over the internet it's hard to convey tone and hard to explain that I still want to encourage them to continue work on the weird, strange and unique.
In principle, the decentralized nature of git means that you don't really have to do this sort of thing. Just clone and be done.
But in practice, I do often find myself browsing my own code on GitHub for a variety of reasons:
- I'm in their office at their machine, but didn't bring my laptop.
- I'm traveling and want to talk about something with someone (I travel with a separate laptop).
- I'm half way through a big refactor and want to go back and see what master looks like. I know how to do this without GH/GL, but the simplest way to do that is to just pull up GH.
In all of these cases, and others, it's far easier to just pull up the file on github than it is to clone the repo locally.
So, given that we do sometimes want to browse our code remotely, the author then provides the (rather niche) criteria that make this a better solution than GH/GL:
"By serving a repo behind an FTP interface, we get these benefits:
- Web browser supported but not required
- Minimized network traffic, sending just the file data itself
- Supported on all platforms, with dozens of clients already written
- Support for both the command line and GUI"
Agree with you that this is a fairly niche set of criteria, but I could at least imagine myself in an environment where this could be a useful tool.
Tip for looking up other versions of a file in a local repository when you don't want to update to it: TortoiseGit on Windows can open the repo browser to show the state for a certain commit from the repository history window. Not a lot of people seem to know this. And to this date, I have found no other client with that capability.
I think you are a bit quick to judge to declare this as "strange" and "weird".
What I find strange for myself is the requirement of having a git web interface installed on the server and the need to use a web browser just to analyze a few parts of a remote repository.
Sure, it works, looks great, and there already are mainstream solutions available; but, a web interface certainly is overkill if all you want is to display some remote files.
I'm definitely not saying that web interfaces are wrong, but for people like me, simpler alternatives are very welcome, and I feel that FTP is a perfect fit for this. Git repositories are file trees after all and a "file transfer protocol" sounds like a better fit for this task than a "hypertext transfer protocol".
Apart from simplicity, this approach would have better composability (with external tools) as well, as the author put it:
> However these interfaces are fairly rigid, and don’t connect well with external tools.
Looking for simpler solutions to do things that we do everyday is never "strange" in my eyes.
FTP needs to die, immediately. It's an archaic protocol built around the limitations of early-1980s systems. In addition to the (by modern standards) insane protocol "feature" of having server responses be human-readable (and not machine-parseable) text, there's the batshit crazy mechanism of control and data connections it uses, which, in addition to having caused something like 2 decades of security problems, also makes it a nightmare to provision in modern networks --- and for nothing, for no benefit at all.
If in 2019 a developer built a file transfer system with the same service model as FTP, using the same design, you'd dismiss them for incompetence.
There is nothing FTP does that HTTP can't already do. The normal response to that is that there's no standard set of HTTP endpoints that delivers the file-upload capability that FTP exposes by default. But then, FTP doesn't even have a "standard" directory listing.
If we didn't have HTTP file upload, SFTP would be a reasonable thing to want. But we do, and SFTP shares with FTP the property of essentially being a restricted-purpose shell (with an attendant user authentication model), which, again, is of basically no benefit in modern systems.
To be fair to FTP, it was written for NCP in the 70s. Network protocol design was a very different problem back then and NCP required incoming data on a different port to outgoing (usually with adjacent, odd/even, ports though).
However I do completely and emphatically agree that FTP needs to die. There is no need for it in the modern era. We’ve had a whole plethora of better suited protocols since (I don’t agree HTTP is a good fit replacement but there isn’t a shortage of options out there) and in many cases FTP - even once you’ve applied the mountain of kludges and workarounds just to get it working with SSL behind two NAT’ed firewalls - still falls short of what a great many alternatives can already do.
FTP not only needs to die; it should have already died 10 or 20 years ago.
You have strong opinions about this, which makes me think maybe you can tell me details I didn't know about FTP.
> It's an archaic protocol built around the limitations of early-1980s systems.
What were the limitations, and how did they shape the protocol?
> protocol "feature" of having server responses be human-readable (and not machine-parseable)
The responses are successfully parsed by quite a few clients, no? The fact that I as a person can easily read the output would seem in and of itself to be a positive quality.
> control and data connections ... caused something like 2 decades of security problems, also makes it a nightmare to provision in modern networks
I know little about configuring network security, can you tell me more here? The idea is that in passive mode the server has to pick and listen on a bunch of random ports and so the firewall can't unconditionally block them?
Also you mention "modern networks" -- did it used to be easier to provision networks in the past and something has changed recently?
Someone else on HN probably has firsthand experience with the systems that birthed FTP, and I will be speculating a bit. But here's an example, and it's an interesting one because it infects TCP to this day: presumably because systems at the time didn't have workable socket multiplexing, FTP (and TCP) supports an "URGent pointer" that allows one TCP to flag another that important command-and-control data needs to be read during a file transfer --- this despite the fact that FTP is already (pointlessly) allocating an additional socket connection for each data transfer. The URG wart lives on in TCP to this day, unused by any modern protocol.
FTP LIST responses are "successfully parsed" by predicting that servers will return a circa-1991 ftpd "ls" listing. Which means that to be parsed by those clients, you need to be bug-compatible with those servers. That was the point DJB was making with his (parsable) publicfile output.
For a good starting point on FTP's security design, read up on [FTP bounce attacks]. But the key thing to remember is: this design is pointless. There is no reason for a file transfer protocol to be allocating new connections like this.
> There is nothing FTP does that HTTP can't already do.
In my experience, HTTP regularly chokes on multi-GB file transfers. I am not endorsing FTP, and when we have 100GB+ transfers we have in the past resorted to snearkernet.
FTP is not a better fit other than by sounding like it is. It barely has a reliable way to list directories whereas hypertext can actually structurally represent a tree.
Pretty much. It just gives you an unstructured blob of text. From the RFC:
Since the information on a file may vary widely from system to system, this information may be hard to use automatically in a program, but may be quite useful to a human user.
It works by convention and people writing piles of hairy code to over the years to make it behave. FTP is a really ancient protocol with many problems beside that. Much like short people, it got no reason to live.
I understand, but since we are comparing it to HTML; doesn't the same argument hold for it as well, and more strongly even?
FTP may have its quirks but I'd argue that nobody can say that it's simpler/easier to parse HTML than to parse FTP payloads.
In any case, can't this be solved to configure the server to conform to the most standard/clean/mainstream conventions? We are writing a server after all in this case, not a client.
I'm not convinced that FTP's shortcomings can't be easily worked around in this case; but I understand that you are trying to say that the ancient FTP protocol should die. Maybe you are right. And maybe there are very good reasons for that.
And then I'd say we need a modern file transfer protocol. HTML over HTTP still isn't optimal for this task at least some of the time, for some people. And I, with my limited knowledge, still believe that FTP can't be worse than HTML over HTTP for the task in question, even if not better.
To return to the matter at hand, if you wanted to know all the subdirectories in html, you could look for <a> tags. With ftp, you feed a string through strsep() and hope there's a 'd' where you're looking for it.
There are actually standards documents you can read that specify how to parse html. (Mind boggling length and complexity, aside.) But there's nothing that tells you how to interpret an ftp list response.
No, that argument doesn't hold. If HTTP servers gave file listings in a single PRE tag with newline-separated OS-specific impossible-to-parse entries, they'd be comparable, but even the HTML file listings designed for human consumption are straightforwardly scrapable, and in a modern system you'd just honor a .json (or Accept header) that would give you the result as a trivially parsed array.
You surely seem to know better than me. And I've never parsed FTP output directly, so maybe I shouldn't have come this far in this thread, but what I'm really having difficulty to understand is that; come on, is it really THAT bad? It's just plain text with columns after all.
In my limited understanding, the only parameters that can change is the separator and the number of columns. How hard can it be to establish a straightforward convention? Or why are we being judged so hard, despite the fact that we are advocating to use the protocol in a consistent, straightforward way; just because some people apparently made very bad technical decisions in the past?
In the web you can build a universe if you want. The API surface of it is gigantic compared to FTP. I like to use the simplest technology I can for the purpose at hand. And if the servers I use are outputting straightforward payloads, I don't have to care about what stupid things people did in the past.
I realized that I'm making the mistake of thinking that all we can serve over HTTP is HTML+Javascript+CSS. But anybody can easily serve structured JSON over HTTP, or any other clear format, like you pointed out:
> and in a modern system you'd just honor a .json (or Accept header) that would give you the result as a trivially parsed array
HTTP is not tightly coupled to HTML+Javascript+CSS. I guess I acted a bit defensive on this one because I hate the usual bloat that the web brings and suffered much from it.
Anyway.
Your approach would be to serve JSON over HTTP then, I guess. Would you have any other recommendations to for this apart from JSON over HTTP? Are there other simple and viable alternatives?
My preferred approach for command/control and metadata (like directory listings) would be JSON. HTTP already does file upload/download just fine, and there's no JSON involved there.
The output of ls can be whatever the server wants it to be. djb's publicfile infamously returns a list of files that doesn't look like other servers, which confuses various clients. (The irony is it was supposed to be easier for machines to parse. But nobody was expecting it.)
One factor that mitigates this a little is the "SYST" command. When a client sends it to gitftp, I have it reply with "215 UNIX" to give them a clue about the format.
I also browsed a few of the FTP mirrors for OpenBSD and took note of how they formatted their response, and tried to make my format match what I observed to be the most common response. It's worked with all the clients I've tried so far: ncftp on mac, ftp on openbsd, Firefox, mac's ftp-finder integration, and Filezilla.
So while there is admittedly ambiguity in FTP servers on the whole, I think my server in particular should work OK. Keeping an open mind though if you know about a problem I haven't foreseen.
So if you're looking at the output from a BSD derived ftpd, a few notes. That used to be just fork and exec "ls -l", and that was slow, so ls.c was copied into src/ftpd. But it's literally the same code. Whatever ls prints, that's what ftpd prints.
But the client parsing code is mostly strsep() and cross your fingers and hope to die. It's not terribly robust.
Like if you were writing a program, you would not prefer parsing the output of ls to running stat yourself.
I do this very often. Usually, I'm looking at a very large project (think glibc, WebKit, etc.) trying to find a certain thing (where is this output coming from, what code path leads to this behavior?); so I really only need to look at a couple of files. It's really not worth cloning the entire project for one little thing.
That's a big leap at best, misrepresentation at worst.
Git is specifically a decentralised version control system. Cloning the remote branches is how you view and browse remote repos.
Viewing remote non bare repos via FTP is fine but im not sure I understand the problem.
The problem is network traffic? Cool, checkout a shallow copy depth=1.
Browsing? I'm confused how ftp is worse than say GitHub? Or GitHub with some of the file browser add-ons/greasemonkey scripts.
I'm also pretty confused by the use case that ftp beats keeping a local working copy. Presumably you want to edit these files eventually right? Are you also editing remotely? What about a remote X session or SSH?
I'm just confused by the use case not saying you're holding it wrong.
Try answering this: in what world or under which conditions does this solution make sense?
I'm not saying it's wrong I'm saying I can't see what conditions make this make sense.
Well, yes, that will work, but many larger Git repos take many minutes to download, not to mention the fact that you now have random repos scattered around your system.
Yeah, it might just be best to use a script to run `git clone --depth=1` into a temporary directory and run the Git GUI on that checkout. I think I'll try writing that up.
Yeah, that's mostly why I've learned to find my way around GitHub and GitWeb's UI…I'm not sure if there's any way to implement this without downloading the whole thing.
I don't know much about the <video> tag, but in Safari I just see a very fuzzy screenshot, leaking off the right edge of the page. Clicking it does nothing, either.
In a real scenario gitftp would be run remotely. If you're hosting your own git server then you could host the ftp interface too. It won't work with e.g. Github.
I would be much more comfortable with using this if it was written in Go or Rust. Why are we still writing C programs to expose sensitive information when much safer languages exist to handle these?
Author here. I did aim to write the code carefully, for instance doing bounded reads like `fgets(cmd, CLIENT_BUFSZ, conn)` to prevent overflowing my buffer. Also ran the program in Valgrind and exercised different code paths to detect memory leaks and management errors such as use-after-free or double-freeing. Because libgit2 and its memory conventions were new to me, I did actually make mistakes and Valgrind helped me find them. The fix for those mistakes is in commit 59bb39b if you're curious to check it out.
Writing in C requires care, but the modern day "rewrite it in rust" crusade does feel overblown. C programs are nice -- small and fast.
Anyway, I don't mean to be cocky. A code review is welcome, I'd be curious if I did indeed overlook a security issue.
There are already a number of read-only FUSE-git bridges, and we already have production-ready FTP daemons galore if that's the protocol you prefer.
If you don't like the existing FUSE-git bridges make one of those instead. It's more composable; you could also point apache at the read-only FUSE mount for example.
You're unnecessarily tightly-coupling libgit2 and the FTP protocol, and probably creating a bunch of bugs (potentially dangerous RCE ones being a network service) in the process.