This looks like a very interesting project to make a lightweight sftp server, though the dependency on an SQL db makes it a bit harder to install.
Go has a really good set of crypto primitives, an excellent ssh implementation maintained by the core team, and a great sftp server and client library which makes this project possible.
I recently added an sftp server to rclone: https://rclone.org/commands/rclone_serve_sftp/ - this can serve any of the cloud providers rclone support as sftp (or local disk). This runs on windows/macOS/linux too.
It was a joy to add this as the Go libraries are very well thought out and easy to use.
Not sure the case in this context, but usually the SQLite dependency can almost be seen as no-dependency at all - it's just a file on the disk without any external dependency.
Serious question, why hasn't anyone rewritten it in Go? Been awhile since I've looked at it but C to Go doesn't seem that difficult to translate.
Edit: just looked at the amalgamation again, it's huge... Still surprised no one has tried it yet that I know of. Pre-processor directives would probably make it even more difficult though.
Edit2: and it already works so what's the point, other than an extremely challenging problem
It's really not that hard. The code is clearly documented and everything works in roughly the same way. Easily achievable by a single developer in a few months.
Sqlite code is emulating classes with structs and methods (they're even called classes in the code). This it's pretty easy to translate into languages with reference and struct support.
Memory allocation is separated into an allocation provider, which is easy to implement in GC languages (just create the struct requested).
The parser and VM opcode file is tricky. The opcodes are parsed from comments in one big file. I'm not a fan of that. But once generated for a particular version, pretty easy to translate.
Testing is another can of worms. Translating the exact version of Tcl, too is the best course of action. This is required to run and pass the (hundreds of thousands of) base line tests. Other correctness tests (in sqlite because of branch coverage requirements for aircraft software) are harder to translate. The prerelease burn in test is also not publicly available.
Preprocessor directives are not used as macros. Rather they are used to enable features. In a port, you can for example choose to exclude WAL or specific optimizations. FTS can be left out if not needed. The different lock implementations can be deferred to the stdlib.
If you interested in the perf overhead, here's a small benchmark of a C# implementation I maintain:
Can you point to the pure Go re-implementation of Sqlite? I was looking for such a thing but never found it. Instead I had to go with a k/v store like Bolt.
Sqlite databases are often not shared between applications. Do you know a similar sql, in process database in pure go (even if not completely similar feature wise ?)
What would the RL use case be for running this server, whose purpose is to facilitate transferring files from one computer's filesystem to another, on an ephemeral filesystem?
If what's meant here is that it would be hard to run this on a container platform because of the ephemeral nature of containers, assumably you would be using persistent volumes to keep your state safe across the container lifecycle.
That being said if you really wanted to use sqlite for the database, you'd do the exact same thing, use a persistent volume.
I wrote an interface that would allow SFTP transfer to Azure Blob Storage (doesn't support it) that doesn't store anything on the machine running the service. It was really just supposed to be a frontend.
Then you might also want to check out BadgerDB. I can't say which is better, but I know that both are popular and in production use, so probably each with their pros and cons (e.g. read vs. write speed).
A lot of the world runs on sftp. The flexible user management in this code would have saved my team about a month of time implimenting our own solution.
OpenSSH is very much tied to system users, but sometimes you might want to give access to external users that don’t need to exist as actual unix users.
because SSH and SFTP are so closely tied together, a configuration via PAM is pretty hard and inconvenient because creating fake users via PAM for SFTP will also create them for SSH and because there’s no easy way to map all such virtual users to the same user-id. Also, because OpenSSH has zero support for virtual users, aside of a PAM configuration, you also need an NSS configuration and now all your virtual users in some database have suddenly become system users on your box.
SFTP as a protocol on the other hand is very convenient over, say FTP over TLS because it’s using a single TCP port and it has been created this century.
So having this self-contained project is useful when you need to allow third parties access to files but you also don’t want to create system users for them or risk f’ing something up with PAM
I'd second this, I've longed struggled to come up with a easy-to-deploy sshd/sftp/chroot configuration that permitted easy database-driven configuration w/o extra shell access. You have to fight a lot of defaults to get this just right.
Would the OpenSSH upstream accept patches for an unprivileged sshd/sftp-subsystem to make this easier to use their battle tested code?
No, it is a static unchangeable password. Graft can't "manage" user accounts, it just has one user with password. It does not support keyfiles or other authentication mechanisms.
I wrote graft to have a simple portable tool for transfering files in a network without shares - the main idea behind it was to run:
graft serve myfiles/*.txt
on the server side and then
graft receive
on the client side without having to remember the ip or hostname - because zeroconf / mdns is used, it will find the server automatically, if the network is not too big. If there is more than one server, it will prompt you to choose the right one.
I only used SFTP, because it is a secure way to transfer files over the network.
on the other hand, its SQL backend modules insist on fetching a password and comparing them using their own functionality and there's no way to get to the the user's typed password into a custom query.
That means that there's no way to authenticate users if you use a password encryption scheme not supported by the bundled modules (like bcrypt for example)
Seems like a use-case for just wrapping regular SFTP in a virtualized userland with definable hooks for things like PAM’s user data lookups. I believe that gVisor does this?
To be fair, it is written in Go - a language where a lot of projects built can just be single, portable binaries with no dependencies (as this project appears to be).
With this solution you can easily create fake account just for sftp, and the installation is very simple. In other words, it solves the biggest security problem with sftp accounts - what you usually want is only to give people access to a few files, and not to give them full system accounts.
I actually had to implement something very similar. The use case was an SFTP upload interface for clients that they would use the same login credentials they use for the web portal to upload to unique subfolders in Azure Storage.
Yeah, I'm hoping to see use of Go routines in useful rewrites of projects like rsync with parallel file uploads. It can lead to clean, extensible, and maintainable code with still low overhead and high throughput.
I've been avoiding starting a project myself which will require SFTP. Passing text files around via SFTP is how a lot of the world works. It's still painful to implement this, and makes the transfer of data slow (polling for new files to process etc). This introduces a big delay in any data feedback cycles. Managing users with system accounts is painful too.
This project looks to alleviate a lot of these problems.
It'd be great to be able to register webhooks so that it could send events to external systems. Ideally I'd like to know when a file is created/deleted etc without having to walk directories on the sftp server on a regular basis.
Yes, if you can avoid polling by getting events it can be a good system. But really you want to be notified when a file is closed successfully, not closed after a timeout of other error.
For me, in a previous system I uploaded as a .tmp file, then the uploaded renames to .zip at the end of the upload. The rename is atomic so no chance of the consumer reading a partial file.
I was dealing in CSV, TSV and XML files. I chose to upload them all in a zip wrapper not only because they compress well, but also because each file has a CRC and the file won’t be extracted if the file is damaged or truncated. It’s hard to tell if a raw CSV file has been truncated.
just drop files with guids into stage dirs and send all staged files over to another stage dir on the other side. When successfullly archive on both ends of the transfer.
Can this project be used as a library and integrated into an existing program?
I am asking, b/c I am currently working on a little document management system for home-use.
It looks like the only sane way to integrate with document scanners is (S)FTP upload or Email (SMPT).
I have tried to use off-the-shelf FTP servers for this and inotify to get notified of new uploads.
The the solutions work OK, but are hard to setup and rather brittle (and limited to Linux).
Go seems to be the ideal language for this project, because its concurrency model and the ease of distribution (as single binary files).
- Has anyone here experience with building systems like this (integrating via FTP/SMTP)?
- Any recommendations for languages and libraries?
You might need to double check your requirements — SFTP is a very different protocol from FTP/FTPS. Using SFTP for something like document scanners without strong Unix roots would be surprising to me, at least.
I’m currently architecting a transport administration system written fully in Go which includes XMLEDI over SFTP; this project is perfect as we need to integrate with the current IAM.
Go has a really good set of crypto primitives, an excellent ssh implementation maintained by the core team, and a great sftp server and client library which makes this project possible.
I recently added an sftp server to rclone: https://rclone.org/commands/rclone_serve_sftp/ - this can serve any of the cloud providers rclone support as sftp (or local disk). This runs on windows/macOS/linux too.
It was a joy to add this as the Go libraries are very well thought out and easy to use.