One-Process Programming Notes

kodablah · on Aug 6, 2018

From experience, before you just spinkle big ass string literals throughout your Go code that communicate with Sqlite, just toss all of your persistence in a separate package if you can. I'm not saying some crazy abstraction, just put it some place common if you don't have a lot of advanced SQL needs, and put in the minor extra work serializing to and from structs. You'll be happy when you want to add query logging, find certain usages, need code reuse, and if you ever want to change your persistence approach.

Also, devs should be aware of some of the concurrency limitations of Sqlite. Devs need to essentially share a single connection across the app, and serialize the writes.

Also, side note, I've found one really neat use case for Sqlite over other DBs: simple disk encryption. I wrote a Go wrapper for one of the popular ones [0], but in general if you distribute your app in other ways you may want to offer your users encryption that you can't easily with other DBs. Sure you lose some performance, but sometimes it's worth it (again, less in the server case where you control it and/or can do fs-level encryption, but definitely in the distributed app case).

0 - https://github.com/cretz/go-sqleet

marktangotango · on Aug 7, 2018

The author talks about using multiple readers and serializing all writes to a single connection fyi. This is how you get really stellar update/write performance out of sqlite. In my test I saw 3x insert per second compared to Postgres for example.

pavlov · on Aug 7, 2018

"Most problems do fit in one computer, up to a point. Spend some time determining where that point is. If it is years away there is a good chance one computer will do."

I've been following this principle for a decade, and 98% of the services I've built have never needed to grow beyond one instance. The ones that did were solved by breaking out specific functions into their own managed services, not by applying a wide horizontal scaling solution.

If you build for scaling before you have anything running that can be tested against, it's highly likely that you're actually scaling the wrong thing. It's like optimisation: to get any meaningful benefit, you'll want to surgically solve the actual bottleneck rather than sprinkle hope-based gestures all over the place. (I'm reminded of the article that was posted the other day, where the writer experienced a Firebase slowdown and tried to solve it by upgrading hundreds of npm packages to their latest versions. That's what I mean by hope-based gestures. Premature "web scale" architectural tripping is essentially the same kind of shotgun approach.)

grey-area · on Aug 7, 2018

Sure, I'd say this applies to the vast majority of websites, and technologies like docker, kubernetes or multi-region deploys simply are not required for most projects. There is no need for massive scale and multiple nodes for most services/websites and if you find there is, well you'll be rewriting a bunch of things anyway unless you've done a lot of work on very large websites before.

I do think there's something to be said for habitually separating the database though - if you store data elsewhere, you can have 1 instance or 2 or 3 pretty easily, and sometimes having more than one is nice for redundancy, handling big spikes or seasonal traffic etc. It is pretty easy to scale horizontally nowadays on cloud services where they take care of load balancing and sometimes the data store too. You also don't have to worry about wiping a server and starting again, as your data is elsewhere.

Embedding a db like sqlite introduces other problems (cgo requirement for one). It is really handy for things like mobile development though as then you don't depend on a separate api or db for data. It's also handy if you don't want potential users to have to worry about setting up a db when trying out your project, so an embedded db is really nice for that - I wish golang had a clone of sqlite that was pure go code as it would make this even more attractive for a broader range of options.

pjc50 · on Aug 7, 2018

"One process programming" is a really under-rated philosophy. Communication overhead can be substantial and concurrency issues are harder to debug. A single process that you can just stop and inspect in the debugger is easier to work on. Especially if you're only one developer as well.

Two big pieces of software I worked on used this philosophy. One was https://en.wikipedia.org/wiki/Zeus_Web_Server - nowadays its niche is occupied by nginix, but back then its advantage was being a one-process-per-core select()-based solution, compared to Apache's process-per-concurrent-request solution. It was also absurdly, unnecessarily portable, being written in vanilla POSIX. Ran on a dozen different flavours and architectures of UNIX almost all of which are now dead.

Another was chip design software for a startup. Eventually it would have been useful to parallelise computation, but in the meantime it was so much faster to develop it and just buy big machines and leave it running overnight.

See also https://adamdrake.com/command-line-tools-can-be-235x-faster-...

interfixus · on Aug 7, 2018

Unsurprisingly, comments start pointing the way to added complexity like Postgres and containers and whatnot, but this is exactly the right approach for probably the large majority of webapps out there. One executable, one db-file, up and running anywhere within a very few minutes. With a modicum of intelligence thrown into the design, you're never ever going to see SQLite acting bottleneck. Few sites have request rates on the scale of tens of thousands per second.

Developers of would-be federated services to decentralize the web, please take note. This is how you go about ensuring wide deployment. Simplicity, if anyone remembers the word.

oavdeev · on Aug 6, 2018

If you're using AWS anyway, I think using their managed Postgres / RDS could make this a lot easier operationally. No need for custom syncing/backup scripts, easier to scale the storage up.

To the larger point, while I totally agree with the premise -- most apps will never need to scale beyond one large instance, I'm not exactly sure what's the actual tradeoff is. If you're writing a simple CRUD app, it is not really controversial that it shouldn't need any complex distributed stuff anyway, it is just a simple python/Go app and a data store.

Most "fancy" things outside that single process paradigm, such as avoiding using local disk and using S3/RDS/document stores, using containers for deployment, etc. usually have more to do with making the app operationally simpler and easy to recover from instance failures than scaling per se.

chii · on Aug 6, 2018

> operationally simpler and easy to recover from instance failures than scaling per se.

I'd say architecturally making the app easier to perform this function, rather than rely on infrastructure to do it for you (and hoping doing so will make the app "easier" to code).

For example, use an append only data structure (like event sourcing), with snapshotting. Then your app recovery process is just "restart".

X6S1x6Okd1st · on Aug 6, 2018

I thought part of the reason why sqlite was chosen was because it can run on the phone.

erdaniels · on Aug 7, 2018

"Typical concerns about the impact on software quality of adding C code to Go do not apply to SQLite as it has an extraordinary degree of testing. The quality of the code is exceptional."

It could be worth noting the thread concurrency limit imposed by cgo calls locking the OS thread to the calling goroutine. For example, 128 concurrent requests running long statements would take up 128 OS threads.

Of course any language not using a lightweight threading model has this issue either way if each request is mapped to one thread but it seems worth it to point out that you may want to limit your own concurrency at the request level or with properly sizing a connection pool (like the one present in the sqlite driver).

Edit: whatever the solution to concurrency is, benchmarking it is a good idea for any sized product going into a production environment. There's a good article about this by the people at Cockroach labs: https://www.cockroachlabs.com/blog/the-cost-and-complexity-o... (see Cgoroutines != Goroutines)

le-mark · on Aug 7, 2018

like the one present in the sqlite driver

Last time I checked mattn's lib, there was no connection pool, and my tests supported this. Do you have other information?

erdaniels · on Aug 7, 2018

David created his own at https://godoc.org/crawshaw.io/sqlite

hathawsh · on Aug 6, 2018

This is a good article, but I feel like using Postgres instead of SQLite on the server side would achieve his goals even better:

- No need to link with C code. There's a pure go driver for Postgres.

- You get the 'psql' command to inspect and change the database, even while the web app is running.

- You get a lot more database features.

- You can keep the deployment simple by running Postgres on the web app server until you need to scale up.

- Conversion to a full blown cloud database and horizontal scaling is just a matter of configuration.

I understand the desire to keep the stack as simple as possible, but it looks to me like Postgres would actually be simpler than SQLite for the server side.

colanderman · on Aug 6, 2018

Added bonus: if you trample memory, you don't corrupt your database. [1]

[1] https://www.sqlite.org/howtocorrupt.html#_memory_corruption

kodablah · on Aug 6, 2018

> You can keep the deployment simple by running Postgres on the web app server until you need to scale up.

This doesn't affect deployment. Deployment includes setting up and upgrading and potentially backing up Postgres. All of these are easier with Sqlite. While in general I agree with you, the benefits of putting one file on a server and running it has value over installing an entirely separate software product.

hn_throwaway_99 · on Aug 6, 2018

> Deployment includes setting up and upgrading and potentially backing up Postgres.

But I find this rather trivial to do with cloud database services like RDS. Why not just use that? At the end of the day it gives you tons of benefits (e.g. easy backups) with no real lock-in because you'd still just be using a normal postgres driver.

kodablah · on Aug 7, 2018

> Why not just use that?

No particular reason, depends on the use case. Every choice has tradeoffs, and there are cloud-specific ones that can hinder mobility even if it uses a common tech. Lock in is about more than code reuse especially as dependence grows. Granted, for many use cases it might be the best option, but we shouldn't pretend like it's the exact same mobility profile as copying a DB file.

qznc · on Aug 7, 2018

What about local development? With sqlite I can clone my repo, execute something like "make run" and my server runs.

hn_throwaway_99 · on Aug 7, 2018

It's quite simple to do the exact same thing with postgres, you just configure a separate connection string for prod vs local deployments.

pvg · on Aug 6, 2018

the desire to keep the stack as simple as possible

It's a somewhat deceptive simplicity, to boot. The fact that you can use SQLite like this is a testament to its versatility and quality but it's not really for this. There's a conceptual overhead in trying to wedge it into this role and it can also easily become a practical one.

qznc · on Aug 7, 2018

Can you deploy Postgres by simply linking it into your executable?

ngrilly · on Aug 7, 2018

No you can't. But you can deploy the PostgreSQL executable along your app executable, and run it as a "sidecar" process, using a Unix-domain socket to let your app communicate with the PostgreSQL server, and setup PostgreSQL to store the database files in a location specific to your app, instead of a system global location. It's a bit unusual, but it's possible, and PostgreSQL command line makes this relatively easy.

lmm · on Aug 7, 2018

Yes, though it's a less common configuration and so a bit fiddly and underdocumented.

ngrilly · on Aug 8, 2018

As far as I know, there is no way to link PostgreSQL as a library in another executable. At least, nothing supported by the official PostgreSQL project. Even MySQL, which used to be available as an embedded library, got rid of this, feature a long time ago. Do you have any link to this "underdocumented" feature?

chj · on Aug 7, 2018

While reading this, please keep in mind this is for indie developers, who needs to do all kinds of jobs. Chances are SQLite is the one thing that everyone must learn to use, if your business involves building apps. So if you can use it on server side, it would save a lot of needless learning. Same reason as how nodejs got started.

Some may say, well, sql servers are not that hard to learn. Well, that is because you live with it day and night. Try going back to it after 1 year of break.

cabaalis · on Aug 7, 2018

> I claim typical services do not hit this scaling limit.

This seems to be the main push of the article, and I for one agree with that statement.

justinholmes · on Aug 6, 2018

I've done something similar to keep it small with potential to go multi region for one of my new projects using Bitbucket and Google Cloud.

Bitbucket pipelines that create an image of the app, create instance template with a container "gcloud beta compute instance-templates create-with-container", create instance group, create DNS, global load balancer, SSL certs and rolling update of the instance groups with a new version (if applicable).

Using preemptible g1-small to keep costs down but deployed in at least two zones per region.

ngrilly · on Aug 7, 2018

Where do you store your data? Since you use at least two zones, how do you keep your data synchronized between zones?

Do you use SQLite?

What do you mean by an "image of the app"? A Docker image or something else?

How does your rolling deploy work?

bvinc · on Aug 7, 2018

Just today, I finished writing my own low level go sqlite driver. I haven't published it yet. I really thought there was a need for it. I hate the database/sql integration that the other drivers have. Now I'll have to see if my driver has any need to exist.

This always seems to happen when I write something.

crawshaw · on Aug 7, 2018

I'd be curious to hear if my sqlite package fits your needs, or if it doesn't, why it doesn't.

Collaboration would be great if it's possible, but there are good reasons you may want to do your own thing.

bvinc · on Aug 7, 2018

Your sqlite package totally fits my needs. My biggest problem is that I hate the database/sql layer. Connection pooling is completely unnecessary and surprising in a sqlite context. I honestly have no idea how anyone is using it with sqlite. The concept of a sqlite "connection" goes out the window. You can no longer use sqlite transactions in any normal way. You get locking issues if you don't force it into shared cache mode.

I've always wanted just a pure sqlite package with a one-to-one mapping of functions. Your driver is totally what I've always wanted to exist.

I've put up a preview version of my package if you wanted to look at it.

https://github.com/bvinc/go-sqlite-lite

My project started as a fork of https://github.com/mxk/go-sqlite/tree/master/sqlite3

I ended up removing a ton of features, removing the database/sql driver, getting rid of all the "caching" of information that it does, removing callbacks (which I plan to re-add later), and renaming the methods to match sqlite function names.

It still has some niceties that I plan to keep, such as the Scan method to extract several columns with a single method call, and a RowData type to extract rows into a go map.

I found your project in Github and I'm going to suggest some things. There's a particularly nasty footgun that you might have by not having HAVE_USLEEP defined on UNIX. I'll open an issue.

crawshaw · on Aug 13, 2018

Ah, you implement the online backup API! That's something I need to do. Opened an issue.

le-mark · on Aug 7, 2018

Last I checked mattn's wrapper was for an older version and was missing some things, like registering an update hook (I could be wrong, been a while). I welcome a current, lower level wrapper. If you publish your code, I'd take a look.

iovrthoughtthis · on Aug 7, 2018

There is nothing wrong with implementing something someone else has already done!

Usage of a library has more to do with it’s community than it’s codebase.

ckastner · on Aug 7, 2018

> There is nothing wrong with implementing something someone else has already done!

Well, the phrase "re-inventing the wheel" exists for a reason.

> Usage of a library has more to do with it’s community than it’s codebase.

The purpose of a library is code re-use.

For comparison, the wikipedia page [1] for a computer library does not even mention communities, or the social aspect in general.

I'd say the community is a factor in the decision to use a particular library, but it's not the driving factor.

[1] https://en.wikipedia.org/wiki/Library_(computing)

jwdunne · on Aug 7, 2018

And most people discount the learning experience. You don't really know how to do something until you've done it, with all its warts and edge cases and all.

mikhailfranco · on Aug 8, 2018

Scalability! But at what COST?

Single process performance and the COST principle, from the iconoclastic (some might say bombastic) McSherry:

[1] blog http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...

[2] pdf https://www.usenix.org/system/files/conference/hotos15/hotos...

TooBrokeToBeg · on Aug 7, 2018

Why is half the blog post about SQLite? Use SQLite, because everyone has or will. Yea, got it.

gf263 · on Aug 7, 2018

I really enjoy the minimalism.