This sentiment pops up regularly on HN, and I've seen at least one article per month for the past few months, but the trouble is, none of them seem to help you actually deploy it. They assume you're comfortable spinning up public web servers.
If you want to use a PaaS to deploy an app, because you don't want to spend your time learning to be a sysadmin, then all the tutorials are going to put you on the Postgres path, because that's what's supported. (Of course, you'll then end up paying $15+/mo for Postgres, which is hilarious for most hobby projects storing 50MB of data.) But in reality, you could just scale vertically on one machine and be completely fine. No need for "distributed" anything, in theory.
I took a shot at productionizing SQLite here as an experiment: https://cheapo.onrender.com/
But I'm not sure I did it right, because I don't have much experience working below the level of a PaaS. I'm an application developer and I do not want to become a release engineer. I resent even having to learn Docker. :-)
Anyway, if somebody wants to nitpick my Flask+SQLite deployment, I'd really appreciate it, because it seems really silly that people have to keep writing this stuff from scratch when 90% of hobby sites have the exact same needs. And the Fly.io/Render configs would apply just as well to Node, Ruby, etc. https://github.com/irskep/cheapo_website
Edit: Somebody got mad at the Docker bit. I tossed it in for effect, but I promise you I don't hate learning new skills. I wish people would recognize that weekend coding projects should be fun, and sometimes that means avoiding certain kinds of things that a person experiences as difficult or frustrating. Arguing on the internet sucks, who knew?
> I'm an application developer and I do not want to become a release engineer. I resent even having to learn Docker. :-)
Sorry, but that's a terrible attitude to have. You don't need to be a full-blown sysadmin to know how to do basic deployments, and learning these things will make you a better developer.
Here is the caveat - you have to learn and be interested with it.
If you just want to do the minimum to run the app, it will make you worse and usually utterly terrible sysadmin, as it usually is going to be "do the minimum required amount of copy-pasting from tutorials to make the stuff running".
Then inevitably don't update shit, nor even set unattended-upgrades package.
And if the server won't get hacked from bad firewall and no updates, the server will ran out of space because dev of course didn't thought about data management and some log file in wrong place without rotation overflowed.
That's from decade+ of fixing shit of developers that think that tutorial on internet made them sysadmin.
At the same time, I kind of wonder at the direction of things when pretty much every package and application has decided to embrace becoming more complex over time.
It's only going to increase cognitive load, potentially without end (?) which doesn't sound wise long term. :(
I don't know why this is downvoted, it's probably the best advice on this page. Learning a few things about deployment definitely helps to make better software.
Well, he did cherry-pick one joking comment out of a well-intentioned post in order to say I have a bad attitude, so that's not exactly welcoming. And very patronizing, considering I have actually done that kind of work with Salt Stack a few years ago, and just don't want to bother with it at the moment because I have other priorities.
I learn new things every day, and one day, Docker was that thing. I am now better at release engineering, cool! But I also lost a day of working on shipping value to users and actually having fun, which is less cool. That's why PaaS has been helpful to certain kinds of people; it makes the curve of progressive disclosure necessary to deploy stuff a little flatter. My hope is that we can make more kinds of things really easy to do the right way, and not yell at people who just want to have some fun.
Sure, that comment came a bit rough. But I'm not even sure it was triggered by the Docker comment, rather than the developer/release engineer opposition. At least that's how I see it. And maybe I just don't see the added value of PaaS because I've been spinning up public web servers for too long now ;-)
Yeah, that's fair, and it was bad phrasing on my part. The engineer type distinctions are completely imaginary. It just really threw me off to have people respond to me sharing something I thought was helpful with claims that I'm somehow opposed to learning, when this project is nothing but learning.
I do think people who have experience deploying software have a blind spot here, just like I have a blind spot for people who have never learned how to center a div. jUsT uSe FlExBoX
Hey there, I've never deployed anything but will be going to in the (very) near future. I'm self-taught and really haven't fooled around with servers, ever. So instead of just going down the PaaS way I was hoping to kind of turn my computer/raspberry pie into a server as a fun learning project.
Any cool, beginner friendly resources you can share? Many thanks
Agreed. Too much reliance on AWS, Heroku etc. can leave you without vital Linux skills. Linux cli tools are not just for getting a job done, though they're great at what they do. No, they're part of the *nix culture which is what open source software is based on. Only recently have I become aware that it was better to have entered software development in or before the early 2000s, ie. before The Cloud took off. You were forced to learn Linux/BSD cli tools just to swim in the water. I can't imagine, back then, knowing Perl and not knowing Linux or FreeBSD.
I used slackware linux. I wrote assembly code when I was a teenager. I use 'perl -i -pe' constantly. I still don't want to learn a bunch of arcane flags from a bunch of tools with a million gotchas. Sorry. I'd rather focus on the code craft that gives me joy these days.
Sorry, I detest this attitude of: "Well I was hazed so why aren't you hazed too?"
Did I mention that Russ Cox and I used to write C code with pen and paper? https://news.ycombinator.com/item?id=32874759 If you haven't, are you even qualified to discuss what vital skills others are missing?
Or maybe because I learned perl because it was simpler than tr and awk, I should STFU? Give me a break.
> I used slackware linux. I wrote assembly code when I was a teenager.
Lol are you me?
I grew up with Slack/assembler/softice/etc and I used to host everything on Slack boxes on Linode (bless them for having a pre-built image). But I left it a while ago because I got sick of dealing with hand upgrades and compiling everything from scratch. I feel the same way about other things. I could probably out-CLI most of the people here saying "you should learn linux!" but sometimes I just need to get something done.
If I'm trying to get something out the door, I don't give a rat's ass about the details. It's fun and interesting when you're a teenager and it's 4am on a summer night and you've got nothing better to do, but these days I have strict time, limited energy, and I'm not going to spend more time fiddling with things than I need to.
I mean, you can just put it into automation and never have to repeat same linux cmdline dance to do the same thing again. There are plenty of configuration management tools to do so. Hell, I outright forgot some skills because I encoded whatever I will ever need from a given tool in Puppet manifest and didn't had a need to touch the internals again.
Docker, k8s and friends are essentially just level above the "just CM", still just use basic linux primitives but with some coordination.
The "new toys" are essentially abstractions on what sysadmin would do (anyone remember building chroots ? Yeah I don't liked it either, even if it felt "cool" the first time I did it) and every abstraction will leak which means once you are big and complex enough you will need to debug it. Just that random hobby project probably won't get there and even for work it probably might be someone's else job to debug that
> I should STFU? Give me a break
...you probably should, your whining and bragging doesn't add anything to discussion. "Oh I played with cool kids few decades ago, look at me Mr. Important"
I didn't spend six years on a PhD to be called Mr. Important.
I'm joking.
Okay, bringing this conversation back to a normal tone: I have specific needs and want to get specific things done. I agree everyone should learn basic unix. I also think people should get as low to the metal as possible so they understand performance.
But can you acknowledge that there is a near infinite number of things people should learn, and that the priority of that list might be different for other people? That for some, and their technical goals, they might not have the time to prioritize multi-server web app dev when they're focusing on figuring out how to get GPUs to do DDP correctly? I never learned k8s because I all my web apps have super low usage. e.g. data annotation frameworks. I don't know JS and I guess maybe I'd love to, but I'd also like to wait until the field stops moving so correctly so that I could just pick a great framework and not have to relearn everything constantly. HTMX is cool and gives me joy.
Can you acknowledge that building tooling that allow devs to focus on what gives them joy is actually a nice thing? We have enough problems in the work we like that we end up spending endless hours debugging. If someone refuses to learn what they need to solve their main technical pursuits, yeah that's a moral failing, but nonetheless there's a finite surface area we can cover and I prefer to focus on learning the things that come up in the line of duty.
p.s. what are you working on? Maybe we can play together. I am not being snarky. I'm collaborative and maybe you're cool too. :)
>I don't know JS and I guess maybe I'd love to, but I'd also like to wait until the field stops moving so correctly so that I could just pick a great framework and not have to relearn everything constantly.
/js rant start
I think when that happens it won't be because JS stopped moving, it will be because people will just go "fuck it" and replace it wholly with <favourite language> + WASM.
At least that's how I managed how to not learn JS properly for years. Web frontend work is just drudgery, and I only did it few times when we decided trying to explain a frontend developer ops stuff to make admin page out of is more complex that stitching some shitty code together.
And every single time it was miserable experience and I think only thing that I made and is still in supported framework is fucking jQuery, because every other one seems to shit on backward compatibility and decide to just make new framework. Then call it same name but just increase the number after the name.
/js rant end
> Can you acknowledge that building tooling that allow devs to focus on what gives them joy is actually a nice thing?
I did just say that yes we have those tools and it is fine that dev might not care about internals, I just want to highlight that the whole "I don't care what's underneath as long as it works" might eventually bite, so doing the boring part of understanding something about underlying system might save a lot of effort going forward.
Especially if that bit extra effort allows for overall simpler architecture, as easier debugging usually comes with it. "Just look in logs" or see what process is doing in system directly using installed tools is infinitely easier than fucking with docker/k8s CLI, sidecars and other methods to observe the app in container.
Recent example: A bunch of devs in company we provide infrastructure for got starry eyed for k8s and we just recently had to re-explain that no, the "just give us big POSIX filesystem mounted everywhere" on dozens of nodes isn't gonna work well, and the fact you can tell k8s "give me storage with ReadWriteMany" won't just work. That was after they tried to make shared block device and run XFS on it, which fell apart for obvious reasons
That after same company had year+ long migration from "a big GFS2 volume that was slow as fuck because GFS2 is slow as fuck when you have dozens of nodes for it" to plain S3.
One meeting later and they figured out they don't even need shared storage in the first place, but nobody researched that before and their solution "worked" on test 1 node k8s cluster (for obvious reasons). While a bit of research would have saved the whole ordeal
So less communicative company might just go "right on, we will set up CephFS for you and go ahead" then get more maintenance and the inevitable "this clustered filesystem doesn't work EXACTLY like my XFS partition on ubuntu" problems. It would probably be better for billable hours tho...
More low-scale example: My VPS just have few systemd services for few apps I put there. unattended-upgrades and it is near-zero maintenance for years. Granted, I knew how to do it because that's my day job but I have 100x less complex setup on my VPS. Maybe I should write blog about it...
> p.s. what are you working on? Maybe we can play together. I am not being snarky. I'm collaborative and maybe you're cool too. :)
Well, pandemic/nearby war kinda fucked up every project I started
I was working on midi2sid chip eurorack module. Which was delayed for year+ just waiting to get parts. I think they might be available again, maybe I should resurrect it. It did ran in rust and actually played notes, I just ran out of pins on the MCU I used so I had nice big board with bigger MCU and some niceties added, made it ready to be assembled... right as the chipaggedon started.
I was also working on car datalogger, even got prototype working, except recent economic fuckery put any track days on hold. The plan was to finally get a house, the inflation put any sensible mortgage out of question for at least nearby future so all the fun money are going to the savings now.
The other plan was to replace car's radio with something newer and integrate that with it (via dumb "just send rPi HDMI image thru the infotainment screen" probably), but yeah, can't really do that without garage in the first place...
I experimented with home automation a bit and had some ideas (involving embedded Rust, it is surprisingly palatable) but, well, again, the plans involved having a home, and not really that much possible to do in rented apartment
In meantime like 30 projects that scratch itch nobody else but me have and were not touched after scratching part started to work fine.
I did decide to implement the CPU from The Art of Computer Programming. But it looked annoyingly weird so I just made Go Z80 "emulator". Then rewrote it in Rust as exercise (both in ImGui as frontend). Then realized that getting the other chips than Z80 "right" is a ton of work and.... not that fun work to boot so I called it learning experience for Rust and ImGui and left it alone. It did run enough assembly to do something tho. No longer runs after Rust version upgrade tho. So much for backward compat...
But currently nothing really interesting. I have those cycles, where if work is interesting I spend a most of free time in brainless/semi-brainless fun bracket, and if work is boring I do the interesting hobby stuff, and, well, recent company buyout and some other changes made sure I'm having all the "fun" at work I can take so I haven't been touching hobbies much.
So many things (deployment and productionizing) are easily possible without Docker, so why add that layer of complexity unless you know you have a good reason?
Why make Docker and essential part of a system before you actually need it? There’s a reasonable chance that you never need it. I know this, because a lot of business and hobby systems have been in operation since before Docker existed.
Docker has value in many cases, but it certainly doesn’t have to be a necessity.
Go app which is one blob and some static file ? Sure. Java app that just needs JVM in system ? Go ahead, .service systemd file with some limits is all you need.
Ruby on Rails ? Put that radioactive shit in container, else your server will need to have a bunch of -dev packages just to compile gems and make it running (or alternatively you'd have to compile the gems on same environment server is running, which is more work than just making a container)
I've been deploying Rails since v 2, and never with Docker. The only time I ever had trouble was setting up Rails dev on M1 Mac when M1 was new. In Linux production, it just worked; and pulling/building gems was less a drag than what people regularly experience in the NPM world.
To be fair, I don't use a ton of gems because I generally don't like additional dependencies unless the value is really there. So maybe my use cases have been too basic to experience the pain.
I mean, it isn't a big deal, just few packages need to be installed, just compared to "put a blob on server, run" it's more complex. It's all fine if you use any kind of configuration management as it is just write once in manifest and done.
But if you just install it directly via package manager, leave alone and need to say reinstall server or something, the knowledge is lost.
I remember having a lot of ping-pong between ruby devs where they, well, didn't note down what system libs and dev package they needed, just had them installed locally at some point and forgot about, then surprise when app isn't compiling on server.
Most of us are perfectly capable of learning new technologies, and there is no shortage of software and disciplines begging for our attention and adoption. For me it's not a matter of "can I learn this" but "do I want to spend my most limited resource--time-- pouring through docs and tutorials to become a few levels below competent at x, y, z, etc, etc, etc interesting systems." Some systems are intrinsically rewarding because they yield thinking tools that can be applied to other systems, while others, like Docker, are pretty much never fun and just a tool.
One consideration for small vendors who target mid-market or larger customer segments is Vendor Risk Management assessments. These typically dive into resilience (among other things) along with roles and responsibilities. If it's running on a PaaS a good chunk of responsibility can be delegated under the shared responsibility model.
You don't need to be a full-blown sysadmin to
know how to do basic deployments, and learning
these things will make you a better developer.
Yes. True. However, it is also true that we have a limited amount of time per week and a limited number of weeks on Earth. Time spent learning sysadmin-y stuff is less time spent mastering developer-y stuff. Think about what it means to be a "full stack" engineer in 2023:
- Unix literacy including common tools (awk, sed, grep, etc)
- Docker, etc.
- Relational data stores
- Key/value and document stores
- Base frontend technologies (HTML, CSS)
- Frontend frameworks and the associated language
- Build pipelines to tie it all together
I've never worked with anybody who's deeply knowledgeable at all of them, or even most of them. I would submit that it is practically impossible to master all of them while also holding a full-time job and writing and shipping production code, and staying "current" on all of the above, unless you're willing to go way past 40-hour weeks.
I've grown extremely disenchanted with "modern" software development. Your typical full stack engineer is frankly bad at most of the damn stack and they're not great at any of it. Nobody masters anything anymore. You get people with IQs three standard deviations above normal thinking they need to spin up an armada of AWS crap just to render "Hello World" in 2023. And they still manage to make a mess of it. I can't tell how much of this is brain damage and how much of it is a sociopathic commitment to resume-driven development.
In practicality, I'm a fan of "T-shaped" engineers. Master a piece or two of the stack, and have light knowledge of the adjacent layers and what they do. e.g. If you are an application-level coder you should know what Docket/K8s/etc are and when/why you might use it and when you should eschew them. And honestly I think it should be eschewed fairly often.
The places you can go with a bit of good old vertical scaling are pretty impressive in 2023. A single box with e.g. 24 cores, 64GB of RAM, and fast solid-state storage is a god damn supercomputer when it comes to spitting out web pages. Think about how far Stack Overflow got with that model and at which point you might start to outgrow them.
---
edit: By "mastery" I mean "fairly comprehensive competency" and not "be a recognized global authority." You can fluently use 50-75% of what the tool offers and you know what the other 25%-50% is so you can learn it and utilize it when needed. You can ship code using this tool that is maintainable and performant. For example, perhaps you've never utilized partitioning in Postgres but you know what it is and you understand the appropriate use case for it so you can learn employ it when needed.
Fwiw, I’ve used all of those in my career, extensively here and there. I’ve written many thousands of lines of bash scripts. Even written my own php-for-bash-script style code tags that support arbitrary shells. I’ve written my own log based distributed kv store, gone down the YouTube trail of writing my own db, gone through the angular and react iterations, deep dove in docker, docker compose, k8s, crds and custom handlers, and handled infrastructure via infra as code and with a control plane.
I also have a wife and a 2.5 year old.
It’s certainly possible if you don’t stop yourself.
Sure, you can do all that, but have you also written video games, UI frameworks, compilers, Twitter bots, localization tools, and web-based whiteboards? Have you ever made a music video or gotten a novel published? If not, why not? I'd say it's because you had different interests and spent your time elsewhere, which is completely fine and not a problem at all. People are allowed to have their own interests and explore the amazing world of technology in their own ways.
It's weird to say that people aren't learning these things because they're "stopping themselves." They're just going in other directions. No one in my life has ever accused me of learning too few things and failing to learn by building things.
Edit: I think I misunderstood the parent comment, sorry for the defensiveness here
As the sibling comment notes, yes -- my point was that the 2023 version of a "full stack developer" is distressingly shallow in most of the individual skills.
I'll use myself as an example. I am a "full stack developer." I have fairly deep backend knowledge. But my React skills basically amount to the ability to make small JSX tweaks, and my sysadmin-y skills are also at roughly the same level.
Conversely, the folks who are really sharp at front end technologies tend to write some psychedelically bad backend code in terms of scalability and maintainability.
For some projects this admittedly doesn't matter. Some projects are simple and don't need to be wonders of engineering.
No offense and not here to toot my own horn, but just because you are better on one side of the stack doesn't mean that true full stack developers are unicorns.
Like others upthread, I am one of them, and probably because I've been paid to do this job for 16+ years.
Honestly my challenge is making potential employers understand that yes, I know my way around Elixir or React, as I do around C or Rust, as I do around sysadmin (not only DevOps) or DBA work, as I do around low-level system code. I'm no world expert at any of them, but most companies need a person that can wear many hats when things go crazy and specialisation is for insects anyway.
Sadly, generalist engineers like me have lost against the trend of "full stack developer" to have been stolen from us by recruiters and less skilled devs to now mean "can use Express and React and maybe deploy on Heroku".
I know my way around Elixir or React, as I
do around C or Rust, as I do around sysadmin
(not only DevOps) or DBA work, as I do around
low-level system code
To what level do you know these tools?
This is a challenging discussion because the idea of "knowing" or "being good at" a tool is so nebulous.
As I mentioned elsewhere I'm using a definition that is essentially, "the level of skill a solid engineer would acquire after working full-time with a technology for 1-3 years." Not world-class expertise necessarily, but enough time to surpass basic literacy and achieve a real fluency. To encounter edge cases and pitfalls and develop well-supported opinions about best practices.
If you have that level of fluency in all those tools, great! Sounds like you're pretty awesome. In my experience that level of mastery of the entire stack at once is exceedingly rare.
I've written C since I was 14 in 2001. I wrote a small operating system (up to reading and running a binary from ext2) around 2004 in C, so I know how a computer works at low level. I might still be able to write x86 assembly.
I've been a MySQL DBA full time for 3 years.
I've administered Linux systems since I was 14 in 2001, and for all my professional career.
Then there's Python, Go, etc.
Probably the one I know the least is React, which means I was PM on a React codebase for a short while, and spent way too much debugging weird issues with Next.js. I stopped paying attention post-hooks, since the frontend world changes too fast, and I'm getting old.
--
Again, this is not to toot my own horn, it's just that if you live and breathe computers, and hate doing the same thing for long (I blame my ADHD), over a long enough time you tend to have quite the repertoire. I think I'm quite average compared to other people that have been around as long.
You'll soon notice there doesn't tend to be anything revolutionary in computing. After your third framework and language, you'll keep finding the same ideas and concept with minor variations.
I'm also fairly generalist although my title is cloud engineer, and have all the skills listed by the OP + IaC infrastructure deployment and cloud stuff, as does everyone on my team. I think that's the trick; everyone on my team is responsible for everything, so we're constantly taking tickets in whatever we're weakest at and consulting each other, which causes really fast skill growth. I can see how it would seem impossible in a more siloed environment though.
> I know my way around Elixir or React, as I do around C or Rust
Interestingly, I find that I've never been asked to write anything in a compiled language yet. I've been wondering how I could work those skills into my job, but it seems like it doesn't come up much in the things I end up working on
Good for you guys, but there is always someone doing more, always someone better, why start a pissing contest? You turned someone’s comment on why deploying SQLite is not trivial into an opportunity to tap yourself in the back so we can all say “wow look at berdon”, but in the end, it doesn’t really matter.
I've dabbled in the whole stack as you've said. I've written a web framework, done some Docker, done some fun Javascript demos, shipped web and desktop code in a bunch of languages. Built and maintained my own physical servers back when that was a thing people did. Windows, Linux. Currently relearning C and 68K game development (at a glacial pace) as a side project. Did assembly language in college.
Ran a business. Did everything from building the servers, the code, the marketing, the design, community management, merch, and event planning. Call it "ultra full stack," maybe?
So yeah, it's extremely possible.
But I'll also tell you bluntly I was only really good at a few of those things. Most of those things, I was bad at or perhaps more accurately I just acquired the bare minimal knowledge to get by in my specific and limited use cases.
And many of those skills decayed quickly. I knew a thing or two about SEO and front end development in let's say 2000-2012 but those fields change fast and little I knew back then is relevant now.
That's why I say that in 2023, I think the notion of "full stack developer" has grown untenable. You cannot be good (in the sense of shipping actual maintainable and performant production level stuff) at the entire stack simultaneously now that complexity at each level of the stack has multiplied relative to 2005 or 2015. For example, your own db -- is that a fun (and impressive!) hobby project or is that really production level stuff?
To be ultra clear, I am not knocking what you have achieved. It sounds awesome and I suspect you have dived deeper into more things than I. Kudos, and I mean that sincerely.
I heard a story that the inventor of python was once asked in an interview at Amazon to rate himself from 1-10 in python. The interviewer didn’t know who he was and it was just a standard question. He said 7.
I don’t think you have to be a 10 to be a good. But you might need to be 10 to be exceptional. And I will agree that it would be very challenging to be a 7, let alone an 10, across multiple domains at the same time. Certainly, across months/years one’s focus might wander and thus their exceptionality might as well. Being able to adapt and pick up new things at a 7 level though - is the real FSD.
But I see your point and agree with you about the shallowing of the FSD term.
I think it highlights how difficult it is to even discuss these things. We're all using different definitions of "good" and "mastery" and "7/10" and "10/10".
Being able to adapt and pick up new things at a 7
level though - is the real FSD
This is a really interesting line of thought.
By my definition, I actually don't think this is possible, at least not for larger languages/tools/frameworks.
My reasoning is this. To be a 7 (my definition) requires time. You need to not just understand the basic premise and syntax of a language, but you need battle scars. You need to have shipped some code in that language, gotten familiar with the ecosystem of libraries, you've troubleshot production issues, and become familiar with common pitfalls and how to avoid them. You've checked out some large codebases in that language and gleaned best practices and things to avoid. You have probably also spent some time in that community, watched/attended presentations from recognized leaders, and have a sense of which way the wind is blowing.
Everything I just described takes time. I don't think even the smartest person in the world can drop in and achieve that immediately unless we're talking about a relatively simple tool.
I mean, could somebody who already knows CSS pick up TailwindCSS and be a 7 quickly? Absolutely.
Could an engineer who is new to Ruby/Kotlin/Python be a 7 quickly? Not by my definition, not by a longshot. In my experience, seasoned developers drop into existing codebases in these sorts of languages and make a mess of things at first until they get used to the ecosystem.
You did all of these in the last 2.5 years? If not, that part seems a bit irrelevant.
But regardless of the response, some kids are way easier than others (easy kids are those who like to eat and like to sleep, imho), and some parents have more help or stricter separation of parental duties.
I really appreciate you taking the time to respond with these details. I do go deep on application-level tech (CSS, HTML, JS with and without frameworks, iOS dev, etc) and it's always disheartening for people to claim that choosing not to focus on specific technologies makes you a worse human being.
The other thing that's going completely over people's heads here is that I made this repo with the intent of helping other people who know less than me. I know I can figure out all the deployment bullshit, but I mentor folks who can't, and also don't have much money, and yet still need to deploy their portfolio projects to the Internet in order to find jobs. I was trying to find a balance of ease-of-maintenance vs monetary cost. If there's an even simpler solution with an even lower cost, that a random bootcamp grad could maintain, I'd adopt it in my template in a heartbeat.
If any of these people claiming it's easy want to work with me on a reboot of the "deploy a web site easy and cheap" template, I'd absolutely collaborate with them on it. I think it's important to democratize all the cheap computing power lying around these days.
> and it's always disheartening for people to claim that choosing not to focus on specific technologies makes you a worse human being.
Hey, I'm the guy who wrote that original response to you. I apologize if that's how you interpreted my comment, but that's not what I meant by it. I wasn't even trying to say that you're a bad engineer (and certainly not a bad human being), I just wanted to say that I disagree with the attitude of that comment. I'm sure if I said it to you in person, you would have picked up on the non-hostile intent and narrow scope of my criticism, but the internet tends to twist things in very negative ways.
Nobody can know everything, and not knowing something doesn't make someone a bad engineer. Software development in particular is more about learning than about knowing, which is why when I see someone saying that they don't need to learn something because <X>, it bothers me. In your comment, you said that you don't want to be a release engineer, but IMO that's a poor excuse since "release engineer" isn't even a role that exists at many places.
And for the record, I didn't even look at the project you linked to. So if you also thought I was criticizing it, then rest assured that I'm much too lazy for that. When I was still learning, online tutorials and resources like yours were extremely valuable to me, so I know how helpful they can be to people even if they're not "perfect" by some snobby asshole's definition...so keep at it!
And thank you so much for doing what you do. That use case of, "how the heck do I host my portfolio projects?" is really unreasonably confusing in 2023. You are working to remove a significant barrier to entry in our industry.
Unless you're one of the very few known as top in a field, mastery of a specific technology adds very little value and is always in danger of becoming more legacy than relevant. That's in comparison to someone with enough knowledge to assist in or handle any phase of getting a product into the the hands of customers and making money. Most things don't need master level work or knowledge to be successfully implemented.
mastery of a specific technology adds very
little value
I disagree in the strongest possible terms but it's certainly possible that we're working with different definitions of "mastery" so I'll give you mine. My version is perhaps actually closer to "competency" rather than "deep under-the-hood knowledge." I'm talking about somebody who knows how to use a screwdriver without stabbing their own eyeballs out, and when to use a screwdriver, but isn't necessarily like... inventing new types of screwdrivers or innovating in the field.
Somebody with Postgres "mastery" should have a strong command of basic normalized table structures, they should know how to construct useful indexes, they should know the downsides of over-indexing, they should know how to find and optimize slow queries, and they should at least know of somewhat advanced topics such as partitioning, materialized views, foreign data wrappers, and so on. It is okay not to have a clue how to set up e.g. partitioning but you should know what it is so that you can learn about it and employ it when needed.
Somebody with "mastery" of an application framework such as Rails should be competent at basic OO design, they should be familiar with Ruby structures and idioms, they should understand what goes where in Rails' MVC paradigm, they should be able to write comprehensive and performant tests, and they should know how Rails' asset pipeline builds and emits front end code. I would not expect them to have mastery of Ruby's metaprogramming, but I would expect them to know what it is and have some idea of how Rails uses it to extend Ruby. (I realize Rails is controversial, but I'm not talking about anything Rails-specific. Feel free to substitute your language+framework of choice)
In short, the sort of working knowledge you might commonly gain with 1-3 years of real-world production experience with a given piece of the stack. Or perhaps less if you were really focused on it and just learning it full time.
Without guidance, folks with less knowledge than this tend to produce absolutely unscalable, unmaintainable spaghetti code for anything larger than a toy project. To return to the screwdriver analogy, they can kind of use a screwdriver but they're constantly stripping screwdriver heads at which point either the project is ruined or they need some help from more practiced craftsperson.
always in danger of becoming more legacy than relevant
I would certainly agree that you can go too deep to be useful as a "full-stack" developer. For example, if you really understand Postgres internals and can write your own extensions and have a bunch of commits in Postgres itself, cool, but what happens when our next client wants us to work on their MySQL-based app? And while Postgres is unlikely to die anytime soon, what happens if you've achieved that level of expertise in a tool that does fall by the wayside? I've certainly "mastered" a few tools that are distant memories.
>learning these things will make you a better developer.
Of webapps. Will make you a better developer of webapps. It may surprise you to learn that there are more kinds of software than that. Sometimes the final product is something like a binary that runs outside the browser, and deployment is no more complex that putting an tar.gz'd executable on a website.
I'm not sure what that has to do with sqlite. I'm very much a non-web-developer, I never use Docker and "deployments" and stuff, but sqlite is absolutely trivial to install and run, and use for non-webapp stuff? Even operating system components use it.
If you’re actually busy then everything you say yes to necessarily means saying no to something else. If I start a hobby project to learn about and play with technology X, and I end up yak-shaving technology Y, I’m wasting my time and not achieving my goal.
Hey that's really cool, thanks! I'll consider adding a link to it in the repo.
I'm a little skeptical that any given PostgreSQL free tier will stick around indefinitely, after what happened with Heroku. And once you hit 500MB, you jump immediately to $25/mo, so if you're running a hobby project, your choice is either to delete data or start paying $300/year. On the other hand, I'd expect a well-optimized read-heavy SQLite app to scale to 10GB+ without breaking a sweat (speculating wildly) and costing more like $3/mo in storage.
I speak from experience here—until recently, I ran a site that would have been 10x cheaper if it had used pure SQLite instead of managed Postgres.
PasS my ass. What's so difficult about setting up a VPS and installing PostgreSQL, MySQL or whatever floats your boat? At Hetzner.com (no I don't work for them) you can get a dual-CPU VPS with 4Gb RAM, 40Gb SSD and 20TB traffic. That'll get you off to the races with Spring Boot and PostgreSQL if you limit the JVM to half the available RAM. For something like Rails, Laravel or Express even easier.
For that price you can get a $10 dedicated server with similar specs. Installing and configuring Postgres isn't a big deal. There are several helpful guides available. For small projects the config isn't important at all. For everything else it is just editing a few lines in a text file.
I'm not sure where the impression that this is some arcane art comes from. For a typical Debian system it is just an apt-get. Yes, the default config has tiny limits for work_mem. Just edit the .conf
Part of this might be that AWS or whichever other proprietary environment has captured developer knowledge at this point. Instead of sane INSTALL files, projects tell users to use a prepacked Docker image.
By doing so, I am a failure, and whatever I'm doing is not worthwhile because installing PostgreSQL on my $20 VPS means I cannot possibly achieve /Google/Facebook/Amazon/ scale.
One significant caveat: "Free projects are paused after 1 week of inactivity." I'm not complaining, but I reckon a lot of hobby projects would see sporadic activity. For example, I used to upload a list of my CDs to a VPS, so that when I found something interesting at a music store, I could check to see whether I already own it (#FirstWorldProblems).
I think your message implies that performance is the axis that matters and if you don’t need performance at all you should use something other than SQLite (like a file system with json)?
> They assume you're comfortable spinning up public web servers
I'd argue you should be.
I'm currently learning a low level language. It has nothing to do with my job, will not directly be a benefit to my work, and won't look good on my C.V.
It will (and has) _indirectly_ improved my work, because I have a better understanding of why certain things higher up are the way they are, which allows me to make better decisions.
You can ignore everything that's not "your area", whatever that might be, but I'm my opinion it means you'll be stuck at a certain invisible level because of it.
The sarcasm and intentionally-missing-the-point here is not really in the spirit of HN, but I'll try to address what you seem to be trying to say, which is that "it's obvious and I'm an idiot for not seeing that it's obvious":
- Will multithreading make it break? (Not with WAL mode, but you have to set it manually, as this article suggests.)
- When using a PaaS, you need to explicitly add a volume and mount it on your server machine, which you might not think to do if you're a brand new bootcamp grad
- You can't(?) run migrations from another process (or at least, people don't seem to talk about doing this), so you need to prevent your server from writing while you run the migration. Even if I'm wrong, and I would love to be, it's frustrating that people don't talk about the completely ordinary need to run migrations on a database.
- Backups just mean copying the file somewhere, which is nice, but you might need to configure that yourself instead of just using somebody's managed Postgres backup feature
So, there is at least some complexity that comes with managing a "real" SQLite web server.
(I'm probably at least 25% wrong on some details above, but that's kind of the point, it's not always easy to figure this stuff out.)
It's not sarcasm. The fact that it's a library and you point it at a file matters, and should be thought about. It implies that it's not built for distributed systems. It's not supposed to be a managed service. It's not a good option for what you appear to want. You deploy it as part of your application.. it's a library.
> It's not a good option for what you appear to want.
Ok but the article you're replying in the comments to says "SQLite is all you need for nearly everything", and what the comment you're replying to is describing is, to use their very apt word choice, entirely ordinary.
So how do we square this circle of somebody being told both "SQLite is all you need for everything" and "it is not a good option for your totally ordinary use case"?
I think the answer is that you are a different person than the author of the article and you don't agree with the article's claim. Which, great, that was irskep's entire point!
I tend to agree with irskep that this SQLite architecture is really interesting, but I also don't quite get it. There seem to be a lot of missing details for totally bog standard applications, that people frustratingly seem to just not even mention - like the migrations thing, or how to do zero downtime deploys in general, or a few other things - as if those concepts are irrelevant or uncommon. But they aren't, they're important and typical things to think about.
"In contrast to many other database management systems, SQLite is not a client-server database engine, but you actually very rarely need that. If your application software runs on the same physical machine as the database, which is what most small to medium sized web applications does, then you probably only need SQLite."
That's how we square it. It's right there in the article.
That is a claim, not an explanation. Satisfactory answers need to be provided for the typical requirements that a small to medium web app might have, or the claim is unjustified.
(Nearly) zero down time is a common requirement. Can a live backup be made while transactions are in progress? That appears to be the case. What about schema changes? Can columns or indexes be added without interfering with access to the tables in question? And so on.
> (Nearly) zero down time is a common requirement.
Are we still talking about small to medium web apps ? I'm sorry but if HN goes down things will be OK. In fact, there is a very large majority of services that can go down go down and things will still be OK.
I've realized that this whole thing comes down to the word "most" doing too much duty here. I don't think it's true that most applications just run a single db-and-application node. I've never worked on such an application. You and the author do seem to think this is true. It would be difficult for either of us to support our intuition empirically, so this is where the divide comes from.
> I don't think it's true that most applications just run a single db-and-application node. I've never worked on such an application.
Rather than applications you've worked on (many of us spend years on end working on a narrow range of applications), consider software you use (most of us flip between multiple applications every day spanning the gamut of uses). Ignore, for a moment, whatever you know about their implementation, and focus on the user-facing functionality.
Do even half of them logically communication between application nodes?
(Consider that there is still a rich class of software that is fully capable of running locally on the "client"'s computer!)
Yes I think pretty much all the applications I interact with require fault tolerance and uptime that (to me) seems simpler to implement with separation between application and database nodes.
Most things don't need zero downtime deployment. Now it's nice to have and might be an interesting technical challenge to solve, but it's usually not strictly needed.
edit: when I worked as a sysadmin we would have to schedule outages for updating apps that were designed for zero downtime anyways because those were the processes of the organisation.
What? I've never worked for a company that would tolerate downtime during deployments. Downtime is ok for personal projects, but not for most business applications.
Most business have daily downtime where the entire business is closed as in not business hours. Being sensitive to downtime is more common for companies that has some kind of online service as their primary product, but most companies are not online companies, or even global companies with offices across all the time zones.
Even for online business some downtime might be acceptable or even preferred. I used to work for a company that made most of it's money through a website and it would take the site offline for 8 hours a few weeks before important events to run a full scale load test on the production infrastructure.
Another place I worked we had to schedule downtime even when updating applications that were designed for zero downtime.
I have never worked a place that didn't allow for downtime or even heard of such a place other than the big tech companies.
Correct, but in the context of deploys I would hope they are not random failures, but rather planned events that you do when you have updates to... deploy.
Yes, but you also don't want to have to deploy in the middle of the night in order to avoid downtime during peak hours.
Not only is that a pain for whoever is monitoring the deployment, but now if the deployment breaks something you're going to have to go wake up all of the relevant stakeholders, if you even know who they are.
Not to say late night deployments are never justified, but definitely not something devs want to be doing regularly.
Yeah I've realized this is just a disagreement over the word "most" which won't really be resolvable empirically. It doesn't fit my experience that most applications are single node with lax uptime and failover requirements.
It is honestly difficult for me to imagine what kind of applications people are working on that have these - to me - very lax reliability constraints. But we're just disagreeing over what "most" applications are like, based on our experiences, without any empirical data to say either way, so :shrug:.
When I read an article like this that claims some architectural technique is broadly applicable, I don't think it is talking about hobby or just-for-learning applications. Certainly you can do whatever you want with those, but that's not very illuminating.
I'm not sure exactly what you have in mind for annotation apps for academia - things like zotero that run client side? If so, sure, there is a big world of client side software where a database is useful, and I think SQLite (or DuckDB) seems like a no brainer there.
I don't really agree about intranet apps, which are often even more critical to the people using them than an arbitrary consumer app. But I'll grant that for a company that spans a small number of time zones, you can at least have downtime windows outside work hours.
In any case, as I've said all over this thread, the only disagreement here is over what kinds of software is "most". And my intuition for "most" is based on my experience working on and using applications where a lot of effort is made to keep the thing running all the time while still evolving it. Maybe you're all right that "most" software isn't like that.
I haven't looked into how everyone else does migrations in SQLite.
The way I do it is to check the DB version `PRAGMA user_version;` on application start and run migrations if required, before the app starts taking connections. Yes, this means my app will be unavailable for a few seconds when upgrading. If you want a zero downtime solution then you will have to do it in a different way.
I think the point to be learned here is that SQLite fundamentally does not fit in a PaaS model. They are even transparent with this limited use case[0]. I work with embedded systems so I use it a lot, and all the web work I do nowadays is one-off project site and small utilities that are usually a single process so I end up use SQLite 90% of the time I’m reaching for a solution.
I think it's more that PaaS vendors aren't interested in first-class SQLite support when they can sell overpriced managed Postgres instead. Sure, it doesn't scale the same way, so it's hard to move upmarket and sell to Enterprise, but it's a shame that there's no one-click solution like there is for a managed database.
* Someone who knows Django but has zero devops skills can deploy my app with a few simple commands, add a fix, and demo it to me. Crucial: They must ask me zero questions.
* The backups happen to a non-you service, I one-click auth my google drive and/or dropbox.
* There are instructions on how to stand-up the web app on another service if you shut down. Those instructions might require two hours of my time, but should be complete.
Interesting question. I think you could probably get those features by deploying Piku or Dokku for like $4/mo on a VPS, and the equivalent managed Postgres on Render would be like $20/mo (16GB SSD), so something like $6-10 seems like the right range to me. Maybe not a booming business, but a nice margin percentage and worth paying for good UX. And the cheap version doesn't need to be the only version if people want more RAM, etc.
I'd find such a product really appealing, especially with a dedicated backup strategy that uses the SQLite backup API to do hourly snapshots or uses Litestream to S3 by default.
Let's say the price was 12. If this idea is compelling enough for you at that price point to venmo me an advance for your first month, I have some ideas to explore that could actually make this viable. FWIW, I've worked professionally on a PaaS and a managed database offering.
If you are trying to replicate or make available to a bunch of machines or similar - it's likely the wrong thing (although there are tools to do this, I've never used them). If you are just trying to back it up there is a pretty simple back up command.
I think we disagree on the "most cases" aspect that
For most of the things I touch, having a single-point-of-failure data store is typically not acceptable for production environments, except perhaps caching.
If you are a good software engineer, you probably work at a place that needs good software engineers, which by definition tends to be a place with a system that needs high availability or low latency or both or something else, while serving a lot of customers or load or something, all of which basically results in a pretty complicated distributed system...
So in your day to day job as a good software engineer, yeah sqlite isn't going to be a good choice. It doesn't work well for the sorts of systems you work on. But if you were going to build a new social network for your college or something trivial like that, and you just want to move fast and have a crazy simple deployment setup, you might well use sqlite.
It's a file is relevant. It allows you to think of it like other files:
- how do you back up a JPG? You copy it, cause it's a file
- how do you replicate it? You copy it, cause it's a file. Unless you're talking about fancy DB replication, in which case well, that's not really a thing we do with files much. You'll have to do more research. but that's cause you're trying to do non-file things to a file.
- how do you handle two different processes accessing a JPG? They both open and read the JPG. Same for SQLite. For other concerns, you're trying to do things that don't map well to files, so once again more research needed.
Only you can’t copy it if it’s in the middle of a transaction or you corrupt it and have to roll back the Journal. And coping the journal, WAL, and DB itself directly and trying to backup from that isn’t recommended. In fact SQLite itself has a purpose built backup API.
Yes, technically a SQLite database is just a file on disk. And for the most part, you can treat it like other files on disk. Except for when you can’t. The GP’s questions are valid
Just this morning I was wondering about where to host the next iteration of my side project (GiftyWeddings.com). It currently uses Go + SQLite, and I deploy it to a small $9/month instance on AWS, but the setup involves a bunch more Ansible and messing around than I want, and I'm not even sure the Ansible scripts I wrote a few years ago would work anymore (or work a second time). A simpler PaaS-like system that works with SQLite at a few bucks a month would be great. I'll check out your repo and fly.io in more depth -- thanks!
I don’t get this. Go can be compiled to a single binary containing SQLite. Scp the executable and run it. This will work on any Linux vm without any setup…
That's what I do now. But there's a bunch more setup:
- Install and configure Caddy to terminate the SSL. Caddy is great, but still stuff to think about and 20 lines of config to figure out.
- Configure systemd to run Caddy and my Go server. Not rocket science, but required me figuring out systemd for the first time and the appropriate 25-line config file for each server.
- Scripts to upgrade Caddy when a new version comes out (it wasn't in the apt repos when I did this).
- Ansible setup scripts to clone the repo, create users and groups for Caddy and my Go server, copy config files, add cron jobs for backups (150 lines of Ansible YAML).
It looks like you don't need most of this, or get it without additional configuration with Fly.io and Render.
It's kind of like the difference between Dropbox and "you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software." (https://news.ycombinator.com/item?id=28153080)
That's true, but previously the PaaSs I've looked at didn't seem to have the same concept of persistent volumes, or maybe they were costly, I forget. In any case, I tried on Heroku before, and one other provider, and they didn't really support this use case, or pushed you to use their hosted PostgreSQL offerings, which were expensive (for my budget). Fly.io looks a lot simpler and cheaper!
Your backup script is basically what I do, but I use a little Python script that also uploads it to S3, keeps the last 10 days worth of backups, and so on.
Company offering compute and expensive hosted DBs, thus making SQLite difficult to use? Color me surprised :-)
For use cases similar to yours, I'd probably bite the bullet with a raw VM from somewhere and manually configure systemd and Caddy on it and manually upgrade Caddy on it from time to time. We would probably get a re-usable Caddyfile and systemd unit from this initial setup, making it even easier to change VMs later if needed.
It's so simple, it doesn't need a PaaS.
I really appreciate all your replies and sharing your experiences. It's helping me think through deploying my own side projects which, you guessed it, will use Caddy and systemd on a VM somewhere.
The problem is when you start to approach oltp. This is not a space for sqlite.
I have thought about jacking inotify into sqlite_busy_handler() to get the exclusive writers doing better than random waits, but this full api isn't exposed to PHP (where I need it), and doing it at C would suggest alternate approaches, maybe even with xargs at the shell.
Oracle has a DBWR process that manages itself, but a write-heavy app on sqlite must explicitly declare one, and there are many further traps down this path that will trip the unwary.
Respect. I’m curious what use case for >500k writes/second or 5k write txns/second you’re implementing? I know nothing about inotify and random waits. What else can you tell me?
The normal operation of the busy handler is to accept a maximum timeout (also implement at the SQL level with a PRAGMA). Setting this at the C API also sets a dedicated handler.
The inotify interface (which is specific to Linux) allows the kernel to alert a process when writes or close/writes occur (among other filesystem activities). The arrival of such events is a more efficient way to check if the lock has been released, rather than a wait/retry of limited duration.
Let's look at this from the shell:
# session 1:
$ sqlite3 test.db
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> create table foo(bar);
sqlite> .quit
# session 2:
$ inotifywait -m test.db
Setting up watches.
Watches established.
# session 1:
$ sqlite3 test.db
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> insert into foo values('hello, world!');
sqlite> begin transaction;
sqlite> insert into foo values('so long, world!');
sqlite> delete from foo;
sqlite> commit;
sqlite> .quit
# session 2 output:
Setting up watches.
Watches established.
test.db OPEN
test.db ACCESS
test.db CLOSE_NOWRITE,CLOSE
test.db OPEN
test.db ACCESS
test.db ACCESS
test.db ACCESS
test.db ACCESS
test.db MODIFY
test.db ACCESS
test.db MODIFY
test.db CLOSE_WRITE,CLOSE
"Path units" under systemd also use this interface.
Note with Neon now you don't need to spend $15+/mo for Postgres because they separate compute from storage. So compute can scale down to 0 and storage is cheap.
Congrats on the upcoming launch! I'm definitely very strongly considering using Neon in my upcoming project. Mind if I ask a couple questions?
Would you be able to comment on if Neon is a good fit for having one postgres database per user and how well that would scale? E.g. what if millions of users?
Also with the managed service, is there help with applying migrations or helping manage migrations for such a multitude of postgres databases?
Lastly, is there any way of querying across neon databases, e.g. aggregating data from many user's databases? I saw you partnered with Hasura a bit. Hasura would be a great fit here for aggregating and "federating" each user's postgres database, but with their most recent price change I'm scared of integrating too much with them. Wondering how you think one can best query e.g. a million user's individual postgres databases?
Thank you and feel free to ignore I know it's kinda asking alot
One database per user is something many of our customers are already using today.
I think if there are millions of users it’s a bit of an overkill. Because likely you will have some very light users that you can still collocate on one database but give heavy users a dedicated one.
With Neon you can do either. Our minimum configuration is 1/2 core which may still be too much for one light user.
Thank you! That makes a lot of sense. Yes its probably overkill to do one db per user. Better to split off when they become heavier users. Thanks again for the awesome tech eh and good luck on the formal launch!!
SSHing into a new linux VPS can be intimidating but there a few good digital ocean tutorials that'll take you from new VPS to dokku and once you're there it's very much a PaaS
This is cool to see! I’ve been working on a website for online ordering, and a key requirement is free hosting [0]. After some searching I figured fly.io + SQLite would be enough for my use case but I can’t be 100% sure until I actually deploy, which I haven’t yet. It’s nice to know someone else has done it this way successfully!
[0] - It’s for a small business not in the US with no serious requirements on availability etc so free, or close to it, seems achievable.
Not the poster you replied to but from my anecdotal experience -- from most of my work ever.
I can only remember 3 companies out of all 40+ I consulted and contracted for that actually needed a separate machine for app and a DB, let alone such that actually truly need several of each.
It's a very non-romantic truth but most projects out there can easily fit in a VPS with 4 CPU cores, 8GB ram and 100-200GB SSD space. App + DB + self-hosted telemetry included.
is $15/month really that much of an expense? i share my PostgreSQL instance across projects, really does not seem like that big of a deal to me considering it's such a big hobby of mine.
Specifically: setting up database replication sucks, setting up failover sucks, setting up backups sucks (even with a PaaS you need to do this, but you can use it as your first layer), migrating database clusters to new versions or hosts sucks, keeping your host's software up to date sucks, and setting up alerting and monitoring sucks.
It's easy to get an open port to connect to, but hard to keep it there for five years.
I've done self-hosted databases. It's easy to take it for granted if you've done it before, because once you've done it, you know that it is easy. But, even though it easy, it requires work. You have to spend time learning how to do it well, and you have to go through all the steps setting things up and configuring it. There's all the backups and monitoring that you want. Death by a thousand cuts, and all that.
I still self-host databases for personal projects, but I can see why so many people don't want to bother. When I've done it professionally, I was basically working as a sysadmin and managing stuff like database servers was a core job function, not something tacked on to a development job.
If you install your own copy, you are on call for it, and the colo is unlikely to offer much help. You also have to set up monitoring or you won’t even know when it fails. Then there’s replication. Backups. All this stuff is work that PaaS vendors are ready to automate away, if my time is expensive for the org.
For fun, sure, dink around and learn as long as there are no customers to affect.
You are not wrong but for each personal project this upfront cost is 2 hours maximum, with accumulated 5-6 more over the course of the next 3-6 months.
Not a huge sacrifice. Though I do get the argument of "I want to pay $5 and it to just work" and I've done so part of the times. Just pointing out that the upfront investment in doing it on your own is not so big.
I suppose Terraform could do it (I’m pleasantly surprised to see it has a PagerDuty plugin), but it’s a lot more to write than a PaaS API, and you probably can’t borrow a specialist SRE/DBA for a small fraction of his salary.
>The only time you need to consider a client-server setup is: Where you have multiple physical machines accessing the same database server over a network. In this setup you have a shared database between multiple clients.
This caveat covers "most cases". If there's only a single machine, then any data stored is not durable.
Additionally, to my knowledge SQLite doesn't have a solution for durability other than asynchronous replication. Arguably, most applications can tolerate this, but I'd rather just use MySQL with semi-sync replication, and not have to think through all of the edge cases about data loss.
That's not what the parent means by durability, they mean having your data survive any one of your machines being instantly nuked from orbit at the most inconvenient possible time.
Just having sync replication is enough, doesn't have to be fancy like semi-sync.
'durability' already has a well-established, rigorously-defined meaning in this context, which is confusingly similar to pitr but definitely not the same thing
the downside of sync replication, as i understand it, is that although your data will survive any one of your machines being instantly nuked from orbit, your entire service will go down; semi-sync avoids this problem
But they’re using the other well-established meaning of durability a la how AWS and others describe their storage platforms. It’s pretty much the same thing but taken at whole system level. On that level an ACID database is as durable as the underlying storage medium which is sadly not very durable.
well, it's sort of arbitrary that the standard definition of durability requires your data to survive machine checks and kernel panics and power outages but not disk failures, isn't it
especially since nowadays in many data centers disk failures are far more common
The OPs point is that the single process ACID semantics of SQLite don't provide a durability guarantee that includes replication.
Other databases have a commit level that makes sure logs have been shipped.
For me this is an edge case in just about everything except financial transactions (the performance penalty of distributed transactions is pretty high!) but it is correct to note it.
Sounds like IMS runs on Z system mainframes with redundant hot-swappable CPUs and memory. They pay IBM a lot of money for the illusion of a single reliable machine, when a different OS would manage it as a small cluster.
We economize by using racks of cheap, flaky commodity hardware, but we have to be ready for one computer to die by failing each application over to another.
Does this practically improve the situation? The odds of two servers breaking at the same time for the same reasons seems very high. I actually can't think of a single example where the secondary sever would keep running.
Regression via a code or dependency update? Full disk? DNS is down? Too much load? All of these would bring down both servers in quick succession.
I guess something like a "once every 2 days" race condition could buy you some time if you had a 2nd server. But that's not a common error
Zero downtime upgrades, hardware fault, aws decides that specific instance needs to die. It also doesn't let you cheat statelessness very easily, so it's easier to scale horizontally.
Fair enough I guess. I don’t think you need two servers to do zero downtime upgrades. And the other issues are, imo, beyond the 0.99 uptime threshold that most services realistically have when you add in breakage due to upgrades.
I like your statelessness point. I suppose in your view it’s better to have the concentrated stateful core with stateless servers as opposed to just one stateful instance. Two instances mean you can’t easily store foo in memory and hope the server doesn’t die until it’s not needed there anymore. Counterpoint is that the extra layer of indirection is 10x slower and horizontal scaling won’t be needed as much if you don’t pay that price in the first place, but you are right, the temptation to store foo in memory would still be in its prime. The thing is, if one machine can scale, putting foo in memory isn’t actually bad. It’s only when things don’t scale that it’s bad.
> I don’t think you need two servers to do zero downtime upgrades
Absolutely not and I can't understand why I keep hearing this argument. Doing zero downtime upgrades on a single server have been simple since basically forever, run another process on another port, change config, restart front balancer gracefully and there you go.
We use 3 node MSSQL and it happens all the time where the primary gets in a bad state (100% cpu, high latency etc)and simply failing over to another instance fully recovers.
It could be bad hardware, it could be bad query (left dangling/canceled on old instance), could be bad statistics and unlocks disk fragmentation etc etc.
I’m with you, but you could also make the case that most small web service businesses still run a single Postgres instance with no redundancy— just backups. So, you have a single point of failure. You can get quite decent uptime out of a single VPS.
This project comes to mind https://github.com/rqlite/rqlite but I've never used it, and I'm not sure if it would count as "pure sqlite" like the op advocated anymore.
https://litestream.io/ does streaming replication to S3 (or similar service). With this, you probably have better data durability than a small database cluster.
My understanding is that it provides asynchronous replication, so you'd still lose some data if you lose a machine. This is documented here https://litestream.io/tips/#data-loss-window
I guess this is an interview level question.
1) Drain connections from your instance. Stop taking new connections and let all existing requests timeout. This could be by removing it from a load-balancer or dns. This ensures your litestream backup is "up-to-date".
2) Bring up the new deployment, it restores by litestream. When restore is complete, register it with the load balancer (if you are using one) or dns.
3) Delete the old instance.
Yes... and all I see here is downtime. How do we do this without services failure? With a postgres db, you can spin up and down ec2 instances to your hearts content. Every 1-4 years, you can upgrade your db by using a replica with no down time.
Depends exactly what you mean with "durable". One machine with RAID10 can be pretty durable and solves the most common problems with disk issues, other risks can be managed too.
Ah, that brings back memories. Had 2 RAID 10 MySQL servers run for a decade without rebooting. One had an app db, the other a stats db, and the two replicated to each other.
Spinning disks and all, I was terrified to reboot them and have the boot disk fail (which was not on RAID).
The main disks failed once or twice which slowed the servers down considerably until rebuild of the raid finished. Very nervous time.
Durable in the database context refers to durability of transactions, i.e. your database does not lose a record of committed transactions. A good example is an ATM withdrawal.
I like sqlite as much as the next guy but it's built-in datatypes are limited.
Things like arrays, UUIDs, geometry stuff, JSON, etc.
Sure you can store more advanced stuff as blobs or text but then you have to mess around with deserializing it in the host language and you lose the ability to query it directly in the db engine.
The biggest one missing is date and/or time. The workarounds all suck:
- Store the date as a huge, wasteful string in ISO8601 format
- Store it as Unix epoch seconds
- Store it as a fractional Julian day
Besides the first one, you have to remember how the date is stored and ensure all client libraries handle the conversion. If you want to view or manipulate the latter 2 formats in SQL, you need to chain a bunch of conversion functions.
I meant hardware caches like L1, L2, and L3 on the CPU.
SQLite is used in some HPC work.
ISO-8601 datetime strings can easily wreck your L1 cache. Instead of filling an eighth of the cache with a 64-bit value, you wind up filling almost half with a 27-character-long string.
This is what we do. Have been storing 100% of our timestamps this way in SQLite for ~8 years now. Using .NET to handle the actual conversion to/from long.
var myTimeUtc = DateTime.UtcNow;
var myTimeUnix = new DateTimeOffset(myTimeUtc).ToUnixTimeSeconds();
var myTimeUtc2 = DateTimeOffset.FromUnixTimeSeconds(myTimeUnix).UtcDateTime;
No drama at all. No weird libraries or utility methods. It's all simple built-ins these days.
Agree, this is a giant PITA. I think this could be easily fixed. The main sticking point with unix epoch is the SQLite CLI interface, but there could easily (?) be added some kind of mark to columns that it's an epoch timestamp and a client feature which parses those and formats the date. Done, problem solved.
Any code (usually one codebase) which looks at dates in a SQLite DB can already easily do these conversions, even at the application-level.
That's true - but I think it goes back to "you will need." It's nice to query these things in the DB, but for most users you can just load everything based on associations and sort it out in memory. It's less efficient, but most of the time you will be ok.
It's ok until you have to deal with loading a bunch of point cloud or geometry data based on associations and sort it out in memory. Then PostGRES becomes your friend.
I mean, yes - "you only ever need C in most situations" is true. You might like to use something else. You might be really glad to use something instead of C. And also...you could generally get by with C. Just like you can mostly get by using SQLite.
And I'm not sure what sort of support you'd expect for a UUID type. Depending on how you represent the UUID, it's either a string or a blob -- I can't think of any meaningful operations to perform on a UUID which go beyond basic comparisons.
I agree. For my little applications I've looked at Postgres because it has much richer data types, but I can't justify the huge complexity increase of Postgres. So SQLite it is.
add partitioning and sharding, basic search in other DBs is pretty nice as well. I've come in on a project where the team did custom trig for distance queries that took 30 seconds to run once the data grew past their data set. All so they could use sqllite for local development. :facepalm:
Also the moment you need two databases for high availability or access to it over the network you end up inventing mysql but using sqlite rather than innodb.
From a technical perspective, almost every tool is more than what you need. There is a lot of mature software that does amazing things.
Whether the tool fits with your architecture and specific use case is much more important than whether it does the job. You can make most tools work for most use cases, but it might not be a natural fit.
For example, an in-memory database is probably not conducive with a serverless environment and you would prefer to either host your own DB server or use a serverless DB.
Or perhaps there are specific Postgres plugins that enable your use case, or a specific Postgres feature like n-gram search (I don't know if SQLite supports that), etc.
Technical maximalism ("it does all the things!") is great for marketing, but a poor way to choose the appropriate technology for your application.
> If your application software runs on the same physical machine as the database, which is what most small to medium sized web applications does, then you probably only need SQLite.
That's the adoption problem. It has a hard ceiling. Besides embedded, most engineers use a different DB professionally - dare I say pg or mysql? And because of that they'll reach for the tool they already know.
Sure, I get the argument that SQLLite fits "more exactly" for many many applications because the sheer number that never move off of one machine, but it has this hard ceiling of "what if I need more" and well "I could just use this other tool that goes the full distance, in case one day... oh and I already use it at work."
That's why SQLLite excels for embedded applications. It fits perfectly. There is no "what if" and the performance is astounding esp in low power.
The question is what kind of "more" is needed. Out of disk space or hitting CPU/mem limits? Splitting off a 2nd service which handles isolated functionality could help, migrating to larger storage could help, migrating some data to cold/external storage could help, eliminating low value high storage cost features could help.
The only "more" which pushes a move to client-server DB is a CPU- or memory-limited, not easily sharded, persistent service. But to have such a problem is highly uncommon. Usually optimization will solve such problems easily. I've seen Python services which use 1GB RAM/process, each process can handle 1 concurrent request, and there's only 20GB of RAM per instance. The solution there is to use one of many sane async frameworks to handle more than 1 concurrent request/process. Some problems, such as latency of DB queries/N+1s and the query complexity which ensues, will not even arise in the first place if you were using SQLite.
> The only time you need to consider a client-server setup is: Where you have multiple physical machines accessing the same database server over a network. In this setup you have a shared database between multiple clients.
Am I misunderstanding this or is this not the vast, vast majority of all cases?
And so it is, but I have never gotten that as an answer when asking.
edit: I feel like I should be more specific here. I would only call it a considered technical reason if it was actually considered. The fact that it is possible to come up with good reasons is not relevant if no thought went into it at the time of design/development.
Technical reason is to limit blast radius, easier permission management, easier scaling.
For blast radius, you have things like patching (you don't want to botch the database accidentally updating a shared dependency with your app), resource management (don't want your DB to eat all the RAM or I/O and kill your app), if someone botches maintenance there's less to break at once
You don't have to use VMs but they're one way to do it. Container orchestrators achieve many of the same goals and automatically restart workloads if a physical machine fails, too
And yet it has been done just about everywhere I have worked in the past, The justification is usually that VMWare will move the machines to another host if it goes down.
I would love to use SQLite for all my Django webapps that have only several simultaneous users, but this article suggests there are too many footguns for me to be able to do that.
Is there a "using SQLite for a multi-threaded webapp for dummies" package that does all the config I need so I can just drop it in and go and not tune anything?
Paging fly.io founders etc! If I have a persistent volume can my fly.io apps use SQLite? What are other good micro-hosting options?
2) When you want to do a transaction that does writes, use `BEGIN IMMEDIATE`, not `BEGIN`.
3) Don't have long-running transactions.
4) Have some process to do backups.
(3) might be a big ask for some systems. Long-running transactions should be avoided even in systems like Postgres, but on a SQLite system with writers, they're the difference between an amazing experience and a garbage one.
I'm hopeful that Fly can eventually make (4) painless by having super-easy out-of-the-box litestream and S3 backups. Until then, roll your own cron scripts or what-have-you.
I was looking at https://github.com/irskep/cheapo_website from commenter irskep above, and they make a nice point that render.com has automatic daily backups, solving 4)
However, in another comment they mention "You can't(?) run migrations from another process" and that "people don't talk about the completely ordinary need to run migrations on a database".
I guess this is also the piece that I'm missing. How do I run migrations? Do I deploy a new version with the migration and temporarily take down the server? I'm glad to do that.
I guess I'm also walking through this because---as I said---I'd love just to switch to SQLite but I'm still not sure how many simple non-esoteric gotchas will pop up.
You can run migrations from another process. Migrations are just writes, and SQLite supports writes from multiple processes.
A single transaction that does a very large write will likely impact the reader -- the reader will be blocked while the write finishes.
I use SQLite in a web scraper on my laptop. The scraper runs as 16 processes hammering the database, doing about 5,000 write transactions/sec. Occasionally, simple SELECT queries experience high latency (what would be a 1ms query takes 100-200ms), because they're blocked while the WAL gets checkpointed into the main database.
If you have much lower volume of writes, the checkpoint is smaller and so completes much faster, and so the worst case latency is much better. Since crawling is a non-interactive process, I don't care about the worst case latency. If I was writing a website, I'd feel differently--but most websites won't do 5,000 writes/second.
> You can run migrations from another process. Migrations are just writes, and SQLite supports writes from multiple processes.
Thank you for explicitly saying this! I'll see if I can update my repo to allow for online migrations. I was trying to be as conservative as possible based on what I knew at the time. It'll be nice to update the migration steps to be simpler.
I think it's worth noting that this combination loses committed transactions on kernel crash / power failure / etc. That is, your app replied "OK" to the client, but the data still gets lost.
> A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.
You can certainly put a SQLite database on a Fly.io persistent volume. You can use Litestream to keep that SQLite database backed up, transaction-by-transaction, on other volumes or in S3; soon, you'll be able to use LiteFS to replicate that SQLite database as well.
I'm extremely excited about LiteFS. Looking forward to trying it out once write forwarding lands. I think it has the potential to make hobby projects scale to zero but also serve reasonable traffic, which is exactly what I'm looking for. Thanks so much for hiring Ben!
It would probably be harder to bend Django to use SQLite as a backend then it would be to just setup MySQL or PostGRES and use the existing Django tooling for it.
Ignore this, I was wrong. Django supports sqlite just fine. Last time I did anything with django I was an intern and just copied the setup from the readme which happened to include a mysql setup. I have not touched django since.
I'm not convinced this is true. Django supports SQLite out of the box, and otherwise database management is an exercise for the reader. The migration tools, for example, work fine with SQLite.
Agreed perhaps, but I still don't want to devops Postgres and I'd rather having tiny hosting costs for tiny apps. So I think there's a definite need here for this tooling.
SQLite does it all if you look closely enough. Even for performance it can turn out to be the best option.
If you dare combine (properly-configured) SQLite with a local NVMe disk, you will find yourself well beyond what many hosted solutions can provide (clustered or otherwise). To be clear - the biggest reason for this is the incredible latency reduction, not raw IO bandwidth or disk IOPS (although this helps massively too).
Millions of transactions per second doesn't mean much when there exist no dependencies between them. The figure I am more concerned with is serial transactions per second. 1 logical thread blocking on every command.
SQLite is the only database engine I have ever used that can reliably satisfy queries in timeframes that are more conveniently measured in microseconds rather than milliseconds.
I agree. Most uses of databases definitely don't need to grow larger than, say, a single filesystem, or a single application, or a single host, or a single network, or a single geographical region, or a single customer, or a single organization, or a single global network of customers in organizations in regions on networks on hosts on applications on filesystems.
There could not be any features of any other database that SQLite might not have, or that an application might need, or want.
We definitely should not, like, read a book on databases, or read the manual of another database, or something else crazy like that. There's no reason we might learn about other databases. They are just "shiny stuff", meaning, there's something going on with them that I can't see, because of all the glare.
Honestly, the existence of all those other databases, and database models, and the billions of dollars spent on them, is a fluke, probably. It's unlikely you will ever in your life see or work on an application that needs a database other than SQLite. Because the only applications you will ever work on won't ever run on more than one virtual host, or be used by more than one application. And definitely you will probably not need a feature that isn't in SQLite.
SQLite is very fast, it is very simple, it is very well written, and it has a lot of tests. Therefore we can conclude that you should never look at or learn about another database, because considering the previously states information, we know that no other database could possibly be desired or needed.
I don't know a lot about databases. And, granted, I only just found out about SQLite. But I am quite sure I am correct that SQLite is the only one you'll ever need in most cases.
I don't think this characterization of SQLite advocates is fair. I've written pet database systems from scratch, studied CS theory at university--including databases, worked for big, famous tech companies.
Looking back on my long career, I can say that literally every application I've ever worked on-- with one exception-- could have been run on a single modern machine, probably with SQLite as a backing store. I'm not sure, but there's a pretty good case that such a design would have improved things a fair bit.
The network was never our problem. The hardware turned out to not be the cause of any of our outages. Outages were always caused by a software bug or a misconfiguration of our complex applications. These misconfigurations were in large part caused by avoidable complexity. The complexity was often due to us trying to solve for scale and reliability problems that we didn't really have.
You can easily host multiple customers or organizations in any database. That's not something that SQLite or SQL Server or Postgres helps or hinders.
Very few of us ever work on a Google or Facebook scale-- even though we think we do.
Anyway, in my career, the one exception I can think of was a big-data application that really did need many servers just to store the actual petabytes of data. Even there, though, we could probably have partitioned the data across SQLite instead of Elastic and probably done alright.
All of that said, I run Postgres in production like a sane person, because I don't want to have to deal with managing a database. Render manages it for me so that I can focus on reading Hacker News. So, this entire rant is moot.
I bet most non-FAANG programmers have indeed never worked on an application which could not be built on a single host with SQLite. And SQLite is in fact more fully featured in some ways than some client-server DBs (when will Postgres add support, even via a plugin, for primary indexes?)
I agree with you that it is important to realize where a client-server DB may be needed, but the it really almost never is.
For perspective, same conversation, but different subject:
"A bicycle is all the transportation you will ever need in most cases.
I bet most non-bicycle drivers have indeed never been anywhere which couldn't have been gotten to by bike. And the bike is in fact more fully featured in some ways than some cars (when will cars shrink down? not require fuel? not pollute? cause less harm?)
I agree it's important to realize when a car may be needed, but in reality it almost never is."
Even if all of that is true... I still want to use a car, for good reasons. I may not need it, and it's more complex to use and maintain. But I get enough value from it that it's worth it.
Yeah, it works when it works. Just that in many cases you'll run into scenarios with it when it completely doesn't or is missing something crucial and we'll get another prodigal son story about going back to Postgres.
Having learnt SQLite before PostgreSQL I get that experience every so often with PostgreSQL... Both have got nice features that the other does not and what you got used to is going to determine what you miss when picking up the other.
SQLite does have strict typing and check constraints, I suspect the R*Tree module with check constraints would provide rectangular exclusion, though not circles.
Have any applications you’ve build needed both of these features?
Also worth mentioning Fly.io's work on LiteStream [1] and LiteFS [2] giving SQLite important S3 DR/reliability & multi-node replication and scalability - opening SQLite up to even more use-cases.
We're making use of this ourselves in https://blazordiffusion.com which runs entirely on SQLite, using Litestream to replicate it to Cloudflare's R2 object storage which is running on a single Hetzner US Cloud VM at €13 /mo.
As we believe SQLite + Litestream is a very cost effective solution that can support a large number of App's data requirements we've added first-class support to add SQLite + Litestream support in our .NET Project templates [3] which uses GitHub Actions to run Docker compose App deployments along with setting up Litestream replication to AWS S3, Azure Blob Storage and SFTP in a sidecar container that also includes support running DB Migrations on Server with Rollback on failure. If anyone's looking to do something similar, the GitHub Actions + Docker compose configuration that enable this are being maintained at https://github.com/ServiceStack/mix/tree/master/actions
I have similar arguments for "Firebase is the only database you will ever need in most cases" for web apps, be it that you need real-time capabilities or not.
I can also confidently say: a static HTML landing page is the only website you will ever need in most cases. I suffer every time I see a one-page site, hardly ever updated, built with Wordpress. Why hire an 18-wheeler to deliver a pizza?
> I have run SQLite as a web application database with thousands concurrent writes every second, coming from different HTTP requests, without any delays or issues.
Is this with nodejs or something single threaded as the webserver?
I would kind of assume you'd run into issues with something like PHP.
In production, Node is generally run in clustered mode (multiple threads per instance). SQLite works fine with multithreaded applications, if you set a longish timeout for acquiring a file lock.
It does work better when all of your writes are on a single thread, though. I used Node’s IPC to accomplish this, and was able to get it up to 10k or so writes per second while still having it do tens of thousands of simple reads per second.
Blocking can become an issue pretty quickly with SQLite unless you do what this guy does and make a different database for essentially every process writing to the db. This can crop up in all sorts of situations.
For example today I was writing tests for a python application I'm working on that uses SQLite. SQLite is the only option I have for this program as I have literally no way to setup a client server db in it's environment.
I had the tests all configured to write to the same test db file. The program I wrote the tests for had no issues with locking because it doesn't run anything concurrently.
However, when I did the same in my test suite, they ran in parallel, which caused my tests to hang. Of course, I figured this out quickly and made the tests write to their own individual DB. This is a really small program, so cleaning up afterwards is no big deal. (I could also just use an in memory db instance instead of even dealing with files). But, for a program with a LOT of tests, I could see this being an issue.
The answer here is non-trivial. I have since long stopped developing with PHP, but PHP has several run modes (And multiple runtimes like HHVM), that may bring different performance and concurrency characteristics.
Last I remember, PHP still runs all inside of a single process, so it still all share the same memory space, and it no longer has the overhead of starting a new process with every request.
The piece of engineering that made node.js so fast back in the day was Libuv, which allowed for non-blocking IO, greatly reducing the number of system calls/context switches.
But I am also going to guess that PHP developers have since caught on to the performance optimizations of non-blocking IO, and integrated the improvements into the runtime. Doing a quick search for "php vs nodejs benchmark" on Google [1], it seems like the performance of Nodejs is comparable to that of HHVM.
So as usual, use the right tool for the job. This is more of a problem of using your tooling correctly, more than choosing the correct language.
Node.js uses non-blocking IO to achieve massive amounts of concurrency on a single thread, while one-thread-per-request models such as PHP rely on OS threading to do the exact same thing.
If anything, the node.js model is often capable of more concurrency because of the inefficiencies involved in OS threading and context switching.
EDIT: of course, event driven PHP exist. I don't know much about the current state of it.
and if you need to migrate your database schema in any non-trivial way.....well then you're on your own. for anything beyond adding a column to a table, you'll have to copy the whole table to a new one with the structure you want, drop the old table, then rename your new table, carrying along all the foreign key constraints and other constraints while you do so. Or use a tool which does this (I write one such tool and it's not fun to maintain).
if SQLite allowed for custom commands, at least there could be ALTER commands that run this process behind the scenes, which the SQLite developers wouldn't have to maintain.
SQLite has commands to rename columns (this is somewhat new). Which other migration is not supported without new table/copy/drop old table process?
Also, MySQL can't run a migration on a FK-constrant table without downtime. To do this you need an online schema migration tool which generally requires the absence of foreign keys.
> SQLite has commands to rename columns (this is somewhat new). Which other migration is not supported without new table/copy/drop old table process?
adding or dropping any constraints, including nullability, foreign keys, check constraints, changing the structure of the primary key, etc. changing a type also, even given SQLite's squishy typing model.
pretty much anything is not allowed except adding and renaming columns.
> Also, MySQL can't run a migration on a FK-constrant table without downtime.
That's not an issue for the overwhelming vast majority of MySQL databases in production, which are to be clear not running or aspiring to run at Facebook / OLTP-level scales. A table with a few million rows can be migrated in seconds, the table gets locked for a few seconds, everything keeps running after a brief pause. This is not a problem for the "most cases" use case the article refers towards.
> To do this you need an online schema migration tool which generally requires the absence of foreign keys.
if you are running at Facebook / OLTP scales or aspiring to be, then yes. Otherwise, not usually. The article here is referring to SQLite being used for "most cases", not just "running at Facebook / OLTP -level scales". Running at Facebook / OLTP-level scales is still one of the places where you most certainly would *not* be using SQLite for your primary database.
Faire used ghost for MySQL migrations in 2018. We’d run some migrations in a blocking fashion if they were on a smaller table, but the smaller ones still took a few seconds. I suspect the delay would be very noticeable (15s+) for data sizes far less than at FB.
See I’ve been using Vitess on Kubernetes for even personal projects and I gotta say I love that I can run, for 10 bucks a month on Linode, the same tools that I know by experience I can scale to a multi-billion dollar valuation worth of customers. Heck I even run it in development on my laptop thanks to Skaffold.
Sure it’s all insane overkill - but I use Linux for the same reasons - I want one API that I can use everywhere, from my toaster to my spaceship, from hobby to enterprise.
The simplicity is not the API. The simplicity is having one API.
You also run a company which offers kubernetes self hosting as a business, which is quite the bias.
Anyways, the question isn't whether client-server databases scale, it's whether SQLite doesn't scale. What were the database size, resource constraints, architecture for the multi-billion $ company? How did this scale over tiem? Do you have any reason to believe that a SQLite architecture couldn't scale to support the service offerings you saw?
If you’re exceptionally careful and skilled about writing smart queries, SQLite can for sure scale to the point you can afford to resign anything you want. The idea behind vitess and sharded MySQL (and for that matter kubernetes too) is that you can move fast and not make a terrible mess of things. I can split off one poorly designed table with Vitess - with SQLite that would be an application redesign.
But in general I compare the intended scale to successful startups I’ve been at - where several TB of data are being read by hundreds of thousands of users per second. Typically this requires a fleet of the largest instance types most cloud providers offer.
Anyways - you’re not wrong and neither is the author - but if I’m going to choose one tool - I’d rather it work for all intentions and be a bit more complex than the other way around.
Well if you're talking about a single query, virtually no one hits that scale. But in terms of over-all QPS in quickly-scaled apps, 6 digit QPS is super common. All it takes is a poorly programmed notification or messaging feature to hit that number. Also, that was for sure a high-water mark, not the typical usage!
I have wondered why Synapse, the most feature-complete Matrix homeserver, so vehemently recommends against use of SQLite as it's backing db. They say that the performance is insufficient and it's only appropiate for testing purposes.
That would make sense if you assume that Synapse is only going to be used in instances with hundreds/thousands of users, but plenty of people host their own instances for themselves only. Surely SQLite would be plenty for single-user instances, or family instances?
The problem is that even a single user matrix server can be very resource intensive if that user joins big rooms with thousands of users spread over thousands of servers. Synapse is very database heavy, so the parallelism in Postgres helps a lot - plus some of the hot DB paths have special cased queries for Postgres to use some of its more obscure features that Sqlite lacks. Finally, we don’t dogfood or optimise Synapse with Sqlite, so there’s a risk of perf regressions.
Which obscure Postgres optimization features does SQLite lack?
Can you share any extra info about the table layouts or queries which are slower in SQLite vs Postgres? In particular which postgres-specific optimizations have been made?
Thank you, Matthew! This would be fantastic information to include in the Synapse documentation, for those of us being seduced by the operational simplicity of SQLite, and posts like this. :)
Adding my voice to this - I was actually looking at Synapse a few days ago and was wondering why SQLite was not recommended. This would have answered a question I didn't know the answer to until just now.
I guess it depends upon how SQLite is configured and what storage it's deployed on. If it's setup with WAL mode and on SSD storage then it should handle thousands of writes a second fairly easily.
To be fair, if you were to consider yourself an engineer (which I imagine many of HNers would) that's essentially your whole job, overall what you want is to get the requirements fulfilled with the least complexity, cost, time, etc. If deployment difficulty or hardware usage is a consideration in the requirements then it makes sense to try and use a lighter-weight "serverless" database (SQLite doesn't use a client-server model, so it's serverless, got it??? I'll see myself out).
definitely not easier to deploy, at least if you're talking about how most people build software these days (what sqlite managed service are you familiar with? for mysql or postgres there's hundreds of companies offering this)
The difference between installation for sqlite and postgres is the difference between typing `apt install sqlite3` and `apt install postgres`.
The actual learning curve for sqlite is more complex because picking random sql queries on the internet will end in misunderstanding in underlying types in sqlite and broken data. Postgres tools ecosystem is much more expressive, because more people think about Postgres as about "real" database.
I'd go further and just point out the obvious - all DBs have incredibly well supported container run options in 2023, that mean you don't have to install the DB at all. I know of no DB that cannot be easily learned or deployed this way - even Microsoft SQLServer has a Linux image these days.
Since the advent of containers for running software, deploying any DB software is largely as easy as any other with a one line run statement, and if container image is cached you are getting a new DB in seconds.
If just starting out, running PG from a container is exactly how I would tell someone to learn today, and the process is largely identical whether they learn on windows, linux or a mac.
I haven't natively installed a DB since probably 2015 for any kind of local development use, its just a waste of your time more often than not now.
Yeah docker is great and I've done the exact same as you for a few years. But, I don't think I even knew what a docker container was until 2018, and again, as an intern or college kid with a school/work windows workstation telling them to run it in a container is a non starter.
Ever try to install postgres on windows? On a school or work computer you may or may not have admin access on? That's the kind of enviroment a lot of people (including myself) started in. In that sort of enviroment, it's easier to run sqlite3.exe database.db. And that's the kind of enviroment a lot of people (myself included) got started in databases.
Depends upon what you are optimizing. If you are optimizing for learning as few tools as possible then yes, learning PostgreSQL as your one database is a good idea.
Personally I use both PostgreSQL and SQLite. I like SQLite because it's a lot nicer to work with, it's easier to quickly develop for and the ease of deployment and operations can't be beat. I use PostgreSQL when working with other people mostly because it's what everyone else knows and it's great for large systems.
Why learn Postgres when you could just learn SQLite and have a database that is virtually guaranteed to be enough in almost all cases (and as sister comment says, is more efficient and more secure)?
Yea that is my thought as well. With how easy it is these days to spin up managed Postgres or compatible DB services it really seems unnecessary to not just start with that.
I am ashamed to admit I've run into a case where SQlite was the wrong choice.
I have a cluster of 20 Windows 7,8,10,11 machines spread across multiple sites that I wanted to run some log analytics.
This was for some custom software running on all 20 machines which had a SQlite API.
So I did the simplest and cheapest thing that would work.
I setup sync.com folder on all machines (think dropbox/ onedrive) to write logs to the same log.db file across all machines.
This worked great at first (I could analyze the db remotely).
However you see the potential problem here...
Logging was maybe a few writes an hour PER MACHINE but inevitably you start getting conflicts (since sync takes a 5-30 seconds to actually sync).
Now I am faced with merging 20 conflicting log.db files.
In theory I should have used a server based SQL database.
Or perhaps I should have just lived with 20 different log.db files.
In my defense there was only SQlite API so I would have had to write some middleware to transfer to another DB.
I’ve argued this before and I’ll argue it here now:
Modern computers are fast enough that in many cases “the only database you will ever need” can be files on the filesystem. For example “1 row = 1 file”.
It brings additional benefits as well: for low-write applications you can use git to get a history (+transactions if you store them in the log), backups are super easy, replication is trivial. For higher-write applications it gets more complex but you can still plan and implement most of the traditional DB scaling techniques (and even implement them one at a time as you go grow).
Computers are “stupid fast” now that we’ve gotten off platters.
> Name: Edward
I'm using grep and find and the Unix file system
> Name: Toni
Why do you need a database? I'm using CSV files!
> Name: Carlos
I came after a long journey to your website to seek enlightenment : am I fucked? And it didn't answer that question clearly. Let me re-phrase it: I first search an LDAP directory, then I remotely execute a quota status, after that I query a PostgreSQL database, and then I generate a .txt file with the timestamp as its name or a .csv file with an hash as its name, and then I look up the files from a web page, load it all to a multi-dimensional array, and generate a nice report, re-loading the entire file every time the user wants to, say, sort by another field. Something this complex can't ever possibly fuck up, can it?
Transactions, really? How do you lock rows, how do you have relations, how do you do joins? In fact, ext3 can only handle about 50,000 files in one directory. So you'll have to split up your "primary key" into letters like abc/def/foo like we do
The bad but valid answer to locking rows and doing relations is that you write the logic in to your application. The better answer is of course that if you’re doing specific types of complex things a DB is a better fit.
Honestly splitting in to nested subfolders is not big deal anymore, it’s a single function you can write even if you’re having a “.10X” day.
Store the index on the filesystem and populate it on write.
Not satire, though a bit sensationalistic to argue it’s a solid solution that’s usually overlooked because it’s “too slow”. I’m just pointing out it’s not actually slow any more.
Back in the days of scaled applications running on MySQL, DDR2 was 3200MB/s and people were so happy when their DB was small enough they could fit it in RAM.
I feel like wasting disk space is not a real issue any more: if you’re working with more than 1TB of data you (at least you better) have the resources to pay for bigger HDs which are at an almost trivial cost per TB.
Files in a single directory: It was discussed in another comment, but there’s a tried and true solution to that: simply nest your items in folders. For example with UUID as primary key you could have a folder structure of ‘(first 4 bytes)/(second four bytes)/(...so on)/(full uuid)’ where your nesting level is enough that you have no more than 50,000 files in each directory. For smaller pools you can reduce the layers (‘(first two bytes)/(full uuid)’ for example still gives you quite a few entries before any one folder gets to 50,000)
I used to use SQL Server on a PC to deal with tables over 1 million rows. I'm on a Mac now... can I use SQL Lite? What is my best option? I am not a coder, I just know enough how to query SQL databases.
Are you coming from using SQL Server Management Studio on a PC?
If yes, you may be better off using something like PostgreSQL on Mac OS X, and a PostgreSQL GUI browser tool, because it may be harder to use SQLite in this situation as you don't have the benefit of using a computer language to make up for the datatypes that SQLite does not have.
For example SQLite does not have a DateTime datatype. If you need to do a lot of date/time field manipulation in SQLite, this can be made easier when using a programming language like C#, because you can use a SQLite integer datatype to store the DateTime data, and then convert the integer to a C# DateTime datatype and do the DateTime manipulation in C#. But if you aren't a coder, then this option isn't available to you, and you will have to have other ways to manipulate DateTime data in SQLite like using the string datatype to store the date time values and then use corresponding SQL queries to handle the "DateTime stored as a string" situation.
The Mac OS download is also available from https://www.sqlite.org/download.html - and in any case SQLite is a single-file C program, so you could easily compile your own (but no need for the Mac).
This post inspired me to switch one of my projects to SQLite, but then I remembered that it uses PostGIS, and the ORM I use does not support the SQLite GIS types :(
Everyone thinks they have big data and 99.99% of the time, their entire DB could be served from a single 100GB SQLite file on a single SSD on a random dev's laptop with better performance than the expensive AWS deployment they've rented. They think they have super-concurrent stuff and locks are important, but modern machines are so fast that data can be added faster than their users can input data.
People don't realize just how big 1MB is if it's not multimedia. If you had 100,000 users consistently writing 1kb to your DB every single day, it would still take a 10,000 days or over 27 YEARS to fill a 1TB harddrive.
Most companies have 1,000 users per day updating mostly existing data or adding small snippets here and there. That 1TB drive would die of old age long before your average company could come close to filling even 10% of the capacity.
Nope. SQLite is already available in the same process and using the same file system as your server. In some cases (ex Python) without adding any new dependencies. It's downright silly to try to think of a way to make it easier.
Here's your easy cheap and lightweight relational datastore API:
import sqlite3
conn = sqlite3.connect("db.sqlite3")
cur = conn.cursor()
cur.execute("SELECT * FROM products LIMIT 25")
print(cur.fetchall())
We have a bunch of serverless functions that are generating records that fit a relational data structure. (AWS Lambda)
We would like to write these records to a persistent relational data store so another downstream process can read it.
Amazon RDS seems like an overkill.
Amazon DynamoDB (NoSQL) seems like a misfit because we want to execute relational queries and some joins against this data.
We could write to CSV on S3 and query using Athena/Presto. Seems clunky and slow.
Am I missing any obvious solution here OR there is a space here for a service that offers a lightweight relational datastore that multiple loosely coupled readers and writers can use.
Use RDS. SQLite only works if you have one persistent machine with a disk and file system.
The problem with AWS lambda and SQLite are 1) network file systems often don’t support the APIs that SQLite needs to implement concurrent access; SQLite is not a database server 2) local storage for AWS lambda is ephemeral, your DB will be deleted, and if there’s two lambdas, they won’t be using the same DB.
Just use one EC2 instance and SQLite, or lambda with RDS.
Everytime I try to use SQLite I run into db locking issues where I seemingly have to try to run my query in a retry loop. Am I doing something wrong or does SQLite just not play nice in multi threaded contexts?
SQLite is a great but complex software. In order to properly use it, you do need to read the documentations, guides and resources available at the official website. If you don't, you will shoot yourself in the foot.
In that sense, I have the impression SQLite is different from other database software. You can usually get by with Postgres or MySQL (after they are set up) without looking at their docs.
I spent several hours reading the SQLite docs. It wasn't wasted time: actually learning SQLite made me a better professional. But for those thinking of using it on their side (or main) project, definitely understand how it works and what are the trade-offs involved.
Now, to answer your question specifically: it depends on a few factors. You can have concurrent readers (wal). With newer developments (wal2 + begin concurrent), you can also have concurrent writes as long as they happen at different pages.
If you are blindly doing multi-threaded connections without understanding the implications, you also risk corrupting the database entirely.
> SQLite has always been able to handle multiple processes and threads
This is true for the vast majority of cases, although there's at least one documented scenario where using modern SQLite coupled with an old threading implementation (the one that predates NPTL in Linux) may lead to database corruption.
I guess that's why I said "blindly": as a way to incentivize OP to look this up. It's not that the SQLite database is fragile, but rather that it expects to be used in a certain way, and if you don't, you risk corrupting it.
This is true for every SQL database. Push conflicting transactions hard enough and you'll need a retry loop. In Postgres you'll see row-level MVCC detecting a write conflict and have the exact same end result. SQLite's locking is just coarser grained, and tends to trigger with less load.
As someone who uses free hosting for personal projects, I thought I couldn't use SQLite because it wasn't advertised as being available. I'm beginning to find this isn't true.
SQLite added strict mode recently, check it out. Agreed time handling is sub par - could use builtin date formatting from epoch to ISO which would fix all the problems IMO.
I know about STRICT tables [0], but they still follow the quirky coercion rules. The reasoning seems to be that other DBMs have a similar behaviour. However, I want _errors_ if I insert '123' into an INT column, so it's easier to find problems in my code.
The quirky coercion rules that PG, MySQL, SQL server and oracle also all follow? Let’s be clear, if this is a problem it’s a problem with all SQL DBs, not just SQLite.
I’m curious why ‘123’ in an INT column is so bad? I suspect the conversion rules are in place because they shouldnt ever cause logical errors.
I personally appreciate using created_at < ‘2021-05-23’ in Postgres queries. The query would only be more verbose if I had to explicitly construct a date object for it.
The reason for that being not that good basically has the same reason as with coercion rules in weakly typed languages like JavaScript.
If I'm passing a string to an int column, there is most likely an issue in my application code. If there currently is none, there might will be.
For example, I might have forgotten to parse the string properly in my application code. If I'm doing '10' * 2 in JS, it returns 20. If I later change it to '10' + 10, it will be '1010'.
Say I then save the result of that compilation in the DB. Raising on '1010' would have prevented me from persisting the error and gave me an opportunity to investigate the situation. Without an error, there will be a much harder debugging session.
These coercions were popular back in the 90s/00s, which is why I think most DBMs have them. At least that's why JS has it.
> In contrast to many other database management systems, SQLite is not a client-server database engine, but you actually very rarely need that. If your application software runs on the same physical machine as the database, which is what most small to medium sized web applications does, then you probably only need SQLite.
Disagree.
If you think about it from an attack surface perspective, there are numerous advantages to isolating the database. There are performance, availability, sharding, and columnar options out there also that may better meet the use-case (just to name a few). I have ran Postgres on endpoints when developing with performance akin to SQLite. Further, there are numerous ways in which to increase performance, availability, or to pursue some of the more customized versions of Postgres depending on use-case. One of the times I used Postgres was with Oracle DBAs, and they found the transition pretty simple.
Various customizations / extensions / versions of PG
> If you think about it from an attack surface perspective, there are numerous advantages to isolating the database.
The attack surface on PG or MySQL is a lot larger and there are a lot more moving parts than SQLite (which is just a file). Notably, there is no service exposed to the network that someone can attack, which is a huge attack vector with lots of different types of vulnerabilities that don't exist in SQLite.
Back in 1999 or so, before SQLite was a thing, I used to throw little ASP websites together with an MS access .mdb file as the backend connected up through ODBC.
It was neat and quick and easy to get running right on your regular windows desktop.
By the criteria of this article, that was apparently the only database I ever needed. It could handle the multiple reads and occasional write of a small scale website. Backing it up consisted of copying the file. It was accessed through a simple standard library (ODBC) and supported SQL.
Since that was the only database we needed, what does SQLite bring to the table?
saying the obvious for the new comers, sometimes is really necessary in that generation of master of overengineers and not only that, but with pretty useless and unecessary abstractions
The output is apples and oranges tho. Since I was downvoted by someone I'll added a simple example to show the difference between the two interfaces. I shouldn't have assumed anyone here was familiar with the respective representations.
Sample data:
CREATE TABLE users (user_id serial, name text);
CREATE TABLE comments (comment_id serial, user_id int, comment text unique);
CREATE VIEW user_comment_view as select u.user_id, u.name, c.comment from users u, comments c where u.user_id = c.user_id;
INSERT INTO users VALUES (1, 'Bob');
INSERT INTO users VALUES (2, 'Sally');
SQLITE3 OUTPUT
sqlite> .schema
CREATE TABLE users (user_id serial, name text);
CREATE TABLE comments (comment_id serial, user_id int, comment text unique);
CREATE VIEW user_comment_view as select u.user_id, u.name, c.comment from users u, comments c where u.user_id = c.user_id
/* user_comment_view(user_id,comment) */;
sqlite> .schema users
CREATE TABLE users (user_id serial, name text);
sqlite> select * from users;
1|Bob
2|Sally
POSTGRESQL OUTPUT
test=# \d
List of relations
Schema | Name | Type | Owner
--------+-------------------------+----------+----------
public | comments | table | postgres
public | comments_comment_id_seq | sequence | postgres
public | user_comment_view | view | postgres
public | users | table | postgres
public | users_user_id_seq | sequence | postgres
(5 rows)
test=# \d users
Table "public.users"
Column | Type | Collation | Nullable | Default
---------+---------+-----------+----------+----------------------------------------
user_id | integer | | not null | nextval('users_user_id_seq'::regclass)
name | text | | |
test=# select * from users;
user_id | name
---------+-------
1 | Bob
2 | Sally
(2 rows)
Postgres also supports adding + to commands to get additional extended information, eg, \d+. You can also filter by tables (\dt), filter by views (\dv), filter by functions (\df), etc. It's allows much more natural enumeration of the DB which I wish sqlite had as well.
I partially agree, but then I remember my high school IT lessons, where people in my class (our profile was math and IT, mind you) struggled with excel and very basic programming. Scripting may sound trivial for people reading HN, but certainly is not for everyone.
Not to mention that to really benefit from scripting you need programs that you can actually execute. As far as I know Windows (which most people use) is not very friendly in that regard. Powershell improves things a bit, but I'm pretty sure you can't just manipulate .xlsx files with a simple shell script on Windows, and this is one of the lowest hanging fruits I could imagine for white collar workers.
Powershell and whatever else can that instantiate COM objects can edit excel files.
Windows is a lot more friendly than you realize. Probably powershell is a lot more powerful than you realize.
I'll also make the point that people who are doing OK in excel probably can model things in their head well enough to get into scripting. Powershell also has a one-liner "export to *.csv" cmdlet, which is pretty amazingly handy.
Excuse me, are you disparaging our lord and saviour SQLite? I'll have you know that SQLite works perfectly for all use cases. If you use it in production you don't even need SLAs because it literally works 100% of the time.
If you want to use a PaaS to deploy an app, because you don't want to spend your time learning to be a sysadmin, then all the tutorials are going to put you on the Postgres path, because that's what's supported. (Of course, you'll then end up paying $15+/mo for Postgres, which is hilarious for most hobby projects storing 50MB of data.) But in reality, you could just scale vertically on one machine and be completely fine. No need for "distributed" anything, in theory.
I took a shot at productionizing SQLite here as an experiment: https://cheapo.onrender.com/ But I'm not sure I did it right, because I don't have much experience working below the level of a PaaS. I'm an application developer and I do not want to become a release engineer. I resent even having to learn Docker. :-)
Anyway, if somebody wants to nitpick my Flask+SQLite deployment, I'd really appreciate it, because it seems really silly that people have to keep writing this stuff from scratch when 90% of hobby sites have the exact same needs. And the Fly.io/Render configs would apply just as well to Node, Ruby, etc. https://github.com/irskep/cheapo_website
Edit: Somebody got mad at the Docker bit. I tossed it in for effect, but I promise you I don't hate learning new skills. I wish people would recognize that weekend coding projects should be fun, and sometimes that means avoiding certain kinds of things that a person experiences as difficult or frustrating. Arguing on the internet sucks, who knew?