Hacker News new | past | comments | ask | show | jobs | submit login
Serverless SQLite (workers.dev)
396 points by sfeng on Jan 20, 2021 | hide | past | favorite | 163 comments



This is really clever. I've been wanting to try the WASM version of SQLite for something - this is a really smart usage of it.

My https://datasette.io/ project is built around a similar idea to this: the original inspiration for it was Zeit Now (now Vercel) and my realization that SQLite read-only workloads are an amazing fit for serverless providers, since you don't need to pay for an additional database server - you can literally bundle your data with the rest of your application code (the SQLite database is just a binary file).

If you want to serve a SQLite API in an edge-closest-to-the-user manner for much larger database files (100MB+) I've had some success running Datasette on https://fly.io/ - which runs Docker containers in a geo-load-balanced manner. I have a plugin for Datasette that can publish database files directly to Fly: https://docs.datasette.io/en/stable/publish.html#publishing-...


Thanks for making Datasette! I've been using it to build a project to make US ranked choice election results more accessible. It's very intuitive and easy to use. The main difficulty for me is how to get around the limitation that the sqlite files have to be colocated on the same server. These datasets can get pretty large, and I can't host them on Github, and since I can't put the datasette db on an S3 bucket, I've been exploring mounting AWS Elastic File System to the Docker container. Is there a better way?


Google Cloud Run works well with database files up to a couple of GBs. Beyond that the best solution at the moment is to run a VPS - I've had good results with DigitalOcean droplets.


Why don't you place them onto an AWS disk and attach it to the VM instance running Datasette?


Yep, that's probably what I'll end up doing.


Hi Picardo!

We are toying around with the idea of launching a Cloud SQLite product targeting this exact use case.

How large are your data sets?

Would you be interested in a quick call to discuss your needs in more detail?


FYI, I'm using Google Cloud Run for this at the moment. Scales to zero, issues the SSL cert and provides the HTTPS endpoint. That's what I'd be benchmarking your service against. Let me know if you actually do something in this space.


hey, thanks. That sounds interesting, but this is a volunteer project. I'm donating my time, so I don't think I can justify a service like that right now. The datasets are around 200MB each, just over the 100MB limit of Github.


Thanks for the details!

We would price such a service to be competitive with the obvious alternative of using a long-running VPS or ec2 instance.


I’m using the WASM SQLite (in the browser) for a BI product [1] I’m working on.

It’s been working extremely well and it runs fully sandboxed in a separate worker thread with zero network traffic after the source data has been transferred across. With computational notebooks running alongside it, you can do some pretty powerful analytics with very low latency and it’s great to be able to do inserts and complex queries entirely client-side.

SQLite is one of the greatest software projects I know.

1. https://www.rationalbi.com


https://www.sqlite.org/serverless.html

(first published 2007, before the more recent misuse of the term began)


I think I will always struggle to understand the popularity of the term serverless.

From the two definitions found in https://www.sqlite.org/serverless.html

classic serverless --> "embedded database" exists and is often used

neo-serverless --> I suspect that this is a marketing term used to attract cool people to new cloud offerings (like "jamstack" for services such as netlify). There is no good replacement here because the term was tied to this new kind of infrastructure really early. But anytime I hear someone repeat "of course serverless does not mean there is no server!" I die a little more.

Naming things is hard but at least we ought to try to come up with terms that are not blatantly misleading.


Look, the industry has settled on the term "serverless" to mean: "You just upload some application code and the system automatically provisions servers to run it."

No one has the power to change this. It's like saying "I think the word 'spoon' is stupid, let's all call it 'scooper' instead." This is our language now, it is what it is. Vendors offering serverless solutions have to call it "serverless" whether they like it or not because that's the word customers understand. You can keep complaining about it, but it won't change. You might as well accept it and move on.

"Cloud" is a stupid term too, your servers aren't really floating in the sky. But here we are.


If only somebody would come up with a Common Interface for application Gateways that would let the webserver dynamically select based on URL routing which executable to provision temporarily to generate that page.


CGI is in no way more understandable than serverless as a concept, especially when it shares its acronym with the very popular computer generated imagery.


At least it's not made out of an English language construction which is actually just wrong.


The english language has a bazillion "wrong" constructions. What is this, kindergarten? Serverless clearly has gotten a specific meaning in the industry, which means "without manually spinning up servers".


And this is how the English language has become what it is. If someone politely points out that the use of a word is incorrect, instead of thanking and correcting themselves, people just shrug and say: "it is what it is" and continue to spread the incorrect usage. I even wonder why bother with grammar in a language like this.


It's jargon used by part of the industry. It's not like the usual irregularities of natural language. There's no need to encourage its wider adoption.


> The english language has a bazillion "wrong" constructions.

Sometimes I complain about some of those too.

Personally, I'm looking forward to the next loop around, when the concept is rediscovered again. It will need a new name. Unfortunately, that's probably going to be 20 or 30 years.


I think the idea is that if enough of us push back, then people will stop using that term. After all, "the industry" is made up of people whose minds we can attempt to change.


The "industry" is comprised of much more than techies -- there are millions of marketing people out there who get hyped up at the shallowest sounding buzzword and we absolutely never will out-shout them. Nor do customers who love buzzwords.


Buzzwords can be fickle things. “Wireless” used to mean radio. Who calls the internet “the information superhighway” anymore? “Computer” used to be a job title. How many job ads capitalise “PC/MAC” even though it’s a contraction of Macintosh and not an initialism like Personal Computer is? The first “tablet” I bought was a Wacom — an accessory, not a computer in its own right. When did we stop calling pocket computers “PDAs”?


"Digital" used to mean something that related to fingers or toes. Then it meant "electronic". Now it means "distributed via network instead of physical medium", even though it's the first "D" in "DVD".


It's not just marketers. The vast majority of engineers are just fine with this term. The set of people complaining about it are a tiny minority.

I'm the lead engineer on Cloudflare Workers. We didn't originally call it "serverless", but after talking to lots of engineers who said "Oh so it's serverless?" we decided to go with that term. The decision was not made by marketers.


So network effects. And the other engineers already bought the marketing term and are implicitly pressuring you into using it. Ouch.


There's nothing "ouch" about it. People understand a term to mean a thing, we are doing that thing, we adopt the same term, now everyone understands what we're doing. Communication is good.


Well, I agree on that. But I do get sad sometimes about how easily influenced the people at large are.


And I'm telling you that idea is not remotely realistic. Sorry.


Then it's similarly unrealistic to expect that some fraction of people who are familiar with English won't continue to complain about it.


<something>“less” has been in use for a long time and typically means a paradigm shift in how the <something> is performed. Horseless carriages still had means of locomotion; driverless cars will still have a means of effecting driver functions; eggless recipes typically have some other substance to perform the job of eggs. In my mind <something>”less” is not as objectionable as <something>”-free” like “sugar-free” which isn’t to say that something isn’t sweet, but that the sweetening comes from something else.


whereas "sugarless" means its not sweet :)


but "sugar-free" means it's sweet again


I really thought we'd have grid computing, aka utility computing. Running our self-contained software agents, persisting data and state in ubiquitous tuplespaces.

That's my excuse for being so slow to grasp "cloud". When AWS made EC2 public, I really didn't understand why anyone would want to use it.

Cloud era "serverless" just pisses me off. The applicable cliche is "Every sufficiently complex C project recreates LISP, poorly."

It's seems like we're doomed to repeatedly reinvent the past. Championed by youngsters who never read the book.

Maybe this explains "worse is better".

I know, I know. I've become the old crank. All you noisy kids get off my lawn.


That's Greenspun's 10th rule. The full version of which is, "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of CommonLisp."

Robert Morris' addendum is, "Including Common Lisp."

If you don't believe Robert Morris, tell me what LOOP does. :-)

(Note, this is the Robert Morris who wrote the first major internet worm, cofounded Viaweb with Paul Graham, and cofounded YCombinator, again with Paul Graham.)


I wouldn't be surprised if the term originated as a marketing tactic (e.g., AWS) to seed in people's heads the idea of not having to care about servers


You hit the nail on the head. It is all about marketing for the new front-end developers who don't want to learn "server side" manipulation db, files, etc.

Personally the whole indexdb, localstorage really break the web page as a stateless model. Why do you need save so much local data to just maintain the session? Stop putting everything in the web browser.


I can see the value of serverless in general (not this hack necessarily) in small startups that don't have sysadmins on payroll but still want to rapidly deploy products with confidence that it will just work - all without worrying about infrastructure, scaling, etc.

What I'm curious is if (1) serverless is cheaper than hiring competent sysadmins who can maintain the infrastructure instead and (2) are these savings worth being locked into a chaotic architecture and boring proprietary tooling that is forced on you for the profit of the Google, Microsoft, and Amazon monopolies?

I've recently entered the job market and personally find no joy in working in serverless environments because of the latter.


Personally I suspect that the chaotic tooling will waste more dev time than what the sysadmin time will cost you. So that it will be a net loss even before you consider lock-in and fees.


While a good argument in favor of the vendor locked-in serverless services can be made at the start of a project, it looks like a net loss during any other phase of the project.

It comes down to if you want to have generic and reusable sysadmins vs. cloud serverless specialists.


Heroku fits the bill for me. Very easy to manage and scale, still cheaper than an SRE or sysadmin.


You don't need local state to maintain a session other than possibly storing a Token, but that's very small.

Storing data locally is often out of convenience or possibly for cached data for offline support.

There are valid reasons for maintaining local state.


> Why do you need save so much local data to just maintain the session? Stop putting everything in the web browser.

I guess that's to replace the traditional desktop application with something that can be deployed over the web?


Stateless was gone with the invention of cookies and authentication.


I think I heard it first wrt GAE, but I think it was more "wry observation" then than actual "thing".


> I think I will always struggle to understand the popularity of the term serverless.

AWS marketing is very good.

Serverless was a creation of the AWS Lambda team / AWS marketing department "I helped start the serverless movement..." from the LinkedIn of Tim Wagner (https://www.linkedin.com/in/timawagner/)


the difference between serverless and managed service is actually that when there are no requests the serverless is offline and not costing you much.


All this serverless talk is just marketing speak for limited shared-hosting environment.


I think the term is just fine. For me it doesn't mean that there is no server involved to run the app, it means I'm not involved to run the server.


Serverless is an adjective so you can slap it everywhere.

Worker-based or FaaS (function as a service) don't roll off the tongue that well.

I'm not advocating for the term, just trying to explain why it is so popular.


Also, serverless solutions like DynamoDB or S3 aren't compute, so worker based and FaaS would simply be wrong.


They probably call them serverless because you are mainly billed by the number of requests instead of the number of hours it runs for (although for Dynamo, there is a cost per table and r/w)


Would we lose any meaning by just calling DynamoDB and S3 managed solutions, like in the old times?


EC2 is a managed solution but you still need to manage servers.


Elasticsearch and RDS are also managed, but not serverless.


That’s kinda unrelated. The title here could have been “SQLite in serverless apps” or something to avoid the confusion.


And https://en.m.wikipedia.org/wiki/Web_SQL_Database which was truly serverless SQLite for a web browser.


How is the OP not "truly serverless SQLite for a web browser"?


SQLite has a stricter definition of serverless than conventional.


Words can have multiple meanings and meanings of words can change.


Almost like how the English language has been literally hijacked and destroyed.

Language does change and evolve. The misuse and subsequent additional meaning for “literally” to now also be a simile for “figuratively” is a good example of how this isn’t necessarily always a good thing.


To be fair, we wouldn't have "literally" without a metaphoric interpretation of "literalis," "of or relating to letters."

I always find this particular example to be somewhat self-contradicting for this reason.


The 'misuse' of literally has been part of the English language for almost 300 years at this point [1]. It is older than the USA.

[1] https://www.merriam-webster.com/words-at-play/misuse-of-lite...


The changing and evolution of language is neither a good nor bad thing, it just is.


Disagree.

If a change diminishes the clarity, efficiency or even the beauty of the language for no good reason, then it's a bad change, at least for the purposes of communication.


How is "literally" used as a simile for "figuratively"?

A simile would require a sentence in which "literally" is compared to "figuratively", wouldn't it?

My impression, as a non-native speaker, is that the misused "literally" is to indicate exaggeration, not the quality of being figurative.


> the misused "literally" is to indicate exaggeration

Used to be. Now "literally" literally means "figuratively". [1]

[1] https://www.merriam-webster.com/dictionary/literally


From your link:

> Can literally mean figuratively?

> One of the definitions of literally that we provide is "in effect, virtually—used in an exaggerated way to emphasize a statement or description that is not literally true or possible." Some find this objectionable on the grounds that it is not the primary meaning of the word, "with the meaning of each individual word given exactly." However, this extended definition of literally is commonly used and is not quite the same meaning as figuratively ("with a meaning that is metaphorical rather than literal").

That seem to confirm my impression that "literally" is used to indicate exaggeration, rather than being figurative.

Edit: formatting the quote


You are correct. Literally is commonly used as an intensifier and has been for ages.

Complaining that modern people are literally ruining the word literally is a popular internet gripe.


Simple: "Literally" is being used figuratively.


Do you mean like "incredible" and the like?

It's just language, you might disagree but usage determines correctness (eventually).


I've also experimented with this, unfortunately the 50ms CPU time isn't enough for datasets larger than 1.5MB. And the wasm init add at least 100ms to each request even when "hot" in cache.

Also, out of a cost perspective, running a VM with SSD for cheap with SQlite will give much more requests than a CF worker for much less.

Adding writes to this is also very limited due to the max 1sec write per KV key limit.


Cloudflare's "Workers Unbound" don't have that limit.


It doesn't but it will have latency when loading in the KV keys for every non-cached request.


Don't KV loads not count towards the CPU time of the request?


Loading a KV does not take CPU time, however, loading a KV into a variable does parse the content of the KV which does consume CPU time.

Which is why it takes a lot less CPU time to do streaming and even to some degree, a streaming parsing of the content.


Mmm, the only use case I can see is equivalent to an in-memory database, which would be faster than embedding sqlite.


I've also tried full-text-search in worker by pre-indexing the content, works very fast even with a JS engine - less than 5ms to make a search in 5MB of text.

It runs out of CPU-time at 6MB of index though.

There's someone that made a WASM for full-text search too, it's definitely faster and can handle quite a lot more text.

https://github.com/wilsonzlin/edgesearch


I was thinking of making it possible for SQLite to be used with static pages.

My idea is to modify SQLite to use ajax with the HTTP Range header to fetch B+ pages from the server as they are needed. SQLite already has a VFS (virtual file system) so this shouldn't be too hard.

I am not sure how fast it would be and it would waste a lot of bandwidth. That's why I haven't made it yet.

This would only be useful for using it with Github pages.


Hm, interesting. I made this, if you want to take a look: https://github.com/TomasHubelbauer/sqlite-javascript. You can test it using the Demo link in the readme and then use the prefilled value in the Load from URL prompt. Based on this I find your idea to be very doable. I didn't do it yet, but while working on this I had async fetching of pages using Range requests on my mind. Shame I didn't get to it when I was working on this project, could be easy to put together a demo. Similar to this I made https://github.com/TomasHubelbauer/fatcow-icons which parses a big ZIP file piece-wise on demand. As you scroll, new parts of the archive are fetched and icons extracted so you don't need to download the whole thing.



Oracle had this more than 20 years ago, but mostly used stored procedures (packages) to generate the html pages...


This is interesting. The polar opposite of what I was thinking.


I have a sqlite 'on top' of a chrome cache filesystem. My usecase is to torrent the database as i also have support for files. The caveat is that i primarily use as a key-value store (the btree storage part of sqlite).

They are used to share structured and unstructured data in a p2p way where a torrent could be a database(structured) or a fileset(unstructured) .

But all of this is part of a bigger project that is basically a new sort of p2p "app browser" with a local-first emphasis but where every app can share its functionality though RPC (locally or remotely) with distribution going over DHT's and torrents.


That could be a lot of round trips.


Yes. Maybe you could increase the sizes of B+ pages ? The only reason that databases use B+ trees rather than a red-black tree or avl trees is because of the overhead of reading data from the hard drive. This would be a interesting hack.


You can increase the page size [1]. That will increase the size of the B tree nodes [2] (and other pages too but that's probably what you want):

> The upper bound on [the number of keys on an interior b-tree page] is as many keys as will fit on the page.

I think a tricky part of this idea would be the locking. Usually SQLite relies on the locking of the underlying filesystem. You could add your own mechanism that causes a lock to be assigned to a single client connection, but what if it never unlocks? (On a single machine you can tell if the client process has crashed.) You could add a timeout but what if the client process then does respond?

[1] https://www.sqlite.org/pragma.html#pragma_page_size

[2] https://www.sqlite.org/fileformat2.html#b_tree_pages


The clients would just send ajax queries with range headers to get the parts of the database that they want.

It is going to be read only.

Locking wouldn't be problem because in the eyes of the sqlite running in the browser it would have it's own read only copy of the database.


Someone has built it already for bittorrent, maybe you can reuse some parts: https://github.com/lmatteis/torrent-net

Even with an artificially slow connection it seems to be reactive enough. Actually fetching pages on-demand can only be better for the bandwidth, the real issue is going to be latency


I haven't looked at the code but this sounds cool. There is also something called Kademlia which is a distributed hash table for keeping track of seeds for torrents. https://en.wikipedia.org/wiki/Kademlia


Dqlite might be interesting to look at. They used the Sqlite VFS to front their raft cluster backend. https://github.com/canonical/dqlite


So we start adding sqlite. Then comes some sort of small framework for templating. Then we add session storage. Then we add an api endpoint so we can serve a SPA. I think after that it's really time to create a VM in sqlite so we can emulate linux and run somethings else in the 'serverless' endpoint


Lol I did this (sqlite via rest, custom urls, static file hosting) plus authentication/authorization and smtp with email templates. Everything needed for SPAs, no one was interested.


Well, now I'm interested. Link it, please? I'd love to take a look. Sounds really good!


I took it down, let the domain name lapse. It was a depressing, eye opening experience to say the least.



It's the future of the web they said!

Seriously though, if it's Truing complete, you know it'll be abused to do something completely unintended, like run a ray tracer or something.


Ray tracing in lambda could be cool... perfect candidate for horizontal scale... lambda per pixel?


That's a very expensive way of rendering, not sure I could afford more tha 30 frames per hour.


But think of all the money you'll save by not having to install and manage the server!


Using SQLite compiled to Wasm in order to push computation closer to the user is a powerful idea. I'm partial to the method of serving up the SQLite files directly and building applications around the SQL.js library [1], which includes math extensions and the ability to embed Javascript udfs. I wrote a data visualization using SQLite as the data store [2] and can attest that it's refreshing to use SQL inside of a static website.

[1] https://github.com/sql-js/sql.js [2] https://ml-ranking.geospiza.me


I did exactly that for a minor app I built for a relative (a vocabulary) - a single html page, loading a remote sqlite file, allowing for indexed user searches with no perceivable network latency. the downside is having to transfer 4mb to the client upfront, of course, but gzip shrinks that on the wire down to 1.5mb, which while heavy is acceptable nowadays.


SQLite was already serverless.

https://www.sqlite.org/serverless.html


> SQLite was already serverless.

Yes but now it runs on someone else server! Wait a minute, this makes no sense...

'serverless' is perhaps the stupidest marketing buzzword developers have come up with.


Indeed it's quite stupid. Any alternative catchy name for the concept?


We already had terms that covers about 90% of the different use cases: “managed”, “SaaS”, “shared hosting”, “multi-tendency”, “timesharing”.


tendency > tenancy (or tenant)


Yeah sorry, that must have been autocorrected on my phone. The words are visually similar enough that it’s easy to miss.


Dunno about the catchy, but Runtime as a Service could fit the *aaS pattern.


I thought the term was FaaS?


Homeless - of no fixed abode.

Catchy like vagrant but perhaps a bit insensitive (unless it was used to highlight the plight of the homeless).

Less catchy - Poste Restante or General Delivery

https://en.m.wikipedia.org/wiki/Poste_restante


"Shared hosting".


I would agree and here's why: when compared to PHP scripts of old, the concept is the same. You have some bit of executable code that gets called when a specific URL is hit.


Shared hosting is better than this. You get to use the filesystem.


sysadmin-less, or "someone else server"


Isn't the modern serverless just a "load on demand"-concept?


My intuition is:

"Don't care where it runs, don't want to manage it, I want to pay only for when I actually use it, thanks for managing it, here's some extra cash for the effort"


(Neo-)Serverless just means billing per-request instead of per-day, right?


More or less yes.

But you can only apply that metric if you remove the end users from actually touching the machines (virtual or real).

This is the "less" in serverless.

Turns out that this is also something people want: not having to be on the hook for actually administering machines, installing security patches etc etc.

So managed (platform as a) service + fine grained billing


I would have preferred something like daemonless, processless, or something that more closely describes what's being offered. My suggestions also are flawed but less flawed than serverless, imho.


How about "inetd"?


Managed infrastructure

Managed service


NoSrv


Cloudy


Cloudy McCloudface


computer-free software


well, you don't need to manage a server to run the process. So, technically, it is less server. It's not no-server, just less.


According to your link:

> Recently, folks have begun to use the word "serverless" to mean something subtly different from its intended meaning in this document. Here are two possible definitions of "serverless": Classic Serverless: [...] Neo-Serverless: [...]

SQLite was already "Classic Serverless". Now it's "Neo Serverless" as well.


Nice, you can actually do something a bit more flexible using SQLite on AWS Lambda. "Just" mount an elastic file system (EFS) onto your functions. I wrote about it here: https://franz.hamburg/writing/shared-storage-for-lambda-func...


Thanks for sharing your writeup and the potential use cases therein.

We've been building an experimental shared file system that specifically targets the FaaS setting. It supports SQLite and gets big performance gains over NFS/EFS, especially on read-mostly workloads, due to improved local caching and lock elision. See: https://arxiv.org/abs/2009.09845


Really interesting.

How do you deal with ephemeral file storage? I guess you actually need the provider (say AWS) to build this into the platform?


Yes, the idea is that the cloud provider should offer something like this. We intercepted system calls to simulate what it would look like.


Did I read it right you get 1 second responses? Why even bother with lambdas? A t3 nano reserved instance costs 2 bucks a month or something! Surely most folks don't anticipate 1000x scaling in minutes? Coupled to elastic beanstalk you can get reaonable scaling as well!


Might be useful for services that aren't continually used? For example, month or year end processes. Not convinced I would use SQLite for a service like that though, seeing as AWS has serverless RDS


This was just a proof-of-concept using the smallest lambdas available. I definitely would opt for a EC2 instance. Though lambda would be enough for something like a configuration service.


Is a lambda easier to configure and set up? With Elastic Beanstalk and flask (and the sample code they provide) I can get an api running in a few minutes..


Both are trivial and take only minutes if you're already familiar with them, but Lambda doesn't involve managing instances, their OSes, or making sure they scale up / down (at the instance level) the way you want.


I was thinking a bit about an "on-edge DB" recently, perhaps something as simple as some way to persist and sync IndexedDB data between user sessions. Not usable for everything but probably great for a small CMS or a blog.

For running a write-enabled DB on a CF worker a major problem is that the only storage option that I could find has really low write limits [0], 1000 writes a day for the free option. "Durable Objects" is a beta API, perhaps its transactional-storage-api [1] has better limits?

--

0: https://developers.cloudflare.com/workers/platform/limits#kv...

1: https://developers.cloudflare.com/workers/runtime-apis/durab...


Honestly, I think the page's "edge-sql" title is more descriptive. This is running SQLite within "edge" workers, as a nifty little proof-of-concept.


edge-sql [1] allows arbitrary SQLite queries to be executed over an immutable external data set. The demo uses a Forex data set stored in Workers KV.

Client issued arbitrary queries is one of the use cases for GraphQL and publishing immutable data sets on the web is the main use case for Simon Wilson’s Datasette [2].

In-memory SQLite compiled to WASM works in the browser and Node.js too. In the future, we can expect proper ACID operations on any WASM runtime that supports fsync in WASI [3], a POSIX-like API.

[1] https://github.com/lspgn/edge-sql

[2] https://datasette.io/

[3] https://wasi.dev/


I have a Google Cloud Function that uses SQLite to store a user's previous day tweets and do some minor sorting and filtering of data. Works very well and saves me the cost of a "real" SQL instance.


Where's the SQLite database written? How do you ensure only one writer is modifying it?


Two-phase commit[0] could do the job

[0] https://en.m.wikipedia.org/wiki/Two-phase_commit_protocol


How do you do two phase commit on a sqlite database file?


I don't know SQLite specifically, but I think that it can be implemented in the application logic using the primitives of any RDBMS. This old thread suggests that http://sqlite.1065341.n5.nabble.com/Re-Two-Phase-commit-usin...


Serverless SQLite … for read-only use cases


Can someone ELI5 as to the significance of this?


CDNs allow serving files closest to the user cutting out 100s of ms of latency.

But they are limited to read only files.

Cloud flare workers allow running simple programs at the CDN node closest to the user.

SQLite is embeddable C code that is like a normal SQL server just without the network parts - just the sync functions embedded into your process.

This wraps SQLite with the networking and CDN abilities of cloud flare, allowing you to http-get a subset of data dynamically using SQL (that would not be possible with the simple “download 100% of file X”).


Now THAT makes sense! (And, i can now see the benefits of this approach.) Thanks for the explanation!


I don't get it. Does SQLite run now on the Cloudflare servers, or is it running in the browser via WASM?


On the cloudflare servers, which run server-side JavaScript.


In AWS Lambda you can run arbitrary processes in their serverless service e.g. my company runs ffmpeg in Lambda. Why does this have to be WASM in javascript engine? Why can't they just run sqlite by javascript forking to a C++ process (or using sqlite binding of javascript)? Is this a POC for wasm?


Very cool. I had similar idea and glad to see it implemented and working well.

Workers KV imposes 25MB limit per key. Worker memory limit is 128MB. Concatenating several values from the store or using sqlite's ATTACH DATABASE should make possible querying of about 100MB large databases, would be my guess.


What would this be best used for?


This is great for any time you have a mostly read-only database (which is true for most publishing use-cases - anything you might consider a static site generator for) which you want to provide API access to with lightning-fast speed because replies will be served from a CDN edge.


The data served from the edge can be modified without touching code, just produce a new version of your SQLite database.

A small DB to search in a list of storefronts which typical retail websites have, a database of flags or promotion codes which you don't want to send to the browser are some examples I came up with.

Personally I would write a code generator to embed the data in the code but to each their own.


Cloudflare workers run at the edge along with a simple but fast KV store.

We use KV and workers to serve static data (json) to browser based code.

This could be used to filter, sort or aggregate that data before sending the client.

Perhaps for inventory / product lookup on a e-commerce site.


I use something like this for a metadata search tool serving static data. Its high performance and easy to set up which is the main use case for me.


With this, suppose you want faster response than Disk, then every serverless function can have its own replica of data & power responses off of that. It is like a cache through sqlclient. But wait, can't we do that already running in memory SQLLite? why is it a big deal now?


Nothing. Demoing novel wasm usage maybe?


Reinvent http proxies, but now with additional code and data that gets out-of-sync and is harder to debug.


Hmm. Are you checking for SQL injections?


Mu.

There's no ability to change the anything through the SQL. It spins up a new sqlite db every time, and builds the table in memory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: