Hacker News new | past | comments | ask | show | jobs | submit login
Offline-First Database Comparison (github.com/pubkey)
415 points by typingmonkey on Oct 26, 2021 | hide | past | favorite | 155 comments



So glad to see PouchDB included. We use it and have generally had a great experience! We use it with CouchDB on the backend, and Couch seems like a fantastic way to go for use cases involving syncing data between devices with an offline mode and syncing between clients. It was built from the ground up with replication in mind.

Biggest bummer of CouchDB? If you’re not hosting it yourself, there’s only one major player in the market that I know of: IBM Cloudant. They contribute much to Apache CouchDB though, and hosting it yourself doesn’t seem too difficult, especially for small, simple use cases.

Anyone else using CouchDB?


I use the PouchDB/CouchDB combo for exactly the use case your describing. The query language for CouchDB leaves a little to be desired, but ultimately it has worked well for me. I'm self hosting on a digital ocean droplet.


Is DO offering anything close to enterprise grade?


You probably need to define your actual requirements rather than this nebulous term.

The "DO offering" is a basic VPS and you host/manage the couchdb process yourself. DO does not offer a managed couchdb service. I've found couchdb to be reasonably stable and worry free.

The nature of pouchdb/couchdb and the design philosophy behind it makes it relatively easy to scale to additional servers. Couchdb is master-master, which is a good eventually consistent model that should help in any horizontal scaling. The base process is erlang/elixir which should scale vertically well.

I've been using it a couple of years, but not on any sites that have significant traffic.


"enterprise grade" covers a a much larger spectrum of uses and needs with regard to features and stability than non-enterprise grade, making it kinda hard to answer that generally.

They offer VMs, containers, managed DB instance offerings, block storage, multiple regions and datacenters, load balancing, etc. They have an API to control all those things, and modules available in many popular languages. But that's basically what many people consider table stakes for a service like that, and indeed there are competitors like Vultr and Upcloud that offer the same.

Will any of them have quite the same level of offerings that AWS or GCE or Azure offer? Probably not. But for a great many people what they have is all the enterpise level stuff they'll need or use, and it is decidedly easier to just start up a cheap linux VM and take care of it on one of these services compared to AWS or GCE or Azure, so if what you want are somewhat manually managed cloud VMs, I highly recommend one of these services over one of the big names.

I've used all the services named, and I still prefer DO for just throwing up a cheap $5-$10 VM for personal stuff, or to spin up a temporary VM for testing something out. On DO that's a couple second process when you do it manually by clicking around.


Five 9s, or nine 5s?

Can one run a professional, mid-pack to top-tier SaaS on it, is it consistent, stable, etc


I've used AWS, Digital Ocean, and Vultr each for multiple years. I've had one major outage on AWS and another on Vultr. I'm currently feeling out Oracle cloud.

I wouldn't hesitate to run significant sites on any of these providers. Problems and outages happen, I don't think anyone is really providing five 9s whatever they say.


One can definitely run a top-tier service on nine fives metal. It's basics of SRE.


Not really, at least in the negative sense of the term. Try talking to Oracle or SAP sales for that :)


"Enterprise grade" = "you can do better, but you can't pay more"

(with regard to systems from Oracle and SAP)


lol!


There's IBM's Cloudant services if you need that. And you can build your own cluster and connect as many servers as you want.

But DO doesn't have any pre-configured droplets to start off with. Be nice if they did.


I tried to use couchdb with pouchdb. It was a mess to add the proper authentication layer over it and the fact that even the couchdb team has changed their opinions on the right way to do it was not impressive.

I love RxDB with Hasura behind it. It's incredible and you get a great postgres front end to boot.


>>the fact that even the couchdb team has changed their opinions on the right way to do it was not impressive

I don't get that. If they're working on improving something I didn't like that's something I'd appreciate.

CouchDB/PouchDB works great for how I'm using them but over the years I've observed that those coming from using SQL DBs can have a hard time with it.

I get that. But CouchDB is not designed to compete or replace SQL.

To me, it feels like CouchDB was not the right tool for the job you were doing. That's not a reason to dismiss it though.


Do you use «one database per user» model or «all users data in one db» model?


One database per user. And even disregarding the obvious inability to natively join information across tables, it still was a mess to subscribe to changes across all of them and all the other things you take for granted with a relational database. And to me, it looks like couchdb is on life support as a technology. It's a great idea and revolutionary in it's time, putting everything as documents with views, etc. But, too many gaps and unclear direction.


CouchDB dev here, I can confirm that CouchDB is very much not on life support. Active work goes into PouchDB and CouchDB, some of which is addressing the issues brought up here. Nothing shipping yet (aside from maybe native jwt auth in this months 3.2.0 release), but definitely ongoing :)


Glad to hear a new release is coming. I run CouchDB in production. I have actually checked the couchdb repo to see if it was being maintained.

Some of the things that trouble me. The ubuntu upstream package manager dropped off the radar for a while, not sure if it's currently running or not. Also, the last release was a looong time ago. I understand it's pretty stable, but there are rough spots that could use some shoring up as noted.

These issues aren't enough for me drop its active use in production, but I'm eyeing reworking how I use if these kinds of issues continue. This kind of dropping the ball doesn't instill confidence. I don't want to have to maintain my own installer so I can predictably perform new installations.

Also, no Linux ARM package. I gave a go at compiling it myself for ARM, but that failed due to being unable to find/use a compatible SpiderMonkey.

It's great tech, and I'd love to carry it with me into bigger and better projects. Here hoping :)


the ASF switched binary hosting providers, the COuchDB docs have the up to date links: https://docs.couchdb.org/en/stable/install/unix.html


That's great to hear. I love PouchDB for the record. It's amazing.

That's really great to hear about native JWT auth support, that is a huge gap for me to not have a good story of how to do auth. I totally understand the reasons for moving it outside of the database, but it was too much work to create and manage my own proxy on top of couchdb.

Thanks a lot for the clarification, and I'll be watching.


Jan has actually been working on per-document access control and according to the discussion it is expected to land in CouchDB 4 [1].

CouchDB doesn't move extremely fast. It's kinda boring and reliable once everything is set up - which I consider a good thing.

But I agree, it's annoying to not be able to analyse data across multiple user DBs with a single query or to build hacky solutions when listening to changes across databases.

[1] https://github.com/apache/couchdb/issues/1524

E: typo


IMHO, user databases (both CouchDB and PouchDB) are just front cache. User can mess with his data in CouchDB/PouchDB using API, so it's important to keep data in an inner database, inaccessible to user, and copy data between databases. Inner database can be a SQL database, e.g. PG with jsonb.

  +--+     +-------+    +-------+    +--+
  |PG+<-+->+CouchDB+<-->+PouchDB+<-->+UI|
  +--+  |  +-------+    +-------+    +--+
        |
        +->...
        |
        +->...

[0]: https://kroki.io/ditaa/svg/eNpTUNDW1dVWAAEgAwy0sXG0uRQUagLct...


Really depends on your use case. If that's the user's data anyway, no need to put it in an inner database.


I've actually been using Couch DB with Pouch directly to authenticate users on a small app, with a DB per user. Tbf, my app isn't all that large or complex at this stage, so I don't know if there's any overlap in what we're doing. But from your post, relational joins aside, I can't understand what's breaking for you specifically. You're offering nebulous gripes tbf, not discrete problems.


That's interesting. What problems did you encounter?


>>It was built from the ground up with replication in mind.

Because it was inspired by one of the first document-based, nosql, replicated databases - Lotus Notes.

https://www.wired.com/2012/12/couchdb/


> Couch seems like a fantastic way to go for use cases involving syncing data between devices with an offline mode and syncing between clients. It was built from the ground up with replication in mind.

Have you come across any simple examples that show offline mode and syncing between clients with replication?


syncing between clients requires a network between clients and normally, clients only have connections to servers. But if you are in a situation where clients can open TCP connections to other clients, CouchDB can sync over that.


I use CouchDB. I love its multi-master replication capability, HTTP API and its ability to monitor for changes easily.


I use couch and pouch frequently. For replication mostly.

I’ve copied airtable data to it in the past.

Recently I implemented an event store CQRS system designed to be usable offline. I considered syncing events to the client via sockets but I needed to implement diffs. So I use couch and pouch as read only side of CQRS with an append only event CouchDB. The actual data is in Postgres.

Authorization is tricky. I do not recommend trying to do document level access control. I simply added an express endpoint that allows only reads and checks the session user for which table they can access. Then pass the request to couch.

Overall works really well. I recently turned off live sync for web and react native and loop over all of my databases on a set timeout interval. I had trouble with many connections at once.


> Authorization is tricky. I do not recommend trying to do document level access control.

If you're trying to do that, true. But if you simply let the user sync his data as he pleases, auth is quite easy with the library I maintain (link in my profile).

> I had trouble with many connections at once.

Can you elaborate on how many? And Did you increase max_db_open and the necessary OS limits? I'm currently planning to go the other way, but also a bit worried that too many open connections can cause trouble.


Neat project. If the data belongs to only that user it works pretty well. I run into trouble with sharing across users on a team and individual user sharing. In that case, I see most solutions use express as middleware for the couchdb connection.

> Can you elaborate on how many?

For web, I can have as many as I want syncing. I haven’t stress tested it yet tho. I have a CouchDB in prod that throws an error message about connection limit a few times a day. This one writes and reads. It restarts the docker container to recover since I haven’t had time to investigate. You may have given me the answer :)

On React Native, I’ve had odd behavior around 5 connections. That’s where I need to periodically poll for syncing. It works out best since the user is offline most of the time and downloads infrequently.


When i used it for web I hit a max of 6 simultaneous connections in chrome. I seem to recall that it was an (intended) browser limitation - but it doesn't throw any errors, so it's not very obvious. I forked the socket-pouch library and changed it a bit to sync unlimited db's over a single websocket connection. It worked like a charm (despite my messy code).

If you're hitting the same issues, this might be it. A non-throwing limitation in the browser runtime.


> I run into trouble with sharing across users on a team and individual user sharing. In that case, I see most solutions use express as middleware for the couchdb connection.

A shared databases where everyone can read but only some can write is very much possible with design docs. And when being flexible with assigning/revoking roles + creating/replicating data, a lot can be modelled without a proxy.

But yeah, at some point it's probably easier to use a proxy than to do weird stuff with databases and roles.


> Biggest bummer of CouchDB? If you’re not hosting it yourself, there’s only one major player in the market that I know of: IBM Cloudant.

That's the biggest problem my projects using Pouch/Couch are facing. The tech choice was made when Cloudant still had Azure datacenter support and IBM's multiple confusing changes to their Cloud brands has put it in a situation we aren't entirely happy with and I keep getting asked/pressure if I can move things back to Azure datacenters.

I don't know what I'm going to replace it with and I still wish Azure CosmosDB was more friendly to Couch replication. (It's so close, especially its Changes feed, I feel that the proxy I need probably doesn't need to do all that much I just don't think I have the budget/time to build and test such a proxy.)


I didn't realize Azure and CouchDB/Cloudant got along together for awhile! No more though?

> I just don't think I have the budget/time to build and test such a proxy

Same here, including with the authentication shortcomings we hope get addressed, like per-document security or other improvements.


Cloudant was a startup that targeted multi-cloud. At one point they supported cluster deployments to AWS and Azure. They were bought by IBM and dropped AWS support but kept Azure support for a bit longer as they built out more of "BlueMix" (early IBM Cloud brand name), and then IBM did its dance of Cloud brand names and datacenters supported and Cloudant dropped Azure support too.


Thanks for the info! I knew Cloudant only after the IBM purchase, so this is news to me.


I use it every time I can and I freaking love it.


Last write wins is not a strategy for conflict resolution, it's a surrender.

So I'm glad to hear something else apart from Pouch actually handles it. Anyone familiar with rxdb and can chime in on how they do it?


It sounds like rxdb is built on top of pouch, so probably the same set of options with the possibility of some opinionated design or sensible defaults though I can't find anything. obvious.


Yes RxDB conflict resolution is equal to PouchDBs. At least for now, there are plans to improve from there where you have a global resoluting function instead of listening for conflicts in the changestream.


Not directly related to the post (which is focused on which database to host for your app), but I'm writing desktop apps (think DAWs) in C++ and Rust (not JS), and want to synchronize settings through a Dropbox or Google Drive (so I don't have to host my own cloud sync servers). What's a good library or schema to achieve this?

Personally I usually don't have multiple instances of the same app open on multiple machines, but other people might open the same or different files on their desktop and laptop.

- Not all settings should be synchronized (don't include machine-specific "recent files" paths).

- How should settings be stored locally (if I may have multiple instances of my app open on a single machine)? Registry (Windows-only)? INI with atomic saving (requires care and locking to prevent multiple instances from trampling or racing with each other)? SQLite?

IMO Stylus is a pretty good implementation of offline-first cloud settings sync over Dropbox/etc. It's currently based around one JSON file per CSS file (Dropbox/Apps/Stylus - Userstyles Manager/docs/uuid.json), and what appears to be a transaction log (Dropbox/Apps/Stylus - Userstyles Manager/changes/number.json). Cloud sync has been 100% reliable in my experience, though I do notice temporary file lock errors when switching between different machines in my dual-boot setup (but sync seems to be eventually consistent nonetheless).

uBlock Origin is worse. Instead of merging settings, it expects the user to upload and download the entire settings blob at once (and pulling an old blob can erase changes you've made locally). And in the past it's entirely failed to sync because the blob was too big to upload to Mozilla's servers. (Right now it "works" but takes several minutes for one computer to see a config uploaded from another computer.)


I've done something pretty ridiculous to solve this problem and I'm not sure I'd recommend it, but here it is:

- A directory is synced with the server with no conflict resolution - The application creates its config file in that directory, named by a random UUID, which is stored outside the synced folder - The config file stores the setting overrides (defaults were compiled-in) in any format (I used YAML) - Each setting override includes a "locked" and "lastModifed" - On startup (sync was external) all files in the directory are read and merged starting with the local one, then skipping any settings that are locked (locally or remotely), last modified wins

Some deployments used a daily rsync cronjob, some had a mounted network share (with hilarious broken file locking) and of course it worked with direct bind mounts as well.

I also briefly experimented turning the "locked" field into a "group" field to enable multiple "sync groups" with some keys shared globally and some only with other group members (even different groups for different settings), but it ended up not being useful for my use case, although it did work.


Sounds interesting. I suppose it would break if a computer's clock was set in the future (its lastModified would always win), but IDK what wouldn't break in that scenario.

What does the locked field do?

Do you have a link to your implementation, or is this proprietary?


I would imagine using a CRDT would be appropriate for the data you want to sync would let you sync state between multiple open clients.


My old company used S3 to sync config files in json. S3 is strongly consistent now


I really enjoy examples like this, thanks, I’ll be exploring it.

As an aside… I would truly love to explore a collection of interesting ways to use SQLite. It’s such an impressive piece of technology that I’d like to use more often. Please share if you have something similar!


I had a really great experience building an EAV store with Datalog as the query interface on top of SQLite for embedding in native mobile apps.

Pros: querying complex data hierarchies was easy, and was able to skip the pain typically associated with managing a SQL schema.


What’s the application for this? EAV is often an anti-pattern when a schema could be defined, but I’m actually using it as well. Our application is an end-user-defined database for mobile data collection. The EAV model in SQLite is a bit of a cognitive burden but makes offline sync and conflict resolution pretty straight forward. It’s almost a crude CRDT implementation.


It was a scheduling, work tracking and invoicing app for service providers that have spotty-at-best network connections, think long periods out of cell service but still need to do complex data entry and querying.

> EAV is often an anti-pattern when a schema could be defined

Super interesting, I wasn't aware that EAV is an anti-pattern in that case. Is it an efficiency thing?

For clarity, my design wasn't schemaless, values (can) have defined datatypes and relationships are first-class. I meant that I found adding to or modifying the schema was less cumbersome and error prone than traditional SQL schema additions or changes. I feel like SQL schema management is more suited to server-based dbs where you have tight control over the db lifecycle, which you don't when it lives on a bunch of mobile devices.

Totally agree with the ease of sync and conflict resolution, another strong pro.

Love to hear more about your approach! Also feel free to reach out (email in bio) if you'd like to compare notes some time.


Are you able to elaborate on why you chose EAV over using something like the json1 extension of SQLite?


Honestly I didn't look at json1.

It was built on a single table that held the entity-attribute-value tuple along with some additional metadata like type information, whether or not the attribute was a pointer to another entity, and the cardinality of that relationship (one or many).

Relationships were walked via self joins and the eav columns were all indexed.


This sounds very similar to what we’re doing and we are in the process of migrating most of the EAV models (other than the relationships) to a json1 column (Which I’d argue is still EAV just in document format). Keeping the relationship foreign keys outside of the json allows the database itself to enforce referential integrity.

The difficulty with having all attributes be EAV becomes apparent when having to do multiple joins to fetch a single record type (what would be a “table” traditionally). Although this is manageable, the bigger difficulty I’ve found is synchronizing deletions of records, especially if deletions/insertions are done in bulk. Rather than just 1 transaction you have to do multiple delete/insert queries to also delete/insert the attributes and the values and they should be done in a way that doesn’t break key constraints.


I would love to hear more! Did you write the datalog layer or is there one somewhere? Is there any code available I could see?


You might be interested in the now defunct Mentat project from Mozilla. They made an EAV store with syncing on top of sqlite. It ran datalog queries by translating them into sql.

https://github.com/mozilla/mentat


I’m aware of mentat and eternally disappointed that it got shelved :(


I'm doing something like this with CRDTs + RDBMS (POC with sqlite).

The closest thing is this:

https://munin.uit.no/bitstream/handle/10037/22344/thesis.pdf

There it split each record in a stream of CRDTs values. I found (quickly!) that it could cause serious violations of business logics if done as-is. Now, I trying to threat the record as whole. Still could have issues for multi-record/table logical integrity, so I have tough in build a "transaction markers" so your stream of changes are:

  Start
   ADD: T1.Row1...
   ADD: T2.Row1...  
  End
So you don't partially apply a change.

P.D: If interested and know Rust we can talk!


Thanks for sharing! Unfortunately, I don't work with Rust :/


Maybe not very interesting but I'm using SQLite as a dataframe replacement in languages where the support isn't great, for a project where I want to transition from a CLI tool to a web application (and probably from SQLite to Postgres).


Huh, I'm unfamiliar with Dataframe. Thanks for mentioning it!


SQL in the browser again: https://github.com/jlongster/absurd-sql


> It basically stores a whole database into another database. Which is absurd.

The project is both interesting and amusing, thanks!


From the perspective of someone less familiar with this kind of thing, this comparison would be easier to understand with a bit of an introductory explanation about what job we're trying to do or problem we're trying to solve, and the assumed context or constraints.


The key piece of missing info is this is about web development. Some web devs don't seem to know that there are other engineers out there who don't work in web dev. More specifically, we're talking about frontend web dev, which means writing code that runs in a web browser. Browsers traditionally are stateless and get all their data from a server. Any frontend code would also need to communicate with some backend if it wants to store any persistent state. But that requires an internet connection. Nowadays browsers have some ability to store persistent state themselves so this is about supporting code that works offline, ie. without a connection to the backend.


Offline-first is a well-known term. You can search for it elsewhere. When writing, it's important to choose who you're speaking to exactly so that you can avoid sharing context, since that takes up time and bandwidth. The truth is, communication is much more efficient if you don't re-explain every single concept that you are talking about.


leaving aside the definition of "offline first", it'd be clearer to understand if the there was a brief problem statement pinning down that yes, we're specifically investigating offline first databases that are easy to integrate with javascript apps -- or perhaps comparisons of other offline first approaches for native mobile apps or whatnot would make sense and be welcome.

naive question: arguably git and mercurial and subversion could be thought of as offline first databases -- albeit targeted at a domain-specific use case. does it make any sense to compare them too?

academia has many things to learn from the world of software development, particularly around testing to ensure quality and reproducibility of work, but perhaps software development could benefit from a few ideas from academia: giving a brief introduction to contextualise the work -- not an extensive glosarry, but at least a few links to relevant work others have already done - ideally with at least one link to something that introduced the idea or is an extensive survey of the subject.

I'm reading the book "designing data-intensive applications" at the moment and looked up offline-first applications in the index, which references http://blog.hood.ie/2013/11/say-hello-to-offline-first/ , which no longer exists, but is still mirrored by https://web.archive.org/web/20200222150347/http://hood.ie/bl...


A compromise solution would be linking to a glossary or introductory material if someone is generally versed but not in a particular subject, we get a lot of, say, pure math people on here who find software engineering interesting.


you'd still have to draw a line at some point which terms you'd explain as otherwise your article only consist of glossary information.

i'm pretty sure the term offline-first wouldn't have met the cutoff point, as its _really_ well known from my experience.


I wrote so much things and posted it on HN over the years. There is just no way to make everyone happy, someone always complains that some infos are missing or too much.

I now leave things out that can be googled and are already known by "most" readers.


This has drastically improved my writing too. It's ok to target a specific group of people when writing. Writing for the lowest common denominator hurts everyone sometimes - it might be too high level for beginners, but too much background info for experts.


Am I wrong to think that pouchdb uses indexeddb?

Over the last 8 months, I've been working on a react/capacitor based android app (potentially ios later on) and was originally using idb-keyval, which uses indexeddb for key val storage, and things were great since it's a local storage solution compatible with react/capacitor, and one of my goals is to not rely on a remote data storage solution.

As said, things were going great, but then a couple weeks ago things went to shit when all the data that was stored in the prototype on my android device was wiped. Apparently, both android and ios tend to wipe browser/web-view local storage at random/when space is needed(?).

Dug around since looking for an alternative solution (would love a capacitor compatible mongodb solution), came across pouchdb via rxdb, but could've sworn there was mention that it relies on indexeddb. So just to be safe switched to sqlite and been rewriting components since.

Lesson of the story, even if it isn't dependent on indexeddb, if you're looking for a local storage option for a mobile app and happen to be using a js framework with capacitor or anything that utilizes a web-view, stay away from anything that uses indexeddb. If a wipe like this were to happen post release, the chance that your app succeeds afterwards would be near 0%

Edit: so yeah, just double checked/was reading through the readme of this project, and pouchdb via rxdb is reliant on indexeddb


Author here.

I am using RxDB with Capacitor (iOS and Android app). You can use the SQLite based pouchdb adapter with capacitor. It keeps your data and is (sometimes) faster.

Here [1] I have documented a whole section about how to use RxDB+SQLite in Capacitor.

[1] https://rxdb.info/adapters.html


Ah okay cool. Thank you for pointing this out.

Now would I use the adapter for react native or for cordova?

If react native, I've been writing with react js and have been under the impression that react native specific plugins aren't compatible with react js. Is that wrong?

If cordova, it's totally compatible with capacitor?

Sorry just want to make sure before jumping in


cordova-sqlite adapter can be used with cordova and capacitor. The article says so.


Just sought further confirmation as I've had issues in the past with supposed cross compatible cordova/capacitor plugins.

Also wasn't sure whether it was being suggested to use the react (native) plugin or the cordova plugin, as I didn't clarify (I should have) in the parent that I'm using react js and not native.


How fast is the Pouchdb with sqlite on android? I dont really need any replication in my application, should i just use the normal cordova sqlite instead and switch to dexie.js when using in web browser?


> Apparently, both android and ios tend to wipe browser/web-view local storage at random/when space is needed(?).

"Apparently"? Mozilla docs say so much in the introduction[0], even directing you to a dedicated page[1].

[0]: https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_A... [1]: https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_A...


Can confirm: don’t use IndexedDb in a mobile app made with Capacitor or Cordova. I never had the IndexedDB wiped by the OS, but there are occasional browser bugs that affect IndexedDb.

One time, users could not load their data if their android device had less than 1 GB of free disk space, because Chrome had a bug in calculation the quota for IndexedDb.

SQLite is fast and it gives me peace at night, knowing that user data is safe.


> one of my goals is to not rely on a remote data storage solution

Yeah, that's going to be hard goal to meet in this case. A benefit to PouchDB over raw IndexedDB is the great replication support and replicating everything back down in the case of one of those worst case wipes is an alright solution in some cases (but that does require managing remote data storage solutions, unfortunately).

Android and iOS are supposed to manage IndexedDB as Application Storage when installed as an app via something like Capacitor, but it's still not great. Supposedly Android is getting a lot better about it for apps installed as PWAs instead and Capacitor should give you a good PWA path for Android at least. (iOS is unfortunately still lagging far behind on PWA support.)


My biggest gripe with offline-first capability of PWAs is speed. a network roundtrip is generally faster then fetching the same information from indexeddb, and you still have to sync first because indexeddb tends to get wiped at inopportune times.

its great if you're writing a traditional app though, but really unfortunate wrt to the PWAs


IndexedDB is a _bad_ API, and making many small read/write operations on it is absurdly slow. This is one of the reasons why WatermelonDB on web only uses IDB as a dumb storage medium but actually does all the database'y things in memory. You won't reasonably scale to gigabytes this way, but for most PWAs this is plenty enough and MUCH faster in practice. Certainly far faster than network.


Why is that?. Is this because you're on a fibre link? indexedDB is being used with several layers to add a comfy SQL and ORM layer for you?

I've used indexedDB on a couple of projects at work, while there are definitely downsides with its indexing design, limited querying options and the menagerie of fuckups by Team Fruit(TM), it works well as a local cache when our clients are out on site with their customers and all they've got is a crappy intermittent 3/4g signal.


In my tests I actually went with plain indexed db at the end, because I wanted to make sure it's as fast as it could go.

It's true that the indexeddb access is faster then a 3g signal with timeout, but that wasn't happening often enough to warrant slowing down all other requests measuribly just to speed up the rare case when this occurs.

Loading from indexeddb generally took about 100-200ms, loading data over WLAN/4g from a remote server (~500km real life distance) over socket took < 50ms overall for multiple json payloads with about 200 serialized entities altogether.

And doing both at the same to serve whatever finished first wasn't worth the trade for me either, as the indexeddb access isn't cheap from a energy drain perspective either.

Other people might come to different conclusions depending on their challenges.


Curious! Even with machines on the same premises IDDB has been orders of magnitutes faster for us. In the beginning we were storing carbon copies of mariadb tables and initial load times went down drastically.


Really? I've never used index db but to confirm your saying to fetch data from local memory is slower than network round trip?


The complaint was about IndexedDB access, not local memory. Folk mentioning this are usually referring to the disastrous implementation in Chrome: see https://dev.to/skhmt/why-are-indexeddb-operations-significan... and https://jlongster.com/future-sql-web


Wow. Orders of magnitude slower than ff.


indexedDB is quite a low level in what the interface gives you.

This includes what options you have available for querying (just accessing a key range, upper/lower bound, forward or backwards (backwards can be much slower), rather than a full query language like SQL etc..

So you need to design your app / data format and pre-plan any queries so you'll have the indexes you need, or do some glue logic to combine indexes as needed.


You can use CouchDB installed on a desktop PC and tell PouchDB to use that instead of the web browser's IndexedDB for offline-first apps.

This approach gets you pretty close to native app speed.


Any information on how scalable are these databases compared to traditional SQL databases? Or specifically, when should you use this (in prototyping or production)


IMO the principal consideration here is that these are offline/local databases for browsers and (probably?) Electron, so they are not intended to be scalable at all. If you have a progressive web app (PWA) and you need a queryable database for some reason, then you would use these. Otherwise, stick your queries behind API endpoints.


CouchDB is a lot more scalable than SQL databases because it has a distributed scaling model built in (just add nodes), no need to mess with read-replicate and finicky hot-failover, it all just works out of the box (Dynamo style).


It’s more scalable in theory, and I talked its praises in a different comment, but our team has hit scaling issues with our one-user-per-database approach. It was a mess to sort out, but Cloudant support was very helpful.

Our major issue: we write many small documents, and we write them over every user’s database fairly frequently. And Cloudant’s default settings don’t like that with a one-user-per-database approach. In fact, they discourage anyone from the one-db-per-user approach these days: https://www.ibm.com/cloud/blog/cloudant-best-and-worst-pract...

That blog post calls it an anti-pattern, but I would respectfully disagree. It is an absolutely great pattern to keep a native app and a web app in sync across multiple devices with intelligent conflict resolution.

A solution was to reduce the number of shards that a database was split out over, since our database’s data is pretty small overall and we didn’t need each database split out so much across our cluster.


sure, changing defaults for different use-cases is totally normal for any database. Best thing: CouchDB 3.x comes with shard splitting and default shard factor of 2 (previously 8), so you get best of both worlds getting started, and you end up with larger dbs you can split their shards on the fly.


What kind of consistency models [0] do Offline-first databases like RxDB and PouchDB have?

I was thinking read uncommitted, but they might allow dirty writes. Maybe there’s some CRDTs under the hood… I can’t find any documentation on consistency though, anyone here know?

[0] https://jepsen.io/consistency



Thanks! Very cool. I haven’t used couchdb in prod. I’ll read into this more.


Is the reason for all of these sorts of comments some sort of "never lose data" quality assurance?

There's consistency, write first, latency, replication, and then acronyms.

I just implement hardware and OS stuff, but I like the DB people to be happy. What am I missing?


Wait, this is benchmarking the Firestore emulator. How is that relevant?


It is benchmarking the firestore JavaScript library that runs at the client.


Maybe, but I don't see where the browser is brought offline to enforce that. It looks like the emulator is still running and there's still a connection to it.

Can you show where you force the browser offline?


I imagine that Supabase would be a perfect candidate for this project! I would love to get the authors opinion on it too.


I considered to add Supabase, RethinkDB and meteor. But they are not really client side databases. They realtime stream query results from the server to the client.


Did you look into https://ditto.live/? Its and off-line first db too. I never used it but looks very interesting and would be interested how it compare to the technologies you picked


I love RethinkDB. I really believe it is a database that deserves more attention. We've been running on it for six years and it is flawless, easy to use. Like any other database you need to understand how it performs and how to query efficiently.

But it does has everything required for streaming and syncing data in a modern way. You can open streaming queries and even with a backlog of data, so syncing should be pretty easy. Maybe would be feasible to fork PouchDB to use RethinkDB as a backend?


Firestore isn't really a client side database then either. It is certainly also querying results from a server.


I second that! Would like to see supabase since that’s what I’ve been using a lot recently and really liking!


I have a question, how do you manage the fact that Supabase is based on a relational db[1]? Do you just put all fields (except the primary key) in a JSON column[2]? Does it work well if you use it that way? We were investigating it as a way to escape Firebase's vendor lock-in but the fact that we would have to manage schema migrations was a bit of a deal breaker based on our use case. I'm also interested in hearing about any alternatives to Supabase that use a document-based nosql db.

[1] https://supabase.io/database

[2] https://www.postgresql.org/docs/13/datatype-json.html


Personably I kinda like the fact that it’s a relational DB. The stuff you gain by having users only allowed to do certain things based on certain fields using Policies makes life a lot easier and more relaxing.


Firestore can do that, a DB engine doesn't need to be relational to offer that feature.

https://firebase.google.com/docs/firestore/security/get-star...


Some charts for that table would be so awesome.


I'm not the OP or author. This sheet[0] has the Metrics and Feature Map data for charting, sorting, etc. directly from their GH page.

[0] https://docs.google.com/spreadsheets/d/12ReO-4_bZ2BaLj9P6oJT...


I've used watermelondb and found it quite nice to use. I mainly chose it because it has observable queries. I can live without conflict resolution and transactions on the client side because it's an electron app. Having a sqlite backend (via the electron process) would overcome some of the shortcoming wrt to the in memory database (and still allow in memory via sqlite if speed is desired). I wrote a backend sync for it which was quite easy although I haven't fully implemented support for everything because I don't need it (notably deletes).


For offline-first use PouchDB can connect directly to a CouchDB installed on your desktop PC as opposed to storing user data in your browser's built-in IndexedDB and syncing that with a Cloud based CouchDB.

Syncing that local CouchDB to your web based CouchDB is very fast and it's done in the background so it doesn't affect the performance of actually using the app. You can click "Save" and move on with no waiting at all.

So in this scenerio the speed of the DB is not necessarily a reflection of the speed of the app for the user.


> Syncing that local CouchDB to your web based CouchDB is very fast

This is not true. When you compare the CouchDB replication with other replication protocols, it is slow. The reason is that CouchDB supports replication with many instances at the same time. This creates big overhead in handling revision trees. Many requests have to be made all the time. You can observe that by starting the PouchDB subproject in the comparison repo. Watch the network tab in dev tools. Another problem is that CouchDB does not support Websocket replication, everything is long polling and plain http requests.

Other replications that only support many-clients-to-one-server are way faster. Both, on the initial load and on ongoing changes. This was the main reason why I build GraphQL replication for RxDB.


>>This is not true. When you compare the CouchDB replication with other replication protocols, it is slow.

That may be true but when a single user is working with an offline-first app connected to a CouchDB installed on their desktop pc that happens entirely in the background so they don't experience any lag.

And while I've not done benchmark studies with CouchDB I have monitored the logs to watch those syncs and we're not talking painfully "slow" in real world use. It is reliable though. I've been using it for about 5 years now and it's been solid. And so has the work they've done to improve it.

And I was not aware of RxDB, which is certainly interesting, so thank you for sharing that!


See also Google's Lovefield SQL browser database:

https://google.github.io/lovefield


Second this. Oh and it’s not reaaaaly abandoned, they just called it done.

I looked into using this as well but eventually decided against it due to the lack of active development.


No commits in 2 years. Seems Google lost interest like usual.


I've used CouchDB and PouchDB on a previous project, and it was a blast. The built-in features like replication and HTTP API are great. My only regret is the limited support (at the time) for full text search and complex queries. I suppose I like joins a bit too much.. :)

I deployed CouchDB in a Kubernetes cluster (not with HA as I didn't have high availability requirements), and it was working great.


Yeah although the docs claim to support things like regex-based searches, it's so horrendously slow it shouldn't even be listed as a feature.


I often feel I'm on another planet when I see timings for browser things people are recommending - Insert one message - best 9ms, insert 20 messages - best 33ms, worse - >8 Seconds! Why bother? just write a native client.


When you insert 20 messages, there is way more happening than 20 db writes. Each insert will affect the UI which will cause a layout change. Also the changes will be replayed in different browser tabs and might also be replicated to the backend. Then there is things like conflict resolution, validation and encryption. You just cannot compare these speeds with your MySQL insert operation.


You see, the fact that you do all that for a write is what's wrong. The interface should be decoupled from the message write, and the GUI notified a change has happened and update if needed.


Yes, this is exactly what is happening. But still you only have one CPU in JavaScript, so the GUI part will take the resources from the database part.


So the browser isn't really suited to this sort of thing


The browser is an abstraction layer. It will always be slower. But all this does not matter because your user will not write 20 chat messages / second.


I hate to say slippery slope but - its bad, so why not make more bad stuff on top of it. I feel the bad parts of the browser are glossed over in software development decisions, this is a very bad approach for any engineering discipline. The surprising thing is how few software devs know how to do anything non browser, how can proper technology evaluations be performed without this? This is just the latest in a long line of such.


I just described the world how it is. I did not create it, I did not made the browser. I do not understand your point. You want to complain about what exactly?


I thought it was fairly clear - using the browser for an offline database is dumb. Even worse, by propagating this sort of nonsense it encourages new devs to think thats how things are done, when you can do the same thing with a native app, 10 or 100 times faster and take perhaps a tenth or even less the time taken to develop it.

Edit: and as I said at the beginning - "I often feel I'm on another planet" - having this discussion with web developers trying to justify something that is not just bad but dumb in so many ways always gives me a chuckle, and I try to educate on why there are other things beside browsers, and its not a very good way to write software


Browser-based apps have use cases even when you reject PWAs generally as a replacement for native apps. Trying out a new tool quickly, or short-term when using a tool with a customer, in enterprise where you can't install native tools, etc...


Well make a binary that doesn't require installation, this is a very narrow use case for a technology thats being advocated for more than that use case


Most enterprise users can’t download and run an arbitrary binary due to numerous security controls and regulations. This is true even if that binary is signed - if it isn’t on an allow-list maintained by high priests it can’t be used.

The real world demands web apps in the enterprise even in 2021.


This is a poor reason in 2021 - app stores exist. Software developers have an obligation, as professionals, to present the costs of web apps vs binaries when developing new projects, the browser is not an operating system and has real limitations - as this project demonstrates.


Surprised not to see Realm included (www.realm.io)


I wonder why multitab isnt supported on watermelon, crosstab updates thru localstorage api works great


You can use multiple tabs but to have more than one write to the DB and not be overwritten relies on (online) sync.

The problem is that Watermelon assumes a consistent view of the entire database, so you can't have multiple writers - at least not without synchronous notifications from IndexedDB (not a thing), leader election (cannot be made reliable), or some design sacrifices. To be reconsidered in the future...


> WatermelonDB uses the LokiJS adapter which is an in memory database that regularly persists the data to IndexedDB either on interval, or when the browser tab is closed


What is the advantages of using one of these client side databases vs using indexeddb directly?


IndexedDB is a joke of a database. Yes, it can store data, and you can create a very simple index, so it's _technically_ a database… But its ability to express queries is borderline useless for all but simplest use cases, it's slow, and it's very inconvenient to use. So solutions exist that range from giving IDB a simpler, more modern API, all the way to using IDB as a dumb storage medium to a fast in-memory database.


Replication, Encryption, Conflict Handling, Multi-Tab-Support, Compression, Observability and many more..


I never see Mozilla's Kinto mentioned.


gunjs?


I've tried to read the code 3-4 times now. Every time I have to give up. The "chain" thing is really bad code, even admitted by the author.

If you're considering GunJS, take a look at the code and verify it's something you'd want to debug, before going all-in.


I did go all in and it worked fine, great even. The devs are responsive on their Gitter for sorting things out and it was generally simple enough that I wouldn't need to mess with the internals


I saw the author present once, I couldn't follow anything they were talking about or the code samples presented.


How far the world has fallen from Linux, MySQL, SpringBoot, and HTML/CSS/JS


The world has moved on from a beige box on a desk with a fat physical link made of copper and fibre.

Allowing users to work when their mobile signal is spotty is pretty good UX for those that don't deal in a office cubical or wfh.

Yes you could argue that native mobile apps have that covered...

But should it be that you have two proprietary "standards", and only those two? For "trivial" apps that are just data entry/query?

Why do I need 124mb for the app code alone of Twitter, when the web version is closer to ~7mb (ang don't start the argument about 7mb being obscene for a website, please. we're discussing a web APP, not grandma's cooking blog).

Plenty of sites are still on the "Linux, MySQL, SpringBoot, and HTML/CSS/JS" stack or equiv. We just have tools now that let us add "works while you're on a train where the internewt is unreliable" to the featureset.


Wait, are you saying that the stack I mentioned doesn't work well on mobile? Because I've coded and tested everything myself-- my sites are infinitely better on mobile compared to anything written in react


You can only live in the past for so long. To bystanders, you're falling behind by being cynical and ignorant of newer solutions.

Don't get me wrong. The world still uses the technology you mentioned at large, but the industry has in-fact built upon and grown alongside older technology.


I don't want to seem like I'm defending SpringBoot, ferchrissakes, but still fail to see a big impact of PWAs. Especially a positive one for the user, not mere ad clicks.

In general, the whole rich web app space reminds me of the "You're not making Christianity better, you're making Rock'n Roll worse" meme.


I am genuinely shocked that people don't like spring boot? It is very performant, open, and ergonomic in my experience.


2021 Spring and Maven could buy me chocolates and flowers all day long, I wouldn't forget what they did to me in the past...

But seriously, I'm okay with working with SB when I have to, but quite often in those situations Java wouldn't be my first choice, and it's still all the putrescence of proper Spring underneath. A framework for a framework, with a bit too much magic for me. Explicit is better than implicit.


While I certainly can fathom that other newer technologies are capable of solving interesting problems, I think my assertion is that the additional value they provide is not worth the risks involved in trying them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: