Hacker News new | past | comments | ask | show | jobs | submit login

As someone who has been doing local-first for the last 3 years with Notesnook, let me tell you: it's not all gardens and roses. Local first has its own very unique set of problems:

1. What to do with stale user data? What happens if a user doesn't open the app for a year? How do you handle migrations?

2. What about data corruption? What happens if the user has a network interruption during a sync? How do you handle partial states?

3. What happens when you have merge conflicts during a sync? CRDT structures are not even close to enough for this.

4. What happens when the user has millions of items? How do you handle sync and storage for that? How do you handle backups? How do you handle exports?

One would imagine that having all your data locally would make things fast and easy, but oh boy! Not everyone has a high end machine. Mobiles are really bad with memory. iOS and Android have insane level of restrictions on how much memory an app can consume, and for good reason because most consumer mobile phones have 4-6 gbs of RAM.

All these problems do not exist in a non-local-first situation (but other problems do). Things are actually simpler in a server-first environment because all the heavy lifting is done by you instead of the user.




> 1. What to do with stale user data? What happens if a user doesn't open the app for a year? How do you handle migrations?

    version = db.query("select value from config where key='version'").fetch_one()
    switch (version) {
    case 1:
        db.migrate_to_version_2()
        fallthrough
    case 2:
        db.migrate_to_version_3()
        // ... and so on
    }
    assert(version == 3)
    start_sync()
Just don't delete the old cases. Refuse to run sync if device is not on the latest schema version.

One of my Django projects started in 2018 and has over 150 migration files, some involving major schema refactors (including introducing multi-tenancy). I can take a DB dump from 2018, migrate it, and have the app run against master, without any manual fixes. I don't think it's an unsolved problem.

> 2. What about data corruption? What happens if the user has a network interruption during a sync? How do you handle partial states?

Run the sync in a transaction.

> 3. What happens when you have merge conflicts during a sync? CRDT structures are not even close to enough for this.

CRDTs are probably the best we have so far, but what you should do depends on the application. You may have to ask the user to pick one of the possible resolutions. "Keep version A, version B, or both?"

> 4. What happens when the user has millions of items? How do you handle sync and storage for that?

Every system has practical limits. Set a soft limit to warn people about pushing the app too far. Find out who your user with a million items is. Talk to them about their use cases. Figure out if you can improve the product, maybe offer a pro/higher-priced tier.

> Mobiles are really bad with memory. iOS and Android have insane level of restrictions on how much memory an app can consume, and for good reason because most consumer mobile phones have 4-6 gbs of RAM.

You don't load up your entire DB into memory on the backend either. (Well your database server is probably powerful enough to keep the entire working set in memory, but you don't start your request handler with "select * from users".)

You're asking very broad questions, and I know these are very simplistic answers - every product will be slightly different and face unique trade-offs. But I don't think the solutions are outside of reach for an average semi-competent engineer.


> You may have to ask the user to pick one of the possible resolutions. "Keep version A, version B, or both?"

For structured data, with compound entities, linked entities, both, or even both in the same entity, that can be a lot more complicated.

If a user has updated an object and some of its children, is that an atomic change or might they want the child/descendent/parent/ancestor/linked updates to go through even if the others don't? All of them or some? If you can't automatically decide this (which you possibly can't in a way that will satisfy a large enough majority of use cases) how do you present the question to the user (baring in mind this might be a very non-technical user)?

Also what if another user wants to override an update that invalidates part/all of their own? Or try to merge them? Depending on your app this might not matter (the user might always be me on different devices, likely using one at once, that is easier to understand than the user interacting with others potentially making many overlapping updates).


I think you misunderstand. My intention was not to say local-first is bad or impossible; it's not. We have been local-first at Notesnook since the beginning and it has been going alright so far.

But anyone looking to go local-first or build a local-first solution should have a clear idea of what problems can arise. As I said in the original comment: it's not all gardens and roses.


> As I said in the original comment: it's not all gardens and roses.

Which is why I bit ;) You're raising valid but also quite broad concerns. Share some war stories, if you can <3


Ah war stories.

Just a few weeks back a user came to us after failing to migrate GBs of their data off of Evernote. This, of course, included attachments. They had successfully imported & synced 80K items, but when they tried to login on their iPhone, the sync was really, really slow. They had to wait 5 hours just to get the count up to 20K items. And that's when the app crashed resetting the whole sync progress to 0.

In short, we had not considered someone syncing 80K items. To be clear, 80K is not a lot of items even for a local-first sync system, but you do have to optimize for it. The solution consisted of extensively utilizing batching & parallelization on both the backend & the users' device.

The result? Now their 80K items sync within 2 minutes.


In this case, would the optimizations/fix be easier if you were using traditional client-server setup vs local-first?


The problem wouldn't exist. This was about the phone fetching 80k new items from the server. If the phone just shows the item you're looking at, one at a time, and doesn't try to sync everything, there's no such problem.


There's no restriction inherent to CRDTs/local-first around a partial sync. You are not required to sync everything.


I've been working on mobile apps developed for education in Afghanistan, rural Rwanda, etc for the last 9 years. I used to think that sync was the way, but I have learned from painful experience.

4 (extended): What happens when the user has access to millions of items, but they probably only want a few (e.g. an ebook library catalog)? Do you waste huge amounts of bandwidth and storage to transfer data, of which 99.9% will be useless? We had a situation where the admin user logging in on a project that had been running for years resulted in a sync of 500MB+, 99.99% of that data the admin would never directly use.

Also: do you make users wait for all that old data to sync before they can do anything when they login?

Relying on sync is basically relying on push-only data access. I think in reality most offline/local first applications will work best when they push what a user probably wants (e.g. because they selected it for offline sync, starred it, etc) and can pull any other data on demand (in a way that it can be accessed later offline).

I've outlined that here: https://medium.com/@mike_21858/auto-generating-an-http-serve...


Query-based sync to partially replicate is an absolute must. This was a key feature with Ditto: https://www.ditto.live


Query-based replication works when you know what the user probably wants to have in advance (e.g. a device in a warehouse needs records for that stock in that warehouse, not others). But that's still push.

You still need pull on demand access when a user opens any random item where we don't know in advance what they probably want (e.g. a discussion board scenario).


I'd say you're spot on except for point (3). There's a number of crdt and event log approaches that, when combined properly in order to preserve user intent, can solve almost all merge issues of applications that do not require strong consistency.

> 4. What happens when the user has millions of items?

Partial replication is a problem I haven't seen many people solving but it is definitely the next frontier in this space.


I am the developer of RxDB, a local-first javascript database, and I made multiple apps with it and worked with many people creating local first apps. The problems you describe are basically solved.

> What to do with stale user data?

Version/Schema migration in RxDB (same for IndexedDB and watermelonDB) can be done in simple javascript functions. This just works. https://rxdb.info/data-migration.html

> What about data corruption?

Offline first apps are built on the premise that internet connections drop or do not exist at all. The replication protocols are build with exact that purpose so they do not have any problems with that. https://rxdb.info/replication.html

> What happens when you have merge conflicts during a sync?

You are right, CRDTs are not the answer for most use cases. Therefore RxDB has custom conflict handlers, which are plain javascript functions. https://rxdb.info/replication.html#conflict-handling

> What happens when the user has millions of items?

There is a limit on how much you can store on a client device. If you need to store gigabytes of data, it will just not work. You are right at that point.

> How do you handle backups? How do you handle exports?

- live backups: https://rxdb.info/backup.html

- json export/import https://rxdb.info/rx-collection.html#exportjson


Some domains just don't have these issues, like note taking where data is small and there are many acceptable ways to handle conflicts, like backup files and three way merges.

Maybe the question is less "How to make this work local first" and more "How to squash down the problem domain into something that just naturally fits with local first"?

I wish we had something like an embeddable version of SyncThing, that had incoming file change notifications, conflict handler plugins, and partial sync capabilities, and an easy cloud storage option.

I think most everything I've ever wanted to p2pify could probably be built on top of that, except social media which is a separate really hard thing.


Data migrations don't exist on non-local-first software? Interrupted requests don't exist?

Using 4 GB per user on your backend works?

I'm very surprised by this list...


I don't think even you understand what you just said.

Consumer devices are notorious for their reliability problems. Compared to a full blown server that you have 100% control over with almost insane amounts of RAM & CPU power & a lot of guarantees.

Running a migrations on a server is far, far different to running it on every users' device. The sheer scale of it is different.

> Using 4 GB per user on your backend works?

That was a comment on the average RAM on a consumer device - not the total RAM required per user.


> Running a migrations on a server is far, far different to running it on every users' device. The sheer scale of it is different.

Well, it’s not only just that. Among the other things, some of the application instances would be outdated but still need to work, so you would need to support _all_ the DB schemes you have ever had for your app.


I know I understand what I said, and I am not convinced by anything you said.

What reliability problem would prevent you from running local-first software but doesn't interfere with running a thin client?

Why would the business logic part of your app require more RAM on the end-user device that it requires per-user (or per-document, etc) on a server?

Why do you claim that running migrations is so fundamentally different here and there?

If you want to argue I would appreciate you doing it with real arguments and experiences rather than even more unsubstantiated claims and statements like "you don't understand what you just said".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: