LokiJS – Lightweight JavaScript in-memory database

skrebbel · on Nov 4, 2014

Great!

I'm really no expert, but I think that language-specific persistent in-memory databases are the way to go.

Deep inside, what I really want on my backends is just a data structure. A single big tree (or graph) of objects. Some primitives to make it thread-safe somehow (immutable collections, or mutexes, or just old-fashioned single-threadedness like Node - whatever), and that's it. Requests from the client query and mutate that data. In very many situations, I wouldn't need anything special to make this work fine with an application up to a pretty large number of users. Not all apps do big data analyses. RAMs are gigantic these days, bigger than the SQL datafiles of swathes of webapps.

However, there's that pesky problem of code upgrades and crashing servers. If my server would never crash and I could hot-swap the code, I wouldn't need persistence at all. But I do, so I don't just want an in-memory datastructure, but a persistent in-memory data structure. I want to use this datastructure as easily as the programming language could possibly make it be for me while still guaranteeing some sane level of persistence and fault tolerance.

Redis is a nice idea, but the fact that it's a separate server, with a client and a protocol and data structures that just not precisely map to my programming language's data structures force me to write a whole bunch of boilerplate anyway. Even more so with other databases like Mongo or Postgres.

If I understand it well, Loki doesn't entirely do what I'd want: it does not save the data to disk on every change, but less often, if I get it right. That might be good enough if my problem allows for many little independent Loki databases, but if the dataset is a gigabyte and persistence means flushing that whole gigabyte to disk after every data change, it probably won't work very well.

I'm really curious if other people here have similar ideas, maybe implementations, of these concepts. Maybe Loki could be extended with a snapshot+operation_log type of data storage like Redis has and then it'd be pretty close.

yourad_io · on Nov 4, 2014

> If I understand it well, Loki doesn't entirely do what I'd want: it does not save the data to disk on every change, but less often, if I get it right.

From a cursory glance at the source, it doesn't look like anything internal ever calls .save()/.saveToDisk(), so it is up to you to decide how often you want to be creating your on-disk checkpoints.

EDIT: Seems I am wrong: "In a nodejs/node-webkit environment LokiJS also persists to disk whenever an insert, update or remove is performed." [1]. But I still can't spot where .save[/ToDisk] is called in the source [2]. Anyone?

> That might be good enough if my problem allows for many little independent Loki databases, but if the dataset is a gigabyte and persistence means flushing that whole gigabyte to disk after every data change, it probably won't work very well.

If you're looking at gigabyte+ datasets, then something designed to be in-memory is probably not your best bet. Aside from saving, (one imagines) this would also affect load times (reading->parsing->importing a multi-GB JSON file at launch can't be quick).

From [1]:

> LokiJS is ideal for the following scenarios:

* where a lightweight in-memory db is ideal

* cross-platform mobile apps where you can leverage the power of javascript and avoid interacting with native databases

* data sets are not so large that it wouldn't be a problem loading the entire db from a server and synchronising at the end of the work session

[1] https://github.com/techfort/LokiJS

[2] https://github.com/techfort/LokiJS/blob/master/src/lokijs.js

skrebbel · on Nov 4, 2014

> If you're looking at gigabyte+ datasets, then something designed to be in-memory is probably not your best bet. Aside from saving, (one imagines) this would also affect load times (reading->parsing->importing a multi-GB JSON file at launch can't be quick).

Why not? I only need to load when my program starts, i.e. after a crash or maybe an upgrade when hot code swapping is not available. Sounds like there may be plenty of persistence schemes that make this not entirely painful (load latest data first, or fallback to disk reads if the data isn't entirely loaded yet, which makes the service slower but still available right after a crash).

Note, I'm really just dreaming here, and I appreciate you dreaming along. Dreaming is good! Some frontend dev dreamed "i just want to rebuild the entire page whenever data changes!" and then React happened.

Thanks for your dig into the code btw, nice findings! I can't find the save() calls either, so I suspect that they used to be there but aren't now, or the other way around. It's alpha, after all.

yourad_io · on Nov 4, 2014

My (unseen) emphasis was on best bet.

It would work, but I think it would be dangerous and/or inconvenient.

The danger part comes from having a sync-to-disk operation that lasts any considerable amount of time - the longer it lasts, the more the likelihood that an inopportune crash would leave you with an incomplete (read: corrupted) JSON file. A DB built for fast disk persistence would only update the relevant records, keeping the disk writes as small as possible. I don't think Loki has any option other than writing the entire thing each time (with the current JSON savefiles, that is). Since save() is an expensive op, you also can't be calling it at every update (it wouldn't even work! the first one would lock the file for writing and all subsequent save() ops, until the first one is done, would fail!) so it would be inherently unsafe unless you committed GBs to disk for each update!.

So, this might be somewhat dangerous, but we can mitigate that, right? We'll save frequently, but not too frequently, and somehow version-control our JSON file. Which is GB+ in size. So we'll also compress them? And when we need to restore, we'd.. hmm.. start reading from the latest until a valid one is found? (<--inconvenience)

> Note, I'm really just dreaming here, and I appreciate you dreaming along.

Likewise!

> Sounds like there may be plenty of persistence schemes that make this not entirely painful (load latest data first, or fallback to disk reads if the data isn't entirely loaded yet, which makes the service slower but still available right after a crash).

While those certainly exist, the one currently chosen by Loki can't do any of these things. Since you can't parse half a JSON file (especially in this format, which appends index info etc at the end of the file), you have to read the entire thing from disk, JSON.parse it, and then feed it into Loki. Only after all of that will you know that you are actually restoring from meaningful (and complete) data files and not junk.

The way I see this, it would be great for throwaway-prototypes on the serverside (or for few/unimportant data) but its real value would show on the client side. You basically get a mini-mongodb for 2K LOC that you can embed on anything that speaks ECMA3.

What's even cooler IMHO is the future - on their slideshare presentation they list replication and horizontal scaling on their roadmap. While I have no idea how they have envisaged implementing those, I would love to experiment with the meteor.js concepts and this - providing a full, fast cache of the user's data on the client side and then throw differential updates around.

edit: spelling

delluminatus · on Nov 4, 2014

I downloaded Loki and ran some tests and the README is wrong, it does not do any automatic persisting to disk at any time. That might be a future planned feature, perhaps.

However it does throw events, so it would not be hard to implement this behavior yourself with a simple event listener.

yourad_io · on Nov 4, 2014

I was looking for that too, it seems to be missing indeed. Opened an issue[1].

[1] https://github.com/techfort/LokiJS/issues/14

coldtea · on Nov 5, 2014

>I'm really no expert, but I think that language-specific persistent in-memory databases are the way to go.

Yeah, let's go back to the seventies, and drop everything we learned about having an abstract, sound data model, and multiple access from different services.

tracker1 · on Nov 5, 2014

While I appreciate your sentiment here, the fact is there are situations where having the most bulletproof solution isn't the best solution.

A perfect example are, in my mind, some of the reasons why node.js is taking off as opposed to say Java or C#... One, is that the latter tend to follow "enterprise" development practices. That is, they will follow "architect" tooling that implements "design patterns" ... the problem is that this adds its' own kinds of complexity.

In C# (or Java), there are frameworks and tooling that tend to follow design patterns, which is nice. There are abstractions that allow for the use of multiple injection/ioc systems to make code more modular and testable. Which is awesome. It also makes code more complex, and the abstractions are often harder to follow.

In JavaScript, since you can replace just about anything, and even call into functions with differing contexts, this makes testing even easier. Because if this simplicity, with a module system (like commonjs/node/npm), you can write testable code without the need for the use of interface abstractions... It becomes easier to test, write for and use.

There are down sides to using JS over C# etc.. but the same arguments apply to data usage. Most "NoSQL" databases give up some safety, usually for big performance gains. It really depends on the context of the data, which I think in DBA terms is often lost. I've seen some horribly normalized datasets used in SQL to create "flexible" systems, where a NoSQL solution is a much better fit.

It depends on your needs. There are times where I'd lean more towards ElasticSearch, Mongo, Redis, Couch, Cassandra, RethinkDB and others over the likes of MS-SQL, PostgreSQL, Oracle, DB2, Firebird, etc.... all have advantages and disadvantages over eachother. An embedable db abstraction is often a very good thing. Depends on what you need.

I specifically added Firebird above, because I've used it as the core for an embedded/distributed data layer before... This was long before node.js was an option.. now, I might favor something like tfa solution, with a mongodb central db.

coldtea · on Nov 5, 2014

>A perfect example are, in my mind, some of the reasons why node.js is taking off as opposed to say Java or C#...

New programmers rediscovering the (square) wheel?

>In JavaScript, since you can replace just about anything, and even call into functions with differing contexts, this makes testing even easier.

One should rather draw the opposite conslusion: that the above make testing even more necessary (and code even less robust).

z3t4 · on Nov 4, 2014

I've been thinking about this (JavaScript object persistent storage) for a while: The dilemma is circular objects and the prototype chain. What should be taken from the run-time and what should be taken from the persistent object?

ddevault · on Nov 4, 2014

Related: SQL.js, which is sqlite compiled with emscripten

https://github.com/kripken/sql.js

indubitably · on Nov 4, 2014

Except that file is 10x bigger than loki…

bpicolo · on Nov 4, 2014

So? It's also full-blown sqlite. Let's not pretend file sizes matter much in modern times.

rakoo · on Nov 4, 2014

It does if you're going to send that to a browser.

imslavko · on Nov 4, 2014

From skimming the presentation it looks like a project similar to Meteor's Minimongo and Miniredis (https://www.meteor.com/mini-databases) used for browser caches. Minimongo, for example, implements a great majority of Mongo selectors (there are no secondary indexes, though).

(I contributed to both Minimongo and Miniredis)

jedireza · on Nov 4, 2014

I'm curious why NeDB wouldn't suffice.

https://github.com/louischatriot/nedb

yourad_io · on Nov 4, 2014

From the slideshare presentation on the homepage (p. 4)[1]

> Performs better than similar products (NeDB, TaffyDB, PouchDB etc.) and it's much smaller

It would be interesting to see benchmarks.

[1] http://www.slideshare.net/techfort/lokijs/4

fiatjaf · on Nov 4, 2014

Ok, TaffyDB is a piece of crap. You call `.fetch` to fetch a document at some id, but if you modify your fetched copy, the copy in the database is also modified. With a database like this I'm much better with a hashmap.

kristopolous · on Nov 4, 2014

there is this benchmark : https://github.com/danstocker/jorder/wiki/Benchmarks ... I maintain this one (https://github.com/kristopolous/db.js) which is part of that.

From my perspective, with a small collection size, I'll sacrifice performance for expressiveness. I have a number of expensive features that I think do nice things.

jedireza · on Nov 4, 2014

Thanks for sharing that. I jumped the gun before commenting.

yourad_io · on Nov 4, 2014

No worries. I just noticed they include the benchmarks here [1]

[1] https://github.com/techfort/LokiJS/tree/master/benchmark

keithwhor · on Nov 4, 2014

How does this compare to DataCollection.js?

https://www.npmjs.org/package/data-collection http://thestorefront.github.io/DataCollection.js/

They look fairly similar. What's the test coverage and do you have performance benchmarks?

remon · on Nov 4, 2014

Out of pure curiosity, what are the common use cases for a fast in-memory database? Are they exclusive to server-side applications (e.g. caching)?

rmsaksida · on Nov 4, 2014

Could be useful for a browser app which handles 'large' datasets, or for a mobile/desktop app as an SQLite replacement. Think of a messaging app which lets you search conversations client side with decent performance. It has a small footprint and some interesting features, so it might be a convenient DB for some apps even without strong performance requirements.

ilghiro · on Nov 4, 2014

Cordova is a big one. The ability to have rich queries into some persistent data set is something that's difficult to achieve atm.

joeminichino · on Nov 4, 2014

It's precisely why I created LokiJS in the first place, then it grew much bigger than that but yeah - that is where it all came from.

tjelen · on Nov 4, 2014

Well, it could be very useful as a more advanced "model" layer in browser-side MVC apps. I'm currently working on something that has a pretty complex data structure and simple JS objects/arrays simply don't cut it. LokiJS certainly seems worth looking into.

The feature that looks especially promising is the ability to create Dynamic Views and to listen on notifications on these views.

delluminatus · on Nov 4, 2014

You can also use them in the browser to cache data to reduce server calls or improve client UI performance (a la meteor.js).

marknadal · on Nov 5, 2014

Excellent project, there needs to be more focus in this space. I've been feeling a general trend of developers going back to basic embedded databases.

I'm working on a similar project, that is focused on ease of use combined with distributed/decentralized (concurrency safe) behavior. Http://github.com/amark/gun

cloudsheet · on Nov 4, 2014

Has anyone used this is lieu of or in comparison to Redis? Am I correct in assuming that with LokiJS, one could move in-memory data storage from the server (with Redis) to the browser (with LokiJS)?

This project looks quite interesting. Thanks for sharing. Could be a great solution for single-page web apps (and mobile, Cordova, etc.).

panzi · on Nov 4, 2014

If you want a simple key-value-store in the browser, why don't you directly use localStorage(+JSON)? It's supported since IE8. My guess is that this thing is using localStorage under the hood.

RussianCow · on Nov 4, 2014

Presumably because LokiJS is faster. (I don't know if that's actually true.) And no, I just looked at the source code and Loki does not use localStorage under the hood. It's an in-memory database without a persistence layer, so you lose the data when you leave the page (in the browser).

Edit: Actually, reading more about Loki, it looks like the whole point of it is to provide querying/indexing capabilities to data in memory. So I don't know if there would be any value in using it as a replacement for key/value stores.

leeber · on Nov 4, 2014

This looks interesting. Coincidentally I was just searching around for something like this to run in the browser.

I'll play around with it myself, but is there anyone who has used this that knows how well it does with memory usage?

yourad_io · on Nov 4, 2014

+1 for interesting.

The ToDo demo shows the size of the collection [1]. Seems rather efficient.

[1] http://lokijs.org/#/demo

nobodysfool · on Nov 4, 2014

Uhm, I'm pretty sure it can use all your ram if you want. It's an in-memory database, so all your ram will be used for data...

RussianCow · on Nov 4, 2014

I think the question was more about how compactly it stores the data in memory.

quartzmo · on Nov 4, 2014

I want a very fast, client-side, read-only database with rich query support. Is this the best choice?