Sophia: A modern embeddable key-value database – v1.2.2 released

NhanH · on April 12, 2015

There seems to be a lot of embedded KV store already, and SQLite is pretty much the defacto embedded relational DB. Is there a good embedded Graph DB around? Specifically, a embedded property graph DB. Hypergraphdb is embeddable, but it's not property graph, and while neo4j has an embedded version, I don't think it works for non-java use case.

RussianCow · on April 12, 2015

There are a lot of platform specific solutions (neo4j, networkx, Core Data, etc) but I'm not aware of a generalized solution. I would like to know this too, because I'm often constrained to certain languages/platforms but would like to use something like neo4j.

marknadal · on April 13, 2015

If you are using NodeJS then http://github.com/amark/gun might do the trick. Embedded graph database with realtime push notifications. Properties are just regular ol'JSON.

mdcox · on April 13, 2015

Working on one! Give us a few more weeks though, we don't wanna release something we aren't 100% happy with, even as a 0.1 .

sophacles · on April 13, 2015

There's Sparksee/Dex. Not sure it has everything you're looking, and it's proprietary and a bit pricey.

taterbase · on April 12, 2015

I believe CoreData should count as an embedded Graph DB. It's backed by SQLite but Cocoa specific.

fasteo · on April 12, 2015

Be sure to checkout Tarantool[1]; it uses Sophia for on-disk databases

[1] http://www.tarantool.org

slapresta · on April 12, 2015

Never heard about it before, it looks interesting.

  > Tarantool combines the network programming power of Node.JS with data persitence capabilities of Redis.

Is that sarcasm? I can't tell.

vbit · on April 13, 2015

I don't think so.

Tarantool uses an async evented IO model, but uses Lua coroutines and not Javascript. There are not callbacks, just 'yield points'.

Also, the primary data store backend is an in-memory database with optional 'snapshoting' to disk. An alternative backend uses sophia, so it's not 100% in memory.

fasteo · on April 12, 2015

I tend to see Tarantool as a Lua powered database. In this sense, you could easily implement a Redis like system on top of it using Lua

vbit · on April 12, 2015

Do you know which version of Sophia it uses?

kostja · on April 13, 2015

The latest. Sophia was created as a disk-based engine for Tarantool, and is also available as a standalone embeddable library

vbit · on April 14, 2015

Ah, thanks - didn't know that. Tarantool is very interesting, btw.

MichaelGG · on April 12, 2015

Does anyone have recommendations on a constant DB optimized for sequential integer keys? Running LZ4 over things is cool, but using delta encoding or more clever schemes, you can work right on the compressed key data. (And even more fun if the value is also just a restricted set of integers, like an inverted index.)

hyc_symas · on April 15, 2015

LMDB has optimized support for integer keys, as well as for sequentially sorted data. http://symas.com/mdb/doc/

gfodor · on April 12, 2015

comparisons to kyoto cabinet, leveldb, and rocksdb (on features, maturity, and performance) would be great if anyone has any to share.

donpdonp · on April 12, 2015

also add lmdb/boltdb to that list. there seems to be a convergence around a certain feature-set for embeddable key/value stores: MVCC semantics and ordered keys.

jonatanheyman · on April 12, 2015

As well as comparison to LMDB.

retrodict · on April 16, 2015

https://charlesleifer.com/blog/completely-un-scientific-benc... does some basic performance comparisons for unqlite, vedis, dbm and kyotocabinet, leveldb, rocksdb, sqlite and redis.

diydsp · on April 13, 2015

Just to be clear, does embeddable have a specific meaning here? I'm a firmware developer, often writing code for small CPUs and microcontrollers. Does this apply? It seems like here "embeddable" means it can be compiled into an app, as opposed to getting accessed through a server. Is that correct? Thank you.

ximus · on April 13, 2015

It seems like here "embeddable" means it can be compiled into an app, as opposed to getting accessed through a server. Is that correct?

Yes

whether or not it is a good fit for small CPUs and microcontrollers is another characteristic that I can't comment on.

hyc_symas · on April 14, 2015

(since LMDB has been mentioned in this thread... LMDB is embeddable in every sense of the word. It can work with as little as 64KB of memory and is already deployed in a number of MCU-based products. Unfortunately I don't have permission to name names.)

justin66 · on April 16, 2015

Is LMDB running on any operating systems that do not offer mmap support?

hyc_symas · on April 17, 2015

Not currently. That's kind of a fundamental component of LMDB's design.

justin66 · on April 17, 2015

Thanks. That makes your comment about MCU-based systems that much more intriguing. :)

notduncansmith · on April 13, 2015

I believe embeddable is referring to the definition you inferred - that is, you can compile it into your application rather than running it as a separate service.

vezzy-fnord · on April 12, 2015

Since we're throwing around names here, depending on your use case something as simple as cdb can be amazing: http://cr.yp.to/cdb.html

eternalban · on April 12, 2015

@pmwkee: http://sphia.org/pv12.html doesn't tell us the scaling characteristics. The cited performance page is DB at steady state of 6.0M keys. how does it behave under dynamic load? Various scenarios to help your potential users determine if the software is a good fit for their use-case, would be helpful.

Glanced at the code and the arch doc. Looks promising and shows careful crafting. Well done!

johncmouser · on April 12, 2015

Cool! I was looking for a simple key-value alternative for SQLite3 and was going to use redislite[1]. But this is perfect, I think it has the potential to replace SQLite3.

[1]https://github.com/seppo0010/redislite

otoburb · on April 13, 2015

SQLite4 is being designed as a key-value alternative to SQLite3[1]. SQLite3 and SQLite4 are meant for different use-cases and are expected to co-exist. Unfortunately, SQLite4 hasn't yet been released, but wanted to let you know that the SQLite developers are actively working on addressing the need for an embedded key-value store with SQLite4.

[1] https://sqlite.org/src4/doc/trunk/www/design.wiki

aswanson · on April 12, 2015

BSD licensed and implemented as small C-written library with zero dependencies.

What's not to like...

virmundi · on April 12, 2015

So why not BerkleyDB? I couldn't find a comparison to the old standard on the site (granted did just a cursory glance).

beagle3 · on April 12, 2015

Why BerkleyDB?

BerkleyDB is now AGPL3[0], which some projects have a problem with. (Of course, you can buy a commercial license. Some projects have a problem with that too).

But the main reason, almost regardless of context, as to "why not BerkleyDB" is LMDB[1]. It works way better than everything else, in just about every practical use that has more reads than writes.

The only downside as far as I can tell, is that right now it relies on memory mapping the entire database, so you're limited to ~1GB overall database size on 32 bit systems. There is no practical limit on 64 bit systems. Also, I recall Howard Chu (main LMDB developer) mentioned that in the near future LMDB will gain the ability to manage memory manually - thus removing this restriction as well (for a performance price if used this way).

[0] http://www.oracle.com/technetwork/database/database-technolo...

[1] http://symas.com/mdb/

hyc_symas · on April 13, 2015

The feature for re-mapping on 32bit systems is available here https://gitorious.org/mdb/mdb/source/69d7cb8d44e04f02d8d0c92...:

beagle3 · on April 13, 2015

When will it be merged to the mainline? (or is it already there?) Will it eventually be a run-time option, or always a compile time option?

The latest reference I can find is http://www.openldap.org/lists/openldap-devel/201410/msg00001... - were the problems solved?

Thanks for LMDB. It is awesome.

hyc_symas · on April 14, 2015

It still needs heavier testing before going to mainline. I expect it will only ever be a compile-time option. On 64bit it's pointless, and 32bit is going the way of the dodo.

pron · on April 12, 2015

BerkeleyDB is awesome, but its license may not be suitable for many organizations. Also, the different design would probably result in different behaviors under different scenarios. It seems like Sophia is optimized for inserts.

halayli · on April 12, 2015

BerkeleyDB API is ugly. And probably its extra features impact performance.

tremols · on April 13, 2015

Xodus from JetBrains (Java only / Apache 2 License) looks promising.

halayli · on April 12, 2015

This looks very promising. The code is very clean and optimization is taken into consideration.

raghavsethi · on April 13, 2015

Also, what exactly is a database traversal? Is it a random read benchmark? If so, what is the distribution - uniform, zipf, or something else?

skeoh · on April 12, 2015

What's going on with that logo? It is completely illegible.