Hacker News new | past | comments | ask | show | jobs | submit login
A proof of concept MongoDB clone built on Postgres (github.com/jerrysievert)
142 points by pablobaz on April 20, 2015 | hide | past | favorite | 49 comments



You can turn a Postgres db into an API with PostgREST, and it supports jsonb to retrieve json properties.

https://github.com/begriffs/postgrest


This is a really cool project. Thanks for pointing it out.


How are JSON updates handled?

For those who aren't aware, Postgres currently lacks support for doing updates to JSON fields via SQL[1]. For many this isn't a problem, but I'd imagine that people expecting a MongoDB clone would need it.

[1] But even though you can individually address the various fields within the JSON document, you can’t update a single field. Well, actually you can, but by extracting the entire JSON document out, appending the new values and writing it back, letting the JSON parser sort out the duplicates. https://blog.compose.io/is-postgresql-your-next-json-databas...


I don't know the answer to your question but JSON update is being worked on by the PG community. I think it is unlikely to make 9.5, but we will see.

http://www.postgresql.org/message-id/flat/54E00D1F.9060101@d...


Write a function in PLV8?


Or Python, or any language that has a PL in PostgreSQL. Unfortunately a lot of people consider that a hack, and they're not wrong.

If we can address fields with operators (->, ->>, etc.) then why can't those operators modify, too? It works for everything else, after all. It's fairly counter-intuitive from a dev perspective.

That, and PLV8 still doesn't support JSONB. With JSONB being so much more efficient, faster, and so on, not being able to interact with it using the most logical procedural language is an EPIC oversight.


> If we can address fields with operators (->, ->>, etc.) then why can't those operators modify, too? It works for everything else, after all. It's fairly counter-intuitive from a dev perspective.

It certainly doesn't work for "everything else". You can update individual "parts" of arrays and record types, that's it. And those exception are hard-coded, so someone would either have to hard-code similar exceptions for JSON values (ugly, inflexible, generally a bad idea) or generalize support for custom operators on the left-hand side of assignments in UPDATE. All this while trying to push out a release which was already late by months.

You can call everything an oversight, but with very limited resources available and the need to cut a release at some point, this is far from the truth in this case.


Why is special-casting json or jsonb worse than special-casing arrays?


There have been calls to move toward jsonb, as well as a pull request for newer v8, but development seems to have stalled.


You can also write an update function in plain SQL. See my example in this comment on a recent plant postgresql post: http://bonesmoses.org/2015/04/10/pg-phriday-functions-and-ad...


currently, that is exactly how they are handled. the JSON structure is brought into memory, parsed, modified, and then written again to the database.

this is non-optimal, but currently the only method available.


That wouldn't change even if there was a specialized syntax, because of MVCC. So the only problem is that syntactic sugar is lacking, not that the performance isn't as good as it could be.


Use a transaction maybe?



FWIW, I've recently been implementing the MongoDB query language as well:

https://github.com/zumero/Elmo

In F#. Not even remotely close to usable or production-ready.

The approach here is somewhat different, as this implementation is built on SQLite, which it treats as a simple key-value storage layer.


Thanks for posting that, it's interesting and instructive to see examples of f# code 'out in the wild'. Much appreciated!


Note also that the Meteor project has "minimongo", which is yet another implementation of the MongoDB query language.


I'd use it if it supported $near ...


That's an excellent news for CouchDB, frankly writting map/reduce functions was a bit hard. CouchDB has excellent features (multi master syncs,...), I look forward using CouchDB again where it makes sense.


Álvaro here from ToroDB.

Check out ToroDB (github.com/torodb/torodb). It is a Mongo implementation based on PostgreSQL that transforms JSON documents into relational tables. This has many advantages like significant storage reduction, less I/O required and it allows for updates (a concern raised on some comments below). Please check it out! :)


I was going to post that this would have been cool a year ago, but now that MongoDB has interchangeable backends there's not much to gain from a high level project like this.

But then I saw that this project started two years ago, and the last commit was a year ago.


From my experience of MySQL's pluggable engines, I learned that this actually complicates things: the user interface ends up being awkward, optimization is harder, adding features is made much harder.

Investigating standalone alternatives therefore seems like a good thing, even for projects starting today.


| MongoDB has interchangeable backends

Do you have a source on this? I can't find anything at the moment but it sounds familiar.


The feature is called "pluggable storage engine", if you want something to Google.



I've been working on something similar for the last while, but using jsonb and plpgsql instead of plv8.

So far, the basic crud stuff works and there's a python driver with decent test coverage. Progress is slow, but it's fun!

http://bedquiltdb.github.io


Something else that provides a simple API for working with JSON in Postgres: https://github.com/robconery/massive-js (node.js)


That looks pretty awesome, although fairly orthogonal to this discussion. I submitted it as a story on its own. https://news.ycombinator.com/item?id=9407782


At the bottom of the README it says "Follow along at http://legitimatesounding.com/blog/", which 404's. I want to follow along! Please fix. Thanks.


Fixed.


Interesting experiment! How does it compare in performance, cpu and memory footprint?


There's also Mongres which was inspired by this project.

https://github.com/umitanuki/mongres


I thought MongoDb:s main selling point was the simple scale-out model? At least that's what stuck with me. I'm well aware about the non-robustness properties of Mongo, but to me it seems like calling this a clone without the scale-out capabilities would be missing the point.


It's a proof of concept. The author's essentially saying, "here, it can be done, this is one way of doing it." It would need more bits and pieces and more tuning to be useful in a production case, but this shows that it at least can work.


I think you missed my point. The USP of MongoDb in my view is the simplicity and its scalability model. At least these are the things which are interesting to me and what is hard to implement. They have cloned a subset of the functionality - the subset which is easy - and left the hard parts. I don't agree that this proves anything.


Python has a nice non-SQL style interface to SQL engines too, called Dataset: http://dataset.readthedocs.org/en/latest/


Benchmarks please! Response times, mem usage, disk usage.

Does it use HSTORE?

Nice project!


ELI5 does this essentially mean I can use the Mongo API to work with Postgres?


Makes sense they're porting MongoDB to Postgres, given that:

"As of version 9.4, PostgreSQL benchmarks faster than MongoDB for both inserting and querying JSON data." :P


Why?


Developer here (woke up surprised it was on hn again), why not?

The blog posts go into more details: http://legitimatesounding.com/blog/building_a_mongodb_clone_... and http://legitimatesounding.com/blog/building_a_mongodb_clone_...


I suspect there are lots of people who built applications on top of MongoDB who would like to transition away from Mongo onto a different database, or would at least like the option to be able to to. This may provide a good mechanism for a transition.

Which leads to a question for the OP: does this work with mongoose.js?


i started a mongo client implementation in node.js, but it fell off my radar - the goal was to get it to work with mongoose, etc: https://github.com/JerrySievert/mongolike-client (it is not yet complete).

in the end, i just wrote my own database system, with a query language that could act as an intermediary from multiple sources (SQL, mongo, etc).

but, this stuff is mostly just for fun, my day job keeps me plenty busy.


Could be interessting for meteor.com, the mongodb only usage prevent me to use it.


problem : meteor relies on a specific mongodb feature,ie polling the db for any change. I heard there is experimental support for PG but it seems to rely on a complex hack using triggers ... So the issue isn't really about mongo queries, but whether it is possible to track db edits from third parties or not.


There's infrastructure for that in postgres, since 9.4. See

http://www.postgresql.org/docs/devel/static/logicaldecoding.... for the (somewhat low level) description of the feature. You'd have to write an output plugin that formats the output as json, but that should be pretty easy.

Disclaimer: I'm the author of the feature ;)


That specific MongoDB feature is "tailing the oplog", where the oplog is the capped collection with the idempotent commands that represent the changes in the database. This is what Meteor uses to receive (be pushed) changes (not polling).

Certainly, logical decoding provides the necessary infrastructure to implement it (and also normal tailable cursors): thank you Andres, really nice work! :) But some work is also needed to transform the representation you are using in PostgreSQL into MongoDB's oplog entries.

This is definitely what we are using in ToroDB to emulate it (currently, under development).


> I suspect there are lots of people who built applications on top of MongoDB who would like to transition away from Mongo onto a different database

What happened to separating the data access layer from the business logic layer? Oh yeah, "good architectural practices are so JEE ..." /s


Taking full advantage of the capabilities of the data layer has effects on the business logic layer. Sometimes it's worth it, sometimes it isn't.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: