Hacker News new | past | comments | ask | show | jobs | submit login
A JSON field type for Django (metric.io)
96 points by aychedee on March 13, 2014 | hide | past | favorite | 43 comments



Hopefully this will soon be added to django core - it's one of the goals as part of Marc Tamlyn's kickstarter to improve postgres support in django.

https://www.kickstarter.com/projects/mjtamlyn/improved-postg...


Really not trying to troll here, but I want to know why anyone uses a Python ORM here other than SQLAlchemy? It's been a while since I used Django's ORM but I recall the comparison between the two being very heavily in sqla's favor.


The Django community is focused on web apps, not e.g. off-line reporting, so a large percentage of the things which the ORM doesn't help with simply aren't a priority for most users. For something like 90% of the queries most people write, Django's ORM is easier to work with and the tight integration with model forms, the admin, etc. is a big selling point for the typical project on a deadline.

When you do need to do complex joins or use database-specific features, the .raw() / .extra() queryset methods handle enough to avoid it being a deal-breaker, particularly since the crazier your performance / feature requirements the more likely it is that you're going to need to ditch an ORM altogether.

EDIT: in case it wasn't clear, I have a ton of respect for SQLAlchemy. Nothing above should be seen as saying SQLA isn't good, merely that the choice of ORM usually isn't the most important factor in a project's decision.


The Django ORM is developed as part of the framework. As such, it benefits from the integration of other Django features. The Django pitch is: Django: The Web framework for perfectionists with deadlines. Meaning that everything is designed to get you up and running quickly. In the case of SQLAlchemy, that level of integration is not available as a result. Even though SQLAlchemy is much powerful, the benefits of the Django ORM outweigh any additional features provided by the former. Plus most web apps are simple cruds that will not require the level of fine tuning that SQLAlchemy provides.

Now, in case of APIs, I prefer Flask (if using Python) with SQLAlchemy (if not flat SQL). It is the inverse of Django. No additional features than the ones you need as a base. Build on top of it rather than build with.


I'm fairly new to Python (~6 months) and started with SQLAlchemy, but recently switched our app to Peewee (http://peewee.readthedocs.org) because of frustrations with the SQLAlchemy "session".

The standard method of scoped_session(sessionmaker(engine)) will return a thread-local session - so all data manipulation effectively shares a global session.

- Every change needs to be followed with a commit() or rollback(), or you'll end up with rubbish in the session that some later call will inadvertantly commit.

- If you want to do a general "update where" call, make sure you use "synchronize_session=False" and then session.expire_all(), otherwise the session will be out of date.

It felt like every time I had code working with data, I also had a non-trivial amount of session management. It wasn't abstracting the database access away, it was making everything very SQLAlchemy session specific.

To be honest I'm somewhat second guessing my decision, because Peewee is far from the Python standard. But it's simple, understandable and clear, and doesn't force a strange "don't forget to manage the global state" mindset onto everything.


FWIW I use Pyramid with SQLAlchemy and it comes with a transaction manager plugin which automates the begin, commit and rollback on error of transactions. 99% of the time I don't need to think about transactions, only during CLI tool development and only barely. YMMV


Yeah, sounds like you're mileage does vary.


i think flask-sqlalchemy does something similar


The documentation has a pretty good example where they bundle session related functionality into a reusable context manager (Python's `with` statement). I took this approach in one of my projects, and it worked out rather well.


I assume you mean the example towards the bottom of http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html#when-...

So where do you use this "with session_scope() as session:" context wrapper? Do you wrap your entry point in it, so the entire app is implicitly using the session (as the example shows)? Or do you call it in each separate function that needs data access?

If the former, how do you handle commit() and rollback() for "local" (per-function) changes? It sounds like either "never rollback" or "always commit" is necessary. And I think "always commit" is more likely, because usually an entire "request" (whether we're talking web apps or not) isn't atomic.

So you've got this SQLAlchemy session context wrapper around the entry point, nice and separated from all the internal data functions. Except every data function actually needs that session to commit(), so it's either pulled from some global state, or passed around like a stinky baby.

But this can't be the case, as this inelegance is not pythonic, and SQLAlchemy is the Python ORM. What am I missing?


It has been sometime since I used either one really intensively (I still use Django's off and on in maintenance), but for a long period of time, despite the advantages for sqla, Django was still a lot more newbie friendly.

Not much fiddling with engines, metadata, etc. Not to mention knowing the form stuff was going to just work, compared to third-party solutions (as good as they are) with the other.


Well...

I use Django... The Django ORM is built in and does everything i need. I use Salt... Which has native support for using the Django ORM which I'm already using.

And ... For some extra icing on the cake. I like doing things in an SOA which has a wonderful recipe in Django + Django REST Framework + Django ROA (The original https://bitbucket.org/david/django-roa/overview and one of its most promising forks https://github.com/charles-vdulac/django-roa) With this, I can take a codebase, split off a particular chunk of the code, wrap it as an API with Django Rest Framework, then hook the rest of the service up to use the newly separated 'service' via Django ROA which lets Django work with the REST Api via the native ORM ... REST as a database!


SQLAlchemy is great for certain classes of applications but rarely are the advanced features needed for web applications. It's one of the few things I dislike about flask development in general. It's the standard ORM to roll with flask and compared to Django's ORM it's WAY over complicated. Flask and a NoSql db is a dream though.


SQLAlchemy's documentation makes the ORM seem a bit unapproachable, but Flask-SQLAlchemy does a good job of simplifying things: http://pythonhosted.org/Flask-SQLAlchemy/. With that documentation I don't think SQLAlchemy is much harder to use than Django's ORM.


I think SQLAlchemy's documentation is actually pretty good; it's extremely thorough. SQLAlchemy is also extremely well designed/written (as is Django), though it tends not to hide its complexity. SQLAlchemy's sessions are more complex than Django's equivalent. This is because SQLAlchemy assumes a more general case, where as Django assumes you're making a standard CRUD web application. They're both good at what they do.


Have you looked at https://pypi.python.org/pypi/django-jsonfield/ before building your json field? If so, can you elaborate about how those implementations compare with each other?


My understanding is that the field, when working with a PostgreSQL database, is stored as a JSON field in PostgreSQL, rather than just a text field. I don't think django-jsonfield can do that.


There are a number of existing jsonfield implementations. This one, probably the best of the lot, DOES make use of the PostgreSQL JSON field: https://github.com/bradjasper/django-jsonfield/blob/master/j...

It looks like your app does more in terms of casting data that comes from the field. Is this the major improvement?


It's not Django, but I've built something quite similar for Rails, and have really loved using it. The combination of RDBMS stability and the flexibility of schemaless content is really powerful -- it is (IMHO) the best of both worlds.

Would love to hear people's experience with this kind of setup (no matter what software you're using)...I think it's a very interesting, and useful, direction to go, especially for new projects where you may not want the overhead of multiple storage systems.

https://github.com/ageweke/flex_columns


Thanks for this! I use some of the various json fields that exist already but have been feeling like it is about time to start using the built in Postgres JSON support instead of just TextFields.


This looks great, but how does it compare to already existing options like django-jsonfield or jsonfield?


I looked at lots of alternatives. But I needed something that supported the actual underlying Postgresql json type. Psycopg2 converts those automatically into Python types. So we needed a JSON type that would accept: Python dicts, lists, strings, and JSON encoded objects, lists, and strings. None of them support that because they are all just storing the data as text in the backend.


I run django-jsonfield which has partial support for native Posgresql JSON type and more will be added as it's incorporated into Django: https://github.com/bradjasper/django-jsonfield/issues/55

Happy to accept pull requests for stuff like this if you're interested in contributing.


Great, I'll definitely do that. Maybe we should have a DM twitter chat on how to generalise it before I submit anything? I'm hanseldunlop there.


i use your lib all the time, works great! just wanted to say thanks



I haven't looked into the postgres json field in depth yet, but it seems like you may know the answer. Is there any way of ensuring the structure / integrity of the data stored in it? Or is it currently considered to be totally freeform?


Postgresql validates the field. From the docs:

"the json data type has the advantage of checking that each stored value is a valid JSON value" - http://www.postgresql.org/docs/9.3/static/datatype-json.html

There are also a bunch of functions available to operate on a JSON field. Full info here: http://www.postgresql.org/docs/9.3/static/functions-json.htm.... What this means is that you can do pretty fast searches on the value of a key in a JSON object.


Thanks for the info - I'm thinking more along the lines of constraining the data within the valid json.

Just found this post which shows how you can do it.

    CREATE TABLE products (
        data JSON,
        CONSTRAINT validate_id CHECK ((data->>'id')::integer >= 1 AND (data->>'id') IS NOT NULL ),
        CONSTRAINT validate_name CHECK (length(data->>'name') > 0 AND (data->>'name') IS NOT NULL ),
        CONSTRAINT validate_description CHECK (length(data->>'description') > 0  AND (data->>'description') IS NOT NULL ),
        CONSTRAINT validate_price CHECK ((data->>'price')::decimal >= 0.0 AND (data->>'price') IS NOT NULL),
        CONSTRAINT validate_currency CHECK (data->>'currency' = 'dollars' AND (data->>'currency') IS NOT NULL),
        CONSTRAINT validate_in_stock CHECK ((data->>'in_stock')::integer >= 0 AND (data->>'in_stock') IS NOT NULL )
    }

http://blog.endpoint.com/2013/06/postgresql-as-nosql-with-da...


Why would you ever want to do this? Surely at this point you are better with a table with id, name, description, price, currency, in_stock and an extra data field for anything else - given you require the other fields.


Maybe - I need to test the performance, but for my application breaking things out into a table is a little slow. In all probability the constraints will be slower, but it's something to try.

At the moment I have all my data in mongo but over time it's fallen out of shape. There's a large chunk of it that just needs to be stores (json field) but it would be nice to constrain some of it.


Depends on how much the data is going to be altered. Changing columns is not a free operation in data stores. They require you to rewrite all the rows in many cases.

This can be more dynamic short term.


Oh cool, that's very interesting! Thanks for sharing.


Thanks for the answer. Would it be possible to query against this field?


Not properly using Django's ORM. That's something I'll write when we actually start to need it. At the moment. It's more of a wholesale document store. Right now you would have to write a custom SQL query, using the Postgresql JSON functions, and the Model.objects.raw(...) interface of provided by Django.


The second you make it easy to query this, I'm dumping mongodb as my go to quick hack json store. Having one datastore > multiple datastores


Remember to add it here when it's released: https://www.djangopackages.com/grids/g/json-fields/


What are the engineering trade-offs here? Metric were migrating from CouchDB, so using a JSON field is the obvious solution, but what about for new projects?

Compare it to say, Django-CMS, which achieves pretty much the same thing by storing the tree data-structure directly in SQL. The `django-mptt` library handles the details of efficiently implementing a tree structure in SQL, and `south` simplifies database scheme migration. As a programmer, which system would you rather work with?


For myself? I don't want to do a migration whenever we add a new widget to our promotion or email builder, I certainly don't want to think about doing a data migration on every existing document. In this particular case we do not care about the contents of this field. There is already a lot of related information stored in the same row.


You should use hstore to store json in postgres also - no?

https://github.com/djangonauts/django-hstore

I think the issue with that is it converts everything to string so you need to use json.loads().


Not quite, hstore doesn't support nesting.


You should also get this up on Github quick :)


It's probably more appropriate to submit a pull to one of the existing projects TBH.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: