Hacker News new | past | comments | ask | show | jobs | submit login

Can you give some examples where a relational schema isn’t suitable?



Storing historical webhook notifications


The bonus with this is that at the start you are probably only interested in a few of the fields in the webhook, but as your application develops, you may need to extract more. If you've stored the JSON of previous webhooks, you have an easy migration path.


Sparse data.

For example, if there are 1000 possible attributes, only 5% are populated for any given row. If you have 1000 columns you are going to have a ridiculously wide and mostly empty table.


In Postgres, NULL values take one bit of storage as far as I understand. So a table has to be very, very wide before this becomes a benefit. Of course if you have attributes that you don't control but are e.g. user-specified, JSONB column are a good choice.


The traditional approach to efficiently modelling data like that with relational databases is to use the EAV[1] model, which works quite well in practice.

[1]: https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80...


I use the JSONP column type in Postgres for data that is structured but user-supplied.


The user is supplying json?


There is a key-value interface so not directly.


Why not store this data into a relational database?


Because values can be different types (string, int, boolean).


Can't you store them all as varchar? You could even have a type column if you wanted to enforce the types in some way at the application level.


Product data on a webshop. If you have a wide selection of product categories you’ll have a huge number of attributes that aren’t relevant across your categories.

There’s a number of ways to deal with the issue, e.g. a model per category or an EAV database pattern (this is what platforms like Magento does). None are really ideal, but storing the product attributes as JSON works pretty well. Even more so when your database supports querying the JSON blob.


Storing arbitrarily complex Boolean queries (think Elasticsearch's query syntqx). While this could be done in SQL tables, there are no gains with referential intregity and while there would be some in consistency, I don't think they're worth it.


Speaking strictly of schema...

Relational schema are flexible and allow for an optimizer to figure out how best to fetch your data based on the data are querying and filtering for and stored statistics about your data set.

Document oriented storage is great when you don't need the optimizer because you already know how the data is written and read. This means you can bundle it up into small, single fetch documents. No statistics or optimizer necessary. This is great if you understand your use cases really well, and they never change (good luck with that) or you have a large distributed data set that would be tough on an analyzer.


There are scenarios when you can't avoid EAV and then JSON is great even when it's for denormalized data.


Data that tends to change after the application is deployed.

More importantly, JSONFields can be used as a substitute for EAV pattern.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: