Hacker News new | past | comments | ask | show | jobs | submit login

If, like me a moment ago, you have no idea what jsonb is; see here for a full explanation http://www.postgresql.org/message-id/E1WRpmB-0002et-MT@gemul...

tl;dr storing json in a way that doesn't mean repeatedly parsing it to make updates




Thanks for that link: It explains that insignificant whitespace, duplicate keys and key order are not preserved.

Does someone have a link to the storage format? I'm always curious and want to learn efficient encodings. Thanks!


It's sad, but many developers actually count on JSON key ordering. Why, I don't know, but they do. Why people like to code to the implementation, not the spec, I do not know.


I've seen cases where the actual api is XML based, and there's a simple mechanical translation from JSON to XML.

Standards People.


A prominent example is Solr's JSON update endpoint [1], which is just the XML update endpoint in disguise.

It expects input of the form:

    {
       "add": {document 1 goes here},
       "add": {document 2 goes here},
       ...
       "commit": {}
    }
And of course all the "add" values are different and the "commit" has to come at the end.

[1] https://wiki.apache.org/solr/UpdateJSON


I've dealt with one of these. EAN, the Expedia Affiliate Network[0], has an API with what they call an "XML" mode and a "REST" mode, where "REST" means JSON. The "REST" mode is very clearly translated directly from the XML mode. I'm simplifying and doing this from memory so bear with me, but here is how translation works:

    <Hotel id="1">
      <Amenity>Bacon</Amenity>
    </Hotel>
turns into

    {'Hotel': {'@id': '1',
               'Amenity': 'Bacon'}}
Straight forward enough. But here's what happens when you have two <Amenity> elements:

    <Hotel id="1">
      <Amenity>Bacon</Amenity>
      <Amenity>Chocolate</Amenity>
    </Hotel>
Now turns into:

    {'Hotel': {'@id': '1',
               'AmenitysList': {'@size': 2,
                                'Amenitys':
                                    [{'Amenity': Bacon'}
                                     {'Amenity': Bacon'}]}}}
The translation engine appears to have no knowledge of the schema, it just adds ___List and ___s entries. So the schema of the JSON is different based on the presence of a repeated element

Also since it doesn't have any knowledge of the schema, all elements are textual except the special @size element.

Because @size is a special element for these lists (which of course you have no need for in JSON), but the engine turns XML attributes into @____ as well, there is no way to get at the actual "size" attribute if one exists.

[0] http://developer.ean.com/docs/hotel-list/


Because in software development industry "Because it works!" is considered an acceptable answer, that's why.


"Because it works for me" is also prevelent.


http://www.postgresql.org/message-id/E1WRpmB-0002et-MT@gemul... links to the git commit at http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitd.... The obviously interesting file is src/include/utils/jsonb.h (a new header file named jsonb).

It has a reasonably decent (highly decent for a header file) description.

One thing I had to take a guess at is that, in JEntry, the header is a combination of a bit mask and an offset. I also would guess that offsets for identical keys (can happen in nested json), and maybe offsets for other identical data will be identical. I didn't check either.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: