Hacker News new | past | comments | ask | show | jobs | submit login

This looks like a really useful library. A few questions:

1. What size limits/practical constraints are there on freezefiles (and accompanying JSON files)?

2. Are there any code samples for consuming freezefiles, or should I just assume it's simple JSON/YML parsing?

3. Has there been any thought in using this to expose database contents via static REST API?

Final thought: this seems like a great step towards solving the age-old version controlling data problem.




Author here.

As for 1.: CSV files are encoded as a stream, so they can be as large as needed. JSON is dumped as a whole from memory, I'd be keen to see if someone has written a streaming JSON encoder.

2.: Consuming, no. I normally load them in a browser with D3 or jQuery to feed them into a graphic or other interface.

3.: I'd argue this is out of scope for dataset, but simpler REST API makers would definietly be cool. Check https://github.com/okfn/webstore - this is what dataset came out of, and it makes somewhat RESTish APIs.


I'd be keen to see if someone has written a streaming JSON encoder.

This looks interesting: https://gist.github.com/akaihola/1415730

Edit: dataset looks like a really interesting library!


I'm curious of the relative advantages/disadvantages over something like sqlalchemy..


From an end-user point of view, SQLAlchemy relies on you first defining your models in the ORM (object relational mapping) and then SQLAlchemy will take control of issuing the SQL to create, update and drop tables depending on your interactions with your Python ORM models.

From what I can read, it seems that this cool looking tool allows you to use SQL as a kind of object free data store, maybe not unlike a NoSQL DB python wrapper (freeing you from first defining your models, and then ensuring that the SQLAlchemy functions have updated your DB).


Since tables are created and modified on insert commands, there doesn't seem to be any possibility of maintaining integrity at the DB level. That would seem to be the main disadvantage compared to any approach that uses schema defined in advance. You still get an RDBMS advantages for ad hoc queries, but not integrity.


Well it seems like this is heavily based upon the progress of sqlalchemy based on the shoulder of giants comment at the bottom of the page. Whether that is in a philosophical way or a technical way, I haven't looked into it enough to find out, but it would be nice to know the comparative differences and similarities.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: