Hacker News new | past | comments | ask | show | jobs | submit login

How can we import CSV dataset? HTTP API for insert returns this error message: expected JSON_OBJECT_BEGIN, got: JSON_OBJECT_BEGIN. The GROUP BY query in homepage scans 1.8B rows and only takes 1.5 seconds which is great but how many nodes used in that setup?



We do have a csv import util (the API expects JSON) but it's not in the current distribution/release build. I'll add it and update this comment once it's live.

Queries are mostly limited by IO if running on regular hard disks. The number of rows/seconds mainly depends on the number and types of columns that are accessed. For example, if we scan 1.8B rows and only load a single integer column from disk (and the integers are small), we'll only have to load about 1 byte per row from disk (using an idealized model excluding some overheads for illustration purposes). If we want to complete the query in 1.5 seconds that would be a total IO load of 1144MB/s. So (depending on disk speed) around ~15 machines would suffice.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: