Hacker News new | past | comments | ask | show | jobs | submit login

Nice work! I am a fan of tools like this.

However, in my first attempted query (version 3.1.6 on MacOS), I ran into significant performance limitations and more importantly, it did not give correct output.

In particular, running on a narrow table with 1mm rows (the same one used in the xsv examples) using the command "select country, count(1) from worldcitiespop_mil.csv group by country" takes 12 seconds just to get an incorrect error 'no such column: country'.

using sqlite3, it takes two seconds or so to load, and less than a second to run, and gives me the correct result.

Using https://github.com/liquidaty/zsv (disclaimer, I'm one of its authors), I get the correct results in 0.95 seconds with the one-liner `zsv sql 'select country, count(1) from data group by country' worldcitiespop_mil.csv`.




Hi, author of q here.

Regarding the error you got, q currently does not autodetect headers, so you'd need to add -H as a flag in order to use the "country" column name. You're absolutely correct on failing-fast here - It's a bug which i'll fix.

In general regarding speed - q supports automatic caching of the CSV files (through the "-C readwrite" flag). Once it's activated, it will write the data into another file (with a .qsql extension), and will use it automatically in further queries in order to speed things considerably.

Effectively, the .qsql files are regular sqlite3 files (with some metadata), and q can be used to query them directly (or any regular sqlite3 file), including the ability to seamlessly join between multiple sqlite3 files.

http://harelba.github.io/q/#auto-caching-examples


Ah, got it, thank you!

Just one minor suggestions/feedback point, in case you find helpful, which is that I had to also add the `-d` flag with a comma value. Otherwise with just -H, I get the error "Bad header row" even though my header was simply "Country,City,AccentCity,Region,Population,Latitude,Longitude".

This suggests to me that `q` is not assuming the input to be a CSV file, but that seems at odds with the first example in the manual, which is `q "select * from myfile.csv"`, with no `-d` flag. Or perhaps the first example also isn't using a csv delimiter, but it doesn't matter because no specific column is being selected?

In addition, given that, from what I gather, a significant convenience of `q` is its auto-detection, then I think it would make sense for it to notice when the input table name ends in ".csv" and based on that, to assume a comma delimiter.

Just my 2 cents. Great job!


Hi again, thanks a lot for the suggestions!

You're absolutely right about the auto-detection (and documentation) of both the header row and the delimiter, I was busy with the auto-caching ability in the last few months in order to provide generic sqlite3 querying, so never got around to it.

I will update the docs and also add the auto-detection capability soon.

Harel




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: