Hacker News new | past | comments | ask | show | jobs | submit | harelba's comments login

Hi, author of q here.

Regarding the error you got, q currently does not autodetect headers, so you'd need to add -H as a flag in order to use the "country" column name. You're absolutely correct on failing-fast here - It's a bug which i'll fix.

In general regarding speed - q supports automatic caching of the CSV files (through the "-C readwrite" flag). Once it's activated, it will write the data into another file (with a .qsql extension), and will use it automatically in further queries in order to speed things considerably.

Effectively, the .qsql files are regular sqlite3 files (with some metadata), and q can be used to query them directly (or any regular sqlite3 file), including the ability to seamlessly join between multiple sqlite3 files.

http://harelba.github.io/q/#auto-caching-examples


Ah, got it, thank you!

Just one minor suggestions/feedback point, in case you find helpful, which is that I had to also add the `-d` flag with a comma value. Otherwise with just -H, I get the error "Bad header row" even though my header was simply "Country,City,AccentCity,Region,Population,Latitude,Longitude".

This suggests to me that `q` is not assuming the input to be a CSV file, but that seems at odds with the first example in the manual, which is `q "select * from myfile.csv"`, with no `-d` flag. Or perhaps the first example also isn't using a csv delimiter, but it doesn't matter because no specific column is being selected?

In addition, given that, from what I gather, a significant convenience of `q` is its auto-detection, then I think it would make sense for it to notice when the input table name ends in ".csv" and based on that, to assume a comma delimiter.

Just my 2 cents. Great job!


Hi again, thanks a lot for the suggestions!

You're absolutely right about the auto-detection (and documentation) of both the header row and the delimiter, I was busy with the auto-caching ability in the last few months in order to provide generic sqlite3 querying, so never got around to it.

I will update the docs and also add the auto-detection capability soon.

Harel


pyoxidizer is amazing indeed.

Just repackaged my own open source project for multiple platforms using pyoxidizier.

I wish it would merge into the python ecosystem itself.

Harel https://github.com/harelba


Thanks, I'll take your .bzl as inspiration [0]. How did you go about developing against the Starlark API, any IDE support?

[0] https://github.com/harelba/q/blob/master/pyoxidizer.bzl


Hi, q's creator here,

Any kind of input/output delimiter is supported (-d <delim> and -D <delim>), and also multiple encodings (-e <encoding>). Also, q performs automatic type inference over the actual data.

Encoding autodetection and fixed width files are not supported though.


The company I work for also does delimiter autodetection, quote character inference (from a limited set), and encoding inference (which is mostly limited to utf8 / windows-1252 / iso-8859-15, but it can't reliably differentiate latter two).


q supports any kind of input and output delimiter (-d <input-delim> and -D <output-delim> respectively).

harelba (creator of q)


q has a windows installer as well.

http://harelba.github.io/q/

Harel


Hi. q's developer here. Thanks for the mention and kind words everyone.

I've considered the searchability issue when deciding on a name for it, but eventually favored the day-to-day minimum-typing short name over better searchability.

Anyway, you can search for "harelba q" in order to find it if needed.

Harel @harelba


There's a command line tool called q, which allows performing SQL-like queries directly on text files, basically treating text as data and auto detecting column types.

http://harelba.github.io/q/


Neat, but auto-detection is exactly what I don't want. We have structure on one side. Why round-trip it through an unstructured format and attempt to guess the exact same structure on the other side? If I guess wrong, it's a security hole.


Hi, I'm q's creator, Harel.

There are obviously lots of other software which can provide a similar capability, and while I haven't checked all of them out, I'm really believe that most of them do a great job. However, my rationale for creating this tool was to provide a seamless addition to the Linux command line toolset - A tool as most Linux commands are, and not a capability. The distinction I'm doing here is that tools are reusable, composable and such, vs a capability which is usually less reusable in different contexts. I'm sure that some of the above are definitely tools. I just hope that the tool I have created provides value to people and helps them with their tasks.

As I posted here elsewhere, my complete rationale for creating the tool is available on the README of the github project. Comments and issues are most welcome.

Harel Ben-Attia


Hi, I'm q's creator, Harel Ben-Attia.

The Linux toolset is really great, and I use it extensively. The whole idea of the tool is not to replace any of the existing tools, but to extend the toolset to concepts which treat text as data. In a way, it's a metatool which provides an easy and familiar way to add more data processing concepts to the linux toolset. There are many cases where I use 'wc -l' in order to count rows in a file, but if i need to count the rows of only the ones which have a specific column which is larger than the value X, or get the sum of some column per group, then q is a simple and readable way to do it properly, without any need for "tricks".

My rationale for creating it is also explained in the README of the github project.

Any more comments are most welcome.

Harel


Hi, i'm q's creator (HN made the name q uppercase, but it's actually a lowercase q). The reasoning was that it's used as a command line tool, and used often. So "q" and not "Q" :)I'm currently preparing the debian package, and one-letter names are not allowed, so it's going to be named "qsql" there.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: