More

harelba · on Sept 22, 2022

Hi, author of q here.

Regarding the error you got, q currently does not autodetect headers, so you'd need to add -H as a flag in order to use the "country" column name. You're absolutely correct on failing-fast here - It's a bug which i'll fix.

In general regarding speed - q supports automatic caching of the CSV files (through the "-C readwrite" flag). Once it's activated, it will write the data into another file (with a .qsql extension), and will use it automatically in further queries in order to speed things considerably.

Effectively, the .qsql files are regular sqlite3 files (with some metadata), and q can be used to query them directly (or any regular sqlite3 file), including the ability to seamlessly join between multiple sqlite3 files.

http://harelba.github.io/q/#auto-caching-examples

mattewong · on Sept 22, 2022

Ah, got it, thank you!

Just one minor suggestions/feedback point, in case you find helpful, which is that I had to also add the `-d` flag with a comma value. Otherwise with just -H, I get the error "Bad header row" even though my header was simply "Country,City,AccentCity,Region,Population,Latitude,Longitude".

This suggests to me that `q` is not assuming the input to be a CSV file, but that seems at odds with the first example in the manual, which is `q "select * from myfile.csv"`, with no `-d` flag. Or perhaps the first example also isn't using a csv delimiter, but it doesn't matter because no specific column is being selected?

In addition, given that, from what I gather, a significant convenience of `q` is its auto-detection, then I think it would make sense for it to notice when the input table name ends in ".csv" and based on that, to assume a comma delimiter.

Just my 2 cents. Great job!

harelba · on Sept 23, 2022

Hi again, thanks a lot for the suggestions!

You're absolutely right about the auto-detection (and documentation) of both the header row and the delimiter, I was busy with the auto-caching ability in the last few months in order to provide generic sqlite3 querying, so never got around to it.

I will update the docs and also add the auto-detection capability soon.

Harel

harelba · on Dec 4, 2021

pyoxidizer is amazing indeed.

Just repackaged my own open source project for multiple platforms using pyoxidizier.

I wish it would merge into the python ecosystem itself.

Harel https://github.com/harelba

alex_hirner · on Dec 4, 2021

Thanks, I'll take your .bzl as inspiration [0]. How did you go about developing against the Starlark API, any IDE support?

[0] https://github.com/harelba/q/blob/master/pyoxidizer.bzl

harelba · on Nov 18, 2018

Hi, q's creator here,

Any kind of input/output delimiter is supported (-d <delim> and -D <delim>), and also multiple encodings (-e <encoding>). Also, q performs automatic type inference over the actual data.

Encoding autodetection and fixed width files are not supported though.

barrkel · on Nov 19, 2018

The company I work for also does delimiter autodetection, quote character inference (from a limited set), and encoding inference (which is mostly limited to utf8 / windows-1252 / iso-8859-15, but it can't reliably differentiate latter two).

harelba · on Nov 18, 2018

q supports any kind of input and output delimiter (-d <input-delim> and -D <output-delim> respectively).

harelba (creator of q)

harelba · on Aug 19, 2015

q has a windows installer as well.

http://harelba.github.io/q/

Harel

harelba · on Aug 17, 2015

Hi. q's developer here. Thanks for the mention and kind words everyone.

I've considered the searchability issue when deciding on a name for it, but eventually favored the day-to-day minimum-typing short name over better searchability.

Anyway, you can search for "harelba q" in order to find it if needed.

Harel @harelba

harelba · on Oct 31, 2014

There's a command line tool called q, which allows performing SQL-like queries directly on text files, basically treating text as data and auto detecting column types.

http://harelba.github.io/q/

geofft · on Nov 2, 2014

Neat, but auto-detection is exactly what I don't want. We have structure on one side. Why round-trip it through an unstructured format and attempt to guess the exact same structure on the other side? If I guess wrong, it's a security hole.

harelba · on Feb 24, 2014

Hi, I'm q's creator, Harel.

There are obviously lots of other software which can provide a similar capability, and while I haven't checked all of them out, I'm really believe that most of them do a great job. However, my rationale for creating this tool was to provide a seamless addition to the Linux command line toolset - A tool as most Linux commands are, and not a capability. The distinction I'm doing here is that tools are reusable, composable and such, vs a capability which is usually less reusable in different contexts. I'm sure that some of the above are definitely tools. I just hope that the tool I have created provides value to people and helps them with their tasks.

As I posted here elsewhere, my complete rationale for creating the tool is available on the README of the github project. Comments and issues are most welcome.

Harel Ben-Attia

harelba · on Feb 24, 2014

Hi, I'm q's creator, Harel Ben-Attia.

The Linux toolset is really great, and I use it extensively. The whole idea of the tool is not to replace any of the existing tools, but to extend the toolset to concepts which treat text as data. In a way, it's a metatool which provides an easy and familiar way to add more data processing concepts to the linux toolset. There are many cases where I use 'wc -l' in order to count rows in a file, but if i need to count the rows of only the ones which have a specific column which is larger than the value X, or get the sum of some column per group, then q is a simple and readable way to do it properly, without any need for "tricks".

My rationale for creating it is also explained in the README of the github project.

Any more comments are most welcome.

Harel

harelba · on Feb 24, 2014

Hi, i'm q's creator (HN made the name q uppercase, but it's actually a lowercase q). The reasoning was that it's used as a command line tool, and used often. So "q" and not "Q" :)I'm currently preparing the debian package, and one-letter names are not allowed, so it's going to be named "qsql" there.