> Just try these in Bash or PowerShell!
> select fullpath from files where fullpath like '%sublime%' and fullpath like '%settings%' and fullpath not like '%backup%';
This isn't a very good example, because it's trivial to do in bash:
locate sublime | grep settings | grep -v backup
(Replace `locate sublime` with `find / | grep sublime` if locate's results are too old.)
> select fullpath, bytes from files order by bytes desc limit 5;
This is better. Here it is in bash:
find / -type f -exec stat -c '%s %n' {} \; | sort -nr | head -n 5
Cherry picking another one that stood out to me.
> select writeln('/Users/SJohnson/dictionary2.txt', data) from fileslines where fullpath = '/Users/SJohnson/dictionary.txt' order by data;
cd /Users/SJohnson/; sort dictionary.txt > dictionary2.txt
Some of the rest of the examples are trivial in bash, and others look potentially useful. Of course they are trying to demonstrate its capabilities so the examples are contrived. I can see how this would be useful for someone who doesn't know the command-line, but as someone who is proficient in both SQL is pretty verbose.
In the real world I'd switch to a scripting language for some of the more complex cases, since they'd be rare.
As pointed out a lot of the verbosity in PowerShell is optional these days as there are a good number of aliases for common commands.
The nice thing about the verbosity is kind of similar to the point of SQL in that it too has something of a natural language DSL intent (in PowerShell commands are expected to be "Verb-Noun" and verbs for the most part are encouraged to be from a relatively small set of common verbs). This can make it a bear to fully write out without an IDE (and there are multiple choices these days) or tab completion (which gets better with each release).
The benefit however, is that typically you can very easily read someone's PS1 script if it has been written at full verbosity and know what it is doing. It's often very close to self-documenting at that point.
Oh, it can be. The confusion lies around the fact that you can do basic things half a dozen ways, and I also used some shorter aliases (eg Get-ChildItem (or gci, or ls) Sort-Object (sort) Select-Object (select))
You can sort of choose what level of verbosity you want to encode in your scripts for your sanity at a later date.
Not really. If you write SQL statements for a living, the former is more readable, if you sysadmin for a living, then the latter wins. A sysadmin will use tools like find, sort and head very frequently, whilst SQL statements will be fairly rare.
They are both technical, but 'order by bytes desc' has got to be more expressive than 'sort -nr'. It's almost natural human English, whereas the latter doesn't express anything.
That said, I don't know how much time it would genuinely save. As with most of these tools, you shouldn't be installing them on production servers, so you still have to know Bash anyway.
> > select fullpath, bytes from files order by bytes desc limit 5;
> This is better. Here it is in bash:
> find / -type f -exec stat -c '%s %n' {} \; | sort -nr | head -n 5
This is also a nice example of the biggest problem piped shell commands: each command is executed in isolation. Because of that you miss on many optimizations that are possible when you know the full query.
This is pretty cool, I have find myself wanting a tool like this for ages. However, does anyone know of a pure open source alternative (just for Linux) ?
A more unixy alternative to osquery is termsql (https://github.com/tobimensch/termsql): it works with anything on the input, so it's a matter of "ls"-ing the correct folder and then using SQL to output what you want.
It would not be terribly difficult to create a workalike on other * nixes as the task is effectively just mapping your filesystem's stat(2) metadata onto an SQLite db, with a virtual table representing file content and a user defined function that calls exec()* . SQLite gives you most of what you need here for free and makes the rest relatively easy. In fact a module containing the implementations for the latter can even be loaded directly into the existing sqlite3 cli.
If you added an inotify daemon and FTS indexes you would have essentially a clone of Spotlight and other indexed filesystem search engines.
* piping the filenames somewhat obviates this though
You can do most of the same kinds of things via find and grep and some shell foo, but honestly who can remember all of that? Maybe someone smarter than me, but every time I need it I am reading man pages.
The find syntax to get files modified more than 20 minutes ago? How can you remember that? But modified > now() - interval '5 minutes' (well that's postgres but still), I can remember it and I haven't used it in 2 or 3 years, because it's slightly less arbitrary and doesn't have 8 different gotchas.
EDIT:
find . -mmin -5 # that gets you files modified in the last 5 minutes. The part I can't ever remember:
find . -mmin +5 # that gives you files last modified more than 5 minutes ago
find . -mmin 5 # apparently this is files modified exactly 5 minutes ago? The fact that this syntax exists (and is different from +5) seems absurd to me. What is the resolution? It must be minutes. This option exists only to confuse people.
We looked at Log Parser before building Crab. The syntax was far from standard SQL, it didn't have joins etc.
It didn't use string matching to identify subsets of files to query.
And maybe the most important thing, it doesn't have the exec function to run operating system commands on the files you get back in your query results.
I've been looking for something like this (and thinking about developing something if I can't find a satisfactory solution) to use across many different hosts to identify duplicate files, etc. I've got media spread across many different linux and os x machines. Can crab handle this?
Now, be careful the example find duplicate file names (with equal size). Not duplicate files ! Those would require check of the contents of the file.
Also there are tools like fdupe, same problem regarding remote hosts though. Some tools use xattr to store hashes. Some might use DB's. (With xattr tools, you just run the tool on the remote host first, then on the local host, if you need to save bandwith.)
Yeah, ideally what I want is a daemon which hooks into libevent or something similar, and each time a file changes or is created, calculates a checksum and updates other metadata, and then finally provides this information back to a central queryable database.
We don't have any linux builds at the moment, if there's demand we'll make some.
We can scan mounted drives, and Crab has a command line switch to treat names as case sensitive or not. SHA-1 calls are probably not practical across a network to compare file contents, even scans can be a bit slow, but you might do something by comparing file sizes.
I think it was abandoned when Microsoft realized they could achieve similar object orientation built on the same old file system (so no incompatibilities), just with the .NET layer in between. See also PowerShell and the example above. :)
Crab doesn't pick up metadata during the scan, because this would slow the scan too much.
There is a wrapper function 'metadata' which returns a specific metadata item from a file at a given path at query run time, but this basically runs mdls under the hood.
Crab can handle extended attributes using the EVAL function function which runs an OS command and returns the result as a string. But you have to parse the string, for example to return the size of a resource fork:
It doesn't like it's open source - you download a time-limited trial version, after which you need to buy a license. And there's no link to the source.
This isn't a very good example, because it's trivial to do in bash:
(Replace `locate sublime` with `find / | grep sublime` if locate's results are too old.)> select fullpath, bytes from files order by bytes desc limit 5;
This is better. Here it is in bash:
Cherry picking another one that stood out to me.> select writeln('/Users/SJohnson/dictionary2.txt', data) from fileslines where fullpath = '/Users/SJohnson/dictionary.txt' order by data;
Some of the rest of the examples are trivial in bash, and others look potentially useful. Of course they are trying to demonstrate its capabilities so the examples are contrived. I can see how this would be useful for someone who doesn't know the command-line, but as someone who is proficient in both SQL is pretty verbose.In the real world I'd switch to a scripting language for some of the more complex cases, since they'd be rare.