Hacker News new | past | comments | ask | show | jobs | submit login
The Command Line Murders (github.com/veltman)
364 points by coolvoltage on Jan 29, 2016 | hide | past | favorite | 60 comments



Good fun, I'm going to hope that using ipython still qualifies, and it made things much easier by letting you define functions that join some ad-hoc datasets to look for matches, or read particular files.

A similar approach to different puzzles by Peter Norvig[1]

Other useful (and perhaps less common) utilities I used were 'q'[2], and the standard unix 'comm(1)'[3]

[1] http://nbviewer.jupyter.org/url/norvig.com/ipython/Fred%20Bu...

http://norvig.com/sudoku.html

[2] https://harelba.github.io/q/ - sqlite text munging ended up being a bit too clunky though, and I couldn't remember/fix the join syntax to make it worthwhile in teh end.

[3] $ comm -12 <(sort memberships/AAA) <(sort memberships/Delta_SkyMiles) > aaa_delta_comm # intersect names in 2 files. Sadly can't handle more than 2 inputs directly though, and assumes pre-sorting.


I think the goal is teaching you unix text commands. So if you use Python it's missing the point. Thoughts?


Do what is fun, there is no point


Seems equally valid to use it as a tool for learning how to manipulate data in ipython.


Here is my solution for finding the tall males driving the blue honda with the semi-known licencse plate. I found awk quite suitable.

  awk 'BEGIN {RS="\n\n"; FS="\n"} $1 ~ /L337.*9$/ && $2 ~ /Honda/ && $3 ~ /Blue/ && $5 ~ /6'\''/ {print $4}' vehicles | cut -d' ' -f2,3
Edit: And to find the intersection between the SkyMiles members and the members of the Museum of Bash History, I used fgrep:

  grep -Fxf Delta_SkyMiles Museum_of_Bash_History


Your first command uses 13 distinct non-alphanumeric characters: '{="\;}$~/.*&

It looks more like black magic than code to someone like me who doesn't know awk. It reminds me of these APL/J/K solutions I see on projecteuler.net. I imagine this is what happens when people completely ignore learning curve steepness and optimize for maximum productivity. Very impressive!


It would look better if it wasn't all on one line (could be just the way HN is displaying it).

Essentially, awk is a C like syntax with a bit of built in parsing, and a built in loop. It breaks up each input line into fields (by default using whitespace as a delimiter). The fields are dereferenced using the '$' character ($1, $2, etc).

Then, for each line of the input, the entire program gets run. The awk program consists of a conditional (basically the body of an "if" statement -- the if is implied). If that conditional is matched, the C-like program segment following it (enclosed in curly braces) gets executed. And the contents of the curly braces is basically interpreted/scripted C.

Two special conditions exist in awk -- BEGIN and END. They are evaluated (and the contents of their matching curly braces are executed) before, and after (respectively) any lines of input are read.

Hope that helps give you a start on awk -- it is a really powerful tool.

Edit: so as a quick walkthrough: Before any lines of the input text are read in, RS and FS variables get set. Then, for each line where field 1 matches the regular expression L337.*9$, and field 2 contains the word Honda, etc... it prints (outputs) the contents of field 4 (print $4).

Ok, so in addition to normal C like syntax, this example contains regular expression matching too. But other than that (and some other variables that automatically get set, like NR for number of records, NF for number of fields in the current line), most of awk programs look like C.


Thanks for explaining. awk is actually quite simple, once you get the hang of it.

It’s all on one line, because I used it as “one” command in bash. After all, this is the command line murders.


Most if is is basically just an SQL type query on a text file.

Awk actually has a fairly simple learning curve for queries like this, one can do stuff similar to this after a couple hours of playing around.

EDIT: I couldn't find the tutorial I used but the digital ocean one seems like a good brief intro that would get you 90% of the way to understanding that awk command. https://www.digitalocean.com/community/tutorials/how-to-use-...

I agree that awk, sed etc look like line noise at first!


Using more characters/fewer distinct characters wouldn't make it any more readable.

Similarly, COBOL is no more readable than Perl. Readability of texts written in programming languages comes from organization and knowing the definitions of terms and idioms, which is true of texts written in every language.

Well-written software is difficult to read because terms have precise definitions which demand precise thought; philosophical texts approach this, but only mathematics replicates it. Therefore, shifting notation isn't going to help much.


That's great, and quite concise. I didn't get further than a sed query to filter the correct paragraphs, and then replace .* with a group filter (WR|DV|P8) to narrow the search:

$ sed -rne '/L337.*9$/,/^$/p' vehicles


I don’t understand what you gain in replacing .* with (WR|DV|P8) in a second query. Could you elaborate on that?


Well, listing all the number plates that match /L337.*9/ will also return a number of other models and colours than the one we're looking for. So you could make the list smaller by only filtering on the license plates of correct models.

But you're right that there's very little gained. The effort of reducing a list of 9 to a list of 4 is better spent on advancing the puzzle :)


That was fun.

Hint: The file "vehicles" will be easier to deal with using the standard POSIX tools if you translate it into a line-oriented format. Here is a way to do it (in rot13):

    <iruvpyrf ge '\a' '|' | frq -r 'f/Yvprafr/\aYvprafr/t' > iruvpyrf.ersbeznggrq
An alternative solution is to notice that "vehicles" is almost a valid Recutils [1] file. The fix is as simple as

    frq -r '1,4q;f/Yvprafr Cyngr/YvprafrCyngr:/' iruvpyrf > iruvpyrf.erp
You can query the result with `recsel`. (Admittedly, I didn't try this until after solving the mystery.)

[1] https://www.gnu.org/software/recutils/#content


`comm -1 -3` (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/co...) is practically made for quickly finding who is a member of all involved groups.


This is just like SQL joins!


Nope, that would be join.


Indeed you're right. Jonathan Leffler's comment on Stack Overflow discusses the differences

---------

There are a couple of differences between comm and join:

    comm compares whole lines; join compares fields within lines.
    comm prints whole lines; join can print selected parts of lines.
---------

http://stackoverflow.com/questions/7234028/bash-difference-b...


Great idea!

My path to solution:

Get a list of CLUEs:

   grep -C 3 CLUE crimescene
Get a list of the 3 possible suspects based on CLUEs:

   grep -f <(grep -f memberships/AAA  memberships/Delta_SkyMiles | grep -f memberships/Terminal_City_Library  | grep -f memberships/Museum_of_Bash_History) -C 3 vehicles  | grep -A 6 L337..9 | grep -B 1 -A 4 Honda 
Then read their interviews:

   grep <Owner> people 
   tail -n +<line> streets/<street> | head -n 1
   less interviews/interview-<interview number>
Take your guess :)


Here's my take on that part (no guessing was necessary):

sort AAA Delta_SkyMiles Terminal_City_Library Museum_of_Bash_History | uniq -c -d | sort | grep '^ 4' | cut -d' ' -f8- > ../suspects


don't forget to check the interview of the witness, so you get the details of the getaway car. (after checking the clues)

  grep <Witness> people
compare to

  grep "SEE INTERVIEW" streets/*
 
also, you can add

  grep -B2 -A3 Blue
based on that inverview to your second line. once you check the interviews of the two suspects left, you won't have to guess anymore :)


I left out the grep by Blue because Teal is close to Blue.


Great idea doesn't cut it -- this is freaking incredible.


This was great fun! However, it could be called grep murder, as that's the only tool you really need!


Murder She Grepped is the name I would've gone with.


Is it considered cheating to use files to "save state"?

SPOILER:

One thing I noticed doing this was that most of the interviews that weren't Alice in Wonderland snippets were less than 3 or 4 lines long. Rather than typing in the same long filtering for loop again and again, I basically outputting the list of "good interviews" to a file which was a `for i in $(cat goodinterviews);...` away. I wasn't sure if using the disk qualifies as just using the command line though...

Also, I think more or less (the commands) are cheating just as much as an editor but many associate less as more of a "command line" tool, so the author should specifically forbid it.

Also, grep -rn is a godsend in this case, might have made it too easy :) This was quite fun!


This was fun and I learned comm to intersect text files.

A nicely paced game with a very unusual and fun mechanic that is plausably a part of a detective's skillset.


I had fun with this, and even though I've been using the terminal for ages, I still learned something new. Specifically

    $(command -v md5 || command -v md5sum)
Does anybody know anything like this, but for sql?


Try this: $ type command command is a shell builtin

Respectfully-RTM (assuming bash): $ man bash

It's a big manpage, so i'll spare you.

       command [-pVv] command [arg ...]
              Run  command  with args suppressing the normal shell function lookup. Only builtin commands or commands found in the PATH are executed.  If the -p
              option is given, the search for command is performed using a default value for PATH that is guaranteed to find all of the standard utilities.   If
              either  the  -V  or -v option is supplied, a description of command is printed.  The -v option causes a single word indicating the command or file
              name used to invoke command to be displayed; the -V option produces a more verbose description.  If the -V or -v option is supplied, the exit sta-
              tus  is  0 if command was found, and 1 if not.  If neither option is supplied and an error occurred or command cannot be found, the exit status is
              127.  Otherwise, the exit status of the command builtin is the exit status of command.

'command -v something' just tries to run the 'something' command and fails if it is not found. So, in this instance, try to run 'md5' then try to run 'md5sum'.


This little trick came in very handy

    cat <file1> <file2> ... | sort | uniq -c | sort -n
I have a variant of that which I use to count how often individual words appear in a body of text:

    cat ${FILES}  | tr [:upper:] [:lower:] | sed -r 's/\t/ /g' | sed -r "s/'s//g" | sed -r 's/ /\n/g' | tr -d [:punct:] | sort | uniq -c | sort -n


Wasn't it a bit too short though? I was kinda surprised when I found out I have neither suspect in custody, nor any hard evidence except the one I received initially.


This was fun. This was my path to the solution:

http://pastebin.com/Px4cLTJV


props to the author! This is a great fun and educational game. I'd love to find more like this. Anyone have any recommendations?


CORRECT! GREAT WORK, GUMSHOE.


First person to solve this on their phone wins.


FYI. This whole comment is just spoilers.

I feel like a homer because I started with the second clue, then the first, and, finally, the (scientifically-proven-unreliable) witness testimony.

Starting with Clue #2, get a list of everyone with the set of membership cards. Do this by combining all the lists, sort them together, use uniq to group them ' N FNAME LASTNAME', keep only those with N=4, cut off the cruft, pull each individual's info from the people file, and keep only the males.

  cat mystery/memberships/{AAA,Delta_SkyMiles,Museum_of_Bash_History,\
  Terminal_City_Library} | sort | uniq -c | grep 4 | cut -c 6- \
  | xargs -I'{}' grep '{}' mystery/people | sed -ne '/\bM\b/p'
With this list of 13 names and addresses (from 5029), I tried to find each one in the street files. HOWEVER, only four of them have existent streets.

  cat mystery/memberships/{AAA,Delta_SkyMiles,Museum_of_Bash_History,\
  Terminal_City_Library} | sort | uniq -c | grep 4 | cut -c 6- \
  | xargs -I'{}' grep '{}' mystery/people | sed -ne '/\bM\b/p' \
  | cut -f4 | cut -d, -f1 | tr ' ' '_' | xargs -I'{}' \
  ls mystery/streets/'{}' 2>/dev/null
They are Brian Boyer, Jeremy Bowers, Matt Waite, and Mike Bostock. Following this flaw, I was able to get the answer with four interviews and without consulting clues #1 nor #3.

If I were to pursue this train of thought (without the flaw), I would review clue #1 to find suspects over 6' tall (from mystery/vehicles). The list of 13 becomes just six. The four suspects mentioned above and two others; 'Augustin Lozano' and 'Nikolaus Milatz'.

Finally, I review Clue #3. I try the following commands because baristas are TERRIBLE at getting people's name correct.

  grep Annabel mystery/people | sed -ne '/\bF\b'

  grep Anabel mystery/people | sed -ne '/\bF\b'
'Annabel' pulls up only two names, 'Anabel' pulls up four. 'Annabel Church' ends up being the eyewitness. The crucial piece of info is the partial license plate number.

  sed -ne '/L337..9/,+6p' mystery/vehicles
I'm not command-fu enough to write a better sed command so I do a manual inspection. There are five cars that match the description of a 'Blue Honda'; six if you include 'Teal'. Mr. Bowers, one of my original four suspects, owns a 'Blue Honda', and Mr. Bostock owns the 'Teal'.

Anyway. This is just to say that there's more than the prescribed way to solve the game. You could, potentially, solve it with just three interviews 'Annabel', 'Bowers', and 'Bostock'. Or, like me, you could do things ass-backwards.


Wow. Lots of detail.

Zip? Seriously?


If it's a mystery to solve it makes sense not too include too many details :) You should just clone the repo. The zip option is always present on github repos.


If it bothers you that much then use git instead:

    git clone https://github.com/Kinto/kinto.git
But what you're complaining about has been an option on Github for years.


What's wrong with zip?


Absolutely nothing. Just a little discordant to see grep et al being discussed in a project packaged as a zip. I suppose people might want to play with this in Windows, although to be honest if you have gone to the trouble to install grep et al, you will have no trouble with tar either.


You're still missing the point. GitHub repos can be downloaded as a zip, because the zip file extension is easily compatible on all major operating systems.


I'm not usually one to post these, but I can't help it in this care:

https://xkcd.com/1168/


tar zxvf! For tar.gz. Never, ever understood why the tar command can't just read the extension and determine the required long list of options needed for successful extraction. Whether I can remember tar xvjpf (for tar.bz2 files) depends on many things, not sure which. To be honest, I like it when I hit a .zip, I can just type "unzip x"...

Nice xkcd :)


All I ever need to remember:

    atool -x archive.{zip,jar,rar,cab,deb,rpm,tar,tar.{gz,Z,bz2,xz}}
(list of supported formats is not complete)

With a simple alias, it isn't even necessary to remember the -x

    alias xx="atool -x"
This also protects you from badly made archives that explode hundreds of files into the current directory. All decompression is done in a temporary subdir, which is removed if there was only one file/dir at the top level.

For xkcd, see atool's --explain or --simulate options that show you the generated tar/etc commands.

http://www.nongnu.org/atool/


Atool already comes with appropriate aliases for all the common tasks:

    $ wajig list-files atool | grep bin/
    /usr/bin/atool
    /usr/bin/arepack
    /usr/bin/aunpack
    /usr/bin/apack
    /usr/bin/als
    /usr/bin/acat
    /usr/bin/adiff


Newer versions of tar do automatically use the correct procedure based on the extension, just type tar xvf.


Woah, I was stuck in the past! Thanx for the tips!


Just do "tar -xf" for all your file extracting needs. The other command-line parameters are pretty much only needed if you want to override the detected compression format.


It can.

    $ ls
    foo.tar.gz
    $ tar -xf foo.tar.gz
    $ ls
    foo.tar.gz
    my_file.md


It does, I always extract files with:

tar xf <file>

(as in eXtract File)

The xkcd answer could be: tar t

I always see people using tar -zxvf <file>, I'm guessing they tried that and it worked, and they never needed to look in the manpage. I feel that


lame@user:~/$ tar --help



Zip doesn't mean windows, lots of us Linux people don't like tar.


Lots of us Linux people don't like Zip, so?

To point out the obvious, Zip compresses files individually and then bundles them. A .tar.{Z,gz,bz2,xz,...} files bundles the files and then compresses them. The latter is better for entropy reduction, the former is better for random access. It all depends on your objectives.


Of course, but that's entirely beside the point; my comment was on the author assuming zip implied a windows user, it does not. Zip is ubiquitous on all OS's which is why github offers zip.


What always leaves a puzzled look on my face (as a Windows user) is .tar.gz. What does .tar do in the case where there's nothing but a single gzipped file being archived?


Preserves the permission, user, group, time-stamps, and complete original name and path. Granted, there's a small overlap with what gzip also provides, but a tar file is as close to the original as you generally gets. Oh, and exotic: support for sparse files (if enabled).



would have been a great opportunity for gunzip.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: