look(1): lines beginning with given string

jzwinck · on Dec 13, 2015

The universe of Unix filters never ceases to turn up things I haven't seen before. Other examples for those inclined: tac, sponge, pv, join, paste.

ryan-c · on Dec 13, 2015

Some others I like:

rev, vipe, comm, bfr, xxd

lappa · on Dec 13, 2015

Do you know of a good resource listing all of these?

kevinoid · on Dec 13, 2015

The standardized ones are described in the Shell & Utilities section of POSIX:2008/SUSv4 online at http://pubs.opengroup.org/onlinepubs/9699919799/

jph · on Dec 13, 2015

Yes, Num command has a quick list: http://www.numcommand.com/doc/helpers.html

dap · on Dec 13, 2015

`ls /usr/bin/?? /usr/bin/???` can be interesting to skim through.

cfcef · on Dec 13, 2015

In the same vein of 'improvements over things everyone assumes you have to do with the more famous tool', we could say grep '^string' : look :: sort -R : shuf

nhaehnle · on Dec 13, 2015

Thanks to @climagic on Twitter, I recently learned that sort -R and shuf are not in fact the same thing. (sort -R doesn't actually shuffle...)

cfcef · on Dec 13, 2015

My point there was that besides being a true shuffle, 'shuf' also has an algorithmic advantage when you only need a few lines (similar to how lookup has an advantage when you only need a few).

ryan-c · on Dec 13, 2015

look does a binary search and only works on sorted files, while grep '^string' works on any text file but would be slower.

gizmo686 · on Dec 13, 2015

On my machine, (Ubuntu 14.04) look only performs a binary search if the "-b" option is passed (unless no file is specified, in which case it behaves as described in the linked manpage). My manpage notes in the compatibility section:

look uses a linear search by default instead of a binary search, which is what most other implementations use by default.

nonotmeplease · on Dec 13, 2015

Is look of any use? Isn't grep fast enough?

LukeShu · on Dec 13, 2015

A better one-line description would be

"look(1): binary search for lines with a given prefix in a sorted file"

Which tells you exactly what the pros/cons are compared to plain grep.

nonotmeplease · on Dec 13, 2015

I created a 1.1G file with dictionary words and timed...

  % du -h a  
  1.1G	a
  % time look 'dog' a | wc -l
  53856

  real	0m0.021s
  user	0m0.020s
  sys	0m0.003s
  % time grep '^dog' a | wc -l
  53856

  real	0m28.593s
  user	0m0.977s
  sys	0m2.223s

Ok grep performed worse than what I expected (sort took a long time though).

DougMerritt · on Dec 13, 2015

'look' can stop working when it finishes the sorted section of 'dog', whereas grep must continue to the end of the file.

Ignoring memory hierarchy, naturally 'look' would be O(log n) while grep would be O(n).

Special input (sorted in this case) of course often calls for matching algorithms, when speed matters.

ambrosio · on Dec 13, 2015

That seem abysmally slow for grep. On my machine, with a 1.1G file made of 1200 copies of /usr/share/dict/words, GNU grep 2.16 takes about 1.5 seconds. Try with LC_ALL=C?

justinator · on Dec 13, 2015

Is grep optimized to be fast, though? There seems to be a lot of, "better at grep than grep" tools out there,

http://beyondgrep.com/

hyperpape · on Dec 13, 2015

There's some real effort that was put into it: https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug.... I don't know how it compares to all the alternatives.

mveety · on Dec 13, 2015

Yeah there's a lot of that, but grep is like using duct tape to hold something together when you really should be using screws or so. grep can do everything, but its not always the best tool for the job.

DougMerritt · on Dec 13, 2015

The "ack" you reference is apparently faster only in that it's easier to specify exactly which files to search or not to search, which obviously can also be done with grep, e.g. combined with 'find'.

As for grep itself, it has had many incarnations over the decades with various pluses and minuses.

The current GNU grep allows 3 kinds of regex, basic/extended/perl -- the latter being what ack supports.

Note that Perl regexes have extensions beyond regular languages that are inherently slower than the automata specified by basic regexes. Power versus speed.

E.g. grep(1): "Back-references are very slow, and may require exponential time."

For further info:

> why GNU grep is fast > Mike Haertel mike at ducky.net > Sat Aug 21 03:00:30 UTC 2010

> Here's a blog post from 2006 about a developer trying to "beat grep" and looking at the algorithms it uses; it goes into a little more detail about the "doesn't need to do the loop exit test at every step" optimization mentioned in this email.

http://ridiculousfish.com/blog/posts/old-age-and-treachery.h...

via

https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...

The best writeup is surely by the inimitable Russ Cox, who really really explains clearly when grep as of 2007 was one of the only fast regex implementations:

Regular Expression Matching Can Be Simple And Fast [#1] (but is slow in Java, Perl, PHP, Python, Ruby, ...)

Russ Cox 2007 jan

http://swtch.com/~rsc/regexp/regexp1.html

(This is a 4 part series but IIRC part 1 has the highlights)

I'm sure that various other tools have been strongly influenced by this famous essay, and so many more things may be as fast as grep by now, but still...

P.S. one of the other high profile "ack"-like search tools would be "ag", aka "The Silver Surfer".

"The Silver Searcher is a 3-5x faster drop in replacement for ack (which itself is better than grep)."

2013

https://www.reddit.com/r/programming/comments/16bvah/the_sil...

robryk · on Dec 13, 2015

Seems that most time for grep was spent waiting on disk. This underlines the reason look may be significantly faster: actually reading the bad lines may take time.

hanez · on Dec 13, 2015

When running 'look' to get all comments in a file I only get the first found line in the ouput. Can someone explain this?

look "#" .zshrc

There are really a lot more comments (lines starting with '#') in my .zshrc then the first line.

kbd · on Dec 13, 2015

From the manpage:

> As look performs a binary search, the lines in file must be sorted...

I assume you didn't sort your source file first.

hanez · on Dec 13, 2015

When sorting the file first I get no output anymore:

sort ~/.zshrc -o /tmp/.zshrc

look "#" /tmp/.zshrc

Sorry for the noise but I am confused... :|

emmelaich · on Dec 13, 2015

sort(1) might not use the same ordering as look(1)

Try setting LC_COLLATE=C and export it and retry.

hanez · on Dec 13, 2015

Uh, wow! That was it!

Thanks for this answer...

andreasvc · on Dec 13, 2015

It doesn't make sense to sort a configuration file, order and context matters. Easier to just use grep '^#' (or even better, ag).

hanez · on Dec 13, 2015

Yes, I am aware of that but I wanted to understand the usage of 'look' at this point. It was a fault to try stuff using the .zshrc file at first because I ran into misunderstanding everything. It was just the first file that came in mind to me. Since sorting a source file and then "looking" for comments makes no sense in any way, grep seems to be really the better choice.

dmd · on Dec 13, 2015

Your .zshrc is almost certainly not a sorted file.

grymoire1 · on Dec 13, 2015

One of the useful things about look is that it defaults to the dictionary. So if you weren't sure how to spell accommodate, you could type

look ac | grep date

hyperpape · on Dec 13, 2015

Only 12 hours after looking at this did I realize that if your log files have one record per line, you can easily use this to search for all logs that happened in a particular ten minute period.

selfishAIgen · on Dec 13, 2015

look a |wc -l => 5985

look -b a |wc -l => 1228

look -bf a |wc -l => 1228

You must sort the /usr/share/dict/words file according to ignore case for the option bf to work properly.

Edit: I was expecting

sort -f /usr/share/dict/words | look -bf a | wc -l to be 5985 but the result is 1228, don't know why.

vortico · on Dec 13, 2015

If you don't have this basic linux utility, you can use

    grep '^string' file.txt

but there's a greater chance of having look installed than grep.

DiabloD3 · on Dec 13, 2015

As long as I have been using *nixes, I have never met a machine that didn't have grep. It was written by Ken Thompson himself in 1974, or 41 years ago, and first came out in Unix 4th edition.

userbinator · on Dec 13, 2015

grep is also specified by POSIX, whereas look isn't.

mtdewcmu · on Dec 13, 2015

The interesting thing about look is that it uses binary search, so it is much faster.

discardorama · on Dec 13, 2015

Binary search is useless unless the file is sorted. Most often my files are not sorted, so egrep '^line' works just fine.

alayne · on Dec 13, 2015

It obvious that look was written as an optimization for this specific case though. The default file is the user dictionary which is interesting. Do you think they didn't know about grep in 1979?

mtdewcmu · on Dec 13, 2015

The interesting use case for me is that you can make a sorted index and look up records efficiently, i.e. within a script. It works with ctags indexes, for instance.

userbinator · on Dec 13, 2015

Binary search is only faster if access to the elements is constant-time, which is not true for a file with varying-length records (lines). Line 1000 (counting from 0) could be at offset 2000 if each line is a single character + newline, or it could be at offset 2000000 if each line is 999 characters and a newline, or somewhere in between.

mtdewcmu · on Dec 13, 2015

You don't need to know the number of the line you're looking at. You jump to the middle of the file, and you compare the nearest line to the search pattern. Then you recurse on the top or bottom half.

userbinator · on Dec 13, 2015

That still requires random seeking; it's not so bad on SSDs, but abysmal on rotating disks, and most of the filesystem and OS caching code is optimised for linear forward reads, so things like prefetching are not going to help at all.

mtdewcmu · on Dec 13, 2015

At some point the cost of scanning a long file will overcome the cost of doing log(n) seeks. On short files, it really doesn't matter what you do, searching will be fast regardless.

alayne · on Dec 13, 2015

It's probably much faster in most cases of text files though.

I also think your constant time claim sounds too strong. You can eat a lot of end of line search time after the binary search and still beat a linear/regex search.

As someone else pointed out, look can also exit sooner.

grymoire1 · on Dec 13, 2015

According to Wikipedia, grep was first used in "Version 4 AT&T UNIX" According to my 1984 Eunice manual, Look was part of the 7th edition of Unix.