Hacker News new | past | comments | ask | show | jobs | submit login

I created a 1.1G file with dictionary words and timed...

  % du -h a  
  1.1G	a
  % time look 'dog' a | wc -l
  53856

  real	0m0.021s
  user	0m0.020s
  sys	0m0.003s
  % time grep '^dog' a | wc -l
  53856

  real	0m28.593s
  user	0m0.977s
  sys	0m2.223s
Ok grep performed worse than what I expected (sort took a long time though).



'look' can stop working when it finishes the sorted section of 'dog', whereas grep must continue to the end of the file.

Ignoring memory hierarchy, naturally 'look' would be O(log n) while grep would be O(n).

Special input (sorted in this case) of course often calls for matching algorithms, when speed matters.


That seem abysmally slow for grep. On my machine, with a 1.1G file made of 1200 copies of /usr/share/dict/words, GNU grep 2.16 takes about 1.5 seconds. Try with LC_ALL=C?


Is grep optimized to be fast, though? There seems to be a lot of, "better at grep than grep" tools out there,

http://beyondgrep.com/


There's some real effort that was put into it: https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug.... I don't know how it compares to all the alternatives.


Yeah there's a lot of that, but grep is like using duct tape to hold something together when you really should be using screws or so. grep can do everything, but its not always the best tool for the job.


The "ack" you reference is apparently faster only in that it's easier to specify exactly which files to search or not to search, which obviously can also be done with grep, e.g. combined with 'find'.

As for grep itself, it has had many incarnations over the decades with various pluses and minuses.

The current GNU grep allows 3 kinds of regex, basic/extended/perl -- the latter being what ack supports.

Note that Perl regexes have extensions beyond regular languages that are inherently slower than the automata specified by basic regexes. Power versus speed.

E.g. grep(1): "Back-references are very slow, and may require exponential time."

For further info:

> why GNU grep is fast > Mike Haertel mike at ducky.net > Sat Aug 21 03:00:30 UTC 2010

> Here's a blog post from 2006 about a developer trying to "beat grep" and looking at the algorithms it uses; it goes into a little more detail about the "doesn't need to do the loop exit test at every step" optimization mentioned in this email.

http://ridiculousfish.com/blog/posts/old-age-and-treachery.h...

via

https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...

The best writeup is surely by the inimitable Russ Cox, who really really explains clearly when grep as of 2007 was one of the only fast regex implementations:

Regular Expression Matching Can Be Simple And Fast [#1] (but is slow in Java, Perl, PHP, Python, Ruby, ...)

Russ Cox 2007 jan

http://swtch.com/~rsc/regexp/regexp1.html

(This is a 4 part series but IIRC part 1 has the highlights)

I'm sure that various other tools have been strongly influenced by this famous essay, and so many more things may be as fast as grep by now, but still...

P.S. one of the other high profile "ack"-like search tools would be "ag", aka "The Silver Surfer".

"The Silver Searcher is a 3-5x faster drop in replacement for ack (which itself is better than grep)."

2013

https://www.reddit.com/r/programming/comments/16bvah/the_sil...


Seems that most time for grep was spent waiting on disk. This underlines the reason look may be significantly faster: actually reading the bad lines may take time.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: