A faster string search would be nice. I used to (and still sometimes do) use less to analyze large trace files. With a few hundred MB the searching becomes a real bottle neck.
Couldn't be that hard to do a boyer moore for non RE substrings.
One of the lesser known less features is filtering only matching lines using &pattern. This is also very cool in combination with F, ie. tail -f mode. Unfortunately it tends to be extremely slow in large files, even though grep seemingly has no problem with them. I suspect it's related to search performance.
Overall I think less is one of those tools where it's really valuable to spend 10 minutes a day in the man page for a week, which should be enough to learn essentially all of its functionality.
Markers are also very useful, particularly paired with the functionality to pipe data to another file or shell command. E.g. to extract the instance of a server error plus some lines for context from an otherwise unwieldy log file. :) I use markers rarely enough that I invariable need to reread the man-/help-page, but being aware of the functionality is half the battle. :)
Another tip: within less, press -S to toggle line wrap. (Works for most other command line options, too.)
> Markers are also very useful, particularly paired with the functionality to pipe data to another file or shell command. E.g. to extract the instance of a server error plus some lines for context from an otherwise unwieldy log file. :) I use markers rarely enough that I invariable need to reread the man-/help-page, but being aware of the functionality is half the battle. :)
Not for string search, but I got fed up with the extremely long time it takes less to precalculate line counts on large files when I was working with log files of a size in the order of GBs. The result is here: https://github.com/nhaehnle/less
Since it was only a small hack to scratch the itch I was having at the time, I never really completed that project. For example, backwards line counting is not sped up, which can sometimes be noticeable.
If you feel like working on less-speedup issues, feel free to drop me a line.
Sure, but that's not an option if you actually want the line numbers. Given that it was possible to speed up the line number calculation by more than an order of magnitude, I do believe that fixing the code was the right way to go :)
I wholeheartedly agree, and I'm honestly surprised nobody has done this already. I often find that grep is ridiculously fast compared to less. It seems like a huge shame that a tool uses by so many people on such a regular basis is so slow at such a simple, commonplace task.
Honestly I'd take a stab at myself if I had the time. Maybe I should start a kickstarter or something like that.
As far as I know, it still holds that ag isn't fast, ack is slow ;-) That is to say: grep is pretty fast (too). In other words, ack improved the user-interface and api for search across files, with an eye towards programming and editors, but was relatively slow -- ag aims to keep the improved ui/api/output but bring speed back up to gnu grep-like levels.
You might be thinking of -F/--fixed-strings. -s is slient (long option --no-messages). For GNU grep 2.12, anyway. Or you might be thinking of BSD grep:
:~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |wc -l)
2765699
real 0m5.021s
user 0m0.144s
sys 0m0.792s
:~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |grep -E 'Some pattern' -v -c)
2765700
real 0m5.133s
user 0m0.264s
sys 0m0.852s
:~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |grep -E 'Some..pattern' -v -c)
2765700
real 0m5.144s
user 0m0.400s
sys 0m0.768s
# "%% " used for leading comment lines in some of this code:
:~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |grep -E '^%% ' -c)
27535
real 0m5.597s
user 0m0.520s
sys 0m0.788s
:~/tmp/riak/riak-2.0.0pre5/deps $ du -hcs .
405M .
405M total
:~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |ag '^%% ' >/dev/null)
real 0m5.735s
user 0m1.480s
sys 0m0.876s
#actually find/cat is pretty slow -- I guess both GNU grep and ag
#use nmap to good effect:
$ time rgrep '^%% ' . > /dev/null
real 0m0.539s
user 0m0.404s
sys 0m0.128s
:~/tmp/riak/riak-2.0.0pre5/deps $ time ag '^%% ' . |wc -l
27500
real 0m0.252s
user 0m0.284s
sys 0m0.068s
:~/tmp/riak/riak-2.0.0pre5/deps $ time rgrep -E '^%% ' . |wc -l
27553
real 0m0.535s
user 0m0.396s
sys 0m0.140s
Note that grep clearly goes looking in more files here (more mathcing
lines). Still, I guess ag is indeed faster than grep in some cases (even
if it might not be apples to apples depending how you count -- of course
the whole point of ag is to help search just the right files).
:~/tmp/riak/riak-2.0.0pre5/deps $ time rgrep -E 'Some pattern' . |wc -l
0
real 0m0.266s
user 0m0.128s
sys 0m0.132s
:~/tmp/riak/riak-2.0.0pre5/deps $ time rgrep -E 'Some..pattern' . |wc -l
0
real 0m0.338s
user 0m0.212s
sys 0m0.120s
:~/tmp/riak/riak-2.0.0pre5/deps $ time ag 'Some..pattern' . |wc -l
0
real 0m0.111s
user 0m0.100s
sys 0m0.076s
I guess ag is indeed faster, even if it might not be due to fixed string
search...
[edit2: For those wondering that's an (old) ssd, on an old machine -- but with ~4G ram the working set should fit, as soon as some of my open tabs in ff are paged to disk...]
Thanks for benchmarking ag against grep. You're right that it's not exactly apples to apples. Ag doesn't search as many files, but it does parse and match against rules in .ag/.git/.hgignore. Also, ag prints line numbers by default, which can be an expensive operation on larger files.
I think most of the slowdown you're seeing with "find -exec | cat" is forking at least two processes (ag and cat) for each file. Also, each process has to be run sequentially (to prevent garbled output), which makes use of only one CPU core most of the time. I've tried to keep ag's startup time fast so that classic find-style commands still run quickly. (This is why ag doesn't support a ~/.agrc or similar.)
Just FYI, you can use ag --stat to see how many files/bytes were searched, how long it took, etc. I think I'll add some stats about obeying ignore rules, since some of those can be outright pathological in terms of runtime cost. In many cases, ag spends more time figuring out what to search than actually searching.
I tried to gauge the cpu usage (just looking at the percentage as listed
in xmobar) -- but both grep and ag are too fast on the ~400mb set of
files for that to work... As I have two cores on this machine, the
difference between ag and rgrep could indeed be ag's use of threads.
Many thanks for not just writing and sharing ag as free software, but
for the nice articles describing the design and optimizations!
At least this brief benchmarking run convinced me that I should probably
try to integrate ag in my work flow :-)
Quickly reviewing some of the posts on the ag blog/page[1], I'm guessing the speedup is mainly from a custom dir scanning algorithm and possibly from running two threads.
In the course of checking out ag (again) I also learned about gnu id-utils[2].
There should be a scrollbar on the bottom (I kept the commands on one line, rather than splitting with "\"). Might not be on mobile, though? In other words, the code-boxes should have overflow:scroll or something to that effect.
I think Chrome on OSX hides scroll bars by default unless you're scrolling. Regardless, the box is tall enough that it doesn't fit in my viewport so I wouldn't see the bottom scrollbar anyway.
Couldn't be that hard to do a boyer moore for non RE substrings.