Hacker News new | past | comments | ask | show | jobs | submit login

A faster string search would be nice. I used to (and still sometimes do) use less to analyze large trace files. With a few hundred MB the searching becomes a real bottle neck.

Couldn't be that hard to do a boyer moore for non RE substrings.




One of the lesser known less features is filtering only matching lines using &pattern. This is also very cool in combination with F, ie. tail -f mode. Unfortunately it tends to be extremely slow in large files, even though grep seemingly has no problem with them. I suspect it's related to search performance.

Overall I think less is one of those tools where it's really valuable to spend 10 minutes a day in the man page for a week, which should be enough to learn essentially all of its functionality.

Markers are also very useful, particularly paired with the functionality to pipe data to another file or shell command. E.g. to extract the instance of a server error plus some lines for context from an otherwise unwieldy log file. :) I use markers rarely enough that I invariable need to reread the man-/help-page, but being aware of the functionality is half the battle. :)

Another tip: within less, press -S to toggle line wrap. (Works for most other command line options, too.)


> Markers are also very useful, particularly paired with the functionality to pipe data to another file or shell command. E.g. to extract the instance of a server error plus some lines for context from an otherwise unwieldy log file. :) I use markers rarely enough that I invariable need to reread the man-/help-page, but being aware of the functionality is half the battle. :)

Isn't that just grep -C?


Sort of an interactive version of grep -C.


Not for string search, but I got fed up with the extremely long time it takes less to precalculate line counts on large files when I was working with log files of a size in the order of GBs. The result is here: https://github.com/nhaehnle/less

Since it was only a small hack to scratch the itch I was having at the time, I never really completed that project. For example, backwards line counting is not sped up, which can sometimes be noticeable.

If you feel like working on less-speedup issues, feel free to drop me a line.


"... I got fed up with the extremely long time it takes less to precalculate line counts on large files..."

When you were experiencing long wait times, did you turn off line numbering?

  less -n logfile


Sure, but that's not an option if you actually want the line numbers. Given that it was possible to speed up the line number calculation by more than an order of magnitude, I do believe that fixing the code was the right way to go :)


You can also turn off line numbering _during_ the long "calculating line numbers" phase. Hit Ctrl-C and it says "line numbering off" or somesuch.


I wholeheartedly agree, and I'm honestly surprised nobody has done this already. I often find that grep is ridiculously fast compared to less. It seems like a huge shame that a tool uses by so many people on such a regular basis is so slow at such a simple, commonplace task.

Honestly I'd take a stab at myself if I had the time. Maybe I should start a kickstarter or something like that.


Would using ag help? No longer maintained AFAIK, but fast for me.


where did you get that idea? last commit was 7 days ago, no mention of abandonment


From the status of the Ubuntu ppa, and (I thought) comments on his site. Thanks for pointing out that's not the case.


As far as I know, it still holds that ag isn't fast, ack is slow ;-) That is to say: grep is pretty fast (too). In other words, ack improved the user-interface and api for search across files, with an eye towards programming and editors, but was relatively slow -- ag aims to keep the improved ui/api/output but bring speed back up to gnu grep-like levels.


I think if your grep has the -s flag (the GNU 'treat as string' option) you're correct. If not, it may be faster :)


You might be thinking of -F/--fixed-strings. -s is slient (long option --no-messages). For GNU grep 2.12, anyway. Or you might be thinking of BSD grep:

http://lists.freebsd.org/pipermail/freebsd-current/2010-Augu...

edit: Eg, with warm cache:

    :~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |wc -l)
    2765699

    real    0m5.021s
    user    0m0.144s
    sys     0m0.792s
    :~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |grep -E 'Some  pattern' -v -c)
    2765700

    real    0m5.133s
    user    0m0.264s
    sys     0m0.852s
    :~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |grep -E 'Some..pattern' -v -c)
    2765700

    real    0m5.144s
    user    0m0.400s
    sys     0m0.768s

    # "%% " used for leading comment lines in some of this code:
    :~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |grep -E '^%% ' -c)
    27535

    real    0m5.597s
    user    0m0.520s
    sys     0m0.788s

    :~/tmp/riak/riak-2.0.0pre5/deps $ du -hcs .
    405M    .
    405M    total

    :~/tmp/riak/riak-2.0.0pre5/deps $ time (find . -type f -exec cat '{}' \; |ag '^%% ' >/dev/null)

    real    0m5.735s
    user    0m1.480s
    sys     0m0.876s
   
    #actually find/cat is pretty slow -- I guess both GNU grep and ag
    #use nmap to good effect:

    $ time rgrep '^%% ' . > /dev/null

    real    0m0.539s
    user    0m0.404s
    sys     0m0.128s
    
    :~/tmp/riak/riak-2.0.0pre5/deps $ time ag '^%% ' . |wc -l
    27500

    real    0m0.252s
    user    0m0.284s
    sys     0m0.068s

    :~/tmp/riak/riak-2.0.0pre5/deps $ time rgrep -E '^%% ' . |wc -l
    27553

    real    0m0.535s
    user    0m0.396s
    sys     0m0.140s

Note that grep clearly goes looking in more files here (more mathcing lines). Still, I guess ag is indeed faster than grep in some cases (even if it might not be apples to apples depending how you count -- of course the whole point of ag is to help search just the right files).

    :~/tmp/riak/riak-2.0.0pre5/deps $ time rgrep -E 'Some  pattern' . |wc -l
    0

    real    0m0.266s
    user    0m0.128s
    sys     0m0.132s
    :~/tmp/riak/riak-2.0.0pre5/deps $ time rgrep -E 'Some..pattern' . |wc -l
    0

    real    0m0.338s
    user    0m0.212s
    sys     0m0.120s
    :~/tmp/riak/riak-2.0.0pre5/deps $ time ag 'Some..pattern' . |wc -l
    0

    real    0m0.111s
    user    0m0.100s
    sys     0m0.076s
I guess ag is indeed faster, even if it might not be due to fixed string search...

[edit2: For those wondering that's an (old) ssd, on an old machine -- but with ~4G ram the working set should fit, as soon as some of my open tabs in ff are paged to disk...]


Thanks for benchmarking ag against grep. You're right that it's not exactly apples to apples. Ag doesn't search as many files, but it does parse and match against rules in .ag/.git/.hgignore. Also, ag prints line numbers by default, which can be an expensive operation on larger files.

I think most of the slowdown you're seeing with "find -exec | cat" is forking at least two processes (ag and cat) for each file. Also, each process has to be run sequentially (to prevent garbled output), which makes use of only one CPU core most of the time. I've tried to keep ag's startup time fast so that classic find-style commands still run quickly. (This is why ag doesn't support a ~/.agrc or similar.)

Just FYI, you can use ag --stat to see how many files/bytes were searched, how long it took, etc. I think I'll add some stats about obeying ignore rules, since some of those can be outright pathological in terms of runtime cost. In many cases, ag spends more time figuring out what to search than actually searching.


I tried to gauge the cpu usage (just looking at the percentage as listed in xmobar) -- but both grep and ag are too fast on the ~400mb set of files for that to work... As I have two cores on this machine, the difference between ag and rgrep could indeed be ag's use of threads.

Many thanks for not just writing and sharing ag as free software, but for the nice articles describing the design and optimizations!

At least this brief benchmarking run convinced me that I should probably try to integrate ag in my work flow :-)


Quickly reviewing some of the posts on the ag blog/page[1], I'm guessing the speedup is mainly from a custom dir scanning algorithm and possibly from running two threads.

In the course of checking out ag (again) I also learned about gnu id-utils[2].

[1] http://geoff.greer.fm/ag/ [2} http://www.delorie.com/gnu/docs/id-utils/id-utils_1.html


This was very confusing until I realized HN had invisible code boxes with a fixed width.


There should be a scrollbar on the bottom (I kept the commands on one line, rather than splitting with "\"). Might not be on mobile, though? In other words, the code-boxes should have overflow:scroll or something to that effect.


I think Chrome on OSX hides scroll bars by default unless you're scrolling. Regardless, the box is tall enough that it doesn't fit in my viewport so I wouldn't see the bottom scrollbar anyway.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: