Hacker News new | past | comments | ask | show | jobs | submit login

Yup, that's exactly right. '\A' or '(?-m)^' should work, but don't, because of an incorrectly applied optimization.

The bug is fixed on master. Thanks for calling this to my attention! https://github.com/BurntSushi/ripgrep/issues/1878




Hmm. Thanks for fixing this, but, two things about the tests

1. These seem like they're effectively integration tests. They check the entire ripgrep command line app works as intended. Is this because the bug was not where it looks like it is, in the regex crate, but elsewhere? If not, it seems like they'd be better as unit tests closer to where the bugs they're likely to detect would lie?

2. While repeating yourself exactly once isn't automatically a bad sign, it smells suspicious. It seems like there would be a lot of tests that ought to behave exactly the same with or without --mmap and so maybe that's a pattern worth extracting.


You're right, I probably should have added a unit test. I added one in a branch of ongoing work. I don't always add unit tests, but in this case, it was pretty easy to. (FWIW, this is in the grep-regex crate, not the regex crate. The grep-regex crate is a glue layer between regex and ripgrep that adds in a bunch of line-oriented optimizations.)

> While repeating yourself exactly once isn't automatically a bad sign, it smells suspicious. It seems like there would be a lot of tests that ought to behave exactly the same with or without --mmap and so maybe that's a pattern worth extracting.

Quite possibly. Not a bad idea. I already do that in the tests for PCRE2. Most tests are run with both the default regex engine and with PCRE2. (There are some tests that have intended behavioral differences with mmap enabled, but those can be handled on a case-by-case basis.)


> That's not correct because the `m` flag gets enabled by the multiline option.

Is this documented? RG(1) only says

            -m, --max-count <NUM>
            Limit the number of matching lines per files searched to NUM.
Is this a different option or is there an implied <NUM> that prevents 'rg -U ^' from searching from the beginning of the file?


In this context the "m flag" refers to a flag inside the regex syntax. That is, when you use ripgrep's regex library as a standalone (as Rust programmers do), then '^' only matches at the beginning of text, where as '(?m)^' enables multi-line mode and thus permits to match at either the beginning of text or at the beginning of a line.

ripgrep also has a -U/--multiline flag, but it's orthogonal to the regex mode called multiline. It's an unfortunate naming clash, but they are names in otherwise distinct namespaces.

ripgrep always enables the regex flag 'm', regardless of whether -U/--multiline is enabled or not.


Thank you for taking the time to explain!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: