Python isn't really something I would even think as possible example, Common Lis...

burntsushi · on March 13, 2021

So? I said, "higher level language." I didn't say, "Python specifically."

I would guess D could do it.

I don't know enough about Nim or Swift.

I would learn something if Common Lisp did it. I'd also learn something if Haskell or Go did it.

cb321 · on March 13, 2021

I am not trying to contradict anyone here, but any language mature enough to have an impl/way to not have arbitrary performance ceilings needs access to inline assembly/SIMD. Cython/Nim/SBCL can all do that..probably Haskell..Not so sure about Go or Swift. Anyway, many languages can respond well to optimization effort. I doubt anyone disagrees.

At the point of realizing the above no ceiling bit, the argument devolves to more one about (fairly subjective) high/low levelness of the code itself/the effort applied to optimizing, not about the language the code is written in. So, it's not very informative and tends to go nowhere (EDIT: especially when the focus is on a single, highly optimized tool like `rg` as opposed to "broad demographic traits" of pools of developers, and "levelness" is often somewhat subjective, too).

burntsushi · on March 13, 2021

You're missing the context I think. Look at what I was responding to in my initial message in this thread:

> If you have an architecture where you can afford real parallelism you can afford higher level languages anyway.

My response is, "no you can't, and here's an example."

> but any language mature enough to have an impl/way to not have arbitrary performance ceilings needs access to inline assembly/SIMD

If you ported ripgrep to Python and the vast majority of it was in C or Assembly, then I would say, "that's consistent with my claim: your port isn't in Python."

My claim is likely more subtle than you might imagine. ripgrep has many performance sensitive areas. It isn't enough to, say, implement the regex engine in C and write some glue code around that. It won't be good enough. (Or at least, that's my claim. If I'm proven wrong, then as I said, I'd learn something.)

> At the point of realizing the above no ceiling bit, the argument devolves to more one about (fairly subjective) high/low levelness of the code itself/the effort applied to optimizing, not about the language the code is written in. So, it's not very informative and tends to go nowhere.

I agree that it's pretty subjective and wishy washy. But when someone goes around talking nonsense like "if parallelism is a benefit then you're fine with a higher level language," you kind of have to work with what you got. A good counter example to that nonsense is to show a program that is written is a "lower" level language that simultaneously benefits from parallelism and wouldn't be appropriate to do in a higher level language. I happen to have one of those in my back-pocket. :-) (xsv is another example. Compare it with csvkit, even though csvkit's CSV parser is written in C, it's still dog slow, because the code around the CSV parser matters.)

cb321 · on March 13, 2021

Ok. "Afford parallelism => afford high level" with the implication of HL=slow does sound pretty off base. So, fair enough.

FWIW, as per your subtle claim, it all seems pretty hot spot optimizable to me, at least if you include the memchr/utf8-regex engine in "hot spot". I do think the entire framing has much measurement vagueness ("hot", "vast majority", "levelness", and others) & is unlikely to be helpful, as explained. In terms of "evidence", I do not know of a competitor who has put the care into such a tool to even try to measure, though. { And I love rg. Many thanks and no offense at all was intended! }

burntsushi · on March 13, 2021

ack might be an example. It's Perl, not Python, and its author is on record as saying that performance isn't his goal. So it's a bit of a strained one. But yes, it's true, I don't know any other serious grep clone in a language like Python. This is why I hedged everything initially by saying that I know that absence of evidence isn't evidence of absence. :-) And in particular, I framed this as, "I would learn something," rather than, "this is objective fact." So long as my standard is my own experience, the hand wavy aspect of this works a bit better IMO.

> I do not know of a competitor who has put the care into such a tool to even try to measure, though.

Right. Like for example, I am certain enough about my claim that I would never even attempt to do it in the first place. I would guess that others think the same. With that said, people have written grep's in Python and the like, and last time I checked, they were very slow. But yeah, the "development effort" angle of this likely makes such tools inappropriate for a serious comparison to support my claim. But then again, if I'm right, the development effort required to make a Python grep be as fast as ripgrep is insurmountable.

> it all seems pretty hot spot optimizable to me

As long as we're okay with being hand wavy, then I would say that it's unlikely. Many of the optimizations in ripgrep have to do with amortizing allocation, and that kind of optimization is just nearly completely absent in a language like Python unless you drop down into C. This amortization principle is pervasive and applies as deep as regex internals to the code the simply prints ripgrep's output (which is in and of itself a complex beast and quite performance sensitive in workloads with lots of matches), and oodles of stuff inbetween.

> { And I love rg. Many thanks and no offense at all was intended! }

:-) No offense taken. This is by far the best convo I'm having in this HN post. Lol.

Note that I've made similar claims before. In the last one, there is a lot more data: https://news.ycombinator.com/item?id=17943509

cb321 · on March 13, 2021

When I used to write in Cython + NumPy I would pre-allocate numpy arrays written into by Cython. It's C-like, but because of the gradual typing I think firmly in the higher level (for some value of "er"). One can certainly do that stuff in Nim/SBCL/etc. (and one sees it done).

While allocation is pretty pervasive, I'm skeptical that everywhere or even most places you do it is an important perf bottleneck. Without a count of these 20 times it matters and these 40 it doesn't, it's just kind guesswork from an all too often frail human memory/attention that "ignores the noise" by its very nature. You might be right. Just trying to add some color. :-)

Another way to think of this is to imagine your own codebase "in reverse". "If I drop this optim, would I see it on that profile?" Or look at the biggest piles of code in your repo and ask "Is this in the critical path/really perf necessary?" and the like. Under the assumption that higher level things would be a lot shorter that kind of thought experiment can inform. Maybe an approach toward more objectivity, anyway. Little strewn about tidbits in every module don't really count { to me :-) } - that speaks more to abstraction problems.

But I don't think there is a lot of value in all the above gendankenizing. While I realize some bad "just throw money at it" kicked this off, one of my big objections to the entire framing is that I think people and their APIs really "alter the level" of a language. Indeed their experience with the language has big impact there. Every one reading this knows C's `printf(fmt, arg1, arg2,..)`. Yet, I'd bet under 1% have heard of/thought to do an allocating (or preallocated) string builder variant like `str(sub1, sub2, ..., NULL)` or using/acquiring something like glibc's `asprintf`. People will say "C is low level - It has no string concatenation!". Yet, inside of an hour or two most medium-skill devs could write my above variadic string builder or learn about vasprintf. Or learn about Boehm-Wiser for garbage collected C or 100 other things like that CTL I mentioned elsewhere in this thread.

So what "level" is C, the language? Beats me. Does it have concatenation? Well, not spelled "a+b" but maybe spelled not much worse "str(a,b,NULL)". Level all just depends so much on how you use it. Performance is similar. Much C++ (and Rust for that matter) is terribly inefficient because of reputations for being "fast languages" leading to less care (or maybe just being done by junior devs..). These "depends" carry over to almost anything..not just Rust or C, but sometimes even English. I am usually told I write in much too detailed a way and a trimmer way might have higher persuasion/communication performance! { How's that for "meta"? ;-) }

> This is by far the best convo I'm having in this HN post. Lol.

Cool, cool. There can be a lot of "Rust Rage" out there (in both directions, probably). :)

Anyway, I don't think we'll resolve anything objective here, but don't take a lack of response as indicating anything other than that. You aren't making any strong objective claims to really rebut and I'm glad that you personally undertook the challenge to do ripgrep in any language. I do think many might have done..maybe Ada, too, and probably many more, but maybe all at the same "realized levelness". You just did not know them/feel confident about getting peformance in them. Which is fine. A.Ok, even! I guess your other biggy is Go and that might actually not have worked of all the alternatives bandied about by pjmlp and myself so far.

burntsushi · on March 13, 2021

> While allocation is pretty pervasive, I'm skeptical that everywhere or even most places you do it is an important perf bottleneck. Without a count of these 20 times it matters and these 40 it doesn't, it's just kind guesswork from an all too often frail human memory/attention that "ignores the noise" by its very nature. You might be right. Just trying to add some color. :-)

In general I agree. But I'm saying what I'm saying because of all the times I've had to change my code to amortize allocation rather than not do it. It's just pervasive because there are all sorts of little buffers everywhere in different parts of the code. And those were put there because of experimenting that said the program benefited from them.

The key here is that the loops inside of ripgrep can grow quite large pretty quickly. There's the obvious "loop over all files," and then there's "loop over all lines" and then "loop over all matches." ripgrep has to do work in each of those loops and sometimes the work requires allocation. Even allocations at the outermost loop (looping over all files) can cause noticeable degradations in speed for some workloads.

This is why I'm so certain.

The numpy example is a good one where a substantial amount of code has been written to cater to one very specific domain. And in that domain, it's true, you can write programs that are very fast.

> So what "level" is C, the language?

Oh I see, I don't think I realized you wanted to go in this direction. I think I would just say that I absolutely agree that describing languages as "levels" is problematic. There's lots of good counter examples and what not. For example, one could say that Rust is both high level and low level and still be correct.

But like, for example, I would say that "Python is high level" is correct and "Python is low level" is probably not. But they are exceptionally general statements and I'm sure counter-examples exist. They are, after all, inherently relativistic statements, so your baseline matters.

That's kind of why I've stayed in "hand wavy" territory here. If we wanted to come down to Earth, we could, for example, replace "high level languages" in the poster's original statement with something more precise but also more verbose that this discussion still largely fits.

> I am usually told I write in much too detailed a way and a trimmer way might have higher persuasion/communication performance! { How's that for "meta"? ;-) }

Yeah, it's hard to be both pithy and precise. So usually when one is pithy, it's good to take the charitable interpretation of it. But we are technical folks, and chiming in with clarifications is to be expected.

> I don't think we'll resolve anything objective here

Most definitely. At the end of the day, I have a prior about what's possible in certain languages, and if that prior is proven wrong, then invariably, my mental model gets updated. Some priors are stronger than others. :-)

> You aren't making any strong objective claims to really rebut

Right. Or rather, my claims are rooted in my own experience. If we were going to test this, we'd probably want to build a smaller model of ripgrep in Rust, then try implementing that in various languages and see how far we can get. The problem with that is that the model has to be complex enough to model some reasonable real world usage. As you remove features from ripgrep, so to do you remove the need for different kinds of optimizations. For example, if ripgrep didn't have replacements or didn't do anything other than memory map files, then that's two sources of alloc amortization that aren't needed. So ultimately, doing this test would be expensive. And that's ignoring the quibbling folks will ultimately have about whether or not it's fair.

> I guess your other biggy is Go and that might actually not have worked of all the alternatives bandied about by pjmlp and myself so far.

I would guess Go would have a much better shot than Python. But even Go might be tricky. Someone tried to write a source code line counter in Go, put quite a bit of effort into it, and couldn't get over the GC hurdle: https://boyter.org/posts/sloc-cloc-code/ (subsequent blog posts on the topic discuss GC as well).

cb321 · on March 14, 2021

I feel we've talked past each other about what is/is not Python a few times. There is Cython and Pythran and Pypy and ShedSkin and Numba and others that are targeting, for lack of a more precise term, "extreme compatibility with" CPython, but also trying to provide an escape hatch for performance which includes in-language low levelness including allocation tricks that are not "mainstream CPython" (well, Pypy may not have those...).

My first reply was questioning "what counts" as "Python". Cython is its own language, not just "C", nor just "Python", but can do "low level things" such as using C's alloca. Maybe the only prior update here is on the diversity of "Python" impls. There are a lot. This is another reason why language levelness is hard to pin down which was always my main point, upon which we do not disagree. Maybe this is what you meant by "exceptionally general", but I kinda feel like "there isn't just one 'Python'" got lost. { There used to be a joke.."Linux isn't" related to the variety of distros/default configs/etc. :-) }

Advice-wise, I would say that your claim can be closer to easily true if you adjust it to say "ripgrep needs 'low level tricks' to be fast and a language that allows them, such as Rust". That phrasing side-steps worrying about levelnesses in the large of programming languages, re-assigns it to techniques which is more concrete and begs the question of technique enumeration. That is the right question to beg, though, if not in this conversation then in others. You might learn how each and every technique has representation in various other programming languages. It's late for me, though. So, good night!

burntsushi · on March 14, 2021

Ah I see. You are right. I missed that you were going after that. I'm personally only really familiar with CPython, so that is indeed what I had in mind. To be honest, I don't really know what a ripgrep in Cython would look like. Is there a separate Cython standard library, for example? Or do you still use Python's main standard library?

We don't have to tumble down that rabbit hole though. If someone wrote a ripgrep in Cython and matched performance, then I would definitely learn something.

> "ripgrep needs 'low level tricks' to be fast and a language that allows them, such as Rust"

I might use that, sure. I think my point above was that I had to riff off of someone else's language. But I think we covered that. :-) In any case, yes, that phrasing sounds better.

Anyway, good chatting with you, good night!

petdance · on March 15, 2021

> I do not know of a competitor who has put the care into such a tool to even try to measure, though.

As an aside, I'm the author of ack, and I would ask that folks not use the word "competitor" to describe different projects in the same space. Speaking for me/ack and burntsushi/ripgrep, there is absolutely no competition between us. We have different projects that do similar things, and neither of us is trying to best the other. We are each trying to make the best project we can for the needs we are looking to fill. ripgrep won't replace ack, ack won't replace ripgrep, and neither one will replace plain ol' grep. Each has its place.

I believe this so strongly that I created a [feature comparison chart](https://beyondgrep.com/feature-comparison/) comparing various greplike tools, and a related blog post I wrote on this: [The best open source project for someone might not be yours, and that's OK](https://blog.petdance.com/2018/01/02/the-best-open-source-pr...)

eru · on March 14, 2021

I suspect Don Stewart might be able to do it in Haskell.

That's basically by knowing enough about GHC to carefully trigger all the relevant optimizations.

burntsushi · on March 14, 2021

Hah. I'm highly skeptical. But I suppose if anyone could do it, it'd be him. I would certainly learn something. :-)

I've tried optimizing Haskell code myself before. It did not go well. It was an implementation of the Viterbi algorithm actually. We ported it to Standard ML and C and measured performance. mlton did quite well at least.

We published a paper about the process of writing Viterbi in Haskell in ICFP a few years back: https://dl.acm.org/doi/pdf/10.1145/2364527.2364560?casa_toke...

Unfortunately, the performance aspect of it was only a small part, and we didn't talk about the C or mlton ports in the paper.

eru · on March 15, 2021

Very interesting!

I suspect you could make a very Haskell-like language that's also really fast, but you'd have to base it on linear types from the ground up, and make everything total by default. (Hide non-total parts behind some type 'tag' like we do with IO in current Haskell (and have something like unsafePerformPartial when you know your code is total, but can't convince the compiler).)

That way the compiler can be much more aggressive about making things strict.

cb321 · on March 13, 2021

Cython with all the appropriate cdef type declarations can match C and so might also do it. Not sure Cython exactly counts as "Python"..it's more a superset/dialect { and I also doubt such a port would hold many lessons for @burntsushi, but it bore noting. }