This article is disingenuous with its Vec benchmark. Each call to `validate` cre...

thefaux · 2024-05-14T16:53:08

> how about we: pull in a whole new dependency ('imbl') that attempts to optimize for the general case using complex trees (???????????????)

To me this is a self answering question.

ww520 · 2024-05-14T20:01:15

Yes, when I saw the immutable path is cloned and appended a key, I knew the benchmark was a strawman.

dmitry_dygalo · 2024-05-14T18:56:18

> This article is disingenuous with its Vec benchmark. Each call to `validate` creates a new Vec, but that means you allocate + free the vec for each validation. Why not store the vec on the validator to reuse the allocation? Why not mention this in the article, i had to dig in the git history to find out whether the vec was getting reallocated.

The idea comes back to [0] which is similar to one of the steps in the article, and before adding `push` & `pop` I just cloned it to make things work. That's what Rust beginners do.

> This feels like you had a cool conclusion for your article, 'linked lists faster than vec', but you had to engineer the vec example to be worse. Maybe I'm being cynical.

Maybe from today's point in time, I'd think the same.

> It would be interesting to see the performance of a `Vec<&str>` where you reuse the vector, but also a `Vec<u8>` where you copy the path bytes directly into the vector and don't bother doing any pointer traversals. The example path sections are all very small - 'inner', 'another', 5 bytes, 7 bytes - less than the length of a pointer! storing a whole `&str` is 16 bytes per element and then you have to rebuild it again anyway in the invalid case.

Yeah, that makes sense to try!

> This whole article is kinda bad, it's titled 'blazingly fast linked lists' which gives it some authority but the approach is all wrong. Man, be responsible if you're choosing titles like this. Someone's going to read this and assume it's a reasonable approach, but the entire section with Vec is bonkers.

> Why are we designing 'blazingly fast' algorithms with rust primitives rather than thinking about where the data needs to go first? Why are we even considering vector clones or other crazy stuff? The thought process behind the naive approach and step 1 is insane to me:

> 1. i need to track some data that will grow and shrink like a stack, so my solution is to copy around an immutable Vec (???)

> 2. this is really slow for obvious reasons, how about we: pull in a whole new dependency ('imbl') that attempts to optimize for the general case using complex trees (???????????????)

That's clickbait-y, though none of the article's ideas aim to be a silver bullet. I mean, there are admittedly dumb ideas in the article, though I won't believe that somebody would come up with a reasonable solution without trying something stupid first. However, I might have used better wording to highlight that and mention that I've come up with some of these ideas when was working on `jsonschema` in the past.

> I understand you're trying to be complete, but 'some scenarios' is doing a lot of work here. An Arc<[T]> approach is literally just the same as the naive approach but with extra atomic refcounts! Why mention it in this context?

If you don't need to mutate the data and need to store it in some other struct, it might be useful, i.e. just to have cheap clones. But dang, that indeed is a whole different story.

> I have no idea why you mention 'code complexity' here (complexity introduced by rust and its lifetimes), but fail to mention how adding a dependency on 'imbl' is a negative.

Fair. Adding `imbl` wasn't a really good idea for this context at all.

Overall I think what you say is kind of fair, but I think that our perspectives on the goals of the article are quite different (which does not disregard the criticism).

Thank you for taking the time and answer!

- [0] - https://github.com/Stranger6667/jsonschema-rs/commit/1a1c6c3...