Hacker News new | past | comments | ask | show | jobs | submit login

Something I posted elsewhere:

All of these articles are frustrating because they use different environments and test sets and none of the ones I’ve read have posted the test sets up. Some people use random characters, some people use existing files. Some people use files of 1 MiB, some 100 MiB, some several GiB in size. Not only that, but the people programming the replacements don’t even normalize for the difference in machine/processor capability by compiling the competitors and GNU wc from scratch. The system wc is likely to be compiled differently depending on your machine. The multithreaded implementations are going to perform differently depending on if you’re running Chrome when you test the app or not, etc.

This would easily be solved by using the same distribution as a live USB, sharing testing sets, and compiling things from scratch with predefined options, but nobody seems to want to go to that much effort to get coherent comparisons.




>none of the ones I’ve read have posted the test sets up

I think the earliest(?) entry[1] in this "series" (the one done in Haskell) had their test input shared alongside the source code, and it has been referenced in some others (often multiplied several times over, as the original isn't very big). Beyond that it's ASCII only, the contents matter little.

>nobody seems to want to go to that much effort to get coherent comparisons

I think nobody really wants to get any coherent comparisons, because this thing isn't really a competition between the entries themselves.

[1]: https://github.com/ChrisPenner/wc (see data/big.txt)


I tested this on a fresh install of Fedora 31, so I didn’t really see any benefit of running it on a LiveUSB. As I mentioned in the article, the wc implementation I used for comparison has been compiled locally with gcc 9.2.1 and -O3 optimizations. I’ve also listed my exact system specifications there. I’ve used the unprocessed enwik9 dataset (Wikipedia dump), truncated to 100 MB and 1 GB.

I understand your frustrations with the previous posts, but I’ve tried to make my article as unambiguous as possible. Do give it a read, if you have any futher suggestions or comments, I’d be happy to hear them!


> This would easily be solved by using the same distribution as a live USB, sharing testing sets, and compiling things from scratch with predefined options, but nobody seems to want to go to that much effort to get coherent comparisons.

Alternatively, you could have a sort of "shootout CI server" where people upload their compiled binaries as Docker images and the CI server runs them against several a random subset from a set of (hidden) fixed test datasets, averaging the results. (Random and hidden such that you can't just overfit against the test set; fixed so that it's still mostly measuring the same thing.)

I think the Netflix Prize sort of worked like this?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: