I had a side job as a tech reviewer ~15 years ago and I can tell you why this is...

I had a side job as a tech reviewer ~15 years ago and I can tell you why this is never going to happen. It's a classic case of "works in theory, but impossible in practice." What you're actually asking for is a database of every permutation of hardware, hardware revision, firmware/BIOS, benchmarking software, benchmarking software version, driver version, OS, OS patch, user selectable feature state, etc. is tracked and tested. It's a nearly infinite data set size, and you'll still miss things like ambient room temperature/humidity for CPU thottling, that one finicky USB device you spilled water on that sometimes disconnects randomly, and specific workflow quirks.

"Objective" benchmarks were an almost tractable problem 15-20 years ago, but with the way modern OSes run background tasks, access network resources, and perform self-maintenance, it's even more difficult to bridge the synthetic-to-real world divide.

So, you select a point of reference (e.g. - new GPU), you choose the most common system components and a selection of components that tell a story (usually "if you're building a new PC now, here are the options"), you assemble the current versions of all software/drivers, and run your tests x times. The good sites will have "sub-stories," like specific workflows or new/changed features, but once you get to around 4 of these, you start hitting too many permutations to clearly communicate the significance of your choices.

Even if you crowdsourced it, it's A) extremely difficult to verify the integrity of results and prevent manipulation from marketing departments or brigading, and B) an unrewarding, tedious process where the best practice is to leave the machine untouched for hours. Most of the crowdsourced benchmarking sites are set up to be little competitions and sanity checks when overclocking. For a pure review of shipping hardware/software, you run the benchmark and you're done. It isn't particularly fun, requires invasive cataloguing of system specs, and isn't very new-user/first time builder friendly.

Dynamic graphs would be nice (I love them), but I suspect many review sites have run the numbers and found they reduce overall web metrics. Many (I would argue most) people don't notice interactive page components, aren't interested enough to turn their quick article skim into a deep dive, or are reading in less optimal settings (e.g. - on phone on the toilet). Instead of getting 7 page views for 1 minute each with separate graphs, you're getting 1 page view for either 1 minute (probably 70%) or 10 minutes (probably 30%) with a dynamic graph.

I'm a total data junkie and I WISH there was a good solution to these problems, but there isn't a practical implementation that I've seen or dreamt up that doesn't have enough variance to render the point of having such fine-grained data moot.