The entire problem is that there isn't a single linear measure of performance. It is the same problem one has comparing different CPU designs, but it now applies for every single chip you make.
They can certainly be clustered. But what kind of performance are you buying when you get a $model? Yes, you have another $model on your desk to compare, but the new one does not have the same performance at all. What if the one you already have is a fast one? Then you can not expect the new one to be as fast, and may prefer buying from some other manufacturer.
EDIT: To put it shorter: How do you promise you the chip I'm selling has at least some performance X when I don't have any chip with performance X for you to measure and see what it means?
To respond to your edit - tell me something I know about. Like I posted before, how many AES blocks can it decode per second. How many many NxM matrices can it multiply per second. How quickly does it match in a kd-tree. Even tell me that it runs quake at x fps etc. An abstract number (overall performance is 9001) is actually what the customer cares about the least.
I'm going to know what my workload is, or what to compare it to. If you don't know what your workload is, then you're likely a general computer use customer and don't have specific requirements.
I don't think that's so much different than what we have right now. Processors have different core counts, different buses, different feature sets, different speeds on those features, and that's still before we get into patching the microcode. Sure, the differences may be in more basic operations, but then we'll just have benchmarks which expose those numbers instead.
When you get to Amazon, and order a i3 $generation, you know exactly how many cores it will have, and what performance each core has. Every single one of those chips with the same tag have the same performance.
They can certainly be clustered. But what kind of performance are you buying when you get a $model? Yes, you have another $model on your desk to compare, but the new one does not have the same performance at all. What if the one you already have is a fast one? Then you can not expect the new one to be as fast, and may prefer buying from some other manufacturer.
EDIT: To put it shorter: How do you promise you the chip I'm selling has at least some performance X when I don't have any chip with performance X for you to measure and see what it means?