I think that the CPU cost could be reduced if the CPU had some kind of instruction for array access:
if you 'hardcode' that an 'array' has its address stored in register Rx and its length stored in Rx+1, you can easily have an instruction "checked_load Ry <- Rx, <index>" this instruction can have a latency similar to a normal load instruction as it could check the value of the index with Rx+1 in parallel to the load (1).
And if you have a 'load aligned double word' instruction which load two consecutive word in Rx and Rx+1 (aligned to avoid the thorny issue of page missing in the middle of an instruction), you can reduce the cost of having to manage both the array address and its length.
Sure this isn't free: more registers used, more cache used, still it doesn't look THAT bad from a performance POV no? Or very complex to implement. Is-there something I'm not understanding?
1: and if the checks fails there is TRAP executed, which is compatible with C/C++ semantic.
I think the problem a lot of people would have is that introducing hardware, which costs millions to design and bring to market, when it's possible to write code that isn't prone to overflow....just doesn't seem like a good idea.
When I was in school, my professor for CPU Architecture had a good saying: "If you're going down to NAND gates to debug your software you're doing something wrong".
I think another issue is that arrays, trees, and other structures are a complete abstraction. The CPU doesn't need to know a thing about them for them to work, because all it actually sees is addresses and instructions (I get that I'm grossly oversimplifying things), and so introducing that type of "check" at the hardware level is pushing the abstraction down, and I don't think that's a good idea either.
But I do think that's one of the great reasons for using a higher level language, where there are array bounds checks, etc.
To me, C/C++ is like "pretty" assemby language (Now with new features like Undefined Behavior!!!), and I like that about it. But I'm not sure how other people feel.
Anyways, that's just my take on it, take it with a grain of salt.
And if you have a 'load aligned double word' instruction which load two consecutive word in Rx and Rx+1 (aligned to avoid the thorny issue of page missing in the middle of an instruction), you can reduce the cost of having to manage both the array address and its length.
Sure this isn't free: more registers used, more cache used, still it doesn't look THAT bad from a performance POV no? Or very complex to implement. Is-there something I'm not understanding?
1: and if the checks fails there is TRAP executed, which is compatible with C/C++ semantic.