The cost is higher because of all the branches that are scattered everywhere to check return codes. With exceptions there's a check at the place the error is thrown, but that's inevitable. There aren't checks scattered throughout the rest of the code, which would otherwise reduce icache utilization.
Arguing about icache utilization is a little silly here - the code will be laid out for you as though the branches are not taken (or you should force it to do so). In that case, the only "waste" of icache is the CMP and JMP, an additional 4-8 bytes per return, and literally 0 cycles.
When you do take an error, each RET takes you 1 cycle, plus the 10-15 cycle mispredict for the CMP+JMP because there's a stack engine in the CPU that tells you the address to return to. It's counterintuitive that doing "a lot" of things is cheaper than doing fewer things, but it's true.
In comparison, an exception involves taking the one control flow break to some cold control code (maybe page faulting), figuring out where to go using a jump table (slow), restoring the old state from that context (slow), figuring out the type of the thrown object (in many languages, also slow), and then handling accordingly. Each of these steps can easily take 100+ cycles, and may be more.
The math does not work out in favor of exceptions. Neither do the benchmarks in most cases. You do 1 slow thing to avoid doing 20 things that are trivially fast.
The checks you're talking about are duplicated more or less per statement in some types of code. Every single call site ends up with an `if err != nil` or moral equivalent. It can add up, also consider the extra register pressure. The return values aren't valuable anymore, they're just error signalling.
The compiler doesn't necessarily know what your error types are, it can try to use heuristics to move those blocks around but it's not like an exception where the types are a part of the language and the compiler can know that. We're talking about startup code here, nobody will be annotating their error branches with manual predictor probabilities, so we're limited to what the compiler can do.
Yes the act of throwing an exception is more work but it's exceptional, so doesn't matter. The slowest part is calculating the stack trace anyway and that's of huge value, which you don't get with error codes anyway.
There's no register pressure - TEST EAX, EAX (or CMP, EAX, $0) // JNZ $ERROR_HANDLER is the instruction sequence we're talking about. Most error types are enums where 0 = "good" and any nonzero value is not good. This is the inverse of the "null pointer check" in C. It consumes no registers and a negligible number of code bytes.
There is obviously a sparsity of exceptional cases where error-handling code like this is worse than using exceptions. I would claim that it's a lot more sparse than you think. Many people use exceptions for things like "file not found" or "function failed for whatever reason," (my favorite) "timeout," or "bad input from the user." These cases are often not that exceptional!