Hacker News new | past | comments | ask | show | jobs | submit login

No, I don't assume people are against register allocation, but any concrete proposal I have seen kind of implies such conclusion. I am trying to understand what people actually want, since they seem clearly different from what people say they want.

Okay let's discuss a concrete example.

    *x = 12345678;
    f();
    return *x; // can you copy propagate 12345678 to here?
f() does this:

    for (int *p = 0; p++; p < MEMSIZE)
        if (*p == 12345678)
            *p = 12345679.
That is, f scans the memory for 12345678 and replace all instances with 12345679. There is no doubt this actually works that way in assembly. Things like cheat engines do this! C compilers assume this doesn't happen, because it is UB.

Hence, portable assembly C compiler can't omit any load. Now I understand there are minority of people who will answer "that's what I want!", but like register allocation, I think people generally want this to optimize. But that necessarily implies memory search-and-replace can't compile in portable assembly manner.




I can't really speak to the "portable assembler" point of view here, but if I was trying to make UB less dangerous I would say that the code had better return either 12345678 or 12345679, as long as no other memory addresses have 12345678 stored in them. Or it could trap.


> I can't really speak to the "portable assembler" point of view here, but if I was trying to make UB less dangerous I would say that the code had better return either 12345678 or 12345679

If 12345678 is acceptable to you then the language specification is already doing what you want. The alternative is to require arbitrary havocs on every memory address upon any function call to a different translation unit. Nightmare.

> Or it could trap.

UBSan exists and works very well. But introducing runtime checks for all of this stuff is not acceptable in the C or C++ communities, outside of very small niches regarding cryptography, because of the incredible overhead required. Imagine if your hardware doesn't support signed integer overflow detection. Now suddenly every single arithmetic operation is followed by a branch to check for overflow in software. Slow.


> If 12345678 is acceptable to you then the language specification is already doing what you want.

No it's not.

The compiler is allowed to look at this code and make it print 5. Or make it delete my files. This code is undefined and the compiler could do anything without violating the C standard.


> The compiler is allowed to look at this code and make it print 5. Or make it delete my files.

It is allowed to do this but it won't. "The compiler will maximally abuse me if I put any undefined behavior in my program" is not a concern that is actually based in any reality. In the above program the compiler cannot meaningfully prove that undefined behavior exists and if it could it would yell at you and raise an error rather than filling your hard drives with pictures of cats.

This meme has done so much damage to the discussion around UB. The gcc and clang maintainers aren't licking their lips just waiting to delete hard drives whenever people dereference null.

Go compile that program. You can stick it in compiler explorer. It is going to print 12345678.


> It is allowed to do this but it won't.

It is very possible for a non-malicious compiler to end up eliminating this code as dead.

That's the biggest risk. I only mentioned "delete my files" to demonstrate how big the gap in the spec is, because you were saying the spec is already "doing what I want", which happens long before we get to what compilers will or won't do.


A programmer wrote things in a specific order for a specific reason.

Lets instead assume that the variable assignments above are to some global configuration variables and then f() also references those and the behavior of f() changes based on the previously written code.

The objections from the 'C as portable assembler' camp are:

* Re-ordering the order of operations across context switch bounds (curly braces and function calls). -- re-ordering non-volatile store / loads within a context is fine, and shouldn't generate warnings.

* Eliminating written instructions (not calling f()) based on optimizations. -- Modification to computed work should always generate a warning so the optimization can be applied to the source code, or bugs corrected.


> A programmer wrote things in a specific order for a specific reason.

Is it not possible that the programmer introduced a bug?

Consider the bug that caused the 911 glitch in Android phones recently. An unstable comparator was defined in a type, violating the contract that Comparable has with Java's sorting algorithms. When Java detects that this implementation violates the assumptions its sorting algorithms make, it throws an exception. Should it not do this and instead say that the programmer wrote that specific comparator on purpose and it should loop forever or produce an incorrect sort? I think most people would say "no". So why is the contract around pointer dereferencing meaningfully different?


> Modification to computed work should always generate a warning so the optimization can be applied to the source code, or bugs corrected.

This only works out in very, very limited cases. What if this opportunity only presents itself after inlining? What if it's the result of a macro? Or a C++ template?

Just because the compiler can optimize something out in one case doesn't mean you can just delete it in the code...


Globals and locals are different. All compilers will give a global a specific memory location and load and store from it. Locals by contrast can be escape analyzed.


The example didn't show where X was defined; it could be anything.


No it could not have been for copy propagation to be valid. It had to be a local except under some very special conditions.


How about circumstances such as opting in to a semantically new version of C?

  #ifndef __CC_VOLATILE_FUNCTIONS
  /\* No volatile function support, remove code */
  #define VOLFUNC
  #else
/* This compiler supports volatile functions. Only volatile functions may cause external side effects without likely bugs. */

  #define VOLFUNC volatile
  #endif
https://www.postgresql.org/docs/current/xfunc-volatility.htm...

Similarly stable functions could have results cached. You might also note that PostgreSQL assumes any undeclared function is volatile.


> any concrete proposal I have seen kind of implies such conclusion.

No it does not. In your example, I personally would prefer it did not propagate the 12345678. Good grief, I wrote the deref there.

> C compilers assume this doesn't happen, because it is UB.

Incorrectly. IMHO.

> but like register allocation,

You are silently equating int x; with a memory deref. There is no need to do this.

Anyway, here is part of the rationale for C:

"Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a ``high-level assembler'': the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program"

http://port70.net/~nsz/c/c89/rationale/a.html#1-5

So the idea of C as a portable assembler is not some strange idea proposed by ignorant people who don't understand C, but an idea that was fundamental to C and fundamental to the people who created the ANSI/ISO C standard.

But hey, what do they know?


Thanks for making an example!

I'm against compiling this particular example to return a constant. I presumably wrote the awkward and unnatural construction return *x because I want to to force a load from x at return. If I wanted to return a constant, I'd have written it differently! I'm odd though in that I occasionally do optimizations to the level where I intentionally need to force a reload to get the assembly that I want.

Philosophically, I think our difference might be that you want to conclude that one answer to this question directly implies that the "compiler can't omit any load", while I'd probably argue that it's actually OK for the compiler to treat cases differently based on the apparent intent of the programmer. Or maybe it's OK to treat things differently if f() can be analyzed by the compiler than if it's in a shared library.

It would be interesting to see whether your prediction holds: do a majority actually want to return a constant here? My instinct is that C programmers who complain about the compiler treatment of UB behavior will agree with me, but that C++ programmers dependent on optimizations of third party templates might be more likely to agree with you.


Oh, so you are on "that's what I want!" camp. But I am pretty sure you are in minority, or at the very least economic minority. Slowdown implied by this semantics is large, and easily costs millions of dollars.

> while I'd probably argue that it's actually OK for the compiler to treat cases differently based on the apparent intent of the programmer.

This is actually what I am looking for, i.e. answer to "then what do you mean?". Standard should define how to divine the apparent intent of the programmer, so that compilers can do divination consistently. So far, proposals have been lacking in detailed instruction of how to do this divination.


> and easily costs millions of dollars

Looks like UB bugs can cost more. It's a new age of UB sanitizers as a reaction to a clear problem with UB.


Bugs based on optimizations that compilers make based on assumptions enabled by undefined behavior (like the null check issue from 2009 in the Linux kernel) actually don't cost very much. They get a disproportionate amount of scrutiny relative to their importance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: