Even in C-land, you're technically not supposed to fabricate arbitrary heap pointers, you're supposed to offset something you get from malloc(). The Standard says (if memory serves) that only pointers from the beginning of a malloc() to 1-past-the-end have defined behavior (when it comes to the heap).
Of course, there are probably lots of in-practice exceptions when it comes to embedded, kernel code, mmap() shenanigans, etc.
Oh boy, welcome to the exciting world of C pointer provenance :) What you just described is what compiler people call pointer provenance, where each pointer has, in addition to the address, a second piece of info attached to it that describes all the places to which the pointer can point.
This is an extremely simplified and probably incorrect view of https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1434r0.... This is complicated because nobody agrees on what the correct behaviour should be, and we have mountains of legacy codebases that all rely on something slightly different when pointers get converted to ints and back.
That's what makes C programming fun: doing what the compiler/language people say you shouldn't be doing and seeing if it works anyways. At least, it works on my machine...
Rational built a truly semantic IDE, and the result of that is that if you change a line of source-code somewhere in a huge complicated project, it will know precisely what the consequences of that change are.
Ultimately, this allows it often just recompile that single line.
The part of this that I most enjoy is that you cannot even obtain some memory from your own allocator implemented on top of mmap or on top of malloc, store ints in it, return it to the allocator, ask the allocator for memory again and receive the same memory back, and store doubles in it.
I've implemented a pools allocator for embedded devices a couple of times. Allocating memory for callers out of a block of static memory. I thought it was portable C. Which detail of the standard did I not realise I was running into?
To be clear, that metadata only (generally) exists in principle - it is not actually materialized and stored anywhere, even at compile time (definitely not at runtime). There may be cases where the compiler actually tracks it, but most often it is only tracked "best effort" - the compiler only needs to prove that two pointers can't be aliased based on any possible value of that metadata, not actually compute that metadata and use it.
Standard C provides only a few ways to obtain valid pointers. implementations can define behaviors in cases that the standard leaves undefined, such as allowing more cases of casts of integers to pointers than the standard defines (common in embedded-land), or functions like mmap or sbrk. so you or your chip vendor could define FOO_REG as (uint32_t )0x80001234 and use it as if it were a variable.
Olde C used to just let you use integers as struct pointers, and there was only one struct member namespace. so code like this was valid and did an integer-size write to address 0177770. old unix did this for device register access; see the lions book.
Although you shouldn't be on a modern system, you'd instead be implementing malloc on top of mmap. So remove sbrk, and make mmap the "object allocator", and tada! You don't really need linear virtual address spaces anymore.
Yes, this is why malloc() can't actually be implemented in C. Actual implementations of it exist because of special compiler dispensation, or mostly that the callers and implementations are in separate libraries so the implementation isn't visible to the caller.
> Attribute malloc indicates that a function is malloc-like, i.e., that the pointer P returned by the function cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P. In addition, the GCC predicts that a function with the attribute returns non-null in most cases.
But it doesn't provide any operations to do the things in this paragraph (create new pointers). The operation that does this is malloc itself.
This will typically not cause problems, but it would if LTO got so good you could include libc in it.
Of course, there are probably lots of in-practice exceptions when it comes to embedded, kernel code, mmap() shenanigans, etc.