Hacker News new | past | comments | ask | show | jobs | submit login

Even in C-land, you're technically not supposed to fabricate arbitrary heap pointers, you're supposed to offset something you get from malloc(). The Standard says (if memory serves) that only pointers from the beginning of a malloc() to 1-past-the-end have defined behavior (when it comes to the heap).

Of course, there are probably lots of in-practice exceptions when it comes to embedded, kernel code, mmap() shenanigans, etc.




Oh boy, welcome to the exciting world of C pointer provenance :) What you just described is what compiler people call pointer provenance, where each pointer has, in addition to the address, a second piece of info attached to it that describes all the places to which the pointer can point.

This is an extremely simplified and probably incorrect view of https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1434r0.... This is complicated because nobody agrees on what the correct behaviour should be, and we have mountains of legacy codebases that all rely on something slightly different when pointers get converted to ints and back.


That's what makes C programming fun: doing what the compiler/language people say you shouldn't be doing and seeing if it works anyways. At least, it works on my machine...


Until the compiler gets upgraded....


Until changes in seemingly far away code change the optimization decisions the compiler makes...


That is actually the R1000's raison d'etre:

Rational built a truly semantic IDE, and the result of that is that if you change a line of source-code somewhere in a huge complicated project, it will know precisely what the consequences of that change are.

Ultimately, this allows it often just recompile that single line.


That sounds like the "generation" tag in Vale's generational references [0] or like how memory tagging works on ARM CPUs [1].

I love the approach, it's a way to get a lot more memory safety while not giving up a program's flexibility, especially in C's case.

Some languages are opting to disallow pointer arithmetic and conversion between integers and pointers. We'll see how it works out!

[0] https://verdagon.dev/blog/generational-references

[1] https://developer.arm.com/-/media/Arm%20Developer%20Communit...


The part of this that I most enjoy is that you cannot even obtain some memory from your own allocator implemented on top of mmap or on top of malloc, store ints in it, return it to the allocator, ask the allocator for memory again and receive the same memory back, and store doubles in it.


Yes, notoriously you can't implement malloc in portable C.

In C++, I think they added enough magic in that you should now be able to do it (placement new, std::launder and numerous other hacks).


I've implemented a pools allocator for embedded devices a couple of times. Allocating memory for callers out of a block of static memory. I thought it was portable C. Which detail of the standard did I not realise I was running into?


I assume the block of static memory was a large static char array.

My understanding that reusing the storage would violate the aliasing rules[1] and the rules against overlapping object lifetimes.

[1] while char ptrs can be used to access everything, the reverse is not allowed.


Reportedly you can implement malloc in C, using mmap().


well mmap is posix, not standard portable C.


So instead of giving that metadata to the programmer the C language says F U and tells me only knowing where to put it is enough, right?

"hey you allocated it you should know about it right?" right yeah


To be clear, that metadata only (generally) exists in principle - it is not actually materialized and stored anywhere, even at compile time (definitely not at runtime). There may be cases where the compiler actually tracks it, but most often it is only tracked "best effort" - the compiler only needs to prove that two pointers can't be aliased based on any possible value of that metadata, not actually compute that metadata and use it.


Standard C provides only a few ways to obtain valid pointers. implementations can define behaviors in cases that the standard leaves undefined, such as allowing more cases of casts of integers to pointers than the standard defines (common in embedded-land), or functions like mmap or sbrk. so you or your chip vendor could define FOO_REG as (uint32_t )0x80001234 and use it as if it were a variable.

Olde C used to just let you use integers as struct pointers, and there was only one struct member namespace. so code like this was valid and did an integer-size write to address 0177770. old unix did this for device register access; see the lions book.

struct { int integ; };

f() { 0177770->integ = 012345; }


Well, unless you're implementing malloc using e.g. sbrk().


Although you shouldn't be on a modern system, you'd instead be implementing malloc on top of mmap. So remove sbrk, and make mmap the "object allocator", and tada! You don't really need linear virtual address spaces anymore.


Newer allocators like mimalloc[0] don't even support sbrk (I think jemalloc still does). Mimalloc seems to have some interesting features.

[0] https://github.com/microsoft/mimalloc


It still supports sbrk since it's available on WebAssembly but mmap is not.

https://webassembly.org/docs/faq/#what-about-mmap


Newer kernels like arm64 FreeBSD don't even support sbrk(2).


Yes, this is why malloc() can't actually be implemented in C. Actual implementations of it exist because of special compiler dispensation, or mostly that the callers and implementations are in separate libraries so the implementation isn't visible to the caller.


malloc perhaps can't be implemented in the abstract dialect called ISO C. It can be impemented in GNU C, and others.


It can't. GNU C adds a malloc attribute:

> Attribute malloc indicates that a function is malloc-like, i.e., that the pointer P returned by the function cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P. In addition, the GCC predicts that a function with the attribute returns non-null in most cases.

But it doesn't provide any operations to do the things in this paragraph (create new pointers). The operation that does this is malloc itself.

This will typically not cause problems, but it would if LTO got so good you could include libc in it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: