It's way worse than an extra byte, or the offset of 1 byte for pointers; it also...

zedshaw · on Jan 11, 2012

I partially agree with you, but in a different way. I feel that the real problem is the OS doesn't give code access to its own internal accounting of allocated memory. It already knows the size of any heap chunk you make, so why can't we ask it? In most C code we're carrying around either a null terminator (which can get clobbered) or a whole integer for the size.

Instead, there should be a way to ask the OS "how big is the crap this pointer is pointed at" and get a valid answer. Other useful things would be "how far inside the chunk pointed at by X is the pointer Y?" Or, "will pointing Y at J inside X cause an error?"

And it wouldn't even need to be the OS, just the allocator, probably a few macros, etc. But, for now I have to show people how to write bug resistant C code so this is the best way so far.

tptacek · on Jan 11, 2012

Part of the problem here is that the allocator doesn't need to know how big the crap the pointer is pointing at is; it only needs to know that the crap is smaller thank the chunk it allocated.

If you're going to teach people something unorthodox about C programming, writing custom allocators would probably be a great one. In more than one job I've crushed optimization problems on code written by people way smarter than me simply by knowing to profile allocation and replace malloc.

scott_s · on Jan 11, 2012

Hmm. The question "what is the size of memory that x points to?" is cheap to figure out, because free needs to do it anyway. You couldn't use a macro to do it - it would need to access the internal data structures of the allocator - but it's easy to do. The other questions could be macros that called the first function.

What are the use cases for these functions? What bugs would they prevent?

tptacek · on Jan 11, 2012

Worth pointing out again: the size of the chunk allocated for a particular data structure does not give you the precise bounds of the data structure; odds are, the chunk is slightly larger than the structure.

scott_s · on Jan 11, 2012

I wrote a response explaining why, if you know x then you must know y, but then I realized you were talking about knowing y and learning x. Yes, I agree. I'm not sure which context (knowing the actual size of the memory chunk, which is easy, or knowing the used size of the memory chunk, which is not easy) Zed was talking about.

Roboprog · on Jan 11, 2012

That's a hell of a good idea. As somebody pointed out, it might not stop you from corrupting neighboring items in an array or structure, but it would let you find the size of an array allocated by itself, AND, it would stop you from corrupting the heap itself!

tptacek · on Jan 11, 2012

If you're concerned about corrupting the heap, use an allocator hardened against heap corruption. The default WinAPI allocator, even for optimized production code, is hardened that way. Userland code doesn't need to do anything to get the feature, which is as it should be, because people who write userland code don't know enough to defend the heap against memory corruption.

Roboprog · on Jan 11, 2012

I would happily trade C ASCIIZ strings for Pascal/Perl/Java out of band length indicated strings, even at the cost of those edge cases. Especially if there were a way to internalize immutable string data, and share the bytes of common fragments. (this of course doesn't work well if you plan on modifying the string data)

tptacek · on Jan 11, 2012

So make the trade. I'm sorry, I can see I'm communicating some kind of disdain for alternate string representations, but every C programmer I know --- every single one of them --- has used some form of counted string at some point.

I'm just saying there's a reason the default in C is ASCIIZ. Most of what you do with strings is lightweight; compare 'em, search 'em, tokenize 'em, copy 'em. For that 80% of use cases, ASCIIZ is superior.

Should ANSI C libc provide a heavyweight counted string alternative? Sure, I think so; in fact, it's possible that the only reason it doesn't is that it would take 300 years to resolve all the disputes about exactly what such a library should like like, since every professional C programmer has their own now.

Roboprog · on Jan 11, 2012

Fair enough. Started something like that, not sure I'll ever finish it, though :-(

https://github.com/roboprog/buzzard/blob/master/bzrt/src/bzr...