Hacker News new | past | comments | ask | show | jobs | submit login

It's way worse than an extra byte, or the offset of 1 byte for pointers; it also means you need a whole copy of every substring with its own length delimiter, and can't tokenize in place.

C code gets into just as much trouble with length-delimited data structures as it does with ASCIIZ; ASCIIZ is a red herring. People have declared over and over again that it's the single worst decision in C and the cause of every buffer overflow. But if you look over the past 10 years or so, memcpy() has caused just as much chaos, and we're just as likely to find an overflow or integer mistake in binary protocols (where NUL-termination is nonsensical) as we are in ASCII protocols.

"Leaving the cleanup to the OS" works everywhere, on every modern system, and lots of programs would benefit from shedding a lot of useless bookkeeping and just treating the process address space as an arena to be torn down all at once. But I think the point the author was trying to make is, when you code that way, you make it impossible to hoist your code into another program as a module. Which is true; if it's likely you're writing a library as well as a program, you don't get to take the easy way out.

You can still write a 100 line arena allocator to pretend like you can, though. :)




I partially agree with you, but in a different way. I feel that the real problem is the OS doesn't give code access to its own internal accounting of allocated memory. It already knows the size of any heap chunk you make, so why can't we ask it? In most C code we're carrying around either a null terminator (which can get clobbered) or a whole integer for the size.

Instead, there should be a way to ask the OS "how big is the crap this pointer is pointed at" and get a valid answer. Other useful things would be "how far inside the chunk pointed at by X is the pointer Y?" Or, "will pointing Y at J inside X cause an error?"

And it wouldn't even need to be the OS, just the allocator, probably a few macros, etc. But, for now I have to show people how to write bug resistant C code so this is the best way so far.


Part of the problem here is that the allocator doesn't need to know how big the crap the pointer is pointing at is; it only needs to know that the crap is smaller thank the chunk it allocated.

If you're going to teach people something unorthodox about C programming, writing custom allocators would probably be a great one. In more than one job I've crushed optimization problems on code written by people way smarter than me simply by knowing to profile allocation and replace malloc.


Hmm. The question "what is the size of memory that x points to?" is cheap to figure out, because free needs to do it anyway. You couldn't use a macro to do it - it would need to access the internal data structures of the allocator - but it's easy to do. The other questions could be macros that called the first function.

What are the use cases for these functions? What bugs would they prevent?


Worth pointing out again: the size of the chunk allocated for a particular data structure does not give you the precise bounds of the data structure; odds are, the chunk is slightly larger than the structure.


I wrote a response explaining why, if you know x then you must know y, but then I realized you were talking about knowing y and learning x. Yes, I agree. I'm not sure which context (knowing the actual size of the memory chunk, which is easy, or knowing the used size of the memory chunk, which is not easy) Zed was talking about.


That's a hell of a good idea. As somebody pointed out, it might not stop you from corrupting neighboring items in an array or structure, but it would let you find the size of an array allocated by itself, AND, it would stop you from corrupting the heap itself!


If you're concerned about corrupting the heap, use an allocator hardened against heap corruption. The default WinAPI allocator, even for optimized production code, is hardened that way. Userland code doesn't need to do anything to get the feature, which is as it should be, because people who write userland code don't know enough to defend the heap against memory corruption.


I would happily trade C ASCIIZ strings for Pascal/Perl/Java out of band length indicated strings, even at the cost of those edge cases. Especially if there were a way to internalize immutable string data, and share the bytes of common fragments. (this of course doesn't work well if you plan on modifying the string data)


So make the trade. I'm sorry, I can see I'm communicating some kind of disdain for alternate string representations, but every C programmer I know --- every single one of them --- has used some form of counted string at some point.

I'm just saying there's a reason the default in C is ASCIIZ. Most of what you do with strings is lightweight; compare 'em, search 'em, tokenize 'em, copy 'em. For that 80% of use cases, ASCIIZ is superior.

Should ANSI C libc provide a heavyweight counted string alternative? Sure, I think so; in fact, it's possible that the only reason it doesn't is that it would take 300 years to resolve all the disputes about exactly what such a library should like like, since every professional C programmer has their own now.


Fair enough. Started something like that, not sure I'll ever finish it, though :-(

https://github.com/roboprog/buzzard/blob/master/bzrt/src/bzr...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: