Hacker News new | past | comments | ask | show | jobs | submit login

I think that structure will be helpful, and you can also easily pass substrings. (With null-terminated strings, you can easy cut off some characters from the beginning of the string, but not from the end. Passing both the length and the pointer together allows to easily do both, as well as allowing passing strings containing null characters.) I think that this is better than Pascal strings, due to being able to make substrings like this.

(Also the structure can be passed by value (without a pointer to the structure), which also allow to easily use in other function, e.g. to convert a null-terminated string or Pascal string into this structure, without allocating additional memory.)




It boggles my mind that the C standard hasn’t officially added support for this. It seems like such a small change that would dramatically improve the quality of C code.


It would not be a small change if you wanted to actually make them usable for the standard library - ie. add a second 'struct bytes'-variant of EVERY function that currently accepts a NUL-terminated string.

No, just calling a conversion function before (that would need heap allocation) would never be accepted by C programmers (for the overhead).

Then there is inertia - how many really would want to port their application to a different string type? Not to mention, all the libraries you're using would also have to been converted.


You can make the read-only conversion "free" by storing both the size of the string, as well as terminating the actual string contents with a NUL. So the String "bar" would be { length: 3, contents: ['b', 'a', 'r', '\0\] }. All your functions dealing with the "rich" string type work the same using the length, except they have to be aware of the need to preserve the terminating 0, and you if you want to pass a read-only string to a legacy function you can just pass it contents.

Of course that also generates a giant foot gun because you might manage to get the old and new strlen to disagree on the size, because one reads the length field and the other searches the \0.


Inertia is no excuse, if something like SDS was adopted by WG14, eventually C applications could slowly be migrated into it.

As it is, it will never happen.


SDS is cool, I use it in my projects and have also extended it further with some new features... but I worry about the amount of dynamic allocation going on, it would be nice to have an alternative solution such as using stack or pool allocations, or being able to declare SDS string literals at compile-time (probably only possible in C++). Every time I notice that I need to replace a const char* with an SDS I just know I'm slowing things down and adding more complexity.


It doesn't have to be 1:1 equal to SDS, rather at least having WG14 doing something, anything at all, instead of keeping pushing the agenda of C being a Swiss cheese of security exploits, to the point everyone is adopting hardware memory tagging as the ultimate mitigation.


For some reason C seem to be going through standard updates as often as C++ now. Not only that, they've added absolutely huge stuff like threading primitives as well as an insane generic macro thing.

Surely they could add better designed structures and functions for dealing with memory.


Is that really so big a deal, considering they've done that multiple times already anyway? How many versions of strcpy, printf, and other string handling functions are there already?


"I mean it's one mutable string type Michael, what could it take to standardize, 48 pages not including the allocator model?"


Copy C++. The underlying string buffer must be null terminated, but still carry its length.


They were afraid of the C++ situation where there are 3-4 different string implementations in every large project.


Hence why something like SDS should be part of the standard.


Nul termination is not only in C but also in some file formats. It has some advantages - apart from modest space savings for short strings, it means that you can read a string from some given location to the end - without any out of band data (length) that necessarily has to be stored in a different, agreed on location. This is a very valuable property.

I'm not saying don't use length fields, I'm saying use nul terminators where possible and use length fields where needed. And, they are not mutually exclusive.

And C doesn't need a standardized length delineated string structure in my opinion. Nul terminators serve the job fine for most Standard APIs (which take only short strings), and can receive length fields as separate function parameters where required.


A case against:

• On the most widely used architectures, reading a string is much easier if the string is a known length. x86 has its string instructions, ARM has its Load Multiple instructions.

• Even with length-prefixed strings, many uses of short strings are with string literals and so the length does not need to be stored anywhere.


I agree. We're not disagreeing :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: