> It's inherent to the language. Writing "string" gives you a NUL-terminated str...

nine_k · on Aug 27, 2022

Pascal strings can be improved to support strings of any reasonable length.

Say two highest bits of the counter set the size of the counter field. 00 = 6 remaining bits, 01 = 14 bits (2 bytes), 10 = 30 bits, 11 = 62 bits (8 bytes).

A simple `counter* & 0x3f` would remove the width-setting bits, without any shifts, additions, etc.

This allows small strings to use only 1 byte for the counter, while allowing huge strings that span the entire RAM.

Phelinofist · on Aug 27, 2022

How would that work with untrusted strings? As I understood TFA strlen() is an issue if the string is not null terminated

matheusmoreira · on Aug 27, 2022

That's correct. Using strlen on anything but C string literals is just asking for bugs. The thing is untrusted strings don't come from C itself, they come from I/O.

The kernel has perfectly reasonable I/O interfaces.

  ssize_t bytes_read    =  read(file_descriptor, buffer, size);
  ssize_t bytes_written = write(file_descriptor, buffer, size);

You always know the length.

Well... At least you would always know the length if the standard C library didn't abstract that perfectly good interface away behind stdio just so it could do buffering and return NUL-terminated strings.

It's just like errno. The kernel simply returns a negated error constant on failure. The C standard library takes that sane interface and turns it into a thread local global variable.

Gibbon1 · on Aug 27, 2022

You could add real strings and arrays to C and kill POSIX and you'd have two fewer problems.

JdeBP · on Aug 27, 2022

No. You would have extra problems. Not just from the maintenance issues when going along the migration path that Linus Torvalds pointed out. But also especially, for one thing, from the poor thinking generally involved, such as people thinking that the way to address a problem where it is stated that the POSIX API for I/O is reasonable is to "kill POSIX".

astrobe_ · on Aug 27, 2022

Apparently at some point people forgot that if you don't like an API, library or interface, you can just put a wrapper on top of it. Many libraries are just "toolkits" after all, you are not supposed to use the raw API everywhere; e.g. you if don't quickly stop doing that with the BSD sockets API, you are part of the problem.

matheusmoreira · on Aug 27, 2022

> if you don't like an API, library or interface, you can just put a wrapper on top of it

We can also simply get rid of all that bloat and just use the system calls directly.

> you are not supposed to use the raw API everywhere

Linux system calls are a stable interface and the entry points are even programming language agnostic. It's okay to use them directly.

> e.g. you if don't quickly stop doing that with the BSD sockets API, you are part of the problem

Yeah it's not a good idea on other operating systems since the system call interfaces are unstable. We have to use their C libraries on those platforms.

morelisp · on Aug 27, 2022

> Apparently at some point people forgot that if you don't like an API, library or interface, you can just put a wrapper on top of it.

If I were to survey the state of modern software development and try to characterize the skills lost compared to decades past, "not enough wrappers" would be nowhere on my list.

astrobe_ · on Aug 28, 2022

Ok that was a bit overstated, s/some point people forgot/some people forget/. I don't know what was the common practice a few decades ago because source code was not as visible as it is today, but in my experience what you see on Github is not what the average developer does. Github advertises itself as a social coding network; just like with other social networks, there is a selection bias regarding what is posted.

astrobe_ · on Aug 27, 2022

> That's correct. Using strlen on anything but C string literals is just asking for bugs. The thing is untrusted strings don't come from C itself, they come from I/O.

Indeed, it is just an input check/"sanitization" issue - just like one carefully checks that a JSON or XML input is well formed, if a protocol spec says that some part is an ASCIIZ string, one has to check that there's indeed a zero byte before the end of the data packet.

silon42 · on Aug 27, 2022

the 'discarding const' is quite a problem if you try write real code like this.

matheusmoreira · on Aug 27, 2022

I know. That string literal is likely to be located in a read only page. In real code, I'd have to allocate some memory and copy the text to the new location if I want the resulting structure to be writable. For clarity's sake I omitted these details.

This isn't unique to my example though. Traditional C strings have the exact same problem and they do get copied all the time.