Hacker News new | past | comments | ask | show | jobs | submit login

The #1 problem with C is buffer overflows. The solution is pretty simple:

https://www.digitalmars.com/articles/C-biggest-mistake.html

and does not break existing code.




For strings, you need more than the allocated length: you need the current length too. Otherwise you end up with:

- O(n) algorithms that are constantly scanning strings looking for a zero byte

- strings can't contain zero bytes

- strings have to contain one zero byte

- putting a zero byte in the middle of a string chops it off

- probably other nonsense I haven't thought of

Please don't adopt another half-assed solution just because it fits more easily into C's existing set of crap. That's how we got fake strings in the first place.


This solution has been in D for 20 years. It works very well. It is fully assed.

It is very, very rare to see a buffer overflow in D because the use of these arrays is so easy, convenient, and robust.

Not only does it virtually eliminate buffer overflows (when used), it is more efficient than 0 terminated strings. It does not need to scan the strings, nor does it need to load the string into memory to determine its length.

I understand your concerns about mixing it up with 0 terminated strings. They are real, but have not been a particular problem in practice. What happens is one simply moves away from using 0 terminated strings. A zero terminated string can be converted to a length one with:

    a = s[0 .. strlen(s)];
Going the other way requires a memory allocation similar to what strdup() does.


I read your article before posting, and it says:

-----

void foo(char a[..])

meaning an array is passed as a so-called “fat pointer”, i.e. a pair consisting of a pointer to the start of the array, and a size_t of the array dimension.

-----

I didn't see a "current length" mentioned. Is it there? Can I have a string with an allocated length of 20 bytes and a current length of 10 bytes, without using a looking for a zero byte?


The capacity value is not part of it.

This proposal is not about memory management any more than 0 termination is about memory management. It is just about finding the end of the array.


The CHERI extensions for the ARM architecture allow for compilers that achieve this effect by making all pointers "fat", with bounds, and doing pervasive hardware bounds checks. They've been playing with FPGA versions and emulators for a while, but the first actual SoCs just got shipped: https://www.theregister.com/2022/01/21/arm_morello_testing/ -- software for it includes tweaked versions of BSD and I think Linux which use bounds-checked pointers throughout, including in the kernel.


Making them all fat doesn't fix code that uses strlen().


Not quite sure what you mean. All it can do is turn undefined behavior on an out-of-bounds reference into a segfault, but if that ends up turning an RCE vulnerability into something less severe, it's still an improvement -- at least from that perspective. The intent is for the hardware-checked bounds to be at very close to the declared bounds for arrays in source code, at least in cases most typically subject to buffer overlow; see https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf

(And I'm not sure how the alternative fat-pointer proposal you mentioned does better. Searched that page for references to strlen, but didn't find much.)


Most arrays do not have declared bounds, that's why a runtime check is necessary.


Bounds for storage are known when it is allocated. CHERI puts those bounds into the pointers (which double in size so they can fit), and has the processor do the runtime checks at every dereference. So, same effect as the proposal you reference, AFAICS -- just without altered syntax.


I'm fairly new to C, am I understanding correctly that the new syntax is just sugar for a struct containing two values: a pointer to the start of the array, and its length? This can of course be done without the new syntax, and it seems exceedingly useful. Are such structs commonly used in C projects?

What is the actual source of bugs in this regard? How does passing the length as a separate parameter lead to more bugs than having it bundled -- is the main source of error passing the wrong variable?


If the length is bundled with the pointer as syntax then the compiler and maybe even the runtime can provide checking on behalf of the programmer. Passing it by hand means it’s the programmers responsibility to (remember to) validate


> How does passing the length as a separate parameter lead to more bugs than having it bundled

It boils down to being inconvenient, unreliable, error prone, and difficult to audit. That's why it isn't used and C's #1 problem remains buffer overflows. And so it goes for all the other solutions for C for this problem, except my proposal.

My proposal is how D works, and it's been convenient, reliable, robust and auditable for 20 years. You can still use raw pointers in D, they are fully supported, but the use of the arrays make use of raw pointers rare.


>This can of course be done without the new syntax, and it seems exceedingly useful. Are such structs commonly used in C projects?

There are string libraries that work that way in C.

If you do it manually, without a lib, then you need to check that the length is valid yourself (after every operation), so it's not as useful as a language with first class support for it.


Presumably you can add an inlined array access function or macro in the header to do this for you.


If that worked, people would have done it, and C buffer overflows would be a thing of the past.


The reason this hasn't been done is mostly because C programmers have an allergy to runtime checks that might slow their programs by even single digit percentages.


People don't use these other schemes because they are clumsy, inconvenient, look bad, and have never caught on.

With it as part of the syntax, it becomes natural to use them. I'm not making this up, it is based on extensive experience.

The runtime overflow checks can be turned on and off with a compiler switch, so it becomes trivial to see what performance effect it actually has. Critical loops can be coded with ordinary pointers as necessary. For the rest, the performance effect is not measurable.

Again, this is from experience, not supposition.



Neat idea. What does implementing new syntax in one of the established C compilers involve? Is it the kind of thing that could be reasonably tackled in a small patch just to play with?


It wouldn't be hard. The semantics are straightforward and don't interfere with the way a C compiler already works.


> The #1 problem with C is buffer overflows.

There is also the null hypothesis that buffer overflows are simply a category of bugs that is simply easy to identify, and thus, apparently prevalent.


Would it be reasonable to assert that bugs which have such rare and uninteresting consequences that nobody has either noticed (not widely prevalent) or identified them (worth investigating because the consequences were severe), could be the number one problem?


The cold is one of the most common human infections, does that make it the most severe health problems facing humanity as well?


You started off suggesting there may be more serious problems in C which haven't been identified or are not prevalent. In this comparison we have identified car crashes, heart disease and lung cancer as health problems, and they are prevalent.


I've never heard of anyone dying of the cold. I've heard of many major security breaches caused by buffer overflows.


Buffer overflows aren't easy to identify. That's why they lurk for years in shipping software, and then there's a panic when they're discovered.



> C is probably the patriarch of the longest list of languages. Notable among these are C++, the D programming language, and most recently, Go. There are endless discussion threads on how to fix C, going back to the 80’s.

Why is Java missing in that list?


Maybe because Java is more of a direct descendant of C++ than pure C (though obviously C++ is an iteration of C).

Or maybe they omitted it before there are literally hundreds of languages that were inspired from C and listing them all would have been boring for the reader.


Java has a C++ syntax to sell it to the C++ devs at the time, however its major influence was Objective-C.

https://cs.gmu.edu/~sean/stuff/java-objc.html


Or so somebody said in an interview or thought they did, but otherwise Java is nothing like Objective C with respect to messages (and of course, wrt to syntax, but that's not that important).


If you want to do dynamic dispatch, there are ways to achieve it via reflection and dynamic proxies.

Interfaces, dynamic code loading, JAr bundles, lightweight class type reflection, all trace back to Objective-C, or Smalltalk, if one wants to be pendantic.

In case you missed it, even JEE started as an Objective-C framework for the Spring distributed OS, Distributed Objects Everywhere.

https://en.m.wikipedia.org/wiki/Distributed_Objects_Everywhe...


That's my posting that was linked to.

Of course, Java doesn't have dispatch. And it's also true that Gosling's team originally considered, then rejected, C++ in favor of building Oak. But Oak borrowed an awful lot directly from Obj-C, and only later underwent a lot of syntactical surgery (turning into Java) in order to "look" like C++ specifically to attract C++ programmers, even though it didn't feel like C++ at all. This is pretty well documented.


Yet it didn't have any of the dynamic nature of Objective-C which made the attempt to replace Objective-C in OS X with Java a failure.


What?

There was no attempt to replace Objective-C with Java on OS X.

Apple was unsure if the strange look from Objective-C would ever appeal to the Object Pascal/C++ communities of Apple developers, thus they used Java wave as plan B, in case Objective-C was rejected by them.

As this did not happen, there was no reason to keep plan B around.


There was no attempt to replace Objective-C with Java on OS X.

I would say there was a heck of an attempt with the Java-Cocoa bridge that didn't do well because Java didn't have a lot of the dynamic nature of Objective-C. They certainly to my eyes as a developer tried to push Java.


As you wish, Java-Cocoa bridge could have never replaced Objective-C, when writing Objective-C was still part of the game to actually use it.

Do you actually believe that Jobs liked Java, when Apple was created on top of Object Pascal and C++, and then he was responsible for bringing Brad Cox to NeXT?


Yeah, given the crap they were sending in the monthly discs to developers at the time. They certainly seemed like we should of took Java seriously. I was rather annoyed given I had learned Objective-C on NeXTSTEP. I'm glad that someone realized it was not substitute, but they did push it. Jobs hated it later, but he changed his mind on things fairly often given just looking at iPod features.


How could Java be a substitute when Cocoa bridge only allowed for a subset of OS X frameworks to be called from it?

Java was already available on System 7.


Well, it sure didn't, but the messaging in those CD-ROMs was pretty obvious and I suppose they would have expanded it if someone didn't realize it was a lost cause. I didn't develop for Macs until Apple bought NeXT, so I don't know what was available for System 7.


Sometimes one understands what they think they want to understand.

Java was definitely not on the picture when Apple went to CERN doing their OS X marketing sessions.

In fact, you now made me dig out some stuff.

https://developer.apple.com/library/archive/documentation/Ja...

https://developer.apple.com/library/archive/documentation/Co...

> This document discusses issues that arise when writing Java applications with Cocoa, which is implemented in Objective-C.

No Java here on the OS X announcement:

https://youtu.be/SjlLG1EzJ2k?t=4450

The only message was Java being first party on OS X, as in the System 7 days, the JVM was not from Apple rather a third party. Thus the announcement at JavaONE 2000.

https://www.javacoffeebreak.com/articles/javaone00/index.htm...

You will not find in those CDs anything like this:

> Swift is a successor to both the C and Objective-C languages. It includes low-level primitives such as types, flow control, and operators. It also provides object-oriented features such as classes, protocols, and generics, giving Cocoa and Cocoa Touch developers the performance and power they demand.

https://developer.apple.com/swift/


Why would you think Java is a descendant of C++? There may be overlap in some syntax, mainly from C. C++ is not, and was not, the only OOP language, and I have heard no such that that is should be descendant of C++.


I was once at a talk given by James Gosling. He said that Java-the-language was «a trick to get C++ programmers to use the Java Virtual Machine». He deliberately made Java very similar to C++ but removed what he saw as the hard and risky parts (memory management, operator overloading, etc) that are typically not required for standard applications.

Well, that was my interpretation of what he said, errors are my own etc. But this would make Java a direct descendant of C++, in my mind.


I have never heard that quote from James before. Are you sure about the JVM? The JVM was quite controversial back then, Java first had to prove that you could make a performant virtual machine.

But Guy Steele claimed "We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp."


Well, it’s many years ago and memory corruption is real. I got the impression that their goal was to get adoption of (what at some point became) the JVM, or the «compile once run anywhere» vision. They envisioned many languages to coexist on the JVM, which kinda happened but maybe not as much as they thought. So they designed a language to get started, Java, and made it familiar-looking to get people om board.


I was around at the time.

Java's object semantics are explicitly intended as a streamlining of C++, the keywords are the same for the most part, and it was sold as a C++ which runs anywhere with no memory leaks.

Note that I mentioned the semantics: the object semantics of Java and C++ are so similar as to have corrupted the entire concept of objects in their favor.

This wasn't an accident, and it wasn't malice, it just feels like it sometimes.


I'm not a Java programmer but as far as I can tell java object semantics, far from being corrupted, do indeed come from simula via c++.

Thanks to reflection and a featureful VM, Java does have a significant amount of dynamic behaviour that can be used to implement a lot of features of the smalltalk side of the OO family tree.


Because at the time it was pretty clear.

'At that stage, C and C++ "absolutely owned the universe"' - https://www.zdnet.com/article/programming-languages-java-fou...

They took a lot of inspiration from C/C++'s syntax and seemed to be pretty concerned with improving memory management, security and developer velocity.


Another programming language being popular by no means mean that it is a derived language of any sort. Any development is of course retrospective, but it is sorta like saying all music is descendant from pop.


I understand the point you’re trying to make but writing music is a creative process whereas marketing programming languages isn’t.

I was around at the time and C++ was trendy so Sun were marketing it as the future for C++ developers. It was definitely influenced by what was in vogue at the time even if it doesn’t adopt all of the traits of C++.

I remember this because I wasn’t a fan of C++ back then as I’d come from the ALGOL family of languages so found C-style syntax a little alien (and tbh I still don’t like C++ now even though I’ve since warmed to C’s syntax) so it took me years before I warmed to Java.


Define "derived".

In particular, if Java kept (almost?) all the keywords, and the operators, and the statement terminators, and the block delimiters, and the same approach to object-oriented... how is it not derived from C++?


>C++ is not, and was not, the only OOP language

No, but it was the only one that mattered at the time, as far as adoption was concerned, and regarding marketing Java as similar to existing programmers and their managers...

That's also how it was hyped at the time and the kind of people it was sold too (I was -barely- there).


I suspect it’s because manual memory management in Java isn’t built into the language. Is it even possible? I’m not a Java programmer and I don’t know. My understanding has been that the runtime doesn’t expose the memory model to you.


In a way it does. Java just likes to push most features to methods on special objects, instead of exposing them as native functionality (to avoid backwards incompatible changes).

So it would look something like MemorySegment.allocateNatice(100, someScope). This new API has a runtime ownership model, so by default only a single thread can access this memory address, and it can be freed at will.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: