Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Modifying Clang for a Safer, More Explicit C++ (github.com/compiler-devel)
76 points by compiler-devel on Aug 19, 2022 | hide | past | favorite | 87 comments
Modified C++

Inspired by the paper "Some Were Meant for C" by Stephen Kell, I decided to show that it's possible to iterate C++ to be safer, more explicit, and less error-prone.

Here's a possible starting point: I didn't invent a new language or compiler, but took the world's best compiler, clang, and modified it to begin iterating towards a new furture of C++. Naming things is hard, so I call this 'Modified C++'. Some of the following could be implemented as tooling in a linter or checker, but the idea is to update the compiler directly. I also wanted to learn more about clang. This compiler needs a flag to enable/disable this functionality so that existing library code can be used with a 'diagnostic ignored' pragma.

You can build clang using the normal non-bootstrap process and you'll be left with a clang that compiles C++ but with the following modifications:

     - All basic types (excluding pointers and references) are const by
     default and may be marked 'mutable' to allow them to be changed after
     declaration
     - Lambda capture lists must be explicit (no [&] or [=], by themselves)
     - Braces are required for conditional statements, case and default
     statements within switches, and loops
     - Implicit conversions to bool are prohibited (e.g., pointers must be
     compared against nullptr/NULL)
     - No goto support
     - Explicit 'rule of six' for classes must be programmer-implemented
     (default, copy, and move c'tors, copy and move assignment, d'tor)
     - No C style casts

Here's an example program that's valid in Modified C++:

    mutable int main(int, char**)
    {
      mutable int x = 0;
      return x;
    }


    Here's another that will fail to compile:


    mutable int main(int, char**)
    {
      int x = 1;
      x = 0;  // x is constant
      return x;
    }

I'd like your feedback. Future changes I'm thinking about are:

     - feature flag for modified c++ to enable/disable with 'diagnostic ignored'
     pragma, to support existing headers and libraries
     - support enum classes only
     - constructor declarations are explicit by default
     - namespaces within classes
     - normalize lambda and free function syntax
     - your ideas here



> - All basic types (excluding pointers and references) are const by default and may be marked 'mutable' to allow them to be changed after declaration

If you're not changing how const works, then this has limited utility in C++ because C++ const has all sorts of problems (e.g. not transitive). Also, what does the "mutable" annotation for a free function (i.e. main) mean? That just seems weird.

> - Lambda capture lists must be explicit (no [&] or [=], by themselves)

[&] is pretty valuable in cases where you do something like invokeSynchronously([&] {...})

I don't know that your changes will ever see much adoption because it won't be able to compile anything more complex than a "hello world" program as all the things you disallow are used & the porting effort is not cosmetic. Additionally, you're not actually fixing any of the problems people have with C++. So:

1. Consider fixing const semantics if you're going down the path of defining a new language

2. Think about how to fix memory safety and issues around UB which are the #1 sharp edges for C++

I don't know if you're achieving the goal of a safer, less error-prone language with the changes outlined. Have you looked at the things Carbon [1] is doing? I'd say that's an attempt to define a spiritual successor to C++, one that can easily interoperate with C++ but when you stay within the language it's safer.

[1] https://github.com/carbon-language/carbon-lang


Thank you for this great feedback. I'll do my best to respond to each of your points:

WRT const, you're correct and I'd need to go further in updating const behaviors in the language. I stole this idea from Rust (sort of) in that variable declarations in that language are const by default. Essentially, I wanted to 'flip' the semantics in C++ to match, and use mutable to allow variables to change after their declaration. I could go further to enforce transitivity (e.g. so you can't do something like mutable x = y; where y is const).

[&] is handy indeed yet this was motivated by my experience in legacy heavy codebases where there are often many variables in scope and some with external consequences (e.g. file descriptors, sockets). I don't want these accidentally captured if the lambda invocation site has lifetime implications beyond those resources.

I think I'm achieving the goal of a safer, less error-prone language because these changes could've prevented the 2014 'goto fail' from happening (and not just because the keyword goto would be omitted but because there was a conditional without braces in the affected source making the code less explicit and less clear).


> [&] is handy indeed yet this was motivated by my experience in legacy heavy codebases where there are often many variables in scope and some with external consequences (e.g. file descriptors, sockets). I don't want these accidentally captured if the lambda invocation site has lifetime implications beyond those resources.

I think your solution to that problem goes the wrong way, though. The problem is whether or not a lambda can survive past the immediate usage, not what it captures. Listing those resources explicitly still gives you the same bug, banning [&] didn't avoid it.

I'd suggest instead an approach where a template taking a callable annotates whether or not it's "inline". If it is inline, then [&] should just be the default even. If it's not, then ban [&]. Possibly ban taking anything by reference if it's not a synchronously-used lambda even.

(inline / non-inline terms here cribbed from Kotlin https://kotlinlang.org/docs/inline-functions.html - probably there's a better word for it, but whatever)




mutable x = y in my opinion should be allowed. After all the new variable is just a copy of old one. Why shouldn't you be able to modify it?


Agreed, just like mutable x = 42 should be allowed.


Yes, good point, I retract what I said before. Thanks!


Just don't fall into the trap of believing that only making constness transitive would make it fool-proof.

A non-"pure" const member function with const parameters could still call some other function with access to a "mutable" alias to what you have a const reference to. You would need something more, such as an ownership system to make the compiler make that impossible (as in Rust) or to detect it (the topic of Master's thesis, BTW ) ... but then it would no longer be something resembling C++.


Generally speaking a const member function can't call a member function that is non-const or a member function that modifies a member variable that's not declared mutable. If the present const member function has a const reference to something that's marked mutable, then it's effectively a no-op as far as const value qualifications are concerned.


I think you might be onto something with regards to the general idea, but most of your particular rules I disagree with. vector<mutable int> for example is very strange; there's no reason vector<int> shouldn't work. With respect to lambda captures always being explicit, it's a far heavier restriction than you (and many) people realize—sometimes you literally cannot know what's inside the lambda to be able to capture it (look up the SCOPE_EXIT macro as just one example), and even when you can, listing all of them is sometimes far more harmful to readability than helpful—it depends strongly on the situation. Goto is absolutely necessary in certain rare but practical cases too—like when converting a recursive algorithm to an iterative one without breaking git blame. C-style casts to (void) are pretty useful, so you'd need at least an exception for that.

Constructors being explicit by default I 100% agree with, and there are other rules I could come up with too, but in general, you need to realize that a lot of the features in the language have legitimate use cases that you might simply have a hard time imagining. Therefore, coming up with useful rules without hampering useful functionality requires both (a) experience & playing around with the language to a greater extent than you might at your job, and (b) a great deal of thought on top of that.


Thank you for your thoughtful response. vector<int> wouldn't work because copy semantics wouldn't apply for a constant type, so mutable would be needed (as you rightly pointed out). I'm not sure that vector<int> should work unless the vector container was updated to move its elements by default (another commenter suggested move-by-default rather than copy-by-default as well). I've used RxCpp in the past and know what nightmare awaits should you have to explicitly state lambda captures, yet I've seen too many devs over capture with subtle bugs as a result. Is there a compromise here? I'm not sure that goto is required when one could use do { ... } while(false); with break statements for cases where goto would've been used (not ideal, but again this is an iterative approach). C style casts to void are useful for some memory operations but I'm not sure there's a case where they're required.

If you would, I'd love to hear some of your rules as it's clear you have a lot of C++ experience. Can you send some along? Thanks!


> I'm not sure that vector<int> should work

Well, I think it "should" work in the sense that I shouldn't have to type "vector<mutable int>" just to get a vector of mutable ints. It's just too much typing for zero benefit. How exactly you make that work is a separate question; you can do it at both the the language and library level. A compromise might be to make 'mutable' be a storage class (like 'register', or like how it already is for class members) rather than a type qualifier. Note that even making it a storage class has a downside: 'return v;' will now copy-construct its output instead of moving it. You'd have to mess with the const rules to get around that. It might be possible but I'd need to think through the implications and actually play around with it for a while before I could suggest that it would actually work well.

> lambda captures [...] Is there a compromise here?

I don't know honestly. One idea could be to see if some dataflow analysis could tell you if the lambda might leak from the scope it's declared in, and you could warn on that. I think there are already tools (like clang-tidy, cppcheck, etc.) that give you warnings of this sort; I'm not sure if they fully handle this case though, you'll have to check and see if those handle the cases you want. It almost certainly won't be something you could whip up in a few hours, in case that's what you were hoping for.

> I'm not sure that goto is required when one could use do { ... } while(false); with break statements for cases where goto would've been used (not ideal, but again this is an iterative approach).

That's in no way a substitute for a goto. Sometimes you really do need the ability to jump in, not just jump out. Imagine a state machine/coroutine/etc.—it's not impossible to write them without goto, but sometimes you'd have to go through contortions and write unnatural/unmaintainable logic to write them without goto. Yes C++20 has coroutine support now but it's mediocre at best and isn't suitable for every use case.

> C style casts to void are useful for some memory operations but I'm not sure there's a case where they're required.

Edit: (void) isn't required anywhere I know of, but there are lots of places where it's helpful to have, and completely unhelpful not to have. Here's one:

  void foo(void *p)
  {
  #if NDBUG
    bar(p);
  #else
    (void)p;  // suppress "unused parameter" warning
  #endif
  }
Sure you can do static_cast<void>(p) but that's not buying you anything. It's not the end of the world, but it's just wasting your time and making your code more verbose to read. I don't have a problem with more verbose typing when it actually buys you something, but there are cases where it doesn't, and this is one of them.

In fact, a better rule might #5 below. I'm not sure there's a reason to ban the C-style cast entirely; it could be much more useful and safer than it is now.

Meta-rule of thumb: you need to make sure you're familiar with the vast array of use cases and scenarios people encounter in real-world C++ before you can come up with rules for other C++ devs to follow. The committee itself has a hard enough time doing this for a good reason—because it's hard! If you are going to propose that some feature is unnecessary, it should be a conclusion you draw after you've already used that feature in its "most useful" context (and found a good alternative)—not before that. Most features have some very compelling use cases, so if you haven't found a compelling use case for a feature ("compelling" assuming you disregard any downsides it might have in other contexts) then there's a good chance you simply haven't come across it yet, rather than it having been unnecessary to begin with. It's usually enlightening (and honestly kind of fun) to try to figure that out before rushing to get rid of it.

> I'd love to hear some of your rules

Sorry I wrote this comment but forgot to respond to this part. I'd have to sit down and think through a lot of them before I can share them with any confidence honestly. But just going off the top of my head, here might be a few:

(1) Conversion operators (like constructors as you mentioned) should probably be explicit by default too

(2) Shadowing local variables (or parameters) in a surrounding scope should probably require something like [[shadow]] somewhere to make it abundantly obvious it's intentional (and its use cases would be incredibly rare)

(3) Initializing a variable by passing itself as an argument should be disallowed (so struct MyClass { int x; MyClass() : x(x) { } }; should be illegal, i.e. the equivalent of -Werror=init-self should be mandatory)

(4) value-initialization should probably be the default, but with a way to override it and perform default-initialization when there's actually a reason to (but perhaps -Wuninitialized should still treat the variable as uninitialized regardless)

(5) Perhaps the C-style cast should really be equivalent to a static_cast except in cases where a dynamic_cast/reinterpret_cast/const_cast would also be legal, in which case it should be an error? That would make it safer than static_cast (since it's more restrictive in where it's allowed), rather than more dangerous, and it would require less typing as well.


Wow, thank you for this rich and thorough follow up. Interestingly, in the present C++ standard, 'mutable' indeed is a storage class like 'register' while 'const' is a CV-qualifier. I thought it quite odd that 'mutable' isn't a CV-qualifier (the standard leaves this gap open) and for nonmember variables, my patch makes it a CV-qualifier; otherwise it remains a storage class specifier.

WRT lambda changes, I have no particular timeline for this project as it's something I took up on the side. Pointers kinda break any data flow analysis that could be done. For example, imagine an object that serializes itself in one translation unit and is deserialized in another using a different class (this is somewhat common in telecommunications code. Imagine a struct Header { ..., void end[0]; }; which is used to handle messages of variable length but with the same Header types).

Java does just fine without goto (or have they added that since 2010?).

I think code that is intentional is more effective than code that is accidental. That said, I'd rather suppress the unused parameter warning with the '#pragma diagnostic ignored' mechanism than use a cast mechanic that just happens to address a compiler issue.

Thank you for the great list of rules! I agree with them all. Minor nit: you can't use static_cast to cast away constness.


> Java does just fine without goto (or have they added that since 2010?).

C++ is quite literally meant for use cases where Java (or Python or C# or Go or pretty much any other language) is not "just fine". And as I mentioned above, you CAN get by without goto. You just have to go through (go to?) contortions in certain cases without it that make the situation worse rather than better. (And there is no reason to believe such use cases are equally common across all languages, so keep that in mind. For example hardware contexts require dealing with explicit state machines a lot more than software contexts do, and C/C++ are used more in those contexts—to name just one example.) Remember Java was doing "just fine" without lambdas, and it's still doing "just fine" without templates, value types, manual memory management, and a million other things you find in C++. Even C was also doing "just fine" without generics and destructors, but then they realized they're missing out and finally added it.

You have to realize, goto is basically a religion nowadays. People want to believe goto has no legitimate use cases, because (I can only assume) they're scared someone will use it as an excuse to utilize it irresponsibly outside those contexts. Kind of like why some drugs require prescriptions, I guess. I can't stop people from believing what they want, but as far as facts go, it does have use cases that many people simply don't encounter, and I tried to list some of them in my comments above.

> Minor nit: you can't use static_cast to cast away constness.

You actually can! Check this out:

  int const b = 1;
  int const *p = &b;
  **static_cast<int **>(static_cast<void *>(static_cast<int const **>(&p))) = 2;
  assert(p == &b && *p == 2);
If this is surprising... I would take it as an indication that it's difficult to foresee what can be done even with the commonplace features in the language (in both good and bad directions), let alone the rare ones (like goto).


Can you cast away const with a single static_cast? Your static_cast chain is analogous to my example of serializing/deserializing across TUs because the memory is treated as a pointer to storage (your second static_cast<void*>). As I understand, the original question you asked pertained to using one of the C++ cast operators in place of a single C style cast.


Oh shoot I'm sorry, I misunderstood your comment. Yeah you're right, the const_cast isn't important for casting away const, so it's not important for that rule as far as that goes, good point! This is why I also need to sit down and think about these before I can be confident in them :)


> All basic types (excluding pointers and references) are const by default and may be marked 'mutable' to allow them to be changed after declaration

FWIW, for me, this is an anti-feature, and I would not use this language because of it. The net effect of this would be that I type "mutable" all over the place and get very little for my effort.

I've spent a significant amount of time understanding what the high-consequence programming errors that I make are, and "oops, I mutated that thing that I could have marked const" is a class of error that consumes a vanishingly small amount of my debugging time.

The errors I make that account for a large portion of my debugging time are errors related to semantics of my program. Things that, in C++, are typically only detectable at runtime, but with a better type system could be detected at compile time. The first step for this might be type annotations that specify valid values. For example, being able to annotate whether an argument is or is not allowed to be null, and having that enforced at call sites.

(NOTE: I also don't spend a meaningful amount of time debugging accidental nullptr values, but that's a good first step towards the type annotations I _do_ want)


> The net effect of this would be that I type "mutable" all over the place

You might want to consider adopting a more modern programming style for the benefit of your coworkers (and possibly yourself). Mutability all over the place is a nightmare, speaking as someone who currently has to work in a large codebase written like that. It's hard to predict what value a variable is going to have at any particular point in your code, since instead of only having to check where the variable is defined, you have to audit all the code between the definition and the use. For the same reason, it's hard to guarantee that your invariants are maintained, since there is a much larger surface area to check.


> You might want to consider adopting a more modern programming style

Just because some people think a particular style is useful doesn't mean everyone does. You might want to consider checking your biases before making comments like this. I understand the (many) arguments for using `const`, and I've concluded (for myself) that it's not a useful construct. Read and internalize the rest of my previous comment for more information.


It's not about biases. Using immutability is not just some personal preference. It's common knowledge that mutability everywhere is a bad practice. That's why there's a trend toward immutability by default in newer languages.


You would only type 'mutable' at the handful places that really need to be mutable. You would also likely delete the 'const' that are already all over the place. If you really use mutation all over the place, then either you work on something unusual or you should learn to do better.


> You would also likely delete the 'const'

I never type `const` in the first place. I don't find it useful, which was the point of my original comment.

> then either you work on something unusual

In C++, I've been working on game engines, compilers, interpreters and occasionally some reverse engineering. Not sure if that counts as unusual. I'm always thinking about perf, so if I can safely modify something in-place, I usually do.

> or you should learn to do better

I do just fine without `const`, thank you. Maybe you should learn to have a more open mind.


const can serve to mark invariants in your code as well as provide the compiler with information that it can use for optimizations. I highly recommend Scott Meyers' "Effective C++" wherein Item #3 is to 'Use const whenever possible'.


Apart from the mutable keyword, can't these be implemented as a clang diagnostic plugin? Then it can be used to enforce a stricter style guide. As another commenter pointed, mutable will be probably of limited use anyway.


Yes, some (maybe most) could be implemented in a plugin. I wanted to make these changes in part to better understand the clang internals and also show that rather than use external tooling, the language itself can (should?) be changed.


I would submit that C++ has enough inertia (as a language and as an ecosystem) that changing the language itself would be difficult.

However, C++ is a huge language and if there's a way to enforce safety by using only a subset of the language + tooling to help you do that, your improvements could be adopted piecemeal by teams looking to level up their codebase a bit.

Many languages have a way to opt in to e.g. strict type checking on a per-file basis. It would be really cool to see these improvements implemented in such a way that existing codebases could gradually adopt them.


Thank you, this indeed is the approach that I'd like to take. Like I mentioned in the commit message on the patch, one of my goals is to show that we can iterate the language (and our codebases) gradually. I'll add a feature flag to my patch to selectively enable and disable these changes.


I really do not understand the Rust-esque love of the "mutable" keyword in rebellion of "const". They are most often attached to variables. The definition of the word variable is "subject to variation or changes". By definition, variables change. Constants do not change. I understand that the semantics here are historical, but it's very much like "Automated ATM Machine". Maybe I just don't like the word mutable, and would prefer "var" or "varying".


"const" historically means "compile-time constant".

A "mutable" variable is contrasted to an "immutable" variable, not to a "constant".

You may not like the name "variable" for something that cannot be changed within a given scope, but it's still something that can take multiple values during the execution of a program.


I think you missed the point I was trying to make. C/C++ currently: 1.) "variable" -> something that is subject to change 2.) "const variable" -> an unchanging thing that is subject to change (I guess you could say it only changes once).

This thing and Rust: "constant" -> something unable to change "mutable constant" -> a changeable constant...what? Even "mutable variable" -> a changeable thing that is a thing that is subject change doesn't make much more sense.

It is fine for "things" to be immutable by default, and in fact I think they should be. I just think "mutable" is keyword smell similar to "decltype" because type wasn't keyworded from the start.


> This thing and Rust: "constant" -> something unable to change "mutable constant" -> a changeable constant...what?

"What?" indeed. Rust doesn't have "mutable constant". Rust's "const" is actually a constant, unlike in C where "const" means "Sort of immutable, although maybe not".

I guess maybe you've been told something about Rust like "let is kinda like C++ const" and so you've come to the erroneous conclusion that somehow "let mut" means "mutable constant" but that's just because you didn't really understand, blame either your attention or the poor explanation, it's surely nothing to do with Rust which has never said this is "mutable constant" since that's nonsense.


All non-void return values should be [[nodiscard]] by default. Of course then you will need something else ([[discardable]]?) to indicate the ones that may safely be ignored.


Couldn't [[maybe_unused]] fill the role of [[discardable]]?


Interesting idea!


This is a really great idea, specially if you can write a transpiler from general C++ into modified C++ (it can error out on corner cases and ask for manual intervention, but trivial stuff like adding missing braces can and should be done by an automatic tool, like rustfix is used to migrate between Rust editions https://github.com/rust-lang/rustfix)

But here you didn't tackle the main thing: a plan to make simple business logic not corrupt memory and cause havok with UB! Dereferencing arbitrary pointers is a dangerous operation that shouldn't be done in everyday code. If I'm writing data structures I'm willing to think about UB, but if I'm choosing the color of a widget I'm less so. I'm not expecting you solve this hard problem, but at least a general direction or a half solution that works for a % of the cases would be cool (or at least state this is a long term goal).

And of course there's the comparison to Rust, but Rust is actually just a data point in this solution space and perhaps new languages can afford to try new approaches


Clang-tidy can detect some of the "Modified C++" constraints like no implicit conversions to bool and can even suggest automatic fixes and also apply them with clang-apply-replacements. Clang-tidy is already a transpiler for "Modified C++".


Indeed. I conceded in my commit message that a linter or checker (such as clang-tidy) could be used to implement some or most of what I suggested (but not the mutable/const change of course). Aside from learning more about how clang works, I wanted to show that we can (and should?) modify the language. There appears to be inertia to drop legacy at the committee level. C++ needs a Snow Leopard release to do some of what I've done here.


Great suggestions all around. UB is a beast of a problem and compilers take advantage of UB for optimizations (some going so far as to break programmers' code).


Ok some more suggestions: - Pointers aren't arrays. - no implicit conversions at all. - require fields to be initialized before use/end of constructor


Some implicit conversions are okay, like type promotion from int to double. Some type coercions are fraught, like char to int or back again. I agree that array decay to pointer could be explicit, and pointers shouldn't cast to arrays.


implicit int to double is really, really bad! it can silently truncate - double can only store 53 bits integers so for large integers the result will not be an integer!

in general, lossy conversions should never, ever be implicit


Great point, I was thinking of ints as 32 bits. You're absolutely correct for 64 bit ints!


I would instead say int should be guaranteed to fit in a double. I feel like there's no reason to introduce a pitfall in > 99.999% of use cases just because there might be some obscure architecture where int is 64-bit and its programmers cannot be bothered with the extra keystroke for 'long'.


I'd love to see integer promotion die in a fire, to prevent this: https://twitter.com/stephentyrone/status/1410636445593837569


> require fields to be initialized before use

Good idea. That would catch annoying bugs I rarely, but occasionally, have.


I think it's a great experiment, keep doing what you're doing. Removing the footguns from C++ without wildly changing the syntax up is a solid idea.


Thanks! Aside from the default const/mutable change, this was my approach. To improve adoption, it would be easy to add a feature flag for this set of changes which could be applied on a per file basis.


Rather than build a new compiler, I wonder if this might be easier to integrate as a static checker. IMO clang static checks are not that difficult to write. The hardest thing can be the query to find the interesting elements. But you're banning/requiring fairly high-level language elements so they should be pretty easy queries to write.


Agreed, that's why I started by modifying clang. I think we can start dropping some of the crufty legacy in the C++ language without throwing it all out and starting again. While clang tidy could be used to check for a lot of these, I wanted to show that we could change the language directly and what that could look like.


> I think we can start dropping some of the crufty legacy in the C++ language (...)

Do you have any concrete example of what you perceive as being "crufty legacy"?


C style casts, already prohibited by this patch, seems to be a good example.


> C style casts (...)

Those are pretty much irrelevant since at least C++98, specially as not only are they used voluntarily but also under the hood they are already handled with explicit casts.

Is this the best argument there is to break backwards compatibility?


I hate writing proper C++ casts because C style is just shorter (e.g. (T)(foobar)), and have to force myself to write the whole invocation. So yeah, not even having the lazy option wouldn't be too bad :)


what's the motivation for removing `goto`, is this something that you find being abused? I code in c++ for work, and I almost never see anyone using it without a good reason.


My personal opinion is that a programming language should instead of 'goto', have explicit constructs for those things that 'goto' is most often used to emulate:

• Breaking out of nested loops

• Clause after loop that has run to its end-condition without a break, return or throw. Python allows an 'else'-clause after a loop, but IMHO "default" would be a better keyword.

• Error handling (C++ has exception handling already, but there are alternatives)


This is interesting, are there languages in use today with these constructs?


Yeah, I'm curious about this as well. I know the Go and Lua creators explicitly included goto in their languages, saying it can be useful if used carefully.


> I almost never see anyone using it without a good reason

In any of those cases, could the code have been rewritten without goto?


- define the evaluation order for function parameters, e.g. f(a(), b()) [or is that well defined in modern C++?]

- allow for named arguments. E.g. let's say for the definition f(int a = 12, int b = 42), one might call f(b: 1337) or f(12, 1337). Not allowing mixing of named & positional is probably a good idea.

- take a look at static verification and remove language features that make static verification more difficult and think about how you could replace them (or remove; but e.g. function pointers & similar stuff fall in that category and are probably to powerful to be sacrificed this way)

//Edit: as others have said, try to whack as much undefined behaviour as possible (and in case you can't, don't accept the input program).


Btw, a lot of bad things are already present as compile time warnings. Using -Wall -Werror (or similar) should already the default for new projects for well over a decade.

You probably know that, but "stealing" stuff from there is also a good idea. Plus, iirc, there are several additional warning options that are not contained in -Wall. Maybe you want to add some of these, too.


Great suggestion.


Completely agree! Thank you for these suggestions.


Couldn't most of these be covered by a linter? I'm not sure you really need a new language for this. Even right now in Visual Studio resharper is constantly telling me about things that can be constexpr or const and a lot of the other things you mention here.


You don't even need a linter for most of this. Just turn on modern compiler diagnostics and you'll have common footguns flagged.


I think this is an interesting idea but I also think it will never gain adoption.

Move constructors/Move assignment should be noexcept by default. It's not entirely clear to me what a program ought to do if a move constructor/assignment operator throws an exception. In a general sense you cannot 'trust' the old object to not have been modified.

"All basic types (excluding pointers and references) are const by default" -- why the exception?

The rule of zero should be acceptable in addition to the rule of 6. Also, the rule of 5 is acceptable in many circumstances; lots of classes should not have default constructors. I agree that having 1,2,3, or 4 are bad, but 0,5,6 are acceptable.


Many standard containers don't have noexcept move operations in Microsoft's implementation. It is conforming. Having a throwing move doesn't mean that you can't have exception guarantees. Just do the throwing operations before you modify the source or target objects.


One suggestion: In documentation and comments, distinguish clearly between "const" and "constant".

"const" means "read-only", and probably should have been spelled "readonly".

"constant", as in "constant expression", means evaluated at compile time.

For example: `const int r = rand();` is perfectly valid: r can't be computed until run time (it's not constant), but it can't be modified after its initialization (it is const/readonly).


How hard would it be to automatically convert some existing C++ into the new language? It seems like your compiler can diagnose the errors, so inserting `mutable` and `bool(...)` should be possible.

It might be interesting to do this on an existing codebase just to see where mutable is needed.


No C style casts allowed, so maybe static_cast<bool>(...) ;-)


I'm not sure about explicit braces for cases in `switch`. I think what Swift does is pretty neat: each case breaks by default, so you don't have to write `break;`, instead you have to write `fallthrough` to explicitly allow them falling through.


That is neat, I'll look further into it.


In the libreoffice project, we have implemented some of this eg. no c style casts, using clang plugins, which avoids needing a custom build of clang. But always good to see people experimenting in this space!


That's great to hear! I'm sure I could find it, but are the libreoffice coding standards written down (and could you send them my way)?

On the topic of coding standards, there's an excellent github repo https://github.com/isocpp/CppCoreGuidelines which even quotes Bjarne Stroustrup as saying "Within C++ is a smaller, simpler, safer language struggling to get out." There are hundreds of recommendations, like 'ES.31: Don't use macros for constants or "functions"'.


So we don't really have strong coding standards (kind of tricky when you inherit a 10 million LOC codebase written over ~20 years), we just try to be pragmatic and improve the code where we can.

So we have a collection of plugins, see here: https://cgit.freedesktop.org/libreoffice/core/tree/compilerp...

Which verify a variety of things.

We focus on 2 things: finding dodgy code and using APIs correctly. We don't try to modify the C++ language, just restrict accidentally straying into some of the really nasty corners.

But I like to keep an eye on experiments like yours for ideas :-)

e.g.

no c-style casts: https://cgit.freedesktop.org/libreoffice/core/tree/compilerp...

use the comma-operator sparingly: https://cgit.freedesktop.org/libreoffice/core/tree/compilerp...

is your loop variable really big enough: https://cgit.freedesktop.org/libreoffice/core/tree/compilerp...

calling virtual methods from destructors is dodgy: https://cgit.freedesktop.org/libreoffice/core/tree/compilerp...


How about making atomics mutable through const&, adding move-by-default, and marking all constructors (value, conversion, and copy) as explicit aside from move, and probably add explicit copy assignment as well?


Without an Issue tracker on the GH fork it will be hard to add ideas.

I would add: removing Unicode identifiers, because identifiers are meant to be identifiable.


Great suggestion, I'll set up an issue tracker. A few people have ventured into the code to leave some comments inline, I welcome that.


const-by-default is definitely nice. Does this extend to both sides of a pointer type? Does int * refer to int const * const?

There is nothing wrong with [&] for short-lifetime lambdas. Lambdas passed to std algorithms or immediately invoked lambdas come to mind.

edit:

Are data members also const by default? How do I declare a non-const data member that is const when accessed within a const member function? (so non-const non-mutable in original c++)


In my first release I didn't address pointers or references and need to think on them further. I'll release updates over time that address these as well as many of the great ideas that I've received from other commenters.


Add something like a break_n/break_label that allows for leaving a loop from a switch statement. Or allow goto in that limited case.


IMO it would help adoption if you supply a clang-powered rewriter into and out of your language variant. It allays the fear of losing your codebase if the compiler project dies.

Reverse the default for typename. Currently some_class<T>::thing is assumed to be an expression where 'thing' is a variable, when we don't know which template pattern to use because there may be an explicit specialization on the T that the user chooses. Hence, we have to say "typename std::vector<T>::iterator it;" instead of just saying "std::vector<T>::iterator it;". Instead, reverse that and assume it's a type by default unless shown that it's an expression. You'll need a new keyword for that, replacing "typename".

Remove the promotion-to-int rules. Currently in C (and in C++)

  unsigned short test(unsigned short a, unsigned short b, unsigned short c) {
    unsigned short x = a * b * c;
    return x;
  }
can have UB as signed integer overflow because any math done on an object smaller than int gets promoted to int. (No, you can't fix this with "(((unsigned short)x) * ((unsigned short)y))" the promotion happens on 's LHS and RHS, if those have types smaller than int.) Beyond this, people seem to expect that the type of the variable declaration will appertains to the calculation on the right, but it doesn't. For instance people seem to think "float f = a + b;" can't overflow where 'a' and 'b' are ints, because the assignment is going into a float.

I haven't thought this idea through completely yet. Extend pointer types to include a static allocation identity as part of the type. Address-of local variable or global variable should produce one of these pointers. A "static allocation identity" is a special-typed zero-size variable, so you can stick it in code or as a class member. You could have pointers that were guaranteed to be allocated by THIS allocation point, instead of pointing to every possible T in the program. I'll fake up a syntax, "tree_node ^ tree::node_alloc ". It's known not to alias any other TreeNode the program might have, it has to be attached to the allocation point owned by that specific "node_alloc" in that object. (Let me phrase it differently. A tree in C or C++ has pointers which can point anywhere as long as it's another tree node type. That could be pointing to a different tree, it could be a self-pointer, it could be pointing up the tree, and so on. If your tree_node class has an allocation root, you can say that the pointers are things allocated through this allocation root. They can not outlive the allocation root. They are distinct from the things allocated by other allocation roots, which are the same tree_node types, but different tree_node objects. The node's list of children is std::vector<std::unique_ptr<tree_node ^ node_alloc >> so it clearly only holds pointers it allocated itself.)

There's another problem with pointer related to the above. Some code I saw used a "T &get_or_default<K, V>(Container &c, K key, V &default);" and the problem was that people would call it with a temporary for the default, like "Value &x = get_or_default(mymap, key, Value());" and they'd be holding a dangling reference. If you could make that an error, that'd be great. Maybe we use a trick like the "allocation root" above and treat pointers or references to temporaries have different type from the local variable. Then get_or_default takes and returns a reference-to-temporary and attempting to assign that to a reference in a variable declaration fails. Unlike the previous "allocation root" idea where you indicate the only thing you accept, this would be a case where you accept all allocation roots except one, the "temporaries" allocation root.

As far as I know, no compiler takes advantage of the freedom of the order of operations except in the most trivial ways. Everyone knows that in "f() g() + h()" that * must happen before +, but people think this means that f() and g() must happen before h(). No, they may happen in any order at all. I had to fix a lot of code that did "Print(stream.read(), stream.size())" where "read" updates the pointer and leaves size == 0: gcc ran stream.size() first and clang ran stream.read() first, setting the subsequent size to zero. Similar issue with "expr1() = expr2();" expressions.

Extend switch() and case to work on objects with any operator== defined. Add a statement for fallthrough and default to 'break;' before the start of the next case-label. Give each case label its own scope so I can declare variables in there without adding my own curly-braces. (Bonus 1 can you design a way to ensure that case labels are not overlapping? May require something other than operator==. Bonus 2 can you allow cases to be structured binding matches, similar to Rust?)

Speaking of structured binding, it's great but doesn't allow nesting. This

  std::vector<std::unordered_map<std::string, std::pair<int, int>>> v;
  for (auto [name, [lhsid, rhsid]] : v) {
is code I actually wanted to write in the past week yet that's a syntax error.

Add the ability to declare object inheritance ("class Derived : Base;") so that I can cast between them before writing out the body of the derived class. Also allow me to write out the entire class tree with no possibility for extension in another translation unit. The "final" keyword states that a class may not be derived from, but I usually have a Base class which does have subclasses, but a known list of subclasses that will never grow without recompiling the whole project. Currently the compiler has to assume I could write a new subclass and compile it into a shared object that the existing program dlopen's and the existing program will work. It's crazy. No, I have the final tree not just some leaf classes, please devirtualize the whole thing for me.

Are ABI changes on the table? Explicit template instantiations and explicit specializations should mangle differently. See my comment elsewhere: https://github.com/dealii/dealii/issues/3705#issuecomment-11...

If I think of some more, I'll reply to myself.


I'd like to be able to redeclare a member variable from the base in my derived with a covariant type.

  class FontInfo { /* ... */ };
  class TTFFontInfo : public FontInfo { /* ... */ };
  
  class Font { protected: FontInfo *info; /* ... */ };
  class TTFFont : public Font { public: using TTFFontInfo *info = Font::info; /* ... */ };
making TTFFont::info an alias for Font::info but with the derived type.


break and continue with an integer argument, how many times to break. "break" breaks out of nearest loop, "break 2" breaks out of two. "continue 2" breaks out of one loop and then performs "continue" on the loop outside that.


Labels would be more readable for that IMO:

    outer:
    for (...)
      for (...)
        if (...)
          break outer;


This is a neat idea but I'm afraid it's fraught in that the resulting code could be as hard to maintain as the same written with goto.


Could you please write a translator, that would convert into the modified syntax instead of emitting an error?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: