C++ Modules Might Be Dead-On-Arrival

WalterBright · on Jan 28, 2019

D made the decision at the beginning to have filename==modulename specifically to avoid this problem. So:

    import std.stdio;

means look for:

    std/stdio.d

slavik81 · on Jan 28, 2019

Have there been any downsides or annoyances with that approach?

WalterBright · on Jan 28, 2019

1. The filename characters have to be valid D identifier characters. This annoys some people.

2. Because Windows has case-insensitive filenames, and Linux, etc., have case sensitive filenames, we recommend that path/filenames be in lower case for portability. This annoys some people.

3. There are command line switches to map from module names to filenames for special purposes. They're very rarely needed, but invaluable when they are.

Overall, it's been very successful.

mort96 · on Jan 28, 2019

> 2. Because Windows has case-insensitive filenames, and Linux, etc., have case sensitive filenames, we recommend that path/filenames be in lower case for portability. This annoys some people.

That's also a problem in most other languages; just a couple of days ago, someone else's C++ code didn't compile on my machine because they had accidentally included <something/whatever.h> when the file was actually named <something/Whatever.h>, because macOS is case insensitive. I had the same experience with JavaScript some months ago, that time because they were running Windows.

On the "filenames must be valid identifiers" thing; I really wish more languages would start allowing kebab-case in identifiers. That's also absolutely not a D thing, more of a common complaint about most languages.

giornogiovanna · on Jan 28, 2019

Kebab-case is, in my opinion, the most beautiful of identifier cases, but how would you get around the ambiguity with subtraction, short of going full Lisp or adding whitespace sensitivity?

mort96 · on Jan 28, 2019

I don't think requiring whitespace around operators is such a bad thing. My personal coding style generally looks like `int foo = 10 + (something - 20);` anyways, and I think that style is a lot more readable than `int foo=10+(something-20);`. If you require whitespace around almost all operators, you open up the possibility of naming identifiers basically anything, which lets you have conventions like naming predicates or boolean members with a question mark at the end. In my opinion, `myObject.whatever?` looks a lot better than `myObject.isWhatever`.

Exactly which operators should require whitespace and which don't is up for debate, but in my personal opinion, requiring space around infix operators and letting prefix/postfix operators not require a space would be appropriate. Nobody wants to have to write `myArray [i]`, but I think most people would be willing to give up `i-1` and instead write `i - 1`.

sacado2 · on Jan 29, 2019

I like to be able to remove spaces in complex expressions, for readability issues. For instance, this :

    t[x] = t[x-1] + t[x-2]

looks more readable to me than this :

    t[x] = t[x - 1] + t[x - 2]

Another example :

    y = a*x1 + b*x2 + c

wtetzner · on Jan 28, 2019

> adding whitespace sensitivity?

Would adding whitespace sensitivity really be a problem? You already need whitespace to separate identifiers, so it's not a totally foreign concept in mainstream languages.

It seems like we've been making a weird trdeoff, by disallowing kebab-case just so we can smash our operators together with our operands.

phkahler · on Jan 28, 2019

>> It seems like we've been making a weird trdeoff, by disallowing kebab-case just so we can smash our operators together with our operands.

Some people don't want to bother putting a space between operators and operands, and proponents of kebab-case just don't want to push the shift key to get an underscore.

wtetzner · on Jan 28, 2019

To get kebab-case, the restriction that identifiers cannot start or end with '-' gets you pretty far. The only whitespace change is that you sometimes need whitespace around an infix '-'. Other operators are still fine, and it still works fine for prefix and postfix operators.

Also, the reason to prefer kabab-case for me has nothing to do with avoiding a keypress. It's that I find kebab-case easier to read.

ModernMech · on Jan 28, 2019

As the other poster mentioned, you gain more than just the dash when forcing white space; you get forced readability and characters like / ? and ^ in identifiers. Then you can name things like foo/bar or e^x

mikepurvis · on Jan 28, 2019

I suppose I could get used to that, but it sounds like a nightmare at first blush.

plopz · on Jan 28, 2019

I don't think I could handle i ++ instead of i++

mort96 · on Jan 28, 2019

You wouldn't have to require whitespace around all operators; you could, for example, decide that infix math/bitwise/logic operators must have a space around them, while other operators (like the infix operators `.` and `->`, and the prefix/suffix `++`, `--`, `[...]`, and `!`) wouldn't require whitespace.

I agree that nobody would want to write `i ++` or `foo [10]` or `myvar . mymember`, but I think a lot of people could get behind `10 - 20` and `foo && bar` instead of `10-20` and `foo&&bar`.

m48 · on Jan 28, 2019

Does one actually need whitespace to separate identifiers? This is something that's always bugged me a bit.

Aside from C-style type declarations ("unsigned int x;"), C-style syntaxes seem to always have ways other than whitespace to separate identifiers.

Like (using some JavaScript in a hypothetical example) I can't think of many concrete reasons why this is easier to parse:

  let first_number=2, second_number=2, answer=first_number-second_number;

...than this:

  let first number=2, second number=2, answer=first number-second number;

Although, of course, some languages—most Lisps, Tcl, and Red/REBOL come to mind—actually do rely on whitespace and whitespace alone to separate identifiers in many situations, and something like this would likely be unworkable there.

_19qg · on Jan 28, 2019

Though one can use whitespace in Common Lisp identifiers, by quoting symbols:

  CL-USER 115 > (let ((first| |number 10)
                      (second\ number 20))
                  (+ first\ number |SECOND NUMBER|))
  30

m48 · on Jan 28, 2019

Oh, I never noticed that. That's pretty interesting—although unfortunately, that syntax does not look terribly convenient to write, which is the main thing I'm after here.

I think the best way to get identifiers with whitespace to work in a Lisp would be contrive a syntax for S-expressions that uses something other than whitespace to separate things. Perhaps letting (first rest-1 rest-2 ...) be written as as (first: rest 1, rest 2, ...) or (first, rest 1, rest 2, ...), so that example could be written as:

  (let: ((first number: 10), 
         (second number: 20)), 
        (+: first number, second number))

I imagine it would be possible to write a macro in Common Lisp to transform this into runnable code, or a language in Racket to do so—although, I'm not sure how many people would actually want to make or use something like this.

kazinator · on Jan 29, 2019

Not all whitespace you see in Lisp code is strictly necessary:

TXR Lisp:

  1> (list 1"a"'(b(c)d(e)))
  (1 "a" (b (c) d (e)))

Here we just have one space that prevents list 1 from being list1.

mkl · on Jan 28, 2019

TikZ allows whitespace in identifiers. At first it was pretty strange, but I actually really like it now. I don't think parsing it is much of a problem, and I would quite like to be able to use it in other languages.

m48 · on Jan 28, 2019

Well, thinking about it more, I did realize there's a pretty nasty edge case with my hypothetical JavaScript syntax:

  let let x = 5;
  let x = 6; 
  // should this set the variable "let x"?
  // or define a variable named "x"?

One could potentially design around situations like this, but allowing whitespace in identifiers likely does require being much more meticulous about treatment of reserved words, identifiers, and whitespace than more traditional syntaxes, and this is likely why not many people attempt this.

I think the idea is worth experimenting with, though, and that a good implementation of it could be convenient enough for end-users to outweigh the implementation inconvenience.

blt · on Jan 28, 2019

I generally separate operands, but in dense math expressions it's not always the best for readability.

   x = (-b + sqrt(b**2 + 4*a*c) / (2*a)
   
   x = (- b + sqrt(b ** 2 + 4 * a * c) / (2 * a)

... Although, one could argue that allowing tightened multiplication and division are enough.

AnimalMuppet · on Jan 28, 2019

Not to be that guy, but...

  x = (-b + sqrt(b**2 - 4*a*c) / (2*a)

blt · on Feb 1, 2019

lol, thank you.

wtetzner · on Jan 28, 2019

Well, if you're only interested in kebab-case, then specifying that identifiers can't start or end with hyphens solves most of the problem. The only restriction would be that you'd need whitespace around '-' only when used as an infix operator.

tzs · on Jan 28, 2019

Use +- for subtraction, where + is binary plus and - is unary minus.

a1369209993 · on Jan 28, 2019

Not sure if it's a good idea, but maybe try:

  foo-bar # kebab-case
  foo−bar # subtraction
  foo minus bar # subtraction? (infix identifier "minus")
  foo - bar # subtraction (infix identifier "-")
  foo − bar # subtraction (operator symbol)

using \u2212 as a explicit subtraction operator for people who really can't stand having 'extra' whitespace?

slobotron · on Jan 28, 2019

Perl6 allows it, and it's the preferred convention to boot.

Small Intro: https://perl6advent.wordpress.com/2015/12/05/day-5-identifie...

kps · on Jan 28, 2019

Use ‘−’ for subtraction, allow ‘–’ and ‘‐’ in identifiers, and report an error for ‘-’.

mort96 · on Jan 28, 2019

Sure, just make everyone replace their keyboard with one which has two `-` buttons and make everyone understand why they need two buttons for something which looks like the same letter and you can safely use – for one thing and — for the other.

xyproto · on Jan 28, 2019

Those people are easily annoyed.

afiori · on Jan 28, 2019

> 1. The filename characters have to be valid D identifier characters. This annoys some people.

this could be solved by allowing strings in qualified imports

  import "illegal identifier"."some more weird unicode" as someLib;

AnIdiotOnTheNet · on Jan 28, 2019

Which, incidentally, is how Zig does it:

  const thing = @import("relative/path/to/thing.zig");
  const package = @import("packagename");

gpm · on Jan 28, 2019

And rust

`extern crate foo`

`extern crate "foo-bar" as foo_bar`

This is all legal identifiers but

`use std::path::Path;`

`use std::path::Path as int; // For maximum confusion`

And not that anyone uses this part but:

`mod bazz;`

`#[path = "bazz-bar.rs"] mod bazz_bar;`

coldtea · on Jan 28, 2019

>The filename characters have to be valid D identifier characters. This annoys some people.

I'd say "valid X language identifier characters" should always be ASCII.

I never understood the BS fad for unicode identifiers.

Wanna allow some math symbols? Maybe. The full unicode gamut, so that you can have a variable named shit emoji? Yeah, no.

enedil · on Jan 28, 2019

But that's not only about Unicode. You can't start your filename with digit or a dash.

ZiiS · on Jan 28, 2019

This stops most of the world naming things in their native language.

ckastner · on Jan 28, 2019

As a non-native English speaker, I understand the desire to name things in my native language (German), but for all but languae francae, naming things in a native language presents an obstacle to sharing these things with others.

Compare итератор and 迭代器, which are complete mysteries to me produced by Google Translate. If my intention were to reach as many people as possible, I'd use "iterator" (which, coincidentally, works for English and my native German).

johannes1234321 · on Jan 28, 2019

There are technical terms and there is domain terminology.

If once worked in finance and there is a difference between GAAP accounting and German accounting rules. If my algorithms used English terminology to be consistent with technical terms this would be confusing inneach review. Using German terms (even combined with English "get" or "set", like "getBetriebsertrag") there was beneficial, even though it always confussed new members of the team.

beaconstudios · on Jan 28, 2019

funnily enough, the Russian also spells "iterator", just in Cyrillic. But your point stands nonetheless.

kungtotte · on Jan 28, 2019

Speaking as someone who doesn't have English as their first language I think programming should be in English and ASCII. This includes identifiers and filenames. Strings on the other hand should be 100% valid Unicode, never ASCII.

This means the code can be read by anyone anywhere in the world on any operating system and that string payloads can similarly be read by anyone anywhere in the world.

ptx · on Jan 28, 2019

Since keywords and APIs are usually in English, continuing to follow that convention in your variable names is often the most natural option.

But in cases where the program will be dealing with some concept that doesn't exist in English, being able to refer to things by their actual name, in the native language (assuming that's also the native language of the customer and development team), is much better than inventing a confusing and unnatural English translation.

lifthrasiir · on Jan 28, 2019

> This means the code can be read by anyone anywhere in the world on any operating system and that string payloads can similarly be read by anyone anywhere in the world.

Uh, no? You are not supposed to be able to read this valid Unicode string literal in Korean: `"그뤼고 이 문좌열은 일부려 기ㅖ버역을 어럽게하러고 오타비문이 산개해 있구먼요."`

Also a significant portion (and possibly the majority) of codes would be ever read and written by a small group of people, often sharing a common language other than English, so non-English code is just fine for them. If you are saying that a public library should be written in English, I almost agree---there would be some exceptions though.

yarosv · on Jan 28, 2019

Do you mean you want like C#?

    public class 그뤼고
    {
      private 이 이 {get;set;}

      public 그뤼고()
      {
        var 문좌열은 = 이.문좌열은;
        var 기ㅖ버역을 = Enumerable.Range(0, 10);
        var 오타비문이 = 기ㅖ버역을.Select(요 => 요);
      }
    }

    public class 이
    {
      public int 문좌열은 {get;set;}
    }

lifthrasiir · on Jan 28, 2019

Sometimes, though my example was intentionally obscured to prevent machine translation and you were not aware of Korean conjugations (usually omitted in identifiers) ;-)

I have seen numerous instances of pseudo-English when it comes to naming. It is hard to name things in non-native tongues. When reasonable, reducing that overhead can be indeed beneficial.

yarosv · on Feb 6, 2019

Oops (yes, I don't know what it meant). I mean I speak Ukrainian and Russian. If I saw the code like that in my native tongue, I would be upset.

Plus it is pain to alt+shift between languages all the time.

coldtea · on Jan 28, 2019

That's the main benefit.

A programming language's identifiers is not the place to express one's national identity. They should be utilitarian, and easily understood by programmers across countries.

Since you're already supposed to understand the syntax of every major programming language (which is based on english) you can make do with english keywords too. Nothing worse than opening some code to find bizarro foreign language identifiers.

(And I'm no native english speaker, so I'm not speaking as someone who's ok with this because english is their language or ASCII fits their default keyboard layout: it just makes sense).

scrollaway · on Jan 28, 2019

I see what you're saying, but then again, this is exactly how many chinese programmers feel all the time.

> Nothing worse than opening some code to find bizarro foreign language identifiers.

lultimouomo · on Jan 28, 2019

So we should live in a world where Chinese programmers share knowledge only with Chinese programmes, Americans share among them and with Brits, Aussies and Canadians, and Mexicans share with Spaniards?

I'm not a native English speaker, and I have only to loose in this scenario.

scrollaway · on Jan 28, 2019

Look, I worked for years in localization and I'm actually a proponent of English as a worldwide-spoken language. But I'm pointing out the irony in saying "Nothing worse than opening some code to find bizarro foreign language identifiers." when that is exactly how CJKHT(...) programmers, many of who do not speak a word of any western language, feel. Closer to the west, even people in countries that use cyrillic or greek alphabets are not necessarily familiar with latin transliteration and forcing it on them is dubious.

I mean, yeah, this can be made a requirement of programming languages: After all, it was such a requirement for a long, long time. But it doesn't have to be anymore.

BTW, full disclosure, I'm French and I find code written in french completely fucking unreadable. And that's 100% ASCII. As I said I believe code should be written in English, but I also don't think we should have essentially-artificial barriers for people to enter something as important as programming; those barriers only end up eroding the culture in question.

krapp · on Jan 28, 2019

No one is claiming Chinese programmers should only code in Chinese, but to assert that they or anyone must code in American English for the convenience of Western programmers seems absurd. Should they also be forced to write all of their novels and perform all of their movies and music in English as well?

We live in a multicultural world, one for which English as a default doesn't make sense in every context. Yes, it may be the case that Chinese, Spanish, Greek, German, Arabic, or other non-American ideogram using programmers write code primarily meant to be used and understood within their own culture. I see nothing wrong with that.

kungtotte · on Jan 28, 2019

Chinese programmers using ASCII and English names for things would also be of great help to all other East Asian programmers that aren't Chinese.

Also what about the fact that while Mandarin is the largest language/dialect it's far from the only one in China? Using English/ASCII means all the programmers in China can understand each others code...

detaro · on Jan 28, 2019

There are languages that allow more than ASCII, and that's not what happens in them.

yorwba · on Jan 28, 2019

For that matter, there are human languages besides English that can be written using only ASCII (sometimes by transliterating) and that's not what happens in them.

Just like people know they have to write in English on HN to communicate, they also write their code in English when they plan to open-source it and share with the rest of the world. As for closed-source projects ... if your company doesn't conduct its business in English, why force the code to be in English? The only people whom that'd benefit are never going to see it.

zozbot123 · on Jan 28, 2019

This is something that only humans care about, not computers. And humans can be accommodated by prettyprinting - or, in a pinch, by roundtrip conversion from an ASCII-only format to a "rich", Unicode-based one, and back. But let computers have their simple, ASCII-based identifier names. E.g. https://en.wikipedia.org/wiki/Punycode is a thing, and is routinely used for "native-language" domain names. But guess what, these domain names are still ASCII under the hood!

(Indeed, we should arguably move away from the notion of a single character string as the only human-facing semantics that an identifier is associated with-- there should be a higher layer, perhaps with multiple choices of e.g. native language, formatting and the like. Human facing semantics are closer to "literate" documentation than to anything that compilers should have to deal with. Yes, the "native", underlying representation should still be something that we can somehow make sense of - I'm not saying that our identifiers should be GUIDs or anything like that! But it will only be resorted to in a pinch.)

Sharlin · on Jan 28, 2019

That's a false dichotomy. If you were talking about restricting identifiers to codepoints in the Unicode "letter" categories, you might have a point. (Nb. there are approximately 160,000 "letter" codepoints in Unicode, the vast majority of which being CJK ideograms.)

WalterBright · on Jan 28, 2019

D allows the same Unicode codepoints as identifier characters as does C and C++.

coldtea · on Jan 28, 2019

No, I'm saying restricting identifiers to AZaz09_.

I don't see a dichotomy (much less a false one). What are the two options I separate artificially?

I'm saying just don't impose regional alphabets (other than AZaz that's already par for the course with the syntax of all major programming languages anyway) and regional words into source code.

Sharlin · on Jan 28, 2019

You disregarded 99.99% of Unicode as "math symbols and poop emoji". That's just a ridiculously biased Anglocentric viewpoint. It's 2019; there's zero reason to force people to stick to either a subset of their native script, or a completely foreign one, when naming identifiers in a programming language.

ketzu · on Jan 28, 2019

Valid characters are usually not the full ASCII set at all, so why not pick and chose from unicode as well?

Or why not full unicode text support? Is there any real reason besides "some people might want emoji and I don't like that"?

SiempreViernes · on Jan 28, 2019

They might use their native language too!

But no, the totality of the argument always reduces to: "I'm not used to this and would find it inconvenient"

kccqzy · on Jan 28, 2019

I haven't used D, but in Haskell we have the same module name == file name thing. The only time I don't like it is when we have nested modules, the parent and children modules are not in the same directory:

    import A      -- compiler reads A.hs
    import A.B    -- compiler reads A/B.hs

Thus, two semantically related modules are now in different directories.

Python, IMO, handles this correctly by having __init__.py support inside directories. It's theoretically less elegant because of the special name, but in practice leads to better file organization.

Same for Rust, but even better, because one can define nested modules in the same file. So you can either define a new module in the same file, put it in a different file named by the module, or put it in the file `mod.rs` inside the directory named by the module.

ben-schaaf · on Jan 28, 2019

D also has package.d for the same effect as python. It works quite well as package is a keyword and thus can't be used as a regular module anyway.

p0nce · on Jan 28, 2019

But package.d could lead you to overdepend on too many modules too :) it's mostly great for end applications rather than internal to libraries.

ben-schaaf · on Jan 28, 2019

Sure, but that's not specific to package.d. See the earlier convention of all.d or d.d. That's the trade-off with having public imports. I don't see how this is related to my comment though?

JoshTriplett · on Jan 28, 2019

> The only time I don't like it is when we have nested modules, the parent and children modules are not in the same directory:

> import A -- compiler reads A.hs

> import A.B -- compiler reads A/B.hs

That has to happen at some level, assuming subdirectories; otherwise, what would "import A.B.C" refer to?

_blrj · on Jan 28, 2019

For readers, Lua does this in the form:

    require( "engine.shared.entities" )

or

    require "engine.shared.entities"

Which means look for:

    engine/shared/entities.lua

Additionally, with the `package` module, `require` can also be modified to look for:

    engine/shared/entities/init.lua`

which is a common Lua practice.

sephoric · on Jan 28, 2019

Lua can also be modified to have "engine.shared.entities" look for a file named "/engine-shared-entities.lua" and similar things. The creators have cited that Lua is sometimes used in environments where directories do not exist, only files.

agumonkey · on Jan 28, 2019

what were your inspirations ? python ? others ?

dkarl · on Jan 28, 2019

As someone who programmed professionally in C++ for eight years but hasn't touched it since 2011, all I can think of is that it would be easier to learn Rust than to catch up to current C++.

Not that my memories of C++ are bad or that I'd avoid using it again! It's just that it would be like trying to reconnect with someone I haven't seen since college. I'm curious, but I don't know if it would be worth the awkwardness.

TillE · on Jan 28, 2019

As has been true throughout the history of C++, nobody uses every single feature. You just pick what you want (foreach loops, yay!) and write code. All your old code still works.

It's the best time to be a C++ programmer, because if there's something that annoyed you about C++03, there's probably a better way to do it in C++11/14/17.

htfy96 · on Jan 29, 2019

Just an opinion: Rust and C++ are going on different paths and selecting different subset of features. Rust generally lies in higher layer, while when it comes to tough situations like ABI compatibility, a machine that CHAR_SIZE does not equals to 8, weird platforms that have very weak consistency model, recovering your program when each bit might be flipped by random, dealing with platform-related half-broken OS APIs and manipulating stacks to squash last drop of performance from your program, C++ is still the only choice.

In a word, Rust is becoming "C++ for 80% cases", but the remaining 20% is inherently difficult (much harder than most people's wildest imaginations - just try to implement a file system library, make it work on last two versions of Windows, Mac and major Linux distributions and you'll understand what it means).

cheez · on Jan 28, 2019

As someone who programmed professionally in C++ for almost 20 years, you're wrong. Just look at a tour of C++ by stroustrup and you are 80% of the way there. It's a very simple book.

apta · on Jan 28, 2019

And the remaining 20% will take 200% of the time? :P

mempko · on Jan 28, 2019

Isn't this true with Rust too?

cheez · on Jan 29, 2019

Mastery always takes forever.

whathappeneda · on Jan 28, 2019

Is there any successful product you made using Rust?

sigfubar · on Jan 28, 2019

In addition, C++ doesn’t come with Rust’s awesome community and is missing all of Cargo’s benefits. Rust’s genesis as a brand new language acted as a crucible that melted away all kinds of molds.

DerDangDerDang · on Jan 28, 2019

Rust’s ‘awesome community’ comes across to me as pushy and arrogant more often than not.

sigfubar · on Jan 28, 2019

You're certainly entitled to your opinion.

ziotom78 · on Jan 28, 2019

The idea of forcing modules to be defined in files with a deterministic way to pair module names and file names seems pretty reasonable. Two examples come to my mind:

- Go doesn't place requirements to the name of files defining a package [1]. However, it has a preprocessor neither, so the problems described in this article (specifying the module name within an #ifdef) are impossible.

- FreePascal has a preprocessor [2], but it defines a deterministic algorithm [3] to find the files containing a unit (the FPC equivalent of a module). Moreover, the compiler creates two files for every unit: a .o object file, and a "unit description file" [4], much like the C++ proposal.

It seems that FPC's case is the most similar. I think the author is right; the C++ committee should adopt a deterministic way to find the name of the files defining a module.

[1] https://golang.org/ref/spec#Packages

[2] https://www.freepascal.org/docs-html/current/prog/progse4.ht...

[3] https://www.freepascal.org/docs-html-3.0.0/user/usersu7.html

[4] https://www.freepascal.org/docs-html/current/prog/progse13.h...

ithkuil · on Jan 28, 2019

> - Go doesn't place requirements to the name of files defining a package [1].

In Go, the import path and package name are two distinct things: the import path locates and identifies the package, while the package name acts as the default name for scoping qualified name exported from that package.

Furthermore in Go the file names are not part of the import path: the name of the directory containing the files that together define a single package is part of the import path.

An imperfect analogy with C++ would be:

- Go import paths <-> path to the included file (header)

- Go package names <-> a namespace inside that included file.

- Go package filenames <-> sections within the included file

gmueckl · on Jan 28, 2019

What really scares me is that the use of the preprocessor is still allowed inside modules. This might have been a chance to define a clean break at least with #include and the shortcomings thereof. A syntactically and semantically saner replacement for #define amd #ifdef could have laid the groundwork for a much improved tool support. But if the preprocessor is dragged into modules as a whole (sans interaction between modules), the only gain is in language complexity.

I'm generally disliking the need to maintain separate header and implementation files. Maintaining both is time consuming and putting everything in headers is no panacea, either. Now modules seem to add another type of interface definition to the mix that would need to be maintained after a project adopts modules.

stochastic_monk · on Jan 28, 2019

How would you write different code for different architectures or based on compilation flags without preprocessor directives? Rust has directives for specifying which version to compile, but C++ currently doesn’t. However, I find #if __AVX512BW__/#elif __AVX2__/#elif__SSE2__#else/#endif to be easy and flexible, allowing only a subset of code to vary by architecture. I also find that macros allow me to write much more concise, maintainable code.

It’s archaic and low level, but it’s also powerful and expressive. Replacing the CPP would probably just require a new language.

tambre · on Jan 28, 2019

What about using `if constexpr`? Sure, there currently are no equivalent constant variables to use, but adding those should be very easy.

stochastic_monk · on Jan 28, 2019

if constexpr only works for procedural code, not data members. (e.g., use __m128i v[4] for SSE2, __m256 v[2] for __AVX2__, etc.) It could be templated, as long as there was a compile-time method besides the CPP to get architectural information, so that would be a way (if much more verbose) forward.

That being said, if constexpr is great; iterating over either sparse or dense arrays with in Blaze without wrapping in two functions was mind-blowing when I discovered I could so easily.

gmueckl · on Jan 28, 2019

That is why static if and version in D work very differently. Alexei Alexandescu criticized if constexpr in C++ for being a poor copy of static if that misses the point, but that was dismissed. But it is powerful enough to handle almost all cases of conditional compilation.

rini17 · on Jan 28, 2019

Isn't C++ supposed to be object-oriented? Why you couldn't have specific class optimized for each architecture?

pjmlp · on Jan 28, 2019

Like when one does proper C or C++ clean coding without pre-processor spaghetti, an implementation file for each architecture or OS specific coding.

MrStonedOne · on Jan 28, 2019

So, copy and pasted code spread out everywhere rather than centralized?

pjmlp · on Jan 28, 2019

Not at all, fine tuned in small functions or architecture specific headers.

banachtarski · on Jan 28, 2019

Yes thank you. Not to mention that the parent comment neglects the fact that splitting out files just means you have to now worry about conditional inclusion or tighter coupling with your build system.

pjmlp · on Jan 28, 2019

That approach has successfully turned a pile of spaghetti C pre-processor code into a managable secure server application deployed across Aix, HP-UX, Solaris, Linux and Windows NT/2000.

Yes it was tighly coupled with Make that took care of selecting the proper set of files to compile and link based on the platform.

The anti-module crowd seems to want to have it all, which to me is the same anti-exceptions and anti-RTTI crowd, and in that case modules are indeed dead-on-arrival.

frutiger · on Jan 28, 2019

> That approach has successfully turned a pile of spaghetti C pre-processor code into a managable secure server application deployed across Aix, HP-UX, Solaris, Linux and Windows NT/2000.

I have also weitten and deployed what you’re denigrating as “preprocessor spaghetti code” across the above exact platforms, with a lot of success.

Perhaps we work(ed) at the same company.

wbkang · on Jan 28, 2019

Couldn't you just link against a different version? e.g. mymodule-avx2.o vs mymodule-avx.o

gpderetta · on Jan 28, 2019

What's your story to migrate bilions of line of code that are currently using the preprocessor to use modules?

One of the design goals of the module proposals is that there is a migration path from pure include to pure modules. The transition must not require a flag day.

In particular a program must be able to handle a mix of modularized libraries and old school libraries for the rest of the eternity (it is not likely that C is going to move to modules anytime soon).

Deganta · on Jan 28, 2019

That would mean that it's impossible to mix old and new C++ codebases, which would make it very very hard to port larger projets. It may even make it impossible if one uses a header only 3rd party library.

gmueckl · on Jan 28, 2019

It would certainly restrict the ways in which code could be mixed or updated. But I do not think that it would be as hard as you make it out to be. Let's say that you can have either modules without includes or more traditional translation units that use a preprocessor and can also import modules. Then you could port then code over one module at a time, couldn't you?

Deganta · on Jan 28, 2019

Not really. You can't wrap a translation unit that uses a 3rd party library into a module. That means every other translation unit that uses this unit also can't be a module and so forth.

richard_todd · on Jan 28, 2019

The article actually says c++ would be best to follow Python’s import model, which makes no sense to me. It doesn’t work to try to get the compiler to go compile other modules when you import them, because it has no way to know how you want them compiled (build switches, #defines, pre-compilation steps, etc). The eventual answer has to leave the dependency-relationship of modules to the build system; I don’t understand how that can even be up for debate in a c++ context. If the proposals include pulling a standard build system into the compiler, then they have very little chance of gaining traction.

carlmr · on Jan 28, 2019

Say if modulename == filename, then you can create a build system that creates a DAG and waits on compiling this module until the BMI files are there.

If you have the other module as a build step it will compile. If not, you have a linker error.

I don't see the issue here?

Especially if you force module imports to be at the top of the file, you could only scan the start of the file to get an idea on whether you can continue and pause until it's possible or all modules are finished and you didn't get your BMI.

quietbritishjim · on Jan 28, 2019

I might've misunderstood your comment, but I think you're referring to a different meaning of "Python’s import model" to the parent comment. Here "Python's import model" refers to the fact that when Python comes across an import statment it will potentially pause compilation of the current translation unit to go and compile something else. It does not refer to the fact that Python maps import statements to directory names and file names. Here is the relevant quote from the article:

When a new import statement is encountered, Python will first find a source file corresponding to that module, and then look for a pre-compiled version in a deterministic fashion. If the pre-compiled version already exists and is up-to-date, it will be used. If no pre-compiled version exists, the source file will be compiled and the resulting bytecode will be written to disk. The bytecode is then loaded.

The article suggests using this idea in C++ and the parent comment objects, but then it sounds like you're saying it wouldn't be needed anyway (so you're disagreeing with the article too?).

reissbaker · on Jan 28, 2019

Not deeply familiar with C++ modules, but I've built and maintained fairly large build systems for other languages (and written my fair share of C++), and from this article I'm not quite sure where the intractable problems lie. It seems like the .bmi files are effectively an optimization that allows for fast incremental compilation, but a compiler doesn't actually need them to run compilation from scratch: it knows how to generate them, so if they're missing it can fall back to the old, slower #include-style compile-every-file behavior, generating the .bmi files as it goes. It doesn't seem like they add new slow paths that you can't already construct today with macros and #include, so it's hard to see why they'd be DOA: first time compilation should be no slower, but incremental compilation should be much faster thanks to interface stability.

Maybe I'm missing something?

It's not like C++ modules were designed by random nobodies, though; this has been worked on by build infra engineers at major companies with enormous C++ codebases like Facebook, and compiler maintainers e.g. the Clang maintainers. It's possible they completely forgot to think about parallel builds, but that seems at least a little unlikely.

jsnell · on Jan 28, 2019

But you can't just compile-every-file. Each file can depend on the outputs of compiling some unknown set of other files. The compiler needs to become a build system, or the build system needs to become a compiler.

The clang modules proposal had the concept of mapping files, mapping module names to file names.

Companies like Facebook will presumably use proper build systems that already encode the dependency information in the build files rather than try to autodetect it. In that kind of an environment this proposal probably isn't particularly painful.

coliveira · on Jan 28, 2019

The compiler will not become a build system because this is out of scope for C++. With or without modules, C++ will continue to rely on an external dependency management tool, such as a Makefile. The introduction of modules will not change anything in this respect.

jsnell · on Jan 28, 2019

Indeed. You've now taken one solution off the table. The other one is for the build system to become a compiler, which is equally unacceptable. That leaves you with manually encoding all dependency information in the build files. Which most people aren't doing (the exception being Bazel-like build systems which enforce that).

That seems to leave us with just one conclusion: the article is right, and most of the ecosystem will never migrate to modules, leaving us with the worst of both worlds.

ginko · on Jan 28, 2019

The build system doesn't need to become a compiler. It just needs to parse the source files for import statements in order to create a DAG.

jcelerier · on Jan 28, 2019

> The build system doesn't need to become a compiler. It just needs to parse

parsing C++ is mostly equivalent to becoming a C++ compiler.

gpderetta · on Jan 28, 2019

The module preamble has been explicitly designed to be parseable without requiring a full blown C++ parser.

geezerjay · on Jan 28, 2019

> parsing C++ is mostly equivalent to becoming a C++ compiler.

It reallt isn't. Parsing a languagr just means validating its correctness wrt a grammar and in the process extract some information. Parsing something is just the first stage and a one of many stages required to map C++ source code to valid binaries.

tcbrindle · on Jan 28, 2019

The presence of template specialisations and constexpr functions means that the GP is right here; you cannot decide whether an arbitrary piece of C++ is syntactically valid without instantiating templates and/or interpreting constexpr functions. Consider

    template <int>
    struct foo {
        template <int>
        static int bar(int);
    };

    template <>
    struct foo<8> {
        static const int bar = 99;
    };

    constexpr int some_function() { return (int) sizeof(void*); };

Now given the snippet

    foo<some_function()>::bar<0>(1);

then if some_function() returns something other than 8, we use the primary template and foo<N>::bar<0>(1) is a call to a static template member function.

But if some_function() does return 8, we use the specialisation and the foo<8>::bar is an int with value 99; so we ask is 99 less than the expression 0>(1) (aka "false", promoted to the int 0).

That is, there are two entirely different but valid parses depending on whether we are compiling on a 32- or 64-bit system.

Parsing C++ is hard.

EDIT: Godbolt example: https://godbolt.org/z/yR3YHW

ginko · on Jan 28, 2019

You only need to parse the "module <module name>" and "import <module name>" statements. No need to parse all of C++ for that. You could probably even do that with a regex.

galangalalgol · on Jan 28, 2019

It also has to do all the preprocessing to see which import statements get hit. I don't think templates could control at compile time which module to import, at least I hope not.

sobellian · on Jan 28, 2019

Parsing C++ is literally undecidable. You can encode arbitrary programs which will emit a syntax error depending on their result.

Here's an example of a program which compiles only if the constant N is prime, and otherwise emits a syntax error: https://stackoverflow.com/questions/14589346/is-c-context-fr....

coliveira · on Jan 28, 2019

You are misrepresenting the concept of undecidable. If the compiler can say if the program compiles or not, then it is most certainly decidable. What you want to say is that it cannot be determined without full parsing, so no preprocessing is possible.

drewgross · on Jan 28, 2019

No, it's actually undecidable. C++ templated have been determined to be turing complete, which means that template instantiations can encode the halting problem. Determining whether a program compiles or not therefore requires solving the halting problem.

In practice, compilers work around this by limiting template instantiation depth.

sobellian · on Jan 29, 2019

I gave an example of a template program to show the general method. Obviously, primality is decidable, but there exist candidate C++ programs whose parse tree is undecidable. The trick would be to encode your parser in a template, run it on the undecidable program (i.e., itself), and create a contrary result. Does this have any effect on practical C++ builds? I honestly have no idea.

Deganta · on Jan 28, 2019

They could just specify that the module/import statements need to be at the top of the file (excluding comments). Most people will do this anyway. Then the build system only needs to parse comments and module statements, which should be fast and easy.

tcbrindle · on Jan 28, 2019

Except that, as mentioned in TFA, as currently proposed it's possible to say

    #if SOME_PREPROCESSOR_JUNK
    import foo;
    #else
    import bar;
    #endif

This has legitimate use cases, say

    #ifdef WIN32
    import mymodule.windows
    #else
    import mymodule.posix
    #endif

So in reality build systems will be required to invoke at least the preprocessor to extract dependency information.

AFAIK the modules support in the Build2 build system does exactly this, and in fact caches the entire preprocessed file to pass to the compiler proper later.

gpderetta · on Jan 28, 2019

That's exactly how it has been specified.

humanrebar · on Jan 28, 2019

Or do the modules equivalent of 'g++ -MM'. Or is there some reason we wouldn't want that approach?

jsnell · on Jan 28, 2019

That's the problem: you can't do that.

Having the compiler produce header dependency information is possible, since the dependencies are just an optimization. If there's no dependency information available, you can just compile all of the files in an arbitrary order, and you get both the object file and a dep file. And then on further runs you use the old dep files to skip unnecessary recompilations.

With modules, you can't compile the files in an arbitrary order: if A uses a module defined in B, B must be compiled first. So you need to have the dependency information available up front even for the first build. And since it needs to be available up front, it can't be generated by the compiler. It must either be produced by the build tool which becomes vastly more complicated, or manually by humans.

coliveira · on Jan 28, 2019

This is not different from a situation where C++ compilation has a binary dependency on other modules. The best know situation is a static library (.a file). In this case, the project cannot be built if there is a static library missing. With modules, one cannot compile the project with missing modules, so the build system will have to provide this information.

ZirconiumX · on Jan 28, 2019

But you also need compatibility with #include, so your build system also needs to become a preprocessor.

ginko · on Jan 28, 2019

The preprocessor is a self-contained binary. The build system can just pipe through it.

gchpaco · on Jan 29, 2019

The C and I presume C++ standard has been very carefully avoiding the idea of a preprocessor being separate at all. The standard was very carefully worded to prevent that being necessary, because most C compilers do not have a separate preprocessor. It is only Unix heritage compilers that really have one, and even they're not consistent about it.

ZirconiumX · on Jan 29, 2019

I'm curious how you plan to detect dependencies from a file by piping it through a command that removes those dependencies.

coliveira · on Jan 28, 2019

Your conclusion is incorrect. Most people with simple projects will use simple techniques to make modules work without worrying about the preprocessor. Large companies will create their own tooling to use modules in their own way. My point is that this is how C++ has been used since its inception. C++ users are already aware that the language needs external building support and modules cannot change this reality. But modules will certainly improve how the language is used.

Asooka · on Jan 28, 2019

The compiler won't be a build system? I'm not quite so sure. We already have -MD in gcc to emit Makefile rules for the dependencies of the current file. It's not much of a stretch to propose a similar flag to emit a list of required modules. In fact the very same flag could emit a foo.bmi target requirement when you "import foo" and your Makefile should have foo.bmi as one of the products of compiling the foo module. You could also have a similar flag that tells you what modules are built from the current cpp file given some compiler options.

dodobirdlord · on Jan 28, 2019

What I gathered is that module compilation is intended to be safe from preprocessor actions defined outside of the module. So the code you would generate with #include-style compilation and the code you would generate by compiling modules in the intended fashion aren't guaranteed to be the same. It seems as though this would mean that projects involving modules simply couldn't be compiled in the previous fashion.

andreareina · on Jan 28, 2019

First compilation time will be slower if before you could compile 8 files at a time, but now you can only do 3 at first because the others depend on those 3. Then maybe you can compile 6 at the same time, because all the rest of the code depends on those 6 modules, etc.

tokyodude · on Jan 28, 2019

> In this respect, C++ would be best to mimic Python’s import implementation: When a new import statement is encountered, Python will first find a source file corresponding to that module, and then look for a pre-compiled version in a deterministic fashion. If the pre-compiled version already exists and is up-to-date, it will be used. If no pre-compiled version exists, the source file will be compiled and the resulting bytecode will be written to disk.

What??? How would that happen? Are modules always compiled with zero flags because in non-module c++ how the dependent module gets compiled is defined in the build system so in order for the compiler to build a missing .bmi it would have to ask the build system how to build it .

That seems to answer the question. What happens if foo.bmi does not exist? Anwser: you get a compilation error (Missing foo.bmi or foo.bmi out of date). You then need to go fix the dependencies in your build system to make sure foo.cpp gets compiled before bar.cpp.

Right?

I get that might suck but it's not unprecedented. lots of builds have dependent steps. Maybe in order to implement C++ modules build systems will need an eaiser way to declare lots of dependencies where as now dependencies are an exception?

martincmartin · on Jan 28, 2019

What??? How would that happen? Are modules always compiled with zero flags because in non-module c++ how the dependent module gets compiled is defined in the build system so in order for the compiler to build a missing .bmi it would have to ask the build system how to build it .

The build system just figures out how to invoke the compiler. The compiler does the actual building. When the compiler runs, it has all the flags.

Remember, headers in C / C++ are basically a file level construct. They happen before you even split the file into tokens. #include just means "do the equivalent of opening that file in a text editor and copy and paste it in place of this #include line."

The compiler is already compiling header files, as part of compiling cpp files.

tokyodude · on Jan 28, 2019

But we're not talking about header files, we're talking about modules. Modules are a new concept so how they work is up for definition. To say that if bar.cpp uses module foo that foo.bmi must already exist is not an unreasonable rule.

modules work with the import statement (new) not the #include statement. They are not the same as include at all.

In fact this is spelled out in the article in the first goal

> The “importer” of a module cannot affect the content of the module being imported. The state of the compiler (preprocessor) in the importing source has no bearing on the processing of the imported code.

In other words, the flags passed in when compiling bar.cpp have no effect on foo.bmi. foo.bmi is the result of the flags passed in when foo.cpp was compiled and those flags can only be gotten from the build system if foo.bmi does not exist.

est31 · on Jan 28, 2019

Just for comparison, in Rust this is solved in a very easy way: If you are in a module foo and have a mod bar; statement, then the compiler will go search for bar.rs and for bar/mod.rs. If neither are found, it reports an error. There is only one path where the compiler starts the search from: the foo/ directory (note that the foo module itself can be declared in the foo directory or outside as foo.rs).

Sometimes C++ can use its age as an excuse to be super complicated but here, the modules implementation of C++ is younger than Rust's.

sanxiyn · on Jan 28, 2019

Unlike C++ modules, Rust modules are not separately compiled. Better comparison is with Rust crates.

est31 · on Jan 28, 2019

In Rust's crate compilation model, there's certainly some unexploited parallelism. Often as much as half of the compilation is spent in the LLVM phases. By then, all the MIR is already around and only sitting there, waiting on LLVM to finish. Downstream crates could already start their compilation with the MIR data only. Only the LLVM phases of the downstream crates need the LLVM data of the upstream crates. Assuming that half of the time is spent in MIR, half in LLVM IR, you would be able to double your parallelism, or halve the length of the critical path through the compilation graph.

zozbot123 · on Jan 28, 2019

Isn't this pipelining as opposed to parallelism? I assume that multiple downstream crates are already compiled in parallel whenever possible.

est31 · on Jan 28, 2019

Yes, often things are compiled in parallel but often there are tight spots in the compilation graph where only one crate is compiling because all later crates are relying on it.

haberman · on Jan 28, 2019

I don't get it. What about modules inherently forces them to be imported via this newfangled module namespace (eg. import std.iostream) instead of being imported by their source filenames (eg. import "iostream.h")?

If I'm understanding the post correctly, the entire problem they are facing is that you have to scan all source files to build this module->filename mapping.

None of the "essential goals" listed at the top of the blog post requires that modules be imported by namespace instead of filename, as far as I can see. So why was this design chosen when it causes these problems?

Chabs · on Jan 28, 2019

C++, as a language, has never cared about the notion of "files". The entire standard is defined as a function of a "Translation Unit", which is an abstract notion that we tend to associate with "a single .cpp file" by nothing but convention.

Since modules operate at the language level, they need to operate on this notion, which precludes importing by file.

brianberns · on Jan 28, 2019

But a header is a file, no? And it is referenced explicitly by its file name.

gmueckl · on Jan 28, 2019

No, the language of the standard carefully omits talking about files. This is because there are still ancient mainframe operating systems around that do not have typical hierarchical filesystems, but it is still technically possible to provide a C++ implementation for these. Prescribing a module name to file name mapping woukd not work in these environments either. This is also why #pragma once was rejected and the replacement #once [unique ID] was invented instead: just defining what is and isn't the same file for #pragma once turned out too difficult to define.

ahaferburg · on Jan 28, 2019

What I don't get though is why these ancient mainframes need the latest version of the standard. I can't imagine the compiler writers for these OSs to be too eager implement any change at all. You said "technically possible", are you implying that nobody actually does? What are these OSs?

To me this seems like a weird take on accessibility. In order to accommodate that one OS that has some serious disabilities, everyone else has to suffer the consequences. Why not build a ramp for that one OS, and build stairs for everyone else?

gpderetta · on Jan 29, 2019

IBM has multiple people in the standard committee and they care a lot for both backward compatibility and new standards. They alone were strongly opposed from removing trigraphs from the standard.

Still trigraphs were removed in the end; if there is enough support the committee is willing to break backward compatibility.

simias · on Jan 28, 2019

>just defining what is and isn't the same file for #pragma once turned out too difficult to define.

Admittedly that's not just a problem with old mainframes. Any system supporting file aliases (be it hardlinks, symlinks or the same FS mounted at several locations for instance) would be tricky to handle.

I always thought #pragma once was a bad idea for that reason, header guards with unique IDs don't require any compiler magic and are simple to reason about without having to read the standard or compiler's docs to figure out how it operates.

Chabs · on Jan 28, 2019

That's handled by the preprocessor. It's literally just a "insert the contents of that file here" copy-paste.

haberman · on Jan 28, 2019

But the preprocessor is part of the C++ standard, no? I'm really not seeing why it's ok for the preprocessor to refer to files but not the language.

Also, going to this level of trouble to support systems that don't have files seems... odd. Targets that don't have files, that I can totally understand. But compiler toolchains that don't have the notion of a file? That sounds obscure beyond obscure. I'm surprised such a system would be a compilation host instead of a cross-compile target.

Chabs · on Jan 28, 2019

We are talking about a language that goes so far as to make sure it functions on systems where the size of a byte is not 8, or where memory is not necessarily linearly addressed. People tend to forget how shockingly flexible standard-compliant C++ code actually is.

haberman · on Jan 28, 2019

I get that, but there's also precedent for cutting ancient things loose. Both C and C++ have finally decided to specify that signed integers are two's complement: https://twitter.com/jfbastien/status/989242576598327296?lang... Also trigraphs are gone in C++17.

gardaani · on Jan 28, 2019

This C++ code actually compiles with clang++. Incredible!

    int main(int argc, char *argv<::>)
    <%
        if (argc not_eq 1 and argc not_eq 2) <% return 1; %>
        return 0;
    %>

https://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C

bregma · on Jan 28, 2019

Digraphs are still a part of the language. I would be more surprised if a conformant piece of code did not compile with a conformant compiler.

pjmlp · on Jan 28, 2019

Trigraphs are gone, but it took a while to win IBM representatives over it

favorited · on Jan 28, 2019

I don't think they were ever really won over, I think their concerns were heard and they begrudgingly acquiesced rather than vote down C++17.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n421...

nitwit005 · on Jan 28, 2019

From prior discussions, there is the issue that they want the existing standard headers work with modules. They weren't written with modules in mind, so that requires some contortions.

choeger · on Jan 28, 2019

So you need to organize your project in a sane way regarding module dependencies. Bummer.

OCaml or Haskell do just fine with proper modules and parallel builds. I presume the same holds for Go, rust, et. al.

pjmlp · on Jan 28, 2019

So far, from what I have understood, the anti-module movement seems to be all about wanting to use modules as if they were header files, while magically winning the compilation speedup of modules.

alexeiz · on Jan 28, 2019

What movement? You got infected by politics to think everything is a "movement".

pjmlp · on Jan 28, 2019

Any language standardisation process is full of politics.

xenadu02 · on Jan 28, 2019

The C++ modules folks seem determined to learn nothing from Clang modules.

Clang’s module maps solve a lot of these problems. There is a known location to find the module map and all headers from the module must be reachable from the map.

pjmlp · on Jan 28, 2019

Clang’s module maps aren't without issues, which is why they have been mostly used by Google and no other C++ compiler vendor bothered implementing Clang's design.

Even at Apple, which originally designed them, they just focused on making it good enough for C and Objective-C system headers.

Ace17 · on Jan 28, 2019

Interesting talk with a critical flavour about C++ modules by John Lakos (CppCon 2018):

https://www.youtube.com/watch?v=K_fTl_hIEGY&feature=youtu.be...

sanxiyn · on Jan 28, 2019

> The compilation ordering between TUs will kill module adoption dead in its tracks.

I agree, but that's because C++ TUs are too small. Rust works like this "dead-on-arrival" way, but survives because TUs are larger.

muststopmyths · on Jan 28, 2019

Granted I haven't looked too deeply into this in a while, but this article seems to imply usage of modules in a way that is fundamentally incompatible with the actual goals of modules.

For example,according to the clang documentation for modules [1], they are meant to obviate the necessity of #including headers of libraries you are linking to.

When you link with libraries, you expect them to be compiled already. If the library is a dependency of your current project (in Visual Studio solution parlance), then it will be recompiled if needed before your project is compiled, if your build system is set up correctly. You will also take care to point your build system to the correct version of the library you want to link to, including versions that change depending on CPP flags etc.

I don't see how modules are any different.

Please enlighten me if I'm not fully grasping the point of the article.

[1] https://clang.llvm.org/docs/Modules.html

comex · on Jan 28, 2019

The "modules" feature documented on that page is Clang-specific functionality. The article is talking about the newer Modules TS, which is currently in the standardization process, and works completely differently. However, as to your specific question, both Clang modules and the Modules TS aim to support replacing #include entirely with module imports, including within a single project.

muststopmyths · on Jan 28, 2019

interesting. Will have to read up on it. thanks.

johntb86 · on Jan 28, 2019

I think visual studio might require a library to wait for another library it depends to cookie on before compiling, but that's not necessary. In general you can compile all c++ files at the same time - the only stage that needs to wait is linking.

muststopmyths · on Jan 28, 2019

yes, that is really annoying behavior. I once built a tool to set up custom compilation ordering to work around this.

im3w1l · on Jan 28, 2019

I haven't read much at all about this, but why couldn't modules work like this?

Header, source like before. Allow putting all implementations in the source (including templates). Allow private member functions to be defined in the source file even though they were not declared in the class. Every function that is not declared in the header is no-external-linkage (including these adhoc private members). Defines don't leak into header file from including files unless explicitly passed into it (using some new syntax). Defines can leak out of header files. If the define already exists it's an error (the point of this is to make sure that order doesn't matter), unless the origin of the define is the same (so if x includes y and z, and y includes z, then a define in z would go into x directly and also through y which would not be an error).

This seems like it should be a strict improvement.

creato · on Jan 28, 2019

> Header, source like before. Allow putting all implementations in the source (including templates).

Do you realize how hard this is to implement? This actually was part of the original C++ spec (export keyword), but no compiler successfully implemented it in a way that was compatible with other compilers, and it was deprecated.

This (and C++ modules) requires a standard intermediate representation of the language that all compilers share, otherwise compiler A can't use a module generated by compiler B.

This is why I've never really expected C++ modules to ever exist, or if they do, it will be in a form that is much more limited than most people want or expect. Either they'll only allow a subset of the language to exist in a module, or the feature won't be much different than the "pre-compiled header" feature offered by most C++ compilers), or modules won't be portable across compilers (and maybe even versions of the same compiler).

bigcheesegs · on Jan 28, 2019

This has always been a problem in talking about modules in C++. Everyone has a different idea of what a module is.

Currently every compiler except MSVC plans to make module files version locked. Different versions of the same compiler will use different module files.

Personally I find little value in portable module files, as you need to be able to rebuild modules anyway to handle pretty much any change to compile flags.

jeffdavis · on Jan 28, 2019

I have a lot of respect for C++ because it's still holding together (thriving even) after changing and adding so much.

But I have to wonder, is adding one more big feature wise? Will modules be the feature where it becomes impossible to create a compiler that's both useful amd conforming?

m12k · on Jan 28, 2019

From my experience working on a commercial game engine, my guess is that almost anyone working on a huge C++ code base (e.g. a game engine, browser, OS or the like) would rank compilation speed as the #1 issue they would like to see addressed. Modules seem like the best candidate to improve this issue, primarily by limiting the scope of how much needs to be recompiled whenever a change is made. So I think it's a feature that is worth quite a lot of compiler-writer pain to get shipped, seeing as how many man-years it could save. That said, I don't really see why this would be all that tricky for compilers to implement compared to other recent additions in modern C++ - do you have any particular reason to think it would be? (other than apparently some communication issues within the body making the spec as per the linked article)

jeffdavis · on Jan 28, 2019

My worry comes mostly from the article. I'm not informed enough on this topic to say how feasible it is to resolve these concerns.

chris_wot · on Jan 28, 2019

Don't know who this Thomas Rodgers is, but telling someone to STFU is hardly going to be helpful.

eej71 · on Jan 28, 2019

I'm hopeful that modules can be salvaged. But I must admit - I find it hard to follow along with what the current proposal even is. I've assumed that this is the best and most complete summary of it. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p110...

coliveira · on Jan 28, 2019

I think the author is trying to map the way python works into C++, which doesn't make sense. The type of problems that he mentions with the preprocessor are ALREADY present with the compilation model used by C++. The solution is that each project needs to manage dependencies using Makefile or a similar tool. The process is not automatic as with python, but instead is managed according to the needs of each individual project.

pierrebai · on Jan 28, 2019

I'll admit right away of not having read the module deisgn documents. I'm basing my comments on teh description given...

It seems to me that the module-interface unit (MIU) would be pretty much the equivalent of present-day header files. So for two modules foo and bar, there is no dependency on the order foo.cpp and bar.cpp are compiled, because only the MIU needs to be compiled for a given module and the design ensures two MIU are isolated. If they are mutually dependent it's a bad design, but teh solution is the same as in Java: you need to build the MIU in the same compiler invocation. (In fact, that would probably happen automatically, using the equivalent of today's -I include directory directive to find MIUs.)

Yes, that means you need to split your module into a clean MIU andthe actual implementation file, just like now you split them in a header file and a cpp file.

Yes, you need MIU to be "available" to be translated to BMI gobally, just like you need header files to be available globally when compiling.

75dvtwin · on Jan 28, 2019

WRT > Module interface unit locations must be deterministic.

My immediate though was create a SQLLite database per some collection of modules (a crate of modules?).

Each row in that table will have unique 'business id' and unique row_id. business Id is a composite key on

module_name+exported_function_names_with_attribute_and_result signature

every BusinesID as defined above, can point to source location, compiled object location, last compilation time, compilation state.

Moving crates to other locations will not change the BusinessID. Moving compiled object location will not change the BusinessID.

The sqllite database can have a network interface for multi-machine compilation. can be backed up/restored into a new environment and be unchanged as long as compilation flags for all the modules remained the same, and compiler versions are same.

fooker · on Jan 28, 2019

Except, several large companies already use clang's implementation of modules internally.

petters · on Jan 28, 2019

Python (mentioned in an article) usually works nicely without a build system.

C++ does not. The build system will also take care of this problem, just as we currently e.g. define libraries in cmake.

norswap · on Jan 29, 2019

> Ahead-of-time BMI compilation is a non-starter.

I didn't get why - one could imagine parallelizing BMI generation, then parallelizing "normal" compilation.

The only issue I see is that you wouldn't know which BMI you'd need, so you would need to generate all of them (or regenerate those who are out of date), or specifically list those that needs to be generated in a build tool. Given how the rest of the compilation pipeline works, is that undesirable?

purplezooey · on Jan 28, 2019

"As far as I can tell, the paper was not discussed nor reviewed by any relevant eyes."

I'll keep my irrelevant eyes to myself then, thanks.

IshKebab · on Jan 28, 2019

Surely the binary module interface is just a binary version of the header file? Are you sure it has to be generated from `foo.cpp` and not just `foo.h`?

It is a binary module interface not binary module implementation.

phkahler · on Jan 28, 2019

From TFA: As this depth increases, modules grow slower and slower, while headers remain fairly constant for even “extreme” depths approaching 300

Well before reaching 300 I'd have to ask WTF are you doing? I mean really, seriously, if your dependency chain is 1/10 that deep I'd look for something wrong.

I've often thought that including headers from inside headers in C or C++ is a mistake. And that thinking is probably wrong. It makes sense when using a library that may itself have a lot of components - I just include the top-level header for the library. But even that is different from having a really deep dependency chain.

Maybe - just maybe - people have shifted the spaghetti out of their code and into the file structure.

richardwhiuk · on Jan 28, 2019

If foo.hpp is:

  class Foo {

  };

Then if I must write bar.hpp as:

  #include "foo.hpp"

  class Bar {
     Foo f;
  }

I cannot forward declare Foo because in order to size Bar, I must know the size of Foo. I must therefore include `foo.hpp` in `bar.hpp`, thus I must include headers in headers, unless my headers are not allowed to contain class definitions.

phkahler · on Jan 28, 2019

Which is fine, but if the chain is getting too deep you probably have excessive granularity and/or complexity. Foo could be defined in the same header as Bar if they are always used together. I still can't see getting anywhere near 300 levels deep in this stuff. You can also forward declare Foo in the header if it's just referenced via pointers in Bar.

This is the type of complexity that a good software "architect" should be trying to reduce rather than manage.

richardwhiuk · on Jan 28, 2019

Sure, but once you stray from blanket rules, it's harder to state categorically that 300 is an imperative to fix, and harder to prevent it occurring.

e.g. a blanket rule that a header file isn't allowed to include another header file is trivial to enforce, one which says it can't be more than n deep, is subject to boundary pushing.

brandmeyer · on Jan 28, 2019

Its a natural problem in any ecosystem that has ubiquitous code sharing. Most folks see this in the JS/npm universe, but I've also seen it happen in a big C++ monorepo.

Jumziey · on Jan 29, 2019

Ah, crap!

4bpp · on Jan 28, 2019

Half of what came out of the C++ standard bodies in the past five years makes me only half-jokingly wonder if the committees haven't been infiltrated by deep-cover Rustaceans trying to run the language into the ground in a plausibly deniable manner...

rafaelvasco · on Jan 28, 2019

It appears to me C++ will slowly fade away. Languages like Rust are gaining traction fast, and most importantly, they're building a huge, passionate and dedicated community, especially Rust itself. C++ people are slowly getting curious, like what's up with this Rust thing ? I could give it a try... Thats how it happens i guess.

smolsky · on Jan 28, 2019

Wow, that is the most often made generic comment.

Are there other languages _like Rust_? Have you heard of large, production code bases maintained at large successful companies that are written in Rust?

steveklabnik · on Jan 28, 2019

> Have you heard of large, production code bases maintained at large successful companies that are written in Rust?

Google, Amazon, Facebook, Microsoft, Dropbox...