The discussion between Andrei Alexandrescu (Facebook) and Ian Lance Taylor (Google) is highly intellectual, especially the part about how each language addresses concurrency.
"D has resolutely exited C++'s shade because it is good
at things C++ is not good at, in addition to being good at things C++
is good at. Also, Go would be tenuous to frame as a better C because
it cannot do everything C does - e.g. unsafe memory access and manual
memory management, which are needed in certain systems - and it
interoperates poorly with C."
In short, he states that D can do unsafe memory access and manual memory management. Go can't.
Note that the discussion in question dates back to 2010, before the existence of cgo. Go now has quite good C interoperability. And Go can now do unsafe memory access and manual management using the "unsafe" package.
Yes, if I was to learn a systems programming language, I would only pick from those that allow manual memory manipulation. Either that or the ability to trivially integrate with C functions that can do manual memory manipulation.
I fear you are confusing systems programming language and operating systems programming language (or conflating).
Go is a great systems programming language especially for modern concurrent systems — http servers etc. It talks to C easily so you can integrate all kinds of system level code into your apps. Having a Garbage collector makes it trivial to write long-running daemons.
Modern operating systems have traditionally been written in C (and C++) plus some assembly language code. D can do everything C & C++ can do but it would still no doubt need the assembly language code.
Go lacks manual memory management. Some say that this would be a barrier for writing an operating system while others don't. The fact that you would have to use some assembly language code to talk to the hardware and you might need to add some manual memory management via assembly language code. After that I'm sure the garbage collector would make the OS more reliable and potentially a little quicker in places.
Either way I don't see why Go gets criticised for needing a bootstrap layer when operating systems written in C and C++ also need this.
There are tons of user-level applications that need manual memory management and that don't have anything to do with operating systems, like for instance video encoding/decoding/streaming or high-frequency trading.
What really bothers me about Go is not really that it's garbage collected, but rather that its garbage collector sucks so badly. Really, not all garbage collectors are created equal.
For example I'm working on a startup and we've been integrating with various bidding exchanges for serving targeted ads. All the bidding exchanges want the response to be generated in under 100ms, which includes the network roundtrip. This means on the server-side, the average must not be higher than 10ms per request, preferably lower. Scala on the JVM can handle it, but when I tried out Go, it was a disaster ... as that garbage collector stops the world and it's totally unpredictable, so you end up with spikes of latency that can upset your partners and given enough incoming requests, it can also blow up your buffers/queues, crashing your servers. It's also non-compacting, but that's a given, as it's not even fully precise yet.
Which is the reason for why integrations with bidding exchanges are usually written in C++ too (in case it's not clear, we are talking about B2B web services). We've gone with the JVM because it provides a good productivity/performance balance, but eternal vigilance is needed in profiling the memory allocation patterns and tuning the garbage collector to handle the load. And Go requires even more tuning. Which is why sometimes I fantasise about a high-level language that allows for manual memory management, as things would be so much easier ... although I'm rooting more for Mozilla's Rust, than I am for D.
Sociomantic is a German based company that does online real-time bidding and is using D to do it: you can find more info in their blogpost about Dconf 2013 with links to two talks talking about their company and how they dealt with the GC issues you mention.
Interesting you chose Scala. I have built and managed Jetty based adservers to talk to bidding exchanges and wondered how would the system perform if the adserver was written in Scala instead. Going with Jetty initially was the right choice as none on the team were experts in Scala, and the base assumption was that Jetty is battle tested, has good documentation and is already optimized to perform great from the get go. So, we simply chose Jetty. I didn't get a chance to build an equivalent system in Scala to compare performance. Will be great, if you can throw some light on how you went about building highly performant adservers in Scala, and scaled them. Does coding in Scala make it a lot more easy to deal with concurrency (for development and debugging), keeping a small codebase, help in rapid iteration, and in general make it fun to program and manage the servers? Do you have a blog post or a write up on this topic? I encourage you to make one, if you don't have one already. Good karma, and an opportunity to showcase your engineering chops to hire other great engineers.
I wrote a simulation software to test a product in my previous job which used akka mostly but also used AHC which is a plain java http client library without problems. Size-wise the code would periodically get more features but by refactoring often it never got beyond 2500~3000 LOCs. Readability was never harmed; Scala - contrary to what some people think - allows for very clear code. People that inherited the code is working on adding features. Obfuscated code can be written in PHP if you want (or are unable to do otherwise). Another piece of scala software was an internal web service written using unfiltered on jetty. It's so stable sometimes I just forget it's there.
I think that Objective-C would qualify as a reasonably high-level language and offers a choice between manual and automatic, but very predictable memory management.
> Modern operating systems have traditionally been written in C (and C++) plus some assembly language code. D can do everything C & C++ can do but it would still no doubt need the assembly language code.
Actually, contrary to C and C++ language specification, in D, support for inline assembly is part of the language specification.
> downvote
sambeau 50 minutes ago | link | parent | flag
I fear you are confusing systems programming language and operating systems programming language (or conflating).
Go is a great systems programming language especially for modern concurrent systems — http servers etc. It talks to C easily so you can integrate all kinds of system level code into your apps. Having a Garbage collector makes it trivial to write long-running daemons.
Modern operating systems have traditionally been written in C (and C++) plus some assembly language code. D can do everything C & C++ can do but it would still no doubt need the assembly language code.
Go lacks manual memory management. Some say that this would be a barrier for writing an operating system while others don't. The fact that you would have to use some assembly language code to talk to the hardware and you might need to add some manual memory management via assembly language code. After that I'm sure the garbage collector would make the OS more reliable and potentially a little quicker in places.
While I am on the D and Rust field, I support Go's ability to do this.
Go is no different from Oberon in system capabilities. And Oberon was used to write quite a few desktop systems used at Zurich's ETHZ during the mid to late 90's.
The OS bootloader and the kernel package for hardware interactions were written in Assembly, with the remaing parts in Oberon.
>I fear you are confusing systems programming language and operating systems programming language (or conflating).
Systems programming doesn't seem to be a very well defined term. My understanding is that it is certainly not application programming and it requires pretty tight management of hardware resources. That includes things like operating systems, database systems, embedded systems, networking software like firewalls, etc.
I guess it's a difference in terminology. I've never heard anyone refer to writing an HTTP server as "systems programming"; I'm used to "systems programming" meaning the same thing as "operating systems programming". Sometimes people refer to writing higher level OS components (like init or libc) as "systems programming" as well, but I would think that a server that isn't a core part of an OS would just fall under the category of "server programming".
Go can do unsafe memory access, but doing things like pointer arithmetic or casting of a memory blob to an arbitrary struct type is way more painful than it is in C/C++.
In a lot of contexts, this is a feature (makes terrible code smells in normal code easier to see), but depending upon the type of coding you do, you do sometimes find yourself wishing it were a bit easier when dealing with things like graphics APIs where you just want to lock a texture, update some bits in-place, and then unlock it. Sometimes what might be a couple lines of C code are 10s of lines of Go code where you either do crazy gymnastics with the unsafe package, or juggle things in and out of byte buffers.
Rust has safe manual memory management, at the cost of a learning curve. In D you must use the garbage collector if you want memory safety, but it avoids all the complexity of lifetimes and uniqueness. This difference makes the two languages feel pretty different, even if at their core they're pretty similar.
D is more like C++ without macro and with GC plus some engineering features (unit test is a language feature? weird). Rust looks different and does different. For example, idiomatic foreach block in Rust is a syntax sugar of passing lambda, which is more friendly to parallelism. While in C++/D it is a syntax sugar of classic for-loop with iterator, which is more efficient in unparalleled environment.
You might thing that unit test as a language feature is weird. But our experience with it is that minor-seeming weird feature has been an enormous success.
It has literally transformed writing D code, and for the better. There's another little feature, -cov, which will tell you which lines of code are covered by the unit tests.
It's hard to understate the improvement these engender.
> For example, idiomatic foreach block in Rust is a syntax sugar of passing lambda, which is more friendly to parallelism
This changed with the recent 0.8, `for` is now syntax sugar for using the Iterator trait, which optimises exactly like C++ (the vector iterator even vectorises when LLVM can do it).
Rust has actually moved to have iterators that are very similar to (a subset of) D's ranges.
I actually think I do. Go does offer a couple of primitives for poking into memory, but it is my opinion that a systems language needs much more refined control over memory layout and allocation.
Still, I think you raised valid question in the first part of your message, though the second is a bit offensive and I also thank Andrei for his clarification and use the opportunity to congratulate for this milestone of D language. I like D and even if I use C++ at work, I know I'd enjoy using D especially for the features that let my code "see" and "build" the other parts of the code.
I think that a lot of confusion has been caused by Go using the term "systems language," because (as I understand it) Go doesn't mean it in the same way that C++ and D do. The Go folks seem to be thinking systems in the sense of large networks of computers and the like (the kind of stuff that Google typically does), whereas the C++ and D folks are thinking systems in terms of stuff like operating systems. What Go is trying to do does not necessarily require low-level primitives (though it can benefit from them), whereas what C++ and D are trying to do does require such primitives.
The problem is by deciding that they can use "systems language" (which they dropped) because "system" means "a set of connected things or parts forming a complex whole" results in every language falling under "systems language."
Others have been using it to distinguish languages suitable for writing operating systems/drivers years before Go introduced this confusion.
When I hear "systems programmer" I think computer-to-computer. Therefore when I hear "systems language" I think computer-to-computer even absent Go's use of the term. I not be disappointed if a systems language were not appropriate for creating operating systems.
Now does it allow unsafe memory access, or does it not?
And does the unsafe package allow you to build a C-style manual allocator or not (regardless of whether it integrates with Go's new or make operators)?
However, like many flaws with HN's underdeveloped software, you can't actually do that on HN because the parser includes the trailing > in the URL. So your best bet here is with whitespace.
I'd argue that the world has moved on since 1998. We live in a world full of URL parsers of various abilities, and doggedly enclosing URLs in angle brackets because of a memo written so long ago it also references gopher seems stubborn to no obvious gain. Given the state of the modern internet, I definitely wouldn't consider angle brackets to be the best way to delimit URLs.
> What is certain is that D's
type system is expressive enough to allow libraries to reject during
compilation embarrassments such as transporting pointers over the
network.
How does the language do this? Doesn't that have more to do with the design of the program itself? It certainly is a interesting security idea, much more strict than sending remote procedure calls or serializing objects and sending them across the network.
For one example of a D program in production use, check out http://forum.dlang.org. The forum software is entirely written in D (written by Vladimir Panteleev).
(Although due to this announcement, the site is being slammed at the moment. It's a single server hosting dlang.org, not a farm.)
The responsiveness of this forum is amazing. Is the client side doing some pre-fetching?
I remember finding the website disturbingly fast on my first visit. I wonder if it is just me being so used to loading delays, or if there is a point were you can be so fast that it is detrimental to the user experience. The latter doesn't sound like it makes sense so it must be of matter of getting used to it.
Well, a forum is essentially the same thing as BBSs were in the 80s but now we have computers that are several thousand times faster. I find it much more "amazing" that the other forums are so slow.
BBSs in the 80s / 90s were operated by users in their own homes and usually weren't allowing for multiple connections. The software itself was pretty dumb and slow.
Forums these days need to allow for multiple users to log in, read content and post messages. I'm pretty sure that the link above received several hundreds or thousands of users in a single hour. Forums these days also use database systems (or other services external to the process) that represent a big bottleneck. Then you've got the issue of static assets, like CSS, Javascripts, images and so on.
As a general rule of thumb, if you want scalability, you have to trade some throughput for it, as scalability concerns hurts performance.
That said, popular open-source web software, in general is awful in terms of everything.
The forum sends back a simple preformatted chunk of HTML which is then inserted at a certain spot.
It does not send a complex JSON object that is then marshalled into some kind of Angular-Knockout-Backbone framework of the day where complex magic creates views, links and templating language formats objects into HTML created via DOM.
As JS in the browser speed up, all those framework find some way of using it. Sure, your 5-item TODO list app looks good...
"Normally" such an app (avrider) would be set up with routing such that the url became /forum/thread/<id> -- and with a simple 6-digit id, that's much more managable to type/communicate manually than an uuid or something.
All that said, I basically agree -- exposing some "magic" constant/serial number to the user usually doesn't do much to help with the ux. It's actually an interesting problem: what would be the better, more "true" web/REST-like url-scheme for a forum? Perhaps forum/topic/sub-topic/thread-topic ? How limiting would that be wrt. reorganizing the forum threads (and not breaking links) -- this is especially an issue if there's a chance of re-using uri's -- it's annoying to have a link to one discussion morph into a link to another one. I don't think there are any solutions that are significantly better than a (symbolic/numerical) thread/topic id...
edit: Actually I see the sense in the email-id-like urls of the d-mars forum (or usenet-like).
Facebook seems to use any language they want to at any time. I don't consider it a major endorsement that it appears in production, it just lets me know its on their radar.
If Google picked up D I would be much more interested as to why.
Having D used in production in a big company like Facebook is a big deal. No, it doesn't mean that D has taken over the programming world or anything like that, but it shows that the language is maturing and that real companies are now willing to use it in the real world - and it's a pretty major company at that.
So, this is just one step forward, but it's a big one and exciting to the folks who are fans of D - especially those who have put a lot of time and effort into it.
Seeing D in production is a good thing. Google picking up a language doesn't prove anything. Seeing a language in production in a high volume setting is impressive.
I kind of bet hard on D, and I know people that bet even more. Last time I used it was 4 years ago.
I was using the other standard library (yeah, there were two, kind of), there were bugs which weren't resolved for years, there were issues with people camping on both sides of standard libraries, there was development stagnation because D2 was being pushed hard... but it all boils down to preference.
D1 (with Tango) to me was like a really really good blend of C with classes and modern stuff in it (mixins for example). I was happy with it, but then D2 started to emerge which was more templatey than I was comfortable with. I have accepted, long ago, that I am a C programmer, not C++, by heart.
Ultimately following factors threw me away from D altogether:
- my code isn't an island, I leverage a lot of libraries, and for each and every one I had to use I had to either make my own bindings or rely on somewhat unstable other people's bindings. I accepted the reality of it, but it wasn't something I wanted to waste my time on.
- D2 - it just didn't feel the same
- Lack of tools. DMD compiler, while somewhat working, was a lackluster (to be polite) compared to what I was used to in C world. There was a GDC frontend for gcc which, at the time, wasn't actively developed, there was an attempt to make a LLVM thing.. but if you wanted to work with D, at that time, you pretty much had to use DMD
- several bugs which were ignored in core development which were uncomfortable for my use. I didn't have the expertise at the time to help with resolving that. Coincidentally just yesterday I received an update from bug tracker that bug I reported four years ago is still in.
- Development of D pretty much relied on one man alone
- out of frustration and a tight deadline, I developed something in C at that time and came to a realization that "it just works". I was able to concentrate on my task at hand instead of tackling various gotchas and doing extra amount of work that I had to with bindings etc.
D is more than 10 years old. Honestly, I think it has good ideas, I really liked the approach of D1, but I think it will never catch on en masse.
A few updates on the current state of things today.
The Tango/Phobos split has been resolved for awhile. You can now use Phobos, Tango, or both in the same D2 program.
The number of libraries and bindings is growing rapidly. There is the new dub package manager that makes it easy to include dependencies < http://code.dlang.org/ > and Deimos which is a collection of C library bindings < https://github.com/D-Programming-Deimos/ >.
GDC and LDC are both actively developed and kept up to date with the state of DMD.
It used to be you'd usually hit a bug while working on something in D but the community is really starting to notice they aren't hitting many bugs these days (especially over the last year or so). < http://dlang.org/bugstats.php >
Everything is up on Github now < https://github.com/D-Programming-Language > and Walter is just one of 21 committers. Github says dmd has 72 contributors and Phobos has 116.
If you don't like D2's template approach though you probably still won't be happy with it even with almost everything being better. Personally I love the power it gives you without sacrificing speed.
My friends and I implemented an operating system in D back in college, and I just want to say that I agree entirely with this post. It's mostly why I went into Ruby instead of staying in D, and why I'm doing Rust now instead of D today.
That said, D has a lot of promise, and a lot of these issues have been worked out, but I feel like there are many people like you and I...
> D is more than 10 years old. Honestly, I think it has good ideas, I really liked the approach of D1, but I think it will never catch on en masse.
History proves that the only system programming languages that succeed in the industry, have been at a given moment adopted by OS vendors as their main language.
It hasn't happened so far to D, but it might still happen.
I would argue that java application servers fulfill the same role as that of an Operating System (especially a virtualized one) and exist almost entirely to side-step the sysadmin priests.
I don't follow Java at all (I wish I had time to take a few months to see what's happening), so I might be talking out of my ass. I think Java got a big boost when it became a de facto standard for communications platforms, especially considering the rise of mobile.
"Aside from speaking whether 5112 lines of code is really a good sign, there is separate issue regarding quality. When you will look at claim that some language (lets take for example C# or Java) "supports feature X", that really means that the feature is supported. In D this for sure means that the feature is either broken or misdesigned (shared libraries, routine code breakages, obsolete ms32 object format, AA arrays, shared, const postblits, odd template crosstalk bugs, type system holes, segfaulting lambdas, unstable stdlib, absent of third-party libraries). Untill this stuff is fixed this is a huge barrier irrespective of whether D is used in Facebook or not."
I think there is a lot of truth to the statement, but not to the extreme. There are implementation issues for many claimed features, some because large and in your face, some edge cases, others due to language design.
The situation is better and greatly improving. Many things have a work around. It is just one of the problems with a language full of tools and little full-time man power. Java and Go are rather simple languages so there aren't as many features to boast (e.g. not supporting shared libraries is a feature of Go, yet "broken" in D [lots of good work in git HEAD])
I was mostly not interested by it because the reference implementation is not open source.
I checked, to get the facts straight: DMD is the reference implementation. The frontend is under GPL but the backend (optimizer and codegen) is not. The source code is available but you are not allowed to redistribute it.
Yeah, but there are the GDC [1] and LDC [2] implementations, which AFAIK uses their respective (open source) backends, so that should really not be a problem IMO ...
(In my tests, they are also mostly the fastest ones, especially LDC ... see for example my comparisons in http://saml.rilspace.org/moar-languagez-gc-content-in-python... ... D with LDC is among the top contenders, together with various optimized C versions ...)
Looking at the commit history, GDC has been in steady development since about 4 years now after a pause and LDC is being steadily developed since its beginnings (with changing main authors). Both have very active maintainers these days.
Things might've changed now, but Steve is right. there used to be a sense of dread with other compiler directions. Last time I was in, GDC was about a year without any serious updates. DMD on the other hand didn't have support for 64bit, which was imperative for my usage at the time.
All this talk about D got me nostalgic about great group of people and a sense of accomplishment being made (with language and libraries) when I was using it. It makes me want to have another go, almost.
- C++11 is good enough (and a little better than that)
- The eco system. Yes, I can use C libs from dlang but managing extra bridges/headers is too much overhead if the alternative is just to throw them into my C++ project and use them without writing any glue code.
- The garbage collector. (RAII fan here)
- Tooling. Are there any good dlang IDEs with mature refactoring support and code completion?
For the tasks I'm using C++ for, C++11 is more than sufficient. For more abstract/high level stuff where I don't particularly care about top performance/memory layout I don't need another C++ contender but can use higher level languages/scripting languages.
D would look more tempting if there was no C++11 but for now for me I'm perfectly happy with C++.
- dub + VisualD makes building code, managing dependencies and debugging painless for me.
- being part of building the ecosystem is fun
- I'm not comfortable with the high cost of developing in C++ (even with last revision) and never chose to use it in the first place. C++ is basically forced on me since years.
- I feel like I can be more productive and write higher quality programs when using D. I can use RAII in D like I would in C++.
- I feel like Go and Rust do not address C++ shortcomings while introducing problems of their own.
- D programs are short. I don't need nor want IDE refactorings.
- compilation speed is not as impressive. To me this is imho the biggest pain point with C++, bigger then memory unsafety.
- I missed templates based on any value/symbol, it forced macros on me in my very first try. I don't think it's a problem once you know Rust, but having to write macros was too intimidating for a newb.
> compilation speed is not as impressive. To me this is imho the biggest pain point with C++, bigger then memory unsafety.
Compilation speed remains a work in progress; it's improved a lot in recent versions. Note that 80% of the compilation time in rustc is actually in LLVM passes (the same used by clang), which are mostly optimizations, so turning off optimization helps Rust compile times a lot. LLVM as a backend does many more optimizations than DMD (and 6g/8g for that matter) and runs slower as a result.
One issue is that Rust currently does not do incremental compilation below the crate level, so you will need to use many crates if you want incremental compilation. This will possibly change at some point via a ccache-like mechanism.
If memory safety is not important to your applications, though, then Rust may indeed not be the right choice. Rust requires you to make sacrifices (a learning curve and less flexible unique pointers) to ensure memory safety, and if you just want to write "new" and "delete" then Rust isn't going to be the best language for you. (That said, I think there is a strong argument to be made that for many classes of programs it is worth some pain to eliminate memory safety problems, because memory safety issues frequently lead to security vulnerabilities. For us as browser manufacturers, memory safety is so important it is worth sacrificing literally everything else except maybe performance in some cases.)
> - I missed templates based on any value/symbol, it forced macros on me in my very first try. I don't think it's a problem once you know Rust, but having to write macros was too intimidating for a newb.
In general you want to use traits instead of macros to write generic functions. There are some current limitations with traits but I think they're a more scalable approach than ad-hoc templates. It'd be helpful to know exactly what limitations you were hitting.
Ok I think I was unfair to Rust in my top message. "Creating problems of their own" was not accurate, but I am under the impression I would miss some D things if using Rust.
> Note that 80% of the compilation time in rustc is actually in LLVM passes (the same used by clang), which are mostly optimizations.
OK I had a flawed perception probably because of this.
> That said, I think there is a strong argument to be made that for many classes of programs it is worth some pain to eliminate memory safety problems
I agree 100%, especially on teams memory safety become incredibly important, also I have to thank you for fixing C integer promotion which I've seen create many bugs.
> It'd be helpful to know exactly what limitations you were hitting.
Ah, OK. From a glance I think some combination of higher-kinded type parameters, functional dependencies/associated types, and associated constants could fix the issues you were hitting that forced you to use macros. But I'll talk to bjz. Thanks for the pointers!
C++11 borrowed a few good ideas from D and other languages so that certainly helped close the gap between D and C++ but D is still significantly nicer to work in than C++ (which is what I spend all day at work writing).
You can use RAII all you want in D. The D standard library uses it for things like Files, containers, and smart pointers. I feel like Garbage Collectors have become something of a bogeyman for native languages. There is a lot of value a garbage collector can add. In D it's rarely hard to switch to managing your own memory if your profiler shows the garbage collector taking up more time than you'd like. For most programs avoiding the garbage collector is a premature optimization though.
What exactly is the argument for garbage collection over RAII-style freeing? Even in garbage-collected languages, you still need to do things like close file objects; the runtime doesn't do so automatically when it detects that there are no stack references to the file. Same goes for unlocking mutexes, or any of the other things that RAII tends to address.
Is it just that maintaining ownership semantics for heap memory tends to be more complicated than doing the same for file objects (much more rarely shared, I'd guess) or mutexes (not sure if "sharing" is even sensibly defined here).
There are a number of reasons why you may want garbage collection/automatic memory management over RAII:
- RAII-style freeing means a lot of reference counting. Reference counting and multiple threads don't mix well. You can circumvent that by having multiple separate heaps, but that's not always a practical solution.
- Reference counting is slow; if you want GC-like speed and RC-like pause time guarantees, you're probably better off with deferred reference counting, but that's not easy to engineer into C++; it's essentially a form of garbage collection.
- Lack of fully automatic memory management may affect modularity adversely; see Jones/Lins for the gory details. In brief, wholly or partly manual memory management creates leaky abstractions (this is if you eschew RC for some of your code/types for speed reasons or because you need to deal with cyclic references).
- It is not difficult to have scoped deallocation for resources (scope statement in D, using statement in C#, etc., destructor pragma in Nimrod). Resource lifetime, in any event, does not always coincide with the lifetime of a variable, so this is an incomplete solution. Using GC does not mean not using RAII where its downsides do not matter.
RAII does not mean a lot of reference counting. In the vast majority of cases in typical programs, objects have exactly one owner, and managing that ownership is very easy if your language provides good facilities for move semantics (as C++11 and Rust do).
You can't say reference counting is bad for multi-threaded programs without mentioning that GC is also bad for multi-threaded programs, since the garbage collector has to pause all other threads, either for the entirety of the GC run or (if you're lucky and using a good GC) for parts of it.
The modularity argument is reasonable, and I've been annoyed by this aspect several times, though in the majority of cases it doesn't seem to be an issue.
If you mix up reference counting with manual memory management, you get the aforementioned modularity issue (as I mentioned as one of my points). You get rid of (some) of the overhead, at the cost of simplicity (you also may incur other overhead; manual memory management often introduces unnecessary copying in order to keep ownership simple).
The problem with reference counting in concurrent programs is one of correctness, not one of speed [1] because every change of a reference count is a potential race condition. That's quite different from the challenges garbage collection faces in a multi-threaded environment.
Whether a garbage collector has to pause all threads is an implementation issue. Modern garbage collectors can limit that pause effectively to the root scanning phase, then let the collector run concurrently with the mutator. You can also work around pausing threads entirely, though the tradeoffs are rarely worth it outside of hard realtime environments.
Note also that this is an issue of pause time, not performance. While HN can get obsessed with pause times, not everyone programs video games, embedded systems, or kernel drivers where that matters. High-performance computing, for example, is a huge field where pause times are all but irrelevant and where amortized cost is what matters. (HPC is also where correctness can become easily more important than squeezing out a percent or more of performance through low-level hacks; if a job that takes several days to run crashes halfway through, that's an enormous loss.)
[1] Technically, you can make reference count updates atomic, but the overhead is absurd, especially under contention. Hence why SNZIs [2] exist, which reduce the speed overhead at the expense of additional memory overhead.
[2] http://dl.acm.org/citation.cfm?id=1281106
>While HN can get obsessed with pause times, not everyone programs video games, embedded systems, or kernel drivers where that matters.
The (my) problem is that it also matters whenever a program holds lots of data in memory (databases, data analysis, caching, etc). As the cost of RAM decreases, the importance of this problem increases, i.e. fast.
Large heaps and large pause times do not necessarily go hand in hand. That's a question of GC technology. For example, the Azul GC does <10 ms pauses for heaps that are hundreds of GB in size. Granted, Azul/Zing pricing doesn't exactly make it the most accessible technology (and it relies on kernel hackery for its compaction), but it demonstrates that multi-second pauses are hardly a necessity.
Incremental garbage collection isn't a new technology [1] or one that's particularly difficult to implement by itself (a grad student could probably implement Baker's treadmill in a couple of days); what makes it hard is primarily compaction and multiple threads [1]. You can't effectively use (naive) RC with multiple threads and you don't get compaction with RC, either. Multi-threading is in practice a major driver for the need of GC, since (naive) RC isn't thread-safe and unpredictable object lifetimes don't go well with manual memory management.
Also, deferred RC strategies can under certain circumstances contain the cost for cycle collection. Trial deletion is already limited to nodes reachable from other nodes whose reference count has been decremented; type information can be leveraged further to exclude nodes that cannot possibly be parts of cycles (this is particularly easy in ML-style languages, which make mutually recursive types explicit in the source code, but is not limited to those [2]).
Finally, you can also use multiple disjoint heaps to simplify implementation and cap GC cost (one per thread and zero or more shared heaps). This can also bring naive RC back into play as a viable strategy, though you'd still lose compaction. Multiple heaps are particularly attractive for NUMA architectures.
[1] I note that hard realtime guarantees are difficult to make with a basic incremental collector due to the potential of pathological behavior, but we are not talking about hard realtime guarantees here.
[2] Obviously, JITs and dynamically or weakly typed languages have a harder time with this approach.
>Large heaps and large pause times do not necessarily go hand in hand
Not necessarily, but in practice they do go hand in hand. I didn't say the problem was impossible to solve, just that it is an important problem that needs solving. Azul solved the technology issues (or so they claim) but not the economics of it, and they solved it for a language that isn't a good fit for in-memory computing in the first place (to put it politely).
If I have to write software today that keeps a lot of data in-memory and requires reasonable latency (and I do) my only realistic option is C++.
I know all the drawbacks of naive reference counting. C++ shared pointers are a horrible kludge. Fortunately they are only needed in very few places, thanks to RAII and unique_ptr. The problem is that C++ has tons of other issues that will never be solved (antiquated modularity, header files, crazy compile times, excessive complexity and generally lots of baggage from the past).
If I have to write software today that keeps a lot of data in-memory and requires reasonable latency (and I do) my only realistic option is C++.
I don't necessarily have a fundamental disagreement here, but I offer two caveats (one of which you may consider disagreement at a certain level).
One is that a lot of the discussion in this thread is not about what is possible today, but about where language implementations can realistically go.
The second is that there are language implementations that do allow you to keep lots of data in memory and still give you low latency; the catch is that most of them are functional programming languages and do not tie into the ecosystems of C++, Java, etc. which limits their applicability. But consider for example, that Jane Street Capital, which has pretty significant requirements for latency, is using OCaml. This is in part because OCaml has an incremental garbage collector for major collections (in addition to generational garbage collection). As I said, it's not rocket science.
The same goes for Erlang, which uses a model of thread-local heaps. Erlang heaps tend to be small (lots of threads [1] with little data each, even though the total memory footprint may be in the gigabytes), so Erlang can use a straightforward garbage collector that can run concurrently with other threads, does a complete collection fast (because the entire heap fits inside the L3 or even L2 cache) or can avoid collection entirely (if you specify the heap size explicitly with spawn_opt and no collection is needed before thread termination). As a result, Erlang can easily satisfy the low latency requirements for telecommunications (where its being used primarily).
Functional programming languages just had an unavoidable need for garbage collection for half a century now and so implementations of functional languages have seen a lot more GC tuning than imperative languages. Thus, you do at least in theory have other options, but I expect "realistic" in your case also implies being able to tap into certain existing software ecosystems.
Let me finally note that a big problem has arguably been too much focus on the JVM; not that there's anything wrong with supporting the JVM, but it has too often come at the cost of alternative execution models. JIT compilation, lack of value types, a very general threading model, loss of static type information, etc. all can very much get in the way of tuning automatic memory management. Luckily, some of the newer emergent languages target alternative backends, in part to avoid these problems.
[1] Technically, Erlang uses the term "process" in lieu of "thread"; I'm using "thread" to avoid confusion with OS-level processes.
Modern GCs use multiple threads to collect garbage. Also, it is possible to design a GC that does not pause mutator (e.g. C4) or does most of the things concurrently (CMS) or does incremental sweeps (G1 and other evacuating collectors), or uses separate isolated heaps for each thread (Erlang GC).
There are some non-defered reference counting that can offer pretty good performance(within 19% of defered reference counting), while still offering determinism[1].
Yes, but they have the same problem that they require compiler smarts; i.e., they require more than a template to implement shared_ptr<T>. When you're doing that, you may just as well implement something more sophisticated for automated memory management.
The attraction of RAII is that it's a pretty simple mechanism that doesn't require compiler intervention outside of basic, already existing optimizations such as inlining.
Lack of GC makes concurrent lockless programming extremely hard (and most of the examples on the internets are subtly broken). RAII doesn't work at all besides trivial cases when there is no sharing at all. Reference counting incurs additional contention, due to interlocked increments/decrements and is also very hard to do properly. Where in a GCed language a single CAS is often enough, in a non-GCed language you need to resort to less efficient and often completely unsupported things like DCAS or CAS2.
There have been papers on how GCs can be more efficient than manual memory management in some situations, so it's not always the case that RAII is better from a performance standpoint (though I think that it frequently is). But regardless, a big reason for the GC in D is memory safety. Using the GC, you can make memory safety guarantees that cannot be made with malloc and free (e.g. you can guarantee that a pointer won't be invalid with a GC but can't with manual memory management), and D is big on making memory safety guarantees. Also, some features (such as array slices or closures) work very well when you have a GC and become much more difficult to pull off without one due to a lack of automatic memory management that the compiler can use and a lack of clear ownership in some cases (e.g. the runtime owns the memory for array slices rather than there being a particular array which owns the memory).
That being said, D does not require the GC. It's easier if you use it for at least some of the features, but D makes it very easy to use RAII and manual memory management, which also helps a lot in making the GC work better, because it's much easier to avoid making a lot of garbage for it to collect when you don't need to. A lot of stuff in D ends up on the stack rather than the heap, and D's GC ends up having a lot less work to do than the GC does in languages like C# or Java. That being said, the current GC implementation needs some work (and effort is definitely put forth in that area), but if you don't want to use the GC, you can minimize its use or even outright avoid it (though _completely_ avoiding it can be a bit annoying since that means avoiding a few language features that require it; that list is short though).
So, while D definitely uses the GC and promotes its use where appropriate, you have full control over memory just like you would in C++. And the few features that are hampered by avoiding the GC don't even exist in C++ in the first place.
> A lot of stuff in D ends up on the stack rather than the heap, and D's GC ends up having a lot less work to do than the GC does in languages like C# or Java
This is true, however there's a compromise involved. Because D also allows for manual memory management and unsafe memory access, it means that the GC is not free to move stuff in memory at will ... which really means that garbage collectors, like the ones available for Java (precise, generational, non-blocking, fairly predictable and compacting) are very hard to develop, probably next to impossible. This is the price you pay for using a lower level language and it's not a price that will go away easily.
I've been using Scala a lot for the past two years, using it to build high-traffic web services and while Scala is pretty wasteful in terms of allocating short-lived objects, I've been surprised at how well the JVM handles it.
In terms of throughput for example, Java GCs are much better than manual memory management. Allocating memory usually involves just incrementing a pointer, so it's basically as cheap as stack allocation. Deallocating short-lived objects is also fairly cheap, since it happens in bulk and so the amortized cost is pretty similar to dealocating stuff on the stack (!!!). The JVM can do some pretty neat things, like for example if it detects that certain references do not escape their local context, it can decide to allocate those objects straight on the stack.
What really sucks about garbage collection is the unpredictability. Java's CMS for example, awesome as it is, still blocks the world from time to time. And when it does, you have no real control over how much time it keeps the process hostage. The new G1 in JDK7 is much better and if you want the state of the art in terms of near-real-time GCs, you can buy into Azul's pauseless GC. But they still suck for certain apps. Allocating objects on the heap also means you have to pay the price of boxing/unboxing the references involved. This sucks too.
On the other hand, by having a good GC at disposal, it's much easier to build multi-threaded architectures. In C++ for example, it's so freaking painful to deal with non-blocking concurrent algorithms, or really, multi-threading of any kind.
> Because D also allows for manual memory management and unsafe memory access, it means that the GC is not free to move stuff in memory at will
Since I wrote a moving GC for Java years ago, and hence know how they work, I set the D semantics so it allows a moving GC. It's just that nobody has written one for D.
It is hard to manage ownership when allocations are passed around between concurrent threads. Garbage collection in Go makes goroutines and channels easy.
Though arguably you shouldn't be accessing the same allocation from concurrent threads at the same time anyways, because that requires mutexes, which are ugly. Rust requires you to transfer ownership of objects when passing them between threads, which both enforces mutex-free programming and makes RAII work. Only one thread owns an object at a time, and that's the only thread that can use the object, and is also the thread that is responsible for freeing it when it goes out of scope.
Mutexes are only ugly because few languages properly support them. It is not difficult to ensure at the language level that no object is accessed without owning the the lock associated with that object (read/write locks can be a bit trickier), but for some reason hardly any language implements something along these lines.
It's not fundamentally different from having bound checks for array indices. You can either do that pretty easily at runtime or use a simple type system to do it at compile time (e.g., Boyapati's [1] or Flanagan's [2] work). It's been done, e.g. for Cyclone [3]. This is decades old stuff, not rocket science. Honestly, Monitors had the basic idea right in the 1970s, except as Per Brinch Hansen lamented [4], language designers keep screwing it up.
This implies that an object will be unavailable while it is being passsed between threads. How fast is that? With a mutex, it's as fast as shared memory.
Message passing is a couple of atomic operations, and there are also L4-esque optimizations that allow a message send to be optimized into a direct context switch without a pass through the scheduler.
Also, you can use shared memory in Rust. If you do, the type system ensures there can be no data races: either the data is immutable or you must take the lock before you mutate the data.
Under the hood, you still use shared memory in this style of programming. But the programmer never uses shared memory directly. Instead he passes references to the shared memory to other threads, and the act of passing a reference clears the reference in the sending thread so the sending thread can no longer access it.
In Rust, this style is enforced by the language. In C++11 you can enforce it by the convention of using std::unique_ptr and passing those between threads via a shared data structure like a blocking queue.
> Even in garbage-collected languages, you still need to do things like close file objects; the runtime doesn't do so automatically when it detects that there are no stack references to the file.
I'd just like to say there are sane solutions out there already to things like this. With the proper abstractions you get things like custodians[0] in Racket and several alternatives in terms of "call-with-..."[1] functions that do handling for you.
These might not be the languages that are being discussed here, but they offer things like these precisely because they are not the languages discussed here right now.
For reference, a lot of embedded programmers still use static allocations only, not even trusting heap allocations at all.
Took awhile to get used to, but it is a damned fine way to avoid memory leaks of any kind!
But even that aside, if you are writing something like a render loop with hard timing guarantees, (or anything else with timing needs!) having non-deterministic allocation costs can easily blow your cycle budget.
"defer" has really weird semantics; it's based on imperatively pushing cleanups onto implicit per-function mutable state. (This shows up if you call "defer" in a loop, for example.) It's also possible to screw up and forget to use "defer", causing things like leaks of file handles. RAII, on the other hand, is based on lexical scope, without hidden mutation, and also makes it harder to forget to do things like close files.
It's more complex to specify and implement than "this code statically runs at the end of the block". This can lead to surprises. (I would not have guessed what "defer" in a loop does until I learned.)
Because "defer" is tied to function definitions, it means that pulling code out into a new function or inlining a function will silently change its behavior in subtle ways.
Like most complex language semantics, "defer" is less easily optimizable than the simple semantics of RAII—with RAII the compiler statically knows what it is supposed to call, which helps exceptions (panicking goroutines in Go) because it allows their cleanups to be optimized into table-driven unwinding. But because Go chose "defer", the compiler must insert code that dynamically keeps track of cleanups to handle the general case. This results in indirect function calls, and bookkeeping overhead, whereas RAII is purely static function calls. The bookkeeping overhead is significant: it allows RAII to be zero-cost in the no-panic path (very important in C++'s case!), while I don't see any way to implement "defer" in a zero-cost way.
defer is great for "one off" cleanups, but in so many cases (closing file handles, releasing locks, etc.) cleanup is not one off but is repeated over and over again. So the price of explicitness is redundant code that is easy to forget or get wrong.
Note that if your language supports RAII and lambdas (e.g. C++11), it's trivial to implement defer using a class that takes a function as an argument to its constructor and calls that function in its destructor. So you can still have the flexibility of defer in a language with RAII.
My preferred method of handling this is how Ruby does it with, e.g., File.open. The function yields the file handle and then closes it once the supplied block (think lambda function) ends. This is similar to what you refer to, but without the class. The class is useful for heap objects, of course.
I knew when I was a C++ dev that any "other languages" was just not serious. Lots of friends who are master in C++ call everything else than C and C++ "scripting languages". I was in that closed-mind ecosystem of C++ elitist but then, luckily, I learned about Python and it caught my interest.. and then from there I've learned Lisp, Ruby, Javascript, etc etc. And I haven't touched C++ after that.
Note that I live in a startup environment all the time and the time and cost to develop in C++ is just too damn high for me. Also, the verbosity and its philosophy kills me.
What was it?
std::map<int, std::vector<int>> whatever = new std::map<int, std::vector<int>>();
C++11 brought type inference so these days it would be either
auto whatever = new std::map<int, std::vector<int>>();
or
std::map<int, std::vector<int>> whatever;
depending on if you want heap or stack allocated variable, stack allocation usually being preferred as the default solution (and thus being the tersest form).
But lets look a bit deeper why C++ is more verbose in this case. One central reason is the idea that standard library should not be special/privileged in the language syntax. The upside from that is that almost all syntax sugar can be applied to user defined types. A relevant example would be uniform initializer lists in C++11, which allow you to write eg.
std::vector<int> foo {1,2,3};
but also
MyFunkyDatastructure bar {1,2,3};
as long as suitable constructors are defined. In comparison afaik in Python you can not define a class that can be initialized with sugared syntax.
The theme of non-privileged standard library also extends to global namespace pollution. In C++ stdlib does not get to reserve names at global namespace, which leads slightly increased verbosity (std:: -prefixes or using-statements).
So when you ask why you cant make a map/dict in C++ with just 'd = {}', know that there are actually good reasons for (some/most) of the verbosity. The tradeoffs might make it less suitable in your usecases, but there is no absolute/general superiority in either way.
is absolutely not equivalent to a map of int to a vector of ints. Sure in javascript, you can indeed use an object as a map, but you are lacking the type safety that the C++ version provide and all the nice API. Also, in C++11, you can shorten it with auto :
auto whatever = new std::map<int, std::vector<int>>();
and to be fair with the js, if you want it as short as possible, you could use a using namespace statement for std (although not really recommended) :
auto whatever = new map<int, vector<int>>();
or maybe not use new :
map<int, vector<int>> whatever;
I don't think it's that bad. But sure, any dynamic language will be less verbose that a static language because, well, you are removing information...
1. You picked probably the most verbose way to do it.
2. It's not valid C++ because you're assigning a pointer to a value.
3. Idiomatic C++ doesn't use `new` explicitly anyway
4. The `d = {}` of Python is not really equivalent.
My feeling is that python gains brevity in exchange for putting a heavier mental load on the programmer, which is manageable for small programs but becomes unwieldy for larger ones.
Idiomatic C++ would allocate everything possible on the stack, and if that wasn't possible, use `make_shared`/`make_unique`. So maybe I will change it to say "doesn't use `new` directly".
Why not:
using namespace std;
map<int, vector<int>> whatever;
It's a lot shorter, actually correct, and it's totally cool to import whole namespaces in a .cpp implementation file where you're likely to actually declare something like that.
If you're going to critique the language you should try to learn it first.
Making history short: because the first time it took off it ended having two incompatible "standard" libraries that were so competitive that even bad blood happened.
I think that D2 is really interesting. But I think it doesn't get as much traction because it was an improvement over C++03, but there was no fundamental change in how one developed. I think people switch languages when there's a huge change in how they develop programs - such as a new abstraction that opens up a new programming model. Many, smaller improvements don't seem to add up quite the same way.
Now, I'm less clear on how D2 compares to C++11. Certainly some of the smaller improvements matter less, as C++11 has them.
My feeling is that you really have to try it out for some time to see why it really makes a difference* . And, although C++11 indeed has caught up in some areas, there are in fact many smaller and some larger things that still do make a big difference in practice. But in general the fact that it's not just one or maybe a few primary features indeed probably makes it more difficult to judge the actual benefits from the outside.
The drawbacks today are in my eyes mostly the relatively minimal support for non-PC platforms and that a number of language features are not yet completely finished. But if the main target are x86 based systems and the use of cutting edge features is not the primary concern, D2 is a great development tool today.
* I can only really talk about my experiences going from C++ -> D, but it was a long process where my way of designing systems and solving problems was gradually altered in many areas - much to the better. And the immediate reward of course is the extreme gain in productivity due to concise syntax, fast compilation times and, last but not least, the lack of a need for header files.
Much of the initial design come from improving the language grammar and syntax. That limited improvement didn't last long, but it was an initial motivator (Walter being a compiler writer and C++ being horrible language to parse).
The D community has greatly pushed D away from this initial improvement. In C++ there is a separation of the developer tools and the library writer tools. I think that separation still exists in D, but it is in an area even library writers don't/can't reach for in C++.
Ultimately D doesn't force a fundamental change, but as one learns to use the language, the result is a fundamental change.
Yeah. The "almost entire" applies to std.algorithm. How much allocation goes on elsewhere depends on what module you're talking about. Some definitely avoid it, whereas others definitely require it. That will be (and has been) changing though. The main thing is probably going to be making much heavier use of output ranges rather than automatically allocating arrays/strings. We'll get there though, because there's clearly a lot of interest in doing so, and there's no fundamental reason why we can't.
I'm not sure where you got that idea but that's not true. The standard library make heavy usage of the GC. There is talk of implementing ARC to replace or supplement the GC and the standard library will most likely soon start using output ranges to avoid forced GC allocations in many cases but that hasn't happened yet.
"Memory is usually managed with garbage collection, but specific objects can be finalized immediately when they go out of scope. Explicit memory management is possible using the overloaded operators new and delete, and by simply calling C's malloc and free directly. Garbage collection can be controlled: programmers can add and exclude memory ranges from being observed by the collector, can disable and enable the collector and force a generational or a full collection cycle. The manual gives many examples of how to implement different highly optimized memory management schemes for when garbage collection is inadequate in a program."
not as far as I know. while having garbage collection was an asset when java was created, currently with ubiquitous RAII usage in c++ it's just a hindrance.
Really interested to know what other compile-to-native-code languages that they evaluated and why they chose D over other alternatives. Anyone knows if such comparison is available?
There is another project at facebook which uses Haskell [1]. It's a DSL that replaces an interpreter which was written in C++. Performance was one of the reasons for replacing it, and the talk includes an comparison of the performance of the two approaches.
I remember having a chat with Andrei, one of D's creators, almost one year ago and saying: "If you want D to be more popular, maybe you should make an announcement about Facebook using D internally". Seems that day is here :)
I'm not sure 'targeted' would be the correct term, at least for Go. Go initially was targeting C++ devs(however ambitious and failed that attempt may have been) by offering this stripped down new language that more resembles C.
I've always much preferred C style to C++/D/Rust based mainly on readability and how small the language felt. I completely agree though that Go has given me this similar feeling - to me it really is a modern C. That said, I'm not sure how many C devs are going to like the bounds put in place by Go, or the mandatory GC.
I am, admittedly, a big fan of go also. For me, it has taken from both C and Python use cases, which I think says a lot about its versatility.
The D statement may be accurate, I've never done anything with D and only minimal brush ins with C++ - but to my untrained eyes, they seem pretty similar.
Except that the go developers explicitly stated their design goal was to replace C++ for their needs, not C. And that completely failed, and now the focus is on pulling in python devs.
I jumped into Go bandwagon early on, even tried to do some initial contributions. Mostly because Go has an Oberon feel to it. A GC enabled systems programming language I used for a while in the mid-90's.
However, I quickly became disappointed with Go's spartan design, given my broad experience across languages and paradigms.
That is why somehow I feel it is more indicated for C developers, that could live with a GC enabled language.
As they would mostly getting type safety and a few more features, whereas developers from other languages are mostly giving away features.
Yes, but from what I've learn of the original developers, "replace C++ for their needs" means, "we are using C++ currently, we don't want to, so we're making a language which incorporates a core which helps our tasks (goroutines). To do this we will fix up some issues we have with C."
Rust has much less emphasis on CTFE than D, and instead has a very extreme focus on zero-cost memory safety (and, following from that, a concomitant focus on easy concurrency).
D also owes more of its ancestry to C++, while Rust takes a lot of its inspiration from the ML family of languages.
Both languages are gunning to appeal to C++ programmers, and they're both worthy in this regard depending on your use case. The C++ pie is more than big enough to let both languages thrive independently (and their underlying philosophies are different enough that I expect very little overlap between their communities).
D is mature and actually usable, Rust is in alpha stages at best. The current outlook seems to be that they should be fairly comparable one day, but right now making such comparisons wouldn't be really fair. We don't even know yet what the final form of Rust will be like.
I'd say the biggest differences are (a) Rust is more functional (preferring algebraic data types over OO, pattern matching, immutability by default, typeclasses over templates); (b) Rust is newer and younger; (c) Rust has safe manual memory management, enforcing memory safety even if the GC is not used. Of these, I think (c) is the most fundamental difference: it gives the language a very different feel and niche, at the cost of a learning curve.
All available D compilers generate native code directly, there is no JIT involved.
Sometimes I wonder how computing would look like if the default implementations of Java and C#[1] were available from the start as native compilers, instead of VM environments.
[1] Funny enough, the precursor of .NET was native and COM based, similar to what is now WinRT, more info here,
All available D compilers generate native code directly, there is no JIT involved.
I think that's why the GP has the JIT in parentheses (to account for Julia's implementation, which does use a JIT-compiler, as opposed to D and Nimrod)
Sometimes I wonder how computing would look like if the default implementations of Java and C#[1] were available from the start as native compilers, instead of VM environments.
At least with Java that would defeat one of the main intents of its development. What would you think would be gained by having non-VM based implementations of these by default?
What would you think would be gained by having non-VM based implementations of these by default?
For the JVM, considerably lower startup/JIT warm-up times. You can really observe this nicely if you're using Zinc (the incremental Scala compiler), where you not only cut the startup overhead down to pretty much zero, but compilation really speeds up once the JIT has warmed up a bit. (Hence why Nailgun exists.)
I only have a passing familiarity with Scala so I had to look up Zinc, and my hobby languages are erlang/lisp and work is C# so Nailgun is also new to me.
Nailgun seems to be a way to keep the JVM running so you don't incur the startup overhead. Straightforward enough, and makes sense when your execution time is dominated by the JVM init.
Zinc seems to be a compiler based on Nailgun so that performance improves over time (particularly useful for long compiles or frequent compilation).
Neither of those though, move Java off a VM/JIT basis. However, as someone who hasn't seen them before, it was interesting reading and neat to see what's going on in the rest of the world.
Neither of those though, move Java off a VM/JIT basis.
That's the thing; the Nailgun workaround to an extent avoids the startup overhead, but can create other problems. E.g., when you're starting multiple builds on the same server (because the Nailgun server listens on a port, so you may have to sort out conflicts), if you have multiple users on the same machine (because anybody can connect to the socket, so you have a potential security problem), etc.
With a native option, this can be avoided (note that IBM offers an AOT compiler for Java).
Note: I'm not saying a native option doesn't make sense, my original question is what would be the benefit for Java/C# (though we've focused on Java) of native first.
Re: Nailgun - that seems like an implementation issue, but yes, if native compilation were present then Nailgun wouldn't exist (or need to) so the technical issues causing those problems wouldn't exist.
I'll admit, I was only in HS in the 90s, but by the time I got to college and was studying CS that confusion never occurred to me. Was there really a period when strong typing was specifically associated with VMs and vice versa?
You studied CS, and probably even had a nice CS degree.
Many people discussing about languages in HN and reddit, tend to first of all, mix languages with implementations. Language X is compiled, interpreted, ..., sometimes without knowing multiple implementations are available.
Then many never used strong typed languages with native compilers, like Pascal, Modula-2 and so on. Their experience tends to be limited to JVM/.NET languages.
Outside of this world, they kind of know C and C++ are not as strong typed as those languages, don't have a GC and have native code compilers available.
Hence, strong typed typed languages are managed and require a VM.
This is a pattern I observed in many young developers without CS background.
That's a fair issue, but also a particular use-case. Java seems to be suitable for long running apps where that 100+ms start up time is easily covered as an ammortized cost over the apps uptime. It still doesn't answer what benefit would Java have if it had been native compilation from the start versus VM (and later VM with JIT-compilation). It would not have been able to run on near the number of platforms that it initially supported if, instead of porting a VM, they'd had to support many OS/ISA pairs. The VM -> VM w/ JIT approach provided a deployment path based on incremental improvements. Initially support dozens of platforms with good enough performance, then over time improve performance on each platform. Versus initially support a handful of platforms with good performance and spend years getting the breadth of platform support.
They "solve" it by being long running applications. When GP says CLI they likely mean things like grep or cat or other command line tools, not just interfaces, that tend to run in a very short time. A Java implementation would see a lot of extra time spent on just the startup. Consider writing a bash script that calls out to grep for each of a bunch of different files (an example, better ways, but go with it). Say it ran grep 100 times, jgrep at 100ms startup time would take 10s longer just on the java startup time, ignoring any other performance differences.
The crucial combination appears to be static typing + automatic memory management + imperative programming. This is a niche that has not been served well by languages outside the JVM/.NET families; and the JVM has the problem of having pretty high per-process overheads while the problem with .NET languages has always been portability/deployment outside the Windows ecosystem.
Apple has Objective-C, Mozilla has Rust, Google has Go, Microsoft has C#. I wonder if Facebook was feeling a little left out of the post-C++ language party?
Each language is like the others in that it has become associated primarily with a single organization.
That's what I was attempting to convey, not any similarity between the languages themselves (the erroneous and unfortunate mention of C++, notwithstanding).
Fair enough. I should have specified CLR instead of C# in general.
That said, I stand by my argument that it is better compared to Java, though the later revisions have been adding more C++-esque low level features. I find C# a good compromise.
> It should be compulsory for everyone to learn about compiler design, before comparing programming languages.
My oversight, and difference of opinion with you does not mean I'm an uneducated idiot. I wrote a (limited) C compiler in my undergrad and programming languages was large mandatory portion of 4th year. If you rear-ended somebody on your morning commute, I'm not going to assume you're a dangerous driver and should be pulled off the road.
This is Hacker News - not Reddit. The least we can do is act like that means something.
Objective-C isn't exactly a "post-C++ language". The two languages showed up around the same time in the 80's and have orthogonal object models (in a very concrete sense given the existence of Objective-C++).
EDIT: Removed dot at the end of the URL