> Apologies. I was thinking if HIPE, not BEAM. Other than the fact that it makes...

MichaelGG · on July 18, 2015

So HiPE is stable now? I haven't used any Erlang for a few years, so I'm glad to hear it's fixed.

I understood that Erlang hot patching was only at a function level, or some other limitation that required much care? No guarantees when the old code will be unloaded? It's been years and I only skimmed things so perhaps I'm wrong and should shut up. But anyways, nothing stops you from doing the same in other languages. Indeed, asp.net does this. But it's just a light perf hack - in general I'm unsure of the usefulness vs more robust approaches. Where do you find it beneficial? (Only time I desire such stuff is to hand off control of a socket to a new server version, and there's nonintrusive ways of doing that.)

I remember seeing someone porting OTP to Java or .Net. Again, what are the language level features that prevents this? Or is it more of a lack of a commercial entity to back the beginning of such a project? Personally, having to serialize everything and getting "transparent" scalability by moving stuff over the network doesn't appeal to me. I'm guessing the rest of the world just runs a cluster of HTTP servers or other middleware and leaves it at that.

Maybe I've just had a terrible exposure to Erlang. Users of it seem to enjoy it, but so do many users. The main point seems to be that Ericsson made a good switch with it... Which seems as fallacious as pointing out that Facebook used PHP.

simoncion · on July 19, 2015

> So HiPE is stable now?

I mean, it looks like HiPE has been shipped with OTP since 2001 (three years before work started on Dialyzer), and it certainly is enabled by default on all supported platforms (of which x86 and amd64 appear to be two). Now, you do have to pass the "native" option to the code compiler to make it compile to native code, but you don't need to jump through any more hoops than that.

For the software I've written, bytecode-compiled Erlang was fast enough for me, so I haven't much experience with HiPE. I'm pretty sure that none of the documentation on erlang.org indicates that HiPE is experimental or unstable. What was wrong with HiPE when you used it, and when did you use it?

> I understood that Erlang hot patching was only at a function level... No guarantees when the old code will be unloaded?

Yeah, it's hot patching at a whole function level. This level of granularity almost doesn't require any care at all, actually... far less than if you could patch at a statement level [0]. Read the first several paragraphs of [1] to get a high-level overview of how hot code swapping (and code unloading) works. (The prose from "There are ways to bind yourself" onwards is not relevant to your interests, so you can stop there.)

> Where do you find [hot code loading] beneficial?

Whenever I have a service whose code I need to upgrade, and won't need any complicated data migration as a result of the upgrade. Hot code loading is not absolutely critical, but it's another useful tool in Erlang's high-availability toolbox.

> I remember seeing someone porting OTP to Java or .Net.

That's cool! :D What parts of OTP did they not port? Mnesia? The Erlang stdlib? (There's lots more to OTP than gen_server and friends.) Did they also port single-assignment variables, transparent-to-application-code IPC, and distributed code loading-and-execution, or was this just an OTP-the-library port and not a "Let's port some of the nicer Erlang/OTP runtime features, as well as gen_server and friends." project?

> Personally, having to serialize everything and getting "transparent" scalability by moving stuff over the network doesn't appeal to me.

There's no need for scare quotes. Erlang process distribution is transparent to program code. And, like, anyone writing distributed software has to be aware that accessing off-node data is almost always more expensive than accessing on-node data. It's a law of physics. You can't ignore it.

Anyway. As an application writer, Erlang's process distribution is also incredibly nice (until measurements demonstrate that it's too slow for your application, and you have to do a bit of redesign).

For most web app backend services, and a lot of web infrastructure Erlang is more than fast enough, and gives you the tools to trivially scale to meet increasing demand.

> The main point seems to be that Ericsson made a good switch with it...

The point of that example is that over 1.5 million lines of Erlang were used in a piece of telecom hardware that provided 99.9999999% uptime. (That means that the switch was down for no more than 31 milliseconds per year.)

Erlang isn't good for every project. Only people who don't know what they're talking about make that claim. Erlang and OTP do provide you with the tools to relatively easily make fault-tolerant, scalable software. Is it the only toolset that does this? Fuck no. But it is a pretty-well-thought-out, battle-tested, actively maintained one.

What tools do you use when you must write highly-fault-tolerant, scalable software?

[0] If we assume a moderately complex function, there are certainly people alive who could keep all of the interactions between the first half of the currently-running-code and the second half of the to-be-switched-to-code in their head. I'm not one of them.

[1] http://learnyousomeerlang.com/designing-a-concurrent-applica... (You might need to reload the page after it is first loaded. Late image loading scrolled the page away from the intended anchor on my system.)

MichaelGG · on July 19, 2015

Thank you for this comment.

The transparent part I put in quotes because going over a network isn't transparent. Even .net remoting can transparently create objects over the network. It's just s bad idea and better to be explicit. Like you say, the performance issues are simply a fact.

I've written some telecom stuff and ran the first VoIP oriented 911 service provider. We missed a single call in a year, and we followed up manually on that one (it was during a hard, scheduled, failover, and we were monitoring for call attempts). It's mostly a matter of testing and just assuming everything will fail. From having higher and higher level exception handlers, to assuming every connection, server, process -- anything-- will fail and making sure there's a failover path available. After that, there's monitoring, to try to prevent system wide cascading failures.

Looking back over VoIP stuff I did more recently, the availability rate is way, way lower. A huge chunk of the problems were just lack of testing or procedure. It's embarrassing, really. After that, lack of limits in order to prevent resource exhaustion was the second biggest problem. Failure of the runtimes/VMs was never an issue, across Windows and Linux, CLR and Mono.

How is Erlang going to help with logic errors more than any managed language that discourages state? I don't see how crash and retry fixes the majority of bugs. I can see how it's better than an unmanaged/unverified language where a single fault trashes the entire process, sure. But against JVM/CLR languages, say?

As far as a million lines with high uptime: the Linux kernel is pretty big and haven't people achieved high uptime with it? But I wouldn't consider that in favor of C, just that it shows it's possible in C. Is this an invalid comparison? I'd be more interested in Ericsson's engineering dept, but I imagine it's gonna be what we expect right? Heavy testing and specs?

And suppose I say OK, and move to Erlang. How is it going to maintain HA while pushing, say, a million packets a second of RTP traffic? Right now I just lb stuff out to various processing servers and call it a day. Even the guys I know that use Erlang, all the heavy lifting is C. Erlang's just the signal plane (even then, I wonder how they'd scale to handling DDoS levels of signaling).

As far as I remember, the process distribution part, it's just shuttling around serialized function calls over TCP, right? Not to demean it, just it's not a secret magic perf sauce, is it?

I'll give it another look, I've most likely missed something.

simoncion · on July 22, 2015

I notice that you haven't mentioned when you last used HiPE, nor have you mentioned what was unstable about it when you used it. I also notice that you haven't offered any sort of detailed description of or link to the OTP port that you saw some time ago. I'm genuinely interested in all of these things.

Every single time I have heard of "network transparency", whether it was from the Project Athena documentation out of MIT, OpenGL tomes, or CORBA programmer's guides, the author has said something to the effect of:

"Network transparency means that -to client code- access of local resources appears to be identical to access of remote resources; client code doesn't have to care where a resource is. However, access of non-local resources is bound to be slower (often substantially slower) than access of local resources. Be aware when writing performance critical code!"

Everything I've read defines network transparency in this way. Everyone I've talked to knows this definition and knows about its performance implications. Everyone I know who's not a freshly-minted web developer agrees that if you lack a certain level of experience, you have no business designing distributed systems. I'm not sure why you have such trouble with the term.

In regards to your VoIP service: Erlang/OTP provides battle-hardened monitoring, failover, exception handling, and the like. When you use Erlang, you don't have to write any of that, or fish for libraries of unknown quality to provide that functionality. That's one of the big things that's nice about the language and platform.

> Failure of the runtimes/VMs was never an issue...

Neither I, nor most folks who work with languages that run on a properly-developed VM often run into VM failures. If we did, we probably would stop using that particular faulty system. :)

> How is Erlang going to help with logic errors...

It helps by letting you write code only for the happy path. See my next comment.

> I don't see how crash and retry fixes the majority of bugs.

You might be confused about what it means to crash in Erlang. When you write Erlang, you code only for the cases that you must handle. This reduces the amount of code you must write and test. If a component of your software encounters unexpected or invalid input, or gets put into an unanticipated state, it crashes, the invalid state is lost, and the supervisor for that part of the system restarts the component. Folks sometimes talk about writing a crash-free error kernel [0] surrounded by code that dies when it runs into something unexpected.

Do you get this for free? Yes and no. The process supervision code is built in. The modular software design to take advantage of the supervisory stuff you must do for yourself. You would likely end up doing very similar design work regardless of what language or platform you used.

> ...the Linux kernel is pretty big and haven't people achieved high uptime with it? ... Is this an invalid comparison?

It probably is an invalid comparison. And yeah, Ericsson's engineering department is likely full of good, disciplined programmers. However, most tools (like Erlang/OTP) that improve programmer productivity will improve the productivity of all but the very, very weakest of programmers.

> Even the guys I know that use Erlang, all the heavy lifting is C. Erlang's just the signal plane.

Yeah. As I understand it, that's the general pattern for high-performance systems. If you have really serious performance needs for your robust distributed (or simply fault-tolerant) system, do some perf tests, write the performance-critical parts in C or C++ or whatever, then use an Erlang Port Driver (or maybe some IPC mechanism of some sort) to connect them to the more difficult or logically tricky bits that are written in Erlang.

As I understand it, this is how the software running the AXD301 was designed. If you have an hour or so, you might be interested in [1]. It's an Ericsson presentation on the design of the switch in question. Some of it is stuff you undoubtedly already know about, but much of it is probably not. If you're looking for more, any one of the papers from Joe Armstrong on the proper way to go about designing Erlang systems are always good reads.

> As far as I remember, the process distribution part ... [isn't] a secret magic perf sauce, is it?

It was written by programmers much like you and me so -no- it's not secret or magical. The protocol is even partially documented [2]. What's more, AFAICT noone claims that it's the source of serious performance gains. It is -however- reliable, and -I gather- well understood, and backed by an active, talented development team.

[0] http://learnyousomeerlang.com/building-applications-with-otp

[1] www.erlang.se/publications/Ulf_Wiger.pdf

[2] http://erlang.org/doc/apps/erts/erl_dist_protocol.html