I'm really not sure the drive for garbage collected languages in memory constrained systems with no user recoverability. Seems like a recipe for a device that just stops working from time to time.
Embedded systems, IMO, must be deterministic, reliable and consistent. Introducing garbage collection violates these three principals. Without them, how can you guarantee an interrupt can be reliably serviced in time? How can you guarantee that memory growth won't be exhausted because of some unexpected condition which prevents a timely GC? Many embedded systems developers don't even use malloc() in lieu of static allocations so they can actually understand their memory requirements.
It's either big enough for Linux, in which case have at it, or you need to reconsider why you're down in the kilobytes of total memory with a garbage collector.
This seems awfully skeptical. The http://nerves-project.org/ project (which puts Elixir/Erlang on bare metal) seems to be gaining some success, and the BEAM VM is considered "near-realtime" and notably features no "stop the world" GC, doing it on a per-node basis (all of which are cheap to create/destroy and are entirely isolated).
http://www.rosepoint.com/ uses Nerves under the hood on its marine navigation products (so, real-world results). Garth (CEO) did a presentation a year ago or so where he said his entire industry is java-driven and was skeptical about his decision to go Nerves but after testing it on some realtime data for a month and seeing absolutely zero memory leaks or any faults (and relatively low CPU usage), he decided to go forward with it and has been pretty happy with it so far.
I assumed between that and the fault tolerant nature of the architecture that it would be ideal for a standalone device that would be harder to service. I remember doing specs for in vehicle systems in college and the "what if it crashes" question was a nightmare.
Unfortunately, BEAM is not really very easy to hack on. The community has gotten much better than it was a few years ago, but the code is pretty hard to understand.
I think the BEAM approach could be very attractive for embedded systems, given the right investment.
I'll grant you that there's not a lot of documentation for the internals, but I don't think the internal code is that hard to understand. To start with, unlike some languages I've worked with, a large amount of OTP is built in erlang, not C. Another key thing Erlang does is avoid complexity when it can. For example, the GC algorithm is just about the simplest GC you can get (caveat it is generational); because of the language constraints, a very simple GC is effective.
You certainly need to spend some time figuring out all the data types and how to access them in C, but if you're willing to spend some time, and you're capable of mucking about inside a VM; I don't see how it's that hard to understand. It feels to me to be on about the same level as the FreeBSD kernel; after chasing down surprising behavior enough times, I've got a pretty good feel for how to read the code, and whereabouts to start looking for code I'd like to read; but making changes can be a stretch, depending on where it needs to happen. OTOH, I only have to dive into the depths when my team manages to break beam or the kernel, which isn't everyday... If more things broke, I'd have more skill here. ;)
I've written a bunch of Erlang, and a bit of Linux Kernel bits.
You're right, rarely do I break BEAM because most of what I'm able to break is in Erlang. The only times when this is untrue is performance. If you spend a lot of your day hacking on BEAM, I'd love some more documentation like the BEAM handbook, if you're interested.
> I'm really not sure the drive for garbage collected languages in memory constrained systems with no user recoverability. Seems like a recipe for a device that just stops working from time to time.
Remember all those home computers from seventies and eighties, such as C64, Apple II, MSX compatibles and Spectrum? Almost all of them were running "garbage collected" BASIC interpreters.
I don't think blanket statements are justified. There are a lot of different types of embedded systems.
> Without them, how can you guarantee an interrupt can be reliably serviced in time?
You wouldn't allocate memory in IRQ service routine in the first place, GC or not. GC, dynamic (malloc) and static system would all take exactly as long to service an interrupt.
GC can also be a subset of the system, where less time critical functionality is running.
That's not to say embedded systems should do allocation at runtime. It's often reasonable to avoid it. But perhaps not all the time.
Are you sure those BASIC interpreters used garbage collection? I wouldn't have thought so. I don't think you even had dynamic memory allocation in most of these, you had to size your arrays in advance. The memory might not have gotten allocated until you got to that line of code but that's not the same thing...
I'm really not sure the drive for garbage collected languages in memory constrained systems with no user recoverability. Seems like a recipe for a device that just stops working from time to time.
J2ME is nowhere near being hard real-time. Minimal Java ME Embedded configuration: 32-bit MCU, 130 KB RAM, 350 KB Flash/ROM. This is close to border where systems have MMU and is considered very "big" configuration in realm of embedded systems.
10-20 kB RAM/64kB flash is a lot for many many applications.
Rust seems to be on a good path to be the real C competitor in general bare-metal development. Right now Cortex-M* family support is on level when you can start writing some apps in it. I'm not considering Ada to be mainstream.
Correct me if I'm wrong: Aonix is not existing - it was bought by PTC. Their real time Java is now targeting only x64/x86 systems.
You are wrong about "anything less"
Anything less is probably 90% of market. You will find 130kB RAM, 350kB flash only in high end products from NXP(LCP family), Kinetis(KL family) or STI( STM family). You only need so much RAM for JAVA :)
I believe his point was they implemented hard-real-time, embedded Java with GC on some components that was fielded successfully. That refutes any objection to whether it can be done. At that point, we're talking whether the new project can pull that off on what kind of hardware with what constraints.
EDIT to clear up two sub-conversations here: what Aonix pulled off for hard-real-time Java; that there's also hard-real-time GC's for embedded. Example of latter.
However I don't know what they have done with the PicoJava offerings, since PTC isn't as friendly as Aonix regarding the making the documentation available.
As for the market, it depends on which one the products built with those CPUs are actually being sold to.
My Cisco phone and the Ricoh printer around the corner both are running some form of embedded Java.
Maybe you think they are part of the remaining 10%, however Cisco and Ricoh though it was worthwhile for their sales.
I believe we have different definition of embedded system.
There are a lot of them, for me simplest definition is CPU/uC system without MMU.
Those Cisco Ricoh gear have Linux running inside (at least for phone I have it's some 200MHz MIPS)
The 32-bit market was at $6 billion by 2014 per Amtel's report. There's also a huge amount of sales for Windows Embedded and embedded Linux's. That represents a significant chunk of a massive market. So, it's quite worthwhile to call even a 32-bit-targeted, hard-real-time GC useful for "embedded" systems. As he said, it's part of the standard definition used in the embedded sector. The 32-bit side is going up rapidly, too, due to decreasing cost and power usage.
EDIT: The specs on them are also starting to look like the desktops of old. Actually, started to do that quite a while ago.
My definition is the market definition, it doesn't single out to one specific architecture design.
The market goes all the way from something like the PIC10 to something ARM v8 64 bit.
It is all a matter of what a company is targeting as customer market, and how much it is willing to spend.
Just because a given language is not able to target 100% of the market, it doesn't make it invalid to such market.
If that was the case, C wouldn't be a valid language for embedded development as well, given that many CPU/uC aren't able to cope with straigh ANSI C compliant compilers and require either a C dialect or pure Assembly.
No, he's making the point that the latency introduced by a GC is most important in the case of a hard real-time system. If J2ME isn't one, then using it as an example is specious.
Hah, that's a fair point, actually. Maybe I'm using a different definition of 'embedded system'. To me, anything that's a general purpose application processor these days (i.e. capable of running Linux) barely fits the definition. I wouldn't really call the iPhone CPU an 'embedded system' although, I guess, it kind of is.
Hah, that's a fair point, actually. Maybe I'm using a different definition of 'embedded system'. To me, anything that's a general purpose application processor these days (i.e. capable of running Linux) barely fits the definition.
I'd say that's a bit myopic.
There's a huge range of devices between "a few kilobytes of memory" and "smartphone" that would be well-served by something like this.
I agree with you that there's many device in that range, but why are they not well served by using an operating system? When is a device large enough to run the whole Go runtime but too small for (say) Linux?
Compared to a complete Linux kernel, the Go runtime is pretty tiny.
Buy to your point there is no magic answer. The question is rather: when is the capabilities of a full OS kernel like Linux worth the resources needed to run it? And the answer is ultimately: it depends.
I've written a prototype bit of software in Rust to run on a fleet of Raspberry Pi units. So far it seems impeccably reliable, which is nice, but cross-compiling it was not the most fun I've ever had programming.
As the tooling improves I can definitely see it being a good language to use on embedded devices. And it's a rather fun language too; I certainly find it more pleasant than Go to write (but I seem to be in the minority, considering the popularity of Go lately).
It actually runs inside Docker, on a Linux system, as I've been deploying it using Resin.io for our test units. Annoyingly I've had a couple of units crap out on me, might be a bug in an older ResinOS image that has since been resolved though.
When I get time (eventually) I'm going to be working on our own minimal Linux system for the devices. Really all I want is a device that can be accessed from behind firewalls (looking at Teleport for this with their new ARM support), and the rest can be compiled Rust binaries using upstart or somesuch :)
We do have some upcoming projects where I might get a chance to try writing stuff without an operating system. That'll be an interesting challenge!
OP is talking about more restrained embedded devices. Devices on which there is no docker or even OS. I would not name Pi as embedded device in this context.
I'd trust a GC implementation a lot further than I'd trust a typical C programmer. Yes, there's a certain risk that your device will occasionally stop working - but absent formal verification that's a risk for a device who's software is written in any language. Rather than an absolute notion of risk/no risk, let's start talking about acceptable defect rates.
Rust is pretty cool. If I were writing code for a platform like this I might use it (but probably only if I was convinced that GC issues were going to be a real practical problem if I used OCaml/Haskell/Scala-native).
GERT can't deterministically service interrupts, but it's technically because of the armv7a architecture and its non-deterministic generic interrupt controller. Also, unless your Go program is constantly creating and destroying objects, the garbage collector won't really run. I wouldn't put GERT on my ABS brakes just yet, but I think you can engineer around a GC.
I wish there was an armv7R or armv8R dev board around (that doesn't cost thousands) because those are actually meant for realtime applications and I would really like to try GERT on one.
> Embedded systems, IMO, must be deterministic, reliable and consistent.
This is the definition of a hard real-time system. In most of the literature, 'embedded system' is a broader term that just means there is some compute embedded in a device that performs a larger task.
It's looking more and more like mainstream embedded SoCs will combine the general-use HMI processor core (like an A8/A9) with a smaller real-time core for control tasks.
TI Sitara (Beaglebone family) does this via the PRU, and Freescale added a Cortex-M4 to the i.MX 6SoloX for a similar purpose.
I think it's worth to see how this experiment will work out in practice. Go's concurrent GC has predictable pauses less then <100 microseconds (not milli-) even on large heaps (>50GB) and heavy objects allocation pattern. I believe for the embedded software the heap will be much smaller :) and objects allocation pattern as well, so real pauses will be almost negligible with 99.9999% guarantee to be less <100microseconds (and, I believe, less then 10microseconds). Which may be just enough for many cases.
There is a price you pay for this, you can even get 1 microsecond pause but how much work you will make in this 1 microsecond? You should measure total time spent in GC through x seconds instead of measuring one pause. If your task takes a lot of time then all those GC pauses times add together to the task execution time.
In practice those numbers you gave (provided by golang developers) are not always true. I know because I run apps written in Go in production.
Many modern embedded systems are more powerfull than Xerox PARC Star with Mesa/Cedar, ETHZ Ceres with Oberon, DEC Topaz with Modula-2+, Washington SpinOS with Modula-3 were.
Also embedded real time JVMs fit in a few hundred KB and are being used by the likes of military, e.g. Aonix picoJVM, to control real time stuff like battleship missile tracking systems, which I assume is quite real time.
Given that there are real time JVMs controlling ballistic systems and aiming turrets on battleships, I guess they have a pretty good predictable response time.
Also I have a JVM running on the Cisco phone on my desk and the Ricoh laser printer down the hall.
Just, because there is a portion of the market that a certain concept doesn't apply, it doesn't mean it isn't viable in other segments of the same market.
For Go to be successful on embedded systems, doesn't mean it must run everywhere.
Heck there are even embedded CPUs that cannot cope with ANSI C, and that hasn't prevented people to make use of it on other market segments of the embedded space.
"its not about size, its about predictable response time."
Far as GC's, it's also about size if it's a constrained embedded system. I've seen a number of GC papers discussing tradeoffs between size (i.e. RAM use) vs speed/latency. This even factors in a bit on the large ones like Vega where they were still balancing those factors to get an optimal, "pauseless" GC for accelerating enterprise apps.
There's a fair amount of implicit allocation in Go. When you reference a slice, you create a slice descriptor, which takes up space. Maps, channels, and goroutines all allocate. It's not impossible to avoid allocation in a block of code, but it's not just avoiding explicit allocation.
How do interrupts work in this system? Do they map to goroutines, or what?
The thing is, this allocation is usually on the stack. With a bit of experience, it is relatively easy to write your program such that all these allocations are on the stack. The compiler also gives you a clear breakdown on which objects are allocated on the heap. It is one big advantage of Go vs. Java for example, that you do get much better control about the allocation. All Go types are "value types", so this creates much more control and less memory pressure. (Except for interfaces, they are great when you need dynamic typing, but they behave more like reference types and do some unavoidable heap allocation, so avoid them in code parts which must not allocate from the heap.)
No, it's really not. How do you allocate a data structure on the stack and then call a function passing a pointer to it at a polymorphic call site in Go? (A polymorphic call site is one in which the compiler cannot statically tell which function is being called.)
Yes, it is possible. The Go GC is written in the Go subset that can be compiled without heap allocation. The compiler even has a switch for this, which turns any heap allocation into a compile error. This is a subset of all possible Go programs, but still a very useful one.
Sorry, I don't know and haven't been able to locate it in the help for the build tool, but it was mentioned in a talk about the GC, especially about it being written in Go itself.
Oh I see what you mean. Yeah forcing vars to actually be stack allocated can be challenging when using interfaces everywhere. What are the Go authors' opinions on improving escape analysis?
Nim can run on bare metal natively and even on microcontrollers like arduino. The GC is deterministic, the GC algorithm is pluggable and can be also turned off.
Sure, though they generally achieve that by running on big iron, with lots of spare CPU and memory, and running a lot of big exotic software. That is, environments with the opposite of where you'd generally want to deploy any code bare metal.
The Aonix data on their website was very useful. It proved most of our points. That's why we reference it. Still available through archive.org. Although, just checking, I find a lot of the good links are dead on the times I tried. Uh oh.
In the Aonix design, there is no GC on the hard-real-time threads or portions of the heap. Those are usually done statically anyway for max predictability. The GC can apply to anything else with it preempted by real-time threads when necessary. It was usually tied into a RTOS like LynuxWork's (now Lynx).
I still refer to Aonix, because they were more developer friendly, had more information on their website than the few whitepapers from PTC and the web site is still partially up.
* Deterministic - the system is intrinsically incapable of undefinable behavior, provably so. (Though extrinsic factors like hardware or network failure could result in undefinable behavior).
No, but in manually collected ones you usually have complete control over when an allocation happens, and often you can keep everything entirely on the stack. That's one advantage C would have over Go for example.
Of course, in the end, it doesn't matter as much as people make it out to, because you can easily blow the stack in C. In reality, one of the worst disadvantages of garbage collectors is latency, and Go's GC is best-in-class in that respect.
And, obviously, while Go is pretty competitive in memory usage to many higher level languages, in my experience you can still be much, much more frugal on memory when coding in C.
Wow, it's a great piece of work. I was always wondering how to implement such big golang runtime on bare-metal, but someone finally did it. I glad to have a paper as well. So how big is the compiled binary of golang runtime? I hope it would be small enough to port to Cortex-M series MCUs.
Hi I'm the author of GERT. The size of the ELF for the laser projector program is 2.1M so it probably will not fit on a cortex-m :(. Additionally, I don't think GERT will be as useful on a single-core SOC as it is on a multicore chip because blocking operations (like reading a UART) may literally take forever. The memory safety can certainly be useful though! I'd say just start tinkering and try it out. You can probably gut enough of the elf to get it to a few hundred kilobytes.
It's awesome that you've released your masters project under a free software license. You see a lot of research that took several years of labour but wilts away because it remained proprietary. Well done!
It's planned. If ARMv8 had market penetration ~1.5 years ago, then I probably would have started that way. One big issue with most SOCs is the lack of publicly available data sheets for writing good drivers. That's also why I picked the imx6Q; its data sheet is very detailed.
I'd say that even two years ago most mobile phones being sold were already ARMv8. That doesn't help with the SoC documentation, you're right that this has been a consistent weak spot. Usually documentation is offered up or it washes up only when the market relevancy of any one SoC approaches zero. Before then, it's passworded up and jealously guarded. Makes no sense to me, especially when you consider most of any one SoC to be consisting of reused generic IP blocks. I mean, I can deal with an NDA for something tricky like your new GPU, but that doesn't explain why I can't figure out the interrupt routing or your clock and GPIO blocks.
If you're referring to ARM servers, then things are still pretty solid (it takes a while to line up an entire hardware and software ecosystem, even in a world where you're all set if it runs Linux). There are specs like SBSA and SBBR that ensure servers from any SoC vendor look roughly the same, but I would wonder why you would target bare metal in that case anyway. Have you considered targeting ARMv8 VMs, like the one modeled by KVM/qemu? Extra bonus in that it looks like an ARM server.
I've been developing with iMX over 5 years and I'll heartily recommend that part over any other Linux-class SoC on the market right now.
Freescale's support is probably the best available out there in this class of chips. Documentation is mature and plentiful (excepting the GPU of course but that's being worked around), and there is plenty of code sitting on their Github servers including Yocto recipes that are pretty close to mainline.
I just updated Debian on my cubox-i (also iMX6) last week, which also runs mostly Go code. Never thought that Go would be low-level enough to run on bare hardware without replicating a lot of OS-level code. Interesting project, thanks for sharing... I hope I get to check out how you did it when/if I have some more time.
Embedded systems, IMO, must be deterministic, reliable and consistent. Introducing garbage collection violates these three principals. Without them, how can you guarantee an interrupt can be reliably serviced in time? How can you guarantee that memory growth won't be exhausted because of some unexpected condition which prevents a timely GC? Many embedded systems developers don't even use malloc() in lieu of static allocations so they can actually understand their memory requirements.
It's either big enough for Linux, in which case have at it, or you need to reconsider why you're down in the kilobytes of total memory with a garbage collector.