What computer and software is used by the Falcon 9? (2015)

qchris · on May 31, 2020

Knowing that the ground control software is written in LabView is one of the more disappointing things I've read recently. I did a bunch of test engineering during co-ops during college, including multiple "codebases" of LabView, plus the normal class coursework and research lab instrumentation. I hate LabView, to the degree that I made a commitment never to apply to a job that has it's use as one of the requirements or as part of the desired qualifications.

I remember working on a getting a brushless motor test stand up and running using a LabView VI, which was written by one of the senior test engineers a couple years before. After spending days working on it, I eventually went to him for help. He sat down with me and it for another two hours, and even then, he needed to take the code (that he wrote!) and go sit down with it alone the next day for several hours to understand what was going on. If you're not working it essentially full-time, on the same codebase, it's insanely difficult to maintain any non-trivial code.

What makes the situation even a bigger shame is that National Instruments makes Labwindows/CVI, an ANSI C equivalent to LabView. The few times I've used it (in a graduate-level experimental physics class), it was so much easier to use and maintain.

bmitc · on May 31, 2020

How well do you actually know LabVIEW? Have you ever used its object-oriented system, actor framework, network streams, or get/set control value by indices (a feature actually implemented for SpaceX) functionalities? These are all things a professional LabVIEW engineer will be familiar with and use. The fact that a system didn't use these means it's either seriously outdated or the person writing it wasn't aware.

It sounds like you hate LabVIEW because you worked on a few poorly architected systems, as I don't see anything in your hate for it that actually applies to LabVIEW the language/environment.

> If you're not working it essentially full-time, on the same codebase, it's insanely difficult to maintain any non-trivial code.

It's hard to maintain any codebase if you're not giving it attention. I would argue that LabVIEW, when properly architected, can be maintained just fine and maybe even easier than other languages. I've done it.

LabVIEW is highly performant and excels at throwing data around and processing it and interacting with hardware. Its dataflow nature actually makes it easy to modify and maintain, when done properly. There aren't many languages that make it that easy to make user interfaces and distributed systems. Another language/ecosystem that I would consider for such a system is Elixir/Erlang, but you'll be searching for a user interface solution and lots of hardware drivers. However, with Elixir/Erlang you gain a system even more built for distribution but you lose out a lot on synchronous/sequential performance. C#/F#/.NET are other options to use since NI makes quite a lot of .NET drivers.

qchris · on May 31, 2020

This is definitely fair--I don't consider myself a LabView expert. I'm proficient with object-oriented systems (Python, C++, Rust (if you consider Rust object-oriented)), with get/set for indices and network streams, but not familiar with actor frameworks outside of a passing read-some-docs a few times. I'd estimate I've cumulatively spent around a month working on LabView full-time, including writing/editing several working instrumentation sets in different settings.

I don't like LabView because of the when-properly-architected asterisk. Sure, well-written code doesn't happen all the time in any language, but my personal experience is that the both the frequency and severity of that being the case is much higher in LabView than any other language I've used, making the barrier to entry for writing good code essentially being close to a full-time LabView engineer.

You'll get no argument that it can do a good job when it's written well--the good LabView VIs I've used are definitely fast, and as you pointed out, NI makes sure that it plays well with hardware. But, to me, the risk of getting hired somewhere with a poorly-constructed system written in LabView is far too high, considering the grief I've had trying to fix those system.

I appreciate your suggestions for other alternatives--I haven't personally used C#/.NET a lot, but I know several test engineers who love it enough to do all their GUI work in it. I'll keep it in mind next time I'm setting up a DAQ system!

bmitc · on May 31, 2020

Just a few clarifications. By get/set control values by indices I meant a specific feature in LabVIEW that allows high-performant updates of front-panel controls. You use these when you have a lot of data being streamed to the front panel. By network streams, I again meant a specific LabVIEW feature (https://www.ni.com/documentation/en/labview-comms/latest/dat...) for sending data back and forth over Ethernet. It's a nice high-level library that is performant. It just makes it extremely easy to pass data around. In any big system, these features are a must-have. Also, I wasn't judging, so apologies if it came across that way. :) There's just a difference between little small systems that don't use LabVIEW's features and those that do. Most people who have used LabVIEW to its full extent have completely different complaints than those whose experience revolves around university or internship LabVIEW software, and so I like to point out that LabVIEW isn't fundamentally flawed as a visual language.

I think that the when properly architected asterisk applies to any language. LabVIEW is in a difficult situation because there are two huge factors at play. It makes it easy to start off and get things going, but systems inevitably grow beyond their original scope, and so LabVIEW systems are often developed by non-software people but still expected to grow. That's a conflict. The second thing is that more traditionally trained programmers have a mental block when it comes to LabVIEW, so people who do know how to properly modularize code don't pay attention to LabVIEW. So in LabVIEW, you have some experts who do treat it seriously, as if it was any other software system, but then a lot of code that was thrown together by others new to software and to LabVIEW. I would say Python is actually in a very similar situation by being "friendly" to beginners, so there is a ton of bad Python code out there.

I once interviewed at a place that had completely sworn off LabVIEW. I couldn't get them to explain why. They just hated it. But then I asked to see their Python code, which is what they had moved to, and I was greeted with a single file that was 10,000+ lines long. And in there I saw some function signatures that were 15-20 lines long (yes, the function's arguments spanned that many lines). So I see that as a disconnect. They hated LabVIEW but didn't know why and loved Python and didn't know why. In either case, they didn't know what they were doing.

The .NET APIs NI makes are nice enough, and I have used them before but not to a deep degree.

heisenzombie · on May 31, 2020

I have written and maintained a piece of testing/control software. It was originally horrible “old” spaghetti Labview. I maintained it for a while then wrote a replacement with Labview’s actor framework. I agree that the AF seems to be the only reasonable way to write certain kinds of larger application. However, the actor framework was, for me, just barely good enough to be usable.

- The debugging story is very bad: There were community efforts to write some tools to help, but they amounted to downloading software from a forum post.

- Development is incredibly tedious: Creating a new actor, or message, or (god forbid) implementation of an abstract message. They all take SO many clicks across lots of files in a slow IDE. Tedious and error prone.

- Distributing actors across nodes is not supported by the framework.

- There is no concept of supervision trees, so if you want “let it crash” robustness, you’ll be writing that yourself

I wish it were different, because at its heart I kind of love the idea of a strongly-typed dataflow language.

Edit: I want to point out that despite how personally ugly I found the experience, I’m not sure there was a better choice. I would love to have used Elixir, but what (non-software) engineering company can hire for that?

bmitc · on May 31, 2020

I agree with your complaints. Indeed, implementing an abstract message is awkward, although there was a specific version that did improve the project interface to make that easier at some point. I remember a relief of "finally". Haha. The right-click menu for the actor framework is implemented using project providers and LabVIEW scripting, both of which need massive improvement. It's an awkward process to even get the license to implement project providers as a user. I've just ignored them because I'd rather just write custom tools off to the side that use LabVIEW's application control system manually. And there's also a lot of ideas I have that require scripting, but it's not an easy system to use.

I also agree with the actor framework being too bare bones, and there are plenty of things I disagree with in the design. That's why I extended it myself to add in additional support. For example, I added built in messages to notify when an actor is ready, a publish/subscribe mechanism to break out of the recommended hierarchical setup when needed, a streaming actor that you can subscribe to the data it streams out via messages, a finite state machine actor that allows state transitions triggered by messages or internal requests and message mechanisms to notify when this happens to subscribers of the state updates, and some more.

> There is no concept of supervision trees, so if you want “let it crash” robustness, you’ll be writing that yourself

Yea. I recently learned Elixir/Erlang to some degree. I'd like to have an OTP-like system in LabVIEW. The above extensions are actually already close in some ways. For example, the FSM actor I made is similar to gen_fsm or gen_statem in OTP.

> I want to point out that despite how personally ugly I found the experience, I’m not sure there was a better choice.

Therein lies the rub. I've searched for one. Despite my complaints about LabVIEW, I end up finding other systems even harder to work with for these applications. Elixir/Erlang are great, but developing a user interface with them will require web-technologies. C#/F#/.NET is the other system I've identified, but it's a fairly complicated ecosystem. And what language allows me to do desktop apps, web apps, web servers, real-time deterministic programming, and FPGA programming all in the same language and environment? It's a tough feature set to replicate and replace.

simontheowl · on May 31, 2020

I think another thing that is often overlooked is that a project like this requires a ton of very diverse hardware in the form of hundreds or maybe thousands of sensors, valves, PLCs etc. A rocket launch site reminds me a lot of an industrial process facility. National instruments sells all of that hardware with LabView as a complete solution. Their PXI systems, which are used extensively in the process industry, are quite expensive up-front but the seamless integration of the hardware with Labview drivers, Teststand etc makes it so much more cost-efficient down the line.

jccooper · on May 31, 2020

That AMA is from 2013, when LabView would have seemed like a more reasonable choice. Dunno if they've changed since then, but it's possible they've come to their senses.

azhenley · on May 31, 2020

Having worked at National Instruments, I will say there are some engineers out there that really love LabVIEW. It has been a successful tool.

elteto · on May 31, 2020

A slight inaccuracy: the flight strings are not x86, but ARM, running on a custom board.

Also, only the actual graphical display application uses Chromium/JS. The rest of the system is all C++. The display code has 100% test coverage, down to validation of graphical output (for example if you have a progress bar and you set it to X% the tests verify that it is actually drawn correctly).

Game_Ender · on May 31, 2020

How stateless and independent is each display?

elteto · on May 31, 2020

Can't say exactly re statelessness, but I would not be surprised if you could power-cycle all displays if needed in the middle of a mission.

As far as independent, completely independent from each other.

yellowapple · on May 31, 2020

> but I would not be surprised if you could power-cycle all displays if needed in the middle of a mission.

Indeed you can, judging by the instruction to power off and clean all displays (then power them back on) prior to ISS rendezvous/docking in the live stream.

zeroc8 · on June 1, 2020

How is that done? By pixelwise comparison with a test image?

warrentr · on June 3, 2020

I'm curious about this as well. From what I remember chromium does what you suggest to verify what is rendered down to the pixel: https://developers.google.com/web/updates/2020/02/chromium-c...

erikbye · on May 31, 2020

SpaceX has had a booth at GDC at least two times, they attend in order to recruit game developers (programmers), as they feel these programmers work with a lot of the same constraints they do in terms of resources and experience with real-time, performant systems, etc.

cbanek · on May 31, 2020

It's true, I was there! (And I was a gamedev previously). Although you forgot crunch time which I think is the real required experience. ;)

nullifidian · on May 31, 2020

Sounds kinda disappointing/irresponsible. Chromium is an overly complex, relatively buggy piece of software. For example I recently experienced a bug where a text I selected was significantly offset from what should have been selected given the pointer position (on linux), a misclicky kind of bug. Considering that the majority of Crew Dragon's controls are through touch screen, maybe even manual docking(not sure, but I don't see a joystick anywhere), my personal experience with chromium does not instill confidence for such critical applications. But I guess they tested it very thoroughly.

C++ and linux do not strike me as mission critical piece of software / language either. My heart desires for something with verified properties, advanced languages with integrated proof checkers, verified OSs, or at least smaller purpose built OSs that were carefully reviewed. But I guess SpaceX is not to blame for this. It's likely such approaches are simply not ready yet. And I know that C++ and linux are common in mission critical applications. JSF avionics is written in C++ for example.

(Not to mention JavaScript in space, in a mission critical role. That shouldn't have happened.)

Nokinside · on May 31, 2020

Most safety critical code in the world, both civilian and military, is currently written in C/C++. Ada/SPARK is also used but it 's below 50%.

These languages have the best practical static code analyzer, verification and proofing tools money can by. I'm personally using Astree https://www.absint.com/astree/index.htm Automatic docking software for the ATV that delivers supply to ISS is written using C code and verified with Astree.

People get stuck into the language, but if it has some features that make it work, you can use it with verifiers. Then you write tests and simulators and tests ...

JavaScript and Chromium seems more suspect to me, but if the code is well tested and has limited run time, I think it can be good to go. The state is stored and prosessed in those radiation hardened RAD750 processors. Chromium/JavaScript runtime image can restart every few seconds if needed.

The UI and controls seem like unnecessarily flashy bullshit to me. Usually if you diverge from KISS principle bad things happen. But before I make any judgements I would see how they made those decisions.

thebruce87m · on May 31, 2020

How much is Astrée? I get nervous when you can’t see a price on the website.

elteto · on May 31, 2020

In this case, C++ code is written in a different style than regular applications: exceptions are not allowed, allocations are not allowed past a certain point in program initialization, raw pointers are frowned upon, etc. This is common in this type of software and when you pair it with the right development culture it really makes a difference in the robustness of the result.

erikbye · on May 31, 2020

Is C++ code formally verified, e.g. with something like Coq?

justinclift · on May 31, 2020

Any idea if there's a database layer (eg SQLite, PG, etc) on the systems?

Nokinside · on May 31, 2020

None if possible. In the safety critical parts you want to use software that

- don't do dynamic any memory allocation, has bounded memory usage, never runs out of memory.

- has bounded execution times

- has low interrupt latency

- can do checksum on data and internal state an option. (if need to load into memory)

- is tested using the most rigorous standards, 100% verification and code coverage.

If you write great relatively general but customizable database system like this with good indexing, you need only 5-10 customers and you are set. Safety-critical systems are growing in size and there is real need for software that can get easily certified.

SQLite could probably get certified as a part a system at least as read only database. It's relatively solid code. But it will cost big money and I don't think anyone has done it.

elteto · on May 31, 2020

On the ground side, yes, there are many databases for different purposes. But not on the actual flight software.

exikyut · on May 31, 2020

Very curious, how is state persisted where non-volatile storage is needed, and how is that persistence structured?

elteto · on May 31, 2020

It is never persisted outside of RAM. The flight computers are never powered off during a mission. You rely on two flight computers to always be functioning, that’s why the system is one-fault redundant.

You can think as each string being “functional“: given the same set of inputs you expect the same set of outputs.

coss · on May 31, 2020

What happens if some anomaly happens where the shuttle loses power for a second?

myrandomcomment · on May 31, 2020

Are you asking about what happens if all the power goes out or are you asking how they get state back? If everything reboots it’s fine. On restart the computers will look at all the sensors add project the state of now. There is no reason to keep the state of the past is there? Why does it matter where the craft was 5 minutes before? What matters is what is what is in front of them and the future right? If you tell the computer I want to fly home it looks at where you are and tells you how to get there. It does not need to remember the past parts of the trip does it?

coss · on June 8, 2020

It'd have to know that it's destination was earth, or does it just sit idle after a power on and wait for the command? What happens if the astronauts are asleep? I feel like it needs some persisted data.

perl4ever · on June 1, 2020

I'm reminded of the quote from Linus:

"Real men don’t use backups, they post their stuff on a public ftp server and let the rest of the world make copies."

Real programmers don't use persistent storage, they just rely on physical reality in general.

Nicksil · on May 31, 2020

State is, as far as the system is concerned, is what values the multitude of sensors are reporting at any given moment.

stefan_ · on May 31, 2020

The fact that it's implemented in JavaScript and Chromium just tells you the fancy display is entirely superfluous to mission success.

htk · on May 31, 2020

Interesting way to see it. For anyone wondering how unreliable it is, think instead that the trade off was well planned.

yellowapple · on May 31, 2020

Is there a backup should that fancy display fail? If not, then I'd hardly call it "superfluous to mission success".

Czarcasm · on May 31, 2020

Apparently the physical controls below the screens can perform all of the mission critical functions in the event of a display failure.

_emacsomancer_ · on June 9, 2020

If only browsers had a similar fallback.

jchw · on May 31, 2020

Once you decide that absolute reliability is unattainable, you can instead aim for sufficient redundancy. It’s not perfect, but with enough safe guards in place it may very well be good enough.

microcolonel · on May 31, 2020

Software is not always like hardware though: the same software failure can cascade across redundant instances rapidly just as easily as it affects the primary.

jchw · on May 31, 2020

The same hardware failure can cascade too; redundancy alone is not a panacea, you need some safeguards. Ideally, nothing should fully trust anything else.

perl4ever · on June 1, 2020

I was just reading that they design the rocket so that one engine could explode without affecting the others (hopefully) and they test the engines by dropping a nut or bolt or something in, which is not something rocket engines normally can tolerate.

microcolonel · on June 1, 2020

Maybe cascade was the wrong word. I meant that if you pop in a hot spare and it's running the same software, there's a good chance it will encounter the same issue, sometimes immediately.

perl4ever · on June 1, 2020

In some cases redundant software is developed independently by different teams to the same spec; I think I read about that with the Space Shuttle.

How correlated are your layers of swiss cheese...?

eqvinox · on May 31, 2020

I'm reasonably sure that quite a bit of the complexity is more or less directly caused by the amount of features it provides. I agree with you in that Chromium probably brings in a lot of features they don't need, but I would also say that reimplementing the features they do need is also a huge liability and would incur quite a bit of complexity too... but it'd make it their insular system with no other resources flowing into improving it.

I don't have the data to make that call, but I would think someone at their end evaluated this quite thoroughly and determined the risks of rolling their own to exceed the risks of using Chromium.

(Also they mentioned on the stream that they do in fact have a few hardware buttons for important things. Can't cite this, sorry...)

davedx · on May 31, 2020

Why not Javascript? I would argue that a good usable UI in a spacecraft is a good thing, and Javascript is a great language for building UIs

fdw · on May 31, 2020

> Javascript is a great language for building UIs

I disagree. It is easy to build something presentable with HTML and browsers, and there have been many JS frameworks that lower the barrier. However, that is not because JS is a good language for it; it is because JS is the only language available in that environment.

In fact, there are at least two other languages that transpile to JS, because you need JS to be able to run in a browser. But JS the language is not good enough, so you want two write your code in something else.

Building a UI in a completely different language, without a browser, is definitely harder, but you might have several advantages in the other, non-UI parts.

davedx · on May 31, 2020

You haven’t made any arguments about why Javascript isn’t good for building UIs there. Here’s mine for why it is:

- first class functions (callback style is a good fit for UI programming)

- async/await and single threaded - many components of UIs are async by nature, Javascript supports you well here while keeping things simple with its single threaded model

heavyset_go · on May 31, 2020

Sharing the UI thread with logic is something to be mitigated when building UIs, it's not a feature.

davedx · on June 1, 2020

In theory, maybe. In practice, your UI thread very rarely has to do the kind of heavy lifting that would block it so badly you actually have problems.

I have encountered this -- for example parsing very large incoming JSON payloads -- but very rarely. And if you really need to mitigate it, there are webworkers.

vertex-four · on May 31, 2020

There's... a large number of languages with both of those features.

scotty79 · on May 31, 2020

Could you name few better than js?

vertex-four · on May 31, 2020

In the context of writing reliable software for a spacecraft? Pretty much anything with a reasonable type system, for a start. Maybe C#, or some reasonable subset of C++20, or Rust. Maybe Reason or OCaml.

errantspark · on May 31, 2020

I mean very clearly the good people at SpaceX have heard of these languages and decided against it. I think JS and Chromium are the best UI kit I can think of in terms of man hours to get X done. If you're careful the only thing I've ever had issues with is latency, but in space if you're in a situation where missing a frame or two is important you're already beyond fucked.

zeroc8 · on June 1, 2020

From my experience, Qt/QML is way better for things like that. I mean, the UI does not need to be responsive, does it?

Maybe they couldn't afford the Qt license.

davedx · on June 1, 2020

Many people are using TypeScript these days, it works very well.

I like C# and Rust too. I've built UI's in C++ and MFC/OpenGL/DirectX. It remains my fairly strong opinion that JavaScript is well suited to UI development.

Wilem82 · on May 31, 2020

Javascript is a bad language period. For anything. It's used for UIs for historical reasons, not because it's good.

yoz-y · on May 31, 2020

Javascript has also evolved quite a bit and there is also TypeScript that improves on that. The language is not perfect by any means, but it gets better.

dmitriid · on May 31, 2020

I would expect the UI in a space ship to be soft realtime. Chromium and JS are anything but.

adrianmonk · on May 31, 2020

The only thing that's self-evident to me is that Javascript is a popular language for building UIs.

raverbashing · on May 31, 2020

I don't see why.

Chromium is being used in the UI. Not on critical flight software.

> maybe even manual docking

It isn't done manually

> C++ and linux do not strike me as mission critical piece of software / language either

A lot of flight software is done on C++, as you even mentioned. In the end, your best reliability partner is not using a language that's only know by fewer people.

What would be a good "mission critical piece of software"? QNX (with no memory protection?). Linux is fine. I suppose they're not firing Ubuntu out of the box and running that with no customizations.

Separation of concerns and redundancy usually beat correctness on the long term (which doesn't mean they didn't test the correctness of the system).

StillBored · on June 1, 2020

  QNX (with no memory protection?)

Its probably a misunderstanding, but QNX has a very strong memory protection model that extends to its driver stack. Given that its a true realtime/microkernel, which has its own set of problems, but memory protection isn't one of them. For a number of cases QNX is going to be a large step up from linux.

http://www.qnx.com/developers/docs/6.5.0/topic/com.qnx.doc.i...

mofosyne · on May 31, 2020

Unlikely that they are using javascript in the actual flight controller. Also under the screen they have a small amount of manual buttons that is likely to be seperate from the touchscreen interface logic.

nullifidian · on May 31, 2020

>javascript in the actual flight controller.

zero chances that it's the case. I meant that in aerospace applications UI is mission critical too.

eqvinox · on May 31, 2020

> I meant that in aerospace applications UI is mission critical too.

It is, but usability of the UI is mission critical as well. The humans in the capsule need to be able to quickly and effectively consume information in a way that doesn't make important facts hard to get at, and then quickly and effectively perform commands to address issues.

I don't think a touch GUI is a bad choice in this regard, and if I'm building a touch GUI I'd rather use some existing framework. My personal choice would've been something like Gtk or Qt though :)

vijaybritto · on May 31, 2020

It should be most likely be WebGL and some glue js code. I don't think they're running everything in just js for the view code.

edsac_xyzw · on May 31, 2020

In the case of SpaceX, it seems that they use embedded Linux and C++ with x86 processors, same as standard PC processors. A reason for using embedded linux is that, it allows using standard C++ or even scripting languages for controlling the hardware from the user-space by just reading and writing files. Linux device drivers (aka kernel modules) maps the hardware to special files on /sys or /dev. For instance, it makes possible to control a GPIO (General Purpose IO) which the device driver maps to sysfs special file system, by just writing 1 or 0 to the file a like /sys/class/gpio/gpio4/value which would enable GPIO 4 and turn on a LED attached to it. Another practical example about this feature is that, on Linux, it is possible to turn on or turn off the keyboard capslock LED by writing to some /sys/class/leds file such as "$ echo 1 > /sys/class/leds/input7\:\:capslock/brightness" which turns on the Capslock LED. By writing 0 to it, the LED is turned off.

This feature of Unix and Linux allows controlling the hardware with any programming language capable of reading and writing files, including Python and standard C++. I guess that they may be using standard PC hardware with industrial IO card. They may also use a single-board computer or custom board with x86 low-power variants processors built for embedded applications as SOC system-on-chip. One example of low power x86-SOC based processor is: https://www.cnx-software.com/2015/04/09/vortex86dx3-is-a-new...

ryder9 · on May 31, 2020

[flagged]

7leafer · on May 31, 2020

you sure gave enough arguments to beat those of the parent.

ryder9 · on June 1, 2020

not on me to prove his arguments stable genius

gumby · on May 31, 2020

Interesting that it’s three identical processors running he same software. I remember just the automated Vienna central train station used a pair of machines, x86 and SPARC, one running the logic in a procedural language (Chill) and the other running in a logic language (Prolog). They didn’t want both processors to share any code, and therefore potentially the same bug.

fit2rule · on May 31, 2020

That hasn't been the case in Vienna for some years, so you must have seen that setup a decade or so, ago.

These days its THALES-built systems running SIL-4 rated code in a best 2-of-3 or 3-of-5 voting system, and each of the nodes can be either PPC, Arm or x86 - but not all the same, for architecture reasons. These systems produce a vote that has to match, or else the system shuts itself down. Chill and Prolog haven't been shipped in that capacity in decades...

(Disclaimer: worked for Thales on these systems, had to help remove a ton of Chill code, fix bugs in its compiler, etc.)

gumby · on May 31, 2020

Yeah, I was involved in the Alcatel system in the 90s.

Still, the basic point is interesting: the Thales system also uses a separation of architecture.

hoorayimhelping · on May 31, 2020

>They didn’t want both processors to share any code, and therefore potentially the same bug.

They're not worried about bugs, they're worried about electromagnetic radiation (which is way more present when you get outside the atmosphere) flipping bits.

fit2rule · on May 31, 2020

There have been cases of homogenous CPU architectures in safety-critical where all CPU's failed at once.

Bitflippinrads are one of a multi-variate set of problems that these systems are designed to detect and avoid. Nodes can degrade otherwise: SRAM failure, battery/cap depletion, all sorts of things. The idea is to detect when a voting node has had a catastrophic system failure, and then remove offending nodes from the decision as rapidly as possible.

A multi-architecture system assists with managing (and certifying) things such as compiler bugs and impact from system-level homogeneity, where critical components all exhibit the same bug/failure. An x86 hardware bug is fatal if all 5 of the nodes in a voting system are x86 and making a critical decision; it is less of an issue if the other 2 or 3 nodes are PPC, ARM, MIPS, etc., or a mixture there-of, and - as variant hardware architectures - are democratically still voting during the period .. the idea is to avoid homogeneity and pursue heterogeneity.

gumby · on May 31, 2020

> They're not worried about bugs, they're worried about electromagnetic radiation

I'm aware of that; I commented because I'm surprised by it.

I'm sure the issue was discussed so I'm not implying that they are idiots. But it does seem odd to me.

kraemahz · on June 1, 2020

They are worried about bugs, but they address those in other ways. This is some of the most extensively integration tested software there is (I was part of the team that built the integration testing). Every commit is ran through a custom CI system that successively tests pieces of the system working together up to and including automated hardware tests. The number of lines of code dedicated to testing the vehicle is definitely much greater than the lines of code on the vehicle.

hliyan · on May 31, 2020

Looks like the goal is radiation tolerance, not n-version programming.

justsid · on May 31, 2020

This is exactly how the Airbus fly-by-wire system works as well. Two software stacks running on two separate hardware configs get bundled together in a single box. Then they take three of those systems and feed them from different redundant sensors together with pilot input and have the system vote on the output. If one computer returns a different result than the other two, it’s assumed to be faulty (or fed by faulty sensors) and no longer considered for input. On top of that there are actually separate redundant computer systems that run different parts of the plane with some overlap in case one system completely fails.

woile · on May 31, 2020

From the latest video NASA uploaded, the astronauts explain that the UI sends "commands", and if it fails, they have physical buttons below the screen to send these commands.

https://mobile.twitter.com/NASA/status/1266885097359388672

smarx007 · on May 31, 2020

The UI at 2:22 is very similar to the one discussed in https://news.ycombinator.com/item?id=23162820 as seems to use three.js and GSAP. Also, the open-source Lato font seems to be used throughout the interface.

throworangeaway · on May 31, 2020

open-source Lato font

That'd be a really nice touch, one of my favourite fonts for effects. Released after client who commissioned it, a chain of retail bookstores, rejected the design. Largest deployment till now, although just dumped in redesign, was for headings at classifieds website OLX. Caught eyes for a while I guess.

No need then for an especially legible font for cockpit screens like Airbus required https://b612-font.com/ following decades of practices in avionics. Of course the use-case is different and I'm sure everything is adequate for SpaceX goals. Like with the several layers of spontaneously rebooting virtual machines in Tesla cars, in the end branding looks gorgeous.

irrational · on May 31, 2020

I’ve read a bit about how strict the space shuttle coders were. I wonder what they think about the code running this spacecraft.

https://www.fastcompany.com/28121/they-write-right-stuff

https://history.nasa.gov/computers/Ch4-5.html

https://www.nasa.gov/mission_pages/shuttle/flyout/flyfeature...

ncmncm · on May 31, 2020

There are reports that the procedure for fixing bugs on the STS was so heavy-weight that they often redefined non-fatal bugs as correct operation. The example I read of was an error event that dumped thousands of lines of text to the flight console screen, obscuring useful information.

OnlyOneCannolo · on May 31, 2020

Basically every change, whether software or hardware, had a lengthy acceptance process. Reviews, analysis, verification and validation, documentation, operation and maintenance procedures, etc.

ashleyn · on May 31, 2020

COTS software is nowhere near what I'd call "spaceworthy". They either determined the UI was not life-critical or ignored that fact.

Torkel · on May 31, 2020

”Source: Discussion with various SpaceX engineers at GDC 2015/2016”

I wonder if they had to change any of this for the human rated mission? Would be super exciting to hear tech talks from Spacex!

neurostimulant · on May 31, 2020

Next time someone complaining about electron I'm going to tell them it's technically chromium and thus space worthy :)

ipython · on May 31, 2020

Fascinating. The increased complexity in the flight control software and user interface probably makes the job of being an astronaut easier in the “happy path” - less cognitive load interpreting metrics and raw telemetry when you have the computer doing that for you.

My wonder then is how to effectively manage when things go wrong. The Apollo spacecraft were able to be essentially hot patched directly in the cockpit via the dsky. Can you do that with a fancy touch screen interface now? Perhaps up/down links to the capsule from earth are good/reliable enough to send patches from ground control, if need be? I wonder how the astronauts feel about a potential “loss” of control in case something goes wrong?

peter303 · on May 31, 2020

Other references say parts of Falcon9 and Dragon a version of Unix calLed VxWorks from Charles River. It is realtime (not virtual) and predates Linux. NASA uses it in a lot of its space probes because its been debugged for 33 years.

That doesnt means its perfect. The two 2003 Mars rovers experienced a bug in the then-newfangled flash memory driver. It was successfully fixed and reloaded from Earth.

https://en.wikipedia.org/wiki/VxWorks

dilyevsky · on May 31, 2020

I’ve worked with vxworks and I don’t think it can be considered Unix unless something changed in last 10 years

BiteCode_dev · on May 31, 2020

Can you give some details about the experience?

I'm curious about this system, but not curious enough to spend a week playing with it.

dilyevsky · on June 1, 2020

We used it for “hard realtime” embedded system didn’t even use virtual memory. Used to have pretty archaic toolchain (diab and some clunky eclipse setup). The os itself seemed pretty meh too (we had license and hence all the sources). Not sure how you would even get to try it without some trial agreement with their rep? Maybe try Qnx instead? It implements Posix so at least you’ll have easier time moving to linux. Also hard realtime and actually is a micro kernel arch which is nice

BiteCode_dev · on May 31, 2020

And they added Python 3.8 support last year:

https://blogs.windriver.com/wind_river_blog/2019/09/vxworks-...

This damn snake can be compiled on anything.

tonyedgecombe · on May 31, 2020

VxWorks from Charles River

I think it is Wind River Systems.

jamespetercook · on May 31, 2020

“They use Chromium and JavaScript for the Dragon 2 flight interface.“

This is the most interesting part for me being a web dev! How likely do you think this is?

aetherspawn · on May 31, 2020

Just because they do, doesn’t mean you should.

I’m running an automotive firmware project at the moment and there are such things as software integrity levels, and each time you add something to the stack you create quite a large amount of testing and validation and process that needs to be followed.

And it’s pretty much impossible to take something existing like Chromium and certify it to what you need. The architecture is wrong. The process is wrong. There is a lack of formal requirement traceability. Many other issues.

fsh · on May 31, 2020

I believe the choice of Chromium and JS for the UI simply shows that the astronauts are merely inconvenient payloads. Gagarin's first flight was already fully automated 59 years ago.

TeMPOraL · on May 31, 2020

Should've paired them with a dog. As the old joke goes, the astronauts are there to feed the dog; the dog is there to bite the astronauts if either of them tries to touch any of the controls.

rurban · on June 1, 2020

As we saw in a recent Space Force episode on Netflix, the ape was much better than the dog. But you need pair them, as with Doug and Bob. Doug being the ape of course.

usrusr · on May 31, 2020

The UI design consisting exclusively of elegant but low contrast shades of blue certainly gives a similar vibe. But that might still be misleading, perhaps they deliberately keep the routine visualizations in a low key "style over substance" passenger entertainment mode so that they have plenty of visual headroom for when something actually important combs up?

racingmars · on May 31, 2020

This reminds me of the Airbus concept. They follow what's called a "lights out" design for all of those buttons on the overhead panel. When things are set and working the way they should be for normal flight, there is no indication. Only if something is non-normal does the indication or button light up.

For example, buttons that should be "on" for flight only light up when the system they control is "off", and things like emergency overrides or backup systems that should be "off" normally have buttons that light up when they're "on".

It's an interesting concept. Obvious benefit is you can glance at the overhead panel and instead of having to consider the state of, say, 50 or more buttons, you can instantly see if everything is "normal" for flight. But the downside is that it's not as intuitive to answer the question "is system X turned on (or off)"...you have to look at the button, then know what's considered "normal" for that system, on or off, and determine if the lack or presence of indicator light means on or off. That said...the pilots know the plane well enough where it's not a problem.

CapricornNoble · on May 31, 2020

That's an interesting thought. If the UI is always blue-ish...and suddenly a bright red triangle appears with an Exclamation Point inside....you might be about to die and should act accordingly.

neurostimulant · on May 31, 2020

Right, I think the crew dragon is intended to fly fully automatic from launch to docking? But imagine if there is an emergency and the crew need to take over, but lo and behold, chromium crashed! Hopefully the spacecraft can still be operated without the display.

TeMPOraL · on May 31, 2020

That's just the ship telemetry tab that's crashed. The tab with YouTube still works (undoubtedly playing "Space Oddity").

je42 · on May 31, 2020

I think it is not helpful to call out that their architecture is "wrong" and their process is "wrong".

The attribute "wrong" is very opaque.

As somebody who doesn't have your background, i cannot follow you. I have no idea about automotive and space software requirements.

Since there are many aspects that lead to choices of a system architecture it requires more words to explain why a choice is not a good choice for the job.

aetherspawn · on May 31, 2020

I was referring to the architecture of Chromium being wrong to achieve certification for safety critical applications.

And look, I haven't said impossible, but rough estimate I'd say removing 75% of the features and 5+ years to rewrite it to be compliant. And the subset of javascript that it could run would be very limited - it might not even have loops or functions.

wadim · on May 31, 2020

But mixed-criticality systems are quite normal in many deployments nowadays, especially with the advent of virtualization and hypervisors you should be able to run safety critical and non-safety critical functions at the same time on the same SoC.

aetherspawn · on May 31, 2020

Yes, but you have to prove that if one display considered to be "non critical" shows misleading data, it will not cause harm to the crew. For example, if the Chromium crashed and showed the heading or fuel level incorrectly, would they fly the rocket ship into the ocean or run out of fuel and drop out of the sky.

Yes, it's called software integrity decomposition, and it's complicated and requires much analysis.

jki275 · on May 31, 2020

Heresy.

api · on May 31, 2020

That was the most surprising part to me too, but not that surprising. Lots of automotive and newer industrial UIs use Qt Quick with QML which is not all that different from HTML5 and JS with something like React.

It's used in the presentation layer, not the actually control systems.

emteycz · on May 31, 2020

It is very different from the audit/certification viewpoint, and Qt specifically caters to the automotive market

nsajko · on May 31, 2020

https://en.wikipedia.org/wiki/Triple_modular_redundancy

varjag · on May 31, 2020

The Flight Software team is about 35 people. We write all the code for Falcon 9, Grasshopper, and Dragon applications; and do the core platform work, also on those vehicles; we also write simulation software; test the flight code; write the communications and analysis software, deployed in our ground stations. We also work in Mission Control to support active missions.

jonplackett · on May 31, 2020

Imagine if everyone on earth was as productive as those 35 people.

BiteCode_dev · on May 31, 2020

It reminds me of this olognion piece:

https://www.theolognion.com/unreal-engine-5-is-meant-to-ridi...

That's not just hilarious, that's also pretty close to how I feel most of the time.

I mean, some people have built the Airbus A380.

On the other hand, last week, I created a test DB with the user "test" and the password "test". I couldn't connect to it, and after a long debugging session, noticed I misspelled "test".

So, yeah...

jonplackett · on May 31, 2020

Well, that Mars rover missed its target because they got their units of measurement mixed up.

https://www.latimes.com/archives/la-xpm-1999-oct-01-mn-17288...

So, don’t feel too bad...

BiteCode_dev · on May 31, 2020

It does feel better knowing my mistake didn't cost $125-million dollars.

Is there a book that lists many bugs and weird IT stories? Like the 500km email or the FileNotFound boolean.

arexxbifs · on May 31, 2020

A selection of dangerous and/or expensive bugs from space flight alone:

https://en.wikipedia.org/wiki/List_of_software_bugs#Space

I'm sure there's plenty more, albeit of lesser significance.

BiteCode_dev · on May 31, 2020

Thanks

jagannathtech · on May 31, 2020

I put comma ',' instead of the dot '.' in a file name with extension and wasted a couple of hours debugging.

arexxbifs · on May 31, 2020

I appreciate it's a joke, kicking at open doors at that. Still, here's what one dude coded in JavaScript on his spare time a few years ago: https://www.pouet.net/prod.php?which=71881

ra0x3 · on May 31, 2020

This is amazing XD

karatestomp · on May 31, 2020

You can get a lot done when you're not re-inventing HTML dropdown form elements or replacing the Android date picker because some "stakeholder" just really, really needs them to look on-brand. And other low-value bullshit tasks done solely on the guesswork-hunch-feelings of someone whose job is to create new Jira tasks. See also: 90s development teams who made all kinds of things you'd expect a team 5x as large and a timeline twice as long for, today.

Cyph0n · on May 31, 2020

I’m imagining the insane work hours.

varjag · on May 31, 2020

There is probably quite some overtime involved. That said, you don't get reliable flight software simply by putting long work hours.

7leafer · on May 31, 2020

The whole Dragon is a simulation written in Python and running inside a VM running inside an OS written in JS and running inside an Electron app running in a container somewhere on Amazon AWS.

sansnomme · on May 31, 2020

Welp, I called it two weeks ago - https://news.ycombinator.com/item?id=23146976

BiteCode_dev · on May 31, 2020

They should have said "Visual Basic and some Excel macros", just to see the Twitter reactions.

Simulacra · on May 31, 2020

I’m really curious what language the telemetry data is written in. Does anyone know what the historically preferred language is?

cwebster2 · on May 31, 2020

No idea for SpaceX, but a lot of commercial avionics run code written in Ada.

sneeuwpopsneeuw · on May 31, 2020

Depending on the platforms and customers C and Java are also heavily used next to Ada. The history behind Ada is mostly that the military required components to be made for that language. That requirement shifted around 10 years ago to a preference.

zo1 · on May 31, 2020

Probably running Node Javascript (or Typescript), transmitted in JSON over a restful API, hosted on a K8s cluster using Docker containers. Also, the JSON payload has a formal schema in the form of JSonSchema V5. /s

On a more serious note. Some of the amazing stuff that the SpaceX team are accomplishing really puts into perspective the day to day work I and others here do.

crocal · on May 31, 2020

Bytecode, it seems :) https://mars.nasa.gov/MPF/rovercom/telem.html

skunkworker · on May 31, 2020

I’m now curious about their memory structure, are they just using off the shelf multibit ECC memory?

vardump · on May 31, 2020

You might need multibit ECC also in CPU caches. Pretty much all CPU caches have ECC, but no idea how large of an error they can fix.

Then again, probably control logic is running on something radiation hardened and GUI layer (with some redundant hardware) can be easily rebooted as necessary.

brandmeyer · on May 31, 2020

I don't think that's necessarily true. I've worked with a couple of cacheful CPUs that only used parity checks on caches, not ECC. To get safety, the cache has to be set up as write-through instead of write-back, so that it is almost always safe to throw away a line that suffers a parity error.

vardump · on June 1, 2020

At least I've seen Machine Check Exceptions on x86 CPUs about L3 ECC errors on non-Xeon CPUs. I'd presume also other levels are protected.

rurban · on June 1, 2020

LabView? Lol. That's the cheapest SW for this kind, usually used only in countries which cannot afford or is not allowed to use the industry standard simulation SW. Like Iran or North Korea.

The rest looks good, similar to the architecture used in the shuttle program (3 redundant dSpace realtime Linux boxes).

phkahler · on May 31, 2020

For triple redundancy, is there a way to re-sync if one of them disagrees? Control systems tend to have internal states that may be disrupted even if an input has glitched for a single cycle. Will it eventually converge back to nominal? Or is that processor rejected for the duration of the mission?

elteto · on May 31, 2020

If a flight string desyncs (i.e. it has state at the end of a cycle different from its two siblings) then it is automatically rebooted. Then state is automatically resynced from any other sibling after boot. There is no converging between the strings, they are either in complete sync or they aren’t.

person_of_color · on May 31, 2020

What about radiation hardening? Or do we just rely on ECC now? What about ECC for flops?

AndreyKarpov · on June 1, 2020

> The flight software is written in C/C++

I wish we could check the source code using the PVS-Studio analyzer to see what we can find out there :). It would be more interesting than these standard articles on project checks.

tosh · on May 31, 2020

seems like while most of us worried about the singularity;

leftpad managed to expand from planet earth to the ISS. well played!

#universalpaperclips

historyremade · on May 31, 2020

It would have taken them century if they written whole thing in memory safe langs.

sm4rk0 · on May 31, 2020

(2015)

dang · on May 31, 2020

Url changed from https://space.stackexchange.com/a/9446 to the main page.

_xktm · on May 31, 2020

Just like a PC?

throwaway31338 · on May 31, 2020

Using a gigantic overly complex software written in a non-memory-safe language for a system that gives commands to the flight control system?

This is how you get the Cylons pwning your spaceships. >sigh<

s1k3s · on May 31, 2020

We get it, you like Go. Meanwhile they are in space tho..

MattGaiser · on May 31, 2020

Isn't mission critical software overwhelmingly C++?

fnordfnordfnord · on June 1, 2020

A lot of Ada as well.

Kye · on May 31, 2020

Adama wouldn't have worried so much about networked ships if the people who programmed this ship programmed Battlestars. They still used COBOL.

goatinaboat · on May 31, 2020

They still used COBOL.

But they were experts in using it, some even say Lords

ncmncm · on May 31, 2020

Heaven help them if the Javascript they run depends on npm components.

When I read about these multiply redundant systems I always want to know what provisions are made for restarting or shutting down a part that is producing wrong answers. Can the Power device reset the machine whose instructions lost the vote? If the two cores disagree, do they both reset?

Also, what is done for procedures that must not be interrupted? E.g., once de-orbit is initiated, if the microcontrollers get no new instructions, do they have enough to re-enter safely? Or do they need updates whenever something is supposed to happen?

api · on May 31, 2020

I highly doubt JS is used outside the UI layer.

cowmix · on May 31, 2020

Javascript is what will be used to program the James Web Telescope is/when it ever goes into orbit.

tonetheman · on May 31, 2020

Why is this comment grayed out... This is so true.

Assuming they dumb enough to put chromium on a spaceship for a UI then there a good chance that yes they did use bower/npm. Unless they designed it in pure HTML/JS... again this seems unlikely ... given the blunt stupid of the initial choice.