Hacker News new | past | comments | ask | show | jobs | submit login

I had the opportunity to go down to JPL and speak with team members about this design decision. The space hardened processors are not fast enough to do real time sensor fusion and flight control, so they were forced to move to the faster snapdragon. This processor will have not flips on Mars, possibly up to every few minutes. Their solution is to hold two copies of memory and double check operations as much as possible, and if any difference is detected they simply reboot. Ingenuity will start to fall out of the sky, but it can go through a full reboot and come back online in a few hundred milliseconds to continue flying.

In the far future where robots are exploring distant planets, our best tech troubleshooting tool is to turn it off and turn it on again.




I'm a little surprised they didn't go for three separate computers and compare them for every operation, or something like that, but I'm sure they have their reasons.


I've never seen an off-the-shelf processor that has hardware support for doing that kind of cross-checking on every instruction. And doing it in software would probably add so much overhead that the error-checking would be much more likely to fail than the application code.

If you're willing to relax your real-time constraints a bit, and risk a brief period of incorrect behavior before the error is caught, the problem becomes vastly easier and cheaper to solve.


>off-the-shelf processor that has hardware support for doing that kind of cross-checking on every instruction.

it is usually done with COTS CPU by either running the CPUs in lockstep (the simpler early generations of CPU) or by inserting hardware checkpoints at various points like branches, by number of instructions, etc. A recent such commercial system was the triple Itanium from Tandem/NonStop(HP).


There are the ARM Cortex-R series of processors which have two cores running in lockstep for fault tolerance.


Perhaps the double-memory-and-checking is only done on the control algos and not on sensor fusion/object detection etc?

Kind of "Ok if we for a brief moment believe there's an obstacle in front of us, since it'll be gone next tick, but not ok to turn off motors".


Do you have a source for those fast reboots? It's running Linux after all


A few hundred milliseconds seems easily doable with a custom linux distro.


Unless your /dev/sda wants a fsck. :-)


is Ingenuity running Linux? All of the flight controller software i've seen for autonomous drones don't use an operating system.


Yep, from the article:

"This the first time we’ll be flying Linux on Mars. We’re actually running on a Linux operating system. The software framework that we’re using is one that we developed at JPL for cubesats and instruments, and we open-sourced[0] it a few years ago. So, you can get the software framework that’s flying on the Mars helicopter, and use it on your own project. It’s kind of an open-source victory, because we’re flying an open-source operating system and an open-source flight software framework and flying commercial parts that you can buy off the shelf if you wanted to do this yourself someday. This is a new thing for JPL because they tend to like what’s very safe and proven, but a lot of people are very excited about it, and we’re really looking forward to doing it."

[0]: https://github.com/nasa/fprime




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: