Hacker News new | past | comments | ask | show | jobs | submit login
Toyota's firmware: Bad design and its consequences (edn.com)
177 points by JeremyBanks on Oct 29, 2013 | hide | past | favorite | 193 comments



As I mentioned in another recent thread

https://news.ycombinator.com/item?id=6615457

engineers are often not aware of basic principles of fail safe design. I mentioned Toyota, and this article confirms it.

Not mentioned in this article is the most basic fail safety method of all - a mechanical override that can be activated by the driver. This is as simple as a button that physically removes power from the ignition system so that the engine cannot continue running.

I don't mean a button that sends a command to the computer to shut down. I mean it physically disconnects power to the ignition. Just like the big red STOP button you'll find on every table saw, drill press, etc.

Back when I worked on critical flight systems for Boeing, the pilot had the option of, via flipping circuit breakers, physically removing power from computers that had been possessed by skynet and were operating perversely.

This is well known in airframe design. As previously, I've recommended that people who write safety critical software, where people will die if it malfunctions, might spend a few dollars to hire an aerospace engineer to review their design and coach their engineers on how to do fail safe systems properly.


One interesting thing I've noticed is that subway systems, long home of one of the more iconic failsafes, the big red lever behind glass connected directly to hydraulic brakes, are moving away from that. An old-style subway (like NYC) has emergency-stop-in-place levers that cause the train to screech to a halt immediately. But automatic train systems, like the Copenhagen Metro, have a similar looking lever that is just a computer signal that signals an emergency condition. The default response (according to the fine-print under it that I recently read) is that the train will continue on to the next station, open its doors, and then hold until further instructions.

That part I assume is on purpose, because even a computerized system could have "stop immediately" as the default policy when the emergency lever is pulled. Would be interesting to read the analysis that led to the decision. My guess is that it's because on-train issues are statistically the most likely emergency situation a passenger would signal (heart attacks, fights, etc.), in which case continuing to the next station (typically 30-90 seconds) where emergency staff can meet the train and access it, rather than stopping in the middle of a subway tunnel or elevated rail segment, is the most sensible policy.


NYC subways have additional safety features - if the track signals indicate stop (due to imminent collision with another train, for example), there are tripcocks. These raise from the tracks, and catch on the bottom of the train, causing it to hard break immediately.


Failsafes don't always fail safely. Take the unmanned Chicago subway train that cruised through the failsafes and crashed into another train, two weeks ago. Another article quotes a CTA official as not knowing how it could have escaped the yard without the brakes tripping.

http://www.huffingtonpost.com/2013/10/04/cta-blue-line-crash...


Reminds me of http://www.amazon.com/Systemantics-Systems-Work-Especially-T... "Fail-safe systems fail by failing to fail safe." - John Gall


Yowch! I hadn't heard about that.


Those mechanical overrides can still come in handy: http://www.timeout.com/newyork/film/the-hot-seat-samuel-l-ja... [granted, incident allegedly from 1990, but still applicable]


Many of the newer NYC trains, I am told, are more complicated - an emergency lever pull (someone caught in the doors) should hard-brake the train. If it's travelled more than 1000 feet, it is assumed the train is already in the tunnel, and the lever just signals the conductor, who presumably will radio for help and tell the operator to hold at the next station. I guess this implies that there's a computer inline.

The tripcocks, I hope, are still connected directly to the brakes.


"Just like the big red STOP button you'll find on every table saw, drill press, etc."

For those that like finding out, "What's the formal name for that?"... it's called a kill switch:

https://en.wikipedia.org/wiki/Kill_switch

For further reading, see the related "Dead man's switch":

https://en.wikipedia.org/wiki/Dead_man%27s_switch


You already have that, it is called the ignition key. The fact that the driver didn't think to either switch off the engine, put the gear in neutral, or slam on the brakes, makes me think he also wouldn't think to press the big red button.

Also pilots get proper training to handle their vehicles, car drivers not so much.


On my (older) Prius, when I insert my mystical key fob into the dashboard and turn on the car, I can't remove it until I've turned the power off: it's held physically in the slot. (I haven't tried yanking hard.) As far as I know, the "On" button is electronic rather than physical. The gearshift is also just sending instructions to a computer: it doesn't even stay in position after you've used it. And I honestly don't know how the break pedal works: it somehow swaps over from magnetic regenerative breaking to traditional friction brakes at some point, but I'm not sure to what degree that's electronic vs. mechanical. (Maybe the parking break is purely mechanical: it's definitely on my "in case of emergencies, try this" list.)

The point is, most of the options you've listed there really may be computer-mediated in modern cars. (And yes, I've heard that there's a strong correlation between unintended acceleration and older drivers, and that a lot of those cases really are driver error. But I don't think you're making that case here.)


The Prius uses hyraulic brakes, which are at times assisted by the electromechanical motor.

The drivetrain spins the electromechanical motor at all times, adding drag. The drag is not just from the added mass of rotation, but also a dynamic resistance caused by electrical properties of the motor being varied in different ways so that the computer can achieve either regeneration (by temporarily changing modes to allow the motor and circuitry to act as a generator, usually during a coast downhill or to a stop), or additional braking (by electrically braking the motor, using the energy stored in the batteries, to add further resistance to the drive train at the cost of heat generation and range reduction).

If a check engine light that has to do with the electromechanical subgroups of your prius comes on (indicating a fault) those systems are disabled, meaning that the car is more or less non-hybrid during those times. Braking will feel stiff, and the car sluggish, but it is by no means dangerous to drive (unless you consider the new learning curve for the cars' performance profile to be dangerous, which it is.)

Also : Your emergency brake is indeed fully mechanical, but on newer models they may be released electromechanically via a command, i'm unsure. I haven't worked on one since the second generation.

p.s. you forgot a sub group. Your steering rack is also electromechanical. One of the first of its' kind in production. Meaning, if you ever experienced a total blackout, your steering would, too, become much more resistant. This , however, isn't considered to be a safety hazard, because at speed the steering rack does little to assist the driver. the forward momentum takes care of that. The steering assistance is mostly there for parking lot situations.

(source : I was at one time a toyota technician, and my back still remembers the recall on first generation prius battery packs, they weighed 124lbs and were way awkward to remove.)


As far as I know, the "On" button is electronic rather than physical. The gearshift is also just sending instructions to a computer: it doesn't even stay in position after you've used it. And I honestly don't know how the break pedal works: it somehow swaps over from magnetic regenerative breaking to traditional friction brakes at some point, but I'm not sure to what degree that's electronic vs. mechanical.

All of these functions are electronic on the Prius (and indeed on every hybrid car that I know of that's on the roads). The balance between regenerative and regular friction braking in particular requires quite a bit of computer code and calibration to get right.


From the article, emphasis theirs:

Vehicle tests confirmed that one particular dead task would result in loss of throttle control, and that the driver might have to fully remove their foot from the brake during an unintended acceleration event before being able to end the unwanted acceleration.

Every one of those approaches you suggested are, in many modern cars, fully software driven. And the article even shows an example of how a bug in the software can only be resolved through the exact opposite of what a rational person would do in a crisis.

I think the only actual mechanical failsafe left is the handbrake. Please tell me that's still sacred...


Handbrakes are almost always mechanical cables, but they're almost definitely not enough to stop a car under high engine output. They're mechanical and not power-boosted (see comments on how much force is needed on the brake pedal without power assistance) and plus, most people have at some point driven around for a few miles before they realized that beeping sound was the parking brake stuck on the whole time.


Even if you are able to, fully engaging the handbrake in a car at highway speeds, while the drivetrain is in runaway, wouldn't be helpful.

source: I have tested this at ~30-40 mph and nothing about that experience leads me to believe that it would be safer if I had been going faster, and at full throttle.


Oh, of course - I've left the "emergency 'smell funny' lever" on before (thanks Mitch Hedberg).

So you're probably right, it's a stretch to call the handbrake something useful in emergencies when in reality it probably wouldn't perform that function.


It's called a parking break now. Only useful when the car isn't moving :)


It might be sensible, though, to couple a killswitch to the handbrake, so that engaging it switches the engine off, or cuts off all electronic control, or something.

I don't drive cars, so i have no idea if that would conflict with normal uses of the handbrake. Perhaps it could have a position beyond the normal brake-engaging position that did this? So that if someone panics and yanks on it as hard as they can, they get the result they probably want.


Handbrakes usually brake only the rear wheels. If you turn the handbrake fully on while driving at higher speeds, you may lose directional-control.


Very well said. The focus on convenience and cool features over safety makes me really sad, and want to force automotive engineers to watch some Alan Kay talks. He loves to talk how people who don't know the history and basics of their craft will arrive at inferior solutions, for example in this one: http://www.youtube.com/watch?v=FvmTSpJU-Xc


Electric hand brake is very common in modern cars. Luckily most the features you listed are usually implemented in separate ECUs. Neutral probably goes through the gearbox ecu, handbrake through the brake ecu etc.


>Electric hand brake is very common in modern cars.

Yuck!

>Luckily most the features you listed are usually implemented in separate ECUs.

What concerns me would be how the systems handle unexpected inputs.

In the article it notes that the only way to end one unexpected acceleration event was to stop using the brakes. I'm not sure if the vehicle in question has separate controllers, but if it doesn't that's a real concern that unexpected input from one tickles a bug in another.


The parking brake is electronically controlled in a lot of new cars now.


Ah, didn't know that. Figures!


The previous sentence in the article seems to indicate that a "dead task" in this context will only happen if a specific bit in the controller's RAM is corrupted.


No, you don't. Many modern cars use contactless/electronic ignition keys and start buttons connected to software. Braking is a software function (I.e. ABS) and most Americans drive automatic (software) transmissions.


>>Many modern cars use contactless/electronic ignition keys and start buttons connected to software.

Fair enough, but as far as I know not a valid point in this specific case.

>>Braking is a software function

Yes, but a completely separated system. The odds of both software systems failing simultaneously is getting in the hash collisions domain...

>>most Americans drive automatic (software) transmissions.

Again a seperate system, and you still have a neutral position.


>Yes, but a completely separated system.

Are you sure? It seems sensible to me that accelerator inputs would factor heavily into the braking system, so it seems very sensible to me that an unexpected condition in one could translate over to the other - as the article noted the only way to undo one unexpected acceleration condition was to completely remove your foot from the brake pedal. Sounds like cross-over to me...

>you still have a neutral position.

... which is likely just a software input to the transmission computer.

In all likelihood the only non-electronic failsafe is the handbrake, which I still think is a direct mechanical connection in almost all cars.


> In all likelihood the only non-electronic failsafe is the handbrake

Not on my Nissan Leaf. The brake lever is a switch that turns what I assume (based on the noise) is a small motor to engage the rear brakes. As far as I can tell, everything is electronically controlled. Brakes, accelerator, "ignition", "transmission" (both in quotes because the Leaf really has neither), parking brake. If there's a firmware failure, there's not a mechanically-operated fail safe to save me.


> Yes, but a completely separated system. The odds of both software systems failing simultaneously is getting in the hash collisions domain...

It isn't, really, in a lot of newer cars. You have a brake override system [1] that can reduce the power output of the engine by various means.

[1]: http://auto.howstuffworks.com/car-driving-safety/safety-regu...


Separate systems, FOR NOW. How long do you think that will last? And would you care to bet your life on it?


> The fact that the driver didn't think to either switch off the engine, put the gear in neutral, or slam on the brakes

The gearshift in the Prius is totally electronic, and it does not allow you to switch into neutral if you're traveling above a certain speed.


> it does not allow you to switch into neutral if you're traveling above a certain speed.

I don't think this is correct (at least for 1997 - 2009 models). Could you offer a citation?


I was a passenger in a 2005 Prius going at ~65 mph when the driver tried it. It didn't work. Needless to say, you can switch into neutral when stopped, so I gather there is some speed threshold above which it doesn't let you switch into neutral.

Edit: this made me curious, so I did some cursory research and found this:

http://answers.yahoo.com/question/index?qid=20100204083409AA...

According to one commenter, to shift into neutral when driving you can do one of the following:

1. Press the park button

2. Shift into reverse

3. Hold the shifter in the neutral position for 3 seconds

A video posted by a different commenter shows the driver holding the shifter in the neutral position for not quite 3 seconds, but still longer than is required to shift the car into other gears (and if my memory is serving me right, longer than is required to shift into neutral when stopped).

In any case, the most obvious way to shift the car into neutral did not work for us, and it's unlikely a panicked driver would think to try any of the methods listed above.


The cost of the level of training airline pilots get (around 100kUSD) would be prohibitive for individual driving. Also, they fly with an equally well trained colleague who will run the checklists in an emergency and correct their mistakes. I assume private pilots make as stupid mistakes as individual drivers.

Aircraft investigation is really good at overcoming hindsight bias and looking at human factors in a more objective way. What seems absolutely logical for you to type, having read about these incidents before, might not be as obvious to a driver who hasn't read about unwanted acceleration but is suddenly experiencing it.


My car doesn't have an ignition key, it has a button. If you press the button while in motion, I'm not sure what happens, but I'd bet it would ignore it.


I have a 2010 Prius, and according to the manual, this is precisely what happens if you push the power button while moving.


2013 Prius here. So, I tried it on the way home. A quick push of the button is ignored, but if I hold it down the car shuts off. It takes power steering with it though, so I don't recommend it under uncontrolled circumstances. I had to come to a stop to restart as well.

Of course, if the computer is busted, I'm assuming the long button press will be sent to /dev/null.


> It takes power steering with it though,

That is what sucks about cars with power steering, they don't get any of the benefit a car built with no power steering does.


I had my old Caravan chew threw a serpentine belt once...

Getting that boat of a minivan around the next corner in traffic was both entertaining, dangerous, and probably the best upper-body workout I got that year.


A number of cars these days are actually removing the ignition key entirely. Just look up the cars with "push button start" Most of them no longer have key's but instead the keyfob that opens the doors also acts as a signal that it's ok to start the car. It's one big reason that I don't want to buy any car that doesn't require a key. I know they're easy to duplicate, but i don't believe them when they say that the fobs are hard to duplicate.


Others already said it. Only to take the heat off Toyota a bit, two relevant questions from Renault drivers:

http://answers.yahoo.com/question/index?qid=20091121085512AA...

http://www.cliosport.net/forum/showthread.php?623742-06-Clio...!


Modern ignition keys are electronic - they send a signal to the computer. This is NOT adequate.


Many modern cars have keyless ignition.

Yes, better driver training could have made some of these faults less serious. Either by fully braking properly, switching into neutral, or other techniques. That doesn't excuse the faults though.


My wife's old Camry once got its starter motor stuck on – she could remove her key and it kept running until it smoked itself out. Needless to say we don't own that car any more.

Software that ensures safety like this really ought to be mandated to be open-source.


WalterBright wrote: "engineers are often not aware of basic principles of fail safe design."

I would suggest they should not be called "engineers" then. And in many countries, they're not. Part of the problem is that the tech community includes a lot of different people. Some are programmers, some are program managers, some went to engineering school, some are licensed engineers (in some other discipline). In the US, these are all commonly called engineers. Sadly, I think a lot of web programmers just don't know the true scope of the software industry and its practices.

If you want to design/build a bridge, you need a state license and insurance. The software industry isn't regulated like that. Anyone can design the software that controls a car. That's probably OK since web apps are non-critical systems. But I can't help but wonder if net security wouldn't be better if more programmers had better training in recognizing and improving the total impact of a system.

It is only the reputation of the company and potential damages in a lawsuit such as this one that put pressure on the car manufacturer and web-app startup to test their code in depth. Actually, I do wonder how much the US auto safety regulations are involved with firmware--or do they just test the macro behavior of the car?


> test their code in depth.

Failsafe design flaws are not uncovered by testing code.

Failsafe systems are designed not by "the code works therefore it is safe", but by "assume the code FAILS". Regardless of how much testing is done, you still ASSUME IT FAILS AND ACTS PERVERSELY. Then what?

(Note that acting perversely is hardly farfetched in these days of ubiquitous hacking.)


I have a quick question for you that's a matter of personal curiosity and one I think you might be delighted in answering: What sort of failsafes are there in a fly-by-wire system? Is it a matter of redundancy or another mechanism that ensures pilot inputs yield expected outputs?

I've really been enjoying the posts you shared relating to your time in aerospace. I think there are a lot of lessons the entire software industry should learn from...


All I know in detail is the 757 system, which uses triply-redundant hydraulic systems. Any computer control of the flight control systems (such as the autopilot) can be quickly locked out by the pilot who then reverts to manual control.

The computer control systems were dual, meaning two independent computer boards. The boards were designed independently, had different CPU architectures on board, were programmed in different languages, were developed by different teams, the algorithms used were different, and a third group would check that there was no inadvertent similarity.

An electronic comparator compared the results of the boards, and if they differed, automatically locked out both and alerted the pilot. And oh yea, there were dual comparators, and either one could lock them out.

This was pretty much standard practice at the time.

Note the complete lack of "we can write software that won't fail!" nonsense. This attitude permeates everything in airframe design, which is why air travel is so incredibly safe despite its inherent danger.


This is such a cool comment. Thanks for writing it.


The shuttles had similar concepts - various flaps had multiple redundant hydraulic pumps to control them so that even if one went nuts and started going in reverse that other pumps would over power it, and the result would simply be slower response times.


Gosh, this is an incredible comment. I see in greater detail what is meant by your illustration of "dual path." I had no idea the systems-level design was so thoroughly isolated.

Thank you very much for taking the time to share and answer my question!


I'd be surprised if there was a single question on a state licensing exam on failsafe design.

The sample tests I looked at had none. The GRE exams I took had none. The engineering courses I took never mentioned it. I don't recall ever seeing an engineering textbook discussing it. I've never seen it brought up in engineering forums or discussions about engineering disasters.

And, I see little evidence of awareness of it outside of aerospace - Toyota, Fukushima, and Deep Water Horizon being standout examples of such lack. You can throw in New Orleans where hospitals (and everyone else but one building) put their emergency generators in the basement. And in a NYC phone company substation was entirely destroyed because a vital oil pump was in the basement that got flooded during Sandy.


I looked into licensing at one point when it seemed like my career would be heading in a different direction. As I understand it, my state defines "engineering" as anything that could affect public safety, and says that all engineers have to be licensed, but there are only exams in certain subject areas.

I think the guiding principle of engineering regulation would lead one to believe that software controls for a car should be covered by licensing, but that this has not occurred in practice due to the regulations not keeping up with technological change.


> Anyone can design the software that controls a car. That's probably OK since web apps are non-critical systems.

Wat?


A couple articles I wrote on the topic:

"Safe Systems from Unreliable Parts"

http://www.drdobbs.com/architecture-and-design/safe-systems-...

"Designing Safe Software Systems"

http://www.drdobbs.com/architecture-and-design/designing-saf...


Everything I learned as an engineering undergrad had an underlying safety mindset. It was made pretty clear that redundancy and safety are foremost in design procedure. If most engineers these days aren't receiving that in their education then we need to drastically shift the education paradigm. As the people designing all these important systems, they absolutely must be focused on safety.


I guess you weren't working on the 787, which is exclusively fly-by-wire -- turn off the computers and it crashes.


I worked on the 757, and got it hammered into me how to do failsafe design.

I don't know the failsafe design of the 787, but I have faith that Boeing, the FAA, and the aerospace engineers know what they're doing with failsafe design.


WalterBright I have a great deal of appreciation and respect for nearly everything you've said in this thread up to the point where you declare to have "faith" in Boeing and the FAA. Engineering critical systems and faith of any kind don't belong together.


I'll get on a 787 and fly on it because I have faith in Boeing etc.

But you're right in that if I was actually working on the 787, I would have no such faith, and would verify the designs I was responsible for.


I often point out in aviation-related threads the striking difference between aircraft engineering in the past and today.

In 1989, a DC-10 operating United flight 232 suffered an uncontained engine failure which damaged the tail and disabled flight controls, resulting in 111 deaths (and could have been more, but in a freak of chance, a DC-10 flight instructor was on board and was able to assist the crew in landing, which may have made the difference for the other 185 people on board). In 2010, an Airbus A380 operating Qantas flight 32 suffered an uncontained engine failure which damaged a wing, disabled a hydraulic system and braking systems, and disabled some flight controls while starting a fire. It resulted in... zero deaths.

In the early 1990s, Boeing 737-200 and 737-300 aircraft had a variety of uncommanded rudder movement issues, resulting in at least 157 deaths. In 2012-2013, the 787's battery fires resulted in... zero deaths.

In other words, the main difference between "then" and "now" is exactly the opposite of the usual arguments against modern aircraft development: more recent aircraft, when they have serious issues, result in fewer injuries and deaths than older aircraft when they experienced serious issues.

This track record of improvement gives justifiable faith that modern aircraft development is safer.


I don't think anyone should be especially reassured that no airframes have been lost due to battery fires. If any of the 787 battery fires that occurred on the ground had occurred in flight it isn't clear the aircraft would have survived.


If any of the 787 battery fires that occurred on the ground had occurred in flight it isn't clear the aircraft would have survived.

There was an in-flight fire, on an ANA 787. It made an emergency landing and the plane was evacuated. No lives were lost, and the airframe was not lost.


Even in the 787, the spoilers and horizontal stabilizers can be operated electrically independently of hydraulics and flight computers: http://en.wikipedia.org/wiki/Fly-by-wire.


Are you sure there is no backup, either a simpler computer or something else?


I was always taught that the mechanical override was shifting it out of drive and into neutral.


This is true for most vehicles, but on some cars (like the Prius) the shifter is not mechanical.

Yes, it's unnerving.


The list is only growing longer . . .

If you have computer-assisted hill starts, collision avoidance, or computer assisted braking [through ABS, certain traction & stability control systems, etc] your computer has control of your brakes.

If you have range-assisted cruise control, or early collision alert, your computer likely has total control over your throttle.

If you have parking assist, lane departure warning, lane following assist, or electric power steering etc: your computer has control over your steering.

If your car has a DCT: your computer has control over your shifting _as well as both clutches._ -- Meaning you have _no mechanical interface_ to disengage the motor from the transmission. This is similar to your Prius example: the shifter is not mechanical.

On many new cars: there's no ignition key to remove. You likely have a smart "keyfob" that simply needs to be within X-feet of the car, and then you have a push button ignition.

I'm sure there's some override for the button [push and hold for three seconds], but it's still going to go through some electronics to figure that out.

---

The only bit that scares me is that all these systems potentially share the CAN bus with the horrendous "Infotainment" systems that every manufacturer loves to install. shudders


Mfrs have started using LIN for slow-control accessories like power windows, door locks, wipers, etc. A separate CAN for critical functions of the auto, and another (sometimes CAN) system for non-critical things like GPS integration, entertainment, and (apparently) MOSTbus has finally taken off after much teeth gnashing. MOST is a higher bandwidth than CAN bus system designed for 'infotainment' systems in autos. https://en.wikipedia.org/wiki/MOST_Bus


Thanks for sharing!

I have to admit though I'm surprised that some of those accessories even require a bus as they're typically operated by simple switches.

I guess door locks, wipers, etc. make some sense: with the prevalence of central locking, as well as wipers that sense rain and things like that.

Windows are a bit odd, as they just need a very simple switch. I suppose having all the accessories on a common bus must simplify the wiring harness though.

---

I'm curious how TESLA integrates their in-dash system with the rest of the car.

I thought their in-dash console could control some rather safety-critical "preferences" of the car ... for instance I thought you could adjust the level of regenerative braking.

(Other manufacturers are also offering "sport modes", etc -- though they're not always controllable through the dash.)


One example of where a CAN/LIN bus connected power window is good is my old Jetta. You could insert the key in the driver door lock and turn it backwards to make the car roll down all of the windows and open the sun roof. As someone else mentioned, you can likely implement features like this via hard wiring, bu then you get massive wiring harnesses.


>I have to admit though I'm surprised that some of those accessories even require a bus as they're typically operated by simple switches.

They certainly don't require a bus in principle. However, if you want to understand both the business case and some very good engineering reasoning behind the use of data buses to control simple automobile accessories you should have a look at the wiring harness for an early 90's Mercedes or similar luxury car. Demand for fancy features drove the number of wires running to and fro to an unmanageable level. Having a mile of wires in a car is expensive for many reasons I'm sure you can imagine, as well as each connection point becoming an opportunity for something to go wrong.

I'm also curious how electric cars in general will manage safety systems going forward, especially in light of this Toyota court decision.


That's why you buy a car with a manual transmission. Those haven't been electronic-ized yet and are unlikely to be as the forces involved in depressing the clutch are substantial and unlikely to go away.

Once the clutch plates are no longer in contact the engine can do whatever it wants but none of that makes it to the wheels. And then you can get the car out of gear to coast, turn the whole car off and then back on.

There's no substitute to having a human being in the loop to execute high-level "executive" functions, especially when things aren't done to the very high standards of aerospace. You know, like cars.


> That's why you buy a car with a clutch pedal.

There exist "manual transmission" that lack an operator controlled clutch. DCTs are very much manual transmissions, but many of them rely on a TCM to disengage the clutches.


I do agree that having a hydraulic (or cable)-clutch car eliminates the ECU from a critical path (power to wheels), but there are plenty of electronic-ized manuals.

There are all sorts of advanced DSG / dual-clutched transmissions in many newer cars, but even some older cars like the MR2 and Smart cars have a http://en.wikipedia.org/wiki/Electrohydraulic_manual_transmi... : quite literally a manual transmission where the clutch has been actuated electronically.


I agree. With a Camry, shifting into N would have immediately remedied the problem. I remember hearing reports claiming: "Shifting to N didn't do anything".

The explanation always seemed obvious to me: Non-technical driver experiences UA. In a panic shifts into neutral. Hears engine scream. Thinks, "The car is still going!" and doesn't realize that despite the sound, the car is actually slowing down.

I know this to be a common misconception in my experience. People commonly misinterpret high revs with "going fast".


Oh wow, good point, I hadn't thought of that. "High revs" = "motor going fast" to me, not "wheels going fast."

But I had not considered that this correlation is only intuitive to me because I own a car with a manual transmission.


I'd imagine the sudden loss of power should be apparent as well.

But then, people can react irrationally (or not at all) in a full on panic.


Car firmware scares me at times. One day I started up my Kia Soul and the transmission decided it wasn't going to shift out of first gear. A reboot later and everything was working fine, and it hasn't happened since.

Still though, sort of horrifying. (Hopefully I hit some sort of safety or fallback mode that can only occur on boot!)

It is with a mixture of fear and amusement that I observe the workings of my car's firmware. (I can hear the interrupts fire in my stereo system when I change the volume through the steering wheel control and the music skips, some buffer wasn't full! I almost have a 100% repro worked out. :) )

Edit:

Having worked in Firmware for some time now, I can confirm that the skill level of many embedded developers is not what I'd call stupendous. Now of course most people are, on average, average, but embedded seems like a special case.

The thing is, embedded systems have exploded in complexity in the last few years. No longer are software projects worked on end to end by just a handful of engineers, rather embedded engineering teams are being forced to learn the lessons about properly scaling up software engineering that developers in other areas learned long ago. A project with 16KB of space for code could be written by 2 or 3 developers sitting next to each other, and it was reasonable to keep the entire program state in one's head.

Now days? You can get Cortex M4 boards that look a damn lot like an actual computer. Sure you don't have much RAM, but the code complexity is way up there. You aren't just talking over an I2C bus to a couple of peripherals anymore!

On top of this, more and more features are being shoved into cars through the use of software. I talked to a developer of wiring harnesses for one of the major auto manufacturers, he described to me exactly how the auto companies see software as "the easy part" of things, which means they get the short shaft in terms of resources (test, dev, time, budget, etc), but are expected to bear most of the load of new feature work. (After all, it is so cheap to do it in software!)

FWIW, this developer said he has gone back to purchasing older model cars, he won't buy any of the cars running his own team's firmware.


embedded seems like a special case

I know :-(

I have worked in firmware for almost my entire software career and I agree completely. The main problem is that most embedded software programmers started out as EE's (like me) and never learned how to architect or build complex software systems. The ones who took the time to learn CS & SE produce noticeably better code and with their knowledge of electronics can do amazing things, but the rest... well, I still remember trying to explain to an EE developer during a code review why "#define ZERO 0" was still a Magic Number!


Bringing back horrible memories. I ended up with an EE degree, but I was two or three core classes away from a CompE. The software literacy has been nice throughout my career[1], but I was never able to swing into any of the firmware development groups at my previous employer. Closest I got was some systems engineering that covered custom FPGAs and off-the-shelf signal processing hardware. Now I will probably never make it past the gatekeepers for any kind of embedded job.

What the embedded industry needs is more systems engineering. Not taking an EE and saying, "Poof, you're a systems engineer now!", but actual, interdisciplinary engineers who understand systems development concepts, requirements analysis, safety analysis, and everything else that goes along with it. Its primarily an aerospace thing right now, but it really should expand to the broader field.

[1] In fact, at the moment I am a pure software engineer.


I had a Mercedes SLK 350 about ~5 years ago (leased). I was at a stop light in the Chicago loop, and while holding the brake, the engine revved up 2-3K RPMs with no accelerator pedal input. After taking it in for work, I was told it was a software issue and had been resolved.

I don't lease/purchase Mercedes vehicles anymore.


Hello, I signed up specifically to argue with you (be proud).

I think that may have been an overzealous algorithm giving the car a blip of throttle to stop it from stalling when it is idling.

An engine still needs fuel and air to run when even when there is no load on it and this is controlled by a computer. This computer targets some rpm for idling. If the rpm drops substantially low, due to for example a 'mechanical' glitch like dodgy fuel, then the computer may over compensate with too much throttle, causing the revving you experienced.

:)


I'll double on this. I had an SLK230 ('03) and noticed that when at a stop light it would sometimes rev up to about 2k rpms and come back down. I started watching for it and noticed that it happened right after the rpm needle dropped into the 500 range so I think it was an anti-stall feature in the car.


At least have your software look and see:

a) Brake pedal is fully pressed b) Vehicle speed is at zero

If these conditions are met, and you're attempting to prevent a stall, you may want to consider pushing the transmission into neutral to prevent "unintended acceleration". Transmission is drive-by-wire, so could be done by issuing a command over CANBUS.


According to the comments on this thread, that scenario can apply to ANY manufacturer. These problems are inherent to the technology being used, not the users of the technology.


The software has become dramatically larger, dramatically more complicated, and dramatically more important. At the same time, the process around developing embedded projects has not even slightly kept up.

Most of the crappy projects I've worked on weren't crappy because the developers didn't know what they were doing. They were crappy because they were old code bases that were recycled over and over and over again through small hardware iterations, and no team working on a single iteration can ever get permission from management to do the necessary maintenance (refactor, redesign, whatever) -- they are just required to get their little port finished fast. Everybody makes the smartest hacks they can on top of the crap they were given, and each step of the way the system gets worse.

Most companies don't start from scratch -- which is almost impossible anyway, given the low quality of documentation from chip manufacturers -- but start from whatever example code base or framework the manufacturer provides. These are invariably bad, and were developed like I mentioned above (hacks on hacks with each hardware iteration)... and then the hacks for your specific project start.

Then the actual development cycle is pretty much dictated to be the 'waterfall' method since it's tough to be 'agile' with hardware. Proper debugging tools, at the level you would get with desktop software development, are either unavailable or cost more than the company is willing to invest. Proper code analysis tools are the same. Continuous deployment and automated testing are practically impossible unless you have tens to hundreds of millions of dollars to spend on infrastructure.

And there's never enough time for any of it.

And there's never enough documentation for any of it.

At this stage, embedded has reached the complexity of enterprise software running on desktops/servers, but without the process and tools needed to make that actually work.


A few years ago my automatic Honda Accord had an issue where instead of kicking into 4th, it would start jumping between 3rd and 4th rapidly, about once a second. I took it back to the dealer and they claimed that the transmission control chip had "fried" and fixed it for about $15 before insurance.

I have no idea what actually went wrong, it's been working swell for 6 years after that now. My next car is going to be manual though, I no longer fully trust that sort of thing.


> transmission control chip had fried > $15 before insurance

Was this outside of warranty? Because I've paid _hundreds_ for transmission control modules of all sorts: Dodge, FORD, Honda.

The "chips" they would be referencing are likely surface-mount not socket-mount on anything much newer than 1995. (I think my 1993 EEC-IV has a socket mount for the main processor but I haven't had it apart in ages.)

There is no way a brand new TCM is $15 before insurance; and I just don't see what "chip" they would be replacing that costs $15.

Gotta love Honda transmissions though; they've has had their fair share of rather interesting transmission reliability issues. My personal favorite comes from the Acura TL.[0]

[0]: http://en.wikipedia.org/wiki/Acura_TL#2000

"...as the third clutch pack wore, particles blocked off oil passages and prevented the transmission from shifting or holding gears normally. The transmission would slip, fail to shift, or suddenly downshift and make the car come to a screeching halt, even at freeway speeds..."


> Was this outside of warranty?

I honestly don't know, the car was a 2002, this was in 2007. I bought it used a year before (from by father). IIRC it had 115k miles on it when I bought it. I've been able to get another 60k on it since then.

I've never quite bought the "chip fried" line. My suspicion at the time was that this was a frequent issue that they were trying to keep somewhat under wraps (hence the near-free repair), but now I am wondering if there was a firmware issue that they fixed by reflashing something.

That or they couldn't find the issue and it resolved itself when they reset something...

I wish I had pressed for more details, but I was mostly just happy the transmission hadn't killed itself.


>he won't buy any of the cars running his own team's firmware.

These cars run in the millions in production numbers and are driven everyday, all day, by various types of drivers in various conditions. Where's the big firmware mess? Ignoring the limited case of this Prius issue, which seems to have a lot to do with floormats being stuck to the accelerator, I'm just not seeing it.

I think there's a problem of visibility here. If you've worked in, say, fast food, you might not want to ever eat it. Software is the same thing. You get to see how the sausage is made. That doesn't mean that the sausage is unsafe. Or that older methods, you didn't witness or were part of, were better.

You drive the rear wheel drive V8 that weighs 2x my car's weight and I'll drive mine with front drive, ABS, traction control, and incredible MPG. There are a lot of reasons to own a classic car, but safety and efficiency aren't those reasons.


I think the biggest problem with software is that it can be perfectly safe and work OK in 99.9999% of all cases, and still contain deadly bugs that strike in very specific and difficult to reproduce circumstances. That the software works very well most of the time only adds to the belief that there must be another explanation for what happened (I think even Woz could reproduce the runaway accelerator bug in his Prius?)

That a piece of software is widespread and works good most of the time simply can't be used to say whether the software is "safe". To say anything about that, you have to look at the process used to develop the software, and see whether that process has a good success rate of predicting and uncovering software bugs.

If Toyotas code is as bad as this article makes it out to be and the certification standards are so poor, I won't be surprised if we see more cases like this in the future, although they will still be rare.


> You drive the rear wheel drive V8 that weighs 2x my car's weight

Actually, a lot of newer cars weigh more, not the other way around. I believe that this is mostly due to additional safety devices. For example a 1967 Mustang weighs 2500-3000 lbs[1], while a 2014 Mustang is 3500-3700 lbs[2]. A 2014 Toyota Prius is 3000 lbs[3], while a 2014 Corolla weights 2800 lbs[4]. A 1980 Corolla weighed 2000 lbs[5]. As you see, that rear wheel drive V8 weighs the same as the Prius. I think the improvements in MPG mostly come from improvements in engine efficiencies, not weight savings. I don't know what kind of car you drive, but I'd be surprised if it only weighs 1500 lbs.

That being said, I mostly agree with your other points.

[1] http://auto.howstuffworks.com/1967-1968-ford-mustang-specifi...

[2] http://www.motortrend.com/cars/2014/ford/mustang/specificati...

[3] http://www.toyota.com/prius/features.html#!/weights_capaciti...

[4] http://www.toyota.com/corolla/features.html#!/weights_capaci...

[5] http://www.ultimatespecs.com/car-specs/Toyota/5260/Toyota-Co...


Well, yes, cars are heavier thanks to life saving crumple zones and other safety engineering. I'd rather crash in a prius than a classic mustang.

That said, its also a little unfair to compare a 4 door sedan to a 2 door sports car. Modern cars are still heavier, but the difference is a bit more sane.

I think my dad's Cadillac when I was growing up was over 5,000 lbs. The 70s and 80s certainly had heavy cars, but heavy and death-traps and super shitty mpg.


My car (Kia Soul) weighs 2700lb and is considered light. Indeed the newer models have been steadily increasing in weight as more and more features are added. Of course part of this is the consumers fault, more features means more parts which means more weight! The 2012 version of the Soul is currently up to nearly 3000lb!


What's this do for your willingness to use a driverless car?

My outsider's impression is similar to your insiders - the complexity of external systems is catching up with embedded systems, but perhaps not the practices and talent.


To be fair, there's bugs in humans too. Chances are (especially over time) the bugs in a computer will be worked out to the point that they are rarer then the ones in humans.


Yes - the continuous improvement cycle is much faster on machines!


At least, there would be no wondering who (human vs. computer) has been at fault if it was driverless.


I would trust car firmware written (entirely) by Google. They're a software company, after all.


Google has never written software before that causes people to die if it fails. It's a very different thing from other kinds of software that you just reboot if it is acting funny.


That is exactly I would not trust firmware written primarily or entirely by Google. For all the talk of clueless EEs writing code in this thread, I have horror stories of clueless SEs messing with hardware.


Umm, you can A/B test a new logo or colour scheme. You can't A/B test a collision avoidance algorithm. What makes Google a successful ad platform doesn't translate to making them a good safety-critical engineering company.


I imagine you could, it just would be expensive, and owuld not involve the customers at large.


They're also a SW firm that seems to know how to manage complexity. There's a lot of software firms that I still wouldn't trust. :-)


> embedded systems have exploded in complexity in the last few years

I feel like something has to give soon. I don't think we can sustain the exploding complexity of software systems too much longer. The cognitive load will be too much to bear to develop an "average" software system soon, unless we come up with some really fundamentally different concepts to manage complexity.

Maybe "systems" are the problem. Systems are almost by definition infinitely complex. Maybe there's a better model that reduces the "system" effect? Could it be a functional approach?


I just want to say: computer controlled cars have saved more lives than any other parts in a car. How different would the automotive world be without ABS and similar devices?


ABS is a great example of a simple embedded system. Initial implementations were relatively self contained.

Personally I'd be spun out in a ditch about 5 times a day without traction control, I tried turning traction control off in my car once, and it turns out the way I drive is basically 100% dependent upon traction control functioning!


Can Michael Barr be trusted to give an unbiased review. His business is after all embedded software training and coding standards. So magnifying the negative consequences of bad coding practices would play into his business strategy...

(I am not saying hes verdict is biased, I am just stating he has cause to find Toyota guilty).

Furthermore was Toyota's code really of such poor quality relative to the rest of the industry, and relative to the economic realities of the market. I mean it's all good and well to demand aerospace and medical device quality code, but would the average consumer be willing to pay $1000 per LOC for his speed control in his car? I very much doubt it.


Michael Barr didn't find Toyota guilty, a court in Oklahoma did. Michael Barr provided testimony to that court.

If you're implying that Michael Barr somehow stretched the truth of what he found, and did so under a legal oath, then I think you're implying that he perjured himself, which is a pretty serious allegation.


Expressing an opinion via the use of concrete but in-this-case-not-actually-demonstrated examples from the code is not normally considered perjury. He believed what he said was true, and had good reasons for that belief.

That doesn't mean it's the truth though. I too read through that looking for something more damning than I found:

The memory wasn't ECC and the code didn't do anything to mitigate that risk, but that doesn't prove that a memory fault occured or even tell us what the likelihood is. Toyota screwed up the stack depth analysis. But AFAICT no stack overflow condition was found. Apparently some other stuff was found (probably with a static analysis tool) on which the article doesn't elaborate.

These are bugs, and certainly don't make me feel good about the system. But they're not a specific finding of fault either.


No I am not saying at all he lied, but the case can be argued various ways. What I am saying is that the court expected from Michael Barr to give expert testimony if the firmware was of "good enough" quality, given the constraints of the market and the risks.

It's easy to overreact after a tragic incident (and I have full sympathy with the families), and to demand more stringent certification, but it comes at a cost as well, the price for the software goes up.


Or that he has a conflict of interest which can subtly, and subconsciously, affect ones decisions and conclusions.


Now lets take a moment to consider how one would go about charging a expert-in-a-field with perjury.


Yet, Barr discovered things that the NASA report didn't.

NASA: "The NESC team examined the software code (more than 280,000 lines) for paths that might initiate such a UA (unintended acceleration), but none were identified."


notice that 1000$/LOC in a car with 1000000 units solds amounts to little more than 11$ (in the case of the article, IIRC: 11000 LOC), so economies of scale do make it perfectly feasable, I guess.


Yes, but it is not just the $ / LOC, it is also the time to market which influences the profitability of a product.

But your point is taken, it might well be feasible to demand aerospace quality code from the automakers.


It is important to note here that that many automakers allegedly demand extremely high quality code from themselves. The MISRA C standard [1] is widely used as a guide for high quality embedded code in many industries and was developed by the Motor Industry Software Reliability Association.

[1] http://en.wikipedia.org/wiki/MISRA_C


Improving quality is not the solution. The issue is lack of a proper failsafe system - in this case, a STOP button that physically removes power from the ignition system.

No amount of quality can guarantee the computer cannot fail.


Wargames.


It's typically been my experience that (within limits) higher quality equals faster production. Can you imagine the debugging time they must burn with 11,000 global variables?


Considering the risks of flying vs. driving, I would argue it makes more sense to demand aerospace quality from cars than it does from airplanes.


Thanks for pointing out the potential bias. I did find myself thinking some of the points were a little exaggerated.

Everyone has come across code which no-one in the company wants to touch because it's too complex. However, that same code may have been in production for decades, proving fit for purpose by example.

That said, I'm not sure I'm entirely against the idea of companies being sued for failing to write sufficient quality code. Could do wonders for the industry!


> However, that same code may have been in production for decades, proving fit for purpose by example.

Until it isn't anymore because one of the critical but undocumented (and possibly even unintended) assumptions no longer applies. See also: Ariane 5. I can easily see that happening in this case - all they'd have to do is shrink the available stack space to free up RAM elsewhere and suddenly critical variables would intermittently get corrupted, possibly only under circumstances that didn't happen in testing.


Witnesses are often paid by the party on behalf of who they are testifying.


Slightly sensationalist. The story is driven by a plaintiff's lawyer and a consultant suing a large BigCo with deep pockets. So nothing new there. The tear-down into the ECU tho is slighty interesting.

Unintentional RTOS task shutdown was heavily investigated as a potential source of the UA. As single bits in memory control each task, corruption due to HW or SW faults will suspend needed tasks or start unwanted ones. Vehicle tests confirmed that one particular dead task would result in loss of throttle control, and that the driver might have to fully remove their foot from the brake during an unintended acceleration event before being able to end the unwanted acceleration.


At the risk of a flame war, I'm going to state the obvious. firmware code often has quality issues because it is often written by Electrical Engineers.

EE's often have little software engineering taught as mandatory subjects in college, and often only get to learn good SW engineering practices on the job, or via milspec/aerospace certification, if ever.

No disrespect is intended, it's just my experience after working with a lot of otherwise very talented EEs.


I've written production firmware (computer engineering background). In my experience EE's don't usually write production firmware, but are responsible for writing non-production verification loops and code.

Usually a computer engineer or software person works with the EE who developed the board level hardware + ASIC engineers.

Of course, there exist EEs who are talented at writing software, but I wonder how often EEs are tasked with writing production code.


I was an embedded contractor for 8 years. It was a huge majority of EEs writing the production code at every company I worked with. I probably worked closely with about 300 engineers in that time, and fewer than 20 of them had Computer Science backgrounds. This was especially true in aerospace/defense, where I very well might have been the only CS guy in the (large) building.

And, no, most weren't "EEs who are talented at writing software" :)


I am an EE, and I wrote several thousand lines of production code for an airborne radar system. It wasn't even firmware; it was signal processing application code running in RTLinux on commercial server hardware. Our firmware was all written by EEs, though I ended up having to help the EE tasked with the C code quite a bit.


That sounds really awesome! But as a military system, this project likely had a lot of mandated quality processes.


"Usually a computer engineer or software person works with the EE" This does not always happen. This really depends on Conway's Law:

http://en.wikipedia.org/wiki/Conway's_law

For example, if the firmware is controlling a device that does not typically include higher-level computer systems. For example, an ECU originally designed for a non-hybrid vehicle, there will likely be no CE/CS people involved.


Has it been ruled out that it wasn't just a case of the wrong pedal being pressed? That is often the simplest explanation for cases like this. The driver swears they were hitting the brake pedal, but that car just kept accelerating! Not surprisingly, these cases often (but not exclusively) involve elderly drivers, new drivers, or car rentals.


Since most of the victims of uncontrolled acceleration in Toyota cars were elderly, I think we can conclude that this was the case in the majority of incidents. Competes don't discriminate by agree on their own.


We'll never know, unless the pedal was being pressed hard enough to bend it out of shape. Maybe it is an additional reason to make the system fail safe - assume people will press the wrong pedal and then sue.


The weird thing is we should be able to know! I recall at one point one of the incidents was investigated, and they said that under hard braking at high speeds, particularly sustained hard braking, certain chemical changes happen in the surface of the brake pad due to the temps. They said they found no evidence of those chemical signs.


> certain chemical changes happen in the surface of the brake pad due to the temps

In laymen's terms, we call that "burning" :-)


Brake pads don't usually burn in cars.


This would only be expected in cars where the brake pad is necessarily engaged when you press the brake pedal.


I thought cars were still generally like that. The ABS can step in and take over, but without pro-active behavior from the ABS you have a direct hydraulic connection. I didn't think ABS was being called upon as involved here.


Brakes are computer-controlled in many cars these days, such as the Prius. The computer chooses between engaging magnetic brakes or the hydraulic system or routing force directly from the pedal to the brakes.


Seems like for software-driven cars, a small 15-minute diagnostics log would be a good idea. A black-box without audio recording, to avoid privacy implications.


I just wsnt to add that not even formal methods absolutely guarantee absence of errors. Firstly, there might be errors in the proof itself. Secondly, you always make assumptions about the CPU, memory and other hardware components. These might fail due to a variety of reasons: hardware bugs, electricity disturbances, heat, radiation, physical damage and more.


You're right that formal methods do not guarantee absence of 'errors' in the sense of unforseen unwanted behavior.

But formal methods, which typically are based on computer-checked proofs, can help you to eliminate certain possibilities. They are severely underestimated and underused because of our dependence on C.

The formal methods do not 'fail' as such. They just fail to prove anything boyond the proven property. Such properties can very well (and often do) include failure (of whatever kind) of parts in the system.

Apart from AI, which as an approach to embedded systems is almost by definition the opposite, I only see one way forward from the mess we're in now: formal methods.


That's why aerospace electronics have temporal and physical redundancy as well as voting algos. These generally rule out the radiation, physical damage, and RF issues.


The issue wasn't that the error happened. The issue was Toyota did not do proper due diligence to prevent errors.


"For the bulk of this research, EDN consulted Michael Barr, CTO and co-founder of Barr Group, an embedded systems consulting firm, last week. As a primary expert witness for the plaintiffs, the in-depth analysis conducted by Barr and his colleagues illuminates a shameful example of software design and development"

Well, okay, so the article kindly submitted here takes the position of personal injury attorneys who just won a trial before an Oklahoma jury. And perhaps that is the correct factual position about what Toyota did and what Toyota should have done instead. (Disclosure: I am a lawyer, so I had law school courses that trained me to think on both sides of issues that go to litigation.) When "unintended acceleration" cases were first mentioned in the news media, including one case that occurred here in Minnesota, I was very wary of buying Toyota cars, and bought other brands instead. But as our previous cars wore out, we bought Toyotas, and Toyota vehicles are what we drive for all our driving now. I notice that both cars we bought have very clear warnings near the floor mats about attaching those securely and not using any floor mat that isn't attached securely. Toyota, from this point of view, seems to be acknowledging that floor mats used to get jammed up against accelerator pedals in a way that made cars hard to control. We have not had any problems with our vehicles. The news stories about unintended acceleration in Toyota cars seem to have diminished. Perhaps whatever was bad about the former designs has been fixed.


That's a dangerous line of reasoning in this case.

I could be convinced that this court ruling is erroneous, and that the unintended acceleration issues can be entirely accounted for by floor mats and driver error. But in this case I think we should be thankful for bad floor mats and driver error, as they've brought to light very fundamental flaws in Toyota's firmware engineering processes.

If there had never been a single unintended acceleration in a toyota vehicle it would not have been through robust engineering but instead through luck. And we need our vehicles to be safe by design, not through happenstance.


If you can solve the halting problem, we can make software that doesn't misbehave.

I'm not saying Toyota should be allowed to provide this shoddy piece of software in a critical subsystem, but I very much think a) other vendors software will be just as crappy and b) this feels like the court longing for reasons to fault Toyota on something that was still very likely user error, not software misbehaving.


It's not necessary to solve the halting problem to have the kind of protections aircraft have, let alone things like not drastically miscounting the stack space, having memory protection against stack overflow, using ECC RAM, not having 11,000 global variables, etc. Even if it was user error, this isn't even close to a sane design.


If there had never been a single unintended acceleration in a toyota vehicle it would not have been through robust engineering but instead through luck. And we need our vehicles to be safe by design, not through happenstance.

I especially appreciate your comment because my childhood best friend (an electrical engineer who designs safety-critical systems) thinks this way. He started out in avionics, and was the lead designer for the avionics system for a commercial airliner that so far has a very good safety record indeed, and then he moved over to the medical device industry. In his work, "zero defects" is the only standard, and fundamental understanding of how a system works, from the level of subatomic physics on up, is his approach to design with no hidden flaws. That approach is not easy, but he thinks that is the appropriate approach when human lives are at stake.


This problem in general is yet another symptom of the immaturity of the software industry as a whole. Most standards are community standards rather than well accepted and extremely well known official standards. And a lot of best practices still come down to judgment. Moreover, best practices and standards vary greatly depending on the nature of the product. Even within a small sub-field like embedded systems the requirements are very different for a car, airliner, or 3D printer.

It's telling that even at a big company like Toyota which is fairly risk averse and is well known for its commitment to quality and safety they are still capable of churning out pretty crappy software that is hugely important. Software dev is still a pretty hard problem overall, we live in an era where there have been lots of successes, but failure is still common and the consequences of failure can sometimes be severe.


What EDN reports aren't really - by and large - design issues. They are more process issues: lack of certification (a big deal for embedded systems); lack of substantial code review (MISRA violations); lack of HW design testing (tin whiskers); inadequate analysis of memory usage.

Process issues can be solved, given motivated management and diligent engineers. Whether the effort to solve them was done and how that effort goes becomes a cultural thing, and I can't speak to the Toyota SW engineering culture(s?).


This is the same as saying, "Your web site has a slew of bugs in the JavaScript for signing up for the newsletter and the CSS is terrible and that's why it doesn't validate credit cards properly. They never point out the actual code that causes the acceleration problem - they just malign the code in general. Looks like a lot of lawyering by people who want to squeeze more money out of Toyota.

Besides, why would bugs cause more problems when the driver is elderly (http://www.forbes.com/2010/03/26/toyota-acceleration-elderly...)?


I think you missed this bit:

Toyota claimed the 2005 Camry's main CPU had error detecting and correcting (EDAC) RAM. It didn't.

Unintentional RTOS task shutdown was heavily investigated as a potential source of the UA. As single bits in memory control each task, corruption due to HW or SW faults will suspend needed tasks or start unwanted ones. Vehicle tests confirmed that one particular dead task would result in loss of throttle control, and that the driver might have to fully remove their foot from the brake during an unintended acceleration event before being able to end the unwanted acceleration.

In other words, they used non-error-correcting memory, and the investigation found a code path that would lead to the observed behaviour if a single bit flips.


>> ...driver might have to fully remove their foot from the brake during an unintended acceleration event before being able to end the unwanted acceleration. <<

"Might" is a weasel word you use when you don't have proof. Are they saying that the the system is non-deterministic? Seriously, you can say it "might" cause the problem, or you can run tests that cause the problem. Even if it is non-deterministic, you could run 1 gazillion test and get a percentage. Then you could figure out, based on the amount those system get used, how often you'd expect acceleration to be uncontrolled.

And it still doesn't explain why it mostly happens to the elderly.


Some of the issues identified in this review could have led directly to unintended behavior (e.g, lack of ECC memory, and coding practices that could lead to stack overruns).


Yeah I agree. This is a remarkably uninformative article by a breathtakingly biased source presented uncritically and with absolutely no context.


So... what's the actual bug? "Cyclomatic complexity" this, "single point of failure" that sounds very damning, but if you don't see the bug it's sort of less convincing. I mean, I can look at any code and say it's too sloppy to work.


You're right, this article doesn't go into much detail. However the expert witness' testimony does, and it's not pretty: https://www.dropbox.com/s/wnzqidngrtj8y2l/Bookout_v_Toyota_B...

He found massive failures in all of the safety systems, and successfully demonstrated that a single bit flip could cause the task responsible for controlling the gas/fuel mixture to stop running, preventing the driver from decelerating the car. The safety mechanisms in the car would entirely fail to catch this, and at this point Toyota wasn't using error-correcting RAM, so it's not entirely implausible.

He found many possible buffer overflows, stack overflows, race conditions, and unsafe casts that could lead to memory corruption or logic errors. He went on at length about bigger-picture design flaws in the way that their failsafes were implemented, rendering them often useless. They explicitly ignored error codes from the operating system which indicated that things were going wrong, as well as from their own code which was warning them that the CPU was overburdened and necessary tasks my not have been completed.

He testifies that Toyota has no real bug tracking system, no consistent code review, and had countless violatings of both their own safe coding standards, and other standards which they had had contributed to.

The corresponding Reddit discussion at http://www.reddit.com/r/programming/comments/1pgyaa/ may also be of interest.


None of this is finding the bug. RAM bits almost never flip, and "a single bit" - "the" bit he pointed to - flipping on multiple occasions is a virtual impossibility. As to all those other things: yes, bad stuff, did he see how at the "small picture level" these faults could cause the problem though? The whole thing is extremely unclear and blaming all those different things smells of not having actually understood the problem.


Previous report on the issue by NASA (PDF, 177 Pages) http://www.nhtsa.gov/staticfiles/nvs/pdf/NASA-UA_report.pdf


11,000 global variables

absence of any bug-tracking system

wow. and this is toyota.

[edit: http://en.wikipedia.org/wiki/The_Toyota_Way http://en.wikipedia.org/wiki/Toyota_Production_System ]


Could it be that your understanding of Toyota quality is based on millions and millions of dollars of marketing money spent each year. And then it turns out they are just like every other large global company.


Based on your reasoning, why don't all other automakers (all have large marketing budgets, GM spends even more than Toyota) have similar reps for quality?

Toyota's rep has been well earned, there's plenty of literature in operations about this, as well as popular trade coverage (i.e., http://www.nytimes.com/2013/10/29/automobiles/japanese-autos...)

Their software process wasn't given the same attention, and they've been burned by it. Embedded systems are really tricky, and a good process (design through end-of-life) is key.


It's not just marketing. I was in the manufacturing program of a large public university in Michigan, where you would expect all the staff to be heavily biased in favor of Detroit automakers, and the constant refrain was "Toyota, Toyota, lean manufacturing, six sigma, Toyota."


That firmware written in C for a for-profit company is shoddy should surprise no one. Rather, non-shoddy software really would be the surprise. There is a tremendous lack of understanding of the 'rest of the software world' in the embedded world, heightened by people who neither studied, wanted, nor trained for developing software.

This is a perfect example of software development done without management maturity, process maturity, and with inferior technical tools. That large scale systems for life-critical services are written in ASM/C is horrifying. That management did not enforce certification compliance is horrifying. That the correctness process did not account for tin whiskers or ECC memory is horrifying. That the engineers violated MISRA (which evidently they attempted to adhere to) is less horrifying, but still bad.


Followup: I found a transcript via Slashdot about this - https://www.dropbox.com/s/wnzqidngrtj8y2l/Bookout_v_Toyota_B...

Let's just say that this is an incredibly readable discussion on how to do safety critical software wrong in many, many, many ways. Everything from using binary blobs to using gratuitous amounts of globals (> 10,000 ?!?!) to not having an issue tracker(!!?!?!).

At any rate, this document counterindicates buying a 2005 Camry.


This happened to me too (toyota minivan).

I was in stop-and-go traffic and went from accelerator to brake, and the vehicle started revving it's engine to try accelerating. (gears were grinding trying to shift to compensate for the slammed brake). The situation ended when i lifted my foot off the brake and pumped it.

I took it to the toyota service and they said it was due to me pressing both brake and accelerator at the same time. I was skeptical of my own user-error, so I later tried to recreate the issue (by pressing brake+accelerator together in various combinations) but was unable to repro it.

Now that I see this article, it makes me realize it's probably the firmware.


ps: this caused me to rear-end a taxi. only going at about 2km/hr so no damage. I'm in Thailand so paid the driver about $20 and off he went.


Here's the testimony from the trial. They're starting to post it on EE Times. First part is about Task X. http://www.eetimes.com/document.asp?doc_id=1319936&


So, a question.

Why does the industry use something like C? Even with the MISRA guidelines, it still seems like the wrong tool for the job when you have things out there like Erlang, which are designed for fault-tolerance and concurrency from the get go. I can imagine the requirement for hard real-time operation might be an issue (not sure enough about Erlang specifically to say if it does or doesn't address or hinder this, so I'll leave that to someone else).

What's that industry moving towards, software wise, to address this? Is Ada an alternative?

C (even MISRA) just seems like a horribly risky thing to use in this scenario.


Primarily size. When you have 256k or less to work with, you don't have much choice.

Specific to Erlang: http://www.erlang.org/faq/implementations.html (see 8.9)


Ah, thanks for the link.

It's easy to forget how small the RAM budgets still are in some parts of the embedded world.


Sometimes I question the value of going digital for systems like this. Does the benefit gained over an analog equivalent justify the risks associated by using a black box that cannot be qualitatively evaluated or examined by the layperson?

My friend had a 50s Ford when I was in high school. The throttle control was a lever and rod attached to a couple of springs.


Purely mechanical does not mean failure free. I've had the throttle linkage stick such that the spring was insufficient to drive the throttle back when I removed my foot from the gas. Fortunately, the ignition key truly disabled the ignition (hard-wired). The thing you had to be REALLY careful of was to not twist the key back so far as to engage the anti-theft steering lock mechanism ('70s and newer?).


I've had mechanical throttles get stuck, too. But it wasn't an issue as I would hook my foot under the pedal and pull it up, or just turn the key off. Or push in the clutch. Or shift to neutral. Or step hard on the brake (the brake is stronger than the engine). My car also has a separate switch for the fuel pump, which can be turned off.

The point is there being multiple independent ways to stop the car. Running everything through the same central computer makes for a completely unsafe system.


I've had a similar "total clutch failure" before.

1993 Mustang's have a plastic ratcheting mechanism that picks up slack on their [very heavy] clutch cables.

The teeth on the plastic ratchets eventually wear out (duh....)

If you're FORD, you design it so that this piece fails catastrophically, releasing all tension on the clutch cable. (Instead of, say, hitting a bump-stop that keeps some minimum amount of tension on the clutch cable.)

No tension on the clutch cable = no clutch. No clutch on the OEM T5 w/ a stock-spec clutch = good luck with the next downshift.

(For anyone interested: firewall adjuster + aluminum clutch quadrant = sweet deal.)

---

However I'd still argue the purely mechanical systems are _objectively better_ in that they lend themselves to preventative maintenance.

Had I thought to look: I would've seen the _very_ worn teeth on my clutch quadrant. I knew the clutch cable was old, rusty, and starting to bind in the sleeve, which probably led to the wear of my clutch quadrant. As for a throttle cable: not only can I feel it binding in the pedal, but it's fairly easy to visually inspect a throttle cable or spring for faults, rust, etc.

In addition mechanical fixes are all radically simpler. If my throttle is sticking, I lubricate the cable and replace the return spring. If your computer controlled throttle is sticking: you hope it's (A) an actual bug and not intended behavior, (B) a bug that the mfr. is aware about, (C) a bug that a patch is available for, (D) that the ECU flash can be applied free or inexpensively.

Say you're an enterprising embedded electronics engineer and you wanted to fix it yourself? If you try to modify an automotive computer: you've just tampered with an emissions control device. Your car is no longer street legal in the United States.

Purely mechanical systems by their very nature are perfectly transparent. Proprietary computer software is almost always a "black box." -- Due to federal regulations though: you have absolutely no legal way to repair or replace it on a street-driven vehicle. You are stuck with software you cannot see, understand, or control.


I didn't mean to imply that. I've driven the same drive by wire car for several years, and I'm still around to tell the tale :)

But consider that a good mechanic could figure out the failure in the analog world, and this Toyota issue has required deaths and 9-figure lawsuits to get close to an answer.


The throttle control on an early 90s, or early 2000 FORD is also a lever and rod [or braided steel cable] attached to a couple springs...

Unless you had cruise control [remember when that was an _option_?]... then there's a servo or equivalent motor in-line with the cable and springs.

---

Even with a digital system: there's no benefit. It's the same mechanical piece, it has to be, that's how combustion engines work: you open a throttle plate, the vacuum sucks in more air, the computer compensates w/ more fuel, compress that, spark it, you have more power.

You will always have a throttle plate and some sort of lever; and since you want it to fail closed you best have a return spring somewhere.

All you gain by adding a digital control system is [perhaps] clever software control and [certainly] additional complexity.

Is it worth it? Perhaps... there are some _very_ cool systems developed as a result of drive-by-wire. Lane following assist, parking assist, assisted hill starts, manual transmissions with automated clutches, collision avoidance, etc.[0]

I would definitely argue for more _transparency_ in this technology, but I don't know that I could argue that the technology itself is a bad thing.

[0]: http://www.youtube.com/watch?v=ridS396W2BY "Volvo Trucks - Emergency braking at its best!"


Digital control allows for digital override. In the not so far off future, automatic driving systems are going to radically reduce the number of car accidents, and for every 1 accident that occurs because of a firmware bug, 10 will be avoided because computers will be that much safer drivers than humans (who should never, ever be driving in the first place - they kill about 40,000 people a year in the united states, and maim hundreds of thousands more. )


More people not ordering independent certification of life-critical systems they're purchasing, then blaming everyone but themselves when it kills somebody.

Get your manufacturer to pay a respectable firm to certify their software rather than just their mechanical hardware.


This kind of thing makes me never want a car without a keyed ignition and to a lesser extent (and for other reasons), a clutch pedal. Heavy machinery should have a hardware-based killswitch. I'm kind of amazed this isn't regulated.


Having a clutch also prevents accidental acceleration due to hitting the wrong pedal, which is much more likely to occur than a firmware glitch.


When Toyota started doing recalls on floor mats to get rid of unintended acceleration, I assumed they had no idea what the real problem was. I always suspected something like this.


If this is a case of bad design in cars we have come a long way. It was not long ago that the Ford Pinto was catching fire due to a known bad physical design.


Perhaps an open hardware car will emerge at some point as a reaction to over-complicating vehicles.


Open hardware cars already exist: https://en.wikipedia.org/wiki/Rally_Fighter


Nice! Would be great to see an open design become popular.


Is there some kind of lagrangian programming paradigm where systems have many paths toward lower levels of energy at any point in time ?


And people call me paranoid because I refuse to drive cars without physical kill switch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: