Hacker News new | past | comments | ask | show | jobs | submit login
A Case Study of Toyota Unintended Acceleration and Software Safety (betterembsw.blogspot.com)
95 points by mzehrer on Jan 17, 2015 | hide | past | favorite | 86 comments



This is an absolutely fascinating slide set. Thanks submitter very much for the link. I have written embedded C before, and the following facts just blow my mind:

1. The Throttle Angle function in the Toyota code had a McCabe Cyclomatic Complexity of 146 (over 50 is considered untestable according to slides) [slide 38]

2. The main throttle function was 1300 lines long, and had no directed tests. [slide 38]

3. I find the static analysis results quite alarming. [slide 37]

4. 80+% of variables were declared as global. [slide 40]

I find this to be a stunning lapse of quality, especially for a safety-critical system.


Maybe they had a "hardware guy" with little software training or experience write the code?

I've done a lot of both hardware and software, and I've seen a lot more bad software done by hardware engineers than I've seen bad hardware done by software engineers. The software guys usually know that they're out of their bailiwick when it comes to hardware design.

E.g. one of the worst in my experience was a 30,000 line shell script, few if any functions, used as part of our production flow. A simple refactoring could have cut it down at least 90%. Even worse, it was totally unsupported because the guy who wrote it was reassigned.


I think this is probably the essence of it. Hardware people see software as just hardware that can be easily reconfigured. That was a reasonable model back in the 8048 days, when one only had 1kB of ROM and 256B of RAM to work in. But these programs are obviously far larger, with orders of magnitude more data paths and astronomically more possible states. Different techniques are required. Someone from a hardware background isn't going to understand that.


"Hardware people see software as just hardware that can be easily reconfigured"

Not really, not if you have a realistic view of software. When I was an EE student, about half of my classmates loved software, the other half didn't like it and weren't good at it. And in silicon valley, a lot of EEs ended up doing software at all levels. I've always been jealous of how creative software could be, and didn't have the same limitations as hardware. Lots of hardware is usually programmed at low levels by EEs, which essentially ends up being drivers, and other board support software. They also realize that their training and education is not at the same level as software engineers in software.

BTW, it used to be that if you couldn't cut it in engineering, EE/ME/ChemE, and you wanted to pursue a technical degree, most people would go into computer science. I realize it's not like that today, but that's how it was back in the day.


That's interesting. I graduated in 2005 from an engineering college and none of the engineers went into Comp Sci if they couldn't cut it in EE/ME/AE/ChemE. They went into "Engineering and Management" instead, which was basically a business degree for engineers.


The only engineering management program at our school is a graduate degree for engineers with undergrad engineering degrees already. One of the attractions of Comp Sci was the availability of jobs, which was why they tried to complete a traditional engineering program in the first place. It was kind of hilarious to see a frustrated student complaining about one of the simpler classes, but then explain that he was enduring the pain because of a job at the end of the tunnel.


Static analysis does not show you the complete picture. I have worked on countless horrifically complex, buggy and unmanageable J2EE applications that had perfect cyclomatic complexity et al scores. I have also written perfectly reviewed, manageable and well tested piece of code that didn't.

I very much question the experience of anyone who can refer to a codebase as "spaghetti code" without seeing it and relying solely on static analysis. Software is not that simple or transparent.


I'm going to guess that the applications you refer to are "simple" at the level of individual functions, but that's only because the complexity has been spread out so much that it becomes difficult to understand the whole. That's exactly why I think cyclomatic complexity and related metrics are of little benefit, and may even be harmful - "refactoring" to reduce "point complexity" can result in increasing the complexity of the whole.

I've always wondered what the cyclomatic complexity of TeX is. While Knuth is a bit of an "edge case", it would be fun to see what static analysers think of his code... complete with copious use of global variables, goto, and very, very long functions. Give someone who has never had any experience with TeX the results and ask them what they think the defect rate would be, then show them the fact that it's one of the most bug-free pieces of software ever written.


it's one of the most bug-free pieces of software ever written.

Not doubting you, but do you have a source to demonstrate that claim? If I make the claim to someone else, it'd be easier to provide evidence than for me to handwave.

TeX version is 3.14159265, so some of those are probably bugfixes.

EDIT: Um. Look, I rarely complain about downvotes, but what's up with the downvoting on HN lately? Is it me, or what? This is a simple request for more information about something I don't know about. It's not an easy thing to Google. It's up to the parent to provide evidence.

https://www.google.com/search?q=tex+bug+free shows a lot of evidence that TeX is absent of bugs, but that's not the question. The question is the total bugs that have been fixed since it was first written relative to every other major software project. That's not so easy to answer. https://www.google.com/search?q=low+total+bug+count brings up nothing relevant. In fact, it could turn out to be entirely false that TeX had a low total bugcount over its history relative to its size, especially during its very days. We don't know, because no one has provided evidence one way or another.

All of this is exceedingly obvious, and it's getting tedious to type out huge edits like this whenever something straightforward is downvoted.

I'm seriously tempted to create my own community at this point out of desperation, one that focuses on technical merit and being nice rather than posturing. I wonder if one already exists? I've heard some pretty good things about newsgroups, but haven't really looked into any.


From http://en.wikipedia.org/wiki/TeX#Development

Knuth has kept a very detailed log of all the bugs he has corrected and changes he has made in the program since 1982; as of 2008, the list contains 427 entries, not including the version modification that should be done after his death as the final change in TeX.

The file is called "tex82.bug": http://mirrors.rit.edu/CTAN/systems/knuth/dist/errata/tex82....

Wikipedia is (slightly) out of date since there's 428 currently in the above file, but note that not all of these are actual (functionality-breaking) bugs, just changes; for example, #2 is just a renaming of variables and #425 is an optimisation.

That would be 428 total changes, in a span of a little over 32 years, with the majority of them extremely early in TeX's history - #214 was in 1983, #321 in 1985, #400 in 1991, #420 (a "missing goto") in 2007.


Thanks! That's one of the coolest changelogs I've ever seen. It's pretty amazing to see 9000 lines represent 32 years of changes.


There are a lot of people that would be very happy to receive a check from Knuth for reporting a bug (those checks are rarely if ever cashed, for obvious reasons).

So it's not like people aren't looking for bugs.


Donald Knuth no longer writes personal checks for finding errors in his books and code, though he still issues a credit for the correct amount at the (pretty sure it's fictional) Bank of San Serriffe [0]. The page I linked to also mentions that he will try to send legal tender to a bug-finder if she or he really wants it.

[0]: http://www-cs-faculty.stanford.edu/~uno/news08.html


> what's up with the downvoting on HN lately

I've noticed a rash of downvotes on posts coming all at once lately. As if somebody gets a bug somewhere unpleasant and goes to go downvote all of that person's posts that they can.

(Which AFAICT doesn't impact karma score, but does downvote all those posts.)

> I'm seriously tempted to create my own community at this point out of desperation, one that focuses on technical merit and being nice rather than posturing.

I'm down. Email's in my profile if you want to chat about the idea.


FWIW I think you're an asset here and don't let the downvotes get to you. It happens.


Did you miss the joke that that release is 'Pi'?


I'm aware how the TeX version numbering scheme works. Would you please stop making assumptions about me, from our earlier conversation to this one? I was asking about the total bug count of the system since its inception, and 3.14159265 is at least 8 changes since version 3.

I'll skip the sarcasm and say that yes I realize 3.14159265 is Pi.


I have also written perfectly reviewed, manageable and well tested piece of code that didn't.

Ok, so I'll bite. The fact that not all bad code scores poorly on cyclomatic complexity scores does not imply that code that scores poorly in cyclomatic complexity isn't bad code.

So you wrote a function with cyclomatic complexity greater than some accepted value (50?) and you claim that it's well tested. I'm skeptical. How many of those code paths did you follow in your tests? What was the ratio of lines of logic code to lines of test code for that function? Can you really say that your tests provided adequate coverage of this function?

I've fixed a lot of cowboy-code with horrible cyclomatic complexity and no state separation, and this report about the 1500-line throttle control function dipping into system-global pool of universally-accessible variables sent shivers down my spine.


>80+% of variables were declared as global

This is actually important for safety-critical programming. It lessens the likelihood of running out of stack space, and it's useful for eliminating dynamic allocation.

Some safety-critical software has 100% global variables, without even using a stack.


But even then, you want to declare those variables as static or otherwise find some way to reduce their scope. Most global variables in the Toyota code were visible and writable from the whole program.

This is not even counting all the variables declared with the wrong type or more than once, uninitialized variables, etc.


I think this is possibly one area where functional programming could shine if the garbage collection issue could be dealt with once and for all (and that's a hard one). It's for practical reasons impossible to test something with that much state exhaustively, but once broken up into functional units you just might get there. Something along the lines of Erlang for embedded systems.


No. Functional programming does not magically free you from "state". The world has state, the unlimited number of real world inputs are the state of the program, merely expressed as function parameters. The CPU itself has registers, and stores a call stack.

The abstraction of functional programming does not alleviate all of these issues, and can introduce other ones.


Of course it doesn't free you from state. But it can help to put the state into a more manageable context. Just like side-effect free functions can help you with that.

I've done enough embedded programming (and repairs on embedded programming projects) to know just how bad the spaghetti can get and it really wouldn't hurt to borrow a few leaves from the functional world in those cases.


I'm not going to defend the other points, but most variables being global is not at all unusual for code like this. These sorts of realtime embedded applications typically have no heap allocation, so the only way to define persistent storage is as statically allocated globals. Modularization in these systems is done by convention, e.g. by limiting which include files are visible to a compilation unit.


Non-stack variables can be static instead of global. Either file-scoped or function-scoped.


Exactly.

This is also broken down on slide 40 - local static and file static variables.


It's strange that the slides don't seem to mention this point at all, even though the author is presumably an expert in this field. Why would he make such a big point of this if it's commonly done in the field?


There is a difference between being in an expert in the theory of software development and being an expert in the reality.

The theory says that bad software is written when you don't have 100% test coverage, static analysis on checkins, peer review, pair programming, the usual guidelines around encapsulation, inheritance etc. The reality is that bad software mainly comes about through poorly thought out requirements, bad initial architecture decisions and ridiculous time constraints.

It's always a shame when you see stories like this that those aspects never really get investigated or analysed.


Koopman was one of the main witnesses for the prosecution. He has also worked in the nuclear navy and industry specifically designing safety critical systems.


No ECC in RAM, either. Yiiiikes.

Edit: or peer review, or even a bug tracker. My coworkers working on web apps have a better SDLC than these jokers.


They did have peer review just not on every module. This is very common even amongst companies who say they do peer review.

And curious about the lack of a bug tracking software. Many companies including the one I work for "technically" don't use them. I wouldn't call Rally or Trello a bug tracker but that do work just fine for that purpose.

I also did find it amusing that they called out lack of formal specifications as an issue given that Agile encourages you not to create them.


> They did have peer review just not on every module. This is very common even amongst companies who say they do peer review.

This is not common at all in safety-critical applications.

> I also did find it amusing that they called out lack of formal specifications as an issue given that Agile encourages you not to create them.

This shouldn't be amusing if you understand Agile. Agile is for systems where you don't yet fully know the scope or the desired end behavior. The guts of a production car are as waterfall as it possibly can get.


It's not acceptable for a system that can kill you to be programmed without at least two programmers looking over every line of code. Common practice or not, given what we know about programming, it's immoral.


Not sure if it's widely known how they were caught... but it was when they outsourced their translation of documents and a translator read documents and blew the whistle.

Which is why the top translation firms now offer... security!


Wow, do you have a link for that? I think this is a fascinating story in and of itself, but one of the tangential things that also interests me about it is that nobody I know in Japan (where I live) has ever even heard of this story.


Here's a link to one key article: http://www.asbpe.org/wp-content/uploads/2014/07/Toyota-Cover...

In Japan, some journalists know, but media organizations won't publish the story.

I am the translator.


A friend is a director at an interesting company that provides secure translations amongst other services to companies and has made bank since this happened. I guess it's a bit of a secret outside the industry.


'How they were caught' you mean how we discovered Toyotas suffered from "UA"? The story claims it was the incident of the off-duty highway patrol officer on a call with 911 for 4 minutes trying to stop the car before it crashed and killed all 4 occupants which blew the lid on the story.


"How they were caught" involved thousands of consumer complaints and lawsuits that continue up to the present day. Tens of thousands of unintended acceleration incidents also go unreported and unlitigated because the damage is not major enough to be worthwhile. The government auto safety regulators investigated and then closed the investigations many times due to clever manuevering by Toyota's lawyers and government-relations people. They also lied to Congress (which they have since admitted to when the DOJ investigated their cover up and fined them $1.2 billion. This story is far from over.


Which is why its good to know how to shift into neutral.


There have been cases where the driver attempted to switch to neutral and the car would not shift. Also, there is often not enough time to shift to neutral because the majority of instances of Toyota unintended acceleration take place in parking lots and other confined spaces, and the car hits something before the driver can react.


Here is a link to a video published yesterday with dashcam video of many sudden acceleration incidents. https://www.youtube.com/watch?v=3oAshG36dNI These are mainly in Korea, where dashcams are the norm, and likely to be KIA or Hyundai vehicles (they also have big software problems, apparently). Imagine driving in some of these instances of close quarters. Would you be able to shift into neutral to stop in time?

Also, Dr. Antony Anderson, who closely follows this technical issue on his blog, recently published a paper in IEEE Access about how software can be fooled by mechanical glitches such as intermittent connections:

http://ieeexplore.ieee.org/ielx7/6287639/6705689/06777269.pd...

(patience--this link is very slow)

Separately, speaking to your point about shifting to neutral, Dr. Anderson has noted that it is unreasonably risky to design a safety-critical system with the expectation that operator responses such as shifting to neutral can provide an effective failsafe. Would any of you here actually design such a system?


You know the transmission in those cars is computer controlled, too?


My only criticism is that he's comparing Toyota's code and processes to a published but unenforced standard and clearly they are nowhere near that standard. It would be more relevant to compare Toyota to other manufacturers.

It seems the legal argument against Toyota is that they were not following industry standards - but if no one else was, could you really call it an industry standard?


So if all car manufacturers are as bad as each other, that's OK?


Not at all. But if it turned out that Toyota was already trying three times harder than everyone else to get it right, it would be harder to argue they didn't take "reasonable" precautions.

In the talk, he did mention that the government agency that certifies vehicles does only basic checks and does not enforce any standards on software. One could easily argue that they are partially to blame, as they're leaving it up to the manufacturers.


Also see this report from another experts involved in one of the suits:

http://www.safetyresearch.net/blog/articles/toyota-unintende...

And the slides (PDF):

http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUB...


I thought the "unintended acceleration" was proven to have been due to older drivers who got confused?


Well, Toyota probably has the better PR team (and more of an interest in getting it's message out).

It appears as if neither theory has ever been proven. The slides argue that the standard of evidence in civil proceedings is simply "more likely than not" and not "beyond reasonable doubt".

Reading between the lines, I think this expert witness is arguing that anybody who writes code this terrible for a safety-critical application deserves to pay through the nose, irregardless of the actual chain of events.


The example he gave was a young police officer who was a very experienced driver, and had a passenger call 911 in a panic while he was unable to stop the car with his foot firmly planted on the brake - and it ended up killing all four occupants. That negates the 'old people can't drive properly' theory.


Although in Toyota's defense the officer was too clueless to know he just had to hold the start button down for three seconds to turn off the ignition.


That's got to be the worst ever user interface. Press the START button to turn OFF the ignition? That's not accidental that's malicious, especially in an emergency situation. Have you ever seen a chunk of industrial machinery? Those big red mushroom buttons are the emergency shutdowns and they don't take 'three seconds' are not labeled 'start' for extra confusion in times of panic.


In push to start cars, holding down the start button is the normal way to turn them off, so it is not unreasonable to attempt to do so for an emergency shutdown.

Having said that, I would like an emergency stop button in cars (preferably one that works indepenently of the onboard computer).


It's certainly a reasonable thing to try, but as as your aunt eropple notes, an emergency shutoff should be something so obvious that a person in fear for his life would think of it.


Such as an ignition switch which you turn to the left to physically cut power to the ignition system, like we used to have back when cars weren't so overcomplex.


Three seconds is a very long time...

Maybe just go with one of these http://thetimedok.files.wordpress.com/2013/02/big-red-button...


"Too clueless," or scared for his life?

I mean...geez.


The Micheal Barr Group was selected to analyze the Toyota controller code. They've filed online reports you can read.


One of the interesting slides points out that a failsafe in the system wouldn't actually work if the driver was pumping the brakes. Drivers who are used to anti-lock braking systems -- i.e., younger drivers -- probably will not pump the brakes, but drivers who learned on cars which didn't have anti-lock brakes -- older drivers -- are probably more likely to do so.

Which could account for some of the alleged age discrepancies, beyond the incident involving the police officer, which cannot be blamed on age/confusion.


Is it known why he did not turn off the ignition?


According to mr. Koopman's presentation, turning off the ignition turns of the steering assist and would be a bad idea. The recommended workaround would be to shift to neutral.


Apparently it was one of those "push button" ignition systems, which you have to hold down for three seconds to kill while in motion. Probably the driver didn't know that, or didn't hold the button long enough in a highly stressful situation.

The "old-fashioned" key ignition would have provided a much clearer interface to killing the engine. As would shifting into neutral, which is the obvious other option; dunno if there was a reason that wouldn't work.


No it is not. But one physicist, Dr.Ron Belt, has published some papers that may explain the reason for the skew toward older drivers among SUA victims. Basically, he says that their driving habits tend to deplete batteries, and the resulting unstable electrical supply shared by loaded cars and their ECUs can tend to set off software glitches in poorly written code. The pedal confusion theory is often used as a defense by automakers but it has been discredited.


The Audi 5000 unintended acceleration in the 80s was shown to be that.


I can see an excellent case here for open sourcing such code.


I've always wanted to build an open source ECU, but haven't had the time.


I think this is one of the most interesting software stories in recent years, up there with Stuxnet.

The software development process seems so staggeringly, jaw-droppingly incompetent and negligent, and it now seems clear that software flaws really did kill people despite the heavy layer of spin that it was driver error, floor mats, etc.

I almost certainly couldn't buy a Toyota ever again knowing this. But it also really makes me wonder: how bad is the QA and testing for the software components of other carmakers' vehicles?


> I almost certainly couldn't buy a Toyota ever again knowing this.

What makes you think any other car company is different/better?


That's what I mean; I don't necessarily think it is (which is scary).

Maybe we only got to know how horribly flawed the software situation is at Toyota because they had finally had enough people killed that they just couldn't keep it hidden any longer, and Audi and Honda are just as bad and just haven't yet had this kind of exposure event.

I would prefer to doubt that, but it does all kind of make me want to buy a restored 1978 Datsun with carburetors and mechanical everything.


This makes me want a Tesla. That company actually seems to hire qualified engineers given the failure modes it's had in the exceedingly rare cases of accidents that have thus far made the news.


Idk... there was an article on here a while ago about how after the Ford/Firestone tires = death incidents a decade and a half ago actually resulted in Ford making much better quality cars.


If this didn't result in Toyota making better cars, I think that would be an indication that human society is fundamentally broken.


It doesn't seem at all clear to me that it was "spin" when there were cases of floor mats getting caught under the pedal, incidents of driver error, and physical problems.

It also is striking that the analysis could not provide an example of a single error condition that would cause the crash scenario. It's only a high level analysis.


It probably wasn't just spin -- Toyota probably didn't initially believe that they had a multiple-fatality software bug. When I first read some of the news stories about this, years ago, I also thought it sounded far-fetched that a bug that caused your car to accelerate to 100+ mph and crash could have made it all the way to production.

But it is important to note that the post we are all commenting on here isn't "the" analysis. This is just an academic case study of this now-infamous and interesting case, based on public information.

The closest thing that we have to "the" analysis on this (since we will never see Toyota's internal analyses) is, as tokenrove linked to elsewhere in this thread, the one Michael Barr did that was the main analysis[1][2][3] used in the court case against Toyota.

But there have been a lot of other interesting articles and posts on this case, other than just this one.

[1] testimony part 1: http://www.safetyresearch.net/Library/Koopman%2010-11-13%20a...

[2] testimony part 2: http://www.safetyresearch.net/Library/Koopman%2010-11-13%20p...

[3] slides: http://www.safetyresearch.net/Library/BarrSlides_FINAL_SCRUB...


How hard would it be to get an old prius from a junk yard, extract the ECU, extract the software from there and reverse assemble and annotate the whole thing?

The hardest part I figure would be getting around the inevitable read protection but there are some folks that have done very interesting things in this arena (for instance, reading out pic chips that had their read fuses blown).


Go for it. Just remember it is a huge amount of code.


Your comments about the software development process reminds me of the arguments from a few years ago, against open source software, like linux. One of the arguments for professionally made, closed source, proprietary software (like Microsoft windows), was that; programmers getting paid by a corporation would obviously develop better software than a bunch of unpaid, unprofessional, part-time hackers. Funny how much that argument has changed today.


If someone ever comes up with a practical method of creating reliable software then everyone would use it. You can always find a programmer who will claim that they would of done it better... For all we know whatever Toyota is doing is way better than what everyone else is doing and this is their only safety critical bug (assuming that the bug actually exists)...

In the end the software is either correct or it is not.


The information that has been released so far, especially Michael Barr's comments on it, suggest that this is far from the only bug.

For example, http://www.edn.com/design/automotive/4423428/Toyota-s-killer... quotes Barr's claims: "Toyota’s electronic throttle control system (ETCS) source code is of unreasonable quality." "Toyota’s source code is defective and contains bugs, including bugs that can cause unintended acceleration (UA)."

I am a little appalled at the number of apologist comments on this story. There is mounting evidence that this wasn't a "it could happen to anyone" bug, but rather a serious violation of software engineering ethics. Code must not kill.


>...software engineering...

Some of us do not believe that such a thing exists in any meaningful form. In the past every approach to creating reliable software has failed to deliver. That makes it hard to believe that any particular approach is finally the answer.

I agree that the Toyota software is poorly written. That in no way means that conforming to a particular standard or method would of automatically produced software that was more reliable.


I work with some people in the static analysis industry, and I'm told that Toyota did have static analysis tools that would have identified a number of these issues; as a particular example, recursive functions, which are verboten under the MISRA rules they should have been following.

I heard to a less reliable degree that the tools had been used, and results ignored.


I'm not really a fan of MISRA-C or V process and stuff like that, but the quality of the source code and process is truly appalling if these slides are right... tens of thousands of global variables, no bug tracking and so on and so on.

I really hope they improved since then, for the sake of anybody driving a Toyota.


Anyone know what the things he "couldn't talk about" were? I was hoping to hear the open mic bit at the end for a hint but the video is cut off.


He mentioned he's still involved in legal proceedings and expects to be deposed in further trials. I imagine specific implementation details, things under NDA, etc. are what he's trying to avoid.

In work like this, you certainly wouldn't want to accidentally cause a negligent company to get off the hook because of an unimportant public statement.


This happened to my parents in a 2002 civic.


2002 civic used a mechanical throttle cable and mechanical transmission linkage. These Toyotas/Lexus(s) did not use any such mechanical links to the drive train, they were dependent on software wholly.

The engine management on your parents' Honda cannot adjust the throttle percentage or gearing, only ignition and valve timing along with air/fuel ratios and gear hold-out timings. This could result in a surge of power, or a stalled car, but nothing like the experience of wide open throttle.

The Toyota UA is thought to have been rooted mostly in the fact that they adopted fly-by-wire throttle schemes while failing to compensate for the much more major software/hardware/management risks that come with such systems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: