Reverse engineering the VW electronic control unit

jzwinck · on Jan 9, 2016

The "data driven" nature of the ECU is not surprising. Ever since the first ECUs were made they have used lookup tables (LUTs) as opposed to mathematical models. I don't know exactly how ubiquitous LUTs are today but certainly they are traditionally used in this application. Computer enthusiasts may enjoy knowing that the controller in many LCD monitors is similarly LUT based rather than containing curves as formulas.

As an aside, I imagine "A2L" is short for addr2line.

_pmf_ · on Jan 9, 2016

> As an aside, I imagine "A2L" is short for addr2line.

A2L is for "ASAM MCD-2 MC Language" [0] (but it does contain addresses for internal variables and associated information related to the interpretation of the variable's meta data, so your guess makes sense).

Mostly, A2L files are generated by using the concrete addresses from an ELF file to fill a template A2L file. This file can be used with something called "application tool", which is mostly either Vector CANape or ETAS Inca, to either manipulate or read out the variable values from the live ECU via the XCP/CCP protocol. XCP also contains mechanisms for so called bypass operation, where stimuli that are generally received via sensors are fed from the host PC (used mainly in stand-alone test scenarios for the ECU).

[0] https://wiki.asam.net/display/STANDARDS/ASAM+MCD-2+MC

kaftoy · on Jan 9, 2016

Yeah, so much for the "reverse engineering" operation. The guy didn't even take the time to research what the A2L is, but he was quick to throw accusations around [1]. There's nothing like some quick internet exposure.

[1] "Domke said that it is clear that lots of different kinds of cheating is going on in the ECU" ... because "12KB block of code that is used to ensure the tachometer always shows 780 RPM when the car is idling. Even though the engine is not that steady [...]". What does he expect? The RPM niddle to have 1-2-10 rpm resolution on the analog board? Or for the value to switch super fast on the digital screen to annoy the driver? All this stuff is filtered to have a smooth display info on the dashboard, same way input values from sensor in all sensor based applications are filtered at some point, for example.

[edit]

And btw, the SW is not set to display anything. The engine ECU does not display shit. It send data to the BCM or some other module over the CAN (usually) network. The value of 780 rpm in this case is a setpoint value. The ECU is programmed to KEEP a minimum idle engine speed value when not torque is requested. Else the engine would stall. This is done in a closed loop, one of the inputs being the engine speed which is calculated in different ways, from system to system (usually crank wheel sensor counting the teeth). This balance you may imagine will not be able to keep perfectly fixed 780 rpm idle speed, while the BCM will filter the dashboard value for driver's comfort.

ygra · on Jan 9, 2016

> he was quick to throw accusations around

To be fair:

a) English isn't his native language and both "schummeln" and "betrügen" would be "cheating" in English, though one with much less negative connotation

b) in the talk it sounded more like he was saying that the ECU (or other embedded computers) do a lot of filtering and processing already, even in places where you'd expect to have a straight path from the sensor to the reading (e.g. the idle rpm). When listening to the talk it didn't sound to me as he was implying that all of that processing was on the same level as the emissions stuff; just that there's a lot going on that most people don't even realise.

jzwinck · on Jan 9, 2016

Betrügwagen.

jacquesm · on Jan 9, 2016

This is a good thing. Having the computations done once and the results stored in a LUT saves compute power and it makes the device much easier to test exhaustively. A LUT doesn't have any edge cases, it only has values, whereas computing the same results in real time might lead to result values that cause problems. A LUT simply can not return a value that isn't in it (provided you index it right...).

kaftoy · on Jan 9, 2016

We (at another supplier) work with 2 types of LUT's: index and interpolation. The "index" ones will return, indeed, only values calibrated by the car maker or the subsystem supplier and nothing else. Interpolation lookup tables can compute values "in between" the calibrated values, based on not so complicated formula and the input values.

Having this type of computation saves lots of CPU cycles and there's nothing wrong using them. Believe me, there are a lot of models computed in real time, so saving some cycles on maps is quite allright. CPU load is high, as nobody ever will pay for an overkill CPU for a specific application with lower requirements. Modern single core TriCores from Infineon run at 120-150-180 MHz. Maximum CPU load (reached in regeneration or other calculation intensive activities) can go up to 85-87%. You can't go higher MHz with single cores as the heat is not manageable. So future cars (2016+) are setup to use new TriCores with multicore architecture. More complicated to program (to really run on multiple cores, not just use one), and still this new power we get will not be wasted by removing LUT's.

Since LUTs are also calibratable, you can have same model on very similar engines, but with different calibration maps (slightly adjusted to match the engine) => high reusability of code.

jacquesm · on Jan 9, 2016

You're right, there are also interpolating LUTs, for those the output values tend to be circumscribed by the values listed in the edge columns (and rows in case of 2D tables).

Since you seem to have a lot of insight into this material, what is the reason that the cycle budget is spent to that degree? With my limited insight into the world of embedded systems like these I'd imagine a main loop cycling several thousand times per second and none of the inputs generating more pulses than a few KHz you'd imagine that 120 MHz would be ample, what am I missing? Are these chips programmed in some high level language with significant overhead?

kaftoy · on Jan 9, 2016

(Coding is done in C, following MISRA rules and some internal additional standards. This includes the generated code - this one may not be highly optimized every time. The rest of the SW I could say is pretty much OK optimization wise. Assambler may be also used in some limited scenarios, especially for low level device drivers)

I think what you are missing is the complexity of the models working with the inputs to produce the outputs. The models are much more than a few LUT's here and there. It is just so much to do and you have to do in real time (we actually use real time operating systems, like RTA-OSEK and RTA-OS). Stuff is executed time based (1ms to 1s, with most at 10ms and 100ms) or event based (generated by SW or by interrupts). Events can come faster than 1ms (crank wheels teeth counting at high rpm for example).

jacquesm · on Jan 9, 2016

Ah, that explains a lot of the missing details. Thank you!

OopsCriticality · on Jan 9, 2016

I thought that if one exceeded 70% utilization on a real-time system, one lost guarantees of being real-time?

brandmeyer · on Jan 9, 2016

More detail on the "it depends". The guarantees are of the form "if the utilization is less than X%, then an arbitrary collection of periodic tasks can be scheduled without missing deadlines". The key term there is "arbitrary". For example, rate monotonic scheduling provides this guarantee up to ~70% utilization. However, manually scheduled tasks can walk up into the 90's. A UPS application I worked on scheduled all of its periodic tasks manually by ensuring that each slower rate task had a rate that was always an integer divisor of (and lower preemptable priority than) the next higher rate task. This proceeded all the way until you get down to much slower rate periodic "housekeeping" tasks that weren't even realtime.

OopsCriticality · on Jan 9, 2016

Ahh, that clears it up for me… nice example too. Thanks!

jacquesm · on Jan 9, 2016

That depends. As long as you don't miss any interrupts and as long as you manage to fulfill your latency promises it can work, but the higher the load the closer to the abyss you'll be walking.

on Jan 9, 2016

[deleted]

jacquesm · on Jan 9, 2016

> Is the thermal envelope really that tight?

Not an automotive developer but I've done quite a bit of embedded stuff that was placed in/on various items including cars. Yes, it is that tight. You're in a sealed container under the hood of a car, the ambient temperatures can be quite high and you're not going to be able to dump any of the heat you generate outside of the box you're in other than through conduction. Those boxes are typically made of ABS so conduction will be terrible.

Of course you could re-design the enclosures, have a heatsink mount point on the outside (some expensive trickery required to do that without breaking the sealed environment), or even an air duct that would allow you to use a fan (but fans tend to fail).

So I can totally see why the constraint on power consumption for such an embedded system would be so drastic.

revelation · on Jan 9, 2016

Notice that the reason the code is data driven to this extent is simply because it has been automatically generated from Matlab Simulink models.

Massively popular in the car industry and it will produce code exactly like this as there is very little control flow in Simulink.

(There is of course control flow, but really only in the basic elements of blocks you are composing, like a switch or a minmax block. Here is a good example of a Simulink model:

http://de.mathworks.com/help/simulink/examples/engine-timing...)

_0ffh · on Jan 9, 2016

I think "data driven" is meant in how all software modules are running in a kind of quasi-parallel mode: They read a bunch of "current" values, work on them, output a bunch of "next" values. Rinse and repeat.

Modern ECU software has lots of physical models in there. (I worked on a few.) Just using a few LUTs hasn't been cutting it for quite some time.

Washuu · on Jan 9, 2016

Back in the 1990s using almost exclusively LUTs made sense due to less processing power. The Chrysler JTEC PCM has three Motorola 68HC microcontrollers working in tandem. One for fuel, one for ignition, and one that ties it all back together with other sensors.

jacquesm · on Jan 9, 2016

The other reason is to make the device work in a deterministic manner.

TeMPOraL · on Jan 9, 2016

And in general, not recomputing things that should not be recomputed sounds like very good engineering to me. In the wider software world, we're way too eager to waste computational resources on pointless things.

snyderize · on Jan 9, 2016

This seems to be about engines that use diesel exhaust fluid (lots of references to AdBlue), not the 2.0 L TDIs that are part of the emissions scandal

kinofcain · on Jan 9, 2016

There was a separate issue with the 3.0L TDIs (which do use AdBlue) that VW had not informed the regulators of certain modes that the engine could enter due to temperature. This looks like that issue.

"Audi also shed light on the Auxiliary Emission Control Devices (AECDs) which the EPA had earlier labeled "defeat devices," confirming that the tech was part of a "warmup strategy" for the catalyst in the 3-liter emissions control system, and was thus completely different from the emissions-cheating software found in the 2.0-liter VW engines."

http://autoweek.com/article/vw-diesel-scandal/audi-develops-...

ygra · on Jan 9, 2016

The ECU he looked at was from his own car, which does use DEF, but also was affected by the recall that reprogrammed it, so it had the affected engine. While many of the affected cars didn't use DEF, some did. And since he could tinker and test with his own car, that's what he looked at.

snyderize · on Jan 9, 2016

OK, I thought his was something new. Seems like the fix for these is easy -- just use the normal dosing regime for the AdBlue and everything should be fine. I'm more interested in what happens with the cars like mine that don't use the fluid, and what the performance and mileage trade offs were to get under emissions targets.

rurban · on Jan 9, 2016

It was a Bosch ECU, not a VW ECU. VW (via Audi) just provides the a2l parametrization for the ECU.