Hacker News new | past | comments | ask | show | jobs | submit login
Issue 62938 – Barometer driver hangs and kills accelerometer on its way (code.google.com)
100 points by cryptoz on Jan 28, 2014 | hide | past | favorite | 25 comments



Aaaah I2C, one of the simplest protocols there is and also probably the one that gave me the most debugging time.

If someone wants to look into it and is willing to tinker with the guts of the phone, the first thing is to see the state of the bus when the devices stop responding. I2C's "idle" level is high thanks to pull ups, the devices only drive the bus through an open drain.

Sometimes a device's state machine will go fubar (either because of a hardware bug or programing error) and will lock the bus down, basically making it impossible for anybody else to use it.

If this happens the next step is to disconnect/reset all other devices on the bus to make sure which one is screwing up (that's the difficulty with I2C, since the two lines can be driven by any master or slave you can never really know who's doing what when things go wrong).

An other thing to look for is the level of the line. Since there are many devices and pull ups on the wire it's not common to have messed up levels (0 is really 0.2 or 1 is really 0.8, if the pull up is too strong or too weak respectively). Depending on temperature and other factors that can lead certain interfaces to sample bad values.

And then well... you have to capture the transaction that causes the lock up and try to understand what goes wrong...

As a quick fix it might just be possible to force a reset of the bogus chip when a lockup is detected, that would prevent having to restart everything. There's usually a GPIO for that (if they wired it...).

I hope you have a good oscilloscope!

Quite frankly I can empathize with the dev not wanting to look into this bug, by the looks of it that's the kind of minor bug that'll take several days to track down and fix.


Very good summary, though let me throw in some doubt:

    > it might just be possible to force a
    > reset of the bogus chip when a lockup is
    > detected [..] There's usually a GPIO for
    > that (if they wired it...).
Yes, if they wired it. In my experience, actual design with this best practice is frustratingly rare. It's as if the hardware designers think, "Oh, it's just I2C, what could go wrong."

If they didn't do it, the chips are probably only connected to a master board level reset, and you're basically screwed.

To find out whether it's the case on the affected devices, absent schematics or a scope, I'd grep around the kernel sources and look for definition of such a pin. (If sources aren't available, try symbols.)


You're being way too kind to these chips.

Look at the datasheet for the BMP280 barometer chip, the AK8963 compass, and the MPU6500 accelerometer (close enough):

http://datasheet.octopart.com/BMP280-Bosch-datasheet-1369120... http://www.akm.com/akm/en/file/datasheet/AK8963.pdf http://dlnmh9ip6v2uc.cloudfront.net/datasheets/Components/Ge...

The compass (AK8963) has a reset line, the other two have no reset line at all. Your best bet is to drop VCC, but what are the chances the hardware guy just tied them to the power bus and left it at that?


> The compass (AK8963) has a reset line, the other two have no reset line at all. Your best bet is to drop VCC, but what are the chances the hardware guy just tied them to the power bus and left it at that?

That is, if you also have even more circuitry to disable the bus pullups; otherwise they will continue to power the device via clamping diodes, potentially calling further confusion. The hardware guys don't implement I2C slave power control for a reason: it gets bloody expensive real fast - both in terms of BOM cost and PCB estate.

Having said that, the lack of a RESET line on many I2C slaves is just ridiculous. I have a sticker on my monitor that says "fix the hanging MMA8451Q bug"; it's been there for at least three months. The very thought of debugging I2C transactions makes me stop even trying.


Good point about the pull-ups. Forgot about that.


> Yes, if they wired it. In my experience, actual design with this best practice is frustratingly rare.

Keep in mind how precious board space and GPIOs are in a modern phone. The boards are tiny, and even with 8 or 10 layers, they are still packed with traces on each layer.

Add to that, that the processor itself doesn't have a lot of GPIOs left over for a design like this. There are soooo many peripherals these days. Sure, you could use a port extender, but that is (a) another chip, (b) extra board space, (c) extra cost. So that's not going to happen unless something really important needs it, like the audio subsystem.


My favorite thing is when vendors are too lazy to implement I2C, so you have to open the datasheets for every device that you want to have on the same bus and compare each timing chart to see if they'll work together. Scales like O(n^2).


Sometimes the slave's state machine will get 'lost' emitting a zero. It is possible that simply toggling SCL until SDA goes high (followed by a STOP) can resolve these issues without a separate reset line. Implementing this may require temporarily putting the I2C pins into GPIO mode and bit banging.


Thanks for the suggestion; I'll probably end up implementing this in my I2C driver. The temporary GPIO switch won't hurt; at that point, the I2C host should probably get a reset too just to un-confuse it (at least I can generate THAT in pure software).


Pedantic, but: can we replace the title with something like "Bug in AOSP breaks barometer, accelerometer usage."


No worries, it won't be long before a helpful mod comes along to change the title to "Issue 62938: Barometer driver hangs and kills accellerometer on its way." Enjoy the context we have until that happens.


I think that would be a much better title (minus the issue number).

As it is there's strong presumption that the bug is not in AOSP, so that's missleading. As for the "impeding science" part it's clickbait more than anything else IMHO.


What context? Right now the title is empty clickbait: "This bug in AOSP is impeding science. How to best get Google's attention?"


Context, being what does it affect, and why should anybody care. The current title accurately states what is affected (AOSP), and does a less great job of saying who should care (all scientists and probably the entirety of humanity, apparently). The "correct" title gives no indication of what is affected, and little motivation for caring. The title probably will change (they seem to do that, anyhow), but it will probably change for the worse. This is an argument that's been going for years, and it's not going anywhere, but I'm still going to gripe about it from time to time.


It would be a lot easier to get their attention if there was an actual 'adb bugreport' attached to the bug.

There's very little context on what the bug is and how to reproduce it besides some vague references to PressureNET (which means very little to someone who hasn't used that before).


How do you know it's a bug in AOSP. It could well be a driver issue.


It affects the Nexus 5 (and to a lesser extent Nexus 4), shouldn't that be stock Android?


The Android Open Source Project refers to the components released by Google as open source. This is the framework for the most part, plus a few Java apps. The drivers are part of the per-device "BSP" layer, and not part of AOSP. They're sometimes delivered as source (certainly the kernel components are), but often not (the userspace HAL libraries are almost never source-visible).

This bug looks to be between the kernel driver and sensor HAL to me. It might be fixable in code we can see, but none of that is part of "AOSP".


No...If the driver is provided as a blob from the manufacturer then it may have nothing to do with AOSP directly.


This is a hardware issue that only affects some of the phones. I did a factory reset and no additional apps installed and still the phone stopped to auto-rotate every day needing a reboot. RMA'd the phone, the new one does not have the accelerometer problem anymore. However, it has the focus problem in low light conditions which the first one didn't have. Not sure I will buy another LG phone in the future.


Ah, I'm glad I stumbled upon this. I've seen this exact problem on my device and didn't know what was triggering it. I've been running a widget to track barometric changes, specifically SyPressure, and since I installed it, at least 3 times I've seen the orientation sensor stop working. I hadn't made the correlation since I was in other applications when it would fail.


It's probably embarrassing to admit but I read the title and wondered if Barometer was an Uber competitor and some driver had gone postal. It did not quite fit the meaning but it was the best I could do till I read the bug.

but the whole embedded market seems like this now - everything just the wrong side of reliable and abstractions an impossibility. maybe I am just too new to the area.


Look on the bright side, my first interpretation of the subject line was "solder in a new chip" not "This can only be fixed via reboot."


Please, won't anybody think of the science?!


Promise them cake. That worked for GlaDOS...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: