Hacker News new | past | comments | ask | show | jobs | submit login
Gaining kernel code execution on an MTE-enabled Pixel 8 (github.blog)
302 points by gulced 9 months ago | hide | past | favorite | 60 comments



The big thing here is that the GPU has historically been a pain point for Android, because it has extreme access to the AP in ways that basically sidestep any mitigation that you put in its way. Any bugs in the driver's mapping code (and there have been many) end up giving very powerful primitives, and this fact has repeatedly been used in in-the-wild exploits. Unfortunately, I don't think much is going to change here until this gets rearchitected.


IMO, what needs to happen is that half-assed mobile GPUs need to stop including their own MMU, and use a standard IO-MMU.


A number of GPUs use a standard Arm SMMU instead of an IOMMU already.

The problem with those GPUs in general is driver issues, the hardware is fine.


> A number of GPUs use a standard Arm SMMU instead of an IOMMU already.

Yes, I'm talking about using cores like an ARM SMMU (which is an IO-MMU). Perhaps some GPUs do, but many (most?) don't including the Mali-G710 in this article that's currently shipping in the Pixel 8.

> The problem with those GPUs in general is driver issues, the hardware is fine.

Exactly. I want them to stop writing bespoke kernel code manually fiddling with some custom page table format that gives physical memory read/write primitives when they get it wrong.


> Perhaps some GPUs do, but many (most?) don't

See Qualcomm Adreno


Recent Adreno.

VideoCore, IMG, Mali, the RDNA2 respun as "XClipse GPU" all do not use system standard IOMMUs to provide their user device contexts.


Is the GPU driver closed source?


It is in this case.


The UM part, KM is OSS


What does AP mean here?


Application Processor, i.e. the main processor


[flagged]


It's a very common phrase in spoken english.



It's been around for 38 years. It's not going anywhere.

https://www.merriam-webster.com/dictionary/pain%20point


I get that, but in GP post there was a zero need to use a buzzword instead of something normal, such as "problem" or "issue" or smth.


What can I say, the corporate environment has ruined me


> What is interesting about this vulnerability is that it is a logic bug in the memory management unit of the Arm Mali GPU and it is capable of bypassing Memory Tagging Extension (MTE)

The rest of the article appears to be describing that a bug is actually caused by a race condition and use-after-free is simply a consequence of it.


Would this affect GrapheneOS installs as well prior to the March update?


One of the main goals of GrapheneOS is to release security updates as soon as possible, so if it's patched upstream GrapheneOS almost surely includes the patch.

Sometimes they even adopt pre-release AOSP security patch levels or backport security fixes from unreleased AOSP or kernel sources.


given that this is related to a hardware-ish problem (maybe firmware inside it?) in the GPU I'd bet it even affects it after the march update which was related to the bluetooth stack.

EDIT: Ignore me, I was confusing that with the recent blog post they had about finding an issue with MTE applying to all system apps too. Looks like GrapheneOS should have this as of their 2024030600 release because it brings in the "full 2024-03-05 security patch level"


Probabilistic Arm MTE memory safety is a stepping stone to deterministic CHERI hardware, https://saaramar.github.io/memory_safety_blogpost_2022/ & https://news.ycombinator.com/item?id=39668053

  The right kind of mitigations targets the 1st order primitive; the root cause of the bug.

  Hardware solutions: CHERI (Morello, CheriIoT), MTE
  Software mitigations: kalloc_type+dataPAC, AUTOSLAB, Firebloom, GuardedMemcpy, CastGuard, attack surface reduction
  Safe programming languages: Rust, Swift

  MTE/CHERI play pretty nicely - they help ensure that whatever bugs we have in these areas are killed at their root cause… MSR, MSRC and Azure Silicon pushed for… scaling CHERI down to RISC-V32E, the smallest core RISC-V specification.
Microsoft Research open-sourced a hardware/software stack for CHERI in IoT devices, https://msrc.microsoft.com/blog/2023/02/first-steps-in-cheri...

  CHERI-based microcontroller that aims to… get very strong security guarantees if we are willing to co-design the instruction set architecture (ISA), the application binary interface (ABI), isolation model, and the core parts of the software stack… our microcontroller achieves the following security properties:

  Deterministic mitigation for spatial safety (using CHERI-ISA capabilities).
  Deterministic mitigation for heap and cross-compartment stack temporal safety (using a load barrier, zeroing, revocation, and a one-bit information flow control scheme).
  Fine-grained compartmentalization (using additional CHERI-ISA features and a tiny monitor).
David Chisnall, U of Cambridge, https://lobste.rs/s/gnjx2n/c_can_be_memory_safe#c_9ohzku via https://eclypsium.com/blog/a-faster-path-to-memory-safety-ch...

> There are around 13 billion lines of open source C and C++, which end up in various TCBs. This number gets even bigger when you include proprietary code… if we all stopped writing C/C++ code now and every software engineer focused on rewriting legacy code in safe languages (and on the assumption that everything can be written in safe languages) then it would take 5-10 to replace everything and we’d likely see a lot of logic bugs because we’d be replacing old well-tested code with new code that would need different algorithms and data structures to fit with allowable idioms in safe languages.

> If we didn’t do the rewriting thing and just stopped writing code in C/C++, then at normal code replacement rates, our TCBs would be entirely safe in around 50 years. If we don’t all agree to stop writing C/C++, it’s at least 100 years.

> In contrast, if the major CPU vendors shipped CHERI CPUs in five years, most machines (and all high-value ones) would have memory safety within 15 years of today, without needing programmers to change their behaviour.


For anyone interested in CHERI for embedded/IoT and other similar use cases lowRISC (whom I work for) are building a couple of FPGA based evaluation platforms for CHERIoT (The Microsoft created CHERI variant referred to above): https://www.sunburst-project.org/

The first is the Sonata system: https://github.com/lowRISC/sonata-system. This comprises a dedicated PCB with an FPGA along with various peripherals and headers. The PCB design is done and will be available through Mouser (plus it's open source including the board layout so you can assemble your own if you like). We're currently working on the RTL for the FPGA. When complete you'll have a complete CHERIoT based microcontroller like system with documentation and tooling.

Additionally we're building the Symphony system, which combines Sonata with the OpenTitan Earl Grey root of trust: https://github.com/lowRISC/symphony-system


There is also Solaris SPARC ADI, that most folks keep forgeting, because Oracle and the state of Solaris SPARC, unfortunely.


That’s comparable to MTE, and much weaker than CHERI.


CHERI is great, but until it becomes a widespread product and not ARM Morello test board, or current RISC-V prototype, anything else in production is better than nothing.


Does SPARC count as being in production anymore?


It definitely counts, it is available for anyone that still wants to buy one.


Its certainly available though.


Since this was an off-CPU hardware bug, I don't see how CHERI would help.

Anyway, the last time I looked into it, CHERI wasn't sound: It was still possible to write memory bugs on top of it. Have they fixed that yet?


Yes and no. CHERI provides bounds safety but not lifetime safety. If you use capability enhanced garbage collection you can have both, but obviously bolting garbage collection on top of everything you're already doing with manual management (reference counting, etc.) in your existing C/C++ codebase is going to be the worst of both worlds.

Lifetime safety is a much harder problem to solve. Despite CHERI providing ""more robust"" bounds safety, the fact that you get decent lifetime safety for essentially free from MTE is a huge plus. The two technologies aren't incompatible so in theory you could bolt the two together to get MTE lifetime safety and CHERI bounds safety, but that would likely waste a ton of memory.


> then it would take 5-10 to replace everything

If we're talking years, that seems wildly optimistic. I imagine the bike-shedding alone would take that long.


The root problem seems to be that the user is executing malicious code and abuse some MMU hash collision.

The exploit can probably be written in most languages, including Rust.


[flagged]


I am chewing my tabacoo. Maybe I misunderstood the article ...


hardware is _that_ bad?? holy...


GPU hardware is crawling with bugs. Hardware is only re-spun for things that cannot be worked around in the driver at an acceptable cost. That approach is possible because GPUs do not allow relatively direct hardware access like CPUs do.


This is a bug in the driver that runs on the CPU.


This is great research and a great write-up, but I'm a little (pleasantly) surprised to see it on GitHub's blog.

Does anyone know what their "business reason" for doing research like this is? (not that a business reason should be needed, but like I said, I'm a bit surprised to see it here)


Man Yue Mo worked at Semmle (https://blog.sonatype.com/steps-to-responsible-disclosure) before it was acquired by GitHub (https://github.blog/2019-09-18-github-welcomes-semmle/). That research function has carried on as the GitHub Security Lab.

Semmle built CodeQL, now offered by GitHub (https://docs.github.com/en/code-security/code-scanning/intro...), which GitHub and Microsoft (see https://www.microsoft.com/en-us/security/blog/2023/11/02/ann...) want to associate with "deep security insight".

So they continue to fund this kind of novel security research, for which security practitioners across industry are grateful.


This work comes from GitHub's Security Lab https://securitylab.github.com/


A little surprising that hasn't been shifted into MSRC, but GitHub operates very independently inside Microsoft.


They got bought by Microsoft and so have the resources to sponsor research, including of this kind. There’s a GitHub app, and the security of that app is not outside their purview. if an attacker manages to install a lurky app on your phone, they could do stuff as you. if you're someone with GitHub clout, that could be real damaging so it's in their interests to find such vulnerabilities.


They have hosted action runners for arm too. So, they may have an interest in checking and verifying the security capabilities of arm hardware with MTE for sandboxing.


> Does anyone know what their "business reason" for doing research like this is? (not that a business reason should be needed, but like I said, I'm a bit surprised to see it here)

I think it's basically basic research [0]. In first order reasoning, github, as a product doesn't really need android security experts. But employing them has some potential long-term benefits.

[0]: https://en.wikipedia.org/wiki/Basic_research


Unlike other departments, security teams often don’t have anything to do so this research is a good use of free time.


What is this comment? Github security research lab solely focuses on security research and publishes some of the best research in the industry.

Man Yue Mo is a security researcher who finds some of the most complex and impactful bugs in the industry like crbug.com/40065473


Seeing mmsc's post history, especially computer security related comments, I presume he was just being sarcastic :)


Indeed.

Although, it wouldn’t be abnormal for a security team to have free time, and dedicate it to researching an emerging technology whether it directly contributes to the business goals or not. Of course I’m not talking about a security team that is reading log files from their SIEM while sitting in a SOC.


How many security teams have you been on? Definitely ones with less work than I've been on...


Sadly people didn't see the sarcasm in this comment


i understand people disliking using tone indicators, especially when they can ruin a joke, but they are really wonderful things that can prevent misunderstandings like this online


Wow, that's just absolutely incorrect. Ignoring that tons of security teams are actually stupidly busy, this person's specific role at GitHub is security research. GitHub have security products for code security, which he ties into.


My colleagues at the GH Security Lab saw this and made this thread/response [1]

I’ll paste:

Why does GitHub Security Lab do research like @mmolgtm’s recent work on bypassing MTE on the Pixel 8? This question was asked on Hacker News and we think it’s worth a short thread. news.ycombinator.com/item?id=397522…

First an important point: we only research open source code, which means that many parts of your phone (for example most of your apps) are out-of-scope for us. That said, all open source code is in-scope, including projects that aren’t hosted on GitHub. (Quote tweet reply to this tweet [2])

In this particular case, @mmolgtm found a bug in Arm Mali, which is an open source GPU driver used on many Android phones. Android itself is open source. https://developer.arm.com/downloads/-/mali-drivers/valhall-k...

Open source software is the foundation of much of the world’s software. So when open source wins, we win. And that’s why @GitHub takes its responsibility seriously, to help make open source software more secure.

GitHub Security Lab sits within @GitHubSecurity, and we focus exclusively on open source security with four main priorities:

First, we run the GitHub Advisory Database, which is a comprehensive database of open source vulnerabilities. https://t.co/U4HlXO2l1G

Second, we share information around secure coding practices, through blogs and video content. https://t.co/EdO5SZtR0B

Third, we use GitHub’s CodeQL to scan thousands of open source repositories for common security mistakes, like SQL injections or path traversals. https://t.co/m72rt2a5RL

And fourth, we do deep research on critical open source projects. @mmolgtm’s recent work on Arm Mail is an example of this. https://t.co/jxVYeoJjtO

The work that we do feeds into GitHub’s security products. For example, the advisory database is used to generate Dependabot alerts. https://docs.github.com/en/code-security/dependabot/dependab...

Similarly, our work with CodeQL provides feedback to the code scanning team to help improve and further develop the feature so that more vulnerabilities are caught quickly and automatically. https://docs.github.com/en/code-security/code-scanning/intro...

And these activities also benefit open source, because GitHub security products, including Dependabot and CodeQL, are free for open source projects!

Our deep research work is primarily intended to inspire the community, so that we can improve open source security together. That’s why we publish detailed blog posts and proof-of-concept exploits.

https://github.com/github/securitylab/tree/main/SecurityExpl...

We’re big believers in Linus's law: “given enough eyeballs, all bugs are shallow”. Together, we’re making open source software secure. https://en.wikipedia.org/wiki/Linus%27s_law

[1]: https://x.com/ghsecuritylab/status/1770940743944720557

[2]: https://x.com/zemarmot/status/1681008991663423489


I am surprised no one introduced yet a CPU and phone which has little if any GPU and called it a business phone. The obvious advantages include security, cost, power consumption.


the obvious disadvantage is no high-dpi touchscreen so you're back to a Blackberry or Palm Treo, things that were sold as business phones.


And that requires a powerful GPU? I thought a much much simpler 2D accelerator in the style of the S3 911 of yesteryear would be enough.


Swipe up from the bottom of your iPhone. Oops, you're suddenly doing 3D transformations.

There are dozens of UI effects which rely on the GPU, and there's just no such thing as a 2D GPU these days, it makes no sense unless you're building a retro console or something.


> Swipe up from the bottom of your iPhone. Oops, you're suddenly doing 3D transformations.

So don't do that exact effect? This is a pretty weak objection.

> there's just no such thing as a 2D GPU these days, it makes no sense

This might be stronger but I'm not an expert on pixel pushing.


There's quite a step between “you can't have fancy UI animations” and “you're back to BlackBerry” though…


Things like inertial scrolling are not 'fancy UI animations', they're core components of a touch ui. Take out the touch UI and you're back to something like a nicer Treo.


Anyone who has an eInk device (where such animations are impossible due to the refresh rate of the screen) can tell you that it's still fully usable and has nothing to do with getting back to BlackBerry or Treo.

It looks less nice and is limited in some ways, but for business needs to does the job perfectly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: