Hacker News new | past | comments | ask | show | jobs | submit login
Qualcomm’s Hexagon DSP, and Now, NPU (chipsandcheese.com)
56 points by mfiguiere 11 months ago | hide | past | favorite | 23 comments



One of the major downsides of the Hexagon DSP is that its near impossible to actually run anything on it unless you somehow get your hands on an unprovisioned/unlocked SoC.

The HLOS (High-level OS) running on the Hexagon requires every "applet" to be signed by either the Qualcomm root cert or the OEMs cert. Usually, every phone has a set of generic Hexagon applets (or "skeleton libs") that are provided and signed by the OEM, which seem to be freely usable to offload some computational work to the DSP (mainly FastCV et al - https://developer.qualcomm.com/sites/default/files/docs/qual...). Those of course come with their own bugs: https://research.checkpoint.com/2021/pwn2own-qualcomm-dsp/

On some older SoCs, you were able to use a TOCTOU (Time of check to time of use) exploit to bypass the signature check by patching the applet loader shim in-memory, once it itself got authenticated: https://github.com/geohot/freethedsp/ (I have personally ported this to the msm8953, and it seems to work)


I am not surprised. When I worked at Qualcomm my main gripe was how closed and secretive they were about everything. The tech underneath was pretty cool, although nothing spectacular in my opinion. I don't think I ever saw anything that deserved all that secrecy, at least in the GPU.

When I switched to NVidia I was surprised to find a much more open ecosystem with good public documentation. NVidia did have some tasty secret sauce stuff that they didn't expose outright, but they did what they could to empower developers to make the best use of the underlying hardware. They strike the right balance between openness and maintaining a competitive advantage, in my view.

Just my opinion based on working in both companies for a number of years. Thankfully I no longer have a dog in that fight.


I am an ex QCOMer and agree with everything here. We always said it was a legal firm with a tech problem. That stranglehold on IP really holds the company back, IMO. Sure the licensing model made $$$ but they lose a lot of good will in the tech community.


Hello,

> The HLOS (High-level OS) running on the Hexagon requires every "applet" to be signed by either the Qualcomm root cert or the OEMs cert

That's no longer true since quite some years now :) See the Unsigned PDs, which are allowed for general purpose compute since at least sm8150 (Snapdragon 855).

Note that the articles you mention says this about it:

> Signature-free dynamic shared objects are run inside an Unsigned PD, which is the user PD limited in its access to underlying DSP drivers and thread priorities. An Unsigned PD is designed to support only general computing applications.


I spent way too much time trying to make use of it with Halide and was not successful. Are you saying that this is now possible? I am the developer an app which would greatly benefit from it.


Yes. Note however that the Pixel line shipped with Hexagon access restricted for non-platform Android apps however. But on other devices, things should just work.


This whole approach makes little sense for a developer (not to mention a user). When a consumer buys a phone at particular price point, they expect it to offer some level of performance. Now if devs can offload to these accelerators on a tiny subset of devices in the market, it will by definition lead to a fragmented user experience (and a ton more dev work). Why bother?

I am becoming convinced that CPU (and maybe GPU) is the only viable accelerator on Android devices. All these fancy accelerators are just for phone makers to do their own thing (mainly camera crap). Might as well make it part of the ISP.

Also, I fear Apple is going to eat Android's lunch at this rate :(


The new Brew MP?


You just gave me PTSD flashbacks. Man I am getting old.


I find this one funny. When I was working at qcom, was surprising to see that BREW was still not gone from the monorepo in the 2020s. (but no longer used by anybody of course)


> However, the Hexagon DSPs are notoriously hard to program.

I disagree the Hexagon is "hard" to program - the HVX intrinsics are fairly straightforward if you've used neon/avx/etc. The problem is that you don't get access to the HMX/HTA/(whatever they call the NPU currently), just HVX. So for running on DSP your options are (1.) Qualcomm's Hexagon SDK which doesn't give you access to the tensor hardware, and (2.) their SNPE SDK, which actually has access to HMX/HTA/etc, but doesn't let you program against the hardware directly. Instead, SNPE is supposed to convert your caffe/tf/onyx/torch models to Qualcomm proprietary format so it can map ops to the most appropriate hardware. What actually happens is that their conversion tools fall apart the first time you don't use one of their AlexNet/ResNet vanilla examples and try to convert a prod-grade model. Combine the brittleness of their conversion tools with the lack of documentation/support and it becomes impossible to use the hardware.

All that being said, would be interested to hear if anyone has had luck with other means of using their (admittedly impressive) hardware. Maybe a better approach would be to try using Halide for Hexagon and OpenCl for Adreno? They were also going to release a QNN update this year or next year - not sure if they've allowed anyone access to it. Its really a shame that the hardware is so good, but the software is so clunky.


> I disagree the Hexagon is "hard" to program - the HVX intrinsics are fairly straightforward if you've used neon/avx/etc.

The difficulty comes in launching your code on the device from the apps core. It's a bit unlike many other typical conventions for this.

I agree that once you're executing code there that "HVX intrinsics are fairly straightforward" etc


SNPE seems like the old way and QNN is the future as I understood it. I ran into some engineers at a recent AI conference and it appeared even they could not explain how to do model optimization in a sane way. What the heck is Qualcomm thinking? For my work, I just decided to ignore their accelerators for now.


> However, the Hexagon DSPs are notoriously hard to program. While the Snapdragon CPU and GPU can be targeted using OpenMP and OpenCL, respectively, no such model exists for the DSP.

It's true, this is a big drawback. Something like OpenCL would be really nice. I haven't used OpenCL for a while but way back when the design seemed to consider that the memory used by the accelerator device was a separate destination to be copied to. For the SoCs where Hexagon shows up, that's generally not the case. But hopefully there's already some tweaks to OpenCL to enable different SMMU contexts instead of copies.


I don't think it's true at all, you can just write C/C++ with SIMD intrinsics, just like you can on ARM or x86, and the instruction set is mostly awesome. OpenCL would just be an extra layer to get in the way. If you want to just run some existing OpenCL code, I guess that would be nice, but I doubt any OpenCL code written for a GPU would actually run well on Hexagon anyways.

The article also complains about VLIW in the same paragraph, but I don't think VLIW makes things harder, it just makes problems more obvious. If you write ARM or x86 code that has dependencies between every instruction, that's going to suck too, you just won't know it until you run it, but VLIW will make it obvious if you just look at the generated code. For the kinds of programs that make sense to run on a processor like Hexagon, VLIW is fine.

The whole Hexagon environment is just so much better than any of the other similar DSPs I'm aware of: you can use open source LLVM to compile code for it (so you aren't stuck with an old version of GCC), and the OS is much closer to standard (e.g. thread synchronization is just pthreads).

I did a bunch of work on Hexagon and I like it a lot. It is my favorite in its class.


It has a quite nice high level programming model tbh. FastRPC is quite nice.

And there's Halide support too.


I used to work with the Halide guys from 2019-2021; I was on the TVM team. The compilers group was very proud of Halide's success in the early pixel phones. That was the library behind HDR+ on the Pixel 3.

Hexagon is a difficult architecture to write code for but the benefits are worth it: it's the secret sauce for why Qualcomm's modems are so good. I see people getting all excited over AVX512 and I just think "well we had 2048 bit vectors years ago"


ex qualcomm qct compiler person here: qualcomm RF was so good even when Hexagon was being developed. if they moved their advantage onto Hexagon thereafter, that's cool.


It is? And not any esoteric analog design? That explains the secrecy.


> However, the Hexagon DSPs are notoriously hard to program. While the Snapdragon CPU and GPU can be targeted using OpenMP and OpenCL, respectively, no such model exists for the DSP.

That’s ultimately the downfall of NPUs. If it’s not accessible it may as well not exist.


Qualcomm's naming schemes are so confusing: I used to be on the QCOM Hexagon compilers team for a few years and I still don't know what hardware unit the NPU is. Is it Hexagon Matrix Extensions(HMX)?


Qualcomm has a part in maintaining that confusion sadly.

While Hexagon and HVX are publicly documented ISAs, HTA (legacy), HMX and friends are not publicly documented ISA extensions.


Oh yeah, I specifically worked on HMX stuff and I don't even know. I stopped caring about keeping up with product names when we stopped using names from LOTR for our internal projects. I wish they would reveal _something_, because currently it's a bit of a black hole on my resume




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: