Seems more like an oversight, since you have to stitch together a bunch of subop...

arghwhat · 2024-04-12T11:47:15.000000Z

It does seem like an oversight, but there's nothing "suboptimal non-default options" about iteven if the implementation posted here seems somewhat hastily hacked together.

segfaultbuserr · 2024-04-12T13:36:24.000000Z

> but there's nothing "suboptimal non-default options" about it

If "bypassing the official driver to invoke the underlying hardware feature directly through source code modification (and incompatibilities must be carefully worked around by turning off IOMMU and large BAR, since the feature was never officially supported)" does not count as "suboptimal non-default options", then I don't know what counts as "suboptimal non-default options".

arghwhat · 2024-04-13T07:28:11.000000Z

> bypassing the official driverto

The driver is not bypasses. This is a patch to the official open-source kernel-driver where the feature is added, which is how all upstream Linux driver development is done.

> to invoke the underlying hardware feature directly

Accessing hardware features directly is pretty much the sole job of a driver, and the only thing "bypassed" is some abstractions internal to the driver. Just means the patch would fail review in basis of codestyle, and on the basis of possibly only supporting one device family.

> through source code modification

That is a weird way to describe software engineering. Making the code available for further development is kind of the whole point of open source.

> turning off IOMMU

This is not a P2PDMA problem, and just a result of them not also adding the necessary IOMMU boilerplate, which would be added if the patch was done properly to be upstreamed.

> large BAR

This is an expected and "optimal" system requirement.

segfaultbuserr · 2024-04-13T17:21:24.000000Z

> The driver is not bypasses. This is a patch to the official open-source kernel-driver where the feature is added, which is how all upstream Linux driver development is done. [source code modification] is a weird way to describe software engineering. Making the code available for further development is kind of the whole point of open source.

My previous comment was written with an unspoken assumption: Hardware drivers tend to be very different from other forms of software. For ordinary free-and-open-source software, the source code availability largely guarantees community control. However, the same often does not apply to drivers. Even with source code availability, they're often written by vendors using NDAed information and in-house expertise about the underlying hardware design. As a result, drivers remain under a vendor's tight control. Even with access to 100% source code, it's often still difficult to do meaningful development due to missing documentation to explain "why" instead of "what", the driver can be full of magic numbers and unexplained functionalities, without any description other than a few helpers functions and macros. This is not just a hypothetical scenario, this situation is encountered by OpenBSD developers on a daily basis. In a OpenBSD presentation, the speaker said the study of Linux code is a form of "reverse-engineering from source code".

Geohot didn't find the workaround by reading hardware documentation, instead, it was found by making educated guesses based on the existing source code, and by watching what happens when you send the commands to hardware to invoke a feature unexposed by the HAL. Thus, it was found by reverse-engineering (in a wider sense). And I call it a driver bypass, in the sense that it bypasses the original design decisions made by Nvidia's developers.

> [turning off IOMMU] is not a P2PDMA problem, and just a result of them not also adding the necessary IOMMU boilerplate, which would be added if the patch was done properly to be upstreamed.

Good point, I stand corrected.

I'll consider stop calling geohot's hack "a bypass" and accepting your characterization of "driver development" if it really gets upstreamed to Linux - which usually requires maintainer review, and Nvidia's maintainer is likely to reject the patch.

> [large BAR] is an expected and "optimal" system requirement.

I meant "turning off (IOMMU && large BAR)". Disabling large BAR in order to use PCIe P2P is a suboptimal configuration.

arghwhat · 2024-04-15T20:28:06.000000Z

> For ordinary free-and-open-source software, the source code availability largely guarantees community control. However, the same often does not apply to drivers. Even with source code availability, they're often written by vendors using NDAed information and in-house expertise about the underlying hardware design. As a result, drivers remain under a vendor's tight control.

This is not true for any upstream Linux kernel driver. They are fully open source, fully controlled by the community and not subject to any NDAs to work on. The vendor can only exert control in the form of reviews and open maintainership. This is the case for both AMD and Intel GPU drivers.

For numbers, Arch Linux bundles some ~7.5k loadable kernel drivers (which is a subset of all available, but AUR only has 8 out-of-tree drivers, out of which two are proprietary and 2-3 are community-led. "vendor-controlled" drivers are an extreme outlier on Linux.

While not true for nvidia's proprietary, closed-source driver stack, the linked open source nvidia (kernel) driver is nvidia's work-in-progress driver to be upstreamed in exactly this fashion.

> by watching what happens when you send the commands to hardware to invoke a feature unexposed by the HAL. Thus, it was found by reverse-engineering

It is true that documentation was lacking, but there was no reverse-engineering here, just standard (albeit, hacked up) driver development. "The HAL" means nothing, as it's just an random abstraction within the driver.

I used to work on proprietary network drivers for high-performance, FPGA-based NICs, and apart from it being more annoying to debug it's really no different than coding on anything else. Unless you're bringing up new IP blocks, it's mostly you and the code, not specs.

It would have been reverse-engineering if the registers and blocks weren't in the driver in the first place, but in this case all the parts were there to bring up a completely standard feature. What he did was use existing driver code to ask for what he wanted.

Not saying that it wasn't significant effort, but unless you consider "reading code you did not write" reverse-engineering (in which case all coding is reverse-engineering), then it has nothing to do with reverse-engineering, or bypassing anything. Considering this reverse-engineering is also a bit of a disservice to those that actually reverse-engineered GPUs and wrote open-source GPU drivers for them, like panfrost and recently the Apple silicon drivers.

Apart from the fact that the feature is hacked and not properly wired up (as evident by the IOMMU issue and abstraction break), the implementation flow is done exactly the same as when volunteers contribute fixes or features to other, complex open-source upstream GPU drivers, like AMDGPU. Not by reverse-engineering, nor by signing NDAs, but by reading and writing code and debugging your hardware. Stuff is just always harder in kernel mode.

> I meant "turning off (IOMMU && large BAR)". Disabling large BAR in order to use PCIe P2P is a suboptimal configuration.

The requirement is large BAR on, not off - the phrasing is a bit poor on the page, but they're trying to say that you need large bar support, and that you need IOMMU off, not that you need large bar and IOMMU off.

segfaultbuserr · 2024-04-17T21:31:30.000000Z

> This is not true for any upstream Linux kernel driver.

This is very true for at least a portion of upstream Linux kernel drivers.

> They are fully open source

True.

> fully controlled by the community

False (Of course, this claim is based on my belief that source code access is insufficient to do driver development, hardware datasheets are not just nice to have but mandatory for a true free operating system. If you disagree, nothing more can be said).

The upstream Linux kernel has absolutely no requirement that the contributed hardware drivers must include public documentation. In fact, many are written by vendor-hired contractors, or by independent embedded system companies who make computer gadgets based on these chips (so they have a business relationship with the vendor that give them datasheet access).

If you want to do any low-level work on these kinds of drivers independently, it almost always involves trial-and-error and educated guesses.

> and not subject to any NDAs to work on. The vendor can only exert control in the form of reviews and open maintainership. This is the case for both AMD and Intel GPU drivers.

Sure, the drivers themselves are not subject to any NDAs to work on, but hardware vendors often maintain a de-facto control - because they are in possession of technical advantages not available to outsiders, in terms of NDAed information and in-house expertise. Although theoretically everyone is free to participate, and it often happens, but when 80% of the experts are programmers with privileged information, it's difficult although not impossible for an outsider to join the development effort (it's no surprise that most outsider contributions are high-level changes that do not directly affect hardware operations, like moving away from a deprecated kernel API, fixing memory leaks, rather than, say, changing the PLL divider of the hardware's reference clock).

> For numbers, Arch Linux bundles some ~7.5k loadable kernel drivers (which is a subset of all available, but AUR only has 8 out-of-tree drivers, out of which two are proprietary and 2-3 are community-led. "vendor-controlled" drivers are an extreme outlier on Linux.

As a contributor to the Linux kernel, my personal experience is, drivers that contains code similar to the following are not rare:

    uint8_t registers[] = {
        0xFF, 0xFE, 0xEA, 0x3C, 0x5A, 0x6A,
        0x01, 0x02, 0x3A, 0x4D, 0x55, 0x66
        /* many lines */
    };

    /* initialize hardware */
    write_registers(device_ctx, registers);

I bet at least 10% of device drivers (or as high as 25%?) are written with privileged information and they're almost impossible for independent developers to work with (other than making educated guesses) due to incomplete or NDAed documentation. Datasheet access is the only way to interpret the intentions of this kind of drivers. Even with macro or bitfields definitions - which is better than none, the exact effect of a register is often subtle. Not to mention that hardware bugs are common and their workarounds often involve very particular register-write sequences - and that tend not to be publicly documented. Sometimes, even firmware binary blobs are embedded into the source code this way, and the code is almost always contributed by a vendor who actually knows what the blob is doing. If I was asked to do some development on top of an existing driver but without hardware documentation, I would refuse it every single time.

_zoltan_ · 2024-04-12T20:26:09.000000Z

I have some news for you: you must disable IOMMU on the H100 platform anyway, at least for optimal GDS :-)

segfaultbuserr · 2024-04-13T17:28:37.000000Z

I stand corrected. If it's already suboptimal in practice to begin with, the hack does not more it more suboptimal... Still, disabling large BAR size is still sub-optimal...

talldayo · 2024-04-12T15:55:46.000000Z

> then I don't know what counts as "suboptimal non-default options".

Boy oh boy do I have a bridge to sell you: https://nouveau.freedesktop.org/