Hacker News new | past | comments | ask | show | jobs | submit login
Reverse engineering Dell iDRAC to get rid of GPU throttling (github.com/l4rz)
249 points by f_devd on May 10, 2023 | hide | past | favorite | 90 comments



BMC's in general leave me uneasy.

I like the idea, it is a small computer that is used to monitor and control your big computer. But hate the implementation. Why are they all super secret special firmware blobs? Why can't I just install my linux of choice and run the manufacturers software? This would still suck but not as bad as the full stack nonsense they foist on you at this point.


They’re special firmware blobs because generally the OEMs aren’t building their BMCs from scratch as they might with their main boards and other components. They’re generally getting the bmc SoC from the likes of Aspeed and others who are the ones keeping them closed up. I’ve tried to get the magic binaries and source for various projects but have given up because there are so many layers of NDAs and sales gatekeepers. I’m not entirely sure who makes the dell bmcs but I know supermicro bundles Aspeed (at least they did with older generations of their main boards.)

I agree with you that you should be able to run whatever since in the end it’s just another computer, but the manufacturers believe otherwise since there’s “valuable IP” or whatever nonsense (insert rollseyes emoji here).

There are open specs like redfish but still doesn’t get to the heart of the matter.


It is complicated.

AMI sells a bmc software stack, https://www.ami.com/megarac/ Intel and small manufacturer were unhappy, about always paying the ami tax. So intel created openbmc, as a hedge against ami's monopoly for small manufacturers. I have heard Openbmc has user from facebook, google, ibm, bytdance, and ali.

Dell owns their own stack in idrac, I have heard most of their systems are nuvoton based. I am suspect dell pays some big bucks to keep their systems at feature parity with the other options, and they view it as a an investment.

There are also silicon devices on the motherboard, that have drivers that are not able to be shared. So it not surprising that companies don't share source in a way that would be useful.

If you wanted a system that as a bmc that could be tested try the asrock-e3c246d4c, it looks like there are hobbyist, that have it running coreboot, and openbmc. (impressively)

https://9esec.io/blog/coreboot-on-the-asrock-e3c246d4c/


Aspeed aren't the ones keeping them closed - OpenBMC running on Aspeed chips is an option chosen by some system vendors. It's a vendor choice whether to go with an open platform or a closed BMC from AMI etc.

https://github.com/AspeedTech-BMC/openbmc (or see the upstream OpenBMC tree)

HPE are adding some OpenBMC support https://www.hpe.com/us/en/compute/openbmc-proliant-servers.h...

This work from Supermicro looked promising https://lore.kernel.org/openbmc/CACPK8XdE0sRmt4x54YJVJO2wDT5...


Was responsible to poke at BMC security in data centers.

BMCs lack the security fundamentals and often behave like cheap IoT knock-offs. They often use outdated kernels, libraries, and security mechanism.


Yeah they are utter garbage. For years you had to use Java 6 with absolutely every modern security measure turned off in both the JVM runtime itself and your browser to access Dell DRACs. Accept expired certs, run unsigned code, I'm sure this is all fine ...

I mostly work in the cloud now but when I last had to manage a bunch of physical machines we had a physically separate network accessed via its own VPN to get onto the BMCs. Because yeah, the security situation was a joke.


I found that if you leave your bmc unplugged on a super micro, it’ll conveniently bridge it to whatever other Ethernet is plugged in, meaning an outage of your management network may roll over to another network unintentionally.

Id put money on there being preauth vulnerabilities in those things, judging by the engineering quality.


That sounds like showstopper for many sites.

On a much less-important note, it might explain some weirdness I'm seeing lately with one of my home Supermicro servers. (The docs say the BMC should only listen on one port, but the switch still sees some degree of responsiveness on the normal non-management port when "off".)


I believe you can override this behavior in BMC settings


Ran a fleet of servers with terrible BMCs. We kept those on a well-sealed-off private network.

Woe betide you if you run into one of the BMC implementations that shares a host network interface; no separate cable! These things are terrible from a security standpoint.


i hope you disable OS passtrough because it could be gore


The S in BMC stands for "Security".


Was responsible for trying to improve the management and operation of a large fleet of BMCs for a while. Plenty of bugs and pace of releases is slow. :(

Definitely an area where a more open ecosystem would improve the pace of innovation.


We need a Linux for BMCs. Oxide is working on one, but I'd like to see a contender fielded from the seL4 community, along with some other folks. For example, why doesn't Wind River have one already?


OpenBMC is the leader at this point.


Why linux? Why not *BSD?


> Why linux? Why not *BSD?

GPL can force manufacturers to cooperate with users. Of course, they can still use closed source binary modules and userland programs ...


That sure sounds appealing to manufacturers.


Once it reaches critical mass it is appealing because the value you receive from the community is far greater than what you have to give back.


If the point is code diversity, what stops you producing a fork of BSD for BMCs under the GPL?


GPL can also turn manufacturers away. I would rather have variation in the possible BMC operating systems instead of sticking linux everywhere and contributing to a monoculture.


...which you wouldn't get with BSD as they'd just close it down and ship you binary, as they would have no obligation or reason to.

The BSD "freedom" is not for the user of the software, it's for corporation to take.


And how would this affect the original open BSD-based BMC firmware which would be available? If the corporations want to maintain their own fork, it's on them.

The original BSD release will stay open and free for everyone to utilize.


How can you trust that a closed source OpenBSD fork is secure when there's no way to audit the quality (or lack thereof) in the firmware the vendor gives you?

If it's GPL you can at least interrogate the code release and make an informed decision


The same way I can trust all the binary blobs vendors stick into GPL-licensed systems. How do I make an informed decision about those?


Software diversity is a luxury customers won't pay for.


Sometimes the bugs end up at a security conference.

https://airbus-seclab.github.io/ilo/BHUSA2021-Slides-hpe_ilo...


Actually, why aren't they literally normal little computers? Like, fully open, bring your own OS computers. All it needs is some peripherals - Ethernet, its own host USB, gadget mode USB to present mouse/kb/storage to the main computer, video capture card, some GPIO to control power - but there's nothing all that special there; then you just install Debian or w/e and control the perfectly standard interfaces at will.


The interface between the BMC and motherboard is unique to each motherboard, especially for "added value" features that some servers have. DC-SCM is working on standardizing this but I don't know how interoperable it will be.


Personally I have some pikvms at home. And while not full parity to idrac enterprise, and it takes some extra effort to get atx, I have STRONGLY been considering dropping them in our DC to at least replace the dell ipkvm that frankly is a security nightmare.


It may no be "enterprise" enough for a given employer, by it's not hard to replicate most of this functionality with cheap (and open) hardware. For example I had a case that called for BMC that I resolved with a spare raspi3b, a very cheap capture device and a "smart plug". Total cost of materials was about 30 euro and (for me) it wasn't any harder to operate that an idrac.



It would also be sooo much easier to automate everything if it just ran slightly custom Debian install instead of... whatever the fuck abomination manufacturer made.


To add to this, manufacturers (HPE) are now requiring an expensive annual fee in order for customers to use the hardware.


I hear that many BMCs internally just run Linux.


The repo claims that the servers themselves throttle the GPUs, but isn't it the GPUs themselves that can throttle or maybe the OS? Neither of those are controlled by the server (hopefully) so is there a different system at play here?


I can actually answer this (as it is how I stumbled on to the repo), it's through a signal from the motherboard called Pwrbrk (Power Brake), Pin 30 on PCIe. It tells the PCIe device to maintain a low-power mode, in the case of Nvidia GPUs it's about 50W (300Mhz out of 2100Mhz in my case).

You can check if it's active using `nvidia-smi -q | grep Slowdown` as shown in the post


No, that's controlled by the server: try lspci -vv on any linux system. Look at the link speed and width, like LnkSta: Speed 8GT/s, Width x2: x2 means 2 lanes.

Try:

`sudo lspci -vv | grep -P "[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]|downgrad" |grep -B1 downgrad`

Besides the speed, you can have another problem with lanes limitations.

For example, AMD CPUs have a lot of lanes, but unless you have an EPYC, most of them are not exposed, so the PCH tries to spread its meager set among the devices connected to your PCI bus, and if you have a x16 GPU, but also a WIFI adapter, a WWAN card and a few identical NVMe, you may find only of the NVMe benchmarks at the throughput you expect.


> For example, AMD CPUs have a lot of lanes, but unless you have an EPYC, most of them are not exposed, so the PCH tries to spread its meager set among the devices connected to your PCI bus, and if you have a x16 GPU, but also a WIFI adapter, a WWAN card and a few identical NVMe, you may find only of the NVMe benchmarks at the throughput you expect.

Most AM4 boards put an x16 slot direct to the CPU, and an x4 direct linked NVMe slot. That's 20 of the 24 lanes; the other 4 lanes go to the chipset, which all the rest of the peripherals are behind. (There's some USB and other I/O from the cpu, too). AM5 CPUs added another 4 lanes, which is usually a second cpu x4 slot.

Early AM4 boards might not have a cpu x4 NVMe slot, and those 4 cpu lanes might not be exposed, and the a300/x300 chipsetless boards don't tend to expose everything, but where else are you seeing AMD boards where all the CPU lanes aren't exposed?


> Early AM4 boards might not have a cpu x4 NVMe slot, and those 4 cpu lanes might not be exposed, and the a300/x300 chipsetless boards don't tend to expose everything

I'm sorry, I oversimplified, and said "most of them" while I should have said "not all of them" as 20/24 is more correct for B550 chipsets (the most common for AM4) instead of trying to generalize.

Your explanation is more correct that mine.

For anyone who might want extra details about the number of lanes per CPU, https://pcguide101.com/motherboard/how-many-pcie-lanes-does-... is a good read that shows the difference for APUs.


I'm still not quite sure what you're trying to say?

Lanes behind the chipset are multiplexed, and you can't get more than x4 throughput through the chipset (and the link speed between the cpu and the chipset varies depending on the chipset and cpu). But that's not a problem of the CPU lanes not being exposed, it's a problem of "not enough lanes" or more likely, lanes not arranged how you'd like. On AM4, if your GPU uses x16, and one NVMe uses x4, then everything else is going to be squeezed through the chipset. On AM5, you usually get two x4 NVMe slots, but again everything else is squeezed through the chipset; x670 is particularly constrained because it just puts a second chipset downstream of the first chipset, so you're just adding more stuff to squeeze through the same x4 link to the CPU.

Personally, I found that link to be more confusing than just reading through the descriptions on wikipedia for a particular Zen version. For example https://en.wikipedia.org/wiki/Zen_3 ... just text search in the page for "lanes" and it explains for all the flavors of chips how many lanes, and how many go to the chipset. Similarly the page for AMD chipsets is pretty succinct https://en.wikipedia.org/wiki/List_of_AMD_chipsets#AM5_chips...


There's a reason why so many motherboard makers avoid putting a block diagram in their manuals and go for paragraphs of legalese instead, and laziness is only half of it.


> Most AM4 boards put an x16 slot direct to the CPU, and an x4 direct linked NVMe slot. That's 20 of the 24 lanes; the other 4 lanes go to the chipset, which all the rest of the peripherals are behind. (There's some USB and other I/O from the cpu, too). AM5 CPUs added another 4 lanes, which is usually a second cpu x4 slot.

Mine just go to second NVMe weirdly enough.


I meant to say the additional x4 is usually a second cpu x4 [NVMe] slot. Not a pci-e x4 slot.


> For example, AMD CPUs have a lot of lanes, but unless you have an EPYC, most of them are not exposed, so the PCH tries to spread its meager set among the devices connected to your PCI bus, and if you have a x16 GPU, but also a WIFI adapter, a WWAN card and a few identical NVMe, you may find only of the NVMe benchmarks at the throughput you expect.

example from my X670E board

* first NVME = 4x gen 5

* second= 4x gen 4

* 2 USB ports connected to CPU (10/5 Gbit)

and EVERYTHING ELSE goes thru 4x gen 4 PCIE bus, including additional 3x nvme, 7 SATA ports, a bunch of USBs, few 1x PCIE ports, network, etc.


PCIe devices can only draw a limited wattage until the host clears them for higher power. There is also a separate power brake mechanism (optional part of PCIe) mentioned in the article, which has been proposed by nVidia for PCIe so it seems likely their GPUs support it.


There are a number of valid engineering reasons for thermal dissipation why you don't want to overload the heat producing things in a 1U server beyond what it was designed for.

This article doesn't mention at all what the max TDP of each gpu is, which makes me suspicious. Or things like max tdp of cpus (such as when running a prime number calculatio multi core stress benchmark to load them to 100%) combined with total wattage of GPUs.

If you have never built an x86-64 1U dual socket server from discrete whitebox components (chassis, power supply, 12x13 size motherboard, etc) this is harder to intuitively understand.

I would recommend that people who want four powerful GPUs in something they own themselves to look at more conventional sized server chassis, 3U to 4U in height, or tower format if it doesn't need to be in a datacenter cabinet somewhere.


I'm dealing with something similar. I wanted to use Redfish to clear out hard drives but storage is not standardize across different vendors. Dell has a secure erase. HPE gen10 has smart storage and anything older doesn't have any useful functionality in their Redfish API. What a mess. So I need to use PXE booting and probably winpe to do this.


Dell is peak asshole design. They also blast fans as full speed if you install GPUs that you don't buy from them. Fuck them.


In case you're referring to their servers (from https://dl.dell.com/manuals/common/poweredge_pcie_cooling.pd... ):

> The automatic system cooling response for third-party cards provisions airflow based on common industry PCIe requirements, regulating the inlet air to the card to a maximum of 55°C. The algorithm also approximates airflow in linear foot per minute (LFM) for the card based on card power delivery expectations for the slot (not actual card power consumption) and sets fan speeds to meet that LFM expectation. Since the airflow delivery is based on limited information from the third-party card, it is possible that this estimated airflow delivery may result in overcooling or undercooling of the card. Therefore, Dell EMC provides airflow customization for third-party PCIe adapters installed in PowerEdge platforms.

You need to use their RACADM interface to update the minimum LFM for your card.


HP is way worse than Dell. Fans at full speed I can handle. Servers permanently throwing errors because a part isn't HP branded, that's peak asshole design. So is refusing to measure status of drives if I use a third party drive sled (which is some folded metal, and some LEDs on a flex PCB) AND throwing errors about them.


Lenovo soft bricks their workstations if they detect non-branded cards in them.

No upgrading my wifi, thats a nono!


The wifi thing is slightly understandable because FCC requires you to limit your radiated emissions, and when you do the certification you have to control the entire configuration to pass the testing, which means that your radio is paired with your antenna cable and antenna in the body of the device. Allowing people to replace the radio without a paired antenna cable and antenna could cause radiated emissions to fall outside of the spectrum allowed by the FCC. It's dumb in practice but at least somewhat understandable in principle.


If you sell me something FCC-compliant and I modify it to make it noncompliant, isn't that entirely on me? Why do you need to try to stop me?


And not just stop you or refuse to activate the card, but to intentionally cause the hardware I purchased to irreparably malfunction as long as I attempt to use something you didn't sell me.

Of course, I can use a USB wifi card, no problem. Just the convenient PCI card inside the system that is specifically designed to enable a wide variety of functions aside from wifi that is locked in the BIOS just in case I wanted to use a 3rd party card of a specification that you don't happen to sell.


I’ve never heard this explanation before and I don’t buy it. Transmission power is constant regardless of the antenna used.


Yeah, it's 100% pure fresh from the ass BS.

Lenovo did that to reduce their customer's ability to buy PC expansion devices from other manufacturers, in hopes of making more aftermarket sales in the future.

It was a pure greed tactic.


It used to be very common across all OEMs, at least in the consumer space. I remember having to flash a modded BIOS on my 2012 HP laptop so it would boot with a WiFi + BT card. Back in those days it was uncommon for a laptop to have BT.


Hmm I've got an HP Z6 that doesn't seem to care?

Are you sure this isn't a 1u rack will fry itself if you put a spaceheater type of thing inside of it?


5u system, ML350Gen9. I ignore the errors, but of course that means that if a real error pops up there, I won't know. It's a lower-urgency server of my own so it's ok, but annoying as well in production.

I see the Z6 is a workstation unit, they're going to be more flexible there.


Don't forget how Dell included/includes DRM in their laptop chargers to prevent customers from buying cheap aftermarket replacements. Of course the wire for the DRM functionality is as thin as possible and is always the first thing to break.


They don’t prevent you from using aftermarket chargers. They just display a warning, which can be disabled in the BIOS.


They also set CPU speed to minimum.


I don't know what it's like currently, but when I stopped using Dell laptops a few years ago they wouldn't even boot if the battery was dead and you plugged in a third party charger or a charger with a busted data cable.


Other manufacturers do worse and prevent boot if your PCI ids aren't on a positive list.

This is for example present on thinkpads, and while you could patch the bios before, Intel bootguard now prevents you do that "for your own protection" :)

I hope the MSI leak contains actual bootguard keys for intel 11th gen+, and can be used to allow "unauthorized" PCI modules on modern thinkpads!


I'd love to buy a AMD Ryzen 3 Pro 5350g, but I don't want to deal with the stupid locked CPUs floating around from Lenovo pulls.


What worked for some older think machines (haven't tried it on my thinkpad though) is to update the bios (can be to the same version) and change the serial nr.to all zeroes (the update script asks if you want that). That got rid of the wifi whitelist i encountered.


Very interesting!

Could you please explain which serial? Can you do dmidecode and tell me which Handle/ UUID is all zeroes?

Even if it may not directly apply to current thinkpads, it implies the UEFI module might have other conditionals before going on checking the positive list - something that should be easy to check by reversing the LenovoWmaPolicyDxe.sct PE32.


If you’re interested, I have a dump of my TP bios flash unmodified and then the same version I paid some Russian guy to modify so the whitelist is removed and an extra menu unlocked. I basically sent them the dump, and after a “donation” I got the patches dump back.

Would love to learn how to figure out what the changes are because with my limited knowledge I didn’t figure anything out.


Which thinkpad model?

If you want to learn, https://erfur.github.io/2019/03/28/down_the_rabbit_hole_pt3.... is a good guide!


Thanks, I am so going down this rabbit hole.

It's for an X230, I wanted 802.11ac wifi in it.


If you are interested in doing other modifications yourself, also read about UEFI variables, how they control menu options and can be tweaked with just grub


x230 is easy, just install coreboot instead of messing with bios patches.


I've run coreboot for a while but had issues with USB3 and the docking station so I unfortunately had to revert back to the Lenovo BIOS.


bootguard keys are oem specific


The MSI leak comments like https://sizeof.cat/post/leak-intel-private-keys-msi-firmware... mentioned the bootguard keys maybe have been common to other manufacturers: "It is assumed that the keys for downloading the guard are not limited to compromising MSI products and can also be used to attack equipment from other manufacturers using Intel’s 11th, 12th and 13th generation processors (for example, Intel, Lenovo and Supermicro boards are mentioned)"


Those are board that MSI OEMed for other vendors.


So servers and desktops only, which crushes my hope of getting rid of bootguard on my laptop.


Restrictive nonsense seems common in the server space unfortunately. HPe do similar things. IIRC they disabled certain features if you used non-HPe ‘approved’ hard drives.


Dell also does this with their EMC storage arrays, it’s meant to push you towards their pro services. You are supposed to tell the array to order drives for you from pro services and someone from some nameless MSP contracted with dell installs it for you at a 10x markup.


Are there even any good server vendors? Dell, HPE and Lenovo do their lock-in shit. Supermicro's BMC is pretty bad. xFusion is totally-not-Huawei-I-pwomise. There's a few more that come to mind but all of them are niches like HPC and don't really do sales on a small scale.


> E.g. it is well known that adding a third-party PCIe NIC makes fans run at the maximum speed.

From the article


I observed this behavior in a Dell system about a decade ago, but based on experience over the last 5 years or so with PowerEdge servers, installing a third-party GPU no longer triggers the (extremely loud) maximum fan-speed response.


Even if that's the case the damage is done.

I still tell people about capacitor plaque and how we should have class actioned Dell out of existence over it


Why Dell specifically? Wasn’t all hardware that bought the cheaper capacitors affected?


Because they were the most well known of the affected companies and they didn't really do much to repair the goodwill the issue cost them.

Sure, they "put up $300 million for repairs" according to https://www.theguardian.com/technology/blog/2010/jun/29/dell..., but I bet the lions share of that went to their largest purchasers, so the people who blew their school's IT budget on Dell computers were just SOL.


They also stop producing laptop batteries after a few years while refusing to let the laptop charge 3rd party battery replacements, significantly limiting the useful life of their laptops.


You can actually call a small business rep and get them to order it for you. They still produce them or else have them in stock, just don’t sell them online.


Isn't that illegal? Hijacking your customer pc if they install something that isn't from them?


Nominally done for your protection. Lowering power (clock) and heat load (fast fan) for unapproved gear prevents things from going dead and getting people REALLY mad and likely reduces warranty claims.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: