More

maxwell86 · on Nov 12, 2022

> Accessing InfiniBand and GPUs directly become a problem.

I use nvidia containers on HPC systems every day and accessing NICs, doing RDMA to GPUs, etc. "just works" and performs as well as baremetal. Every time we upgrade our container we verify the new container with a set of benchmarks against both the old one and baremetal.

> You don’t want to give indirect root access via docker group, too.

I don't know of any HPC center using docker though. It does not sound like a good idea because the docker daemon runs as root..

maxwell86 · on Oct 28, 2022

> xactly for how they implemented and supply a 12V solution (which includes both the physical products they supply, and the messaging they've put out around it) which this article yet again underlines as being both real, and even worse than initially thought.

I'd recommend people gloating to read the article. EDIT: (see my edit below).

The conclussions are clear:

- The problem is NOT the new connection; that's fine. New PSUs come with a connection that does not need any adaptor and those are safe and work fine.

- The problem is a poor quality adaptor shipped with 4090s for people that buy a 1600$ GFX but then skimp on a new PSU and want to pair it with an old one (EDIT: skimp is out of place and victim blaming, I'd guess it would be more appropiate to have said here that NVIDIA and partners decided to add an adapter to avoid suggesting that users need a new PSU).

These adaptors are distributed by NVIDIA but build by a supplier. Igor's recommendation is, I quote: "NVIDIA has to take its own supplier to task here, and replacing the adapters in circulation would actually be the least they could do.".

EDIT: This comment can be misunderstood as me speculating whether the OP read the article or not. I am not speculating: the OP did not read the article, which claims the opposite of what the OP claims. The OP claims that 12V solutions are the issue, while the article states that they are fine, and as proof shows that new PSUs implement them correctly. In fact the _goal_ of the article is to set the record straight about this, by precising that the only problem is the quality of the adapter, not 12 V per se. So this comment is not an speculation about whether the OP read the article or not, but a response to set the record straight for those who might read OPs comment only, but not the article (I often come to HN for the comments more than the articles, so I'd find such a comment helpful myself).

_abox · on Oct 28, 2022

"Skim on a new PSU" sounds like people are cheaping out or something. Many people already have a more than sufficient PSU and replacing it just for another plug is a waste of natural resources.

NVidia should just include an adaptor that's not a fire hazard. The consumers are not to blame here.

Ps I think you mean "skimp"

maxwell86 · on Oct 28, 2022

Agreed, not sure what other words to use instead.

I think it would have been better for these cards to not have an adapter at all. I've added an EDIT to try to word this differently.

Sohcahtoa82 · on Oct 28, 2022

Other commenters are claiming that NVIDIA KNEW the adapter had an issue with melting and/or catching fire. If that's true, I still think NVIDIA still has 100% liability.

If it was late in the development cycle that this was discovered, then the proper thing to do would have been to delay the release, or just not include adapters and offer them later. It would have been a minor PR hit, but not nearly as bad as shipping adapters known to be faulty.

maxwell86 · on Oct 28, 2022

> Other commenters are claiming that NVIDIA KNEW the adapter h

Those refer to a PCI Express Forum issue that was opened about the connector drawing too much power, not the adapter.

sudosysgen · on Oct 28, 2022

Why would I waste a perfectly good PSU just because Nvidia can't make adapters that don't melt?

maxwell86 · on Oct 28, 2022

TBH NVIDIA should just have not included any adapter at all.

izacus · on Oct 29, 2022

dylan604 · on Oct 28, 2022

But does the inclusion of a shoddy adapter not just encourage the usage of the older PSU?

cptskippy · on Oct 28, 2022

The problem isn't with older PSUs, they can work fine with a good adapter.

The problem is with the adapter design, it is not just one bad choice but multiple layers of negligence that compound the issue.

* The adapter has 6 pins all bridged together by thin connections that can break.

* 4 heavy gauge wires are attached to those pins with a surface mount solder joint. They're not through hole soldered which would provide more contact AND far greater strength. They're not crimped which would provide the best contact and strength.

* There's no strain relief. So if you hold the cables close to where they're soldered to the connector and flex it you can easily break those surface mount solder joints.

* Because the 6 pins / 4 wires are all bridged asymmetrically, some bridges have more current passing through them and if they're fatigued or damaged they'll have higher resistance. Higher resistance means more heat.

Overall it's just really poorly engineered on multiple levels. It's diarrhea poured over-top an open face shit sandwich.

KIFulgore · on Oct 28, 2022

Same conclusion here. Pushing 500+ watts through surface-mount solder joints is pure negligence.

CamperBob2 · on Oct 29, 2022

Yep. In general whenever you rely on a solder joint to provide mechanical strength or support, you're in a state of sin. Running 600 watts through such a connection is pure negligence.

maxwell86 · on Oct 28, 2022

Yes. They should not have included any adapter at all.

Those with older PSUs should have had to make a decision about whether they want an adapter or not, and then pick an adapter of the appropiate quality.

pvg · on Oct 28, 2022

I'd recommend people gloating to read the article.

This is a neat way to parallelize and scale up this swipe but it's still the same swipe:

Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

https://news.ycombinator.com/newsguidelines.html

maxwell86 · on Oct 28, 2022

Thanks. I've added an EDIT to clarify that I am not speculating about whether the OP read the article, and that the only point of my comment is to set the record straight for those who often just come to HN for the comments (like myself).

pvg · on Oct 28, 2022

Just take it out. You don't know who has and hasn't read the article, how much they are 'gloating' and it's one of the oldest known bad tropes of internet forums which is why it's in the guidelines. Your comment only gets better without it.

maxwell86 · on Sept 26, 2022

> Stand on the right of escalators

This surprised me when visiting the UK.

I actually expected people to stand on the "left" of the scalators - just like how they drive on the left and overtake on the right - but they didn't. They stood on the right.

I was left puzzled ???

matt-p · on Sept 26, 2022

Walk on the left, which is enabled by people standing on the right. Quite simple IMO, but maybe that's just because it's ever been thus!

maxwell86 · on Oct 5, 2022

That's the opposite of all other traffic rules in the UK.

Cars stop on the left, which allow other cars to overtake them on the right.

maxwell86 · on Sept 23, 2022

> How likely is it that any given node would be mapping out more than 2^64 bytes worth of virtual pages?

In the Grace Hopper whitepaper, NVIDIA says that they connect multiple nodes with a fabric that allows them to creat a virtual address space across all of them.

maxwell86 · on Sept 20, 2022

Do you have some scientific references on synchronous concurrency?

syncurrent · on Sept 20, 2022

This paper: https://past.date-conference.com/proceedings-archive/2017/py... is on Sequentially Constructive Synchronous Concurrency which is also the base for the synchronous language Blech (https://www.blech-lang.org/)

maxwell86 · on July 11, 2022

Rust already has a GCC backend that can do all that.

This post is about a new front-end.

maxwell86 · on July 11, 2022

The rust compiler can already use GCC as a backend. So I don't see how this opens more platforms than that.

dleslie · on July 11, 2022

Because that doesn't use the GCC frontend interface, requiring build tools and embedded toolchains to be modified to understand the rust compiler interface. By using GCC it's just another language that the existing toolchain can understand.

maxwell86 · on July 12, 2022

The Rust module system is radically different from C and C++ and other similar languages in the embedded space.

Every build system that has added support for Rust, which aren't many, had to be radically modified to achieve that.

None of these supports the GCC Rust frontend, but all of them support the Rust frontend.

So if you actually wanted to build any >100 LOC Rust project for embedded targets not supported by LLVM, doing it with the Rust frontend is as easy as just running 1 CLI command to pick its GCC backend.

Doing it with the GCC frontend, would require you to either port one of the build systems to support it, or... give the GCC frontend a CLI API that's 100% compatible with the Rust frontend.

dleslie · on July 12, 2022

Cargo support for gccrs is part of this project:

https://github.com/Rust-GCC/cargo-gccrs

Moreover, modules are less interesting to me in embedded development, for which I'm interested in access to Rust's borrow checker for gaining certainty of small portions of larger projects, which are written in other languages.

nickitolas · on July 12, 2022

So it's not about platform support, but about toolchain integration? Who benefits from that, projects using C/C++ who want to use a rust library? Or is it about distro package maintainers?

dleslie · on July 12, 2022

The toolchains support helps embedded developers, mainly. For example, Xtensa and AVR toolchains are generally byzantine monstrosities of makefiles, Python, dialog, etc; so having them be given a low effort means to consume rust is a boon. Ideally, rust is just another source file in the srcdir soup.

That said, gcc supports more platforms than llvm; including esoteric and unpopular desktop configurations.

Personally, I plan to drop this into marsdev as soon as it releases. Writing 32X games in rust sounds like silly fun.

steveklabnik · on July 12, 2022

Incidentally, Xtensa has hired someone for the last… year or so? To make using Rust on their stuff work well.

AVR support is almost there in mainline rustc but has a codegen bug or two, last I heard.

zRedShift · on July 12, 2022

I think you meant Espressif, Steve. They have a fork of LLVM with Xtensa support they’re looking to upstream (still a few things missing, like the DSP/AI instructions in ESP32-S3, and I think the codegen is better on GCC for now). And the folks at esp-rs who work at ESP and outside contributors maintain a Rust toolchain and standard library (based on ESP-IDF) which they also want to upstream. There’s also a baremetal target which has a dedicated developer in ESP, it’s pretty amazing. Although esp32-s3 is going to be the last Xtensa chip from them, they’re planning on moving to RISC V, wholesale, with all their products in the last year based on it. Ferrous Systems even designed a Rust specific devkit based on their RISC V esp32-c3, to teach embedded Rust on.

I bet Cadence was ripping them off for the IP, which is a shame…

Any talks in Oxide about porting Hubris to RISC V? I hear getting your hands on Cortex-M*s in bulk is still pretty challenging these days.

steveklabnik · on July 12, 2022

Ahh whoops you’re right, lol. Embarrassing.

Hubris was designed to be easy to port to RISC-V, so yes! We didn’t end up doing that though. Someone else did though! https://github.com/oxidecomputer/hubris/discussions/365

maxwell86 · on July 11, 2022

This is not really a problem.

While it is true that you can only use masks from v0, and this requires moving masks into v0 after calling a vector instruction, those moves don't actually copy data from one register to another. Instead, they just "rename" registers.

So...

    ...generate mask into v2... v2, ....  <- put mask here
    
    mov v0, v2  <- move mask into v0
    vadd ... <- vector instruction, always use v0

doesn't really put some bits into v2, then copy them to v0, and then call the vector instruction.

Instead, the mov v0, v2 just disappears due to a register rename (e.g. v2 gets renamed as v0 for vadd), and vadd picks the mask directly from the register that was previously called v2 but is now called v0.

Any CPU would implement register renaming before actually even thinking of adding vector registers. So it is fair to assume that every CPU that implemenst the RISC-V V extension, supports it.

polomi · on July 11, 2022

The article says as much.

> Although this may seem like a significant drawback, it's actually not that alarming, at least not to me. While it's true that you must insert various "move mask to v0" instructions that wouldn't otherwise be there, it's important to remember that these will not really be actual computation instructions. Moves from one vector register to another will always be simple register renames handled by the front end of any high-performance chip, and I would consider it highly unlikely that you would change masks so frequently as to overburden the front end.

This is not the point the author was making.

maxwell86 · on July 11, 2022

If you only write to the lower 32-bit of the v0 register, which could be 1024 bit wide, that claims that the hardware somehow has to allocate a 1024-bit wide register to back those up, and then makes some "locality" arguments.

The hardware can back up the 1024-bit register with a pool of 32-bit registers, and if you only wrote to the first 32-bits, and all others are zero, it can use a single 32-bit register to back it up, making this "as good" as the single mask register solution, which the author thinks is good.

ncmncm · on July 11, 2022

Determining that you only wrote to the bottom 32 bits of the register being copied from is hard for hardware to see; and if the compiler can see, it has no way to tell the hardware.

freemint · on July 14, 2022

I know of a lot of embedded designs which do not do register renaming.

hajile · on July 14, 2022

I doubt any of those are implementing vector instructions. If they do, then renaming is a relatively small addition in comparison.

freemint · on July 15, 2022

Vector instructions are replaced with DSP instructions for embedded applications. Ne speculating SOC with DSPs seem pretty standard?

maxwell86 · on July 10, 2022

> The elevation of the room where the rack is installed must be below 9,842 feet (3,000 meters).

Why?

maxwell86 · on June 1, 2022

> even if you have a PhD many will see you as "less than."

Inside academia and outside of it.

Inside academia you are not on a tenure track or similar, and will have to put up with a lot.

Outside academia, your peers will be making 3x or more than you, working less hours, with less stress, etc.

The reason RSE's jobs are hard to fill and often aren't even opened is that they don't make sense. If you are good enough for an RSE job, you will be good enough for research postions at FAANG. Those pay 10x more, so you also need someone willing to not accept that 10x pay, and also willing to work double the hours.

RSEs making a reasonable pay for the skills they require make no sense either, because that would put your pay at 2x that of professors, etc.

robbomacrae · on June 1, 2022

I'm curious which company pays a RS 10x more than a SE/RSE. I've found its usually SE/RSE's that make a bit more than RS's but I've never seen an RSE comp significantly outweigh an SE unless its in ML/Crypto.

On that note I wish levels.fyi had RS and RSE roles...

exdsq · on June 1, 2022

This isn't totally crazy but it's only if you stretch the facts enough. An RSE at Oxford will earn 32k GBP, like a postdoc. In theory, that person could be so good they could get the highest starting salary possible at a Tier 1 paying company like a research engineer at Hudson River Trading, which can be > 320k TC. So it's possible but only for a very very small number of people. 3x-5x is much more likely.

PheonixPharts · on June 1, 2022

> highest starting salary possible at a Tier 1 paying company like a research engineer at Hudson River Trading, which can be > 320k TC.

You don't have to be at a "tier 1 paying company" to exceed 320k TC, that's easily achievable at any tech company that has gone public in the NYC metro or Bay Area. TC at a FAANG (MANGA or whatever they're calling it) for a more senior IC role can easily cross the $500k mark. If you've been their awhile and have been accruing shares that have increased dramatically in value, crossing the 7 figure mark is not unheard of.

So you don't have to stretch the facts too much to realistically achieve the 10x. If you take two very qualified engineer graduating with a PhD, one chooses to go to Google and stay there the other goes to a university and works as an RSE. Fast forward a decade, you'll definitely be seeing a 10x difference in total comp.

exdsq · on June 1, 2022

Oh after a decade sure. I was just interviewing with Meta for > $500k TC so totally believe it. I meant immediately after a PhD.

adw · on June 1, 2022

10x more than academia. I’m a PhD in mineral physics turned Big Four MLE via startups and I earn about 10x what I would be on if I had stayed on the academic track.

epgui · on June 1, 2022

10x is possible, but maybe 3-5x is more realistic (even for non-FAANG).