Sigh. It's great that these container images exist to give people an easy on-ram...

xahrepap · on Oct 6, 2023

I know it's still different than what you're looking for, so you probably already know this, but many projects like this have the Dockerfile on github which shows exactly how they set up the image. For example:

https://github.com/RadeonOpenCompute/ROCm-docker/blob/master...

They also have some for Fedora. Looks like for this you need to install their repo:

    curl -sL https://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - \
    && printf "deb [arch=amd64] https://repo.radeon.com/rocm/apt/$ROCM_VERSION/ jammy main" | tee /etc/apt/sources.list.d/rocm.list \
    && printf "deb [arch=amd64] https://repo.radeon.com/amdgpu/$AMDGPU_VERSION/ubuntu jammy main" | tee /etc/apt/sources.list.d/amdgpu.list \

then install Python, a couple other dependencies (build-essential, etc) and then the package in question: rocm-dev

So they are doing the packaging. There might even be documentation elsewhere for that type of setup.

mikepurvis · on Oct 6, 2023

Oh yeah, I mean... having the source for the container build is kind of table stakes at this point. No one would accept a 10gb mystery meat blob as the basis of their production system. It's bad enough that we still accept binary-only drivers and proprietary libraries like TensorRT.

I think my issue is more just with the mindset that it's okay to have one narrow slice of supported versions of everything that are "known to work together" and those are what's in the container and anything outside of those and you're immediately pooched.

This is not hypothetical btw, I've run into real problems around it with libraries like gproto, where tensorflow's bazel build pulls in an exact version that's different from the default one in nixpkgs, and now you get symbol conflicts when something tries to link to the tensorflow c++ API while linking to another component already using the default gproto. I know these problems are solveable with symbol visibility control and whatever, but that stuff is far from universal and hard to get right, especially if the person setting up the build rules for the library doesn't themselves use it in that type of heterogeneous environment (like, everyone at Google just links the same global proto version from the monorepo so it doesn't matter).

weebull · on Oct 7, 2023

> I think my issue is more just with the mindset that it's okay to have one narrow slice of supported versions of everything that are "known to work together" and those are what's in the container and anything outside of those and you're immediately pooched.

I hear you. I think docker has been a plague on the quality of software. It's allowed "works for me" to become the norm, except it's now pronounced "works on the official docker image". It seems to be especially true in the ML sphere where compiling things is so temperamental that there's a lot of binaries being distributed.

Docker was meant to be a deployment platform, not a distribution medium.

mgaunard · on Oct 7, 2023

I don't know what world you live in, but this is a problem for any software development.

You need to ensure that there is only one version of any library used globally throughout the code and that the set of versions is compatible with each other, and preferably you also want everything to be built against the same toolchain with the same flags.

That usually means onboarding third-party libraries into your own build system.

anuraaga · on Oct 7, 2023

I'd say with semver becoming far better known, this is not a problem for "any" software development. The developer gets the choice to pick libraries that are stable, often also influencing language choice. Mistakes happen, Guava broke the Java ecosystem for about two years, but it's never something that is accepted as just a fact of software development, it is a mistake.

Wanting to hold Python+C ecosystem more accountable is fair I think, at least from my own experience around half a year ago, Anaconda doesn't work and you need a Dockerfile for any sort of reproducibility, which can have issues since GPU with docker isn't that easy. And this means developers from the vendors working with Anaconda, for example, on solving the issue rather than just hoping for contributors to do it. If AMD were to make easy, reproducible builds without root or VM a reality, that would be reason enough to try their hardware. If not, hopefully Nvidia does and then there really would be no way across the moat for me at least.

mgaunard · on Oct 7, 2023

Semver is a joke and doesn't work. Languages like C and C++ can easily have problems if you link code built with different versions together (even if you aim for them to be compatible, or even if they are indeed the same source version but with subtly different flags), and there are no good solutions for this, except not doing it.

A docker container is not really any different from any other process; the main difference is that it runs in a chroot pretty much.

Dylan16807 · on Oct 9, 2023

> problems if you link code built with different versions

But that has nothing to do with semver.

Semver gives you information about when when you can replace one version with another version. It doesn't promise that you can mix multiple versions together.

mgaunard · on Oct 9, 2023

It gives you information about intent, not reality.

And you are mixing multiple versions if you are building against version x.y and linking against version x.(y+z).

Dylan16807 · on Oct 9, 2023

Maybe I misunderstood "built with", because I thought you were talking about the compiler version there. I know semver is just intent, but the intent doesn't even touch mixing internal data from multiple versions.

If linking against a different version of the code breaks like that, that sounds like someone did semver wrong. If that happens a lot to you, then oh, I'm sorry about that happening.

kiitos · on Oct 12, 2023

Every versioning scheme necessarily describes intent, not reality.

anthk · on Oct 7, 2023

This would be the work for Guix. Much better than docker, and exportable to a lot of formats. Or just build a vm from the CLI, an ad-hoc environment, a Docker export or a direct rootfs to deploy and run in any compatible machine.

josephg · on Oct 7, 2023

It’s not a universal problem. A lot of modern languages allow multiple versions of a library to be pulled in to the same code base, through different dependency paths. (Eg nodejs, rust). It’s not a perfect answer by any means, but it’s nice not needing to worry about some package pulling in an inconvenient version of one of its dependencies.

Also, just to name it, it’s ridiculous that a specific graphics card manages to restrict the version of gproto that you’re using. You don’t have this problem with nvidia drivers, since cuda stuff is much less fiddly. AMD needs to pull a finger out and fix the bugs in their stack that make it so fragile like this.

iopq · on Oct 7, 2023

In NixOS, I can install multiple versions of libraries

Or rather, I install no versions of libraries because NixOS will put them all in the store in different folders, and will compile the executable to use the correct path (or patch the elf when needed)

it has an issue with pip because it's allergic to just randomly executing things as part of package management, but pip in general is wtf

mikepurvis · on Oct 8, 2023

Ironically I'm having this problem in a Nix build context because of the broken approach Nix takes to packaging bazel—which itself is largely a consequence of the larger issue I'm grouching about here: unbundling tensorflow's locked dependencies is very hard to do when the underlying source is written to assume it's only targeting the exact version specified in the build rules. You can't just switch it to target the gproto in nixpkgs because then you get compilation failures.

anthk · on Oct 7, 2023

That's trivial with Guix.

JonChesterfield · on Oct 7, 2023

> No one would accept a 10gb mystery meat blob as the basis of their production system

Well, except for cuda. Which is a massive pile of proprietary software that people are using in production anyway.

hotstickyballs · on Oct 7, 2023

If anything, the situation with tensor rt shows that companies are absolutely willing to accept a multi gig meat blob

pixl97 · on Oct 7, 2023

> No one would accept a 10gb mystery meat blob as the basis of their production system

Heh, if only. When working with F100's I've seen many terrible, terrible things.

fwsgonzo · on Oct 6, 2023

I feel the same way, especially about build systems. OpenSSL and v8 are among a large list of things that have horrid build systems. Only way to build them sanely is to use some randos CMake fork, then it Just Works. Literally a two-liner in your build system to add them to your project with a sane CMake script.

mikepurvis · on Oct 6, 2023

I was part of a Nix migration over the past two years, and literally one of the first things we checked is that there was already a community-maintained tensorflow+gpu package in nixpkgs because without that the whole thing would have been a complete non-starter, and we sure as heck didn't have the resources or know-how to figure it out for ourselves as a small DevOps team just trying to do basic packaging.

amelius · on Oct 6, 2023

> So it's important that vendors don't feel let off the hook to provide sane packaging just because there's an option to use a kitchen-sink container image they rebuild every day.

Sadly if e.g. 95% of their users can use the container, then it could make economical sense to do it that way.

mathisfun123 · on Oct 6, 2023

> especially once you're in embedded

is this a real problem? exactly which embedded platform has a device that ROCm supports?

mikepurvis · on Oct 6, 2023

Robotic perception is the one relevant to me. You want to do object recognition on an industrial x86 or Jetson-type machine, without having to use Ubuntu or whatever the one "blessed" underlay system is (either natively or implicitly because you pulled a container based on it).

mathisfun123 · on Oct 6, 2023

>industrial x86 or Jetson-type machine

that's not embedded dev. if you

1. use underpowered devices to perform sophisticated tasks

2. using code/tools that operate at extremely high levels of "abstraction"

don't be surprised when all the inherent complexity is tamed using just more layers of "abstraction". if that becomes a problem for your cost/power/space budget then reconsider choice 1 or choice 2.

mikepurvis · on Oct 6, 2023

Not sure this is worth an argument over semantics, but modern "embedded" development is a lot bigger than just microcontrollers and wearables. IMO as soon as you're deploying a computer into any kind of "appliance", or you're offline for periods of time, or you're running on batteries or your primary network connection is wireless... then yeah, you're starting to hit the requirements associated with embedded and need to seek established solutions for them, including using distros which account for those requirements.

serf · on Oct 6, 2023

fwiw CompTIA classifies an embedded engineer/developer as " those who develop an optimized code for specific hardware platforms."

mathisfun123 · on Oct 6, 2023

> IMO as soon as you're deploying a computer into any kind of "appliance", or you're offline for periods of time, or you're running on batteries or your primary network connection is wireless

yes and in those instances you do not reach for pytorch/tensorflow on top of ubuntu on top of x86 with a discrete gpu and 32gb of ram. instead you reach for C and micro or some arm soc that supports baremetal or at most rtos. that's embedded dev.

so i'll repeat myself: if you want to run extremely high-level code then don't be "surprised pikachu" when your underpowered platform, that you chose due to concrete, tight budgets doesn't work out.

Const-me · on Oct 6, 2023

The hardware can be fast, actually. Here’s an example of relatively modern industrial x86: https://www.onlogic.com/ml100g-41/ That thing is probably faster than half of currently sold laptops.

However, containers or Ubuntu Linux don’t perform great in that environment. Ubuntu is for desktops, containers are for cloud data centers. An offline stand-alone device is different. BTW, end users don’t typically aware that thing is a computer at all.

Personally, I usually pick Alpine or Debian Linux for similar use cases, bare metal i.e. without any containers.

cannonpalms · on Oct 7, 2023

> Ubuntu is for desktops

Tell that to their (much larger, more profitable, and better-funded) server org. This is far from true.

iopq · on Oct 7, 2023

It also works much better as a server. Snaps work really well for things like certbot

On Desktop you have to worry about things like... UIs, sound, Wine, etc.

ngcc_hk · on Oct 7, 2023

That is the moat they tried to cross. Imagine you have a PyTorch app and run on iOS, arm based, amd based and intel … cloud, or embedded. just imagine. You scale and embed as your business case, not as any one firm current strategy is.

Or at least you have some case as heaven never come. Or come just we do not aware now like internet. Can you need to use ibm to rub sna to provide a token ring based network. In 1980 …

Imagine and let us or they competite …

rcxdude · on Oct 7, 2023

Not that I want to encourage gatekeeping in the first place, but you'll have more success if you have a clue what the other person is talking about in the first place (and some idea of what embedded looks like outside of tiny micros, and how the concerns about abstractions extend beyond matters of how much computational power is available).

nightski · on Oct 6, 2023

Clearly you've never used a Nvidia Jetson and have no idea what it is. You don't need a discrete GPU, it has a quite sophisticated GPU in the SoC. It's Nvidia's embedded platform for ML/AI.

ngcc_hk · on Oct 7, 2023

Better to come if the tide shift so we can have compatible layer. The key is the tide. Obviously would n try to sue … it would be a sign that finally we have real competition. Gar is where innovation do.

X86 cannot do 64 bit let us do this and that so the market can use only our cpu. Repeat with me x86-64 is impossible.

Not sure Apple is in this otherwise the real great competition come.