Semi related, I have a huge number of BC-250's [0]. Now that ETH mining is over,...

speps · on Sept 25, 2022

Contribute to the different volunteer computing projects: https://en.wikipedia.org/wiki/List_of_volunteer_computing_pr...

latchkey · on Sept 25, 2022

Good suggestion, I reached out folding. Let's see what happens.

Melatonic · on Sept 26, 2022

GIMPS!

liminalsunset · on Sept 25, 2022

How well did the economics of this kind of an operation end up working out? Seems like these were a fairly recent development, so they really wouldn't have had much time, say, the 500 days cited to reach profitability.

It would be interesting to see how the GPU driver side of this works. If they boot Ubuntu, what kind of GPU driver is required to run the GPU? Is it open source amdgpu compatible?

In any case, these would work rather well for some kind of VPS server hosting or maybe more like dedicated server hosting, given the density/form factor. That is assuming the driver situation doesn't preclude a choice of operating system...

latchkey · on Sept 25, 2022

They run standard Ubuntu 20.04 and can be upgraded to 22 or whatever else comes along.

Standard AMD Ubuntu driver (21.50.2.50002, but can be upgraded as well). Heavily modified the AMD packaging to minimize it to just the necessary files because these iPXE boot (sadly, still around ~60megs).

The bigger issue is that they don't have any onboard persistent storage (could be added, but the speed is limited to about 500mbit/s) and they are only gigE.

Running strictly from memory, they are also prone to memory corruption. Odd, I know, but I see it at the scale we operate. Thus, they need to be treated as interruptible machines. Reboot to running is about 60s.

So, quite a few limitations, but still good hardware, if we can find a good workload for them.

liminalsunset · on Sept 25, 2022

Is this memory corruption you speak of silent, or simply fatal?

This could be a significant problem if the workload requires some form of integrity, since the hardware could be quietly introducing errors into otherwise normal looking computing

I remember having this issue with overclocked AMD cards mining too, where it was common to try to undervolt or overclock the memory. I wonder if any of those tuning tools work here, and if it would be possible to underclock the memory to increase its stability.

Either way, this echoes some of the sentiment I generally had around hardware intended for mining, including the bitcoin branded 2000 watt power supplies built with bottom of the barrel parts. Most hardware built for mining was built with exactly one purpose in mind, and has significant warts when it is attempted to be repurposed. The kind of constraints and requirements that cryptomining presents are really quite different from those of most modern IT systems.

latchkey · on Sept 25, 2022

Silent. It'll be things like you can't ssh into the box any more or you log in and can't reboot it. Likely due to ethash mining, which is heavily RAM based and the voltage/clocking. Luckily, it is easy to change those settings to build more stability. I have a process that auto tunes the machines for known instabilities... but the weird silent ram corruption ones are much harder to detect.

You're totally right that mining hardware was majority single purpose, especially at large scale. Those PSU's did the job, but yes, in general, hand soldered in China and prone to do weird things.

It certainly puts a hamper into what can be done with it now that the merge has happened, but I'd like to keep trying to find uses!

liminalsunset · on Sept 25, 2022

I wonder if these have any chance of running TensorFlow or other ML applications. The problem would again, be that there is no local storage and thus the 4GB Stable Diffusion model might be a bit much, but once loaded, perhaps it may work well for that kind of non critical application.

I think one of the reasons GPU memory corruption may cause the system to freeze is because the GPU and main memory are unified on APUs, which would probably explain the machines being difficult to login or use sometimes

latchkey · on Sept 25, 2022

It is effectively this GPU with RDNA1: https://en.wikipedia.org/wiki/Radeon_RX_5000_series

Yes, shared memory is definitely the cause.

ThePowerOfFuet · on Sept 25, 2022

> Running strictly from memory, they are also prone to memory corruption. Odd, I know, but I see it at the scale we operate. Thus, they need to be treated as interruptible machines. Reboot to running is about 60s.

This would be an instant dealbreaker for me. To quote the inimitable Sweet Brown, _ain't nobody got time for that_. [1]

1: https://en.wikipedia.org/wiki/Ain't_Nobody_Got_Time_for_That

throwagpu · on Sept 25, 2022

Do these support graphics of any kind? Can you run a test with Vulkan? Can the boards run windows and correctly start DirectX?

latchkey · on Sept 25, 2022

It is effectively this GPU with RDNA1: https://en.wikipedia.org/wiki/Radeon_RX_5000_series

I don't know about Windows, but at this scale, I doubt it would be easy to iPXE boot this many blades over gigE.

tekno45 · on Sept 25, 2022

stable diffusion hosting

capableweb · on Sept 26, 2022

Seems to be AMD cards, which has way less support in the ecosystem.