Oxide's "On the Metal" podcast is an incredibly fun deep dive into technical issues no one should have to deal with as told by people who lived them. "Deep" as in: software DRAM drivers, Ring -1 security, bespoke motherboard designs... I only wish there were more episodes.
For whatever it's worth, I think the reason the doc site was submitted to HN recently is in fact because of our recent episode on the frontend.[0] We have really enjoyed doing Oxide and Friends, and if you're an On the Metal listener, we think you'll find a lot to like!
Oxide + Friends is very good. It's very motivating to listen to and gets me excited to try and build new things. Way more positive vibes being put out there than a lot of stuff.
O+F is posted to youtube, I get the auto-generated transcript from there. It's unscripted/unstructured, so if you are primarily wanting to hear discussion of the oxide hardware/software/progress it's pretty difficult to consume as an audio program.
This is not a dig on the program at all; I'm glad they are making the time to produce it, and I'd rather they spend their effort getting racks out the door instead of generating marketing hype.
When people presume things about moderation, they almost always get it wrong. That's a pity because all anyone has to do to get the correct answer is ask us.
If you see an account that's banned and you don't think it should be, please let us know at hn@ycombinator.com so we can take a look. We've unbanned quite a few accounts that way. In the meantime, you can vouch for their good comments (see https://news.ycombinator.com/newsfaq.html#cvouch). But please don't divert the thread with offtopic noise that is likely to be incorrect.
Everything about Oxide's gear sounds like fun. I imagine it must be a bit like what working with minicomputers in 70s thru the 90s was like.
I did a little work in the late 90s with Alpha-based machines. I was impressed at those machines didn't seem like the hack-job crap that PC-based stuff was (with simulated chips from the early 1980's hiding out in dark corners because "compatibility") and still is today. I'm betting working with Sun gear felt similar, though I never got to work with it. Just having an honest-to-God serial console, as opposed to crappy bag-on-the-side things that scrape video memory and pretend to be "legacy" PC input devices, would be an amazing thing.
I'll never be able to work with their stuff because I don't work with Customers at that scale. I'm also vastly unqualified to work for them at their current stage. I suppose maybe someday they'll need field service technicians... I can hope, I guess.
I really wish they'd do a tour of it, hardware & software. They've gotta be proud as hell of what they've pulled off, it boggles my mind that they're not more eager to show it off.
I'd fly out to their factory on my dime and pay to watch a dog-and-pony show and hear a Q&A. I'm that excited about this stuff. Even if I just end up being a Customer of somebody who hosts my VM on this gear it's plenty exciting. It actually feels like a new computer, as opposed to the same old, same old offerings from the incumbents.
Given it was only relatively recently they were talking about getting their first deployment out the door, I'd suspect that staying mostly quiet is strategically wise in terms of avoiding having more prospective customers come in the door at once than they can (as yet) service to the standard they want to.
So "they probably really really -want- to show it off, but are showing remarkable restraint for good reason" seems at the very least plausible to me.
> I imagine it must be a bit like what working with minicomputers in 70s thru the 90s was like.
I had a chance to talk to someone who worked at Oxide once. I got excited because my experience matches up nicely with their products on several levels.
The person got excited as well and started talking about how to interview there, but the conversation got kind of weird. They kept emphasizing how important it was to not talk about compensation in the interview. Apparently they pay engineers all the same comp (not bad, but would have been a significant step down from every offer I received during that time of my life) and they select for people who aren't interested in getting paid a lot for their unique skills.
Probably not a bad deal for people who like working in that domain with like minded people. At the time I got some very uneasy feelings from being aggressively coached to not bring up compensation or ask any questions about equity during the interview like it was some unspoken rule that would get me disqualified. Maybe the person was exaggerating, but I found at least one other person with a similar story.
Honestly, their comp would have been awesome if I was a single guy living in a low cost of living location and working remote, but at the time it would have meant giving up quite a bit to work for an early stage startup with high expectations and an unspoken rule that I should never ask about compensation.
This conversation strikes me as unlikely on several levels. First, no one would have coached you on "how to interview at Oxide" because that's not where the process starts -- it starts with you preparing your materials.[0] (Our review of the materials constitutes ~95% of our process.) Second, we have always been very explicit about compensation (that is, we ourselves brought it up early in conversations); no one at Oxide would tell you to "not bring it up" because everyone at Oxide knows that it is a subject dealt with early in the process. And finally, this is all assuming that you were talking to someone before March 2021, when we published our blog post on it.[1] After the blog post, compensation simply doesn't come up:
everyone has seen it -- and indeed, our approach to compensation is part of what attracted them to the company!
The person to whom you are replying clearly meant (IMO) that you shouldn’t ask for more compensation or you will make people act defensive. Frankly, your reply reads a little defensive so maybe that’s not awful advice?
It also seems like this was a spontaneous initial conversation and not part of the process, so I’m not sure why you are suggesting that they made it up.
I didn't realize there are two ways of looking at it until your comment. It didn't occur to me that if you randomly have Oxide on a list of 40 other companies to apply to, and you expect to talk to them about compensation as you would with most others, you're going to have a weird time because they have an uncommon policy.
But on the other hand, Oxide is very up-front about it, and their CTO is happy to go on HN and chat about it. So not knowing about it makes you look like you didn't do your research before the interview, or knowing about it and trying to force the issue anyways makes you look kind of arrogant (if you don't agree with it, you can just not apply).
> It also seems like this was a spontaneous initial conversation and not part of the process, so I’m not sure why you are suggesting that they made it up.
Yes, it was an initial conversation as I said. When I asked about equity compensation I was told that compensation discussions are to be avoided because bringing it up could be considered a negative by the company.
> First, no one would have coached you on "how to interview at Oxide" because that's not where the process starts
I was using “interview” as a generic term for applying to a company, not literally referring to your internal process.
I didn’t interview with or even pursue Oxide after the conversation (and never said I did)
This all came up because I asked the person what the equity compensation was like. Not an unreasonable question when talking about a startup. That’s when they started advising me that I shouldn’t bring it up and it’s not something they talk about. After that I got uncomfortable about pursuing a company that discourages any conversation about compensation to the extent that someone felt necessary to warn me about it when I hadn’t even applied.
> no one at Oxide would tell you to "not bring it up" because everyone at Oxide knows that it is a subject dealt with early in the process.
They were trying to tell me that it was important that I avoid giving the impression that I cared about compensation, as that would be a negative if I talked to anyone else at the company. Just repeating what I was told.
> And finally, this is all assuming that you were talking to someone before March 2021, when we published our blog post on it.[1]
No, they told me to look up the blog post, but I had not read the company blog before taking to this person.
> After the blog post, compensation simply doesn't come up: everyone has seen it -- and indeed, our approach to compensation is part of what attracted them to the company!
Or maybe your approach to compensation is what filters people out of the application pipeline? I don’t think it’s realistic to think that this compensation strategy is what attracts people to the company rather than pre-selecting people out.
I read the blog post, but I feel like I’m missing the equity portion of the conversation still.
Regardless, is it so hard to believe that compensation “simply doesn’t come up” because potential candidates (like me) are sometimes coached to not bring it up? Or that the company’s stance appears to discourage bringing it up? This feels like some circular logic: Nobody brings it up because we discourage people from bringing it up.
It definitely wouldn't be viewed as a negative to talk about it, and in fact our transparency on this topic makes it very easy to talk about directly. So we absolutely don't discourage talking about it -- but it's also true that people for whom the compensation is going to make Oxide impossible do self-select out. And that's okay! People have different needs at different stages of their career, and there are different things that they want; there is nothing wrong with optimizing for compensation -- but it's also true that Oxide is very unlikely to be a fit for someone optimizing for compensation, for many reasons.
> People have different needs at different stages of their career, and there are different things that they want; there is nothing wrong with optimizing for compensation -- but it's also true that Oxide is very unlikely to be a fit for someone optimizing for compensation, for many reasons.
I think you are missing that there are other reasons to avoid Oxide that doesn't have to do with optimizing for compensation. They may very well be optimizing for something else, but compensation is still a data point and while they may take less money, maybe not too much less.
Oh, there are definitely many reasons to not work at Oxide -- not least that it's hard, grueling work! (Indeed, part of our process is getting candidates sufficiently understanding what the work actually entails to allow them to make a decision for themselves.)
Hey Bryan, I enjoyed reading the cash compensation article, but I'm curious about how it meshes with your equity compensation? Is it awarded purely on when people joined? Does everyone who wasn't a founder get the same amount? It feels like you could run into much of the same problems the article points out about transparency and so on if the equity component isn't as straight-forward?
speaking as someone who dropped on the first round: for the nobodies like me, they say equity is based on time joining. it was early 2021 i think and amount already published on the job or mentioned pretty early (don't really recall)... and it was brutally low considering the pay cut. maybe they keep it to negotiate the somebodys.
Yes, that is a sad point. I love Oxide and everything about them. It is the only company in Silicon Valley that I'm honestly kid-like excited about (I fake the "passion" for others but don't feel it). My partner is probably tired of hearing me drool about this one company I wish I could work for...
But with a family to support, it's never going to happen. The pay cut would be brutal, so I never apply. If I ever become an independently wealthy multimillionaire, the first thing I'll do it apply to Oxide. As long as I need a paycheck, it's impossible.
As someone who's only dealt with commodity server hardware, these specs make me salivate. And all these boot/management TUIs are just so satisfying to look at.
(Sorry to be that guy, but just a friendly suggestion: high contrast dark themes are difficult to read for people with astigmatism. Especially since this is technical documentation, intended to be thoroughly read, you might want to consider a light theme toggle.)
I haven't worked with SuperMicro (aside from having it inside "appliance" devices that I've worked adjacent to), but I assume my experience with Dell and HP commodity servers are similar.
The thick layer of hardware contrivances necessary to maintain IBM PC compatibility is unnecessary for the task of bulk hosting of x86/x64 VMs. There's a lot of hardware and software that just doesn't need to be there.
Bare metal out-of-band management ends up being bolted-on to these "legacy" contrivances (scraping video memory for remote consoles, faking being USB peripherals). A serial console or SSH connection to a service processor would be vastly superior. I can't begin to count how many times an iDRAC "lied" to me about issues with a machine, or how many times the solution was "upgrade the iDRAC firmware and reboot it".
I have been mostly unimpressed with the quality of firmware for motherboards, baseboard management controllers, RAID adapters, NICs, HBAs, power supplies, backplanes, front panels, etc. Every new model of system or component ends up being an exercise in fear / anticipation of problems. The integrator has very little power over the firmware quality and I can be assured that if I do have a firmware-induced issue I'm many, many steps away from actually communicating with somebody who can help.
Granted, maybe if I was buying at the scale of Oxide's prospective Customers I'd have some pull with the integrators, but I'm skeptical of that, even.
Oxide is actually building computers. Putting commodity motherboards into boxes with other commodity components won't ever have the level of attention to detail and integration that Oxide can provide.
Probably not a coincidence. It would be interesting to know which ODM they partnered with for the hardware.
I've done some work with SuperMicro in the past. Some of their boards come with extensive headers and customization options right out of the box. They're also happy to work on board level customizations with the right contracts in place.
We didn't work with an ODM: the ODMs were unwilling to contemplate some of the most basic things we needed (e.g., replacing the BMC with a much lighter weight service processor, having a true root-of-trust, etc.) -- let alone the more things we wanted to do (e.g., our own switch). The compute sled and the switch are both of our own design and look nothing like what you'll find from an ODM; if you're curious in the details, we have discussed them quite a bit in our Oxide and Friends podcast.[0][1][2]
Please consider generating some sort of automated transcript from these.
I'd hope your target audience would understand the limitations of such a thing, and I'm probably not the only person who'd rather read than listen even with the obvious caveats.
(these days automated transcripts seem to be no harder to mentally fix up the errors in as I read them than "somebody typing fast on a software keyboard and suffering the inevitable tyop and autocorrupt related issues" is, though of course others' mileage may vary)
I was going for "set up some code once and don't think about it again" to maximise the odds of the idea sounding tempting.
Proofreading would set up an expectation on the part of readers that it -had- been proofread and corrected and therefore a commitment to perform a repeated "boring but important" task going forwards for whoever's doing said proofreading.
That way would likely lie either delayed transcripts or never getting to initial activation energy to provide anything at all.
So I think "add a quick bit of code to your podcast publishing workflow and a CAVEAT IN BIG LETTERS" is better to do first.
If it turns out enough people care about the transcript, doing it a more labour intensive nicer way later is something they can decide, well, later.
This is pretty darn close to the "Gimlet" Compute Sleds
One 225W TDP 64-core AMD Milan CPU
1 TiB of DRAM across 16 DIMMs
12 front-facing hot-swappable PCIe Gen 4 U.2 storage devices
2 internal M.2 devices
2 ports of 100 GbE networking
I recommend Midnight Lizard as a backup for difficult sites. It's a little heavier/slower, but sometimes works on pages where Dark Reader doesn't. You can configure Midnight Lizard not to apply to all sites by default, then selectively turn it on where Dark Reader fails.
I would never have thought to use an extension intended for adding dark mode, to instead view dark mode websites in a light mode, but that makes total sense!
> high contrast dark themes are difficult to read for people with astigmatism.
Are you yourself affected by this? I have astigmatism and I keep hearing this, but I've never experienced it. If you are affected, do you keep your screen at high brightness? I'm wondering if it doesn't happen to me only because my astigmatism is mild, or if the fact that I tend/have to keep my screens at relatively low brightness plays a role.
Incidentally, for different reasons, high contrast dark themes can be problematic for me as well (especially at high brightness). Dark Reader and Midnight Lizard are essential for me in keeping contrast in a comfortable range.
I am super interested in learning more about the storage subsystem! I figured they'd be using ZFS, given the people involved, but it appears they've also gone ahead and built a clustered FS (crucible) on top of it? I figured something like that would be necessary to handle fault tolerance at the gimlet level. (Losing an entire shelf / drive controller, etc.) Getting ZFS to go multi-node is surely a neat trick.
Second to that I just want to say the presentation of these docs is top notch. (I so desperately wish I was the target customer for these systems; reading these docs makes me want to do terrible things to my electrical service and play with one of these racks.)
They use Crucible on top of ZFS. https://github.com/oxidecomputer/crucible
I don't think they have anything for S3-like service but there are other options for that, e.g. https://garagehq.deuxfleurs.fr or MinIO.
I am not sure whether they have their own SSDs or use of the shelf SSDs just with their firmware or something.
They implemented a block store with replication from scratch? That's kinda brave, considering that that's a project big enough to justify full startups for!
However, the folks at Oxide are at the top of the game for this space with dozens of years of experience in building and testing such systems. Secondly Oxide's crucible stack is completely written from scratch in Rust, which dramatically reduces failure modes common to such stacks, which are often written in C / C++.
Finally I actually understand what they're building. Now I must ask: Why?
On-prem servers aren't a new invention. The market seems pretty saturated (and shrinking). Virtualization isn't a new invention either. The market seems pretty well served, at least commercially. Can the integration of both be a convincing enough advantage?
The management UI certainly looks nice; it's something I'd like to have on my KVM box at home (any good Proxmox alternatives?). I don't see why it'd have to be bound to an enormous server.
One benefit is that they're competing in an industry where the time to get a sled up and running for compute can be measured in weeks (they say they've heard up to 90 days) whereas their solution is basically plug and play - Bryan was trying to get Steve to admit set up took "hours" whereas Steve was hedging and saying customers could get started "within a week."
They go into more details on their podcast, and this section in particular covers the bootstrap time: https://youtu.be/5P5Mk_IggE0?t=3381 Pretty fascinating stuff.
Do I have it right that they ship their own hypervisor (that's based on maybe Solaris, not KVM) as firmware? Let's assume a very small team can compete on a technical level, it still seems like it could cut out a lot of the potential market.
I can't imagine that large cloud / "web scale" companies would want that. Most want a fair bit of control of their own hypervisor and management stacks based around KVM. And "enterprise" type companies are going to have issues with certification I would have thought -- will RedHat, Microsoft, SAP, Oracle, etc certify their supported products on top of this hypervisor? Seems like a difficult and expensive process.
So what's left? Companies that support their own virtual machine software but don't support their own hypervisor and don't like what's available from vmware or Microsoft or RedHat. A small niche. Or are my assumptions wrong?
I think quite often when we assume 'most want to fair bit of control' is just not true. Enterprises want something that just works, they want control if they can't have something that just works.
If you have a team that is struggling building an internal cloud with all this control (and problems) and all this commodity hardware (and its problems) then maybe they would be happy to switch to something that just works.
> And "enterprise" type companies are going to have issues with certification I would have thought -- will RedHat, Microsoft, SAP, Oracle, etc certify their supported products on top of this hypervisor? Seems like a difficult and expensive process.
If that was the case and nobody running any of these would run their rack, then I wouldn't think they would not have received any funding. But I don't know enough about these certification process to really comment.
> Companies that support their own virtual machine software but don't support their own hypervisor and don't like what's available from vmware or Microsoft or RedHat
Non of these come with a fully integrated rack.
The competition would be somebody willing to buy a rack of Dell servers with VMWare software. Or somebody willing to buy a rack of Dell server and then use RedHat and set up all their own cloud style infrastructure.
> I think quite often when we assume 'most want to fair bit of control' is just not true. Enterprises want something that just works, they want control if they can't have something that just works.
That's not what I'm assuming here. Read carefully, I divide the market into 3 categories. Those who support their own VM image software and hypervisors, those who support neither, and those who support VM image but not hypervisor.
First is Amazon, Google, Facebook and the like (and it's not an assumption we can see their public contributions to KVM, QEMU, etc., and hear their talks about some of what they use internally). Second is "enterprise" who wants something that just works. Third is ? and would they want to support their software on a niche hypervisor?
> If that was the case and nobody running any of these would run their rack, then I wouldn't think they would not have received any funding. But I don't know enough about these certification process to really comment.
Well it is the case that enterprise (supported) software is not just supported on any hypervisor. https://access.redhat.com/articles/973163 RHEL runs on their own KVM as well as MS, VMware, some cloud vendors. Some application software also gets certified to hardware and hypervisors, not just operating system (e.g., SAP does this).
> Non of these come with a fully integrated rack.
It's not fully integrated if it doesn't come with the guest software though, is it?
> The competition would be somebody willing to buy a rack of Dell servers with VMWare software. Or somebody willing to buy a rack of Dell server and then use RedHat and set up all their own cloud style infrastructure.
Right. And the problem for Oxide is that the competition will have fully certified and supported operating system and application software for their virtual machines.
The hypervisor is based on Bhyve from FreeBSD + Propolis in user space. Illumos actually has/ had KVM and there is a talk by Bryan Cantrill where he speaks about the porting effort. All of that information is readily searchable.
Public cloud like AWS is a premium product. When your web scale business wants to increase profits by cutting costs and you’ve already done a few passes making your software run faster, owning your own metal & renting colo space starts looking like a big avenue for savings. Especially if you’re doing something bandwidth intensive, where AWS makes you pay through the nose. I think we’ll see a fair number of companies move back towards owning metal, especially if the metal is super easy to manage. I think the Oxide pitch is making owning racks sensible for ~2000 eng companies instead of ~20,000 eng companies.
I'm not affiliated with them, but I recall this being marketed at some point as giving you the flexibility and customization powers that the Googles and Facebooks of the world have with their on-prem infrastructure without needing to have as deep of a dedicated staff as they do to just this which allowed them to develop all their custom tooling in the first place.
Basically if you are on-prem, and you are dissatisfied with what you are getting out of today's onprem sellers. Things like bad firmware with slow update cycles, issues with rack/power supplies/cabling/interconnecting systems. Closed down systems that don't allow much customization, etc. They are open sourcing a lot of their work along the way
Again, I'm not affiliated with them, and my info may be outdated so take it with a grain of salt. But that's how I've seen them for some time.
I would say Oxide is inflexible and non-customizable since they have exactly one hardware configuration and few software features at this point. Their claim is more that their rack works and everything else on the market is full of bugs.
I'm not an infra engineer, but this claim "everything else on the market is full of bugs" might be the killer app. Of course, it needs to be true. What if they iterate to an insanely stable embedded code base (BIOS, etc.)? Then, continuously upgrade the hardware to use the latest CPU/RAM/NVME. I could see that being very valuable.
I fully expect there will be data-loss bugs and poor performance during recovery in their in-house distributed block storage solution. That's just in the nature of the problem domain, and this is all new code:
The short answer is that on-prem is used by a lot of companies for many reasons that go beyond "legacy/people don't know about cloud" (and even go beyond "regulatory environments"!), and these boxes are meant for people who have serious requirements.
I'd recommend the O+F ep posted in sibling, but I think here the pitch is "well you need this hardware anyways right? How about buying one that's easy to use and doesn't take a month to get working?" All built by people who are so obsessed with root cause analysis that they've ended up writing their own firmware, running on an OS where these people are common contributors.
Thanks, I think now I get the excitement (from the engineering perspective more than the business one)! "Fixing everything" is probably everyone's dream.
This is the most succinct pitch for Oxide, the problem is that many people do not know what the hyperscalers even are, so it doesn't land with a lot of folks.
A dumbed down interpretation. What most people can buy off the shelf for servers more or less looks like a pc shoved into an odd looking case (1u, 3U rack). To get anywhere near the cost / performance of the big players you need something that’s designed for the data center server workloads.
Stupid question (no trolling, I promise): Could this product be valuable to public cloud vendors, like AWS, Google, Oracle, IBM, etc? Or even medium sized ones, like Linode? My thinking: Could this be the cloud inversion moment akin to TSMC and outsourcing semi-conductor manuf?
For big vendors, no. Because what oxide does is basically sell the kind of server that the hyperscalers have been building for themselves internally. But for smaller cloud provider, who aren't running custom hardware made for hyperscale and are instead using of the shelf servers with all their flaws, then it could make a lot of sense.
I am definitely not an infra engineer, but I see it as an attempt to fix problems that have accumulated over decades as standards calcified and no single vendor was able to improve due to interoperability issues. Some of these things can’t be tackled without a full end-to-end solution, hence the full server. It doesn’t mean components couldn’t be swappable in the future, but at the beginning it’s best to develop on a narrow set of hardware until a solid baseline of stability is reached. Sort of an Apple-like approach for large scale servers.
Oxide is an HN darling. I have never seen anything even close to negative about them here. I hope someone writes a case-study about this. It seems to be a mix of the charisma of the team/founders and their product that makes everyone love them.
I would put Wireguard/Tailscale in the same category as well.
Here's the big negative things I can come up with quickly:
1. It's not clear the market case closes, because while there are customers who would greatly benefit from their systems, those same customers are also very averse to change, meaning that most of them will probably sit out for the first few generations to see if they have staying power. If everyone does that, it becomes a self-fulfilling prophecy.
2. ... but this would have been fine, a few years ago. It's not fine in the current market situation, where there is much less easy money looking for a place to go. I sure hope they are sitting on a long runway.
3. Speaking of, they are very late in their execution. A SM4 platform that only shipped in 2023 is not great. I really hope they are far along on their SP5 development. (... But this also dampens current sales. I bet a lot of potential customers are thinking that a SP5 Oxide Rack seems a much more appealing than a SP4 one, so why not wait?)
But the reason they are not discussed that much is that people who understand the market are all really, really hoping that they pull it off. Because the current situation in server hardware is dire. In every thread people who don't know much about servers ask why not get a similar system from Dell or Supermicro or whoever. The answer is that the commodity servers are pieces of shit that requires significant local engineering resources to manage. Software quality of firmware is generally horrendous, and when this is pointed out to the vendors, their answer is that they know, but everyone else is just as bad, wontfix. A significant draw of the cloud is that all that pain goes away, because the hyperscalers realized how shit everything was and fixed it in their systems. It just never trickled down to the market below them.
There are just very few people regularly buy 500k-1M in Server hardware and have a good understanding of the market and the finances involved in the alternative.
> I would put Wireguard/Tailscale in the same category as well.
Sometimes the hype exists for a good reason - as a paying enterprise Tailscale customer, I’ll fight you if you ever suggest I have to staff people to admin an IPsec or SSL-based VPN again.
It sound nice, but I've been building systems like that since 2012 (usually vmware vCloud director plus custom code). Using hardware like Fujitsu's cx1000, Nexus 1000 distributed vswitch, brocade network switches, and San storage. The systems I built back then could dynamically provision separate network/conpute/storage segments for PCI compliance and more. If I were to build something like this today I'd probably consider using Kubernetes too and integrating it with public cloud for scale out. This way a business can have their "own" cloud that is much more cost effective at certain scales and access to "unlimited" public cloud resources at the same time (at the cost of increase in system complexity).
Is this just more of the same, or is there some innovation there? It is interesting that computing trends go in centralise/decentralise cycles. We can observe this as far back as 1980s. I can't wait for the next "decentralise" cycle as I'm under an impression a lot more innovation happens during that phase.
> I can't wait for the next "decentralise" cycle as I'm under an impression a lot more innovation happens during that phase.
I think we are beyond that cycle. Centralization and decentralization happen at the same time. We have much more much cheaper chips now everywhere that do a lot more, and then we have local and distributed compute deployed everywhere and we also have large datacenters at the center of it.
Neiter datacenters nor distributed compute is gone go away anytime soon.
It's great to see Oxide shipping. They have an incredibly talented team thats worked very hard for a long time in the best tradition of Sun Microsystems.
Can you elaborate on the Sun comparison? I am a huge fan of Sun and what they did for computing at large - designing hardware, creating specs, their contributions to evolving unix and so on. I'm not sure how Oxide compares. Unless you're talking about "in the spirit of Sun".
I'm not the one you replied to, so I don't know what they mean specifically about the "tradition of Sun", but
Bryan worked at Sun where he helped create DTrace.
After the Oracle acquisition, he left to join Joyent as VP of Engineering and then CTO. Steve was COO of Joyent. And I have heard similar comparisons between Joyent and Sun.
Sun was a combination hardware and software shop, which Bryan appreciated and has tried to replicate at Oxide. The only reason they have a chance today is because the hardware/firmware interface in most servers is terrible quality.
I wonder how much the smallest, cheapest configuration will cost. I would really love to buy one of these. But I suspect it will cost 100x more than I can afford heh.
Supermicro equivalent to what one of those sleds might be is around $14k. Times 32, plus switches etc, plus proprietary development margin, plus lack of volume manufacturing benefits.
Yeah that’s about 100x more than I can afford exactly.
I think I will go back to my little collection of RPi Compute Module machines. And maybe some day in the very distant future I can buy big boy servers lol
Depending on your exact wants, I think there was an LTT video recently on how you can purchase older server blades for very reasonable prices, and upgrade them using also reasonably priced used server-grade CPUs and RAM. Obviously you won’t have the bang of new hardware, but if what you want to do is play around with enterprise-class systems…
I'm a little surprised that, according to the docs, "the Oxide rack does not come with any preloaded machine images" [1]. But I guess it makes sense that the initial, early-adopter customers wouldn't have a problem with rolling their own VM images. I'm at least glad that Oxide didn't decide to start by only allowing VMs that use a pre-defined set of pre-made images in some custom format.
As an on-prem sysadmin, I'm not sure I will ever in my life work on an environment that has the minimum specifications of an Oxide system. Every entire datacenter I've worked in has less total capacity than a single "sled" here.
Not in the near future, when a lot of the design relies on the benefits of scale, scaling down kind of removes some of the benefits. But also just like, things are still early, there's not a lot of choice period.
That said, we do have a lot of fans who want to buy something, and it would be cool to figure out how to do that. But also, we gotta like, make and ship the primary product. So we'll see.
As someone who has been attempting to smash computers together into 'hyperconverged infrastructure' since before Y2K, I could not be happier for the buzz around Oxide; hopefully we'll see smaller scale versions of the concept out soon -- something in the 3-10 node size for SMB and/or a small scale-out system for SOHO. It seems insane to me that the tech to do this with off the shelf free software exists, but there's no way to buy it ready to go at small scale. Go get a quote for an HCI vmware buildout and DISMAY.
That kind of magnitude of compute is more datacenter level. But either way, you're probably better off buying one of the all-in-one tightly integrated solutions from Nvidia or Intel.
I'm particularly impressed with their anti-tamper measures. "For each server sled, shine a light into the cubby to look for any physical tampering or damage."
The rack also comes with built-in security features to ensure all hardware and software are genuine Oxide products:
purpose-built hardware root of trust (RoT) – present on every Oxide server and switch – cryptographically validates that its own firmware is genuine and unmodified
encryption of data at rest via internal key management system built on the RoT
trust quorum establishment at boot time to ensure the cryptographically-derived rack secret is verified before unlocking storage
That's pretty standard? Anything that's going straight into 400V (I assume) deserves a good eyeball before it's energised. I doubt their "root of trust" knows how to protect the system from a dead mouse in the bus bars.
I'm not worried about dead mice; I'm worried about tampering. I would have hoped there'd be more guidance than "just look and see if anything looks weird". A determined threat actor can easily side-step that check with, for example, careful desoldering/resoldering technique.
I’d love to see someone provide a turnkey managed bare metal container platform, complete with L4 / L7 routing. I haven’t heard if oxide has a container play, and I suspect it may require virtualization based on their choice of host OS.
By "L4 / L7 routing" do you mean specifically something that works like Amazon's Application Load Balancer and Network Load Balancer for Kubernetes? And by "turnkey managed" you mean, you """just""" """configure""" """IP addresses""" and it """all just works"""?
You can certainly install Ubuntu on a very powerful machine with a WAN interface (e.g., a NIC connected to a residential cable internet connection). Then, use something like k0s to provision other bare metal workers. Those three steps, and you've got a "bare metal container platform." You don't need a "LoadBalancer", you can specify that nginx-controller runs on the host network of specifically the machine with the WAN interface and configure its service's external IP to the WAN IP, and now you support Ingress.
But how do you imagine having multiple LoadBalancer resources without multiple IPs? And how do you imagine having multiple IPs without ARIN? The turnkey challenging part is the public IPv4 addresses, not the platform.
> You don't need a "LoadBalancer", you can specify that nginx-controller runs on the host network of specifically the machine with the WAN interface and configure its service's external IP to the WAN IP, and now you support Ingress.
And if you need to service that machine or it goes down?
> But how do you imagine having multiple LoadBalancer resources without multiple IPs? And how do you imagine having multiple IPs without ARIN?
Most colos/transit providers will happily lend you their IPs
I don't believe they have any GPU option in this initial rack. In one of the podcasts they said something about none of the GPU vendors being open enough to allow the kind of deep integration they're going for. It might change in future of course.
After giving the "Known Behavior and Limitations" a scroll here https://docs.oxide.computer/release-notes/system/1-0-0
It seems a little... half baked? Especially for a company whose specialty is the integrated management platform. Interested to see customer reviews.
https://oxide.computer/podcasts/on-the-metal
I just noticed they have a second podcast. I'll assume it's just as good.
https://oxide.computer/podcasts/oxide-and-friends