Ask HN: What is the point of IBM mainframes?

aq9 · on Nov 28, 2022

I think this is the wrong way to think about it.

For the type of organizations that run workloads on IBM mainframes, there are three drivers: * Legacy: The application was written for the mainframe, cannot run on anything else, too expensive in terms of dev and test time to re-platform * Business value: This is the big one; these workloads make their companies 100s of millions to billions of dollars per year. The price premium for running this on a mainframe is a rounding error. * Reliability: With the cloud, I hold the opinion that the average x86 application is less available/reliable than a well-run pre-cloud application (which already included HA, etc.). Mainframe apps and hardware blow all this out of the water.

FWIW: I programmed mainframes briefly early in my career, I am quite familiar with the ecosystem.

PaulHoule · on Nov 28, 2022

The reliability is legendary.

Every calculation in the CPU is replicated. If it shows any sign of failure it will try to migrate threads off the failing CPU to other CPU.

DRAM is RAIDed.

There is a disaster recovery capability that can replicate several data centers within a 70 km range via optic fiber. If one of them burns, get flooded or hit with a nuke the others will pick up the slack automatically.

riskable · on Nov 28, 2022

> The reliability is legendary.

Not at the OS level. Back when I was doing penetration testing nearly every organization that had IBM mainframes would suffer pretty severe outages just from our basic scans and doing things like checking open ports. They were also super duper easy to break into 90% of the time.

Also, most of the software running on mainframes has been running for decades. Which means they had like 40+ years to work out all the bugs. I'm 100% certain that if you took any given "modern" software stack (take your pick!) and very carefully applied patches to it for 40 years without ever adding any major new features it would be equally as reliable.

reacharavindh · on Nov 28, 2022

I’ve had a few good early years working at the system level z/OS, z/VM and the mainframe hardware from z10 era.

The reliability is indeed legendary for the usecases it is originally designed for (running Z/OS or TPF, IBM DB2 on Z/OS, CICS and COBOL batch jobs). However, IBM marketing folks will try to sell you on specialty processors that can run Java applications, Linux VMs(s390 arch) etc - that’s where the reliability rails come off.

Most serious mainframe users I worked with had their legacy applications which did one of the original usecases I mentioned, and it runs reliably. At the hardware level, redundancies are engineered at every level. You can hit plug CPUs like blades on a running system for maintenance, and replace them. Same with memory modules, storage devices etc. Upgrades to z/OS are also so thoroughly documented that you can avoid downtimes or plan for minimal ones..

nuc1e0n · on Nov 28, 2022

Using Xen (https://xenproject.org/), you can live migrate running Virtual Machine containers from one physical computer to another on commodity PC hardware while continuing to serve requests (no down time). You can then turn off the computer the VM was originally running on, upgrade it then migrate the VM back to it again. I did this once while pinging the VM from another machine. It didn't even drop any packets. My jaw dropped though.

jacooper · on Nov 29, 2022

Proxmox also does this

PaulHoule · on Nov 28, 2022

The "specialized processor" is a different microcode for the same CPU. My understanding is that this microcode has disabled a few instruction so z/OS won't run but doesn't really do anything special for Java like

https://en.wikipedia.org/wiki/Jazelle

The point is that Java or Linux workloads could be run on some other CPU and face competition but this is not the case for z/OS. Thus there is a reason to lower the price for workloads that could be easily migrated but keep it high for captives.

not_me_ever · on Nov 30, 2022

Yes, they do crash, but they never produce wrong results, and that is exactly what they were built to do. And don't forget: These things pre-date the internet, so even thinking about random strangers having access to the network was unthinkable.

Back in the day all the essential software, e.g. flight control, which still largely runs on S/360 today, had to be proven to be correct. There are mathematical concepts/processes that allow that.

The firmware is proven to work correctly. The compiler is proven to work correctly. The OS is proven .... I guess you get where this goes.

The problem is: Nobody today even wants to learn "how to prove software" anymore. I tried to teach a class at my University a few years back, and 12 out of 12 students dropped out in the first 3 lessons. My usual dropout rate is close to 0%.

For non-essential software you are probably right. After 60 years of development all the bugs are gone; Plus the developers know their machines (hardware & software) inside out. No chasing after ever changing platforms, standards, APIs & SDKs.

feet · on Nov 28, 2022

Is security the same thing as reliability?

riskable · on Nov 28, 2022

Not "the same thing" but security is a HUGELY IMPORTANT under the umbrella of "reliability".

If your stuff isn't secure how could you possibly make claims that it's reliable? If I said, "resilient" (synonym for "reliable") instead would it make more sense (in the scope of IT stuff)?

CRConrad · on Nov 29, 2022

But is security even "under the umbrella of 'reliability'"?

Depends (like pretty much everything else) on your definitions, I guess.

I could see defining "security" and "reliability" along orthogonal axes:

"Sure, it's a bit leaky, but it's very, very reliable -- including reliably leaky."

rantingdemon · on Nov 29, 2022

On a high level, security can be thought of as the CIA triangle.

Confidentiality Integrity Availability.

ninefathom · on Nov 28, 2022

I can second the legacy and reliability assertions. My experience with z/Arch was more recent- and it's a very different world from the modern cloud CI/CD Linux git Dev[Sec]Ops thing- but the reliability and legacy support immediately impressed me. A single z/Arch rack is basically a cloud in a box - get it power and network connectivity and you're golden. Other vendors can offer that though. The legacy support is another story; I watched an NGiNX proxy and a custom financial reporting app from the late 70s running side by side on the same box, and that's when I understood the additional dollars.

LinuxBender · on Nov 28, 2022

What can you accomplish with an IBM mainframe that you could not do on x86 Linux servers?

I would probably word that as what can a mainframe do that commodity servers can not. They are more reliable and allow for spinning up a vast number of workloads with incredibly fast virtual networking. If one needed a "cloud" of Linux nodes, a Z16 would be insanely fast deployment and teardown of workloads on demand and those nodes could talk to each other with very little latency and high throughput. Some companies and organizations need a small private cloud (think pod or blast-radius contained mini region) that can spin up their cloud region on demand. Even in a catastrophic failure, the fault blast radius is limited to the group of racks comprising the mainframe.

The cost is not just hardware. Anything IBM is going to be IBM supported and your business would have factored that into the ROI/TCO. The contracts are very expensive but you have the highest level engineers a phone call away and if you require it they can also remotely diagnose problems and have replacement parts in {n} hours based on the contract. Not everyone needs this of course which understandably leads many to question the cost.

mikewarot · on Nov 28, 2022

Capability Based Security[2] is what IBM and other Mainframe systems offer. When you have a job, in the setup, you specify exactly what resources and runtimes are to be made available, which virtual disks, etc. The code running in that constrained environment can't affect anything else. This is, effectively capability based security. It's the thing that didn't get built into Multics in time, and Unix made fun of ignoring.

It's the same thing that Linus thought wasn't important at all, ignoring the advice of his teacher, Andrew Tannenbaum who was trying to teach him about microkernels, and why they were better[3].

In the PC world, we don't even have ECC memory in most desktops, etc. That's why Rowhammer[1] and all the other things like it work, because it's substandard RAM that we've grown to accept in the name of cost savings.

We keep finding Ersatz versions of Capabilities, first in Virtualization such as VMware, then in Docker and containers, and now in WASM/WASI. We'll eventually learn the lesson, likely in 5-10 years now. If we can keep WASM from getting corrupted, it might make it.

Sure, mainframes aren't cost competitive, because the cost of computing is no where near as important as the business it makes possible.

[1] https://en.wikipedia.org/wiki/Row_hammer

[2] https://en.wikipedia.org/wiki/Capability-based_security

[3] https://en.wikipedia.org/wiki/Tanenbaum%E2%80%93Torvalds_deb...

0xbkt · on Nov 29, 2022

What is Ersatz?

marcus0x62 · on Nov 29, 2022

It means imitation, subpar, or fake.

mikewarot · on Nov 29, 2022

jerry-rigged, improvised, a kludge that works, sort of

closeparen · on Nov 29, 2022

My understanding is: Silicon Valley hires tens of thousands of the most brilliant, expensive engineers to write always-on distributed systems and to try and keep them alive with the data sort-of consistent. Enterprise does not do this. Your bank writes simple, naive, non-distributed programs. The operator of one super-reliable computer invokes the program and makes sure it runs to completion each night. When it is time to make a deployment or a database migration on an internet-connected service, they just declare a maintenance window and go down for a few hours. There's one big ole SQL database in normal form with joins and transactions.

CRConrad · on Nov 29, 2022

Agree on all except the last sentence: As I understand it, on the more modern mainframes there's one big ole SQL database in normal form with joins and transactions, but on others there's a huge bunch of applications (usually written in COBOL) that handle data in fixed-field files, predating normal form, joins, and transactions.

jareds · on Nov 28, 2022

You can continue to run internal software that has been under active development for over 50 years. IN 1970 Unix and C did not exist but COBOL did. Do you really want to rewrite all your credit card processing code from scratch when it has 50 years of business rules and bug fixes?

hindsightbias · on Nov 28, 2022

google news malware, ransomware, dc/cloud outages, millions and see how many reference an IBM mainframe.

You get what you pay for. All your new fangled software and hardware just reinvents the wheel, is buggy, is layered on an onion stack with a lifespan of maybe 3 years and takes a lot more humans and money to run, diagnose and maintain. And in three years, a new CTO will come from Wharton and tell you you’re a moron for not using microservices 2.0 and spend $100M rearchitecting it.

In the meantime, Z/Power just keep going and 4 or 5 guys run the whole show. They sleep at night and have a life.

tacostakohashi · on Nov 28, 2022

I currently work at an organization that has a lot of very flaky applications that are spread across a handful (5, 10, 20) servers using zookeeper, ignite and a few other frameworks that attempt to provide "clustering".

It all ads a lot of overhead and isn't particularly reliable - frankly, I think it would make mor sense to run it on a mainframe, and have simple application code because the fault tolerance is on the hardware/OS layer.

If you are a big tech company, then the massive scale means the cost advantage of x86 is worth dealing with node failures and it's not like any single mainframe for handle it, but for a lot of internal applications that are too big for any single x86 machine, but don't require thousands of nodes, I can totally see where it makes sense.

markus_zhang · on Nov 28, 2022

I actually want to ask the reverse question: why is it not popular for mainframe manufacturers to rent out their cores and rams and storages for non banking/insurance businesses? Is it because of the cost?

riskable · on Nov 28, 2022

It's because they're not cost competitive. For the same amount of money you can get a whole heck of a lot more servers (CPU and memory!) running Intel, AMD, or more specialized hardware. Even if we're only talking about their (IBM's) cost!

If mainframes were actually competitive with modern server hardware everyone would be using them. Even IBM uses regular Intel hardware in their cloud stuff!

Mainframes aren't even fast... IBM will make all sorts of BS claims about memory speed and the bandwidth of their interconnects and whatnot but all of it is 100% artificial benchmark nonsense that doesn't translate into real-world usefulness because nobody is rewriting their mainframe shit to take advantage of it.

I don't know about you but in the time I've been in IT "hardware failures" that actually had any sort of serious impact on operations were few and far between. The whole point of modern solutions (everything from N-tier architecture to containers and on-demand compute/function stuff) is to make the hardware irrelevant. At my work we had a whole data center taken down as part of a planned test and I doubt that any end users even noticed (and it was down for hours because they screwed something up when bringing things back online hehe). I think something like 6,000 servers and some large amount of networking equipment were complete powered down? I don't know the specifics (and probably shouldn't give them out anyway).

The whole point of mainframes is to serve a function: All the hardware is redundant/super robust (within itself). That function is mostly meaningless in today's IT infrastructure world.

naikrovek · on Nov 28, 2022

Mainframe performance is very fast within the mainframe itself. as with every other platform, performance drops quite a lot when you escape into the real world.

erk__ · on Nov 28, 2022

IBM cloud does rent out mainframe cores, although it is quite expensive

simne · on Dec 2, 2022

It is because in business practice very widely used paradigm/model (of thinking) "good enough". Most known example was Ford model T - it was not the fastest, it was not most beautiful, but it was cheap and reliable enough for daily usage, and it conquered US.

Nearly all other producers, before Ford model T, tried business models "the fastest", "the most reliable and comfortable", they are also good models, but they are just much less popular than "good enough".

rantingdemon · on Nov 29, 2022

Interestingly, this was a story line in the series Halt and Catch Fire. I think it was in season 2.

simne · on Nov 30, 2022

I could tell a story from a typical mainframe client decades ago.

This was successful big distributed corporation, which begin, when mainframes was only viable technology for them, and all their business was inside machine.

They grow decades, but once happen problem - mainframe reach it's limits, and vendor was not agile enough, and said "we will transfer all your infrastructure to new hardware, but in few months" (they understand this as will stop business for long time).

I don't know all exact details, all I know, other company made them offer, claimed they will transfer to cloud, without stopping work.

simne · on Nov 30, 2022

I think, this is because IBM is software/consulting company, and their software costs so much, that hardware cost not important.

And sure, big share of their business software, intensively use very specific features of their mainframes.

And as a last trick, usually such companies don't disclosure real prices, but when you buy hard+soft+consulting in one, they offer huge discounts.

So you sure, could pay their consultants, to write on Linux, but it will cost more.

simne · on Nov 30, 2022

For hard specifics, here people already said about memory protection.

I could add, I've seen x86 server from IBM, but with IBM chipset, it was capable for more cpus than any other competitor of it's time, and for some clients this was killer feature.

CRConrad · on Nov 29, 2022

As I've understood it[1], in a word: Throughput.

___

[1]: Edit -- Removed superfluous "haven't read TFA" disclaimer.