Hacker News new | past | comments | ask | show | jobs | submit login
What Happens When You Mix Java with a 1960 IBM Mainframe (thenewstack.io)
230 points by 3n7r0pY on Jan 23, 2017 | hide | past | favorite | 79 comments



I'm still struggling with my own legacy code problem. I'm reviving an old LISP program from the early 1980s. Parts of it were written for the original Stanford AI Lab SAIL system in the 1970s. It last ran under Franz LISP in 1986.

My current struggle is with one line of code:

   (defun getenode (l) (cadr l))
That ought to be simple enough. But it's being applied not to a list, but a "hunk". A "hunk" is an obsolete MacLISP concept.[1]. It's a block of memory which has N contiguous LISP cells, each with two pointers. This is the memory object underlying structures and arrays in MacLISP. Macros were used to create the illusion of structure data objects, with hunks underneath. However, you could still access a "hunk" with car, cdr, cxr, etc.

I'm converting this to Common LISP, which has real structures, but not hunks. That, with some new macro support, works for the regular structure operations. So far, so good.

But which element of the structure does (cadr l), which usually means the same thing as "(car (cdr l))", access? (cadr (list 0 1 2 4)) returns 1, so you'd think it would be field 1 of the structure. But no. It's more complicated and depends on how hunks are laid out in memory.

The Franz LISP manual from 1983 [2] says "Although hunks are not list cells, you can still access the first two hunk elements with cdr and car and you can access any hunk element with cxr†." At footnote "†", "In a hunk, the function cdr references the first element and car the second." This is backwards from the way lists behave.

A blog posting from 2008 about MacLISP says "A Maclisp hunk was a structure like a cons cell that could hold an arbitrary number of pointers, up to total of 512. Each of these slots in a hunk was referred to as a numbered cxr, with a numbering scheme that went like this: ( cxr-1 cxr-2 cxr-3 ... cxr-n cxr-0 ). No matter how many slots were in the hunk, car was equivalent to (cxr 1 hunk) and cdr was equivalent to (cxr 0 hunk)." Note that element 0 is at the end, which is even stranger. The documentation is silent about what "cadr" would do. Does it get element 2, or get element 0 and then apply "car" to it?

The original code [3] contains no relevant comments. I'm trying to figure out from the context what the original author, Greg Nelson, had in mind. He died in 2015.[4]

[1] http://www.mschaef.com/blog/tech/lisp/car-cdr.html [2] http://www.softwarepreservation.org/projects/LISP/franz/Fran... [3] https://github.com/John-Nagle/pasv/blob/master/src/CPC4/z.li... [4] https://en.wikipedia.org/wiki/Greg_Nelson_(computer_scientis...


There are only a few possibilities for what (cadr hunk) could mean. One way to solve this is to try them all and see which one runs successfully.

Based on †, it sounds like (cadr (hunk (hunk 1 2) 3)) should return 2.

Is the old LISP code available online somewhere? I'm curious to see it.


You can run it on an emulated PDP-10 running ITS. A friend of mine has been creating an easy-to-use project setting all of this up. MacLisp is included in the the system.

https://github.com/PDP-10/its


See the link listed above: https://github.com/John-Nagle/pasv/blob/master/src/CPC4/z.li...

I put all the code on Github. The oldest version of each file is exactly what ran in 1986.

The code is delicate. It's a theorem prover, and there's much manipulation of complex data structures, with little explanation of what's going on. The overall theory is documented; this is the original Oppen-Nelson simplifier and there are published papers. But the code has few comments.


getenode is only called in a few places, and each call looks like:

  (getenode znode)
i.e. getenode is only called on znodes. The definition of makeznode is:

  (defun makeznode (node)
         (prog (l znode)
               (setq l (list (list '(1 . 1) node)))
               (xzfield node (list l node nil))
               (setq znode (tellz l node))
               (or (null znode) (eq znode node) (break makeznode))
               (return l)))
So it seems like getenode is called on a regular list whose structure looks like

  (list (list '(1 . 1) node))
Assuming "node" is an enode, the way to access it would be (cadar znode), not (cdar znode). Try changing the definition of getenode to

  (defun getenode (l) (cadar l))
and see if it runs.


I thought that way at first. But the program breaks when (getenode znode) is called on an item of type "node", not a list. This happens when (interntimes) takes the path which ends

    (or (setq znode (tellz l node)) (return t))
    (or (eq node (getenode znode)) (zmerge node (getenode znode)))))
and (tellz) has taken the path which ends (return (baserowz* i)). "baserow*" is an array which contains links to "node" items, not list cells. What "z.lisp" and "ze.lisp" are doing, by the way, is solving systems of linear inequalities by using linear programming on a sparse matrix.

Also, see

    (defun isznode (x) (and (hunkp x) (= (hunksize x) 8))) ; original

    (defun isznode (x) (equal (type-of x) 'node))   ; CL version
which indicate that znodes are hunks/structures, not lists. This is inconsistent, yet somehow it used to work.

(If you want to talk privately about this, I'm at "nagle@animats.com". Too much detail for HN.)


or abuse, ouch!

   (unless (setq znode (tellz l node))
     (return t))


Having done a bit of lisp spelunking myself, I would suggest that you dig into the source of the lisp your program ran under. Trying to figure out behavior from the surrounding context can be rather difficult when you're playing with an archaic, sparsely documented lisp.


Note that this is the order of display:

The order of display of hunk slots is historical in nature. For better or worse, the elements of a hunk display in order except that the 0th element is last, not first. e.g., for a hunk of a length n+1, (cxr1 . cxr2 . ... . cxrn . cxr0 .)

It could still make sense to have the layout in memory being sequential, like (cxr-0 == CDR, cxr-1 == CAR, ...others ...).

Note also that CAR extracts the leftmost element of a hunk, just as it addresses the leftmost element of a cons. Similarly, CDR extracts the rightmost element of hunks and conses.

It seems more logical that CADR is just the combination of CAR with CDR. It don't think the designers would try to transpose the fact that it means "second", with proper lists, for hunks. It just seems unlikely, but I have no proof.

Also:

(Note that the operation CAR is undefined on hunk-1's, but CDR is not.) This means that if you want to make a plist for a hunk of your own, you can use its cdr as a hunk; it does not mean that you can blindly assume that any hunk wants its CDR treated that way. The exact use of the slots of a hunk is up to the creator; it's a good idea to mark your hunks (e.g., by placing a distinctive object in their cxr-1 slot) so that you can tell them from hunks created by other programs.

My guess is that there is some metadata associated with a hunk, stored in CRX-0, a.k.a. CDR.

http://www.maclisp.info/pitmanual/hunks.html


You should consult the Maclisp manuals. There is one in ITS, and the Pitmanual is here: http://maclisp.info/pitmanual/index.html

You should probably also test some code in Maclisp.



Is someone in the US government really still using an IBM 7074? Really? I'm shocked. How could that possibly be cost-effective?

Actually, this blog post makes the story clearer:

http://nikhilism.com/post/2016/systems-we-love/

It isn't a physical IBM 7074.

When it came time to migrate from 7074 to S/360, rather than rewriting their 7074 software, they just wrote a 7074 emulator for S/360. And, it sounds like, they are still running their 7074 software, under their 7074 emulator, most likely on a recent z/Architecture mainframe.

The article makes it sound like people still use "1960s mainframes" when I very much doubt anyone is still running 1960s hardware in production. People use modern machines–modern IBM mainframes, which are multicore 64-bit processors–or other mainframe vendors such as Unisys or Fujitsu use mainly I believe x86-64 running Linux running a software emulator for the old mainframe CPU.

A lot of legacy, sure, but I think this article makes it sound even more legacy than it really is.


The picture of an IBM 7074 is from Wikipedia.

IBM offered 7074 emulation as a standard IBM System/360 product.[1] On an S/360, it required some special hardware support. In 1972, IBM gave users a free IBM 7074 emulator, software only, for System/370 machines.[2] They may still be running that program on a Z-series mainframe.

[1] http://bitsavers.trailing-edge.com/pdf/ibm/370/compatibility...

[2] https://books.google.com/books?id=p5zVQgaQ-N0C&pg=PA11


It's possible Ms Belotti and her team didn't realise they were working with an emulator. Mainframes being what they are, the programmers were probably never in the same room as the machines they were progamming, and it does take a bit of digging to figure out that the architecture you see before your eyes is emulated on another machine (like in "The Story of Mel").

Still, even the blog post you link to doesn't make it absolutely clear that java was running on an emulated s/370. It says that the decision was made to emulate the older architecutre rather than rewrite the old programs, but then it goes on to say "These are still operational". Does it mean the old programs? Or the old machines? It's hard to say.

As to how unlikely it is to see a very old machine still in use, instead of one made in more recent times, last year I talked to an engineer who claimed he had seen a PDP still in operation in some transport company if memory serves.


I believe (a coworker of mine worked on S/360) that all the old software can be effectively run through emulators on the current system z. The feeling was that once you had done the development and testing of an older system, IBM was never going to force you to rewrite your code. As a result, the upgrades were pretty seamless over the years. Many of those customers never had to upgrade and never had a reason to.


PDP-10 was discontinued in 1983, but PDP-11 wasn't discontinued until 1997, with third-parties continuing to sell parts, so it's really not that unlikely to come across PDPs, depending on which line.


I 100% buy they're using 1960s hardware. I've talked to some people who had to spend half their day on ebay trawling for parts to keep their ancient systems running. I've personally worked with medical offices still using 1970s hardware, it's not that rare. Many places have an "if it ain't broke don't fix it" attitude.


And it's an open question whether or not that attitude is better. A day or two of office admin time every 3-6 months is quite possibly more cost effective than hiring one of us for weeks/months/years to create a new system...


Especially in a doctor's office where artificial supply restrictions on degrees and licensing give them little or no competition for clients.


I/O speed reported in the article seemed highly unlikely for mag tape


"a different group of mainframes would run an automated job that would harvest that data from the magnetic tape and load them up into more traditional databases"


I think most of it is highly unlikely...


> A lot of legacy, sure, but I think this article makes it sound even more legacy than it really is.

Indeed. The point of the talk was that 1) legacy is often assumed to be bad not for any real technical reasons but just because it is legacy and 2) a lot of what was being presented as legacy wasn't even legacy. Their OS 2200 version was actually newer than the Oracle DB they were using on the "modern" side of the stack.


You might enjoy this article given the kind of work you do:

http://www.pcworld.com/article/249951/computers/if-it-aint-b...

You've probably seen plenty of crazy stuff in legacy systems but I'm hoping at least one surprises you. Maybe the first one. :)


I know of an S/360 assembly code base that (as of a decade ago), still ran code that assumed 24 bit address space and used 31 bit pointers with tags stuffed in the unused 7 bits. So it had to run in the lowest 16MB of memory, on a 64-bit machine. I don't think they had the budget to rewrite that subsystem to fix it. So yes, new hardware, but some of the legacy problems in the software run very deep.


They're often virtualized but don't underestimate mainframe persistence.

At my last job as a consultant someone made https://github.com/manheim/antimony for testing IBM TN5250 mainframe screens in Ruby - it's like Selenium but even more brittle. :>


It's "cost effective" because "cost effectiveness" is not as objective as we think, once we start considering things like risk tolerance.


ClearPath Dorado still had a Univac on a chip until 2015. This is actual hardware, not just software emulation.


Yes. And they never accessed 1960 IBM Mainframe at all with their software. The title is probably intentionally misleading.

And the one they actually accessed and received query results in 6 milliseconds was introduced in 2008, not so old and slow:

https://www.app5.unisys.com/offerings/ClearPathConnection/st...

"Single image performance range of 300 MIPS at the entry level and maximum single image performance of approximately 5,700 MIPS (32 processor system)."

"An expanded memory subsystem supports larger memory capabilities and offers memory configurations that include the ability to expand up to 4GW per cell and up to 32GW for a maximum eight-cell system."


Mazal tov, that's containerization...from the 1970s.

Everything old is new again. I love our profession.


I was going to ask if they couldn't just replace the hardware with emulators :-)


Does ext4 journal mode not journal everything?


This article was such a pleasant surprise! I loved talking at Systems We Love and I love talking about legacy architecture in general. Bryan and the Joyent team were very accommodating and understanding. The White House is an amazing place to work (even now), but it's not a system that understands conference talks very well. Wish we could have a video, but it was a heavy lift just getting the bureaucracy okay with the idea that I was going to talk about information that was already public, without naming agencies or projects and without going into detail beyond what was in a user manual and that this was not a security risk.


Hi Marianne! I'm with The New Stack (not the writer of this article), and was wondering about the video myself. Interesting to see you address that there's definitely no public facing version, as it seems a lot of the commenters here would love to see it. I can only imagine the bureaucrat complexity involved.


Oh yeah, when I talked to the agency that owns the 7074 (more on this in a second) they were like "You can't do this because we will get hacked"

"How are you going to get hacked if I talk about your mainframe? It's not connected to the public internet, is it?"

"No. Well... we don't know... but ... hackers! Hackers are really smart Marianne."

Part of the compromise was that I promised I would only use information that was already available publicly through government reports and news articles. I went back through my talk and documented where each fact was already published somewhere else until they were comfortable with it. So the ambiguity on whether the 7074 was the actual machine or an emulator was deliberate... there were certain things I could not find a public comment on and therefore agreed to avoid making direct statements about.

This all seems super annoying, but it makes sense when you realize how heavily scrutinized public servants are. In the end they are only trying to protect me, my organization and Obama's legacy. Three things that are really important to me. So I can't exactly blame them for it. I was happy to be able to find a middle ground where they felt comfortable, the organizers weren't too badly inconvenienced and I got to give the talk I wanted to.


>> It was at this point that the seasoned data architects in the department began expressing their exasperation. “15 years ago, everybody was telling us ‘Get off the mainframe, get on AT&T applications, build these thick clients. Mainframes are out.’ And now thick clients are out, and everybody’s moving to APIs and microservices, which basically are very similar to the thin client that a terminal uses to interact with a mainframe.”

A.k.a. "Them as not knows their history are doomed to repeat it". But I suppose it's more the case of business imperatives than real ignorance that drive this mad race to make new stuff that works worse than the old stuff only so we can then go back to the old stuff with a different name.

Btw, that lady is my new tech hero:

“The systems that I love are really the systems that other engineers hate,” Bellotti told the audience — “the messy, archaic, half chewing gum and duct tape systems that are sort of patched together.

<3 <3 <3


"15 years ago" doesn't sound right.

Thick clients were being pushed in mid 90s. But then Clipper Chip plans went south ...


Consider that state/federal bureaucracies get on "trends" with a delay of 2 or 3 years minimum, and the story is probably from 2015 or thereabout - the agency was launched in 2014.


Wish I could upvote this twice.


The thing is that there was a lot about those old systems that was slow, so you were very, very careful how you programmed. You tended not to use vast library stacks, you went close to the metal and you coded in languages like Assembler, COBOL or FORTRAN. I/O was often run through specialised co-processors (such as IBM's channel processors) and the terminals could sometimes help too.

I have friends who have been looking after legacy applications for an airline running on Unisys. The core apps for reservation, Cargo booking and weight/balance were written in FORTRAN. In recent times, the front end was written in Java to give web access. They tried to rewrite the core apps but it was impossible to do so and get the performance.


>> They tried to rewrite the core apps but it was impossible to do so and get the performance.

Well, Cobol is a bit like the C of mainframes - you can manipulate memory directly and so on. You can't really do that sort of thing with Java.


a) if it was really running on the old hardware; in that case ruby on a modern machine would have been several magnitudes faster than the original code - at least because of the faster IO

b) if the whole thing was indeed running in an emulator, the emulation overhead would have negated all direct memory access advantages


Emulators on mainframes are much more sophisticated and performant than is typical on x86 and ARM platforms. The hardware and even software is often designed with emulation in mind, not just for backwards compatibility but for forwards compatibility, too.


Do you mean to say that they were running a mainframe emulator on a mainframe?


Compilers for some IBM mainframes have for decades (since the 1970s, I think) targeted an intermediate virtual machine instruction set which is then translated on-the-fly to the local architecture by the OS upon execution. So in the case of IBM their machines are truly built with both forward and backward compatibility in mind. The pointers for this instruction set have been 128 bits since the beginning, long before 64-bit hardware even came into being.

Some (or all?) of the latest Unisys mainframes run on Intel Xeons, but with custom chips for translating the machine code of their old architectures.

I don't work in this area. I just like reading about it. Though, unfortunately, it's difficult to find clear specifications and descriptions on how these architectures work.

For example, Unisys' 2200 ClearPath architecture is one of the (if not _the_) last architectures still sold that uses signed-magnitude representation, as well as having an odd-sized integer width of 39 bits. (INT_MAX is 549755813887 and INT_MIN is -549755813887, and the compiler has to "emulate" the modulo arithmetic semantics required of unsigned types in C. ClearPath MCP is also a POSIX system, and has to emulate an 8-bit char type.) I discovered that you could download the specification for their C compiler for free online, which was useful when discussing the relevancy of undefined behavior in C. But AFAIU (and this is where finding concrete details is more difficult) the latest models of the ClearPath line use Xeons with custom chips bolted on to help run the machine code of the older architecture. In any event, the point is that while the old architecture is arguably emulated, it's not a pure software emulation that you might assume, and the resulting performance is better than the previous models of those mainframes, which were still being built at least until a few years ago. In other words, direct memory access isn't ruled out because the I/O systems may have been intentionally designed to work efficiently in a backwards compatible manner.


The 128bit pointer intermediate code is used on what IBM calls "midrange systems" (ie. AS/400), not mainframes. IBM mainframes execute their machine code directly and the ISA is designed such that it allows for efficient virtualization since beginning and is extensible and backwards-compatible. Otherwise the IBM mainframe magic is in IO offload and truly immense memory bandwidth. On the other hand, Unisys systems use architecture that is significantly different from what today's programmer would expect, with completely different memory model originally implemented in hardware (which essentially combines the memory protection model implemented on AS/400 in software with lisp machine-style pointer tagging).


Excellent comment. I'll also point out that system z mainframes are not slow machines, so even with some emulation overhead there is typically enough performance.


Yes, but good luck translating any sizeable chunk of code to a higher level language without a massive effort figuring out which things are discarded side-effects that can be ignored and which things are relied on later. I just spent a few hours last night massaging a C-translation from 6502 assembler. It's a tiny piece of code - ~3000 lines that'll probably shrink to ~2500 or so as I figure out which results are ignored (the original translation attempted to do a faithful 1-1 translation instruction by instruction, which leads to things like long sequences to handle basic multiplication etc.), but it takes ages, because it is not always obvious when it e.g. depends on the status flags set, and values keep being moved between registers etc. Now try doing that with a big piece of code.

There's a reason why people often resort to emulators.


> The thing is that there was a lot about those old systems that was slow, so you were very, very careful how you programmed.

That's a common sentiment. I wish I could find the quote by someone who made the transition; it was about how happy they were to be able to compile so much quicker, and how getting immediate feedback made them so much more productive.


The notion of waiting ages for programs to compile or assemble is mostly related to the older hardware.

I compile/assemble COBOL and IBM's assembly language on a z13 daily and it's pretty much instantaneous.


> The notion of waiting ages for programs to compile or assemble is mostly related to the older hardware.

Oh, I was talking more about older ways to organize the data centre: batch vs timeshare processing.

https://en.wikipedia.org/wiki/Time-sharing


Did anyone read this and go "hu"?

There's this whole thing about how they are getting data from mainframes where "the data was being returned in between one and six milliseconds".

But then: "harvest that data from the magnetic tape and load them up into more traditional databases. That Java application was extracting the data from the databases"

But then: "But the data from the mainframes was actually arriving (from its new home in the database) in less than six milliseconds. The bottleneck was — of course — the Java application."

So of course it is entirely possibly to write slow Java applications. But then the story seems to end! So what happened? Did they fix the application?


I have heard from many of my friends about massive projects of 'modernizing' mainframe applications with Java stack. Java did not deliver improvement that management was expecting. Once the project consumed all the budget for ~25% completion, they were scrapped.

I think, though without any proof, that overreliance of 100s of mixed quality libraries, combined with 'best practices' of enterprise development and heavy application of design patterns creates a very large surface area for change. This makes reasonable translation of functionality to Java almost impossible.


Was it Java that did not deliver improvement, or was it the team? :)


… or organizational culture? A lot of places really wanted to believe the problem was the technology because that's relatively much easier to change than going to a bunch of very senior managers and telling them that the way they're used to doing business is too expensive to continue.


I am inclined to say Java as I have not seen mythical teams who work on Java without whole caboodle of 'Enterprise Apps' culture.


You can write modern software on Java without using "enterprise" features: https://github.com/prestodb/presto


Nice. It may be the best way to organize project this large. I know this is maven thing and once using everyone's favorite Intellij IDEA it should not matter but I personally find ~1000 directories for ~4000 files a kind of Javasim.


Can you expand on what makes the Presto project different than the aforementioned enterprise approach? (Not a Java developer, so I don't have context)


I believe its more because companies that have "enterprise apps" culture embraced java more, rather than it being the fault of Java itself. I've seen first-hand when these companies have "enterprisified" C++, C# and heck, even PHP

I do think Java the language has a long way to go, but it is catching up a bit. I see it as the least common denominator language for most companies. Large open source java projects have been successful (everything on hadoop, hbase, cassandra etc) and a lot less enterprisey.


When I first saw the Chromebook, my thought was that Larry Ellison's 1998 dream of a network computer thin terminal had come true. It's smarter than a true thin terminal, but everything lives in the cloud (err, butt).


Being someone who worked for Sun at the time, I'd rather say it was Scott McNealy's dream.


That Chrome extension is clbuttic.


Indeed.

> but everything lives in my butt (err, butt).

I had to open the comment in porn (er, incognito) mode in order to parse it :D.


"It used magnetic-core memory (instead of cathode ray tubes) according to Bellotti"

I didn't think magnetic-core memory and CRTs were interchangeable...


To expect CRT/Williams Tube memory is a slightly bizarre assumption, it's a technology that was only very briefly used (developed in 1947 and used only in machines in the early 1950s, all but disappearing by 1956). It's also pretty much unheard of now.

While magnetic core memory was dominant from 1955 until around 1976, when silicon went mainstream. Machines with MCM were in general use for a long time after that too.

It's possible the interviewer confused the widespread use of CRT in pre-flat screen monitors. (Shrug, maybe CRT was very briefly mentioned? who knows?!)

Williams Tube memory is pretty interesting BTW. Worth taking a look at Selectron memory too. You know, for fun.


This is the one thing the article got wrong about the talk. The 7074 replaced magnetic drums with core memory. It's an easy mistake to make since the 700 series used vacuum tube logic and the 7070 replaced those with transistors. The 7074 came about two years after that.


I think the implication wasn't that they were interchangeable, but that the machine they were expecting to see used CRTs, and instead, they found it using MCM.


I didn't think that CRTs were a memory technology, but apparently they have been: https://en.wikipedia.org/wiki/Williams_tube


I'm not sure the article did well answering the headline. I now know that there is a mainframe (emulator it seems from additional research) that returns API responses very quickly (1-6ms) TO a Java app... which apparently is inefficient as the page render takes 6-10 seconds.

Interesting problems not terribly well described in the post, would love to read more.


Yeah it describes pretty basic "discoveries", and reads a bit like a book report. Would love some detail.


An explanation from 'bcantrill about why the talk was not recorded: https://lobste.rs/s/q2xz14/systems_we_love_videos/comments/i...


She likes those crufty legacy systems only because she works with each one for a short time. If she had to work on one every day for years, she would hate it, too.


Hmm, interesting. My bet is that the Java devs were not great at implementing concurrency. But who knows.


Probably lots of XML transforms going on to be enterprisey.


If it ain't broken, don't fix it.


You get sued for violating copyright?


Remember COBOL.net !


EBCDIC stacktraces!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: