In modern usage, the von Neumann machine is unfortunately poorly defined. You won't find it defined in Hennessy and Patterson (although they won the von Neumann award); it's discussed but not defined even in appendix L.
There are strong similarities between the Princeton IAS, UNIVAC 1, IBM 701 and even the CDC 1604 (Cray's transistor version). In fact, I'd say that the von Neumann reports [0] are the first 'architecture' with apologies to IBM. Sort of kind of. Don't chop my head off. But that similarity is not what people are thinking when they say von Neumann machine. It should be but it isn't.
But you have to really read the writing to get a sense of what a von Neumann machine actually is and reading those reports are damn hard. The Computer As Von Neumann Planned It [1] is fairly readable.
As an example, the von Neumann machine word had a binary digit (bit) describing whether a minor cycle (word) was a standard number (data) or an order (instruction, really two instructions); section 15.1 of the EDVAC report if you want to look it up. So it's kind've tagged. Lot other weirdnesses and cool ideas.
Minor cycles fall into two classes: Standard numbers
and orders. These two categories should be distinguished
from each other by their respective first units i.e. by
the value of i0. We agree accordingly that i0 = 0 is to
designate a standard number, and i0 = 1 an order.
Anyways, these days whenever someone says von Neumann they generally gloss over this blizzard of detail which they probably were never taught and just mean scalar. It's doubtful whether they are even distinguishing Harvard and Princeton architectures. They just mean something basic, fundamental, abstract. But they don't really mean von Neumann.
This all was the basis of patent lawsuit mentioned in the article. Over the years, there have been many histories written. Reconsidering the Stored-Program Concept [2] is pretty good.
You are marking out an interesting point, rather ignored in general observations.
We may say that there had been soon two strains in fixed word-length machines: One that prefixed address contents by one or more flags to taint the data regarding its type or kind (as in "<fixed-length taint>((<instruction>[<operand>])|<data>)", where "taint" may indicate data versus instruction, binary versus decimal, a flag to indicate a decision, etc) and, on the other hand, a more generalized form of "(<fixed-length instruction><operand>)|<data>", where a fixed length instruction represents a serial selector to a hierarchically structured function table and interpretation of the type of the content of any address was left to context. While the latter model lends itself to self-modifying programs more easily (with dedicated instructions to address either just the instruction part or just the operand exclusively and simple increments of instructions to perform pointer arithmetics), which soon became the prevalent standard before the advent of B registers and/or stacks, we eventually saw a return of tainting (both on the machine- and the OS-level) to ensure program and data integrity ...
I think this is conflating the stored program design and the Von Neumann machine. The storage of both programs and memory in the same address space was novel, but other options for storing programs exist (see especially Harvard Architecture). You can even make a case that "stored program" as an idea predates electronics and computers. Both Leibniz and Babbage explored the idea of a programmable computing device, and the Jacquard loom used ordered sets of cards that most people would recognize as a program.
Babbage didn't want to put the program and data in the same storage device. The program was a chain of cards; the data store was rows of number wheels mounted on a big cylinder for addressing purposes. Probably the closest production machine to Babbage's machine was the IBM Card Programmed Calculator, which we would not consider a computer today.
All thinking about this issue has to face the fact that there were no good memory elements before magnetic core. Delay lines were slow and sequential, drums were slower and sequential, and Williams tubes were faster, random access, but expensive per bit. Core memory wasn't cheap; it was about a million dollars a megabyte in 1970. "Just put the program in main memory" wasn't a good option until halfway decent memory hardware was developed.
You can even make a case that "stored program" as an idea predates electronics and computers.
You could but then you'd be conflating something on the other end of the spectrum - turning 'programmable' into 'stored program'. There is no sensible way in which a Jacquard loom is a 'stored program' device.
That's a fair point --- I'm certainly stretching to try to extend 'configurable' into 'stored programs'. I do still think that the shift to storing instructions in electronic memory (once said memories became available) was less revolutionary than evolutionary. Von Neumann's innovation (or at least the idea he was first to systematically describe) of treating instructions and data as the same type of thing was probably a little more of a leap, but I think all the implications of that weren't realized until later.
The book The Dream Machine [1] touches on the controversy and patent litigation around the "von Neumann" machine.
IIRC a colleague of von Neumann circulated a technical report by him, where he was summarizing the work of others along with his own, with regard to the stored program architecture. But his name was the only one on it, and the inventors of other machines got pissed off.
There was a rush to patent the idea, and patent litigation. But the idea was never patented, I think because of prior art.
It's interesting to think about what would happen if the idea was patented... I mean it is a significant idea and probably deserves a patent under the law. But would that have set computing history back by a decade or two?
The Dream Machine also goes into some other "inside baseball"... e.g. the relationship between Turing and Church, etc.
Through no fault of its own, this is a very American-centric view of early computing. I'm pretty sure that at the time this was written, the story of COLOSSUS was still highly classified by the British government. It has since been declassified, and you might read about it in this book [1].
Colossus was an important machine, but I don't think it was a stored-program computer (it was programmed through a plug-board.) On the other hand, Wikipedia's entry on stored-program computers says "In 1936 Konrad Zuse anticipated in two patent applications that machine instructions could be stored in the same storage used for data", though I don't think it was implemented in any of his machines.
Isn't the concept kind of implicit in a universal Turing machine?
There are strong similarities between the Princeton IAS, UNIVAC 1, IBM 701 and even the CDC 1604 (Cray's transistor version). In fact, I'd say that the von Neumann reports [0] are the first 'architecture' with apologies to IBM. Sort of kind of. Don't chop my head off. But that similarity is not what people are thinking when they say von Neumann machine. It should be but it isn't.
But you have to really read the writing to get a sense of what a von Neumann machine actually is and reading those reports are damn hard. The Computer As Von Neumann Planned It [1] is fairly readable.
As an example, the von Neumann machine word had a binary digit (bit) describing whether a minor cycle (word) was a standard number (data) or an order (instruction, really two instructions); section 15.1 of the EDVAC report if you want to look it up. So it's kind've tagged. Lot other weirdnesses and cool ideas.
Anyways, these days whenever someone says von Neumann they generally gloss over this blizzard of detail which they probably were never taught and just mean scalar. It's doubtful whether they are even distinguishing Harvard and Princeton architectures. They just mean something basic, fundamental, abstract. But they don't really mean von Neumann.This all was the basis of patent lawsuit mentioned in the article. Over the years, there have been many histories written. Reconsidering the Stored-Program Concept [2] is pretty good.
[0] https://library.ias.edu/ecp
[1] http://cva.stanford.edu/classes/cs99s/papers/godfrey-compute...
[2] http://www.markpriestley.net/pdfs/ReconsideringTheStoredProg...