> "Now back in the early days of this country, when they moved heavy objects around, they didn't have any Caterpillar tractors, they didn't have any big cranes. They used oxen. And when they got a great big log on the ground, and one ox couldn't budge the darn thing, they did not try to grow a bigger ox. They used two oxen! And I think they're trying to tell us something. When we need greater computer power, the answer is not "get a bigger computer", it's "get another computer". Which of course, is what common sense would have told us to begin with."
This is a reaction to Grosch's Law, "Computing power increases as the square of the price".[1] In the early 1980s, people still believed that. Seymour Cray did. John McCarthy did when I was at Stanford around then. It didn't last into the era of microprocessors.
Amusingly, in the horse-powered era, once railroads started working, but trucks didn't work yet, there was a "last mile" problem - getting stuff from the railroad station or dock to the final destination. The 19th century solution was to develop a bigger breed of horse - the Shire Horse.[1]
The article you linked doesn't support your anecdote about the root of the Shire Horse. It describes their history dating back centuries before railways. Their biggest use seems to have been hauling material to and from ports, not trains.
Sure it does. Shires go back a ways, but they were not bred in quantity until the 1850s or so. "In the late nineteenth and early twentieth centuries, there were large numbers of Shires, and many were exported to the United States."
Before and after that period, those big guys were an exotic breed. That's the railroad but pre-truck period.
(I've owned a Percheron, and have known some Shires.)
When you cite a source for something, the source should justify the thing you're claiming. The relevant part of your post to the thread was that "The 19th century solution was to develop a bigger breed of horse." It is entirely contrary to what the Wikipedia article says - "The breed was established in the mid-eighteenth century, although its origins are much older".
(relatives have owned Friesian's and Clydesdale's and Norwegian Fjording Horses, but it's neither here nor there)
That other definition doesn't really seem to fit either, but I acknowledge that if they had used a different word ("adopted a bigger breed" or "popularized a bigger breed" or something) then it would fit with the anecdote.
In working animal husbandry, breeds are rarely static and are typically selectively bred for the task at hand. As that task or its environment evolves, the breed is further developed or even forked to accommodate the new conditions.
So I think “developed” is fairly appropriate here, though “adapted” might have been more clear.
Additionally, with early pioneer logging, another solution to avoiding having logs which are too large to handle was to not drop them in the first place.
In the Pacific Northwest, US, early loggers would leave the huge ones - to the point where pioneers could complain about a lack of available timber in an old-growth forest.
When the initial University of Washington was built, land-clearing costs were a huge portion of the overall capital spend. The largest trees on the site weren't used for anything productive; rather, they were climbed, chained together, and domino felled at the same time. By attaching the trees together, they only needed to fell one tree which brought the whole mess down into a pile and they burned it.
I think there's a lesson here about choosing which logs you want to move.
The "to record an event" meaning of logging does in fact originate from wooden logs, which were used to calculate the speed of a ship under sail. A log, tied to a fathom-line, was cast off the stern, and the number of knots which passed through the sailors hands in a measured time interval determined the speed. This was recorded as "log", in what came to be known as the "ship's log". The term came to be used for event recording in general. This was used as a component of "dead reckoning" (that is, deduced reckoning of position) in navigation, prior to the development of accurate time-keeping and direct position reporting through LORAN, radar-navigation, and ultimately GPS. Dead reckoning was not especially accurate and had some rather notorious failure modes.
More recently, the Honda Point disaster in which the US Navy, 1923 saw the loss of 7 destroyers at flank speed of 20 knots off the Santa Barbara coast: <https://en.wikipedia.org/wiki/Honda_Point_disaster>.
It's interesting to note that advances in timekeeping typically translate to improvements in location determination.
That "they didn't have any big cranes" forces the analogy in a way that breaks it. The solution wherever cranes are used is absolutely to get a bigger crane. And also, oxen were absolutely bred to be bigger. That's kind of the defining thing that distinguishes draft oxen from other kinds of cattle. But that process was limited by some factors that are peculiar to domesticated animals. And, of course, if you need to solve the problem right now, you make do with the current state of the art in farm animal technology.
Admiral Hopper's lecture wasn't delivered too long after 1976, which saw the release of both the CRAY-1 (single CPU) and the ILLIAC IV (parallel). ILLIAC IV, being more expensive, harder to use, and slower than the CRAY-1, was a promising hint at future possibility, but not particularly successful. Cray's quip on this subject was (paraphrasing) that he'd rather plow a field with one strong ox than $bignum chickens. Admiral Hopper was presumably responding to that.
What they both seem to miss is that the best tool for the job depends on both the job and the available tools. And they both seem to be completely missing that, if you know what you're doing, scale up and scale out are complementary: first you scale up the individual nodes as much as is practical, and then you start to scale out once scale up loses steam.
>And also, oxen were absolutely bred to be bigger. That's kind of the defining thing that distinguishes draft oxen from other kinds of cattle.
In your attempt to take down the analogy you just reinforced it. They quickly hit the limits of large oxen and had to scale up far faster than any selective breeding could help.
The exact same thing happened in computing even during the absolute hay day of Moore’s law. Workloads would very quickly hit the ceiling of a single server and the way to unblock yourself was not to wait for next gen chips but to parallelize.
It's not that multiprocessing systems didn't exist at the time Hopper delivered this lecture; it's that they remained fairly niche products for computing researchers and fairly deep-pocketed organizations. At the time, multiprocessing was still very difficult to pull off. It wasn't necessarily analogous to just yoking two oxen to the same cart. It was maybe more like a world where the time it takes to breed an ox that's twice as strong is comparable to the time it takes to develop a working yoke for state-of-the-art oxen, and also nobody's quite sure how to drive a two-oxen team because it's still such a new idea. So the parallel option wasn't as sure of a bet from a business perspective as it is now.
Interestingly, there are cases where a “support crane” is made to lift crane components up to a higher altitude where a different “primary crane” can be to do the remainder of the heavy lifting. At that point the listing can theoretically be efficiently parallelized, with two items being able to be hoisted at any given moment.
Her analogy is in the context of her larger topic of systems of computers, and I think history has largely proven out her advocated approach. She highlights a case of trying to cram a multi-user time sharing system, security system, database and actual programs all onto a single large computer, and contrasts this with having a separate system managing multiple user access, a separate system managing the database, and individual systems managing each application. Which sounds a lot like a firewall/gateway + Postgres server + API server + reporting server + etc + etc setup that is pretty much the design of every major system I've worked on. Yes, to a small degree, we sort of cram these things back into a single box these days by way of virtualization, but it's pretty rare to see a system where everything is running on the same machine, under the same OS like Admiral Hopper would have been talking about at the time.
If you served me two pound cakes at the same time, I would say things were improved. However, it wouldn't be very efficient as I would still only eat them serially instead of in parallel
My favorite quote/story was "Never Never Never take the First No!" (16:20) because some people are just obstructionists and others just want to see how serious you are. As she says, appreciation for how to do this comes with age, but to me a good definition of management in general is learning if, when, how, and how much to pushback against pushback.
There’s an inflection point, however. Hence Seymour Cray’s famous quip, “If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?”
Well, if growing or breeding bigger oxen were as feasible as building bigger (mower powerful) computers was/is, perhaps people would take a different path?
In other words, perhaps the analogy is flawed?
Sure, but it's important to notice that the biggest off the shelf computers keep getting bigger.
Dual socket Epyc is pretty big these days.
If you can fit your job on one box (+ spares, as needed), you can save a whole lot of complexity vs spreading it over several.
It's always worth considering what you can fit on one box with 192-256 cores, 12TB of ram, and whatever storage you can attach to 256 lanes of PCIe 5.0 (minus however many lanes you need for network I/O).
You can probably go bigger with exotic computers, but if you have bottlenecks with the biggest off the shelf computer you can get, you might be better of scaling horizontally, but assuming you aren't growing 4x a year, you should have plenty of notice that you're coming to the end of easy vertical scaling. And sometimes you get lucky and AMD or Intel makes a nicely timed release to get you some more room.
exactly, that's what it is
as we hit the end of moore's law (which we won't, but we'll hit the end as far as feature size shrinkage)... one of the optimizations they will do is rote trivial process optimization. So if the chip failure rate on the assembly line is 40% they drop it to 10%. Costs will drop accordingly, because there are x-fold more transistors per dollar, thus ensuring moores law.
How to organize them is a hard problem. For general-purpose use, we have only three architectures today - shared memory multiprocessors, GPUs, and clusters.
When Hopper gave that talk, people were proposing all sorts of non-shared memory multiprocessor setups. The ILLIAC IV and the BBN Butterfly predate that talk, while the NCube, the Transputer, and the Connection Machine followed it by a year or two. This was a hot topic at the time.
All of those were duds. Other than the PS3 Cell, also a dud, none of those architectures were built in quantity. They're really hard to program. You have to organize your program around the data transfer between neighbor units. It really works only for programs that have a spatial structure, such as finite element analysis, weather prediction, or fluid dynamics calculations for nuclear weapons design. Those were a big part of government computing when Hopper was active. They aren't a big part of computing today.
It's interesting that GPUs became generally useful beyond graphics. But that's another story.
(Some) TPUs look more like those non-shared memory systems. The TPU has compute tiles with local memory and the program needs to deal with data transfer. However, the heavy lifting is left to the compiler, rather than the programmer.
Some TPUs are also structured around fixed dataflow (systolic arrays for matrix multiplication).
It's not really comparable to the other examples you cite - the ncube/transputer/connection machine, in that it was programmed conventionally, not requiring a special parallel language, but Tandem's NonStop was this, starting in ~1976 or 1977. Loosely coupled, shared-nothing processors, communicating with messages over a pair of high-speed inter-processor busses. It was certainly a niche product, but not a dud. It's still around having been ported from a proprietary stack-machine to MIPS to Itanium to X86.
EDIT: I suppose it can be compared to a Single System Image cluster.
Sure, but, as I was (rather unpopularly) pointing out in another comment, that point was pretty hard to reach in 1982. Specifically the point where you've met both criteria: bigger computer is too cost prohibitive to get, and lots of smaller computers is easier. At the time of this lecture, parallel computers had a nasty tendency to achieve poorer real-world performance on practical applications than their sequential contemporaries, despite greater theoretical performance.
It's still kind of hard even now. To date in my career I've had more successes with improving existing systems' throughput by removing parallelism than I have by adding it. Amdahl's Law plus the memory hierarchy is one heck of a one-two punch.
because you could still make bipolar electronics that beat out mass-produced consumer electronics. By the mid 1990s even IBM abandoned bipolar mainframes and had to introduce parallelism so a cluster of (still slower) CMOS mainframes could replace a bipolar mainframe. This great book was written by someone who worked on this project
we had at Cornell were going to win (ours was way bigger) because they were scalable. (e.g. the way Cray himself saw it, a conventional supercomputer had to live within a small enough space that the cycle time was not unduly limited by the speed of light so that kind of supercomputer had to become physically smaller, not larger, to get faster)
Now for very specialized tasks like codebreaking, ASICs are a good answer and you'd probably stuff a large number of them into expansion cards into rather ordinary computers and clusters today possibly also have some ASICs for glue and communications such as
The problem I see with people who attempt parallelism for the first time is that the task size has to be smaller than the overhead to transfer tasks between cores or nodes. That is, if you are processing most CSV files you can't round-robin assign rows to threads but 10,000 row chunks are probably fine. You usually get good results over a large range of chunk size but chunking is essential if you want most parallel jobs to really get a speedup. I find it frustrating as hell to see so many blog posts pushing the idea that some programming scheme like Actors is going to solve your problems and meeting people that treat chunking as a mere optimization you'll apply after the fact. My inclination is you can get the project done faster (human time) if you build in chunking right away but I've learned you just have to let people learn that lesson for themselves.
To your last point, it's been interesting to watch people struggle to effectively use technologies like Hadoop and Spark now that we've all moved to the cloud.
Originally, the whole point of the Hadoop architecture was that the data were pre-chunked and already sitting on the local storage of your compute nodes, so that the overhead to transfer at least that first map task was effectively zero, and your big data transfer cost was collecting all the (hopefully much smaller than your input data) results of that into one place in the reduce step.
Now we're in the cloud and the original data's all sitting in object storage. So shoving all your raw data through a tiny small slow network interface is an essential first step of any job, and it's not nearly so easy to get speedups that were as impressive as what people were doing 15 years ago.
That said I wouldn't want to go back. HDFS clusters were such a PITA to work with and I'm not the one paying the monthly AWS bill.
> The problem I see with people who attempt parallelism for the first time is that the task size has to be smaller than the overhead to transfer tasks between cores or nodes.
My big sticking point is that for some key classes of tasks, it's not clear that this is even possible. I've seen no credible reason to think that throwing more processors at the problem will ever build that one tool-generated template-heavy C++ file (IYKYK) in under a minute, or accurately simulate an old game console with a useful "fast forward" button, or fit an FPGA design before I decide to take a long coffee-and-HN break.
To be fair, some things that do parallelize well (e.g. large-scale finite element analysis, web servers) are extremely important. It's not as though these techniques and architectures and research projects are simply a waste of time. It's just that, like so many others before it, parallelism has been hyped for the past decade as "the" new computing paradigm that we've got to shove absolutely everything into, and I don't believe it.
It isn't for a great many tasks. Basically, whenever you're computing f(g(x)), you can't execute f and g concurrently.
What you can do is run g and h currently in something that looks like f(g(), h()). And you can vectorize.
A lot of early multiprocessor computers only gave you that last option. They had a special mode where you'd send exactly the same instructions to all of the CPUs, and the CPUs would be mapped to different memory. So in many respects it was more like a primitive version of SSE instructions than it is to what modern multiprocessor computers do.
I was getting into 3D around the time the Pentium was out, and I took a lot of time looking at the price of a single Pentium computer or multiple used 486s. The logic being a mini render farm would still be faster than a single Pentium. Never pulled the trigger on either option
These days, “get another computer” and “get a bigger computer” are basically the same thing; differences primarily residing in packaging and interconnects, but boy howdy can those interconnects make a difference.
Actually, humans have been doing exactly that through breeding over the millennia. They were just limited in their means.
This analogy has some "you wouldn't download a car!" vibes — sure I would, if it were practical (: And vertical scaling of computers is practical (up to some limits).
This is irrelevant to the example cited by Hopper. If you have a large log, you don't have time to breed a larger ox. You need to solve the problem with the oxen you have.
The thing is, Hopper said this in 1982. This was a time when, to keep stretching the analogy, it wasn't hard to find a second ox, but yokes were still flaky bleeding edge technology that mostly didn't work very well in practice.
One potentially more likely solution back in the day was to just accept the job was going to take a while. This would be analogous to using a block and tackle. The ox can do the job but they're going to pull for twice as long to get it done. Imagine pulleys cost $10, but a second ox costs $1000 and a yoke costs $5000, and getting the job done in less time is not worth $5,990 to you.
I'm all in favor of making the best of what's available. But at the same time, if such thinking is taken as dogma, innovation suffers.
You spoke of one log, and the time scales involved.
But suppose you have an entire forest of logs. Then it may indeed be worth breeding bigger oxen (or rather, inventing tractors).
I don't mean to accuse Hopper of shortsightedness, but when quotes by famous people, like the above, are thrown around without context, they encourage that dogmatic thinking.
So, I was more replying to that quote as it appeared here, rather than as it appeared in her talk.
> if such thinking is taken as dogma, innovation suffers.
I don’t think there’s anything about the original post, with quote about oxen, that reads as dogmatic, or invites such perspective.
Also, I think we can all agree most innovation happens as an extension of “making the best of what’s available” rather than independent of it, on a fully separate track.
Using two oxen can lead to realizing a bigger ox would be beneficial.
I don't mean to wear out this thread, and I totally respect your different viewpoint, but when I see:
... they're trying to tell us something. When we need greater
computer power, the answer is not "get a bigger computer", it's
"get another computer".
that does read as dogmatic advice to me, taken in isolation. It boils down to "the answer is X."
Not "consider these factors" or "weigh these different options," but just "this is the answer, full stop."
That's dogma, no?
(that aside, I do slightly regret the snarkiness of my initial comment :)
It is extremely rare that your compute workload has scaling properties that need just a little bit faster computer. The vast majority of the time if you are bound by hardware at all, the answer is to scale horizontally.
The only exception is really where you have a bounded task that will never grow in compute time.
Perhaps I misunderstand you, but what about those decades where CPUs were made faster and faster, from a few MHz up to several GHz, before hitting physical manufacturing and power/heat limits?
Was that all just a bunch of wasted effort, and what they should have been doing was build more and more 50MHz chips?
Of course not. There are lots of advantages to scaling up rather than out.
Even today, there are clear advantages to using an "xlarge" instance on AWS rather than a whole bunch of "nano" ones working together.
But all this seems so straightforward that I suspect I really don't understand your point...
>Perhaps I misunderstand you, but what about those decades where CPUs were made faster and faster, from a few MHz up to several GHz, before hitting physical manufacturing and power/heat limits?
If you waited for chips to catch up to your workload, you got smoked by any competitors who parallelized. Waiting even a year to double speed when you could just use two computers was still an eternity.
> Was that all just a bunch of wasted effort, and what they should have been doing was build more and more 50MHz chips?
No, that’s a stupid question and you know it. You set it up as a strawman to attack.
Hardware improvements are amazing and have let us do tons for much cheaper.
However, the ~4ghz CPUs we have now are not meaningfully faster in single thread performance compared to what you could buy literally a decade ago. If you’re sitting around waiting for 32ghz that should only be “3 years away”, you’re dead in the water. All modern improvements are power savings and density of parallel cores, which require you to face what Grace presented all those years ago.
Faster CPUs aren’t coming.
xlarge on AWS is a ton of parallel cores. Not faster.
I just want to make one last attempt to get my point across, since I think you are discussing in good faith, even if I don't like your aggressive timbre.
There is risk in reinforcing a narrow-minded approach that "all we need is more oxen." It limits one's imagination. That's the essence of what I've been advocating against in this thread, though perhaps my attempts and examples have merely chummed your waters. Ironically, I'd say Grace Hopper rather agrees, elsewhere in the linked talk[1].
I think the saddest phrase I ever hear in a computer installation
is that horrible one "but we've always done it that way." That's a
forbidden phrase in my office.
I liked grace hopper's comments as a rebuttal against "only vertical! No horizontal!" but I'd agree that reading that rebuttal dogmatically would be just as bad of a decision.
Bigger is better in terms of height and girth when it comes to capabilities. At any given time, figure out the most cost efficient number of oxen of varying breeds for your workload and redundancy needs and have at it. In another year if you're still travelling the Oregon trail you can reconsider doing the math again and trading in the last batch's oxen for some new ones, repeat as infinitum or as long as you're in business.
You clearly aren’t working in the constraints of computing in reality. The clock speed ceiling has been in place for nearly 20 years now. You haven’t posted anything suggesting alternatives are possible.
Your point has been made and I’m telling you very explicitly that it’s bad. The years of waiting for faster processors have been gone for basically a generation of humans. When you hit the limit of a core, you don’t wait a year for a faster core, you parallelize. The entire GPU boom is exemplary of this.
I agree. And it is interesting too that the ceiling for the faster computer still goes back to her visualization of a nanosecond. Keep cutting that wire smaller and smaller, and there's almost nothing left to cut. But if we want it to go faster, we'd need to keep halving the wires.
Despite the very plain language her talk has a lot of depth to it and I do think how interesting how on the money she was with her thoughts all the way back then.
I think the misunderstanding here (and I apologize where I've contributed to it) is that you think I'm talking specifically and only about CPU clock rates.
The scale-up/scale-out tradeoff applies to many things, both in computing and elsewhere. I was trying to make a larger point.
I guess it's appropriate, in this discussion about logging, that we got into some mixup between the forest and the trees (:
In there context of an analogy for parallelism, a tractor is just a bigger oxen. The whole point seems to be instead of making a bigger X to do function Y one has the option to use multiple X at the same time.
>But suppose you have an entire forest of logs. Then it may indeed be worth breeding bigger oxen
That’s idiotic unless you have other constraints. The parallelism allows you to also break apart the oxen to do multiple smaller logs at the same time when their combined force isn’t needed.
To an extent, but there's also a reason why beasts of burden didn't get to arbitrarily large sizes. Scaling has limits (particularly in this case both thermal limits and material strength limits).
> "Now back in the early days of this country, when they moved heavy objects around, they didn't have any Caterpillar tractors, they didn't have any big cranes. They used oxen. And when they got a great big log on the ground, and one ox couldn't budge the darn thing, they did not try to grow a bigger ox. They used two oxen! And I think they're trying to tell us something. When we need greater computer power, the answer is not "get a bigger computer", it's "get another computer". Which of course, is what common sense would have told us to begin with."