Between his time at Sun and Netflix, Cockcroft has had massive influence on architectural development practices over the past decade and a half. I'd say that his influence is more on par with folks like Wirth, Hoare and the GOF. Not nearly as obvious as the guys and gals whose influence was reified in hardware products, but still huge.
I'd say Urs is on another level than all known eng manager to the public. He directed the evolution of Google's infrastructure, which pretty much is always at the forefront of modern large scale infrastructure.
Google is the leader in large scale infrastructure for a single (or small set) of customers (Google, Youtube etc). You could call this "private hyper-scale cloud". AWS is the leader in vending that infrastructure to the rest of the world (a.k.a. public cloud).
I would argue that over time the second market will be much larger and more important than what Google built internally.
> a single (or small set) of customers (Google, Youtube etc)
Well, the applications inside Google are equally diverse compared to applications running on AWS, or at least on the same level. Google's infrastructure is used for an extremely wide range of use cases, from running a shell command remotely, to support planetary deployment of world's largest customer-facing applications.
It was not designed for simple or uniform use cases. Actually, it is impossible to design something that is simple and uniform, and at the same can support Google's growth on the way.
Your examples, Google (search), youtube, are actually examples that have extremely diverse requirements across their entirely tech stack. In fact, many of its requirements cannot be supported in any existing public Cloud providers.
Though, to be fair, Google is investing a lot in exposing some of its internal infrastructure as a public cloud. It's a relatively late comer to that game, but that game is also just getting started.
Those are all features. Line of sight means don't propagate forever. Stepping on somebody else is evident to everybody of a lost transmission.
The only real complaints about radio transmissions are that better training is needed, and there's limited number of conversations per hour possible, sometimes being a limiting factor on landings.
Identifying the Reasons in hindsight is not the point. There are always Reasons but if you can't reliably identify them before the event rather than after then it's not of any predictive interest.
Blanket statements like "Cassandra doesn't work" and "Hadoop is a complete waste of hosts for most companies" are unproductive and contribute nothing, unless you can back them with data and real world examples.
So, what data do you base these assertions on ? Also, not to burst your bubble but a lot of businesses (if not the majority) run Spark on YARN. And Spark is built on the JVM.
If they had data and examples they would almost certainly have enough experience not to say things like "______ doesn't work" and "______ is a complete waste of hosts."
Disagree on two fronts :
- By hadoop I assume you mean Map Reduce ? There are other Engines like SAMSA and FLINK and Kafka makes a great event store. Anyway, MR is super for massive throughput batch jobs, for example huge HIVE queries or Pig jobs and if you are reading, breaking the heap size, doing one thing and then writing there is no bonus from doing it in SPARK.
- SPARK is written in Scala, which runs on the JVM. And it has a nice Java API as well !
I'd be interested to know how you can assert that Hadoop is a 'complete waste of hosts for most companies'. Also, don't underestimate the many, many people successfully running Spark on YARN at scale. Hadoop is actually quite helpful to some workloads.
Most companies simply don't have the data volume to make Hadoop worthwhile. You can process tens of TB in an RDBMS on a beefy machine cheaper than a Hadoop cluster.
Hadoop is slow, but on huge data volume the overheads are dwarfed by the parallelism gained. Most companies don't have huge volume though.
For example recently I saw someone propose using Hadoop for a sub-TB dataset...
Actually, it's the other way around. He designed and built it, and Netflix learned scale experience from him.
Probably the most influential and successful IT manager in human history. (I worked in his group at Netflix.)