This is really interesting and thought-provoking, but I'm skeptical.
1. In my experience, each data source and each data format requires a lot of custom work. Each kind of prediction task requires additional custom work. I don't see this going away. Even if a company develops a solid core of reusable engineering infrastructure, it will always need to be adapted to the problem at hand. At this point, this company would seem more like a consultancy, with non-trivial marginal/variable costs. This reminds me of Palantir, which operates this way - core set of tools and infrastructure, consultants implement and apply these tools/infra at each client company with a lot of custom integration work.
2. Assuming this is not a problem and the shared infrastructure is able to generalize enough of the custom work to be feasible, this thesis actually seems like an argument for the big tech companies dominating all data companies. Google, Microsoft, and Amazon have the engineering talent and resouces to develop this hypothetical infrastructure. They also have the internal political will because they can then expose this as cloud APIs. Indeed, it appears they are already attempting this in certain domains.
3. Superior engineering infrastructure is indeed a competitive advantage, but isn't enough of a moat for a single company to dominate this space. Yes, great engineers are hard to find, but there are enough of them for more than one company to feasibly develop this infrastructure, with a lot of money. You can't say the same about trying to buy the social network of Facebook or Instagram.
1. In my experience, each data source and each data format requires a lot of custom work. Each kind of prediction task requires additional custom work. I don't see this going away. Even if a company develops a solid core of reusable engineering infrastructure, it will always need to be adapted to the problem at hand. At this point, this company would seem more like a consultancy, with non-trivial marginal/variable costs. This reminds me of Palantir, which operates this way - core set of tools and infrastructure, consultants implement and apply these tools/infra at each client company with a lot of custom integration work.
2. Assuming this is not a problem and the shared infrastructure is able to generalize enough of the custom work to be feasible, this thesis actually seems like an argument for the big tech companies dominating all data companies. Google, Microsoft, and Amazon have the engineering talent and resouces to develop this hypothetical infrastructure. They also have the internal political will because they can then expose this as cloud APIs. Indeed, it appears they are already attempting this in certain domains.
3. Superior engineering infrastructure is indeed a competitive advantage, but isn't enough of a moat for a single company to dominate this space. Yes, great engineers are hard to find, but there are enough of them for more than one company to feasibly develop this infrastructure, with a lot of money. You can't say the same about trying to buy the social network of Facebook or Instagram.