How do you think S3 works? It's a distributed cluster with shards, replication, and everything. Elasticsearch implements its own version of that with some optimizations that suit their usecase; like running algorithms close to the data instead of fetching the data over a network and then running some algorithm.
You'd have a hard time matching performance, throughput and cost (particularly) rebuilding that on top of s3. It would probably work well enough for smallish data sets but would get expensive pretty quickly beyond that. For the same reason, network attached storage is not a thing with Elasticsearch. Bare metal, preferably SSD is what you want. Technically it works (though nfs is not recommended) but the performance sucks.
Elastic is doing plenty with ML but fundamentally search requires digging through heaps of data, whether you use ML or not. That data needs to live somewhere and that somewhere is indeed a distributed object store, which is exactly what Elastic implemented already.
As for reindexing all the time. You index only once, until you change the schema or data. That kind of is the whole point: it's CPU intensive and you don't want to have to do that over and over again at query time. There are systems like prestodb that do that of course but they tend to be not used for real-time, interactive use cases (like search). The whole point of having an index is not having to scan heaps of data when you run your queries. Systems like presto are good at delegating the work of scanning that data to gazillions of cluster nodes making it seem more cheap than it actually is.
You'd have a hard time matching performance, throughput and cost (particularly) rebuilding that on top of s3. It would probably work well enough for smallish data sets but would get expensive pretty quickly beyond that. For the same reason, network attached storage is not a thing with Elasticsearch. Bare metal, preferably SSD is what you want. Technically it works (though nfs is not recommended) but the performance sucks.
Elastic is doing plenty with ML but fundamentally search requires digging through heaps of data, whether you use ML or not. That data needs to live somewhere and that somewhere is indeed a distributed object store, which is exactly what Elastic implemented already.
As for reindexing all the time. You index only once, until you change the schema or data. That kind of is the whole point: it's CPU intensive and you don't want to have to do that over and over again at query time. There are systems like prestodb that do that of course but they tend to be not used for real-time, interactive use cases (like search). The whole point of having an index is not having to scan heaps of data when you run your queries. Systems like presto are good at delegating the work of scanning that data to gazillions of cluster nodes making it seem more cheap than it actually is.