If you sprinkle in a bit of infrastructure, I think we're already there. The ability to distill a variety of content into vectors and perform approximate nearest neighbor search (shameless plug: https://milvus.io) across all of them can really help power a lot of these applications. With the retrieved vectors, you could match questions with answers or create a reverse index to the original content to perform summarization.
With that being said, one of the main challenges ahead will be multimodal learning. We're sort-of there combining text with visual data, but there are many other modalities out there as well.
Might be usable for low column count tabular data, but it would be pretty terrible for any other semantically dense modality e.g. video, molecules, geospatial, etc.
Not precisely, but if you had 50 documents in that 1000-dimensional embedding and you reduced the dimensions to three and still got at least the exact same nearest neighbor ordering then it would at least still function, right?
I guess the problem is taking a new document (like a search term) in the higher dimensional embedding and reducing it to three dimensions for searching in that reduced space and expecting that to also maintain the same nearest neighbor ordering.
With that being said, one of the main challenges ahead will be multimodal learning. We're sort-of there combining text with visual data, but there are many other modalities out there as well.