I actually have seen the opposite trend. You don’t want to standardize the search engine and that has been a major problem with SOLR.
Instead, you want to custom build the data retrieval system so it’s tailored to your use case.
One example from experience was needing to add hard filterable metadata to an in-house search index. We solved this by actually calculating bit masks that represented all the filtering criteria and having a frontend preprocessor that would first restrict to the filtered subset and then do TFIDF-based relevance sorting.
Creating the bit mask tooling ourselves (instead of relying on whatever baked-in method of scanning items and filtering that comes with out of the box search engine tools) allowed us complete control over the trade-offs, particularly managing document deletions and optimizing run time performance in certain ways that just weren’t available in out of the box tools, as well as being able to integrate any in-house code into the search engine as needed (since the whole system was in-house code).
You want to create data models that are highly application specific, and then route data into them. The mistaken approach of one-size-fits-all tools, especially in information retrieval, is to pre-define the supported behavior of the application, like a web service wrapping a search index, with baked-in assumptions about the trade-offs and only limited support to modify or configure the trade-offs under the hood.
The gravest mistake is thinking just because your use case seems to function OK with those assumptions now, that you can marry yourself to the underlying data model. Then in the future you’ll hit the point where you have to throw it away and create something custom, but it will be far more costly to do so and extremely hard to migrate gracefully and ensure integrations are working.
Instead, you want to custom build the data retrieval system so it’s tailored to your use case.
One example from experience was needing to add hard filterable metadata to an in-house search index. We solved this by actually calculating bit masks that represented all the filtering criteria and having a frontend preprocessor that would first restrict to the filtered subset and then do TFIDF-based relevance sorting.
Creating the bit mask tooling ourselves (instead of relying on whatever baked-in method of scanning items and filtering that comes with out of the box search engine tools) allowed us complete control over the trade-offs, particularly managing document deletions and optimizing run time performance in certain ways that just weren’t available in out of the box tools, as well as being able to integrate any in-house code into the search engine as needed (since the whole system was in-house code).
You want to create data models that are highly application specific, and then route data into them. The mistaken approach of one-size-fits-all tools, especially in information retrieval, is to pre-define the supported behavior of the application, like a web service wrapping a search index, with baked-in assumptions about the trade-offs and only limited support to modify or configure the trade-offs under the hood.
The gravest mistake is thinking just because your use case seems to function OK with those assumptions now, that you can marry yourself to the underlying data model. Then in the future you’ll hit the point where you have to throw it away and create something custom, but it will be far more costly to do so and extremely hard to migrate gracefully and ensure integrations are working.