Hacker News new | past | comments | ask | show | jobs | submit login

> You can write your own tokenizer, but it has to be as a C extension.

Another option is to tokenize the data before insertion into PostgreSQL (which, ironically, I'm currently doing in one of my projects using Lucene).




Sure, but that gives away most of the benefit of using it. You have to have your tokenizer algorithm available at query time (so you can tokenize the query the same way), usually at least.

You also have to undertake some batch process to reindex things that have been added/changed. You don't have to handle deletes, which is certainly the trickiest thing to try to keep in sync, but once you have some external cronjob or whatever running your code over all the full text entries, you might as well just make that job insert the data into ElasticSearch. Most of the difficulty in setting up full text search outside of the DB is creating and managing/monitoring the process to shuttle the data out of the db and into ElasticSearch, and if you are going to do most of the work, you might as well get the benefit, rather than just shuttling the data out of the database and then back into the database.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: