Hacker News new | past | comments | ask | show | jobs | submit login

Mordecai uses SpaCy and is worth a look for extracting place names: https://github.com/openeventdata/mordecai

I wasn't too successful running it against tweets (low hit rate/false positives, low spatial resolution) but geolocating tweets is a hard problem and I'm sure it would work better against more structured text.




Are you using the pre-trained NER models or your own? If the former, I wouldn't expect it to work well on tweets since it wasn't trained on them.


Do you have any recommendations for building a custom language model for business-specific NER?


I recommend Prodigy to label your examples and train a Spacy model. Prodigy is the best tool I have ever used for NLP labeling. Most likely starting with a blank model will work better, but you can try starting with one of Spacy's pre-trained models.

https://prodi.gy/


I also recommend looking at ipyannotate if most of your workflow is in jupyter




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: