Hacker News new | past | comments | ask | show | jobs | submit login

Perfect timing for me. I've been working on a project that needs multilingual vectorizing and entity recognition. I've been using dbpedia and yago queries.

Given an input text, what would be a good way to extract a list of entities? The word sequence should be usable to determine which is an entity or just a word.

Is there the possibility to do fine tuning a la Bert or Elmo?




My past paper describes an entity linking method based on Wikipedia2Vec: https://arxiv.org/abs/1601.01343

You need to extract entity names using an NER software (e.g., SpaCy, Stanford NER), and resolve the names to knowledge base entities using the entity linking method.


You can use the TAGME API to do that, it’s state of the art as far as I know. If you want to implement it yourself I think it’s pretty painful, though.

No reason you can’t fine tune these on your task, that’s true of any word embeddings not just Bert or Elmo.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: