Hacker News new | past | comments | ask | show | jobs | submit login

The preprocessing that you need is (in Lucene nomenclature, but it's the same principle for search in general) an Analyzer (the component, which knows to prepare the plain text that gets inside for storing it in an index and the corresponding component for a search query) made for code search. That's not different from analyzers for other languages (Stemming sucks for almost everything but English). Thinking about it .. the frontend of most compilers for a language could maybe make a pretty good Analyzer. It already knows language specific components and can split them into parts it needs for further processing, which is basically what an analyzer does.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: