Hacker News new | past | comments | ask | show | jobs | submit login

Nice name and clean interface!

I'm glad to see Remote as a location, but due to the free-form writing in the original posts, there are errors. For example, "Haskell dev at Standard Chartered Bank" is listed under Remote, but the post itself says "Remote work isn’t an option". The post for Button similarly doesn't allow remote, but uses "Remote - no" to convey that.

I've been planning on building some filtering for the Who is Hiring threads, and I've pretty much determined that some degree of manual review will be needed. In the most recent thread, I found a huge number of posts containing "remote" which don't actually allow remote working. "No remote" is fairly common and easy to filter out, but there are any number of variations that you can't anticipate a priori.




> I've pretty much determined that some degree of manual review will be needed

You're spot on with everything. I did a lot of manual review and the site already filters out "NO REMOTE", "REMOTE no", "Remote not" and "No Remote" entries. I did spot the "Remote work isn’t an option" post, but I decided I'm not going to write that kind of completely ad-hoc filtering rules, it's just ugly.


You could break the text up into sentences [1] and do sentiment analysis [2] on the sentences with 'remote' in. Then flag based on that.

[1] https://opennlp.apache.org/documentation/1.5.3/manual/opennl...

[2] http://nlp.stanford.edu/sentiment/


Wikify it.

Let users can log in and change the remote/non-remote status (and other attributes).

Have some kind of trust system (could be linked to HN points or whatever).

(Even better if the YC guys made a custom job board where you fill in a form with all the details so there is no inconsistency.)


Or you could hire people to do it via oDesk or Mechanical Turk. Not so interesting technically, but it's a job people are good at.


Hire people for cheap to help people be hired for $$$, with no reward for the upsell. Brilliant! :)


Sentiment analysis probably isn't the right option here, though it may work.

I think a combination of dependency parsing[1] and regex is the way to go.

regex examples: "Remote: No", "No remote please"

Dependency parsing examples: ""Remote work isn’t an option", "Remote work will not be considered"

[1] look for negation in the parse tree using something like http://demo.ark.cs.cmu.edu/parse?sentence=Remote%20work%20is...


Sentence segmentation and sentiment analysis may be overkill.

N-grams + Naive Bayes is potentially Good Enough.


All these strategies are interesting, but I'm afraid we are over-engineering the problem here. The pretty simplistic strategy I'm using now is basically just pattern matching, and so far I had only 4 misplaced posts out of the 840 for April alone: that is < 0.5%. And it's blazing fast! I can rebuild the entire db in less then 30 seconds.

Given these number I believe pretty much everything more complicated than that would be a total overkill... Good food for thoughts though!


I just manually curate in these cases. HN hiring threads don't ever exceed a level where 0.5% manual review would be onerous.


I think you will need 100% manual review to find those 0.5%


In my experience with data quality management, manual translation of these edge cases is not pleasant. Yet it can be very valuable. It's a bit like "online learning" in machine learning - each time an error is found, you provide the correct answer. Yes, you might end up with a long array of phrases/regexes to check against. However, it scales just right for the amount of data you have and provides high quality results.


> REMOTE no

"REMOTE no problem!" :) Just kidding. Great job.


A better option would be to require job postings to make location and remote-ability explicit at the top, in a standard format/layout. Because quite often I'm Cmd+F-ing through a thread and landing on a ton of "no remote" posts, which is frustrating.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: