Hacker News new | past | comments | ask | show | jobs | submit login

To be fair I don't think you need CV in this specific case where the problem space is very limited.

1. There's no lighting, so the enemies have specific, fixed pixel colours that don't appear in any of the backgrounds. Scan and target these.

2. Enemies appear in a specific zone in the canvas. Makes scan faster, combines with below.

If there's expected ambiguity one can a. detect a few interesting background properties by looking at pixels where enemies never appear (e.g corners), and/or b. use a couple of other pixels relative to the candidate match (maybe neighbours, maybe not, could just as well be 20px down, 10 left) to discriminate.

Side story: one day my team was tasked with doing textual document content recognition for some biz. Everyone was like "oh it's going to be $$$ to pull out CV+OCR and have the OCR learn the specific font".

Turns out the document in question was:

    - an extremely standardised gov format
    - produced only by gov administration
    - of a known fixed, overall size with clear identifiable boundaries
    - printing known, standardised list of fields at fixed position
    - with a known, standard font specifically made for quick automatic recognition
    - containing only /[A-Za-z0-9]/ chars (plus a few I can't recall, but essentially dash, plus, slash...)
    - on a known, standardised background
    - the only variable is the quality of the scan and the size parameters
So I put a file upload form, piped the image through some reasonable imagemagick filter sequence to turn it into a no-background monochrome, look for corners/borders, resize+rotate, scan through the image til I hit a black pixel, then look at pixel-lit/unlit patterns (think 7 segment display in reverse).

Cobbled the thing in a couple afternoons, with a quick, simple UI to have the user crop/rotate the doc (putting it mostly upright). It was stupidly fast to run and success rate was very high. Interestingly enough the failure mode was very good as it could reliably tell "ok I can't make any sense out of this" vs OCR which claimed success but outputted gibberish.

You can get surprisingly far with very little when you have known knowns.




Nah, a proper anecdote should end with 'and you could check a one checkbox at the gov site and instead of the scan you would receive the 'printed' PDF/A with the text layer intact'.

But yeah, there is always a way to optimize. Even if making a clean room implementation (ie not looking at the source of that DOOM captcha) you can easily narrow down a recognition to a couple of 2x2 blocks and just pattern match them against a known background (ie not a monster).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: