Hacker News new | past | comments | ask | show | jobs | submit login
Build a better spell checker, win $10,000 (microsoft.com)
93 points by willf on Dec 16, 2010 | hide | past | favorite | 60 comments



  However, by submitting your entry, you:
  are granting us an irrevocable, royalty-free, worldwide
  right and license 
I'll stop reading there. I can't honestly believe that competent programmers would be willing to work for free for one of the worlds largest multi-national software corporations, with the chance of being paid a relatively small amount.


I hate to support Microsoft on this one - I was actually checking the rules to see if they forbid you from dual-licensing it - but you pulled this out of context. The license only allows them to use your code to evaluate who won the contest, and use it in promotional materials. Now, the latter part is a bit murky, it seems they have a license to publish a screenshot of your solution. But nevertheless, they are not claiming ownership of your work.

The full text is below:

Other than what is set forth below, we are not claiming any ownership rights to your entry. However, by submitting your entry, you are granting us an irrevocable, royalty-free, worldwide right and license to: (i) use, review, assess, test and otherwise analyze your entry and all its content in connection with this Contest; and (ii) feature your entry and all content in connection with the marketing, sale, or promotion of this Contest (including but not limited to internal and external sales meetings, conference presentations, tradeshows, and screen shots of the Contest entry in press releases) in all media (now known or later developed)

http://web-ngram.research.microsoft.com/spellerchallenge/Doc...


He said he stopped reading.


Which is a very good reason not to make a post on HN afterwards.


I was being facetious. I didn't really stop reading until I read this:

  understand that we cannot control the incoming information you will 
  disclose to our representatives in the course of entering, or what our
  representatives will remember about your entry. You also understand that we will 
  not restrict work assignments of representatives who have had access to your entry.
  By entering this Contest, you agree that use of informationin our representatives’
  unaided memories in the development or deployment of our products or services does
  not create liability for us under this agreement or copyright or trade secret law;
Sounds like a filthy, under-handed way of saying "we won't copy your code verbatim, but we reserve the right to copy your algorithm and not even give you credit for it".


This seems much worse that the original excerpt you posted.


What on earth? I know we all love a good MS pile-on, but if this contest weren't being run by Microsoft, it'd be an academic shared task with results to be written up and published in a workshop of some conference, probably ACL or EMNLP (which will probably still happen, actually), for no prize money and yet with most or all of the authors open-sourcing their research code anyway.

And this isn't even substantially different from putting together OSS to do the job; even GPLed code lets MSR "use, review, assess, test, and otherwise analyze" it.

EDIT: To be even clearer, the idea of a shared task, run a bit like a contest (with a validation corpus and a secret test corpus, and a designated winner at the end), is really well established in the Natural Language Processing community, and the usual expectation is that you publish at the end. When I say in my first paragraph that this "'d be an academic shared task", I'm not speculating. The remarkable thing about the MSR contest is not the publication requirement, but that they're paying out money at all.


Either that, or it might get build in one of these newfangled "start-ups". Good luck buying, for $10,000, a company that has a spellchecker better than Microsofts.


Good luck building a company around a spell-checker, no matter what level of quality, in a world where most people already have Office.


A sufficiently-smart spell-checker would be indistinguishable from a translation aide, sentiment analyzer, voice-recognition classifier, etc. Basically, whatever tech is required to make spell-checkers better has other uses than spell-checking, and those might be valuable.


A sufficiently-smart spell-checker would be indistinguishable from an artificial intelligence. A company producing one of those things would make billions, but not by selling a spell-checker.


I didn't mean it would be indistinguishable from all of those things (because it is an AGI and could act as any of those things if it wished), but rather that it would likely have to incorporate one or more of those things to spell better. I have a feeling sentiment analysis alone would be a pretty good next step for spelling/grammar-checking.


The founder of http://afterthedeadline.com/ hangs out in hn, so I think you should take this issue up with him.


And I'm guessing he received more than $10,000 from Automattic (WordPress.com) when they acquired him / his company.

http://blog.afterthedeadline.com/2009/09/08/after-the-deadli...


Just to be clear, that's more than just a spell checker.


For word processing, sure, but Thunderbird and Swype are two examples of software that I use on a daily basis that have uses for spell checking.


I am on no universities payroll, nobody pays me to do research.


I'm no fan of Microsoft but way to quote them out of context.

. are granting us an irrevocable, royalty-free, worldwide right and license to: (i) use, review, assess, test and otherwise analyze your entry and all its content in connection with this Contest; and (ii) feature your entry and all content in connection with the marketing, sale, or promotion of this Contest (including but not limited to internal and external sales meetings, conference presentations, tradeshows, and screen shots of the Contest entry in press releases) in all media (now known or later developed)


Unless I am misreading the terms you are only submitting a URL to a REST webservice you create:

...Once you submit the URL of your service at the evaluation web site http://spellerchallenge.com/, a job will be scheduled to call your web service and post a status update to the Challenge’s community page...

Basically it sounds like you are giving them the right to use whatever service you create for the purpose of the contest. It would be hard for them to evaluate your entry if you didn't give them the right to do so. After reading through the rest of the terms, I can't find any requirement to give them access to anything other than that webservice and, if you win, a paper describing your research (no code requirement specified in any of the submission requirements and if you don't want their money, you don't even have to write a paper to participate).


I had the same reaction, simply because if you entered the challenge and actually produced a better spell checker, you would probably have a few people other than Microsoft willing to pay a bit more than $10,000.

Now if it were a million or two, I'd totally do it.


Exactly. Give us the cow and the milk and we’ve got some awesome beans you’ll totally love.


Career-wise, a "won an international search algorithm contest by MS Research" line in your CV is worth much more than 10K.


Yeah seriously. This contest reminded me of the Netflix contest and that one landed you 1 million dollars if you won.


A million is probably still small compared to what they would make off a revolutionary algorithm (and significantly smaller than MS would make off a better spellcheck), but at least it's enough to quit your job and do what you want for a while.

10k is nothing. It's an insult to the work that would go into an improvement.


If you've already got a better algorithm, you now know it's worth at least $10,000 to Microsoft.


Debatably. This could be a talent-scouting endeavor, so the person who came up with it (and thus this and their future creations) could be worth at least 10k to MS.

Though I think it's more likely a side-goal, so I do agree. Just pointing out alternatives :)


for hackers, working on such problem is in different order of magnitude more fun than just earning money.


I agree with you up until the point where the person owning the product of your hack is Microsoft. If the result was FOSS, that would be different.


I think they only will get the license to use it without paying any royalty. You still own the product, no?

Edit: And the license is not exclusive either. This will in a way give validation to your algorithm opening more doors (if you really want to make money).


You wouldn't own their changes to it. I.e. they take what you gave them, build on it, and now they own it and not you. I imagine they are looking for a novel idea that they can own for $10k and then they will implement into Office "the Microsoft way" for about $500k+.


Well, yeah. Did you expect them to give out $10,000 just to know that somewhere in the world a better algorithm exists?

I agree that $10,000 seems rather low, but you could presumably license the algorithm to other parties. The sucky part is that it appears to apply to all applicants instead of just the winner.


However, by submitting your entry, you: are granting us an irrevocable, royalty-free, worldwide right and license.

They're not saying that only the winner has to grant said license, they're saying that everybody who enters has to grant said license.

It's like 99designs, except in this case, MS gets to take home all the designs instead of just the winning one.


No you don’t. That was quoted out of context. You grant them the license to evaluate the algorithm in connection with the contest and to use your algorithm for promotion in connection with the contest. This license doesn’t give Microsoft the right to use your algorithm anywhere else.


Yeah, you're right. I certainly wouldn't enter that contest.


"All your spellcheck algorithms are belong to us"


$10K? That probably wouldn't pay for a single programmer's efforts to do this...essentially you'll be working for Microsoft for free.

Hell Microsoft, at least do like Netflix and make the prize something worth pursuing.


Almost as big of a ripoff as that company that gives you less than 20,000 bucks for a 2-10% steak in your company:

http://tinyurl.com/yb74jsr

Or maybe both of these deals have other less tangible benefits to the people who end up winning.


Please don't use tinyurl or other URL shorteners. Pleople want to see the actual the link, and the posting software can handle long URLs nicely anyway.


Though I normally would agree with you, Tinyurl was fundamental to the point of this person's post. So much so that even before clicking the random Tinyurl I thought to myself, "Nobody would tinyurl something in this context, it must link back to Y-Com."


yeah YC does love it's steaks...

here is the difference, with YC, you actually have your own company at the end. YC gives you 20K when you have nothing.

In this case, you get $0...and only if you hit a homerun, do you get $10K.

Completely different


I always find it a bit disingenuous when I see this kind of competition. I quickly went through the official rules [1] and they're unclear about the true motivations behind this offer. Sure, you get $10k if you develop a great spell checking algorithm, and Microsoft claims no ownership over your implementation. But then there's two clauses that I feel weird about:

* "are granting us an irrevocable, royalty-free, worldwide right and license to: (i) use, review, assess, test and otherwise analyze your entry and all its content in connection with this Contest; and (ii) feature your entry and all content in connection with the marketing, sale, or promotion of this Contest (including but not limited to internal and external sales meetings, conference presentations, tradeshows, and screen shots of the Contest entry in press releases) in all media (now known or later developed)"

* "understand and acknowledge that the Promotion Parties may have developed or commissioned materials similar or identical to your submission and you waive any claims you may have resulting from any similarities to your entry"

I'll admit that this kind of contest pokes my CS brain and that other people will be at least curious enough about it to participate. But then you're getting $10k whereas Microsoft would be getting a bunch more out of your work. Am I wrong? Possibly. But my eyebrow moved when I read these pages.

[1] http://web-ngram.research.microsoft.com/spellerchallenge/Doc...


The Expected F1 (EF1) is the harmonic mean of expected Precision and Recall.[..] the Expected Percision is defined as... (Rules page)

Could at least have run the Spell Check Challenge pages through a spell check!


 understand that we cannot control the incoming information you will disclose to our representatives in the course of entering, or what our representatives will remember about your entry. You also understand that we will not restrict work assignments of representatives who have had access to your entry. By entering this Contest, you agree that use of information in our representatives’ unaided memories in the development or deployment of our products or services does not create liability for us under this agreement or copyright or trade secret law;

Come on... What a fucked up plan is this? Let someone work for free, then let you whole engineering team "review" this... so ooops sorry if we remembered your algorithm. We didn't claim we wont.

I can't believe anyone will be willing to participate in this. This is a day time robbery.


That is almost certainly not what this is about. It's just a common accusation in lawsuits, so Microsoft is just covering its ass. This is the same reason why a lot of Hollywood studios will return spec scripts unopened — if they even open a package from somebody who hasn't signed an agreement like the one MS presents here, they're in danger of getting sued for big bucks.


I did a proof of concept for a better spell checker about 3 years ago. Query groups of two to three words in a search engine and look at the word count. Then replace the word in question with other words that are similarly spelled and run a query with each. The word with the highest result count is extremely likely to be the correct word. Really, it's surprising how accurate it is.

The glory of this is that it works with proper nouns that don't occur in dictionaries (IE: xkcd). In Google's initial demo for Wave, they showed "Icland is an icland" be corrected to "Iceland is an island." I'm fairly confident they took a similar approach. There's also a good chance it could work for other languages, because it doesn't use anything specific to English.

The disappointing part is that most of the accomplishment comes in "Suggestion Intelligence First," meaning that from a list of 5, the top result is the correct result. In most cases, the Suggestion Intelligence is just fine, you will just need to pick the right one yourself.

If anyone's interested, this was my presentation: http://soe.rutgers.edu/sites/default/files/gset/Presentation...

And this was the "research paper." Unfortunately it was a three week program, and myself and the other coder (IE: the ones who understood how the thing worked) didn't contribute much to the paper. Feel free to ask if you have any questions. The email in there isn't actually my email: http://soe.rutgers.edu/sites/default/files/gset/Paper08-Hung...


> Q. Who is not eligible to compete in the challenge?

> Entrants who are younger than 18 years of age;

Ugh. They just ruled out a large portion of the hackers for whom 10k is a lot of money.


They probably didn't have much of a choice on this one, strictly for legal reasons.


I don't know. You don't have to be 18 to win 10k at a chess tournament.


irrevocable, royalty-free, worldwide right and license

If words like that were used in said chess tournament, you probably would have to be 18 to compete.


They probably have to be 18 to grant irrevocable, royalty-free, worldwide right and license. But maybe they should have made it work somehow.


The way it's been explained to me before is that you do have to be an adult to enter in to a legal agreement.




I'll pay more with far less onerous terms. Seriously. I'll happily sponsor grammar aid work too. Get in touch: http://gruschow.org


I should note that I work for Microsoft, but not for Microsoft Research, and of course, I don't speak for them.


I hope you get paid more than a $10,000 prize for working at Microsoft.


Probably this has already been thought about but it seems to me that a good checker should look for typos arising from the proximity of certain letters on the QWERTY keyboard.

Particularly potential typos which would be correct spellings of different, unintended words (these pass undetected by existing checkers).


Does this mean IE will finally get a built-in spell checker?


Try another 3 zeroes.


Right on. Netflix forked out a million for their contest, and they have market cap that's puny compared to Microsoft.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: