Hacker News new | past | comments | ask | show | jobs | submit | datageek's comments login

the algorithm needed to execute quickly. image analysis takes too long.


What's the bottleneck? Bandwidth for downloading images? Understanding of an algorithm that would do the job?


Bandwidth and general cumbersomeness of dealing with larger amounts of data with starving-startup resources. I actually spent about a decade of my career focused on image processing, and while I love it's power, I knew how much of an engineering challenge it can be at massive scale. I need to do a blog post about this, since I know my choice is a bit surprising and needs explanation.


Difficulty in creating an algorithm.

There are ways to get it done algorithmically, however, the challenge is in getting enough data. 30,000 images is too low. you would need a few million, then just simple machine learning algorithms would work.

The latest machine learning techniques such as unsupervised deep learning might work, with millions of unlabeled images and the 30,000 labelled.


Technically I wouldn't call it a 'Photo quality algorithm' in that case


Kaggle - San Francisco, CA

We're looking for:

* Data scientists

* Developers (REMOTE)

* Technical sales

More information at http://www.kaggle.com/pages/jobs

Kaggle has just closed a large Series A ($11.25m). Our early employees will help shape Kaggle's direction and grow along with the company. Regardless of the position, you should have a strong interest in data science and the intellectual curiosity to engage with competition clients from a wide variety of fields.

Kaggle is aiming to build a meritocratic marketplace that will change the way data science gets done. Read more at: http://www.businessweek.com/magazine/kaggles-contests-crunch...


FWIW, Martin O'Leary doesn't sound like a Jewish name.


ONLY 40% of Physics nobels are given to jews, that leaves 60% to non-Jews. So, its not surprising that Martin O'Leary isn't Jewish. Unless his mom is and he took his dad's non-Jewish name. :)


Doesn't seem to be anything stopping ppl from including other data.


You're welcome to bring additional data as long as it's publicly available.

http://kaggle.com/view-postlist/forum-29-rta-freeway-travel-...


"This competition requires participants to predict travel time on Sydney's M4 freeway from past travel time observations". This line seems to suggest that the past travel time is the most important part of the experiment; however, as one other (rrrhys) pointed out, the data is useless, since the road has changed, and the grandparent of this post mentioned sporting events affecting traffic.

All of that said, perhaps a strong model can be generated using just historical data.


Build a better chess rating system and enter your system into the following competition: http://kaggle.com/chess.

You may want to use machine learning techniques, which you can learn using the Andrew Ng's Stanford lectures (http://www.youtube.com/watch?v=UzxYlbK2c7E&feature=chann...).


Bustaname.com has some neat functionality, whereas Bruteforcenaming.com is super simple.


at 50 years old - it's probably due for an upgrade!


Instead of being upgraded the ELO system is actually being applied to other sports because of its solid statistical basis.


It is solid and it is easily computable


Benford's Law is really simple and neat. Worth reading about in isolation http://en.wikipedia.org/wiki/Benford%27s_law


Possibly the start of an arms race between fraudsters and data miners?


I wonder if there's a business model in this?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: