Hacker News new | past | comments | ask | show | jobs | submit login

I wonder how it came to be that such a critical component of a multi billion industry relies on something so amateurish as LAION. This is no offense to the author at all, who organised a gigantic effort which we now see is very valuable. But I would imagine a company like Google could do a much better job in no time, simply due to expertise and resources.

I guess the answer is legal liability.




By all indications, it's definitely a solid engineering project. On what basis should we deride it as "amateurish"?

Because it originates in the public sector? As opposed to the private sector (where, as we know, everything is done to the highest possible engineering standards)?


I know the word sounds demeaning, and I didn't mean to criticise in on technical grounds. I find it an extremely impressive project.

I meant that it does not have the refinement you would expect for such a critical tool. A substantial portion of LAION is composed of duplicates. If you have ever browsed it, you will find that many annotations are quite basic and in some cases incorrect. In ChatGPT's case we know there was a small army of people going through their dataset to filter and refine issues that are presumably similar to those.


Who is not to say google et al doesn’t have an in house army that re-annotates images with a low score and store it in a parallel dataset?


"I guess the answer is legal liability."

Indeed. The answer is in the article. He gets offered jobs all the time, but nobody offers to buy the data itself. Clearly because nobody wants to own it, it's all plausible deniability.


> Clearly because nobody wants to own it

Is it open source? If thats the case, you already own it.


Forever relevant: https://xkcd.com/2347/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: