I wonder how it came to be that such a critical component of a multi billion industry relies on something so amateurish as LAION. This is no offense to the author at all, who organised a gigantic effort which we now see is very valuable. But I would imagine a company like Google could do a much better job in no time, simply due to expertise and resources.
By all indications, it's definitely a solid engineering project. On what basis should we deride it as "amateurish"?
Because it originates in the public sector? As opposed to the private sector (where, as we know, everything is done to the highest possible engineering standards)?
I know the word sounds demeaning, and I didn't mean to criticise in on technical grounds. I find it an extremely impressive project.
I meant that it does not have the refinement you would expect for such a critical tool. A substantial portion of LAION is composed of duplicates. If you have ever browsed it, you will find that many annotations are quite basic and in some cases incorrect. In ChatGPT's case we know there was a small army of people going through their dataset to filter and refine issues that are presumably similar to those.
Indeed. The answer is in the article. He gets offered jobs all the time, but nobody offers to buy the data itself. Clearly because nobody wants to own it, it's all plausible deniability.
I guess the answer is legal liability.