I wonder how it came to be that such a critical component of a multi billion ind...

lisasays · on April 25, 2023

By all indications, it's definitely a solid engineering project. On what basis should we deride it as "amateurish"?

Because it originates in the public sector? As opposed to the private sector (where, as we know, everything is done to the highest possible engineering standards)?

epups · on April 25, 2023

I know the word sounds demeaning, and I didn't mean to criticise in on technical grounds. I find it an extremely impressive project.

I meant that it does not have the refinement you would expect for such a critical tool. A substantial portion of LAION is composed of duplicates. If you have ever browsed it, you will find that many annotations are quite basic and in some cases incorrect. In ChatGPT's case we know there was a small army of people going through their dataset to filter and refine issues that are presumably similar to those.

wodenokoto · on April 25, 2023

Who is not to say google et al doesn’t have an in house army that re-annotates images with a low score and store it in a parallel dataset?

dahwolf · on April 25, 2023

"I guess the answer is legal liability."

Indeed. The answer is in the article. He gets offered jobs all the time, but nobody offers to buy the data itself. Clearly because nobody wants to own it, it's all plausible deniability.

bioemerl · on April 26, 2023

> Clearly because nobody wants to own it

Is it open source? If thats the case, you already own it.

Maken · on April 25, 2023

Forever relevant: https://xkcd.com/2347/