Using other people data for training without their permission is the "original sin" of LLMs[1]. That will, at best, be a shadow over the entire field for an extremely long time.
[1] Just to head off people saying that such a use is not a copyright violation -- I'm not saying it is. I'm just saying that it's extremely sketchy and, in my view, ethically unsupportable.
[1] Just to head off people saying that such a use is not a copyright violation -- I'm not saying it is. I'm just saying that it's extremely sketchy and, in my view, ethically unsupportable.