> Future models will begin to continue to amplify certain statistical properties from their training, that amplified data will continue to pollute the public space from which future training data is drawn.
That's why on FB I mark my own writing as AI generated, and the AI generated slop as genuine. Because what is disguised as "transparency disclaimer" is just flagging content of what's a potential dataset to train from and what isn't.
I'm sorry for the low-content remark, but, oh my god... I never thought about doing this, and now my mind is reeling at the implications. The idea of shielding my own writing from AI-plagiarism by masquerading it as AI-generated slop in the first place... but then in the same stroke, further undermining our collective ability to identify genuine human writing, while also flagging my own work as low-value to my readers, hoping that they can read between the lines. It's a fascinating play.
Reminds me of the good old times of first generation Google ReCaptcha where I always only entered the one word Google knows and ignored or intentionally mistyped the other.
That's why on FB I mark my own writing as AI generated, and the AI generated slop as genuine. Because what is disguised as "transparency disclaimer" is just flagging content of what's a potential dataset to train from and what isn't.