Hacker News new | past | comments | ask | show | jobs | submit login

First thing I thought reading your comment was "this guy's repeating the low background steel analogy that was posted the other day". I don't know if that's actually true, and it doesn't matter, my point is the internet is already an echo chamber. GPT is trained on reddit and HN and other forums where people say the same stuff over and over again anyway. It's not like the pre-gpt internet is some pristine pool of new ideas. As long as someone somewhere is injecting some new stuff in, and they will be, it doesn't matter that there will be more automatically generated crap, there already was



Ignoring the effects of volume is a common and fatal mistake. "This effect already exists" is not incompatible with "Multiplying the impact of this effect by orders of magnitude will cause it to become a massive problem."

Or, put another way: A slightly degraded signal-to-noise ratio isn't really a problem, but once it degrades far enough, the whole stream becomes impossible to decode, and worthless.


As I understand it, LLMs use temperature scaling to optimize the variability in the tokens they generate so the output appears "good", somewhere between getting stuck always saying the same thing and saying nonsense. [0]

I wonder if more output being used in the training data will result in a shift to higher temperatures over time to keep variety in the output.

[0] Discussion the other day: https://news.ycombinator.com/item?id=35131112


My Information Theory is rusty (and wasn't too strong to begin with), but didn't Shannon prove that we can decode a signal given arbitrarily bad noise, but that the more noise, the lower the channel capacity?


Yep, pretty much. But therein lies the problem: filtering out garbage writing is something done by humans, whose limited time, attention, and effort all make for a pretty tight bottleneck in the first place. Degrade that channel capacity, and our ability to find texts of value becomes pretty close to nil.

edit: Also, when the noise comes in the form of competing signals rather than random noise, that becomes much, much more difficult, but we're also outside my area of expertise at this point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: