Hacker News new | past | comments | ask | show | jobs | submit login

‘delve’ is given as an example right there in TFA.



Yes, but the material presented in no way makes distiction between potential organic growth of 'delve' vs. LLM induced use. They just note that even though 'delve' was on the rise, in 23-24 the word gains more popularity, at the same time ChatGPT rose. Word adoption is certainly not a linear phenomenon. And as the author states 'I don't think anyone has reliable information about post-2021 language usage by humans'

So I would still state noun-phrase frequency in LLM output would tend to reflect noun-phrase frequency in training data in a similar context (disregarding enforced bias induced through RLHF and other tuning at the moment)

I'm sure there will be cross-fertilization from LLM to Human and back, but I'm not seeing the data yet that the influence on word-frequency is that outspoken.

The author seems to have some other objections to the rise of LLM's, which I fully understand.


The fact that making this distinction is impossible is reason enough to stop.


Even granting that we can disregard a really huge factor here, which I'm not sure we really can, one can not know beforehand how the clustering of the vocabulary is going to go pre-training, and its speculated that both at the center and at the edges of clusters we get random particularities. Hence the "solidgoldmagikarp" phenomenon and many others.


there is almost certainly organic growth as well as more people in Nigeria and other SSA countries are getting very good internet penetration in recent years




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: