disclaimer: i am in the middle of the journey into learning this subject, and have at least done finetuning on existing models. some may still consider my take as misinformed.
one thing i am not a big fan of in this discourse is that how the community makes up jargon to describe relatively simple things on one hand (a la "embeddings"), and while come up with misnomers like "hallucinations" to describe otherwise blunt things. it is contributing to miss the forest for the trees in this discussion.
while i agree it is quite impressive to see what the state of the art can seemingly achieve on surface, the way we approach designing these models are fundamentally not aligned with what we want to do with its result. while some media for generative machine learning produce relatively harmless results (e.g. images and audio), it is the text where i have a lot of reservations.
we train these models to essentially turn noise into something that humans can pass as acceptable (for the given prompt). Sadly, however, people outside (and sadly "inside") this space claim it to be the word of God.
i still strongly believe that a hybrid system is going to be the only way we can achieve the most from the current approach. almost a decade ago we were already good at parsing through sentence structure and "understanding" what the user writes. what the output is can easily be done with some hardcoding in logic with some noise thrown in by the models. however, it should only be treated with that level of seriousness.
i might be shouting into a void, but it is imperitive that we stop taking what these models output as production-ready code, or as a starting point even! otherwise, the only thing you achieve from this is to train the models running the chat services further.
one thing i am not a big fan of in this discourse is that how the community makes up jargon to describe relatively simple things on one hand (a la "embeddings"), and while come up with misnomers like "hallucinations" to describe otherwise blunt things. it is contributing to miss the forest for the trees in this discussion.
while i agree it is quite impressive to see what the state of the art can seemingly achieve on surface, the way we approach designing these models are fundamentally not aligned with what we want to do with its result. while some media for generative machine learning produce relatively harmless results (e.g. images and audio), it is the text where i have a lot of reservations.
we train these models to essentially turn noise into something that humans can pass as acceptable (for the given prompt). Sadly, however, people outside (and sadly "inside") this space claim it to be the word of God.
i still strongly believe that a hybrid system is going to be the only way we can achieve the most from the current approach. almost a decade ago we were already good at parsing through sentence structure and "understanding" what the user writes. what the output is can easily be done with some hardcoding in logic with some noise thrown in by the models. however, it should only be treated with that level of seriousness.
i might be shouting into a void, but it is imperitive that we stop taking what these models output as production-ready code, or as a starting point even! otherwise, the only thing you achieve from this is to train the models running the chat services further.