Hacker News new | past | comments | ask | show | jobs | submit login

From my observation, even the largest GPT-2 model has difficulty retaining any long-range relationship information. In the "unicorn" writing example that was published originally, the model 'forgets' where the researchers are (climbing a mountain versus being beside a lake iirc) after just a few sentences. Because of this, it's hard to imagine models of this type being able to write long-form coherent papers. Now if we could somehow constrain the generated text to conform to a predefined graph structure that isn't forgotten so quickly...



Maybe the problem is that most of these models seem to rely on sequential information (even the transformer needs this for forward generation of text) to encode long range information.

But I can’t remember the last time I relied on sequentially remembering the ordering of tokens in order to complete an essay or hell even reply to an email.

Structurally we retain some kind of hierarchical information (topic, places, names, events) about text.

Is there any active research looking into text generation models which do this? Maybe some kind of query that is made in a learned vector space and which is not temporarily dependent but rather “spatially” - as in these are the facts about the text being generated so far.


I'm interested on this as well.

I have been trying to fine-tune GPT-2 on genre fiction to work as a sort of "fiction replicator". Stylistically it actually seems to do quite reasonably, but it lacks narrative cohesion. This problem, as you point out, is corpus agnostic.

I thought of trying to keep track of characters and key interactions outside of the model, but I haven't figured out how to make these two models interact reliably -- outside of just having the first component generate prompts for the second model in a kind of cooperative setting.

Is there a known way to set up transformer to do infix generation? That is: give it a start and end prompt, and an estimated number of tokens to fill in between. That seems like it should be doable and could improve things, but I haven't found any work on this problem yet and haven't had the time (and potentially don't have the skills) to look deeply myself yet.


Note that "just a few sentences" is more coherence than anything has ever managed before.

Also, in many examples that I've seen there's a clear thread that runs through the generated text. For example, a couple small passages posted by others in this thread all revolve around one or two words that are repeated throughout the text, even if the details around those words keep changing

See: https://news.ycombinator.com/item?id=21456705

Now it's "my bed" now it's a "strange bed" but it's always about a bed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: