Hacker News new | past | comments | ask | show | jobs | submit login

How are you getting the book text into gpt3?



You don’t. You cut it into snippets. For those you create embeddings which allow you to rank them by semantic similarity to a query. You then prompt GPT3 with the question plus, say, the three most relevant snippets from the book.

The most difficult thing about the process is preventing the model from making stuff up.


This is exactly what I'm working on! My project is taking Zoom conversation, using pyannote for speaker diarisation, whisper for transcription, pinecone.io for semantic search, then feeding that into GPT-3 so we can ask questions about conversation.

For us this is super useful because it's not unusual for our discover sessions to last days and we're all terrible at taking notes.

As a nerd, my brain is already buzzing on ways that I could use this for my groups D&D campaigns.


Are you getting good results when summarizing a human speaking? On my project, even though Whisper does a good job translating it, I'm not happy with the query results. My theory is that GPT-3 is designed for written word and the way people speak and the way they write are structurally different. Or I'm just figuring this out and I'm not good enough at it yet.


It’s often not enough to just index the snippets themselves. You may need to augment them. For instance, you may need to keep track of the context, and prepend it to the actual snippet that you want to index.

The important thing in such a pipeline is not GPT 3. The important thing is the retrieving/ranking algorithm that finds the most relevant snippets and feeds them into GPT 3. The latter is only the mouthpiece, if you will.

In fact, you might even find that you’re better off without it (no confabulation, ground truth data).


Interesting, I have been playing with something similar to use as a knowledge lookup tool stapled to a dynamic prompt builder.


This sounds so interesting, do you have any plans to write up a more detailed description of it??


I've got tons of notes so it shouldn't be too hard to do a write up. Currently it's in a private repo, but if I can get sign-off from my boss I'll open source it.


Any idea when you might find out??? :)


How do you create the embeddings? Is there a GPT3 API that returns a paragraph’s embedding vector?


There is. But I’m not sure how good they are. Nils Reimer wrote a Medium post about them, concluding that they performed worse than SOTA sentence transformer models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: