Okay, I know it's not possible to bypass the 8k token limit. I don't have access to the 32k model yet.
I have transcripts that are typically around 15000 tokens in size. I want to split this text into different topics. The problem is the current limit to GPT-4.
The obvious approach would be to split the text into chunks and then send to the API. However, GPT-4 won't have the context of the other chunks to accurately identify topics inside the text. For example, the chunk could separate the text in the middle of a topic. I can't think of a way that can programatically chunk the text without breaking up topics by mistake.
Does anyone know of a way around this or have a better approach? Or is this just the reality of GPT-4 at the moment.