A technique I have had success with is to do it in multiple passes.
Map-reduce it with overlapping sections, but then propagate back downwards and repeat the process, but now each map-reduce node knows the context it's operating in and can summarize more salient details.
Concretely, on the first pass, your leaf nodes are given a prompt like "The following is lines X-Y of a Z length article. Output a 1 paragraph summary."
You then summarize those summaries, etc. But then you can propagate that info back down for a second pass, so in the second pass, your leaf nodes are given a prompt like "The following is lines X-Y of a Z length article. The article is about <topic>. The section before line X is about <subtopic>. The section after Y is about <subtopic>. Output a 1 paragraph summary that covers details most relevant to this article in the surrounding context."
Could you expand on this? Is the idea to embed paragraphs (or some other arbitrary subsection) of text, and then semantic search for the most relevant paragraphs, and then only summarize them?
Yes that's exactly right, but it presumes you know what to look for and what you want in your summary. Our use case is to pick out action items or next steps from meeting notes so this can work. But not for all use cases - i.e. summarize this paper.