> and parsing it out of the man pages is not something I’d like to imagine doing reliably. So I must satisfy myself by manually writing these facts down. And this turns out to be the bottleneck of the entire operation.
If you do, you must unit-test the LLM stage. How do you do that without wasting a lot of time and resources? If the unit tests run through a few thousand times, would you bet your life on it never failing? I would if it was any other code.
I find LLMs very helpful when the task is annoyingly underdefined / understructured, but the result I want is easy to eyeball-audit.
This seems like one of those. Boiling down manpages to a consistent structure which a program can consume is going to involve a lot of special-casing a script, because they aren't written to be scraped like that.
But opening the result in one window, then loading the manpages one at a time in the other, and sanity-checking the contents, is less effort than manually copy-pasting everything and getting it into a consistent data format by hand.
Feeding the result of an LLM-grep sight-unseen into another program is an insane thing to do, of course. But using it like the above could save a lot of time.
You can probably use an LLM for this.