I am the maintainer of a man page that turns into a 260+ page PDF document in letter size.
I haven't hacked on groff, but I recently I did a whole bunch of work on the man2html program from the man tools. That code is extremely hacky, like you wouldn't believe!
Mandoc, which is what the BSDs and some other systems use for formatting manpages, has very good HTML output. In fact, the program used to be named “mdocml” because it was written to be a mdoc‐to‐HTML converter.
Unfortunately the -man macros (as opposed to the modern -mdoc macros) aren’t that good for conversion to other formats like HTML, because they’re by nature presentation‐focused. All major troff implementations support the -mdoc macros, though, and -mdoc is much better suited. It’s what I write all my manpages in these days (and it’s a drop‐in replacement—replace foo.1 written in -man with foo.1 written in -mdoc and groff, man, etc will handle it instantly). I also like to convert manpages from -man to -mdoc, or write new pages for programs that don’t have one. It gets a little exhausting to convert long pages like the one you linked, though.
I wonder how good is mandoc's handling of the troff language as such.
The long page that I wrote, though it is based on the old -man macros, is actually to a large extent based on its own macros which are retargettable.
As I started to polish the document for better PDF output, I needed to reach into more of the power of groff, while maintaining compatibility with man2html. That's when I started hacking on man2html to handle more of the troff language. I found that loops didn't work very well and there were issues with nested if/else and such.
Keep in mind, groff can itself generate HTML. I think it may need some pre- and post-processing to do it, though too, because (I think) what are links in the HTML version end up being footnotes in the pdf/ps version. Could be wrong about that, though.
No thanks; that is broken by design. HTML is treated as a typesetter device, more or less, when what is needed is a semantic translation of the high level document structure.
As an analogy to another software system, you wouldn't want to generate HTML from LaTeX by processing the DVI file.
What is needed is the high level macros of a specific package being recognized and translated to HTML at a high level.
However hacky, the man2html program does that (and I made it work better: it has better support for handling more sophisticated macros, and is less buggy. I likely won't invest any more time into it, however, and I'm not going anywhere near groff).
I haven't hacked on groff, but I recently I did a whole bunch of work on the man2html program from the man tools. That code is extremely hacky, like you wouldn't believe!
http://www.kylheku.com/cgit/man/
I have it so that a man page can detect whether it's being compiled by groff or by man2html and re-target some of its macros.
That aforementioned large man page is here; the macros are upfront: http://www.kylheku.com/cgit/txr/tree/txr.1
HTML and PDF here: http://sourceforge.net/projects/txr/files/txr-104/
(The index and hyperlinks in the HTML are due to a post-processing pass, implemented in the "genman.txr" script.)