Does this work well on screenshots of a desktop or webpage? We’ve found a lot of the document segmentation/layout parsing models and tools are very tuned to really paper docs only, the kind of thing you’d see in a research paper or shared pdf, not able to handle the text layout that happens for modern applications and webpage UIs. We’ve been cobbling together our own techniques because of that.
A webpage should work pretty well, but a desktop would probably fail miserably.
With that said the library supports adding your own post processing pipeline to group nodes together so it’s totally extensible. Probably similar to what you already have.