The lack of accessibility sucks, but there's a part of me wondering why we don't have accessibility tools that don't rely on us tagging images and giving us the ability to navigate the layout of a page.
Wouldn't tools that do text and image recognition based on screen selection, arrow keys that move to the center of the nearest shape, etc. be a lot more useful? Like a Google lens for desktop.
Text selection is 100% reliable, but being able to get the text and layout from any arbitrary image and be able to navigate that would help those who struggle with vision a lot more, even if it's only 85% reliable.
Once you're at the level of OCRing text from screen, it becomes a general-AI-complete problem. See e.g. how bad OCR still is for books, and there you can still make good assumptions about the shape and meaning of text (sentences, paragraphs, chapters). With arbitrary UIs, you have to recognize what each block of pixels means, and that's the GAI-complete problem at which even humans fail (that's why the UI/UX/HCI fields exist).
Wouldn't tools that do text and image recognition based on screen selection, arrow keys that move to the center of the nearest shape, etc. be a lot more useful? Like a Google lens for desktop.
Text selection is 100% reliable, but being able to get the text and layout from any arbitrary image and be able to navigate that would help those who struggle with vision a lot more, even if it's only 85% reliable.