AccessKit developer here. It's still really early in development; there's not mu...

wizzwizz4 · on Aug 13, 2021

> Each node has an integer ID, a role (e.g. button or window), and a variety of optional attributes. The schema also defines actions that can be requested by assistive technologies, such as moving the keyboard focus, invoking a button, or selecting text.

This sounds very similar to what I'm using for my Semantic UI project (which has similar aims).

Accessibility systems require the ability to programmatically interact with the UI, too (install Accerciser if you're on an AT-SPI2-based system to have a play around); I'm not sure how your system supports typing. (Is it all done via Action::ReplaceSelectedText?)

Also, have you thought about latency? AT-SPI2 is really laggy (“bring down your system for several seconds at a time” levels of laggy), and from a cursory inspection AccessKit looks even heavier.

mwcampbell · on Aug 13, 2021

I'd like to know more about the Semantic UI project.

The way text input is implemented depends on the user's platform and input needs. When using a screen reader with a hardware keyboard, the screen reader will often use the accessibility API to programmatically move the keyboard focus, but once the focus is in a text input control, the input itself happens as usual, not through the platform's accessibility API. For users who require alternate input methods such as speech recognition, it depends on the platform. On Windows, for instance, text input isn't even done through the accessibility API; it's done through a separate API called Text Services Framework. But AccessKit will offer the ReplaceSelectedText action for platforms that can expose it.

I have certainly thought about latency; as a Windows screen reader developer, it has been a difficult problem for a long time. The relevant factor here is not the amount of information being pushed, but the number of round trips between the assistive technology (e.g. screen reader) and the application. If I'm not mistaken, this is what makes AT-SPI problematic in this area. This has also been a problem for the Windows UI Automation API, and a major focus of my time on the Windows accessibility team at Microsoft was to help solve that problem. As for AccessKit, I'll refer you to the part in the README about how applications will push tree updates to platform adapters. Since a large tree update can be pushed all at once, AccessKit doesn't make the problem of multiple round trips any worse.

wizzwizz4 · on Aug 13, 2021

> The relevant factor here is not the amount of information being pushed, but the number of round trips between the assistive technology (e.g. screen reader) and the application. If I'm not mistaken, this is what makes AT-SPI problematic in this area.

That explains a lot! AT-SPI2 has, as you say, a lot of round trips – and some applications (e.g. Firefox) seem to use a blocking D-Bus interface that means they drop X events while talking to the accessibility bus.

> I'd like to know more about the Semantic UI project.

I don't think it qualifies for a definite article just yet. :-) I got annoyed with the lack of good, lightweight, cross-platform GUIs in Rust, and I tried to make my own, but then faced the same issue with accessibility APIs… so now I'm trying to solve both problems at once: defining a schema and interaction protocol for the semantics of a user interface, as a first-class citizen – all the information needed to construct a GUI interface would be present in the “accessibility data”, but in principle any kind of UI could be generated just as easily. (Of course, a GUI auto-generated from the SUI data would be like a CSS-free webpage; I'm planning to make a proper GUI library too, later.)

There are three types of thing in the schema I've got so far:

• “Widget type” – basically a role. Each widget has exactly one widget type, which implies a certain set of features (e.g. section-with-heading has a heading)

• “Feature” – a group of attributes with a semantic meaning (e.g. the heading feature consists of a reference to the heading widget (which must have the feature providing its natural language representation)). I'm not sure how to deal with stuff like “can be scrolled”, because I still haven't finished bikeshedding things like “should there be implied zero-size features, or should widget types just have a lot of semantics, or should there be a load of explicit-but-redundant mandatory features on every button widget saying it can be pressed?”. (I'm leaning towards the latter, now, under the assumption that simplicity is better than trying to reduce bandwidth.)

• “Event”. Every change to the state of widgets is accompanied by an event. There are separate events for semantically different things even if the same thing happened; for instance, when LibreOffice Calc deletes table cell widgets that have gone off-screen, the widgets have been deleted but the actual cells are still there; that's a different thing to what happens when someone deletes a worksheet, so it should have a different event. This makes SUI retained-mode, but it should be usable with immediate-mode UIs in the same situations as AccessKit is.

I haven't worked out how to represent “alternate interface interacts with program” yet, but I'm leaning towards a second kind of event, with the set of valid user events (and hence what the alternate UI “looks” like) determined by the

Another question is how to represent cursors. Obviously there should be co-ordinate-positional (mouse-like) and cursors over the widget graph, but keyboard-driven GUIs don't behave like either of those things… so do I just let the alternate interface deal with cursors? But then how does the application know what's currently selected by the cursor? (Focus, hover, select… all with different semantics, not all of which I'm aware of.) Maybe SUI should just completely keep out of that, and pass through a cursor ID and various events without trying to fit them to a model?

You can tell I'm not very good at this; if I'd heard of AccessKit earlier than a week into this project, I wouldn't've started it! :-p

Since pretty much every OS supports Unix Domain Sockets, I intended to use that as the communication protocol. Backends for existing accessibility systems (e.g. AT-SPI2, IAccessible2) were planned as daemons, but to be honest I don't know enough about IPC to have planned this out properly, and I haven't really got attached to any one architecture. I don't even know that that would work properly; IAccessible is a COM interface, and afaik Windows cares strongly about which process provides those.

I thought amount of information was a factor in IPC latency (even though computers can download GBs of data over a network in seconds), so I've been distracting myself with trying to “lazy-load” lots of the data. If you're right about latency – which you probably are – then that's worse than useless, and I should just bubble that up.

A final question is: how to deal with reading order? I have no answers to this.