The product I would like to see out of this is a way to automate UI QA.
Ideally it would be given a persona and a list of use cases, try to accomplish each task and save the state where you/it failed.
Something like a Chrome lighthouse but for usability. Bonus point if it can highlight what part of my documentation is using mismatched terminology making it difficult for newcomers to understand what button I am referring to.
I've seen similar sentiment even pre-LLM that AI would help automate other forms of testing, and I just don't quite see it.
Implementing tests is not the hard part. You could make that an intern project or hire a consultant for 3 months. The hard part is the interpretation of results.
That is - making a thing that spits out tickets/alerts is easy. The signal/noise tuning and actual investigation workflows are the hard part and still very manual & human operated. I don't see LLM mouse/keyboard control changing that yet.
> making a thing that spits out tickets/alerts is easy.
I don't really believe that what I am asking for is hard, yet I still can't buy it as far as I know.
> actual investigation workflows are the hard part and still very manual & human operated.
Sure but it would allow your QA worker to have pre-tested usecase-based path with some flag on whether or not they may be problematic with a screen-recording and some timestamp of where it went wrong.
These will always need human-in-the-loop to vet the findings before cutting a ticket to development team.
Fair - I'm not personally familiar with state of the art in UI QA automation, but I know theres been various screen recording type tools available for a decade+ with mixed success.
I come more from a "big data" background, and have dealt with CTOs who think "can't we just use AI?" is the answer to data quality checking multi-PB data lakes with 1000s of unique datasets from 100s of vendors. That is - they don't want to staff a data quality team, they think you can just magic it all away.
The answer was always - sure, but you are fixated on the easy part - anomaly detection.
Actual data analysis on what broke, when, how, why, and escalating to data provider was always 95% of the work. Someone needs to look at the exhaust, and there will be exhaust every single day.. so you can kill your dev teams productivity or actually staff an operations team responsible for the tickets the thing spits out.
That's fair and I don't think I have a good counter to this, it would be very easy for such a UI QA product to become just another "security vulnerability scanner" that cuts low severity tickets that nobody looks at.
do y'all see a way to ramp from mostly-human-in-the-loop to mostly-ai? Can you take a system that does 1% at the hard part of signal/tuning and teach it to get better over time?
I'm thinking for a single particular application under test and a mostly-static group of SMEs who might be involved to respond/tune
It’s both. Most manual tests are required to be run whenever the underlying code has changed. And that’s pretty slow and annoying. Interpreting results is usually pretty trivial, like checking the http code or checking against an assert. I don’t think most companies use/should use manual testing but where it’s unavoidable, this is a great workaround.
Ideally it would be given a persona and a list of use cases, try to accomplish each task and save the state where you/it failed.
Something like a Chrome lighthouse but for usability. Bonus point if it can highlight what part of my documentation is using mismatched terminology making it difficult for newcomers to understand what button I am referring to.