Hacker News new | past | comments | ask | show | jobs | submit login

Are you referring to the test you mentioned in this thread? https://news.ycombinator.com/item?id=26784732

If so, December predates Conformer, so you're talking about the sconv model, which is the model I was complaining about upthread - it was very polarizing with users, and despite the theoretical WER improvements, the errors were much more catastrophic than the model that preceded it.

In either case, I'm constantly making improvements - I'm in the middle of a retrain that fixes some of the biggest issues (such as misrecognizing some short commands as numbers), and I've done a lot of other work recently that has really polished up the experience with the existing model.




I totally forgot about that conversation! Yeah I must be referring to sconv then. I was thinking of the new custom-trained model you were releasing to your paid beta patreon subscribers, and confused the two.

As a side rant, it turned out that simply stepping away from work for a few weeks around the holidays nearly fixed my RSI, which makes me so sad about the nature of my career whenever it crops back up.

Btw, any chance you've done any work on the `phones` or related tooling? I remember that (and editing in general) being a pain point.


Yeah for sure, breaks are really important.

sconv was especially disappointing because it looked so good on metrics during my training, but the cracks really started to show once it entered user testing. Conformer has been so much less stressful in comparison because most user complaints are about near misses (or completely ambiguous speech where the output is not _wrong_ per se if you listen to the audio) rather than catastrophic failure.

There's another interesting emergent behavior with my user base as I make improvements, which is that as I release improved models allowing users to speak faster without mistakes, some users will speak even faster until there are mistakes again.

Edit: Yep! There have been several improvements on editing, though that's more in the user script domain and my work has still been mostly on the backing tech. I'm planning on working on "first party" user scripts in the future where that stuff is more polished too.


> as I release improved models allowing users to speak faster without mistakes, users will speak even faster until there are mistakes again.

LOL. Users will be users! That's a hilarious case study, thanks for sharing.

> Yep! There have been several improvements on editing, though that's more in the user script domain and my work has still been mostly on the backing tech. I'm planning on working on "first party" user scripts in the future where that stuff is more polished too.

That would be wonderful! If you haven't seen them, I'd suggest looking at Serenade (also ASR) and Nebo (handwriting OCR on ipad) as interesting references for editing UI. They seem to have tight integration between the recognition and editing steps, letting errors be painless to fix by exposing alternative recognitions at the click of a button or short command. It lets them make x% precision@n as convenient as x% accuracy.


I would say not quite as convenient, because they lean on that UI to also make you constantly confirm top-1 commands that would've worked fine. As you can see in my Conformer demo video I can hit top-1 so reliably I don't even need to wait to look at the command before I start saying the next one.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: