Hacker News new | past | comments | ask | show | jobs | submit login

IMO the RNN is overkill of this problem, compared to a simple and elegant algorithm called "$1 unistroke recognizer". That one works beautifully even when trained with just a single sample of each gesture.

I hope $1 unistroke gets more recognition because it can be integrated in an afternoon into any project to add gesture recognition and make the UI more friendly.

It works quite reliably for palm style "Graffiti" text entry, as long as each letter is just a single stroke. The original paper also makes great effort to be readable and understandable.

https://depts.washington.edu/acelab/proj/dollar/index.html




A big issue with the $1 recognizer is that it requires strokes to be drawn in a specific way, for example to draw a circle you need to go counterclockwise, if you go clockwise (as seems more natural to me) it gets recognized as a caret. This makes it not really usable in a context of free drawing were the users are not aware of the details of your implementation.


But this is only a potential issue if you expect users to record their own gestures and then switch direction for some reason. If you are the one to define the gestures you can just preprocess them to allow multiple directions/orientations (or just record multiple yourself).


Not an issue, just invert each recorded gesture and add it to the same symbol.


This does not scale well when your drawing is more complicated. A simple example is a square, which can start in 4 places and go 2 directions, now you have 8 samples, but it gets more complicated because some people use multi-stroke for the square.

The other algos in the family are more robust to this, but after experimenting, a RNN or vision model does much better on the consistency side of things.


What I meant is to add both the clockwise and counter-clockwise variant of same gesture. Rotations are another matter, $1 unistroke can be made to be either sensitive or insensitive to gesture rotation, depending what you want. Often you'd want to discern "7" from "L".

Uni-stroke is much more elegant input method than multi-stroke. You can react to user's gesture as soon as they lift the mouse button (or stylus or finger), without introducing some arbitrary delay. Users can learn and become very fast at using gestures. Multi-stroke on the other hand requires coordination of each stroke with previous ones and to me it doesn't justify its complexity. I admit I have preference the software where users adapt and become proficient, while many products with wider audience need to be more accessible towards beginners. Different strokes...


right, but for a square, you have to add 8 samples, not 2, to handle the 4 starting points and 2 directions, but this does not account for the users who multi-stroke

> Different strokes...

I see what you did there :] I'm definitely in the reduce user burden camp.

https://quickdraw.withgoogle.com/ is a good baseline to start from for a more resilient gesture recognizer


it thinks everything I draw is a caret.


> ^^ ^^^^^^ ^^^^^^^^^ ^ ^^^^ ^^ ^ ^^^^^^

I don't understand what you're trying to say here.


People here testing out the example on this page and reporting errors seem to be missing the fact that this demo is "trained" on one example. The linked paper[0] goes into error rates, and they get better pretty quickly with a few more examples.

[0]https://faculty.washington.edu/wobbrock/pubs/uist-07.01.pdf , page 8


I've just tried it, and it's pretty bad, without training at least.

My rectangle is recognized as a caret, my zigzag as curly bracket.

And it doesn't support drawing a shape in two strokes, like the arrow for example.


There's no "training", it's more of a data sample matching, akin more to these new vector databases than a neural network. You have to have gesture or point cloud samples in the data set


I played with this for a bit and found it too simple. If you don't draw the example shapes exactly, it confuses them. I recommend playing with "delete" versus "x" from the example shapes to see just how poorly this does. I could not get it to consistently differentiate between different drawing techniques.

This would certainly get you started for gesture interfaces, where drawing a shape the same way every time is expected. It would not be a good fit for the use case here of diagramming.


Agreed, it really works too well for how simple it is!

We implemented it in ES6 as part of a uni project if anyone's interested: https://github.com/gurgunday/onedollar-unistroke-es6


I tried 4 different shapes (circle, rectangle, triangle and heart) and it always said "Ellipse with score ...".


From the README:

> By default, it recognizes three shapes: Arrow, Check mark, and Ellipse.

> You can add more templates by drawing them and clicking on the Add Template button.

It worked well for the three - except a clockwise circle wouldn't work, only a counter-clockwise.


it works really well if all you are drawing is an eclipse ¯\_(ツ)_/¯. Could be a bug in the client or your implementation of $1.


I implemented that in Objective-C when the iPhone was new-ish. It was a fun demo on a touch screen. It was surprising how well it worked for how simple it was. https://github.com/ddunkin/dollar-touch


it does not work as well.

I have this deep seated fear that NNs will be the death of the lessons learned from 1970-2010. After all, if you can use massive amounts of compute to materialize what seems to be a good enough function approximator, why do advanced algorithms at all?

Obviously the reason we should is that approximators like the NNs have explainability issues and corner case unpredictability issues plus they are bad at real world complexity (which is why self driving efforts continue to struggle even when exposed to a narrow subset of the real world).


I think you're right on about explainability and unexpected handling of corner cases - but I think one of the lessons from GOFAI is that handcrafted algorithms might look good in a lab, but rarely handle real-world complexity well at all. Folks worked for decades to try to make systems that did even a tiny fraction of what chatgpt or SD do and basically all failed.

For safety stuff, justice-related decision-making, etc I think explainability is critical, but on the other hand for something like "match doodle to controlled vocabulary of shapes" (and tons of other very-simple-for-humans-but-annoyingly-hard-for-computers problems), why not just use the tiny model?

Maybe if we get really good at making ML models we can make models that invent comprehensible algorithms that solve complex problems and can be tweaked by hand. Maybe if we discover that a problem can be reasonably well solved by a very tiny model, that's a good indication that there is in fact a decent algorithm for solving that problem (and it's worth trying to find the human-comprehensible algorithm).


exactly




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: