Shape Catcher: Find Unicode characters by drawing them

cedex12 · on May 7, 2017

Similar tool for mathematical symbols: http://detexify.kirelabs.org/classify.html

brudgers · on May 7, 2017

I could not get it to identify a British Pound symbol after several attempts. The top proposed glyph was much more obscure and the following ones were increasingly obscure from there.

I suspect that the training corpus may have been a table of Unicode glyphs rather than text from the wild.

Psyvire · on May 7, 2017

I think it might be that it's using a font that styles it in a way that's different that's what you're drawing. I got it to recognise it even with my rather crude drawing after I changed the style: https://imgur.com/a/e9B5p

throwanem · on May 7, 2017

This is kind of a missing piece, in a lot of ways. With such a large character set as Unicode's, discovery can be a real pain - when you see a novel character, how do you find out what it's called, so you can find out how to type it?

Unless you're using something like Emacs which lets you point at a character and ask the editor to tell you everything it knows about what's there, this kind of identification becomes a daunting task to contemplate. Shapecatcher does an excellent job of it; as long as you can draw something roughly approximating the glyph you have in mind, it'll very effectively winnow down the search space to a very manageable list of possible matches.

anonymouz · on May 7, 2017

> Unless you're using something like Emacs which lets you point at a character and ask the editor to tell you everything it knows about what's there, this kind of identification becomes a daunting task to contemplate.

If you already have the character in some text file, it would be much easier and reliable to copy and paste it into some unicode table lookup tool (e.g. https://unicode-table.com/en/).

ardacinar · on May 7, 2017

The search isn't really perfect. I tried drawing a (pretty good, IMO) Hiragana "no" and that result was in the third place (First was, a latin small m. の looks nothing like an m). Then tried Greek small sigma (σ) but not perfectly (I draw ny sigmas in a weird way, looks like this: http://imgur.com/a/XYVHO), the top result I got (Malayalam fraction one quarter: ൳) kind of looks like the thing I drew, but the rest of the results are not really resembling it and there's no sigma there.

darkengine · on May 7, 2017

In the box on the right, it says "Japanese, Korean and Chinese characters are currently not supported", which is disappointing, but probably why you're not seeing it in the results.

Silhouette · on May 7, 2017

Interesting idea. It seems to struggle a bit with some types of characters. For example, drawing a lowercase pi would return many characters with more than two legs, which showed up ahead of pi itself and other characters that do have the two. Does clicking on the good/bad feedback links in cases like this help to train the algorithm in some way?

tyingq · on May 7, 2017

Really well done, and handled my crappy drawings just fine.

I did see the link to your thesis on captcha, but a specific higher level blog post on how this works would likely be popular.

Edit: One piece of feedback...it's hard to draw dots. You have to drag the cursor with the button down, or drag your finger in mobile to get a dot. So dots end up more like little lines. Also, an "Undo" to remove the last "cursor down / draw" event would be nice. Starting over for every line is the only current option.

hsivonen · on May 7, 2017

This is cool, though I was a bit disappointed to notice the part about no support for CJK characters after trying to draw one and not having it recognized. It seems to me that looking up Unihan ideographs is an area where a tool like this could be particularly useful.

chch · on May 7, 2017

I've been using Shape Catcher for a while for my non-CJK needs, but the best tool I've found that for Unihan characters has been the one from LINE[1]. Specifically, unlike others I've used, it tends to do a good job with semi-cursive/calligraphic writing. That makes it faster to look up complicated characters (because you can scribble a bit), plus it's useful when you don't actually know exactly what the "original" form of a character is, and have to just draw it from sight.

e.g. if you see [2], and don't recognize it, but can draw it into the tool, you can get 而 correctly, without having to draw the very specific stroke order[3], like in other tools.

It's made for Chinese, but I often use it for Japanese, too. :)

[1] http://ce.linedict.com/dict.html#/cnen/home

[2] http://www.ryuurui.com/uploads/1/5/4/8/15489306/6738815.jpg?...

[3] https://upload.wikimedia.org/wikipedia/commons/2/2a/%E8%80%8...

bentley · on May 7, 2017

Kanjipad does a decent job whenever I try it, and it’s available in my package repos: http://fishsoup.net/software/kanjipad/

amake · on May 7, 2017

Handwriting tools for CJK characters are pretty much ubiquitous.

m-p-3 · on May 7, 2017

It reminds me the special character finder in Google Docs, very well done.

justinclift · on May 7, 2017

I use this occasionally when trying to find a new glyph. There are some drawbacks though:

• Last updated in 2012: http://shapecatcher.com/news.html

• No way to draw straight lines except pixel-by-pixel (really tedious). This turned out to be a pain when trying to draw various arrow types (made of straight lines).

I'm hoping the author, Benjamin Milde, picks the project up again and keeps it updated, or makes it Open Source, then someone else does.

hashhar · on May 7, 2017

This is really great. Works perfectly and solves a very practical problem for me. Unicode really should do something about discoverability though.

yorwba · on May 7, 2017

Using Fcitx on Linux, any Unicode character can be typed using the hotkey for the Unicode addon [1] and typing the description. So if you ever need the "arabic ligature bismillah ar-rahman ar-raheem" (﷽), it's right at your fingertips!

[1] https://fcitx-im.org/wiki/Unicode

hashhar · on May 8, 2017

Wow. That is awesome. Thanks for leading me to it.

ygra · on May 7, 2017

Unicode governs standardization of the character set and related things like algorithms. That is, they are not involved in anything that relates to fonts, input methods, or other more user-facing things.

riffraff · on May 7, 2017

out of curisity, why do you usually need to discover unicode symbols?

mdani · on May 7, 2017

I am learning to read a new language- urdu. My goal is to read some old hand-written documents so I can't copy-paste the characters. But I can easily draw them and confirm my reading. This is perfect for this purpose.

eriknstr · on May 7, 2017

Pretty neat. Would be useful to be able to restrict the blocks that are searched. For example I might know that the character I'm looking for is Japanese, so if I could let it know that I was looking for is Japanese then it could restrict itself to Katakana, Katakana Phonetic Extensions and other blocks if any that apply to Japanese specifically.

trhaynes · on May 7, 2017

Love this tool! (Btw title should be one word: "Shapecatcher")

nfriedly · on May 7, 2017

Android Wear does something like this for emojis - I've gotten pretty good at drawing a "thumbs up" to respond to text messages and such.

tigerBL00D · on May 7, 2017

Pretty cool! I wonder why the recognizer is not very good at differentiating among types of faces (sad face, happy face, etc.)

hughes · on May 7, 2017

𝗇ìϲе homoglyph search tool you've made :)

(it found all the letters of the word "nice" quite well!)

runnr_az · on May 7, 2017

That's really fun!

dbcurtis · on May 8, 2017

for sure! I did about three serious test cases then got distracted into making random squiggles just to discover new wacko unicode characters.

Now that Python supports unicode identifiers.......