Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Nihongo – Study Japanese using authentic text from games, songs, etc. (nihongo-app.com)
139 points by chrisvasselli on Aug 20, 2015 | hide | past | favorite | 101 comments



OP, this is wonderful. I'm a long time japanese learner and I wrote the iOS Anki app.

Just looking through the website (unfortunately I don't have an iOS device on hand), you've got some killer features here that I've wanted for a long time.

'clippings' looks really excellent. I would go so far as to say this is a game-changer for me as it currently stands. If you took the same idea and went bigger, that would be awesome: I'm thinking an e-book reader. That is, load up a Japanese ePub and then have one-click access to a context-aware J->E dictionary lookup via your app. Then being able to generate cards from the book would be perfect.

If I had to pick a missing feature (again just going from the description) it would be audiovisual context. This has been crucial to my learning - any plans to integrate it?

Finally, I don't know if you're aware but there is precedent for 'clippings' in the tools that generate Anki apps from movie subtitles. They create cards that have a still from the movie, the audio clip, and the subtitles in Japanese and English. Yours is more universal, but I think something that combined features of both would be even more powerful (i.e. audio and images).


I would LOVE to have integration with e-books, I think that could totally change the way I learn Japanese. Since over time the app learns what words you know and what words you don't, it could even suggest books for you that are around the right difficulty level. Wouldn't that be amazing?

Unfortunately, I haven't been able to find any place where you can purchase DRM-free e-books, so I think this would require working with Japanese book publishers. The one exception is the Harry Potter books, which are published DRM-free through their own website. I read the first two Harry Potter books using clippings to generate flashcards for each chapter as I went along, and it was incredibly helpful.

The idea of audiovisual context is interesting. Could you tell me more about what you have in mind? I've thought about adding text-to-speech options on the dictionary entries and in flashcards. But something more complete, like what Rosetta Stone does, seems a lot harder, just from the perspective of acquiring rights to the images.


Aozora Bunko (http://www.aozora.gr.jp/) is the only legitimate source I know of for ebooks (and they are free). Long term it would be really great to get some Japanese publishers to commit to ebooks, but I know that historically they have been strongly, publicly resistant.

The Wakaru app does allow ebooks reading with a dictionary, but I always found it clunky and I don't think it allows easily generating cards from a book, the dictionary was less useful than yours looks, and it doesn't learn your vocab over time.

For audiovisual context, I'm really thinking of:

  - images to match vocab items (potentially auto-sourced from a Creative Commons licensed image search engine)
  - images from articles to match sentences from articles
  - sound clips to match text taken from audio captions
  - video clips OR audio clips + screenshots to match sentences from video subtitles
Regarding image rights, I would say that there is a huge amount of stuff in the public domain or liberally licensed (e.g. on Mediawiki). But you could also allow the user to add their own images, where they take responsibility for making sure they have the rights.


Yeah, unfortunately I find the content in Aozora Bunko not all that interesting. Or at least, I don't know how to find good content in it. I actually built a proof-of-concept of a "Books" tab in Nihongo using Aozora Bunko, but I found it sort of unsatisfying, so I never pursued it further. I haven't used Wakaru before, I'll have to check that out.

I'll explore the audiovisual ideas more, that's pretty interesting, thanks. I'd love to have user-generated content in the app, like the ability to share decks. Seems like that could be a good way to handle image rights as well.


I've got the same problem with finding decent content in Aozora Bunko. For me, the main issue is how to find content which matches my level (actually being interesting being the other problem, but seemingly harder to solve).

I set up a really hacky "grader" algorithm and run it on a bunch of the Aozora Bunko books, then put them online [1]. The very "easiest" books are kinda boring, but it is certainly easier finding usable material in that list than looking in Aozora Bunko itself. The one I've found most interesting so far is 銀河鉄道の夜 [2]

[1] http://www.readyourgrade.com/

[2] http://www.aozora.gr.jp/cards/000081/files/43737_19215.html


This is awesome. I'd love to have these kind of features in Nihongo.


Ohh this is cool. Very nicely put together!


memrise.com handles media how Blahah described. User-submitted and it's the users' responsibility instead of Memrise's responsibility.

Mnemonics is essentially audiovisual cues to aid learning. Memrise is largely centered around them.

You might be able to borrow concepts, although this is making an assumption that you have not heard of Memrise.


I've heard of Memrise, but haven't really spent any time playing with it. In fact, I just checked my phone and it looks like I downloaded it at some point and never opened it. Adding it to my list to check out, thanks!


> The one exception is the Harry Potter books

Harry Potter is notorious among Japanese readers for having a horribly inaccurate translation. Example: http://islaya.sarashi.com/haripota01.html


Interesting, I hadn't heard that before. Still, for people learning Japanese, I tend to think it's a good choice. At my level, it's easy to get lost when reading things that I don't already know the general plot for. Harry Potter is pretty easy to follow and not too difficult from a vocabulary perspective.


> 'clippings' looks really excellent. I would go so far as to say this is a game-changer for me as it currently stands.

Midori, my personal favorite iOS J/E dictionary, has had a feature like this since I started using it, so for at least five years or so. =/ See the "Text Translation" section: http://www.midoriapp.com/features/


Yep, Midori's great, and I even took some inspiration from them for the Clippings UI. For me, the big difference is that Midori's version isn't integrated with flashcards.


So it does, and it looks nicely executed too. Wish I'd found it back when I had an iOS device.


I have also been at it for a while now. What I notice nowadays is that the same English->Japanese cards keep coming up quite often, because they have very similar cards/synonyms. Therefore, when adding new words, I take extra precaution to come up with a short 2-4 word sentence that is unique enough and captures the meaning quite well. Sometimes it might take me a few tries though, when I end up confusing it with another card anyway. This may also mean that sometimes I have to go and change a card/note I made a few years ago, because it 'conflicts' with something I want to add now.

So with regard to the clippings feature I feel that generating single-word flash cards might come back to bite you. It makes English->Japanese "production" difficult and annoying in the long run. On the other hand, it also makes Japanese->English unnecessarily complicated with all the meanings associated with a single word. A relatively simple example that comes to mind is 手入れ, of which I would rather have two short sentences describing each use case. Or with regards to the clippings feature, I want it to generate the card with the meaning I encountered, because the other meanings of that word might never actually be used by the Japanese.

I am not sure an app will (or even should) be able to generate such cards. Condensing both the Japanese, while getting the English meaning right, is hard enough for a human. It certainly would be awesome, because at the moment I have to resort to generating the cards manually for the words/sentences I encounter while reading manga or books on my Kobo e-reader. History shows that (1) I am too lazy to generate a card for a word that I had to look up, and that I tend to just keep on reading (and forgetting the word again later on), or (2) that I actually stop reading for a few days/weeks because I feel I have to enter the back log of cards in Anki before continuing.


Really good observations.

With regards to English->Japanese cards, I actually decided not to support them in the app. I used to study with both J->E and E->J flashcards, but I find that E->J gets you into the habit of thinking about one-to-one relationships between words in English and Japanese, when most of the time the relationship is not that simple. I found it more effective to study only J->E, and internalize the general meaning and "feeling" of the word. Then, as you start to encounter it in natural contexts, you'll internalize the subtleties to the meaning. When you go to produce Japanese, it's best not to start from English and translate to Japanese anyway, so I found the most important thing was to internalize the Japanese words.

The whole experience you mentioned where you either didn't create flashcards for words you encountered, or stopped reading to catch up on generating flashcards is exactly why I built Clippings. And you're right, they aren't going to be perfect, but I still think it's way better than doing it manually. 手入れ is a good example, here's how that would work in Nihongo:

When you create a clipping that included 手入れ, a flashcard gets created with 手入れ on the front, and "repairs, maintenance, tending, trimming, grooming" on the back. That's the primary meaning in the dictionary. If you get to that flashcard, and that doesn't sound right, you can edit the flashcard to add to or replace that backside text with "crackdown, (police) raid". There's no way we can know which meaning was intended in the text, but we can at least make it easy to edit. And if you want, you can always add your own custom text to be displayed on the back of the flashcard. If I encounter a sentence that really clearly shows how the word is used, I'll often add that on the back in addition to the dictionary meaning.


Every feature in this app except the flash cards already exist in "imi wa" that has been available for years.


I think I have a fair number of new ideas in Nihongo that improve on imiwa. I used to use imiwa, and it's a good app, but I wouldn't have built Nihongo if I didn't think there were significant ways I could improve on it. I hope you'll check it out! =)


Hey HN, creator of Nihongo here!

I spent a long time studying Japanese vocabulary using Anki, Flashcards Deluxe, etc., but I found that studying using premade word lists never worked that well for me. Nihongo is built around studying the words you're encountering naturally through your own hobbies and interests. Flashcard packs are automatically generated from the words that you look up in the built-in dictionary. You can also copy and paste Japanese text into the app like song lyrics, textbook readings, videogame scripts, or books, and it will automatically find all the words contained in the text, and generate flashcards from them.

I built Nihongo because it was the tool that I wanted and felt like was missing for studying Japanese. Hopefully it'll be useful for some of you too! I'd love to hear your feedback.


I'm also the author of a Japanese learning tool: https://github.com/mikekchar/jldrill I have some advice for you if you decide to use spaced repetition in your flashcards (not sure if you do already as I haven't had time to look in detail).

JLDrill is mostly abandonware at the moment and is written in very poor Ruby ;-). I don't actually recommend that anyone use it. It was written for me. I think the pheonix branch will actually work on modern Ruby, but I was half way through trying a different scheduling idea, so it probably has considerable bugs.

However, if you decide to do spaced repetition in your app, I would humbly suggest reading JLDrill's strategy on it: https://github.com/mikekchar/JLDrill/blob/master/web/src/Str...

Most tools (like Anki) use the SM2 algorithm, but I found that my strategy works considerably better (for me, anyway ;-) ) The latest version of JLDrill has an ability to "forget" items if you have neglected studying them for too long: https://github.com/mikekchar/JLDrill/blob/master/web/src/Usa...

Feel free to ping me if you have any questions.


Very interesting! Nihongo does use spaced repetition, and it looks like there are some similarities and some differences with JLDrill. Here's the overview:

- When you select a deck, you study words 10 at a time in "learn" mode.

-- In "learn" mode, you have to get the flashcard correct 3 times in a row, at which point they are "learned".

- Every day, you'll be given a set of cards you've learned to "review".

- These cards use a form of spaced repetition. In order to make spaced repetition more approachable, I don't use time intervals in the same way apps like Anki do. Instead, each card has a level, and as you progress through the levels, the interval at which they appear increases. Each level takes three steps to get through, each of which has its own interval.

-- Level 1: 1 day, 2 days, 3 days.

-- Level 2: 1 week, 2 weeks, 3 weeks

-- Level 3: 1 month, 2 months, 3 months

-- Level 4: 6 months, 9 months, 1 year

-- Level 5: Never show again

- If you mark a word wrong at Level 1, it drops down to Level 0, where it will keep showing up in that same session periodically until you've marked it correct enough to move it back to Level 1.

- Getting through your daily set of review cards is encouraged as being the most important thing, adding new cards is secondary.

- We don't want to discourage people from coming back, even if they got lazy and stopped studying for a month. So we never build up a huge pile of flashcards for people to review.

Give it a try if you have a chance, and if you have any thoughts/suggestions I'd love to hear them!


Hi. Unfortunately, I don't don't have an IOS devices, so I can't try Nihongo. However, I'll give you some feedback on the spaced repetition stuff. Hopefully you will find it interesting.

IMHO your goal in creating a spaced repetition algorithm is to balance the cost of learning against the cost of forgetting. I generated a lot of data for myself of how I used JLDrill. What I found was that it takes me a little over a minute (on average) to memorize (or re-memorize) a word, but about 2 seconds to review it if I already remember it. Just to keep the math easy, let's assume that it takes 100 seconds of review to memorize something and 2 to review it if you already remember it.

What I found was that there isn't much point in reducing review frequency beyond every 2-3 months. Remember that while the forgetting curve has a gamma distribution, the section of the curve between 90-100% recall is nearly linear. So you can approximate the drop off linearly.

Assume you have an item which is scheduled to be reviewed after 1 year. Further assume that this corresponds to a 90% recall rate. This means that 10% of the time you will forget it. Since the average re-learning time is 100 seconds, that means that the average cost of forgetting is 10 seconds. If you were to review it every 6 months instead, you would review it twice in a year (an extra 2 seconds of review time), but the recall rate would increase to 95%. This means that the forgetting cost is only 5 seconds. So you have increased your review cost by 2 seconds, but decreased your forgetting cost by 5 seconds. If you were to double the frequency again (every 3 months), then you have increased the review cost by 4 seconds and decreased the forgetting cost by 7.5 seconds. The sweet spot seems to be about every 2 months according to the data for me.

So basically, I would say that your level 4 is counter productive and most of level 3 is at the edge of being useful.

One of the questions I struggled with was, "At what point do you prune vocabulary from the drill?" When do you know it "well enough"? Unfortunately, this is a difficult question because it depends on how often you use the language outside of study. Word frequency lists are illuminating for this question, though. I have not been able to find a good list of Japanese frequencies (the one in Edict is based on newspapers and therefore not good for general study). However, take a look at this English word frequency list: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/PG...

The linked list shows you the most common 10K words. The first word has a frequency of about 5% (so read 100 words and you will see it 5 times on average). Even at only the 1000th most frequent word we're already down to 0.008%. So I would have to read about 12k words before I see it on average.

Consider that a 5 year old child has a vocabulary of 5k word families (not just words of vocabulary, but all permutations of that word -- so police and policeman belong to the same word family). The frequency of the 5000th word is 0.0013%, so I have to read close to 80k words before I see it on average.

Consider that in English a book has about 400 words per printed page (reference books have more, children's books have less, of course). This means that in order to be exposed to the 5000th most frequently used word once you would have to read about 200 pages of text (a short novel). Studies have also shown that you need 95% comprehension of a text in order to correctly infer the vocabulary and grammar that you don't know. So it means that you probably want to be exposed to that word 2 or 3 times a year. At 5000 words of vocabulary, you need to be reading 2-3 short novels per year in order to maintain your vocabulary.

This gives you some idea of the amount of natural language that you need to be exposed to if you want to "naturally" remember something indefinitely. A 5 year old child's proficiency in language is not terribly good (I invite you to test this out on friends and relatives ;-) ) If you want to be at a 10 year old's level you need 10k words (last word has a frequency of 0.0004% or every 250k words).

Anyway, the long and the short of it is that permanently expiring a word form study is potentially useful for beginners, but by the time you reach intermediate level, you almost certainly want to hang on to every word. Unless you get a lot of outside exposure to the language, you are unlikely to encounter these words in normal usage. In fact, I came to the conclusion that expiring words is essentially useless because you only want to do it for the most common 2k words and those will only correspond to about 4 extra hours of study per year. Allowing the user to manually expire a word is probably useful, though.

Finally, for words that you are just learning: There is some new-ish research (i.e., the last 20 years or so) that is very interesting.

There is a phenomenon known as the "spacing effect" (which is different from the confusingly similarly named "spaced repetition" and even more confusingly similarly name "spaced learning") -- to be clear, all three of these things are different ;-).

The spacing effect suggests that if you wait until you forget something and then relearn it, it will be easier to remember the second time. The interesting thing is that if you learn something, then wait until you forget, then learn it again, it turns out that you will remember it better than if you had remembered it consistently the whole time.

The spacing effect is essentially the opposite of spaced repetition. In spaced repetition we want to review something before the person forgets it. In spacing, you want to review just after the person forgets it, so that they have a chance to relearn it. The papers I have read on the subject indicate that the difference is very pronounced.

With "spaced learning" and the related "interleaving" approach, instead of spending a large chunk of time studying, you have much smaller sessions that are separated by time. The idea is to have enough time to forget something. With interleaving, you interleave study material of different subjects so that your mind is forced to think about something else (and in my understanding, forget the previous material as fast as possible). This reduces the time you need to wait until you can relearn, which increases the speed at which you can learn the material.

So having said all that, my suggestion is to space your level 1 items much further apart. A review after 1-3 days is good for the first review and then you want to wait for another few days. The goal is to allow the user to forget and relearn their material once or twice before it is on the "long term" track of review (hopefully without forgetting).

Hope you found the above interesting. There is lots of research on the topic at the moment so if you enjoy reading psychology papers, there are many to choose from ;-)


Thanks, you've definitely given me a lot to chew on.

I definitely struggled with a lot of the same questions. I thought a lot about the question of "when do you know a word well enough, and prune it?" and I came to generally the same conclusion, that there's not a lot of benefit from completely removing a word. The cost of asking someone to review a word once every 3-12 months is so low, that it's best to just keep asking them.

And for all practical purposes, that's what Nihongo does. In order to naturally bring a word to level 5 (the level at which we no longer prompt for it), it would take almost 3 years. And that's assuming that you didn't ever get it wrong, and never missed a day where it came up. I'm definitely the only person who's reached this (starting from when I started development on Nihongo), and I think there's a very good chance I'm the only person who ever will.

So practically, the question is what to do when something reaches the upper reaches of level 3 and level 4. Do you keep showing these words to them once every 2-3 months? Or do you continue to space them further apart? I'm not sure I'm convinced either way, but it's something I'll definitely continue to think about and re-evaluate. I have a little bit of skepticism around your numbers - for example, I don't think it takes 100 seconds to re-learn a word that you knew very well at one point and just forgot how to pronounce - but I think the point is valid.

As for spaced learning, it's an interesting idea that I haven't heard about before. The key thing I worry isn't being considered though is motivation. Unless a user is familiar with the theory behind it, I think forgetting and re-learning each word would be terribly demotivating. I'm definitely going to be doing some more research though.

Thanks for pointing me at all this, I've got lots of reading to go do!


By the way, if you have any specific papers in mind that you think would be good reading, please let me know!


Hi Chris, looks like an awesome start. Some quick suggestions and questions:

1. You should fully support romaji entry for looking things up. The mechanics of switching input methods on iOS are pretty terrible so it is really inconvenient to have to switch into Japanese input mode. I searched for "hana" but got only 花合わせ and 花を引く as matches -- because those have 'hana' in the English definition string. Searching for はな brought the two obvious matches (花 and 鼻) to the top, but I had to go back and cycle through several other input methods to get to Japanese input mode, which was annoying and took longer than the rest of looking up the word. Some other dictionaries do this, and even sites like Amazon Japan now do this for searches as well -- dealing with input methods is just a hassle in general.

2. The understanding of verb conjugation seems really awesome and jumping between things on a "learning tangent" works really well.

3. You have to sync vocab lists and spaced repetition drill status/progress to the cloud. It's just too useful not to do it -- you want to be able to do that in quick downtime moments from any device. Luckily there are many easy ways to do it nowadays. I don't blame you for shipping without that, though! ;-)

All in all, great start!


Thanks for the feedback!

1. I actually just finished implementing romaji entry, and it will be coming in the next version update. Honestly, after implementing it, I can't believe how much time I've wasted switching input methods. =)

2. Thanks!

3. I know, I know! I really want to, but I spent a couple weeks diving into it and found it really buggy and hard to get right. I'm using Core Data under the hood to store this stuff. Any suggestions for tools I should look at to help with syncing?


For syncing, look at couchbase mobile, which recently has added support for Core Data: http://blog.couchbase.com/syncing-with-core-data


Looks promising, thanks. I initially tried with iCloud syncing, and it was a mess.


> 1. I actually just finished implementing romaji entry

Great! I didn't see a need to put it this bluntly in my initial comment, but having to manually switch input methods is enough of a deal-breaker for me that I was going to uninstall the app... but now I will leave it installed and wait for that update. ;-)


I like the approach that the kanji useing website I am useing [0] took to input. They accept the input itself to be in romaji, by transliterate into hiragana as you type.

[0]https://www.wanikani.com/


That is good when the input is known to be Japanese, but in this case the user might be inputting Japanese using the English keyboard, or might be inputting an English word to look it up in Japanese.


I would love to use this app! I have been doing something to similar effect for Anki except using web scraping/API of various japanese dictionary sites with python scripts. The whole thing is clunky and buggy and requires a fair amount of manual work (which i always intended to automate, but not had the time), so i would love to use an app that does this all well.

Unfortunately being iOS only locks me out. I noticed at the bottom of the page it states it was built from the ground up for iOS. This tells me it could take a while (maybe never) to get a port to another OS.


Yeah, unfortunately it's all built very iOS-specific. And since this is a side-project for me, I'm not sure I'll find the time to make an Android version anytime soon. If I ever have the time/resources, it's something I'd love to do!


Native speaker of both Japanese and English here. This looks quite nice and I've forwarded it to my friends who are learning Japanese. :)

I was initially really skeptical when I saw the tagline about "learning from songs and games", because songs are grammatically broken and often nonsensical, while games/anime/manga often have brash speech that you stop using once you're past Junior High. But being able to learn off of your interests in a digitally integrated way is quite awesome (I remember meeting a South Korean guy in college whose Japanese was quite good just from learning form Anime/manga).

Commonality is useful as well, since the language has changed dramatically in the last 100 years, particularly pre and post war.

Kudos! :)


Yeah, it's hard to sum up the idea in 80 characters, but I like that idea of learning off of your interests.

Thanks!


I work in Tokyo and speak okay Japanese, but sort of hit a plateau after college (partly due to knowing 'enough for daily life' and partly due to having to work and no longer having to go to Japanese class).

So my study lists consist 100% of words and phrases I hear and don't understand when I am out at the Japanese semi-mandatory drinking-with-coworkers. I currently use Japanese.app, but am open to switching since it doesn't sync vocab lists hint hint :-).

So I study off my actual experiences here in Japan. I too was instinctively skeptical of the "games, songs" bit. (Although, I notice that text is not actually on the website.) I think "learning off your interests" is better, too.


iOS only :<

No planned Android support anywhere? No signup for android news subscription system or RSS if you don't want to setup a mail system?

Sadface.


I'd love an email notification for android support as well!


Tangential shameless self-promotion: I built a tool for Chinese learners that can take starred entries from the Hanping Dictionary app (for Android) and import them into Anki. It's available at https://github.com/kerrickstaley/hsk_flashcards. File a bug if you have trouble running/using it. Obviously not as seamless as Nihongo :)


Before I built Nihongo, I used to use Imiwa as my Japanese dictionary, and Flashcards Deluxe for spaced repetition studying. I built this ridiculously convoluted setup where I would upload a list of words from Imiwa to dropbox, my laptop at home would detect a new file in my dropbox and run a script to convert the format to the one supported by Flashcards Deluxe, and then I'd open up Flashcards Deluxe and download the deck from dropbox. I told someone about this, and they said "man, wouldn't it be awesome if there were a Japanese app that could just do that automatically?" And thus, Nihongo was born. =)

So KerrickStaley, wouldn't it be awesome if there were a Chinese app that could just do that automatically? =p


You're giving me ideas :P

The problem is that I don't have time to build a monolithic app that serves as both a good Chinese dictionary and a good flashcard app; it's much easier to cobble together a system using existing software.


Nice app, will definitely play with it soon!

By auto-generating flashcards, what are you doing about showing accurate definitions/meanings for the cards? I couldn't tell from your page alone - are you just pulling down the list of senses from JMDict or something? That's always been my hesitation with auto-generating cards, the "meaning" side of the card tends not to be helpful.


By default, the backside of the card has the first "sense" from JMDict on it. This sense is supposed to represent the primary meaning of the word. You can also add/replace that with the other senses by editing the flashcard, or you can add your own notes to the back.

I've found using the first sense works pretty well. Having the whole definition on the back is way too much for your brain to process, but the first sense is usually only a few words that encapsulate the meaning of the word pretty well.


I've been studying the language for a while, and recently made the switch to Japanese-Japanese dictionaries, after using EDICT for a long time.

This has highlighted some reservations I have about it.

The most available example (not the best) is 適当, where it can be interchangeably be used to mean "adequate" and "half-assed", sort of sarcastically. The definition being a mostly undifferentiated bag of words, without necessarily regard for nuance or typical use cases.

Contrast this with the goo dictionary, which has a slightly better structure (http://dictionary.goo.ne.jp/leaf/jn2/151064/m0u/%E9%81%A9%E5...) and the 類語 dictionary, that gives synonyms and situations where you would use one over another (http://dictionary.goo.ne.jp/leaf/thsrs/2512/m0u/).

I understand that there's probably no way to deal with this in a scaleable way that would be as easy to turn it into flash cards but it's kinda sad to see the gap between the two solutions.

Apologies for not having a more constructive suggestion.


I've been relying on japanese-japanese dictionaries for a while, and they work best IMHO once you're past a certain level. Also, most good non-jp-jp resources are in english, and, after all, english is not my native language either, so I might as well deal with everything in japanese. I just wish I could find an app that does everything in japanese, but I never found one. Tried a few, but there are so many that it's hard to find something decent. The result is that I haven't actually actively studied japanese for a while, and rely on "passive" learning from talking, reading and watching TV (I live in Japan, that helps). I'm actually sufficiently annoyed that I'm not retaining as much as I would like that I'm considering writing a web app to handle my own learning.

Edit: by the way, I once stumbled upon an online dictionary that also showed the "standard" intonation for words (something that I've very rarely seen mentioned), but I can't find it again :(


Yeah, it's definitely a weakness in Nihongo. I'd love to have a really well-integrated Japanese-Japanese dictionary, but I've never been able to find a J-J open-source database like JMDict/EDict. My solution for now is that you can tap on the menu button in the top-right of any entry, and open it in the built-in iPhone dictionary, which includes a J-J dictionary. But that's not a very satisfying answer.


>You can also copy and paste Japanese text into the app like song lyrics, textbook readings, videogame scripts, or books, and it will automatically find all the words contained in the text, and generate flashcards from them.

How would 「気に入らない」 be parsed for flashcards?

An entry for 気 with it's most common meaning(s) and an entry for 入る with it's most common meaning(s)?


In this case, there's actually an entry in the dictionary for 「気に入る」, so it will figure out that this is a conjugation of that entry, and use 「気に入る」and its most common meaning as the flashcard.

Many of these kinds of phrases exist in the dictionary (I'm using JMDict under the hood), but for those that don't, it would create two flashcards, one for each word. The flashcards are set by default to use the most common meaning, but if there are alternative meanings, you can choose to show those in addition, or instead of the most common meaning.


* internet high five *

Awesome.

I couldn't think of how to phrase the problem. "Short phrases with meanings that don't match the individual words used being parsed as individual words instead of the short phrase" is kind of wordy and a little confusing.


> "Short phrases with meanings that don't match the individual words used being parsed as individual words instead of the short phrase" is kind of wordy and a little confusing.

These are called "idioms", or, if you want to be really technical for its own sake, "non-compositional phrases". But I note that, at least according to wikipedia, "idiom" is the technical term as well as the vernacular one.

https://en.wikipedia.org/wiki/Phraseme#Non-compositional_phr...


You know those times where you need to describe something but are forgetting the word so you call a "bridge" "an elevated structure built to allow crossing over bodies of water and roads"?

That was one of those times... though I appreciate you calling me out on it. :)


Heheh, got it! Yeah, fortunately JMDict is a really mature project, and has got a ton of these kinds of things covered.


Which morphological analyzer are you using to do the deconjugation?


I actually ended up rolling my own. There are some tricky cases involved, and I wanted full control over it.


Actually, I should give some proper credit here. My husband Justin wrote this component, which turned out awesome. He's really into Japanese linguistics. =)


Would you have specific examples of things that wouldn't have been possible with e.g. mecab or chasen?


To be honest, I haven't spent a huge amount of time exploring mecab and chasen, although I probably should. The decision to write our own was partially just because it seemed like a fun academic exercise, and we learned a lot from building it. We also have some features in mind for the future that it will be useful to have deep control over the the tokenizing and deconjugation, but those haven't made it into the app yet.


>I built Nihongo because it was the tool that I wanted and felt like was missing for studying Japanese. Hopefully it'll be useful for some of you too! I'd love to hear your feedback.

This alone made me download the app even though I haven't seriously studied Japanese since college. Have you written anywhere about your learning habits / methods?


I haven't, but I'd like to! I've been meaning to start a blog to discuss topics like that, but haven't found the time quite yet.


Really like the app, any plans to make an iPad/Universal version? Thanks!


Yep, I'm working on it right now! That's going to be the next big update. I'm aiming to release shortly after iOS 9 comes out, so September/October probably.


Sounds great! Is the code open source? Do you plan versions for Androind and WinPhone? We would like the latter.


Sorry, it's all closed source right now. There are pieces of it I might release as open-source at some point though.

As for Android and Windows, I just don't have the time unfortunately. This is a solo side-project for me. But if I ever find the time/resources, I'd love to port it to other platforms.


@chrisvasselli:

When I was a child, my parents tried to teach me Japanese as my first language, but we live in Brazil and that didn't work out, as soon as I got into school I dropped it and started to speak exclusively in Portuguese. Still I have some knowledge of "spoken Japanese", even though I don't know how to write the kanjis.

That's why I found this clippings feature really great. I can read the pronunciation over the kanjis and deduce what it means by context, I've always missed that when trying to study Japanese, and if I can't deduce its meaning, it's also easier to lookup in the dictionary if I know its pronunciation (in the app it's probably just a click). Some people think that going through this trouble actually helps memorizing the kanjis, but for me it's just a time consuming task.

Too bad I have no plans to buy an iPhone. No plans for an Android or a web version?

Also, how did you get these texts to feed the clippings? By hand? Scraping?


Nice, that's exactly what I'm going for! Part of my philosophy with the app was to make my studying of Japanese as efficient as it could possibly be. I tried to shave out all of the time spent doing tasks that don't contribute to my actual learning, like trying to draw kanji, count strokes, manually type out flashcards, etc.

As for the clippings, those are all user-supplied. For example, I love classic JRPGs, and FFVI is my favorite. If you google around a bit, you can find the script for the Japanese version of FFVI. Import the script into Nihongo, and you can pre-study the words as you play along. You can also filter the flashcards down to just the words that show up more than once, so if you're not ready to study everything, you can just focus on the important words. It also works great if you want to study song lyrics, and I even used it to read Harry Potter in Japanese (thanks J.K. Rowling for putting out DRM-free books!)

Unfortunately, I don't have any plans for Android or the web right now, since this is a solo side-project for me, and I just don't have the time. But I'd love to do it in the future!


Looks great! Any timeline on Android?


Thanks! Unfortunately, I don't have any plans for Android right now. This is a solo side-project for me, so I just don't have the time. I'd love to at some point in the future though!


This


Second vote for Android version.


>Our dictionary is made to find what you're looking for fast, with the words Japanese speakers actually use at the top of every search result.

May I ask how this was factored? Also what if I want to use a certain dialect? Osaka might opt for a different word than Hokkaido. I assume standard Tokyo dialect?

>Word Commonality

Also interested in how you measure this. Frequency used in a newspaper? Is it part of the Jouyou kanji? How about age of speaker? "Common among younger generation" and "common among older generation" might be a better benchmark for commonality.

>We'll even add furigana.

Is it editable for instances when it is wrong?

I don't have an iPhone to test, but I am very interested in helping others' learn Japanese and more importantly making sure they are learning correct information. I've seen few coded systems that weren't heavily curated provide accurate information. There's nearly always a "well...that's...actually wrong" somewhere in the dataset that things use.

Not saying that's the case here - just that it is a possibility.


Great questions!

We include words from all different dialects, but standard Tokyo dialect will inevitably float to the top. I learned Japanese in Kansai, so I'm sensitive to this issue. =)

As for how the commonality is determined, I use a combination of corpuses including newspapers, novels, literature/poetry, and spoken language. I actually initially had a feature that would tell you "this word is more common in novels", but it didn't work consistently well enough, so I ended up scrapping it.

When we actually choose how to order the search results, we take into account a few more things - like if the search string is part of the primary meaning of the word, if it's a partial match of the meaning string or a full match, etc. In order to determine the relative importances of the various factors, we actually automatically optimized them against a huge set of vocabulary lists that a Japanese native, English fluent speaker made for Japanese students studying English. This gives us the best chance of picking the Japanese word that a Japanese native speaker would choose for a given English word.

As for the furigana, since we're taking arbitrary user input, there are bound to be mistakes here or there, and that's the case with this app too. I don't have any plans to let people edit the furigana right now, but I am improving the parsing all the time.


>we actually automatically optimized them against a huge set of vocabulary lists that a Japanese native, English fluent speaker made for Japanese students studying English. This gives us the best chance of picking the Japanese word that a Japanese native speaker would choose for a given English word.

That's actually a pretty awesome way to handle it compared to other methods I've seen used.

>As for the furigana, since we're taking arbitrary user input, there are bound to be mistakes here or there, and that's the case with this app too.

Of course, I'm understanding of that issue. Parsing arbitrary input will always have errors, Google isn't exempt from this problem either. :)

It's very, very difficult to get some of these things right (and sometimes just impossible given current technology/software).

My issue with it is that not all people who use the app will be technologically savvy and understand that the furigana could be wrong. For someone learning a new language, giving the wrong reading could be extremely detrimental for them.

I guess just voicing a gripe I have. I understand from a marketing point, every JP app does this when they automatically add furigana readings. I just wish there was some transparency that they aren't always accurate. :\


Yeah, it's a good gripe to bring up. I'd love to come up with some indication of "confidence" in the parsing. Or maybe when there are two possible parses, give the user the ability to see both somehow?


I think giving an option of two possible parsing would confuse people learning. They wouldn't know which is the right one to pick! So adding a second option just ads to the confusion...

I think showing a confidence level would be a good solution based on the number of available readings for the given context of the kanji.

生 would be a good example. When used by itself, the confidence that the reading is correct should be low (since there are many possible options!) but if it is used in 生まれる the confidence level is very high (because there is only the one possible reading). When used in 生す it could be 50/50 (since it could be read な or む).

Explaining this to learners is a little tricky.


Yeah, 生 is a good example. Fortunately, there are corpuses out there with hand-parsed sentences, so we at least train on those to pick the word for 生 on its own that's most common (I just tried - it's なま). We could use this to get our confidence even better than 50-50 for 生す.

I can see how giving users more options could just bring more confusion. Such a tricky problem! I'll be thinking about this one.


Sweet! Looks like a nice addition to WaniKani. I'm using WaniKani for my general studies, but automatic card generation based on my word lookup is definitely neat.


Yeah, I think Nihongo makes a great compliment to WaniKani! Nihongo doesn't have lessons, and isn't really meant to teach you Japanese in the same way WaniKani is. I like to think of Nihongo as a great companion tool for however you're learning Japanese already, whether that be from a class, from living in Japan, reading books, or studying using other tools.


Lately I've been thinking of learning Japanese just like I started learning English. After two years of basic English courses starting from third grade, I started to play videogames like Pokemon and watch movies with subtitles and my vocabulary and proper expression greatly increased. So I should begin studying elementary Japanese for a while and then start playing pokemon in Japanese and study as it seems fit.

Good job OP! An Android version would be great!


Is there anything like this in different languages? I think you could make a great language-learning platform with this idea.


Not that I know of! I'd love to expand to other languages at some point.

One of the tricks is that I think it's important for there to be a best-in-class dictionary attached to it. If users feel like they'd rather use a different dictionary, then you don't get their dictionary history as flashcards, and overall you create a crappy user experience for them if they have to switch back and forth all the time.

Best-in-class dictionaries require someone deeply familiar with the language, and specifically the pitfalls of learning the language, to make. So I think to expand to other languages, I'd want to get passionate learners of the other languages involved as well.

Definitely a long-term goal though.


I'd be interested in helping out, if you decide to take this further. I'm a passionate learner of French and Arabic but I am not deeply familiar with either language yet. I do know some people who are native speakers though.


> Have you ever tried to use a word you looked up in the dictionary, only to find out that no one has ever heard of it?

This is so great. EDICT has so many entries that Japanese speakers have either never heard of, or would never use, or even where the definition is archaic or just plain wrong.

Maybe someday, this app will be available for Android :-)


Unfortunately doesn't run on iOS 7 or below. Any chance you could support it? I'd love to use this app.


The first couple versions of the app supported iOS 7, so when you try to download it, you should get a prompt to download the earlier version. Unfortunately, I don't have any plans to support iOS 7 going forward. =/


The App Store doesn't present me with that option. I believe you need to have an older version already installed to be able to redownload it.


Ah, sorry, I didn't realize that! Unfortunately I make use of some of the features in iOS 8 in a way that would make it difficult to add support for iOS 7. If you ever get a new device compatible with iOS 8 I hope you'll check it out!


Android version please!


I have android :(


I need this, but for Mandarin


And now for Android please! Always looking for a good japanese learnign app.


Could you shed some light on your algorithms (how you declare something as Common, Uncommon) and datasources (JMDict, tatoeba) ?

Disclaimer: I am developing a free japanese dictionary for android


Does it leverage text to speech for pronouciation purposes?


Not yet! But it's on my short list of features to add! Good to know there's interest.


Looks lovely.

Seems like even a starter-level student could benefit from this, with the flashcards and word commonality features.


just curious, how do you mark a word as common? I often used jisho.org or classic.jisho.org (from wwwjdic) and from what i remember, a lot of those words marked as "common" were mostly definitely not.


Yeah, I found the commonalities in WWWJDic/JMDict to be pretty problematic, so I came up with my own. I use a combination of corpuses including newspapers, novels, literature/poetry, and spoken language. Some of these have been hand-parsed by humans, and others I parse using the same parsing as clippings. I think the result is pretty good!


Commonality is a tricky thing. There are words that most native won't actually know, words they know but rarely use, words that used to be trendy but aren't anymore, etc. It's hard to know from "common", "uncommon", and "rare" in what kind of bucket a word or expression would enter, while "native won't actually know" is a very important distinction to make. Moreover, it's not clear which of "uncommon" and "rare" ranks above the other and below "common", both words being synonyms. To give an example, since I gave a try to your app (best I've found so far, by the way, but unfortunately doesn't match my own needs): 自問自答 is marked rare. 出勤 is marked uncommon. Now, looking at this and some other words, my guess is that uncommon is above rare. Fine. 自問自答 might be rare, but there's no Japanese I know (excluding small kids) that wouldn't understand it. (And, in fact, it stuck in my mind because I keep hearing it). So in your classification, there is no room to tell apart those words that most japanese won't know.


Thanks, it's useful to hear that the difference between "uncommon" and "rare" was unclear to you. I'll think about how to make that better.

I definitely would love to have a distinction in the app for "this is a word that any educated Japanese person would know". I haven't found the dataset yet that can let me build that, unfortunately. I checked, and in the case of 自問自答, it actually doesn't appear even once in any of the corpuses I'm using. So it seems like we need some other source of data.


The data used for commonality in JMDict is a little dated (1998) and biased (exclusively based on newspapers, which tend to have specific vocabulary), which I guess is why you mentioned they were problematic. However, there are more recent data sets available on the Monash ftp archive. http://ftp.monash.edu.au/pub/nihongo/ . For example, there's one dataset from 2008 using blog entries from goo.ne.jp, and another with novels. They could be good additions to your corpus (fwiw, for 自問自答 there are 18038 occurrences indicated in the dataset for goo.ne.jp and 68 in the dataset from novels). Certainly, doing some similar work with current data from the net would be useful too. I wish there was some regular scraping done, so that we could always use fresh data. Hell, I wish Google, Bing or any other search engine were just giving out such word frequencies from their spider bots data (and not just for japanese).


Hmm, I actually use that novels corpus. Sounds like you may have found a bug in Nihongo. I'll look into it. Thanks!


'Clippings' looks really handy. Thanks!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: