Hacker News new | past | comments | ask | show | jobs | submit login

> write up a more specific use case list with specific commands and responses

"no swear words" rule for kids. The speaker need to maintain a global counter for kid_id/word counter for every time a swear word is heard. Will reset every week for rewards/penalty, etc.

The parents (as admin) could add/remove the swear words list, and can configure how long the counter need to be rotated.

A non-trivial example, I am an ESL speaker, trying to teach my kid to learn English. But had little patience with the grammar and pronunciation. (My Bad)

I'd like to record a simple poem e.g. The Moon by Robert Stevenson, ask my kid to listen to it without need to stare at any screen, the aim is to recite it correctly, the programmable speaker would correct any missing words with stronger emphasizes next time, and count any badly spoken words with a counter, test it several rounds to see if my kid improves and recite the full text correctly.

There are many family games that can be played around a programmable speaker, e.g. as the dungeon master. The story teller, the on-demand background music composer, etc.




Ok, here are some quick musings, I’m open to talking about different approaches for any of this or additional features I didn’t cover here. I’m hearing two stronger use cases here.

1. “Family database” - you tell it facts and it tries to remember and reproduce them later when asked. Like storing your family tree somehow and asking about it. This has a lot of nuance but is likely possible in some way with current technology.

2. Teaching assist. You give it a task, such as an English text, and it helps train and evaluate your kid.

For (1), I think there are three obvious approaches to me:

1. Manually enter the data with a computer, but use natural language queries (some kind of Intent model) to access it

2. Have a strict set of data that you can enter (such as family tree, grocery lists, voice memos), and allow slightly strict natural language-ish statements to add data, and natural language queries to access it.

3. Convince a general understanding model like GPT-2 to learn your facts, and just pipe your questions directly into it. This is the coolest answer but would likely also be wrong more often.

I think (2) is an easier task, and likely would use an entirely different approach for it than (1).

(2) is especially easiest if the system already knows the text of the poem. If you’re making up a new poem, or speaking a poem it can’t somehow look up, it will be harder to match two voices against the text of the poem.

One caveat to all of this is I’ve heard speech recognition doesn’t work as well on children because they aren’t well represented in the training data.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: