Hacker News new | past | comments | ask | show | jobs | submit login
Fakelish – Fake English word generator (nwtgck.org)
110 points by lioeters on Dec 5, 2021 | hide | past | favorite | 80 comments



Nice idea, naive implementation which leads to the output being unconvincing as hypothetical English words. I had a brief look and it seems to be proportionally selecting and sticking together sequences of letters sampled from English words (lib/word-probability.ts). This doesn't take into account syllable boundaries, the way the English spelling system maps between phones/phonemes and the phonotactic properties of English which is why the output looks unconvincing.

A better approach would be to use a markov chain built from sampling English text letter-by letter... an even better approach would be to build your stats from some source of English words in IPA transcription with syllable boundaries etc marked, then map from IPA to spelling via some kind of lookup table. We use a similar process in reverse in my research group for building datasets for doing Bayesian phylogenies of language families


Clearly you are far more of a linguist than I am, but from such a perspective, I had a similar impression; I reloaded the page several times and none of the words struck me as being remotely plausibly English. These are worse than most Hollywood scifi words/names.


A significant improvement on letter-by-letter, but not that much harder, is to use n-grams: "two letters to predict the third" etc. Still not "industry grade", but the results start making more sense.


A letter-by-letter markov chain would lead to similar unconvincing results. As you said, vocal groups matter much more than single letters. If you know anything about korean, they actually group letters into characters that way. If one could build such a markov chain for English it would be very convincing I think.


You're right, I forgot that markov chains are memoryless


I used a letter by letter Markov chain for this: http://password.supply/

The output is definitely not convincing as actual words (but reasonable for somewhat more memorable passwords).


You should check out the VOLT paper, I think it would work well. It's a new technique for splitting up a vocabulary into subwords while minimizing entropy. These subwords could then be mixed and matched, maybe by a neural model, for better results.


Thank you for the reference. To save others a search, I believe this is the paper:

Vocabulary Learning via Optimal Transport for Neural Machine Translation - https://arxiv.org/abs/2012.15671

https://jingjing-nlp.github.io/volt-blog/

https://github.com/Jingjing-NLP/VOLT


I got "minable" on my first try and found it impressive and surprised that it wasn't a word. After 3 other reloads nothing else came up.


Definitely not a fake word. Coal, for instance, is a minable resource.

https://www.dictionary.com/browse/minable


Similarly, ”shitbin” was the second word on my first try, and I had to internet search to convince myself that it isn’t in fact a word.


It definitely is a word, since "mine" is an existing verb.


I got "episexic" and, well, I kind of like that one.


Speaking of gibberish english: I know this has been on YouTube for 10 years, but there are always newcomers who haven't had their brain melted by it:

https://www.youtube.com/watch?v=-VsmF9m_Nt8


For a non-native speaker of English - this sounds like lots of songs.

Tangentially related - this is how I discovered Nightwish some 15 years ago: https://www.youtube.com/watch?v=gg5_mlQOsUQ


Nobody could make up words like Frankie Smith (may he RIP 2019) in the middle of Double Dutch Bus https://youtu.be/fK9hK82r-AM


The sound engineer on the loveline show had Dr Drew Pinsky trying to sing this song as an evergreen.


Thank you.

I know this comment doesn't add anything of value to the discussion per se, but that's given me the biggest laugh I've had in months.

Nightwish came into my life in the 00s, and I couldn't tell you one song meaning, yet I love the sound.

This is just a perfect video, thank you for sharing.


Unfortunately, despite now knowing the words, the song will never sound right anymore :)

Great band, though.


Modern version: https://youtu.be/ybcvlxivscw

"English" starts at around 0:48, but the others are also worth a listen!


Here's another similar one, but acted prose instead of a song:

https://www.youtube.com/watch?v=Vt4Dfa4fOEY


This isn't nonsense in the same way, but it has a similar appeal: https://www.youtube.com/watch?v=Y8yEH8TZUsk


This is what a parse error feels like.


My brain isn’t melted. This could just be some obscure Dutch dialect for all I know.


I'm sure some people don't hear it, like "the dress", but for some of us it sounds like an Uncanny Valley of English: close but not quite, just enough for our brains to trip over / struggle to comprehend b/c it is so close.


I wonder how much those exposure those creeped-out people have had to other Germanic languages.


As well as the associations with [1], this also made me think of one of my favourite essays, "Horsehistory study and the automated discovery of new areas of thought"[2]

[1] https://www.thisworddoesnotexist.com/ [2] https://interconnected.org/home/2021/06/16/horsehistory


<obligatory>

Jabberwocky

’Twas brillig, and the slithy toves

      Did gyre and gimble in the wabe:
All mimsy were the borogoves,

      And the mome raths outgrabe.
“Beware the Jabberwock, my son!

      The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun

      The frumious Bandersnatch!”
He took his vorpal sword in hand;

      Long time the manxome foe he sought—
So rested he by the Tumtum tree

      And stood awhile in thought.
And, as in uffish thought he stood,

      The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,

      And burbled as it came!
One, two! One, two! And through and through

      The vorpal blade went snicker-snack!
He left it dead, and with its head

      He went galumphing back.
“And hast thou slain the Jabberwock?

      Come to my arms, my beamish boy!
O frabjous day! Callooh! Callay!”

      He chortled in his joy.
’Twas brillig, and the slithy toves

      Did gyre and gimble in the wabe:
All mimsy were the borogoves,

      And the mome raths outgrabe.
</obligatory>


I am aware about two translations of this poem into Czech. They are completely different from each other and both very playful.


Have you seen the ActionScript version?

Many years of /. posts and other results might find you a version that's readable if you search.


I love this.


Sorry, after a few refreshes not a single word was anything that looked remotely like English. It all looked like complete gibberish or words in another language. Most of them weren’t even pronounceable.


On my first load, I got "Plailmly", which uses a sequence of consonants that I'm reasonably certain occurs nowhere in the English language.


I think ailml is the offending sequence here. It's pretty difficult to say and doesn't sound like something that you'd find in a native English word.

There's calmly which is similar, to be fair, but there's something about the tongue positions for ailml that I find noticeably more difficult, it's too far forward.


Not nowhere, but uncommon: calmly, filmlike, ...


Try for -ailm-.


Ailment


As with flailmen, you've put a syllable break (and a morpheme break!) between the L and the M. This will make continuing the sequence into -ailml- impossible, since an English syllable can't start with ml-.

Interestingly, there's nothing wrong in general with starting a syllable with ml-; it's fundamentally the same mouth motion as starting with bl- or pl-, both of which are common in English. But ml- isn't allowed.

This plays into a pet observation of mine, which is that an underappreciated constraint on the space of words that actually exist in a language -- as opposed to the space of words that could conceivably exist -- is that by and large they must descend from older words in an older form of the language, so that even if a word like "plailm" obeys the rules for modern English syllables, it can't exist because its precursor word would have violated the rules for older English sounds. (I don't know if this is actually true as applied to "plailm", but the phenomenon (of possible sounds failing to exist due to their precursors having been impossible) is real.)


Very good, mlord / mlady ;)


Milord and milady do not involve syllables starting with ml-. They involve a reduced vowel coming between the /m/ and the /l/, making milord two syllables and milady three. They also aren't spelled "mlord" or "mlady"; your options are "milord", "milady", "m'lord", or "m'lady".


Yes. That is why I used the word “;)” at the end there. And, yes, I know ;) is not a word.

I’ve been splained that one of the reasons “humor” sometimes doesn’t play well on HN is that people here have such a wide diversity of English grok. I didn’t anticipate someone could have too much knowledge, but, huzzah, there 'tis, one small step f’r ’man, and so forth.


In that case, you might wish to know that putting a smile at the end of a comment like that is also a common way of calling the person you're talking to stupid.


Welp, it was a wink, not a smile. The intention was good-natured. Just havin' fun on the internet with my new pal, Thaumasiotes, who is plainly the only other person among the swarming billions who found this tiny quirk of language worth blethering about with me. I hear you, mostly people think this sort of mishegas is nutballs. Their loss.

τί οὖν τίμιον; τὸ κροτεῖσθαι; οὐχί. οὐκοῦν οὐδὲ τὸ ὑπὸ γλωσσῶν κροτεῖσθαι: αἱ γὰρ παρὰ τῶν πολλῶν εὐφημίαι κρότος γλωσσῶν.


Flailmen


Flailmen: Awkward males, made uncomfortable and rendered incoherent by the close proximity of a romantic interest. Also, medieval warriors wielding flails.


Also mailmen, you know, male posties.


Runinal Worriably Homenite

I like these, especially the last.


Down due to rate limiting so I can't look at it, but sounds similar to the fantastic https://www.thisworddoesnotexist.com/


The first word I got was 'scrotal', which is a real word.


After a few refreshes I got 'sundial'.


Should probably do a final pass filter against an English word dictionary.


The github example contains “trident” so figure author knows.


Can anyone tell me more about how this works? Most of these don't resemble English words at all to me lol, wondering what the generative procedure/parameters are in the first place


I find much more interesting:

http://www.thisworddoesnotexist.com/

as it also fakes the definition.

But if you want to write some Vogon like poetry, the words generated by Fakelish might be just fine.


dynoderma

dyn·o·derma

a slender, membranous musclelike structure, believed to represent a cross between a cranium and the external spaces of fish and invertebrates, supporting the glans in most vertebrates

"a dynoderma is thought to have existed in all living organisms"


https://raw.githubusercontent.com/nwtgck/fakelish-npm/develo...

Basically a big probability map. I'm guessing this was machine generated though, and it isn't clear to me how that was done.


The following is the text of a recent HN comment (not my own) on the subject of non-drug highs. As it suggests starting with a nonsense word, the OP fake word generator ought to suit:

> It is really quite easy: have someone you know provide a nonsense word. In needs to have no logical sense or connections to to anything - pure nonsense. Then, with that phrase held in your most present and loudest inner voice you repeat that phrase in your head. Repeat it over an over, forcefully to drive any other thoughts or thought fragments out of your mental conversation(s) (at all mental conversation levels, if you have more than one going at once). After a few minutes of forceful repeating, it echoes on it's own, and a few realization moments later 20-30 minutes have passed and it feels like waking from a refreshing dream. When in the "state", it really can't be described because it is whatever your imagination and recent experiences feedback froth back and forth. It's relaxing and refreshing, and a great way to clear one's head when working on difficult complex mental goals.


Quite a few cromulent words, but far from perfect.


I see what you did there


Aimlessly flying though Dasher can create some pretty plausible new words. It’s worth playing around with if you haven’t seen it. It’s in most Linux package managers.

https://www.inference.org.uk/dasher/


I got Donsize. It's when the family handles the layoffs


I recently started playing the NYT Spelling Bee game. There you find yourself wishfully inventing a lot of plausibly English-sounding words, only to learn that indeed, (e.g.) "vilicent" is not a part of the language. IMO the quality of these words is low compared to what a human being comes up with.


So many of the generated names sound like pharmaceutical brands...

Also, if anyone is playing NYTime's spelling bee game, you've probably become pretty familiar with common english three/four letter combos and then iterate/manipulate them to find words. It all about the patterns!



Reminds me of the Italian song made up of English sounding gibberish (although some real words do sneak in, like "alright") https://www.youtube.com/watch?v=-VsmF9m_Nt8


JFYI, that Celentano's song is a famous/mainstream example of grammelot:

https://en.wikipedia.org/wiki/Grammelot

The master/pioneer of grammelot in Italy was Dario Fo, sample (of the "English" one):

https://www.youtube.com/watch?v=8A4n9Ez9O8g


The title immediately made me go "but Adriano Celentano did this". A staple of my childhood even if I only watched German dubs.


These read just as plausibly as "Transient companies selling low quality imported products on Amazon." If perhaps a bit too easily pronounced in English.


I've seen most of these drugs advertised on television.


Came here to say this.

For life’s more persistent problems: Ask your doctor about Subrixate today!


This or pronounceable password generators are great for making usernames for random sites. Sometimes you can even get the .com for them! (if you’re into that)


Coming soon to a Teams meeting in front of you ;-) Amazing!


Portmanteau's are absobloddylutely fun. Though a bit cruel upon those learning the language.


I'll be honest... the words dont look convincing XD


Strange, most the words I saw looked Greek or Latin


As an American assumed it was more like British English.


I want to register all of these as domain names.


This website has been temporarily rate limited


The website got hackernewsed


Markov's chain?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: