Fakelish – Fake English word generator

mrbukkake · on Dec 5, 2021

Nice idea, naive implementation which leads to the output being unconvincing as hypothetical English words. I had a brief look and it seems to be proportionally selecting and sticking together sequences of letters sampled from English words (lib/word-probability.ts). This doesn't take into account syllable boundaries, the way the English spelling system maps between phones/phonemes and the phonotactic properties of English which is why the output looks unconvincing.

A better approach would be to use a markov chain built from sampling English text letter-by letter... an even better approach would be to build your stats from some source of English words in IPA transcription with syllable boundaries etc marked, then map from IPA to spelling via some kind of lookup table. We use a similar process in reverse in my research group for building datasets for doing Bayesian phylogenies of language families

KennyBlanken · on Dec 5, 2021

Clearly you are far more of a linguist than I am, but from such a perspective, I had a similar impression; I reloaded the page several times and none of the words struck me as being remotely plausibly English. These are worse than most Hollywood scifi words/names.

rlayton2 · on Dec 5, 2021

A significant improvement on letter-by-letter, but not that much harder, is to use n-grams: "two letters to predict the third" etc. Still not "industry grade", but the results start making more sense.

bruce343434 · on Dec 5, 2021

A letter-by-letter markov chain would lead to similar unconvincing results. As you said, vocal groups matter much more than single letters. If you know anything about korean, they actually group letters into characters that way. If one could build such a markov chain for English it would be very convincing I think.

mrbukkake · on Dec 5, 2021

You're right, I forgot that markov chains are memoryless

dminor · on Dec 6, 2021

I used a letter by letter Markov chain for this: http://password.supply/

The output is definitely not convincing as actual words (but reasonable for somewhat more memorable passwords).

rajansaini · on Dec 6, 2021

You should check out the VOLT paper, I think it would work well. It's a new technique for splitting up a vocabulary into subwords while minimizing entropy. These subwords could then be mixed and matched, maybe by a neural model, for better results.

lioeters · on Dec 6, 2021

Thank you for the reference. To save others a search, I believe this is the paper:

Vocabulary Learning via Optimal Transport for Neural Machine Translation - https://arxiv.org/abs/2012.15671

https://jingjing-nlp.github.io/volt-blog/

https://github.com/Jingjing-NLP/VOLT

themdonuts · on Dec 5, 2021

I got "minable" on my first try and found it impressive and surprised that it wasn't a word. After 3 other reloads nothing else came up.

tw04 · on Dec 5, 2021

Definitely not a fake word. Coal, for instance, is a minable resource.

https://www.dictionary.com/browse/minable

phs318u · on Dec 6, 2021

Similarly, ”shitbin” was the second word on my first try, and I had to internet search to convince myself that it isn’t in fact a word.

thaumasiotes · on Dec 5, 2021

It definitely is a word, since "mine" is an existing verb.

Wistar · on Dec 6, 2021

I got "episexic" and, well, I kind of like that one.

SavantIdiot · on Dec 5, 2021

Speaking of gibberish english: I know this has been on YouTube for 10 years, but there are always newcomers who haven't had their brain melted by it:

https://www.youtube.com/watch?v=-VsmF9m_Nt8

BrandoElFollito · on Dec 5, 2021

For a non-native speaker of English - this sounds like lots of songs.

Tangentially related - this is how I discovered Nightwish some 15 years ago: https://www.youtube.com/watch?v=gg5_mlQOsUQ

speedcoder · on Dec 5, 2021

Nobody could make up words like Frankie Smith (may he RIP 2019) in the middle of Double Dutch Bus https://youtu.be/fK9hK82r-AM

genewitch · on Dec 5, 2021

The sound engineer on the loveline show had Dr Drew Pinsky trying to sing this song as an evergreen.

mPReDiToR · on Dec 6, 2021

Thank you.

I know this comment doesn't add anything of value to the discussion per se, but that's given me the biggest laugh I've had in months.

Nightwish came into my life in the 00s, and I couldn't tell you one song meaning, yet I love the sound.

This is just a perfect video, thank you for sharing.

BrandoElFollito · on Dec 7, 2021

Unfortunately, despite now knowing the words, the song will never sound right anymore :)

Great band, though.

dustintrex · on Dec 6, 2021

Modern version: https://youtu.be/ybcvlxivscw

"English" starts at around 0:48, but the others are also worth a listen!

LordDragonfang · on Dec 5, 2021

Here's another similar one, but acted prose instead of a song:

https://www.youtube.com/watch?v=Vt4Dfa4fOEY

Joeboy · on Dec 5, 2021

This isn't nonsense in the same way, but it has a similar appeal: https://www.youtube.com/watch?v=Y8yEH8TZUsk

formerly_proven · on Dec 5, 2021

This is what a parse error feels like.

avgcorrection · on Dec 5, 2021

My brain isn’t melted. This could just be some obscure Dutch dialect for all I know.

SavantIdiot · on Dec 6, 2021

I'm sure some people don't hear it, like "the dress", but for some of us it sounds like an Uncanny Valley of English: close but not quite, just enough for our brains to trip over / struggle to comprehend b/c it is so close.

avgcorrection · on Dec 10, 2021

I wonder how much those exposure those creeped-out people have had to other Germanic languages.

scubbo · on Dec 5, 2021

As well as the associations with [1], this also made me think of one of my favourite essays, "Horsehistory study and the automated discovery of new areas of thought"[2]

[1] https://www.thisworddoesnotexist.com/ [2] https://interconnected.org/home/2021/06/16/horsehistory

gumby · on Dec 5, 2021

Jabberwocky

’Twas brillig, and the slithy toves

      Did gyre and gimble in the wabe:

All mimsy were the borogoves,

      And the mome raths outgrabe.

“Beware the Jabberwock, my son!

      The jaws that bite, the claws that catch!

Beware the Jubjub bird, and shun

      The frumious Bandersnatch!”

He took his vorpal sword in hand;

      Long time the manxome foe he sought—

So rested he by the Tumtum tree

      And stood awhile in thought.

And, as in uffish thought he stood,

      The Jabberwock, with eyes of flame,

Came whiffling through the tulgey wood,

      And burbled as it came!

One, two! One, two! And through and through

      The vorpal blade went snicker-snack!

He left it dead, and with its head

      He went galumphing back.

“And hast thou slain the Jabberwock?

      Come to my arms, my beamish boy!

O frabjous day! Callooh! Callay!”

      He chortled in his joy.

’Twas brillig, and the slithy toves

      Did gyre and gimble in the wabe:

All mimsy were the borogoves,

      And the mome raths outgrabe.

</obligatory>

inglor_cz · on Dec 6, 2021

I am aware about two translations of this poem into Czech. They are completely different from each other and both very playful.

mPReDiToR · on Dec 6, 2021

Have you seen the ActionScript version?

Many years of /. posts and other results might find you a version that's readable if you search.

gumby · on Dec 6, 2021

I love this.

nkrisc · on Dec 5, 2021

Sorry, after a few refreshes not a single word was anything that looked remotely like English. It all looked like complete gibberish or words in another language. Most of them weren’t even pronounceable.

LordDragonfang · on Dec 5, 2021

On my first load, I got "Plailmly", which uses a sequence of consonants that I'm reasonably certain occurs nowhere in the English language.

nkrisc · on Dec 6, 2021

I think ailml is the offending sequence here. It's pretty difficult to say and doesn't sound like something that you'd find in a native English word.

There's calmly which is similar, to be fair, but there's something about the tongue positions for ailml that I find noticeably more difficult, it's too far forward.

lokl · on Dec 5, 2021

Not nowhere, but uncommon: calmly, filmlike, ...

thaumasiotes · on Dec 5, 2021

Try for -ailm-.

Kaibeezy · on Dec 5, 2021

Ailment

thaumasiotes · on Dec 6, 2021

As with flailmen, you've put a syllable break (and a morpheme break!) between the L and the M. This will make continuing the sequence into -ailml- impossible, since an English syllable can't start with ml-.

Interestingly, there's nothing wrong in general with starting a syllable with ml-; it's fundamentally the same mouth motion as starting with bl- or pl-, both of which are common in English. But ml- isn't allowed.

This plays into a pet observation of mine, which is that an underappreciated constraint on the space of words that actually exist in a language -- as opposed to the space of words that could conceivably exist -- is that by and large they must descend from older words in an older form of the language, so that even if a word like "plailm" obeys the rules for modern English syllables, it can't exist because its precursor word would have violated the rules for older English sounds. (I don't know if this is actually true as applied to "plailm", but the phenomenon (of possible sounds failing to exist due to their precursors having been impossible) is real.)

Kaibeezy · on Dec 6, 2021

Very good, mlord / mlady ;)

thaumasiotes · on Dec 6, 2021

Milord and milady do not involve syllables starting with ml-. They involve a reduced vowel coming between the /m/ and the /l/, making milord two syllables and milady three. They also aren't spelled "mlord" or "mlady"; your options are "milord", "milady", "m'lord", or "m'lady".

Kaibeezy · on Dec 6, 2021

Yes. That is why I used the word “;)” at the end there. And, yes, I know ;) is not a word.

I’ve been splained that one of the reasons “humor” sometimes doesn’t play well on HN is that people here have such a wide diversity of English grok. I didn’t anticipate someone could have too much knowledge, but, huzzah, there 'tis, one small step f’r ’man, and so forth.

thaumasiotes · on Dec 6, 2021

In that case, you might wish to know that putting a smile at the end of a comment like that is also a common way of calling the person you're talking to stupid.

Kaibeezy · on Dec 6, 2021

Welp, it was a wink, not a smile. The intention was good-natured. Just havin' fun on the internet with my new pal, Thaumasiotes, who is plainly the only other person among the swarming billions who found this tiny quirk of language worth blethering about with me. I hear you, mostly people think this sort of mishegas is nutballs. Their loss.

τί οὖν τίμιον; τὸ κροτεῖσθαι; οὐχί. οὐκοῦν οὐδὲ τὸ ὑπὸ γλωσσῶν κροτεῖσθαι: αἱ γὰρ παρὰ τῶν πολλῶν εὐφημίαι κρότος γλωσσῶν.

genewitch · on Dec 5, 2021

Flailmen

robbedpeter · on Dec 5, 2021

Flailmen: Awkward males, made uncomfortable and rendered incoherent by the close proximity of a romantic interest. Also, medieval warriors wielding flails.

Kaibeezy · on Dec 6, 2021

Also mailmen, you know, male posties.

clavicat · on Dec 6, 2021

Runinal Worriably Homenite

I like these, especially the last.

foobarbecue · on Dec 5, 2021

Down due to rate limiting so I can't look at it, but sounds similar to the fantastic https://www.thisworddoesnotexist.com/

quercusa · on Dec 5, 2021

The first word I got was 'scrotal', which is a real word.

jstx1 · on Dec 5, 2021

After a few refreshes I got 'sundial'.

echelon · on Dec 5, 2021

Should probably do a final pass filter against an English word dictionary.

Terretta · on Dec 6, 2021

The github example contains “trident” so figure author knows.

annetipasto · on Dec 5, 2021

Can anyone tell me more about how this works? Most of these don't resemble English words at all to me lol, wondering what the generative procedure/parameters are in the first place

jaclaz · on Dec 5, 2021

I find much more interesting:

http://www.thisworddoesnotexist.com/

as it also fakes the definition.

But if you want to write some Vogon like poetry, the words generated by Fakelish might be just fine.

newsbinator · on Dec 5, 2021

dynoderma

dyn·o·derma

a slender, membranous musclelike structure, believed to represent a cross between a cranium and the external spaces of fish and invertebrates, supporting the glans in most vertebrates

"a dynoderma is thought to have existed in all living organisms"

dharmaturtle · on Dec 5, 2021

https://raw.githubusercontent.com/nwtgck/fakelish-npm/develo...

Basically a big probability map. I'm guessing this was machine generated though, and it isn't clear to me how that was done.

a9h74j · on Dec 6, 2021

The following is the text of a recent HN comment (not my own) on the subject of non-drug highs. As it suggests starting with a nonsense word, the OP fake word generator ought to suit:

> It is really quite easy: have someone you know provide a nonsense word. In needs to have no logical sense or connections to to anything - pure nonsense. Then, with that phrase held in your most present and loudest inner voice you repeat that phrase in your head. Repeat it over an over, forcefully to drive any other thoughts or thought fragments out of your mental conversation(s) (at all mental conversation levels, if you have more than one going at once). After a few minutes of forceful repeating, it echoes on it's own, and a few realization moments later 20-30 minutes have passed and it feels like waking from a refreshing dream. When in the "state", it really can't be described because it is whatever your imagination and recent experiences feedback froth back and forth. It's relaxing and refreshing, and a great way to clear one's head when working on difficult complex mental goals.

4ensic · on Dec 5, 2021

Quite a few cromulent words, but far from perfect.

kaczordon · on Dec 5, 2021

I see what you did there

aendruk · on Dec 5, 2021

Aimlessly flying though Dasher can create some pretty plausible new words. It’s worth playing around with if you haven’t seen it. It’s in most Linux package managers.

https://www.inference.org.uk/dasher/

alanlammiman · on Dec 5, 2021

I got Donsize. It's when the family handles the layoffs

hyperbovine · on Dec 5, 2021

I recently started playing the NYT Spelling Bee game. There you find yourself wishfully inventing a lot of plausibly English-sounding words, only to learn that indeed, (e.g.) "vilicent" is not a part of the language. IMO the quality of these words is low compared to what a human being comes up with.

SkipperCat · on Dec 6, 2021

So many of the generated names sound like pharmaceutical brands...

Also, if anyone is playing NYTime's spelling bee game, you've probably become pretty familiar with common english three/four letter combos and then iterate/manipulate them to find words. It all about the patterns!

shoto_io · on Dec 5, 2021

Reminds me of https://news.ycombinator.com/item?id=29002776

dsizzle · on Dec 5, 2021

Reminds me of the Italian song made up of English sounding gibberish (although some real words do sneak in, like "alright") https://www.youtube.com/watch?v=-VsmF9m_Nt8

jaclaz · on Dec 6, 2021

JFYI, that Celentano's song is a famous/mainstream example of grammelot:

https://en.wikipedia.org/wiki/Grammelot

The master/pioneer of grammelot in Italy was Dario Fo, sample (of the "English" one):

https://www.youtube.com/watch?v=8A4n9Ez9O8g

SomeBoolshit · on Dec 5, 2021

The title immediately made me go "but Adriano Celentano did this". A staple of my childhood even if I only watched German dubs.

delgaudm · on Dec 5, 2021

These read just as plausibly as "Transient companies selling low quality imported products on Amazon." If perhaps a bit too easily pronounced in English.

jnellis · on Dec 5, 2021

I've seen most of these drugs advertised on television.

labster · on Dec 6, 2021

Came here to say this.

For life’s more persistent problems: Ask your doctor about Subrixate today!

deegles · on Dec 5, 2021

This or pronounceable password generators are great for making usernames for random sites. Sometimes you can even get the .com for them! (if you’re into that)

surfingdino · on Dec 5, 2021

Coming soon to a Teams meeting in front of you ;-) Amazing!

Zenst · on Dec 5, 2021

Portmanteau's are absobloddylutely fun. Though a bit cruel upon those learning the language.

kottaram · on Dec 6, 2021

I'll be honest... the words dont look convincing XD

trynumber9 · on Dec 5, 2021

Strange, most the words I saw looked Greek or Latin

dbavaria · on Dec 5, 2021

As an American assumed it was more like British English.

andrew_ · on Dec 6, 2021

I want to register all of these as domain names.

tony-allan · on Dec 5, 2021

This website has been temporarily rate limited

jcmontx · on Dec 5, 2021

The website got hackernewsed

Orionos · on Dec 5, 2021

Markov's chain?