His deadly prose is
so authentic that it has
a life of its own.
But ordering street
food and riding the subway
had become old hat.
"As an engineer,
I'm sort of a student of
how things fall apart."
I love short stories, and I firmly believe that one of the secrets to good storytelling is to be compact in emotion (let it expand in the reader's head) and start as close to the end as possible.
These haikus are the perfect amount of prose to package up NYT stories.
Slight tangent but if you love short stories then you'll probably like the famous "6 worded story" by Ernest Hemingway. Unbelievable that he could write such an emotional story in 6 words. If you haven't read it - it's here:
Isn't it just automatically extracting sentences that can be broken into 5, 7, 5 groups of syllables? I don't think anyone has made any attempt to summarise the article.
The process is described in the about section (http://haiku.nytimes.com/about). There is an algorithm that extracts the haikus but they only make it to the website when a journalist thinks it good:
"The algorithm discards some potential poems if they are awkwardly constructed and it does not scan articles covering sensitive topics. Furthermore, the machine has no aesthetic sense. It can't distinguish between an elegant verse and a plodding one. But, when it does stumble across something beautiful or funny or just a gem of a haiku, human journalists select it and post it on this blog."
Ah, very good! The other interesting point is they use a dictionary that includes number of syllable information that have augmented with words like "Rihanna".
Part of me wishes this page had been submitted instead of the top level.
Yes, to clarify, I started with the base CMUdict for syllable counts, but I had the program keep track of any term misses it ran into. This way I could augment its vocabulary. It also helped me find some tokenization bugs and also try some rules for dealing with compound words like "unsportsmanlike"
One approximate hack that works pretty well is to count the number of blocks of vowels separated by consonants. It breaks on some words, but was close enough to use for something I was working on. (Datamining rhymes from lyrics.)
I'm not an expert on haikus, but I'm guessing that in English, there's an accepted convention that each line in a haiku generally serves as an independent clause?
So:
What she has given
them is institutional
hagiography
is less "aesthetic" than:
The story's not clear;
Durer may have cooked it up
just to do a nude
One extra layer of machine work could be to use NLP to filter out phrases in which the fifth/seventh syllable doesn't belong to a word that is not a noun. It would be interesting to see how much more it would filter/improve the auto-generated haikus.
Eh, I think the only somewhat-hard requirement is seventeen syllables, and even that is often waved in favor of aesthetics. Plus they really tend to work better in Japanese anyway.
Somewhat on-topic: Jack Kerouac made attempts to "Americanize" the haiku form a bit, which I always thought were pretty neat. I think they're collected in a book called (something like) Book of Haiku.
According to the American Haiku Society (yes, it exists), the syllable count is actually less important than including a seasonal word and a "cut" between two different sets of imagery. But that's a little harder to teach a bit of hack code to do... so...
This is really neat. I hope it stays up after April 1st.
Autogenerated haiku fans might also want to check out Twitter Haiku, which generates haikus from your recent tweets: https://sleepy-mesa-7562.herokuapp.com/
It strings words together randomly so it's a bit more dada. An example from my tweets:
Leaving shut when pain
Like em sleeping no day keep
Me dream normal sure
This is hilarious. With a bit of tweaking it could be made to only detect "haikus" whose lines end at suitable grammatical boundaries rather than word boundaries. I think this would give rise to a higher quality selection.
There are a few awkwardness checks we do that disqualify some haikus. For instance, if the second or third line has a comma right near the front or back, or if there are month or title abbreviations, etc. But I'm always looking to refine it further.
It would be nice if lines didn't end with prepositions or determiners (http://en.wikipedia.org/wiki/English_determiners). Those words depend on whatever follows, so it causes an awkward split. It shouldn't be hard to add such a check.
Sorry, I actually wrote the haiku finding logic in November and it's been running since then while we figured out the look of the tumblr and the moderation workflow.
If you scroll down to the first entry it was posted April 1st, 2013 (from an article in November 2012). Just wanted to note that for anyone else confused.
If you sort by upvotes, there's a number of gems in there.