Hacker News new | past | comments | ask | show | jobs | submit login
Generating pseudo random text with Markov chains using Python (uswaretech.com)
15 points by jcsalterego on June 16, 2009 | hide | past | favorite | 5 comments



In "The Practice of Programming", Brian Kernighan and Rob Pike use the Markov Chain Algorithm to illustrate a number of lessons about software design and implementation. I used it as my "Hello world" program when learning a new programming language.

In Python, the implementation is surprisingly succinct:

    import sys
    import random
    import collections

    MAXGEN = 10000
    NONWORD = ""
    suffixchain = collections.defaultdict(list)

    w1 = w2 = NONWORD
    for line in sys.stdin:
        for word in line.split():
            suffixchain[(w1, w2)].append(word)
            w1, w2 = w2, word
    suffixchain[(w1, w2)].append(NONWORD)

    w1 = w2 = NONWORD
    for i in range(MAXGEN):
        suffixes = suffixchain[(w1, w2)]
        next = random.choice(suffixes)
        if next is NONWORD:
            break
        print next,
        w1, w2 = w2, next
To use it, just run:

  $ python markov.py < english_corpus.txt


This is relevant, if you want to do something fancier:

http://megahal.alioth.debian.org/How.html

BTW, has anything better than Megahal turned up in the meantime, that is also open source?


This is a big reason why speech recognition works. You have a list of probabilities of sequence of words occuring(language model) and it decides on the next word based on the last few previous words.

So if you have the word Super, odds are the next word is Man and not Microphone.


I resent that.


Not ground-breaking by any means, but a nice, concrete application of Markov chains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: