Hacker News new | past | comments | ask | show | jobs | submit login
Generating domain names using Markov Chains (codeismightier.com)
65 points by codeismightier on Oct 19, 2008 | hide | past | favorite | 25 comments



The new Web 2.0-esqe (or dare i say Web 3.0-esqe) nomenclature is dominated by "double-vowels" (Wii, xumii, mobee) or "vowel-combos" (cuil) not often found in normal english. So this approach will not succeed in giving you a really innovative name. Probably you need to perturb the Markov-chain to achieve that or waste a lot of free time digging your cognitive faculty!


But those names suck.


Yeah, but you have to remember that those names will affect the perception of similar names. The Wii, for example, being a monster hit is going to affect the way people perceive names.


You could train the Markov chain specifically on all registered domains. And weight by recency, if you want to be trendy.


This approach won't give you a name like anyone else is using, so it's not an innovative name?


"Chattly.com Beat you to it"

Registered 10/19. Nice. :)


As an aside - Markov Chains are a key tool in Bioinformatics (Hidden Markov Models).


Very cool ... but I don't want the code, I just want a web app to generate names for me (and check that the domain is available, of course).

C'mon, where's markovr.com?


I used the tool to make the name markovable, got the domain and made this: http://www.markovable.net


Keep that site up :) It's a great tool.


I think you mean a web app to generate names and squat them. :-)


All this script needs is the bulk-whois gateway on the other end. But seriously, anyone know of a whois service who allows free high-volume lookups?


don't bother with whois; query the root/gtld nameservers for a valid NS record for the domain to see if it's registered. dns is much quicker and nobody's going to shut you down for using it.


You can subscribe to receive daily a list of all registered .com/.net/.org domains, from the registrar.


You guys killed my server! (256 slice) I can't even ssh into it now!


I bet it was the downloading of that dictionary file.


Does the web console (at manage.slicehost.com) still work?


Yep. Thanks for the suggestion. After I logged in I realized that Apache was basically in an infinite loop where it kept trying to create new child processes but those where instantly being killed because of a lack of memory. After a hard reset it seems to be working now.


someone should make this a web app.


Silly python people always writing lots of code. This should be, like, 15 lines.

http://search.cpan.org/~rclamp/Algorithm-MarkovChain-0.06/li...


Funny I did a similar thing recently, specifically for domain names. The code to train a Markov chain (without using any libraries outside standard Python) is this:

  import re, collections
  from itertools import *
  
  def words(text):
      return (w.group() for w in re.finditer(r"\w+", text.lower()) if w.group().isalpha())
  
  F=collections.defaultdict(lambda:collections.defaultdict(lambda:1))
  
  M = 5
  
  def add_spaces(word):
      return ' ' * M  + word.strip() + ' ' * M
  
  N = 0
  
  from operator import add
  def train(words):
      for word in words:
          w = add_spaces(word)
          for i in range(len(w)-M):
              F[w[i:i+M]][w[i+M]] += 1
      global N
      N = reduce(add,(n for d in F.itervalues() for n in d.itervalues()))
  
  train(words(file('big.txt').read()))
Then, I tried to measure how "good" a particular word is (experimenting with different measures):

  from math import log
  def goodness(word):
      w = add_spaces(word)
      res = 0
      for i in range(len(w)-M):
          res += F[w[i:i+M]][w[i+M]]
  #        res += log(F[w[i:i+M]][w[i+M]])
      global N
      return float(res) / ((len(w)-M)*N)
  
  print 'test score:', reduce(add, (goodness(w) for w in words(file('test.txt').read())))
Then, tried to find the top N words at less than 2 edits distance:

  alphabet = 'abcdefghijklmnopqrstuvwxyz'
  
  import types
  def edits1(word):
      if type(word)==types.StringType:
          n = len(word)
          return set(chain((word[0:i]+word[i+1:] for i in range(n)),                     # deletion
                     (word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)), # transposition
                     (word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet), # alteration
                     (word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet)))  # insertion
      else: #Assume word is actualy an iterable of words
          return set(reduce(chain,(edits1(e) for e in word)))
  
  def edits2(word):
      return edits1(edits1(word))

  def topn(n, word):
      words = [(x, goodness(x)) for x in edits2(word)]
      words.sort(cmp = lambda x,y: -cmp(x[1],y[1]))
      return words[:n]

I have borrowed some code from Peter Norvig's spelling checker, and the style is very much influenced by it.


It'd be nice to include the actual library:

http://search.cpan.org/src/RCLAMP/Algorithm-MarkovChain-0.06...

…so, 165+15 = 180 lines total?


Try closer to something like 800 lines in the library. It comes with demos, a test suite, etc. And I think you're skipping the base class in your naive count.

I'm not really sure how the line count of a library is relevant to anything?


For pretty obvious reasons?

  import generating-domain-names as gds

  print gds.genName()
2 lines of code to do Markov-chain-generating doaminnames!


I just deleted some dead code left from debugging and now it's a few lines shorter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: