Then I use neato to layout the graphs. In each case I run it a few hundred times and pick the result that has the largest boxes. That means it's the most compact output, and seems to be a good heuristic.
Finally I create the HTML you see using an HTML template.
echo Create page with map
./Create_PGE.py > PaulGrahamEssays
There's a small lie in this. The web site is actually a statically generated "wiki". When it was first devised I didn't have the facility to run scripts on my host. I generate the pages as plain text with some mark-down, then off-line generate the entire site. Then I upload the parts that have changed.
If you try to edit then your suggested new version gets emailed to me, where it goes through an aggressive spam filter. Then it sits in my inbox for me to decide if it's a good change. If so, I trigger a refresh. Some people have passwords that trigger the refresh automatically, without my intervention, so it really does work as a limited access wiki.
Here's a stripped down version of the "LayoutGraph" script. In essence, layout the graph 100 times and score each attempt according to the size of the biggest box. Save the attempt in a file with a name that has the score in it, then pick off the best. That way I can compare attempts with each other while it runs.
GraphName=$1
GraphSize=$2
echo Laying out graph $1
LayoutParms="-Gstart=random -Nshape=box -Goverlap=false -Gsplines=true"
for n in 9 8 7 6 5 4 3 2 1 0
do
for m in 9 8 7 6 5 4 3 2 1 0
do
neato ${GraphName}.dot $LayoutParms -Gsize="${GraphSize}" -Tpng -o ${GraphName}.png -Timap -o ${GraphName}.map
score=`sed "s/,/ /g" ${GraphName}.map | gawk '{printf("%04d\n",$5-$3)}' | sort -rn | head -1`
mv ${GraphName}.png ${GraphName}.png_$score
mv ${GraphName}.map ${GraphName}.map_$score
echo $n$m $score `ls -l ${GraphName}.png_* | tail -1`
done
done
cp `ls ${GraphName}.png_0??? | tail -1` ${GraphName}.png
cp `ls ${GraphName}.map_0??? | tail -1` ${GraphName}.map
ls -r ${GraphName}.png_0??? | tail -n +2 | xargs rm
ls -r ${GraphName}.map_0??? | tail -n +2 | xargs rm
I kind of assumed everyone would know how the underlying page-rank algorithm works.
Put people on pages at random, and have them click on an outgoing link at random. Then, 20% of the time, teleport them to a new, random page. What is their distribution in the long run?
The higher the number, the more likely you are to end up there. This means you have more incoming links, but they are incoming links from pages with higher rankings. Incoming links from lower ranked pages - pages with few incoming links - don't give you much "Google Juice".
I'm curious, do people really not know this? Should I write it up? It's a standard piece of linear algebra to find the eigenvector of the appropriate matrix. I thought pretty much everyone would know it.
For those who are interested, here's a lengthier explanation I wrote that relates PageRank explicitly to solving the eigenvector equation (Mq = q, in the language of the article):
Skip past the opening, and down to "Basic Description of PageRank". The article eventually gets somewhat technical, but hopefully this at least helps explain the basics. Incidentally, I don't use the term "eigenvector" in the article, but when we're analysing an equation like Mq =q, that's an eigenvector equation!
Unfortunately, in the present context, only the early part of the notes will be really accessible to someone who has only just taken an intro class in linear algebra. The later parts, which deal with issues such as how fast the PageRank algorithm converges, really require people to have more mathematical experience.
But it's not the job of such a paper to teach them about these things. This sort of paper serves as context and motivation. The material itself can, and should, be learned from other texts designed to teach it. I think what you have is an excellent piece. Read carefully at first, then skim more and more as you get lost. Inspired, go and read about some of the necessary math, then come back and get further next time.
That is exactly why I took linear algebra in college. I (tried to) read the original Google paper and was like, "eigen-what??" and decided to take the class as an elective.
Linear algebra is stupendously useful material, and I think it really should be standard in any CS curriculum. It wasn't in mine.
> It's a standard piece of linear algebra to find the eigenvector of the appropriate matrix.
Yes please!! I just learned about eigenpairs and the teacher refused to explain their significance. I have no idea what they are beyond their purely abstract definition. I would definitely read your post about this.
I like. You see pg's popular essays all the time and get a little tired of them. Beating the averages, yeah whatever... hackers and painters, uh huh. But the less popular ones are really good, I especially like "The Submarine".
(Maybe because it is my hobby to push back against dress codes. One time, I was visiting my company's Singapore office. "You can't wear tennis shoes here, it's against the dress code!" "Do you know who I am?" "Nope." "Can you fire me?" "Nope." "Then what are you going to do about it?" :)
It is what google ranking is said ot be based upon. though they keep their exact algorithm secret afaik.
In a nutshell, you give each page a base score, and have it give away part of its score to the pages it links to.
Assuming that the page author wouldn't link to a page with poor/irrelevant content, you get some sort of quality/popularity score for each page.
Interestingly there's no font-styling (or any css for that matter, he is using tables for christ's sake) on that webpage so the reason you are seeing Serif is because of your web browser more than anything else.
It's too old school to expect people to have their own personal default style settings for their browsers. There are barely any sites these days that don't define their own styles.
It's actually a neat idea of the original web, but it's of little practical value today.
He might mean within the graphs. I haven't played with asking neato to change the font being used, and I'm not sure what fonts I have available on my very, very elderly setup.
http://www.solipsys.co.uk/new/PaulGrahamEssays.html?HN
I was prompted to provide these links by this item:
http://news.ycombinator.com/item?id=2011897
I do regularly re-visit PG's essays, and I use these charts/graphs to see what catches my eye on any particular occasion.