From skimming, this appears to be the algorithm given in the original PageRank paper from 12 years ago. You don't need hundreds of megs of RAM to do it, though; put yourself in the mindset of a 1960s COBOL programmer, and you can do the matrix-vector multiplication in a few K of memory space by the following approach. The adjacency matrix is a set of triples (src, dst, weight), indicating that there is a link from URL src to URL dst of weight weight, where the weights in a row sum to s. You're computing an occupancy vector consisting of a bunch of pairs (url, weight) where the weights total to 1.
1. Sort the adjacency matrix by src and the occupancy vector by url.
2. Iterate through the current occupancy vector and the adjacency matrix in parallel. For each triple (a, b, c) and pair (a, d), emit a partial-sum pair (b, d*c).
3. The partial-sum pairs will be randomly ordered. Sort them by URL.
4. Sum the partial-sum elements for the same URL, and add in the contribution of sD + tE. (I'm leaving out how to compute sD at this point, but it's not that hard.) This is your new occupancy vector.
5. Go back to step 2 for a few more iterations.
This approach parallelizes nicely with MapReduce, and doesn't require anything but sequential data access; you could do it with four tape drives and a microcontroller. I know I've seen it in some paper somewhere, but I can't remember where.
While that is true, this is not the most efficient way to solve the problem if you have a lot of RAM. My point is merely that your laptop can compute PageRank for, roughly, any link graph that will fit on its hard disk, or that it can access at a reasonable speed over the network.
(At 30 links per web page and 5 bytes per link, you need about 160 bytes per web page to represent this link graph. That would allow you to store a 6-billion-page graph on a 1TB notebook hard disk. Unfortunately, making 8 passes over that terabyte will probably take more than a day.)
1. Sort the adjacency matrix by src and the occupancy vector by url.
2. Iterate through the current occupancy vector and the adjacency matrix in parallel. For each triple (a, b, c) and pair (a, d), emit a partial-sum pair (b, d*c).
3. The partial-sum pairs will be randomly ordered. Sort them by URL.
4. Sum the partial-sum elements for the same URL, and add in the contribution of sD + tE. (I'm leaving out how to compute sD at this point, but it's not that hard.) This is your new occupancy vector.
5. Go back to step 2 for a few more iterations.
This approach parallelizes nicely with MapReduce, and doesn't require anything but sequential data access; you could do it with four tape drives and a microcontroller. I know I've seen it in some paper somewhere, but I can't remember where.