We've just launched our new web app seoprofiler.com after crawling the web for over a year (own crawler using the Amazon EC2 infrastructure). The submitted page shows some statistics that we've gained.
I can only speculate. We used seed sites from all American and European countries, including Denmark, so Danish websites should be covered. It might be the geo IP database that we use (maxmind.com), or there are really only 6 sites with more backlinks in Denmark, or Danish webmasters use other countries to host their sites.
http://www.seoprofiler.com/company/about compares to Google link: queries, which are indeed incomplete. How does your crawl compare to Yahoo! Site Explorer link: queries. The latter seem more complete to me.
SEOprofiler shows the backlinks only for the creativecommons.org subdomain, excluding support.creativecommons.org, wiki.creativecommons.org, search.creativecommons.org, de.creativecommons.org, etc. For this reason, the backlink number appears to be lower.
75,233,046 links to geocities sites to rank 42nd in the world. You would think Yahoo could do something better with (eg http://www.geocities.com/monkiineko/index.html) for all that incoming traffic.
right, the parent was commenting that yahoo is essentially throwing away a site that has a very high number of backlinks.
Seems to me that they would have been better off letting it go read-only, and then attempt to use it to sell other Yahoo services (I mean, if they thought the abuse costs of continuing to run geocities were too high; And I imagine that the abuse costs were the only real serious costs associated with continuing to run it.)
According to Yahoo, the Yahoo Site Explorer runs only until 2012 because of the Yahoo-Microsoft search deal ( http://www.ysearchblog.com/2010/08/17/search-alliance-update... ). Open Site Explorer is the free promotion site of the commercial toolset Linkscape, so it's similar to us. Google link queries are very incomplete, that's why webmasters use other databases. In addition, we provide other tools as well.
I think their site list is not varried enough. The fact that they're hitting sites with reddit, w3c validator and wordpress.org tells me they just built the list from their own browser history, or got tripped up about heavily SEO'ed sites in google results (those tend to be more 'social' as well.)
Yes, the lists show the numbers for subdomains so wordpress.org and en.wordpress.org are mutually exclusive. I suspect that all Google domains combined would be #1 (google.com, maps.google.com, code, picasaweb, sites, news, video, groups).