Hacker News new | past | comments | ask | show | jobs | submit login
Websites with the most backlinks (seoprofiler.com)
23 points by andrevoget on Sept 7, 2010 | hide | past | favorite | 19 comments

We've just launched our new web app seoprofiler.com after crawling the web for over a year (own crawler using the Amazon EC2 infrastructure). The submitted page shows some statistics that we've gained.

Interesting concept.

But this makes me wonder: http://toksvaerd.net/ is number 7 on the list for Denmark. It's apparently a site for parents of school children in a small town. The site has had about 300 visits in 2010 according to their counter http://www.counter.gd/Homepage-193570.html Are there really only 6 sites with more backlinks in Denmark? http://www.seoprofiler.com/statistics/top-backlinks/denmark

I can only speculate. We used seed sites from all American and European countries, including Denmark, so Danish websites should be covered. It might be the geo IP database that we use (maxmind.com), or there are really only 6 sites with more backlinks in Denmark, or Danish webmasters use other countries to host their sites.

Looks like some fishy sites are linking to that page. http://www.google.com/search?q=links%3Atoksvaerd.net&ie=...

Maybe it was previously hacked and part of a SEO-spam network?

http://www.seoprofiler.com/company/about compares to Google link: queries, which are indeed incomplete. How does your crawl compare to Yahoo! Site Explorer link: queries. The latter seem more complete to me.

In our tests, Yahoo Site Explorer sometimes shows more links, sometimes less links than we do. It depends on the subdomain.

SEOprofiler shows the backlinks only for the creativecommons.org subdomain, excluding support.creativecommons.org, wiki.creativecommons.org, search.creativecommons.org, de.creativecommons.org, etc. For this reason, the backlink number appears to be lower.

No, that's not the reason. As you can see by clicking "subdomains", they only account for 1.1% of links seoprofiler knows about.

75,233,046 links to geocities sites to rank 42nd in the world. You would think Yahoo could do something better with (eg http://www.geocities.com/monkiineko/index.html) for all that incoming traffic.

Yahoo took Geocities offline earlier this year. There are at least 6 copies of varying completeness available.






and something from http://www.internetarchaeology.org/ but I can't find the link.

right, the parent was commenting that yahoo is essentially throwing away a site that has a very high number of backlinks.

Seems to me that they would have been better off letting it go read-only, and then attempt to use it to sell other Yahoo services (I mean, if they thought the abuse costs of continuing to run geocities were too high; And I imagine that the abuse costs were the only real serious costs associated with continuing to run it.)

When they killed it it had a higher Alexa rank than Digg too.

Look to me that seoprofiler.com wasted a year of their life. This data is already free from https://siteexplorer.search.yahoo.com and http://www.opensiteexplorer.org/ and by doing site: query on google.

According to Yahoo, the Yahoo Site Explorer runs only until 2012 because of the Yahoo-Microsoft search deal ( http://www.ysearchblog.com/2010/08/17/search-alliance-update... ). Open Site Explorer is the free promotion site of the commercial toolset Linkscape, so it's similar to us. Google link queries are very incomplete, that's why webmasters use other databases. In addition, we provide other tools as well.

I think their site list is not varried enough. The fact that they're hitting sites with reddit, w3c validator and wordpress.org tells me they just built the list from their own browser history, or got tripped up about heavily SEO'ed sites in google results (those tend to be more 'social' as well.)

are wordpress.org and en.wordpress.org mutually exclusive? If so, then they'd be #1 if these were combined

Yes, the lists show the numbers for subdomains so wordpress.org and en.wordpress.org are mutually exclusive. I suspect that all Google domains combined would be #1 (google.com, maps.google.com, code, picasaweb, sites, news, video, groups).

Your right, but it doesn't matter because Google, et al. "see" subdomains as separate entities just the same as TLD names.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
