Hacker News new | past | comments | ask | show | jobs | submit login
Top 50,000 paying Adsense keywords (purchased from a shady looking website) (docs.google.com)
105 points by iamelgringo on April 26, 2010 | hide | past | favorite | 39 comments



So, last March, after two startup ideas bombed, I was doing some research on where online advertising revenue was being spent. I figured that if I was going to even partially monetize with ad revenue, it made sense to figure out what topics and niches were paying.

As part of that research, I ended up purchasing a list of the top 50k paying keywords off a shady looking website. The site only took PayPal, and they were only charging 50 bucks, so I figured it was worth it, and I wasn't risking too much aside from exposing my PayPal account, so I bought it.

I was pretty skeptical of the data at first, but I randomly spot checked around 50 search terms, with my Adsense account, and they seemed to correlate pretty closely.

I then wrote a script to cross reference how much people were paying for each set of keywords with how many results were returned for those keywords. That gave me a ratio of High cost keywords::number of results, which was enlightening.

Mind you, this data is suspect. Google doesn't charge a fixed rate for their advertising rates, from what I understand, each time an ad is served, the price essentially depends on an auction and the price varies accordingly. Also, I bought this data off a shady looking website during a really low point in one of the worst economies in decades. Caveat Emptor

That being said, I thought you might find it interesting.


Do you have the rights to re-distribute this data?


There was no guidance whatsoever on the website I bought it from as to whether or not I could or couldn't redistribute. I agreed to nothing, I didn't click "next" on any licensing terms. The site I got if from was really pretty sketchy. The site was only 3 or 4 pages. A description of the product, a price list, a "buy now" page and a download page. I clicked the buy now button, paid $50 and downloaded a CSV. No license, no agreements. Nothing.

Also, it was a March '09 that I did this, not March '10. Sorry about that.

I'm really not trying to be shady at all with this, and I really had no idea that people would be uncomfortable with something like this or think it's wrong.

If people have serious moral qualms about this being here, I'm happy to take it down. I just thought it was an interesting hack so I thought I'd share it.


By the Berne convention, everything is copywritten on creation. And, to the best of my knowledge, the default is that you can't distribute a copywritten work unless you have an explicit license to do so.

For example, I can walk into a book store and buy Harry Potter for cash, without signing a license agreement. But, that doesn't mean I can redistribute its contents.

You are usually allowed to resell what you bought under the first sale doctrine (but not to sell copies of it).

IANAL, etc etc


I think there is an exception for "facts," although it may only apply to individual facts, not a collectioon of them. Any one here know more about this?


I'm not a lawyer, but per Feist v. Rural, facts cannot be copyrighted in the United States. Collections of facts may be eligible for copyright, but it requires authorship, and hinges on the creativity of presentation (the author must select which facts to include, how to present them, etc.) but the facts within the collection are not entitled to copyright protection. If this collection was generated automatically by accessing the Google API, it would probably not qualify for copyright protection based on a lack of creativity in its creation.

http://caselaw.lp.findlaw.com/scripts/getcase.pl?court=US...


Facts can't be copyrighted. There's no problem with redistributing the data.


FWIW in the UK (and Europe I think) there are database rights too: http://en.wikipedia.org/wiki/Database_rights

Where copyright is termed so as to protect a unique arrangement of data then an ordered listing would appear to be protected.


Facts and ideas can't be copyrighted, but their expression and structure can.

http://www.templetons.com/brad/copymyths.html

This is definitely a copyright violation, and I am shocked and enraged that this was voted so highly on HN and not flagged.

[Edits]

1. The standard "Recipes and collection of recipes" argument. This is a collection of recipes.

2. I am not shocked that there is a copyright violation. I am shocked that such an egregious violation has been voted so highly on HN. We are all digital workers on HN, and such disregard for digital work is (still) shocking.


Enraged? Your reaction doesn't fit the crime, as is so often the case with copyright.


Expression/structure can be copyrighted, but that's also not the whole story: http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Tel...

If this document merely lists the keywords in ascending/descending order of popularity, it is unlikely that the author's expression is creative enough to warrant copyright.


It is ill-regard of the system, not disregard of the work.


Since when Hacker News has become Private Bay?


Vincent has a point, I don't feel comfortable looking at this even if the source was shady.


Then, don't look at it.


I throw myself on the court of community opinion. Please feel free to flag this post if you think it's unethical or wrong. And, the mods and/or the HN algorithm will mark it dead.

I'm really not trying to be a ass. I thought it was interesting, and the recent post where the author was driving traffic to his blog by posting the top 50 keywords reminded me of it. I did this project over a year ago in March of '09. I don't think I'm giving away any trade secrets at this point.

I thought I'd share the whole list I'm sure he bought in the spirit of openness. But, as I said, it's really no skin off my nose. Feel free to flag this if you think it's inappropriate. I'm really not up for getting flamed or arguing.


Well, you certainly convinced me to download with this comment :]


Very kind of you to share, I've seen a lot of data like this in the past, it looks pretty familiar in terms of the type of stuff on there.


These surface on the SEO underground every once in a while. I personally don't think the data is that valuable except for getting less-than-savvy folks to pay scads of money for a CSV file (present company excepted, most folks who buy this are going to be the DigitalPoint "How do I make $1 a day with AdSense!?" crowd that buy most Make Money Online products).

That said, the existence of lists like this always makes me shoot a baleful glare in the general direction of Google's "transparency." They are very happy to give away all the rest of the world's information, but anything Big G thinks is commercially sensitive for Google is guarded like Fort Knox. (Did you know AdSense users can't even tell you the CPC rates they get without violating the TOS?)


You've hit the nail on the head. I didn't really want to bring up that issue, but I'm glad that you did. Google also makes it against their TOS to automatically crawl their search content: http://www.google.com/support/webmasters/bin/answer.py?answe...

IMHO, that's patently absurd considering the business they are in.

re: transparency:

In what other business would it be remotely okay to not talk about the purchase price, or the sales price of an item? That's just patently silly.

And, the transparency really only gets worse the closer you get to Google headquarters. I can't tell you how many Googlers that I end up having brief chat's with, that can't talk about what they do or what they are working on in more than very vague terms. It's creepy.

re: I personally don't think the data is that valuable except for getting less-than-savvy folks to pay scads of money for a CSV file

You're absolutely correct. The list is really not that valuable, but you see where people take that information and run with it putting out hundreds of cheap SEO sites that target mesothelioma. Where I did find value in it, was in categorizing sectors where people pay a log of money for customer lead generation. That's Legal and Financial.

So, I mulled that idea around for a while, and I decided I would try and focus my startup, http://Newsley.com on the Ecnomic and Financial News space. One quarter of all online ad spend is devoted to Financial services marketing. It's a 3.5 Billion a year industry. And, I dont' really see that many financial services ad that much, so they must be paying a lot.

I've been taking Newsley in a different direction now that I'm part way into the business, I'm talking to a lot of people that manage large sums of money for a living, and I'm picking their brains about their news finding and filtering problems. It's been really valuable. My current business has really nothing to with the original list, but it at least pointed me in a certain direction, and I'm quite happy I did it.


I'm flagging you. I was one of the people paying 50 bucks for it, and this just starts some kind of stupid DRM battle. The guys did the work to accumulate the data, they're not asking for your firstborn. Giving it away is just screwing over somebody else's business model simply because you can.


So somebody spent the time to figure this data out, is selling it, and you're now giving away their product for free? Is there more to this story?


It's forbidden right now, is there any other url?



[deleted]


The problem is you don't know which ads are effective and who's just pissing money away (not to mention A/B testing). And what works for a competitor won't always work for your site/product.


Likely this data comes from google's keyword tool, which has a paid API.

semisland.com/semrush.com has alot of good data of this type. Also, they scrape the ads and organic rankings for a ton of keywords.


For those that don't want to download, I'll give you a major theme in this after a quick perusal.

1. It's very hard to bid on trademarked names. For example "Disney" may take issue with your "Orlando on $20 a day" clickbank ebook ad.

2. It's a little less competitive if you use misspells. So bidding on "Dinsey" could provide higher CR. This is a fairly well-known theme.

Do note, my example was not included in the file.


Caveat: a trademark protects the owner from you attempting to confuse the public into buying your product because they think it comes from the trademark owner. Using typos is attempting confusion. Thus simply relabelling things as "dinsey" doesn't avoid a potential trademark suit.


Has anyone actually looked at these? Most of them have typos (ie: ameratrade instead of ameritrade).

I don't think this is the top 50k search terms, maybe it's the top 50k when there was a google suggest or something?

I refuse to believe "visaa" ranks higher than "visa" and that "ameratrade" was #1.


It's not a list of search volume, it's a list of the top keywords ordered by cpc price in adsense. Some (most?) likely have little to no search volume at all.


Ah my mistake, thanks. Makes sense then that one offs are the most valuable (lowest competition, latches on to a real one)


Also it could be that people who misspell these terms are easier to monetize.


Now that would be a fascinating study indeed! The correlation of misspelling and impulse shopping :)


A lot of trademarked terms (e.g. "visa") might not be available as adsense words. So people bid on the misspellings.


10 Big Myths about copyright explained http://www.templetons.com/brad/copymyths.html


And the relevant tidbit: "Facts and ideas can't be copyrighted, but their expression and structure can. You can always write the facts in your own words though."


It's not a fact, it's a database and databases are protected by law.


This depends on your jurisdiction.


Has anyone downloaded this yet?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: