Hacker News new | past | comments | ask | show | jobs | submit login
Tutorial: Get Google search terms and rank from HTTP referrer (pressbin.com)
83 points by jawns on Nov 29, 2010 | hide | past | favorite | 36 comments



Sorry to moan.. but it makes me quite angry that this is getting so many points.

Not only is it a pretty dull code snippet for which a combination of Google Analytics and Google Webmaster Tools would do a better job, but also the code has many flaws.

Even with the blatant SQL injection fixed there's still redundancy in the database - why store results_url separately when you can just use something like

    echo '<a href="http://google.com/search?q='.htmlspecialchars($row['search_term']).'">results url</a>';
and save storing all that duplicate data for every referral?

There's also no error handling - what if you get a referral from example.com/google/page.html? Boom! undefined index 'q' on line 5.

Why's $refer being stored on line 1 only to switch back to using $_SERVER['HTTP_REFERER'] again on line 3?

Why copy $vars into three separate variables at all?

And even if all this was fixed and the code was perfect, what's the point of this submission? Why's it interesting? What do we learn from this?

Again, I know complaining like this is frowned upon, but now and then I think it's useful to have some discussion about what's a good submission and what's not. If this faded out at 10 points then the system would be working. As it is it's been highly ranked for hours.


Hi user24,

As the author of the post, I'd like to respond.

When I wrote the code for this example, my main point was simply to illustrate how to extract certain useful bits of data from Google's referrer URL, and explain what they mean.

I was not intending it to be production-ready code -- that's why there's no error handling.

Indeed, as one other commenter noted, the storing of the values in a database could have been left off entirely.

What's the point of the submission, you ask? Well, Google's referrer URL uses variables that are not very intuitive. Who would know, just by looking at it, that the "cd" value represented the position at which a link appears on the Google search results page?

So I wrote this post for people who might see this referrer in their logs and want to know what it means. And while Google Analytics and Google Webmaster Tools do provide essentially the same information (and more!) there may be limited cases in which a person might want to use the information contained in the referrer URL within their script, e.g. "Hey, you searched for such-and-such. Here are some other posts you might like."


Hey,

I didn't mean to offend. And I've also been guilty of releasing non-production-ready code.

And if you'd just included that kind of disclaimer clearly up front I don't think I'd have felt so compelled to comment on this.

As far as the point of the submission. Yes, I found it interesting to learn about Google's referers, but I think I'd rather have just read about that, rather than seeing the code too. It was unclear whether the article was "Hey lets deconstruct google referers" or "Hey lets play with some PHP". The former is interesting, the latter not.

Thanks for the reply.


I am learning just by reading the comments of you guys correcting/sh!tting on the poster. As a noob to PHP it is sometimes nice to see what not to do.


Oh there's some benefit to be squeezed out of it I agree. It was news to me that google are sending this info.

These comments aren't going to be available to everyone who hits OP's site. If we're lucky, it's not a popular site and no-one will see it. If we're unlucky, the next web app you use could be coded by someone self-taught by articles like this.


That's what the flag link is for.


yes, and I used it when I saw this submission back when it was on 7 points. But it appears that enough people didn't that this submission has remained on the front page for over 3 hours. So, as I explained, I felt that system of flagging wasn't working in this case and that a comment was warranted.

I could as well reply to your comment that that's what the downvote button is for.


PHP has methods for URL parsing. There's no need to approximate them with regexes:

  $query_string = parse_url($_SERVER["HTTP_REFERER"], PHP_URL_QUERY);
  parse_str($query_string, $vars);
  $term = $vars['q'];
  $rank = $vars['cd'];
  $url = $vars['url'];
Correct code is even shorter.


That's awesome! Definitely gonna update the post with your code.


And please, please fix the SQL code. It isn't helping noobs (it's not even valid SQL), and it's an eye-sore for people who know how to insert 3 variables into the DB.


nice to see people are still oblivious to sql injection and using string interpolation instead of prepared statements.


Good point. I should note that I'm not oblivious, but I wanted the example to be as readable as possible. I've added a note at the bottom of the post to make it clear that the code should be modified before using it in a live appplication.


Simple code examples with glaring security holes are a great way to teach people how to write insecure code. If you show usable code in a tutorial it will make it into a production environment somewhere.


After years of blogging, occasionally on programming-related topics, and years of managing an outsourcing operation, my guiding star for all code samples is "this will be used without modification by the least competent member of the cheapest team identified by a non-programmer working in the procurement department of the company which handles my mother's medical insurance."

If you have a blog where you routinely discuss topics that might come up in trivial back office code, you can probably think of a comment or email or hundred from the type of developer I'm thinking of.


Is that really a concern, though? People ignoring warnings like the one at the bottom of the post are probably going to write insecure code anyway. From a purely selfish standpoint, there is no advantage to me from not posting insecure code examples. In fact, the more people copy-paste such bad code and make sites that get hacked, the more opportunities I have for work when they get fired.


That really is pretty selfish.

I'm thinking there are two sides to this, the first is that just like you added the 'don't use this' as an after thought the majority of the people that find your code will cut and paste it without actually reading the article, the second is that if this is your 'first approach' to keep it readable you probably have at least a few instances where you forgot to update to more solid code at a later stage because you thought 'x' or 'y' is not facing the web at the moment. And then one day someone bridges two systems and bang, security hole.


I know it's selfish, I just thought it was a direction of thought worth exploring. Personally I wouldn't post insecure code without a lot more warnings, closer to (or commented in) the code itself. At the same time I would have absolutely no sympathy for a "programmer" that would copy paste code without even reading the entire blog post it's from, let alone make an effort to understand it. They don't deserve their job.

Your point on 'first approach' security holes accidentally being persisted is a good one, and I can certainly think of a few bits of code I wrote that were never meant to be secure, but could potentially be used in a larger, web-facing project at some point. Some food for thought there on perhaps never writing insecure code, even if it's just a test.

Tangential addendum: If security is Done Right, then there shouldn't be a choice between "easy to write, read and follow" and "secure".


I'd say that anyone who needs a tutorial to tell them how to do this in PHP probably doesn't have the skills to fix the security hole. You're setting people up to fail, which isn't very nice (although I'm sure you didn't intend it that way).

Seriously, it's worth making it a tiny bit less readable to makie it copy/paste safe!


There is a common misconception that secure code has to be by definition less readable and harder to write. The upvotes on the parent comment suggest that even HN readers think that way.

I believe that this is a false dichotomy. Good practices and separation of concerns often increase code readability. For example I think the updated version of your code with explicit parameter binding is much more readable than string concatenation.


If readability is your goal, the code should be focused on the task you're trying to teach. Code to put the result in a database is ancillary and could be cut, for a positive impact on readability.


It's clearly a proof of concept code focusing on the actual topic, and he begins the code with "Here's an example".


A very important and effective technique in teaching others (regardless of the subject) is to never write or display something that's wrong, because it's often the only thing a student will remember. This is obvious when it comes to language, but it applies to pretty much anything.


Is that better or worse than not teaching anything because your student couldn't pick out the 10 lines of actual functionality among the 100 lines of paranoiaplate in each example?


I take your point but unshift is still correct. One of the reasons PHP has a reputation for being insecure is because there are so many tutorials lying around that show the wrong way to do it. When newbies Google for solutions, they copy + paste what they find, they don't (bother | know that they should) look through it to see if there are any potential security holes.


Based on everyone's feedback, I've switched the example to the more-secure prepared statements.

Thanks, everyone!


As an aside, did anyone read the second post, about turning extensions of the URLs from .php to .html? That's trading "wrong" for "more wrong". Just rewrite your URLs to have a reasonable hierarchy and you're done, no need to add any cruft like extensions.


hm, old? saw that first time in 2009 :)

here is a tutorial how to track these data directly in analytics: http://yoast.com/track-seo-rankings-google-analytics/


Surprised this isn't yet in Google Analytics.


You can write a custom filter in GA to pull this data out into a report in a separate profile:

http://www.seomoz.org/blog/tracking-organic-ranking-in-googl...


You can get the data if you're signed up to google webmaster tools. Have been able to for a while. And that includes info on times when your page appeared but wasn't clicked on.


Wait, Google has been passing rank in the referrer? Doesn't this completely kill the business model of SERP monitoring services like GinzaMetrics?


As far as I know, this won't work if the user came from https:// - the referrer will be empty, won't it?


That's right, but the article is about Google search, so the referrer will be hidden only if the user comes from https://encrypted.google.com


You need something a bit more complex to parse the referrer from a Google Image search, due to the way Google links images to another page on their site instead of directly to your page. You end up with a nested URL within the referrer that needs to be double-decoded to get back the original search term, or something along those lines.


I am looking at my apache access logs, and clicking to my site from a search query and I can't see the 'cd' / rank parameter. Does this work for other people?


Wow, I saved every referrer for each visitor to BitBuffet.com, I had no idea that it contains the rank! Very neat, now I will peruse my database...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: