Hacker News new | past | comments | ask | show | jobs | submit login
SQL injection search (github.com/search)
286 points by mike_esspe on June 1, 2013 | hide | past | favorite | 107 comments





this one seems to produce quite a lot of false-positives though, doesn't it?


Those all look like XSS vulnerabilities to me.


Of course this is a pretty good starting point to find XSS, but I still see quite a few false positives:

the first result I see currently: https://github.com/matsprehn/122B/blob/1d54d2a72f25a23d63ff7...

also spotted this, which looks pretty harmless: https://github.com/cameroni2003/picgrid/blob/0b3becda1f250ef...

a lot others look similar, plus it depends on context...


Yes, sure. Those two do not reveal vulnerabilities just from the results, but I have to wonder at not using templates on the first.


Is it an XSS if you can only make it output code to your own browser? I can already execute whatever JavaScript I want in the console, so what's the advantage of having the server deliver that code to (only) me?

I can definitely see the issue if the server saves the unfiltered input and tries to print that out for other uses, but it seems to me that outputting a raw $_GET variable will only go to the requester and therefore could only run unfiltered code on that requesters machine.

EDIT: Answering my own question:

This is a security hole because the unsafe JavaScript is stored in the URL for the page (it is a GET parameter). This bad URL could then be sent as a link in a spam email or similar. Victims clicking on the link would then see a page that comes from the legitimate source, but is running unsafe code compromising that user's session. This sort of attack relies on the attacker distributing the link with the bad code as a URL parameter, and is not a vulnerability that a user could encounter when just visiting that site as I had first assumed.


That's _exactly_ what XSS is about. One possible way to exploit things like this is if I send you a link to a website, that embeds the target page through an iframe with javascript output injected. I could then have the JS steal your cookies/session or worse.


Or bit.ly/shortener links - you got it. I used to think like you did - what harm is there in 'exploiting' my own browser? Took a while for the penny to drop in my case.


Yes. They are referred to as "reflective" or "non-persistent" XSS vulnerabilities. The attacker might exploit these using for example an "invisible" iframe on a different website, and thus loading the vulnerable website, in the background, with the desired parameters. This will result in the malicious javascript being executed "on the vuln. website", in the victims browser.

This (http://web.math.jjay.cuny.edu/fcm791/web2.0_Vulnerabilities....) is a pretty good paper (jump to page 7) if you are interested.



Heh, cute. This means Github could probably do some automated means of informing these people that their code is insecure and would be a danger to themselves and their users. I'm not sure if they should, but it's interesting that they could.


There's definitely an opportunity for a service to help developers spot obvious security holes.

https://codeclimate.com/ is one I've used but it's Ruby only AFAIK.


Lots of these exist. Check out whitehatsec.com and veracode.com


I'm pretty sure anyone can write a github bot. I remember there used to be several (some of which would submit pull requests!)


GitHub has not been friendly to bots in the past[1].

[1] https://news.ycombinator.com/item?id=4982240 "GitHub Says ‘No Thanks’ to Bots — Even if They’re Nice"


The same bot could notify those people via email using the email address found in commits.


This is a potentially great idea. You could make your build process include submitting your code to a search engine like this (perhaps in some obfuscated manner) and making illegal patterns fail if not manually "approved". Just because the halting problem exists doesn't mean there's not a low hanging fruit in approaching it.


Yep, It's called 'static analysis'. http://en.wikipedia.org/wiki/Static_program_analysis


Heh. I sure romanticized a semi-mundane thing that already exists. ;-)


Don't worry. That's basically what every tech company does these days anyway.


Tons of code review tools already exist out there, runnable on your CI machine. Sonar and Fortify are a few off the top of my head, but there are many, many more.

Nothing new here. Google code search a while back was similar. Also search for index.php~, a great way to see the source for an index page with a unix temp file that Apache will forward as plain text. Oops.


Not sure if this search would be good, but your responses triggers the id of a distributed code review like reCAPTHA. Possibly a pattern search brings up suspicious code and then there is distributed review and once enough people flag it as SQL injection, it could be flagged on Github and someone could submit a patch, etc. Clearly not bulletproof, but this would provide a way of doing code review on projects you might have no other interest in helping.


This isn't a search for SQL injection, its a search for a couple things that you often find in older PHP code that is generally hacked together and likely to have SQL injection vulnerabilities for historical and cultural reasons. However it's perfectly easy to avoid SQL injection even using these things.

    $id = mysql_real_escape_string($_GET['id']);
    $res = mysql_query("SELECT foo FROM bar WHERE id='$id'");
That may be ugly, but it's bulletproof regarding injection.


"mysql_real_escape_string" is the silliest function name ever. I assume there is a "mysql_escape_string" function that doesn't do what you expect it to do?


PHP is a simplistic wrapper around the C API's, so you get all of the legacy of C without any of its performance and memory size benefits.

PHP has some of the insanest defaults due to its C heritage. String functions deal with bytes not characters, so cannot be used safely with utf8 without setting the mbstring.func_overload setting to replace them with unicode-aware versions (except for str_pad, which always deals in bytes). Sort() defaults to binary sorting, and cannot be tricked in any way to sort utf8 in dictionary order if you're running your server on windows (and even on linux it requires an extra parameter on every call). Natsort(), which is supposed to sort like a human would, cannot be made to sort in dictionary order at all. The proper way to sort is by using the Collator class, which is not referenced from the sort() documentation, didn't exist before PHP 5.3, and is in the optional intl extension which is usually disabled by default.

Still better than mysql though, which has a very unique interpretation of unicode collation.


Surprisingly, no one mentioned that mysql_real_escape_string still fails to properly escape a string when the character set has been changed using a SET NAMES or SET CHARACTER SET query. Never mind the big yellow warning sign in the manual.

The search goes like this: https://github.com/search?q=mysql_query+%22SET+NAMES%22+mysq...


The difference between the two is _real_ reacts based on the current connection's character set, and the other doesn't. Yet another reason to jump on the PDO bandwagon is to avoid having to know that subtle difference.


They got the name from mysql's client library API. They just made a wrapper for it.


Don't forget strstr(). So nice, they named it twice.


In PHP's defense, strstr() originally comes from C: https://developer.apple.com/library/mac/documentation/Darwin...


PHP's naming of functions is really horrible.

For example the random usage of underscores: strtoupper(...) substr_compare(...) str_split(...) str_word_count(...)


I'm not a PHP dev, but I have heard that mysql_real_escape_string is not a preferred method of preventing SQL injection anymore?


Currently, the use of PDO is preferred and anything involving the mysql libraries should be avoided, and support for them is being deprecated in PHP anyway.

I found this interesting, though, regarding specifically SQL injection when mysql_real_escape_string is used: http://stackoverflow.com/questions/5741187/sql-injection-tha...

basically the argument appears to boil down to mixed character sets causing escaping not to act as predicted. I can't speak to the validity of it though.


Support for the mysql_ library is being deprecated. Support for the mysqli library is alive and well (although I personally think people should adopt PDO anyway.)


It's not. The preferred method of preventing SQL injections is via prepared statements. mysql_real_escape_string is only suitable for strings (as the name implies). Something like

    SELECT * FROM table WHERE id=$_GET['field']
where $_GET['field'] has been passed through mysql_real_escape_string is still vulnerable. Using prepared statements forces php to send data to the DBMS in such a way that it cannot confuse user input from the actual SQL. This is due to the fact that preparing data forces you to give types to the data before you use it in a query. Escaping input (such as with mysql_real_escape_string) makes this confusion still possible.


Most of the search results show exploitable code. Regardless, this is a search for SQL injection vulnerability, even if not every search result falls under that category.


    use PDO;
    use PDOException;

    /**
     * Used for interacting with the database. Usage:
     * <pre>
     * $db = Database::get();
     * $db->call( ... );
     * </pre>
     */
    class Database extends Obj {
      private static $instance;
      private $dataStore;

      /**
       * Sets the connection that this class uses for database transactions.
       */
      public function __construct() {
        global $dbhost;
        global $dbname;
        global $dbuser;
        global $dbpass;

        try {
          $this->setDataStore(
            new PDO( "pgsql:dbname=$dbname;host=$dbhost", $dbuser, $dbpass ) );
        }
        catch( PDOException $ex ) {
          $this->log( $ex->getMessage() );
        }
      }

      /**
       * Returns the singleton database instance.
       */
      public function get() {
        if( self::$instance === null ) {
          self::$instance = new Database();
        }

        return self::$instance;
      }

      /**
       * Call a database function and return the results. If there are
       * multiple columns to return, then the value for $params must contain
       * a comma; otherwise, without a comma, the value for $params is used
       * as the return column name. For example:
       *
       *- SELECT $params FROM $proc( ?, ? ); -- with comma
       *- SELECT $proc( ?, ? ) AS $params; -- without comma
       *- SELECT $proc( ?, ? ); -- empty
       *
       * @param $proc Name of the function or stored procedure to call.
       * @param $params Name of parameters to use as return columns.
       */
      public function call( $proc, $params = "" ) {
        $args = array();
        $count = 0;
        $placeholders = "";

        // Key is zero-based (e.g., $proc = 0, $params = 1).
        foreach( func_get_args() as $key => $parameter ) {
          // Skip the $proc and $params arguments to this method.
          if( $key < 2 ) continue;

          $count++;
          $placeholders = empty( $placeholders ) ? "?" : "$placeholders,?";
          array_push( $args, $parameter );
        }

        $sql = "";

        if( empty( $params ) ) {
          // If there are no parameters, then just make a call.
          $sql = "SELECT recipe.$proc( $placeholders )";
        }
        else if( strpos( $params, "," ) !== false ) {
          // If there is a comma, select the column names.
          $sql = "SELECT $params FROM recipe.$proc( $placeholders )";
        }
        else {
          // Otherwise, select the result into the given column name.
          $sql = "SELECT recipe.$proc( $placeholders ) AS $params";
        }

        $statement = $this->getDataStore()->prepare( $sql );

        //$this->log( "SQL: $sql" );

        for( $i = 1; $i <= $count; $i++ ) {
          //$this->log( "Bind " . $i . " to " . $args[$i - 1] );
          $statement->bindParam( $i, $args[$i - 1] );
        }

        $statement->execute();
        $result = $statement->fetchAll();
        $this->decodeArray( $result );

        return $result;
      }

      /**
       * Converts an array of numbers into an array suitable for usage with
       * PostgreSQL.
       *
       * @param $array An array of integers.
       */
      public function arrayToString( $array ) {
        return "{" . implode( ",", $array ) . "}";
      }

      /**
       * Recursive method to decode a UTF8-encoded array.
       *
       * @param $array - The array to decode.
       * @param $key - Name of the function to call.
       */
      private function decodeArray( &$array ) {
        if( is_array( $array ) ) {
          array_map( array( $this, "decodeArray" ), $array );
        }
        else {
          $array = utf8_decode( $array );
        }
      }

      private function getDataStore() {
        return $this->dataStore;
      }

      private function setDataStore( $dataStore ) {
        $this->dataStore = $dataStore;
      }
    }
Example usage:

    $db = Database::get();
    $result = $db->call( "is_existing_cookie", "existing", $cookie_value );

    return isset( $result[0] ) ? $result[0]["existing"] > 0 : false;
Another example:

    private function authenticate() {
      $db = Database::get();
      $db->call( "authentication_upsert", "",
        $this->getCookieToken(),
        $this->getBrowserPlatform(),
        $this->getBrowserName(),
        $this->getBrowserVersion(),
        $this->getIp()
      );
    }
Switching to PDO is better. Critiques welcome on Code Review SE.

http://codereview.stackexchange.com/questions/26507/generic-...


I don't know much about PHP but I happened to rewrite some old forms a couple of years ago. The original author had relied on a technique called "magic quotes" (http://php.net/manual/en/security.magicquotes.php) which automatically sanitized user input. When we upgraded our version of PHP "magic quotes" had been deprecated and dropped.

It would be interesting to know if some of these developers are relying on "magic quotes" or something similar... and also to know how large share of the total number of projects these projects represent.


Yeah no, magic quotes didn't ever sanitize.


There's a joke to be made here about "broken crypto search".


I'll make that joke, but it won't be very funny.

https://github.com/search?p=2&q=MD5+password+extension%3...

https://github.com/search?q=CURLOPT_SSL_VERIFYHOST+NOT+depre...

There's more low-hanging fruit, if you're willing to use more specialized searches. For example, guess what mode of operation the PyCrypto library uses by default for all its block ciphers if you don't explicitly pick a sane one:

https://github.com/search?p=1&q=extension%3Apy+crypto+ci...



Also, look for "cookie secret" or anything which looks like a cert file.


Looking around I found a simple CMS sold to small online stores. Through their links you can find a listing of their customers (people who use their CMS). Problem is the CMS is open to SQL injection everywhere. If a script kiddie found this info they could take down a lot of online stores. Not good.


There are tools that will find these and do all kinds of keep-you-up-at-night terrifying things with your application.

You don't even have to be a script kiddie. You just have to be able to click a button.


Did you tell them?


Or better yet write a patch and submit a pull request informing them.


Nice example, but not all are insecure. For example, the second one here is:

    $result = mysql_query('DELETE FROM saves WHERE id = '.(int)$_GET['delete']);


That's an example of hazardously bad programming practices. You're one mistake away from complete disaster. You should be sure that it takes more than one mistake to expose you to that sort of risk.

Casting to int is not a general purpose escaping system, and further, if you miss even one of these your entire application can be trashed.

Using mysql_query at all is a sign there's something severely wrong with your application.


It's a sign that the developer has had a fairly informal introduction to SQL. Personally I see no reason not to use the PDO extension in any situation where you would use mysql.


Depends on the purpose of the code. If it's one-page script then it's fine. If it's the context of the bigger app, mysql_query is probably a problem.


There is an opposing viewpoint, i.e. if you actually need an int, casting to int is one of the most reasonable ways of getting it.


Yes, and given that (IIRC) $_GET always returns strings anyway, casting to int before adding to a database makes sense. Even so, you should still attempt to validate the string with ctype_digit() and make sure it accurately represents the integer you expect. If you just cast directly to an int, you can't really predict the results.

And also, doing the cast inside the sql statement makes it difficult to see. It should be done, if at all, outside the query where it's obvious to anyone looking at the code that this is something that should be paid attention to.


In this case, you don't "need an int", you need a value that's safe to put in a database query.

If you're inserting a value in a database, you always, always, always use the proper escaping mechanism. No exceptions.

That's why using a library with a reliable, well-defined, easy to use escaping system is absolutely imperative.


In any case the real problem with this line of code is not the int cast, it's that it's using a GET request to delete a record.


Not necessarily - there can still be data in the _GET array, even if the request method is POST.


Yes, fair point, but I do think it's a red flag. On looking a bit further I see that yes, in fact, he is using a GET link to delete records: https://github.com/Paton/Saaave/blob/master/_views/browse.ph...

This is an actual, serious problem which I would note in a code review, as opposed to the int thing which cannot, as far as I can can tell, ever lead to an exploit or malfunction which would have been avoided by using a named escape function in this code. If anyone can think of a specific example to prove this wrong, please say so.


It still doesn't check if a particular id belongs to the user, so you can delete all the items in the table. But I agree that's different kind of problem :)


Your assuming it's a system where data belongs to set users - maybe any user is allowed to delete any piece of data.


The search obviously doesn't find all cases, but is a good start.

While there's nothing technically wrong with the example given, I might argue that since that won't work in all cases, it might be better to enforce a more rigorous policy of SQL query cleansing, or using bound params. Although this example is so simple I might not.

Then again, the fact that $_GET is even available at the location the query is taking place means this is most likely a type of design that I abhor, that PHP makes easy. Put actions in functions or methods, and then call them.


Pretty much every language will make getting direct user input then passing it to a database easy. What generally makes this less easy (or at least less intuitive) is a framework. Don't compare the likes of Rails or Django to PHP. Compare Laravel4 or Symfony2.

That doesn't mean PHP doesn't deserve some stick, it does, but most of it's current reported problems spawn from backwards compatibility. Nobody should be using mysql_*, they should be using prepared statements via PDO.

They could solve this by deleteing all the functions you're not supposed to use, but a whole bunch of legacy PHP would stop working. I'm fairly sure this would illicit more hate than the current method of slowly deprecating.


What I meant is that $_GET, $_POST and $_REQUEST aren't available in functions unless you declare them global. The fact that $_GET is being used in the query means either that the query is being performed in the main of that script (probably lots of small php files meant to be called by the browser), or that they are in a function/method and then pull in the global $_GET array.

IMHO, neither are good design choices, as one couples the program to HTTP too tightly (and not in a sane way), the other leads to crazy spaghetti PHP as the control flow can be affected by global variables set (or received from the user) far from where they are used.


I would like to say i'm surprised, but I'm not. PHP makes this easier by not even supporting parameter binding in the older, original mysql binding, so it's more prevalent.

That said, I'm sure a slight tweak to the search would find a lot in other languages as well.


Sort the results by "last indexed" and see that people are doing it right now!

https://github.com/search?q=extension%3Aphp+mysql_query+%24_...


There is a huge need in the space for a well marketed quality assurance contractor who can find problems like this and fix them.

"We found these issues, and we can fix them all. Pay us for finding them or pay us some more for fixing them, too." sort of thing.

Why don't you see QA shops popping up like this?


That feels a bit like "We found these vulnerabilities. You should pay us. It would be a shame if anything happened to your website"


If you do it right, it won't. Many security audit companies are asked to do test audits, and I've seen security audit contracts starting with responsible disclosure of vulnerabilities. Of course, certain amount of trust is necessary so the sides have to behave in a way that is conductive to the establishment of the trust.


> I've seen security audit contracts starting with responsible disclosure of vulnerabilities.

But was that

"We found a hole, here it is." "Thanks! Your good at that" "I know, want to hire us?"

or

"We found a hole. Pay us and we'll tell you."

Because those are very different approaches.


You'd need some pretty iron-clad liability waivers... if you miss something, you're opening yourself up for a "malpractice" sort of scenario.


Liability can be addressed with non-reliance disclosures.

Bigger issue is a market problem. People who need the help the most do not know they need it.


But then, you'd basically be getting paid for maybe kinda helping secure software.


Cigital does exactly this and more. We've even got a plugin for Eclipse that will provide suggestions on secure coding for Java.

Info about the QA/Security Consulting: http://www.cigital.com/services/

Secure coding plugin - http://www.cigital.com/products/secureassist/


Looks like Code Climate is providing an automated service like this for Rails apps: https://codeclimate.com/security-monitor


There are loads of companies already doing this, at least in the UK. They find the holes and you fix them.


The space is dominated by security oriented companies. Bugs are oftentimes not security issues though.


List please? I've tried Googling and not found them.


I guess one of the reasons is that it's much easier to fix those problems than to find them.


I'm just amazed and disturbed that people who write this kind of code are aware of version control.


The person who checked in code isn't necessarily the person who wrote it in the first place.

When dealing with legacy code, especially code going back into the 1980s for example, it's not at all unusual for the code to have been stored in several different version control systems over that time. I know of projects that started with SCCS, moved to RCS, then to CVS, then to Perforce, then to Subversion, and most recently to Git.

If those transitions are done quickly, then somebody will often just take all of the code from a checkout of the old VCS, and check it into the new system.

The same can happen when initially using a VCS, after having not used one before, especially when taking a project over from a developer or even a team that was less-talented. Even if the new developer(s) are going to review the code, and fix bugs and other flaws, a version with the initial state of the code is often a useful thing to have.


I guess people would pay for a service that could identify 90% of all security issues with an online service by going through source code and available routes. Anything that is available today?


I've seen quite a few things that are germane, google "static analysis for security",pentesting etc

http://pentestlab.wordpress.com/2012/11/27/automated-source-...

https://www.owasp.org/index.php/Static_Code_Analysis

http://code.google.com/p/yara-project/

http://ulsrl.org/

http://www.lightbluetouchpaper.org/ (the 3rd post

(these aren't the services, these are what you shd read to decide if some service's operatives are appropriately expensive, and up on current research



Tinfoil scans remotely for known vulnerabilities (i.e. WordPress version X has exploit Y). What you really want is source code analysis. WhitehatSec.com and Veracode.com are the two huge players in this space.


For the Rails folks out there, I have had pretty good luck with Brakeman Scanner in the past.

http://brakemanscanner.org/


In addition to $_GET searching for $_POST and $_REQUEST are bad too. Could even through in $_COOKIE and $_SERVER for that matter.


GitHub (and friends) sound like a great repository to train and prove the value of an automated code review product.


Using unsanitized $_GET is the least of their problems considering mysql_* is deprecated.


So a gaping security hole that compromises all data is less important than using a deprecated interface?


For an application with no exposure, arguably it is.

When you upgrade to PHP 5.6 and your application grinds to a halt because mysql_query isn't available, you'll be wishing you'd fixed it sooner.


If you're on RHEL/CentOS, 2023 is gonna be a bad year.

(RHEL/CentOS is currently on PHP 5.3 and will stay there for a long time.)


He said it was the least of their problems, not that it was the least important.


I'm sorry, isn't that the same thing?


I believe the ext/mysql is deprecated, not the function names. Mysqlnd is a drop in replacement and unencumbered by the copyleft license issues that plague the original extension.


The mysql_* function API is part of ext/mysql and is deprecated — mysqlnd is a lower level library that basically replaces libmysqlclient in the stack (and is used from PDO and mysqli as well).

I'd be interested in a cite on the licensing issues, incidentally; I'm not aware of anything. (There _is_ an issue with the JSON extension at present, though, so it's not impossible.)




Some of these look like deliberate examples of vulnerable code (e.g. the one named "Injection.SQL.php")

Alarmingly (and sadly) most do not.


No need to be alarmed (or sad.)

Without some real-world context for each project, we just can't say whether the code we're looking at represents a real security problem. I'm certain that github hosts just as many one-off, let-me-scratch-some-code-together-that-I'll-never-use-again type projects as it does hard core, production quality ones.

That said it's a neat technique for quickly auditing code. Someone should now write an automated tool for submitting security patches to all of these projects.


I'd love to see the number of search results graphed over time.


75 thousand results. And these are just the public repos!


mysql_query is deprecated... use MySQLi or PDO


MySQLi or PDO will not automatically solve almost any of the security issues found in this search.


Using either of them correctly will.

It is very easy to verify that a mysqli or PDO call is correct by looking at it. The same cannot be said for mysql_query.


90% of example code for PDO uses prepared statements. 90% of example code for the mysql_* functions is ancient and full of security holes.


This f'ing insane, and absolutely brilliant.


;.;




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: