As someone who optimized a piece of PHP code recently, there are a number of things I ran into that were counterintuitive wrt performance.
- array_key_exists is 5X slower than isset (though there are different semantics)
- direct string concatenation is 2X faster than using "implode"
- memoization can be a huge win since function calls are 5X slower than array accesses. (No inlining in PHP)
Of course, these optimizations don't matter except in your hotspots. (Why we still have to have this disclaimer on HN puzzles me, but people seem to keep trotting out the usual Knuth quote out of context as a way to write off micro-optimization techniques in general.)
Also important to keep in mind:
- PHP has copy on write semantics for arrays. So if you set $foo = $bar (arrays), you don't incur any additional memory until you alter $foo. (Aside from the additional reference.) Once you change any entry of $foo, PHP makes a copy of the whole thing. (This can result in massive performance and memory bloat if you don't realize its happening.)
- PHP arrays are not arrays, but are a hybrid linear array and hashtable. ("one data structure to rule them all.") So, even a simple "array" of integers incurs more than what you'd expect memory wise. In fact, IIRC, an array of integers incurs approximately 100 bytes of memory for each entry. Ouch. There are extensions in new versions of PHP that allow you to use 'real' arrays. If you're stuck using normal PHP arrays, good luck trying to design optimized data structures for the problem at hand.
Even more micro-optimization: use single quotes for strings instead of double so PHP does not try to parse it. (phpbench seems to dispute this but doesn't say which version of PHP they are using?)
Don't forget the strange performance hit from require_once vs require, use your own flags instead.
>direct string concatenation is 2X faster than using "implode"
That depends on how many strings you are concatenating. The beauty of implode is that you can build an HTML snippet and implode it all at the end before output.
Never checked that myself, but I believe you will get better results (performance-wise) with output buffering than with concatenating output for one big final `echo'.
Thanks for these. I read the article and thought "These are kinda basic..." whereas I just had to decide between implode and concatenation.
edit: in retrospect, what I was trying to say was essentially "Hacker news can write a better article on php optimization, just in the comments alone." Elitism? probably.
I wrote a PHP application (kind of a journey planner) a few months ago that at "beta" took about 1 minute to run each time due to all the processing it did. I spent a good few days speeding it up and making it use less memory and it now runs in under 10 seconds. Here's a few things I did that worked for me.
1. Install APC and use it to cache objects as well as PHP files. Cache wherever possible. I used APC lots, as well as caching some heavy processing to disk. Cache results of expensive functions that are called many times in 1 script in a global or static variable.
2. Unwrap lots of the lovely OO wrappers, such as the ORM (similar to ActiveRecord). This made the code messier but much faster. PHP takes a big hit every time it instantiates a new object.
3. Take advantage of PHP's copy-on-write memory allocation; understand how PHP does garbage collection; use references where possible; understand what PHP is doing behind the scenes.
4. Profile with xdebug and kcachegrind. Great for finding what's taking up the time and which functions are being called many times. Inline small functions that are called many times.
Stopped reading after the first pointless micro-optimization. With some very rare exceptions, the cost of these kind of optimizations in terms of resulting in hard to maintain and change code are far greater than the cost of bluntly adding more hardware.
I mean seriously, if you're using PHP's OOP functionality but you find yourself having to squeeze out performance by dropping getters and setters on your classes, you should really first take a step back and take a good look at your entire approach.
Those kinds of optimization may be sensible to tighten some critical inner loops of your code. But especially there you should avoid high-level features anyway.
Also, before trying to squeeze out performance that way, you might as well rewrite that part of your code in C, and let the C compiler perform real optimizations.
However, chances are good that you don't even need that, as some intelligent combination of already existing fast PHP functions (implemented in C) usually does the trick.
Unfortunately, these tips were written by somebody who does not understand how PHP works. See detailed critique of the previous version here:
https://php100.wordpress.com/2009/06/26/php-performance-goog...
The article may have changed since then but some of the points are still valid.
It also doesn't mention current stable version of PHP (5.3) and still has no mention of bytecode caching. At least it mentions profiling now...
No mention of APC and they didn't even bothering to discuss profilers beyond a link (which doesn't even return xhprof on the first page!)
The getter/setter thing is the perfect definition of a premature optimization. Making that optimization will result in an infinitesimally small gain compared to what could be had optimizing bad SQL and caching deficiencies.
-mysql_query()? BAD GOOGLE WEBMASTER, NO COOKIE FOR YOU. Seriously - there is no good reason not to use PDO, or at least MDB2. (Doctrine is better, but it's just a superset of PDO.)
-References are a modern PHP programmer's (quiet you, stop laughing!) best friend. "Don't use shortened variables because you'll copy data! Just reuse the $_POST['barf']!" Never mind that $foo = &$_POST['barf'] achieves both goals...
Agreed. Hardly any of it is specific to PHP ("use the latest version", No SQL in a loop, use caching, use a profiler) and the stuff that is seems of marginal benefit.
This is an old article (2009), which the PHP team has responded to:
"With regards to the new article posted at [URL], all of the advice in it is completely incorrect. We at the PHP team would like to offer some thoughts aimed at debunking these claims, which the author has clearly not verified. "
The example they gave ($description = strip_tags($_POST['description']);) would trigger copy-on-write, unless I'm mistaken.
And the SQL example isn't meant to be copy/pasted: it's a demonstration of a fairly common anti-pattern that has nothing to do with SQL escaping. Adding escaping there would just confuse the point.
The problem, like with all bad code examples, is that people copy and paste it without thinking and don't look back if it runs, so here we are in 2011 with SQLi holes in literally every other website.
Either write a complete, secure example or don't write an example at all. Like a colleague of mine said, putting incomplete code samples on the Internet is like handing out loaded Glocks to children; expect feet and heads to be blown off. I wouldn't even write a toy example with mysql_query anymore because the number of footnotes required (that people would ignore) would fill a page.
I recommend Varnish (HTTP cache) where pages are returned instead of touching PHP. APC, a compiled PHP script cache and going to be used in PHP6 releases. Both should lower some resources required.
I guess these are okay suggestions, though a few of them aren't going to make a slow piece of code much faster. Micro optimizations result in micro gains. (On getter/setters, I think their uselessness is better grounds for wiping them out than function call speed, but I digress..)
I particularly dislike the glaring SQL Injection error and not using mysqli in the example. They could have at least used a fake escape_data() function around the values if they don't want to use prepared statements. And ignoring that mysqli_query() would be slow called inside a loop, the solution is taking an n loop to a 2n loop. Ah, if only PHP had inline Python generators to reduce it to one...
Yes, the SQL example was lame. There are too many PHP "tutorials" which attempt to demonstrate one concept while blatantly ignoring basic security principles. In this case, the example should have used prepared statements with either MySQLi or PDO.
I read "PHP: The Good Parts" on the train last night, and face-palmed the whole way - it's ALL written in an insecure style, except for the one chapter that's explicitly dedicated to security.
Using mysqli* functions would still be a "mistake" if you ask me. PDO exists, and gives you all the benefits of mysqli but with relatvely painless cross-DB support.
I totally agree that PDO is great. Though I also like the procedural style of mysqli_ functions (don't shoot!). (In Java stuff I have a "object, build yourself from these rows" OOPy pattern.) As for cross-DB support, unless you're using an ORM it's probably going to be painful. Given that some databases conform to ANSI SQL, others (like MySQL) do their own thing, etc., then the issue of built-ins and custom functions (e.g. LucidDB lets you write Java/Jython/Javascript user-defined functions/procedures/transformations) I don't trust any of the SQL strings to work on multiple databases.
PDO's advantage is a standard interface (like JDBC), and if you're planning to ever use more than MySQL with PHP then yes you should use PDO even for MySQL just to get used to the standard.
From my experience it's faster to make SQL queries in a loop then it is to make one huge SQL query that gets all the data, but requires post-processing in PHP.
Let me clarify:
If you are using all the data from every column and every row, then certainly - do one big query.
But a lot of the time you do code that gets a parent, then all the children.
You only output the parent once. So if you try to write a query that will return all the children at one, you also by necessity are returning the parent data multiple times - yet you only output the parent data once.
To do this typically you store a variable with the previous_parent_id, then check if the new row matches it in your client loop.
Don't do this. It's slower.
Get just the parent data and loop on it, then get the child data in individual queries.
The reason it's faster is database indexes. When you get the parent data you want a sort order - hopefully that column is indexed and the database can return it directly without sorting.
Same for the child data, you want it sorted, you have an index that covers the parent_id, and your sort column and the database can directly return the data to you.
But, if you try to join the parent and child table, only the parent data is pre-sorted. The child data will need to be sorted after the join - often on disk. This is terrible for performance. (The child index is used for the join, but not the sort.)
Additionally you are often transferring lots of data, because you are repeating the parent columns over and over uselessly. That's not free. Even if it's a local database the database server still needs to buffer all that data and so does the client.
Caveat: This is my experience with MySQL, it's possible other databases are able to use indexes to sort both the parent and child records, even through a join.
It really depends on the type of query you're making.
If you're talking about a tree, here's how to get a whole sub-tree (as in all descendants of a parent), without fetching parents multiple times, without multiple queries, sorted in a previously defined order and also indexed and really fast: http://dev.mysql.com/tech-resources/articles/hierarchical-da...
When you get the parent data you want a sort order
Not necessarily. People use a sort order mostly to limit the number of rows returned (say you want only the first 50 items with the lowest prices). But optimizing a SELECT using ORDER BY is really hard as there are other restrictions you need to be aware about (like if you're using an index on multiple columns, you can't have a range condition on the first column and sort on the second, at least in MySql).
That's why, if performance is an issue, there are ways to workaround the need to sort -- for example you can keep extra data, like page=1 if position is between 0 and 50, page=2 if position is between 50 and 100, and so on, such that LIMITing the query to the first 50 items is WHERE page=1 (basically storage-efficient precached queries - if the conditions are stable, you can do it).
And in cases where you can't fetch the data efficiently in a single query, you're probably doing it wrong (like you chose the wrong data representation - for example the relational model is really awful for describing anything related to graphs).
Of course, I'm not talking about cases when you're fetching unrelated data or cases where performance doesn't matter or cases when you've got BLOBs in your parent :)
You can also let the database aggregate that tree for you. For instance, PostgreSQL allows for ARRAY/RECORD data types that can be used for aggregating.
In addition, XML datatypes can be handy here. Those are generic, so you don't need to defined your RECORD types. However, if you use lots of XML elements instead of XML attributes, the communication overhead due to the big XML serialization might defeat performance.
This is why I prefer to aggregate JSON structures instead. Unfortunately, PostgreSQL doesn't yet have good JSON support, neither natively nor via contrib. So this requires a custom JSON implementation.
If you use a database without advanced data structures (such as MySQL) you can also try to aggregate your data by string concatenation. Of course, you need a non-occuring separater character, or better an escaping function here. This might still be fast, but will make your SQL query hard to write and hard to maintain.
In this case you're generally better off querying once to get the parent data, collecting all the child keys from it and queuing a second time to collect ALL the children.
The technique you've described is the reason early ORMs had a bad reputation wrt performance. I'd have seriously strong words with any of my developers I found querying inside a loop. Checkout the N+1 problem.
Unless you have 3 levels of tables to work through.
Yes, you can get all the data from the third level in one query - but then you have to find the keys you want from it in your client, and without an index. (Alternately you may be building a hash-index on the fly each time you retrieve the data, which isn't optimal either. For example if you store the key in a PHP array.)
Obviously you need to be smart about what you are doing. But simply always getting the data in bulk is not automatically correct.
That last example is poorly chosen on other levels as well:
- Because the values are concatenated into the query it makes the query harder to cache for the database. For performance reasons queries should always be prepared / parametrized. On Oracle this makes such a big difference that at a big customer of my employer a single query by a different app in its own schema (ran lots of times) was the bottleneck causing five-fold decreases in responsiveness in our app on our schema.
- There's no escaping at all, this is like an open door for hackers. Again, with parametrized queries there would have been less risk (you can still get SQL injection in triggers, but the risk is definitely lower).
This Article is an very good example how particular correct information can create an biased view.
@google
Please remove the article or replace it with an more accurate and complete one. If you really care about PHP performance to make the world a better place, hire some skilled guys that can help the development of PHP core and libraries. The side effect is that those guys will write better articles.
Why do you think so? It it doesn't contain any snide comments. Sure, it is clear that the developers of PHP made some choices that have a negative effect on performance, but so did the Ruby guys. Making tradeoffs that affect performance doesn't necessarily look bad and this article remains neutral as to the background of the performance issues in PHP.
As to hiring developers to work on PHP: I don't see how that would be of benefit to Google. They aren't using it anywhere and I'm sure engineers at Google wouldn't touch PHP with a pitchfork when it comes to selecting a language for a new project. That doesn't mean that PHP is left behind in any way, though, because Facebook and Yahoo are actively working on the PHP ecosystem.
No matter how slow the frontend is, the quicker the execution of that script finishes, the quicker that process can handle another request, saving you resources.
Properly optimized front-end will help your servers a lot.
Case of typical website: 6-15 JavaScript files, handful of CSS files, tens of images, 60-100 HTTP requests. On repeat visit still the same amount of requests, majority of them are just 304 Not Modified
Case of FE optimized website: 1 request for main document, one for combined and minimized .js, one for combined and compacted .css, 1-4 for CSS sprites files and some content images. Repeated request: 1 request for document (the rest of resources have far future expire time so are not requested).
Server has to serve order of magnitude fewer requests.
Agreed, but all the static assets shouldn't be served by your application server but either from a completely different machine (or CDN), or at least from a reverse proxy in front of the application server. Aside of eventual port or file handle starvation, serving assets should have no effect on your application server.
So while having one file with all the assets is advantageous for the end users, it should have no or next to no influence on application server performance.
Even more troubling to me, is that the author completely ignorss magic methods __get and __set. I ran the two tests he provided locally, as well as an altered version to use __get and __set, and the direct access method, and using magic methods were both roughly 10x the speed of the getName, setName methods.
Even ignoring the performance issue, why would you even consider using an explicit setName over __set ?
> why would you even consider using an explicit setName over __set ?
• You don't put logic of all setters in one place. Although you could create some dispatcher in __set that looks up method for the setter and calls it, that's a bit of magic that may not be expected by someone using the class.
• You can control visibility and override setters using standard PHP syntax rather than custom code in __set()
• Inside the class and derived classes it's clear when `$this->foo` is direct access and when it's a setter, otherwise you need to be careful about property declarations and visibility.
• It's possible to pass extra optional arguments to the setter
(none of these points are particularly strong, but there are non-insane reasons to use setters)
Forgot about those. Also I didn't know that they were actually faster.
In very simple classes I think the explicit getters and setters get the job done. Of course if you want to invoke it a lot your solution is a lot better.
I was mainly debating the notion of sacrificing readability and maintainability for the sake of a few microseconds, I feel it's wrong, if you care about speed that much, I think you shouldn't be using PHP in the first place.
You know what's much, much faster but no-one will ever recommend?
Globals instead of copying gobs of data between classes.
For example WordPress used to use a global object cache that was passed around by pointer reference. Modern versions now throw around multiple copies of huge gobs of data by copying it back-and-forth over and over again, like all the users and posts and comments on a page. Makes a HUGE performance decrease that is easily measured on complex pages (200-300% slower). If you have frequent cache misses it's quite a workout for the system.
This article is really, really rudimentary, and not in the good ways. Why does it recommend "not copying data" without explaining references? Why is _Google_ recommending the use of bare mysql_query? I don't care if it's a toy example, they should be reinforcing best practices (or at least not-horrible practices) by using PDO or something similar.
Granted, most PHP programmers I know are not particularly competent. (I am, but I'm also weird enough to be willing to get competent with something like PHP.) But we can at least _try_ to hammer good practices into them.
Sacrifices readability for performance. Sometimes that's a worthwhile compromise, but often times not. And if you plan on referencing $description more than once (which, most code would), then it wouldn't make much sense to run strip_tags each time.
I doubt it even sacrifices performance. The temporary results of strip_tags() still has to be allocated in order to be echo'd out -- so you're using the same amount of memory. $description will be de-allocated as soon as it goes out of scope.
Both versions of the code do exactly the same:
1. Take description from memory
2. Allocate string and put the stripped version there
3. Output the string (here you might save if you don't use output buffering since no copying to the buffer happens but just output)
4. Release the memory allocated in 2.
Not trolling, honest question: why is PHP still used? What makes it preferable over alternative server-side technologies? I have programmed PHP before, and I know I could just Google this question, but this kind of search usually turns up rants by language zealots.
> why is PHP still used? What makes it preferable over alternative server-side technologies?
Years longer of being a web-focused language than other options.
Lots and lots of really good tools, all easily installable via PEAR or PECL, or on the development side.
An excellent community.
Lots of smart developers.
Lots of support from big companies (Not to mention acceptance at the enterprise level).
Continued development and progress.
Other technologies don't offer any big advantages to switching.
Despite what you might think by reading HN, PHP is still a major player, and still has a lot of momentum going forward. Using PHP for your web site is not the exception.
> Not trolling... I know I could just Google this question, but this kind of
> search usually turns up rants by language zealots.
Isn't that what trolling is?
I'm not sure why that would even provoke rants by language zealots. It's a simple question with a simple answer: big libraries and projects that can't be ported in an afternoon.
That is to say, Haskell is a nice language but there's no Drupal port to Haskell.
- array_key_exists is 5X slower than isset (though there are different semantics)
- direct string concatenation is 2X faster than using "implode"
- memoization can be a huge win since function calls are 5X slower than array accesses. (No inlining in PHP)
Of course, these optimizations don't matter except in your hotspots. (Why we still have to have this disclaimer on HN puzzles me, but people seem to keep trotting out the usual Knuth quote out of context as a way to write off micro-optimization techniques in general.)
Also important to keep in mind:
- PHP has copy on write semantics for arrays. So if you set $foo = $bar (arrays), you don't incur any additional memory until you alter $foo. (Aside from the additional reference.) Once you change any entry of $foo, PHP makes a copy of the whole thing. (This can result in massive performance and memory bloat if you don't realize its happening.)
- PHP arrays are not arrays, but are a hybrid linear array and hashtable. ("one data structure to rule them all.") So, even a simple "array" of integers incurs more than what you'd expect memory wise. In fact, IIRC, an array of integers incurs approximately 100 bytes of memory for each entry. Ouch. There are extensions in new versions of PHP that allow you to use 'real' arrays. If you're stuck using normal PHP arrays, good luck trying to design optimized data structures for the problem at hand.
See also: http://phpbench.com/