Hacker News new | past | comments | ask | show | jobs | submit login
Facebook rewrites PHP runtime, will open source on Tuesday (sdtimes.com)
163 points by VonGuard on Jan 31, 2010 | hide | past | favorite | 98 comments



This was mentioned in that anonymous Facebook employee interview: http://news.ycombinator.com/item?id=1089800

Specifically:

Rumpus: So tell me about the engineers.

Employee: They’re weird, and smart as balls. For example, this guy right now is single-handedly rewriting, essentially, the entire site. Our site is coded, I’d say, 90% in PHP. All the front end — everything you see — is generated via a language called PHP. He is creating HPHP, Hyper-PHP, which means he’s literally rewriting the entire language. There’s this distinction in coding between a scripted language and a compiled language. PHP is an example of a scripted language. The computer or browser reads the program like a script, from top to bottom, and executes it in that order: anything you declare at the bottom cannot be referenced at the top. But with a compiled language, the program you write is compiled into an executable file. It doesn’t have to read the program from beginning to end in order to execute commands. It’s much faster that way. So this engineer is converting the site from one that runs on a scripted language to one that runs on a compiled language. However, if you went to go talk to him about basketball, you would probably have the most awkward conversation you’d have with a human being in your entire life. You just can’t talk to these people on a normal level. If you wanted to talk about basketball, talk about graph theory. Then he’d get it. And there’s a lot of people like that. But by golly, they can do their jobs.


The computer or browser reads the program like a script, from top to bottom, and executes it in that order: anything you declare at the bottom cannot be referenced at the top. But with a compiled language, the program you write is compiled into an executable file. It doesn’t have to read the program from beginning to end in order to execute commands. It’s much faster that way.

I lost count of the number of untrue statements there after about ten.


The person being interviewed was obviously a non-technical person who was trying to provide information as he/she understood. It's easy to just take every statement literally and proclaim the person an idiot, but it's almost as easy to translate these statements into their (obvious) accurate equivalents and get some value from this interview. I don't understand the point of the former approach.


To expand, the interviewer in the original interview:

- thought that Facebook's servers would be hosted in their office

- was unfamiliar with the idea of usability studies or tracking eyeballs during usability studies

- was baffled by the idea that facebook would track your clicks throughout the entire site to help determine who your closest contacts were

- was also surprised by the fact that in the cloud, nothing is ever truly deleted

- was surprised to see that facebook employees would have the ability to login as any user in the system


Wanna talk basketball?


Terrified stare


Sure. Just got back from watching KU/K-State. Great game; K-State played some strong defense down to the end of regulation, and KU had problems with poor shot choice the whole game. Thought for a minute KU was going to blow it again in overtime, especially after Collins missed the free throw, but that last foul on Brady (and the resulting five-point lead) was just the nail in the coffin.

I do wish they hadn't had so many timekeeping problems, though (and I'm not sure what was up with the aggressive travelling calls in the first half either).


Not without some graph theory:

http://www.colleyrankings.com/method.html


Obviously, this isn't the traditional way of looking at compilation, but it does make sense.

I guess the problem being described, somewhat obtusely, is that massive amounts of dependencies (in Facebook's case million of lines of code) are loaded in order to execute a relatively small page.

I'll admit that to a CS person who understand compilers, their description makes it look like they don know what they're talking about. But if you explained Facebook's interpretation/compilation problems to a non-techy, it would probably come out sounding similar.

(Note comment reuse: http://www.reddit.com/r/programming/comments/aodzg/one_faceb...)


It's the wrong distinction to make. PHP is slow because the runtime is slow, not because it's a "scripting language".

SBCL and V8 and TraceMonkey are all very fast "scripting languages" because they have a good codegen in the runtime. PHP does not do any codegen.

If I were explaining this to a non-technical person, I would say: A computer works by executing instructions, one at a time. When PHP is executed, it is compiled to a list of instructions, but not ones the processor can directly understand. It uses a "virtual machine" to read each instruction and then execute the corresponding instructions on the real computer. This is slow, because what could be one instruction actually becomes many, sometimes hundreds, of actual instructions. The goal of this project is to skip the virtual machine and run our code directly on the real machine, meaning that we won't have to do as much unnecessary work. This will make our application run faster!

Now, this still might make CS folks mad, because there is bookkeeping overhead other than deciding which instructions to run (you can't store "scalars" in registers, after all), and of course modern CPUs don't execute one instruction at a time. But now we're wandering off into esoterica -- at least the basic idea is right, and I don't use meaningless expressions like "scripting language".


Yes, the runtime is slow. But many language design decision, notably references, make it difficult to make a fast implementation (I should know, I did my PhD on it).

They're trying to explain the problems that Facebook are having, which is that there is just such a large volume of code. They aren't trying to give a short course in interpretation and compilation.

Scripting language is not a meaningless expression. Many papers use it. I submitted papers with "scripting language" in the title to PLDI and POPL, and there was not one complaint or problem with the term from reviewers (who found plenty of tiny "errors" otherwise).


What does "scripting language" generally mean. UNIX "scripts" can be written in C. Is C a scripting language?


Generally speaking, a scripting language comes from the set {Python, Perl, JavaScript, PHP, Lua, Ruby}.

Its a tricky term to define, much like "compiler". Actually "compiler" is a nice analogy: everyone knows what one is, but the actual test for "is this a compiler" is shaky, and returns true for lots of things that aren't _really_ compilers.

Anyway, Ousterout coined it in his paper on TCL, but annoyingly chose not to formally define it. The set at that time was (I think) {TCL, Perl, sh}.

But because he didn't define it properly, the word has been used for years without a proper definition. The best I can do is to say read my PhD thesis, starting on page 7, where I'd spent a meaty six pages explaining in as much detail as I can what it means to be a scripting language.


From my understanding of the issue, a scripting language was once a small language with one purpose which quickly expanded to fill all possible needs (e.g. Perl).

By this definition C isn't a scripting language, as it was small and stood that way.


How is that meaningful from a computer science standpoint?


I'm definitely stealing from someone else's comment a long time ago but they said it very well: The difference is that scripting languages have no main function (as far as I know).


This obviously is not in the spirit of what you're saying, but...

Python main() functions by Guido van Rossum

http://www.artima.com/weblogs/viewpost.jsp?thread=4829


I think the idea is that you optionally can have a main, but you aren't required to do so ;-)


Well, isn't that called a "general-purpose programming language"?


Maybe "scripting language" is the way today's kids say it?


C is a general-purpose programming language but I don't think anyone ever called it a "scripting" language... (besides "scripting" DSL's with C syntax in some games)


Quick question from a noob: why do references make it harder to make a fast implementation? I'm genuinely interested in the topic, could you post a link to your PhD thesis or some other relevant information? Cheers


I am guessing it is related to dereferencing. It seems Google agrees, it's also mentioned on Wikipedia at http://en.wikipedia.org/wiki/Reference_%28C%2B%2B%29#Relatio...

[snip]...consequence of this is that in many implementations, operating on a variable with automatic or static lifetime through a reference, although syntactically similar to accessing it directly, can involve hidden dereference operations that are costly.[/snip]


It's the description of PHP that isn't quite right. It's not read perfectly top-down like a script, nor is it interpreted at the same time as it's read. You can declare a function at the end of a file and use it in the first line.


Well, that's not quite right either. I've purged my mind of the edge cases here, but it generally involves includes.

I admit it sounds a bit off, like the quotee doesn't know exactly what she's talking about, but she's not egregiously wrong.


Hmmmm. There's no way you can name ten. I'd wager you can't name five, and I'd be surprised if you could name two.


I can name at least nine; assuming jrockway is more familiar with PHP than I am (very likely), 10 is reasonable or even understated.

1) The term "scripting language"; meaningless, and usually just used as an insult (as in this case) rather than with any reasonable definition.

2) Browsers don't execute PHP

3) Not all scripting languages are executed based on line order.

4) PHP doesn't require values to be declared before they're used.

5) Not all compiled languages are compiled to a separate file.

6) Most modern compiled languages compile to byte-code, which is not more "executable" than the original source.

7) Compilers do have to read the program from beginning to end. This might just be my ignorance, but I've never heard of a random-access parser.

8) Being compiled doesn't necessarily make execution any faster.

9) The compiled file still has to be read entirely, before it can be executed. Depending on code size, sections of it may be read repeatedly.


I thought we were talking about the bit jrockway actually quoted. Oh well, you're still miles off.

1) You can read from the interview that the quotee was speaking to a non-technical person. Even so, they're using the term in the commonly used sense. But you haven't shown that they're wrong, only that you disagree with the term.

2) You are right. This is the only falsehood I can see here. I'll stipulate that its ok in a 'you know what I meant' kinda way.

3) Your answer to 2) depends on the quotee speaking about PHP, but now they aren't? And in 1) you said that scripting languages don't exist. Basically, you're reaching. Anyway, please name a scripting language which is not executed on line order. PHP certainly is.

4) Values aren't mentioned anywhere. Anyway, PHP requires functions and classes to be declared before use. (Strictly speaking, values don't exist before they are used, so what you said doesn't make sense. I presume you meant variables, but variables aren't declared. So I presume you meant defined, but strictly speaking there is no such thing as a variable in PHP anyway, just a mapping of strings to values (see: all the literature on the topic, including mine)).

5) They're obviously talking in the context of a native compiler, so I see nothing wrong with this.

6) They do? First I heard of it. I'm guessing you've got a funny definition of 'modern compiled languages' to back this up.

7) Executables don't though.

8) Not necessarily, sure. The first iteration of my PHP compiler was 10 times slower. But to say that the quotee is _wrong_ just because some implementation can be slower than some other implementation, is faulty logic.

9) What's your point? Why are they wrong?

So I'll give you one. Two if we count number 8, but that only counts for the most extreme form of pedantry, and it doesn't really contradict anything. So, yeah, one. A long way from ten.


All of those complaints are from the quote.

1) All the more reason not to introduce incorrect terminology; non-technical people won't be able to understand that it's wrong.

3) There are legitimate definitions of "scripting languages"; the one I use is that a scripting language is not useful without 3rd-party code. The Bourne shell and JavaScript are classic examples of scripting languages.

QuakeC is a scripting language which isn't executed in line order.

4) Values are procedures (PHP doesn't have functions; the keyword is mis-named), classes, or variables. They do not have to be declared before use in PHP -- if they did, writing mutually-recursive procedures would be impossible in PHP.

5) I see nothing "obvious" about your statement. They state that compiling results in a separate file, which is wrong because some compilers don't.

6) JavaScript (in all popular implementations), Python, JVM languages (Java, Scala), .NET languages (C#, F#, VB.NET), Perl 6. I think even Ruby has a bytecode compiler, now.

7) There is no indication that this new PHP implementation compiles to native executables.

8) Their claim is that compilation makes execution faster. There exist cases where compilation does not make execution faster. Therefore, their claim is incorrect.

9) Their quote states that compiled binaries don't have to be read from beginning to end to execute. This is incorrect. When a binary is executed, or a library linked, it is loaded entirely into memory.


1) Whatever your opinion, the quotee is not wrong.

3) Please define it. Note that I spend six pages in my PhD on defining it, and there are no formal definitions. Ousterout introduced the term, and didn't define it.

That's a crap definition of scripting language. I don't even know what it means, and it's unusable. C++ is pretty much useless without the C++ standard library. Is it a scripting language now?

3) Nice. The fact that you could dredge up a minor language which peaked in 1996 does not make the quotee wrong.

4) This is all wrong.

> They do not have to be declared before use in PHP -- if they did, writing mutually-recursive procedures would be impossible in PHP.

Whether or not mutually recursive procedures have to be declared is a matter of parsing style. For example, its there in C since it was created using a one-pass compiler. Its not there in Java. In PHP, declaring a function (say x) puts the function into the function-symbol table under the entry "x". When calling x(), the function-table is looked up. It is not necessary to have defined x to parse code that uses it. For example:

  if (false) { x(); } // legit, x() is never called
In a single PHP file, classes and functions declared in the top scope are considered declared at the top of the file. This can lend the appearance that they are not required to be declared, but its a hack. If you include a file later, dont expect to call its functions now.

> Values are procedures, classes, or variables.

I presume you mean that variables, classes and procedures are all kinds of values. That is simply incorrect. Classes are not first-class values, and first-class classes and functions are approximated in PHP by allowing classes to be instantiated by name (using a string) at run-time. This is changing slightly in 5.3, but the semantics are complicated, and still use strings.

Variables are just syntax in PHP. As I said, look at any literature on PHP (or javascript if you prefer). Variables are syntax for keys in a local symbol-table, and aren't real entities. They are certainly not values.

> PHP doesn't have functions; the keyword is mis-named.

Evidence? I can't think how this is correct.

5) So what if some compilers don't. Nearly all compilers do. The level of pedantry here is astounding. Just because you can list an edge case in which they are wrong, does not make them wrong in the general case.

6) Ah, your definition of "compiled" is "bytecode compiled". What a funny circular argument.

7) Except that the quotee says it! "But with a compiled language, the program you write is compiled into an executable file."

8) "Is a rocketship faster than a bicycle?" Yes. "Aha, but if my rocket isnt moving, the bicycle is faster. Therefore I assert that rocketships are _not_ faster than bicycles". Bollox.

9) Loading from memory is not "reading" in the sense the quotee used, which was clearly parsing. I think you're being deliberately obstinate: try using the context of the article to determine what the words mean. I could argue that binaries do not have to be fully read into memory (they are loaded lazily, page-by-4k-page, by most modern OSes), but I don't want to start a discussion on it. Heaven forbid an obscure 90s OS used 8k pages.

Anyway, you're deliberately twisting the quotee's words and being pedantic, so I'm done here.


6) A lot of people share his definition - for example, everyone who says that Java, C# etc are compiled languages. I think thats enough people to make this definition true.

8) Being natively compiled does not make that program automatically faster. It depends on the use case (I/O vs compute bound) and it also depends on the quality and complexity of the code generator and optimizer. Furthermore, bytecode compiled languages (as opposed to native compiled) may be able to better optimize at runtime using JIT compilation due to being able to make assumptions you couldn't make at "compile time" in AOT compilation. Saying that natively compiled programs are faster than other kinds of programs is simply not always true.


3) 4)

<?php

echo foo('baz');

function foo($bar) { return $bar; }

?>

This looks to me like you can use PHP functions before declaring them above.


Functions and classes declared anywhere in a file are pushed to the top of that file by the parser. This obscures all the edge cases. Simplest example I can think of:

  echo foo('baz'); // error

  {
    function foo($bar) { return $bar; } // not pulled to top of file
  }


You'd have to work pretty hard to make a compiled algorithm slower than an interpreted one.


Not true. The runtime has access to information that the compiler doesn't, such as which branches are actually taken.


This makes no sense. Interpreters add indirection at every single statement in the program. The fact that they may know the direction of the occasional branch cannot possibly make up for this.


JIT is a valid strategy for a scripting language interpreter.


No. A JIT is a valid strategy for a scripting language implementation. An interpreter is a different thing. Many JIT compilers use an interpreter for running uncompiled code and profiling ("mixed-mode interpreter"), but "interpreter" and "JIT" are in no way synonymous.


-1 for that? Wow.


Can you give an example of this? In my experience compiled code is often 100x faster than interpreted.


Currently this is generally true, though not always. The compiler only has so much information available when it's compiling, the run-time potentially has more information. The most aggressive optimizations of compiled code rely on profiling data gathered from actually running the code. You generate representative usage scenarios, run your program using those usage scenarios (usually automated), gather data about how frequently different code blocks are hit through the use of instrumented binaries, then use that data to produce a highly efficient optimized compiler output.

However, not all software engineering groups have the capabilities to produce such highly optimized binaries, and there is always the risk that a user's particular usage patterns will differ enough from the expected patterns that they will lose the benefit of this extensive optimization. However, in an interpreted or byte-code language a lot of the same information needed for optimization is available to the run-time. A run-time designed for optimization may be able to take advantage of that, creating super efficient code paths based on actual usage. This model is more difficult to implement but potentially more robust than statically optimized compilation (and also has the potential to take greater advantage of differences in hardware, a statically compiled native binary doesn't have the ability to morph its optimization based on whether its running on a single core Atom or a 6-way Core i7 cpu, or some 100-core monster of the future, but a run-time potentially can).

In the average case most of this is just theory, but the potential is very real.

Some worthwhile background reading:

Trace Trees: http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-06-16.pdf

A blog post / talk from Steve Yegge on dynamic language performance and other topics: http://steve-yegge.blogspot.com/2008/05/dynamic-languages-st...


Very interesting. Can this kind of technology yield further speedups for languages that are already fast (compared to getting dynamic language speed closer to fast)?

BTW these technologies are not interpreters, but compilers (but runtime compilers).


Interpreters are not JITs.


But there is no reason why an interpreter could not contain a JIT to on-demand compile. In fact, the definition of interpreter that seems to be in common use is that it takes raw source code and executes it - nowhere have I ever seen anybody state that the compiler cannot on-demand compile the source code as its interpreted (perhaps to speed up future calls to that code). This is still distinct from VM based implementations, which compile the source code to byte code and the byte code is then executed or natively compiled languages where the code is compiled directly to the host processors instruction set.


its probably on a tangent, but the the theory is used in the OpenGL implementation of OS X [ http://lists.cs.uiuc.edu/pipermail/llvmdev/2006-August/00649... ].


Modern CPUs have branch prediction.


Your link is for this story, not for the quote. Here's the correct source: http://therumpus.net/2010/01/conversations-about-the-interne...


Thanks! I kept clicking the above link and ending up right back in the same place!

Time for a break.


If this is true and it is just a compiled version of PHP how much better is that going to be than APC or eAccelerator? There must be more to it than just compiled PHP.


The idea is probably to compile PHP into machine code rather than byte code.


I know you're thinking machine code must be better than byte-code, but that doesn't do any good, quite the opposite.

With byte-code you can compile sections of code to machine-code, based on runtime profiling / type-inference (as the JVM does). With ahead-of-time compiling to machine code, you're probably going to end up with a serialization of the PHP code in the final executable, plus an interpreter :)


[cringe]

This person knows just enough to be dangerous.


The article is completely unsubstantiated.

Well, I was able to put all the pieces together on this one, finally, and I now understand exactly what is up: Facebook has rewritten the PHP runtime from scratch.

Look if it's true, this is cool, it will no doubt be a great contribution to the OS world, but let's wait until Tuesday or until we have more concrete info beyond this author's guess that FB has completely rewritten the PHP runtime. I realize the author has little to gain from making this up, except for maybe 15 min of fame on HN, but still, don't believe everything you read.


I think that someone reimplementing PHP and cleaning up a lot of it's . . . quirks . . . could be a very good thing for the web. Someone who didn't care about maintaining backwards compatibility with code that relies on those . . . quirks . . . and who did care about language design, and clean implementation.

Unfortunately, I have this sinking feeling that this is not going to be that.


Is there an implication that all this optimization could be merged into PHP core? PHP's appeal is ubiquitous support. Even if it is open-sourced, I can't imagine a bunch of hosting companies suddenly serving Facebook-flavored PHP.


The PHP internals developers heaped scorn on compilation when I talked to them about phc (http://phpcompiler.org). I think it might be different with a Facebook seal of approval though.


Should the current internal developers object to the idea even when implemented, well, "a new set of PHP internals developers" would not be the worst thing that ever happened to PHP....


Wouldn't it have been easier to have moved away from PHP? From what I understand, Facebook only uses PHP for the most front of ends. Business logic is all in other languages. Isn't it easy to port the front-end into something else?


No.


Note: This user has worked for Facebook for two years.


I guess I should probably clarify what I meant by the "No." above:

Facebook has a large, well-tested codebase, with a huge amount of infrastructure built on and to support PHP. It would be ridiculous to expect Facebook to migrate all of this to a different language.

Re: the article itself, you'll have to wait until Tuesday.


What other languages? (genuine question)


Reddit was rewritten from Lisp to Python: http://www.aaronsw.com/weblog/rewritingreddit

Twitter from Ruby into Scala and JVM: http://www.artima.com/scalazine/articles/twitter_on_scala.ht...

I think there are two aspects of a language (correct me if I am wrong) -- it's elegance/simplicity/breadth of code produced by programmers that write with it --- and then the efficiency/effectiveness/HW-optimized executable code produced by its compiler once it's run: so how does PHP and Facebook fit in with all this?


Twitter's frontend is still definitely Ruby (on Rails). Based on what I've heard from people who have consulted there, it's a gigantic ball of crap. They've so heavily patched Rails 2.0 that they can't realistically migrate to a more modern version of Rails.


Which is sorta ridiculous considering that it's definitely not among the most complex of the Web apps out there. Even late competitors like www.shoutem.com are much more complex as they allow for a bunch of Twitters to be created on the same platform.

The only complex thing about Twitter is its size, and I bet their developers are working round the clock just to keep it from falling apart.


I do a fair amount of Rails, so I'm really curious here. How could it be that they've so heavily patched 2.0 that they can't move on? Anyone from Twitter care to comment?

I've worked on many Rails apps, and have upgraded the apps from version to version. It's a pain when key elements of the API shift, but it's not that bad...even when the project has monkey-patched Rails a lot. And twitter certainly has the resources to afford to dedicate a few programmers to this task, so I'm just not sure I buy it.


One of the contractors I spoke with said that they had a branch running Rails 2.1 successfully. When they deployed it in production, the entire application fell on its face.

Supposedly, the problem was caused by Cache Money, but nobody at Twitter wanted to risk moving to a different version again. They're still on 2.0 today. :-)

Another fun fact: Twitter has over 1,500 remote git branches. They also have bright green deer in the reception area of their office. :-)


FB is on a much different scale so it'll be out of question for short term but I wonder if they have looked at Quercus.


Erlang, C++, Java, Python and possibly others that I can't quite remember. See http://www.infoq.com/presentations/Facebook-Software-Stack for actual details.


Yeah, I had the same question. The PHP syntax definitely puts me off. That said, it's just the front-end. Might as well keep on keepin' on with what you're using, and since it's open source, just rewrite the thing.

Deep down inside I wish they were using ERB (or HAML) however, and he was writing HERB (or HHAML). I'm still enamored with Ruby syntax, and wish it would get some more big business love.


> That team were forced to sign NDA's, and taken to a very quiet, secluded meeting room where some cool new Facebook-backed open source project was described.

This caught my eye - an interesting use of the term "open source" that I haven't previously been aware of. This is only a single datapoint (and the article states the project is going to be opened up anyways), but I do have a feeling the term has been diluted and joined the buzzword ranks.


It's "open source" as in, "we'll release it as open source when we are good and ready". So far, that's been never.


Yeah, open source is kinda a verb now. Anyway, Facebook has other open source projects: see Hive http://www.facebook.com/pages/Hive/43928506208

Kinda silly to have a facebook page for an open source project, though.


Here's the main page for open source facebook projects:

http://developers.facebook.com/opensource.php

I agree it's a bit silly. I think they hope that it'll catch on sometime in the future and they'll have yet another type of group socializing on facebook.


You mean people actually socialize on Facebook pages?

It seems that the only effective purpose for pages is for celebrities/companies to push updates to their fans (and almost always ignore the reverse direction), and groups are even worse. Most pages and groups I see are things that you join because you agree with/identify with the name and then totally ignore. I wonder if facebook a) cares, and b) could do something to fix it.


Ya I used to agree with you completely. I still do for the majority of pages out there.

I recently learned of a counter example. There is a locally owned sports bar and restaurant that has a very active group. Most of the members are fans of sports teams that are across country and the games play regularly at this bar. The owner is active in the group too and definitely encourages and responds to conversation. The other group I know that is relatively active has more or less the same characteristics. It's people who identify with the group but are otherwise disjoint - and this is the best way for them to casually communicate. I think it's safe to say this was the original intent of groups and pages. Celebrities and companies are just taking advantage it. Of course, I have a handful of pages on my facebook profile, so I guess I'm as guilty as anyone else. (I'm With COCO!)

EDIT: abusing -> taking advantage


This is particularly interesting because PHP's runtime was rewritten for 5.1 back in late-2005 (on top of 5.0's Zend Engine 2.0 improvements in mid-2004).

And is PHP really considered "pokey"? Sounds like this guy is making stuff up because I get execution times of >0.01 seconds on my micro-framework.


PHP is dirt slow. When you look at the implementations of Lua or Python (which have approximately the same design as PHP), its about 4-5 times slower. Note this is only in the interpreter -- lots of the library code is written in C, which makes the difference somewhat less relevant.

The implication is that writing code that uses PHP's built-in libraries is pretty fast, but the more you write in PHP itself, the slower it gets. For example, my impression is that Yahoo isnt really written in PHP - its written in C patched together using PHP.

The Zend engine was "rewritten" for PHP 4, PHP 5, and to a certain extent for PHP 5.1. I guess it wasn't a complete rewrite because the legacy code from about 10 years ago is still in there. Anyway, its still dirty, badly written, slow, and very very badly commented. Hacks abound (and not the good kind).


This is a point I always have trouble impressing on people. They do a simple benchmarks on a tiny code base that pulls some data from the database, spits out some data, and they compare and PHP seems to be lightning fast.

The problem is that because it's interpreted at some point PHP slows down in proportion to the size of your code base. A small app runs really fast, a big app with 100,000 lines of code will kill your server unless you modularize it really really well - which harder than it seems, because the more modularized you make it the more separate "includes" you end up with in different files and then you come to realize that including a lot of files itself is a problem. And the nature of PHP's very loose coupling tends to lead to code that is nearly impossible to do large scale refactoring on once you have gone too far down the path.

I work on an application that has a very thin PHP layer that performs some simple web services that are the back end for a pure Java web app. Amazingly, when we load test it, the PHP part is the bottleneck, burning CPU like crazy just parsing all our files ... over ... and over ... and over. The java code meanwhile, while theoretically doing far more "work", is completely bored. We will probably look at using an accelerator of some kind or maybe just rewriting all the PHP in another language.


If your problem is parsing time, just use an accelerator. APC is the standard and best integrated.

On the other hand, if you have an opportunity to switch out PHP for something better (read: nearly anything), you should. Otherwise it might grow to a point that you can't remove it.


evaluate quercus.


"For example, my impression is that Yahoo isnt really written in PHP - its written in C patched together using PHP."

Yahoo! is a company, not a single application. Not everything at Yahoo! is written in PHP, but the vast majority of Yahoo! properties do use PHP heavily on the frontend, and not just as a way of patching together C extensions.


Well, if you use APC then that's all irrelevant. It's the equivalent of creating .pyc files. I never got the impression it was slower in actual execution, though.


its about 4-5 times slower.

Really? In what actual benchmarks, real world situations?


Note this is only in the interpreter

You did read this part right?



For the comparison you seem to be interested in choose the measurements where the programs are forced onto a single core

http://shootout.alioth.debian.org/u32/benchmark.php?test=all...


The argument being made was that the PHP runtime is 4-5 times slower than Python and that PHP only looks as fast as it does because of the C libraries. This is simply untrue and the OP wasn't able to back it up. Cores don't come into it.


The Python mandelbrot program uses 4 cores the PHP mandelbrot program uses 1 - that's why Python seems so much faster on mandelbrot.

The Python spectral-norm program uses 4 cores the PHP spectral-norm program uses 1 - that's why Python seems so much faster on spectral-norm.

The Python binary-trees program uses 4 cores the PHP binary-trees program uses 1 - that's why Python seems so much faster on binary-trees.


Right. Python is not 4-5 times faster than PHP. That was my point.


Actually I was basing it off the Language ShootOut. Last time I looked through it properly, Python was 16x slower than C, and PHP was 70x slower.

I wonder why its changed. PHP certainly hasn't gotten faster in the meantime.


A quick stumble through the Language Shootout suggests using PHP instead of C might make you need, oh say 10 to 100 times as many servers. So there is room for improvement.

Of course if your workload is not dominated by script running CPU time then it doesn't really matter. Even then, until the cost for extra servers and their management exceeds the engineering cost to recode, add servers.

I get execution times of >0.01 seconds – hope you aren't planning to handle more than 50 transactions per second.

It all depends what kind of world you live in. I know people that use their hard disks for booting and loading caches and that's it. If one query in a hundred requires a seek they will not keep up. Most of the rest of us could happily fork a CGI PHP for each request and not notice.


I guess you stumbled past the places where the benchmarks game website links to "Overall Performance: PHP is rarely the bottleneck (HTML slides)" http://talks.php.net/show/drupal08/7


Well, with quotes like "it is simply not a language designed for the sorts of workloads that Java and .NET are" I'm not sure of the author's technical level.

Also, his comparison of Zend's "folks" (which ones? Founders? Execs? Sales people?) to "gestapo officer looking for a spy: "What? Who said that? Who said it was slow? Tell us their name!"" isn't too apt - they'd probably like to know who complained about PHP performance, so they can offer them their products/services, rather than silence them somehow.


Probably the same people who decided the backslash was the correct character to use as a namespace separator.


I suppose once you do that there's no telling what you're capable of.


I'm curious; is this (http://timdorr.com/archives/2005/12/pgf-the-ease-of.php) the micro-framework of which you speak?


Nope, that's an older version of it: http://github.com/timdorr/asoworx




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: