Hacker News new | past | comments | ask | show | jobs | submit login

"(And strictly speaking, Facebook isn’t written in PHP; it’s written in a C++ macro language with a striking resemblance.)"

I'm a Facebook engineer who works on the HipHop compiler and HipHop virtual machine. It's in PHP, absolutely full stop.

It's amazing how much the fact that g++ is involved somewhere in the toolchain confuses people in this matter. C++ is just an intermediate representation; the source language really is warts-and-all PHP.




PHP as a language is, at present, defined by the behaviour of the canonical PHP.net implementation. Change the implementation, and de facto you've changed the language; Perl 5 is similar in this respect. A claim that an alternate implementation or tool chain supports _that_ very same language is a curious statement. It is an alternate implementation, and surely it must differ behaviourally, even if only by having unique bugs.

IIRC Facebook engineers must use HipHop for all their development, with PHP.net's php being now incompatible (hence the push to speed up the interpreter.) Wouldn't that make the language-supported-by-HipHop a PHP flavoured superset at the very least?


Which canonical PHP.net implementation? They differ across minor revisions, in intentional and unintentional ways.

In the absence of a standard, saying what is and is not PHP is necessarily a practical matter: useful PHP implementations are those that run non-trivial PHP applications. HipHop qualifies.

FB's dependence on HipHop is because of backwards-compatible extensions to the language (like yield, e.g.). These extensions don't prevent HipHop from running normal PHP programs, though our use of them does prevent us from using Zend. The extensions are under flags that default off, if you don't want to use them.


This was the same argument I was having in my head, and in the end I couldn't draw a clear enough line between something like Jython and something like HipHop.


Does warts-and-all PHP include that eval wart, or is it more like most-warts-and-a-subset-of-all PHP? Not snarky, just curious if there was something I missed.


The HipHop compiler will optimize much, much more effectively if non-trivial eval() is disallowed, as it is in FB's production code. Obviously, since it's an ahead-of-time system, complex code in eval() will run slowly. The HipHop virtual machine is perfectly happy with eval().


What is the problem with eval? Even Python lets you do that. There are legitimate uses for it.


It is difficult to statically compile code that uses eval, since it could be doing anything at run-time.


Name one. I can't think of any that aren't better served by other constructs.

eval does have one huge, honking problem though: it permits text to be interpreted as code. This is just asking for code injection attacks.

If you really need incremental/multi-stage evaluation, see MetaOCaml (http://www.metaocaml.org/) for the proper way to do it (without exposing yourself to injection vulnerabilities). It's a consequence of PHP's by-the-seat-of-your-pants approach to language design that the PHP devs settled for eval instead.


"Name one. I can't think of any that aren't better served by other constructs."

User input of code. It's hard to implement a REPL without it. Even if you do implement without it there's still an "exec" implementation hiding in there somewhere.

Also, on rare occasions, it is actually an optimization when used carefully, like the Python nametuple example mentioned nearby.

I'm just answering your challenge. I totally agree that in general it's a bad idea and that's an unusual case. While for it to work properly it has to ship with the interpreter, if I were designing a language I would move it out of the global namespace at least, and require some sort of explicit module import with lots of dire warnings in the documentation.


Even in the case of a REPL, what you want isn't "interpret this string as code in the language in which this program is written and apply it to the state of the currently executing program". What you're really looking for is "interpret this string as code in language X and apply it to the state of the given sandbox".

The problem with using eval for a REPL is that you want an interpreter, but eval gives you unprincipled incremental compilation.

Javascript (and I'm sure other languages) is moving in the right direction with sandboxed evals, but those are very difficult to get right in an imperative language, where a lot of code relies on global state (e.g. the DOM). Purely functional languages solve this problem well by disallowing side-effects. Of course there remain the issue of unconstrained resource usage but this is much easier to solve.


User input of code.

I use this in my Python data processing scripts. That is, when I generate data, I generate it as Python literals, either lists or dictionaries. When I have to process it, minimal parsing needed, I just eval it into my code.

  def parse_and_append(line, seq, str):
    if str in line:
        seq.append(eval(line[line.find(str) + len(str):]))

  nums = []
  mappings = {}
  for line in data_file:
    parse_and_append(line, nums, "nums: ")
    parse_and_append(line, mappings, "mappings: ")
Terribly insecure for webpages, sure, but very efficient use of my time.


JSON will let you do that just as well. Or, if you must, there's ast.literal_eval which will only allow Python literals.


I see no reason to use JavaScript to process my data files - keep in mind this is data post-processing of experiments, not a user-facing application. And I enjoy Python, so I'll stick with it. But thanks for the pointer to literal_eval, I have not explored that part of the standard library.


It easier to write an optimizer if you don't use eval since the eval code needs an interpreter.


> [...] since the eval code needs an interpreter.

Or a compiler at run-time.


For some reason, I can't reply to colanderman's comment above/below, but Python's namedtuple is implemented using eval (technically it uses exec, but it's the same idea in Python).


Interesting. Redacted, with apologies.


Oh. I guess you weren't using poetic license.


I was trying, but it appears to have been a bit too poetic.


Do your engineers use the PHP interpreter in development (for rapid prototyping) and HipHop-compiled PHP in production (for speed)?


They have an interpreted version of HipHop as well. Can't use the PHP interpreter due to a couple added features (python-esque yield is the only one I know of)


I find it to be an interesting philosophical question about what defines a language.

If you write code in Java, but compile it to native format (using gcj, for instance) instead of using the JVM, is it still Java? In doing so, you lose what is probably the language's biggest selling feature.

Or, perhaps less relevant now, but I remember the days where people would go on about how Ruby didn't support native threads. Except there were interpreters that did use native threads with the exact same Ruby code. Was Ruby code running under one of those other interpreters still a Ruby application?

PHP too comes with expectations about the environment, like how it integrates into the web server stack, which if not met, is it still PHP? I'm not really familiar with HipHop, but, for instance, if you have to compile your code before deployment, you're not meeting one very prominent expectation of PHP: The ability to edit the code in-place on the live server (best practices notwithstanding).

So, what does define a language? Is a language just the syntax? Standard library APIs? Standard library behaviour (see Ruby example)? Runtime environment? I'm not sure you will find a generally accepted answer.


There used to be a generally accepted answer, which is that there were languages, and there were implementations of languages, and that those were separate things. C++ was not g++, and vice versa.

But the lines got very much blurred by the rise of scripting languages, in which - typically - the language was defined by the implementation. So for a few years there, it really was difficult to tell whether this was Python or CPython.

Fortunately, nearly all the languages to which this applies matured and got new implementations. Rubyists often to refer to the C ruby implementation as MRI (Matz' ruby interpreter), and there are lots of different Ruby implementations now. Python went out of its way to document behaviour which was specific to CPython, and put a lot of weight behind Unladen Swallow and Pypy.

But your Java example is bonkers. Compiling Java to native is still Java, its just not JVM. Compiling clojure to JVM is just JVM. The language feature is not the language. In Java's case, the language is specified, the VM is specified, the bytecode is specified. There really should be no ambiguity at all there.


The problem is that most people lack the proper compiler development background when discussing languages.

Most people without a proper CS background mix the language with the implementation.

A language is defined by:

- syntax

- semantics

- libraries

Everything above can be made available as:

- bytecode interpreter

- text parser based interpreter (like the earlier BASICs)

- compiler

- JIT

That is why it is absurd to discuss language A vs language B in regard to implementations, because any language can have all types of implementations.

It is always a matter of cost/benefit which type of implementation is used as default for a given language.


Somehow i find that very comforting that such a heavily trafficked site relies on it. I 'm a happy PHP programmer for at least 10 years. Not overly happy about it but in the end not disappointed about it. I haven't found an overall better alternative tool for the domain yet.


Hm, I'll admit I've spread a bit of misinformation on that matter myself, thanks for the correction. The source for my confusion (and I suspect others as well) was some comments around the time hip hop was made public that it basically wasn't worth using unless you were willing to put in a significant effort to write your php according to some strict guidelines. That made me peg it as something akin to pypy's rpython.


I seem to remember reading that Facebook was using XHP, which allowed XML literals. That would certainly be an improvement over "warts-and-all PHP", in my opinion.

Of course, I'll trust what an actual engineer from Facebook is saying over some article that I can't even cite. Is this not the case?


XHP is orthogonal to HipHop, since XHP is available as a module for Zend PHP - https://github.com/facebook/xhp.


"It's in PHP, absolutely full stop"

Well, you're the authority, but is that really a fair assessment? Has HipHop diverged from PHP enough to really still be considered PHP?


He's using poetic license, if that's not obvious.


"Prosaic license"? If the essay was written in iambic pentameter, it certainly escaped me. Anyway, "C++ macro language with striking similarities" has a pretty precise technical meaning, which sane, literate people might actually mistake.


"poetic license" isn't the same as "poetic."

From the free dictionary:

poetic license n. The liberty taken by an artist or a writer in deviating from conventional form or fact to achieve a desired effect.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: