Hacker News new | past | comments | ask | show | jobs | submit login
A PHP Compiler, a.k.a. the FFI Rabbit Hole (ircmaxell.com)
139 points by ingve on April 23, 2019 | hide | past | favorite | 64 comments



Ironically about the only acronym he does NOT try to explain is FFI, the reason for the project.

> A foreign function interface (FFI) is a mechanism by which a program written in one programming language can call routines or make use of services written in another.

https://en.wikipedia.org/wiki/Foreign_function_interface


When you think about how dynamic PHP is wouldn't you pretty much end up with the source code and a PHP Interpreter baked together into a binary?


From his GitHub page:

"Future Work - Right now, this only supports an EXTREMELY limited subset of PHP. There is no support for dynamic anything. Arrays aren't supported. Neither Object properties nor methods are supported. And the only builtin functions that are supported are var_dump and strlen."

https://github.com/ircmaxell/php-compiler/blob/master/README...

He basically has avoided that road thus far :)

I think you're right though. Things like eval() and ${$deref-me-at-runtime} aren't going to be easily compiled.


There are many who would argue that removing those features from PHP would be a massive improvement.


Replacing php entirely would be a massive improvement. I only had sorrows with it, the kind of language that is painful to write all the way through.


For every word a hipster programmer spends on how terrible PHP is, another 10,000 lines of PHP are released into the wild.


That comment is useless in a really broad spectrum

1. We have to criticise the tools we use. That is the only way we improve. If people had that mentality of not criticising a tool because of how popular it is, we would be stuck with cobol for life

2. Amount of code release has extremely little to do with a language being good or bad, just with its popularity. And it is hard to move away from a language once you have a solid base. Still the language flaws are right there and if for any reason you have to touch the code base, being for security upgrade, refactor or feature implementation, there is a potential for a lot of grief just because of the tool someone decided to use for the job.

3. What was really that about hipster programmer? It has been a while since PHP stopped being the mainstream language for backend web development so that really makes little sense.


> 1. We have to criticise the tools we use. That is the only way we improve. If people had that mentality of not criticising a tool because of how popular it is, we would be stuck with cobol for life

Your comment wasn't constructive criticism, it was kvetching.

And we are stuck with COBOL for life. I guarantee you that every financial transaction you make gets touched by some COBOL somewhere. One takeaway from this might be, "the horrors! Organizations should be spending more money rewriting things in the latest technologies!" Another takeaway from this might be, "COBOL was so revolutionary for its time that it enjoyed massive adoption three decades ago, and it's still good enough that it isn't worth the cost to replace it in many cases."

You may not enjoy working with it. I've worked with it, I didn't love it either. But it is really, really good at what it does and it doesn't suffer any of the issues plaguing software development in more modern technologies.

> Amount of code release has extremely little to do with a language being good or bad, just with its popularity.

The empirical view would be that truly crappy things don't tend to get both the adoption and staying power of something like PHP.

> if for any reason you have to touch the code base, being for security upgrade, refactor or feature implementation, there is a potential for a lot of grief just because of the tool someone decided to use for the job.

And this is unique to PHP? I've worked in so many different languages and architectures I actually can't ever remember all of them off the top of my head. They all have warts. It's possible to build fragile, incomprehensible code in all of them.

PHP's greatest issue is its accessibility. It is so easy to get started with PHP that for over a decade it was the first language for most web developers. That has led to a lot of really awful code being published, coupled with a lot of really awful, stale documentation scattered around the web.

> 3. What was really that about hipster programmer? It has been a while since PHP stopped being the mainstream language for backend web development so that really makes little sense.

If you measure by "technologies mentioned in HN headlines", maybe. Other sources find that PHP is still used by around 79% of the web (https://w3techs.com/technologies/details/pl-php/all/all). The yearly usage statistics graph is hilarious: https://w3techs.com/technologies/history_overview/programmin...

There are intelligent, reasoned criticisms of PHP from smart people and I enjoy reading those. Same as similar criticisms of Python, or Node.js, or Elegant Pootwaddle. But most of the time, "PHP is bad" is just a low-effort comment people make hoping to get a few nods from the internet.

Conjecture: the average Wordpress site is longer-lived than 50% of the startups launched through YC.


first, I am happy to see you are not just a troll, and actually engage in discussions, thanks

What I meant with cobol is that we could well be using it today in all day applications if we didn't search for better alternatives for security-sensitive applications on the web

> it is really, really good at what it does and it doesn't suffer any of the issues plaguing software development in more modern technologies.

could you provide examples? Because I could really use a better approach to developing in PHP and end my grief where I work.

> The empirical view would be that truly crappy things don't tend to get both the adoption and staying power of something like PHP.

that is a naive statement that disregards the history of PHP. When it was conceived there wasn't anything much better and people just went with it, and it was a lot convenient and filled a gap that javascript just couldn't fill, that is writing code alongside HTML.

> PHP's greatest issue is its accessibility. It is so easy to get started with PHP that for over a decade it was the first language for most web developers

That is far from its greatest issues. There are a lot of accessible languages out there, that is simply not the case. The problem with PHP is that it is bad designed at its core, and not much can be done without breaking compatibility. The surrounding tools are either really bad, example for refactoring, for intellisense etc. or really expensive (PHPStorm). There is less crap code written in haskell, or rust than in php not because haskell is a niche language, but because the language is designed to make stupid decisions hard, something that can't be said about PHP. The language has to hold the programmers hand sometimes.

Even python for this matter has such an idiomatic approach to language design that it feels just weird to write stuff that is not according to the language style guide, even more if programmers stick with style check tools from the beginning.

>Your comment wasn't constructive criticism, it was kvetching.

> But most of the time, "PHP is bad" is just a low-effort comment people make hoping to get a few nods from the internet.

That should not be an excuse to write equally low effort answers. I didn't really think much of my comment, I was just being sarcastic for the sake of it. It wasn't supposed to be constructive criticism. I mentioned criticism in response to what you said about PHP and it's code lines output per criticism.

But just to be clear, I am not saying anything with ill intentions, take this all with a light heart, no hard feelings.


Replacing a language does not make all the code written in it go away, though. PHP has its tentacles all over the web with WordPress, Drupal, Magento, etc. and those will not be replaced any time soon.


very true, and the main reason why it is still used. Still, does not disprove what I said, that replacing PHP would be an improvement. A really tough one in many cases, but still an improvement.


FB is replacing PHP with hack. Nobody outside of FB seems terribly interested now that it won't run existing PHP.


PHP 7 is so much better and PHP 8 is on track to become even better.


Can't really agree. 7.3 differs so much from 7.0 that migrating is just not possible without careful code rewrite. How is a version good when it is incompatible inside minor version releases?

I know little about version 8 though, but as long as they are not abandoning backwards compatibility it won't make up for the language flaws


True. what They are doing with PHP 7 and 8 is pretty impressive.


Not necessarily, the TclQuadCode [0] compiler which compiles Tcl, which is even more flexible than PHP, into native machine code handles a very large amount of cases.

[0] https://core.tcl.tk/tclquadcode/dir?ci=trunk


It depends on the code. You can write PHP code with all types either explicitly specified or inferable and no dynamic variable fun without it being too unidiomatic.


That's how we handled it in phc, for exactly that reason. Turns out was also useful for simple things like "what is the value of this float" which changed between different versions. Also gets you the standard library.

Detailed discussion here: https://paulbiggar.com/research/#sac-2009


Common Lisp is more dynamic than PHP yet is still compiled to reasonable assembly.


Common Lisp is much more dynamic in some ways, but much less dynamic in others. In PHP you can't redeclare functions or classes, while in Common Lisp you can. On the other hand, in PHP you can manipulate the current scope and make variables appear out of thin air, but you can't do stuff like that in Common Lisp.

For example in PHP:

    $a = $_GET['greeting'] ?? 'hello';
    eval('echo $a;'); // => hello, or an error or anything else if there is a query parameter called greeting.
    $c = [ $a => eval('return "hi";') ];
    extract($c);
    echo $hello; // => hi, or an error or anything else if there is a query parameter called greeting.
While in CL:

    (let ((a "hello"))
      (eval 'a))
    ; => debugger invoked, the variable A is unbound.
A CL compiler can make a lot of assumptions about the local environment, and only has to account for real dynamicity in the global environment. For PHP it's basically the other way around. Since most optimizations seem to be relatively localized, that might be a real issue. But then, of course, a good compiler could fall back to unreasonable assembly wherever such dynamic features are used.


You can't manipulate the lexical environment using the standardized language, but sbcl and other lisps provide libraries for programatic manipulation of the lexical environment.


Conceptually, you just need to add an extra layer of indirection to local variable lookup.


But we can use dynamic binding:

  CL-USER 29 > (let ((a "hello"))
                 (declare (special a))
                 (eval 'a))
  "hello"


If they let you call a function to eval a dynamic variable containing php code then yes, but that's generally not a great programming practice from a security and performance perspective.


It's not a great practice but super common. I have seen code like

$i = "j": $$i = 3;

WTF?


This isn't all that uncommon in C. It's not really fundamentally different than pointers to pointers, just a slightly different syntax:

    int a = 1; int *b = &a; *b = 2;
If this can be compiled to efficient assembly, I don't think that kind of variable binding presents much more of a challenge.


The content of the $$ is a string so it can be anything including invalid variable names or unknown variables. There is nothing the compiler can do. In C it's totally deterministic.


In C the variable to be dereferenced can be uninitialized or it can result in a segfault. $$ is similar though the potential space indicated by an arbitrarily long string of characters is much larger than the potential space indicated by an integer or long - really I think PHP has an advantage here since it only needs to reference it's internal symbol table which it has full control over vs. memory at large where an erroneous symbol could actually be in a program's data memory and look like it's correct - a typo'd var var would have to be either a valid symbol name or necessarily be known to be bad.


This isn't true: a C program could read a pointer address from the user, convert it to an int and store it in the two-star pointer, then dereference it (this is essentially what gdb's `x` command does).

All this is essentially a double hashtable lookup (assuming the lexical environment is implemented as a hashtable, but if it's something else, it's similarly trivial):

    environment[environment["i"]] = 3
People have been compiling extremely dynamic languages since the Lisps, at the very least; and, assembly itself isn't really a "statically typed" language.


Such a program invokes undefined behavior and thus isn't a correct C program.


Hmm, as far as I can tell, if you cast an appropriately sized unsigned integer to a pointer type and then dereference the pointer, the result is only UB if the integer isn’t the address of an initialized memory location compatible with the target pointer type.


You just have the compiler add all local variables into a dictionary of pointers keyed on the variable name whenever $$ appears in scope. Then compile all $$ as the dictionary lookup.


The fact ${anything interpreted as a string} exists is what makes php great and awful at the same time. It let you do things which will demand a lot of contrived patterns, classes and interfaces in other languages. But it also mean you usually end with some nightmare to maintain full of security holes.


Yeah, I would never hire someone using dynamic variables, any occurrence of "$$" is a no-no and just denotes poor engineering.


I think if PHP is the only language you know this pattern probably feels extremely elegant. The JS guys also like to do stuff like this.


I write a lot of JS (read: TypeScript if I have any say in it) for work so I’m probably a JS guy.

I hate stuff like that. It breaks more than it helps. It overcomplicates trying to dance your way through reasoning what is actually happening. It’s unreliable and I hate it.

Carry on...


Just wait until you meet dynamic dynamic variables, nothing more elegant than that...

    $x=0;$a='x';$b='a'; echo $$$b;


Lazy coder here :

    foreach(['a' => closureA, 'b' => closureB, 'c' => closureC] as $name => $toApply) {
      $myObject->${convertNameToFieldName($name)} = $toApply($something);
    }


Too clever by half - a textbook example. This is, in essence, write-only code.


Haven’t seen that one in the wild. Can that chain just keep going?!


Not sure how long .. but the chain can be quite long!!

    $x = 'y';
    $y = 'x';
    echo $$$$$$$$$$$$$y;
I've tested with 1500 `$` and it still works :)


Good to know that this method scales well ;).


Looks really cool, but it's a little above my skill level. Does this mean you could protect your PHP source code if you wanted to compile and then distribute it?


Please don't do this. It's both ineffective and pisses off those of us who try to produce good bug reports when your obfuscated software misbehaves.


This is just a general issue with third party code - C/C++ libraries have always suffered from this weakness. Still, I've worked in places where if we were unable to distribute our library closed source the business wing wouldn't let us distribute it at all - everyone is welcome to their thoughts on copy-left software, but a thing that exists wouldn't have existed if we could only release it open-source (and yes, we even explained how licensing worked to the higher ups)


I'm confused.

This is a compiler. The difference between distributing the output of this and a compiled C or C++ library is basically that this emits machine code that no sane C or C++ compiler would emit today. This does not prevent me from figuring out how the parts you want kept secret work, it only spreads bugs out over more code.

My objection is twofold: first, on practical grounds, this makes producing good bug reports harder. That's because there's more code to prove isn't the true source of the problematic behavior.

The second is aesthetic. If your goal is to keep your secrets, this does not accomplish it.


No, such behavior does not make technical sense, any more than bits can be coloured. Yet... https://ansuz.sooke.bc.ca/entry/23


I understand why this is generally not recommended but there are situations where someone like you wouldn't have to submit a bug report, so chill man.


I do security for a living. Generally speaking, code that A) is obfuscated, B) is written in PHP, and C) I'm told won't have bugs... is exactly the code I'm probably going to need to file bugs against.


Yes, the compile script outputs code via LLVM as a binary. So yeah, you could do it.


Except the "obfuscation" is a side effect, and not a very strong one at that.


Cool! Thanks.


Another cool (and more functional) compiler is Peachpie. You can compile WordPress and run it on. Net vm. From the reports it's even few times faster than "native" php :)

https://www.peachpie.io


I appreciate the clear explainations of AOT, VM, etc. Someone tried to explain AOT to me a while back, but I didn’t understand anything.


Does anyone know how this compares to PHC? It was an ahead-of-time compiler for PHP.

https://github.com/pbiggar/phc


phc is dead, so there's that :)

Apart from that, it seems to be following a similar path, but is a little bit earlier down it. Could probably look at the code we generated and copy some of the solutions.

PHC was also designed to support eval/dynamic features by embedding the PHP interpreter and optimizing around it. Not sure that this implementation is going down that path, which has upsides and downsides, the upside being that eval can be handled. Would be interested to see what he ends up doing.


How heavily does eval get used in PHP?


More than you'd think. I did some analysis (in 2008 mind you, long time ago, PHP4/PHP5). I think I found that half the packages on sourceforge (sourceforge!) had a use of a dynamic construct.

When I looked into it, it wasn't so much eval as dynamic includes, often used for localization:

    $lang = "en";
    include("l18n/$lang.php");


It is not used alot in modern code, the only use case that I think is reasonably ok nowadays from and used that way is template engines or config caches in development mode, that usually compile template code to generated PHP files, but eval() immediately during development for simplicity.


This is amazing work and great job documenting! I had no idea about the Foreign Function Interface coming to PHP 7.4, that's exciting.


Wonder why meta-compilation is not more popular? There are already some frameworks available like RPython for quite some time, Truffle is relatively new.


Very impressive work, will comment again if I manage a test compile of some minor project! :)



HipHop turned into HHVM, which as of Feb no longer supports PHP going forward. https://hhvm.com/blog/2019/02/11/hhvm-4.0.0.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: