Show HN: A PHP parser written in PHP

aarondf · on Feb 22, 2012

This is not a snide comment, I'm truly curious. What is the point of "An [x] parser/interpreter/compiler written in [x]." I've seen one now for JS and this one for PHP. I lean more toward the sponge learning [1] side of HN, so forgive me if this is super obvious.

[1] http://alexrosen.com/blog/2011/05/sponge-learning/

edit: grammar.

mfonda · on Feb 22, 2012

The benefit is that you can use it to transform PHP code into an abstract syntax tree [1]. This allows you to do some really cool things like static analysis, code transformation, preprocessing, etc. in a convenient way.

The fact that it is written in PHP is nice because it allows you to use it in existing PHP projects in an environment you are already familiar with. I think the fact that it is written in PHP is less important than the fact that it can be used in PHP (e.g. it would be equally useful as a PHP extension written in C).

[1] http://en.wikipedia.org/wiki/Abstract_syntax_tree

EmielMols · on Feb 22, 2012

For instance to allow meta-programming (programming code executed at compile time) in the same language as is being interpreted/compiled.

But more formally, a self-hosting compiler says stuff about the language per se, see also http://en.wikipedia.org/wiki/Bootstrapping_(compilers).

Zak · on Feb 22, 2012

a self-hosting compiler says stuff about the language per se

That it's Turing-complete and can write to a file? I think all it really says is that somebody was hard-headed enough to do it.

aarondf · on Feb 22, 2012

Thanks for the answer and the link. I'm still trying to wrap my head around the chicken/egg problem, but that may take a little while.

jacquesm · on Feb 22, 2012

Line one in tfa: "It's purpose is to simplify static code analysis and manipulation."

exim · on Feb 22, 2012

It doesn't necessary apply to this specific posting. But in general, as with many open source projects out there, the point is in increasing author's visibility on the net and assisting author's CV in finding a job.

And there is nothing bad with this. It is just that many authors don't want to articulate on this and are trying to come up with some artificial rationales.

mbetter · on Feb 23, 2012

As far as I'm concerned, "just for the hell of it" is the most noble of all rationales when it comes to stuff like this.

bradt · on Feb 22, 2012

This would be a great way to create a PHP-based templating system that just uses a subset of PHP as syntax. Would be very fast.

bithive123 · on Feb 22, 2012

This must be a stupid question because I've noticed that it's commonplace, but why do PHP-based projects try to build templating systems in PHP? Do they not realize that the original use case for PHP was to turn plain HTML files into dynamic templates?

I saw this in Horde, the other day; mixed in with the usual PHP tags they had added their own XML tags for doing "if/else" type things in the template. What is the purpose of that?

geon · on Feb 23, 2012

I believe it's an attempt to assure there is no logic in the template code.

...which I think is retarded and anal, since the important part is separation of business logic and presentation. There is bound to be some level of display logic, which is why most template languages have loops and conditionals. But if you are going that far, why not just use PHP? It is arguably a very good template language.

leftnode · on Feb 22, 2012

PHP already has this built into it - http://us.php.net/alternative%20syntax - just use it.

ceejayoz · on Feb 22, 2012

That's not a subset, that's just a different syntax.

If you're making a template language for use in a SaaS system, you'd want to perhaps allow echo, but not file_put_contents.

soult · on Feb 22, 2012

Funny you say that, because what you are describing is already implemented and called the Smarty template system[1], one of the worst examples of the Inner-platform effect[2] the world has ever seen.

1: http://www.smarty.net/ 2: https://en.wikipedia.org/wiki/Inner-platform_effect

NameNickHN · on Feb 22, 2012

I'm saying this for years now and people think I hate templates as a concept. I only dislike template systems like Smarty. Ever since I learned that there is a {php}{/php} tag in Smarty, I new the system was broken.

Gigablah · on Feb 23, 2012

{php}{/php} is deprecated as of Smarty 3.

bradt · on Feb 22, 2012

The issue with Smarty is that it is yet another syntax to learn. A PHP template system that uses PHP for the logic and control structures makes a lot more sense IMO.

themonk · on Feb 23, 2012

That is what PHP is.

kemo · on Feb 22, 2012

I don't really think Smarty has anything to do with this. Though I do agree on the "one of the worst examples" point.

jakejake · on Feb 23, 2012

Well apparently this is not a popular opinion around here but I actually like Smarty when the situation calls for it. In version 3 the "block" feature makes Smarty templates somewhat similar to .NET "Master Pages". I personally like using Smarty as the View in an MVC pattern. I've even used it for email message templates.

I have one reason for not using Smarty in my more recent work but it has nothing to do with my feelings about Smarty itself. Rather just that a lot of web apps lately have all of the view logic in the client. So the server rendering of the view is often just outputting mostly static HTML for the Javascript to use as it's starting point. So it's not that I dislike Smarty, it's just that there's barely any logic at all happening at that level so there's no need for it.

paulhauggis · on Feb 22, 2012

I used to use smarty for all my projects. Mostly because I had designers working on them and it was easier for them to read/manage.

But, you can make just as readable code without smarty and load times are much faster (although smarty does have a caching system).

easy_rider · on Feb 23, 2012

As someone who has been forced to work with on a social web-app using smarty, I find myself asking co-workers to "just kill me now". Hey atleast I'm having a fun time moving all the business logic those smart smarty guys from India put in...right? no more, No but seriously, kill it with fire.

And caching..on disk, who cares? In general, dO we not have better, faster and more versitile options that would in any way justify a choice for smarty?

xatax · on Feb 22, 2012

PHP already has this: the `disable_functions` INI setting.[1]

Disclaimer: I've never had to use this, so don't know what the limitations and pitfalls are for this, and there may be security concerns that I'm unaware of.

[1] http://www.php.net/manual/en/ini.core.php#ini.disable-functi...

wadetandy · on Feb 22, 2012

There is a php port of the extremely awesome and popular HAML templating system from Ruby, phphaml: http://phphaml.sourceforge.net/

fooyc · on Feb 22, 2012

There is also a modern php implementation of HAML at https://github.com/arnaud-lb/MtHaml

I believe it's much faster too.

kemo · on Feb 22, 2012

Lithiums' templating "engine" does something like that, using the tokenizer.

amosrobinson · on Feb 22, 2012

Cool... Now define your Node datatype in Haskell deriving Show & Read, and pretty-print to that. Then you can (easier) do some interesting analysis and transforms!

jasonlotito · on Feb 22, 2012

How does this compare to PHP_CodeSniffer, and it's Tokenizer? You're both using token_get_all under the hood, but PHPCS has support for CSS/JS as well.

nikic · on Feb 22, 2012

PHP_CodeSniffer - as you already say - works with the source code at a token level. This is necessary, because it looks at the precise formatting of the code (like whitespace usage).

The parser is more for analyzers that are not interested in the precise formatting of the code, but want to look at the code from a higher level perspective.

For example, if you want to do control flow analysis and type inference working directly on the tokens would be really hard. An Abstract Syntax Tree makes this kind of work much easier, as you don't have to think about the tiny details of the language.

mgkimsal · on Feb 22, 2012

One of the interesting things about Groovy for me has been the runtime AST transformation stuff - annotating something as @singleton, then having the engine make it in to a singleton at compile time, etc.

Certainly this project doesn't get us there immediately, but might give some neat ideas for future PHP versions to incorporate.

nazar · on Feb 22, 2012

Can it compile itself then?

cfdrake · on Feb 22, 2012

It's only a parser - it just transforms plain source code into an abstract syntax tree representation. However, if you wanted to, you could use this tree for a variety of things - including translating and generating compiled code.

alexpak · on Feb 22, 2012

It's not a compiler, just a syntactic parser.

dawsdesign · on Feb 22, 2012

<?php eval($codeBase); ?>

wopsky · on Feb 22, 2012

Yo dog...

drewdrewdrew · on Feb 22, 2012

PHP inception brain explode

icheishvili · on Feb 22, 2012

It seems like you could have saved yourself quite a bit of parsing/lexing work if you had used the parser that ships with PHP:

http://us3.php.net/manual/en/function.token-get-all.php http://us3.php.net/manual/en/function.token-name.php

Very cool nonetheless.

mfonda · on Feb 22, 2012

They are very different things. token_get_all just tokenizes the code, but this tool parses PHP code into an AST. If you look at the source of this project, you'll notice that it does indeed use token_get_all to handle the lexing.

icheishvili · on Feb 22, 2012

I just went and read Lexer.php to see what you mean. Never mind on my previous comment :)

Well done with the project--I have a use case for it regarding enforcing PHP style guide @ $work.