Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A PHP parser written in PHP (github.com/nikic)
77 points by nikic on Feb 22, 2012 | hide | past | favorite | 39 comments



This is not a snide comment, I'm truly curious. What is the point of "An [x] parser/interpreter/compiler written in [x]." I've seen one now for JS and this one for PHP. I lean more toward the sponge learning [1] side of HN, so forgive me if this is super obvious.

[1] http://alexrosen.com/blog/2011/05/sponge-learning/

edit: grammar.


The benefit is that you can use it to transform PHP code into an abstract syntax tree [1]. This allows you to do some really cool things like static analysis, code transformation, preprocessing, etc. in a convenient way.

The fact that it is written in PHP is nice because it allows you to use it in existing PHP projects in an environment you are already familiar with. I think the fact that it is written in PHP is less important than the fact that it can be used in PHP (e.g. it would be equally useful as a PHP extension written in C).

[1] http://en.wikipedia.org/wiki/Abstract_syntax_tree


For instance to allow meta-programming (programming code executed at compile time) in the same language as is being interpreted/compiled.

But more formally, a self-hosting compiler says stuff about the language per se, see also http://en.wikipedia.org/wiki/Bootstrapping_(compilers).


a self-hosting compiler says stuff about the language per se

That it's Turing-complete and can write to a file? I think all it really says is that somebody was hard-headed enough to do it.


Thanks for the answer and the link. I'm still trying to wrap my head around the chicken/egg problem, but that may take a little while.


Line one in tfa: "It's purpose is to simplify static code analysis and manipulation."


It doesn't necessary apply to this specific posting. But in general, as with many open source projects out there, the point is in increasing author's visibility on the net and assisting author's CV in finding a job.

And there is nothing bad with this. It is just that many authors don't want to articulate on this and are trying to come up with some artificial rationales.


As far as I'm concerned, "just for the hell of it" is the most noble of all rationales when it comes to stuff like this.


This would be a great way to create a PHP-based templating system that just uses a subset of PHP as syntax. Would be very fast.


This must be a stupid question because I've noticed that it's commonplace, but why do PHP-based projects try to build templating systems in PHP? Do they not realize that the original use case for PHP was to turn plain HTML files into dynamic templates?

I saw this in Horde, the other day; mixed in with the usual PHP tags they had added their own XML tags for doing "if/else" type things in the template. What is the purpose of that?


I believe it's an attempt to assure there is no logic in the template code.

...which I think is retarded and anal, since the important part is separation of business logic and presentation. There is bound to be some level of display logic, which is why most template languages have loops and conditionals. But if you are going that far, why not just use PHP? It is arguably a very good template language.


PHP already has this built into it - http://us.php.net/alternative%20syntax - just use it.


That's not a subset, that's just a different syntax.

If you're making a template language for use in a SaaS system, you'd want to perhaps allow echo, but not file_put_contents.


Funny you say that, because what you are describing is already implemented and called the Smarty template system[1], one of the worst examples of the Inner-platform effect[2] the world has ever seen.

1: http://www.smarty.net/ 2: https://en.wikipedia.org/wiki/Inner-platform_effect


I'm saying this for years now and people think I hate templates as a concept. I only dislike template systems like Smarty. Ever since I learned that there is a {php}{/php} tag in Smarty, I new the system was broken.


{php}{/php} is deprecated as of Smarty 3.


The issue with Smarty is that it is yet another syntax to learn. A PHP template system that uses PHP for the logic and control structures makes a lot more sense IMO.


That is what PHP is.


I don't really think Smarty has anything to do with this. Though I do agree on the "one of the worst examples" point.


Well apparently this is not a popular opinion around here but I actually like Smarty when the situation calls for it. In version 3 the "block" feature makes Smarty templates somewhat similar to .NET "Master Pages". I personally like using Smarty as the View in an MVC pattern. I've even used it for email message templates.

I have one reason for not using Smarty in my more recent work but it has nothing to do with my feelings about Smarty itself. Rather just that a lot of web apps lately have all of the view logic in the client. So the server rendering of the view is often just outputting mostly static HTML for the Javascript to use as it's starting point. So it's not that I dislike Smarty, it's just that there's barely any logic at all happening at that level so there's no need for it.


I used to use smarty for all my projects. Mostly because I had designers working on them and it was easier for them to read/manage.

But, you can make just as readable code without smarty and load times are much faster (although smarty does have a caching system).


As someone who has been forced to work with on a social web-app using smarty, I find myself asking co-workers to "just kill me now". Hey atleast I'm having a fun time moving all the business logic those smart smarty guys from India put in...right? no more, No but seriously, kill it with fire.

And caching..on disk, who cares? In general, dO we not have better, faster and more versitile options that would in any way justify a choice for smarty?


PHP already has this: the `disable_functions` INI setting.[1]

Disclaimer: I've never had to use this, so don't know what the limitations and pitfalls are for this, and there may be security concerns that I'm unaware of.

[1] http://www.php.net/manual/en/ini.core.php#ini.disable-functi...


There is a php port of the extremely awesome and popular HAML templating system from Ruby, phphaml: http://phphaml.sourceforge.net/


There is also a modern php implementation of HAML at https://github.com/arnaud-lb/MtHaml

I believe it's much faster too.


Lithiums' templating "engine" does something like that, using the tokenizer.


Cool... Now define your Node datatype in Haskell deriving Show & Read, and pretty-print to that. Then you can (easier) do some interesting analysis and transforms!


How does this compare to PHP_CodeSniffer, and it's Tokenizer? You're both using token_get_all under the hood, but PHPCS has support for CSS/JS as well.


PHP_CodeSniffer - as you already say - works with the source code at a token level. This is necessary, because it looks at the precise formatting of the code (like whitespace usage).

The parser is more for analyzers that are not interested in the precise formatting of the code, but want to look at the code from a higher level perspective.

For example, if you want to do control flow analysis and type inference working directly on the tokens would be really hard. An Abstract Syntax Tree makes this kind of work much easier, as you don't have to think about the tiny details of the language.


One of the interesting things about Groovy for me has been the runtime AST transformation stuff - annotating something as @singleton, then having the engine make it in to a singleton at compile time, etc.

Certainly this project doesn't get us there immediately, but might give some neat ideas for future PHP versions to incorporate.


Can it compile itself then?


It's only a parser - it just transforms plain source code into an abstract syntax tree representation. However, if you wanted to, you could use this tree for a variety of things - including translating and generating compiled code.


It's not a compiler, just a syntactic parser.


<?php eval($codeBase); ?>


Yo dog...


PHP inception brain explode


It seems like you could have saved yourself quite a bit of parsing/lexing work if you had used the parser that ships with PHP:

http://us3.php.net/manual/en/function.token-get-all.php http://us3.php.net/manual/en/function.token-name.php

Very cool nonetheless.


They are very different things. token_get_all just tokenizes the code, but this tool parses PHP code into an AST. If you look at the source of this project, you'll notice that it does indeed use token_get_all to handle the lexing.


I just went and read Lexer.php to see what you mean. Never mind on my previous comment :)

Well done with the project--I have a use case for it regarding enforcing PHP style guide @ $work.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: