Do you have any stats on the CPU/memory usage difference using this approach? Obviously your database/memcache calls are optimized, but I'm curious what the overall effect is.
When we deployed it, I started by adding only the overhead (output buffering and storing the output in a tree to render it later) throughout the entire codebase. When I benchmarked the before/after of this step, the performance difference on our pages wasn't measurable, it was within the margin of error. Which means that it was all gain from then on when we started using it to really save time on the DB and cache.
It's impossible for me to compare right now between using it and not using on a given page, because it requires so much rewriting and change in the way you structure parts of the code that you can't just turn it on or off.
I think the best way to compare would be to fork an open-source framework to use that technique and then look at the difference in CPU and memory usage to run the same site. I wish I had time to do that kind of tedious research for an article, but I don't... My goal was merely to share the idea, I'd be thrilled if someone picks it up and does an implementation that everyone can measure.
I developed that technology over months on a constantly shifting closed source codebase with 20+ developers committing code daily, that's another reason why comparing before/after is a bit difficult to achieve sometimes. Deploying that tech was a massive task itself, very far from a single source branching.
If I had to guess, I'd say that PHP memory usage would be increased, but not by much. After all you're only storing as much data as your final page HTML output - generally you want to keep that to a small size - plus some very small objects. However if your traditional MVC framework was already doing a lot of output buffering, there might not be a difference, the buffering is just moved to this new technique. As for CPU, I don't think it would be noticeable, we're just adding things to a small tree then traversing it once.
I think what is most wasteful about out implementation is the little extra code it creates to deal with datex instead of just echoing content. That's why I mention in the article that this would be better if handled at the language level, where all the concerns about memory and CPU could be highly optimized, in addition to benefiting from lighter syntax.