Hacker News new | past | comments | ask | show | jobs | submit login
Upcoming Hardening in PHP (dustri.org)
312 points by mmsc 3 months ago | hide | past | favorite | 126 comments



The linked CVE-2024-2961 article is a pretty fantastic read on its own:

https://www.ambionics.io/blog/iconv-cve-2024-2961-p1

People are so creative, I can't help but feel some hope for our future :)


i find it boring. because it's very obvious whoever added php://filter was clearly just adding exploit paths (java flavored ones no less). there's zero valid use for that thing.


Yes. Just like the Log4j issue root cause. Too powerful and abstract features to wield securely.

Or maybe if we keep intent out of it; features were added in a time when we all worried less about security and internet implications. I would like to say ‘in the security dark ages’ but we are probably still in that era. ;)


That’s for that. I’ve never seen it before. What a neat path they took.


What a perfect shit-storm!

- What people are saying about the filter protocol, and anything else that interprets the files.

- Why do people insist on making all remote file protocols transparent? Fuck that, a file is a file! If you want to support some remote stuff, do that on the OS, where it's opt-in.

- And PHP keeps insisting it's not a web framework while being a web framework. So it has no access control around local file reading, while clearly designing it to be used in a way where only a few directories should be accessible.

- A buffer overflow in glibc, go figure... But yeah, C is perfect and the only issue is that the devs don't know the language well enough or there aren't enough people reviewing this niche library.

- Didn't everybody decide to ditch the pre-unicode encodings from the standard tools? I remember the announcement (that was supposed to include glibc), it made a lot of people angry. Those have been an endless source of CVEs.


> I find it fascinating that people are putting so much efforts optimizing exploitation techniques, yet ~nobody bothers fixing them, even if it only takes a couple of lines of code and 20 minutes.

There's definite reward in having a 0-day. Either you can get a bounty, or sell it in the hacker-souk.

That "couple of lines of code and 20 minutes" is sort of in the eye of the beholder. If you are a highly-experienced language developer, the fixes are likely to be a lot more obvious, simpler, more comprehensive, and robust, than if you are a relatively junior IC.


I think if somebody wants to describe themselves as an "ethical hacker", and a conference wants to let people talk about exploits they've found, the minimum bar for disclosure is at least a description of a mitigation that could be taken, and ideally an actual code diff if its an open source project.

There's a bit of street cred for finding a 0day, a bit of glamour about figuring out the puzzle. There's not much for the person who fixes it. I think as an industry it might be worth trying to fix that somehow.


Selling hacks is ethical


Let's suppose you are right. Why not accompany that with a proposed fix, too?


Don’t necessarily agree that selling hacks is ethical, but if I already spent time figuring out how to exploit a system - reporting it to the relevant place is charity. Ill do that, but Im definitely not spending time trying to fix the code if the solution isn’t immediately obvious. ++ so if you have to fight to get the bug recognised in the first place


Why? And: Always?


Paying for bounties is paying for exploits. That is to say, choosing not to pay for exploits is tantamount to selling your customers off for a price, the price of the bounty.


I actually agree, in the same way that selling lock picks or guns is ethical. They are just tools. How they are used is the responsibility of the person wielding them.


I can think of benign uses for lock picks and guns. What is the benign use of a secret exploit?


One example I can think of is the WoW private server Warmane uses an RCE to extend client functionality.

https://www.reddit.com/r/wowservers/comments/1eebxwf/warning...


You've never needed to get root access on an old computer when nobody knows the password?


it doesnt have to be secret. for example unlocking old phones. There are certainly people waiting for the right exploits to get access to their old wallet.


I think it is probably because a lot of things are deemed as acceptable. For example, the stream filter chain one is only exploitable if the input to some php IO functions like file_get_contents are attacker-controlled, and those things are already treated as LFR vulnerabilities in application, not the language runtime. Also some of the them (e.g. stream filter chain) are fun and useful enough (turning LFI into RCE), so I bet there definitely some people would rather those thing is not fixed. Given that a properly-secured application wouldn't be affected.


Breaking something is easier than protecting everything from all fronts.

Hackers write the worst code, but all the mess needs only one successful hit to become a 0day.


Instead of making a website about it, you can take any step of your exploit chain and change the code that exploit cannot possibly work, and submit that as patch. You would still get a CVE number assigned that you can add to your resume.

For example, look at the glibc/iconv CVE some other user posted[1]. In the section "Out-of-bound write when converting to ISO-2022-CN-EXT" they have mapped out the boundary checks. By diagnosing the problem this detailed, they already did 90% of the work. The other 10% are the patch and writing to the mailing list.

[1] https://www.ambionics.io/blog/iconv-cve-2024-2961-p1


Making a website about it benefits other people; finding the vulnerability helps other people; even if its 10%, why can’t someone else do it?

Surely someone doing all this would already have submitted a patch if they felt comfortable.


When liability and cybersecurity laws start being more hardly enforced, many companies will certainly bother to fix them.

It like no one cares to keep a kitchen clean, or a factory in order, until the inspection shows up and closes doors.

Naturally even for those, we are at various levels of how those inspections are honestly enforced across the globe.


Well, you can produce the exploit all on your own and showcase it.

But to get your fix in, you'd have to interact with the PHP ecosystem.


Something I'd really like is for PHP to somehow be stricter on the number of arguments passed to a function.

As of now, PHP emits an error if arguments are missing but not if there are too many.

A way to bake that in without breaking old code would be to allow function definition to put an explicit stop to the argument list, for example using the void type keyword:

    function foo (int $a, string $b, void) : bool
    { ... }

A few month ago I discussed this on the development mailing list and people seemed to agree and even suggested that this would be a good idea by default without the keyword thing I suggested. But I never got the time to properly write an RFC. There is already an old one from years ago that was voted against but In was told it was from before anything strict and typing related was considered important in PHP. If anyone's up to it, please write this RFC :) !


FWIW, core PHP functions do throw an ArgumentCountError when passing fewer OR more parameters than the signature allows.


Yep! And currently the behavior is not consistent with user defined functions.


I'm not a fan of the void argument syntax. Wouldn't something like the code below work? We already do it with `strict_types=1`.

<?php declare(strict_args=1);


Mark the function with an attribute to allow the old behavior. See link for similar use case.

https://www.php.net/manual/en/class.allowdynamicproperties.p...


Yup, that's even better.


That's equivalent to my initial proposal, except that mine adds the information in the type signature of the function rather than in a decorator.


If I understood your proposal correctly, to get the new behavior add an explicit stop to the function, my proposal add attribute to keep old behavior.

Thus if understanding this right that would require to update every function signature to new behavior rather than marking a few functions to get the old behavior and automatically get the new better behavior of every other function for free.


Oh okay, you would do it in reverse. I strongly agree with that. But that means the new feature is opt-out rather than opt-in, and it may break some old code. Maybe it should be done in two-steps (opt-in + deprecation, then opt-out).


Typically you emit E_DEPRECATED for a full major version, then in the next major version you throw an error, e.g if it would land in PHP 9.0 then E_DEPRECATED for non-compliant functions and in PHP 10.0 start throwing errors.


One of the main advantages of actually allowing more arguments is forward compatibility:

You can, within a library, provide an additional argument to a callback without actually introducing a BC break for all users.

My favorite approach would be allowing too many args on dynamic calls (closures, and function calls with dynamic name, not method calls in general) and otherwise rejecting it.


There's no need to have this in the language just to solve the case you describe.

"function current_thing(id: Int, callback: Fn(Int)) {}" and, when you decide you need more you have a myriad of options to add these. From the despised "function current_real_thing(id: Ind, success_callback: Fn(Int), error_callback: Fn(Error)) {}" to "some_namespace:current_thing(...)" via "OtherClass::current_thing(...)" to "load current_thing from NewImplementationOfThing" and so on.

Being strict and explicit isn't opposed to being flexible. And strictness and explicitness is most often a predicament to allow for future change, rather than hampering that future change.

It's far easier to refactor, maintain, test and reason about strict and limited implementations than to do so with dynamic, runtime-magically-changing implementations.


I found this approach works best with languages having method overloading. For PHP it felt quite limiting, and it also requires you to have more complexity and overhead with wrapping.

But I have no hard evidence at hand, only how I experienced that in PHP.


I'd be curious to read about what percentage of active PHP devs use the recent features. The last time I worked in a PHP codebase (2020?) was half PHP 5 (bad) and half PHP 7 (much nicer). Curious if there's any real info out there on this


I work on multiple projects from PHP 5.2 to PHP 8.3 and everything in between.

Statistics based on packagist(composer)

https://stitcher.io/blog/php-version-stats-january-2024

Statistics based on web servers in the wild

https://w3techs.com/technologies/details/pl-php

https://w3techs.com/technologies/history_details/pl-php


PHP 5 is as close to phased out as it gets at this point. No doubt it's still in a lot of legacy enterprise codebases (lots of breaking changes going from 5 to 7 or 8), but outside of that no one is using it.


php 7 has been released 9 years ago.


Yeah, and I just finished porting an enormous amount of production code from PHP 5 to 7.x before fully moving it to 8. There are so many breaking changes in each major version, when you have a lot of live projects and clients don't have the budget to pay you to upgrade them, they can lay stagnant for years until way past EOL. It would have been nice to know, for instance, that future versions of PHP would throw warnings about undeclared variables or unaccessible named properties of "arrays" - which could previously be relied upon to be false-ish. That's a major pain point in code bases that treated arrays as simply dynamic objects that could be checked or defined at will. Lots of isset() and !empty() and other BS. Fine, but it takes time to sit down and check it all. I really preferred it when it let you just screw up or try to access a null property or define a variable inside a block and access it later without throwing any errors at all. Nothing about its actual functionality has changed in that regard; it's just errors you have to suppress or be more verbose to get around. In PHP 8 you can still do this:

`if ($a) { $previouslyUndefined = 2; } if ($previouslyUndefined) { echo "yeah"; }`

PHP still knows what $previouslyUndefined is or isn't at the second if statement, but it'll throw an error now in the first statement if you hadn't declared it outside the block. Why? Who cares? Scope in PHP is still understood to be inline, not in block; there is no equivalent to let vs var in JS. Stop telling me where I can check a variable if you're not enforcing various kinds of scope.


Your $previouslyUndefined thing as something that's changed, as far as I know, isn't true? Unless I've missed some very recent change.

If $a is true, that snippet will just execute with no errors. If $a is false you'll get a warning trying to check $previouslyUndefined in the second if. That behavior's been the same for a very long time. The blocks don't matter for scope but the fact that you never executed the line that would have defined the variable does.

Similarly, warnings on accessing array keys that don't exist, that's been a thing forever too. Pretty sure both go back with the same behavior to PHP 4, and probably earlier.


>Why? Who cares?

I do, if I typoed previouslyUndefined the first time. I get it adds boilerplate, but it also catches stupid bugs


The Laravel ecosystem folks seem to be always up to date in recent PHP developments. At least, that's my impression.


Drupal is very hot on attributes, has fiber support, uses readonly (a PHP 8.2 feature) extensively, enums are used -- overall it's fairly up to date.


Symfony also does a great job adding polyfills way ahead of a PHP release , eg https://github.com/symfony/polyfill-php84


Yes, and this is incredibly annoying. Many packages add them as a dependency, and then you get subtle bugs because of it. Or worse, they add a dependency for the polyfill that is related to an extension and suffer performance issues when the extensions are not installed; yet no warning is output.


Somehow people seem to be missing the fact that this would've been an opt-in feature.

> A way to bake that in without breaking old code would be to allow function definition to put an explicit stop to the argument list, ...


It might be too big a change on a language level given this has been in since forever, but it might be picked up by static analysis / a linter. I'd argue it's always better to have additional protections like this in a linter as the process of adding linter rules is easier and less impactful than making a language change.

It's also always preferred to not add anything to the language imo; in this case, I'd opt to have the interpreter emit a warning or info message. It's not broken, it's a developer error.


Indeed on the PHP internals mailing list some people were saying that it would be better to entirely deprecate passing extra arguments to a non-variadic function, without adding syntax/keyword to the language.


I think this would be nice ergonomically, from a coding perspective, but I'm curious as to how it would be a security threat to pass too many arguments. What's the potential exploit here?


Exploit I don't know, but as any stricter type verification, it would catch some bugs for sure. Note that builtin functions already throw an ArgumentCountError when passing fewer OR more parameters than the signature allows. My proposal consists in (optionally in a first place) make this behavior consistent for user-defined functions.


The trouble for me, where the rubber meets the road, is external API calls that spread their arguments into a PHP function that takes a bunch of args. So I would love a way to detect if they're sending too many, which I don't think currently exists (?) but not at the expense of breaking the API if they actually do send too many.


I don’t really understand the issue. Already if you have a mismatch, the only way you’d ever know is through static analysis. It will run and maybe crash during run time. I always joke that changing a function signature is the single most risky thing you can do in php (especially if you have any dynamic dispatch). Making it even more risky isn’t the right answer, IMHO.

Oh, and doing this would literally break class autoloading in symfony, and even the engine itself, which relies on this feature.


> the only way you’d ever know is through static analysis

Not for builtin PHP functions which already throw errors on arity mismatch.

> this would literally break class autoloading in symfony, and even the engine itself, which relies on this feature

I don't understand. Could you point to where in the Symfony code it relies on being able to wrongly call a function with more arguments than it expects and will use?

For variadic functions there is the ... operator already in the language since version 5.6, and my proposal wouldn't break that. Also note that builtin functions already emit deprecated warning in PHP 8 when called with too many arguments.

Here is the full thread discussing it on the PHP internal mailing list: https://news-web.php.net/php.internals/122928


> Not for builtin PHP functions which already throw errors on arity mismatch.

It still only happens during run time. Having a never-called function with an incorrect number of arguments is not an error.


Of course! And I strongly agree that static analysis is a good thing. But PHP still is a dynamically typed language and most of its development and usage is done with this dynamic approach in mind. It's not a XOR either, we can have better dynamic error reporting AND develop better static analysis tools at the same time.

Also, due to the nature and usage of PHP, some things cannot be statically analyzed because they're inherently dynamic. A simple example would be MVC frameworks where the routing is done like so: /controller/action/param1/param2/param3 where "controller" references a class and "action" references a method, which will take the "paramN" as arguments through the splicing of an array: `$ctrl->$actn(...$args);`. In such situations it would be nice to have errors/exceptions raised automatically if the URL is wrong (not enough OR too much arguments) without having to manually assess everything inside each method. Since PHP 7 and 8 over we're moving away from long lines of isset() and !empty() (and verifications such as is_numeric() etc. thanks to argument typing).


You should look at `func_get_args()` usage in the wild. This is sometimes used for (mostly outdated) good-enough reasons and doing this might break it?


I know of func_get_args, but proper variadic functions have been a thing since PHP 5.6 (released more than 10 years ago) using the ... operator. Also, my initial proposal doesn't break existing code :).


Variadic functions serve a purpose but also change how the engine parses arguments. func_get_args is faster and more efficient.


IIRC, that's how symphony and other php frameworks do dependency injection.


To be specific about static analysis: Lots of tools catch this. Sure, making some checks native would be nice, but for instance PHPStan always catches this, and more.

Regardless of the ‘improve the language angle’: Is somebody isn’t running PHPStan (or Psalm, Sonar, etc), then they’re missing out.

PHPStan is currently so good that using it should be non-negiotable. So the question would then even be: “I’d like rule 123 of the tool to be native, we helps with the RFC?”


I find these tools to not be too useful. The things they catch is like trying to force it to be an entirely different language, and don’t actually help with maintaining code. Then again, I’m usually writing low-level libraries that take advantage of quirks in the language for performance:

    $a = &$arr[]
Gives you a reference to null and appends it to the array. Then you can pass $a to something to mutate the tail “from a distance”. Most people never need this, nor should they use it. But when you are writing a streaming parser, it is quite handy as a one-liner instead of writing

    $a = null
    $arr[] = &$a
or keeping track of the current index and dealing with off-by-one errors.

For applications, these static analysis tools are great. For libraries, not so much.


> Suggestion to make those parts read-only was rejected as a 0.6% performance impact was deemed too expensive for too little gain.

Big Oof. :( :( :(


I'm okay with the tradeoff. PHP prioritizing speed over uncommon security is the right call here.


missing /j


At a large PHP shop, 0.6% can be tens of millions of dollars.


Over what time period? You’re implying they are spending at least a billion dollars on hardware costs for 0.6% to be tens of millions.


At a large PHP shop, a successful exploit can be the end of the company.


almost everyone have all the things required for those exploits disabled.

why would i accept performance penalty if i don't allow open('https://google.com') to begin with?

the correct action would be to remove all the stupid features everyone serious disable to begin with.


It seems hard to "disable" the issues mentioned at https://dustri.org/b/upcoming-hardening-in-php.html


PHP is a bad choice to begin with if your use case is that performance critical


No doubt there. It's just that providing a secure platform seems a tad more important.


PHP has always ben slow, its getting slightly faster, but still REALLY, REALLY slow for anything CPU heavy. This is why the ML crowd sticks with Python (numpy) thats incredibly fast.

PHP is still lacking, there is no unicode support, and for a web language this is really bad. Also, the way PHP functions, makes modern web (like websockets) use impossible, there is hacks around this but they all kind of suck.


Python is slower than most of the horses I bet on. That's pretty slow.

The important - CPU intensive parts - of numpy, pandas, pytorch, and all the other "fast python" libraries out there, are actually written in C.

Pure python should not be used for anything that requires good performance: it is programmer ergonomic, not CPU ergonomic. It is great that through the use of FFIs it has access to powerful libraries written in a language that isn't slow, but that does not make it as a language itself, fast.


Thats my point, pyton the is one of the slowest languages, and still have high quality, high perf libraries like numpy. PHP has no way to install deps that actually are written in asm/fortran or c.


PHP still has PECL[0] which is a huge collection of C extensions.

[0] https://pecl.php.net/


> PHP has no way to install deps that actually are written in asm/fortran or c.

https://www.php.net/manual/en/book.ffi.php is not enough for your needs?


PHP has decent FFI, nothing is stopping you from using the same libraries as you would with Python. Here's someone's quick hack as an example: https://github.com/dstogov/php-tensorflow

For an interpreted language PHP itself is ridiculously fast and the VM is rather small so you can use something else coughElixircough for parallellisation. I use it all the time for data wrangling stuff and database imports because it's robust, fast and PsySh is a pretty neat environment.

The array data structure is quite nice too. It's built on simple parts that are foundational to the VM itself, and very flexible, similar to lists in Lisp-like languages but without the seek lag when data grows due to the indexing.


PsySh

I’ll have to check this out.

Though not popular php can be a surprising decent scripting langauge.


https://psysh.org/

It's very popular, as in a lot of businesses use it, it's just not fashionable.

I think it's a great tool to have. It had gradual typing before it was cool. You can type in like a page of code including the layout and render whatever in a PDO-supported database on a web page, served by the builtin web server, which is great for data exploration and things like SQL optimisation. At the moment I'm handling some data flows and conversions in a project with something like a terabyte of email and office documents that need to go into RDBMS, because there are some liberally licensed lightweight libraries (in contrast to the bulky stuff in Jakarta-land) and the performance is good enough to not be a bottleneck.

Edit: And when a library isn't good enough, I can usually trivially fix or extend it because it's in a familiar language and written by a simple minded person like myself.

There's a degree of clunkiness and incoherence in built-in API:s that might be off-putting at first but the included batteries and PsySH make for a quite decent tool anyway.


There were many many times I'd start writing a bash script, but then switch it to a PHP script. I've done this so many times that now I just start writing in PHP.


> This is why the ML crowd sticks with Python (numpy) thats incredibly fast.

That is not why.

You stick to python because it's your common denominator. You all picked it up in school.

Python the most popular language around, and one of the slowest.


Actually PHP itself is very fast compared to Python, especially for an interpreted language.

Python only seems fast because all the heavy duty number crunching libraries are actually written in C.


This comment and all siblings fight over PHP vs Pyhton etc, but that just isn’t the bottleneck in most apps.

By far, for most apps, the biggest bottleneck is the database.


Yet, just a pip install away. In PHP this is not possible.


http://pecl.php.net/packages.php

`pecl install Tensor`?


Just a composer require away.


I'm pretty sure this is wrong. PHP has been faster than Python for a long time, but numpy is not written in Python, it's written in C. Just like PHP, coincidentally :)


Python is also written in C.


Its not wrong, numpy is a pyrhon package (written in c) but you can USE it just with an pip install.


I tend to use PHP for my backend work.

In my experience, it’s actually very fast. That may be partly because of the way I write the code, though, and my backend code isn’t really too massive.


> I find it fascinating that people are putting so much efforts optimizing exploitation techniques, yet ~nobody bothers fixing them, even if it only takes a couple of lines of code and 20 minutes.

Like it or not, exploiting seems just more fun and rewarding. A lot of people will be interested to learn on your blog how you came to find and exploit a vulnerability. The 10 line of code patch gets little attention. Not even taking into consideration bug bounties...


Exploiting is mainly much, much harder. Programmers are pretty good at preventing the obvious exploits so the gaps left to exploit are the tricky ones.


Are these issues very particular to PHP? Honest question, this is all above my current programming knowledge.


Yes. Most languages don't have anything like the filter notation for arbitrary reads to escalate through.


Off topic: there are two key technologies that "digitalized and computerized" the World: Visual Basic and PHP. And Excel.


I found the solution:

sudo apt-get purge php.*


### Структура проекта

``` project/ ├── data/ │ └── surveys.json ├── app.js └── package.json ```

### Шаг 1: Инициализация проекта

1. Создаем папку проекта и переходим в неё:

   ```bash
   mkdir project
   cd project
   ```
2. Инициализируем npm:

   ```bash
   npm init -y
   ```
3. Устанавливаем Express.js:

   ```bash
   npm install express
   ```
### Шаг 2: Создаем файл `surveys.json` для хранения опросов

Создаем папку `data` и файл `surveys.json` в ней с начальным содержимым:

```json [] ```

Это будет массив объектов, где каждый объект — отдельный опрос.

### Шаг 3: Создаем файл `app.js` для серверной логики

Вот код для `app.js`:

```javascript const express = require('express'); const fs = require('fs'); const app = express();

app.use(express.json());

const surveysFilePath = './data/surveys.json';

// Функция для чтения данных из файла const readSurveys = () => { const data = fs.readFileSync(surveysFilePath, 'utf-8'); return JSON.parse(data); };

// Функция для записи данных в файл const writeSurveys = (surveys) => { fs.writeFileSync(surveysFilePath, JSON.stringify(surveys, null, 2)); };

// Создание нового опроса app.post('/surveys', (req, res) => { const surveys = readSurveys(); const newSurvey = { id: Date.now(), ...req.body, editable: true }; surveys.push(newSurvey); writeSurveys(surveys); res.status(201).json(newSurvey); });

// Получение всех опросов app.get('/surveys', (req, res) => { const surveys = readSurveys(); res.json(surveys); });

// Редактирование опроса (если он редактируемый) app.put('/surveys/:id', (req, res) => { const surveys = readSurveys(); const surveyId = parseInt(req.params.id); const surveyIndex = surveys.findIndex(survey => survey.id === surveyId);

  if (surveyIndex === -1) {
    return res.status(404).json({ error: 'Survey not found' });
  }

  if (!surveys[surveyIndex].editable) {
    return res.status(403).json({ error: 'Survey cannot be edited' });
  }

  surveys[surveyIndex] = { ...surveys[surveyIndex], ...req.body, editable: false };
  writeSurveys(surveys);
  res.json(surveys[surveyIndex]);
});

// Удаление опроса app.delete('/surveys/:id', (req, res) => { const surveys = readSurveys(); const surveyId = parseInt(req.params.id); const newSurveys = surveys.filter(survey => survey.id !== surveyId);

  if (newSurveys.length === surveys.length) {
    return res.status(404).json({ error: 'Survey not found' });
  }

  writeSurveys(newSurveys);
  res.status(204).send();
});

// Запуск сервера const PORT = 3000; app.listen(PORT, () => { console.log(`Server is running on http://localhost:${PORT}`); }); ```

### Пояснение к коду:

1. *Маршруты*: - `POST /surveys` – создание нового опроса с полем `editable: true`. - `GET /surveys` – получение всех опросов. - `PUT /surveys/:id` – редактирование опроса по ID, если `editable: true`, после чего `editable` становится `false`. - `DELETE /surveys/:id` – удаление опроса по ID.

2. *Функции*: - `readSurveys` – читает данные из JSON-файла. - `writeSurveys` – записывает данные в JSON-файл.

### Шаг 4: Запуск проекта

Запустите сервер командой:

```bash node app.js ```

Теперь API будет доступен по адресу `http://localhost:3000`.


That's good, PHP is too permissive


The real question is why does PHP have so many bugs that it's so trivial to exploit?


Seems like an attitude problem.


Historically is has been an skill issue. It seem it persists even today


Honestly, the development of the PHP core has always been rather amateur. From historically just adding features whenever to know adding hundreds of breaking changes per minor release. This results in a terrible codebase and a language where upgrading minor versions is so painful and costly for some firms they end up stuck on old version.

The last part makes the fact their could be massive security holes like RCE in the core language very worrying.


Historically they have been one of the languages that added very few breaking changes and were criticized for that. You can't have it both ways.

Going from php 4 to php 8 isn't even that painful and there are twenty years between. It's the one language where what you wrote 20 years ago probably still works today.

Upgrading a php application is one of the least expensive and uneventful things you can do. Try upgrading that java swing application or that react app that's 15 months old. Go, haskell, vue, python 2 to 3 are more difficult because of syntax changes where in php you have some breaking changes like globals being removed or ereg removal or mysql_ being removed. The changes were small like using mysqli instead of mysql or using preg instead of ereg.


Somewhat contrary to my own comment below, migrating PHP upwards can be painful, but some of that stems from who's involved. I'm brought in on PHP upgrade projects, and often the code is 10+ years old, and no one who wrote it is still around. There's typically no documentation, no tests, and little historical knowledge.

Now... that's also a problem in other languages too, but I've found it less so with something like... older Java. Because there are some things that are completely gone from earlier PHP, if people relied on those bits 15 years ago, there's potentially a lot of code touching that needs to happen.

My experience is there's less deprecation of language features in older languages (mostly thinking of Java). There were perhaps less 'wonky' ways of doing dynamic Java stuff in 2006, so 2006 Java will still work today. 2006 PHP can work today (see comments below), but the more 'advanced' your PHP was the more at-risk it will be when trying to upgrade to, say, PHP 8.x.

The other big thing, though, is frameworks. The more you tied your code to, say, ZF1, the bigger an upgrade effort will be (sometimes far bigger than expected). I've been hit by this a couple times in the last few years.


My complaint was adding lots of breaking changes to minor version upgrades. The fact, that historically they didn’t do breaking changes in major version upgrades does not excuse going against industry standards.

There are literally companies stuck on PHP 7 because going to 8 is too painful. And honestly with my decade of experience your claim that going from 4 to 8 isn’t that painful sounds like nonsense and something you haven’t done. And the claim that php 4 code will work on php 8 is 100% nonsense. The syntax for how classes were defined in 4 is not supported in 8.

I’ve upgraded the code to the major versions of other languages without hassle. But they generally had fewer breaking changes in their major versions when I upgraded to Vue3 which was super painless.

And can you point to a minor upgrade where Go changed the syntax to the point the old one no longer works? With Go normally it's the tooling that changes. I can't remember the syntax changing on me. Especially since they have a Compatability Promise[1] it seems weird.

[1] https://go.dev/doc/go1compat


> There are literally companies stuck on PHP 7 because going to 8 is too painful.

Usually due to frameworks. Those tend to do play fast and loose with language constructs so small changes can make them not work anymore. And as those same framework will also break a lot more things between version, you cannot easily upgrade them, so you cannot upgrade your php version.

Currently maintaining an internal symfony 1 website so I know the pain. At least the documentation of old symfony is still available unlike many more modern frameworks.


Like everything... It depends.

Just use rector

https://github.com/rectorphp/rector


Our best engineer spent 2 weeks full time updating our codebase from 7 to 8 using rector (the project was already running PHP 7 but still had a lot of deprecated code from the PHP 5 era which didn't work on PHP 8). That would be eternity without Rector.


I've previously used that to do something. Fell flat on its face. But the fact they literally had to create a tool to solve this problem confirms my point.


I use it all the time and it works great. You need to know what you're doing.


So again, you seem to be confirming my point that it’s not fool proof.


You are right, going from 4 to 8 is a huge challenge.

Minor point: classes are using the same syntax but you are right they typically won't work because the constructor have changed, in PHP4 it was a function named the same as the class, in PHP5 and on it's __construct, the PHP4 version was deprecated in PHP 7 and removed in PHP 8.0.

create_function is just gone. So is each(). Oh and HTTP_RAW_POST_DATA. The list is long.

If you have used == and in PHP4 we did that, come on, don't be holier than holy and claim you only used === in PHP4 then PHP8 will have some surprises for you.

In my experience rector became usable last year-ish, if you tried before, give it another whirl.


There’s a system that I started working on, in 2008, when a lot of hosting outfits were still running PHP 4. It’s still running now, but I think one of the current maintainers rewrote a lot of it with Laravel, recently.

I know that they were still using the old code, just a year or so ago, on PHP 8.


I had code I wrote for someone in 2002 (php4) still running in 2017 (php 5.x). It broke trying to go to 7, and it was a small area that broke. It was core, so broke everything, but it had to do with re-assigning of $this at runtime. I think in 5 it was complaining about it as a warning, but no one was looking. Had I/we been stricter about the use of $this back in 2002, that code might still be running today. Doesn't mean it should or wouldn't be faster if rewritten with newer language features, but ... it had a good run for 15 years.

EDIT: Was reminded of another site started in 2000 (start of PHP4) that is still running. I can only see the login page now, but I see the login page is still displaying a particular URL structure that was/is slightly uncommon. If they kept that but rewrote the entire thing internally... that would be odd, because it would be easier to rewrite the whole thing. I've no doubt they've upgraded some internal parts, if only to accomodate new business needs over the past 20+ years (I stopped working with this project in 2003?) but it's still up and running.


Surprised to see Go in your list though.


> From historically just adding features whenever to know adding hundreds of breaking changes per minor release.

Should be noted that it stopped being the case close to a decade ago now. Since PHP 8 things have changed a lot and it's a significantly better platform, both in terms of usage and the people behind it.

PHP spent a long time running on fumes with little backing. It's now got huge financial backing from Jetbrains, Wordpress Symfony, Laravel etc and theres now people paid to work on it, which has dramaticly improved the quality and quantity of improvements, which are mostly focused around performance and bug fixes.

The performance gains arent just figures on paper either. There was a real world improvement of around 12% on PHP 8.3 alone.


Doesn’t Facebook still run most of their backend on Hack[0] (compiled PHP subset)?

[0] https://hacklang.org/


I think the fact Facebook decided to just fork that language instead of improving it shows how bad the core development was. The internals newsletter was just brutal. I remember reading a thread from a Facebook dev asking who quietly just reverted his commit. No discussion, no nothing, the code was just reverted if someone didn't like it.

I really think that was a major misstep by the project.


> I really think that was a major misstep by the project.

Oh, yeah.

As someone who has written and maintained a lot of "infrastructure-level" stuff, I have come to learn that releasing a project that serves users, or is infrastructure for other projects, is like having children.

Making them is fun. Releasing them, is a pain, but, once they are out there, it is my Responsibility to support them, and accept that they have their own agency.

I can't just go in and pretend that I'm Lord Farquaad, and treat the project as if it's my private fiefdom. It's now a public resource, and my decisions and actions affect a lot of others. I also tend to write software that supports folks with a rather ... pithy ... demeanor, so screwups can result in not-pleasant feedback.

That's a big reason why I don't mind that most of my public repos aren't popular.


Hack is only PHP in a very ship-of-theseus sense - it has PHP _vibes_ but they replaced the language, the runtime, the standard library, and all of the infrastructure

(and all of them much improved over PHP IMO - especially XHP [equivalent to JSX, where HTML is a first-class citizen in the language syntax])


> Should be noted that it stopped being the case close to a decade ago now. Since PHP 8 things have changed a lot and it's a significantly better platform, both in terms of usage and the people behind it.

The breaking changes section of UPGRADING file for PHP 8.4 is over 200 lines. For 8.3 it was over 100 lines.

And when it went to version 8, there was only 1 full-time developer working on it and as far as I know 0 part-time. The rest were volunteers doing it as a hobby. That full-time developer who was paid by Jetbrains decided he wanted to work on another project. This resulted in the next release being pretty much nothing. At which point, everyone realised this language went from funded to not funded and they created the PHP foundation.


I assume you're referring to what the upgrade guide refers to as "Backward Incompatible Changes". If so reading through the list I can't see a single one on there that has a major impact, in fact I'd wager that 99% of all 8.3 instances will have no issue upgrading to 8.4 as they are all very superficial changes to some very legacy areas of the language.

I'm also not seeing 200 on there, though you said "200 lines" are you talking about the length of the article, if so thats not really a helpful metric.


> I assume you're referring to what the upgrade guide refers to as "Backward Incompatible Changes". If so reading through the list I can't see a single one on there that has a major impact, in fact I'd wager that 99% of all 8.3 instances will have no issue upgrading to 8.4 as they are all very superficial changes to some very legacy areas of the language.

They changed error handling. That is a major impact. If things start throwing errors when they previously didn't it results in your app breaking because how you were handling errors is no longer applicable. Now you have third-party libraries, etc all breaking because the PHP core team can't be bothered to follow industry standards. And yes, SemVar is, at this point, the industry standard to the point people use 8.* in their composer require because they expect SemVar.

And changing error handling in very legacy areas of the code is the worst especially when there isn't even an RFC to say that they would be doing it. The fact it's legacy means people don't expect it to change.

> I'm also not seeing 200 on there, though you said "200 lines" are you talking about the length of the article, if so thats not really a helpful metric.

It's useful in giving an impression of the number of changes. Especially, when given an example of another release to see how they're increasing. And it's extremely useful when comparing to other languages where the breaking changes section either doesn't exist or it's extremely small. 200+ lines even with formatting and some lines taking two ends up with over 100 breaking changes.

I get it, you like PHP and you're protective over your tooling. I use PHP heavily, in fact, I'm building my business on top of it. But that does not remove my ability to look at how everything is compared to other languages and see there is a major problem. On Ubuntu, it has packages for each minor release whereas in Python it's just python3-*. Why? Because PHP's reputation for adding breaking changes whenever (despite the claim that they're really good at it) has been there for so long that even Linux distros know that people need to deal with that pain.

The problem with me being mainly a PHP developer is, I know all the problems. You can't BS me like you can devs who don't work with it so much.


Since PHP introduced the formal RFC process for changing the language things has become much less cowboy and much more professional.

Typically it is features before that causes problems.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: