"No, the problem results because lowercase i (in most languages) and uppercase I
(in most languages) are not actually considered to be the upper/lower variant of
the same letter in Turkish. In Turkish, the undotted ı is the lowercase of I,
and the dotted İ is the uppercase of i. If you have a class named Image, it
will break if the locale is changed to turkish because class_exists() function
uses zend_str_tolower(), and changes the case on all classes, because they are
supposed to be case insensitive. Someone else above explained it very well:
"class_exists() function uses zend_str_tolower(). zend_str_tolower() uses
zend_tolower(). zend_tolower() uses _tolower_l() on Windows and tolower() on
other oses. _tolower_l() is not locale aware. tolower() is LC_CTYPE aware."
Edit: Someone else later said the following (I'm wondering if it's true):
"This, practically, can't be fixed. Mainly because there's no way to know if 'I' is uppercase of 'i' or 'ı' since there's not a separate place for Turkish 'I' in code tables. The same holds for 'i' (can't be known if it's lowercase of 'I' or 'İ').
I told 2 years ago and will say it again: PHP should provide a way to turn off case-insensitive function/class name lookup. No good programmer uses this Basic language feature since identifiers are case-sensitive in all real languages like Python, Ruby, C#, Java."
If it wasn't clear by the comments on the bug report or by the quoted sections of this comment's parent, let me rephrase it. This issue is entirely caused by the fact that PHP is case insensitive for classes and function names (but not variables, go figure). That is, if you define a class MyClass, you can instantiate it using MyClass or myclass or MYCLASS. You can call the functions from the standard library in whatever case either (so, array_map or ARRAY_MAP is fine).
Based on the behavior of this bug, it appears that the way PHP handles this case insensitivity is that it just lowercases all class and function names before resolving them. And this bug in particular shows up for Turkish because 'i' is not the lowercase equivalent of 'I'.
Pretty much all other modern languages are case sensitive, so I'd be surprised to find this issue elsewhere.
Well, VB.NET is case insensitive yet the problem doesn’t crop up there because it’s not braindead enough to use the same locale while compiling & executing. Yes, I get that PHP code isn’t compiled in a separate step but there still is no reason for it to use a user-defined locale. It should use the C locale, end of story. I don’t understand why this isn’t trivial to fix. Is there any place where PHP depends on a user-defined locale for parsing?
EDIT: “trivial to fix” as in, doesn’t cause regression, not necessarily that it’s a small change to the code base.
Which would not help in this case, since in turkish the upper-case representation of `i` is not `I` but a different symbol. So the class you're looking for would not exist.
That still doesn't make sense. If 'i' is not the lowercase equivalent of 'I', then the lowercasing should just result in another letter, right? The only thing that could cause the bug is if it uses two different ways of lowercasing (perhaps one when registering the class, and another way when looking up the class).
The mapping between uppercase and lowercase can be completely arbitrary, and as long as it's used consistently you shouldn't get these kind of bugs.
It's not PHP's fault that accurately performing case transformations across locales is difficult; it's just actually very difficult. The solution isn't to "fix" the process of transforming letter case; the solution is to simply not transform the names of your identifiers. Unfortunately that is simple only in a very isolated setting; in the real world, doing such a thing is liable to break a lot of software.
This is a really good example of the problem at hand:
>The Greek letter Σ has two different lowercase forms: "ς" in word-final position and "σ" elsewhere.
The identifiers are lowercased multiple times; first at parse time, presumably using the locale of the OS, some setting in php.ini, or some fixed locale. (it doesn't, in practice, matter where this initial locale is set; it just matters that it's set at parse time.) It's then lowercased again at runtime; if the locale was changed at runtime, such that the casing rules in the two locales produce any differences, the identifier will not be found.
I'm not saying this to defend PHP; just to shed some light on the case-folding problem. Having case-insensitive identifiers is a design mistake.
Unfortunately, the obvious answer (parse code in the C locale) breaks code that's in the wild and relies on PHP's undocumented locale-specific case-insensitivity.
Obviously, case-insensitive identifiers are a bad idea, but PHP is stuck with them at this point.
I don't think it's "obvious" at all. The problems caused by case sensitive identifiers are legion, especially in dynamic languages with implicit declaration. It's not at all clear to me that this problem is of case insensitive identifiers and not simply in PHP's implementation.
Changing the case insensitivity of PHP identifiers is a good idea. I suspect breakage would be minimal and easily fixable. One has to keep in mind that variable names are already case sensitive in PHP, and most (reasonably good) code I've seen in the wild does honor the spelling of class and method names.
Of course it will never happen, but I really think this would be a great idea for the next major version. Backwards compatibility is important but not at all costs. A language needs to remain agile enough to allow for the recognition of (and ultimately the fixing of) mistakes.
I doubt that breakage would be minimal. I'm not as certain as you that most code does honor the spelling of class and method names, but even if we assume that this is the case I'd assume that there are tons of undetected errors. Currently, there's just no way to test that you're using the proper spelling, so nobody does.
It would be trivial to write a command line tool to check (and maybe automatically fix) those typos. Incorporating it into a major new version also ensures everybody has enough time to prepare - and in case of hopeless and unfixable legacy apps: there is always the option not to upgrade.
Breaking changes in programming languages are not that uncommon, C# and Perl spring to mind from personal experience, but also to a lesser degree such things have happened with PHP itself (and it became better for it). In this case, it's actually a change that moves the runtime's behavior closer to what's expected. It's a change that improves internal consistency while also eliminating silly bugs like the one discussed here.
I'm not against changing that behavior since it's arguably stupid and inconsistent [1], but I'm wary of "trivial" changes. It's not only apps that need fixing, pretty much every library needs checking (and maybe fixing). This cannot be done with a commandline check, since classnames can be constructed on the fly, called via eval() or call_user_func() etc. Class names may be loaded or even defined on the fly (the SOAP Pear Extension does this to create proxies). All those cases can only be checked by executing the program. It's probably a good change, but this is anything but trivial.
[1] actually, I don't care at all since I moved on to greener pastures.
Having seen the hairballs of creative PHP in the wild, I'd think the only sane way to do this would be some sort of deprecation warning whenever a symbol lookup matched only case insensitively.
Like the ones that they introduced to fix array[key_without_quotes] where key_without_quotes was mapped to a string if it was not a defined constant and a NOTICE was issued? The first thing everyone did was turn off E_NOTICE since practically all code emitted that notice. It took years until you could run apps with E_NOTICE turned on :)
True: I spent years swimming upstream trying to get PHP developers to log sensibly and actually fix problems rather than suppressing the notices, which is probably why I'm much happier with the Python community.
Just using LC_ALL="C" would break people using some other language to write code. I personally strictly disagree with writing code in anything but English, but other people think it's okay to have Russian class names or something. Using LC_ALL="C" would make this impossible.
If there are so many people depending on PHP and all the code written in PHP in all of Turkey, why doesnt someone in Turkey fix the problem?
Or anywhere for that matter?
There is no "they" in this equation. There is no person who should be held more accountable than you or I for fixing this problem.
The choices are simple:
1) Fix the problem
2) Find a work around
3) Don't use PHP
What's that? There is a lot of open source software that you wanted to use for free that's written in PHP that does just what you need except for this tiny little trivial thing that should be easy to fix? Well too bad!
Trade off the cost of fixing it against the cost of rewriting the big, free, open source package that's written in PHP you wanted to use, in the programming language of your choice, and stop complaining.
It might not be an easy fix if you're not familiar with PHP's guts. It's the kind of fix that can induce a lot of unexpected regressions.
Not wanting to fix a bug because it's not worth the time or risks breaking backward compatibility is perfectly fine by me. But at least take a decision and say something.
If they don't plan on fixing it they should say something like "We believe this is a minor bug that only concerns a small number of users. In order to fix this we'd need to change X, Y and Z and make sure we don't introduce regressions. If you want to try and do it we'll be glad to review your patches. In the meantime you can use this workaround: [...]".
I hate it when I submit a bug report and it's being ignored. You also build a strawman argument with the "lot of open source software that you wanted to use for free". It's a bug and should be fixed (even if the fix is closing the ticket as "wontfix").
specifically the complaint that this problem manifests with lots of off-the-shelf software (although I suppose he didn't specify FOSS in his original comment).
However, I wasn't directly responding to that guy, I was more responding to what I feel has been aptly described as a "witch hunt" by others on this page.
It might not be an easy fix if you're not familiar with PHP's guts
10 years is a long time for someone to have the chance to get familiar with it.
Even if you assume that for 8 years, everyone was saying "oh, it will get fixed some time" even 2 years is a long time for anyone affected by this problem seriously enough to become familiar enough with PHP to fix the problem if that's the path that will produce the most value for them (ie. if there's enough value in some existing codebase or off the shelf software to warrant fixing this if there's truly no other workaround).
Still, I can see the sense in promoting major issues like this with PHP, but posting the bug report on the front page of HN is far less useful than, say, writing a blog post about it with some case studies of where the problem has been manifest, how people have dealt with it, the history of the bug, etc.
Actually that's a good blog post, might put it on my list ;)
We all know PHP has its shortcomings, but there appears to be a witch hunt going on here.
I think some of the witch hunt comes in attempt to steer people away from a language which is badly designed and has a million bugs which cannot be fixed without breaking most of the existing code written for the language.
Pestering a language like that is only fair.
While I'm sure it gets tiresome for those who for whatever reason have to work or prefer working in PHP, it is only a polite gesture to the software-developers who has yet to take that dark path.
While I agree lots of that is emotive, it is hard to argue against the truth in 1. badly designed, 2. bugs, and 3. which cannot be fixed without breaking most PHP code.
The rest follows naturally. Anyway: Have an upvote for objectively dissecting my semi-objective analysis.
Would it rock? Nah. It would merely make it less horrible. It lacks fundamental elegance and it lacks advanced capabilities. Even if fixed, it would still only be clinging to the crown of mediocracy.
But that is fine. Not all languages can or should be mind bending or define its own paradigm. So yeah, if all its problems was somehow in all unlikelyness fixed, it would be fine. It would be fine, but it would in no way "rock".
I am one of those who chose to work with PHP, and I'm getting more than tired about that witch hunt, to a point where I would actually punch someone.
I do realise PHP is not ideal, but as I always say, in the end it get shits done, and the vast majority of those problems or "design flaws" does not affect 99.9% of the community. Most of the people who bitch about it are not even users, in the heavy user circles we know of some of the issues, try to advocate for improvement, but seriously in almost a decade of using PHP, I could compile all those blog posts, and confidently say : I never had any issue whatsoever with any of the problems pointed.
And that's what most elitist around here forget, for 99.9% of its user base, PHP is not flawed, it works, easy to learn, easy to scale, easy to deploy, upgrade, easy to find developers...
While I get what you are trying to say, you are in no position to say so, and the community -the one that actually matters- already spoke, we have no major issue with PHP, so leave us the fuck alone please. If it bugs you that much, consult a therapist you have bigger issues than PHP...
I suspect that the only people who can defend PHP on technical merits are people who have yet to try something better and have yet to realize just how much better most other options out there are.
If that describes you, I am sorry for you, and deeply encourage you to take on something new on the side, in a different language. Just for fun, learning and exploration. Just to let your mind get a feel for how the world can (and maybe should) be different.
I've done PHP. I've been there. So don't get me wrong. PHP has good sides. Yes yes, it does. But IMO (and I'm far from alone here) they are completely overshadowed by the bad sides.
Having dealt with every language under the sun over the last 20 years, PHP still has an as yet unbeatable sweet spot when it comes to getting stuff done.
The PHP project has ensured that "3) Don't use PHP" is the lowest-cost option. These bugs are not trivial, and there are far too many of them.
It is important that people should be fully aware of the technical liability they are taking on when they adopt PHP for nontrivial projects.
It isn't reasonable to demand that other people fix the huge collection of weird bugs in your project. Particularly when they are not invested in PHP (any more). PHP's bug collection is a strong reason not to invest in PHP (any more). If it is important to you to encourage PHP adoption, then YOU fix the bugs.
I am not wasting my life working around this nonsense because there is no reason why I should have to. There are alternatives which already work correctly.
Don't trade off against the cost of rewriting.
Trade off against the cost of using any of the well-developed alternatives which do not have the same bugs, the same volume of bugs, or the same internal processes which generate and shelter bugs for years on end.
This is a huge bug. Believe or not, many dev people in Turkey use locale tr_TR (which is perfectly normal) and when they begin to use "any" off-the-shelf PHP library/class with uppercase-I, it does not work at all. A little example, if APC has a class with I, it won't work on your tr_TR configured Windows Server.
PHP is crap. This bug is clearly a WONTFIX, it's been 10 years since it is reported. I remember this bug when I was 14, thank God I moved on to other languages afterwards.
If this is such a dealbreaker for developers in Turkey, why have none of them, in the 10 years this bug has been alive, submitted a patch for it? PHP is open source, it relies on code submissions.
edit: not trolling, just curious. What drives people to complain about specific, well-defined open source bugs without any effort to fix it? I understand hard-to-nail down issues like user experience, but this shouldn't be that hard to plan out and fix independently.
If there are viable alternative projects which never had that problem, you can save all the time rather than trying to salvage someone else's broken software. It's a lot less time and trouble. What reason do I even have for fixing your project? I don't owe PHP loyalty when it is broken for me.
If there are viable alternative projects which are more responsive to bug reports, that is more promising for the future - if a second bug I see is reasonably likely to be fixed in the future, I can feel more confident basing my own code on it.
People spend months and years writing apps on top of things like PHP and once they have the code they don't necessarily have a lot of choice. At that point maybe you fix the bugs in your dependencies rather than rewrite your own app. But when you have a choice, you don't adopt a tool which is going to leave you with this much technical liability.
This is offered peacefully in an attempt to explain the question which seems to confuse you.
Every time an article critical of PHP appears, defenders come out of the woodwork. It's a great language, they say. It's no more flawed than any other language. Critics are just biased. It has problems, but other languages have problems too. People build large apps with PHP, so it must be good.
But come on. This language is complete crap. Code spontaneously fails depending on the locale? And the bug has been open for ten years and still is not fixed? And this is only one bizarre and inexplicable bug out of hundreds, maybe thousands, of bizarre and inexplicable bugs in PHP.
This language isn't defensible. If you want to say that it's worth dealing with the flaws due to the ecosystem, fine, fair enough. But don't tell us that PHP is no worse than any other language. It's far worse.
Of course, this isn't an article critical of PHP and I'm sure those defenders would just as readily state 'every time PHP appears on Hacker News the PHP-haters come out."
While I tend to agree that I would not use PHP for new projects, I would disagree that it's indefensible. All you need to defend it is, "it's easy." In the sense of, "it's nearby, it's within reach." If it happens to be the language installed on your system, its use is automatically defensible on those grounds alone.
It might not nurture you and love you and cherish you; hell, it may abuse you at times, as any language with idiosyncrasies does. It might even have more idiosyncrasies than other languages do. But those do not make a relationship indefensible -- merely difficult. And in some cases, the difficulty makes the love even more binding -- which is why we still have people who program in low-level languages, for example, even though those have all the more tendency to abuse you for the tiniest mistake you make.
The choice to use PHP may be defensible. The PHP ecosystem may be. But I don't believe the language itself is. It's a subtle but important distinction: there are some good reasons to use PHP (although IMO more good reasons not to), but there are few or no good reasons for PHP's problems.
> Code spontaneously fails depending on the locale?
It doesn't spontaneously fail. The languages functions are case-insensitive and they documented this. [1] [2] When you change the locale to Turkish the letters change. Thus, the class name changes and no longer works as expected.
So it is documented because it may not as expected, but it is not spontaneous.
That's incorrect. It's not behaving as documented. Whether you compare in a case-sensitive or case-insensitive way "Info" should always match "Info". The bug results in a situation where it doesn't.
I'd accept that you cannot reference class "info" using name "Info" in Turkish locale, but that's not the case here.
Nowhere in your documentation does it state that a class name cannot be accessed using the exact same, byte-for-byte identical name, depending on locale.
Case sensitivity changing depending on locale would be weird, but at least vaguely sensible. Identical strings no longer matching is just plain wrong.
PHP is a big legacy open source project, worked on by many volunteers whenever they can spare the time, just like any other open source project.
It is wildly successful despite this and many other bugs.
I only wish that the people who spend as much time attacking PHP and it's developers endlessly would instead focus some of that energy into helping to improve PHP, but I guess some of us are just negatively charged.
Sad that we have yet another anti-PHP posting hitting the front of HN in as many days, let the hating re-commence (again)...
Responses like this to people who don't like PHP are just as bad as the people constantly and loudly attacking it. Neither accomplishes anything other than building animosity.
Your suggestion that people improve PHP instead of attacking it is naive. PHP is, as you said, a big legacy open source project. As a result of that, it's basically impossible to make the extreme, breaking changes that many people (me included) think would be required to make it a reasonable competitor to the existing options. (And the PHP community is not especially inclined to change. It took years for short array syntax to get added to the language. If something as obviously beneficial as that is going to be hotly debated, making real, breaking changes is impossible.)
Faced with the alternatives of trying to radically change PHP (which is, as I said above, impossible) or to use and improve other languages and frameworks, I think the choice is obvious. It was one thing 5-10 years ago when there weren't necessarily good or mature alternatives, but we have many choices now. In my opinion, it makes very little sense to use something with as much extraordinarly painful legacy baggage as PHP unless you have an exceptionally good reason for doing so.
After 5+ years of eloquent, smart programmers* posting long, well researched screeds about what's deeply broken with PHP's design at the most fundamental levels, there is no other conclusion to be reached. The only way to improve PHP is to replace it, and we have a long way to go to get there.
* Note that I am not including myself in this list. But any trivial search for "what's wrong with PHP", much less "PHP sucks" produces 5+ years of very bright, articulate, sometimes downright famous programmers making this same point about PHP. Most recently Jamie Zawinski at http://www.jwz.org/blog/2011/05/computational-feces/#comment...
> After 5+ years of eloquent, smart programmers* posting long, well researched screeds about what's deeply broken with PHP's design at the most fundamental levels, there is no other conclusion to be reached.
Issuing holy decrees from their ivory towers more like. Meanwhile, lots of tremendously successful companies doing real work in PHP each and every day, at the coal face, where it matters. Does their hard work deserve this constant ridicule?
> The only way to improve PHP is to replace it, and we have a long way to go to get there.
Agreed. When something better comes along, I'll start using it (like how I switched from Perl to PHP a long time ago). The problem is, many "eloquent, smart programmers" are too busy "posting long, well researched screeds" to spend some time making PHP better (or making a better PHP).
> Does their hard work deserve this constant ridicule?
Nobody seriously criticizing PHP is ridiculing people working with it. On the contrary, I personally applaud people having to, and succeeding at extracting diamonds with a broken pickaxe.
> The problem is, many "eloquent, smart programmers" are too busy "posting long, well researched screeds" to spend some time making PHP better (or making a better PHP).
A "better" PHP would be at its core so entirely different and incompatible with the current PHP that calling it "PHP" would be a complete misnomer. Since it would be such a different language/platform with only vague syntactic and semantic similarities, one might as well invest its time in the better designed, actively developed, battle-tested, currently available alternatives.
> Nobody seriously criticizing PHP is ridiculing people working with it.
I rarely see this. I've found it hard to find a good criticism of PHP that doesn't find a way to insult it's user base.
Probably the best criticism of PHP has come from Jeff Atwood's recent post[1] (and even he couldn't go the post without slinging immature, and downright vulgar insults).
That's exactly what I'm talking about, it's insulting and it disappoints me that all of the recent HN threads have been full of similar insults. Their arguments always seem to boil down to "my hammer is better than your hammer, and you're an idiot for choosing the wrong hammer, but luckily I am here to re-educate you".
PHP is so incredibly bad as to have no redeeming qualities to the language whatsoever other than its simplicity in deployment.
Whereas I can have reasoned conversations with proponents of most modern languages, PHP is simply and unequivocally a complete and total failure of a language, and there is resultantly absolutely no room for concession when discussing the language.
PHP is broken and should never be used, and if there are use cases that the alternatives don't address, we should work to address them.
> PHP is broken and should never be used, and if there are use cases that the alternatives don't address, we should work to address them.
What alternatives meet your standards of not sharing qualities of PHP while matching the quality of simplicity in deployment (of which Python and Ruby instantly fail)?
I don't think any language matches the simplicity of deployment, but outside some very constrained use cases in environments where there can not be sufficient technical expertise and staffing, deployment doesn't begin to justify the technical travesty of PHP.
In a corporate technology organization there is no justification.
It seems to me that the kicker is the obviously superior alternatives, such as python, ruby, and JavaScript, don't offer the ease of deployment that php has via mod_php, and instead insist on the developer writing a web server. While there are advantages (performance, control) to the web server approach it is clear that there are advantages (simplicty) to being able to stick code snippets in web pages.
If you could deploy (say) python-decorated web pages via apache (on el cheapo hosting services) versus write yor own server and figure out how to host it then the problem would be solved. We have the languages, just not the ecosystems.
Obviously I'm not the first to observe this. Mod_python exists, it's just not popular.
You are wrong to say that these 'insist on the developer writing a web server.' All you have to do is choose a web server, which was also true with PHP (even if all you want to do is choose Apache).
Some of these servers are so easy to install and use, so much EASIER than Apache, that it really gets ridiculous to complain about.
mod_python has been deprecated for years in favor of mod_wsgi, which uses an actual standard. That you don't know this shows that the problem is too much documentation guiding people to things which are no longer modern or standard. If you use Apache, use mod_wsgi, it is infinitely better.
webfaction costs $5.50/mo, lets you do proper Python deploys rather than endless hacks and has good support, if you need cheap Python hosting.
> If you could deploy (say) python-decorated web pages via apache
This is a terrible, terrible idea. There are a few Python frameworks which have tried similar things, but Model-View-Controller was invented for a reason.
PHP can not be improved. An "improved PHP" would be completely different language, and probably a rewrite of the codebase. The difference between "improved PHP" and PHP as it stands today would at least as great as the difference between Perl and Perl6.
And if you are going to design a new language (that's what "improved PHP" would be), you have very little to gain in basing your work on PHP. The ecosystem is in the current PHP, and it is as likely to transition to completely different language as it would be to transition to your "improved PHP".
I don't think those people understand the economics of programming languages. It's not going to die off any time soon, what with all the thousands of companies that use it, and zillions of lines of code.
As someone who would rather not ever work with PHP again, the best thing is to simply focus on other languages and environments, helping to bolster those ecosystems.
Agreed, but publicising stupid problems with PHP actually does help that cause - it might push someone who's "on the fence" over to using a better system.
I'm one of those on the fence. I've spent most of my programming life working on PHP and the recent barrage of negativity against PHP has made me more interested in learning another language, if just to make an educated comparison and see whether I am continue to use PHP "because I know it" or because it is actually good.
I'm currently having a go at Python + Flask in my spare time.
Recent? I never cared for it, and I know that was a common sentiment amongst many 'HN' type people (even though there was no HN then), but until Ruby on Rails came out, I had never found something that had everything I needed and wide appeal. At the time, for me, that meant continued work on Apache Rivet ( http://tcl.apache.org/rivet/ ) , which in a lot of ways was better than PHP, but always suffered from not having a lot of users.
I stopped using PHP years ago because their open-source community is a broken insider network. After wasting a few bug reports repeatedly arguing with certain core developers who took the position that code and documentation not being in sync wasn't a bug, I quit trying.
That's why, for example, .NET world has .ToLowerInvariant() and .ToUpperInvariant() and developers are advised to use it when doing internal stuff. Interpreting / parsing a language is clearly an internal task and shouldn't be affected by locale changes.
Other languages like PHP (partially), BASIC or Pascal are case insensitive, so lookup has to be done case-insensitively which means that case has to be normalized, so transforming case of identifier becomes necessary. If it can't be done consistently, that's a problem.
I'd prefer them both to be one way or the other, but if they have to be different this is the right way to do it. For instance, functions can check to see what name the were called by (and process the call differently if we want to distinguish between cases). Variables can't do that so we must explicitly distinguish between them.
What was the advantage of case insensitive class and function names? Sounds to me like someone that was implementing very early on without great reasons and them kept for backwards comparability. In all my programming in PHP I have never thought to take advantage of this.
I'm assuming the original reason is lost in the mists of time, but one advantage it has when calling/using external/3rd party code is in style conventions. If in my code my convention is to use functionNames but in yours you use functionames or FunctionNames, I can still code in my style after include()ing your file. A small advantage, granted.
> What was the advantage of case insensitive class and function names?
The programmer can be sloppy/lazy and still have thing turn out largely as expected. If you're just learning how to program, this makes it a bit easier, since a whole class of possible problems goes away.
I don't understand what's the problem with fixing this really. I would completely agree that making "Info" and "info" class names compatible is "not fixable", but what is the problem in making "Info" work if both the definition and usage are the same case? The bug says that this is exactly backwards - mixed case works, but same case doesn't.
The only way to make it not work is to first change the case in one locale and then case-insensitive compare it in another locale. Why would this kind of operation ever happen? Any sane situation should "just work":
- in declaration convert to lower-case and save, in usage convert to lower-case and lookup -> has to work
- in declaration save original, in usage search all classes with case-insensitive compare -> has to work
How was that bug ever created in the first place? I get the fact that "I" doesn't match to lower-case "i" in tr_TR, but why does it matter when comparing strings which should be equal? Just be consistent in how both the declarations and usages are converted...
This is my biggest problem with PHP. Aside from poor language construction , and the plethora of poorly written code the core language has lots of problems in it. When upgrading to PHP 5.4.3 I found six or seven show stopper bugs in PHP and some of its extensions ( one of which has never worked ). I am still waiting on the fix to one of them. https://bugs.php.net/bug.php?id=62302
I know it seems insane, but Turkish capitalization is not fun to work with as a programmer. When they latinized the alphabet 100 years ago or so, they were short on vowels and so it must have seemed pretty clever and convenient to make i and I separate letters with İ and ı respective case pairs. From a western programmers perspective though it's one of the worst unicode special cases owing to its combined unexpectedness and commonness.
Just as an example, text-transform: uppercase has been broken in Turkish for all major browsers until I believe Firefox finally fixed it late last year, after having a bug open for nearly a decade.
From my point of view there could be one very simple solution: just add new codepoints in unicode for turkish I and i. So the latin i would follow the common case conventions, and turkish i would use whatever crazy stuff they have there.
Of course that might be bit late to do now, there is probably too much text encoded in the current format.
It's probably a bit late, agreed, but it seems to me this problem is just as much the fault of the encoding itself as it is the fault of PHP : Turkish i and I may look like Western European i and I but they're entirely different characters.
It's funny, the first thing I thought is "someone was having trouble with the turkish I and tried a hackaround, and now it's unfixable."
I blame Atatürk. If I had a time machine, I'd skip killing Hitler and travel back to the language reform time. "Do you know how much trouble this is going to cause us? Reuse the X, make one a dotted e. I don't care, this is going to fuck everything up!"
Şimdi İstanbul'da oturuyorum. :) (I think that's right, I'm still learning the language...) The comment was meant to be snarky -- obviously, the e-i-ö-ü \ a-ı-o-u rule would be broken, which is the reason for the undotted I. Further, nobody could have anticipated in 1927 the vast extent of automation that we are going through now.
For those that don't know Turkish, there is a faced of the language called vowel harmony. When suffixes are added to a word, which is common in Turkish for everything from pluralization, verb congugations to prepositions, the vowels in the suffix will be altered to match the last vowel. (Some Arab loanwords don't follow this, mind you, but it works 98% of the time.) So, the dotted and undotted vowels (except e for some reason) all follow this pattern.
(Incidentally, however, the problems could be solved by turning the single dotted i into a double dotted I, keeping the original symmetry. At that point, a lowercase dotted I would no longer break any system, since you could map them to be functionally equivalent for anything after the reform. While we're at it, I have a few ideas for English language reform...)
You are missing the point that "new" alphabet was way more before designed than "systems".
Right way should be involving in a process of developing standards, not changing some chars because some new tech come and has problems with the language ( going rampage as you mentioned earlier isn't also a valid solution ).
I don't quite understand your point here. I understand which came first, by quite some time. It happens that this is a stickier problem than merely "developing standards." What happens when an American tries to log in from a Turkish terminal? If everything is made case insensitive, i turns into İ rather than I. Similarly, what happens when a Türk logs in from another terminal? Do you have the locale attached to the user? (Public terminals can be an issue if someone can't change the keyboard layout. Do they log in as denIz? Does that work? It might if everything is brought to upper case, but not lowercase.
The Turkish I is one of the most interesting issues in internationalization.
Please stop telling PHP is a crap. News on the topic are damn to high. Think that PHP coders are beginning to migrate to stuff like python and that most of them don't want to learn programming, they still want to monkey write program and that through trial and errors it works. I am on a one of the #python-xx irc channel, it is an horror.
PHP is cool, it is useful, it is a magnet for bad developers. This way they don't pollute our ecosystems.
PHP is to Ruby(Or whatever your hater flavor is) as Christians is to Muslims. Neither of them are going to go away until one of them kills all the others. The more likely alternative is something else coming and destroying them both.
"No, the problem results because lowercase i (in most languages) and uppercase I (in most languages) are not actually considered to be the upper/lower variant of the same letter in Turkish. In Turkish, the undotted ı is the lowercase of I, and the dotted İ is the uppercase of i. If you have a class named Image, it will break if the locale is changed to turkish because class_exists() function uses zend_str_tolower(), and changes the case on all classes, because they are supposed to be case insensitive. Someone else above explained it very well:
"class_exists() function uses zend_str_tolower(). zend_str_tolower() uses zend_tolower(). zend_tolower() uses _tolower_l() on Windows and tolower() on other oses. _tolower_l() is not locale aware. tolower() is LC_CTYPE aware."
Edit: Someone else later said the following (I'm wondering if it's true):
"This, practically, can't be fixed. Mainly because there's no way to know if 'I' is uppercase of 'i' or 'ı' since there's not a separate place for Turkish 'I' in code tables. The same holds for 'i' (can't be known if it's lowercase of 'I' or 'İ'). I told 2 years ago and will say it again: PHP should provide a way to turn off case-insensitive function/class name lookup. No good programmer uses this Basic language feature since identifiers are case-sensitive in all real languages like Python, Ruby, C#, Java."