As far as I can tell from my reading, the strangeness stems from the fact that:
> Arrays in PHP are ordered hashtables (i.e. the hash buckets are part of a doubly linked list)
And that iteration is done using a "internal array pointer":
> This pointer is part of the HashTable structure and is basically just a pointer to the current hashtable Bucket. The internal array pointer is safe against modification, i.e. if the current Bucket is removed, then the internal array pointer will be updated to point to the next bucket.
Which together require some complex copying rules to allow for some simple things like iterating over the same array in nested loops.
I'm not very familiar with the implementation details of many other languages with these constructs, but in python a `for` loop (which operates similarly to the described `foreach` loop in php) simply operates over an iterator, which have a well defined implementation [1]. I don't know about the implementation any deeper than that, however.
I'm curious how other languages implementation of foreach type constructs stack up and how the choice of implementation for the standard list/array datatype affects the interface.
Is this actually Stackoverflow or an impostor phishing site? I don't see the "Question has been closed as not constructive" notice even though this question meets all the requirements for it.
Really? It's a purely technical question with one correct answer--exactly the sort of questions that StackOverflow wants. It is not a question that would breed discussion, have a list of answers or have no correct answer, which is what gets closed.
Sorry, I was being facetious. You're completely right. It's just that's the automatic assumption I make when I see a SO question that's deemed interesting enough to get upvoted on HN
The worst is when the top google result is a closed Stackoverflow question complaining that it's a dupe, or belongs on another site, with no pointers to the actual helpful info.
You know what, I read the question and answer and said to myself "at least no one on HN can complain about THIS one being closed". Disappointed but not surprised.
It's really predictable when the "Question has been closed as not constructive" label will appear - if the question doesn't contain code, it probably will be closed as not constructive.
If anybody feels like explaining something else that's also puzzling about the Zend engine and PHP arrays; I had a few hours spent the other day on a WTF moment writing a PHP extension in C and querying array zvals.
I was doing something fairly simple, trying to extract values passed as named argument to a function and turning them back into simple C types (char * and int):
The part that I still don't understand (but that I figured out by trial-and-error) was why `zend_hash_find` takes a `void••`[1] as argument, which should actually be a `zval•••` cast as `void••`. What's the purpose of the triple pointer here?
Here's my understanding of why each pointer is needed, from reading the Zend source code for a while:
1) The innermost pointer is needed because Zend hash tables actually store a "zval* " (pointer to zval), not a zval directly. The zval is allocated separately, then its pointer is stored into the table.
2) The second pointer is needed because Zend tables internally malloc storage for whatever they store (zval* ) in this case, then access that data as a pointer. The "zval* " pointer is memcpy'd into the malloc'd area. This data is accessed through a "zval* * " pointer. This allows users of zend_hash_xxx to not only access the "zval* " pointer, but also change it.
3) In C, one way for a function to return a value is by passing a pointer to a variable that will store the result. Since zend_hash_find returns the internal "zval* * " data, you need to pass in a "zval* * * " pointer to a "zval* * " pointer that is the actual return value you want. Through this "zval* * " pointer, you can read and also change the "zval* " data stored in that hash table cell.
1) Symbol tables store double pointer, not single, see my comment to the parent for the reason why. There may be hash tables that store single pointers, but not symbol tables.
PHP arrays store zval ••. The reason for it has to do with references - basically, if you want to do $a =& $b; and then $a = 1 and want for $b be 1 now too, if symbol table for b is storing zval •, there's no way to change it when $a = 1 is executed. However, if b's entry stores pointer to actual zval •, and so does a's entry, when you change that storage to point to zval with 1 instead, both $a and $b would change. Hope I explain it clearly, hard to do it without drawing a picture :)
So, to receive zval ••, you need to pass zval ••• to the hash function. That's why the type in general is void ••, because generic hash (hashes are used for all kinds of things, not only zvals) stores void •, so to receive it you pass void ••.
Just for fun, there are places in the code IIRC where quadruple pointer can be found (see zend_fcall_info_args_save for example). Pretty rare case though, don't remember any place with five-times pointer.
I think the interesting thing that this highlights about php, perhaps especially for people who've never worked in it, is the fact that php is an extremely rare example of a scripting language that has value semantics for complex objects.
> perhaps especially for people who've never worked in it
At my last job, one of our interview questions for an experienced PHP programmer was "What makes PHP arrays different or unique, from a computer science point of view?" Not a single one ever got anywhere close to the right answer (that they're ordered hash tables, not arrays).
Pretty sure this applies to by-value and by-reference semantics too.
That doesn't make PHP unique. AWK already did that.
By extension, a pitfall of interview questions: you can't always assume you actually know the correct answer or even that there is a correct answer. Which may cause interviewees to start secondguessing themselves and make them appear worse than they are. You can basically send them into an infinite loop by an illformed question. "Does he mean that they aren't actually arrays? Well, he can't mean that, because there are other languages whose arrays are actually hash tables. And they aren't called 'associative arrays' for nothing. So what does he mean?"
If the candidate said all of the above instead of just thinking it within their head, I would award them additional points. When I'm interviewing someone, I care about how they think and process instead of what they know.
Most of the time, getting the right answer is a matter of asking the right question. If most experienced people can't ask a fairly simple question, then maybe the question is to blame. That bit about the "computer science point of view" may be a little bit vague. Asking for implementation specifics may be more appropriate. I do know that arrays in PHP are implemented as hash tables as this is a common rant about it. But from your question I did not understand what you mean.
I'm still curious what do you mean by "ordered" because the PHP arrays don't order their keys. The common rant previously mentioned aka having to use asort() for getting a proper array:
Most hash tables have a semi-random order dictated by the hash algorithm in combination with the bucket count. PHP Arrays are ordered by insertion order (each slot in the hash table has a next pointer, the last of which is appended to on insertion).
The order may be unusual or even non-obvious, but it is predictable.
"Ordered" means "having order", not "having order that I wanted it to have". The order, unless you have sorted it, is the order of creation of elements. Which may not be what you want if you want numeric key order and you insert keys in different order. So what it means is that PHP array/hash structure supports the concept of order - so you can ask questions like "what is the first element? what is the next one?" Not for all hash structures this is true - for many hashes, asking "what is the next element after this one" is meaningless since there's no order defined on elements.
I got it now. The issue is the language barrier. And by language I mean my native language where "ordered" is sometimes used as synonym for "sorted" (at least in CS and Math fields), hence the confusion.
Insert-ordered hash tables are a little less unique now. Ruby 1.9+ has them (with the built-in Hash), though it was introduced quietly and not a lot of people seem to know that.
It's a surprisingly useful data structure sometimes. I'm not sure if conflating three major kinds of data structures into one was a great decision, though. Honestly the OP's weirdness is much less likely to be important than the mixed-integer/string key weirdnesses you're more likely to run into.
But why would you ask a question like that in an interview? What does it tell you about the person you're interviewing? Or does it tell you something about yourself?
> But why would you ask a question like that in an interview?
Simple. You want to find out if they spend their time on HN. There hasn't been a single discussion on HN about PHP that doesn't mention that arrays in PHP are hash tables. Not one.
It tests the candidate's ability to work on a vaguely specified problem and also gives them insight on how the candidate feels about working on a lot of poorly specified projects.
If the candidate is not smart enough to get up and leave the interview after being asked such a terrible question, you don't want to hire them. The company's actual business model is to sell the resumes of people who are smart enough to reject working there to companies looking for good coders.
I am absolutely certain that you are correct. I have never programmed in PHP so I was confused trying to understand exactly what was meant by references in the stack overflow question. I was struck by how every discussion of the topic was completely imprecise.
There was no clear explanation of references and their relationship with variables and how they differ from what are usually called 'pointers' that clearly distinguished PHP's view of variables from every other object oriented programming language that primarily used reference semantics for objects.
It was only in the context of a discussion of how zvals work and the is_ref and the refcount fields that I could understand exactly what semantics were being used by PHP.
It's even less interesting because PHP arrays have value semantics so someFunc($array) and foreach($array..) aren't really that different. The whole thing is pretty intuitive.
Unless they are experienced developers doing complex data structures, this is probably a good advice. References need to be approached very carefully to use them right. It doesn't help that in old PHP versions you had to use references when working with objects, and some people assumed some bad habits from those times and these habits propagate through code without people actually understanding what's going on.
I hear ya. In fact, I got burned by that problem enough times to make a test for it in my PHPCS 'coding standard'... which is basically a handful of standards that look for my stupid, repeated, coding errors :)
> Arrays in PHP are ordered hashtables (i.e. the hash buckets are part of a doubly linked list)
And that iteration is done using a "internal array pointer":
> This pointer is part of the HashTable structure and is basically just a pointer to the current hashtable Bucket. The internal array pointer is safe against modification, i.e. if the current Bucket is removed, then the internal array pointer will be updated to point to the next bucket.
Which together require some complex copying rules to allow for some simple things like iterating over the same array in nested loops.
I'm not very familiar with the implementation details of many other languages with these constructs, but in python a `for` loop (which operates similarly to the described `foreach` loop in php) simply operates over an iterator, which have a well defined implementation [1]. I don't know about the implementation any deeper than that, however.
I'm curious how other languages implementation of foreach type constructs stack up and how the choice of implementation for the standard list/array datatype affects the interface.
[1] http://excess.org/article/2013/02/itergen1/#iterators