How PHP's foreach works

Wilduck · on Feb 27, 2013

As far as I can tell from my reading, the strangeness stems from the fact that:

> Arrays in PHP are ordered hashtables (i.e. the hash buckets are part of a doubly linked list)

And that iteration is done using a "internal array pointer":

> This pointer is part of the HashTable structure and is basically just a pointer to the current hashtable Bucket. The internal array pointer is safe against modification, i.e. if the current Bucket is removed, then the internal array pointer will be updated to point to the next bucket.

Which together require some complex copying rules to allow for some simple things like iterating over the same array in nested loops.

I'm not very familiar with the implementation details of many other languages with these constructs, but in python a `for` loop (which operates similarly to the described `foreach` loop in php) simply operates over an iterator, which have a well defined implementation [1]. I don't know about the implementation any deeper than that, however.

I'm curious how other languages implementation of foreach type constructs stack up and how the choice of implementation for the standard list/array datatype affects the interface.

[1] http://excess.org/article/2013/02/itergen1/#iterators

danso · on Feb 27, 2013

Is this actually Stackoverflow or an impostor phishing site? I don't see the "Question has been closed as not constructive" notice even though this question meets all the requirements for it.

tikhonj · on Feb 27, 2013

Really? It's a purely technical question with one correct answer--exactly the sort of questions that StackOverflow wants. It is not a question that would breed discussion, have a list of answers or have no correct answer, which is what gets closed.

danso · on Feb 27, 2013

Sorry, I was being facetious. You're completely right. It's just that's the automatic assumption I make when I see a SO question that's deemed interesting enough to get upvoted on HN

dionidium · on Feb 27, 2013

I wish that were true. Useful stuff is closed all the time. Which is good, because we are totally running out of bits on the internet!

menacingly · on Feb 28, 2013

The worst is when the top google result is a closed Stackoverflow question complaining that it's a dupe, or belongs on another site, with no pointers to the actual helpful info.

taejo · on Feb 28, 2013

AFAIK, it's explicit policy not to close questions as dupes without linking to the original question

benparsons · on Feb 28, 2013

Can you provide an example of this?

jsmeaton · on Feb 28, 2013

You know what, I read the question and answer and said to myself "at least no one on HN can complain about THIS one being closed". Disappointed but not surprised.

yen223 · on Feb 28, 2013

It's really predictable when the "Question has been closed as not constructive" label will appear - if the question doesn't contain code, it probably will be closed as not constructive.

kenneth · on Feb 28, 2013

If anybody feels like explaining something else that's also puzzling about the Zend engine and PHP arrays; I had a few hours spent the other day on a WTF moment writing a PHP extension in C and querying array zvals.

I was doing something fairly simple, trying to extract values passed as named argument to a function and turning them back into simple C types (char * and int):

    // capturing hash keys as zvals
    zval **salt_hex_val;
    zval **key_hex_val;
    zval **iterations_val;
    if (    // getting values
            zend_hash_find(hash, "salt", strlen("salt") + 1, (void**)&salt_hex_val) == FAILURE ||
            zend_hash_find(hash, "key", strlen("key") + 1, (void**)&key_hex_val) == FAILURE ||
            zend_hash_find(hash, "iterations", strlen("iterations") + 1, (void**)&iterations_val) == FAILURE ||
            // checking types
            Z_TYPE_PP(salt_hex_val) != IS_STRING ||
            Z_TYPE_PP(key_hex_val) != IS_STRING ||
            (Z_TYPE_PP(iterations_val) != IS_LONG && Z_TYPE_PP(iterations_val) != IS_DOUBLE)
        ) {
        php_error_docref(NULL TSRMLS_CC, E_WARNING, "Could not extract and check types on required values in hash: salt, key, and iterations.");
        RETURN_NULL();
    }
    
    char *salt_hex;
    char *key_hex;

    if (Z_STRLEN_PP(salt_hex_val) != salt_length * 2 ||
            Z_STRLEN_PP(key_hex_val) != key_length * 2) {
        php_error_docref(NULL TSRMLS_CC, E_WARNING, "Key or Salt length incorrect.");
        RETURN_NULL();
    }
    
    salt_hex = Z_STRVAL_PP(salt_hex_val);
    key_hex = Z_STRVAL_PP(key_hex_val);

    int iterations = (Z_TYPE_PP(iterations_val) == IS_LONG ?
        (int)Z_LVAL_PP(iterations_val) :
        (int)Z_DVAL_PP(iterations_val));

The part that I still don't understand (but that I figured out by trial-and-error) was why `zend_hash_find` takes a `void••`[1] as argument, which should actually be a `zval•••` cast as `void••`. What's the purpose of the triple pointer here?

    zend_hash_find(hash, "salt", strlen("salt") + 1, (void**)&salt_hex_val)

[1]: Imagine the • there is a star / asterisk.

ahomescu1 · on Feb 28, 2013

Here's my understanding of why each pointer is needed, from reading the Zend source code for a while:

1) The innermost pointer is needed because Zend hash tables actually store a "zval* " (pointer to zval), not a zval directly. The zval is allocated separately, then its pointer is stored into the table.

2) The second pointer is needed because Zend tables internally malloc storage for whatever they store (zval* ) in this case, then access that data as a pointer. The "zval* " pointer is memcpy'd into the malloc'd area. This data is accessed through a "zval* * " pointer. This allows users of zend_hash_xxx to not only access the "zval* " pointer, but also change it.

3) In C, one way for a function to return a value is by passing a pointer to a variable that will store the result. Since zend_hash_find returns the internal "zval* * " data, you need to pass in a "zval* * * " pointer to a "zval* * " pointer that is the actual return value you want. Through this "zval* * " pointer, you can read and also change the "zval* " data stored in that hash table cell.

smsm42 · on Feb 28, 2013

1) Symbol tables store double pointer, not single, see my comment to the parent for the reason why. There may be hash tables that store single pointers, but not symbol tables.

smsm42 · on Feb 28, 2013

PHP arrays store zval ••. The reason for it has to do with references - basically, if you want to do $a =& $b; and then $a = 1 and want for $b be 1 now too, if symbol table for b is storing zval •, there's no way to change it when $a = 1 is executed. However, if b's entry stores pointer to actual zval •, and so does a's entry, when you change that storage to point to zval with 1 instead, both $a and $b would change. Hope I explain it clearly, hard to do it without drawing a picture :)

So, to receive zval ••, you need to pass zval ••• to the hash function. That's why the type in general is void ••, because generic hash (hashes are used for all kinds of things, not only zvals) stores void •, so to receive it you pass void ••.

Just for fun, there are places in the code IIRC where quadruple pointer can be found (see zend_fcall_info_args_save for example). Pretty rare case though, don't remember any place with five-times pointer.

stormbrew · on Feb 28, 2013

I think the interesting thing that this highlights about php, perhaps especially for people who've never worked in it, is the fact that php is an extremely rare example of a scripting language that has value semantics for complex objects.

I've always found that an interesting choice.

unconed · on Feb 28, 2013

> perhaps especially for people who've never worked in it

At my last job, one of our interview questions for an experienced PHP programmer was "What makes PHP arrays different or unique, from a computer science point of view?" Not a single one ever got anywhere close to the right answer (that they're ordered hash tables, not arrays).

Pretty sure this applies to by-value and by-reference semantics too.

Confusion · on Feb 28, 2013

That doesn't make PHP unique. AWK already did that.

By extension, a pitfall of interview questions: you can't always assume you actually know the correct answer or even that there is a correct answer. Which may cause interviewees to start secondguessing themselves and make them appear worse than they are. You can basically send them into an infinite loop by an illformed question. "Does he mean that they aren't actually arrays? Well, he can't mean that, because there are other languages whose arrays are actually hash tables. And they aren't called 'associative arrays' for nothing. So what does he mean?"

chime · on Feb 28, 2013

If the candidate said all of the above instead of just thinking it within their head, I would award them additional points. When I'm interviewing someone, I care about how they think and process instead of what they know.

1SaltwaterC · on Feb 28, 2013

Most of the time, getting the right answer is a matter of asking the right question. If most experienced people can't ask a fairly simple question, then maybe the question is to blame. That bit about the "computer science point of view" may be a little bit vague. Asking for implementation specifics may be more appropriate. I do know that arrays in PHP are implemented as hash tables as this is a common rant about it. But from your question I did not understand what you mean.

I'm still curious what do you mean by "ordered" because the PHP arrays don't order their keys. The common rant previously mentioned aka having to use asort() for getting a proper array:

  php > $arr = array();
  php > $arr[2] = 2;
  php > $arr[1] = 1;
  php > var_dump($arr);
  array(2) {
    [2]=>
    int(2)
    [1]=>
    int(1)
  }

stormbrew · on Feb 28, 2013

Most hash tables have a semi-random order dictated by the hash algorithm in combination with the bucket count. PHP Arrays are ordered by insertion order (each slot in the hash table has a next pointer, the last of which is appended to on insertion).

The order may be unusual or even non-obvious, but it is predictable.

jeltz · on Feb 28, 2013

Ruby 1.9 hash tables are also ordered by insertion order.

smsm42 · on Feb 28, 2013

"Ordered" means "having order", not "having order that I wanted it to have". The order, unless you have sorted it, is the order of creation of elements. Which may not be what you want if you want numeric key order and you insert keys in different order. So what it means is that PHP array/hash structure supports the concept of order - so you can ask questions like "what is the first element? what is the next one?" Not for all hash structures this is true - for many hashes, asking "what is the next element after this one" is meaningless since there's no order defined on elements.

1SaltwaterC · on March 4, 2013

I got it now. The issue is the language barrier. And by language I mean my native language where "ordered" is sometimes used as synonym for "sorted" (at least in CS and Math fields), hence the confusion.

stormbrew · on Feb 28, 2013

Insert-ordered hash tables are a little less unique now. Ruby 1.9+ has them (with the built-in Hash), though it was introduced quietly and not a lot of people seem to know that.

It's a surprisingly useful data structure sometimes. I'm not sure if conflating three major kinds of data structures into one was a great decision, though. Honestly the OP's weirdness is much less likely to be important than the mixed-integer/string key weirdnesses you're more likely to run into.

objectified · on Feb 28, 2013

But why would you ask a question like that in an interview? What does it tell you about the person you're interviewing? Or does it tell you something about yourself?

maratd · on Feb 28, 2013

> But why would you ask a question like that in an interview?

Simple. You want to find out if they spend their time on HN. There hasn't been a single discussion on HN about PHP that doesn't mention that arrays in PHP are hash tables. Not one.

KMag · on March 1, 2013

It tests the candidate's ability to work on a vaguely specified problem and also gives them insight on how the candidate feels about working on a lot of poorly specified projects.

If the candidate is not smart enough to get up and leave the interview after being asked such a terrible question, you don't want to hire them. The company's actual business model is to sell the resumes of people who are smart enough to reject working there to companies looking for good coders.

Edit: confusing use of "their".

cpleppert · on Feb 28, 2013

I am absolutely certain that you are correct. I have never programmed in PHP so I was confused trying to understand exactly what was meant by references in the stack overflow question. I was struck by how every discussion of the topic was completely imprecise.

There was no clear explanation of references and their relationship with variables and how they differ from what are usually called 'pointers' that clearly distinguished PHP's view of variables from every other object oriented programming language that primarily used reference semantics for objects.

It was only in the context of a discussion of how zvals work and the is_ref and the refcount fields that I could understand exactly what semantics were being used by PHP.

nkozyra · on Feb 27, 2013

So basically it operates on a copy unless it determines it doesn't need to?

I'm not sure why this is interesting.

wvenable · on Feb 27, 2013

It's even less interesting because PHP arrays have value semantics so someFunc($array) and foreach($array..) aren't really that different. The whole thing is pretty intuitive.

The answer to the question is pretty deep though.

francispelland · on Feb 27, 2013

Wasn't this a given when working with PHP? You can afterall send the reference so that you are modifying the array as you go, rather than at the end.

$array = array(1,2,3,4,5); foreach ($array as &value){...}

dools · on Feb 28, 2013

A more "hair friendly" way to modify an array from within a foreach loop is to just use the array key:

    foreach($array as $key => $value)
    {
        if(someCondition($key))
            $array[$key] = someTransformation($value);
    }

or whatever - ie. just use the key to modify the original array directly. It's much easier to read and see what's happening (in my opinion anyway).

narcissus · on Feb 27, 2013

Just always be sure to unset $value afterwards :)

ufo · on Feb 28, 2013

I don't know PHP. What happens if you forget to unset the reference?

uxp · on Feb 28, 2013

Then you get the joy of being able to play with it later.

    $array = array(1,2,3,4,5);
    foreach ($array as &$value) {
      echo $value; // some non-mutating action
    }
    $value = 'woops';
    print_r($array); // ([0] => 1 [1] => 2 [2] => 3 [3] => 4 [4] => woops)

jtreminio · on Feb 28, 2013

That's why I tell the developers on my team that if they're thinking of using references in PHP, then they're probably Doing It Wrong.

smsm42 · on Feb 28, 2013

Unless they are experienced developers doing complex data structures, this is probably a good advice. References need to be approached very carefully to use them right. It doesn't help that in old PHP versions you had to use references when working with objects, and some people assumed some bad habits from those times and these habits propagate through code without people actually understanding what's going on.

notJim · on Feb 28, 2013

Huh, I've never actually had that problem, but that's a really good point.

xkcdfanboy · on Feb 28, 2013

I can't count the times that references have caused wierd errors in my PHP code. Definitely a good recommendation.

narcissus · on Feb 28, 2013

I hear ya. In fact, I got burned by that problem enough times to make a test for it in my PHPCS 'coding standard'... which is basically a handful of standards that look for my stupid, repeated, coding errors :)

function_seven · on Feb 28, 2013

My own standard is this:

    foreach ($array as & $ref) {
        // Do something with $ref
    } unset ($ref);

i.e. put the `unset()` call on the same line as the closing brace, forever "welding" it to that block.

Kiro · on Feb 28, 2013

Why do you even use references in the first place?

smsm42 · on Feb 28, 2013

Data structures like trees, etc. are pretty hard to do without them, due to default array by-value semantics.