PHP uses ref-counting for most garbage collection. That means non-cyclic data structures are collected eagerly, as soon as the last reference to an object is removed.
Naïve ref-counting can't collect cyclic data structures, though. Normally, cycles are "collected" in PHP by just waiting until the request is done and ditching everything. That works great for web sites, but makes less sense for a command line app like Composer.
To better reclaim memory, PHP now has a cycle collector. Whenever a ref-count is decremented but not zero, that means a new island of detached cyclic objects could have been created. When this happens, it adds that object to an array of possible cyclic roots.
When that array gets full (10,000 elements), the cycle collector is triggered. This walks the array and tries to collect any cyclic objects. They reference this paper[1] for their algorithm for doing this, but what they describe just sounds like a regular simple synchronous cycle collector to me.
The basic process is pretty simple. Starting at an object that could be the beginning of some cyclic graph, speculatively decrement the ref-count of everything it refers to. If any of them go to zero, recursively do that to everything they refer to and so on. When that's done, if you end up with any objects that are at zero references, they can be collected. For everything left, undo the speculative decrements.
If you have a large live object graph, this process can be super slow: you have to traverse the entire object graph. If there are few dead objects, you burn a bunch of time doing this and don't get anything back.
Meanwhile, you're busy adding and removing references to live objects, so that potential root array is constantly filling up, re-triggering the same ineffective collection over and over again. Note that this happens even when you aren't allocating: just assigning references is enough to fill the array.
To me, this is the real problem compared to other languages. You shouldn't thrash your GC if you aren't allocating anything!
Disabling the GC (which only disables the cycle collector, not the regular delete-on-zero-refs) avoids that. However, it has a side effect. Once the potential root array is full, any new potential roots get discarded. That means even if you re-enable the cycle collector later, those cyclic objects may never be collected. Probably not a problem for Composer since its a command-line app that exits when done, but not a good idea for a long-running app.
There are other things PHP could do here:
1. Don't use ref-counting. Use a normal tracing GC. Then you only kick off GC based on allocation pressure, not just by mutating memory. Obviously, this would be a big change!
2. Consider prioritizing and incrementally processing the root array. If it kept track of how often the same object reappeared in the root array each GC, it can get a sense of "hey, we're probably not going to collect this". Sort the array by priority so that potentially cyclic objects that have been live in the past are at one end. Then don't process the whole array: just process for a while and stop.
http://php.net/manual/en/features.gc.php
Here's when I found:
PHP uses ref-counting for most garbage collection. That means non-cyclic data structures are collected eagerly, as soon as the last reference to an object is removed.
Naïve ref-counting can't collect cyclic data structures, though. Normally, cycles are "collected" in PHP by just waiting until the request is done and ditching everything. That works great for web sites, but makes less sense for a command line app like Composer.
To better reclaim memory, PHP now has a cycle collector. Whenever a ref-count is decremented but not zero, that means a new island of detached cyclic objects could have been created. When this happens, it adds that object to an array of possible cyclic roots.
When that array gets full (10,000 elements), the cycle collector is triggered. This walks the array and tries to collect any cyclic objects. They reference this paper[1] for their algorithm for doing this, but what they describe just sounds like a regular simple synchronous cycle collector to me.
The basic process is pretty simple. Starting at an object that could be the beginning of some cyclic graph, speculatively decrement the ref-count of everything it refers to. If any of them go to zero, recursively do that to everything they refer to and so on. When that's done, if you end up with any objects that are at zero references, they can be collected. For everything left, undo the speculative decrements.
If you have a large live object graph, this process can be super slow: you have to traverse the entire object graph. If there are few dead objects, you burn a bunch of time doing this and don't get anything back.
Meanwhile, you're busy adding and removing references to live objects, so that potential root array is constantly filling up, re-triggering the same ineffective collection over and over again. Note that this happens even when you aren't allocating: just assigning references is enough to fill the array.
To me, this is the real problem compared to other languages. You shouldn't thrash your GC if you aren't allocating anything!
Disabling the GC (which only disables the cycle collector, not the regular delete-on-zero-refs) avoids that. However, it has a side effect. Once the potential root array is full, any new potential roots get discarded. That means even if you re-enable the cycle collector later, those cyclic objects may never be collected. Probably not a problem for Composer since its a command-line app that exits when done, but not a good idea for a long-running app.
There are other things PHP could do here:
1. Don't use ref-counting. Use a normal tracing GC. Then you only kick off GC based on allocation pressure, not just by mutating memory. Obviously, this would be a big change!
2. Consider prioritizing and incrementally processing the root array. If it kept track of how often the same object reappeared in the root array each GC, it can get a sense of "hey, we're probably not going to collect this". Sort the array by priority so that potentially cyclic objects that have been live in the past are at one end. Then don't process the whole array: just process for a while and stop.
[1]: http://media.junglecode.net/media/Bacon01Concurrent.pdf