Hacker News new | past | comments | ask | show | jobs | submit login
Composer – Disable GC when computing deps and refs (github.com/composer)
300 points by Damin0u on Dec 2, 2014 | hide | past | favorite | 119 comments



For those looking for a technical explanation, the PHP garbage collector in this case is probably wasting a ton of CPU cycles trying to collect thousands of objects (a LOT of objects are created to represent all the inter-package rules when solving dependencies) during the solving process. It keeps trying and trying as objects are allocated and it can not collect anything but still has to check them all every time it triggers.

Disabling GC just kills the advanced GC but leaves the basic reference counting approach to freeing memory, so Composer can keep trucking without using much more memory as the GC wasn't really collecting anything. The memory reduction many people report is rather due to some other improvements we have made yesterday.

As to why the problem went unnoticed for so long, it seems that the GC is not able to be observed by profilers, so whenever we looked at profiles to improve things we obviously did not spot the issue. In most cases though this isn't an issue and I would NOT recommend everyone disables GC on their project :) GC is very useful in many cases especially long running workers, but the Composer solver falls out of the use cases it's made for.


As to why the problem went unnoticed for so long, it seems that the GC is not able to be observed by profilers, so whenever we looked at profiles to improve things we obviously did not spot the issue.

That sounds like a bug in the profiler, not with Composer. Observing internal time is pretty important for any profiler.


Yes it is definitely a failure of the tooling and I hear it is actually being worked on.


Couldn't I also speed up Composer by using Toran Proxy? ( https://toranproxy.com/ )


That would help speed up the network part of the install/update yes. The linked patch on the other hand hopefully gets the CPU part to a decent level for most people.


Out of curiosity what tools do you use for profiling and finding these sorts of things? Plain old xdebug, xhprof, or other things? I'm going to have to jump into debugging a fairly large Symfony application within the next couple months and am on the look out for good tools to help me along.


Traditionally xdebug/xhprof were pretty decent, but xhprof has been a bit abandoned since facebook uses HHVM now.

There are two new commercial condenters though that came out in the last few months: Blackfire.io and QafooLabs.com

Both have announced support for showing GC time in profiles as a result of today's noise :) https://twitter.com/beberlei/status/539816149303955456 https://twitter.com/symfony_en/status/539815082881187841


Xhprof is maintained by Phacility. They did pick up on this[1], but it's not as easy as one might think.

[1] https://secure.phabricator.com/T6675


Thanks for the correction, I wasn't aware that was part of what phabricator extracted out of facebook :)


Thanks for the info, those two will be very helpful.


> It keeps trying and trying as objects are allocated

Worse. GC gets triggered just by assigning references, allocation isn't even needed.


I had the same issue in Python recently. The project runs as a server that loads a huge amount of objects from the database, and could use as much as 10GB memory! Python's reference counting works great, but every so often, the full-heap-scanning cycle collector would run, and it took quite a lot of time to scan a mutli-GB heap.

We noticed the issue happened most often when deserializing objects (loading them from Redis to memory). As it turns out, Python would schedule a collection every time the object_created counter was sufficiently higher than object_destroyed counter. In general, this makes sense, because that way you can be sure that objects are being created and not being freed, which most likely means a resource leak or a reference cycle. However, the same thing happens during deserialization - many new objects are created, and none are freed. Coupled with Python's low threshold (700), GC was triggered many many times in every serialization loop (usually in vain, as no new objects became recyclable). Disabling GC and running full collections manually solved the problem


Looks like someone disabled garbage collection on that comment thread as well :)


The truth is, I don't understand the point on having to download MBs of stupid animated images I will not even look at when I expect to see a commit diff.


The much anticipated sour-grapes-party-spoiler award goes to:


You could always use the Lynx browser


It would be nice if Chrome let you disable animated gifs, without disabling all images.



Thank you!


As far as i understand composer is roughly the same thing as the cpan client. And they just simply disabled the garbage collector for it.

What is this guy doing that he needs gigabytes of memory to install a bunch of php libraries?

    Before: Memory usage: 2194.78MB (peak: 3077.39MB), time: 1324.69s
    After:  Memory usage: 4542.54MB (peak: 4856.12MB), time:  232.66s


That user is using PEAR, which is the old shitty PHP "CPAN"

That's the reason for the huge memory usage. We're slowly moving away from PEAR, but since it works for now not everyone has/will transition.

Edit: I should also point out that there are a few packages that almost everyone uses (PHPMD, PHPCS, phpUnit) that are still mostly pulled from PEAR, though I think phpUnit has a composer option.


PHPUnit stopped updating PEAR in April, and actively destroyed PHPUnit on their PEAR repo at the start of this month (i.e. a patch version update that consisted just of a bash script printing a migration message...).


Recursively gathering all dependencies a project might have. A huge downside of the modern scripting languages landscape is that dependency graphs can get quite convoluted


So, did they exchange a 70% reduction in execution time for a 100% increment in memory usage?


1324 -> 232. If you say 70% reduction VS 100% increment sounds like you are talking the same stuff and it is possible to compare 70 with 100.

The reality is even in the above edge case: 2x memory, 6x speed.


This; interpreting performance numbers is hard. Also, memory nowadays is cheap, CPU power isn't.


> Also, memory nowadays is cheap, CPU power isn't.

Unless you're running your deployment on a 512Mb or 1G VM. I've had composer max out swap on those too. Even with 2G RAM it's not been happy sometimes, so be interesting to see what difference this patch makes.


You shouldn't be running composer update on your deployment, just composer install which doesn't take as much memory since it doesn't have to resolve dependencies.


> Unless you're running your deployment on a 512Mb or 1G VM.

Question - why not run Composer locally, as part of deployment?


Yeah, i have had to move to more expensive ec2 instance on account of this very issue..


IMO you should commit your composer.lock file up to your repository and then use composer.phar install --no-dev --optimize-autoloader on any production instance. Install is much faster and uses hardly any memory compared to the update command.

To add/update any dependencies for your project run the composer.phar update on your development environment or somewhere it can use a ton of memory and cpu without issue. Then just commit and push up your composer.lock changes. Been doing it this way for over a year and had no issues deploying changes in ec2.


In that particular case; other reports on that thread show much better reductions in time for much lower increases in memory.


Yeah, seems like a bug "fix."


The jubilation in the comments over this seems... misplaced?


Yeah, I wondered when They turn it on again.


Interesting. I was looking at the comments hoping for some more technical background, but unfortunately they seem to have been run over by the animated gif crowd.

Any more details on this?


It seems when you start to hit the memory limit PHP's automatic garbage collection will loop through the constructed objects to see if any can be cleaned up.

If none can (and in the case of Composer all the objects exist for a reason) then it's wasting time analysing the objects.

So in this case there's only a large waste of cpu doing nothing with gc enabled.


Well, yes and no, since some of the reports show an increase in memory usage, so the gc was doing something.


Most reports don't show significantly changed memory usage though (some increase slightly, others decrease slightly).



I think the article on this, explains why a memory reduction by garbage collection also increases execution speed: http://derickrethans.nl/collecting-garbage-performance-consi...


Wonderful commit.

(I didn't know animated gifs in github comments are a thing. Maybe I work too much with boring projects.)


Could someone more versed with PHP, and this project explain why turning off garbage collection helped so much? and why they didn't turn it back on at the end of the function?


PHP is reference counted, so memory is typically freed as soon as an object is no longer needed. Cycles are the exception which can cause memory leaks, so in version 5.3 php added a cycle collector, which reads every object in memory and very occasionally deletes objects that are disconnected and have greater than zero reference counts (cycles).

In my opinion, the php cycle collector is a pointless waste of time. In objective-c, apple just let's the memory leak by default, and they give you tools to find the leaks, and then you modify the code to break the cycles.

There is no need to turn cycle collection back on at the end of the program, because OS frees the memory at program termination.


I agree that cycle collector is pointless waste of time. Most script runs short enough that the memory leak doesn't really matter.

But for long running script, it's either cycle collector, or add support for weak reference. But IMO, due to how reference are stored in PHP, and to my limited knowledge of PHP core, I am quite sure cycle collector are more beneficial in both developer time and usefulness. (Not every programmers know how to manage reference cycle)


I'll note that in common usage, PHP does not exit entirely at the end of every HTTP request. (By default, PHP-FPM never exits between requests.) You would, at the least, have to keep track of all live objects and delete them at the end of each request... which sounds like a garbage collector to me.


> Could someone more versed with PHP, and this project explain why turning off garbage collection helped so much?

The cycle collector is relatively recent, I expect it's not very performant (since most PHP applications don't need it) and composer's dependency resolution may be hitting a pathological case (create lots of objects without cycles, triggering lots of collections but no actually useful work)

> and why they didn't turn it back on at the end of the function?

Since it's a package manager, I'd guess the expectation is the process will die soon-ish afterwards (once it's installed whatever it's resolved). There's a discussion of re-enabling it after dependency resolution (so postinstall hooks run with GC enabled) though.


Garbage collection is slow, but reduces memory usage. So disabling it costs memory. Also, Composer does not keep running, once the job is done, the script terminates, so you don't have to enable GC back again (it's only disabled in the context of the current execution).


> Garbage collection is slow, but reduces memory usage

Note that it's only reducing memory usage if there are cycles. The rest will be collected when the refcount falls to 0.


Turning it on again it's currently being discussed: https://github.com/composer/composer/issues/3488


I remember story of my friend in algorithmic contest for high school students in Poland (which are quite hard). He solved problems correctly, but in his implementation he got to check in every iteration of loop if a collection still got any elements. He used col.size()==0 instead of col.isEmpty(). The first was O(n) and it fucked up all performance.


That's a bug.


Not really, some containers have a linear-time size by design. The canonical example is a linked list in which you wish to keep the splice-another-list-at-middle time linear.


Wait, when did Github become the new 4chan?


It's much closer to reddit than 4chan, otherwise the nature of the images posted would be a little different.

And it's been like this for 2 or 3 years now. I've seen comment spam of images for commits and issues for quite a while.


The first one I remember was the commit that added CoffeeScript to rails by default. https://github.com/rails/rails/compare/9333ca7...23aa7da


The first I remember was https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/commi... , but I guess that was a couple of months later.


4chan? I always thought it was a mashup of Myspace and Dropbox. Can I find unspeakably awful porn there, too?


What is wrong with the comments...


Nothing actually, they're working correctly. The people on the other hand... that's the questionable part. ;)


I found an interesting comment between the gif: https://github.com/composer/composer/commit/ac676f47f7bbc619... by h4cc

> Behold, found something in the docs about garbage collection:

>> Therefore, it is probably wise to call gc_collect_cycles() just before you call gc_disable() to free up the memory that could be lost through possible roots that are already recorded in the root buffer. [...]


Never have I seen so many gifs on a commit page.


Then you most certainly don't remember the infamous Bumblebee Fiasco.

https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/commi...


Ugh. I remember that, I had posted on that to explain to the developer why there was so much attention and kept receiving mails and notifications from github for ages. At the time there was no way to "stop watching" when you had commented on something, if I remember correctly.


I still have all those notification emails; I am still thinking about graphing them one day to see how the commenting rate on this thread evolved over time.


Am I the only one that considers this disgusting? If the GC is so bad that it causes 2-10x slower operation in this use case, then it's a bad GC. I mean really, really bad. Short-lived objects in any modern GC should be swept away trivially without a lot of overhead. Of course we're talking about PHP here, so perhaps it's redundant to say something about it sucks, but jesus...runtimes that require hacks like this should be taken out back and shot.


Wow, OSX 64bit chrome can't handle that many animated gifs. 32bit could just fine. What gives?!


BeamSyncDropper v2


I was curious, so I did some investigation, starting here:

http://php.net/manual/en/features.gc.php

Here's when I found:

PHP uses ref-counting for most garbage collection. That means non-cyclic data structures are collected eagerly, as soon as the last reference to an object is removed.

Naïve ref-counting can't collect cyclic data structures, though. Normally, cycles are "collected" in PHP by just waiting until the request is done and ditching everything. That works great for web sites, but makes less sense for a command line app like Composer.

To better reclaim memory, PHP now has a cycle collector. Whenever a ref-count is decremented but not zero, that means a new island of detached cyclic objects could have been created. When this happens, it adds that object to an array of possible cyclic roots.

When that array gets full (10,000 elements), the cycle collector is triggered. This walks the array and tries to collect any cyclic objects. They reference this paper[1] for their algorithm for doing this, but what they describe just sounds like a regular simple synchronous cycle collector to me.

The basic process is pretty simple. Starting at an object that could be the beginning of some cyclic graph, speculatively decrement the ref-count of everything it refers to. If any of them go to zero, recursively do that to everything they refer to and so on. When that's done, if you end up with any objects that are at zero references, they can be collected. For everything left, undo the speculative decrements.

If you have a large live object graph, this process can be super slow: you have to traverse the entire object graph. If there are few dead objects, you burn a bunch of time doing this and don't get anything back.

Meanwhile, you're busy adding and removing references to live objects, so that potential root array is constantly filling up, re-triggering the same ineffective collection over and over again. Note that this happens even when you aren't allocating: just assigning references is enough to fill the array.

To me, this is the real problem compared to other languages. You shouldn't thrash your GC if you aren't allocating anything!

Disabling the GC (which only disables the cycle collector, not the regular delete-on-zero-refs) avoids that. However, it has a side effect. Once the potential root array is full, any new potential roots get discarded. That means even if you re-enable the cycle collector later, those cyclic objects may never be collected. Probably not a problem for Composer since its a command-line app that exits when done, but not a good idea for a long-running app.

There are other things PHP could do here:

1. Don't use ref-counting. Use a normal tracing GC. Then you only kick off GC based on allocation pressure, not just by mutating memory. Obviously, this would be a big change!

2. Consider prioritizing and incrementally processing the root array. If it kept track of how often the same object reappeared in the root array each GC, it can get a sense of "hey, we're probably not going to collect this". Sort the array by priority so that potentially cyclic objects that have been live in the past are at one end. Then don't process the whole array: just process for a while and stop.

[1]: http://media.junglecode.net/media/Bacon01Concurrent.pdf


That page has over 200MB worth of animated gifs, just as a warning.


Naive mark and sweep: making refcounting look fast for 50 years.


Some insightful comments would've been nice.


Excellent collection of gifs. That's the more interesting part for me. I already knew garbage collection was slow.


https://camo.githubusercontent.com/668aedc4bd252dd8fb5a57b90...

Particularly this. What a disturbing documentary.


This got me curious, what's that?



I see two lines changed! Click bait! :P


Do you work with 13yr-olds?


The commit is great. I love that the comments have spiraled completely out of control. At this point, 30 minutes after the link was posted, the comment thread is now a competition to see who can post the best gif.

I know we're serious here, but stuff like this reminds me why I love the internet so much. It's fun to cut loose once in a while.


Agreed. It's such a shame that HackerNews doesn't let you post animated GIFs - I think it'd really add a lot of value to the discussions here.


I'm hoping this is sarcasm since I don't think I've seen a single intelligent discussion in my life that was helped along by a funny GIF.

Not to say there's anything wrong with funny GIF's, but I come to HN exactly because it moderates away that sort of stuff.


It was sarcasm. Sorry I forgot the <sarcasm> tags - I'll be more careful in future.


If only you could post animated GIFs, then you could have added one that indicated sarcasm.


You forgot your closing tag :)


You forgot them again.


He only forgot to close it.


You guys are a tough crowd sometimes. </sarcasm>


There is a reason why i have a extension installed in Firefox just for the ability to say when a gif is to play.



The very one.


Technically, it will let you post them as links, it just won't upload and embed them. But a plugin or a userscript would fix that, provided they're posted somewhere with an easy to deal with API like imgur.


I think it's great too, but I shudder to think what it might look like 5 years from now. I can only figure that there will be dead gifs everywhere


Looks like github hosts most/all of them so should be ok.


everything that's hosted from 'camo.githubusercontent.com' is just proxied and will go away as the source goes away.


I think most of those gifs are hosted on github. I tried to post a linked image, and it only showed the hyperlink. When I uploaded the image to github, it showed up inline.


Github proxies the images in comments: https://github.com/blog/743-sidejack-prevention-phase-3-ssl-...

Here's the code behind it: https://github.com/atmos/camo


No kidding, when did GitHub turn into Reddit/4Chan/9Gag?


Since the start... ;)


That's what I figured as well, and since https://twitter.com/seldaek/status/539773104835555329 the rate of comments went nuts :)

I really felt bad seeing how we missed that improvement for so long, so turning it all in a gif-fest is more productive than self-deprecation!


I love it, but github could supress them behind a click to view (maybe all comments on mobile) because I just lost 10% of my data plan...


pictures/gifs don't belong in github comments. this is the dumbest thing.


If the Pull Request has to do with a bit of UI/UX, images and animations are appropriate, to demonstrate user interaction with the interface.


Strongly disagree. My team puts images in comments to document visual design changes. Occasionally, we'll even drop in an animated gif to illustrate a workflow/process. These can be extremely helpful in code review.


Warning NSFW Commit detected. "Pedophile 11-year-old girl images from 4Chan"

Stay classy programmers.


Warning: The commit is NSFW.

The commits are embarrassing, stupid and really exposes why developers are considered idiots. Why troll?

Because they are jerks. Period. Grow up noobs.


gifhub galore :)


What was I looking at again? I forgot because all of these animated GIF's are amazing!


[deleted]


Please try and keep your attitude in check. Your tone is not helping.


[deleted]


No. Concern trolling and tone policing needs to fuck right the fuck off.

Do you find you have much success motivating people when you use this tone of voice?

I reported this issue in February. Others have reported it before I did. What is Composer's response? Silence and negligence while its maintainers go to cons and drink beer.

Their priorities are totally fucking backwards.

I would strongly advise using less inflammatory titles than Horrendously Stupid and Ill-Advised Install Instructions if you wish people to not ignore you.

You might not want to do this, and that's fine and your prerogative.

If you do want to see a significant increase in cooperation, dropping the attitude will be a quick and effective way of doing so.


[deleted]


I attempted to be civil months ago. It got ignored.

You pointed out a valid concern, a few people kicked the tyres of it, and when it turned out that the fix entails a good deal of work - something you've skipped over - you decided to become "rude."

My goal is to fix the problem

As I said, your current method of trying to get it fixed plainly isn't working.

If you start and end at the same place (i.e. nothing gets done), the only difference being that you've made people want to actively avoid you, is that a good result for you?


[deleted]


People actively avoiding me means I don't have to deal with people. This is a good thing, yes.

That's all I need to know - good work!


[deleted]


Not any longer :p


Yay! :P

(They have an auto-updater now, and I'm working on making it verify signatures in a future releases.)


I think they're doing it on purpose now. They've fixed it for themselves but are hiding said fix from you specifically. The cheeky bastards.


Damn. The clever bastards. Probably switched to Trac/SVN because they know I only bother with Git.


Did you know that Composer is open source?

One of the advantages of open source is that you have the freedom to take the code and fix issues that concern you.

You can Google "open source" for more information - that might be a more productive use of your time than waiting for people who owe you nothing to make changes that you want.


Yes I'm well aware of what open source is and how it works (if you notice I'm pretty active on github); I've already sent a PR to fix part of the problem before.

However, it makes very little sense for them to, for example, merge a PR if it uses my public key for verification and not theirs. The problem I'm running into is that I can't just fix the issue without their cooperation.

Regardless, this is all moot because the Composer maintainer is now communicating with me privately to discuss how to fix the problem so it will be solved some day soon.


Surely you could do the work with your public key and have them swap in theirs once it's ready to go?


That's the precisely avenue I'm pursuing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: