Hacker News new | past | comments | ask | show | jobs | submit login
How to Make Async Requests in PHP (segment.io)
34 points by calvinfo on Feb 4, 2013 | hide | past | favorite | 14 comments



Sorry if I am misunderstanding, but why not use Gearman (http://gearman.org/). We wrote a nice blog post on how we use Gearman for tracking API requests to Mixpanel http://blog.nodesocket.com/how-we-tracking-api-requests-with...).

Or, use React (http://reactphp.org/) a PHP async library.


Gearman is pretty solid once you have it working, but it can be a huge pain to get to that point. We spent probably two years (on-and-off, of course) tweaking code to get our framework talking nicely to gearman. PHP's gearman-manager library is a little... wonky, and we were seeing no shortage of bizarre APC interference.

Although in this case, I imagine the real problem is portability. Curl is available pretty much everywhere. exec (or more directly, pcntl/posix extensions) aren't in any out-of-the-box installation, and anything that needs to be further daemonized to get up and running (such as gearmand and gearman-manager) are even harder to use in a one-click solution.


First off, great article! If you can get away with it, using Gearman is definitely a more robust solution. We'll try and see if we can write adapters for dedicated job queues like it in the future.

We're mainly trying to make the setup process really simple - no need to start up extra job servers or worker computers. Ideally they can add in our small bits of code and then start tracking analytics data without much configuration.


Thanks. Gearman works out lovely for background tasks like emails, tracking, anything you want to fire and forget. Makes sense though that you can't have the dependency of Gearman for your public library.

Are you guys in San Francisco?


Yeah, we are! (calvin|friends)@segment.io if you want to get in touch.


Forking seems like a very expensive solution to the problem. Especially in php where exec() doesn't actually do what exec in unix does in general (contrary to plain exec which you'd use after a fork, PHP's exec still runs a shell).

If you use their API even just one time per request, be aware that they will fork, execute a shell, fork again and then execute curl. Imagine where this is going when you use multiple API calls per request.

As this is "just" about analytics, why not use a UDP packet or two? Sure - they might not get delivered, but is that really so bad for analytics? Sending out a UDP package is very fast and there will be no waiting going on at all.


Author here. You're right that if you end up dealing with many requests per second, it will end up being resource intensive. In that case, the file or even a dedicated queue are better options.

We actually queue calls to our API and send only a single request to our servers. Even if you make multiple API calls over the course of a request, there will still be one fork per request.

I'd actually wanted to include UDP, PHP Extensions, and persistent sockets as part of the article, they just ended being a little out of scope. We might try and support them in the future, UDP is definitely an interesting idea for analytics applications.

We may also end up writing a custom in-memory queue which uses a persistent socket. For a vanilla PHP install, these seemed like the best available options.


UDP is kinda the only game in town with in-app analytics (if you don't want to drop ZeroMQ all over the place); Metricfire's PHP (and other) client libraries have been on this for a while now - http://docs.metricfire.com/


Yes, UDP is a good solution for analytics. StatsD uses UDP.


DNS resolution also blocks, and PHP doesn't appear to have a plugin for async DNS resolution.


If you have a local DNS cache or at least a powerful DNS server close to you, this should really not be that much of a problem as the DNS server would be able to respond quickly from its local cache.

I doubt you'd lose much more than 20ms. That's still way better than forking tons and tons of curl processes and shells.


Happy to see you got PHP running, I imagine the next step is a Wordpress Plugin to handle it automatically.

I suspect that you'll frequently need to fallback to the socket approach. I imagine a lot of shared hosts block the capability to run exec (Godaddy, Hostgator, other common php hosts), and I believe that's where a lot of PHP lives.


True that, it's actually already built: https://segment.io/plugins/wordpress

We decided to inject Javascript calls and use analytics.js's client-side tracking, because it's a bit more friendly for WordPress people we think, and it keeps their personal blogs and smaller projects on the free plan.


curl_multi???!!!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: