Hacker News new | past | comments | ask | show | jobs | submit login

REQ and REP have an evil, evil failure mode if they get out of sync, as I recall. The lockstep of request and reply is rather easy to break.



At JacobsParts we experienced this. Every few weeks our central API would just stop responding. We were using a threaded python zeromq server and the REP "sockets" would get stuck, even after the client was gone. Enough stuck sockets and all threads were hung. Now we use python gevent for an asynchronous server and we use ROUTER, DEALER instead of REP, REQ. Actually the clients can use either DEALER or REQ depending on if they are asynchronous or threaded. Anyway the results have been fantastic and stable for a year now. When there are many small requests the performance benefit of ZeroMQ is just amazing. Can hardly tell the difference between a remote call and a local function call!


+1 on sticking to DEALER/ROUTER on top of a green-thread system like gevent. This is how zerorpc works [1], and it is the backbone of the dotCloud platform.

Also note that most of the time PUB/SUB and PUSH/PULL are not a good idea either. The same results can be usually be achieved by returning a stream on top of ROUTER/DEALER (this is what zerorpc does). The performance gains of custom topologies are great in theory, but in a typical modern web or mobile stack, they are not worth the extra effort and lack of flexibility. The single best change we made to dotCloud's architecture was move away from custom topologies and stick to DEALER/ROUTER.

[1] http://github.com/dotcloud/zerorpc-python


I'm curious, did you have some sort of timeout mechanic for your REP sockets? I don't, but I've never had problems either because I don't have that much traffic.


AFAIK ZeroMQ doesn't support any kind of timeout on REP sockets. It could be hacked in with signals or a watchdog thread in a multiprocessing setup, but that's ugly. If you're going to all that trouble it seems much cleaner to just move to DEALER/ROUTER.


I do like the way REQ/REP keeps track of the requests and has RPC semantics, though. Instead of blocking, you could always poll and time out instead.


The "lazy pirate pattern" was added to the Guide a while back. It explains how to deal with this robustly.

http://zguide.zeromq.org/page:all#Client-Side-Reliability-La...


Are you referring to the need to use a timeout? http://lucumr.pocoo.org/2012/6/26/disconnects-are-good-for-y...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: