Hacker News new | past | comments | ask | show | jobs | submit login

Well yes, you still need monitoring to be able to tell that the failure is external, but the key idea is that each team can do it separately, without having to get everyone's buy-in, or instrumenting every call in the system.

For example, we've had a batch processing system, with pretty relaxed latency requirements, and at some point we were asked to integrate with (internal) service X. The problem is, service X which would go down periodically. The solution was pretty simple: a simple centralized error logging service we already had, some asserts on results, and timeout on all HTTP calls. This works very well, for us at least. The service X still goes down every once in a while, but we can always detect that and explain to our (internal) customers that it is not our fault the system is down. Our customers were the ones who selected service X in the first place, so they are pretty understanding.

Is it a desirable situation to be in? Nope, in the ideal case, someone would go to team behind service X and help them to make service X reliable, with proactive monitoring, good practices, more staffing, etc... But I work are in the big org, and each team has its own budget, management and priorities. So the microservices approach is the best we can do to still get the work done under such conditions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: