Hacker News new | past | comments | ask | show | jobs | submit login

I work in an unrelated company. Even without new software pushes and even for stable code, we need maintenance all the time.

A few examples:

- security holes discovered in the OS requires an emergency update, which requires lots of testing because updating the OS has a bad tendency of breaking even stable software (I've encountered that issue a few weeks ago);

- security holes discovered in third-party libraries/frameworks/..., which requires updating, re-testing, re-releasing stuff – and fixing whatever breaks because of undocumented changes in said library/framework/...;

- security holes discovered in your own code, of course, which also requires updating, reviewing, re-testing, re-releasing;

- monitoring your stack for misbehaviour, which could indicate a software problem (bug? license expired? SSL key expired?), a hardware problem, a resource problem (disk full?), an attack, or in the case of Twitter, one of your clients (the companies that build ads) misusing your tools and attacking you by accident;

- ... and once the misbehavior is detected, actually investigating and fixing the issue.

Sure, you can hop along for a while without anybody to handle these cases. But how long? Keeping in mind that Twitter is a high-value target for state-sponsored attackers (among others) all over the world, so any weakness will be probed and exploited mercilessly.




I run an engineering department so I’m not speaking from ignorance here. 0 days and bugs making it into prod is a red herring that isn’t part of the discussion. We’re talking about a production platform outright failing because it’s a sinking ship being kept with duct tape.

If Twitter is failing that easily, these engineers deserve to be laid off


Well, I can only speak from experience. I've encountered all of the points above just during the last 3 weeks or so. I've seen critical infrastructures in former companies being taken down by an expired SSL key or a full hard drive or a power outage or a DDoS.

Does your engineering department have a solution to all these problems that does not require human beings? Or are we talking of different things?


I would be interested to hear which web apps with 100,000,000+ users would run fine for weeks/months without hiccups if 80% of the people running it left overnight.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: