Hacker News new | past | comments | ask | show | jobs | submit login
Errors for the Twilio Rest API impacting multiple services (twilio.com)
261 points by IG_Semmelweiss on Feb 17, 2023 | hide | past | favorite | 161 comments



The old "oops laid of the wrong people" gag. Never gets old!

I say that jokingly, but wow does it suck. I've been on both sides. I think it's actually worse to be not laid off and left picking up the pieces for incompetent management. One time I had to get admin rights and log into a "shadow IT" developers machine (who had been laid off) to get his Eclipse workspace directory and get the code and decipher what the state of his 5 apps were. Good times.


Not getting laid off, being handed someone else’s job on top of yours, and being told “just be glad you’re still here!” is terrible.

Sometimes getting laid off from a company that’s floundering (with personal finances in order enough to be able to handle it) can be a blessing.


I've been in a couple of companies that went under, and had held serial layoffs. (For example, when I started at Atari in 1982 there were perhaps 4,000 employees, and when I resigned about five years later there were less than 200).

Towards the end you start to wonder who the lucky ones really are. Not kidding.


I worked for a company for 3 years after half my team was laid off. In hindsight I clearly should've quit, but yeah the folks laid off and collecting that severance were definitely the lucky ones.


And those laid off last, get the least severance (if any).


Yeah it's all fun and games until you discover someone had automated a thing using a cron job on their own computer.


I loved when dependencies were manual jobs on local machines that some recently fired dev had been running for 3 years but IT had already wiped their machine.

Deserved to be fired for that one but man did it suck to deal with.


No they didn’t!

I worked at two orgs, and did worse or better.

One had a crappy backup policy, I replicated the prod database to my workstation and took snapshots twice a day. That saved us TWICE!

The other time and I believe I still have an account running in their systems, they wanted to force devs to use Python 2.4 because that is what was in the base system included. I built my own Python plus deps and rsynced it to all the prod machines.

When I left and they disabled my account they enabled it a couple hours later. It was reportedly still running years later.


Or some critical service is running under someone's personal credentials.


I know it likely depends on the company, but in general, does severance usually have a stipulation that if the company needs something from you in the next few months, you have to cooperate? If not, I'd either try to extract a lot of money to consult or depending on my personal situation tell them I'm not interested in helping; perhaps not so politely.


I wouldn't think so. What generally happens is they'll give a couple months to the exit date for handover with severance dependent on you making it to that date.


Does not. Probably not legal in the US, but IANAL.


Actually, given how companies manage the WARN act, I'd wager they actually do. Often they keep an employee on the books for the 90 days as an employee, just with no responsibilities. I would presume changing the responsibilities would be no different then being re-assigned to a different team.


Yeah, but they have lost all leverage. If employee refuses to assist, its not like the company can fire them or withdraw the severance. No sticks left and only carrots. For which a very big carrot would be appropriate.


You're assuming this wasn't done maliciously


The whole thing might have been held together by chewing gum and they fired the one guy who re-applies it every day.


Most things are.


occam's razor suggests that


Eng 1: Yeah we got backups, hit up Bob

Eng 2: Bob left in December. Who replaced Bob?

Eng 1: I don't know, his boss' email is coming back as invalid in outlook.

Eng 2: Fk it I'll just do it myself.

1 hour later

Eng 2: So backups stopped when Bob left in December. He was doing them manually due to an error in the automation. I'll try to restore from December.

5 hours later

Eng 2: backups are corrupt and won't work. Looks like we've never restore from backup to production

--------

Nation State Actor: WTB: Twilio access Recently Laid off Employee: I like money.


> Looks like we've never restore from backup to production

Do you often restore backups to production for testing purposes?


No wonder I don't receive any spammy robocalls or text messages today.


Finally something good to come out of the layoffs.


^ This. Twilio and Sendgrid are outright criminal in the way they handle their own fraudulent customers. Twilio makes money on every single one of those bogus communications and does nothing to proactively prevent that fraud because it would impact the bottom line. When your business is based on mostly fraud you need to find a new business.


Considering how many hoops I have to jump through to use their services (under the guise of anti-fraud/spam) I'd love to know how they get away with it


I think the scammers plan to get kicked off, they get one good campaign out and then get kicked off. Then show up as a new customer with a new name and new payment info, but upload the same list(s) and start spamming again, over and over in perpetuity.


Buying the Twilio account of a failed startup


My guess is extra off-the-books money.


We need more outages right before elections


Not to be pedantic, but if they executed a 11% layoff and later executed a 17% layoff of the remaining, that does NOT add up to 28%. That math is incorrect. Please fix the title.


That would be a 26.13% layoff.

The first layoff left you with 89%. Of that 89%, you lost another 83%. 0.89 * 0.83 = 0.7387.

1 - 0.7387 is how much you lost = 0.2613 = 26.13%.

It's probably clearer to reason about if you have two consecutive 50% layoffs. Clearly you didn't have a 100% layoff. Your workforce got cut in half twice, leaving you with 25% of what you had originally (or a 75% layoff).


To be pedantic, I think the phrase you want is: to be pedantic.


To be exceedingly pedantic, I think GP is using the phrase “not to be pedantic” as a magical incantation intended to make the pedentry following it not pedantic, the same way people say “no offense” before saying something offensive, or “not financial advice” on YouTube videos that give financial advice.


It’s more so that I don’t view being correct with math as being pedantic but I understand it can be perceived that way in this case. In other words, my intention isn’t to be pedantic.


Or putting on hazard lights means "I'm allowed to park illegally here".


If that's not what that means, why did BMW put these hazard lights on my car?


Oh sure, it is meant to increase visibility when you're car stalled is the middle of the road. I meant to refer to scenarios when illegally double parked on a side street, or in a handicap spot.


Yes precisely, it seems you are implying the yellow lights aren't primarily for allowing me to double park as I please?

Why else would BMW spend time and effort adding yellow lights to each corner of the vehicle?


Also why the high beams are there - to let people know you are behind them!


Yes, but an example of terrible BMW penny pinching: not hooking them up a relay such that the high beams rapidly flash themselves.


That costs an additional $9.99 a month. You know that.


At least that one is honest: you're being a hazard.


The percentage has no direct relevance to the linked content, and doesn't belong in the title at all, IMO.


if it's only off by 2%, it's a bit pedantic :D


It's not about being pedantic, if they got high school math wrong what else did they get wrong


that's still pedantic. not everyone is a math person


2%, or 2 percentage points? ;)


To be pedantic, the original 11% layoff was a lot longer than 6 hours ago, which the original headline noted. So the correct percentage is 17%.


Unless both percentages use to the same reference - eg. payroll from 2022-01-01. (no idea if that's the case here)


Being correct with mathematics isn't pedantic, it's just good sense.


And compulsively annotating where other people are being incorrect with mathematics isn't good sense, it's just being pedantic


I hemmed and hawed on whether or not I'd consider this pedantry, and you're probably right - it probably is.


26.1%


This is the best I could find:

“According to Twilio’s latest earnings release, the company had 8,992 employees as of September 30, 2022 and expected to lay off 816 employees for the 2022 round of layoffs. Based on these figures, around 1,400 people will be impacted by this year’s layoffs.”

So that’s 2216 total out of 8992, 24.6%

https://techcrunch.com/2023/02/13/twilio-cuts-17-of-its-work...


26.13%


not to be pedantic but there likely was additional attrition and hires in the interim, without exact head counts there's little way to properly state a single number so why pick on it?


I worked with a guy once that I wasn't a fan of. Gruff, incompetent, egotistical. Just a general ass.

He got laid off two weeks before we were set to do a huge annual update. I stormed into the interim CTO's office (the CTO got laid off too) and explained that laying this guy off was the dumbest idea ever and that it had put the entire project in jeopardy and no one had the knowledge or access to systems that this guy had, and that laying him off was going to cost the company millions of dollars because the release was going to fail.

I don't know what wound up happening at that release. Two weeks later I was working elsewhere; a few years later that company completely evaporated.


Often the grumpy people are that way because they're lives are filled with fixing important problems that no one else can be bothered to do.


or they're so undervalued because their systems never hiccup so they never get to be seen as the hero. They get laid off because "They work on this stuff that is not high priority" ...


He was grumpy because he was vastly overpaid, but snubbed for the CTO role, and knew his job was on the line.


If one guy leaving costs your company millions of dollars, you were screwed to begin with. What if he quit, or even just got sick for a few weeks?


Yep. Better fire him now, before its too late. /s


Very cool. I already started moving all my 2FA stuff out of Authy but guess it needs to be hastened now.


IMO it's good practice to not one's vital 2FA codes held hostage on a service where one's accounts or IP address can be flagged for spam or abusive behavior by automated systems. (This also applies to Google Authenticator, for what it's worth!) Especially in a world where customer service teams are being trimmed wherever possible!

I use 1Password - its UI leaves some things to be desired, and it's not cheap, but it has zero incentive to cancel the account of any paying customer!


Google Authenticator is completely offline isn't it? How are accounts / 2FA material at risk of lockout by using Google Authenticator?


I switched out of Google Authenticator when they updated the app and all my 2FA just went away.

They did fix it with another update, but that was a seriously un-fun few days. Luckily I was just logged in to my AWS account so I could disable the 2FA.


It is, and because of that people regularly lose their vault when switching phones and forgetting to transfer the data.


What alternative are you using?


I migrated to Aegis a few months ago and would recommend it.

Exporting the configuration was a bit tricky, but I found a guide on GitHub: https://gist.github.com/gboudreau/94bb0c11a6209c82418d01a59d...


Passkeys if the site supports them as a secure authenticator for 2FA/MFA (should pop up in your device if you try to setup a secure hardware authenticator on iphones and android) or TOTPs stored securely somewhere that isn't vendor locked.


I use https://github.com/tadfisher/pass-otp with pass, which has a FOSS client for desktop and smartphone (at least for andriod, no idea for ios)


It's built in to passforios

https://github.com/mssun/passforios


iCloud Keychain


Bitwarden


I don't really like the idea of TOTP and passwords in the same place. If it gets compromised someone has both your password and TOTP code.


I use Bitwarden for passwords only. I use Authy for the TOTP, although I should use something else.

I backup my TOTP seeds in KeepassXC. I also have an offline backup of my Bitwarden vault that is in a separate KeePassXC vault. I agree that I don't like the idea of one vault holding both bits of info.


Yep I agree so I use 1Password only for passwords and Bitwarden only for TOTP.


Is there an easy way to import from authy to bw? Or do I have to do one at a time.


The officially supported method is to login to each 2FA account and delete the old 2FA and generate a new one that you make sure to record separately. Authy really locks you in.

There are some unsupported methods a Google search away. I have had luck with this one in the past but haven't used in over a year, so can't vouch for it still working.

https://gist.github.com/gboudreau/94bb0c11a6209c82418d01a59d...


Sorry I'm not sure.


I had no idea authy was owned by twilio. Well I know what I'm doing this weekend


how does this impact Authy?


Twilio owns Authy but regardless it doesn't. Authy is mostly used client-side with a server-side backup option.


ah gotcha, i didn't realize that


Authy appears to be up, but I guess if it were down, the only issue would be trying to setup a new device and have it sync over the 2FA codes? My understanding is after that initial sync, the TOTP it displays should all work offline.


Authy is owned by Twilio.


Twilio owns Authy. I guess they’re assuming future issues there or with syncing.



Did you come up with the headline “Twilio Flex outage now 6 hours after laying off ~26% staff”? If so, could you provide any factual insights that back the implicit claim how this particular outage is the direct cause of the recent layoffs?

Edit/Update: the title of the submission got changed.


Honestly not looking forward to a return to understaffed tech teams fighting on-call fires 70% of the time. I'm not entirely certain how much of the recent stability in larger firms was simply a result of having more people.


I also wonder if the service industry labor shortage was somehow related to the tech hiring boom, if indirectly.


The title is heavily editorialized and is essentially clickbait. It's fine to speculate about things, but at first glance, the title looks like someone wrote a piece that might try to tie the outage to actual facts about employees who departed and how they might be relevant to the outage.

This post is nothing of the sort, seems unlikely to provoke thoughtful discussion.

Edit: title has been updated since I wrote this comment.


The context about layoffs is important! Users depend on not just reliable services but predictable management.

Here’s a related HN post describing a hiring effort during the layoffs https://news.ycombinator.com/item?id=34804077

“For example, Twilio 2 days ago announced they were laying off 1000+ people, and here they are opening a staff software engineering role a day later.”

That’s just more evidence of significant incompetence at the leadership level that needs to be weighed when considering a service offering.


Inflating the layoff percentage doesn't serve anything except to sensationalize an already editorialized headline. The number is 17%, not 28%.


the article states they laid off 11% , less than 5 months ago.

That's 26.1%

Anecdata, but in several years of being a Twilio customer, I have never seen such a long outage.

EDIT: fixed % of %.


>the article states they laid off 11% , less than 5 months ago.

Unless the 11% and 17% are both talking about the original number of employees, the percentages don't simply add up like that. If it was 11% of the original number of staff then 17% of the leftovers, it'd be 26%:

.11+.17*.89=0.26

Still, that's not a huge difference so I don't know why people are getting up in arms over it.


Sorry, but you cannot just add percentages together. If you lay off 11% from X and then lay off 17% from what you have remaining, the total amount laid off is not 28%. It does not quite work like that.


The original headline said 6 hours ago. 5 months ago is quite a bit longer than 6 hours ago, last I checked.


The 6 hours was referring to the length of the outage, not when the layoff was. Even the most recent layoff was 2 days ago, not 6 hours ago.


If you ate 75% of a pie, then ate 75% of that remaining piece of pie, how much pie would you have left?


0% of the pie would be left. If I'm that far into a pie I'm not leaving a small sliver.


I'd have a tummy ache. :(


That's only 28% if the 17% was calculated as a percentage of their pre-first-layoff headcount.


Well, it's 26%, but close enough.


Maybe they hired some more people between the two layoffs.


11% were laid off in Sept.


So is that total based on staff numbers from last year, or is the new rounds layoff percentage based on current staffing?

It seems we are at the stage of layoffs were we can argue about how you measure the percentage sacked..


The original headline said 6 hours ago. September wasn't 6 hours ago last I checked.


General layoffs are just never a good idea unless as the very last resort. I've personally seen the demoralizing effect it creates, triggering more people to leave the same year. For highly technical teams, you can never really replace the institutional knowledge you're losing.


I thought the term is “tribal knowledge” and yes it will cost companies ALOT if they fire incorrect ppl.

Often times you have 2-3 ppl out of 6-8 that carry the cart while 3-4 are juniors or diversity hires.

Firing those 3 will have devastating consequences, firing the other 4 wont make a dent.

Firing based on excel is the most stupid thing you can do as a manager.


I think "institutional knowledge" is more common.


My old employer used twilio extensively for allowing the employees of customers to do automated clockouts/clockins. They seemed like a dumpster fire of a company to work with but that they had telephony figured out made them hard to dislodge from the market.



Clearly they should layoff the people responsible for the previous layoffs. They do not seem to know what they are doing.


"Those responsible for sacking the people who have just been sacked, have been sacked."



(insert spiderman pointing at spiderman meme)


In retrospect, could layoffs be less drastic if the company moved to a lower cost of living area?

Vs placing an adamant bet on paying SF rent, SF salaries and urging others to do the same? https://www.sfgate.com/local/article/Twilio-CEO-wants-tech-c...


One credit due to Twilio is very friendly to remote besides perhaps the pay tiers. But everyone's doing that. They never mandated return to office, so that's in their favor too.

iirc the context of the article you're referring to is more of Tesla, Oracle moving out of California


It says 17% in the article. This is obviously a massive number, but just wondering where the 28% number is coming from? https://www.cnbc.com/2023/02/13/twilio-layoffs-1500-employee...


It's from someone doing incorrect percentage math. Twilio laid off 11% in September, then another 17% this week. You can't just add percentages; the actual total percentage from pre-September levels is around 26%.

Regardless, though, it seems to me a bit disingenuous to suggest that an outage now is correlated with a layoff from 5 months ago, let alone causal. Even with the more-recent layoff, Twilio is still well above its 2020 employee count, which should be more than sufficient to keep things running (and then some).

(Disclosure: former long-tenured Twilio employee; resigned a year ago.)


It’s unknowable to an outsider, and I agree the claim that the outage is related to the layoffs should not be made.

That being said, a 5 month delay is not at all surprising in the time between critical people were laid off and major issues arose.

Most systems are designed to run on their own (developers like to be able to go on holiday). The real problems arise when those systems are changed, and/or upgraded.


Given that we're talking about Flex, I guarantee you that those systems are changed and upgraded way way way more often than every five months. (I would be surprised if they went an entire month without any significant deployments, let alone five.)

Regardless, I'm glad the title has been changed, as it was just unnecessary editorializing.


Well I think the issue is changes compounding over time.

Any engineer at the company can add a small extra feature here and there to a system where the expert is gone.

But the issue is when that expert was maintaining a vision over their part of the platform, and keeping all of the smaller changes in-line with proper architecture for that system (knowing why we don’t do x, etc). The expert is able to push back against features that may cause problems, or suggest better ways to solve the goal. Laying off that expert will mean the things that they were protecting in the past are no longer protected. So the newbies comes in and add some change that creates an n+1 query, which might not be a big deal. But then they change some other functionality later that makes it n^2+1. And by not understanding the system, their changes compound over time to bring major issues to the system.

But small changes are fine. That’s why it takes so long for quality to suffer when you remove the experts.

And you don’t always know who the experts are either, especially as company leadership. It’s not always the person with a big title. Managers may know better, depending on how technical they are.


> The announcement came after the company already laid off around 11% of its workforce as part of a restructuring plan in September.


If you lay off 11% then lay of 17% of what's left that's only a total layoff of 26% of the original number, but yeah.


Regardless, the editorialized headline is inaccurate. If 11% happened months ago, that should not be included with the "6 hours after" bit.


Ha ha, found the nerd.

(I was thinking the same though.)


Have you found it hard to find nerds on HN before this?


I believe it's the latest round plus the previous from a few months ago?


They did two layoffs in a row, one of 20-ish first and then another one, so that must be the total number I imagine.


They fired 11% late last year already


probably just adding up 11% layoff last september (although the % maybe on different numbers)


I used to remember doing a thing where people would be taken from a team and sent to another for a while. During that time no contact with the original team was supposed to happen. Was supposed to help deal with some of the bus factor kinds of problems. Anyone still do that?

Harder when so many are let go though.


Sounds like some kind of "forced" rotation program, although the no-contact thing is kind of weird. Rotation programs can be useful for roles that aren't super-specialized, and also motivates people to write better documentation and automation tools.


The no contact part was to force the rest of the team to deal with whatever came up, just as they would if the person had left or was on holiday.


During the pandemic the real economy learned that lean supply chains are more efficient but also present a massive liability. That they needed to accept slightly higher costs in exchange for greater robustness.

And then the entire tech world promptly ignored that lesson the very next year.


Does this mean that the staff laid off were doing a good job or a bad job?

Building a product that falls over within hours of reducing the head count doesnt suggest they did a very good job, but they were keeping a fragile system up all this time ...


It could be literally anything from a failed health check and failed auto mitigation, to a routine failure that no one actually knows who to contact about anymore, to sabotage by a laid off employee.

It’s not really worth speculating about fired employees performance, but there’s probably internal chaos after yet another layoff round.


They were good at making themselves indispensable, until they were dispensed.


That depends on the reason. For example, did someone cripple a system on their way out the door to cause the outage? Is this some retaliation? If so it doesn't speak to the quality of what was built. I only use this example to speak to the wide array of reasons that could cause this. Without knowing the reason it's hard to assess anything.


Any large service consisting only of integrating 3rd party providers with 3rd party consumers is fragile.


Alternative hypothesis: someone had a backdoor and didn't take their layoff well.


Alternative alternative hypothesis: someone got laid off and told to work out the rest of their shift, and used their frontdoor access to take a massive dump in the boses desk drawer.


This is just pure speculation - we don't actually know why it went down.


International company, 500 staff in three countries, $30 million last year. I'm the only IT person, and I have secure servers in my garage.


I'm glad I migrated our 2FA from Twilio to Vonage some weeks ago after an awful experience with toll fraud (more info here: https://yousefamar.com/memo/log/2023-01-29-00-05-40/)


They fired the one guy who does nightly reboots nobody knows about but are critical for freeing resources nobody else bothered to figure out how to.


This is way worse than the Twitter layoffs. Twitter engineering built a system that was robust to a 50% layoff. Twilio -- not so much.


Was it really robust though? Twitter has been plagued by random outages or service degradations since the layoffs. Not all of those can be attributed to changes in code or deployments either.

If you're solely judging that based off complete downtime, I guess, but I think it's misleading to claim there's been no impact from those layoffs that affected service availability.


Most of Twilio is working. The status page shows must things in green. The same could be said of Twitter. However, in both cases, there have been serious issues. The difference is Twilio is paid to keep these features running, whereas with Twitter, less so. Even then, people were losing their job at Twitter just a few days ago because they couldn't explain why their CEO's tweets weren't popular. There have been numerous reports of Twitter failing at properly handling CSAM despite claims by their CEO. There were reports of massive issues with regards to getting service for advertisers.

Robust? No. I mean, if the only thing you look at is your feed, maybe? But that's not Twitter.


> robust to a 50% layoff

That's an interesting metric worth considering.


It really is, but it’s also a little crazy that a company could lose 50% of its staff and keep working.


But companies like twitter, until it was taken private at least, heavily depend on growth and adapting to emerging trends. You might keep the platform 'running' as is, but not more than that. I don't disagree that companies have people not pulling their weight but when there are large layoffs, the future of a company is, as the kids say, 'sus'. (Did I use that right?)


Did Twitter heavily depend on growth? They seemed to flat line for an awfully long time.


I guess not growth, but the potential for growth - as an investor, it's what I look for. If growth is already happening, then it's likely already priced in. If growth is not expected to happen, then unless there's a generous dividend, there is no reason to hold a stock. But if as an investor, you thought Twitter (when public) was primed for growth, it might have made a good bet as an investment. As it stands, I don't see it having any potential for growth, but again, the Twitter example is moot, as it's not public.


I mean, I'm glad it's working for you, but things have been breaking since the layoffs.


Touché. I never used it much and find Musk repellent. As you say, ‘working’ is used loosely.


Objectively, it really isn't.


This will likely cause more layoffs in the future.


[flagged]


What from? Please elaborate.


Coincidinky




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: