The weight of that single word "tricky" in tricky test regression failures is very high!
I can't imagine the obscure craziness you'll be encountering when working on a codebase that's easily over 25 years old, supporting billions and billions of devices.
Yes, although having never been at Microsoft (and thus just speculating) I wouldn't underestimate the managerial effect of siloing either.
There are many tricky problems in major companies that are actually relatively easy with say a month of developer time, but any single team affected by it can't justify that over a workaround that takes a week to implement.
Then nobody adds up cost of 10 different teams needing to each repeat that twice.
I can't imagine the obscure craziness you'll be encountering when working on a codebase that's easily over 25 years old, supporting billions and billions of devices.