Tricky-to-reproduce problems are always the hardest. It sounds like you've made good headway in identifying temperature as an exacerbant.
Replacing the motherboard would be a sensible next step.
If you've exhausted all the traditional suggestions for troubleshooting (disassembling and reassembling all components, se-seating RAM, CPU, etc), try this:
Get a thermal imaging camera (if available) and a can of cold spray (sometimes referred to colloquially as "liquid nitrogen"). Cool sections of the boards at a time, and see if you can isolate which area causes the lockup (e.g. something near power-related IC's for USB?). The camera isn't critical, but might help you envision where temperature changes most rapidly and achieve better granularity as to which components you thermally stress in any given test.
Borrowing a PSU from a friend and repeating the cold test might also be enlightening.
If you treat a circuit as a 2D surface, yes. But the cold spray won't evenly change the temperature of that 1000uf electrolytic capacitor. It's a good start though.
Side note:
A vendor my company uses outright rejects bug reports that aren't consistently reproducible. Very annoying.
We waste a lot of time trying to find a pattern to the issue, but can't always do so.
I've worked with a product manager that would reject very legitimate bug reports just because they came from me. Coming from a developer's mindset, I would make very detailed bug reports just like I would dream to receive. The product manager told my direct manager that I was trying to show off and make his team look bad. So then it became me writing up the bug report, but my manager would put his name on it and the product manager complaining our department was out to get him.
I hate it when they do that. Seen it from the inside, and it's a blatant cop-out. When a company shows no interest in tracking down and correcting their botched work that's a big red flag for me to be on the lookout for another vendor.
Imagine if Boeing said "those plane crashes are intermittent, we won't work this problem until you've consistently reproduced it."
Replacing the motherboard would be a sensible next step.
If you've exhausted all the traditional suggestions for troubleshooting (disassembling and reassembling all components, se-seating RAM, CPU, etc), try this:
Get a thermal imaging camera (if available) and a can of cold spray (sometimes referred to colloquially as "liquid nitrogen"). Cool sections of the boards at a time, and see if you can isolate which area causes the lockup (e.g. something near power-related IC's for USB?). The camera isn't critical, but might help you envision where temperature changes most rapidly and achieve better granularity as to which components you thermally stress in any given test.
Borrowing a PSU from a friend and repeating the cold test might also be enlightening.
Good luck, and let us know how it turns out!