The fact any modern computer chip works reliability is a pure miracle. The process variations are extreme, and you often end up with a lot of B-level engineers/technicians keeping things going. Having some experience in the semiconductor industry, it oftentimes felt like a lot of bubble gum and bailing wire was used to get the product out the door. Hats off to all the people keeping these systems alive and functioning.
I've worked on the construction of a large dam and I think that's the case for most modern technology. Obviously once a dam is up and running it's really stable compared to what goes on in a fab, but getting it built? That's a whole other story.
The math and engineering behind dam construction are well understood but actually getting them built in practice is a years long story of yak shaving, cat herding, and trying to overcome every little piece of BS nature has to throw at the project. Unique and predictable geological conditions, unknown underground water sources, unexpected soil composition changes, surprise fault lines. Then there's the logistical nightmare of actually moving all the equipment in and earth out, the weather and environmental factors that impede every action, humans ignoring safety altogether, and so on and on. All of this implemented by workers on the ground who just barely know what's going on (for no fault of their own).
I'm continually surprised that anything complicated ever gets built at all.
There must the a name for this phenomenon. Like, the more you know, the less faith you have in it actually working. I'm pretty sure everyone feels this way about their work. I just asked my partner if it's the same with her (non-tech) job and she said yes, she can't believe it works at all. Arthur C. Clarke's Travel by Wire comes to mind too.
I would say it doesn't just work. That's why being a programmer isn't a one-off job. You're still there to glue things back together when they inevitably break unexpectedly.
It's more like disregarding all of the advantages and only focusing on the negatives or when incidents happen.
People routinely clown on companies for downtime but do not celebrate sending multiple MB pictures and videos over cell networks in remote locations from a super computer in their pocket.
Even 95% reliability is relatively good for networks working across the globe relative to what we've had through most of history.
The average person easily plays into the trope that no one appreciates IT when it works, but readily has opinions when there are problems.
The funny thing is, anecdotally, I have never had a CPU fail on me. Memory, motherboards, PCIe cards, PSUs, hard drives, monitors, keyboards, mouse have failed but I have yet to loose a CPU or an SSD.
I’m at a point where I no longer call these things chips. I don’t know what they are, they have pins but to me the precision and machinery used is at the very least on a scale of atomic design. It should not be possible for automated machines to fabricate these things. Yet here we are.
AMD just delayed Ryzen 9000 by one and two weeks because of production issues. Including recalling samples sent to reviewers and already at stores.
We appreciate the excitement around Ryzen 9000 series processors. During final checks, we found the initial production units that were shipped to our channel partners did not meet our full quality expectations. Out of an abundance of caution and to maintain the highest quality experiences for every Ryzen user, we are working with our channel partners to replace the initial production units with fresh units.
As a result, there will be a short delay in retail availability. The Ryzen 7 9700X and Ryzen 5 9600X processors will now go on sale on August 8th and the Ryzen 9 9950X and Ryzen 9 9900X processors will go on sale on August 15th. We pride ourselves in providing a high-quality experience for every Ryzen user, and we look forward to our fans having a great experience with the new Ryzen 9000 series.
Jack Huynh, AMD SVP and GM of Computing and Graphics
That, or they're calling Intel's benchmark before microcode update bluff, and in either case they're upfront about the delay instead of clamming up for half a year while unrest mounts.
I'd wager it's not that easy to separate these two, especially when voltage is dynamically regulated.
If it's a Zeiss lens imperfection, a broken ASML machine, water or silicon impurity, etc, what matters to the customer is the final product you're buying.
The industry has always been kind of monolithic. I'm not sure you can count qualcomm as a competitor just yet. competing means being in the same market. Laptop, desktop and server have all been categories that havent seen anything other than x86 for a long time.
Is this Intel's marketing department doing damage control?
Intel's been a disgrace, and that's why these chips suck. Imagine having all of the money and hiring everyone in sight and pulling all sorts of self-serving, dirty tricks industry-wide yet still letting an underdog on the verge of bankruptcy (AMD) beat you to 7 nm and eat your lunch.
I had a 13900K blue screen at random for almost a year. Anything that used more than a certain amount of cores would either crash or blue screen, which was irritating considering I built the machine in part so I could do simulations and renders in Blender. I was also unable to use dual channel RAM. It just wouldn't POST no matter what settings I used.
I went through 3 motherboards hoping it was that, and not the insanely expensive CPU that would be a pain to RMA. But as it turns out, the RMA process was very quick and painless once I provided my troubleshooting history. But due to it being my main computer that I make my money with, I had to buy another processor to fill the 2 week gap between sending the old one and receiving the new one.
I probably spend $1500 or more on this problem, and by the sounds of it, my troubles might not be over.
Similar story here. 13700KF, no OC, had the machine put together 18 months ago at a respectable shop (Central Computers for those in the bay). I had regular stability issues and blue screens to the point I took it back to the shop after three months - they couldn't find anything wrong. Took it in again at nine months with increasing frustration - nothing. Probably should have taken it again three months ago when I was playing the new Helldivers and that would crash the machine every other time I joined a game, but by that point I was pretty exhausted by the whole thing (and the game had other, unrelated bugs that had me looking elsewhere).
Odds are good I won't be bothering with Intel hardware or recommending it to others for the next decade or more - this has been a flatly terrible experience (and I'll note the irony of finally deciding to spend on a "dream build" and this being the result).
Exact same story here, spent 4-5k on my build with a 13700k which has blue screened hundreds of times in video games (R6, Hogwarts Legacy, Cyberpunk) over the last year (to the point that I don't even play competitive tournaments now).
I did all sorts, switching from Windows 11 to 10, buying new memory (twice!!), countless days debugging, updating my bios, etc.
I'm relieved to finally have found the cause, but my goodwill for intel is burnt.
Do you know what the general fix is, is it just a BIOS update?
I have the same chip but have never had issues, but I also noticed it's thermal properties were a bit of a mess when I ran some benchmarks after putting the build together, so I slightly undervolted it in BIOS. Performance wise I never noticed any difference but it stays nice and cool.
Not that this excuses anything, or is even a real fix for the issue , who knows. I wish I'd gotten a thread ripper instead and will be getting an AMD when I build a new system again.
I began to suspect that it was the processor after I started doing Blender renders and either Blender would crash 100% of the time or I would get a BSoD, which I though was basically impossible on modern computers. The real sign was that dual channel RAM didn't work, but I refused to believe it was the processor. It was so expensive, I didn't want to entertain the idea that I would have to buy another.
A "solution" I had was to lower the amount of cores Blender could use to 12, instead of all of them, which was annoying, and that still only lowered the number of crashes, not stopped them. These crashes were bad too, in that they corrupted project files almost every time.
I basically did everything possible, with little success. All it did was make my computer slow to a crawl and still crash at random.
I don't know what I'm going to do next computer. AMD is seemingly having similar problems, so it's not like I can realistically switch with any confidence. I go close to 10 years between upgrades, so hopefully the landscape will look better then.
Unless you explicitly set your clock speed/voltage, it's overclocking, I can almost guarantee it. There has been extreme carelessness from MOBO manufacturers on top of Intel's problems.
Whenever I see a post like this, I wonder why the poster didn't just return the non-working part to the vendor they bought it from. This might be my US bias, but very few vendors put it on the customer to prove a certain level of testing and diagnostic activities; if the customer says it didn't work, the customer gets their money back. If you're not sure if it's the motherboard or the CPU or... return it all, and try again. No?
I may have been able to return it to Newegg, but return shipping is hit or miss as to whether you have to pay for it and I was honestly so frustrated with the whole fiasco that I just wanted to go directly to the source for a replacement that I knew(??) would work. Getting a replacement from Newegg might have just been another from the same lot.
As for having to prove I did ample troubleshooting, the last thing I wanted was to mail it in and have it returned because they couldn't find anything wrong with it, which has happened to me with other things in the past.
This is the reason why chip manufacturing is not just about buying an EUV machine and then starting to pump out chips to make it rain, as many people believe.
This is craftsmanship. John Doe can use Python to create software, and so can you, yet John Doe's code runs better and faster while your code crashes all the time.
People seem to forget that a craftsman's ability to use tools is a big factor in the final product.
Many people are saying it's all about the Zeiss machines and TSMC isn't actually bringing much over that. Usually as an argument about EU tech competitiveness.
Both. There's a huge difference between designing a functional chip and designing a chip with high yield and long term reliability. Companies and individuals vary wildly in their processes and understanding.
It's becoming a wide spread problem in the industry. For decades technology-minded folks have been told to go into software development because that's where the money is. The field of custom IC design has been both short on applicants and has had huge barriers to entry. As a result, the experienced engineers are retiring and there aren't enough juniors coming up to replace them. Skills and institutional knowledge are being lost.
There is a lot of focus right now on tools such as AI to allow junior people to produce things at an expert level based on encoding that expert knowledge. It would be great if it worked, but so far the results aren't there.
Source: I develop EDA software for custom IC design.
Why there are so many errors with Intel cpus lately? CPUs used to be the most reliable parts of a computer. Are there too many "moving"variables to extensively test everything? Is it no longer possible to grab the speed crown without cutting corners?
They had problems in the move to new fab sizes and are scrambling to show good numbers vs. AMD and their own previous generations so they can sell new CPUs.
Intel's been asleep for a decade. Anyone still remembers how they charged outrageous prices for 1-3% improvements every year during AMD's Bulldozer years? They did nothing in R&D and just kept printing money.
Yeah this is the result of that complacency. Ever since the first Ryzen launched Intel's become more and more irrelevant. If either AMD manages to capture the low TDP and embedded market or ARM improves in compatibility they'll become largely the last choice for anything. They're so damn lucky AMD is fantastic at shooting themselves in the foot.
Ryzen and Apple M1 were the best things that happened to CPUs. On one hand I got a super efficient CPU in the M1 and on other hand, I can get a very performant 16 core CPU (Ryzen 7950x) without shelling out insane money for server/workstation CPUs.
Also they seem to have lost a huge amount of senior technical people… I can’t see who’s at the helm to guide the technical choices. They have limited institutional experience now (probably). I don’t think things will improve…
Intel has had years of fab problems for new nodes. This might just be another issue with the nodes not being up to the necessary quality to produce high performance chips.
The tricky thing here is that as this mostly affects unlocked CPUs it is going to be hard to prove when the fault is from this algorithm vs user/motherboard manufacturer overlocking. Unless there is any internal monitoring with fuses blown. Is there?
As part of the bathtub reliability curve its usual for a large fraction of failures early in life, how much over the usual failure curve are we?
It's still unclear what fraction of CPUs are impacted for both issues. Was oxidation a single fab just for a month and only 5% of produced CPUs? Is the microcode issue in TB 3.0 or TVB, so would only impact the 1[34]900s?
It's also unclear if once degraded it can still reliably work at say 95% peak frequency. In the case of a partial recall it might be worth a discount option if that is the case.
Anyway it's mostly speculation beyond Intel's post on their forum (+Reddit responses), it will be interested to see the next stages which will hopefully clarify some of these. This is just a discussion forum I'm sure the final detailed announcement will the made via their main communication channels.
The PR is so bad that Intel is going to have to take responsibility regardless. And updating the microcode kind of "proves" that the old microcode was defective in some way.
Sort of a weird question but who's using Intel anymore? Most tech people are using Macs on the laptop side, AWS is going Graviton first, RaspberryPi/small form factor is ARM, and then of course there's all the higher level abstractions that are common where you can't even know what processor is under the hood.
The surveys also shows that Linux is more or less equally popular with MacOS among techies. As nice as M1/M2 Macbooks are, I probably wouldn't pay Apple tax just to run Ubuntu on it.
Personally I use whatever boring laptop my company offers, and so far that's been Intel. As long as it runs Linux I don't care what's under the hood.
Most everyone in the personal computer space still. E.g. 23Q4 Intel held 78% of all desktop and laptop CPU sales (include ARM MacBooks and Chromebooks)
>Intel confirms oxidation and excessive voltage in 13th and 14th Gen CPUs
Title is ambiguous, it seems to imply 14th gen CPUs have oxidation issues, the video explains that the oxidation issues are only on an early run of 13th gen. If you own a 14th gen intel CPU you should be safe after applying the upcoming microcode update.
You’re not reading it wrong, that’s really what they’re asserting. GN is saying that it is an active problem resulting in the failures.
GN is sometimes really good and sometimes really bad. They do tons of great data-driven benchmarking and research like this, and then jump to bizarre conclusions at the end, or make bizarre proclamations like “long term value doesn’t exist because that’s not how money works”.
For a while back in 2019 they were declaring the death of 6-core CPUs because of 0.1% lows, and their supporting evidence basically was a couple obviously-broken games like BFV and Far Cry 5 where a 2C4T pentium was outscoring their 6C processor by a factor of 2 in the 0.1% lows… and they didn’t see a problem even when it was raised etc.
Another example was Steve insisting last december that “nvidia was leaving the gaming industry” because of “recent” quotes from Jensen about nvidia being an AI company now… and the citation literally on screen behind him dated the quotation to the mid-2010s. Raised it to him, doesn’t care. He got his clicks.
Like, as the gaming community prepares for the launch of the 5090 sometime this fall or early next year… how’s that working out?
It’s not uncommon for steve to do this. Like there’s certainly an issue here, and oxidation is an issue that existed in the past. Steve thinks therefore that it must be the issue for all these chips, even though intel says that’s basically a small problem that was fixed over a year ago. Steve disagrees, on the basis of evidence such as:
(This space left intentionally blank)
Like people need to get it through their heads that reviewers are humans too, with their own foibles and weaknesses. Some articles he does incredible investigative work. Other ones he’s arguing that we need to abandon 6C processors in favor of (checks notes) stock 2C pentiums because of frametimes, or that spending more can never deliver long term value if it lasts better because “that’s not how money works” (exasperated Steve face). It all depends whether he’s being data driven or whether he’s engineering a support for an argument on a given day - he sucks way more when he’s working backwards to fit data to conclusion.
(ps: my other favorite thing about that chart is the frametimes getting worse as processor frequency increases. The 8600k and 9600k are literally the exact same silicon, same cache, same core count. So a pair of processors both lost frametime consistency as they were overclocked… and actually the faster overclock lost more frametime consistency than the slower overclock. It’s literally a straight correlation between “higher clocks = worse frametimes” even on the exact same chip models, which obviously indicates an engine problem, along with the previous point about pentiums being faster than a processor with 3x the cores and 50% more threads at double the frequency on the same architecture. Like it’s just a facially absurd result to stand as the primary peg in the “6c are dead now!” argument. And the rest of the games weren’t much better - BFV was not in a healthy state in 2018-2019, for example, it’s not really indicative of anything except the state of frostbite engine in 2018.)
At least wait until reviews after the August microcode patch. No telling if it actually fixes the problem, or what the performance impact is. On the other hand I know someone with a Ryzen Legion Pro laptop that seems pretty solid.
Im not sure If i want to spend good money on a laptop that would need a bios update to under volt the cpu right out of the box. TBH I'm not even sure if the mobile processors have the same issues as the desktop K series SKU's, but its not worth the long term anxiety i guess.
Unfortunately the Ryzen options are pretty limited where I live. The 7945HX options are either out of stock or out of my budget. So Im forced to use choose between a 7745HS or a 8845HS.
Does anyone know the list of CPUs affected. I have been running a 13700k for more than 9 months and I have not had a single blue screen so far. Has Intel released an official list of SKUs? Thanks
Sometimes I think if you need 128GB of RAM for your application then maybe it would make sense to build a tower/rack server and remote into that with your laptop.
Often, but not always, big RAM requirements like that also involve big CPU or disk requirements that would be impractical in a laptop form factor.
Is 128gb so much? My last workstation (not a laptop) had a 5950x and 128gb of DDR4 and that RAM is like 4 years old now.
I am not saying you will have it in an ultrabook form factor, but I had 32gb of ram in sandy bridge laptop (intel core i7 2700 ?) and 64 not that much later. That was like 10+ years ago. Those were desktop replacement and thicker style laptops, but not exactly unreasonable equipment.
I use a lenovo laptop, a t15g2 because of 4x dimm slots that let me fit 128gb of ram, and yes, I tend to use that even under linux. I tend to do work and personal business across this device, which includes full separate firefox profiles for each, usually 4-5, and it's not uncommon to see it or chrome/chromium actually using 60-70gb, not just reserved, of ram alone each connected to gapps, teams/orfice365, whatever needed for work.
Throw in almost always using a windoze vm for windoze things for another 8gb, at times other vm's using 4-8gb doing various things for testing, libreoffice with 20 complex spreadsheets, steam games, normal use really, all adds up to a lot of at minimum reserved memory far exceeding physical, at times things ballooning within that, and thus at times getting cranky to still oom occasionally.
I rather wonder who actually uses boxes with only 8gb of ram some still ship with.
I do stuff that uses all my RAM. New workstation has 192GB already had builds that pushed it to 180gb used. I am not alone.
I am not saying everyone needs that much, just that the bar for what warrants a dedicated server rack keeps going up and 128gb in a thick laptop seems practical.
Windows 10 (and 11) works fine with 8 GB. You may be using applications that need more. The minimum requirement for 64-bit Windows 10 is only 2 GB, and even 4 GB can be workable depending on the application.
People have tested the 96GB DRR5 SODIMM kit and reported it to work with framework AMD laptop (framework lets you pick none for RAM, so you won't be paying RAM you won't use). Also as far as I know no one makes DDR5 SODIMM's with >48GB per stick.
It would probably be much cheaper to buy a Framework without RAM and then install 96GB yourself anyway, even if they did offer a 96GB option. It looks like they charge roughly the same amount for 2x32GB as it costs to buy 2x48GB DDR5 SODIMMs at retail.
ugh, I hate to be this guy, but could you possibly use off-device resources?
There’s a tonne of benefits to that approach, cooling, battery life and performance among them- and the ability to run tasks while your laptop is asleep.
I understand not wanting to own multiple devices, or cope with the power draw of an always on machine- but if you truly need such resources you will end up better off (even in the short term) with a dual system setup.
Jetbrains Gateway and VSCode remote render glyphs locally and I doubt you would be able to notice the latency on execution (ie: running your program and compiling it).
For ssh there’s mosh.
I’d buy “having to transfer files” more than latency, realistically.
I dumped my intel shares just now, stock price has been sinking for the last week over this news but I'm predicting a recall or a big lawsuit will make the news from bad to worse.
Not sure if this is being downvoted because it's incorrect or bad investment advice, but I am interested to know if or how INTC stock price will react to this.
Well if it does react it won't be positively! Intel has been hinging their future on becoming a general purpose fab (like a Western TSMC) and the CEO even said he was open to making chips for AMD and Nvidia. But if their manufacturing process has major defects and contamination then they might as well scrap all of that. They're not coming off as trustworthy here either so who is going to want to have their products made by them?
There is no indication it’s a broad manufacturing problem. It was contamination on Intel 7 that happened in 2023 and was resolved, and is not responsible for the instability issues in the majority.
I think the real issue with the manufacturing problem is that Intel never said in 2023 "hey everyone, we had a manufacturing problem that affects all intel 13th gen cpus up until day X. We are offering RMAs for all affected units". As such, over a year later, no one outside of Intel knows exactly whether their chips are having instability due to this.
No tech company is going to announce every single wafer that’s ever been etched and reworked or has some random process problem that causes a discard or a batch failure. It happens constantly, 10-20% of chips being rejects or salvages is a good outcome.
obviously that’s not the case here, but as a rule yeah, your cpu could have been reworked or something and it’s not normal for any brand to tell you that.
If wallstreetbets has taught me anything it's that it'll probably go up 200% as soon as you sell against all logic, because it was priced in all along.
>stock price has been sinking for the last week over this news
What? their price has been around 30 for months and it recently went up to 35 and now went back to 32~ although today many companies went down for significant %
Intel is looking really skeevy on this. Won't them lying and downlplaying bite them harder? How is this level of misrepresenting and covering up not criminal?
EDIT - Why the downvotes? I seriously don't understand why they are doing it this way.
They have actively blamed other companies, like NVidia (not that I like them either, but they simply didn't cause this issue).
There are multiple teams claiming denied RMAs during the period Intel knew about the oxidation.
They didn't announce this until multiple outlets started talking about it.
They did 2 different announcements with different explanations and the more mild one first.
Their statements are contradictory at least in part.
This just isn't how people acting in good faith behave. Using intel's past behavior as an example they have previously handled problems. With spectre they were way more transparent and just published findinds on problems without hiding them for years.
They had to reduce safety margins to match AMD's performance. Now months later they still haven't completely debugged the problem so they are stalling.
This is the downside of the "only the paranoid survive" attitude.
Anyone steeped in the industry deeply enough to be shopping for a fab saw straight through the "overclocking to appear competitive" strategy from day one and knew this was a likely result. Intel's desperate marketing didn't help them (much) then, but the flip side is it didn't hurt them (much) now.