This could be a CPU problem but it could also easily be a memory subsystem or cooling issue. I really hope someone will get to the bottom of this soon and that it won't be a CPU issue, that could get expensive for AMD in a hurry.
Edit: and reading the comments in that thread it would be great if people would remark if they're running stock clocks and if they have upgraded their BIOS.
Notably, this is the second time in a few months that this happens. The previous one was a microcode update, though that one was Intel's fault (according to Intel, it happened only when the microcode was loaded by the kernel; they probably had tested only loading it from the firmware).
I recently updated the firmware on an MSI motherboard and since then machine suffers from occasional memory errors. Could be just by coincidence, but the timing was very close. Zero problems for months (probably years), first problem 4 hours after the upgrade. Downgraded later to the previous firmware version, but that did not change anything anymore.
I'm on T14 Gen 1 AMD with Fedora with the latest BIOS for the past 2 years and hasn't really experienced any issues. Perhaps I'm missing something here. Care to elaborate?
That’s true to a certain extent. Once in a while a system won’t like a drastic change like going from an AMD chipset to Intel and
you may get a bluescreen on reboot.
I bet it's a brown-out of some sort which will be fixed by bumping up the voltage a bit. Or using the Intel trick of slowing the CPU down when it needs to execute heavy vectorized workloads. I wonder why this only seems to happen under Linux though.
MSI is not much better than the other choices when it comes to buggy firmware. Just grep on a z690 how many AE_ALREADY_EXISTS errors you get on a fresh kernel boot ( which indicates at the very minimum a total lack of attention to detail). It did not improve on the z790 despite the fact it was widely reported to them.
I noticed it after I upgraded to a 3900X. Not sure if it's an AMD thing, a USB hub thing, or a many-core thing. But it had never happened before. (Microsoft Pro Intellimouse)
reply