I recently updated the firmware on an MSI motherboard and since then machine suffers from occasional memory errors. Could be just by coincidence, but the timing was very close. Zero problems for months (probably years), first problem 4 hours after the upgrade. Downgraded later to the previous firmware version, but that did not change anything anymore.
There was breaking issue, but it didn't manifest automatically.
It happened, when the 2nd gen TR arrived. It used the same mainboards, so all the manufacturers issued BIOS updates.
Unfortunately, these updates claimed to support SEV (Secure Encrypted Virtualization). Linux of course tried to initialize it at boot/module load time and the entire thing went hanging, because TR CPUs do not support SEV, only EPYCs do.
So there were the following fixes:
1) downgrade BIOS back to pre-TR2 version,
2) blacklist the ccp module; which would make kvm_amd non-functional,
3) wait for a fix in Linux kernel, which initializes SEV with a timeout.
So it wasn't that tragic issue, if you had first gen TR.
Indeed? I wonder if you've tried different memory performance settings in your BIOS (which "might" invalidate your warranty) or different memory modules altogether
What bugs have you observed in the X399 Taichi? Coincidentally, I have the same board (since TR launch) and haven't even updated the BIOS -- it just works fine for me.
I'm wondering the same thing. The one thing I know that changed in Haswell is that the transactional memory instructions were found to be broken, but I assume those aren't the issue here...
Didn't know about the memory controller. The segfault issues didn't go far, and from the few I read it seemed to be a small problem (AMD as RMA'd a thousand and it didn't go further).
Whether the issue is with the CPU hardware, the mainboard design (VRM, etc), mainboard BIOS, kernel, or the Prime95 app itself still appears to be an open question.
Based on oscilloscope analysis of the VRM output in a linked thread elsewhere in the comments it looks like the board’s VRM design, or its configuration by the board’s BIOS, may be the most likely suspect.
But there are less-researched reports of similar issues on other boards as well, which makes things a bit more murky.
Given the uncertainties there it may put some people off from buying into the TR/sTRX40 platform in general. But to offer a blanket recommendation to avoid is a bit premature.
Depending on board support you might get actual error correction or not.
If it doesn't boot, it's very likely that you actually bought registered memory, which is normal in servers, but _not_ supported at all by Ryzen processors, no matter the mainboard.
reply