Dan Williams <dan.j.williams(a)intel.com> writes:
On Wed, Feb 20, 2019 at 11:26 AM Jeff Moyer <jmoyer(a)redhat.com>
> Borislav Petkov <bp(a)alien8.de> writes:
> > Drop stable@
> > On Wed, Feb 20, 2019 at 01:59:15PM -0500, Jeff Moyer wrote:
> >> Sorry for necroposting. I thought the point of the CEC was to make sure
> >> that the other registered decoders only ever saw uncorrected errors.
> > Ha, good point! You mean drivers/ras/cec.c, right?
> > If so, then I don't think we've ever talked about connecting CEC with
> > NVDIMM and whether that would make sense. Lemme add Dan.
> I don't think there's a difference between MCEs for NVDIMMs and normal
> DRAM. I'll let Dan confirm or deny that.
There is a difference. NVDIMMs have local tracking of discovered
poison, methods to scan for latent poison, and methods to clear.
What I meant was that you couldn't tell the difference between an MCE
generated by accessing DRAM vs one generated by accessing an NVDIMM
(aside from checking the address).
A CEC connection, iiuc, would seem an awkward fit. Awkward because
what CEC enables is meant to be implemented natively in the hardware,
and CEC seems to have no concept of the fact that errors can be
As far as I can tell, the Correctable Errors Collector just eats
correctable errors so that the rest of the registered decoders don't
have to worry about receiving them. It sounds like you're suggesting
that NVDIMMs won't spew correctable errors. If that's the case (I don't
think it is), then there's no need at all for these patches.
Anyway, given that the correctable errors collector can be turned off in
the kernel config, and assuming that we still can get correctable errors
from NVDIMMs (I think we can, since I believe the caching hierarchy can
generate them as well), we definitely need to continue to check for
correctable errors in the nfit mce decoder. That's something I had