On Wed, Feb 20, 2019 at 11:40:10AM -0800, Dan Williams wrote:
There is a difference. NVDIMMs have local tracking of discovered
poison, methods to scan for latent poison, and methods to clear. A CEC
connection, iiuc, would seem an awkward fit. Awkward because what CEC
enables is meant to be implemented natively in the hardware, and CEC
seems to have no concept of the fact that errors can be repaired.
CEC is a leaky bucket of sorts which does call memory_failure_queue() in
the end. So we poison only those errors which report the same address
over and over again.
Correctable errors are by definition already repaired, i.e., corrected
so there's no need to do anything.
The way stuff is plumbed now is, all correctable errors go to the CEC so
NFIT doesn't see them, if CEC is enabled.
But the patch Jeff quoted already changed NFIT to ignore correctable
errors so I guess we don't have to do anything. And this is still needed
for the case where CEC is not enabled.
Good mailing practices for 400: avoid top-posting and trim the reply.