[RFC v3 00/19] kunit: introduce KUnit, the Linux kernel unit testing framework
by Brendan Higgins
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here:
https://google.github.io/kunit-docs/third_party/kernel/docs/
Additionally for convenience, I have applied these patches to a branch:
https://kunit.googlesource.com/linux/+/kunit/rfc/4.19/v3
The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/4.19/v3 branch.
## Changes Since Last Version
- Changed namespace prefix from `test_*` to `kunit_*` as requested by
Shuah.
- Started converting/cleaning up the device tree unittest to use KUnit.
- Started adding KUnit expectations with custom messages.
--
2.20.0.rc0.387.gc7a69e6b6c-goog
1 year, 4 months
[PATCH] libnvdimm, namespace: check nsblk->uuid immediately after its allocation
by Wei Yang
When creating nd_namespace_blk, its uuid is copied from nd_label->uuid.
In case the memory allocation fails, it goes to the error branch.
This check is better to be done immediately after memory allocation,
while current implementation does this after assigning claim_class.
This patch moves the check immediately after uuid allocation.
Signed-off-by: Wei Yang <richardw.yang(a)linux.intel.com>
---
drivers/nvdimm/namespace_devs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 681af3a8fd62..9471b9ca04f5 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -2240,11 +2240,11 @@ static struct device *create_namespace_blk(struct nd_region *nd_region,
nsblk->lbasize = __le64_to_cpu(nd_label->lbasize);
nsblk->uuid = kmemdup(nd_label->uuid, NSLABEL_UUID_LEN,
GFP_KERNEL);
+ if (!nsblk->uuid)
+ goto blk_err;
if (namespace_label_has(ndd, abstraction_guid))
nsblk->common.claim_class
= to_nvdimm_cclass(&nd_label->abstraction_guid);
- if (!nsblk->uuid)
- goto blk_err;
memcpy(name, nd_label->name, NSLABEL_NAME_LEN);
if (name[0])
nsblk->alt_name = kmemdup(name, NSLABEL_NAME_LEN,
--
2.19.1
1 year, 7 months
[mm PATCH v6 0/7] Deferred page init improvements
by Alexander Duyck
This patchset is essentially a refactor of the page initialization logic
that is meant to provide for better code reuse while providing a
significant improvement in deferred page initialization performance.
In my testing on an x86_64 system with 384GB of RAM and 3TB of persistent
memory per node I have seen the following. In the case of regular memory
initialization the deferred init time was decreased from 3.75s to 1.06s on
average. For the persistent memory the initialization time dropped from
24.17s to 19.12s on average. This amounts to a 253% improvement for the
deferred memory initialization performance, and a 26% improvement in the
persistent memory initialization performance.
I have called out the improvement observed with each patch.
Note: This patch set is meant as a replacment for the v5 set that is already
in the MM tree.
I had considered just doing incremental changes but Pavel at the time
had suggested I submit it as a whole set, however that was almost 3
weeks ago so if incremental changes are preferred let me know and
I can submit the changes as incremental updates.
I appologize for the delay in submitting this follow-on set. I had been
trying to address the DAX PageReserved bit issue at the same time but
that is taking more time than I anticipated so I decided to push this
before the code sits too much longer.
Commit bf416078f1d83 ("mm/page_alloc.c: memory hotplug: free pages as
higher order") causes issues with the revert of patch 7. It was
necessary to replace all instances of __free_pages_boot_core with
__free_pages_core.
v1->v2:
Fixed build issue on PowerPC due to page struct size being 56
Added new patch that removed __SetPageReserved call for hotplug
v2->v3:
Rebased on latest linux-next
Removed patch that had removed __SetPageReserved call from init
Added patch that folded __SetPageReserved into set_page_links
Tweaked __init_pageblock to use start_pfn to get section_nr instead of pfn
v3->v4:
Updated patch description and comments for mm_zero_struct_page patch
Replaced "default" with "case 64"
Removed #ifndef mm_zero_struct_page
Fixed typo in comment that ommited "_from" in kerneldoc for iterator
Added Reviewed-by for patches reviewed by Pavel
Added Acked-by from Michal Hocko
Added deferred init times for patches that affect init performance
Swapped patches 5 & 6, pulled some code/comments from 4 into 5
v4->v5:
Updated Acks/Reviewed-by
Rebased on latest linux-next
Split core bits of zone iterator patch from MAX_ORDER_NR_PAGES init
v5->v6:
Rebased on linux-next with previous v5 reverted
Drop the "This patch" or "This change" from patch desriptions.
Cleaned up patch descriptions for patches 3 & 4
Fixed kerneldoc for __next_mem_pfn_range_in_zone
Updated several Reviewed-by, and incorporated suggestions from Pavel
Added __init_single_page_nolru to patch 5 to consolidate code
Refactored iterator in patch 7 and fixed several issues
---
Alexander Duyck (7):
mm: Use mm_zero_struct_page from SPARC on all 64b architectures
mm: Drop meminit_pfn_in_nid as it is redundant
mm: Implement new zone specific memblock iterator
mm: Initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections
mm: Move hot-plug specific memory init into separate functions and optimize
mm: Add reserved flag setting to set_page_links
mm: Use common iterator for deferred_init_pages and deferred_free_pages
arch/sparc/include/asm/pgtable_64.h | 30 --
include/linux/memblock.h | 41 +++
include/linux/mm.h | 50 +++
mm/memblock.c | 64 ++++
mm/page_alloc.c | 571 +++++++++++++++++++++--------------
5 files changed, 498 insertions(+), 258 deletions(-)
--
1 year, 10 months
[PATCH v5 1/7] libndctl: Use the supported_alignment attribute
by Oliver O'Halloran
Newer kernels provide the "supported_alignments" sysfs attribute that
indicates what alignments can be used with a PFN or DAX namespace. This
patch adds the plumbing inside of libndctl to allow users to query this
information through using:
ndctl_{dax|pfn}_get_supported_alignment(), and
ndctl_{dax|pfn}_get_num_alignments()
Signed-off-by: Oliver O'Halloran <oohall(a)gmail.com>
---
v5: Fixed comment wording
v4: Changed return code of ndctl_pfn_get_supported_alignment from -1 to
-1 to -EINVAL.
Reworded comment about why we default to 4K and 2M alignments when
the sysfs attribute is missing.
Shuffled around prototypes in ndctl.h.
80 char compliance fixes.
rebased onto pending branch
v3: Changed the return type of the *_get_supported_alignment() functions
to unsigned long to match the existing *_get_alignment() functions.
---
ndctl/lib/libndctl.c | 43 ++++++++++++++++++++++++++++++++++++++++++
ndctl/lib/libndctl.sym | 4 ++++
ndctl/libndctl.h | 4 ++++
3 files changed, 51 insertions(+)
diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index 830b791339d2..06f835d76117 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -31,6 +31,7 @@
#include <ccan/build_assert/build_assert.h>
#include <ndctl.h>
+#include <util/size.h>
#include <util/sysfs.h>
#include <ndctl/libndctl.h>
#include <ndctl/namespace.h>
@@ -237,6 +238,7 @@ struct ndctl_pfn {
int buf_len;
uuid_t uuid;
int id, generation;
+ struct ndctl_lbasize alignments;
};
struct ndctl_dax {
@@ -4814,6 +4816,19 @@ static void *__add_pfn(struct ndctl_pfn *pfn, const char *pfn_base)
else
pfn->size = strtoull(buf, NULL, 0);
+ /*
+ * The supported_alignments attribute was added before arches other
+ * than x86 had pmem support. If the kernel doesn't provide the
+ * attribute then it's safe to assume that we running on x86 where
+ * 4KiB and 2MiB have always been supported.
+ */
+ sprintf(path, "%s/supported_alignments", pfn_base);
+ if (sysfs_read_attr(ctx, path, buf) < 0)
+ sprintf(buf, "%d %d", SZ_4K, SZ_2M);
+
+ if (parse_lbasize_supported(ctx, pfn_base, buf, &pfn->alignments) < 0)
+ goto err_read;
+
free(path);
return pfn;
@@ -5048,6 +5063,23 @@ NDCTL_EXPORT int ndctl_pfn_set_align(struct ndctl_pfn *pfn, unsigned long align)
return 0;
}
+NDCTL_EXPORT int ndctl_pfn_get_num_alignments(struct ndctl_pfn *pfn)
+{
+ return pfn->alignments.num;
+}
+
+NDCTL_EXPORT unsigned long ndctl_pfn_get_supported_alignment(
+ struct ndctl_pfn *pfn, int i)
+{
+ if (pfn->alignments.num == 0)
+ return 0;
+
+ if (i < 0 || i > pfn->alignments.num)
+ return -EINVAL;
+ else
+ return pfn->alignments.supported[i];
+}
+
NDCTL_EXPORT int ndctl_pfn_set_namespace(struct ndctl_pfn *pfn,
struct ndctl_namespace *ndns)
{
@@ -5270,6 +5302,17 @@ NDCTL_EXPORT unsigned long ndctl_dax_get_align(struct ndctl_dax *dax)
return ndctl_pfn_get_align(&dax->pfn);
}
+NDCTL_EXPORT int ndctl_dax_get_num_alignments(struct ndctl_dax *dax)
+{
+ return ndctl_pfn_get_num_alignments(&dax->pfn);
+}
+
+NDCTL_EXPORT unsigned long ndctl_dax_get_supported_alignment(
+ struct ndctl_dax *dax, int i)
+{
+ return ndctl_pfn_get_supported_alignment(&dax->pfn, i);
+}
+
NDCTL_EXPORT int ndctl_dax_has_align(struct ndctl_dax *dax)
{
return ndctl_pfn_has_align(&dax->pfn);
diff --git a/ndctl/lib/libndctl.sym b/ndctl/lib/libndctl.sym
index 275db92ee103..a30a93e3c012 100644
--- a/ndctl/lib/libndctl.sym
+++ b/ndctl/lib/libndctl.sym
@@ -390,4 +390,8 @@ LIBNDCTL_19 {
global:
ndctl_cmd_xlat_firmware_status;
ndctl_cmd_submit_xlat;
+ ndctl_pfn_get_supported_alignment;
+ ndctl_pfn_get_num_alignments;
+ ndctl_dax_get_supported_alignment;
+ ndctl_dax_get_num_alignments;
} LIBNDCTL_18;
diff --git a/ndctl/libndctl.h b/ndctl/libndctl.h
index e55a5932781d..ac639b7d9142 100644
--- a/ndctl/libndctl.h
+++ b/ndctl/libndctl.h
@@ -597,7 +597,9 @@ int ndctl_pfn_set_uuid(struct ndctl_pfn *pfn, uuid_t uu);
void ndctl_pfn_get_uuid(struct ndctl_pfn *pfn, uuid_t uu);
int ndctl_pfn_has_align(struct ndctl_pfn *pfn);
int ndctl_pfn_set_align(struct ndctl_pfn *pfn, unsigned long align);
+int ndctl_pfn_get_num_alignments(struct ndctl_pfn *pfn);
unsigned long ndctl_pfn_get_align(struct ndctl_pfn *pfn);
+unsigned long ndctl_pfn_get_supported_alignment(struct ndctl_pfn *pfn, int i);
unsigned long long ndctl_pfn_get_resource(struct ndctl_pfn *pfn);
unsigned long long ndctl_pfn_get_size(struct ndctl_pfn *pfn);
int ndctl_pfn_set_namespace(struct ndctl_pfn *pfn, struct ndctl_namespace *ndns);
@@ -628,7 +630,9 @@ unsigned long long ndctl_dax_get_resource(struct ndctl_dax *dax);
int ndctl_dax_set_uuid(struct ndctl_dax *dax, uuid_t uu);
enum ndctl_pfn_loc ndctl_dax_get_location(struct ndctl_dax *dax);
int ndctl_dax_set_location(struct ndctl_dax *dax, enum ndctl_pfn_loc loc);
+int ndctl_dax_get_num_alignments(struct ndctl_dax *dax);
unsigned long ndctl_dax_get_align(struct ndctl_dax *dax);
+unsigned long ndctl_dax_get_supported_alignment(struct ndctl_dax *dax, int i);
int ndctl_dax_has_align(struct ndctl_dax *dax);
int ndctl_dax_set_align(struct ndctl_dax *dax, unsigned long align);
int ndctl_dax_set_namespace(struct ndctl_dax *dax,
--
2.20.1
1 year, 10 months
[PATCH v3 1/2] nfit, mce: only handle uncorrectable machine checks
by Vishal Verma
The mce handler for 'nfit' devices is called for memory errors on a
Non-Volatile DIMM, and adds the error location to a 'badblocks' list.
This list is used by the various NVDIMM drivers to avoid consuming known
poison locations during IO.
The mce handler gets called for both corrected and uncorrectable errors.
Until now, both kinds of errors have been added to the badblocks list.
However, corrected memory errors indicate that the problem has already
been fixed by hardware, and the resulting interrupt is merely a
notification to Linux. As far as future accesses to that location are
concerned, it is perfectly fine to use, and thus doesn't need to be
included in the above badblocks list.
Add a check in the nfit mce handler to filter out corrected mce events,
and only process uncorrectable errors.
Reported-by: Omar Avelar <omar.avelar(a)intel.com>
Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Cc: stable(a)vger.kernel.org
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
arch/x86/include/asm/mce.h | 1 +
arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
drivers/acpi/nfit/mce.c | 4 ++--
3 files changed, 5 insertions(+), 3 deletions(-)
v3: Unchanged from v2
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3a17107594c8..3111b3cee2ee 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -216,6 +216,7 @@ static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *s
int mce_available(struct cpuinfo_x86 *c);
bool mce_is_memory_error(struct mce *m);
+bool mce_is_correctable(struct mce *m);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 953b3ce92dcc..27015948bc41 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -534,7 +534,7 @@ bool mce_is_memory_error(struct mce *m)
}
EXPORT_SYMBOL_GPL(mce_is_memory_error);
-static bool mce_is_correctable(struct mce *m)
+bool mce_is_correctable(struct mce *m)
{
if (m->cpuvendor == X86_VENDOR_AMD && m->status & MCI_STATUS_DEFERRED)
return false;
@@ -544,6 +544,7 @@ static bool mce_is_correctable(struct mce *m)
return true;
}
+EXPORT_SYMBOL_GPL(mce_is_correctable);
static bool cec_add_mce(struct mce *m)
{
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index e9626bf6ca29..7a51707f87e9 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -25,8 +25,8 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
struct acpi_nfit_desc *acpi_desc;
struct nfit_spa *nfit_spa;
- /* We only care about memory errors */
- if (!mce_is_memory_error(mce))
+ /* We only care about uncorrectable memory errors */
+ if (!mce_is_memory_error(mce) || mce_is_correctable(mce))
return NOTIFY_DONE;
/*
--
2.17.1
1 year, 11 months
[PATCH 0/5] [v4] Allow persistent memory to be used like normal RAM
by Dave Hansen
v3 spurred a bunch of really good discussion. Thanks to everybody
that made comments and suggestions!
I would still love some Acks on this from the folks on cc, even if it
is on just the patch touching your area.
Note: these are based on commit d2f33c19644 in:
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git libnvdimm-pending
Changes since v3:
* Move HMM-related resource warning instead of removing it
* Use __request_resource() directly instead of devm.
* Create a separate DAX_PMEM Kconfig option, complete with help text
* Update patch descriptions and cover letter to give a better
overview of use-cases and hardware where this might be useful.
Changes since v2:
* Updates to dev_dax_kmem_probe() in patch 5:
* Reject probes for devices with bad NUMA nodes. Keeps slow
memory from being added to node 0.
* Use raw request_mem_region()
* Add comments about permanent reservation
* use dev_*() instead of printk's
* Add references to nvdimm documentation in descriptions
* Remove unneeded GPL export
* Add Kconfig prompt and help text
Changes since v1:
* Now based on git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git
* Use binding/unbinding from "dax bus" code
* Move over to a "dax bus" driver from being an nvdimm driver
--
Persistent memory is cool. But, currently, you have to rewrite
your applications to use it. Wouldn't it be cool if you could
just have it show up in your system like normal RAM and get to
it like a slow blob of memory? Well... have I got the patch
series for you!
== Background / Use Cases ==
Persistent Memory (aka Non-Volatile DIMMs / NVDIMMS) themselves
are described in detail in Documentation/nvdimm/nvdimm.txt.
However, this documentation focuses on actually using them as
storage. This set is focused on using NVDIMMs as DRAM replacement.
This is intended for Intel-style NVDIMMs (aka. Intel Optane DC
persistent memory) NVDIMMs. These DIMMs are physically persistent,
more akin to flash than traditional RAM. They are also expected to
be more cost-effective than using RAM, which is why folks want this
set in the first place.
This set is not intended for RAM-based NVDIMMs. Those are not
cost-effective vs. plain RAM, and this using them here would simply
be a waste.
But, why would you bother with this approach? Intel itself [1]
has announced a hardware feature that does something very similar:
"Memory Mode" which turns DRAM into a cache in front of persistent
memory, which is then as a whole used as normal "RAM"?
Here are a few reasons:
1. The capacity of memory mode is the size of your persistent
memory that you dedicate. DRAM capacity is "lost" because it
is used for cache. With this, you get PMEM+DRAM capacity for
memory.
2. DRAM acts as a cache with memory mode, and caches can lead to
unpredictable latencies. Since memory mode is all-or-nothing
(either all your DRAM is used as cache or none is), your entire
memory space is exposed to these unpredictable latencies. This
solution lets you guarantee DRAM latencies if you need them.
3. The new "tier" of memory is exposed to software. That means
that you can build tiered applications or infrastructure. A
cloud provider could sell cheaper VMs that use more PMEM and
more expensive ones that use DRAM. That's impossible with
memory mode.
Don't take this as criticism of memory mode. Memory mode is
awesome, and doesn't strictly require *any* software changes (we
have software changes proposed for optimizing it though). It has
tons of other advantages over *this* approach. Basically, we
believe that the approach in these patches is complementary to
memory mode and that both can live side-by-side in harmony.
== Patch Set Overview ==
This series adds a new "driver" to which pmem devices can be
attached. Once attached, the memory "owned" by the device is
hot-added to the kernel and managed like any other memory. On
systems with an HMAT (a new ACPI table), each socket (roughly)
will have a separate NUMA node for its persistent memory so
this newly-added memory can be selected by its unique NUMA
node.
== Testing Overview ==
Here's how I set up a system to test this thing:
1. Boot qemu with lots of memory: "-m 4096", for instance
2. Reserve 512MB of physical memory. Reserving a spot a 2GB
physical seems to work: memmap=512M!0x0000000080000000
This will end up looking like a pmem device at boot.
3. When booted, convert fsdax device to "device dax":
ndctl create-namespace -fe namespace0.0 -m dax
4. See patch 4 for instructions on binding the kmem driver
to a device.
5. Now, online the new memory sections. Perhaps:
grep ^MemTotal /proc/meminfo
for f in `grep -vl online /sys/devices/system/memory/*/state`; do
echo $f: `cat $f`
echo online_movable > $f
grep ^MemTotal /proc/meminfo
done
1. https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operati...
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ross Zwisler <zwisler(a)kernel.org>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: linux-nvdimm(a)lists.01.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Fengguang Wu <fengguang.wu(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: Yaowei Bai <baiyaowei(a)cmss.chinamobile.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Jerome Glisse <jglisse(a)redhat.com>
1 year, 11 months
[PATCH v2] nfit: add Hyper-V NVDIMM DSM command set to white list
by Dexuan Cui
Add the Hyper-V _DSM command set to the white list of NVDIMM command
sets.
This command set is documented at http://www.uefi.org/RFIC_LIST
(see "Virtual NVDIMM 0x1901").
Thanks Dan Williams <dan.j.williams(a)intel.com> for writing the
comment change.
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
---
Changes in v2:
Updated the comment and changelog (Thanks, Dan!)
Rebased to the tag libnvdimm-fixes-5.0-rc4 of the nvdimm tree.
drivers/acpi/nfit/core.c | 17 ++++++++++++++---
drivers/acpi/nfit/nfit.h | 6 +++++-
include/uapi/linux/ndctl.h | 1 +
3 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index e18ade5d74e9..a9270c99be72 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1861,9 +1861,17 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
dev_set_drvdata(&adev_dimm->dev, nfit_mem);
/*
- * Until standardization materializes we need to consider 4
- * different command sets. Note, that checking for function0 (bit0)
- * tells us if any commands are reachable through this GUID.
+ * There are 4 "legacy" NVDIMM command sets
+ * (NVDIMM_FAMILY_{INTEL,MSFT,HPE1,HPE2}) that were created before
+ * an EFI working group was established to constrain this
+ * proliferation. The nfit driver probes for the supported command
+ * set by GUID. Note, if you're a platform developer looking to add
+ * a new command set to this probe, consider using an existing set,
+ * or otherwise seek approval to publish the command set at
+ * http://www.uefi.org/RFIC_LIST.
+ *
+ * Note, that checking for function0 (bit0) tells us if any commands
+ * are reachable through this GUID.
*/
for (i = 0; i <= NVDIMM_FAMILY_MAX; i++)
if (acpi_check_dsm(adev_dimm->handle, to_nfit_uuid(i), 1, 1))
@@ -1886,6 +1894,8 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
dsm_mask &= ~(1 << 8);
} else if (nfit_mem->family == NVDIMM_FAMILY_MSFT) {
dsm_mask = 0xffffffff;
+ } else if (nfit_mem->family == NVDIMM_FAMILY_HYPERV) {
+ dsm_mask = 0x1f;
} else {
dev_dbg(dev, "unknown dimm command family\n");
nfit_mem->family = -1;
@@ -3729,6 +3739,7 @@ static __init int nfit_init(void)
guid_parse(UUID_NFIT_DIMM_N_HPE1, &nfit_uuid[NFIT_DEV_DIMM_N_HPE1]);
guid_parse(UUID_NFIT_DIMM_N_HPE2, &nfit_uuid[NFIT_DEV_DIMM_N_HPE2]);
guid_parse(UUID_NFIT_DIMM_N_MSFT, &nfit_uuid[NFIT_DEV_DIMM_N_MSFT]);
+ guid_parse(UUID_NFIT_DIMM_N_HYPERV, &nfit_uuid[NFIT_DEV_DIMM_N_HYPERV]);
nfit_wq = create_singlethread_workqueue("nfit");
if (!nfit_wq)
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index 33691aecfcee..4de167b4f76f 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -34,11 +34,14 @@
/* https://msdn.microsoft.com/library/windows/hardware/mt604741 */
#define UUID_NFIT_DIMM_N_MSFT "1ee68b36-d4bd-4a1a-9a16-4f8e53d46e05"
+/* http://www.uefi.org/RFIC_LIST (see "Virtual NVDIMM 0x1901") */
+#define UUID_NFIT_DIMM_N_HYPERV "5746c5f2-a9a2-4264-ad0e-e4ddc9e09e80"
+
#define ACPI_NFIT_MEM_FAILED_MASK (ACPI_NFIT_MEM_SAVE_FAILED \
| ACPI_NFIT_MEM_RESTORE_FAILED | ACPI_NFIT_MEM_FLUSH_FAILED \
| ACPI_NFIT_MEM_NOT_ARMED | ACPI_NFIT_MEM_MAP_FAILED)
-#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_MSFT
+#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_HYPERV
#define NVDIMM_STANDARD_CMDMASK \
(1 << ND_CMD_SMART | 1 << ND_CMD_SMART_THRESHOLD | 1 << ND_CMD_DIMM_FLAGS \
@@ -94,6 +97,7 @@ enum nfit_uuids {
NFIT_DEV_DIMM_N_HPE1 = NVDIMM_FAMILY_HPE1,
NFIT_DEV_DIMM_N_HPE2 = NVDIMM_FAMILY_HPE2,
NFIT_DEV_DIMM_N_MSFT = NVDIMM_FAMILY_MSFT,
+ NFIT_DEV_DIMM_N_HYPERV = NVDIMM_FAMILY_HYPERV,
NFIT_SPA_VOLATILE,
NFIT_SPA_PM,
NFIT_SPA_DCR,
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index f57c9e434d2d..de5d90212409 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -243,6 +243,7 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_HPE1 1
#define NVDIMM_FAMILY_HPE2 2
#define NVDIMM_FAMILY_MSFT 3
+#define NVDIMM_FAMILY_HYPERV 4
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
--
2.19.1
1 year, 11 months
[PATCH 1/2] libnvdimm, pfn: use size is enough
by Wei Yang
When trying to see whether current nd_region intersects with others, we
have already calculated the *size* to be expanded to SECTION size.
So just pass size is enough.
Signed-off-by: Wei Yang <richardw.yang(a)linux.intel.com>
---
drivers/nvdimm/pfn_devs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index becf0bb481b3..5eca050b3660 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -686,7 +686,7 @@ static void trim_pfn_device(struct nd_pfn *nd_pfn, u32 *start_pad, u32 *end_trun
if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
IORES_DESC_NONE) == REGION_MIXED
|| !IS_ALIGNED(end, nd_pfn->align)
- || nd_region_conflict(nd_region, start, size + adjust))
+ || nd_region_conflict(nd_region, start, size))
*end_trunc = end - phys_pmem_align_down(nd_pfn, end);
}
--
2.19.1
1 year, 11 months
[PATCH v3 0/5] kvm "virtio pmem" device
by Pankaj Gupta
This patch series has implementation for "virtio pmem".
"virtio pmem" is fake persistent memory(nvdimm) in guest
which allows to bypass the guest page cache. This also
implements a VIRTIO based asynchronous flush mechanism.
Sharing guest kernel driver in this patchset with the
changes suggested in v2. Tested with Qemu side device
emulation for virtio-pmem [6].
Details of project idea for 'virtio pmem' flushing interface
is shared [3] & [4].
Implementation is divided into two parts:
New virtio pmem guest driver and qemu code changes for new
virtio pmem paravirtualized device.
1. Guest virtio-pmem kernel driver
---------------------------------
- Reads persistent memory range from paravirt device and
registers with 'nvdimm_bus'.
- 'nvdimm/pmem' driver uses this information to allocate
persistent memory region and setup filesystem operations
to the allocated memory.
- virtio pmem driver implements asynchronous flushing
interface to flush from guest to host.
2. Qemu virtio-pmem device
---------------------------------
- Creates virtio pmem device and exposes a memory range to
KVM guest.
- At host side this is file backed memory which acts as
persistent memory.
- Qemu side flush uses aio thread pool API's and virtio
for asynchronous guest multi request handling.
David Hildenbrand CCed also posted a modified version[6] of
qemu virtio-pmem code based on updated Qemu memory device API.
Virtio-pmem errors handling:
----------------------------------------
Checked behaviour of virtio-pmem for below types of errors
Need suggestions on expected behaviour for handling these errors?
- Hardware Errors: Uncorrectable recoverable Errors:
a] virtio-pmem:
- As per current logic if error page belongs to Qemu process,
host MCE handler isolates(hwpoison) that page and send SIGBUS.
Qemu SIGBUS handler injects exception to KVM guest.
- KVM guest then isolates the page and send SIGBUS to guest
userspace process which has mapped the page.
b] Existing implementation for ACPI pmem driver:
- Handles such errors with MCE notifier and creates a list
of bad blocks. Read/direct access DAX operation return EIO
if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.
- Similar functionality can be reused in virtio-pmem with MCE
notifier but without scrubbing(no ACPI/ARS)? Need inputs to
confirm if this behaviour is ok or needs any change?
Changes from PATCH v2: [1]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan]
- Use name 'virtio pmem' in place of 'fake dax'
Changes from PATCH v1: [2]
- 0-day build test for build dependency on libnvdimm
Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text
- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow
Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c
Changes from RFC v2:
- Add flush function in the nd_region in place of switching
on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan
Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent
memory and other operations instead of creating an entirely
new block driver.
- Use VIRTIO driver to register memory information with
nvdimm_bus and create region_type accordingly.
- Call VIRTIO flush from existing pmem driver.
Pankaj Gupta (5):
libnvdimm: nd_region flush callback support
virtio-pmem: Add virtio-pmem guest driver
libnvdimm: add nd_region buffered dax_dev flag
ext4: disable map_sync for virtio pmem
xfs: disable map_sync for virtio pmem
[2] https://lkml.org/lkml/2018/8/31/407
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
drivers/acpi/nfit/core.c | 4 -
drivers/dax/super.c | 17 +++++
drivers/nvdimm/claim.c | 6 +
drivers/nvdimm/nd.h | 1
drivers/nvdimm/pmem.c | 15 +++-
drivers/nvdimm/region_devs.c | 45 +++++++++++++-
drivers/nvdimm/virtio_pmem.c | 84 ++++++++++++++++++++++++++
drivers/virtio/Kconfig | 10 +++
drivers/virtio/Makefile | 1
drivers/virtio/pmem.c | 125 +++++++++++++++++++++++++++++++++++++++
fs/ext4/file.c | 11 +++
fs/xfs/xfs_file.c | 8 ++
include/linux/dax.h | 9 ++
include/linux/libnvdimm.h | 11 +++
include/linux/virtio_pmem.h | 60 ++++++++++++++++++
include/uapi/linux/virtio_ids.h | 1
include/uapi/linux/virtio_pmem.h | 10 +++
17 files changed, 406 insertions(+), 12 deletions(-)
1 year, 11 months
[PATCH v3 0/5] Optimize writecache when using pmem as cache
by Huaisheng Ye
From: Huaisheng Ye <yehs1(a)lenovo.com>
This patch set could be used for dm-writecache when use persistent
memory as cache data device.
Patch 1 and 2 go towards removing unused parameter and codes which
actually doesn't really work.
Patch 3 and 4 are targeted at solving problem fn ctr failed to work
due to invalid magic or version, which is caused by the super block
of pmem has messy data stored.
Patch 5 is used for getting the status of seq_count.
Changes Since v2:
- seq_count is important for flush operations, output it within status
for debugging and analyzing code behavior.
[1]: https://lkml.org/lkml/2019/1/3/43
[2]: https://lkml.org/lkml/2019/1/9/6
Huaisheng Ye (5):
dm-writecache: remove unused size to writecache_flush_region
dm-writecache: get rid of memory_data flush to writecache_flush_entry
dm-writecache: expand pmem_reinit for struct dm_writecache
Documentation/device-mapper: add optional parameter reinit
dm-writecache: output seq_count within status
Documentation/device-mapper/writecache.txt | 4 ++++
drivers/md/dm-writecache.c | 23 +++++++++++++----------
2 files changed, 17 insertions(+), 10 deletions(-)
--
1.8.3.1
1 year, 11 months