[PATCH v2 0/4] Remove nrexceptional tracking
by Matthew Wilcox (Oracle)
We actually use nrexceptional for very little these days. It's a minor
pain to keep in sync with nrpages, but the pain becomes much bigger
with the THP patches because we don't know how many indices a shadow
entry occupies. It's easier to just remove it than keep it accurate.
Also, we save 8 bytes per inode which is nothing to sneeze at; on my
laptop, it would improve shmem_inode_cache from 22 to 23 objects per
16kB, and inode_cache from 26 to 27 objects. Combined, that saves
a megabyte of memory from a combined usage of 25MB for both caches.
Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
any memory for ext4.
Matthew Wilcox (Oracle) (4):
mm: Introduce and use mapping_empty
mm: Stop accounting shadow entries
dax: Account DAX entries as nrpages
mm: Remove nrexceptional from inode
fs/block_dev.c | 2 +-
fs/dax.c | 8 ++++----
fs/gfs2/glock.c | 3 +--
fs/inode.c | 2 +-
include/linux/fs.h | 2 --
include/linux/pagemap.h | 5 +++++
mm/filemap.c | 16 ----------------
mm/swap_state.c | 4 ----
mm/truncate.c | 19 +++----------------
mm/workingset.c | 1 -
10 files changed, 15 insertions(+), 47 deletions(-)
--
2.28.0
1 year, 3 months
[RFC 0/2] virtio-pmem: Asynchronous flush
by Pankaj Gupta
Jeff reported preflush order issue with the existing implementation
of virtio pmem preflush. Dan suggested[1] to implement asynchronous flush
for virtio pmem using work queue as done in md/RAID. This patch series
intends to solve the preflush ordering issue and also makes the flush
asynchronous from the submitting thread POV.
Submitting this patch series for feeback and is in WIP. I have
done basic testing and currently doing more testing.
Pankaj Gupta (2):
pmem: make nvdimm_flush asynchronous
virtio_pmem: Async virtio-pmem flush
drivers/nvdimm/nd_virtio.c | 66 ++++++++++++++++++++++++++----------
drivers/nvdimm/pmem.c | 15 ++++----
drivers/nvdimm/region_devs.c | 3 +-
drivers/nvdimm/virtio_pmem.c | 9 +++++
drivers/nvdimm/virtio_pmem.h | 12 +++++++
5 files changed, 78 insertions(+), 27 deletions(-)
[1] https://marc.info/?l=linux-kernel&m=157446316409937&w=2
--
2.20.1
1 year, 3 months
[PATCH v2] nvdimm: Avoid race between probe and reading device attributes
by Richard Palethorpe
It is possible to cause a division error and use-after-free by querying the
nmem device before the driver data is fully initialised in nvdimm_probe. E.g
by doing
(while true; do
cat /sys/bus/nd/devices/nmem*/available_slots 2>&1 > /dev/null
done) &
while true; do
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/bind
done
for i in $(seq 0 4); do
echo nmem$i > /sys/bus/nd/drivers/nvdimm/unbind
done
done
On 5.7-rc3 this causes:
[ 12.711578] divide error: 0000 [#1] SMP KASAN PTI
[ 12.712321] CPU: 0 PID: 231 Comm: cat Not tainted 5.7.0-rc3 #48
[ 12.713188] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 12.714857] RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[ 12.715772] Code: ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 05 80 fa 03 7e 52 8b 73 08 31 d2 89 c1 48 83 c4 08 5b 5d <f7> f6 31 d2 41 5c 83 c0 07 c1 e8 03 48 8d 84 00 8e 02 00 00 25 00
[ 12.718311] RSP: 0018:ffffc9000046fd08 EFLAGS: 00010282
[ 12.719030] RAX: 0000000000000000 RBX: ffffffffc0073aa0 RCX: 0000000000000000
[ 12.720005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888060931808
[ 12.720970] RBP: ffff88806609d018 R08: 0000000000000001 R09: ffffed100cc0a2b1
[ 12.721889] R10: ffff888066051587 R11: ffffed100cc0a2b0 R12: ffff888060931800
[ 12.722744] R13: ffff888064362000 R14: ffff88806609d018 R15: ffffffff8b1a2520
[ 12.723602] FS: 00007fd16f3d5580(0000) GS:ffff88806b400000(0000) knlGS:0000000000000000
[ 12.724600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.725308] CR2: 00007fd16f1ec000 CR3: 0000000064322006 CR4: 0000000000160ef0
[ 12.726268] Call Trace:
[ 12.726633] available_slots_show+0x4e/0x120 [libnvdimm]
[ 12.727380] dev_attr_show+0x42/0x80
[ 12.727891] ? memset+0x20/0x40
[ 12.728341] sysfs_kf_seq_show+0x218/0x410
[ 12.728923] seq_read+0x389/0xe10
[ 12.729415] vfs_read+0x101/0x2d0
[ 12.729891] ksys_read+0xf9/0x1d0
[ 12.730361] ? kernel_write+0x120/0x120
[ 12.730915] do_syscall_64+0x95/0x4a0
[ 12.731435] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 12.732163] RIP: 0033:0x7fd16f2fe4be
[ 12.732685] Code: c0 e9 c6 fe ff ff 50 48 8d 3d 2e 12 0a 00 e8 69 e9 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
[ 12.735207] RSP: 002b:00007ffd3177b838 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 12.736261] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fd16f2fe4be
[ 12.737233] RDX: 0000000000020000 RSI: 00007fd16f1ed000 RDI: 0000000000000003
[ 12.738203] RBP: 00007fd16f1ed000 R08: 00007fd16f1ec010 R09: 0000000000000000
[ 12.739172] R10: 00007fd16f3f4f70 R11: 0000000000000246 R12: 00007ffd3177ce23
[ 12.740144] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[ 12.741139] Modules linked in: nfit libnvdimm
[ 12.741783] ---[ end trace 99532e4b82410044 ]---
[ 12.742452] RIP: 0010:nd_label_nfree+0x134/0x1a0 [libnvdimm]
[ 12.743167] Code: ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 14 11 84 d2 74 05 80 fa 03 7e 52 8b 73 08 31 d2 89 c1 48 83 c4 08 5b 5d <f7> f6 31 d2 41 5c 83 c0 07 c1 e8 03 48 8d 84 00 8e 02 00 00 25 00
[ 12.745709] RSP: 0018:ffffc9000046fd08 EFLAGS: 00010282
[ 12.746340] RAX: 0000000000000000 RBX: ffffffffc0073aa0 RCX: 0000000000000000
[ 12.747209] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888060931808
[ 12.748081] RBP: ffff88806609d018 R08: 0000000000000001 R09: ffffed100cc0a2b1
[ 12.748977] R10: ffff888066051587 R11: ffffed100cc0a2b0 R12: ffff888060931800
[ 12.749849] R13: ffff888064362000 R14: ffff88806609d018 R15: ffffffff8b1a2520
[ 12.750729] FS: 00007fd16f3d5580(0000) GS:ffff88806b400000(0000) knlGS:0000000000000000
[ 12.751708] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.752441] CR2: 00007fd16f1ec000 CR3: 0000000064322006 CR4: 0000000000160ef0
[ 12.821357] ==================================================================
[ 12.822284] BUG: KASAN: use-after-free in __mutex_lock+0x111c/0x11a0
[ 12.823084] Read of size 4 at addr ffff888065c26238 by task reproducer/218
[ 12.823968]
[ 12.824183] CPU: 2 PID: 218 Comm: reproducer Tainted: G D 5.7.0-rc3 #48
[ 12.825167] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 12.826595] Call Trace:
[ 12.826926] dump_stack+0x97/0xe0
[ 12.827362] print_address_description.constprop.0+0x1b/0x210
[ 12.828111] ? __mutex_lock+0x111c/0x11a0
[ 12.828645] __kasan_report.cold+0x37/0x92
[ 12.829179] ? __mutex_lock+0x111c/0x11a0
[ 12.829706] kasan_report+0x38/0x50
[ 12.830158] __mutex_lock+0x111c/0x11a0
[ 12.830666] ? ftrace_graph_stop+0x10/0x10
[ 12.831193] ? is_nvdimm_bus+0x40/0x40 [libnvdimm]
[ 12.831820] ? mutex_trylock+0x2b0/0x2b0
[ 12.832333] ? nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.832975] ? mutex_trylock+0x2b0/0x2b0
[ 12.833500] ? nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.834122] ? prepare_ftrace_return+0xa1/0xf0
[ 12.834724] ? ftrace_graph_caller+0x6b/0xa0
[ 12.835269] ? acpi_label_write+0x390/0x390 [nfit]
[ 12.835909] ? nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.836558] ? nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.837179] nvdimm_probe+0x259/0x420 [libnvdimm]
[ 12.837802] nvdimm_bus_probe+0x110/0x6b0 [libnvdimm]
[ 12.838470] really_probe+0x212/0x9a0
[ 12.838954] driver_probe_device+0x1cd/0x300
[ 12.839511] ? driver_probe_device+0x5/0x300
[ 12.840063] device_driver_attach+0xe7/0x120
[ 12.840623] bind_store+0x18d/0x230
[ 12.841075] kernfs_fop_write+0x200/0x420
[ 12.841606] vfs_write+0x154/0x450
[ 12.842047] ksys_write+0xf9/0x1d0
[ 12.842497] ? __ia32_sys_read+0xb0/0xb0
[ 12.843010] do_syscall_64+0x95/0x4a0
[ 12.843495] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 12.844140] RIP: 0033:0x7f5b235d3563
[ 12.844607] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
[ 12.846877] RSP: 002b:00007fff1c3bc578 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 12.847822] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f5b235d3563
[ 12.848717] RDX: 0000000000000006 RSI: 000055f9576710d0 RDI: 0000000000000001
[ 12.849594] RBP: 000055f9576710d0 R08: 000000000000000a R09: 0000000000000000
[ 12.850470] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
[ 12.851333] R13: 00007f5b236a3500 R14: 0000000000000006 R15: 00007f5b236a3700
[ 12.852247]
[ 12.852466] Allocated by task 225:
[ 12.852893] save_stack+0x1b/0x40
[ 12.853310] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 12.853918] kmem_cache_alloc_node+0xef/0x270
[ 12.854475] copy_process+0x485/0x6130
[ 12.854945] _do_fork+0xf1/0xb40
[ 12.855353] __do_sys_clone+0xc3/0x100
[ 12.855843] do_syscall_64+0x95/0x4a0
[ 12.856302] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[ 12.856939]
[ 12.857140] Freed by task 0:
[ 12.857522] save_stack+0x1b/0x40
[ 12.857940] __kasan_slab_free+0x12c/0x170
[ 12.858464] kmem_cache_free+0xb0/0x330
[ 12.858945] rcu_core+0x55f/0x19f0
[ 12.859385] __do_softirq+0x228/0x944
[ 12.859869]
[ 12.860075] The buggy address belongs to the object at ffff888065c26200
[ 12.860075] which belongs to the cache task_struct of size 6016
[ 12.861638] The buggy address is located 56 bytes inside of
[ 12.861638] 6016-byte region [ffff888065c26200, ffff888065c27980)
[ 12.863084] The buggy address belongs to the page:
[ 12.863702] page:ffffea0001970800 refcount:1 mapcount:0 mapping:0000000021ee3712 index:0x0 head:ffffea0001970800 order:3 compound_mapcount:0 compound_pincount:0
[ 12.865478] flags: 0x80000000010200(slab|head)
[ 12.866039] raw: 0080000000010200 0000000000000000 0000000100000001 ffff888066c0f980
[ 12.867010] raw: 0000000000000000 0000000080050005 00000001ffffffff 0000000000000000
[ 12.867986] page dumped because: kasan: bad access detected
[ 12.868696]
[ 12.868900] Memory state around the buggy address:
[ 12.869514] ffff888065c26100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 12.870414] ffff888065c26180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 12.871318] >ffff888065c26200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 12.872238] ^
[ 12.872870] ffff888065c26280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 12.873754] ffff888065c26300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 12.874640]
==================================================================
This can be prevented by setting the driver data after initialisation is
complete.
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure")
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: linux-nvdimm(a)lists.01.org
Cc: linux-kernel(a)vger.kernel.org
Cc: Coly Li <colyli(a)suse.com>
Signed-off-by: Richard Palethorpe <rpalethorpe(a)suse.com>
---
V2:
+ Reviewed by Coly and removed unecessary lock
drivers/nvdimm/dimm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c
index 7d4ddc4d9322..3d3988e1d9a0 100644
--- a/drivers/nvdimm/dimm.c
+++ b/drivers/nvdimm/dimm.c
@@ -43,7 +43,6 @@ static int nvdimm_probe(struct device *dev)
if (!ndd)
return -ENOMEM;
- dev_set_drvdata(dev, ndd);
ndd->dpa.name = dev_name(dev);
ndd->ns_current = -1;
ndd->ns_next = -1;
@@ -106,6 +105,8 @@ static int nvdimm_probe(struct device *dev)
if (rc)
goto err;
+ dev_set_drvdata(dev, ndd);
+
return 0;
err:
--
2.26.2
1 year, 4 months
[PATCH 1/1] ndctl/namespace: Fix disable-namespace accounting relative to seed devices
by Redhairer Li
Seed namespaces are included in "ndctl disable-namespace all". However
since the user never "creates" them it is surprising to see
"disable-namespace" report 1 more namespace relative to the number that
have been created. Catch attempts to disable a zero-sized namespace:
Before:
{
"dev":"namespace1.0",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1"
}
{
"dev":"namespace1.1",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1.1"
}
{
"dev":"namespace1.2",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1.2"
}
disabled 4 namespaces
After:
{
"dev":"namespace1.0",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1"
}
{
"dev":"namespace1.3",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1.3"
}
{
"dev":"namespace1.1",
"size":"492.00 MiB (515.90 MB)",
"blockdev":"pmem1.1"
}
disabled 3 namespaces
Signed-off-by: Redhairer Li <redhairer.li(a)intel.com>
---
ndctl/lib/libndctl.c | 11 ++++++++---
ndctl/region.c | 4 +++-
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/ndctl/lib/libndctl.c b/ndctl/lib/libndctl.c
index ee737cb..49f362b 100644
--- a/ndctl/lib/libndctl.c
+++ b/ndctl/lib/libndctl.c
@@ -4231,6 +4231,7 @@ NDCTL_EXPORT int ndctl_namespace_disable_safe(struct ndctl_namespace *ndns)
const char *bdev = NULL;
char path[50];
int fd;
+ unsigned long long size = ndctl_namespace_get_size(ndns);
if (pfn && ndctl_pfn_is_enabled(pfn))
bdev = ndctl_pfn_get_block_device(pfn);
@@ -4260,9 +4261,13 @@ NDCTL_EXPORT int ndctl_namespace_disable_safe(struct ndctl_namespace *ndns)
devname, bdev, strerror(errno));
return -errno;
}
- } else
- ndctl_namespace_disable_invalidate(ndns);
-
+ } else {
+ if (size == 0)
+ /* Don't try to disable idle namespace (no capacity allocated) */
+ return -ENXIO;
+ else
+ ndctl_namespace_disable_invalidate(ndns);
+ }
return 0;
}
diff --git a/ndctl/region.c b/ndctl/region.c
index 7945007..0014bb9 100644
--- a/ndctl/region.c
+++ b/ndctl/region.c
@@ -72,6 +72,7 @@ static int region_action(struct ndctl_region *region, enum device_action mode)
{
struct ndctl_namespace *ndns;
int rc = 0;
+ unsigned long long size;
switch (mode) {
case ACTION_ENABLE:
@@ -80,7 +81,8 @@ static int region_action(struct ndctl_region *region, enum device_action mode)
case ACTION_DISABLE:
ndctl_namespace_foreach(region, ndns) {
rc = ndctl_namespace_disable_safe(ndns);
- if (rc)
+ size = ndctl_namespace_get_size(ndns);
+ if (rc && size != 0)
return rc;
}
rc = ndctl_region_disable_invalidate(region);
--
2.20.1.windows.1
1 year, 4 months
[PATCH ndctl v1 0/8] daxctl: Add device align and range mapping allocation
by Joao Martins
Hey,
This series builds on top of this one[0] and does the following improvements
to the Soft-Reserved subdivision:
1) Support for {create,reconfigure}-device for selecting @align (hugepage size).
Here we add a '-a|--align 4K|2M|1G' option to the existing commands;
2) Listing improvements for device alignment and mappings;
Note: Perhaps it is better to hide the mappings by default, and only
print with -v|--verbose. This would align with ndctl, as the mappings
info can be quite large.
3) Allow creating devices from selecting ranges. This allows to keep the
same GPA->HPA mapping as before we kexec the hypervisor with running guests:
daxctl list -d dax0.1 > /var/log/dax0.1.json
kexec -d -l bzImage
systemctl kexec
daxctl create -u --restore /var/log/dax0.1.json
The JSON was what I though it would be easier for an user, given that it is
the data format daxctl outputs. Alternatives could be adding multiple:
--mapping <pgoff>:<start>-<end>
But that could end up in a gigantic line and a little more
unmanageable I think.
This series requires this series[0] on top of Dan's patches[1]:
[0] https://lore.kernel.org/linux-nvdimm/20200716172913.19658-1-joao.m.martin...
[1] https://lore.kernel.org/linux-nvdimm/159457116473.754248.7879464730875147...
The only TODO here is docs and improving tests to validate mappings, and test
the restore path.
Suggestions/comments are welcome.
Joao
Joao Martins (8):
daxctl: add daxctl_dev_{get,set}_align()
util/json: Print device align
daxctl: add align support in reconfigure-device
daxctl: add align support in create-device
libdaxctl: add mapping iterator APIs
daxctl: include mappings when listing
libdaxctl: add daxctl_dev_set_mapping()
daxctl: Allow restore devices from JSON metadata
daxctl/device.c | 154 +++++++++++++++++++++++++++++++++++++++--
daxctl/lib/libdaxctl-private.h | 9 +++
daxctl/lib/libdaxctl.c | 152 +++++++++++++++++++++++++++++++++++++++-
daxctl/lib/libdaxctl.sym | 9 +++
daxctl/libdaxctl.h | 16 +++++
util/json.c | 63 ++++++++++++++++-
util/json.h | 3 +
7 files changed, 396 insertions(+), 10 deletions(-)
--
1.8.3.1
1 year, 6 months
[PATCH ndctl v2 00/10] daxctl: Support for sub-dividing soft-reserved regions
by Joao Martins
Changes since v1:
* Add a Documentation/daxctl/ entry for each patch that adds commands or new
option.
* Fix functional test suite to only change region 0 and not touch others
* Fix reconfigure-device -s changes (third patch) for better bisection.
v1: https://lore.kernel.org/linux-nvdimm/20200403205900.18035-1-joao.m.martin...
---
This series introduces the daxctl support for sub-dividing soft-reserved
regions created by EFI/HMAT/efi_fake_mem. It's the userspace counterpart
of this recent patch series [0].
These new 'dynamic' regions can be partitioned into multiple different devices
which its subdivisions can consist of one or more ranges. This
is in contrast to static dax regions -- created with ndctl-create-namespace
-m devdax -- which can't be subdivided neither discontiguous.
See also cover-letter of [0].
The daxctl changes in these patches are depicted as:
* {create,destroy,disable,enable}-device:
These orchestrate/manage the sub-division devices.
It mimmics the same as namespaces equivalent commands.
* Allow reconfigure-device to change the size of an existing *dynamic* dax
device.
* Add test coverage (Tried to cover all range allocation code paths).
v2 of kernel patches now passes this test suite.
* Documentation regarding the new command additions.
[0] "device-dax: Support sub-dividing soft-reserved ranges",
https://lore.kernel.org/linux-nvdimm/159457116473.754248.7879464730875147...
Dan Williams (1):
daxctl: Cleanup whitespace
Joao Martins (9):
libdaxctl: add daxctl_dev_set_size()
daxctl: add resize support in reconfigure-device
daxctl: add command to disable devdax device
daxctl: add command to enable devdax device
libdaxctl: add daxctl_region_create_dev()
daxctl: add command to create device
libdaxctl: add daxctl_region_destroy_dev()
daxctl: add command to destroy device
daxctl/test: Add tests for dynamic dax regions
Documentation/daxctl/Makefile.am | 6 +-
Documentation/daxctl/daxctl-create-device.txt | 105 +++++++
Documentation/daxctl/daxctl-destroy-device.txt | 63 +++++
Documentation/daxctl/daxctl-disable-device.txt | 58 ++++
Documentation/daxctl/daxctl-enable-device.txt | 59 ++++
Documentation/daxctl/daxctl-reconfigure-device.txt | 16 ++
daxctl/builtin.h | 4 +
daxctl/daxctl.c | 4 +
daxctl/device.c | 310 ++++++++++++++++++++-
daxctl/lib/libdaxctl.c | 67 +++++
daxctl/lib/libdaxctl.sym | 7 +
daxctl/libdaxctl.h | 3 +
test/Makefile.am | 1 +
test/daxctl-create.sh | 294 +++++++++++++++++++
util/filter.c | 2 +-
15 files changed, 993 insertions(+), 6 deletions(-)
create mode 100644 Documentation/daxctl/daxctl-create-device.txt
create mode 100644 Documentation/daxctl/daxctl-destroy-device.txt
create mode 100644 Documentation/daxctl/daxctl-disable-device.txt
create mode 100644 Documentation/daxctl/daxctl-enable-device.txt
create mode 100755 test/daxctl-create.sh
--
1.8.3.1
1 year, 6 months
[PATCH RFC v3] testing/nvdimm: Add test module for non-nfit platforms
by Santosh Sivaraj
The current test module cannot be used for testing platforms (make check)
that do no have support for NFIT. In order to get the ndctl tests working,
we need a module which can emulate NVDIMM devices without relying on
ACPI/NFIT.
The aim of this proposed module is to implement a similar functionality to the
existing module but without the ACPI dependencies. Currently interleaving and
error injection are not implemented.
Corresponding changes for ndctl is also required, to skip tests that depend
on nfit attributes, which will be sent as a reply to this.
Signed-off-by: Santosh Sivaraj <santosh(a)fossix.org>
---
tools/testing/nvdimm/config_check.c | 3 +-
tools/testing/nvdimm/test/Kbuild | 6 +-
tools/testing/nvdimm/test/ndtest.c | 819 ++++++++++++++++++++++++++++
tools/testing/nvdimm/test/ndtest.h | 65 +++
4 files changed, 891 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/nvdimm/test/ndtest.c
create mode 100644 tools/testing/nvdimm/test/ndtest.h
diff --git a/tools/testing/nvdimm/config_check.c b/tools/testing/nvdimm/config_check.c
index cac891028cd1..3e3a5f518864 100644
--- a/tools/testing/nvdimm/config_check.c
+++ b/tools/testing/nvdimm/config_check.c
@@ -12,7 +12,8 @@ void check(void)
BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_BTT));
BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_PFN));
BUILD_BUG_ON(!IS_MODULE(CONFIG_ND_BLK));
- BUILD_BUG_ON(!IS_MODULE(CONFIG_ACPI_NFIT));
+ if (IS_ENABLED(CONFIG_ACPI_NFIT))
+ BUILD_BUG_ON(!IS_MODULE(CONFIG_ACPI_NFIT));
BUILD_BUG_ON(!IS_MODULE(CONFIG_DEV_DAX));
BUILD_BUG_ON(!IS_MODULE(CONFIG_DEV_DAX_PMEM));
}
diff --git a/tools/testing/nvdimm/test/Kbuild b/tools/testing/nvdimm/test/Kbuild
index 75baebf8f4ba..197bcb2b7f35 100644
--- a/tools/testing/nvdimm/test/Kbuild
+++ b/tools/testing/nvdimm/test/Kbuild
@@ -5,5 +5,9 @@ ccflags-y += -I$(srctree)/drivers/acpi/nfit/
obj-m += nfit_test.o
obj-m += nfit_test_iomap.o
-nfit_test-y := nfit.o
+ifeq ($(CONFIG_ACPI_NFIT),m)
+ nfit_test-y := nfit.o
+else
+ nfit_test-y := ndtest.o
+endif
nfit_test_iomap-y := iomap.o
diff --git a/tools/testing/nvdimm/test/ndtest.c b/tools/testing/nvdimm/test/ndtest.c
new file mode 100644
index 000000000000..415a40345584
--- /dev/null
+++ b/tools/testing/nvdimm/test/ndtest.c
@@ -0,0 +1,819 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/platform_device.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/genalloc.h>
+#include <linux/vmalloc.h>
+#include <linux/dma-mapping.h>
+#include <linux/list_sort.h>
+#include <linux/libnvdimm.h>
+#include <linux/ndctl.h>
+#include <nd-core.h>
+#include <linux/printk.h>
+
+#include "../watermark.h"
+#include "nfit_test.h"
+#include "ndtest.h"
+
+enum {
+ DIMM_SIZE = SZ_32M,
+ LABEL_SIZE = SZ_128K,
+ NUM_INSTANCES = 2,
+ NUM_DCR = 4,
+};
+
+#define NFIT_DIMM_HANDLE(node, socket, imc, chan, dimm) \
+ (((node & 0xfff) << 16) | ((socket & 0xf) << 12) \
+ | ((imc & 0xf) << 8) | ((chan & 0xf) << 4) | (dimm & 0xf))
+
+static struct ndtest_dimm dimm_group1[] = {
+ {
+ .type = NDTEST_REGION_TYPE_BLK | NDTEST_REGION_TYPE_PMEM,
+ .size = DIMM_SIZE,
+ .handle = NFIT_DIMM_HANDLE(0, 0, 0, 0, 0),
+ .uuid_str = "1e5c75d2-b618-11ea-9aa3-507b9ddc0f72",
+ .physical_id = 0,
+ },
+ {
+ .type = NDTEST_REGION_TYPE_PMEM,
+ .size = DIMM_SIZE,
+ .handle = NFIT_DIMM_HANDLE(0, 0, 0, 0, 1),
+ .uuid_str = "1c4d43ac-b618-11ea-be80-507b9ddc0f72",
+ .physical_id = 1,
+ },
+ {
+ .type = NDTEST_REGION_TYPE_PMEM,
+ .size = DIMM_SIZE * 2,
+ .handle = NFIT_DIMM_HANDLE(0, 0, 1, 0, 0),
+ .uuid_str = "a9f17ffc-b618-11ea-b36d-507b9ddc0f72",
+ .physical_id = 2,
+ },
+ {
+ .type = NDTEST_REGION_TYPE_BLK,
+ .size = DIMM_SIZE,
+ .handle = NFIT_DIMM_HANDLE(0, 0, 1, 0, 1),
+ .uuid_str = "b6b83b22-b618-11ea-8aae-507b9ddc0f72",
+ .physical_id = 3,
+ },
+ {
+ .type = NDTEST_REGION_TYPE_PMEM,
+ .size = DIMM_SIZE,
+ .handle = NFIT_DIMM_HANDLE(0, 1, 0, 0, 0),
+ .uuid_str = "bf9baaee-b618-11ea-b181-507b9ddc0f72",
+ .physical_id = 4,
+ },
+};
+
+static struct ndtest_dimm dimm_group2[] = {
+ {
+ .type = NDTEST_REGION_TYPE_PMEM,
+ .size = DIMM_SIZE,
+ .handle = NFIT_DIMM_HANDLE(1, 0, 0, 0, 0),
+ .uuid_str = "ca0817e2-b618-11ea-9db3-507b9ddc0f72",
+ .physical_id = 0,
+ },
+};
+
+static struct ndtest_config bus_configs[NUM_INSTANCES] = {
+ /* bus 1 */
+ {
+ .dimm_start = 0,
+ .dimm_count = ARRAY_SIZE(dimm_group1),
+ .dimm = dimm_group1,
+ },
+ /* bus 2 */
+ {
+ .dimm_start = ARRAY_SIZE(dimm_group1),
+ .dimm_count = ARRAY_SIZE(dimm_group2),
+ .dimm = dimm_group2,
+ },
+};
+
+static DEFINE_SPINLOCK(ndtest_lock);
+static struct ndtest_priv *instances[NUM_INSTANCES];
+static struct class *ndtest_dimm_class;
+static struct gen_pool *ndtest_pool;
+
+static inline struct ndtest_priv *to_ndtest_priv(struct device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+
+ return container_of(pdev, struct ndtest_priv, pdev);
+}
+
+static int ndtest_config_get(struct ndtest_dimm *p, unsigned int buf_len,
+ struct nd_cmd_get_config_data_hdr *hdr)
+{
+ unsigned int len;
+
+ if ((hdr->in_offset + hdr->in_length) > LABEL_SIZE)
+ return -EINVAL;
+
+ hdr->status = 0;
+ len = min(hdr->in_length, LABEL_SIZE - hdr->in_offset);
+ memcpy(hdr->out_buf, p->label_area + hdr->in_offset, len);
+
+ return buf_len - len;
+}
+
+static int ndtest_config_set(struct ndtest_dimm *p, unsigned int buf_len,
+ struct nd_cmd_set_config_hdr *hdr)
+{
+ unsigned int len;
+
+ if ((hdr->in_offset + hdr->in_length) > LABEL_SIZE)
+ return -EINVAL;
+
+ len = min(hdr->in_length, LABEL_SIZE - hdr->in_offset);
+ memcpy(p->label_area + hdr->in_offset, hdr->in_buf, len);
+
+ return buf_len - len;
+}
+
+static int ndtest_ctl(struct nvdimm_bus_descriptor *nd_desc,
+ struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len, int *cmd_rc)
+{
+ struct nd_cmd_get_config_size *size;
+ struct ndtest_dimm *p;
+ int _cmd_rc;
+
+ if (!cmd_rc)
+ cmd_rc = &_cmd_rc;
+
+ *cmd_rc = 0;
+
+ if (!nvdimm)
+ return -EINVAL;
+
+ p = nvdimm_provider_data(nvdimm);
+ if (!p)
+ return -EINVAL;
+
+ /* Failures for a DIMM can be injected using fail_cmd and
+ * fail_cmd_code, see the device attributes below
+ */
+ if (p->fail_cmd)
+ return p->fail_cmd_code ? p->fail_cmd_code : -EIO;
+
+ switch (cmd) {
+ case ND_CMD_GET_CONFIG_SIZE:
+ size = (struct nd_cmd_get_config_size *) buf;
+ size->status = 0;
+ size->max_xfer = 8;
+ size->config_size = p->config_size;
+ *cmd_rc = 0;
+ break;
+
+ case ND_CMD_GET_CONFIG_DATA:
+ *cmd_rc = ndtest_config_get(p, buf_len, buf);
+ break;
+
+ case ND_CMD_SET_CONFIG_DATA:
+ *cmd_rc = ndtest_config_set(p, buf_len, buf);
+ break;
+ default:
+ dev_dbg(p->dev, "invalid command %u\n", cmd);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static ssize_t handle_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct ndtest_dimm *dimm = dev_get_drvdata(dev);
+
+ return sprintf(buf, "%#x\n", dimm->handle);
+}
+static DEVICE_ATTR_RO(handle);
+
+static ssize_t fail_cmd_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct ndtest_dimm *dimm = dev_get_drvdata(dev);
+
+ return sprintf(buf, "%#lx\n", dimm->fail_cmd);
+}
+
+static ssize_t fail_cmd_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t size)
+{
+ struct ndtest_dimm *dimm = dev_get_drvdata(dev);
+ unsigned long val;
+ ssize_t rc;
+
+ rc = kstrtol(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ dimm->fail_cmd = val;
+ return size;
+}
+static DEVICE_ATTR_RW(fail_cmd);
+
+static ssize_t fail_cmd_code_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct ndtest_dimm *dimm = dev_get_drvdata(dev);
+
+ return sprintf(buf, "%d\n", dimm->fail_cmd_code);
+}
+
+static ssize_t fail_cmd_code_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t size)
+{
+ struct ndtest_dimm *dimm = dev_get_drvdata(dev);
+ unsigned long val;
+ ssize_t rc;
+
+ rc = kstrtol(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ dimm->fail_cmd_code = val;
+ return size;
+}
+static DEVICE_ATTR_RW(fail_cmd_code);
+
+static struct attribute *dimm_attributes[] = {
+ &dev_attr_handle.attr,
+ &dev_attr_fail_cmd.attr,
+ &dev_attr_fail_cmd_code.attr,
+ NULL,
+};
+
+static struct attribute_group dimm_attribute_group = {
+ .attrs = dimm_attributes,
+};
+
+static const struct attribute_group *dimm_attribute_groups[] = {
+ &dimm_attribute_group,
+ NULL,
+};
+
+static void put_dimms(void *data)
+{
+ struct ndtest_priv *p = data;
+ int i;
+
+ for (i = 0; i < p->config->dimm_count; i++)
+ if (p->config->dimm[i].dev) {
+ device_unregister(p->config->dimm[i].dev);
+ p->config->dimm[i].dev = NULL;
+ }
+}
+
+#define NDTEST_SCM_DIMM_CMD_MASK \
+ ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
+ (1ul << ND_CMD_GET_CONFIG_DATA) | \
+ (1ul << ND_CMD_SET_CONFIG_DATA))
+
+static ssize_t phys_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+ struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm);
+
+ return sprintf(buf, "%#x\n", dimm->physical_id);
+}
+static DEVICE_ATTR_RO(phys_id);
+
+static ssize_t vendor_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "0x1234567\n");
+}
+static DEVICE_ATTR_RO(vendor);
+
+static ssize_t id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+ struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm);
+
+ return sprintf(buf, "%04x-%02x-%04x-%08x", 0xabcd,
+ 0xa, 2016, ~(dimm->handle));
+}
+static DEVICE_ATTR_RO(id);
+
+static ssize_t nvdimm_handle_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *nvdimm = to_nvdimm(dev);
+ struct ndtest_dimm *dimm = nvdimm_provider_data(nvdimm);
+
+ return sprintf(buf, "%#x\n", dimm->handle);
+}
+
+static struct device_attribute dev_attr_nvdimm_show_handle = {
+ .attr = { .name = "handle", .mode = 0444 },
+ .show = nvdimm_handle_show,
+};
+
+static ssize_t subsystem_vendor_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "0x%04x\n", 0);
+}
+static DEVICE_ATTR_RO(subsystem_vendor);
+
+static ssize_t dirty_shutdown_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%d\n", 42);
+}
+static DEVICE_ATTR_RO(dirty_shutdown);
+
+static struct attribute *ndtest_nvdimm_attributes[] = {
+ &dev_attr_nvdimm_show_handle.attr,
+ &dev_attr_vendor.attr,
+ &dev_attr_id.attr,
+ &dev_attr_phys_id.attr,
+ &dev_attr_subsystem_vendor.attr,
+ &dev_attr_dirty_shutdown.attr,
+ NULL,
+};
+
+static umode_t ndtest_nvdimm_attr_visible(struct kobject *kobj,
+ struct attribute *a, int n)
+{
+ return a->mode;
+}
+
+static const struct attribute_group ndtest_nvdimm_attribute_group = {
+ .attrs = ndtest_nvdimm_attributes,
+ .is_visible = ndtest_nvdimm_attr_visible,
+};
+
+static const struct attribute_group *ndtest_nvdimm_attribute_groups[] = {
+ &ndtest_nvdimm_attribute_group,
+ NULL,
+};
+
+static int ndtest_blk_do_io(struct nd_blk_region *ndbr, resource_size_t dpa,
+ void *iobuf, u64 len, int rw)
+{
+ struct ndtest_dimm *dimm = ndbr->blk_provider_data;
+ struct ndtest_blk_mmio *mmio = dimm->mmio;
+ struct nd_region *nd_region = &ndbr->nd_region;
+ unsigned int lane;
+
+ lane = nd_region_acquire_lane(nd_region);
+
+ if (rw)
+ memcpy(mmio->base + dpa, iobuf, len);
+ else {
+ memcpy(iobuf, mmio->base + dpa, len);
+ arch_invalidate_pmem(mmio->base + dpa, len);
+ }
+
+ nd_region_release_lane(nd_region, lane);
+
+ return 0;
+}
+
+static int ndtest_blk_region_enable(struct nvdimm_bus *nvdimm_bus,
+ struct device *dev)
+{
+ struct nd_blk_region *ndbr = to_nd_blk_region(dev);
+ struct nvdimm *nvdimm;
+ struct ndtest_dimm *p;
+ struct ndtest_blk_mmio *mmio;
+
+ nvdimm = nd_blk_region_to_dimm(ndbr);
+ p = nvdimm_provider_data(nvdimm);
+
+ nd_blk_region_set_provider_data(ndbr, p);
+ p->region = to_nd_region(dev);
+
+ mmio = devm_kzalloc(dev, sizeof(struct ndtest_blk_mmio), GFP_KERNEL);
+ if (!mmio)
+ return -ENOMEM;
+
+ mmio->base = devm_nvdimm_memremap(dev, p->address, 12,
+ nd_blk_memremap_flags(ndbr));
+ if (!mmio->base) {
+ dev_err(dev, "%s failed to map blk dimm\n", nvdimm_name(nvdimm));
+ return -ENOMEM;
+ }
+
+ p->mmio = mmio;
+
+ return 0;
+}
+
+static struct nfit_test_resource *ndtest_resource_lookup(resource_size_t addr)
+{
+ int i;
+
+ for (i = 0; i < NUM_INSTANCES; i++) {
+ struct nfit_test_resource *n, *nfit_res = NULL;
+ struct ndtest_priv *t = instances[i];
+
+ if (!t)
+ continue;
+ spin_lock(&ndtest_lock);
+ list_for_each_entry(n, &t->resources, list) {
+ if (addr >= n->res.start && (addr < n->res.start
+ + resource_size(&n->res))) {
+ nfit_res = n;
+ break;
+ } else if (addr >= (unsigned long) n->buf
+ && (addr < (unsigned long) n->buf
+ + resource_size(&n->res))) {
+ nfit_res = n;
+ break;
+ }
+ }
+ spin_unlock(&ndtest_lock);
+ if (nfit_res)
+ return nfit_res;
+ }
+
+ pr_warn("Failed to get resource\n");
+
+ return NULL;
+}
+
+static void ndtest_release_resource(void *data)
+{
+ struct nfit_test_resource *res = data;
+
+ spin_lock(&ndtest_lock);
+ list_del(&res->list);
+ spin_unlock(&ndtest_lock);
+
+ if (resource_size(&res->res) >= DIMM_SIZE)
+ gen_pool_free(ndtest_pool, res->res.start,
+ resource_size(&res->res));
+ vfree(res->buf);
+ kfree(res);
+}
+
+static void *ndtest_alloc_resource(struct ndtest_priv *p, size_t size,
+ dma_addr_t *dma)
+{
+ dma_addr_t __dma;
+ void *buf;
+ struct nfit_test_resource *res;
+ struct genpool_data_align data = {
+ .align = SZ_128M,
+ };
+
+ res = kzalloc(sizeof(*res), GFP_KERNEL);
+ if (!res)
+ return NULL;
+
+ buf = vmalloc(size);
+ if (size >= DIMM_SIZE)
+ __dma = gen_pool_alloc_algo(ndtest_pool, size,
+ gen_pool_first_fit_align, &data);
+ else
+ __dma = (unsigned long) buf;
+
+ if (!__dma)
+ goto buf_err;
+
+ INIT_LIST_HEAD(&res->list);
+ res->dev = &p->pdev.dev;
+ res->buf = buf;
+ res->res.start = __dma;
+ res->res.end = __dma + size - 1;
+ res->res.name = "NFIT";
+ spin_lock_init(&res->lock);
+ INIT_LIST_HEAD(&res->requests);
+ spin_lock(&ndtest_lock);
+ list_add(&res->list, &p->resources);
+ spin_unlock(&ndtest_lock);
+
+ if (dma)
+ *dma = __dma;
+
+ if (!devm_add_action(&p->pdev.dev, ndtest_release_resource, res))
+ return res->buf;
+
+buf_err:
+ if (__dma && size >= DIMM_SIZE)
+ gen_pool_free(ndtest_pool, __dma, size);
+ if (buf)
+ vfree(buf);
+ kfree(res);
+
+ return NULL;
+}
+
+static int ndtest_dimm_register(struct ndtest_priv *priv,
+ struct ndtest_dimm *dimm, int id)
+{
+ struct device *dev = &priv->pdev.dev;
+ struct nd_mapping_desc mapping;
+ struct nd_region_desc *ndr_desc;
+ struct nd_blk_region_desc ndbr_desc;
+ unsigned long dimm_flags = 0;
+
+ if (dimm->type == NDTEST_REGION_TYPE_PMEM) {
+ set_bit(NDD_ALIASING, &dimm_flags);
+ if (priv->pdev.id == 0)
+ set_bit(NDD_LABELING, &dimm_flags);
+ }
+
+ dimm->nvdimm = nvdimm_create(priv->bus, dimm,
+ ndtest_nvdimm_attribute_groups, dimm_flags,
+ NDTEST_SCM_DIMM_CMD_MASK, 0, NULL);
+ if (!dimm->nvdimm) {
+ dev_err(dev, "Error creating DIMM object for %pOF\n", priv->dn);
+ return -ENXIO;
+ }
+
+ memset(&mapping, 0, sizeof(mapping));
+ memset(&ndbr_desc, 0, sizeof(ndbr_desc));
+
+ /* now add the region */
+ memset(&mapping, 0, sizeof(mapping));
+ mapping.nvdimm = dimm->nvdimm;
+ mapping.start = dimm->res.start;
+ mapping.size = dimm->size;
+
+ ndr_desc = &ndbr_desc.ndr_desc;
+ memset(ndr_desc, 0, sizeof(*ndr_desc));
+ ndr_desc->res = &dimm->res;
+ ndr_desc->provider_data = dimm;
+ ndr_desc->mapping = &mapping;
+ ndr_desc->num_mappings = 1;
+ ndr_desc->nd_set = &dimm->nd_set;
+ ndr_desc->num_lanes = 1;
+
+ if (dimm->type & NDTEST_REGION_TYPE_BLK) {
+ ndbr_desc.enable = ndtest_blk_region_enable;
+ ndbr_desc.do_io = ndtest_blk_do_io;
+ dimm->region = nvdimm_blk_region_create(priv->bus, ndr_desc);
+ } else
+ dimm->region = nvdimm_pmem_region_create(priv->bus, ndr_desc);
+
+ if (!dimm->region) {
+ dev_err(dev, "Error registering region %pR\n", ndr_desc->res);
+ return -ENXIO;
+ }
+
+ dimm->dev = device_create_with_groups(ndtest_dimm_class,
+ &priv->pdev.dev,
+ 0, dimm, dimm_attribute_groups,
+ "test_dimm%d", id);
+ if (!dimm->dev)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static int ndtest_nvdimm_init(struct ndtest_priv *p)
+{
+ struct ndtest_dimm *d;
+ u64 uuid[2];
+ void *res;
+ int i, id;
+
+ for (i = 0; i < p->config->dimm_count; i++) {
+ d = &p->config->dimm[i];
+ d->id = id = p->config->dimm_start + i;
+ res = ndtest_alloc_resource(p, LABEL_SIZE, NULL);
+ if (!res)
+ return -ENOMEM;
+
+ d->label_area = res;
+ sprintf(d->label_area, "label%d", id);
+ d->config_size = LABEL_SIZE;
+ d->res.name = p->pdev.name;
+
+ if (uuid_parse(d->uuid_str, (uuid_t *) uuid))
+ pr_err("failed to parse UUID\n");
+
+ d->nd_set.cookie1 = cpu_to_le64(uuid[0]);
+ d->nd_set.cookie2 = cpu_to_le64(uuid[1]);
+
+ switch (d->type) {
+ case NDTEST_REGION_TYPE_PMEM:
+ /* setup the resource */
+ res = ndtest_alloc_resource(p, d->size,
+ &d->res.start);
+ if (!res)
+ return -ENOMEM;
+
+ d->res.end = d->res.start + d->size - 1;
+ break;
+ case NDTEST_REGION_TYPE_BLK:
+ WARN_ON(p->nblks > NUM_DCR);
+
+ if (!ndtest_alloc_resource(p, d->size,
+ &p->dimm_dma[p->nblks]))
+ return -ENOMEM;
+
+ if (!ndtest_alloc_resource(p, LABEL_SIZE,
+ &p->label_dma[p->nblks]))
+ return -ENOMEM;
+
+ if (!ndtest_alloc_resource(p, LABEL_SIZE,
+ &p->dcr_dma[p->nblks]))
+ return -ENOMEM;
+
+ d->address = p->dimm_dma[p->nblks];
+ p->nblks++;
+
+ break;
+ }
+
+ ndtest_dimm_register(p, d, id);
+ }
+
+ return 0;
+}
+
+static int ndtest_bus_register(struct ndtest_priv *p,
+ struct ndtest_config *config)
+{
+ p->config = &config[p->pdev.id];
+
+ p->bus_desc.ndctl = ndtest_ctl;
+ p->bus_desc.module = THIS_MODULE;
+ p->bus_desc.provider_name = NULL;
+ p->bus_desc.cmd_mask =
+ 1UL << ND_CMD_ARS_CAP | 1UL << ND_CMD_ARS_START |
+ 1UL << ND_CMD_ARS_STATUS | 1UL << ND_CMD_CLEAR_ERROR |
+ 1UL << ND_CMD_CALL;
+
+ p->bus = nvdimm_bus_register(&p->pdev.dev, &p->bus_desc);
+ if (!p->bus) {
+ dev_err(&p->pdev.dev, "Error creating nvdimm bus %pOF\n", p->dn);
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static int ndtest_probe(struct platform_device *pdev)
+{
+ struct ndtest_priv *p;
+ int rc;
+
+ p = to_ndtest_priv(&pdev->dev);
+ ndtest_bus_register(p, bus_configs);
+
+ p->dcr_dma = devm_kcalloc(&p->pdev.dev, NUM_DCR,
+ sizeof(dma_addr_t), GFP_KERNEL);
+ p->label_dma = devm_kcalloc(&p->pdev.dev, NUM_DCR,
+ sizeof(dma_addr_t), GFP_KERNEL);
+ p->dimm_dma = devm_kcalloc(&p->pdev.dev, NUM_DCR,
+ sizeof(dma_addr_t), GFP_KERNEL);
+
+ rc = ndtest_nvdimm_init(p);
+ if (rc)
+ goto err;
+
+ rc = devm_add_action_or_reset(&pdev->dev, put_dimms, p);
+ if (rc)
+ goto err;
+
+ platform_set_drvdata(pdev, p);
+
+ return 0;
+
+err:
+ nvdimm_bus_unregister(p->bus);
+ kfree(p->bus_desc.provider_name);
+ put_device(&pdev->dev);
+ kfree(p);
+ return rc;
+}
+
+static int ndtest_remove(struct platform_device *pdev)
+{
+ struct ndtest_priv *p = to_ndtest_priv(&pdev->dev);
+
+ nvdimm_bus_unregister(p->bus);
+ return 0;
+}
+
+static const struct platform_device_id ndtest_id[] = {
+ { KBUILD_MODNAME },
+ { },
+};
+
+static struct platform_driver ndtest_driver = {
+ .probe = ndtest_probe,
+ .remove = ndtest_remove,
+ .driver = {
+ .name = KBUILD_MODNAME,
+ },
+ .id_table = ndtest_id,
+};
+
+static void ndtest_release(struct device *dev)
+{
+ struct ndtest_priv *p = to_ndtest_priv(dev);
+
+ kfree(p);
+}
+
+static __init int ndtest_init(void)
+{
+ int rc, i;
+
+ pmem_test();
+ libnvdimm_test();
+ device_dax_test();
+ dax_pmem_test();
+ dax_pmem_core_test();
+#ifdef CONFIG_DEV_DAX_PMEM_COMPAT
+ dax_pmem_compat_test();
+#endif
+
+ nfit_test_setup(ndtest_resource_lookup, NULL);
+
+ ndtest_dimm_class = class_create(THIS_MODULE, "nfit_test_dimm");
+ if (IS_ERR(ndtest_dimm_class)) {
+ rc = PTR_ERR(ndtest_dimm_class);
+ goto err_register;
+ }
+
+ ndtest_pool = gen_pool_create(ilog2(SZ_4M), NUMA_NO_NODE);
+ if (!ndtest_pool) {
+ rc = -ENOMEM;
+ goto err_register;
+ }
+
+ if (gen_pool_add(ndtest_pool, SZ_4G, SZ_4G, NUMA_NO_NODE)) {
+ rc = -ENOMEM;
+ goto err_register;
+ }
+
+ /* Each instance can be taken as a bus, which can have multiple dimms */
+ for (i = 0; i < NUM_INSTANCES; i++) {
+ struct ndtest_priv *priv;
+ struct platform_device *pdev;
+
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv) {
+ rc = -ENOMEM;
+ goto err_register;
+ }
+
+ INIT_LIST_HEAD(&priv->resources);
+ pdev = &priv->pdev;
+ pdev->name = KBUILD_MODNAME;
+ pdev->id = i;
+ pdev->dev.release = ndtest_release;
+ rc = platform_device_register(pdev);
+ if (rc) {
+ put_device(&pdev->dev);
+ goto err_register;
+ }
+ get_device(&pdev->dev);
+
+ instances[i] = priv;
+ }
+
+ rc = platform_driver_register(&ndtest_driver);
+ if (rc)
+ goto err_register;
+
+ return 0;
+
+err_register:
+ pr_err("Error registering platform device\n");
+ if (ndtest_pool)
+ gen_pool_destroy(ndtest_pool);
+
+ for (i = 0; i < NUM_INSTANCES; i++)
+ if (instances[i])
+ platform_device_unregister(&instances[i]->pdev);
+
+ nfit_test_teardown();
+ for (i = 0; i < NUM_INSTANCES; i++)
+ if (instances[i])
+ put_device(&instances[i]->pdev.dev);
+
+ return rc;
+}
+
+static __exit void ndtest_exit(void)
+{
+ int i;
+
+ for (i = 0; i < NUM_INSTANCES; i++)
+ platform_device_unregister(&instances[i]->pdev);
+
+ platform_driver_unregister(&ndtest_driver);
+ nfit_test_teardown();
+
+ gen_pool_destroy(ndtest_pool);
+
+ for (i = 0; i < NUM_INSTANCES; i++)
+ put_device(&instances[i]->pdev.dev);
+ class_destroy(ndtest_dimm_class);
+}
+
+module_init(ndtest_init);
+module_exit(ndtest_exit);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("IBM Corporation");
diff --git a/tools/testing/nvdimm/test/ndtest.h b/tools/testing/nvdimm/test/ndtest.h
new file mode 100644
index 000000000000..2e8ff749e2f4
--- /dev/null
+++ b/tools/testing/nvdimm/test/ndtest.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef NDTEST_H
+#define NDTEST_H
+
+#include <linux/platform_device.h>
+#include <linux/libnvdimm.h>
+
+enum dimm_type {
+ NDTEST_REGION_TYPE_PMEM = 0x0,
+ NDTEST_REGION_TYPE_BLK = 0x1,
+};
+
+struct ndtest_priv {
+ struct platform_device pdev;
+ struct device_node *dn;
+ struct list_head resources;
+ struct nvdimm_bus_descriptor bus_desc;
+ struct nvdimm_bus *bus;
+ struct ndtest_config *config;
+
+ dma_addr_t *dcr_dma;
+ dma_addr_t *label_dma;
+ dma_addr_t *dimm_dma;
+ bool is_volatile;
+ unsigned int flags;
+ unsigned int nblks;
+};
+
+struct ndtest_blk_mmio {
+ void __iomem *base;
+ u64 size;
+ u64 base_offset;
+ u32 line_size;
+ u32 num_lines;
+ u32 table_size;
+};
+
+struct ndtest_dimm {
+ struct resource res;
+ struct device *dev;
+ struct nvdimm *nvdimm;
+ struct nd_region *region;
+ struct nd_interleave_set nd_set;
+ struct ndtest_blk_mmio *mmio;
+
+ dma_addr_t address;
+ unsigned long config_size;
+ unsigned long fail_cmd;
+ void *label_area;
+ char *uuid_str;
+ enum dimm_type type;
+ unsigned int size;
+ unsigned int handle;
+ unsigned int physical_id;
+ int id;
+ int fail_cmd_code;
+};
+
+struct ndtest_config {
+ unsigned int dimm_count;
+ unsigned int dimm_start;
+ struct ndtest_dimm *dimm;
+};
+
+#endif /* NDTEST_H */
--
2.26.2
1 year, 6 months
Feedback requested: Exposing NVDIMM performance statistics in a generic way
by Vaibhav Jain
Hello,
I am looking for some community feedback on these two Problem-statements:
1.How to expose NVDIMM performance statistics in an arch or nvdimm vendor
agnostic manner ?
2. Is there a common set of performance statistics for NVDIMMs that all
vendors should provide ?
Problem context
===============
While working on bring up of PAPR SCM based NVDIMMs[1] for arch/powerpc
we want to expose certain dimm performance statistics like "Media
Read/Write Counts", "Power-on Seconds" etc to user-space [2]. These
performance statistics are similar to what ipmctl[3] reports for Intel®
Optane™ persistent memory via the '-show performance' command line
arg. However the reported set of performance stats doesn't cover the
entirety of all performance stats supported by PAPR SCM based NVDimms.
For example here is a subset of performance stats which are specific to
PAPR SCM NVDimms and that not reported by ipmctl:
* Controller Reset Count
* Controller Reset Elapsed Time
* Power-on Seconds
* Cache Read Hit Count
* Cache Write Hit Count
Possibility of updating ipmctl to add support for these performance
statistics is greatly hampered by no support for ACPI on Powerpc
arch. Secondly vendors who dont support ACPI/NFIT command set
similar to Intel® Optane™ (Example MSFT) are also left out in
lurch. Problem-statement#1 points to this specific problem.
Additionally in absence of any pre-agreed set of performance statistics
which all vendors should support, adding support for such a
functionality in ipmctl may not bode well of other nvdimm vendors. For
example if support for reporting "Controller Reset Count" is added to
ipmctl then it may not be applicable to other vendors such as Intel®
Optane™. This issue is what Problem-statement#2 refers to.
Possible Solution for Problem#1
===============================
One possible solution to Problem#1 can to add support for reporting
NVDIMM performance statistics in 'ndtcl'. 'libndctl' already has a layer
that abstracts underlying NVDIMM vendors (via struct ndctl_dimm_ops),
making supporting different NVDIMM vendors fairly easy. Also ndctl is
more widely used compared to 'ipmctl', hence adding such a functionality
to ndctl would make it more widely used.
Above solution was implemented as RFC patch-set[2] that exposes these
performance statistics through a generic abstraction in libndctl and
added a presentation layer for this data in ndctl[4]. It added a new
command line flags '--stat' to ndctl to report *all* nvdimm vendor
reported performance stats. The output is similar to one below:
# ndctl list -D --stats
[
{
"dev":"nmem0",
"stats":{
"Power-on Seconds":603931,
"Media Read Count":0,
"Media Write Count":6313,
}
}
]
This was done by adding two new dimm-ops callbacks that were
implemented by the papr_scm implementation within libndctl. These
callbacks are invoked by newly introduce code in 'util/json-smart.c'
that format the returned stats from these new dimm-ops and transform
them into a json-object to later presentation. I would request you to
look at RFC patch-set[2] to understand the implementation details.
Possibled Solution for Problem#2
================================
Solution to Problem-statement#2 is what eludes me though. If there is a
minimal set of performance stats (similar to what ndctl enforces for
health-stats) then implementation of such a functionality in
ndctl/ipmctl would be easy to implement. But is it really possible to
have such a common set of performance stats that NVDIMM vendors can
expose.
Patch-set[2] though tries to bypass this problem by letting the vendor
descide which performance stats to expose. This opens up a possibility
of this functionality to abused by dimm vendors to reports arbirary data
through this flag that may not be performance-stats.
Summing-up
==========
In light of above, requesting your feedback as to how
problem-statements#{1, 2} can be addressed within ndctl subsystem. Also
are these problems even worth solving.
References
==========
[1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_...
[2] "[ndctl RFC-PATCH 0/4] Add support for reporting PAPR NVDIMM
Statistics"
https://lore.kernel.org/linux-nvdimm/20200518110814.145644-1-vaibhav@linu...
[3] https://docs.pmem.io/ipmctl-user-guide/instrumentation/show-device-perfor...
[4] "[RFC-PATCH 1/4] ndctl,libndctl: Implement new dimm-ops 'new_stats'
and 'get_stat'"
https://lore.kernel.org/linux-nvdimm/20200514225258.508463-2-vaibhav@linu...
Thanks,
~ Vaibhav
1 year, 7 months
[PATCH v6 0/6] mm: introduce memfd_secret system call to create "secret" memory areas
by Mike Rapoport
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
This is an implementation of "secret" mappings backed by a file descriptor.
I've dropped the boot time reservation patch for now as it is not strictly
required for the basic usage and can be easily added later either with or
without CMA.
v6 changes:
* Silence the warning about missing syscall, thanks to Qian Cai
* Replace spaces with tabs in Kconfig additions, per Randy
* Add a selftest.
v5 changes:
* rebase on v5.9-rc5
* drop boot time memory reservation patch
v4 changes:
* rebase on v5.9-rc1
* Do not redefine PMD_PAGE_ORDER in fs/dax.c, thanks Kirill
* Make secret mappings exclusive by default and only require flags to
memfd_secret() system call for uncached mappings, thanks again Kirill :)
v3 changes:
* Squash kernel-parameters.txt update into the commit that added the
command line option.
* Make uncached mode explicitly selectable by architectures. For now enable
it only on x86.
v2 changes:
* Follow Michael's suggestion and name the new system call 'memfd_secret'
* Add kernel-parameters documentation about the boot option
* Fix i386-tinyconfig regression reported by the kbuild bot.
CONFIG_SECRETMEM now depends on !EMBEDDED to disable it on small systems
from one side and still make it available unconditionally on
architectures that support SET_DIRECT_MAP.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will have desired protection bits set in the user page
table. For instance, current implementation allows uncached mappings.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
Additionally, the secret mappings may be used as a mean to protect guest
memory in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
[1] that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
I've hesitated whether to continue to use new flags to memfd_create() or to
add a new system call and I've decided to use a new system call after I've
started to look into man pages update. There would have been two completely
independent descriptions and I think it would have been very confusing.
Hiding secret memory mappings behind an anonymous file allows (ab)use of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
As the fragmentation of the direct map was one of the major concerns raised
during the previous postings, I've added an amortizing cache of PMD-size
pages to each file descriptor that is used as an allocation pool for the
secret memory areas.
v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
Mike Rapoport (6):
mm: add definition of PMD_PAGE_ORDER
mmap: make mlock_future_check() global
mm: introduce memfd_secret system call to create "secret" memory areas
arch, mm: wire up memfd_secret system call were relevant
mm: secretmem: use PMD-size pages to amortize direct map fragmentation
secretmem: test: add basic selftest for memfd_secret(2)
arch/Kconfig | 7 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/riscv/include/asm/unistd.h | 1 +
arch/x86/Kconfig | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/dax.c | 11 +-
include/linux/pgtable.h | 3 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 7 +-
include/uapi/linux/magic.h | 1 +
include/uapi/linux/secretmem.h | 8 +
kernel/sys_ni.c | 2 +
mm/Kconfig | 4 +
mm/Makefile | 1 +
mm/internal.h | 3 +
mm/mmap.c | 5 +-
mm/secretmem.c | 333 ++++++++++++++++++++++
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 296 +++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 17 ++
25 files changed, 703 insertions(+), 13 deletions(-)
create mode 100644 include/uapi/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c
--
2.28.0
1 year, 7 months
[PATCH -next] device-dax: change error code from postive to negative in range_parse
by Zhang Qilong
call trace:
-> mapping_store()
-> range_parse()
......
rc = -ENXIO;
According to context, the error return value of
range_parse should be negative.
Signed-off-by: Zhang Qilong <zhangqilong3(a)huawei.com>
---
drivers/dax/bus.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 27513d311242..e15a1a7c2853 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1047,7 +1047,7 @@ static ssize_t range_parse(const char *opt, size_t len, struct range *range)
{
unsigned long long addr = 0;
char *start, *end, *str;
- ssize_t rc = EINVAL;
+ ssize_t rc = -EINVAL;
str = kstrdup(opt, GFP_KERNEL);
if (!str)
--
2.17.1
1 year, 7 months