This patch set provides functionality that will help to improve the
locality of the async_schedule calls used to provide deferred
initialization.
This patch set originally started out focused on just the one call to
async_schedule_domain in the nvdimm tree that was being used to defer the
device_add call however after doing some digging I realized the scope of
this was much broader than I had originally planned. As such I went
through and reworked the underlying infrastructure down to replacing the
queue_work call itself with a function of my own and opted to try and
provide a NUMA aware solution that would work for a broader audience.
In addition I have added several tweaks and/or clean-ups to the front of the
patch set. Patches 1 through 4 address a number of issues that actually were
causing the existing async_schedule calls to not show the performance that
they could due to either not scaling on a per device basis, or due to issues
that could result in a potential deadlock. For example, patch 4 addresses the
fact that we were calling async_schedule once per driver instead of once
per device, and as a result we would have still ended up with devices
being probed on a non-local node without addressing this first.
RFC->v1:
Dropped nvdimm patch to submit later.
It relies on code in libnvdimm development tree.
Simplified queue_work_near to just convert node into a CPU.
Split up drivers core and PM core patches.
v1->v2:
Renamed queue_work_near to queue_work_node
Added WARN_ON_ONCE if we use queue_work_node with per-cpu workqueue
v2->v3:
Added Acked-by for queue_work_node patch
Continued rename from _near to _node to be consistent with queue_work_node
Renamed async_schedule_near_domain to async_schedule_node_domain
Renamed async_schedule_near to async_schedule_node
Added kerneldoc for new async_schedule_XXX functions
Updated patch description for patch 4 to include data on potential gains
v3->v4
Added patch to consolidate use of need_parent_lock
Make asynchronous driver probing explicit about use of drvdata
v4->v5
Added patch to move async_synchronize_full to address deadlock
Added bit async_probe to act as mutex for probe/remove calls
Added back nvdimm patch as code it relies on is now in Linus's tree
Incorporated review comments on parent & device locking consolidation
Rebased on latest linux-next
v5->v6:
Drop the "This patch" or "This change" from start of patch descriptions.
Drop unnecessary parenthesis in first patch
Use same wording for "selecting a CPU" in comments added in first patch
Added kernel documentation for async_probe member of device
Fixed up comments for async_schedule calls in patch 2
Moved code related setting async driver out of device.h and into dd.c
Added Reviewed-by for several patches
v6->v7:
Fixed typo which had kernel doc refer to "lock" when I meant "unlock"
Dropped "bool X:1" to "u8 X:1" from patch description
Added async_driver to device_private structure to store driver
Dropped unecessary code shuffle from async_probe patch
Reordered patches to move fixes up to front
Added Reviewed-by for several patches
Updated cover page and patch descriptions throughout the set
v7->v8:
Replaced async_probe value with dead, only apply dead in device_del
Dropped Reviewed-by from patch 2 due to significant changes
Added Reviewed-by for patches reviewed by Luis Chamberlain
---
Alexander Duyck (9):
driver core: Move async_synchronize_full call
driver core: Establish order of operations for device_add and device_del via bitflag
device core: Consolidate locking and unlocking of parent and device
driver core: Probe devices asynchronously instead of the driver
workqueue: Provide queue_work_node to queue work near a given NUMA node
async: Add support for queueing on specific NUMA node
driver core: Attach devices on CPU local to device node
PM core: Use new async_schedule_dev command
libnvdimm: Schedule device registration on node local to the device
drivers/base/base.h | 4 +
drivers/base/bus.c | 46 ++------------
drivers/base/core.c | 11 +++
drivers/base/dd.c | 152 ++++++++++++++++++++++++++++++++++++++-------
drivers/base/power/main.c | 12 ++--
drivers/nvdimm/bus.c | 11 ++-
include/linux/async.h | 82 +++++++++++++++++++++++-
include/linux/device.h | 5 +
include/linux/workqueue.h | 2 +
kernel/async.c | 53 +++++++++-------
kernel/workqueue.c | 84 +++++++++++++++++++++++++
11 files changed, 362 insertions(+), 100 deletions(-)
--
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/dax-fixes-4.20-rc6
...to receive the last of the known regression fixes and fallout from
the Xarray conversion of the filesystem-dax implementation. On the path
to debugging why the dax memory-failure injection test started failing
after the Xarray conversion a couple more fixes for the
dax_lock_mapping_entry(), now called dax_lock_page(), surfaced. Those
plus the bug that started the hunt are now addressed. These patches
have appeared in a -next release with no issues reported.
Note the touches to mm/memory-failure.c are just the conversion to the
new function signature for dax_lock_page().
---
The following changes since commit 2e6e902d185027f8e3cb8b7305238f7e35d6a436:
Linux 4.20-rc4 (2018-11-25 14:19:31 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/dax-fixes-4.20-rc6
for you to fetch changes up to 27359fd6e5f3c5db8fe544b63238b6170e8806d8:
dax: Fix unlock mismatch with updated API (2018-12-04 21:32:00 -0800)
----------------------------------------------------------------
dax fixes 4.20-rc6
* Fix the Xarray conversion of fsdax to properly handle
dax_lock_mapping_entry() in the presense of pmd entries.
* Fix inode destruction racing a new lock request.
----------------------------------------------------------------
Matthew Wilcox (3):
dax: Check page->mapping isn't NULL
dax: Don't access a freed inode
dax: Fix unlock mismatch with updated API
fs/dax.c | 55 ++++++++++++++++++++++++++++++++++++-----------------
include/linux/dax.h | 14 ++++++++------
mm/memory-failure.c | 6 ++++--
3 files changed, 50 insertions(+), 25 deletions(-)
Hi Linus, please pull from:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-fixes-4.20-rc6
...to receive a regression fix for the Address Range Scrub
implementation, yes another one, and support for platforms that
misalign persistent memory relative to the Linux memory hotplug section
constraint. Longer term, support for sub-section memory hotplug would
alleviate alignment waste, but until then this hack allows a 'struct
page' memmap to be established for these misaligned memory regions.
These have all appeared in a -next release, and thanks to Patrick for
reporting and testing the alignment padding fix.
---
The following changes since commit 9ff01193a20d391e8dbce4403dd5ef87c7eaaca6:
Linux 4.20-rc3 (2018-11-18 13:33:44 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-fixes-4.20-rc6
for you to fetch changes up to b5fd2e00a60248902315fb32210550ac3cb9f44c:
acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short" (2018-12-05 14:16:13 -0800)
----------------------------------------------------------------
libnvdimm fixes 4.20-rc6
* Unless and until the core mm handles memory hotplug units smaller than
a section (128M), persistent memory namespaces must be padded to
section alignment. The libnvdimm core already handled section
collision with "System RAM", but some configurations overlap
independent "Persistent Memory" ranges within a section, so additional
padding injection is added for that case.
* The recent reworks of the ARS (address range scrub) state machine to
reduce the number of state flags inadvertantly missed a conversion of
acpi_nfit_ars_rescan() call sites. Fix the regression whereby
user-requested ARS results in a "short" scrub rather than a "long"
scrub.
* Fixup the unit tests to handle / test the 128M section alignment of
mocked test resources.
----------------------------------------------------------------
Dan Williams (3):
tools/testing/nvdimm: Align test resources to 128M
libnvdimm, pfn: Pad pfn namespaces relative to other regions
acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short"
drivers/acpi/nfit/core.c | 2 +-
drivers/nvdimm/nd-core.h | 2 ++
drivers/nvdimm/pfn_devs.c | 64 +++++++++++++++++++++++-----------------
drivers/nvdimm/region_devs.c | 41 +++++++++++++++++++++++++
tools/testing/nvdimm/test/nfit.c | 35 ++++++++++++++++++++--
5 files changed, 114 insertions(+), 30 deletions(-)
ZOTOFF
Spécialiste en rénovation de sol
27 rue Léon Loiseau - 93100 Montreuil
Tel: 01 48 30 10 71 ligne direct 06 48 69 82 27 MR ZOTOFF
Madame, Monsieur,
Nous nous permettons de vous solliciter afin de vous proposer nos prestations de rénovation de sols, par application résine Epoxy et Polyuréthane + antidérapant, peinture, traçages, marquages, signalisations de vos sols. Nous intervenons sur la France entière.
Nous intervenons dans tout type de locaux :
Garages, entrepôts, parkings, ateliers, laboratoires pharmaceutiques et autres.
Nos techniciens se déplacent dans toute la France, et peuvent répondre à vos demandes dans un délai de 72 heures maximum.
Nos points forts :
Dés délais respectés ;
Intervention soir weekend et jours fériés ;
Techniciens spécialisé et équipés de matériel récent et adaptés selon la surface à rénover ;
Une mise au propre des locaux après notre intervention ;
Une intervention dans toute la France ;
Des devis réalisés sous 24/48h ;
Nos prix : Tarif approximatif à partir de 500m² prix 14.00€ m²
Site internet www.peinture-et-marquage-resine-epoxy.fr
L’Equipe ZOTOFF se tient à votre disposition pour demandes de renseignements : N'HESITEZ PAS A ENVOYER VOS PHOTOS PAR EMAIL OU PAR MMS
Adresse mail : zotoff.jonathan(a)gmail.com
Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
store a replacement entry in the Xarray at the given xas-index with the
DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
value of the entry relative to the current Xarray state to be specified.
In most contexts dax_unlock_entry() is operating in the same scope as
the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
case the implementation needs to recall the original entry. In the case
where the original entry is a 'pmd' entry it is possible that the pfn
performed to do the lookup is misaligned to the value retrieved in the
Xarray.
When creating the 'unlocked' entry be sure to align it to the expected
size as reflected by the DAX_PMD flag. Otherwise, future lookups become
confused by finding a 'pte' aligned value at an index that should return
a 'pmd' aligned value. This mismatch results in failure signatures like
the following:
WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
RIP: 0010:dax_insert_entry+0x2b2/0x2d0
[..]
Call Trace:
dax_iomap_pte_fault.isra.41+0x791/0xde0
ext4_dax_huge_fault+0x16f/0x1f0
? up_read+0x1c/0xa0
__do_fault+0x1f/0x160
__handle_mm_fault+0x1033/0x1490
handle_mm_fault+0x18b/0x3d0
...and potential corruption of nearby page state as housekeeping
routines, like dax_disassociate_entry(), may overshoot their expected
bounds starting at the wrong page.
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Jan Kara <jack(a)suse.cz>
Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
fs/dax.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 3f592dc18d67..6c5f8f345b1a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -59,6 +59,7 @@ static inline unsigned int pe_order(enum page_entry_size pe_size)
/* The order of a PMD entry */
#define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT)
+#define PMD_ORDER_MASK ~((1UL << PMD_ORDER) - 1)
static wait_queue_head_t wait_table[DAX_WAIT_TABLE_ENTRIES];
@@ -93,9 +94,13 @@ static unsigned long dax_to_pfn(void *entry)
return xa_to_value(entry) >> DAX_SHIFT;
}
-static void *dax_make_entry(pfn_t pfn, unsigned long flags)
+static void *dax_make_entry(pfn_t pfn_t, unsigned long flags)
{
- return xa_mk_value(flags | (pfn_t_to_pfn(pfn) << DAX_SHIFT));
+ unsigned long pfn = pfn_t_to_pfn(pfn_t);
+
+ if (flags & DAX_PMD)
+ pfn &= PMD_ORDER_MASK;
+ return xa_mk_value(flags | (pfn << DAX_SHIFT));
}
static bool dax_is_locked(void *entry)
Dear Sir,
Please find enclosed our two different PO and do the needful
Kindly provide us Proforma Invoice enabling us to process Advance
Payment.
Await your sooner and urgent Reply
Regards,
MEDICAL SECIENTIFIC OVERSEA GROUP.CO,.LTD
R-jay Obing, Purchasing Officer
MARRIOTT | TRAVEL BRILLIANTLY
Marriott Hoterra Manila
No. 10 Newport Boulevard Newport City Complex, Pasay City Metro
Manila 1309, Philippines
632 988.9999 loc 8188 | F 632 836. 9931 / 48 | www.biohealth-int.com
E: accounts(a)oilexportsbr.ga
I have loosely based this patch series off of the following patch series
from Zhang Yi:
https://lore.kernel.org/lkml/cover.1536342881.git.yi.z.zhang@linux.intel.com
The original set had attempted to address the fact that DAX pages were
treated like MMIO pages which had resulted in reduced performance. It
attempted to address this by ignoring the PageReserved flag if the page
was either a DEV_DAX or FS_DAX page.
I am proposing this as an alternative to that set. The main reason for this
is because I believe there are a few issues that were overlooked with that
original set. Specifically KVM seems to have two different uses for the
PageReserved flag. One being whether or not we can pin the memory, the other
being if we should be marking the pages as dirty or accessed. I believe
only the pinning really applies so I have split the uses of
kvm_is_reserved_pfn and updated the function uses to determine support for
page pinning to include a check of the pgmap to see if it supports pinning.
---
Alexander Duyck (3):
kvm: Split use cases for kvm_is_reserved_pfn to kvm_is_refcounted_pfn
mm: Add support for exposing if dev_pagemap supports refcount pinning
kvm: Add additional check to determine if a page is refcounted
arch/x86/kvm/mmu.c | 6 +++---
drivers/nvdimm/pfn_devs.c | 2 ++
include/linux/kvm_host.h | 2 +-
include/linux/memremap.h | 5 ++++-
include/linux/mm.h | 11 +++++++++++
virt/kvm/kvm_main.c | 34 +++++++++++++++++++++++++---------
6 files changed, 46 insertions(+), 14 deletions(-)
--