Does BlobFS Asynchronous API support multi thread writing?
by chen.zhenghua@zte.com.cn
Hi everyone,
I simply tested the BlobFS Asynchronous API by using SPDK events framework to execute multi tasks, each task writes one file.
But it doesn't work, the spdk_file_write_async() reported an error when resizing the file size.
The call stack looks like this:
spdk_file_write_async() -> __readwrite() -> spdk_file_truncate_async() -> spdk_blob_resize()
The resize operation must be done in the metadata thread which invoked the spdk_fs_load(), so only the task dispatched to the metadata CPU core works.
That's to say only one thread can be used to write files. It's hard to use, and performance issues may arise.
Does anyone knows further more about this?
thanks very much
1 month
RFC: NVMf namespace masking
by Jonas Pfefferle
Hi all,
I would be happy to get some feedback on my NVMf target namespace masking
implementation using attach/detach:
https://review.spdk.io/gerrit/c/spdk/spdk/+/7821
The patch introduces namespace masking for NVMe-over-fabrics
targets by allowing to (dynamically) attach and detach
controllers to/from namespaces, cf. NVMe spec 1.4 - section 6.1.4.
Since SPDK only supports the dynamic controller model a new
controller is allocated on every fabric connect command.
This allows to attach/detach controllers of a specific
host NQN to/from a namespace. A host can only perform
operations to an active namespace. Inactive namespaces can
be listed (not supported by SPDK) but no additional
information can be retrieved:
"Unless otherwise noted, specifying an inactive NSID in a
command that uses the Namespace Identifier (NSID) field shall
cause the controller to abort the command with status
Invalid Field in Command" - NVMe spec 1.4 - section 6.1.5
Note that this patch does not implement the NVMe namespace
attachment command but allows to attach/detach via RPCs only.
To preserve current behavior all controllers are auto attached.
To not not auto attach controllers the nvmf_subsystem_add_ns
shall be called with "--no-auto-attach". We introduce two new
RPC calls:
- nvmf_ns_attach_ctrlr <subsysNQN> <NSID> [--host <hostNQN>]
- nvmf_ns_detach_ctrlr <subsysNQN> <NSID> [--host <hostNQN>]
If no host NQN is specified all controllers
(new and currently connected) will attach/detach to/from the
namespace specified.
The list in spdk_nvmf_ns is used to keep track of hostNQNs
which controllers should be attached on connect.
The active_ns array in spdk_nvmf_ctrlr is used for fast lookup
to check whether a NSID is active/inactive on command execution.
Thanks,
Jonas
7 months
Query on scaling SPDK threads
by lokesharo@gmail.com
Hello
In the SPDK source code, the message pool is of size 262143 ( taken from _thread_lib_init() ) and the per thread cache size is SPDK_MSG_MEMPOOL_CACHE_SIZE (1024). What this means is the first 255 threads created will have the cache size as 1024 and after that the threads will be created but the cache will be NULL and the threads have to refer the global pool to get the msge object to send the messages.
In my application, we create around 300 threads during init. So, 256th thread onward, global pool is referenced for the msge object.
Now, when the program is run, some of the threads that got 1024 entries in the cache while creation (i.e. threads between 0 - 255) remain unused. What this means is that some of the entries from the global pool remain unused always and the threads (from 256 - 300) are not able to get the free entries from the global pool and this leads to spdk_thread_send_msg() failing for the threads in range of 256 - 300.
I have the following doubts here:
1. How do we scale the number of threads in the SPDK environment with the hardcoded value of cache size and pool size.
2. Why is the pool size set to 262143 and cache size to 1024. Is there some logical explanation behind the numbers?
3. If the user changes the above two numbers, are there any issues with it, be it performance or memory or any thing else?
4. Is there any maximum limit that the SPDK suggests for the scaling of threads? If yes, what is the scaling model that SPDK suggests the user should implement?
Thanks
Lokesh
7 months, 1 week
Announcing The 2021 PRC SPDK, PMDK and Intel® Performance Analyzers Virtual Forum!
by Cao, Gang
Hi all,
We’re glad to announce this year’s PRC SPDK, PMDK and Intel® Performance Analyzers Forum.
It will be an Online Virtual Forum like last year and the date is Dec. 15 (Wednesday) as a whole day event.
The registration of the Forum will be opened later in Nov. and at that time, we will send out the email.
This is also a good opportunity to present some of your work around SPDK, PMDK and etc.
If you are interested, please send your presentation Title and Abstract to gang.cao(a)intel.com<mailto:gang.cao@intel.com> before Oct. 20. We will review and respond to you later.
If there is any change on this Forum, we will send out the email accordingly.
Have a nice day!
各位好,
今年SPDK, PMDK, Intel® Performance Analyzers会议,和去年一样,安排了线上分享。时间是12月15号周三全天。
具体的会议注册,预期在11月份会开放出来,届时会有相应的邮件。
同时这也是一次很好的机会来展示你们基于SPDK, PMDK等组件的相关工作。如果有这方面的想法,可以在10月20号前,把你们分享的题目和概要发送到 gang.cao(a)intel.com<mailto:gang.cao@intel.com> 邮箱。我们会跟进,包括沟通后续安排。
如果会议安排有什么变化,也会保持沟通,请留意相关邮件。
祝好!
Thanks,
Gang
7 months, 2 weeks
May NVMe-oF Target be extended into a data-switch?
by 330416470@qq.com
This is an idea during development, may it be feasible?
In NVMe-oF Target framework, data is received from RDMA or tcp channel, then written into NVMe SSD. NVMe-oF Target transport layer is closed and can not be accessed by other modules.
So it is not ok for strorage cluster, and data must be copied from source transport to destination transport when transmitting data from mater to slave nodes.
To solve this, transport layer's data pool will be expose and shared between RDMA and NVMe channels, which makes whole framework confused.
To be more comprehensive and acceptable, maybe NVMe-oF Target can be extended to a data switch, in witch receiving data from one channel and transmitting to one or multi channels without copying.
The switch can support many interface, such as tcp buffer, RDMA queue, Nvme channel, Vhost vring, and also host memory buffer. Each datapath can have single input and one or multiple outputs, that is decided by APP service.
NVMe-oF Target is only one application scene for data switch. This software data switch is suitable for storage cluster and vfio-user, vhost-user.
The difficulty is memory must be registered into all modules, such as RNIC and PCIe, and the memory should be shared among multiple modules. There are a lot of work to be done.
7 months, 3 weeks
Using of Compression VBDEV
by Ruslan Laishev
Hello!
I have created "Compression VBDEV" on top of Malloc0 BDEV.
$ sudo /data/devadm/spdk/scripts/rpc.py bdev_compress_create -p ./pmem_files -b "CryMalloc0A"
COMP_CryMalloc0A
$sudo /data/devadm/spdk/scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 COMP_CryMalloc0A
0
When I try to execute "mkfs /dev/sda" (where /dev/sda is the COMP_Malloc0 ) in the inner VM I got error on the VHOST's output:
[2021-09-13 11:32:51.759692] vbdev_compress.c: 811:_comp_bdev_io_submit: *ERROR*: Unknown I/O type 4
[2021-09-13 11:32:51.759741] vbdev_compress.c: 822:_comp_bdev_io_submit: *ERROR*: on bdev_io submission!
...
Is there what I'm need to check ?
Thanks in advance!
8 months
SPDK Jenkins CI offline 8th - 11th October
by Latecki, Karol
Announcement: SPDK Jenkins CI will be offline for a few days in October
When?
October 8th - October 11th
More specifically shutdown will start on Friday October 8th, 2PM GMT and end on Monday October 11th, 7AM GMT.
Why?
Building electrical maintenance.
How does that affect me?
SPDK CI will not be available to test patches submitted to review.spdk.io, so you will not receive "-1/+1 verified" votes.
Gerrit instance at review.spdk.io itself will be available so you can still submit your changes.
Thanks,
Karol Latecki
8 months, 1 week
QEMU virtio-scs and options
by Ruslan Laishev
Hello!
In the example page (at https://spdk.io/doc/vhost.html#vhost_qemu_config ) I see:
-chardev socket,id=spdk_vhost_scsi0,path=/var/tmp/vhost.0 \
-device vhost-user-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0,num_queues=2
QEMU-KVM return error:
2021-09-05T09:33:28.912405Z qemu-kvm: -device vhost-user-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0,num_queues=2: 'vhost-user-scsi-pci' is not a valid device model name
I changed vhost-user-scsi-pci to virtio-scsi-pci . Got:
2021-09-05T09:36:05.963504Z qemu-kvm: -device virtio-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0,num_queues=2: Property '.chardev' not found
[devadm@ceph-os-02 ~]$ sudo /usr/libexec/qemu-kvm -device virtio-scsi,?
virtio-scsi-pci options:
use-started=<bool>
event_idx=<bool> - on/off
failover_pair_id=<str>
packed=<bool> - on/off
ioeventfd=<bool> - on/off
multifunction=<bool> - on/off
rombar=<uint32>
virtqueue_size=<uint32>
x-disable-pcie=<bool> - on/off
indirect_desc=<bool> - on/off
x-pcie-lnkctl-init=<bool> - on/off
disable-modern=<bool>
num_queues=<uint32>
cmd_per_lun=<uint32>
disable-legacy=<OnOffAuto> - on/off/auto
command_serr_enable=<bool> - on/off
max_sectors=<uint32>
hotplug=<bool> - on/off
page-per-vq=<bool> - on/off
x-pcie-deverr-init=<bool> - on/off
x-pcie-pm-init=<bool> - on/off
x-pcie-flr-init=<bool> - on/off
x-pcie-lnksta-dllla=<bool> - on/off
param_change=<bool> - on/off
any_layout=<bool> - on/off
iothread=<link<iothread>>
addr=<int32> - Slot and optional function number, example: 06.0 or 06
migrate-extra=<bool> - on/off
modern-pio-notify=<bool> - on/off
vectors=<uint32>
x-pcie-extcap-init=<bool> - on/off
virtio-backend=<child<virtio-scsi-device>>
x-ignore-backend-features=<bool>
notify_on_empty=<bool> - on/off
iommu_platform=<bool> - on/off
ats=<bool> - on/off
romfile=<str>
virtio-pci-bus-master-bug-migration=<bool> - on/off
So, can someone help me with a right options ?
TIA.
8 months, 1 week