I simply tested the BlobFS Asynchronous API by using SPDK events framework to execute multi tasks, each task writes one file.
But it doesn't work, the spdk_file_write_async() reported an error when resizing the file size.
The call stack looks like this:
spdk_file_write_async() -> __readwrite() -> spdk_file_truncate_async() -> spdk_blob_resize()
The resize operation must be done in the metadata thread which invoked the spdk_fs_load(), so only the task dispatched to the metadata CPU core works.
That's to say only one thread can be used to write files. It's hard to use, and performance issues may arise.
Does anyone knows further more about this?
thanks very much
I would be happy to get some feedback on my NVMf target namespace masking
implementation using attach/detach:
The patch introduces namespace masking for NVMe-over-fabrics
targets by allowing to (dynamically) attach and detach
controllers to/from namespaces, cf. NVMe spec 1.4 - section 6.1.4.
Since SPDK only supports the dynamic controller model a new
controller is allocated on every fabric connect command.
This allows to attach/detach controllers of a specific
host NQN to/from a namespace. A host can only perform
operations to an active namespace. Inactive namespaces can
be listed (not supported by SPDK) but no additional
information can be retrieved:
"Unless otherwise noted, specifying an inactive NSID in a
command that uses the Namespace Identifier (NSID) field shall
cause the controller to abort the command with status
Invalid Field in Command" - NVMe spec 1.4 - section 6.1.5
Note that this patch does not implement the NVMe namespace
attachment command but allows to attach/detach via RPCs only.
To preserve current behavior all controllers are auto attached.
To not not auto attach controllers the nvmf_subsystem_add_ns
shall be called with "--no-auto-attach". We introduce two new
- nvmf_ns_attach_ctrlr <subsysNQN> <NSID> [--host <hostNQN>]
- nvmf_ns_detach_ctrlr <subsysNQN> <NSID> [--host <hostNQN>]
If no host NQN is specified all controllers
(new and currently connected) will attach/detach to/from the
The list in spdk_nvmf_ns is used to keep track of hostNQNs
which controllers should be attached on connect.
The active_ns array in spdk_nvmf_ctrlr is used for fast lookup
to check whether a NSID is active/inactive on command execution.
In the SPDK source code, the message pool is of size 262143 ( taken from _thread_lib_init() ) and the per thread cache size is SPDK_MSG_MEMPOOL_CACHE_SIZE (1024). What this means is the first 255 threads created will have the cache size as 1024 and after that the threads will be created but the cache will be NULL and the threads have to refer the global pool to get the msge object to send the messages.
In my application, we create around 300 threads during init. So, 256th thread onward, global pool is referenced for the msge object.
Now, when the program is run, some of the threads that got 1024 entries in the cache while creation (i.e. threads between 0 - 255) remain unused. What this means is that some of the entries from the global pool remain unused always and the threads (from 256 - 300) are not able to get the free entries from the global pool and this leads to spdk_thread_send_msg() failing for the threads in range of 256 - 300.
I have the following doubts here:
1. How do we scale the number of threads in the SPDK environment with the hardcoded value of cache size and pool size.
2. Why is the pool size set to 262143 and cache size to 1024. Is there some logical explanation behind the numbers?
3. If the user changes the above two numbers, are there any issues with it, be it performance or memory or any thing else?
4. Is there any maximum limit that the SPDK suggests for the scaling of threads? If yes, what is the scaling model that SPDK suggests the user should implement?
We’re glad to announce this year’s PRC SPDK, PMDK and Intel® Performance Analyzers Forum.
It will be an Online Virtual Forum like last year and the date is Dec. 15 (Wednesday) as a whole day event.
The registration of the Forum will be opened later in Nov. and at that time, we will send out the email.
This is also a good opportunity to present some of your work around SPDK, PMDK and etc.
If you are interested, please send your presentation Title and Abstract to gang.cao(a)intel.com<mailto:firstname.lastname@example.org> before Oct. 20. We will review and respond to you later.
If there is any change on this Forum, we will send out the email accordingly.
Have a nice day!
今年SPDK, PMDK, Intel® Performance Analyzers会议，和去年一样，安排了线上分享。时间是12月15号周三全天。
同时这也是一次很好的机会来展示你们基于SPDK, PMDK等组件的相关工作。如果有这方面的想法，可以在10月20号前，把你们分享的题目和概要发送到 gang.cao(a)intel.com<mailto:email@example.com> 邮箱。我们会跟进，包括沟通后续安排。
This is an idea during development, may it be feasible?
In NVMe-oF Target framework, data is received from RDMA or tcp channel, then written into NVMe SSD. NVMe-oF Target transport layer is closed and can not be accessed by other modules.
So it is not ok for strorage cluster, and data must be copied from source transport to destination transport when transmitting data from mater to slave nodes.
To solve this, transport layer's data pool will be expose and shared between RDMA and NVMe channels, which makes whole framework confused.
To be more comprehensive and acceptable, maybe NVMe-oF Target can be extended to a data switch, in witch receiving data from one channel and transmitting to one or multi channels without copying.
The switch can support many interface, such as tcp buffer, RDMA queue, Nvme channel, Vhost vring, and also host memory buffer. Each datapath can have single input and one or multiple outputs, that is decided by APP service.
NVMe-oF Target is only one application scene for data switch. This software data switch is suitable for storage cluster and vfio-user, vhost-user.
The difficulty is memory must be registered into all modules, such as RNIC and PCIe, and the memory should be shared among multiple modules. There are a lot of work to be done.
I have created "Compression VBDEV" on top of Malloc0 BDEV.
$ sudo /data/devadm/spdk/scripts/rpc.py bdev_compress_create -p ./pmem_files -b "CryMalloc0A"
$sudo /data/devadm/spdk/scripts/rpc.py vhost_scsi_controller_add_target vhost.0 0 COMP_CryMalloc0A
When I try to execute "mkfs /dev/sda" (where /dev/sda is the COMP_Malloc0 ) in the inner VM I got error on the VHOST's output:
[2021-09-13 11:32:51.759692] vbdev_compress.c: 811:_comp_bdev_io_submit: *ERROR*: Unknown I/O type 4
[2021-09-13 11:32:51.759741] vbdev_compress.c: 822:_comp_bdev_io_submit: *ERROR*: on bdev_io submission!
Is there what I'm need to check ?
Thanks in advance!
Announcement: SPDK Jenkins CI will be offline for a few days in October
October 8th - October 11th
More specifically shutdown will start on Friday October 8th, 2PM GMT and end on Monday October 11th, 7AM GMT.
Building electrical maintenance.
How does that affect me?
SPDK CI will not be available to test patches submitted to review.spdk.io, so you will not receive "-1/+1 verified" votes.
Gerrit instance at review.spdk.io itself will be available so you can still submit your changes.