Reporting intermittent test failures
by Harris, James R
Hi all,
I’ve seen a lot of cases recently where -1 votes from the test pool have been removed from a patch due to a failure unrelated to the patch, but then nothing was filed in GitHub for that failure. The filing in GitHub could be a new issue, or a comment on an existing issue.
Please make those GitHub updates a priority. It’s the only way the project can understand the frequency of those intermittent failures and gather to get them fixed. If you’re not sure if a failure has been seen before, search GitHub issues with the “Intermittent Failure” label, or ask on Slack if anyone else has seen the issue. There is no harm in filing a new issue that may be a duplicate – we can always clean these up later during the next bug scrub meeting. The important thing is that we get the failure tracked.
Thanks,
-Jim
2 months, 1 week
SPDK socket abstraction layer
by Zawadzki, Tomasz
Hello everyone,
Summary:
With this message I wanted to update SPDK community on state of VPP socket abstraction as of SPDK 19.07 release.
At this time there does not seem to be a clear efficiency improvements with VPP. There is no further work planned on SPDK and VPP integration.
Details:
As some of you may remember, SPDK 18.04 release introduced support for alternative socket types. Along with that release, Vector Packet Processing (VPP)<https://wiki.fd.io/view/VPP> 18.01 was integrated with SPDK, by expanding socket abstraction to use VPP Communications Library (VCL). TCP/IP stack in VPP<https://wiki.fd.io/view/VPP/HostStack> was in early stages back then and has seen improvements throughout the last year.
To better use VPP capabilities, following fruitful collaboration with VPP team, in SPDK 19.07, this implementation was changed from VCL to VPP Session API from VPP 19.04.2.
VPP socket abstraction has met some challenges due to inherent design of both projects, in particular related to running separate processes and memory copies.
Seeing improvements from original implementation was encouraging, yet measuring against posix socket abstraction (taking into consideration entire system, i.e. both processes), results are comparable. In other words, at this time there does not seem to be a clear benefit of either socket abstraction from standpoint of CPU efficiency or IOPS.
With this message I just wanted to update SPDK community on state of socket abstraction layers as of SPDK 19.07 release. Each SPDK release always brings improvements to the abstraction and its implementations, with exciting work on more efficient use of kernel TCP stack - changes in SPDK 19.10 and SPDK 20.01.
However there is no active involvement at this point around VPP implementation of socket abstraction in SPDK. Contributions in this area are always welcome. In case you're interested in implementing further enhancements of VPP and SPDK integration feel free to reply, or to use one of the many SPDK community communications channels<https://spdk.io/community/>.
Thanks,
Tomek
7 months, 3 weeks
threads
by Trevor Kramer
Is it allowed to connect to a nvme in one thread (spdk_nvme_connect) and
use the controller handle returned in different threads to allocate and use
qpairs (spdk_nvme_ctrlr_alloc_io_qpair)? Or does everything have to be done
in a single thread? I'm seeing intermittent buffer corruption and I'm
wondering if this could be the cause?
Thanks,
Trevor
8 months, 3 weeks
[Release] 20.04: SPDK Top, IDXD, NVMe qpair groups
by Zawadzki, Tomasz
On behalf of the SPDK community I'm pleased to announce the release of SPDK 20.04!
This release contains the following new features:
- SPDK Top: Added application to allow users to monitor resource consumption by a running SPDK application.
- NVMe qpair groups: Added API to allow for pooling of NVMe qpairs across a single entity.
- OCF: Added support for OCF v20.03.
- Crypto bdev: Added support for AES_XTS for the QAT polled mode driver.
- IDXD: Added support for IDXD as an accel plug-in module allowing for use with the generic accel framework API. This feature is considered experimental.
Note: Legacy INI style configuration for SPDK applications has been deprecated and will be removed in a future release. Please switch to JSON-RPC configuration files and/or RPC driven run-time configuration.
The full changelog for this release is available at:
https://github.com/spdk/spdk/releases/tag/v20.04
This release contains 989 commits from 43 authors with over 61k lines of code changed.
We'd especially like to recognize all of our first time contributors:
Allen Zhu
Charles Machalow
Maciej Szczepaniak
Michał Berger
Sudheer Mogilappagari
Sylvain Didelot
Xiaohui Zhu
Thanks to everyone for your contributions, participation, and effort!
Thanks,
Tomek
8 months, 3 weeks
Long IO latency with NVMe/TCP target due to large low watermark socket setting
by Wenhua Liu
Hi,
We opened an issue https://github.com/spdk/spdk/issues/1377 but did not see response, so I thought I should bring it here.
In SPDK NVMe/TCP target, when initializing the socket, the low watermark is set to sizeof(struct spdk_nvme_tcp_common_pdu_hdr), which is 24 bytes. In our testing, some times there might be very small data packet (as small as 16 bytes) be sent to wire. After this, if there is no more data sent to the same socket, this small data packet won’t be received by NVMe/TCP controller qpair thread because the size hasn’t reached the low watermark. Because of this, the qpair thread is waiting for more data come in and the initiator is waiting for the IO request to be completed. Hence the delay happens.
As the minimum data that allows target to determine the PDU type is sizeof(struct spdk_nvme_tcp_common_pdu_hdr), which is 8 bytes, we changed low watermark setting as below. With the change, the problem was gone immediately.
*** tcp.c.new 2020-04-30 05:31:37.196499792 +0000
--- tcp.c 2020-04-29 19:48:25.857651523 +0000
*************** spdk_nvmf_tcp_qpair_sock_init(struct spd
*** 911,917 ****
int rc;
/* set low water mark */
! rc = spdk_sock_set_recvlowat(tqpair->sock, sizeof(struct spdk_nvme_tcp_common_pdu_hdr));
if (rc != 0) {
SPDK_ERRLOG("spdk_sock_set_recvlowat() failed\n");
return rc;
--- 911,917 ----
int rc;
/* set low water mark */
! rc = spdk_sock_set_recvlowat(tqpair->sock, sizeof(struct spdk_nvme_tcp_c2h_data_hdr));
if (rc != 0) {
SPDK_ERRLOG("spdk_sock_set_recvlowat() failed\n");
return rc;
I would suggest SPDK have this change included in future release.
Thanks,
-Wenhua Liu
8 months, 3 weeks
Re: The parameter max_qpairs_per_ctrlr is misleading
by Wenhua Liu
Thanks for suggestion. Created issue https://github.com/spdk/spdk/issues/1378.
Regards,
-Wenhua
On 4/29/20, 9:38 PM, "Yang, Ziye" <ziye.yang(a)intel.com> wrote:
Hi Wenhua,
Could you submit a github issue in : https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.... ?
We just had a community meeting and discussed this issue. And we would change it in SPDK code and target for 20.07 release.
Best Regards
Ziye Yang
-----Original Message-----
From: Wenhua Liu <liuw(a)vmware.com>
Sent: Thursday, April 30, 2020 11:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] The parameter max_qpairs_per_ctrlr is misleading
When creating nvmf transport using command “rpc.py nvmf_create_transport”, there is a parameter max_qpairs_per_ctrlr. It means number of IO queues plus admin queue per controller. But when people talk about number of queues, usually it means number of IO queues. When someone creates nvmf transport, if he/she knows what max_qpairs_per_ctrlr mean and wants to have 8 IO queues, he/she will set this parameter to 9. If given 8, he/she will get only 7 IO queues.
In the worst case, if someone gives one to this parameter, he won’t get any IO queue !!!
I would suggest SPDK NVMF target changes the definition of this parameter and treat it as number of IO queues. Internally, SPDK should allow one extra qpair be created for admin queue.
Thanks,
-Wenhua Liu
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
_______________________________________________
SPDK mailing list -- spdk(a)lists.01.org
To unsubscribe send an email to spdk-leave(a)lists.01.org
8 months, 3 weeks
The parameter max_qpairs_per_ctrlr is misleading
by Wenhua Liu
When creating nvmf transport using command “rpc.py nvmf_create_transport”, there is a parameter max_qpairs_per_ctrlr. It means number of IO queues plus admin queue per controller. But when people talk about number of queues, usually it means number of IO queues. When someone creates nvmf transport, if he/she knows what max_qpairs_per_ctrlr mean and wants to have 8 IO queues, he/she will set this parameter to 9. If given 8, he/she will get only 7 IO queues.
In the worst case, if someone gives one to this parameter, he won’t get any IO queue !!!
I would suggest SPDK NVMF target changes the definition of this parameter and treat it as number of IO queues. Internally, SPDK should allow one extra qpair be created for admin queue.
Thanks,
-Wenhua Liu
8 months, 3 weeks
Preparation for SPDK 20.04 release
by Zawadzki, Tomasz
Hello all,
The merge window for SPDK 20.04 release will close by April 24th.
Please ensure all patches you believe should be included in the release are merged to master branch by this date.
You can do it by adding a hashtag '20.04' in Gerrit on those patches.
The current set of patches that are tagged and need to be reviewed can be seen here:
https://review.spdk.io/gerrit/q/hashtag:%2220.04%22+status:open
On April 24th new branch 'v20.04.x' will be created, and a patch on it will be tagged as release candidate.
Then, on April 30th, a formal release will take place tagging the last patch on the branch as SPDK 20.04.
Between release candidate and formal release, only critical fixes shall be backported to the 'v20.04.x' branch.
Development can continue without disruptions on 'master' branch.
PS. Due to rescheduling of US SPDK Summit, date for code freeze was moved to the original date.
Thanks,
Tomek
8 months, 4 weeks
vhost is destroyed many times when starting vhost device
by Li Feng
Hi,
In rte_vhost_compat.c, spdk_extern_vhost_pre_msg_handler will call
destroy_device when some conditions meet.
I tested and found when a vhost-blk setup, the vhost device will be
destroyed 9 times.
And the bdev(no matter which type) will be affected. I'm using the aio
bdev, and the io_setup/io_destroy will be called 9 times when a disk
is inserted to a VM.
My concern is:
Is there any solution to reduce the call times(at least: disks nums * 9)?
The call will slow down the VM boot time if VM has multiple disks.
There are some related logs:
Starting SPDK v20.01-pre git sha1 93eecf627 / DPDK 19.11.0 initialization...
[ DPDK EAL parameters: spdk_tgt --no-shconf -c 1 -m 1
--log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6
--iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations
--file-prefix=spdk_pid2004865 ]
EAL: No available hugepages reported in hugepages-1048576kB
app.c: 645:spdk_app_start: *NOTICE*: Total cores available: 1
reactor.c: 346:_spdk_reactor_run: *NOTICE*: Reactor started on core 0
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
rte_vhost_compat.c: 225:spdk_extern_vhost_pre_msg_handler: *ERROR*:
destroy device
rte_vhost_compat.c: 130:stop_device: *ERROR*: call stop_device
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
rte_vhost_compat.c: 225:spdk_extern_vhost_pre_msg_handler: *ERROR*:
destroy device
rte_vhost_compat.c: 130:stop_device: *ERROR*: call stop_device
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
rte_vhost_compat.c: 225:spdk_extern_vhost_pre_msg_handler: *ERROR*:
destroy device
rte_vhost_compat.c: 130:stop_device: *ERROR*: call stop_device
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
rte_vhost_compat.c: 225:spdk_extern_vhost_pre_msg_handler: *ERROR*:
destroy device
rte_vhost_compat.c: 130:stop_device: *ERROR*: call stop_device
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
rte_vhost_compat.c: 174:spdk_extern_vhost_pre_msg_handler: *ERROR*:
destroy device
rte_vhost_compat.c: 130:stop_device: *ERROR*: call stop_device
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
rte_vhost_compat.c: 193:spdk_extern_vhost_pre_msg_handler: *ERROR*:
destroy device
rte_vhost_compat.c: 130:stop_device: *ERROR*: call stop_device
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
rte_vhost_compat.c: 225:spdk_extern_vhost_pre_msg_handler: *ERROR*:
destroy device
rte_vhost_compat.c: 130:stop_device: *ERROR*: call stop_device
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
rte_vhost_compat.c: 225:spdk_extern_vhost_pre_msg_handler: *ERROR*:
destroy device
rte_vhost_compat.c: 130:stop_device: *ERROR*: call stop_device
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
rte_vhost_compat.c: 225:spdk_extern_vhost_pre_msg_handler: *ERROR*:
destroy device
rte_vhost_compat.c: 130:stop_device: *ERROR*: call stop_device
bdev_aio.c: 581:bdev_aio_group_destroy_cb: *ERROR*: call io_destroy
bdev_aio.c: 563:bdev_aio_group_create_cb: *ERROR*: call io_setup
Thanks,
Feng Li
--
The SmartX email address is only for business purpose. Any sent message
that is not related to the business is not authorized or permitted by
SmartX.
本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权.
8 months, 4 weeks