Does BlobFS Asynchronous API support multi thread writing?
by chen.zhenghua@zte.com.cn
Hi everyone,
I simply tested the BlobFS Asynchronous API by using SPDK events framework to execute multi tasks, each task writes one file.
But it doesn't work, the spdk_file_write_async() reported an error when resizing the file size.
The call stack looks like this:
spdk_file_write_async() -> __readwrite() -> spdk_file_truncate_async() -> spdk_blob_resize()
The resize operation must be done in the metadata thread which invoked the spdk_fs_load(), so only the task dispatched to the metadata CPU core works.
That's to say only one thread can be used to write files. It's hard to use, and performance issues may arise.
Does anyone knows further more about this?
thanks very much
4 months
Query regarding nvme_tcp_read_data() API.
by Senthil Kumar Veluswamy
Hi,
In the SPDKv20.07, I’ve query on the below code.
In this nvme_tcp_read_data(), if spdk_sock_recv() return “0” i.e. ret = 0, then it treats that condition as FATAL and the caller of nvme_tcp_read_data() i.e. nvmf_tcp_sock_process(), disconnects the connection.
So, would like to know why this condition is treated as FATAL condition and disconnect is issued [as readv() can return ZERO bytes]?
static int
nvme_tcp_read_data(struct spdk_sock *sock, int bytes,
void *buf)
{
int ret;
ret = spdk_sock_recv(sock, buf, bytes);
if (ret > 0) {
return ret;
}
if (ret < 0) {
if (errno == EAGAIN || errno == EWOULDBLOCK) {
return 0;
}
/* For connect reset issue, do not output error log */
if (errno != ECONNRESET) {
SPDK_ERRLOG("spdk_sock_recv() failed, errno %d: %s\n",
errno, spdk_strerror(errno));
}
}
/* connection closed */
return NVME_TCP_CONNECTION_FATAL;
}
Thanks,
Senthil Kumar V.
8 months
Can rocksdb workloads run in the blobfs of qemu virtual machine?
by zhaos@nbjl.nankai.edu.cn
Hi
I try to test the performance of rocksdb in virtual machine where guest and host nvme driver both work in user mode.
Running spdk-rocksdb in bare-mental alone works without a problem, but the same configuration fail in VM.
It seems like feasible theoretically, but I don't know if anyone has tried before.
Running configuration:
Guest:
- spdk & spdk/rocksdb are mainline version.
- follow the BlobFS Getting Started Guide.
Host:
- spdk 18.04 release
- spdk/qemu with vhost-user-nvme patch(https://review.gerrithub.io/c/spdk/qemu/+/409094/)
- follow the vhost-target guide
Error log in Guest :
[root@localhost rocksdb-spdk]# ./db_bench --benchmarks="fillrandom" -spdk=/root/spdk/rocksdb.json -spdk_bdev=Nvme0n1
[2021-11-23 20:22:17.631040] Starting SPDK v22.01-pre git sha1 a1014fc / DPDK 21.08.0 initialization...
[2021-11-23 20:22:17.631315] [ DPDK EAL parameters: [2021-11-23 20:22:17.631360] rocksdb [2021-11-23 20:22:17.631391] --no-shconf [2021-11-23 20:22:17.631438] -c 0x1 [2021-11-23 20:22:17.631475] --log-level=lib.eal:6 [2021-11-23 20:22:17.631504] --log-level=lib.cryptodev:5 [2021-11-23 20:22:17.631534] --log-level=user1:6 [2021-11-23 20:22:17.631563] --iova-mode=pa [2021-11-23 20:22:17.631592] --base-virtaddr=0x200000000000 [2021-11-23 20:22:17.631621] --match-allocations [2021-11-23 20:22:17.631649] --file-prefix=spdk_pid12118 [2021-11-23 20:22:17.631678] ]
TELEMETRY: No legacy callbacks, legacy socket not created
[2021-11-23 20:22:17.775111] app.c: 543:spdk_app_start: *NOTICE*: Total cores available: 1
[2021-11-23 20:22:17.935764] app.c: 389:app_setup_trace: *NOTICE*: Tracepoint Group Mask 0x80 specified.
[2021-11-23 20:22:17.935853] app.c: 393:app_setup_trace: *NOTICE*: Use 'spdk_trace -s rocksdb -p 12118' to capture a snapshot of events at runtime.
[2021-11-23 20:22:17.935874] app.c: 395:app_setup_trace: *NOTICE*: Or copy /dev/shm/rocksdb_trace.pid12118 for offline analysis/debug.
[2021-11-23 20:22:17.935898] reactor.c: 943:reactor_run: *NOTICE*: Reactor started on core 0
[2021-11-23 20:22:17.936476] accel_engine.c:1012:spdk_accel_engine_initialize: *NOTICE*: Accel engine initialized to use software engine.
[2021-11-23 20:22:18.079801] nvme_qpair.c: 241:nvme_admin_qpair_print_command: *NOTICE*: SET FEATURES ASYNC EVENT CONFIGURATION cid:23 cdw10:0000000b PRP1 0x0 PRP2 0x0
[2021-11-23 20:22:18.079888] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: INVALID FIELD (00/02) qid:0 cid:23 cdw0:0 sqhd:0004 p:1 m:0 dnr:1
[2021-11-23 20:22:18.079911] nvme_ctrlr.c:3185:nvme_ctrlr_configure_aer_done: *NOTICE*: [0000:00:03.0] nvme_ctrlr_configure_aer failed!
using bdev Nvme0n1
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
open error: Invalid argument: Found options incompatible with filesystem: IO error: /tmp/rocksdbtest-0/dbbench/CURRENT: No such file or directory
[root@localhost examples]# ./identify
[2021-11-23 20:27:06.080535] Starting SPDK v22.01-pre git sha1 a1014fc / DPDK 21.08.0 initialization...
[2021-11-23 20:27:06.080773] [ DPDK EAL parameters: [2021-11-23 20:27:06.080824] identify [2021-11-23 20:27:06.080854] --no-shconf [2021-11-23 20:27:06.080902] -c 0x1 [2021-11-23 20:27:06.080937] -n 1 [2021-11-23 20:27:06.080965] -m 0 [2021-11-23 20:27:06.080992] --log-level=lib.eal:6 [2021-11-23 20:27:06.081019] --log-level=lib.cryptodev:5 [2021-11-23 20:27:06.081047] --log-level=user1:6 [2021-11-23 20:27:06.081074] --iova-mode=pa [2021-11-23 20:27:06.081101] --base-virtaddr=0x200000000000 [2021-11-23 20:27:06.081129] --match-allocations [2021-11-23 20:27:06.081168] --file-prefix=spdk_pid12136 [2021-11-23 20:27:06.081196] ]
TELEMETRY: No legacy callbacks, legacy socket not created
[2021-11-23 20:27:06.293188] nvme_qpair.c: 241:nvme_admin_qpair_print_command: *NOTICE*: SET FEATURES ASYNC EVENT CONFIGURATION cid:23 cdw10:0000000b PRP1 0x0 PRP2 0x0
[2021-11-23 20:27:06.293340] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: INVALID FIELD (00/02) qid:0 cid:23 cdw0:0 sqhd:0004 p:1 m:0 dnr:1
[2021-11-23 20:27:06.293394] nvme_ctrlr.c:3185:nvme_ctrlr_configure_aer_done: *NOTICE*: [0000:00:03.0] nvme_ctrlr_configure_aer failed!
[2021-11-23 20:27:06.293925] nvme_qpair.c: 241:nvme_admin_qpair_print_command: *NOTICE*: GET FEATURES ARBITRATION cid:23 cdw10:00000001 PRP1 0x0 PRP2 0x0
[2021-11-23 20:27:06.293986] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: INVALID FIELD (00/02) qid:0 cid:23 cdw0:0 sqhd:0006 p:1 m:0 dnr:1
get_feature(0x01) failed
[2021-11-23 20:27:06.294317] nvme_qpair.c: 241:nvme_admin_qpair_print_command: *NOTICE*: GET FEATURES POWER MANAGEMENT cid:23 cdw10:00000002 PRP1 0x0 PRP2 0x0
[2021-11-23 20:27:06.294374] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: INVALID FIELD (00/02) qid:0 cid:23 cdw0:0 sqhd:0007 p:1 m:0 dnr:1
get_feature(0x02) failed
[2021-11-23 20:27:06.294563] nvme_qpair.c: 241:nvme_admin_qpair_print_command: *NOTICE*: GET FEATURES TEMPERATURE THRESHOLD cid:23 cdw10:00000004 PRP1 0x0 PRP2 0x0
[2021-11-23 20:27:06.294618] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: INVALID FIELD (00/02) qid:0 cid:23 cdw0:0 sqhd:0008 p:1 m:0 dnr:1
get_feature(0x04) failed
[2021-11-23 20:27:06.295011] nvme_qpair.c: 251:nvme_admin_qpair_print_command: *NOTICE*: GET LOG PAGE (02) qid:0 cid:23 nsid:ffffffff cdw10:000f0001 cdw11:00000000 PRP1 0x1ac3bf000 PRP2 0x0
[2021-11-23 20:27:06.295070] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: INVALID OPCODE (00/01) qid:0 cid:23 cdw0:0 sqhd:000a p:1 m:0 dnr:1
get log page failed
[2021-11-23 20:27:06.295159] nvme_qpair.c: 251:nvme_admin_qpair_print_command: *NOTICE*: GET LOG PAGE (02) qid:0 cid:22 nsid:ffffffff cdw10:007f0002 cdw11:00000000 PRP1 0x1ac3be000 PRP2 0x0
[2021-11-23 20:27:06.295207] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: INVALID OPCODE (00/01) qid:0 cid:22 cdw0:0 sqhd:000b p:1 m:0 dnr:1
get log page failed
[2021-11-23 20:27:06.295264] nvme_qpair.c: 251:nvme_admin_qpair_print_command: *NOTICE*: GET LOG PAGE (02) qid:0 cid:21 nsid:ffffffff cdw10:007f0003 cdw11:00000000 PRP1 0x1ac3bd000 PRP2 0x0
[2021-11-23 20:27:06.295309] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: INVALID OPCODE (00/01) qid:0 cid:21 cdw0:0 sqhd:000c p:1 m:0 dnr:1
get log page failed
=====================================================
NVMe Controller at 0000:00:03.0 [8086:5845]
=====================================================
Controller Capabilities/Features
================================
Vendor ID: 8086
Subsystem Vendor ID: 8086
Serial Number: NVMe_vhost.2
Model Number: SPDK Virtual NVMe Controller
Firmware Version: 18.04
Recommended Arb Burst: 6
IEEE OUI Identifier: e4 d2 5c
Multi-path I/O
May have multiple subsystem ports: No
May have multiple controllers: No
Associated with SR-IOV VF: No
Max Data Transfer Size: 131072
Max Number of Namespaces: 1
NVMe Specification Version (VS): 1.0
NVMe Specification Version (Identify): 1.0
Maximum Queue Entries: 256
Contiguous Queues Required: Yes
Arbitration Mechanisms Supported
Weighted Round Robin: Not Supported
Vendor Specific: Not Supported
Reset Timeout: 500 ms
Doorbell Stride: 4 bytes
NVM Subsystem Reset: Not Supported
Command Sets Supported
NVM Command Set: Supported
Boot Partition: Not Supported
Memory Page Size Minimum: 4096 bytes
Memory Page Size Maximum: 4096 bytes
Persistent Memory Region: Not Supported
Optional Asynchronous Events Supported
Namespace Attribute Notices: Not Supported
Firmware Activation Notices: Not Supported
128-bit Host Identifier: Not Supported
Controller Memory Buffer Support
================================
Supported: No
Persistent Memory Region Support
================================
Supported: No
Admin Command Set Attributes
============================
Security Send/Receive: Not Supported
Format NVM: Not Supported
Firmware Activate/Download: Not Supported
Namespace Management: Not Supported
Device Self-Test: Not Supported
Directives: Not Supported
NVMe-MI: Not Supported
Virtualization Management: Not Supported
Doorbell Buffer Config: Supported
Abort Command Limit: 1
Async Event Request Limit: 1
Number of Firmware Slots: N/A
Firmware Slot 1 Read-Only: N/A
Firmware Update Granularity: No Information Provided
Per-Namespace SMART Log: No
Asymmetric Namespace Access Log Page: Not Supported
Command Effects Log Page: Not Supported
Get Log Page Extended Data: Not Supported
Telemetry Log Pages: Not Supported
Error Log Page Entries Supported: 1
Keep Alive: Not Supported
NVM Command Set Attributes
==========================
Submission Queue Entry Size
Max: 64
Min: 64
Completion Queue Entry Size
Max: 16
Min: 16
Number of Namespaces: 1
Compare Command: Not Supported
Write Uncorrectable Command: Not Supported
Dataset Management Command: Supported
Write Zeroes Command: Not Supported
Set Features Save Field: Not Supported
Reservations: Not Supported
Timestamp: Not Supported
Copy: Not Supported
Volatile Write Cache: Not Present
Atomic Write Unit (Normal): 1
Atomic Write Unit (PFail): 1
Atomic Compare & Write Unit: 1
Fused Compare & Write: Not Supported
Scatter-Gather List
SGL Command Set: Not Supported
SGL Keyed: Not Supported
SGL Bit Bucket Descriptor: Not Supported
SGL Metadata Pointer: Not Supported
Oversized SGL: Not Supported
SGL Metadata Address: Not Supported
SGL Offset: Not Supported
Transport SGL Data Block: Not Supported
Replay Protected Memory Block: Not Supported
Firmware Slot Information
=========================
Active slot: 0
Error Log
=========
Number of Queues
================
Number of I/O Submission Queues: 1
Number of I/O Completion Queues: 1
Active Namespaces
=================
[2021-11-23 20:27:06.296348] nvme_qpair.c: 241:nvme_admin_qpair_print_command: *NOTICE*: GET FEATURES ERROR_RECOVERY cid:21 cdw10:00000005 PRP1 0x0 PRP2 0x0
[2021-11-23 20:27:06.296404] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: INVALID FIELD (00/02) qid:0 cid:21 cdw0:0 sqhd:000d p:1 m:0 dnr:1
get_feature(0x05) failed
Namespace ID:1
Command Set Identifier: NVM (00h)
Deallocate: Supported
Deallocated/Unwritten Error: Not Supported
Deallocated Read Value: Unknown
Deallocate in Write Zeroes: Not Supported
Deallocated Guard Field: 0xFFFF
Flush: Not Supported
Reservation: Not Supported
Namespace Sharing Capabilities: Private
Size (in LBAs): 937703088 (447GiB)
Capacity (in LBAs): 937703088 (447GiB)
Utilization (in LBAs): 937703088 (447GiB)
Thin Provisioning: Not Supported
Per-NS Atomic Units: No
NGUID/EUI64 Never Reused: No
Number of LBA Formats: 1
Current LBA Format: LBA Format #00
LBA Format #00: Data Size: 512 Metadata Size: 0
8 months, 3 weeks
nvme_bdev_add_ns: *ERROR*: Namespaces are not identical.
by lullajd@yahoo.com
Hi,
This is with v21.10-rc1-122-g64fa301f6
When I try to configure multipath and run bdevperf, I see that nvme_bdev_add_ns() is complaining that "Namespaces are not identical". Could anybody please suggest what is wrong in the conf file ?
Here is my nvme discover o/p at initiator:
root@myVM:~/user/spdk# nvme discover -t tcp -a 10.1.109.152 -q nqn.2015-09.com.Xyz:15.4.7.3
Discovery Log Number of Records 2, Generation counter 1
=====Discovery Log Entry 0======
trtype: tcp
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 0
trsvcid: 4420
subnqn: nqn.2015-09.com.xyz:93111e6e-663a-4ad0-9028-a3d75d90e5d3
traddr: 15.161.1.2
sectype: none
=====Discovery Log Entry 1======
trtype: tcp
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 0
trsvcid: 4420
subnqn: nqn.2015-09.com.xyz:93111e6e-663a-4ad0-9028-a3d75d90e5d3
traddr: 15.162.2.2
sectype: none
here is my dev_new.conf
root@myVM:~/user/spdk# cat ../bdev_new.conf
{
"subsystems": [
{
"subsystem": "bdev",
"config": [
{
"method": "bdev_nvme_set_options", "params": {
"timeout_us": 255000000,
"bdev_retry_count": -1,
"action_on_timeout": "none"
}
},
{
"method": "bdev_nvme_attach_controller", "params": {
"name": "Nvme0",
"trtype": "tcp",
"traddr": "15.161.1.2",
"trsvcid": "4420",
"subnqn": "nqn.2015-09.com.xyz:93111e6e-663a-4ad0-9028-a3d75d90e5d3",
"adrfam": "IPv4",
"hostnqn": "nqn.2015-09.com.Xyz:15.4.7.3",
"multipath": "multipath"
}
},
{
"method": "bdev_nvme_attach_controller", "params": {
"name": "Nvme0",
"trtype": "tcp",
"traddr": "15.162.2.2",
"trsvcid": "4420",
"subnqn": "nqn.2015-09.com.xyz:93111e6e-663a-4ad0-9028-a3d75d90e5d3",
"adrfam": "IPv4",
"multipath": "multipath",
"hostnqn": "nqn.2015-09.com.Xyz:15.4.7.3"
}
}
]
}
]
}
here is the bdevperf run showing "Namespaces are not identical".
root@myVM:~/user/spdk# ./test/bdev/bdevperf/bdevperf -q 1 -o 4096 -w randwrite -c /home/localadmin/user/bdev_new.conf -t 60 -k 9999999
[2021-11-21 22:56:23.465497] Starting SPDK v22.01-pre git sha1 64fa301f6 / DPDK 21.08.0 initialization...
[2021-11-21 22:56:23.465680] [ DPDK EAL parameters: [2021-11-21 22:56:23.466282] bdevperf [2021-11-21 22:56:23.466548] --no-shconf [2021-11-21 22:56:23.466602] -c 0x1 [2021-11-21 22:56:23.467107] --log-level=lib.eal:6 [2021-11-21 22:56:23.467165] --log-level=lib.cryptodev:5 [2021-11-21 22:56:23.467690] --log-level=user1:6 [2021-11-21 22:56:23.467745] --iova-mode=pa [2021-11-21 22:56:23.467994] --base-virtaddr=0x200000000000 [2021-11-21 22:56:23.468136] --match-allocations [2021-11-21 22:56:23.468172] --file-prefix=spdk_pid12913 [2021-11-21 22:56:23.468262] ]
TELEMETRY: No legacy callbacks, legacy socket not created
[2021-11-21 22:56:23.600199] app.c: 543:spdk_app_start: *NOTICE*: Total cores available: 1
[2021-11-21 22:56:23.877499] reactor.c: 943:reactor_run: *NOTICE*: Reactor started on core 0
[2021-11-21 22:56:23.886093] accel_engine.c:1012:spdk_accel_engine_initialize: *NOTICE*: Accel engine initialized to use software engine.
[2021-11-21 22:56:24.102809] bdev_nvme.c:2555:nvme_bdev_add_ns: *ERROR*: Namespaces are not identical.
Running I/O for 60 seconds...
Job: Nvme0n1 (Core Mask 0x1)
Nvme0n1 : 14367.88 IOPS 56.12 MiB/s
=============================================================
Total : 14367.88 IOPS 56.12 MiB/s
root@myVM:~/user/spdk#
8 months, 3 weeks
appropriate timeout values in bdev_nvme_set_options for failover tests
by lullajd@yahoo.com
Hi,
This is about the case where initiator is using spdk.
For testing nvmeof/TCP failover (cold primary/backup) I am trying to use the following timeout values but they don't seem to help:
{
"subsystems": [
{
"subsystem": "bdev",
"config": [
{
"method": "bdev_nvme_set_options", "params": {
"retry_count": 254,
"timeout_us": 255000000,
"keep_alive_timeout_ms": 255000,
"transport_retry_count": 254,
"bdev_retry_count": -1,
"nvme_adminq_poll_period_us": 100000,
"nvme_ioq_poll_period_us": 0,
In my test, when the IO has started, after some time, I am making the primary network path to the nvme device (nvmeof + tcp) unavailable.
I expect the secondary path to become active and the IO must resume.
Without spdk (ie when I use the kernel drivers on linux), the path switchover on initiator takes ~45 seconds and the IO resumes.
I suspect something wrong in the timeout values above when trying with spdk.
Could somebody please suggest more appropriate values for the above timeouts? And also please point out if something else also need to be corrected above?
thanks
Jitendra
9 months
[RFC PATCH v1] [RFC] Use multiple threads to handle vhost virtqueues
by majieyue@linux.alibaba.com
From: Ma Jie Yue <majieyue(a)linux.alibaba.com>
Currently the vhost virtqueues of the same device are handled by only one spdk
thread, even we have many reactors running, which means the performance of a
vhost device can not be scaled up with multiple cores.
This patch bind each virtqueue to an individual spdk thread, and leverage the
spdk scheduler ability to run these threads on different reactors. Now only
the vhost blk module is finished, and just leave the vhost scsi later.
During the test, the spdk_top shows theses threads are indeed dispatched to
different reactors, and the IO performance is also increased with the number
of queues.
Signed-off-by: Ma Jie Yue <majieyue(a)linux.alibaba.com>
---
lib/vhost/vhost.c | 158 +++++++++++++++----
lib/vhost/vhost_blk.c | 384 +++++++++++++++++++++------------------------
lib/vhost/vhost_internal.h | 28 +++-
3 files changed, 324 insertions(+), 246 deletions(-)
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index edae5938e..80343dd1c 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -1089,6 +1089,46 @@ vhost_session_stop_done(struct spdk_vhost_session *vsession, int response)
vhost_session_cb_done(response);
}
+void
+vhost_session_start_vq_done(struct spdk_vhost_virtqueue *vq, int response)
+{
+ struct spdk_vhost_session *vsession = vq->vsession;
+
+ if (response == 0) {
+ vq->started = true;
+ vsession->active_queues++;
+
+ if (vsession->active_queues == vsession->max_queues) {
+ vsession->started = true;
+
+ assert(vsession->vdev->active_session_num < UINT32_MAX);
+ vsession->vdev->active_session_num++;
+ }
+ }
+
+ vhost_session_cb_done(response);
+}
+
+void
+vhost_session_stop_vq_done(struct spdk_vhost_virtqueue *vq, int response)
+{
+ struct spdk_vhost_session *vsession = vq->vsession;
+
+ if (response == 0) {
+ vq->started = false;
+ vsession->active_queues--;
+
+ if (vsession->active_queues == 0) {
+ vsession->started = false;
+
+ assert(vsession->vdev->active_session_num > 0);
+ vsession->vdev->active_session_num--;
+ }
+ }
+
+ vhost_session_cb_done(response);
+}
+
static void
vhost_event_cb(void *arg1)
{
@@ -1101,7 +1141,7 @@ vhost_event_cb(void *arg1)
}
vsession = vhost_session_find_by_id(ctx->vdev, ctx->vsession_id);
- ctx->cb_fn(ctx->vdev, vsession, NULL);
+ ctx->cb_fn(ctx->vdev, vsession, ctx->user_ctx);
pthread_mutex_unlock(&g_vhost_mutex);
}
@@ -1126,6 +1166,34 @@ vhost_session_send_event(struct spdk_vhost_session *vsession,
return g_dpdk_response;
}
+int
+vhost_session_send_event_mt(struct spdk_vhost_session *vsession,
+ spdk_vhost_session_fn cb_fn, unsigned timeout_sec,
+ const char *errmsg)
+{
+ struct vhost_session_fn_ctx ev_ctx = {0};
+ struct spdk_vhost_dev *vdev = vsession->vdev;
+ unsigned long i;
+
+ ev_ctx.vdev = vdev;
+ ev_ctx.vsession_id = vsession->id;
+ ev_ctx.cb_fn = cb_fn;
+
+ for (i = 0; i < vsession->max_queues; i++) {
+ ev_ctx.user_ctx = (void *)i;
+ spdk_thread_send_msg(vsession->thread[i], vhost_event_cb, &ev_ctx);
+
+ pthread_mutex_unlock(&g_vhost_mutex);
+ wait_for_semaphore(timeout_sec, errmsg);
+ pthread_mutex_lock(&g_vhost_mutex);
+
+ if (g_dpdk_response)
+ break;
+ }
+
+ return g_dpdk_response;
+}
+
static void
foreach_session_finish_cb(void *arg1)
{
@@ -1250,7 +1318,7 @@ int
vhost_stop_device_cb(int vid)
{
struct spdk_vhost_session *vsession;
- int rc;
+ int i, rc;
pthread_mutex_lock(&g_vhost_mutex);
vsession = vhost_session_find_by_vid(vid);
@@ -1267,6 +1335,14 @@ vhost_stop_device_cb(int vid)
}
rc = _stop_session(vsession);
+
+ /* clean up the threads */
+ if (!rc) {
+ for (i = 0; i < vsession->max_queues; i++) {
+ spdk_thread_send_msg(vsession->thread[i], vhost_dev_thread_exit, NULL);
+ }
+ }
+
pthread_mutex_unlock(&g_vhost_mutex);
return rc;
@@ -1280,6 +1356,7 @@ vhost_start_device_cb(int vid)
int rc = -1;
uint16_t i;
bool packed_ring;
+ struct spdk_cpuset *cpumask;
pthread_mutex_lock(&g_vhost_mutex);
@@ -1304,9 +1381,11 @@ vhost_start_device_cb(int vid)
packed_ring = ((vsession->negotiated_features & (1ULL << VIRTIO_F_RING_PACKED)) != 0);
vsession->max_queues = 0;
+ cpumask = spdk_thread_get_cpumask(vdev->thread);
memset(vsession->virtqueue, 0, sizeof(vsession->virtqueue));
for (i = 0; i < SPDK_VHOST_MAX_VQUEUES; i++) {
struct spdk_vhost_virtqueue *q = &vsession->virtqueue[i];
+ char *name;
q->vsession = vsession;
q->vring_idx = -1;
@@ -1362,6 +1441,16 @@ vhost_start_device_cb(int vid)
}
q->packed.packed_ring = packed_ring;
+
+ name = spdk_sprintf_alloc("%s.%u", vsession->name, i);
+ vsession->thread[i] = spdk_thread_create(name, cpumask);
+ free(name);
+ if (!vsession->thread[i]) {
+ SPDK_ERRLOG("Failed to create thread for virtqueue %s.%u", vsession->name, i);
+ rc = -EIO;
+ goto out;
+ }
+
vsession->max_queues = i + 1;
}
@@ -1401,55 +1490,56 @@ vhost_start_device_cb(int vid)
}
out:
+ if (rc) {
+ for (i = 0; i < vsession->max_queues; i++) {
+ spdk_thread_send_msg(vsession->thread[i], vhost_dev_thread_exit, NULL);
+ }
+ }
pthread_mutex_unlock(&g_vhost_mutex);
return rc;
}
void
-vhost_session_set_interrupt_mode(struct spdk_vhost_session *vsession, bool interrupt_mode)
+vhost_session_set_vq_interrupt_mode(struct spdk_vhost_virtqueue *q, bool interrupt_mode)
{
- uint16_t i;
bool packed_ring;
int rc = 0;
+ uint64_t num_events = 1;
+ struct spdk_vhost_session *vsession = q->vsession;
packed_ring = ((vsession->negotiated_features & (1ULL << VIRTIO_F_RING_PACKED)) != 0);
- for (i = 0; i < vsession->max_queues; i++) {
- struct spdk_vhost_virtqueue *q = &vsession->virtqueue[i];
- uint64_t num_events = 1;
+ /* vring.desc and vring.desc_packed are in a union struct
+ * so q->vring.desc can replace q->vring.desc_packed.
+ */
+ if (q->vring.desc == NULL || q->vring.size == 0) {
+ return;
+ }
- /* vring.desc and vring.desc_packed are in a union struct
- * so q->vring.desc can replace q->vring.desc_packed.
- */
- if (q->vring.desc == NULL || q->vring.size == 0) {
- continue;
+ if (interrupt_mode) {
+ /* Enable I/O submission notifications, we'll be interrupting. */
+ if (packed_ring) {
+ * (volatile uint16_t *) &q->vring.device_event->flags = VRING_PACKED_EVENT_FLAG_ENABLE;
+ } else {
+ * (volatile uint16_t *) &q->vring.used->flags = 0;
}
- if (interrupt_mode) {
- /* Enable I/O submission notifications, we'll be interrupting. */
- if (packed_ring) {
- * (volatile uint16_t *) &q->vring.device_event->flags = VRING_PACKED_EVENT_FLAG_ENABLE;
- } else {
- * (volatile uint16_t *) &q->vring.used->flags = 0;
- }
-
- /* In case of race condition, always kick vring when switch to intr */
- rc = write(q->vring.kickfd, &num_events, sizeof(num_events));
- if (rc < 0) {
- SPDK_ERRLOG("failed to kick vring: %s.\n", spdk_strerror(errno));
- }
+ /* In case of race condition, always kick vring when switch to intr */
+ rc = write(q->vring.kickfd, &num_events, sizeof(num_events));
+ if (rc < 0) {
+ SPDK_ERRLOG("failed to kick vring: %s.\n", spdk_strerror(errno));
+ }
- vsession->interrupt_mode = true;
+ vsession->interrupt_mode = true;
+ } else {
+ /* Disable I/O submission notifications, we'll be polling. */
+ if (packed_ring) {
+ * (volatile uint16_t *) &q->vring.device_event->flags = VRING_PACKED_EVENT_FLAG_DISABLE;
} else {
- /* Disable I/O submission notifications, we'll be polling. */
- if (packed_ring) {
- * (volatile uint16_t *) &q->vring.device_event->flags = VRING_PACKED_EVENT_FLAG_DISABLE;
- } else {
- * (volatile uint16_t *) &q->vring.used->flags = VRING_USED_F_NO_NOTIFY;
- }
-
- vsession->interrupt_mode = false;
+ * (volatile uint16_t *) &q->vring.used->flags = VRING_USED_F_NO_NOTIFY;
}
+
+ vsession->interrupt_mode = false;
}
}
diff --git a/lib/vhost/vhost_blk.c b/lib/vhost/vhost_blk.c
index 55fb82530..78821ce09 100644
--- a/lib/vhost/vhost_blk.c
+++ b/lib/vhost/vhost_blk.c
@@ -102,17 +102,20 @@ struct spdk_vhost_blk_session {
/* The parent session must be the very first field in this struct */
struct spdk_vhost_session vsession;
struct spdk_vhost_blk_dev *bvdev;
- struct spdk_poller *requestq_poller;
- struct spdk_io_channel *io_channel;
- struct spdk_poller *stop_poller;
+ struct spdk_poller *requestq_poller[SPDK_VHOST_MAX_VQUEUES];
+ struct spdk_io_channel *io_channel[SPDK_VHOST_MAX_VQUEUES];
+ struct spdk_poller *stop_poller[SPDK_VHOST_MAX_VQUEUES];
};
/* forward declaration */
+static int vhost_blk_stop_vq_cb(struct spdk_vhost_dev *vdev,
+ struct spdk_vhost_session *vsession, void *unused);
+
static const struct spdk_vhost_dev_backend vhost_blk_device_backend;
static int
process_blk_request(struct spdk_vhost_blk_task *task,
- struct spdk_vhost_blk_session *bvsession);
+ struct spdk_vhost_virtqueue *vq);
static struct spdk_vhost_blk_session *
to_blk_session(struct spdk_vhost_session *vsession)
@@ -124,8 +127,8 @@ to_blk_session(struct spdk_vhost_session *vsession)
static void
blk_task_finish(struct spdk_vhost_blk_task *task)
{
- assert(task->bvsession->vsession.task_cnt > 0);
- task->bvsession->vsession.task_cnt--;
+ assert(task->vq->task_cnt > 0);
+ task->vq->task_cnt--;
task->used = false;
}
@@ -421,7 +424,7 @@ blk_request_resubmit(void *arg)
struct spdk_vhost_blk_task *task = (struct spdk_vhost_blk_task *)arg;
int rc = 0;
- rc = process_blk_request(task, task->bvsession);
+ rc = process_blk_request(task, task->vq);
if (rc == 0) {
SPDK_DEBUGLOG(vhost_blk, "====== Task %p resubmitted ======\n", task);
} else {
@@ -435,12 +438,13 @@ blk_request_queue_io(struct spdk_vhost_blk_task *task)
int rc;
struct spdk_vhost_blk_session *bvsession = task->bvsession;
struct spdk_bdev *bdev = bvsession->bvdev->bdev;
+ struct spdk_vhost_virtqueue *vq = task->vq;
task->bdev_io_wait.bdev = bdev;
task->bdev_io_wait.cb_fn = blk_request_resubmit;
task->bdev_io_wait.cb_arg = task;
- rc = spdk_bdev_queue_io_wait(bdev, bvsession->io_channel, &task->bdev_io_wait);
+ rc = spdk_bdev_queue_io_wait(bdev, bvsession->io_channel[vq->vring_idx], &task->bdev_io_wait);
if (rc != 0) {
SPDK_ERRLOG("%s: failed to queue I/O, rc=%d\n", bvsession->vsession.name, rc);
invalid_blk_request(task, VIRTIO_BLK_S_IOERR);
@@ -449,8 +453,10 @@ blk_request_queue_io(struct spdk_vhost_blk_task *task)
static int
process_blk_request(struct spdk_vhost_blk_task *task,
- struct spdk_vhost_blk_session *bvsession)
+ struct spdk_vhost_virtqueue *vq)
{
+ struct spdk_vhost_session *vsession = vq->vsession;
+ struct spdk_vhost_blk_session *bvsession = to_blk_session(vsession);
struct spdk_vhost_blk_dev *bvdev = bvsession->bvdev;
const struct virtio_blk_outhdr *req;
struct virtio_blk_discard_write_zeroes *desc;
@@ -503,12 +509,12 @@ process_blk_request(struct spdk_vhost_blk_task *task,
if (type == VIRTIO_BLK_T_IN) {
task->used_len = payload_len + sizeof(*task->status);
- rc = spdk_bdev_readv(bvdev->bdev_desc, bvsession->io_channel,
+ rc = spdk_bdev_readv(bvdev->bdev_desc, bvsession->io_channel[vq->vring_idx],
&task->iovs[1], task->iovcnt, req->sector * 512,
payload_len, blk_request_complete_cb, task);
} else if (!bvdev->readonly) {
task->used_len = sizeof(*task->status);
- rc = spdk_bdev_writev(bvdev->bdev_desc, bvsession->io_channel,
+ rc = spdk_bdev_writev(bvdev->bdev_desc, bvsession->io_channel[vq->vring_idx],
&task->iovs[1], task->iovcnt, req->sector * 512,
payload_len, blk_request_complete_cb, task);
} else {
@@ -540,7 +546,7 @@ process_blk_request(struct spdk_vhost_blk_task *task,
return -1;
}
- rc = spdk_bdev_unmap(bvdev->bdev_desc, bvsession->io_channel,
+ rc = spdk_bdev_unmap(bvdev->bdev_desc, bvsession->io_channel[vq->vring_idx],
desc->sector * 512, desc->num_sectors * 512,
blk_request_complete_cb, task);
if (rc) {
@@ -570,7 +576,7 @@ process_blk_request(struct spdk_vhost_blk_task *task,
(uint64_t)desc->sector * 512, (uint64_t)desc->num_sectors * 512);
}
- rc = spdk_bdev_write_zeroes(bvdev->bdev_desc, bvsession->io_channel,
+ rc = spdk_bdev_write_zeroes(bvdev->bdev_desc, bvsession->io_channel[vq->vring_idx],
desc->sector * 512, desc->num_sectors * 512,
blk_request_complete_cb, task);
if (rc) {
@@ -590,7 +596,7 @@ process_blk_request(struct spdk_vhost_blk_task *task,
invalid_blk_request(task, VIRTIO_BLK_S_IOERR);
return -1;
}
- rc = spdk_bdev_flush(bvdev->bdev_desc, bvsession->io_channel,
+ rc = spdk_bdev_flush(bvdev->bdev_desc, bvsession->io_channel[vq->vring_idx],
0, flush_bytes,
blk_request_complete_cb, task);
if (rc) {
@@ -639,7 +645,7 @@ process_blk_task(struct spdk_vhost_virtqueue *vq, uint16_t req_idx)
return;
}
- task->bvsession->vsession.task_cnt++;
+ vq->task_cnt++;
blk_task_init(task);
@@ -653,7 +659,7 @@ process_blk_task(struct spdk_vhost_virtqueue *vq, uint16_t req_idx)
return;
}
- if (process_blk_request(task, task->bvsession) == 0) {
+ if (process_blk_request(task, vq) == 0) {
SPDK_DEBUGLOG(vhost_blk, "====== Task %p req_idx %d submitted ======\n", task,
req_idx);
} else {
@@ -702,7 +708,7 @@ process_packed_blk_task(struct spdk_vhost_virtqueue *vq, uint16_t req_idx)
req_idx, (req_idx + num_descs - 1) % vq->vring.size,
&task->inflight_head);
- task->bvsession->vsession.task_cnt++;
+ vq->task_cnt++;
blk_task_init(task);
@@ -715,7 +721,7 @@ process_packed_blk_task(struct spdk_vhost_virtqueue *vq, uint16_t req_idx)
return;
}
- if (process_blk_request(task, task->bvsession) == 0) {
+ if (process_blk_request(task, vq) == 0) {
SPDK_DEBUGLOG(vhost_blk, "====== Task %p req_idx %d submitted ======\n", task,
task_idx);
} else {
@@ -760,7 +766,7 @@ process_packed_inflight_blk_task(struct spdk_vhost_virtqueue *vq,
/* It's for cleaning inflight entries */
task->inflight_head = req_idx;
- task->bvsession->vsession.task_cnt++;
+ vq->task_cnt++;
blk_task_init(task);
@@ -773,7 +779,7 @@ process_packed_inflight_blk_task(struct spdk_vhost_virtqueue *vq,
return;
}
- if (process_blk_request(task, task->bvsession) == 0) {
+ if (process_blk_request(task, vq) == 0) {
SPDK_DEBUGLOG(vhost_blk, "====== Task %p req_idx %d submitted ======\n", task,
task_idx);
} else {
@@ -893,20 +899,6 @@ vdev_vq_worker(void *arg)
return _vdev_vq_worker(vq);
}
-static int
-vdev_worker(void *arg)
-{
- struct spdk_vhost_blk_session *bvsession = arg;
- struct spdk_vhost_session *vsession = &bvsession->vsession;
- uint16_t q_idx;
-
- for (q_idx = 0; q_idx < vsession->max_queues; q_idx++) {
- _vdev_vq_worker(&vsession->virtqueue[q_idx]);
- }
-
- return SPDK_POLLER_BUSY;
-}
-
static void
no_bdev_process_vq(struct spdk_vhost_blk_session *bvsession, struct spdk_vhost_virtqueue *vq)
{
@@ -985,9 +977,9 @@ _no_bdev_vdev_vq_worker(struct spdk_vhost_virtqueue *vq)
vhost_session_vq_used_signal(vq);
- if (vsession->task_cnt == 0 && bvsession->io_channel) {
- spdk_put_io_channel(bvsession->io_channel);
- bvsession->io_channel = NULL;
+ if (vq->task_cnt == 0 && bvsession->io_channel[vq->vring_idx]) {
+ spdk_put_io_channel(bvsession->io_channel[vq->vring_idx]);
+ bvsession->io_channel[vq->vring_idx] = NULL;
}
return SPDK_POLLER_BUSY;
@@ -1001,75 +993,55 @@ no_bdev_vdev_vq_worker(void *arg)
return _no_bdev_vdev_vq_worker(vq);
}
-static int
-no_bdev_vdev_worker(void *arg)
-{
- struct spdk_vhost_blk_session *bvsession = arg;
- struct spdk_vhost_session *vsession = &bvsession->vsession;
- uint16_t q_idx;
-
- for (q_idx = 0; q_idx < vsession->max_queues; q_idx++) {
- _no_bdev_vdev_vq_worker(&vsession->virtqueue[q_idx]);
- }
-
- return SPDK_POLLER_BUSY;
-}
-
static void
-vhost_blk_session_unregister_interrupts(struct spdk_vhost_blk_session *bvsession)
+vhost_blk_session_unregister_vq_interrupts(struct spdk_vhost_blk_session *bvsession,
+ int vq_idx)
{
struct spdk_vhost_session *vsession = &bvsession->vsession;
struct spdk_vhost_virtqueue *vq;
- int i;
-
- SPDK_DEBUGLOG(vhost_blk, "unregister virtqueues interrupt\n");
- for (i = 0; i < vsession->max_queues; i++) {
- vq = &vsession->virtqueue[i];
- if (vq->intr == NULL) {
- break;
- }
- SPDK_DEBUGLOG(vhost_blk, "unregister vq[%d]'s kickfd is %d\n",
- i, vq->vring.kickfd);
- spdk_interrupt_unregister(&vq->intr);
+ SPDK_DEBUGLOG(vhost_blk, "unregister virtqueues %d interrupt\n", vq_idx);
+ vq = &vsession->virtqueue[vq_idx];
+ if (vq->intr == NULL) {
+ return;
}
+
+ SPDK_DEBUGLOG(vhost_blk, "unregister vq[%d]'s kickfd is %d\n",
+ vq_idx, vq->vring.kickfd);
+ spdk_interrupt_unregister(&vq->intr);
}
static int
-vhost_blk_session_register_interrupts(struct spdk_vhost_blk_session *bvsession,
- spdk_interrupt_fn fn, const char *name)
+vhost_blk_session_register_vq_interrupts(struct spdk_vhost_blk_session *bvsession,
+ spdk_interrupt_fn fn, int vq_idx)
{
struct spdk_vhost_session *vsession = &bvsession->vsession;
struct spdk_vhost_virtqueue *vq = NULL;
- int i;
-
- SPDK_DEBUGLOG(vhost_blk, "Register virtqueues interrupt\n");
- for (i = 0; i < vsession->max_queues; i++) {
- vq = &vsession->virtqueue[i];
- SPDK_DEBUGLOG(vhost_blk, "Register vq[%d]'s kickfd is %d\n",
- i, vq->vring.kickfd);
-
- vq->intr = spdk_interrupt_register(vq->vring.kickfd, fn, vq, name);
- if (vq->intr == NULL) {
- SPDK_ERRLOG("Fail to register req notifier handler.\n");
- goto err;
- }
+
+ SPDK_DEBUGLOG(vhost_blk, "Register virtqueues %d interrupt\n", vq_idx);
+
+ vq = &vsession->virtqueue[vq_idx];
+ SPDK_DEBUGLOG(vhost_blk, "Register vq[%d]'s kickfd is %d\n",
+ vq_idx, vq->vring.kickfd);
+
+ vq->intr = SPDK_INTERRUPT_REGISTER(vq->vring.kickfd, fn, vq);
+ if (vq->intr == NULL) {
+ SPDK_ERRLOG("Fail to register req notifier handler.\n");
+ goto err;
}
return 0;
err:
- vhost_blk_session_unregister_interrupts(bvsession);
-
return -1;
}
static void
-vhost_blk_poller_set_interrupt_mode(struct spdk_poller *poller, void *cb_arg, bool interrupt_mode)
+vhost_blk_poller_set_vq_interrupt_mode(struct spdk_poller *poller, void *cb_arg, bool interrupt_mode)
{
- struct spdk_vhost_blk_session *bvsession = cb_arg;
+ struct spdk_vhost_virtqueue *vq = cb_arg;
- vhost_session_set_interrupt_mode(&bvsession->vsession, interrupt_mode);
+ vhost_session_set_vq_interrupt_mode(vq, interrupt_mode);
}
static struct spdk_vhost_blk_dev *
@@ -1127,35 +1099,44 @@ vhost_dev_bdev_remove_cpl_cb(struct spdk_vhost_dev *vdev, void *ctx)
bvdev->bdev = NULL;
}
-static int
-vhost_session_bdev_remove_cb(struct spdk_vhost_dev *vdev,
+static int vq_bdev_remove_cb(struct spdk_vhost_dev *vdev,
struct spdk_vhost_session *vsession,
void *ctx)
{
- struct spdk_vhost_blk_session *bvsession;
+ struct spdk_vhost_blk_session *bvsession = to_blk_session(vsession);
+ unsigned long vq_idx = (unsigned long)ctx;
int rc;
- bvsession = to_blk_session(vsession);
- if (bvsession->requestq_poller) {
- spdk_poller_unregister(&bvsession->requestq_poller);
- if (vsession->virtqueue[0].intr) {
- vhost_blk_session_unregister_interrupts(bvsession);
- rc = vhost_blk_session_register_interrupts(bvsession, no_bdev_vdev_vq_worker,
- "no_bdev_vdev_vq_worker");
- if (rc) {
- SPDK_ERRLOG("%s: Interrupt register failed\n", vsession->name);
- return rc;
- }
- }
+ if (bvsession->requestq_poller[vq_idx]) {
+ spdk_poller_unregister(&bvsession->requestq_poller[vq_idx]);
+ }
+
+ vhost_blk_session_unregister_vq_interrupts(bvsession, vq_idx);
- bvsession->requestq_poller = SPDK_POLLER_REGISTER(no_bdev_vdev_worker, bvsession, 0);
- spdk_poller_register_interrupt(bvsession->requestq_poller, vhost_blk_poller_set_interrupt_mode,
- bvsession);
+ if (spdk_interrupt_mode_is_enabled()) {
+ rc = vhost_blk_session_register_vq_interrupts(bvsession, no_bdev_vdev_vq_worker, vq_idx);
+ if (rc) {
+ SPDK_ERRLOG("%s: Interrupt register failed\n", vsession->name);
+ return rc;
+ }
}
+ bvsession->requestq_poller[vq_idx] = SPDK_POLLER_REGISTER(no_bdev_vdev_vq_worker, bvsession, 0);
+ spdk_poller_register_interrupt(bvsession->requestq_poller[vq_idx], vhost_blk_poller_set_vq_interrupt_mode,
+ bvsession);
+
return 0;
}
+static int
+vhost_session_bdev_remove_cb(struct spdk_vhost_dev *vdev,
+ struct spdk_vhost_session *vsession,
+ void *ctx)
+{
+ return vhost_session_send_event_mt(vsession, vq_bdev_remove_cb,
+ 3, "remove bdev");
+}
+
static void
bdev_remove_cb(void *remove_ctx)
{
@@ -1194,156 +1175,143 @@ bdev_event_cb(enum spdk_bdev_event_type type, struct spdk_bdev *bdev,
}
static void
-free_task_pool(struct spdk_vhost_blk_session *bvsession)
+free_vq_task_pool(struct spdk_vhost_virtqueue *vq)
{
- struct spdk_vhost_session *vsession = &bvsession->vsession;
- struct spdk_vhost_virtqueue *vq;
- uint16_t i;
-
- for (i = 0; i < vsession->max_queues; i++) {
- vq = &vsession->virtqueue[i];
- if (vq->tasks == NULL) {
- continue;
- }
-
- spdk_free(vq->tasks);
- vq->tasks = NULL;
+ if (vq->tasks == NULL) {
+ return;
}
+
+ spdk_free(vq->tasks);
+ vq->tasks = NULL;
+ return;
}
static int
-alloc_task_pool(struct spdk_vhost_blk_session *bvsession)
+alloc_vq_task_pool(struct spdk_vhost_virtqueue *vq)
{
- struct spdk_vhost_session *vsession = &bvsession->vsession;
- struct spdk_vhost_virtqueue *vq;
+ struct spdk_vhost_session *vsession = vq->vsession;
+ struct spdk_vhost_blk_session *bvsession = to_blk_session(vsession);
struct spdk_vhost_blk_task *task;
uint32_t task_cnt;
- uint16_t i;
uint32_t j;
- for (i = 0; i < vsession->max_queues; i++) {
- vq = &vsession->virtqueue[i];
- if (vq->vring.desc == NULL) {
- continue;
- }
+ if (vq->vring.desc == NULL) {
+ return 0;
+ }
- task_cnt = vq->vring.size;
- if (task_cnt > SPDK_VHOST_MAX_VQ_SIZE) {
- /* sanity check */
- SPDK_ERRLOG("%s: virtuque %"PRIu16" is too big. (size = %"PRIu32", max = %"PRIu32")\n",
- vsession->name, i, task_cnt, SPDK_VHOST_MAX_VQ_SIZE);
- free_task_pool(bvsession);
- return -1;
- }
- vq->tasks = spdk_zmalloc(sizeof(struct spdk_vhost_blk_task) * task_cnt,
- SPDK_CACHE_LINE_SIZE, NULL,
- SPDK_ENV_LCORE_ID_ANY, SPDK_MALLOC_DMA);
- if (vq->tasks == NULL) {
- SPDK_ERRLOG("%s: failed to allocate %"PRIu32" tasks for virtqueue %"PRIu16"\n",
- vsession->name, task_cnt, i);
- free_task_pool(bvsession);
- return -1;
- }
+ task_cnt = vq->vring.size;
+ if (task_cnt > SPDK_VHOST_MAX_VQ_SIZE) {
+ /* sanity check */
+ SPDK_ERRLOG("%s: virtuque %"PRIu16" is too big. (size = %"PRIu32", max = %"PRIu32")\n",
+ vsession->name, vq->vring_idx, task_cnt, SPDK_VHOST_MAX_VQ_SIZE);
+ return -1;
+ }
+ vq->tasks = spdk_zmalloc(sizeof(struct spdk_vhost_blk_task) * task_cnt,
+ SPDK_CACHE_LINE_SIZE, NULL,
+ SPDK_ENV_LCORE_ID_ANY, SPDK_MALLOC_DMA);
+ if (vq->tasks == NULL) {
+ SPDK_ERRLOG("%s: failed to allocate %"PRIu32" tasks for virtqueue %"PRIu16"\n",
+ vsession->name, task_cnt, vq->vring_idx);
+ return -1;
+ }
- for (j = 0; j < task_cnt; j++) {
- task = &((struct spdk_vhost_blk_task *)vq->tasks)[j];
- task->bvsession = bvsession;
- task->req_idx = j;
- task->vq = vq;
- }
+ for (j = 0; j < task_cnt; j++) {
+ task = &((struct spdk_vhost_blk_task *)vq->tasks)[j];
+ task->bvsession = bvsession;
+ task->req_idx = j;
+ task->vq = vq;
}
return 0;
}
static int
-vhost_blk_start_cb(struct spdk_vhost_dev *vdev,
+vhost_blk_start_vq_cb(struct spdk_vhost_dev *vdev,
struct spdk_vhost_session *vsession, void *unused)
{
struct spdk_vhost_blk_session *bvsession = to_blk_session(vsession);
struct spdk_vhost_blk_dev *bvdev;
- int i, rc = 0;
+ int rc = 0;
+ unsigned int vq_idx = (unsigned long)unused;
+ struct spdk_vhost_virtqueue *vq = &vsession->virtqueue[vq_idx];
bvdev = to_blk_dev(vdev);
assert(bvdev != NULL);
bvsession->bvdev = bvdev;
- /* validate all I/O queues are in a contiguous index range */
- for (i = 0; i < vsession->max_queues; i++) {
- /* vring.desc and vring.desc_packed are in a union struct
- * so q->vring.desc can replace q->vring.desc_packed.
- */
- if (vsession->virtqueue[i].vring.desc == NULL) {
- SPDK_ERRLOG("%s: queue %"PRIu32" is empty\n", vsession->name, i);
- rc = -1;
- goto out;
- }
+ assert(vq->vring_idx == vq_idx);
+
+ /* vring.desc and vring.desc_packed are in a union struct
+ * so q->vring.desc can replace q->vring.desc_packed.
+ */
+ if (vsession->virtqueue[vq_idx].vring.desc == NULL) {
+ SPDK_ERRLOG("%s: queue %"PRIu32" is empty\n", vsession->name, vq_idx);
+ rc = -1;
+ goto out;
}
- rc = alloc_task_pool(bvsession);
+ rc = alloc_vq_task_pool(vq);
if (rc != 0) {
- SPDK_ERRLOG("%s: failed to alloc task pool.\n", vsession->name);
+ SPDK_ERRLOG("%s: failed to alloc %u task pool.\n", vsession->name, vq_idx);
goto out;
}
if (bvdev->bdev) {
- bvsession->io_channel = spdk_bdev_get_io_channel(bvdev->bdev_desc);
- if (!bvsession->io_channel) {
- free_task_pool(bvsession);
- SPDK_ERRLOG("%s: I/O channel allocation failed\n", vsession->name);
+ bvsession->io_channel[vq_idx] = spdk_bdev_get_io_channel(bvdev->bdev_desc);
+ if (!bvsession->io_channel[vq_idx]) {
+ free_vq_task_pool(vq);
+ SPDK_ERRLOG("%s: I/O channel %u allocation failed\n", vsession->name, vq_idx);
rc = -1;
goto out;
}
}
if (spdk_interrupt_mode_is_enabled()) {
- if (bvdev->bdev) {
- rc = vhost_blk_session_register_interrupts(bvsession,
- vdev_vq_worker,
- "vdev_vq_worker");
- } else {
- rc = vhost_blk_session_register_interrupts(bvsession,
- no_bdev_vdev_vq_worker,
- "no_bdev_vdev_vq_worker");
- }
-
+ rc = vhost_blk_session_register_vq_interrupts(bvsession,
+ bvdev->bdev ? vdev_vq_worker : no_bdev_vdev_vq_worker, vq_idx);
if (rc) {
- SPDK_ERRLOG("%s: Interrupt register failed\n", vsession->name);
+ SPDK_ERRLOG("%s: Interrupt %u register failed\n", vsession->name, vq->vring_idx);
goto out;
}
}
- if (bvdev->bdev) {
- bvsession->requestq_poller = SPDK_POLLER_REGISTER(vdev_worker, bvsession, 0);
- } else {
- bvsession->requestq_poller = SPDK_POLLER_REGISTER(no_bdev_vdev_worker, bvsession, 0);
- }
- SPDK_INFOLOG(vhost, "%s: started poller on lcore %d\n",
- vsession->name, spdk_env_get_current_core());
+ bvsession->requestq_poller[vq_idx] = SPDK_POLLER_REGISTER(bvdev->bdev ? vdev_vq_worker : no_bdev_vdev_vq_worker,
+ vq, 0);
+ SPDK_INFOLOG(vhost, "%s.%u: started poller on lcore %d\n",
+ vsession->name, vq_idx, spdk_env_get_current_core());
+
+ spdk_poller_register_interrupt(bvsession->requestq_poller[vq_idx], vhost_blk_poller_set_vq_interrupt_mode,
+ vq);
- spdk_poller_register_interrupt(bvsession->requestq_poller, vhost_blk_poller_set_interrupt_mode,
- bvsession);
out:
- vhost_session_start_done(vsession, rc);
+ vhost_session_start_vq_done(vq, rc);
return rc;
}
static int
vhost_blk_start(struct spdk_vhost_session *vsession)
{
- return vhost_session_send_event(vsession, vhost_blk_start_cb,
+ int rc;
+
+ rc = vhost_session_send_event_mt(vsession, vhost_blk_start_vq_cb,
3, "start session");
+ if (rc) {
+ vhost_session_send_event_mt(vsession, vhost_blk_stop_vq_cb,
+ 3, "stop session");
+ }
+
+ return rc;
}
static int
-destroy_session_poller_cb(void *arg)
+destroy_session_vq_poller_cb(void *arg)
{
- struct spdk_vhost_blk_session *bvsession = arg;
- struct spdk_vhost_session *vsession = &bvsession->vsession;
- int i;
+ struct spdk_vhost_virtqueue *vq = arg;
+ struct spdk_vhost_session *vsession = vq->vsession;
+ struct spdk_vhost_blk_session *bvsession = to_blk_session(vsession);
- if (vsession->task_cnt > 0) {
+ if (vq->task_cnt > 0) {
return SPDK_POLLER_BUSY;
}
@@ -1351,48 +1319,46 @@ destroy_session_poller_cb(void *arg)
return SPDK_POLLER_BUSY;
}
- for (i = 0; i < vsession->max_queues; i++) {
- vsession->virtqueue[i].next_event_time = 0;
- vhost_vq_used_signal(vsession, &vsession->virtqueue[i]);
- }
+ vq->next_event_time = 0;
+ vhost_vq_used_signal(vsession, vq);
SPDK_INFOLOG(vhost, "%s: stopping poller on lcore %d\n",
vsession->name, spdk_env_get_current_core());
- if (bvsession->io_channel) {
- spdk_put_io_channel(bvsession->io_channel);
- bvsession->io_channel = NULL;
+ if (bvsession->io_channel[vq->vring_idx]) {
+ spdk_put_io_channel(bvsession->io_channel[vq->vring_idx]);
+ bvsession->io_channel[vq->vring_idx] = NULL;
}
- free_task_pool(bvsession);
- spdk_poller_unregister(&bvsession->stop_poller);
- vhost_session_stop_done(vsession, 0);
+ free_vq_task_pool(vq);
+ spdk_poller_unregister(&bvsession->stop_poller[vq->vring_idx]);
+ vhost_session_stop_vq_done(vq, 0);
spdk_vhost_unlock();
return SPDK_POLLER_BUSY;
}
static int
-vhost_blk_stop_cb(struct spdk_vhost_dev *vdev,
+vhost_blk_stop_vq_cb(struct spdk_vhost_dev *vdev,
struct spdk_vhost_session *vsession, void *unused)
{
struct spdk_vhost_blk_session *bvsession = to_blk_session(vsession);
+ unsigned long vq_idx = (unsigned long)unused;
+ struct spdk_vhost_virtqueue *vq = &vsession->virtqueue[vq_idx];
- spdk_poller_unregister(&bvsession->requestq_poller);
-
- if (vsession->virtqueue[0].intr) {
- vhost_blk_session_unregister_interrupts(bvsession);
+ if (bvsession->requestq_poller[vq_idx]) {
+ spdk_poller_unregister(&bvsession->requestq_poller[vq_idx]);
}
- bvsession->stop_poller = SPDK_POLLER_REGISTER(destroy_session_poller_cb,
- bvsession, 1000);
+ vhost_blk_session_unregister_vq_interrupts(bvsession, vq_idx);
+ bvsession->stop_poller[vq_idx] = SPDK_POLLER_REGISTER(destroy_session_vq_poller_cb, vq, 1000);
return 0;
}
static int
vhost_blk_stop(struct spdk_vhost_session *vsession)
{
- return vhost_session_send_event(vsession, vhost_blk_stop_cb,
+ return vhost_session_send_event_mt(vsession, vhost_blk_stop_vq_cb,
3, "stop session");
}
diff --git a/lib/vhost/vhost_internal.h b/lib/vhost/vhost_internal.h
index 36ab0c16f..92d096a8c 100644
--- a/lib/vhost/vhost_internal.h
+++ b/lib/vhost/vhost_internal.h
@@ -111,6 +111,14 @@ struct spdk_vhost_virtqueue {
void *tasks;
+ int task_cnt;
+
+ bool initialized;
+ bool started;
+ bool needs_restart;
+ bool forced_polling;
+ bool interrupt_mode;
+
/* Request count from last stats check */
uint32_t req_cnt;
@@ -150,10 +158,12 @@ struct spdk_vhost_session {
struct rte_vhost_memory *mem;
- int task_cnt;
-
uint16_t max_queues;
+ uint16_t active_queues;
+
+ int task_cnt;
+
uint64_t negotiated_features;
/* Local copy of device coalescing settings. */
@@ -168,6 +178,8 @@ struct spdk_vhost_session {
struct spdk_vhost_virtqueue virtqueue[SPDK_VHOST_MAX_VQUEUES];
+ struct spdk_thread *thread[SPDK_VHOST_MAX_VQUEUES]; /* thread of data plane per vq */
+
TAILQ_ENTRY(spdk_vhost_session) tailq;
};
@@ -175,7 +187,7 @@ struct spdk_vhost_dev {
char *name;
char *path;
- struct spdk_thread *thread;
+ struct spdk_thread *thread; /* if support mt, only run as control plane */
bool registered;
uint64_t virtio_features;
@@ -420,6 +432,7 @@ int vhost_destroy_connection_cb(int vid);
* Set vhost session to run in interrupt or poll mode
*/
void vhost_session_set_interrupt_mode(struct spdk_vhost_session *vsession, bool interrupt_mode);
+void vhost_session_set_vq_interrupt_mode(struct spdk_vhost_virtqueue *vq, bool interrupt_mode);
/*
* Memory registration functions used in start/stop device callbacks
@@ -464,6 +477,10 @@ int vhost_session_send_event(struct spdk_vhost_session *vsession,
spdk_vhost_session_fn cb_fn, unsigned timeout_sec,
const char *errmsg);
+int vhost_session_send_event_mt(struct spdk_vhost_session *vsession,
+ spdk_vhost_session_fn cb_fn, unsigned timeout_sec,
+ const char *errmsg);
+
/**
* Finish a blocking spdk_vhost_session_send_event() call and finally
* start the session. This must be called on the target lcore, which
@@ -477,6 +494,8 @@ int vhost_session_send_event(struct spdk_vhost_session *vsession,
*/
void vhost_session_start_done(struct spdk_vhost_session *vsession, int response);
+void vhost_session_start_vq_done(struct spdk_vhost_virtqueue *vq, int response);
+
/**
* Finish a blocking spdk_vhost_session_send_event() call and finally
* stop the session. This must be called on the session's lcore which
@@ -493,6 +512,9 @@ void vhost_session_start_done(struct spdk_vhost_session *vsession, int response)
*/
void vhost_session_stop_done(struct spdk_vhost_session *vsession, int response);
+void vhost_session_stop_vq_done(struct spdk_vhost_virtqueue *vq, int response);
+
+
struct spdk_vhost_session *vhost_session_find_by_vid(int vid);
void vhost_session_install_rte_compat_hooks(struct spdk_vhost_session *vsession);
int vhost_register_unix_socket(const char *path, const char *ctrl_name,
--
2.14.1.40.g8e62ba1
9 months
need help on multipath failover test on spdk initiator
by lullajd@yahoo.com
Hi,
This is about multipath failover support (nvme over tcp) on initiator with v22.01-pre git sha1 be2d126fd / DPDK 21.08.0.
I was trying to test the failover scenario wherein my fio (running on the spdk initiator, accessing a remote nvme device from
a target over TCP) is suppossed to
continue running even when the primary path to the volume/device goes down.
I need suggestions/help in finding
0. if v22.01-pre (git sha1 be2d126fd) has functional failover support on spdk initiator.
1. what is wrong in the bdev_new.conf below wrt the multipath and timeouts
2. is there any example configuration demostrating successful failover when initiator is on spdk.
Details at the bottom of this mail.
Thanks
Jitendra
As the following nvme discover command shows, there were two paths to reach the volume with traddr_ip1 and traddr_ip2.
The path via traddr_ip1 is the primary path and the path via traddr_ip2 is the secondary path. Both the paths were live
initially as shown by nvme list-subsys.
nvme discover -t tcp -a <discovery controller ip> -q <hostnqn>
Discovery Log Number of Records 2, Generation counter 1
=====Discovery Log Entry 0======
trtype: tcp
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 0
trsvcid: 4420
subnqn: <subnqn>
traddr: traddr_ip1
sectype: none
=====Discovery Log Entry 1======
trtype: tcp
adrfam: ipv4
subtype: nvme subsystem
treq: not specified
portid: 0
trsvcid: 4420
subnqn: <subnqn>
traddr: traddr_ip2
sectype: none
Using this info from the nvme discover command, I wrote the bdev_new.conf as the following:
cat mypath/jlulla/spdk/build/fio/bdev_new.conf
{
"subsystems": [
{
"subsystem": "bdev",
"config": [
{
"method": "bdev_nvme_set_options", "params": {
"retry_count": 254,
"timeout_us": 255000000,
"keep_alive_timeout_ms": 255000,
"transport_retry_count": 254,
"bdev_retry_count": -1,
"nvme_adminq_poll_period_us": 100000,
"nvme_ioq_poll_period_us": 0,
"action_on_timeout": "reset"
}
},
{
"method": "bdev_nvme_attach_controller", "params": {
"name": "Nvme0",
"trtype": "tcp",
"traddr": "traddr_ip1",
"trsvcid": "4420",
"subnqn": "<subnqn>",
"adrfam": "IPv4",
"hostnqn": "<hostnqn>",
"multipath": "failover"
}
},
{
"method": "bdev_nvme_attach_controller", "params": {
"name": "Nvme0",
"trtype": "tcp",
"traddr": "traddr_ip2",
"trsvcid": "4420",
"subnqn": "<subnqn>",
"adrfam": "IPv4",
"multipath": "failover",
"hostnqn": "<hostnqn>"
}
}
]
}
]
}
My fio job looked like the below:, the total bytes to be written were 64GiB.
root@mymachine:~/jlulla/spdk# cat mypath/jlulla/fio_spdk_write_profile.fio
[global]
ioengine=spdk_bdev
spdk_conf=mypath/jlulla/spdk/build/fio/bdev_new.conf
direct=1
thread=1
prio=0
rw=write
bs=4k
numjobs=1
iodepth=16
verify=md5
do_verify=0
size=100%
norandommap
randrepeat=0
group_reporting=1
[job1]
filename=Nvme0n1
Howver, when I ran the fio like below, it ended up in two issues:
jlulla/spdk# LD_PRELOAD=mypath/jlulla/spdk/build/fio/spdk_bdev fio mypath/jlulla/fio_spdk_write_profile.fio
Issue 1: 32602, Invalid parameters
job1: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=spdk_bdev, iodepth=16
fio-3.18.random_compress_support
Starting 1 thread
[2021-11-10 13:39:29.871918] Starting SPDK v22.01-pre git sha1 be2d126fd / DPDK 21.08.0 initialization...
[2021-11-10 13:39:29.872144] [ DPDK EAL parameters: [2021-11-10 13:39:29.872191] fio [2021-11-10 13:39:29.872224] --no-shconf [2021-11-10 13:39:29.872259] -c 0x1 [2021-11-10 13:39:29.872287] --log-level=lib.eal:6 [2021-11-10 13:39:29.872318] --log-level=lib.cryptodev:5 [2021-11-10 13:39:29.872353] --log-level=user1:6 [2021-11-10 13:39:29.872383] --iova-mode=pa [2021-11-10 13:39:29.872411] --base-virtaddr=0x200000000000 [2021-11-10 13:39:29.872440] --match-allocations [2021-11-10 13:39:29.872505] --file-prefix=spdk_pid26564 [2021-11-10 13:39:29.872537] ]
TELEMETRY: No legacy callbacks, legacy socket not created
[2021-11-10 13:39:30.077456] accel_engine.c:1012:spdk_accel_engine_initialize: *NOTICE*: Accel engine initialized to use software engine.
[2021-11-10 13:39:30.482123] json_config.c: 221:rpc_client_poller: *ERROR*: error response:
{
"code": -32602,
"message": "Invalid parameters"
}
Issue 2: The fio started wriiting on the device and after some time when the primary path was made unavailable
2A: the fio immediately started the following, but I was expecting it to continue writing.
Jobs: 1 (f=1): [W(1)][48.4%][eta 06m:04s]
Jobs: 1 (f=1): [W(1)][48.4%][eta 06m:12s]
Jobs: 1 (f=1): [W(1)][48.4%][eta 06m:50s]
Jobs: 1 (f=1): [W(1)][48.4%][eta 07m:29s]
Jobs: 1 (f=1): [W(1)][48.5%][eta 11m:08s]
2B: Eventaully, fio ended up showing Input/output error.
[2021-11-10 13:50:07.876969] bdev_nvme.c:2361:timeout_cb: *WARNING*: Warning: Detected a timeout. ctrlr=0x7f7c800d6bd0 qpair=(nil) cid=5
[2021-11-10 13:50:07.877344] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*: aborting queued i/o
[2021-11-10 13:50:07.877432] nvme_qpair.c: 537:nvme_qpair_manual_complete_request: *NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877449] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*: WRITE sqid:1 cid:0 nsid:1 lba:8117663 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877461] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877480] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*: aborting queued i/o
[2021-11-10 13:50:07.877496] nvme_qpair.c: 537:nvme_qpair_manual_complete_request: *NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877506] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*: WRITE sqid:1 cid:0 nsid:1 lba:8117664 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877516] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877525] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*: aborting queued i/o
[2021-11-10 13:50:07.877535] nvme_qpair.c: 537:nvme_qpair_manual_complete_request: *NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877544] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*: WRITE sqid:1 cid:0 nsid:1 lba:8117665 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877553] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877563] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*: aborting queued i/o
[2021-11-10 13:50:07.877572] nvme_qpair.c: 537:nvme_qpair_manual_complete_request: *NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877582] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*: WRITE sqid:1 cid:0 nsid:1 lba:8117666 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877591] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877600] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*: aborting queued i/o
[2021-11-10 13:50:07.877609] nvme_qpair.c: 537:nvme_qpair_manual_complete_request: *NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877618] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*: WRITE sqid:1 cid:0 nsid:1 lba:8117667 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877637] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877647] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*: aborting queued i/o
[2021-11-10 13:50:07.877656] nvme_qpair.c: 537:nvme_qpair_manual_complete_request: *NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877666] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*: WRITE sqid:1 cid:0 nsid:1 lba:8117668 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877676] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877684] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*: aborting queued i/o
[2021-11-10 13:50:07.877695] nvme_qpair.c: 537:nvme_qpair_manual_complete_request: *NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877705] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*: WRITE sqid:1 cid:0 nsid:1 lba:8117669 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877714] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
[2021-11-10 13:50:07.877724] nvme_qpair.c: 559:nvme_qpair_abort_queued_reqs: *ERROR*: aborting queued i/o
[2021-11-10 13:50:07.877732] nvme_qpair.c: 537:nvme_qpair_manual_complete_request: *NOTICE*: Command completed manually:
[2021-11-10 13:50:07.877749] nvme_qpair.c: 273:nvme_io_qpair_print_command: *NOTICE*: WRITE sqid:1 cid:0 nsid:1 lba:8117670 len:1 PRP1 0x0 PRP2 0x0
[2021-11-10 13:50:07.877759] nvme_qpair.c: 456:spdk_nvme_print_completion: *NOTICE*: ABORTED - BY REQUEST (00/07) qid:1 cid:0 cdw0:0 sqhd:0000 p:0 m:0 dnr:1
fio: io_u error on file Nvme0n1: Input/output error: write offset=33249947648, buflen=4096
fio: io_u error on file Nvme0n1: Input/output error: write offset=33249951744, buflen=4096
9 months, 1 week
SPDK initiator NVME/TCP read performance on Null device target is low compared to write perf.
by vishwasdanivas@gmail.com
Hello,
I am trying to run NVME/TCP perf experiments with spdk initiator and target similar to the setup mentioned in the below NVME/TCP SPDK perf document.
https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2101...
I am running my SPDK initiator and target on two different machines with the details at the end of mail.
I am running SPDK initiator scaling experiment as mentioned in the document to measure SPDK initiator performance. IO size is 4K.
I have matched all the configs mentioned in the above document for tuning TCP and enabling zero copy on target and all fio configs mentioned in the document for initiator and target. Only variation is I am using Null dev instead of real SSDs for running the tests.
I am able to see peak write performance close to 3M IOPS as shown in the document for the initiator with FIO.
However for read performance I am getting capped at around 1.5 M IOPS and not able to scale beyond that even after playing around with different values for number of cores/num-jobs , iodepth, number of tcp connections to target/subqn-cnodes etc.
From initial debugging using perf tool, looks like there are lot of L2/L3 cache misses (thrashing) for FIO read test when compared to FIO write. Not entirely sure if this could be the only reason for the degraded read performance.
I was wondering if SPDK read path is touching more data per IOP and hence the increased load on cache and higher latencies, is leading to this?
Can you please throw more light on this?
Also any tunings to help reach higher numbers similar to the perf mentioned in the document for read NVME/TCP FIO initiator test?
Initiator machine details.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 72
On-line CPU(s) list: 0-71
Thread(s) per core: 2
Core(s) per socket: 18
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
Stepping: 4
CPU MHz: 1000.740
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 4600.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-17,36-53
NUMA node1 CPU(s): 18-35,54-71
Thanks,
Vishwas
9 months, 1 week