回复:Re: spdk can't pass fio test if 4 clients testing with 4 split partitions
by nixun_992@sina.com
Hi, Pawel & changpeng:
No, not for 512 size, i just specify random size of IO to stress the spdk vhost program. and for /mnt/ssdtest1, i mount the target disk to /dev/sda, and mount /dev/sda1 /mnt. the ssdtest1 is the test file for fio test.
my guest os is using centos7u1, the dmesg seems to not have so much problem. The main problem is that the spdk is continuing reset controller, not sure why this happens. and i didn't see it in old version.
Thanks,
Xun
================================
I see bsrange 1k to 512k, is NVMe formatted as 512b block size here?
Which commit you use on this test?
filename=/mnt/ssdtest1 – is this some FS mounted directory?
Can you send us dmesg from failure?
Paweł
From: SPDK [mailto:spdk-bounces@lists.01.org]
On Behalf Of Liu, Changpeng
Sent: Tuesday, June 13, 2017 9:18 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] spdk can't pass fio test if 4 clients testing with 4 split partitions
Thanks Xun.
We’ll take a look at the issue.
From the error log message, there seems a SCSI task management command received by SPDK,
the reason why VM sent the task management command is most likely due to timeout for some
commands.
From: SPDK [mailto:spdk-bounces@lists.01.org]
On Behalf Of nixun_992(a)sina.com
Sent: Monday, June 12, 2017 3:40 PM
To: spdk <spdk(a)lists.01.org>
Subject: [SPDK] spdk can't pass fio test if 4 clients testing with 4 split partitions
Hi, All:
spdk can't pass fio test after 2 hours testing. And it can pass the same test if we use the version before Mar 29
the error message is the following
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:134 nsid:1 lba:1310481280 len:128
ABORTED - BY REQUEST (00/07) sqid:1 cid:134 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:202 nsid:1 lba:1310481408 len:256
ABORTED - BY REQUEST (00/07) sqid:1 cid:202 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:151 nsid:1 lba:1310481664 len:256
ABORTED - BY REQUEST (00/07) sqid:1 cid:151 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:243 nsid:1 lba:1312030816 len:96
ABORTED - BY REQUEST (00/07) sqid:1 cid:243 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
resetting controller
resetting controller
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:253 nsid:1 lba:998926248 len:88
ABORTED - BY REQUEST (00/07) sqid:1 cid:253 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:243 nsid:1 lba:1049582336 len:176
ABORTED - BY REQUEST (00/07) sqid:1 cid:243 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:169 nsid:1 lba:1109679488 len:128
ABORTED - BY REQUEST (00/07) sqid:1 cid:169 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:134 nsid:1 lba:958884728 len:136
ABORTED - BY REQUEST (00/07) sqid:1 cid:134 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:152 nsid:1 lba:1018345728 len:240
ABORTED - BY REQUEST (00/07) sqid:1 cid:152 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:234 nsid:1 lba:898096896 len:8
ABORTED - BY REQUEST (00/07) sqid:1 cid:234 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:130 nsid:1 lba:991125248 len:96
ABORTED - BY REQUEST (00/07) sqid:1 cid:130 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
resetting controller
resetting controller
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:130 nsid:1 lba:609149952 len:64
ABORTED - BY REQUEST (00/07) sqid:1
cid:130 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
=========================
our vhost conf is the following
# The Split virtual block device slices block devices into multiple smaller bdevs.
[Split]
# Syntax:
# Split <bdev> <count> [<size_in_megabytes>]
#
# Split Nvme1n1 into two equally-sized portions, Nvme1n1p0 and Nvme1n1p1
Split Nvme0n1 4 200000
# Split Malloc2 into eight 1-megabyte portions, Malloc2p0 ... Malloc2p7,
# leaving the rest of the device inaccessible
#Split Malloc2 8 1
[VhostScsi0]
Dev 0 Nvme0n1p0
[VhostScsi1]
Dev 0 Nvme0n1p1
[VhostScsi2]
Dev 0 Nvme0n1p2
[VhostScsi3]
Dev 0 Nvme0n1p3
fio script is the following:
[global]
filename=/mnt/ssdtest1
size=100G
numjobs=8
iodepth=16
ioengine=libaio
group_reporting
do_verify=1
verify=md5
# direct rand read
[rand-read]
bsrange=1k-512k
#direct=1
rw=randread
runtime=10000
stonewall
# direct seq read
[seq-read]
bsrange=1k-512k
direct=1
rw=read
runtime=10000
stonewall
# direct rand write
[rand-write]
bsrange=1k-512k
direct=1
rw=randwrite
runtime=10000
stonewall
# direct seq write
[seq-write]
bsrange=1k-512k
direct=1
rw=write
runtime=10000
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
3 years, 8 months
spdk can't pass fio test if 4 clients testing with 4 split partitions
by nixun_992@sina.com
Hi, All:
spdk can't pass fio test after 2 hours testing. And it can pass the same test if we use the version before Mar 29
the error message is the following
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:134 nsid:1 lba:1310481280 len:128
ABORTED - BY REQUEST (00/07) sqid:1 cid:134 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:202 nsid:1 lba:1310481408 len:256
ABORTED - BY REQUEST (00/07) sqid:1 cid:202 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:151 nsid:1 lba:1310481664 len:256
ABORTED - BY REQUEST (00/07) sqid:1 cid:151 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:243 nsid:1 lba:1312030816 len:96
ABORTED - BY REQUEST (00/07) sqid:1 cid:243 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
resetting controller
resetting controller
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:253 nsid:1 lba:998926248 len:88
ABORTED - BY REQUEST (00/07) sqid:1 cid:253 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:243 nsid:1 lba:1049582336 len:176
ABORTED - BY REQUEST (00/07) sqid:1 cid:243 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:169 nsid:1 lba:1109679488 len:128
ABORTED - BY REQUEST (00/07) sqid:1 cid:169 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:134 nsid:1 lba:958884728 len:136
ABORTED - BY REQUEST (00/07) sqid:1 cid:134 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:152 nsid:1 lba:1018345728 len:240
ABORTED - BY REQUEST (00/07) sqid:1 cid:152 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:234 nsid:1 lba:898096896 len:8
ABORTED - BY REQUEST (00/07) sqid:1 cid:234 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:130 nsid:1 lba:991125248 len:96
ABORTED - BY REQUEST (00/07) sqid:1 cid:130 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
resetting controller
resetting controller
nvme_pcie.c:1133:nvme_pcie_qpair_abort_trackers: ***ERROR*** aborting outstanding command
READ sqid:1 cid:130 nsid:1 lba:609149952 len:64
ABORTED - BY REQUEST (00/07) sqid:1 cid:130 cdw0:0 sqhd:0000 p:0 m:0 dnr:0
=========================
our vhost conf is the following
# The Split virtual block device slices block devices into multiple smaller bdevs.
[Split]
# Syntax:
# Split <bdev> <count> [<size_in_megabytes>]
#
# Split Nvme1n1 into two equally-sized portions, Nvme1n1p0 and Nvme1n1p1
Split Nvme0n1 4 200000
# Split Malloc2 into eight 1-megabyte portions, Malloc2p0 ... Malloc2p7,
# leaving the rest of the device inaccessible
#Split Malloc2 8 1
[VhostScsi0]
Dev 0 Nvme0n1p0
[VhostScsi1]
Dev 0 Nvme0n1p1
[VhostScsi2]
Dev 0 Nvme0n1p2
[VhostScsi3]
Dev 0 Nvme0n1p3
fio script is the following:
[global]
filename=/mnt/ssdtest1
size=100G
numjobs=8
iodepth=16
ioengine=libaio
group_reporting
do_verify=1
verify=md5
# direct rand read
[rand-read]
bsrange=1k-512k
#direct=1
rw=randread
runtime=10000
stonewall
# direct seq read
[seq-read]
bsrange=1k-512k
direct=1
rw=read
runtime=10000
stonewall
# direct rand write
[rand-write]
bsrange=1k-512k
direct=1
rw=randwrite
runtime=10000
stonewall
# direct seq write
[seq-write]
bsrange=1k-512k
direct=1
rw=write
runtime=10000
3 years, 8 months
SPDK Office Hours
by Walker, Benjamin
Hi Everyone,
We'd like to announce that SPDK Office Hours will be held in our IRC channel
(#spdk on FreeNode) between 10 and 11 AM Pacific Daylight Time/Mountain Standard
Time every Monday. If you jump in the channel at that time, we'll have SPDK
developers ready to answer questions!
In practice, there are SPDK developers in the IRC channel nearly 24/7 that can
assist with questions. If you don't feel like waiting until Monday, come say
hello any time.
Thanks,
The SPDK Maintainers
3 years, 8 months
Re: [SPDK] SPDK performance questions
by Abhik Sarkar
Hi Pawel,
Thanks for the response. Before anything else, I would like to clarify that we are not using NVMe yet, rather SSDs.
I have uploaded the vhost config file attached with this email. The same vhost config is being used to have single-core vhost and dual-core vhost by changing the mask parameters.
Following is the qemu launch command used.
/mnt/virt/spdk-qemu/build/x86_64-softmmu/qemu-system-x86_64 -D /mnt/qemu.log -m 1024 -object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on \
-nographic -no-user-config -nodefaults -serial mon:telnet:localhost:7704,server,nowait -monitor mon:telnet:localhost:8804,server,nowait \
-chardev socket,id=charmonitor,path=/mnt/virt/var/qemu/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -numa node,memdev=mem \
-drive file=/home/qemu/qcows/sl6.1.qcow2,if=none,id=disk -device ide-hd,drive=disk,bootindex=0 \
-chardev socket,id=char0,path=/root/spdk/vhost.0 -device vhost-user-scsi-pci,id=scsi0,chardev=char0 \
-chardev socket,id=char1,path=/root/spdk/vhost.1 -device vhost-user-scsi-pci,id=scsi1,chardev=char1 \
-chardev socket,id=char2,path=/root/spdk/vhost.2 -device vhost-user-scsi-pci,id=scsi2,chardev=char2 \
-chardev socket,id=char3,path=/root/spdk/vhost.3 -device vhost-user-scsi-pci,id=scsi3,chardev=char3 \
-chardev socket,id=char4,path=/root/spdk/vhost.4 -device vhost-user-scsi-pci,id=scsi4,chardev=char4 \
-chardev socket,id=char5,path=/root/spdk/vhost.5 -device vhost-user-scsi-pci,id=scsi5,chardev=char5 \
-chardev socket,id=char6,path=/root/spdk/vhost.6 -device vhost-user-scsi-pci,id=scsi6,chardev=char6 \
--enable-kvm -device e1000,netdev=net0 -netdev user,id=net0,hostfwd=tcp::5555-:22 -smp 14,sockets=2,cores=7,threads=1
I notice that when employing 2-core vhost, the second core (Cpu9) is busy handling software interrupts when fio executes,
%Cpu0 : 49.2 us, 50.8 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 11.6 us, 55.1 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 33.2 si, 0.0 st
While when I run it with 1-core vhost configuration, none of the cores are busy handling software interrupts.
%Cpu0 : 20.3 us, 79.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
This seems like a symptom that can bring the performance down with 2-core vhost config. What could be the reason for it?
I have also attached the fio config directed towards 7 ssds.
As far as the NUMA settings go, it will be really helpful if I can get the Qemu launch command parameters to do that as we have faced difficulty with launching SPDK enabled VM with Virsh.
I have separately posted issue with Virsh but have not had success yet.
Thanks/Regards
Abhik
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of "Wodkowski, PawelX" <pawelx.wodkowski(a)intel.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Wednesday, June 7, 2017 at 6:22 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] SPDK performance questions
Hi,
For SPDK vhost case, there are many possible settings that might be set in suboptimal way.
1. I see you have 2 socket system. So first to check is the NUMA assignments. The QEMU memory and CPU pining should be on the same socket as the vhost reactor/poller is running. And NVMe also should be on the same socket.
2. What is the vhost config, fio job config, qemu launch command?
If you are interested on the throughput I think there should be no difference between SPDK and kernel vhost on IO block size > 32kb. SPDK if focusing on getting more IO per second @ small block size (eg 4kb) using less CPU power than kernel.
Pawel
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Abhik Sarkar
Sent: Tuesday, June 06, 2017 7:41 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] SPDK performance questions
I have a couple of questions regarding the data I have gathered so far on SCSI based disks with SPDK using libaio on host.
1. Our FIO results on host, a 2 socket, 16 core box is around 2.05 GBps. I have been able to push it to 1.88 GBps with SPDK-vhost running on a single core handling 7 request queues (1 queue per disk) or in other words from vhost point of view
it sees 7 different unix sockets and from guest point of view one disk per scsi host. But when I have a single scsi host with 7 target devices, the performance goes down to 1.38 GBps. Is this performance boost coming from guest being able to populate separate queues in parallel, as there is only a single vhost thread handling the requests. Is my understanding correct? Will we get the same performance by adding multiple virtio-scsi-pci devices and interfacing each disk with a different virtio device?
2. Another observation is that, if I try to distribute it between 2 cores, the throughput drops down to bw=1408MiB/s (1476MB/s). Here, I see that the vhost app has an additional thread. With additional thread handling disjoint set of queues,
I would have imagined that there should be some performance boost. But here it seemed to have dropped. Is there a lock contention of some sort? On the host, all these devices are on a single SCSI host.
3. Since, we are not using a userspace driver, rather relying on libaio, will it be more efficient to use kernel-vhost?
3 years, 8 months
SPDK performance questions
by Abhik Sarkar
I have a couple of questions regarding the data I have gathered so far on SCSI based disks with SPDK using libaio on host.
1. Our FIO results on host, a 2 socket, 16 core box is around 2.05 GBps. I have been able to push it to 1.88 GBps with SPDK-vhost running on a single core handling 7 request queues (1 queue per disk) or in other words from vhost point of view
it sees 7 different unix sockets and from guest point of view one disk per scsi host. But when I have a single scsi host with 7 target devices, the performance goes down to 1.38 GBps. Is this performance boost coming from guest being able to populate separate queues in parallel, as there is only a single vhost thread handling the requests. Is my understanding correct? Will we get the same performance by adding multiple virtio-scsi-pci devices and interfacing each disk with a different virtio device?
2. Another observation is that, if I try to distribute it between 2 cores, the throughput drops down to bw=1408MiB/s (1476MB/s). Here, I see that the vhost app has an additional thread. With additional thread handling disjoint set of queues,
I would have imagined that there should be some performance boost. But here it seemed to have dropped. Is there a lock contention of some sort? On the host, all these devices are on a single SCSI host.
3. Since, we are not using a userspace driver, rather relying on libaio, will it be more efficient to use kernel-vhost?
3 years, 8 months
VM oops while testing SPDK hotplug.sh
by Isaac Otsiabah
Daniel. I have done more testing using hotplug.sh and here is why one does not see the problem by running hotplug.sh as is. When hotplug.sh is run, it always creates a new VM to run the test on and, destroys the VM when completed. Therefore, it uses a fresh VM all the time and never gets the chance to do the inserts and remove on a VM that has run the same test several times (ie. 4 to 5 times) before. When I run hotplug.sh with a fresh VM, it passes, On the other hand, when I use the same VM that has run the test several times before, the VM oops, this is the problem. I also think this is the likely scenario the customer will experience.
I wasn't clear in my earlier emails because i was still determining the cause of the problem (at the higher level at least). I think you will see the problem if you do this
i. In an xterm window, Create the VM separately (without the -daemon flag). for example
IMAGE=/home/fedora24/fedora24-2.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
FEDORA_ISO=/tmp/Fedora-Server-dvd-x86_64-24-1.2.iso
qemu_pidfile=/tmp/qemu_pidfile
qemu-2.7.1/x86_64-softmmu/qemu-system-x86_64 \
-hda $IMAGE \
-net user,hostfwd=tcp::10022-:22 \
-net nic \
-cpu host \
-m ${MEM} \
-pidfile "/tmp/qemu_pidfile" \
--enable-kvm \
-chardev socket,id=mon0,host=localhost,port=4444,ipv4,server,nowait \
-mon chardev=mon0,mode=readline \
-cdrom $FEDORA_ISO
ii. Then comment out in hotplug.sh the portion that creates the VM.
iii. Also comment out these 4 lines at the bottom in hotplug.sh to avoid killing the VM.
qemupid=`cat "$qemu_pidfile" | awk '{printf $0}'`
kill -9 $qemupid
rm "$qemu_pidfile"
rm "$test_img"
iv. Run hotplug.sh (about 5 times and you will see the oops on VM console)
The host system I am using is a Centos 7.2
Isaac
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Verkamp, Daniel
Sent: Wednesday, May 17, 2017 1:09 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] VM crashes while testing SPDK hotplug
Hi Isaac,
The version of the hotplug script in the repository (test/lib/nvme/hotplug.sh) is the current version we are running in our automated test pool.
We haven't hit the -net/--netdev issue that you mentioned yet because the version of qemu we are using is older (the current host system running this test is Fedora 25 with qemu 2.7.1). It looks like we'll need to update the script for that. We would be happy to accept a patch to hotplug.sh if the --netdev option also works on older qemu.
If the kernel crashes due to user program behavior, it sounds like there is a bug in the kernel uio driver. We haven't seen this crash in our automated testing, so I am not sure what the cause could be. It is also worth trying a newer kernel version (we are just using Linux 4.5.5 because the test VM image hasn't been updated in a while).
-- Daniel
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Isaac Otsiabah
Sent: Monday, May 15, 2017 1:43 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: Re: [SPDK] VM crashes while testing SPDK hotplug
Daniel, please, do you have an updated version of the hotplug.sh script you can share with us? I created the VM using this exact command on my Centos 7 host
IMAGE=/home/fedora24/fedora24.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
FEDORA_ISO=/tmp/Fedora-Server-dvd-x86_64-24-1.2.iso
/tmp/qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 \
-hda $IMAGE \
-net nic,model=virtio \
-net bridge,br=br1 \
-netdev user,id=hotplug,hostfwd=tcp::10022-:22 \
-m ${MEM} \
-pidfile "/tmp/qemu_pid_fedora.txt" \
--enable-kvm \
-cpu host \
-chardev socket,id=mon0,host=localhost,port=4444,ipv4,server,nowait \
-mon chardev=mon0,mode=readline \
-cdrom $FEDORA_ISO
After the install is complete, I setup the guest IP address in /etc/sysconfig/network-scripts/ifcfg-ens3 and brings up the interface with ./ifup ens3
>From the VM, I clone spdk and build.
Then I run the group of test in hotplug.sh skipping the VM creation and copying spdk to the VM sections.
I mentioned the -netdev flag in my earlier email
Isaac
From: Isaac Otsiabah
Sent: Monday, May 15, 2017 1:08 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Cc: Isaac Otsiabah <IOtsiabah(a)us.fujitsu.com<mailto:IOtsiabah@us.fujitsu.com>>; Paul Von-Stamwitz <PVonStamwitz(a)us.fujitsu.com<mailto:PVonStamwitz@us.fujitsu.com>>; Edward Yang <eyang(a)us.fujitsu.com<mailto:eyang@us.fujitsu.com>>
Subject: RE: VM crashes while testing SPDK hotplug
Daniel, i installed a Fedora 24 VM and test it. After running the test twice or more, the VM oops. Unlike the previous failure on Centos, this failure does not reboot but VM oops after two or more test run. My host is a Centos machine. I found the qemu-kvm which comes with the OS installation does not support nvme so I build qemu-system-x86_64 version 2.9.
[root@host1 spdk]# /tmp/qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 -version
QEMU emulator version 2.9.0
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
One observation on (although this is not the problem because I executed the scripts/setup.sh and the hotplug binary from vm console during appropriate breakpoints because local port 10022 was not responsive), the hotplug.h has the flag "-net user,hostfwd=tcp::10022-:22 \" to redirect gust ssh port 22 to host port 10022. However, qemu-system-x86_64 version 2.9 does not have this option but it has -netdev option but is is different. The qemu-system-86_64 man page on -netdev flag is as follows:
-netdev user,id=str[,ipv4[=on|off]][,net=addr[/mask]][,host=addr]
[,ipv6[=on|off]][,ipv6-net=addr[/int]][,ipv6-host=addr]
[,restrict=on|off][,hostname=host][,dhcpstart=addr]
[,dns=addr][,ipv6-dns=addr][,dnssearch=domain][,tftp=dir]
[,bootfile=f][,hostfwd=rule][,guestfwd=rule][,smb=dir[,smbserver=addr]]
configure a user mode network backend with ID 'str',
its DHCP server and optional services
It says hostfwd=rule and does not give detail of the rule. I used tcp so I specified it as
-netdev user,id=hotplug,hostfwd=tcp::10022-:22 \
>From the host "netstat -an |egrep -I listen |less" I see local port 10022 is being listened on. I installed sshpass and tested this -netdev flag redirection with a simple sshpass command to the vm but got no response. Therefore, i bypassed executing scripts/setup.sh and the hotplug binary using sshpass command.
So I can test it without executing setup.sh and the hotplug binary through sshpass on port 10022. The main issue is why does it oops after I run it 2 or more times?
Isaac
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Verkamp, Daniel
Sent: Tuesday, May 09, 2017 3:33 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: Re: [SPDK] VM crashes while testing SPDK hotplug
Hi Isaac,
Our hotplug tests with a VM (test/lib/nvme/hotplug.sh) are working with a Fedora 24 VM guest running kernel 4.5.5. I suspect there is a bug in the CentOS kernel version (3.10 is fairly old and is probably missing uio/hotplug-related bug fixes from the mainline kernels).
Can you try to reproduce your problem on a newer kernel version and see if that is the cause of the issue?
Thanks,
-- Daniel
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Isaac Otsiabah
Sent: Tuesday, May 9, 2017 2:11 PM
To: spdk(a)lists.01.org<mailto:spdk@lists.01.org>
Subject: [SPDK] VM crashes while testing SPDK hotplug
I created a VM on a Centos 7 with a listening socket on port 4449 and tested the hotplug.
1. VM creation is as follows
IMAGE=/home/centos7/centos72.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
ISO=/tmp/CentOS-7-x86_64-Everything-1611.iso
[root@host1]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@host1]# ls -l /tmp/CentOS-7-x86_64-Everything-1611.iso
-r--------. 1 qemu qemu 8280604672 Apr 12 13:37 /tmp/CentOS-7-x86_64-Everything-1611.iso
qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 \
-hda $IMAGE \
-net nic,model=virtio \
-net bridge,br=br1 \
-m ${MEM} \
-pidfile "/tmp/qemu_pid2.txt" \
--enable-kvm \
-cpu host \
-chardev socket,id=mon0,host=localhost,port=4449,ipv4,server,nowait \
-mon chardev=mon0,mode=readline \
-cdrom $ISO
2. Without running the SPDK ( ie. examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8 ), the qemu commands to insert fake nvme devices work, i can see the nvme devices in /dev/
echo " drive_add 0 file=/root/test0,format=raw,id=drive0,if=none" | nc localhost 4449
echo " drive_add 1 file=/root/test1,format=raw,id=drive1,if=none" | nc localhost 4449
echo "drive_add 2 file=/root/test2,format=raw,id=drive2,if=none" | nc localhost 4449
echo "drive_add 3 file=/root/test3,format=raw,id=drive3,if=none" | nc localhost 4449
echo "device_add nvme,drive=drive0,id=nvme0,serial=nvme0" |nc localhost 4449
echo "device_add nvme,drive=drive1,id=nvme1,serial=nvme1" |nc localhost 4449
echo "device_add nvme,drive=drive2,id=nvme2,serial=nvme2" |nc localhost 4449
echo "device_add nvme,drive=drive3,id=nvme3,serial=nvme3" |nc localhost 4449
Also, commands to delete the devices work without crashing the VM
echo "device_del nvme0" | nc localhost 4449
echo "device_del nvme1" | nc localhost 4449
echo "device_del nvme2" | nc localhost 4449
echo "device_del nvme3" | nc localhost 4449
3. However, with the SPDK hotplug test application (examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8), the device_del command causes a fault and crashes the VM and it reboot as a result. /var/log/message and I created a VM on a Centos 7 with a listening socket on port 4449 and tested the hotplug.
1. VM creation is as follows
IMAGE=/home/centos7/centos72.img
qemu-img create -f qcow2 $IMAGE 50G
MEM=8192M
ISO=/tmp/CentOS-7-x86_64-Everything-1611.iso
[root@host1]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@host1]# ls -l /tmp/CentOS-7-x86_64-Everything-1611.iso
-r--------. 1 qemu qemu 8280604672 Apr 12 13:37 /tmp/CentOS-7-x86_64-Everything-1611.iso
qemu-2.9.0/x86_64-softmmu/qemu-system-x86_64 \
-hda $IMAGE \
-net nic,model=virtio \
-net bridge,br=br1 \
-m ${MEM} \
-pidfile "/tmp/qemu_pid2.txt" \
--enable-kvm \
-cpu host \
-chardev socket,id=mon0,host=localhost,port=4449,ipv4,server,nowait \
-mon chardev=mon0,mode=readline \
-cdrom $ISO
2. Without running the SPDK ( ie. examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8 ), the qemu commands to insert fake nvme devices work, i can see the nvme devices in /dev/
echo " drive_add 0 file=/root/test0,format=raw,id=drive0,if=none" | nc localhost 4449
echo " drive_add 1 file=/root/test1,format=raw,id=drive1,if=none" | nc localhost 4449
echo "drive_add 2 file=/root/test2,format=raw,id=drive2,if=none" | nc localhost 4449
echo "drive_add 3 file=/root/test3,format=raw,id=drive3,if=none" | nc localhost 4449
echo "device_add nvme,drive=drive0,id=nvme0,serial=nvme0" |nc localhost 4449
echo "device_add nvme,drive=drive1,id=nvme1,serial=nvme1" |nc localhost 4449
echo "device_add nvme,drive=drive2,id=nvme2,serial=nvme2" |nc localhost 4449
echo "device_add nvme,drive=drive3,id=nvme3,serial=nvme3" |nc localhost 4449
Also, commands to delete the devices work without crashing the VM
echo "device_del nvme0" | nc localhost 4449
echo "device_del nvme1" | nc localhost 4449
echo "device_del nvme2" | nc localhost 4449
echo "device_del nvme3" | nc localhost 4449
3. However, with the SPDK hotplug test application (examples/nvme/hotplug/hotplug -i 0 -t 15 -n 4 -r 8), the device_del command causes a fault and crashes the VM and it reboot as a result. The /var/log/message and vmcore-dmesg.txt files are in the attached tar file. Would appreciate any help in why a bug in SPDK crashes the VM. Thanks.
Isaac
3 years, 8 months
Questions on spdk roadmap
by Paul Von-Stamwitz
Hi,
For version 17.07:
"Full stack hotplug: Removal support (iSCSI, NVMe-oF and vhost)"
This looks like we're propagating device removal to the host interfaces. Is this correct?
"nvme-cli support"
Can you provide some more detail as to what this is?
Thanks,
Paul
3 years, 9 months