Rename: Question and proposal for network function of iSCSI target
by 松本周平 / MATSUMOTO,SHUUHEI
Hi,
“Unusual” is not correct and I would like to correct the title.
Besides, I would like to add another one.
9) iSCSI SendTargets is implemented in spdk_iscsi_send_tgts() function (in lib/iscsi/tgt_node.c).
I think following two filters use the global portal group tag which the initiator’s connection has.
However I have not found any description in the iSCSI specification such that the portal group tag is used for filter.
And the meaning of portal group tag of SPDK iSCSI target is different from iSCSI specification.
- static int spdk_iscsi_tgt_node_visible(struct spdk_iscsi_tgt_node *target, const char *iqn, int pg_tag)
- static int spdk_iscsi_portal_grp_is_visible(struct spdk_iscsi_tgt_node *target, const char *iqn, int pg_tag)
I think reconsidering the code of SendTargets will be necessary.
Today that’s all from me.
I used mailing list for such purpose also this time but I would like to get used in GerritHub and Trello more.
Thank you for your support in advance.
Shuhei Matsumoto
From: 松本周平 / MATSUMOTO,SHUUHEI
Sent: Friday, October 20, 2017 12:05 PM
To: Storage Performance Development Kit
Subject: RE: Unusual interface and implementation of SPDK network function for iSCSI target
Sorry for repeated self reply.
2) User can use “*” as 0.0.0.0 or INADDR_ANY for IPv4 and “[*]” as [::] or in6addr_any for IPV6.
I’m not so confident of my expertise for networking, but I’ve never heard of this odd interface.
I would like to propose deleting the code related with “*” and “[*]”.
does not cause any apparent error or misunderstanding.
I did not understand why this interface is implemented. Hence I asked but if this interface is convenient, of course this should be maintained.
Thank you,
Shuhei Matsumoto
From: 松本周平 / MATSUMOTO,SHUUHEI
Sent: Friday, October 20, 2017 10:46 AM
To: Storage Performance Development Kit
Subject: RE: Unusual interface and implementation of SPDK network function for iSCSI target
I’m preparing patch and change message for each one. I apologize for your inconvenience until that.
From: 松本周平 / MATSUMOTO,SHUUHEI
Sent: Friday, October 20, 2017 10:28 AM
To: Storage Performance Development Kit
Subject: Unusual interface and implementation of SPDK network function for iSCSI target
Hi,
I’m not so confident of networking but as long as I looked into the code I have found that at least the following items make iSCSI target erroneous or difficult to understand.
The customized socket interface of SPDK (lib/net/sock.c) is only used in the SPDK iSCSI target. Hence I think that now may be a good chance to refactor.
Related with my pushed changes I would like to change my priority to the following:
1. the change https://review.gerrithub.io/#/c/381246/
2. the following (I have not registered into GerritHub except a few.)
3. remaining my pushed changes.
I appreciate any feedback and I hope these would make sense for you and more standardized implementation will make connecting SPDK iSCSI target to user space TCP/IP stack easier.
Best Regards,
Shuhei Matsumoto
1) spdk_sock_getaddr(sock, saddr, slen, caddr, clen) (in lib/net/sock.c) can return only IPv4 address correctly, because get_addr_str() does not take into account of IPv6. Hence current code may not work in IPv6 correctly.
static int get_addr_str(struct sockaddr_in *paddr, char *host, size_t hlen)
{
uint8_t *pa;
if (paddr == NULL || host == NULL)
return -1;
pa = (uint8_t *)&paddr->sin_addr.s_addr;
snprintf(host, hlen, "%u.%u.%u.%u", pa[0], pa[1], pa[2], pa[3]);
return 0;
}
2) User can use “*” as 0.0.0.0 or INADDR_ANY for IPv4 and “[*]” as [::] or in6addr_any for IPV6.
I’m not so confident of my expertise for networking, but I’ve never heard of this odd interface.
I would like to propose deleting the code related with “*” and “[*]”.
3) Network portal (struct spdk_iscsi_portal) remember IP address-port pair not as struct sockaddr but only string, host and port.
Hence iSCSI target do not know if each network portal is IPv4 or IPv6 and have to check “[“ and “]” manually.
If we strip “[“ and “]” from the user input and then pass it to getaddrinfo(), we can know and remember if it is IPv4 or IPv6.
It will be helpful and we can delete the helper function spdk_sock_is_ipv6/4() in lib/net/sock.c.
4) In the spdk_iscsi_tgt_node_access(),
“ALL” means that initiator group allows ANY IP address-port pair or iSCSI name of initiators.
However iSCSI target do not know ALL initiators beforehand.
Hence ANY may be better than ALL.
5) spdk_sock_connect() is not used anywhere. Hence abstraction by spdk_sock_create(LISTEN or CONNECT) is not necessary in lib/net/sock.c.
6) spdk_iscsi_portal_grp_is_visible may not run connrectly in the following case:
- an iSCSI target has PG1-IG1 and PG1-IG2
- an initiator has logined to PG1 of the target.
- the initiator is allowed by not IG1 but IG2.
-> However spdk_iscsi_portal_grp_is_visible() only check the first IG among the same PG, that is, only IG1.
Hence in this case spdk_iscsi_portal_grp_is_visible() should return true but would return false.
-> I think this is caused by PG-IG map array and PG-IG map tree will be better (https://review.gerrithub.io/#/c/379933/).
7) I found OK or NG in the comment in lib/iscsi/tgt_node.c.
These are not proper English and localized one in Japan. Hence it may be better to change to Allow and Deny because these are for ACL.
8) initiator group allow empty netmask list as a normal configuration. However we cannot create such one.
(https://review.gerrithub.io/#/c/382920/)
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Victor Banh
Sent: Thursday, October 19, 2017 4:17 PM
To: Storage Performance Development Kit; Harris, James R; Cao, Gang
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
I am using Ubuntu 16.04 and kernel 4.12.X.
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Wednesday, October 18, 2017 11:51:39 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root@node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root@slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root@slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root@slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root@slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root@slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I’ve just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root@node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root@node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root@node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao@intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao@intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root@node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root@node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris@intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces@lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk@lists.01.org>" <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
3 years, 4 months
Failed to write to /dev/nvme-fabrics: Cannot allocate memory
by Pelplinski, Piotr
Hi,
During tests with multiple nvmf subsystems we have found issue on virtual machine.
Patch fails on Test Pool fedora-06 virtual machine, but passes fedora-03.
Logs from failing fedora-06 machine:
build.log:
...
08:08:38 # nvme connect -t rdma -n nqn.2016-06.io.spdk:cnode1 -a 10.0.2.15 -s 4420
Failed to write to /dev/nvme-fabrics: Cannot allocate memory
...
x
dmesg.log:
...
[ 647.460103] blk-mq: failed to allocate request map
[ 647.470014] ------------[ cut here ]------------
[ 647.470039] WARNING: CPU: 14 PID: 18093 at drivers/infiniband/core/verbs.c:303 ib_dealloc_pd+0x47/0x70 [ib_core]
[...]
[ 647.470163] Call Trace:
[ 647.470172] nvme_rdma_dev_put+0x76/0x90 [nvme_rdma]
[ 647.470175] nvme_rdma_destroy_admin_queue+0x84/0x90 [nvme_rdma]
[ 647.470178] nvme_rdma_create_ctrl+0x2df/0x5db [nvme_rdma]
[ 647.470184] nvmf_dev_write+0x919/0xa8e [nvme_fabrics]
[ 647.470193] __vfs_write+0x37/0x160
[ 647.470196] ? __vfs_write+0x37/0x160
[ 647.470201] ? selinux_file_permission+0xd7/0x110
[ 647.470206] ? security_file_permission+0x3b/0xc0
[ 647.470209] vfs_write+0xb5/0x1a0
[ 647.470213] SyS_write+0x55/0xc0
...
Please look at this patch for more details:
https://review.gerrithub.io/#/c/379247/
Do you know what causes the issue on virtual machine?
--
Best Regards,
Piotr Pelpliński
3 years, 4 months
Unusual interface and implementation of SPDK network function for iSCSI target
by 松本周平 / MATSUMOTO,SHUUHEI
Hi,
I’m not so confident of networking but as long as I looked into the code I have found that at least the following items make iSCSI target erroneous or difficult to understand.
The customized socket interface of SPDK (lib/net/sock.c) is only used in the SPDK iSCSI target. Hence I think that now may be a good chance to refactor.
Related with my pushed changes I would like to change my priority to the following:
1. the change https://review.gerrithub.io/#/c/381246/
2. the following (I have not registered into GerritHub except a few.)
3. remaining my pushed changes.
I appreciate any feedback and I hope these would make sense for you and more standardized implementation will make connecting SPDK iSCSI target to user space TCP/IP stack easier.
Best Regards,
Shuhei Matsumoto
1) spdk_sock_getaddr(sock, saddr, slen, caddr, clen) (in lib/net/sock.c) can return only IPv4 address correctly, because get_addr_str() does not take into account of IPv6. Hence current code may not work in IPv6 correctly.
static int get_addr_str(struct sockaddr_in *paddr, char *host, size_t hlen)
{
uint8_t *pa;
if (paddr == NULL || host == NULL)
return -1;
pa = (uint8_t *)&paddr->sin_addr.s_addr;
snprintf(host, hlen, "%u.%u.%u.%u", pa[0], pa[1], pa[2], pa[3]);
return 0;
}
2) User can use “*” as 0.0.0.0 or INADDR_ANY for IPv4 and “[*]” as [::] or in6addr_any for IPV6.
I’m not so confident of my expertise for networking, but I’ve never heard of this odd interface.
I would like to propose deleting the code related with “*” and “[*]”.
3) Network portal (struct spdk_iscsi_portal) remember IP address-port pair not as struct sockaddr but only string, host and port.
Hence iSCSI target do not know if each network portal is IPv4 or IPv6 and have to check “[“ and “]” manually.
If we strip “[“ and “]” from the user input and then pass it to getaddrinfo(), we can know and remember if it is IPv4 or IPv6.
It will be helpful and we can delete the helper function spdk_sock_is_ipv6/4() in lib/net/sock.c.
4) In the spdk_iscsi_tgt_node_access(),
“ALL” means that initiator group allows ANY IP address-port pair or iSCSI name of initiators.
However iSCSI target do not know ALL initiators beforehand.
Hence ANY may be better than ALL.
5) spdk_sock_connect() is not used anywhere. Hence abstraction by spdk_sock_create(LISTEN or CONNECT) is not necessary in lib/net/sock.c.
6) spdk_iscsi_portal_grp_is_visible may not run connrectly in the following case:
- an iSCSI target has PG1-IG1 and PG1-IG2
- an initiator has logined to PG1 of the target.
- the initiator is allowed by not IG1 but IG2.
-> However spdk_iscsi_portal_grp_is_visible() only check the first IG among the same PG, that is, only IG1.
Hence in this case spdk_iscsi_portal_grp_is_visible() should return true but would return false.
-> I think this is caused by PG-IG map array and PG-IG map tree will be better (https://review.gerrithub.io/#/c/379933/).
7) I found OK or NG in the comment in lib/iscsi/tgt_node.c.
These are not proper English and localized one in Japan. Hence it may be better to change to Allow and Deny because these are for ACL.
8) initiator group allow empty netmask list as a normal configuration. However we cannot create such one.
(https://review.gerrithub.io/#/c/382920/)
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Victor Banh
Sent: Thursday, October 19, 2017 4:17 PM
To: Storage Performance Development Kit; Harris, James R; Cao, Gang
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
I am using Ubuntu 16.04 and kernel 4.12.X.
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Wednesday, October 18, 2017 11:51:39 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root@node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root@slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root@slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root@slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root@slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root@slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I’ve just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root@node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root@node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root@node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao@intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb@mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao@intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root@node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root@node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris@intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris@intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces@lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb@mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk@lists.01.org>" <spdk(a)lists.01.org<mailto:spdk@lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
3 years, 4 months
Error Occurs Randomly when Loading BlobFS Super Block
by Fenggang Wu
Hi Guys,
I run into two scenarios using rocksdb db_bench where the problem occurs
during the blobfs loading the superblock, especially, a mismatching
superblock version number.
1) Malloc Bdev Module
I used the config as follows:
[Malloc]
NumberOfLuns 1
LunSizeInMB 64
I didn't use LunSizeInMB greater than 128MB because larger LunSize causes
spdk_dma_zmalloc() to fail in mkfs.
Even with mkfs successfully executed, db_bench reports an error as follows:
Could not load SPDK blobfs - check that SPDK mkfs was run against block
device Mallo
c0.
Tracing down the source code using gdb, I found the origin of this problem:
in blobstore.c:1612 _spdk_bs_load_super_cpl(), the loaded super version
number mismatches:
ctx->super->version != SPDK_BS_VERSION.
2) My own developed vbdev_agg module.
I am implementing my own vbdev_agg module, which aggregated several base
bdevs as a virtual one using striping. Currently, I am using two NVMe SSDs
(372GB each) as the base device.
My vbdev_agg can successfully complete
the examples/blob/hello_world/hello_blob.c test, where it can write a blob
and read it back with a matching content. Different start page and length
were tested.
Interestingly, the problem happens *randomly*. Sometimes, db_bench can just
successfully exit. Other times, same as the Mallloc senario, db_bench
complains that it cannot load blobfs, which is also because of a
mismatching SPDK_BS_VERSION. After a successful mkfs, db_bench may fail for
several times, then succeed for several times, then fails again, so on and
so forth.
Then I got confused. In my vbdev_agg case, if the superblock data was not
successfully flushed to the two NVMe SSDs during the mkfs program, or the
superblock reading code was not correct, the db_bench couldn't succeed for
even once. But if the superblock is successfully written and the read logic
is correctlly implemented, how come a mismatching superblock version
number?
I've already excluded multi-thread racing problem. When rocksdb is
initializing the spdk environment(blobfs) using one dedicate thread, the
rocksdb main thread is blocked until the fs is ready. So only one thread is
carrying out the blobfs loading task.
If it is not because of multi-thread uncertainty, then how comes this
random behavior?
The Malloc0 scenario fails every time. I suspect that it is due to the
small LUN size (64MB). But I am not sure. Then for the random behavior in
vbdev_agg scenario, I have not clue. BTW, I also tried using one single
nvme bdev, and it succeeds each time.
Any ideas/thoughts will be helpful. Feel free to ask for any additional
information. I sincerely appreciate it.
Thanks!
Fenggang
3 years, 4 months
NVMf Target configuration issue
by Gyan Prakash
Hi all,
I see issue with NVMf Target configuration. I also see the thread [SPDK]
NVMf target configuration issue *Thu Sep 7 01:29:33 PDT 2017 *which is for
the similar issue what I am seeing. I tried the suggestion from that
thread, but it was not helpful for my problem.
* Error Message:*
* # ./nvmf_tgt -t all -c nvmf.conf*
Starting DPDK 17.02.0 initialization...
[ DPDK EAL parameters: nvmf -c fff --file-prefix=spdk_pid3911 ]
EAL: Detected 12 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
Occupied cpu core mask is 0xfff
Occupied cpu socket mask is 0x1
EAL: PCI device 0000:00:04.0 on NUMA socket 0
EAL: probe driver: 8086:2f20 spdk_ioat
Found matching device at 0000:00:04.0 vendor:0x8086 device:0x2f20
EAL: PCI device 0000:00:04.1 on NUMA socket 0
EAL: probe driver: 8086:2f21 spdk_ioat
Found matching device at 0000:00:04.1 vendor:0x8086 device:0x2f21
EAL: PCI device 0000:00:04.2 on NUMA socket 0
EAL: probe driver: 8086:2f22 spdk_ioat
Found matching device at 0000:00:04.2 vendor:0x8086 device:0x2f22
EAL: PCI device 0000:00:04.3 on NUMA socket 0
EAL: probe driver: 8086:2f23 spdk_ioat
Found matching device at 0000:00:04.3 vendor:0x8086 device:0x2f23
EAL: PCI device 0000:00:04.4 on NUMA socket 0
EAL: probe driver: 8086:2f24 spdk_ioat
Found matching device at 0000:00:04.4 vendor:0x8086 device:0x2f24
EAL: PCI device 0000:00:04.5 on NUMA socket 0
EAL: probe driver: 8086:2f25 spdk_ioat
Found matching device at 0000:00:04.5 vendor:0x8086 device:0x2f25
EAL: PCI device 0000:00:04.6 on NUMA socket 0
EAL: probe driver: 8086:2f26 spdk_ioat
Found matching device at 0000:00:04.6 vendor:0x8086 device:0x2f26
EAL: PCI device 0000:00:04.7 on NUMA socket 0
EAL: probe driver: 8086:2f27 spdk_ioat
Found matching device at 0000:00:04.7 vendor:0x8086 device:0x2f27
Ioat Copy Engine Offload Enabled
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 8086:953 spdk_nvme
Probing device 0000:01:00.0
[nvme] nvme_ctrlr.c: 502:nvme_ctrlr_set_state: setting state to init (no
timeout)
[nvme] nvme_ctrlr.c:1135:nvme_ctrlr_process_init: CC.EN = 1
[nvme] nvme_ctrlr.c:1149:nvme_ctrlr_process_init: Setting CC.EN = 0
[nvme] nvme_ctrlr.c: 506:nvme_ctrlr_set_state: setting state to disable and
wait for CSTS.RDY = 0 (timeout 20000 ms)
[nvme] nvme_ctrlr.c:1206:nvme_ctrlr_process_init: CC.EN = 0 && CSTS.RDY = 0
- enabling controller
[nvme] nvme_ctrlr.c:1208:nvme_ctrlr_process_init: Setting CC.EN = 1
[nvme] nvme_ctrlr.c: 506:nvme_ctrlr_set_state: setting state to enable and
wait for CSTS.RDY = 1 (timeout 20000 ms)
[nvme] nvme_ctrlr.c:1217:nvme_ctrlr_process_init: CC.EN = 1 && CSTS.RDY = 1
- controller is ready
[nvme] nvme_ctrlr.c: 594:nvme_ctrlr_identify: transport max_xfer_size
2072576
[nvme] nvme_ctrlr.c: 598:nvme_ctrlr_identify: MDTS max_xfer_size 131072
[nvme] nvme_ctrlr.c: 673:nvme_ctrlr_set_keep_alive_timeout: Controller KAS
is 0 - not enabling Keep Alive
[nvme] nvme_ctrlr.c: 502:nvme_ctrlr_set_state: setting state to ready (no
timeout)
[debug] bdev.c: 932:spdk_bdev_register: Inserting bdev Nvme0n1 into list
Total cores available: 12
Reactor started on core 1 on socket 0
Reactor started on core 2 on socket 0
Reactor started on core 3 on socket 0
Reactor started on core 4 on socket 0
Reactor started on core 5 on socket 0
Reactor started on core 6 on socket 0
Reactor started on core 7 on socket 0
Reactor started on core 8 on socket 0
Reactor started on core 9 on socket 0
Reactor started on core 10 on socket 0
Reactor started on core 0 on socket 0
[nvmf] nvmf.c: 68:spdk_nvmf_tgt_init: Max Queues Per Session: 4
[nvmf] nvmf.c: 69:spdk_nvmf_tgt_init: Max Queue Depth: 128
[nvmf] nvmf.c: 70:spdk_nvmf_tgt_init: Max In Capsule Data: 4096 bytes
[nvmf] nvmf.c: 71:spdk_nvmf_tgt_init: Max I/O Size: 131072 bytes
*** RDMA Transport Init ***
Reactor started on core 11 on socket 0
*allocated subsystem nqn.2014-08.org.nvmexpress.discovery on lcore 0 on
socket 0*
*allocated subsystem nqn.2016-06.io.spdk:cnode1 on lcore 5 on socket 0*
*[rdma] rdma.c:1177:spdk_nvmf_rdma_listen: For listen id 0x1720840 with
context 0x1720db0, created completion channel 0x1720ad0*
*conf.c: 555:spdk_nvmf_construct_subsystem: ***ERROR*** Subsystem
nqn.2016-06.io.spdk:cnode1: missing NVMe directive*
*[nvmf] subsystem.c: 231:spdk_nvmf_delete_subsystem: subsystem is 0x1720b20*
*nvmf_tgt.c: 313:spdk_nvmf_startup: ***ERROR*** spdk_nvmf_parse_conf()
failed*
*[nvmf] subsystem.c: 231:spdk_nvmf_delete_subsystem: subsystem is 0x17205b0*
*[nvme] nvme_ctrlr.c: 406:nvme_ctrlr_shutdown: shutdown complete*
*# ./setup.sh*
0000:01:00.0 (8086 0953): nvme -> uio_pci_generic
0000:00:04.0 (8086 2f20): ioatdma -> uio_pci_generic
0000:00:04.1 (8086 2f21): ioatdma -> uio_pci_generic
0000:00:04.2 (8086 2f22): ioatdma -> uio_pci_generic
0000:00:04.3 (8086 2f23): ioatdma -> uio_pci_generic
0000:00:04.4 (8086 2f24): ioatdma -> uio_pci_generic
0000:00:04.5 (8086 2f25): ioatdma -> uio_pci_generic
0000:00:04.6 (8086 2f26): ioatdma -> uio_pci_generic
0000:00:04.7 (8086 2f27): ioatdma -> uio_pci_generic
*# lspci | grep -i vola**
01:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center
SSD (rev 01)
*# lspci -v -s 0000:01:00.0*
01:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center
SSD (rev 01) (prog-if 02 [NVM Express])
Subsystem: Intel Corporation DC P3700 SSD
Physical Slot: 1
Flags: fast devsel, IRQ 24, NUMA node 0
Memory at fb410000 (64-bit, non-prefetchable) [size=16K]
Expansion ROM at fb400000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI-X: Enable- Count=32 Masked-
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [150] Virtual Channel
Capabilities: [180] Power Budgeting <?>
Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
Capabilities: [270] Device Serial Number 55-cd-2e-41-4d-34-1e-1b
Capabilities: [2a0] #19
Kernel driver in use: uio_pci_generic
Kernel modules: nvme
3 years, 4 months
Memory Issue With SPDK App
by karthi M
Hi All,
Whenever SPDK app start and stop invoked for multiple times, I’m seeing memory foot print is increasing gradually after each app start. I have seen this behaviour in the Virtual box Where I’m starting the iscsi app in a loop after some time entire system memory is hogged and not released even after the binary is killed.
I’m using the SPDK version : 17.07.1
My VM configuration:
VM memory : 4GB
CPU cores : 1
Guest OS : Fedora 25
huge pages - 2MB pages : 512 (1GB)
The script which I have used to run create this issue:
#!/usr/bin/env bash
cd /spdk/app/iscsi_tgt/
repeat=$1
for i in $(seq 1 $repeat);
do
./iscsi_tgt -c iscsi.conf &
sleep 2
pid=` ps -ef | grep iscsi_tgt | awk '{print $2}' | head -1`
kill -9 $pid
done
Has Anyone seen similar issue in SPDK in latest version? or Is this expected behaviour?
Please provide your inputs.
Regards,
Karthi
3 years, 4 months
Re: [SPDK] Understanding io_channel
by Annan Chang 張安男
Hello Fenggang and all,
I am new to SPDK and recently I want to write a vbdev device that contain 2 bdev device.
I found in mail-list that vbdev_agg.c written by Fenggang is suitable for me as example.
(Thank you so much ,Fenggang)
And I got the code in
https://github.com/fgwu/spdk/tree/737ae30c99c80e1bc1d13e535b5a1d9a411ffa8c
and compile it successful.
By using the configure below
============================================
[AIO]
AIO /dev/ram0 AIO0
AIO /dev/ram1 AIO1
[Agg]
VBDev myraid AIO0 AIO1
……………………………………….
[TargetNode1]
……………………
LUN0 myraid
…………………..
==================================================
When I try to connect the iscsi target,
the program get error in conn.c:1212
===================================================
bytes = spdk_sock_writev(conn->sock, iov, iovec_cnt);
if (bytes == -1) {
if (errno == EWOULDBLOCK || errno == EAGAIN) {
return 0;
} else {
perror("writev"); <-------------------------------- get error here
return -1;
}
}
======================================================
I found the iov is null so the variable bytes return by spdk_sock_writev is -1.
I have tried to guess why the iov is error , modify the code and do some experiment.
But the problem still there.
Does anyone can give me some hints/suggestions ??
Any suggestions/hints will be appreciated. Thank you very much!
Regards,
AnNan
3 years, 4 months
Asynchronous and Sequential Loop Implementation in SPDK
by 松本周平 / MATSUMOTO,SHUUHEI
Hi Jim, Ziya, and All,
Thank you for your very helpful feedback.
What I want to do in near future is to execute loop asynchronously and sequentially like /lib/util/io_channel.c.
But this is for a particular purpose and not so simple.
If you don't have to spend much time, please check briefly whether my understanding is on the right track.
# I'm sorry that I should have better tool than mailing-list to ask for you.
Thank you,
Shuhei Matsumoto
[current implementation]
void spdk_iscsi_acceptor_stop_all(void)
{
struct spdk_iscsi_portal *p;
TAILQ_FOREACH(p, &pg->portal_head, tailq)
spdk_iscsi_acceptor_stop(p);
}
void spdk_iscsi_acceptor_stop(struct spdk_iscsi_portal *p)
{
spdk_poller_unregister(&p->acceptor_poller, NULL);
}
[asynchronous implementation to detect completion of unregister]
# pick up the first item in the portal list
# pass it to spdk_iscsi_acceptor_stop() together with the callback function address cb_fn.
void spdk_iscsi_acceptor_stop_all(spdk_iscsi_acceptor_stop_all_complete cb_fn)
{
struct spdk_iscsi_portal *p;
p = TAILQ_FIRST(&pg->portal_head);
spdk_iscsi_acceptor_stop(p, cb_fn);
}
# If spdk_poller_unregister() complete, then the callback _spdk_iscsi_acceptor_stop_cb(p, cb_fn) will be called.
void spdk_iscsi_acceptor_stop(struct spdk_iscsi_portal *p, spdk_iscsi_acceptor_stop_all_complete cb_fn)
{
struct spdk_event *event;
event = spdk_event_allocate(p->acceptor_poller->lcore, _spdk_iscsi_acceptor_stop_cb, p, cb_fn);
spdk_poller_unregsiter(&p->acceptor_poller, event);
}
# get the next of the current.
# if the next is NULL, then call the callback function and return.
# if the next is not NULL,
void _spdk_iscsi_acceptor_stop_cb(void *arg1, void *arg2)
{
struct spdk_iscsi_portal *p, *next;
p = (struct spdk_iscsi_portal *)arg1;
spdk_iscsi_acceptor_stop_all_complete cb_fn = (spdk_iscsi_acceptor_stop_all_complete)arg2;
next = TAILQ_NEXT(p, tailq);
if (next == NULL) {
cb_fn();
} else {
spdk_iscsi_acceptor_stop(next, cb_fn);
}
}
> -----Original Message-----
> From: 松本周平 / MATSUMOTO,SHUUHEI
> Sent: Friday, October 13, 2017 9:46 AM
> To: Storage Performance Development Kit
> Subject: RE: [SPDK] Two Questions/Ideas about SPDK Framework (NVMf/e and RPC)
>
> Hi Jim and Ziye,
>
> Thank you for your very helpful feedback.
> I did not understand bdev correctly yet, but thanks to you I could have clear view and will be
> able to look into code easier.
>
> About RPC, it's very good to know SPDK already have asynchronous RPC implementation. I would like
> to learn that.
>
> Thank you,
> Shuhei
>
>
>
>
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces@lists.01.org] On Behalf Of Harris,
> > James R
> > Sent: Friday, October 13, 2017 9:20 AM
> > To: Storage Performance Development Kit
> > Subject: [!]Re: [SPDK] Two Questions/Ideas about SPDK Framework
> > (NVMf/e and RPC)
> >
> >
> > > On Oct 12, 2017, at 4:59 PM, 松本周平 / MATSUMOTO,SHUUHEI
> > > <shuhei.matsumoto.xt(a)hitachi.com>
> > wrote:
> > >
> > > Hi,
> >
> > Hi Shuhei,
> >
> > >
> > > These are very unshaped questions/ideas yet because I have not a few
> > > things to do for iSCSI/SCSI
> > now, but I would like to hear your thought at this stage.
> > >
> > > 1)
> > > If NVMe_backend_driver and NVMf_target_driver run on the same CPU,
> > > we may be able to do the end-to-end run-to-completion model and may get some benefit from locality.
> > > (https://github.com/stanford-mast/reflex is very interesting for
> > > me.)
> > >
> > > As long as I understand, we should locate NVMe_backend_driver and
> > > NVMf_target_driver on the
> > different CPU, respectively.
> > > NVMf_target_driver is in the SPDK poller framework but NVMe_backend_driver is not.
> >
> > There are actually two NVMe “drivers” in SPDK.
> >
> > The primary one is the core NVMe driver found in lib/nvme. This
> > driver has no dependencies on the SPDK application framework and so
> > does not create any kind of poller. This library is passive, requiring the user to call
> spdk_nvme_qpair_process_completions() to check for any completed I/O.
> >
> > But there is also a bdev NVMe driver found in lib/bdev/nvme. bdev is
> > the abbreviated name for the SPDK block device layer. This driver
> > does have dependencies on the SPDK application framework and will
> > create pollers that call spdk_nvme_qpair_process_completions(). When
> > upper level protocols such as an NVMe-oF or iSCSI target create an I/O
> > channel to send I/O to an NVMe block device, this will trigger the
> > bdev NVMe driver to both allocate an NVMe queue pair and a poller for the completion queue. See
> bdev_nvme_create_cb() and bdev_nvme_poll() in lib/bdev/nvme/bdev_nvme.c.
> >
> > >
> > > Is it reasonable to put some functions of NVMe_backend_driver into the SPDK poller framework?
> > > Or when we do iSCSI target based on DPDK in future, we put some
> > > functions of NVMe_backend_driver
> > into the DPDK event driven framework?
> > >
> > >
> > > 2)
> > > Currently it is difficult for the RPC handler to use semaphore in
> > > the middle, hence I have proposed
> > one idea (https://review.gerrithub.io/#/c/379941/) as a workaround.
> > > Related with this, as Jim taught me, VHOST-SCSI thread is outside of
> > > SPDK and can use semaphore
> > to do synchronous operation.
> > >
> > > To support complex operation in RPC, I think there are at least three approaches:
> > > a) https://review.gerrithub.io/#/c/379941/
> > > b) support asynchronous RPC reply by using callback or event.
> > > b) RPC handler is outside of SPDK threads and communicate with SPDK thread through IPC.
> > >
> > > I would like to propose b) if this looks reasonable.
> >
> > Yes - (b) is very reasonable. We have some cases of this already.
> > For example, most logical volume operations are asynchronous since
> > they require disk I/O. You can see lib/bdev/lvol/vbdev_lvol_rpc.c for some examples of how to
> implement an asynchronous RPC.
> >
> > >
> > >
> > > I’m afraid my explanation is not enough and thank you for your patience in advance.
> > >
> > > Thank you,
> > > Shuhei Matsumoto
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> >
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
3 years, 4 months