We have encountered an issue with some Mellanox cards where rdma_bind_addr succeeds but
the ib_verbs pointer is NULL which caused spdk to crash when attempting to use this port.
The reason for this seems to be an invalid GUID of 0 (bellow is a procedure to re-flash
I don't think that this should cause spdk to crash, so I added a patch for review to
check that the IB verbs is not NULL after binding -
Hope this helps,
It seems that Mellanox has a "blank_guid" option in its mstflint flash
interface, so some manufacturers may provide RNICs with a base GUID of 0. This issue can
be fixed by using the same tool to flash a new GUID. We use the MAC to generate it.
#Here is an example of such an RNIC with 0 as a base GUID:
[root@kblock01-knode05 ~]# ibv_devices
device node GUID
#First generate the GUID out of the MAC
BASE_MAC=$(mstflint --device=mlx5_0 query | grep "^Base MAC" | tr -s "
" | cut -d" " -f 3)
BASE_GUID=$(echo $BASE_MAC | cut -c 1-6)"0300"$(echo $BASE_MAC | cut -c 7-12)
#Now I flash the new GUID:
mstflint --device=mlx5_0 --guid=$BASE_GUID --override_cache_replacement --nofs sg
#The new GUID is not effective until FW reset or power cycle
mlxfwreset --device=/dev/mst/mt4117_pciconf0 --yes reset