[PATCH net 0/9] mptcp: Fixes for v5.12
by Mat Martineau
These patches from the MPTCP tree fix a few multipath TCP issues:
Patches 1 and 5 clear some stale pointers when subflows close.
Patches 2, 4, and 9 plug some memory leaks.
Patch 3 fixes a memory accounting error identified by syzkaller.
Patches 6 and 7 fix a race condition that slowed data transmission.
Patch 8 adds missing wakeups when write buffer space is freed.
Florian Westphal (4):
mptcp: reset last_snd on subflow close
mptcp: put subflow sock on connect error
mptcp: dispose initial struct socket when its subflow is closed
mptcp: reset 'first' and ack_hint on subflow close
Geliang Tang (1):
mptcp: free resources when the port number is mismatched
Paolo Abeni (4):
mptcp: fix memory accounting on allocation error
mptcp: factor out __mptcp_retrans helper()
mptcp: fix race in release_cb
mptcp: fix missing wakeup
net/mptcp/protocol.c | 165 +++++++++++++++++++++++++++----------------
net/mptcp/subflow.c | 14 ++--
2 files changed, 112 insertions(+), 67 deletions(-)
base-commit: a9ecb0cbf03746b17a7c13bd8e3464e6789f73e8
--
2.30.1
1 year, 5 months
[PATCH net] mptcp: fix length of ADD_ADDR with port suboption
by Davide Caratti
in current Linux, MPTCP peers advertising endpoints with port numbers use
a sub-option length that wrongly accounts for the trailing TCP NOP. Also,
receivers will only process incoming ADD_ADDR with port having such wrong
sub-option length. Fix this, making ADD_ADDR compliant to RFC8684 §3.4.1.
this can be verified running tcpdump on the kselftests artifacts:
unpatched kernel:
[root@bottarga mptcp]# tcpdump -tnnr unpatched.pcap | grep add-addr
reading from file mp_join-01-ns1-0-CnJXtE.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
IP 10.0.1.1.10000 > 10.0.1.2.53078: Flags [.], ack 101, win 509, options [nop,nop,TS val 214459678 ecr 521312851,mptcp add-addr v1 id 1 a00:201:2774:2d88:7436:85c3:17fd:101], length 0
IP 10.0.1.2.53078 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 521312852 ecr 214459678,mptcp add-addr[bad opt]]
patched kernel:
[root@bottarga mptcp]# tcpdump -tnnr patched.pcap | grep add-addr
reading from file mp_join-01-ns1-0-pVuVxp.pcap, link-type LINUX_SLL (Linux cooked v1), snapshot length 65535
IP 10.0.1.1.10000 > 10.0.1.2.38178: Flags [.], ack 101, win 509, options [nop,nop,TS val 3728873902 ecr 2732713192,mptcp add-addr v1 id 1 10.0.2.1:10100 hmac 0xbccdfcbe59292a1f,nop,nop], length 0
IP 10.0.1.2.38178 > 10.0.1.1.10000: Flags [.], ack 101, win 502, options [nop,nop,TS val 2732713195 ecr 3728873902,mptcp add-addr v1-echo id 1 10.0.2.1:10100,nop,nop], length 0
Fixes: 22fb85ffaefb ("mptcp: add port support for ADD_ADDR suboption writing")
CC: stable(a)vger.kernel.org # 5.11+
Signed-off-by: Davide Caratti <dcaratti(a)redhat.com>
---
net/mptcp/options.c | 2 ++
net/mptcp/protocol.h | 14 ++++++++------
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 444a38681e93..c8a7270dc61c 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1196,6 +1196,8 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
ptr += 2;
}
} else {
+ /* account for 2 trailing 'nop' options */
+ len += TCPOLEN_MPTCP_PORT_ALIGN;
if (opts->ahmac) {
u8 *bptr = (u8 *)ptr;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 91827d949766..e21a5bc36cf0 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -52,14 +52,15 @@
#define TCPOLEN_MPTCP_DSS_MAP64 14
#define TCPOLEN_MPTCP_DSS_CHECKSUM 2
#define TCPOLEN_MPTCP_ADD_ADDR 16
-#define TCPOLEN_MPTCP_ADD_ADDR_PORT 20
+#define TCPOLEN_MPTCP_ADD_ADDR_PORT 18
#define TCPOLEN_MPTCP_ADD_ADDR_BASE 8
-#define TCPOLEN_MPTCP_ADD_ADDR_BASE_PORT 12
+#define TCPOLEN_MPTCP_ADD_ADDR_BASE_PORT 10
#define TCPOLEN_MPTCP_ADD_ADDR6 28
-#define TCPOLEN_MPTCP_ADD_ADDR6_PORT 32
+#define TCPOLEN_MPTCP_ADD_ADDR6_PORT 30
#define TCPOLEN_MPTCP_ADD_ADDR6_BASE 20
-#define TCPOLEN_MPTCP_ADD_ADDR6_BASE_PORT 24
-#define TCPOLEN_MPTCP_PORT_LEN 4
+#define TCPOLEN_MPTCP_ADD_ADDR6_BASE_PORT 22
+#define TCPOLEN_MPTCP_PORT_LEN 2
+#define TCPOLEN_MPTCP_PORT_ALIGN 2
#define TCPOLEN_MPTCP_RM_ADDR_BASE 4
#define TCPOLEN_MPTCP_PRIO 3
#define TCPOLEN_MPTCP_PRIO_ALIGN 4
@@ -701,8 +702,9 @@ static inline unsigned int mptcp_add_addr_len(int family, bool echo, bool port)
len = TCPOLEN_MPTCP_ADD_ADDR6_BASE;
if (!echo)
len += MPTCPOPT_THMAC_LEN;
+ /* account for 2 trailing 'nop' options */
if (port)
- len += TCPOLEN_MPTCP_PORT_LEN;
+ len += TCPOLEN_MPTCP_PORT_LEN + TCPOLEN_MPTCP_PORT_ALIGN;
return len;
}
--
2.29.2
1 year, 5 months
[MPTCP][PATCH v5 mptcp-next 0/5] remove id 0 address
by Geliang Tang
v5:
- add patch 1 "avoid passing rm_list as a struct".
- re-use MPTCP_MIB_RMADDR/MPTCP_MIB_RMSUBFLOW in patch 3.
- use (id == 0) instead of (!id) in patch 4.
- tag: export/20210226T185626.
v4:
- add new function mptcp_nl_remove_id_zero_address.
- add new testcases.
v3:
- print subflow ids in patch 1;
- add the initial subflow's ids check in patch 2;
- update the test in patch 3;
v2:
- drop __mptcp_wr_shutdown in v1.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/140
Geliang Tang (5):
mptcp: avoid passing rm_list as a struct
mptcp: remove all subflows involving id 0 address
mptcp: unify RM_ADDR and RM_SUBFLOW receiving
mptcp: remove id 0 address
selftests: mptcp: remove id 0 address testcases
net/mptcp/options.c | 4 +-
net/mptcp/pm.c | 23 ++--
net/mptcp/pm_netlink.c | 128 ++++++++++--------
net/mptcp/protocol.h | 16 ++-
.../testing/selftests/net/mptcp/mptcp_join.sh | 29 +++-
5 files changed, 122 insertions(+), 78 deletions(-)
--
2.29.2
1 year, 5 months
[MPTCP][PATCH v3 mptcp-next 0/3] remove id 0 address
by Geliang Tang
v3:
- print subflow ids in patch 1;
- add the initial subflow's ids check in patch 2;
- update the test in patch 3;
- tag: export/20210226T185626.
v2:
- drop __mptcp_wr_shutdown in v1.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/140
Geliang Tang (3):
mptcp: unify RM_ADDR and RM_SUBFLOW receiving
mptcp: check the removing initial subflow
DO-NOT-MERGE: mptcp: remove id 0 test
net/mptcp/pm_netlink.c | 99 +++++++++----------
.../testing/selftests/net/mptcp/mptcp_join.sh | 10 +-
2 files changed, 52 insertions(+), 57 deletions(-)
--
2.29.2
1 year, 5 months
[RFC v2 PATCH 0/2] mptcp: fix deadlock in mptcp{,6}_release
by Paolo Abeni
This address the issue the easy way: removing the bugged features
(a lot of setsockopt including all mcast related).
As suggested by Mat(s) this switch to a whitelist. Assembling the
whilelist itself is quite complex/tedious/errorprone so still RFC.
Please have a look at 1/2 for the details
Paolo Abeni (2):
mptcp: only admit explicitly supported sockopt
mptcp: revert "mptcp: provide subflow aware release function"$ ....$
net/mptcp/protocol.c | 270 ++++++++++++++++++++++++++++++++++---------
1 file changed, 217 insertions(+), 53 deletions(-)
--
2.26.2
1 year, 5 months
[MPTCP][PATCH v4 mptcp-next 0/4] remove id 0 address
by Geliang Tang
v4:
- add new function mptcp_nl_remove_id_zero_address.
- add new testcases.
- tag: export/20210226T185626.
v3:
- print subflow ids in patch 1;
- add the initial subflow's ids check in patch 2;
- update the test in patch 3;
- tag: export/20210226T185626.
v2:
- drop __mptcp_wr_shutdown in v1.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/140
Geliang Tang (4):
mptcp: remove all subflows involving id 0 address
mptcp: unify RM_ADDR and RM_SUBFLOW receiving
mptcp: remove id zero address
selftests: mptcp: mptcp: remove id 0 address testcases
net/mptcp/pm_netlink.c | 120 ++++++++++--------
.../testing/selftests/net/mptcp/mptcp_join.sh | 29 ++++-
2 files changed, 96 insertions(+), 53 deletions(-)
--
2.29.2
1 year, 5 months
[RFC PATCH 0/2] mptcp: fix deadlock in mptcp{,6}_release
by Paolo Abeni
This address the issue the easy way: removing the bugged feature
(mcast support :).
AFAICS the mcast related sockopt manipulate a global state, so there
could be existing working application doing setsockopt(mcast opt) on
TCP sockets. This is why I did not disable mcast at the inet level
for stream sockets.
Still I'm wondering if we should take the opposite approach: explicitly
forbitting all setsockopt except the one we implement in
net/mptcp/protocol.c
Additionally this possibly overlap with wider task - a more
comprehensive sockopt support. Ence the RFC tag.
Side note: syzbot confirm this fixes issues/170
Paolo Abeni (2):
mptcp: forbit mcast-related sockopt on MPTCP sockets
mptcp: revert "mptcp: provide subflow aware release function"$ ....$
net/mptcp/protocol.c | 100 ++++++++++++++++++++-----------------------
1 file changed, 47 insertions(+), 53 deletions(-)
--
2.26.2
1 year, 5 months
Re: possible deadlock in ipv6_sock_mc_close
by Matthieu Baerts
Hi Chuck,
(+ cc: MPTCP list)
On 01/03/2021 15:52, Chuck Lever wrote:
>> On Mar 1, 2021, at 8:49 AM, syzbot <syzbot+e2fa57709a385e6db10f(a)syzkaller.appspotmail.com> wrote:
(...)
>> syzbot found the following issue on:
(...)
> Hi, thanks for the report.
>
> Initial analysis:
>
> c8e88e3aa738 ("NFSD: Replace READ* macros in nfsd4_decode_layoutget()"
> changes code several layers above the network layer. In addition,
> neither of the stack traces contain NFSD functions. And, repro.c does
> not appear to exercise any filesystem code.
>
> Therefore the bisect result looks implausible to me. I don't see any
> obvious connection between the lockdep splat and c8e88e3aa738. (If
> someone else does, please let me know where to look).
From what I can see from the bisect logs, it looks like this issue is
difficult to reproduce. But it really looks like the issue is on MPTCP
side and not directly related to your patch.
Thanks to the syzbot team for reporting this!
It doesn't look very simple to fix but we are tracking this on our side:
https://github.com/multipath-tcp/mptcp_net-next/issues/170
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
1 year, 5 months
Fwd: possible deadlock in ipv6_sock_mc_close
by Matthieu Baerts
Hello,
This report from syzbot mentioned an possible deadlock with MPTCP
socket. I don't know if the commit from nfsd is really causing that.
I'm then sharing this with the list just in case someone saw similar
issues :)
Cheers,
Matt
-------- Forwarded Message --------
Subject: possible deadlock in ipv6_sock_mc_close
Date: Mon, 01 Mar 2021 05:49:14 -0800
From: syzbot <syzbot+e2fa57709a385e6db10f(a)syzkaller.appspotmail.com>
To: bfields(a)fieldses.org, chuck.lever(a)oracle.com, davem(a)davemloft.net,
dsahern(a)kernel.org, kuba(a)kernel.org, linux-kernel(a)vger.kernel.org,
linux-nfs(a)vger.kernel.org, netdev(a)vger.kernel.org,
syzkaller-bugs(a)googlegroups.com, yoshfuji(a)linux-ipv6.org
Hello,
syzbot found the following issue on:
HEAD commit: eee7ede6 Merge branch 'bnxt_en-error-recovery-bug-fixes'
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=123ad632d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e2d5ba72abae4f14
dashboard link: https://syzkaller.appspot.com/bug?extid=e2fa57709a385e6db10f
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=109d89b6d00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12e9e0dad00000
The issue was bisected to:
commit c8e88e3aa73889421461f878cd569ef84f231ceb
Author: Chuck Lever <chuck.lever(a)oracle.com>
Date: Tue Nov 3 20:06:04 2020 +0000
NFSD: Replace READ* macros in nfsd4_decode_layoutget()
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13bef9ccd00000
final oops: https://syzkaller.appspot.com/x/report.txt?x=107ef9ccd00000
console output: https://syzkaller.appspot.com/x/log.txt?x=17bef9ccd00000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e2fa57709a385e6db10f(a)syzkaller.appspotmail.com
Fixes: c8e88e3aa738 ("NFSD: Replace READ* macros in
nfsd4_decode_layoutget()")
======================================================
WARNING: possible circular locking dependency detected
5.11.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor905/8822 is trying to acquire lock:
ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at:
ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323
but task is already holding lock:
ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock
include/net/sock.h:1600 [inline]
ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at:
mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (sk_lock-AF_INET6){+.+.}-{0:0}:
lock_sock_nested+0xca/0x120 net/core/sock.c:3071
lock_sock include/net/sock.h:1600 [inline]
gtp_encap_enable_socket+0x277/0x4a0 drivers/net/gtp.c:824
gtp_encap_enable drivers/net/gtp.c:855 [inline]
gtp_newlink+0x2b3/0xc60 drivers/net/gtp.c:683
__rtnl_newlink+0x1059/0x1710 net/core/rtnetlink.c:3443
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491
rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:674
____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
___sys_sendmsg+0xf3/0x170 net/socket.c:2404
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2437
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #0 (rtnl_mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:2936 [inline]
check_prevs_add kernel/locking/lockdep.c:3059 [inline]
validate_chain kernel/locking/lockdep.c:3674 [inline]
__lock_acquire+0x2b14/0x54c0 kernel/locking/lockdep.c:4900
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
__mutex_lock_common kernel/locking/mutex.c:946 [inline]
__mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1093
ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323
mptcp6_release+0xb9/0x130 net/mptcp/protocol.c:3515
__sock_release+0xcd/0x280 net/socket.c:599
sock_close+0x18/0x20 net/socket.c:1258
__fput+0x288/0x920 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:208
__syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x44/0xae
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sk_lock-AF_INET6);
lock(rtnl_mutex);
lock(sk_lock-AF_INET6);
lock(rtnl_mutex);
*** DEADLOCK ***
2 locks held by syz-executor905/8822:
#0: ffff888033080750 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at:
inode_lock include/linux/fs.h:775 [inline]
#0: ffff888033080750 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at:
__sock_release+0x86/0x280 net/socket.c:598
#1: ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock
include/net/sock.h:1600 [inline]
#1: ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at:
mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507
stack backtrace:
CPU: 1 PID: 8822 Comm: syz-executor905 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0xfa/0x151 lib/dump_stack.c:120
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2127
check_prev_add kernel/locking/lockdep.c:2936 [inline]
check_prevs_add kernel/locking/lockdep.c:3059 [inline]
validate_chain kernel/locking/lockdep.c:3674 [inline]
__lock_acquire+0x2b14/0x54c0 kernel/locking/lockdep.c:4900
lock_acquire kernel/locking/lockdep.c:5510 [inline]
lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
__mutex_lock_common kernel/locking/mutex.c:946 [inline]
__mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1093
ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323
mptcp6_release+0xb9/0x130 net/mptcp/protocol.c:3515
__sock_release+0xcd/0x280 net/socket.c:599
sock_close+0x18/0x20 net/socket.c:1258
__fput+0x288/0x920 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:208
__syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x405b73
Code: c7 c2 c0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f
00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
RSP: 002b:00007ffdbac4d208 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000405b73
RDX: 000000000000002a RSI: 0000000000000029 RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000088 R09: 0000000000f0b5ff
R10: 00000000200001c0 R11: 0000000000000246 R12: 0000000000010bda
R13: 00007ffdbac4d230 R14: 00007ffdbac4d220 R15: 00007ffdbac4d214
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller(a)googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches
1 year, 5 months