[RFC PATCH 0/5] mptcp: cope better with mp_join storm
by Paolo Abeni
This series it amied at fixing:
https://github.com/multipath-tcp/mptcp_net-next/issues/33
patch 3/5 do the main work, while patch 4 and 5 should
reduce the server load avoid creating unneeded children.
Patch 1 and 2 are somewhat related cleanup.
Patch 2 && 5 are likely the most controversial. Any
feedback more than welcome!
Paolo Abeni (5):
mptcp: cleanup subflow_finish_connect()
subflow: explicitly check for plain tcp rsk
subflow: use rsk_ops->send_reset()
subflow: introduce and use mptcp_can_accept_new_subflow()
subflow: do not create child subflow for fallback MP_JOIN
net/mptcp/subflow.c | 76 +++++++++++++++++++++++++--------------------
1 file changed, 42 insertions(+), 34 deletions(-)
--
2.26.2
9 months
WARNING in warn_bad_map
by syzbot
Hello,
syzbot found the following crash on:
HEAD commit: 7ae77150 Merge tag 'powerpc-5.8-1' of git://git.kernel.org..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11124521100000
kernel config: https://syzkaller.appspot.com/x/.config?x=d195fe572fb15312
dashboard link: https://syzkaller.appspot.com/bug?extid=42a07faa5923cfaeb9c9
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+42a07faa5923cfaeb9c9(a)syzkaller.appspotmail.com
------------[ cut here ]------------
Bad mapping: ssn=2478501 map_seq=0 map_data_len=32748
WARNING: CPU: 0 PID: 20706 at net/mptcp/subflow.c:581 warn_bad_map.isra.0.part.0+0x7d/0xb0 net/mptcp/subflow.c:581
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 20706 Comm: syz-executor.2 Not tainted 5.7.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:221
__warn.cold+0x2f/0x35 kernel/panic.c:582
report_bug+0x27b/0x2f0 lib/bug.c:195
fixup_bug arch/x86/kernel/traps.c:105 [inline]
fixup_bug arch/x86/kernel/traps.c:100 [inline]
do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:197
do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:216
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:warn_bad_map.isra.0.part.0+0x7d/0xb0 net/mptcp/subflow.c:581
Code: 48 c1 ea 03 0f b6 14 02 48 89 d8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1c 8b 13 44 89 e6 48 c7 c7 20 5c fe 88 e8 4b 2e 6b f9 <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 48 89 df 89 4c 24 04 e8 7c 3f d9
RSP: 0018:ffffc90006f2f420 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88809e4fe43c RCX: 0000000000000000
RDX: 0000000000040000 RSI: ffffffff815d5ba7 RDI: fffff52000de5e76
RBP: ffff88809e4fe444 R08: ffff8880850aa580 R09: 0000000000000001
R10: ffffffff8c347a27 R11: fffffbfff1868f44 R12: 000000000025d1a5
R13: ffff88809e4fe400 R14: ffff88809e4fe444 R15: ffff88809e4fe43c
warn_bad_map net/mptcp/subflow.c:613 [inline]
validate_mapping net/mptcp/subflow.c:613 [inline]
get_mapping_status net/mptcp/subflow.c:728 [inline]
subflow_check_data_avail net/mptcp/subflow.c:766 [inline]
mptcp_subflow_data_available+0x145b/0x1aa0 net/mptcp/subflow.c:862
subflow_data_ready+0x10b/0x170 net/mptcp/subflow.c:903
tcp_data_ready+0xe8/0x230 net/ipv4/tcp_input.c:4776
tcp_data_queue+0x1161/0x4760 net/ipv4/tcp_input.c:4842
tcp_rcv_established+0x905/0x1d90 net/ipv4/tcp_input.c:5735
tcp_v4_do_rcv+0x605/0x8b0 net/ipv4/tcp_ipv4.c:1629
sk_backlog_rcv include/net/sock.h:996 [inline]
__release_sock+0x134/0x3a0 net/core/sock.c:2548
release_sock+0x54/0x1b0 net/core/sock.c:3064
mptcp_sendmsg+0x11f9/0x17f0 net/mptcp/protocol.c:872
inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:814
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
sock_write_iter+0x288/0x3c0 net/socket.c:999
call_write_iter include/linux/fs.h:1917 [inline]
do_iter_readv_writev+0x51e/0x790 fs/read_write.c:694
do_iter_write fs/read_write.c:999 [inline]
do_iter_write+0x18b/0x600 fs/read_write.c:980
vfs_writev+0x1b3/0x2f0 fs/read_write.c:1072
do_writev+0x27f/0x300 fs/read_write.c:1115
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x45ca59
Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f5887229c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 000000000050e320 RCX: 000000000045ca59
RDX: 0000000000000001 RSI: 0000000020000200 RDI: 0000000000000004
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000d43 R14: 00000000004cb67f R15: 00007f588722a6d4
Kernel Offset: disabled
Rebooting in 86400 seconds..
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller(a)googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
9 months
[PATCH v2 0/4] mptcp: msk diag support
by Paolo Abeni
This introduces basic mptcp sockets diag support.
As IPPROTO_MPTCP excedes 8 bits, we need some changes at the inet_diag level:
a new attribute is introduced to allow user-space providing u32 protocol
values.
Patch 2 introduces new token APIs to allow traversing the existing msks, while
patch 3 bring in the actual diag implementation.
Patch 4 includes some basic functional tests
v1 -> v2
- fixed dump issue on large dump
- use flags for fallback, etc
- patch 4
Paolo Abeni (4):
inet_diag: support for wider protocol numbers
mptcp: add msk interations helpers
mptcp: add MPTCP socket diag interface
selftests/mptcp: add diag interface tests
include/uapi/linux/inet_diag.h | 1 +
include/uapi/linux/mptcp.h | 17 ++
net/core/sock.c | 1 +
net/ipv4/inet_diag.c | 63 +++++--
net/mptcp/Kconfig | 4 +
net/mptcp/Makefile | 2 +
net/mptcp/mptcp_diag.c | 167 ++++++++++++++++++
net/mptcp/protocol.h | 3 +
net/mptcp/token.c | 83 +++++++++
tools/testing/selftests/net/mptcp/Makefile | 2 +-
tools/testing/selftests/net/mptcp/diag.sh | 122 +++++++++++++
.../selftests/net/mptcp/mptcp_connect.c | 14 +-
12 files changed, 457 insertions(+), 22 deletions(-)
create mode 100644 net/mptcp/mptcp_diag.c
create mode 100755 tools/testing/selftests/net/mptcp/diag.sh
--
2.26.2
9 months, 1 week
[PATCH net-next 0/2] mptcp: add receive buffer auto-tuning
by Florian Westphal
First patch extends the test script to allow for reproducible results.
Second patch adds receive auto-tuning. Its based on what TCP is doing,
only difference is that we use the largest RTT of any of the subflows
and that we will update all subflows with the new value.
Else, we get spurious packet drops because the mptcp work queue might
not be able to move packets from subflow socket to master socket
fast enough. Without the adjustment, TCP may drop the packets because
the subflow socket is over its rcvbuffer limit.
Florian Westphal (2):
selftests: mptcp: add option to specify size of file to transfer
mptcp: add receive buffer auto-tuning
net/mptcp/protocol.c | 123 +++++++++++++++++++--
net/mptcp/protocol.h | 7 ++
net/mptcp/subflow.c | 5 +-
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 52 ++++++---
4 files changed, 166 insertions(+), 21 deletions(-)
9 months, 2 weeks
[PATCH net] mptcp: fix DSS map generation on fin retransmission
by Paolo Abeni
The RFC 8684 mandates that no-data DATA FIN packets should carry
a DSS with 0 sequence number and data len equal to 1. Currently,
on FIN retransmission we re-use the existing mapping; if the previous
fin transmission was part of a partially acked data packet, we could
end-up writing in the egress packet a non-compliant DSS.
The above will be detected by a "Bad mapping" warning on the receiver
side.
This change addresses the issue explicitly checking for 0 len packet
when adding the DATA_FIN option.
Fixes: 6d0060f600ad ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
net/mptcp/options.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
---
this should go on "top" of export branch, just before the "DO NOT MERGE"
changes.
Hopefully should not conflict with others
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index f464f8669dfc..46470194b8ca 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -451,9 +451,9 @@ static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb,
}
static void mptcp_write_data_fin(struct mptcp_subflow_context *subflow,
- struct mptcp_ext *ext)
+ struct sk_buff *skb, struct mptcp_ext *ext)
{
- if (!ext->use_map) {
+ if (!ext->use_map || !skb->len) {
/* RFC6824 requires a DSS mapping with specific values
* if DATA_FIN is set but no data payload is mapped
*/
@@ -505,7 +505,7 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
opts->ext_copy = *mpext;
if (skb && tcp_fin && subflow->data_fin_tx_enable)
- mptcp_write_data_fin(subflow, &opts->ext_copy);
+ mptcp_write_data_fin(subflow, skb, &opts->ext_copy);
ret = true;
}
--
2.26.2
9 months, 2 weeks
[PATCH mptcp-next] mptcp: init autotune state also in simultaneous connect case.
by Florian Westphal
This is what Christoph suggested needs to be added to avoid
div0 in case of simultaneous connect.
Squashto: mptcp: add receive buffer auto-tuning
---
net/mptcp/subflow.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 0f0fa1ba57a8..e1e19c76e267 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1120,6 +1120,7 @@ static void subflow_state_change(struct sock *sk)
if (subflow_simultaneous_connect(sk)) {
mptcp_do_fallback(sk);
+ mptcp_rcv_space_init(mptcp_sk(parent), sk);
pr_fallback(mptcp_sk(parent));
subflow->conn_finished = 1;
if (inet_sk_state_load(parent) == TCP_SYN_SENT) {
--
2.26.2
9 months, 2 weeks
[PATCH iproute2-next 0/2] ss: msk diag support
by Paolo Abeni
basic support for MPTCP sockets diag interface.
The first patch update the required headers, while the 2nd one bring in
the actual implementation
Paolo Abeni (2):
include: update mptcp uAPI
ss: mptcp: add msk diag interface support
include/uapi/linux/inet_diag.h | 1 +
include/uapi/linux/mptcp.h | 15 +++++
misc/ss.c | 115 ++++++++++++++++++++++++++++++---
3 files changed, 121 insertions(+), 10 deletions(-)
--
2.26.2
9 months, 2 weeks
[PATCH net-next] mptcp: do nonce initialization at subflow creation time
by Paolo Abeni
This clean-up the code a bit, reduces the number of
used hooks and indirect call requested, and allow
better error reporting from __mptcp_subflow_connect()
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
net/mptcp/subflow.c | 54 +++++++++++++++++----------------------------
1 file changed, 20 insertions(+), 34 deletions(-)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 548f9e347ff5..664aa9158363 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -29,34 +29,6 @@ static void SUBFLOW_REQ_INC_STATS(struct request_sock *req,
MPTCP_INC_STATS(sock_net(req_to_sk(req)), field);
}
-static int subflow_rebuild_header(struct sock *sk)
-{
- struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
- int local_id;
-
- if (subflow->request_join && !subflow->local_nonce) {
- struct mptcp_sock *msk = (struct mptcp_sock *)subflow->conn;
-
- pr_debug("subflow=%p", sk);
-
- do {
- get_random_bytes(&subflow->local_nonce, sizeof(u32));
- } while (!subflow->local_nonce);
-
- if (subflow->local_id)
- goto out;
-
- local_id = mptcp_pm_get_local_id(msk, (struct sock_common *)sk);
- if (local_id < 0)
- return -EINVAL;
-
- subflow->local_id = local_id;
- }
-
-out:
- return subflow->icsk_af_ops->rebuild_header(sk);
-}
-
static void subflow_req_destructor(struct request_sock *req)
{
struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req);
@@ -984,7 +956,9 @@ int __mptcp_subflow_connect(struct sock *sk, int ifindex,
struct mptcp_sock *msk = mptcp_sk(sk);
struct mptcp_subflow_context *subflow;
struct sockaddr_storage addr;
+ int local_id = loc->id;
struct socket *sf;
+ struct sock *ssk;
u32 remote_token;
int addrlen;
int err;
@@ -996,7 +970,20 @@ int __mptcp_subflow_connect(struct sock *sk, int ifindex,
if (err)
return err;
- subflow = mptcp_subflow_ctx(sf->sk);
+ ssk = sf->sk;
+ subflow = mptcp_subflow_ctx(ssk);
+ do {
+ get_random_bytes(&subflow->local_nonce, sizeof(u32));
+ } while (!subflow->local_nonce);
+
+ if (!local_id) {
+ err = mptcp_pm_get_local_id(msk, (struct sock_common *)ssk);
+ if (err < 0)
+ goto failed;
+
+ local_id = err;
+ }
+
subflow->remote_key = msk->remote_key;
subflow->local_key = msk->local_key;
subflow->token = msk->token;
@@ -1007,15 +994,16 @@ int __mptcp_subflow_connect(struct sock *sk, int ifindex,
if (loc->family == AF_INET6)
addrlen = sizeof(struct sockaddr_in6);
#endif
- sf->sk->sk_bound_dev_if = ifindex;
+ ssk->sk_bound_dev_if = ifindex;
err = kernel_bind(sf, (struct sockaddr *)&addr, addrlen);
if (err)
goto failed;
mptcp_crypto_key_sha(subflow->remote_key, &remote_token, NULL);
- pr_debug("msk=%p remote_token=%u", msk, remote_token);
+ pr_debug("msk=%p remote_token=%u local_id=%d", msk, remote_token,
+ local_id);
subflow->remote_token = remote_token;
- subflow->local_id = loc->id;
+ subflow->local_id = local_id;
subflow->request_join = 1;
subflow->request_bkup = 1;
mptcp_info2sockaddr(remote, &addr);
@@ -1288,7 +1276,6 @@ void __init mptcp_subflow_init(void)
subflow_specific.conn_request = subflow_v4_conn_request;
subflow_specific.syn_recv_sock = subflow_syn_recv_sock;
subflow_specific.sk_rx_dst_set = subflow_finish_connect;
- subflow_specific.rebuild_header = subflow_rebuild_header;
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
subflow_request_sock_ipv6_ops = tcp_request_sock_ipv6_ops;
@@ -1298,7 +1285,6 @@ void __init mptcp_subflow_init(void)
subflow_v6_specific.conn_request = subflow_v6_conn_request;
subflow_v6_specific.syn_recv_sock = subflow_syn_recv_sock;
subflow_v6_specific.sk_rx_dst_set = subflow_finish_connect;
- subflow_v6_specific.rebuild_header = subflow_rebuild_header;
subflow_v6m_specific = subflow_v6_specific;
subflow_v6m_specific.queue_xmit = ipv4_specific.queue_xmit;
--
2.26.2
9 months, 2 weeks