[Weekly meetings] MoM - 25th of April 2019
by Matthieu Baerts
Hello,
We just had our 47th meeting with Mat and Peter (Intel OTC),
Christoph (Apple), Florian (Red Hat) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Initial architecture:
- Paolo will do a new iteration of his fixes/improvement we
discussed last week
- new patch set:
https://lists.01.org/pipermail/mptcp/2019-April/001080.html → mptcp
option len must be skb independent (in progress)
- Mat & Peter and Paolo (problem with the receive destination cache
expired) might not look at the same bug.
- Mat & Peter are working on a new version with some
refactoring/cleanup already discussed on Gerrit (new internal header, no
more magic numbers, etc.)
- Mat is finishing his use-after-free issue (on the ULP listening
socket)
- Mat & Peter hope to push to Gerrit soon!
Next steps:
- Mat & Peter: share the new version
- Review of ↑
- Integrate bug fixes/improvements from Paolo
- discuss next week about what to do with the shared received windows
- merge stuff in our repo
- prepare something to share with Eric
mptcp.org:
- MPTCP v1 still in RFC, no comment so far.
- we can use it to test the upstream version with this v1 (the goal
is for us to only support the v1)
Next meeting:
- We propose to have it next Thursday, the 2nd of May.
- Usual time: 16:00 UTC (9am PDT, 6pm CEST)
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20190502
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you next week,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
1 year, 11 months
[PATCH] mptcp: selftests: switch to netns+veth based tests
by Florian Westphal
... so we can exercise PMTU and MSS handling.
MTU on lo is 64k, so we never had to deal with segmentation either.
This also avoids problems with timewait state in inet_ns,
all net namespaces are torn down before script exits.
This uncovers several bugs:
1. mptcp_init_sock() passes init_net instead of sock_net(sk), i.e.
network namespaces are not supported
2. We corrupt tcp option space, I can see invalid tcp headers (too
short) in tcpdump, and receiver process hangs without getting the
data (poll timeout).
Seems to be gone after adding
if (size == MAX_TCP_OPTION_SPACE)
return size;
BUG(size > MAX_TCP_OPTION_SPACE);
in tcp_established_options() after writing mptcp options.
SACK writing doesn't handle 'no option space left' case.
3. Several WARN_ON from networking core are triggered, e.g. due
to mem accouting being off. Maybe fixed already with pending
locking fix in mptcp_recv.
4. "Replaced mapping before it was done" in dmesg.
5. receiver blocking in read(), while TCP socket is in CLOSE-WAIT.
wait_woken+0xd6/0x170
sk_wait_data+0x248/0x270
mptcp_recvmsg+0x5c0/0xd50
6. kmemleak gets noisy, we probably leak a refcount somewhere (iirc
Davide is already working on this).
7. crash on connect completion, probably same bug that Paolo reported
already.
As the script did not yet turn up any problem when using only
tcp, it appears these are MPTCP related bugs rather than with script
or mptcp_connect.c .
Once above issues are fixed, this will be extended again to
set different/varying MTU in ns1 and ns4.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
.../selftests/net/mptcp/mptcp_connect.c | 303 +++++++++---------
.../selftests/net/mptcp/mptcp_connect.sh | 208 ++++++++++--
2 files changed, 336 insertions(+), 175 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index 2830a63ce64e..360b698e5ddd 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -26,16 +26,19 @@ extern int optind;
#define IPPROTO_MPTCP 99
#endif
+static bool listen_mode;
+static int poll_timeout;
+
static const char *cfg_host;
static const char *cfg_port = "12000";
-static int cfg_server_proto = IPPROTO_MPTCP;
-static int cfg_client_proto = IPPROTO_MPTCP;
+static int cfg_sock_proto = IPPROTO_MPTCP;
+
static void die_usage(void)
{
- fprintf(stderr, "Usage: mptcp_connect [-c MPTCP|TCP] [-p port] "
- "[-s MPTCP|TCP]\n");
- exit(-1);
+ fprintf(stderr, "Usage: mptcp_connect [-s MPTCP|TCP] [-p port] "
+ "[ -l ] [ -t timeout ] connect_address\n");
+ exit(1);
}
static const char *getxinfo_strerr(int err)
@@ -76,16 +79,13 @@ static int sock_listen_mptcp(const char * const listenaddr, const char * const p
xgetaddrinfo(listenaddr, port, &hints, &addr);
for (a = addr; a != NULL ; a = a->ai_next) {
- sock = socket(a->ai_family, a->ai_socktype, cfg_server_proto);
- if (sock < 0) {
- perror("socket");
+ sock = socket(a->ai_family, a->ai_socktype, cfg_sock_proto);
+ if (sock < 0)
continue;
- }
if (-1 == setsockopt(sock, SOL_SOCKET,SO_REUSEADDR,&one,sizeof one))
perror("setsockopt");
-
if (bind(sock, a->ai_addr, a->ai_addrlen) == 0)
break; /* success */
@@ -94,10 +94,19 @@ static int sock_listen_mptcp(const char * const listenaddr, const char * const p
sock = -1;
}
- if ((sock >= 0) && listen(sock ,20))
+ freeaddrinfo(addr);
+
+ if (sock < 0) {
+ fprintf(stderr, "Could not create listen socket\n");
+ return sock;
+ }
+
+ if (listen(sock, 20)) {
perror("listen");
+ close(sock);
+ return -1;
+ }
- freeaddrinfo(addr);
return sock;
}
@@ -132,7 +141,7 @@ static int sock_connect_mptcp(const char * const remoteaddr, const char * const
return sock;
}
-static size_t do_write(const int fd, char *buf, const size_t len)
+static size_t do_rnd_write(const int fd, char *buf, const size_t len)
{
size_t offset = 0;
@@ -157,62 +166,141 @@ static size_t do_write(const int fd, char *buf, const size_t len)
return offset;
}
-static void copyfd_io(int peerfd)
+static size_t do_write(const int fd, char *buf, const size_t len)
+{
+ size_t offset = 0;
+
+ while (offset < len) {
+ size_t written;
+ ssize_t bw;
+
+ bw = write(fd, buf+offset, len - offset);
+ if (bw < 0 ) {
+ perror("write");
+ return 0;
+ }
+
+ written = (size_t) bw;
+ offset += written;
+ }
+
+ return offset;
+}
+
+static ssize_t do_rnd_read(const int fd, char *buf, const size_t len)
{
- struct pollfd fds = { .events = POLLIN };
+ size_t cap = rand();
+
+ cap &= 0xffff;
- fds.fd = peerfd;
+ if (cap == 0)
+ cap = 1;
+ else if (cap > len)
+ cap = len;
+
+ return read(fd, buf, cap);
+}
+
+static int copyfd_io(int infd, int peerfd, int outfd)
+{
+ struct pollfd fds = {
+ .fd = peerfd,
+ .events = POLLIN | POLLOUT,
+ };
for (;;) {
- char buf[4096];
+ char buf[8192];
ssize_t len;
- switch(poll(&fds, 1, -1)) {
+ if (fds.events == 0)
+ break;
+
+ switch (poll(&fds, 1, poll_timeout)) {
case -1:
if (errno == EINTR)
continue;
perror("poll");
- return;
+ return 1;
case 0:
- /* should not happen, we requested infinite wait */
- fputs("Timed out?!", stderr);
- return;
+ fprintf(stderr, "%s: poll timed out (events: POLLIN %u, POLLOUT %u)\n",
+ __func__, fds.events & POLLIN, fds.events & POLLOUT);
+ return 2;
}
- if ((fds.revents & POLLIN) == 0)
- return;
+ if (fds.revents & POLLIN) {
+ len = do_rnd_read(peerfd, buf, sizeof(buf));
+ if (len == 0) {
+ /* no more data to receive: peer has closed its write side */
+ fds.events &= ~POLLIN;
- len = read(peerfd, buf, sizeof(buf));
- if (!len)
- return;
- if (len < 0) {
- if (errno == EINTR)
- continue;
+ if ((fds.events & POLLOUT) == 0)
+ break; /* and nothing more to send */
- perror("read");
- return;
- }
+ /* Else, still have data to transmit */
+ } else if (len < 0) {
+ perror("read");
+ return 3;
+ }
- if (!do_write(peerfd, buf, len))
- return;
+ do_write(outfd, buf, len);
+ }
+ if (fds.revents & POLLOUT) {
+ len = do_rnd_read(infd, buf, sizeof(buf));
+ if (len > 0) {
+ if (!do_rnd_write(peerfd, buf, len))
+ return 111;
+ } else if (len == 0) {
+ /* We have no more data to send. */
+ fds.events &= ~POLLOUT;
+
+ if ((fds.events & POLLIN) == 0)
+ break; /* ... and peer also closed already */
+
+ /* ... but we still receive. Close our write side. */
+ shutdown(peerfd, SHUT_WR);
+ } else {
+ if (errno == EINTR)
+ continue;
+ perror("read");
+ return 4;
+ }
+ }
}
+
+ close(peerfd);
+ return 0;
}
int main_loop_s(int listensock)
{
struct sockaddr_storage ss;
+ struct pollfd polls;
socklen_t salen;
int remotesock;
- salen = sizeof(ss);
- while ((remotesock = accept(listensock, (struct sockaddr *)&ss, &salen)) < 0)
- perror("accept");
+ polls.fd = listensock;
+ polls.events = POLLIN;
- copyfd_io(remotesock);
- close(remotesock);
+ switch (poll(&polls, 1, poll_timeout)) {
+ case -1:
+ perror("poll");
+ return 1;
+ case 0:
+ fprintf(stderr, "%s: timed out\n", __func__);
+ close(listensock);
+ return 2;
+ }
- return 0;
+ salen = sizeof(ss);
+ remotesock = accept(listensock, (struct sockaddr *)&ss, &salen);
+ if (remotesock >= 0) {
+ copyfd_io(0, remotesock, 1);
+ return 0;
+ } else {
+ perror("accept");
+ return 1;
+ }
}
static void init_rng(void)
@@ -221,7 +309,9 @@ static void init_rng(void)
unsigned int foo;
if (fd > 0) {
- read(fd, &foo, sizeof(foo));
+ int ret = read(fd, &foo, sizeof(foo));
+ if (ret < 0)
+ srand(fd + foo);
close(fd);
}
@@ -230,112 +320,14 @@ static void init_rng(void)
int main_loop()
{
- int pollfds = 2, timeout = -1;
- char start[32];
- int pipefd[2];
- ssize_t ret;
int fd;
- if (pipe(pipefd)) {
- perror("pipe");
- exit(1);
- }
-
- switch (fork()) {
- case 0:
- close(pipefd[0]);
-
- init_rng();
-
- fd = sock_listen_mptcp(NULL, cfg_port);
- if (fd < 0)
- return -1;
-
- write(pipefd[1], "RDY\n", 4);
- main_loop_s(fd);
- exit(1);
- case -1:
- perror("fork");
- return -1;
- default:
- close(pipefd[1]);
- break;
- }
-
- init_rng();
- ret = read(pipefd[0], start, (int)sizeof(start));
- if (ret < 0) {
- perror("read");
- return -1;
- }
-
- if (ret != 4 || strcmp(start, "RDY\n"))
- return -1;
-
/* listener is ready. */
- fd = sock_connect_mptcp(cfg_host, cfg_port, cfg_client_proto);
+ fd = sock_connect_mptcp(cfg_host, cfg_port, cfg_sock_proto);
if (fd < 0)
- return -1;
-
- for (;;) {
- struct pollfd fds[2];
- char buf[4096];
- ssize_t len;
-
- fds[0].fd = fd;
- fds[0].events = POLLIN;
- fds[1].fd = 0;
- fds[1].events = POLLIN;
- fds[1].revents = 0;
-
- switch (poll(fds, pollfds, timeout)) {
- case -1:
- if (errno == EINTR)
- continue;
- perror("poll");
- return -1;
- case 0:
- close(fd);
- return 0;
- }
-
- if (fds[0].revents & POLLIN) {
- unsigned int blen = rand();
-
- blen %= sizeof(buf);
-
- ++blen;
- len = read(fd, buf, blen);
- if (len < 0) {
- perror("read");
- return -1;
- }
-
- if (len > blen) {
- fprintf(stderr, "read returned more data than buffer length\n");
- len = blen;
- }
-
- write(1, buf, len);
- }
- if (fds[1].revents & POLLIN) {
- len = read(0, buf, sizeof(buf));
- if (len == 0) {
- pollfds = 1;
- timeout = 1000;
- continue;
- }
-
- if (len < 0) {
- perror("read");
- break;
- }
+ return 2;
- do_write(fd, buf, len);
- }
- }
-
- return 1;
+ return copyfd_io(0, fd, 1);
}
int parse_proto(const char *proto)
@@ -344,6 +336,8 @@ int parse_proto(const char *proto)
return IPPROTO_MPTCP;
if (!strcasecmp(proto, "TCP"))
return IPPROTO_TCP;
+
+ fprintf(stderr, "Unknown protocol: %s.", proto);
die_usage();
/* silence compiler warning */
@@ -354,20 +348,25 @@ static void parse_opts(int argc, char **argv)
{
int c;
- while ((c = getopt(argc, argv, "c:p:s:h")) != -1) {
+ while ((c = getopt(argc, argv, "lp:s:ht:")) != -1) {
switch (c) {
- case 'c':
- cfg_client_proto = parse_proto(optarg);
+ case 'l':
+ listen_mode = true;
break;
case 'p':
cfg_port = optarg;
break;
case 's':
- cfg_server_proto = parse_proto(optarg);
+ cfg_sock_proto = parse_proto(optarg);
break;
case 'h':
die_usage();
break;
+ case 't':
+ poll_timeout=atoi(optarg) * 1000;
+ if (poll_timeout <= 0)
+ poll_timeout = -1;
+ break;
}
}
@@ -376,11 +375,19 @@ static void parse_opts(int argc, char **argv)
cfg_host = argv[optind];
}
-
int main(int argc, char *argv[])
{
init_rng();
parse_opts(argc, argv);
+
+ if (listen_mode) {
+ int fd = sock_listen_mptcp(cfg_host, cfg_port);
+ if (fd < 0)
+ return 1;
+
+ return main_loop_s(fd);
+ }
+
return main_loop();
}
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index ec64e95a6d35..0f283cd37225 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -1,46 +1,200 @@
#!/bin/bash
-tmpin=$(mktemp)
-tmpout=$(mktemp)
+ret=0
+sin=""
+sout=""
+cin=""
+cout=""
+ksft_skip=4
+
+TEST_COUNT=0
cleanup()
{
- rm -f "$tmpin" "$tmpout"
+ rm -f "$cin" "$cout"
+ rm -f "$sin" "$sout"
+
+ for i in 1 2 3 4; do
+ ip netns del ns$i
+ done
+}
+
+ip -Version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+ echo "SKIP: Could not run test without ip tool"
+ exit $ksft_skip
+fi
+
+sin=$(mktemp)
+sout=$(mktemp)
+cin=$(mktemp)
+cout=$(mktemp)
+trap cleanup EXIT
+
+for i in 1 2 3 4;do
+ ip netns add ns$i || exit $ksft_skip
+ ip -net ns$i link set lo up
+done
+
+# ns1 ns2 ns3 ns4
+# ns1eth2 ns2eth1 ns2eth3 ns3eth2 ns3eth4 ns4eth3
+
+ip link add ns1eth2 netns ns1 type veth peer name ns2eth1 netns ns2
+ip link add ns2eth3 netns ns2 type veth peer name ns3eth2 netns ns3
+ip link add ns3eth4 netns ns3 type veth peer name ns4eth3 netns ns4
+
+ip -net ns1 addr add 10.0.1.1/24 dev ns1eth2
+ip -net ns1 link set ns1eth2 up
+ip -net ns1 route add default via 10.0.1.2
+
+ip -net ns2 addr add 10.0.1.2/24 dev ns2eth1
+ip -net ns2 link set ns2eth1 up
+
+ip -net ns2 addr add 10.0.2.1/24 dev ns2eth3
+ip -net ns2 link set ns2eth3 up
+ip -net ns2 route add default via 10.0.2.2
+ip netns exec ns2 sysctl -q net.ipv4.ip_forward=1
+
+ip -net ns3 addr add 10.0.2.2/24 dev ns3eth2
+ip -net ns3 link set ns3eth2 up
+
+ip -net ns3 addr add 10.0.3.2/24 dev ns3eth4
+ip -net ns3 link set ns3eth4 up
+ip -net ns3 route add default via 10.0.2.1
+ip netns exec ns3 sysctl -q net.ipv4.ip_forward=1
+
+ip -net ns4 addr add 10.0.3.1/24 dev ns4eth3
+ip -net ns4 link set ns4eth3 up
+ip -net ns4 route add default via 10.0.3.2
+
+print_file_err()
+{
+ ls -l "$1" 1>&2
+ echo "Trailing bytes are: "
+ tail -c 27 "$1"
}
check_transfer()
{
- cl_proto=${1}
- srv_proto=${2}
+ in=$1
+ out=$2
+ what=$3
- printf "%-8s -> %-8s socket\t\t" ${cl_proto} ${srv_proto}
+ cmp "$in" "$out" > /dev/null 2>&1
+ if [ $? -ne 0 ] ;then
+ echo "[ FAIL ] $what does not match (in, out):"
+ print_file_err "$in"
+ print_file_err "$out"
- ./mptcp_connect -c ${cl_proto} -p 43212 -s ${srv_proto} 127.0.0.1 < "$tmpin" > "$tmpout" 2>/dev/null
- ret=$?
- if [ ${ret} -ne 0 ]; then
- echo "[ FAIL ]"
- return " exit code $ret"
+ return 1
fi
- cmp "$tmpin" "$tmpout" > /dev/null 2>&1
- if [ $? -ne 0 ] ;then
- echo "[ FAIL ]"
- ls -l "$tmpin" "$tmpout" 1>&2
- else
- echo "[ OK ]"
+
+ return 0
+}
+
+do_transfer()
+{
+ listener_ns="$1"
+ connector_ns="$2"
+ cl_proto="$3"
+ srv_proto="$4"
+ connect_addr="$5"
+
+ port=$((10000+$TEST_COUNT))
+ TEST_COUNT=$((TEST_COUNT+1))
+
+ :> "$cout"
+ :> "$sout"
+ ip netns exec ${connector_ns} ping -q -c 1 $connect_addr >/dev/null
+ if [ $? -ne 0 ] ; then
+ echo "$listener_ns -> $connect_addr connectivity [ FAIL ]" 1>&2
+ return 1
fi
+
+ printf "%-4s %-5s -> %-4s (%s:%d) %-5s\t" ${connector_ns} ${cl_proto} ${listener_ns} ${connect_addr} ${port} ${srv_proto}
+
+ ip netns exec ${listener_ns} ./mptcp_connect -t 10 -l -p $port -s ${srv_proto} 0.0.0.0 < "$sin" > "$sout" &
+ spid=$!
+
+ sleep 1
+
+ ip netns exec ${connector_ns} ./mptcp_connect -t 10 -p $port -s ${cl_proto} $connect_addr < "$cin" > "$cout" &
+ cpid=$!
+
+ wait $cpid
+ retc=$?
+ wait $spid
+ rets=$?
+
+ if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
+ echo "[ FAIL ] client exit code $retc, server $rets" 1>&2
+ echo "\nnetns ${listener_ns} socket stat for $port:" 1>&2
+ ip netns exec ${listener_ns} ss -nita 1>&2 -o "sport = :$port"
+ echo "\nnetns ${connector_ns} socket stat for $port:" 1>&2
+ ip netns exec ${connector_ns} ss -nita 1>&2 -o "dport = :$port"
+
+ ret=$rets
+ return 1
+ fi
+
+ check_transfer $sin $cout "file received by client"
+ retc=$?
+ check_transfer $cin $sout "file received by server"
+ rets=$?
+
+ if [ $retc -eq 0 ] && [ $rets -eq 0 ];then
+ echo "[ OK ]"
+ return 0
+ fi
+
+ return 1
}
-trap cleanup EXIT
+make_file()
+{
+ name=$1
+ who=$2
-SIZE=$((RANDOM % (1024 * 1024)))
-if [ $SIZE -eq 0 ] ; then
- SIZE=1
-fi
+ SIZE=$((RANDOM % (1024 * 8)))
+ TSIZE=$((SIZE * 1024))
+
+ dd if=/dev/urandom of="$name" bs=1024 count=$SIZE 2> /dev/null
+
+ SIZE=$((RANDOM % 1024))
+ SIZE=$((SIZE + 128))
+ TSIZE=$((TSIZE + SIZE))
+ dd if=/dev/urandom conf=notrunc of="$name" bs=1 count=$SIZE 2> /dev/null
+ echo -e "\nMPTCP_TEST_FILE_END_MARKER" >> "$name"
+
+ echo "Created $name (size $TSIZE) containing data sent by $who"
+}
+
+run_tests()
+{
+ listener_ns="$1"
+ connector_ns="$2"
+ connect_addr="$3"
+
+ do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP ${connect_addr}
+ [ $? -ne 0 ] && return
+ do_transfer ${listener_ns} ${connector_ns} MPTCP TCP ${connect_addr}
+ [ $? -ne 0 ] && return
+ do_transfer ${listener_ns} ${connector_ns} TCP MPTCP ${connect_addr}
+}
+
+make_file "$cin" "client"
+make_file "$sin" "server"
+
+for sender in 1 2 3 4;do
+ run_tests ns1 ns$sender 10.0.1.1
+
+ run_tests ns2 ns$sender 10.0.1.2
+ run_tests ns2 ns$sender 10.0.2.1
-dd if=/dev/urandom of="$tmpin" bs=1 count=$SIZE 2> /dev/null
+ run_tests ns3 ns$sender 10.0.2.2
+ run_tests ns3 ns$sender 10.0.3.2
-check_transfer MPTCP MPTCP
-check_transfer MPTCP TCP
-check_transfer TCP MPTCP
+ run_tests ns4 ns$sender 10.0.3.1
+done
-exit 0
+exit $ret
--
2.21.0
1 year, 11 months
Change in ...mptcp_net-next[t/upstream]: tcp, ulp: Add clone operation to tcp_ulp_ops
by Mat Martineau (GerritHub)
Mat Martineau has uploaded this change for review. ( https://review.gerrithub.io/c/multipath-tcp/mptcp_net-next/+/452480
Change subject: tcp, ulp: Add clone operation to tcp_ulp_ops
......................................................................
tcp, ulp: Add clone operation to tcp_ulp_ops
If ULP is used on a listening socket, icsk_ulp_ops and icsk_ulp_data are
copied when the listener is cloned. Sometimes the clone is immediately
deleted, which will invoke the release op on the clone and likely
corrupt the listening socket's icsk_ulp_data.
The clone operation is invoked immediately after the clone is copied and
gives the ULP type an opportunity to set up the clone socket and its
icsk_ulp_data.
Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
Change-Id: I380ce1411ccbaff50cb22b809e5b2ec5877fa1d2
---
M include/net/tcp.h
M net/ipv4/inet_connection_sock.c
M net/ipv4/tcp_ulp.c
3 files changed, 19 insertions(+), 0 deletions(-)
git pull ssh://review.gerrithub.io:29418/multipath-tcp/mptcp_net-next refs/changes/80/452480/1
--
To view, visit https://review.gerrithub.io/c/multipath-tcp/mptcp_net-next/+/452480
To unsubscribe, or for help writing mail filters, visit https://review.gerrithub.io/settings
Gerrit-Project: multipath-tcp/mptcp_net-next
Gerrit-Branch: t/upstream
Gerrit-Change-Id: I380ce1411ccbaff50cb22b809e5b2ec5877fa1d2
Gerrit-Change-Number: 452480
Gerrit-PatchSet: 1
Gerrit-Owner: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
Gerrit-MessageType: newchange
1 year, 11 months
scheduling-while-atomic splat in mptcp_shutdown
by Florian Westphal
Hi,
I get following splat in mptcp_shutdown:
-----
BUG: sleeping function called from invalid context at net/core/sock.c:2911
in_atomic(): 0, irqs_disabled(): 0, pid: 2238, name: mptcp_connect
[ 61.599662] 1 lock held by mptcp_connect/2238:
[ 61.600395] #0: 000000005d0e085e (rcu_read_lock){....}, at: mptcp_shutdown+0x91/0x2a0
[ 61.601754] Preemption disabled at:
[ 61.601780] [<ffffffff811a28ee>] rcu_lockdep_current_cpu_online+0x2e/0xa0
[ 61.606184] Call Trace:
[ 61.607183] ? rcu_lockdep_current_cpu_online+0x2e/0xa
[ 61.608047] ___might_sleep.cold.73+0x129/0x13d
[ 61.608800] lock_sock_nested+0x32/0xc0
[ 61.609447] inet_shutdown+0x51/0x1c0
[ 61.609999] mptcp_shutdown+0x13e/0x2a0
-----
I can look into it next week in case noone more familiar with
the code has time to work on this.
1 year, 11 months
[PATCH] mptcp: honor net namespace when creating subflow sk
by Florian Westphal
Otherwise, even loopback communication doesn't work
when program is run from non-init_netns.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
Could be squashed with
'mptcp: Make connection_list a real list of subflows'.
net/mptcp/protocol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1e73e307a6d2..1fbddbcb1164 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -703,12 +703,12 @@ static struct proto mptcp_prot = {
static int mptcp_subflow_create(struct sock *sk)
{
struct mptcp_sock *msk = mptcp_sk(sk);
+ struct net *net = sock_net(sk);
struct socket *sf;
int err;
pr_debug("msk=%p", msk);
- err = sock_create_kern(&init_net, PF_INET, SOCK_STREAM, IPPROTO_TCP,
- &sf);
+ err = sock_create_kern(net, PF_INET, SOCK_STREAM, IPPROTO_TCP, &sf);
if (!err) {
lock_sock(sf->sk);
err = tcp_set_ulp(sf->sk, "mptcp");
--
2.21.0
1 year, 11 months
[PATCH] mptcp: fix oops on accept
by Paolo Abeni
while running the self-test on a multi core VM, with a standard (non
debug) kernel config, I hit the following oops:
[ 234.587877] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 234.591567] #PF error: [normal kernel read fault]
[ 234.593862] PGD 800000013a8ec067 P4D 800000013a8ec067 PUD 13a2ad067 PMD 0
[ 234.596616] Oops: 0000 [#1] SMP PTI
[ 234.597173] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.1.0-rc4.mptcp_xmit_07723d4+ #2139
[ 234.598363] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 234.599847] RIP: 0010:mptcp_finish_connect+0x2a/0x160
[ 234.600464] Code: 0f 1f 44 00 00 41 54 41 89 f4 55 53 48 89 fb 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 48 8b 87 80 05 00 00 <48> 8b 40 20 48 8b a8 88 04 00 00 0f 1f 44 00 00 45 85 e4 0f 84 ad
[ 234.602646] RSP: 0018:ffffa014fbb03c78 EFLAGS: 00010246
[ 234.603307] RAX: 0000000000000000 RBX: ffffa014fac92280 RCX: 0000000000000001
[ 234.604158] RDX: ffffa014fa00c480 RSI: 0000000000000001 RDI: ffffa014fac92280
[ 234.605010] RBP: ffffa014f9a48000 R08: ffffa014fa00c4c0 R09: 0000000000000000
[ 234.605861] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
[ 234.606701] R13: ffffa014f6c99424 R14: 0000000000000014 R15: ffffa014f6c99410
[ 234.607555] FS: 0000000000000000(0000) GS:ffffa014fbb00000(0000) knlGS:0000000000000000
[ 234.608498] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 234.609110] CR2: 0000000000000020 CR3: 0000000137e22004 CR4: 00000000001606e0
[ 234.609863] Call Trace:
[ 234.610152] <IRQ>
[ 234.610407] subflow_finish_connect+0x3d/0xc0
[ 234.610883] tcp_rcv_established+0x641/0x660
[ 234.611353] ? tcp_v4_inbound_md5_hash+0x64/0x1d0
[ 234.611862] tcp_v4_do_rcv+0xf4/0x1e0
[ 234.612269] tcp_v4_rcv+0x99c/0xaa0
[ 234.612665] ip_protocol_deliver_rcu+0x4b/0x180
[ 234.613159] ip_local_deliver_finish+0x40/0x50
[ 234.613647] ip_local_deliver+0x6b/0xe0
[ 234.614071] ? ip_rcv_finish+0x62/0xa0
[ 234.614495] ip_rcv+0x52/0xd0
[ 234.614841] __netif_receive_skb_one_core+0x52/0x70
[ 234.615403] process_backlog+0x97/0x130
[ 234.615897] net_rx_action+0x299/0x3d0
[ 234.616319] __do_softirq+0xd8/0x2b7
[ 234.616730] irq_exit+0xd5/0xe0
[ 234.617089] smp_apic_timer_interrupt+0x74/0x140
[ 234.617933] apic_timer_interrupt+0xf/0x20
[ 234.618623] </IRQ>
[ 234.619003] RIP: 0010:default_idle+0x17/0x140
[ 234.619635] Code: ff 4c 89 e7 e8 7a ca c9 ff fb eb 98 90 90 90 90 90 90 90 0f 1f 44 00 00 41 54 55 53 65 8b 2d 00 95 f6 56 0f 1f 44 00 00 fb f4 <65> 8b 2d f2 94 f6 56 0f 1f 44 00 00 5b 5d 41 5c c3 65 8b 05 e1 94
The problem is in mptcp_accept(): the newly created subflow is
already added to the mptcp subflow list, but conn_finished is not set.
On next rx dst cache update we enter subflow_finish_connect()/mptcp_finish_connect()
and hit the crash due to NULL msk->subflow de-referncing.
Fix the above properly initializing the subflow at accept() time.
Fixes: a36080381c36 ("mptcp: Create SUBFLOW socket for incoming connections")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
Note: I'm ok with swashing this change in the commit referenced by the
'fixes' tag.
---
net/mptcp/protocol.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 55a39ec748b5..fcfb74818c30 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -639,6 +639,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
subflow->map_subflow_seq = 1;
subflow->rel_write_seq = 1;
subflow->conn = new_mptcp_sock->sk;
+ subflow->conn_finished = 1;
} else {
msk->subflow = new_sock;
}
--
2.20.1
1 year, 11 months
[PATCH] mptcp: mptcp option len must be skb independent
by Paolo Abeni
Otherwise the TCP stack can't compute correctly the mss and
that will break segmentation. Note that we can't observe the
issue with the self-test until we allow skb collapsing -
loopback default MTU is pretty big and we dont build big enogh
skb otherwise.
Fix the issue caching the use_map/use_checksum flags also at the
substream ctx level, and using them in tcp_established_options().
While at it, replace a bunch of magic numbers with the appropriate
constants.
Fixes: a0e41f8a56d6 ("mptcp: Write MPTCP DSS headers to outgoing data packets")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
--
Notes:
- I'm ok with swashing this commit into the referenced one
- this fixes the data corruption I was observing on the xmit path
refactor patchset
---
include/net/mptcp.h | 2 ++
net/ipv4/tcp_output.c | 18 ++++++++----------
net/mptcp/subflow.c | 4 ++++
3 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 0c0839d9ab38..5658b85da7ca 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -128,6 +128,8 @@ struct subflow_context {
bool fourth_ack; // send initial DSS
bool conn_finished;
bool map_valid;
+ bool use_map;
+ bool use_checksum;
struct sock *sk; /* underlying tcp_sock */
struct sock *conn; /* parent mptcp_sock */
void (*tcp_sk_data_ready)(struct sock *sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6798b1e91140..4e5dab801bed 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -822,19 +822,17 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
opts->mptcp.rcvr_key = remote_key;
size += TCPOLEN_MPTCP_MPC_ACK;
}
- } else if (subflow_ctx(sk)->mp_capable && skb) {
+ } else if (subflow_ctx(sk)->mp_capable) {
+ unsigned int ack_size = TCPOLEN_MPTCP_DSS_ACK64;
unsigned int dss_size = 0;
- unsigned int ack_size = 8;
- struct mptcp_ext *mpext;
u16 suboptions = 0;
- mpext = mptcp_get_ext(skb);
+ if (subflow_ctx(sk)->use_map) {
+ unsigned int map_size = TCPOLEN_MPTCP_DSS_BASE +
+ TCPOLEN_MPTCP_DSS_MAP64;
- if (mpext && mpext->use_map) {
- unsigned int map_size = 18;
-
- if (mpext->use_checksum)
- map_size += 2;
+ if (subflow_ctx(sk)->use_checksum)
+ map_size += TCPOLEN_MPTCP_DSS_CHECKSUM;
if (map_size <= remaining) {
remaining -= map_size;
@@ -849,7 +847,7 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
* overhead if mapping not populated
*/
if (dss_size == 0)
- ack_size += 4;
+ ack_size += TCPOLEN_MPTCP_DSS_BASE;
if (ack_size <= remaining) {
dss_size += ack_size;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 077d5baae270..3c6988cb3126 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -130,6 +130,8 @@ static struct subflow_context *clone_ctx(const struct sock *sk, struct sock *chi
child_ctx->sk = child;
child_ctx->conn = NULL;
child_ctx->tcp_sk_data_ready = ctx->tcp_sk_data_ready;
+ child_ctx->use_map = ctx->use_map;
+ child_ctx->use_checksum = ctx->use_checksum;
return child_ctx;
}
@@ -237,6 +239,8 @@ static int subflow_init(struct sock *sk)
tsk->is_mptcp = 1;
icsk->icsk_af_ops = &subflow_specific;
ctx->tcp_sk_data_ready = sk->sk_data_ready;
+ ctx->use_map = true;
+ ctx->use_checksum = false;
sk->sk_data_ready = subflow_data_ready;
out:
return err;
--
2.20.1
1 year, 11 months
[PATCH 0/3] mptcp: refactor xmit path
by Paolo Abeni
his series refactor the mptcp xmit path trying to make sendmsg behavior
more consistent, memory efficent and improve its performances.
The code is also available here:
https://github.com/pabeni/mptcp/tree/mptcp-proposal-devel
To support MP_JOIN and multiple subflows, the idea is adding, with later
patches, an RB-tree structure containing references to the pending page
fragments, their length and the associated mptcp write_seq number (is subflow
rel_write_seq needed, too?). RB-tree node can be allocated using the same
page frag used in this series for skb data. Each RB-tree node will own a
reference to the page frag.
RFC -> v1:
- addressed Mat's feedback, dropping patch 3/4 and switching
to per msk page frag use, and checking mptcp seq number for coalescing
Paolo Abeni (3):
mptcp: use sk_page_frag() in sendmsg
mptcp: sendmsg() do spool all the provided data
mptcp: allow collapsing consecutive sendpages on the same substream
net/mptcp/protocol.c | 181 ++++++++++++++++++++++++++++---------------
1 file changed, 119 insertions(+), 62 deletions(-)
--
2.20.1
1 year, 12 months