[RFC PATCH v4 00/17] MPTCP architecture proposal
by Mat Martineau
Hello everyone,
Peter and I have been working on this patch set to show how how MPTCP
can fit in to the Linux networking stack using these design ideas:
* Applications opt-in to MPTCP using IPPROTO_MPTCP, regular TCP sockets
are still the default. A socket created with
socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP) will attempt to form a
MPTCP connection. IPPROTO_MPTCP == 99 as a placeholder.
* Subflows exist within the kernel as separate sockets, owned by a
MPTCP connection-level socket that is visible to userspace.
* Adds private pointers to struct sk_buff to store MPTCP metadata.
* Adds the CONFIG_MPTCP option to Kconfig.
Note that this does not yet make use of Florian's CONFIG_SKB_EXTENSIONS,
but I plan to drop patch 12 of this series and use CONFIG_SKB_EXTENSIONS
instead (since they are designed for multiple uses and will hopefully
be merged upstream). Refer to
https://marc.info/?l=linux-netdev&m=154323251731893&w=2
The following patches can form an MPTCP connection with the
multipath-tcp.org kernel (tested with v0.94), and send DSS mappings that
are accepted for the initial data packet. It is an early implementation,
and I don't represent it as being upstreamable as-is or being everyone's
idea of what an eventual upstream implementation will necessarily look
like. It has significant limitations:
* Only one subflow is supported, no joins, and only ipv4.
* Does not support DSS checksums. Checksums must be disabled on the
remote stack (for multipath-tcp.org, 'sudo sysctl -w
net.mptcp.mptcp_checksum=0')
* Lots of debug statements (although they use dynamic debug and are
disabled by default) and TODOs.
* It's only been tested sending small amounts of data for each send
Hopefully there are are some interesting concepts to discuss, and this
code helps us assess how workable the above design principles
are. Thanks in advance for your feedback on the benefits or drawbacks of
this code, how it might be improved, or how other approaches might
compare.
The patch set applies to net-next (as of commit 1464193107da). I have also
pushed it to:
https://git.kernel.org/pub/scm/linux/kernel/git/martineau/linux.git
(mptcp-proposal branch)
v4 changes: Refine skb extension (remove copy hook), change rx path to
use skb extension instead of error queue,
v3 changes: Change skb extension technique, change rx path to use error
queue, add foundational code for multiple subflows, and many bug fixes.
v2 changes: Added receive path implementation (last two patches).
Reworked TCP option writing. Miscellaneous bug fixes including
header dependency cleanup.
Mat Martineau (7):
tcp: Add MPTCP option number
tcp: Define IPPROTO_MPTCP
skbuff: Add private data pointer
tcp: Prevent coalesce and collapse when skb->priv is used
tcp: Export low-level TCP functions
mptcp: Write MPTCP DSS headers to outgoing data packets
mptcp: Implement MPTCP receive path
Peter Krystad (10):
mptcp: Add MPTCP socket stubs
mptcp: Handle MPTCP TCP options
tcp: Add IPPROTO_SUBFLOW
tcp: expose tcp routines and structs for MPTCP
mptcp: Create SUBFLOW socket for outgoing connections
mptcp: Create SUBFLOW socket for incoming connections
mptcp: Add key generation and token tree
mptcp: Add shutdown() socket operation
mptcp: Add setsockopt()/getsockopt() socket operations
mptcp: Make connection_list a real list of subflows
include/linux/skbuff.h | 13 +-
include/linux/tcp.h | 26 ++
include/net/inet_common.h | 3 +
include/net/mptcp.h | 234 ++++++++++
include/net/tcp.h | 15 +
include/uapi/linux/in.h | 4 +
net/Kconfig | 1 +
net/Makefile | 1 +
net/core/skbuff.c | 5 +
net/ipv4/af_inet.c | 2 +-
net/ipv4/tcp.c | 12 +-
net/ipv4/tcp_input.c | 23 +-
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv4/tcp_output.c | 249 +++++++++-
net/mptcp/Kconfig | 10 +
net/mptcp/Makefile | 3 +
net/mptcp/crypto.c | 215 +++++++++
net/mptcp/options.c | 302 ++++++++++++
net/mptcp/protocol.c | 939 ++++++++++++++++++++++++++++++++++++++
net/mptcp/subflow.c | 377 +++++++++++++++
net/mptcp/token.c | 256 +++++++++++
21 files changed, 2663 insertions(+), 31 deletions(-)
create mode 100644 include/net/mptcp.h
create mode 100644 net/mptcp/Kconfig
create mode 100644 net/mptcp/Makefile
create mode 100644 net/mptcp/crypto.c
create mode 100644 net/mptcp/options.c
create mode 100644 net/mptcp/protocol.c
create mode 100644 net/mptcp/subflow.c
create mode 100644 net/mptcp/token.c
--
2.19.1
3 years, 6 months
One approach to indirect call optimization
by Mat Martineau
I noticed this patch on netdev to avoid an indirect call to md5_lookup,
which was accepted. It is mitigating the cost of an existing indirect call
rather than adding a new one, but shows how the maintainers are looking at
the problem.
--
Mat Martineau
Intel OTC
---------- Forwarded message ----------
Date: Mon, 23 Apr 2018 14:46:25
From: Eric Dumazet <edumazet(a)google.com>
To: David S . Miller <davem(a)davemloft.net>
Cc: netdev <netdev(a)vger.kernel.org>, Eric Dumazet <edumazet(a)google.com>,
Eric Dumazet <eric.dumazet(a)gmail.com>
Subject: [PATCH net-next] tcp: md5: only call tp->af_specific->md5_lookup() for
md5 sockets
RETPOLINE made calls to tp->af_specific->md5_lookup() quite expensive,
given they have no result.
We can omit the calls for sockets that have no md5 keys.
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
---
net/ipv4/tcp_output.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 383cac0ff0ec059ca7dbc1a6304cc7f8183e008d..95feffb6d53f8a9eadfb15a2fffeec498d6e993a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -585,14 +585,15 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
unsigned int remaining = MAX_TCP_OPTION_SPACE;
struct tcp_fastopen_request *fastopen = tp->fastopen_req;
+ *md5 = NULL;
#ifdef CONFIG_TCP_MD5SIG
- *md5 = tp->af_specific->md5_lookup(sk, sk);
- if (*md5) {
- opts->options |= OPTION_MD5;
- remaining -= TCPOLEN_MD5SIG_ALIGNED;
+ if (unlikely(rcu_access_pointer(tp->md5sig_info))) {
+ *md5 = tp->af_specific->md5_lookup(sk, sk);
+ if (*md5) {
+ opts->options |= OPTION_MD5;
+ remaining -= TCPOLEN_MD5SIG_ALIGNED;
+ }
}
-#else
- *md5 = NULL;
#endif
/* We always get an MSS option. The option bytes which will be seen in
@@ -720,14 +721,15 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
opts->options = 0;
+ *md5 = NULL;
#ifdef CONFIG_TCP_MD5SIG
- *md5 = tp->af_specific->md5_lookup(sk, sk);
- if (unlikely(*md5)) {
- opts->options |= OPTION_MD5;
- size += TCPOLEN_MD5SIG_ALIGNED;
+ if (unlikely(rcu_access_pointer(tp->md5sig_info))) {
+ *md5 = tp->af_specific->md5_lookup(sk, sk);
+ if (*md5) {
+ opts->options |= OPTION_MD5;
+ size += TCPOLEN_MD5SIG_ALIGNED;
+ }
}
-#else
- *md5 = NULL;
#endif
if (likely(tp->rx_opt.tstamp_ok)) {
--
2.17.0.484.g0c8726318c-goog
3 years, 6 months
[PATCH 0/5] Forward-ports for mptcp-net-next
by Christoph Paasch
Hello,
here are some forward-ports for mptcp-net-next that come from
mptcp_trunk.
Cheers,
Christoph
Christoph Paasch (3):
mptcp: Remove IPv6 duplicated address detection
mptcp: Always call sk_data_ready
mptcp: Correctly initialize IPv6 fields
Kostas Peletidis (2):
Export tcp_send_ack symbol to fix build issue with mptcp_fullmesh
module (mptcp_trunk). This commit should make it possible to revert
64a80850.
Revert "Fix building mptcp_fullmesh as module"
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_output.c | 1 +
net/mptcp/mptcp_ctrl.c | 30 ++++++++--------
net/mptcp/mptcp_fullmesh.c | 85 ++++++----------------------------------------
4 files changed, 29 insertions(+), 89 deletions(-)
--
2.16.2
3 years, 6 months
[Weekly meetings] MoM - 29th of November 2018
by Matthieu Baerts
Hello,
We just had our 30th meeting with Mat, Peter and Ossama (Intel OTC),
Christoph (Apple), Florian (Redhat) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
News from multipath-tcp.org:
- A few patches have been applied (mptcp_trunk, v0.93 and v0.94)
- Christoph updated mptcp_trunk to v4.19 (WIP)
Note: v4.19 introduces batch processing of network packets but
it should be safe for MPTCP, done at the above layer
https://lwn.net/Articles/763056/
skb extension:
- See the discussion on the ML:
https://lists.01.org/pipermail/mptcp/2018-November/000840.html (and next
messages on the same thread)
- Goal: Coalescing and extension data
- MPTCP needs something like that for the receive side
- patch by Florian, RFC, initially for ipsec/bridge
- coalescing: some changes have been proposed on netdev but, to be
checked, only done if TCP options are all the same, not adapted for
MPTCP case then
- note: with ktls, they disabled the coalescing
- note: with MPTCP, you don't need the DSS in all packets
- anyway, this RFC patch will also be very useful for MPTCP → we
know we should go in this direction as well
Mat:
- looking at sharing a new version without the use of err queue.
- also rebase the work on Florian's patch-set.
Matthieu:
- will send a new version of Netlink PM in the coming days/weeks:
the goal is to have it in the MPTCP version based on v4.19
Next meeting:
- We propose to have the next one on Thursday, the 6th of December.
- Usual time: 17:00 UTC (9am PST, 6pm CET)
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20181206
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you next week,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
3 years, 6 months
New skb extension proposal on netdev
by Mat Martineau
Florian posted a skb extension patch set on netdev today and I wanted to
have folks here take a look:
https://marc.info/?l=linux-netdev&m=154323251731893&w=2
So far, I really like how it solves the problem of making extension space
available to multiple layers.
To give more context for the netdev discussion (and to see if others want
to add to it):
The MPTCP control block in our upstream proposal branch looks like this:
struct mptcp_skb_cb {
u64 data_ack;
/* DSS mapping */
u64 data_seq;
u32 subflow_seq;
u16 dll;
__sum16 checksum;
u8 use_map:1,
dsn64:1,
use_checksum:1,
data_fin:1,
use_ack:1,
ack64:1,
__unused:2;
};
This is the content of the MPTCP DSS option that needs to propagate from
the TCP headers to the MPTCP upper layer (and vice versa). The DSS mapping
is used to establish where a part of the TCP sequence space belongs in the
64-bit MPTCP sequence space. This can be included in any data packet, or
as little as once per 2^16 bytes of data.
The approaches we've tried already are the skb_shared_info technique
mentioned in Florian's cover letter, and a variation on Eric's suggestion
(to add a non-initialized area to sk_buff_fclones). I guess we
misunderstood where Eric wanted to put that data - I built an skb
extension using an uninitialized area in sk_buff. Neither of these handled
the issue of sharing the extension bytes, which Florian's patch does.
Sharing that space makes it usable for other purposes (sp, nf_bridge,
kTLS).
I need to finish looking over the patch set, but so far I think it's a
good fit for MPTCP and other uses.
Thanks,
--
Mat Martineau
Intel OTC
3 years, 6 months
Weekly meeting - no meeting on the 22nd of November 2018
by Matthieu Baerts
Hello!
As agreed at the previous meeting, we will skip the one of this week. No
meeting tomorrow then.
Happy holidays for the lucky ones! :-)
Speak to you next week!
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
3 years, 7 months
[Weekly meetings] MoM - 14th of November 2018
by Matthieu Baerts
Hello,
We just had our 29th meeting with Mat, Peter and Ossama (Intel OTC),
Christoph (Apple) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Netdev 0x13:
- https://www.netdevconf.org/0x13/
- March 20th to 22nd, 2019
- Prague
- Closing of CFS: Tue, January 15, 2019.
- any ideas for a presentation?
- that would also a good occasion to meet people and the group
- Christoph might not be present
- other occasions: Linux plumbers. Should be in Portugal next year.
(September 8-10, 2019 | Lisbon, Portugal)
Gerrit:
- the two repo are sync now.
- use gerrit for review
- topgit: Matth can manage the tree to simplify the process: push
stuff for review, Matth apply them.
- let's push stuff on gerrit
- if gerrit is an issue, we can fallback to ML without issue
- goal: improve process, not adding complexity
git checkout -b <my_branch> master
(...)
git commit -s
git review ## instead of git send-email
- AP: Matth: document typical use cases with commands and put that
on the wiki
- I have a new change for 'master' (full MPTCP on top of
mptcp_trunk)
- I have a new change for 'mptcp_net-next' (new architecture)
- I have a fix for an existing topic
- I want to sync 'mptcp_net-next' with 'net-next'
Next steps:
- same as last time
- new AP for Matth, see [Gerrit] section.
- As usual, update the wiki if there are new stuffs to put in!
Next meeting:
- We propose to skip to one of next week, many people are off. The
next one would be on Thursday, the 29th of November. Usual time: 9am PDT
- 17:00 UTC (9am PST, 6pm CET)
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20181129
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you in two weeks,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
3 years, 7 months
Gerrit: use cases
by Matthieu Baerts
Hello,
At the last meeting we had 2 weeks ago, we were interested by using
Gerrit to help reviewing large patches sets.
We were also interested by using TopGit later to ease maintenance of
patches that are "ready to publish" without having to do any rebase on
published branches on this repo.
We would then like to combine the two tools but keep Github as repo
containing the code. We identify gerrithub.io as free Gerrit service
which mirrors the code on Github for us.
The Github repo will be used in a read-only mode, all push should go
via the Gerrit server.
To use both Gerrit and TopGit tools, it is easy: we propose Changes on
Gerrit like on any Gerrit server (e.g. by using 'git-review' tool) but
we don't accept any "Submit" (aka "Merge") it. Instead, someone (the
developer?) can integrate it in the Topgit tree.
In order to configure Gerrit properly, we first need to list the
different use cases we want to support. Here are the ones I see for
the moment:
- anybody can create new Changes (aka Pull Request) for the top branch
- can anybody create Changes for others? (pusher != commiter) or do
we want to restrict that for a group of people? Should be OK for me to
allow that for anybody
- can anybody propose a new version of other's Changes?
- anybody can read these new Changes
- anybody can create new Changes for any topic branches (Note: ideally
we should prefix these branches, e.g. "t/(...)")
- nobody can push force (except admins, just in case)
- only a group of people can modify the Topgit tree:
- create new topics (refs/heads/t/* + refs/top-bases/t/*) → Create Reference
- modify topics (push) → Push, Push Merge Commits, Forge Author
Identity and Committer Identity
- delete topics (?)
- only a group of people can update the base of the Topgit tree (sync
with net-next).
- anybody can create a WIP branch (refs/heads/sandbox/${username}/*).
Or only a group of people? (IMO it is maybe better to restrict to a
group of people, anybody can create a fork on Github anyway)
- anybody can comment and give a "Code-Review +1" ("Looks good to me,
but someone else must approve)
- only a group of people can give a "Code-Review +2" ("Looks good to
me, approved)
- only another group of people (QA + CI?) can give a Verified +1
- only admin can manage the Gerrit config
- only a group of people can tag (also with push force permission?)
Please comment if you see other use-cases!
Cheers,
Matt
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
3 years, 7 months