Reformat Prop329.

author: Mike Perry <mikeperry-git@torproject.org> 2021-03-26 14:49:55 +0000
committer: Mike Perry <mikeperry-git@torproject.org> 2021-03-26 14:49:55 +0000
commit: d73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456 (patch)
tree: 56da959079f1607420d9f699f752e56480366845 /proposals/329-traffic-splitting.txt
parent: f66a457cb27202931d276f8b38ee92223344bd0d (diff)
download: torspec-d73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456.tar.gz
torspec-d73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456.zip
1 files changed, 709 insertions, 698 deletions
diff --git a/proposals/329-traffic-splitting.txt b/proposals/329-traffic-splitting.txt
index 746c6c4..8d73a8c 100644
--- a/proposals/329-traffic-splitting.txt
+++ b/proposals/329-traffic-splitting.txt
@@ -6,657 +6,668 @@ Status: Draft
 
 0. Status
 
-   This proposal describes the Conflux [CONFLUX] system developed by Mashael
-   AlSabah, Kevin Bauer, Tariq Elahi, and Ian Goldberg. It aims at improving
-   Tor client network performance by dynamically splitting traffic between two
-   circuits.
+  This proposal describes the Conflux [CONFLUX] system developed by
+  Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian Goldberg. It aims at
+  improving Tor client network performance by dynamically splitting
+  traffic between two circuits.
 
 
 1. Overview
 
 1.1. Multipath TCP Design Space
 
-   In order to understand our improvements to Conflux, it is important to
-   properly conceptualize what is involved in the design of multipath
-   algorithms in general.
-
-   The design space is broken into two orthogonal parts: congestion
-   control algorithms that apply to each path, and traffic scheduling
-   algorithms that decide when to send packets to send on each path.
-
-   MPTCP specifies 'coupled' congestion control (see [COUPLED]). Coupled
-   congestion control updates single-path congestion control algorithms to
-   account for shared bottlenecks between the paths, so that the combined
-   congestion control algorithms do not overwhelm any bottlenecks that
-   happen to be shared between the multiple paths. Various ways of
-   accomplishing this have been proposed and implemented in the Linux kernel.
-
-   Because Tor's congestion control only concerns itself with bottnecks in Tor
-   relay queues, and not with any other bottlenecks (such as intermediate
-   Internet routers), we can avoid this complexity merely by specifying that
-   any paths that are constructed should not share any relays. In this way, we
-   can proceed to use the exact same congestion control as specified in Proposal
-   324, for each path.
-
-   For this reason, this proposal will focus on the traffic scheduling
-   algorithms, rather than coupling. We propose three candidate algorithms
-   that have been studied in the literature, and will compare their
-   performance using simulation and consensus parameters.
+  In order to understand our improvements to Conflux, it is important to
+  properly conceptualize what is involved in the design of multipath
+  algorithms in general.
+  
+  The design space is broken into two orthogonal parts: congestion control
+  algorithms that apply to each path, and traffic scheduling algorithms
+  that decide when to send packets to send on each path.
+  
+  MPTCP specifies 'coupled' congestion control (see [COUPLED]). Coupled
+  congestion control updates single-path congestion control algorithms to
+  account for shared bottlenecks between the paths, so that the combined
+  congestion control algorithms do not overwhelm any bottlenecks that
+  happen to be shared between the multiple paths. Various ways of
+  accomplishing this have been proposed and implemented in the Linux
+  kernel.
+  
+  Because Tor's congestion control only concerns itself with bottnecks in
+  Tor relay queues, and not with any other bottlenecks (such as
+  intermediate Internet routers), we can avoid this complexity merely by
+  specifying that any paths that are constructed should not share any
+  relays. In this way, we can proceed to use the exact same congestion
+  control as specified in Proposal 324, for each path.
+  
+  For this reason, this proposal will focus on the traffic scheduling
+  algorithms, rather than coupling. We propose three candidate algorithms
+  that have been studied in the literature, and will compare their
+  performance using simulation and consensus parameters.
 
 1.2. Divergence from the initial Conflux design
 
-   The initial [CONFLUX] paper doesn't provide any indications on how to handle
-   the size of out-of-order cell queue, which we consider a potential dangerous
-   memory DoS vector (see [MEMORY_DOS]). It also used RTT as the sole heuristic
-   for selecting which circuit to send on, which may vary depending on the
-   geographical locations of the participant relays, without considering their
-   actual available circuit capacity (which will be available to us via Proposal
-   324). Additionally, since the publication of [CONFLUX], more modern
-   packet scheduling algorithms have been developed, which aim to reduce
-   out-of-order queue size.
-
-   We propose mitigations for these issues using modern scheduling algorithms,
-   as well as implementations options for avoiding the out-of-order queue at
-   Exit relays. Additionally, we consider resumption, side channel, and traffic
-   analysis risks and benefits in [RESUMPTION], [SIDE_CHANNELS] and
-   [TRAFFIC_ANALYSIS].
+  The initial [CONFLUX] paper doesn't provide any indications on how to
+  handle the size of out-of-order cell queue, which we consider a
+  potential dangerous memory DoS vector (see [MEMORY_DOS]). It also used
+  RTT as the sole heuristic for selecting which circuit to send on, which
+  may vary depending on the geographical locations of the participant
+  relays, without considering their actual available circuit capacity
+  (which will be available to us via Proposal 324). Additionally, since
+  the publication of [CONFLUX], more modern packet scheduling algorithms
+  have been developed, which aim to reduce out-of-order queue size.
+  
+  We propose mitigations for these issues using modern scheduling
+  algorithms, as well as implementations options for avoiding the
+  out-of-order queue at Exit relays. Additionally, we consider resumption,
+  side channel, and traffic analysis risks and benefits in [RESUMPTION],
+  [SIDE_CHANNELS] and [TRAFFIC_ANALYSIS].
 
 
 2. Design
 
-   The following section describes the Conflux design. Each sub-section is a
-   building block to the multipath design that Conflux proposes.
-
-   The circuit construction is as follow:
-
-          Primary Circuit (lower RTT)
-             +-------+      +--------+
-             |Guard 1|----->|Middle 1|----------+
-             +---^---+      +--------+          |
-    +-----+      |                           +--v---+
-    | OP  +------+                           | Exit |--> ...
-    +-----+      |                           +--^---+
-             +---v---+      +--------+          |
-             |Guard 2|----->|Middle 2|----------+
-             +-------+      +--------+
-          Secondary Circuit (higher RTT)
-
-   Both circuits are built using current Tor path selection, however they
-   SHOULD NOT share the same Guard relay, or middle relay. By avoiding
-   using the same relays in these positions in the path, we ensure
-   additional path capacity, and eliminate the need to use more complicated
-   'coupled' congestion control algorithms from the MPTCP literature[COUPLED].
-   This both simplifies design, and improves performance.
-
-   Then, the OP needs to link the two circuits together, as described in
-   [LINKING_CIRCUITS], [LINKING_EXIT], and [LINKING_SERVICE].
-
-   For ease of explanation, the primary circuit is the circuit with lower RTT,
-   and the secondary circuit is the circuit with higher RTT. Initial RTT
-   is measured during circuit linking, as described in [LINKING_CIRCUITS].
-   RTT is continually measured using SENDME timing, as in Proposal 324.
-   This means that during use, the primary circuit and secondary circuit may
-   switch roles, depending on unrelated network congestion caused by other
-   Tor clients.
-
-   We also support linking onion service circuits together. In this case,
-   only two rendezvous circuits are linked. Each of these RP circuits will be
-   constructed separately, and then linked. However, the same path constraints
-   apply to each half of the circuits (no shared relays between the legs).
-   Should, by chance, the service and the client sides end up sharing some
-   relays, this is not catastrophic. Multipath TCP researchers we have
-   consulted believe Tor's congestion control from Proposal 324 to be
-   sufficient in this rare case.
-
-   Only two circuits SHOULD be linked together. However, implementations
-   SHOULD make it easy for researchers to *test* more than two paths, as this
-   has been shown to assist in traffic analysis resistance[WTF_SPLIT]. At
-   minimum, this means not hardcoding only two circuits in the implementation.
-
-   If the number of circuits exceeds the current number of guard relays,
-   guard relays MAY be re-used, but implementations SHOULD use the same
-   number of Guards as paths.
-
-   Linked circuits MUST NOT be extended further once linked (ie:
-   'cannibalization' is not supported).
+  The following section describes the Conflux design. Each sub-section is
+  a building block to the multipath design that Conflux proposes.
+  
+  The circuit construction is as follow:
+  
+         Primary Circuit (lower RTT)
+            +-------+      +--------+
+            |Guard 1|----->|Middle 1|----------+
+            +---^---+      +--------+          |
+   +-----+      |                           +--v---+
+   | OP  +------+                           | Exit |--> ...
+   +-----+      |                           +--^---+
+            +---v---+      +--------+          |
+            |Guard 2|----->|Middle 2|----------+
+            +-------+      +--------+
+         Secondary Circuit (higher RTT)
+  
+  Both circuits are built using current Tor path selection, however they
+  SHOULD NOT share the same Guard relay, or middle relay. By avoiding
+  using the same relays in these positions in the path, we ensure
+  additional path capacity, and eliminate the need to use more complicated
+  'coupled' congestion control algorithms from the MPTCP
+  literature[COUPLED].  This both simplifies design, and improves
+  performance.
+  
+  Then, the OP needs to link the two circuits together, as described in
+  [LINKING_CIRCUITS], [LINKING_EXIT], and [LINKING_SERVICE].
+  
+  For ease of explanation, the primary circuit is the circuit with lower
+  RTT, and the secondary circuit is the circuit with higher RTT. Initial
+  RTT is measured during circuit linking, as described in
+  [LINKING_CIRCUITS].  RTT is continually measured using SENDME timing, as
+  in Proposal 324.  This means that during use, the primary circuit and
+  secondary circuit may switch roles, depending on unrelated network
+  congestion caused by other Tor clients.
+  
+  We also support linking onion service circuits together. In this case,
+  only two rendezvous circuits are linked. Each of these RP circuits will
+  be constructed separately, and then linked. However, the same path
+  constraints apply to each half of the circuits (no shared relays between
+  the legs).  Should, by chance, the service and the client sides end up
+  sharing some relays, this is not catastrophic. Multipath TCP researchers
+  we have consulted believe Tor's congestion control from Proposal 324 to
+  be sufficient in this rare case.
+  
+  Only two circuits SHOULD be linked together. However, implementations
+  SHOULD make it easy for researchers to *test* more than two paths, as
+  this has been shown to assist in traffic analysis resistance[WTF_SPLIT].
+  At minimum, this means not hardcoding only two circuits in the
+  implementation.
+  
+  If the number of circuits exceeds the current number of guard relays,
+  guard relays MAY be re-used, but implementations SHOULD use the same
+  number of Guards as paths.
+  
+  Linked circuits MUST NOT be extended further once linked (ie:
+  'cannibalization' is not supported).
 
 2.1. Advertising support for conflux
 
-   We propose a new protocol version in order to advertise support for
-   circuit linking on the relay side:
-
-      "Relay=4" -- Relay supports an 2 byte sequence number in a RELAY cell
-                   header used for multipath circuit which are linked with the
-                   new RELAY_CIRCUIT_LINK relay cell command.
+  We propose a new protocol version in order to advertise support for
+  circuit linking on the relay side:
+  
+     "Relay=4" -- Relay supports an 2 byte sequence number in a RELAY cell
+                  header used for multipath circuit which are linked with the
+                  new RELAY_CIRCUIT_LINK relay cell command.
+  
+  XXX: Advertise this in onion service descriptor.
+  XXX: Onion service descriptor can advertise more than two circuits?
 
-   XXX: Advertise this in onion service descriptor.
-   XXX: Onion service descriptor can advertise more than two circuits?
-
-   The next section describes how the circuits are linked together.
+  The next section describes how the circuits are linked together.
 
 2.2. Linking circuits [LINKING_CIRCUITS]
 
-   To link circuits, we propose new relay commands that are sent on both
-   circuits, as well as a response to confirm the join, and an ack of this
-   response. These commands create a 3way handshake, which allows each
-   endpoint to measure the initial RTT of each leg upon link, without
-   needing to wait for any data.
-
-   All three stages of this handshake are sent on *each* circuit leg to be
-   linked.
-
-   To save round trips, these cells SHOULD be combined with the initial
-   RELAY_BEGIN cell on the faster circuit leg, using Proposal 325. See
-   [LINKING_EXIT] and [LINKING_SERVICE] for more details on setup in each case.
-
-   There are other ways to do this linking that we have considered, but they
-   seem not to be significantly better than this method, especially since we
-   can use Proposal 325 to eliminate the RTT cost of this setup before sending
-   data. For those other ideas, see [ALTERNATIVE_LINKING] and
-   [ALTERNATIVE_RTT], in the appendix.
-
-   The first two parts of the handshake establish the link, and enable
-   resumption:
-
-      16 -- RELAY_CIRCUIT_LINK
-
-            Sent from the OP to the exit/service in order to link
-            circuits together at the end point.
-
-      17 -- RELAY_CIRCUIT_LINKED
-
-            Sent from the exit/service to the OP, to confirm the circuits
-            were linked.
-
-   These cells have the following contents:
-
-     VERSION   [1 byte]
-     PAYLOAD   [variable, up to end of relay payload]
-
-   The VERSION tells us which circuit linking mechanism to use. At this point
-   in time, only 0x01 is recognized and is the one described by the Conflux
-   design.
-
-   For version 0x01, the PAYLOAD contains:
-
-      NONCE              [32 bytes]
-      LAST_SEQNO_SENT    [8 bytes]
-      LAST_SEQNO_RECV    [8 bytes]
-
-   XXX: Should we let endpoints specify their preferred [SCHEDULING] alg
-   here, to override consensus params? This has benefits: eg low-memory
-   mobile clients can ask for an alg that is better for their reorder
-   queues. But it also has complexity risk, if the other endpoint does
-   not want to support it, because of its own memory issues.
-
-   The NONCE contains a random 256-bit secret, used to associate the two
-   circuits together. The nonce must not be shared outside of the circuit
-   transmission, or data may be injected into TCP streams. This means it
-   MUST NOT be logged to disk.
-
-   The two sequence number fields are 0 upon initial link, but non-zero in the
-   case of a resumption attempt (See [RESUMPTION]).
-
-   If either circuit does not receive a RELAY_CIRCUIT_LINKED response, both
-   circuits MUST be closed.
-
-   The third stage of the handshake exists to help the exit/service measure
-   initial RTT, for use in [SCHEDULING]:
-
-      18 -- RELAY_CIRCUIT_LINKED_RTT_ACK
-
-            Sent from the OP to the exit/service, to provide initial RTT
-            measurement for the exit/service.
-
-   For timeout of the handshake, clients should use the normal SOCKS/stream
-   timeout already in use for RELAY_BEGIN.
-
-   These three relay commands (RELAY_CIRCUIT_LINK, RELAY_CIRCUIT_LINKED,
-   and RELAY_CIRCUIT_LINKED_ACK) are send on *each* leg, to allow each
-   endpoint to measure the initial RTT of each leg.
+  To link circuits, we propose new relay commands that are sent on both
+  circuits, as well as a response to confirm the join, and an ack of this
+  response. These commands create a 3way handshake, which allows each
+  endpoint to measure the initial RTT of each leg upon link, without
+  needing to wait for any data.
+  
+  All three stages of this handshake are sent on *each* circuit leg to be
+  linked.
+  
+  To save round trips, these cells SHOULD be combined with the initial
+  RELAY_BEGIN cell on the faster circuit leg, using Proposal 325. See
+  [LINKING_EXIT] and [LINKING_SERVICE] for more details on setup in each
+  case.
+  
+  There are other ways to do this linking that we have considered, but
+  they seem not to be significantly better than this method, especially
+  since we can use Proposal 325 to eliminate the RTT cost of this setup
+  before sending data. For those other ideas, see [ALTERNATIVE_LINKING]
+  and [ALTERNATIVE_RTT], in the appendix.
+  
+  The first two parts of the handshake establish the link, and enable
+  resumption:
+  
+     16 -- RELAY_CIRCUIT_LINK
+  
+           Sent from the OP to the exit/service in order to link
+           circuits together at the end point.
+  
+     17 -- RELAY_CIRCUIT_LINKED
+  
+           Sent from the exit/service to the OP, to confirm the circuits
+           were linked.
+  
+  These cells have the following contents:
+  
+    VERSION   [1 byte]
+    PAYLOAD   [variable, up to end of relay payload]
+  
+  The VERSION tells us which circuit linking mechanism to use. At this
+  point in time, only 0x01 is recognized and is the one described by the
+  Conflux design.
+  
+  For version 0x01, the PAYLOAD contains:
+  
+     NONCE              [32 bytes]
+     LAST_SEQNO_SENT    [8 bytes]
+     LAST_SEQNO_RECV    [8 bytes]
+  
+  XXX: Should we let endpoints specify their preferred [SCHEDULING] alg
+  here, to override consensus params? This has benefits: eg low-memory
+  mobile clients can ask for an alg that is better for their reorder
+  queues. But it also has complexity risk, if the other endpoint does not
+  want to support it, because of its own memory issues.
+  
+  The NONCE contains a random 256-bit secret, used to associate the two
+  circuits together. The nonce must not be shared outside of the circuit
+  transmission, or data may be injected into TCP streams. This means it
+  MUST NOT be logged to disk.
+  
+  The two sequence number fields are 0 upon initial link, but non-zero in
+  the case of a resumption attempt (See [RESUMPTION]).
+  
+  If either circuit does not receive a RELAY_CIRCUIT_LINKED response, both
+  circuits MUST be closed.
+  
+  The third stage of the handshake exists to help the exit/service measure
+  initial RTT, for use in [SCHEDULING]:
+  
+     18 -- RELAY_CIRCUIT_LINKED_RTT_ACK
+  
+           Sent from the OP to the exit/service, to provide initial RTT
+           measurement for the exit/service.
+  
+  For timeout of the handshake, clients should use the normal SOCKS/stream
+  timeout already in use for RELAY_BEGIN.
+  
+  These three relay commands (RELAY_CIRCUIT_LINK, RELAY_CIRCUIT_LINKED,
+  and RELAY_CIRCUIT_LINKED_ACK) are send on *each* leg, to allow each
+  endpoint to measure the initial RTT of each leg.
 
 2.2. Linking Circuits from OP to Exit [LINKING_EXIT]
 
-   To link exit circuits, two circuits to the same exit are built. The
-   client records the circuit build time of each.
-
-   If the circuits are being built on-demand, for immediate use, the
-   circuit with the lower build time SHOULD use Proposal 325 to append its
-   first RELAY cell to the RELAY_COMMAND_LINK, on the circuit with the
-   lower circuit build time. The exit MUST respond on this same leg.
-   After that, actual RTT measurements MUST be used to determine
-   future transmissions, as specified in [SCHEDULING].
-
-   The RTT times between RELAY_COMMAND_LINK and RELAY_COMMAND_LINKED are
-   measured by the client, to determine each circuit RTT to determine
-   primary vs secondary circuit use, and for packet scheduling.  Similarly,
-   the exit measures the RTT times between RELAY_COMMAND_LINKED and
-   RELAY_COMMAND_LINKED_ACK, for the same purpose.
-
+  To link exit circuits, two circuits to the same exit are built. The
+  client records the circuit build time of each.
+  
+  If the circuits are being built on-demand, for immediate use, the
+  circuit with the lower build time SHOULD use Proposal 325 to append its
+  first RELAY cell to the RELAY_COMMAND_LINK, on the circuit with the
+  lower circuit build time. The exit MUST respond on this same leg.  After
+  that, actual RTT measurements MUST be used to determine future
+  transmissions, as specified in [SCHEDULING].
+  
+  The RTT times between RELAY_COMMAND_LINK and RELAY_COMMAND_LINKED are
+  measured by the client, to determine each circuit RTT to determine
+  primary vs secondary circuit use, and for packet scheduling.  Similarly,
+  the exit measures the RTT times between RELAY_COMMAND_LINKED and
+  RELAY_COMMAND_LINKED_ACK, for the same purpose.
+  
 2.3. Linking circuits to an onion service [LINKING_SERVICE]
-
-   For onion services, we will only concern ourselves with linking
-   rendezvous circuits.
-
-   To join rendezvous circuits, clients make two introduce requests to a
-   service's intropoint, causing it to create two rendezvous circuits, to
-   meet the client at two separate rendezvous points. These introduce
-   requests MUST be sent to the same intropoint (due to potential use of
-   onionbalance), and SHOULD be sent back-to-back on the same intro
-   circuit. They MAY be combined with Proposal 325.
-
-   The first rendezvous circuit to get joined SHOULD use Proposal 325
-   to append the RELAY_BEGIN command, and the service MUST answer
-   on this circuit, until RTT can be measured.
-
-   Once both circuits are linked and RTT is measured, packet scheduling
-   should be used, as per [SCHEDULING].
-
+  
+  For onion services, we will only concern ourselves with linking
+  rendezvous circuits.
+  
+  To join rendezvous circuits, clients make two introduce requests to a
+  service's intropoint, causing it to create two rendezvous circuits, to
+  meet the client at two separate rendezvous points. These introduce
+  requests MUST be sent to the same intropoint (due to potential use of
+  onionbalance), and SHOULD be sent back-to-back on the same intro
+  circuit. They MAY be combined with Proposal 325.
+  
+  The first rendezvous circuit to get joined SHOULD use Proposal 325 to
+  append the RELAY_BEGIN command, and the service MUST answer on this
+  circuit, until RTT can be measured.
+  
+  Once both circuits are linked and RTT is measured, packet scheduling
+  should be used, as per [SCHEDULING].
+  
 2.4. Congestion Control Application [CONGESTION_CONTROL]
-
-   The SENDMEs for congestion control are performed per-leg. As data arrives,
-   regardless of its ordering, it is counted towards SENDME delivery. In this
-   way, 'cwnd - package_window' of each leg always reflects the available
-   data to send on each leg. This is important for [SCHEDULING].
-
-   The Congestion control Stream XON/XOFF can be sent on either leg, and
-   applies to the stream's transmission on both legs.
-
+  
+  The SENDMEs for congestion control are performed per-leg. As data
+  arrives, regardless of its ordering, it is counted towards SENDME
+  delivery. In this way, 'cwnd - package_window' of each leg always
+  reflects the available data to send on each leg. This is important for
+  [SCHEDULING].
+  
+  The Congestion control Stream XON/XOFF can be sent on either leg, and
+  applies to the stream's transmission on both legs.
+  
 2.5. Sequencing [SEQUENCING]
-
-   With multiple paths for data, the problem of data re-ordering appears. In
-   other words, cells can arrive out of order from the two circuits where cell
-   N + 1 arrives before the cell N.
-
-   Handling this reordering operates after congestion control for each circuit
-   leg, but before relay cell command processing or stream data delivery.
-
-   For the receiver to be able to reorder the receiving cells, a sequencing
-   scheme needs to be implemented. However, because Tor does not drop or
-   reorder packets inside of a circuit, this sequence number can be very
-   small. It only has to signal that a cell comes after those arriving
-   on another circuit.
-
-   To achieve this, we add a small sequence number to the common relay
-   header for all relay cells on linked circuits. This sequence number
-   is meant to signal the number of cells sent on the *other* leg, so
-   that each endpoint knows how many cells are still in-flight on
-   another leg. It is different from the absolute sequence number used
-   in [LINKING_CIRCUITS] and [RESUMPTION], but can be derived from that
-   number, using relative arithmetic.
-
-     Relay command   [1 byte]
-     Recognized      [2 bytes]
-     StreamID        [2 bytes]
-     Digest          [4 bytes]
-     Length          [2 bytes]
-   > LongSeq         [1 bit]  # If this bit is set, use 31 bits for Seq
-   > Sequencing      [7 or 31 bits]
-     Data            [Remainder]
-
-   The sequence number is only set for the first cell after the endpoint
-   switches legs. In this case, LongSeq is set to 1, and the Sequencing
-   field is 31 more bits. Otherwise it is a 1 byte 0 value.
-
-   These fields MUST be present on ALL end-to-end relay cells on each leg
-   that come from the endpoint, following a RELAY_CIRCUIT_LINK command.
-
-   They are absent on 'leaky pipe' RELAY_COMMAND_DROP and
-   RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays,
-   as opposed to the endpoint, to support padding.
-
-   When an endpoint switches legs, on the first cell in a new leg,
-   LongSeq is set to 1, and the following 31 bits represent the *total*
-   number of cells sent on the *other* leg, before the switch. The receiver
-   must wait for that number of cells to arrive from the previous leg
-   before delivering that cell.
-
-   XXX: In the rare event that we send more than 2^31 cells (~1TB) on a
-   single leg, do we force a switch of legs, or expand the field further?
-
-   An alternative method of sequencing, that assumes that the endpoint
-   knows when it is going to switch, the cell before it switches, is
-   specified in [ALTERNATIVE_SEQUENCING]. Note that that method requires
-   only 1 byte for sequence number and switch signaling, but requires that
-   the sender know that it is planning to switch, the cell before it switches.
-   (This is possible with [BLEST_TOR], but [LOWRTT_TOR] can switch based on
-   RTT change, so it may be one cell late in that case).
+  
+  With multiple paths for data, the problem of data re-ordering appears.
+  In other words, cells can arrive out of order from the two circuits
+  where cell N + 1 arrives before the cell N.
+  
+  Handling this reordering operates after congestion control for each
+  circuit leg, but before relay cell command processing or stream data
+  delivery.
+  
+  For the receiver to be able to reorder the receiving cells, a sequencing
+  scheme needs to be implemented. However, because Tor does not drop or
+  reorder packets inside of a circuit, this sequence number can be very
+  small. It only has to signal that a cell comes after those arriving on
+  another circuit.
+  
+  To achieve this, we add a small sequence number to the common relay
+  header for all relay cells on linked circuits. This sequence number is
+  meant to signal the number of cells sent on the *other* leg, so that
+  each endpoint knows how many cells are still in-flight on another leg.
+  It is different from the absolute sequence number used in
+  [LINKING_CIRCUITS] and [RESUMPTION], but can be derived from that
+  number, using relative arithmetic.
+  
+    Relay command   [1 byte]
+    Recognized      [2 bytes]
+    StreamID        [2 bytes]
+    Digest          [4 bytes]
+    Length          [2 bytes]
+  > LongSeq         [1 bit]  # If this bit is set, use 31 bits for Seq
+  > Sequencing      [7 or 31 bits]
+    Data            [Remainder]
+  
+  The sequence number is only set for the first cell after the endpoint
+  switches legs. In this case, LongSeq is set to 1, and the Sequencing
+  field is 31 more bits. Otherwise it is a 1 byte 0 value.
+  
+  These fields MUST be present on ALL end-to-end relay cells on each leg
+  that come from the endpoint, following a RELAY_CIRCUIT_LINK command.
+  
+  They are absent on 'leaky pipe' RELAY_COMMAND_DROP and
+  RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, as
+  opposed to the endpoint, to support padding.
+  
+  When an endpoint switches legs, on the first cell in a new leg, LongSeq
+  is set to 1, and the following 31 bits represent the *total* number of
+  cells sent on the *other* leg, before the switch. The receiver must wait
+  for that number of cells to arrive from the previous leg before
+  delivering that cell.
+  
+  XXX: In the rare event that we send more than 2^31 cells (~1TB) on a
+  single leg, do we force a switch of legs, or expand the field further?
+  
+  An alternative method of sequencing, that assumes that the endpoint
+  knows when it is going to switch, the cell before it switches, is
+  specified in [ALTERNATIVE_SEQUENCING]. Note that that method requires
+  only 1 byte for sequence number and switch signaling, but requires that
+  the sender know that it is planning to switch, the cell before it
+  switches.  (This is possible with [BLEST_TOR], but [LOWRTT_TOR] can
+  switch based on RTT change, so it may be one cell late in that case).
 
 2.6. Resumption [RESUMPTION]
 
-   In the event that a circuit leg is destroyed, they MAY be resumed.
-
-   Resumption is achieved by re-using the NONCE and method to the same endpoint
-   (either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path need not
-   use the same middle and guard relays, but should not share any relays
-   with any existing legs(s).
-
-   To provide resumption, endpoints store an absolute 64bit cell counter of
-   the last cell they have sent on a conflux pair (their LAST_SEQNO_SENT),
-   as well the last sequence number they have delivered in-order to edge
-   connections corresponding to a conflux pair (their LAST_SEQNO_RECV).
-   Additionally, endpoints MAY store the entire contents of unacked
-   inflight cells (ie the 'package_window' from proposal 324), for each
-   leg, along with information corresponding to those cells' absolute
-   sequence numbers.
-
-   These 64 bit absolute counters can wrap without issue, as congestion
-   windows will never grow to 2^64 cells until well past the Singularity.
-   However, it is possible that extremely long, bulk circuits could
-   exceed 2^64 total sent or received cells, so endpoints SHOULD handle
-   wrapped sequence numbers for purposes of computing retransmit
-   information. (But even this case is unlikely to happen within the next
-   decade or so).
-
-   Upon resumption, the LAST_SEQNO_SENT and LAST_SEQNO_RECV fields are used to
-   convey the sequence numbers of the last cell the relay sent and received on
-   that leg. The other endpoint can use these sequence numbers to determine if
-   it received the in-flight data or not, or sent more data since that point,
-   up to and including this absolute sequence number. If LAST_SEQNO_SENT
-   has not been received, the endpoint MAY transmit the missing data, if it
-   still has it buffered.
-
-   Because both endpoints get information about the other side's absolute
-   SENT sequence number, they will know exactly how many re-transmitted
-   packets to expect, should the circuit stay open. Re-transmitters
-   should not re-increment their absolute sent fields while re-transmitting.
-   
-   If it does not have this missing data due to memory pressure, that endpoint
-   should destroy *both* legs, as this represents unrecoverable data loss.
-
-   Otherwise, the new circuit can be re-joined, and its RTT can be compared
-   to the remaining circuit to determine if the new leg is primary or
-   secondary.
-
-   It is even possible to resume conflux circuits where both legs have
-   been collapsed using this scheme, if endpoints continue to buffer their
-   unacked package_window data for some time after this close. However,
-   see [TRAFFIC_ANALYSIS] for more details on the full scope of this
-   issue.
-
-   If endpoints are buffering package_window data, such data should be
-   given priority to be freed in any oomkiller invocation. See
-   [MEMORY_DOS] for more oomkiller information.
+  In the event that a circuit leg is destroyed, they MAY be resumed.
+  
+  Resumption is achieved by re-using the NONCE and method to the same
+  endpoint (either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path
+  need not use the same middle and guard relays, but should not share any
+  relays with any existing legs(s).
+  
+  To provide resumption, endpoints store an absolute 64bit cell counter of
+  the last cell they have sent on a conflux pair (their LAST_SEQNO_SENT),
+  as well the last sequence number they have delivered in-order to edge
+  connections corresponding to a conflux pair (their LAST_SEQNO_RECV).
+  Additionally, endpoints MAY store the entire contents of unacked
+  inflight cells (ie the 'package_window' from proposal 324), for each
+  leg, along with information corresponding to those cells' absolute
+  sequence numbers.
+  
+  These 64 bit absolute counters can wrap without issue, as congestion
+  windows will never grow to 2^64 cells until well past the Singularity.
+  However, it is possible that extremely long, bulk circuits could exceed
+  2^64 total sent or received cells, so endpoints SHOULD handle wrapped
+  sequence numbers for purposes of computing retransmit information. (But
+  even this case is unlikely to happen within the next decade or so).
+  
+  Upon resumption, the LAST_SEQNO_SENT and LAST_SEQNO_RECV fields are used
+  to convey the sequence numbers of the last cell the relay sent and
+  received on that leg. The other endpoint can use these sequence numbers
+  to determine if it received the in-flight data or not, or sent more data
+  since that point, up to and including this absolute sequence number. If
+  LAST_SEQNO_SENT has not been received, the endpoint MAY transmit the
+  missing data, if it still has it buffered.
+  
+  Because both endpoints get information about the other side's absolute
+  SENT sequence number, they will know exactly how many re-transmitted
+  packets to expect, should the circuit stay open. Re-transmitters should
+  not re-increment their absolute sent fields while re-transmitting.
+  
+  If it does not have this missing data due to memory pressure, that
+  endpoint should destroy *both* legs, as this represents unrecoverable
+  data loss.
+  
+  Otherwise, the new circuit can be re-joined, and its RTT can be compared
+  to the remaining circuit to determine if the new leg is primary or
+  secondary.
+  
+  It is even possible to resume conflux circuits where both legs have been
+  collapsed using this scheme, if endpoints continue to buffer their
+  unacked package_window data for some time after this close. However, see
+  [TRAFFIC_ANALYSIS] for more details on the full scope of this issue.
+  
+  If endpoints are buffering package_window data, such data should be
+  given priority to be freed in any oomkiller invocation. See [MEMORY_DOS]
+  for more oomkiller information.
 
 
 3. Traffic Scheduling [SCHEDULING]
 
-   In order to load balance the traffic between the two circuits, the original
-   conflux paper used only RTT. However, with Proposal 324, we will have
-   accurate information on the instantaneous available bandwidth of each
-   circuit leg, as 'cwnd - package_window' (see Section 3 of Proposal 324).
-
-   Some additional RTT optimizations are also useful, to improve
-   responsiveness and minimize out-of-order queue sizes.
-
-   We specify two traffic schedulers from the multipath literature and adapt
-   them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has three
-   variants, with different trade offs.
-
-   However, see the [TRAFFIC_ANALYSIS] sections of this proposal for important
-   details on how this selection can be changed, to reduce website traffic
-   fingerprinting.
+  In order to load balance the traffic between the two circuits, the
+  original conflux paper used only RTT. However, with Proposal 324, we
+  will have accurate information on the instantaneous available bandwidth
+  of each circuit leg, as 'cwnd - package_window' (see Section 3 of
+  Proposal 324).
+  
+  Some additional RTT optimizations are also useful, to improve
+  responsiveness and minimize out-of-order queue sizes.
+  
+  We specify two traffic schedulers from the multipath literature and
+  adapt them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has
+  three variants, with different trade offs.
+  
+  However, see the [TRAFFIC_ANALYSIS] sections of this proposal for
+  important details on how this selection can be changed, to reduce
+  website traffic fingerprinting.
 
 3.1. LowRTT Scheduling [LOWRTT_TOR]
 
-   This scheduling algorithm is based on the original [CONFLUX] paper, with
-   ideas from [MPTCP]'s minRTT/LowRTT scheduler.
-
-   In this algorithm, endpoints send cells on the circuit with lower RTT
-   (primary circuit). This continues while the congestion window on the
-   circuit has available room: ie whenever cwnd - package_window > 0.
-
-   Whenever the primary circuit's congestion window becomes full, the
-   secondary circuit is used. We stop reading on the send window source
-   (edge connection) when both congestion windows become full.
-
-   In this way, unlike original conflux, we switch to the secondary circuit
-   without causing congestion on the primary circuit. This improves both
-   load times, and overall throughput.
-
-   This behavior matches minRTT from [MPTCP], sometimes called LowRTT.
-
-   It may be better to stop reading on the edge connection when the primary
-   congestion window becomes full, rather than switch to the secondary
-   circuit as soon as the primary congestion window becomes full. (Ie: only
-   switch if the RTTs themselves change which circuit is primary). This is
-   what was done in the original Conflux paper. This behavior effectively
-   causes us to optimize for responsiveness and congestion avoidance, rather
-   than throughput. For evaluation, we should control this switching behavior
-   with a consensus parameter (see [CONSENSUS_PARAMETERS]).
-
-   Because of potential side channel risk (see [SIDE_CHANNELS]), a third
-   variant of this algorithm, where the primary circuit is chosen during the
-   [LINKING_CIRCUITS] handshake and never changed, should also be possible
-   to control via consensus parameter.
+  This scheduling algorithm is based on the original [CONFLUX] paper, with
+  ideas from [MPTCP]'s minRTT/LowRTT scheduler.
+  
+  In this algorithm, endpoints send cells on the circuit with lower RTT
+  (primary circuit). This continues while the congestion window on the
+  circuit has available room: ie whenever cwnd - package_window > 0.
+  
+  Whenever the primary circuit's congestion window becomes full, the
+  secondary circuit is used. We stop reading on the send window source
+  (edge connection) when both congestion windows become full.
+  
+  In this way, unlike original conflux, we switch to the secondary circuit
+  without causing congestion on the primary circuit. This improves both
+  load times, and overall throughput.
+  
+  This behavior matches minRTT from [MPTCP], sometimes called LowRTT.
+  
+  It may be better to stop reading on the edge connection when the primary
+  congestion window becomes full, rather than switch to the secondary
+  circuit as soon as the primary congestion window becomes full. (Ie: only
+  switch if the RTTs themselves change which circuit is primary). This is
+  what was done in the original Conflux paper. This behavior effectively
+  causes us to optimize for responsiveness and congestion avoidance,
+  rather than throughput. For evaluation, we should control this switching
+  behavior with a consensus parameter (see [CONSENSUS_PARAMETERS]).
+  
+  Because of potential side channel risk (see [SIDE_CHANNELS]), a third
+  variant of this algorithm, where the primary circuit is chosen during
+  the [LINKING_CIRCUITS] handshake and never changed, should also be
+  possible to control via consensus parameter.
 
 3.2. BLEST Scheduling [BLEST_TOR]
 
-   [BLEST] attempts to predict the availability of the primary circuit, and
-   use this information to reorder transmitted data, to minimize head-of-line
-   blocking in the recipient (and thus minimize out-of-order queues there).
-
-   BLEST_TOR uses the primary circuit until the congestion window is full.
-   Then, it uses the relative RTT times of the two circuits to calculate
-   how much data can be sent on the secondary circuit faster than if we
-   just waited for the primary circuit to become available. 
-
-   This is achieved by computing two variables at the sender:   
-   
-     rtts = secondary.currRTT / primary.currRTT
-     primary_limit = primary.cwnd + (rtts-1)/2)*rtts
-
-   Note: This (rtts-1)/2 factor represents anticipated congestion window
-   growth over this period.. it may be different for Tor, depending on CC alg.
-
-   If primary_limit < secondary.cwnd - (secondary.package_window + 1), then
-   there is enough space on the secondary circuit to send data faster than we
-   could than waiting for the primary circuit.
-   
-   XXX: Note that BLEST uses total_send_window where we use secondary.cwnd in
-   this check. total_send_window is min(recv_win, CWND). But since Tor does
-   not use receive windows and intead uses stream XON/XOFF, we only use CWND. There
-   is some concern this may alter BLEST's buffer minimization properties,
-   but since receive window should only matter if the application is slower
-   than Tor, and XON/XOFF should cover that case, hopefully this is fine.
-
-   Otherwise, if the primary_limit condition is not hit, cease reading 
-   on source edge connections until SENDME acks come back.
-
-   Here is the pseudocode for this:
-
-     while source.has_data_to_send():
-       if primary.cwnd > primary.package_window:
-         primary.send(source.get_packet())
-         continue
-     
-       rtts = secondary.currRTT / primary.currRTT
-       primary_limit = (primary.cwnd + (rtts-1)/2)*rtts
-     
-       if primary_limit < secondary.cwnd - (secondary.package_window+1):
-         secondary.send(source.get_packet())
-       else:
-         break # done for now, wait for an ACK to free up CWND space and restart
-
-   Note that BLEST also has a parameter lambda that is updated whenever HoL
-   blocking occurs. Because it is expensive and takes significant time to
-   signal this over Tor, we omit this.
-   XXX: See [REORDER_SIGNALING] section if we want this lambda feedback.
+  [BLEST] attempts to predict the availability of the primary circuit, and
+  use this information to reorder transmitted data, to minimize
+  head-of-line blocking in the recipient (and thus minimize out-of-order
+  queues there).
+  
+  BLEST_TOR uses the primary circuit until the congestion window is full.
+  Then, it uses the relative RTT times of the two circuits to calculate
+  how much data can be sent on the secondary circuit faster than if we
+  just waited for the primary circuit to become available. 
+  
+  This is achieved by computing two variables at the sender:   
+  
+    rtts = secondary.currRTT / primary.currRTT
+    primary_limit = primary.cwnd + (rtts-1)/2)*rtts
+  
+  Note: This (rtts-1)/2 factor represents anticipated congestion window
+  growth over this period.. it may be different for Tor, depending on CC
+  alg.
+  
+  If primary_limit < secondary.cwnd - (secondary.package_window + 1), then
+  there is enough space on the secondary circuit to send data faster than
+  we could than waiting for the primary circuit.
+  
+  XXX: Note that BLEST uses total_send_window where we use secondary.cwnd
+  in this check. total_send_window is min(recv_win, CWND). But since Tor
+  does not use receive windows and intead uses stream XON/XOFF, we only
+  use CWND. There is some concern this may alter BLEST's buffer
+  minimization properties, but since receive window should only matter if
+  the application is slower than Tor, and XON/XOFF should cover that case,
+  hopefully this is fine.
+  
+  Otherwise, if the primary_limit condition is not hit, cease reading on
+  source edge connections until SENDME acks come back.
+  
+  Here is the pseudocode for this:
+  
+    while source.has_data_to_send():
+      if primary.cwnd > primary.package_window:
+        primary.send(source.get_packet())
+        continue
+    
+      rtts = secondary.currRTT / primary.currRTT
+      primary_limit = (primary.cwnd + (rtts-1)/2)*rtts
+    
+      if primary_limit < secondary.cwnd - (secondary.package_window+1):
+        secondary.send(source.get_packet())
+      else:
+        break # done for now, wait for SENDME to free up CWND and restart
+  
+  Note that BLEST also has a parameter lambda that is updated whenever HoL
+  blocking occurs. Because it is expensive and takes significant time to
+  signal this over Tor, we omit this.
+  
+  XXX: See [REORDER_SIGNALING] section if we want this lambda feedback.
 
 3.3. Reorder queue signaling [REORDER_SIGNALING]
 
-   Reordering should be fairly simple task. By following using the sequence
-   number field in [SEQUENCING], endpoints can know how many cells are still
-   in flight on the other leg.
-
-   To reorder them properly, a buffer of out of order cells needs to be kept.
-   On the Exit side, this can quickly become overwhelming considering ten of
-   thousands of possible circuits can be held open leading to gigabytes of
-   memory being used. There is a clear potential memory DoS vector which means
-   that a tor implementation should be able to limit the size of those queues.
-
-   Luckily, [BLEST_TOR] and the form of [LOWRTT_TOR] that only uses the
-   primary circuit will minimize or eliminate this out-of-order buffer.
-
-   XXX: The remainder of this section may be over-complicating things... We
-   only need these concepts if we want to use BLEST's lambda feedback.
-
-   The default for this queue size is governed by the 'cflx_reorder_client'
-   and 'cflx_reorder_srv' consensus parameters (see [CONSENSUS_PARAMS]).
-   'cflx_reorder_srv' applies to Exits and onion services. Both parameters
-   can be overridden by Torrc, to larger or smaller than the consensus
-   parameter. (Low memory clients may want to lower it; SecureDrop onion
-   services or other high-upload services may want to raise it).
-
-   When the reorder queue hits this size, a RELAY_CONFLUX_XOFF is sent down
-   the circuit leg that has data waiting in the queue and use of that leg must
-   cease, until it drains to half of this value, at which point an
-   RELAY_CONFLUX_XON is sent. Note that this is different than the stream
-   XON/XOFF from Proposal 324.
-
-   XXX: [BLEST] actually does not cease use of a path in this case, but
-   instead uses this signal to adjust the lambda parameter, which biases
-   traffic away from that leg.
+  Reordering should be fairly simple task. By following using the sequence
+  number field in [SEQUENCING], endpoints can know how many cells are
+  still in flight on the other leg.
+  
+  To reorder them properly, a buffer of out of order cells needs to be
+  kept.  On the Exit side, this can quickly become overwhelming
+  considering ten of thousands of possible circuits can be held open
+  leading to gigabytes of memory being used. There is a clear potential
+  memory DoS vector which means that a tor implementation should be able
+  to limit the size of those queues.
+  
+  Luckily, [BLEST_TOR] and the form of [LOWRTT_TOR] that only uses the
+  primary circuit will minimize or eliminate this out-of-order buffer.
+  
+  XXX: The remainder of this section may be over-complicating things... We
+  only need these concepts if we want to use BLEST's lambda feedback.
+  
+  The default for this queue size is governed by the 'cflx_reorder_client'
+  and 'cflx_reorder_srv' consensus parameters (see [CONSENSUS_PARAMS]).
+  'cflx_reorder_srv' applies to Exits and onion services. Both parameters
+  can be overridden by Torrc, to larger or smaller than the consensus
+  parameter. (Low memory clients may want to lower it; SecureDrop onion
+  services or other high-upload services may want to raise it).
+  
+  When the reorder queue hits this size, a RELAY_CONFLUX_XOFF is sent down
+  the circuit leg that has data waiting in the queue and use of that leg
+  must cease, until it drains to half of this value, at which point an
+  RELAY_CONFLUX_XON is sent. Note that this is different than the stream
+  XON/XOFF from Proposal 324.
+  
+  XXX: [BLEST] actually does not cease use of a path in this case, but
+  instead uses this signal to adjust the lambda parameter, which biases
+  traffic away from that leg.
 
 
 4. Security Considerations
 
 4.1. Memory Denial of Service [MEMORY_DOS]
 
-   Both reorder queues and retransmit buffers inherently represent a memory
-   denial of service condition.
-
-   For [RESUMPTION] retransmit buffers, endpoints that support this
-   feature SHOULD free retransmit information as soon as they get close
-   to memory pressure. This prevents resumption while data is in flight,
-   but will not otherwise harm operation.
-
-   For reorder buffers, adversaries can potentially impact this at any
-   point, but most obviously and most severely from the client position.
-
-   In particular, clients can lie about sequence numbers, sending cells
-   with sequence numbers such that the next expected sequence number
-   is never sent. They can do this repeatedly on many circuits, to exhaust
-   memory at exits.
-
-   One option is to only allow actual traffic splitting in the downstream
-   direction, towards clients, and always use the primary circuit for
-   everything in the upstream direction. However, the ability to support
-   conflux from the client to the exit shows promise against traffic
-   analysis (see [WTF_SPLIT]).
-
-   The other option is to use [BLEST_TOR] from clients to exits, as it has
-   predictable interleaved cell scheduling, and minimizes reorder queues
-   at exits. If the ratios prescribed by that algorithm are not followed
-   within some bounds, the other endpoint can close both circuits, and
-   free the queue memory.
-
-   This still leaves the possibility that intermediate relays may block
-   a leg, allowing cells to traverse only one leg, thus still accumulating
-   at the reorder queue. Clients can also spoof sequence numbers similarly,
-   to make it appear that they are following [BLEST_TOR], without
-   actually sending any data on one of the legs.
-
-   To handle either of these cases, when a relay is under memory pressure,
-   the circuit OOM killer SHOULD free and close circuits with the oldest
-   reorder queue data, first. This heuristic was shown to be best during
-   the [SNIPER] attack OOM killer iteration cycle.
+  Both reorder queues and retransmit buffers inherently represent a memory
+  denial of service condition.
+  
+  For [RESUMPTION] retransmit buffers, endpoints that support this feature
+  SHOULD free retransmit information as soon as they get close to memory
+  pressure. This prevents resumption while data is in flight, but will not
+  otherwise harm operation.
+  
+  For reorder buffers, adversaries can potentially impact this at any
+  point, but most obviously and most severely from the client position.
+  
+  In particular, clients can lie about sequence numbers, sending cells
+  with sequence numbers such that the next expected sequence number is
+  never sent. They can do this repeatedly on many circuits, to exhaust
+  memory at exits.
+  
+  One option is to only allow actual traffic splitting in the downstream
+  direction, towards clients, and always use the primary circuit for
+  everything in the upstream direction. However, the ability to support
+  conflux from the client to the exit shows promise against traffic
+  analysis (see [WTF_SPLIT]).
+  
+  The other option is to use [BLEST_TOR] from clients to exits, as it has
+  predictable interleaved cell scheduling, and minimizes reorder queues at
+  exits. If the ratios prescribed by that algorithm are not followed
+  within some bounds, the other endpoint can close both circuits, and free
+  the queue memory.
+  
+  This still leaves the possibility that intermediate relays may block a
+  leg, allowing cells to traverse only one leg, thus still accumulating at
+  the reorder queue. Clients can also spoof sequence numbers similarly, to
+  make it appear that they are following [BLEST_TOR], without actually
+  sending any data on one of the legs.
+  
+  To handle either of these cases, when a relay is under memory pressure,
+  the circuit OOM killer SHOULD free and close circuits with the oldest
+  reorder queue data, first. This heuristic was shown to be best during
+  the [SNIPER] attack OOM killer iteration cycle.
 
 4.2. Side Channels [SIDE_CHANNELS]
 
-   Two potential side channels may be introduced by the use of Conflux:
-      1. RTT leg-use bias by altering SENDME latency
-      2. Location info leaks through the use of both leg's latencies
-
-   For RTT and leg-use bias, Guard relays could delay legs to introduce
-   a pattern into the delivery of cells at the exit relay, by varying the
-   latency of SENDME cells (every 100th cell) to change the distribution
-   of traffic to send information. This attack could be performed in either
-   direction of traffic, to bias traffic load off of a particular Guard.
-   If an adversary controls both Guards, it could in theory send a binary
-   signal more easily, by alternating delays on each.
-
-   However, this risk weighs against the potential benefits against traffic
-   fingerprinting, as per [WTF_SPLIT]. Additionally, even ignoring
-   cryptographic tagging attacks, this side channel provides significantly
-   lower information over time than inter-packet-delay based side channels
-   that are already available to Guards and routers along the path to the
-   Guard.
-
-   Tor currently provides no defenses against already existing
-   single-circuit delay-based side channels, though both circuit padding
-   and [BACKLIT] are potential options it could conceivably deploy. The
-   [BACKLIT] paper also has an excellent review of the various methods
-   that have been studied for such single circuit side channels, and
-   the [BACKLIT] style RTT monitoring could be used to protect against
-   these conflux side channels as well. Circuit padding can also help
-   to obscure which cells are SENDMEs, since circuit padding is not
-   counted towards SENDME totals.
-
-   The second class of side channel is where the Exit relay may be able
-   to use the two legs to further infer more information about client
-   location. See [LATENCY_LEAK] for more details. It is unclear at
-   this time how much more severe this is for two paths than just one.
-
-   We should preserve the ability to disable conflux to and from Exit
-   relays, should these side channels prove more severe, or should
-   it prove possible to mitigate single-circuit side channels, but
-   not conflux side channels.
-
-   In all cases, all of these side channels appear less severe for onion
-   service traffic, due to the higher path variability due to relay
-   selection, as well as the end-to-end nature of conflux in that case.
-   This indicates that our ability to enable/disable conflux for services
-   should be separate from Exits.
+  Two potential side channels may be introduced by the use of Conflux:
+     1. RTT leg-use bias by altering SENDME latency
+     2. Location info leaks through the use of both leg's latencies
+  
+  For RTT and leg-use bias, Guard relays could delay legs to introduce a
+  pattern into the delivery of cells at the exit relay, by varying the
+  latency of SENDME cells (every 100th cell) to change the distribution of
+  traffic to send information. This attack could be performed in either
+  direction of traffic, to bias traffic load off of a particular Guard.
+  If an adversary controls both Guards, it could in theory send a binary
+  signal more easily, by alternating delays on each.
+  
+  However, this risk weighs against the potential benefits against traffic
+  fingerprinting, as per [WTF_SPLIT]. Additionally, even ignoring
+  cryptographic tagging attacks, this side channel provides significantly
+  lower information over time than inter-packet-delay based side channels
+  that are already available to Guards and routers along the path to the
+  Guard.
+  
+  Tor currently provides no defenses against already existing
+  single-circuit delay-based side channels, though both circuit padding
+  and [BACKLIT] are potential options it could conceivably deploy. The
+  [BACKLIT] paper also has an excellent review of the various methods that
+  have been studied for such single circuit side channels, and the
+  [BACKLIT] style RTT monitoring could be used to protect against these
+  conflux side channels as well. Circuit padding can also help to obscure
+  which cells are SENDMEs, since circuit padding is not counted towards
+  SENDME totals.
+  
+  The second class of side channel is where the Exit relay may be able to
+  use the two legs to further infer more information about client
+  location. See [LATENCY_LEAK] for more details. It is unclear at this
+  time how much more severe this is for two paths than just one.
+  
+  We should preserve the ability to disable conflux to and from Exit
+  relays, should these side channels prove more severe, or should it prove
+  possible to mitigate single-circuit side channels, but not conflux side
+  channels.
+  
+  In all cases, all of these side channels appear less severe for onion
+  service traffic, due to the higher path variability due to relay
+  selection, as well as the end-to-end nature of conflux in that case.
+  This indicates that our ability to enable/disable conflux for services
+  should be separate from Exits.
 
 4.3. Traffic analysis [TRAFFIC_ANALYSIS]
 
-   Even though conflux shows benefits against traffic analysis in [WTF_SPLIT],
-   these gains may be moot if the adversary is able to perform packet counting
-   and timing analysis at guards to guess which specific circuits are linked.
-   In particular, the 3 way handshake in [LINKING_CIRCUITS] may be quite
-   noticeable.
-
-   As one countermeasure, it may be possible to eliminate the third leg
-   (RELAY_CIRCUIT_LINKED_ACK) by computing the exit/service RTT via measuring
-   the time between CREATED/REND_JOINED and RELAY_CIRCUIT_LINK, but this
-   will introduce cross-component complexity into Tor's protocol that
-   could quickly become unwieldy and fragile.
-
-   Additionally, the conflux handshake may make onion services stand out
-   more, regardless of the number of stages in the handshake. For this
-   reason, it may be more wise to simply address these issues with circuit
-   padding machines during circuit setup (see padding-spec.txt).
-
-   Additional traffic analysis considerations arise when combining conflux
-   with padding, for purposes of mitigating traffic fingerprinting. For this,
-   it seems wise to treat the packet schedulers as another piece of a combined
-   optimization problem in tandem with optimizing padding machines, perhaps
-   introducing randomness or fudge factors their scheduling, as a parameterized
-   distribution. For details, see
-   https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDevelopment.md
-
-   Finally, conflux may exacerbate forms of confirmation-based traffic
-   analysis that close circuits to determine concretely if they were in use,
-   since closing either leg might cause resumption to fail. TCP RST
-   injection can perform this attack on the side, without surveillance
-   capability. [RESUMPTION] with buffering of the inflight unacked
-   package_window data, for retransmit, is a partial mitigation, if
-   endpoints buffer this data for retransmission for a brief time even
-   if both legs close. This seems more feasible for onion services,
-   which are more vulnerable to this attack. However, if the adversary
-   controls the client, they will notice the resumption re-link, and
-   still obtain confirmation that way.
-
-   It seems the only way to fully mitigate these kinds of attacks is with
-   the Snowflake pluggable transport, which provides its own resumption
-   and retransmit behavior. Additionally, Snowflake's use of UDP DTLS also
-   protects against TCP RST injection, which we suspect to be the main
-   vector for such attacks.
-
-   In the future, a DTLS or QUIC transport for Tor such as masque could
-   provide similar RST injection resistance, and resumption at
-   Guard/Bridge nodes, as well.
+  Even though conflux shows benefits against traffic analysis in
+  [WTF_SPLIT], these gains may be moot if the adversary is able to perform
+  packet counting and timing analysis at guards to guess which specific
+  circuits are linked.  In particular, the 3 way handshake in
+  [LINKING_CIRCUITS] may be quite noticeable.
+  
+  As one countermeasure, it may be possible to eliminate the third leg
+  (RELAY_CIRCUIT_LINKED_ACK) by computing the exit/service RTT via
+  measuring the time between CREATED/REND_JOINED and RELAY_CIRCUIT_LINK,
+  but this will introduce cross-component complexity into Tor's protocol
+  that could quickly become unwieldy and fragile.
+  
+  Additionally, the conflux handshake may make onion services stand out
+  more, regardless of the number of stages in the handshake. For this
+  reason, it may be more wise to simply address these issues with circuit
+  padding machines during circuit setup (see padding-spec.txt).
+  
+  Additional traffic analysis considerations arise when combining conflux
+  with padding, for purposes of mitigating traffic fingerprinting. For
+  this, it seems wise to treat the packet schedulers as another piece of a
+  combined optimization problem in tandem with optimizing padding
+  machines, perhaps introducing randomness or fudge factors their
+  scheduling, as a parameterized distribution. For details, see
+  https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDevelopment.md
+  
+  Finally, conflux may exacerbate forms of confirmation-based traffic
+  analysis that close circuits to determine concretely if they were in
+  use, since closing either leg might cause resumption to fail. TCP RST
+  injection can perform this attack on the side, without surveillance
+  capability. [RESUMPTION] with buffering of the inflight unacked
+  package_window data, for retransmit, is a partial mitigation, if
+  endpoints buffer this data for retransmission for a brief time even if
+  both legs close. This seems more feasible for onion services, which are
+  more vulnerable to this attack. However, if the adversary controls the
+  client, they will notice the resumption re-link, and still obtain
+  confirmation that way.
+  
+  It seems the only way to fully mitigate these kinds of attacks is with
+  the Snowflake pluggable transport, which provides its own resumption and
+  retransmit behavior. Additionally, Snowflake's use of UDP DTLS also
+  protects against TCP RST injection, which we suspect to be the main
+  vector for such attacks.
+  
+  In the future, a DTLS or QUIC transport for Tor such as masque could
+  provide similar RST injection resistance, and resumption at Guard/Bridge
+  nodes, as well.
 
 
 5. System Interactions
@@ -665,7 +676,7 @@ Status: Draft
   - EWMA and KIST
   - CBT and number of guards
   - Onion service circ obfuscation
-  - Future UDP (it may increase the need for UDP to buffer before dropping)
+  - Future UDP (may increase need for UDP to buffer before dropping)
   - Padding (no sequence numbers on padding cells, as per [SEQUENCING])
     - Also, any padding machines may need re-tuning
   - No 'cannibalization' of linked circuits
@@ -697,116 +708,116 @@ Appended A [ALTERNATIVES]
 
 A.1 BEGIN/END sequencing [ALTERNATIVE_SEQUENCING]
 
-   In this method of signaling, we increment the sequence number by 1
-   only when we switch legs, and use BEGIN/END "bookends" to know that
-   all data on a leg has been received.
-
-   To achieve this, we add a small sequence number to the common relay
-   header for all relay cells on linked circuits, as well as a field to
-   signal the beginning of a sequence, intermediate data, and the end
-   of a sequence.
-
-     Relay command   [1 byte]
-     Recognized      [2 bytes]
-     StreamID        [2 bytes]
-     Digest          [4 bytes]
-     Length          [2 bytes]
-   > Switching       [2 bits]    # 01 = BEGIN, 00 = CONTINUE, 10 = END
-   > Sequencing      [6 bits]
-     Data            [PAYLOAD_LEN - 12 - Length bytes]
-
-   These fields MUST be present on ALL end-to-end relay cells on each leg
-   that come from the endpoint, following a RELAY_CIRCUIT_LINK command.
-
-   They are absent on 'leaky pipe' RELAY_COMMAND_DROP and
-   RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays,
-   as opposed to the endpoint, to support padding.
-
-   Sequence numbers are incremented by one when an endpoint switches legs
-   to transmit a cell. This number will wrap; implementations should treat
-   0 as the next sequence after 2^6-1. Because we do not expect to support
-   significantly more than 2 legs, and much fewer than 63, this is not an
-   issue.
-
-   The first cell on a new circuit MUST use the BEGIN code for switching.
-   Cells are delivered from that circuit until an END switching signal is
-   received, even if cells arrive first on another circuit with the next
-   sequence number before and END switching field. Recipients MUST only
-   deliver cells with a BEGIN, if their Sequencing number is one more than
-   the last END.
+  In this method of signaling, we increment the sequence number by 1 only
+  when we switch legs, and use BEGIN/END "bookends" to know that all data
+  on a leg has been received.
+  
+  To achieve this, we add a small sequence number to the common relay
+  header for all relay cells on linked circuits, as well as a field to
+  signal the beginning of a sequence, intermediate data, and the end of a
+  sequence.
+  
+    Relay command   [1 byte]
+    Recognized      [2 bytes]
+    StreamID        [2 bytes]
+    Digest          [4 bytes]
+    Length          [2 bytes]
+  > Switching       [2 bits]    # 01 = BEGIN, 00 = CONTINUE, 10 = END
+  > Sequencing      [6 bits]
+    Data            [PAYLOAD_LEN - 12 - Length bytes]
+  
+  These fields MUST be present on ALL end-to-end relay cells on each leg
+  that come from the endpoint, following a RELAY_CIRCUIT_LINK command.
+  
+  They are absent on 'leaky pipe' RELAY_COMMAND_DROP and
+  RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, as
+  opposed to the endpoint, to support padding.
+  
+  Sequence numbers are incremented by one when an endpoint switches legs
+  to transmit a cell. This number will wrap; implementations should treat
+  0 as the next sequence after 2^6-1. Because we do not expect to support
+  significantly more than 2 legs, and much fewer than 63, this is not an
+  issue.
+  
+  The first cell on a new circuit MUST use the BEGIN code for switching.
+  Cells are delivered from that circuit until an END switching signal is
+  received, even if cells arrive first on another circuit with the next
+  sequence number before and END switching field. Recipients MUST only
+  deliver cells with a BEGIN, if their Sequencing number is one more than
+  the last END.
 
 A.2 Alternative Link Handshake [ALTERNATIVE_LINKING]
 
-   The circuit linking in [LINKING_CIRCUITS] could be done as encrypted
-   ntor onionskin extension fields, similar to those used by v3 onions.
-
-   This approach has at least four problems:
-     i). For onion services, since the onionskins traverse the intro circuit
-         and then return on the rend circuit, this handshake cannot measure
-         RTT there.
-    ii). Since these onionskins are larger, and have no PFS, an adversary
-         at the middle relay knows that the onionskin is for linking, and
-         can potentially try to obtain the onionskin key for attacks on
-         the link.
-   iii). It makes linking circuits more fragile, since they could timeout
-         due to CBT, or other issues during construction.
-    iv). The overhead in processing this onionskin through onionskin queues
-         adds additional time for linking, even in the Exit case, making
-         that RTT potentially noisy.
-
-   Additionally, it is not clear that this approach actually saves us
-   anything in terms of setup time, because we can optimize away the
-   linking phase using Proposal 325, to combine initial RELAY_BEGIN cells
-   with RELAY_CIRCUIT_LINK.
+  The circuit linking in [LINKING_CIRCUITS] could be done as encrypted
+  ntor onionskin extension fields, similar to those used by v3 onions.
+  
+  This approach has at least four problems:
+    i). For onion services, since onionskins traverse the intro circuit
+        and return on the rend circuit, this handshake cannot measure
+        RTT there.
+   ii). Since these onionskins are larger, and have no PFS, an adversary
+        at the middle relay knows that the onionskin is for linking, and
+        can potentially try to obtain the onionskin key for attacks on
+        the link.
+  iii). It makes linking circuits more fragile, since they could timeout
+        due to CBT, or other issues during construction.
+   iv). The overhead in processing this onionskin in onionskin queues
+        adds additional time for linking, even in the Exit case, making
+        that RTT potentially noisy.
+  
+  Additionally, it is not clear that this approach actually saves us
+  anything in terms of setup time, because we can optimize away the
+  linking phase using Proposal 325, to combine initial RELAY_BEGIN cells
+  with RELAY_CIRCUIT_LINK.
 
 A.3. Alternative RTT measurement [ALTERNATIVE_RTT]
 
-   Instead of measuring RTTs during [LINKING_CIRCUITS], we could create
-   PING/PONG cells, whose sole purpose is to allow endpoints to measure
-   RTT.
-
-   This was rejected for several reasons. First, during circuit use, we
-   already have SENDMEs to measure RTT. Every 100 cells (or
-   'circwindow_inc' from Proposal 324), we are able to re-measure RTT based
-   on the time between that Nth cell and the SENDME ack. So we only need
-   PING/PONG to measure initial circuit RTT.
-
-   If we were able to use onionskins, as per [ALTERNATIVE_LINKING] above,
-   we might be able to specify a PING/PONG/PING handshake solely for
-   measuring initial RTT, especially for onion service circuits.
-
-   The reason for not making a dedicated PING/PONG for this purpose is that
-   it is context-free. Even if we were able to use onionskins for linking
-   and resumption, to avoid additional data in handshake that just measures
-   RTT, we would have to enforce that this PING/PONG/PING only follows the
-   exact form needed by this proposal, at the expected time, and at no
-   other points.
-
-   If we do not enforce this specific use of PING/PONG/PING, it becomes
-   another potential side channel, for use in attacks such as [DROPMARK].
-
-   In general, Tor is planning to remove current forms of context-free and
-   semantic-free cells from its protocol:
-   https://gitlab.torproject.org/tpo/core/torspec/-/issues/39
-
-   We should not add more.
+  Instead of measuring RTTs during [LINKING_CIRCUITS], we could create
+  PING/PONG cells, whose sole purpose is to allow endpoints to measure
+  RTT.
+  
+  This was rejected for several reasons. First, during circuit use, we
+  already have SENDMEs to measure RTT. Every 100 cells (or
+  'circwindow_inc' from Proposal 324), we are able to re-measure RTT based
+  on the time between that Nth cell and the SENDME ack. So we only need
+  PING/PONG to measure initial circuit RTT.
+  
+  If we were able to use onionskins, as per [ALTERNATIVE_LINKING] above,
+  we might be able to specify a PING/PONG/PING handshake solely for
+  measuring initial RTT, especially for onion service circuits.
+  
+  The reason for not making a dedicated PING/PONG for this purpose is that
+  it is context-free. Even if we were able to use onionskins for linking
+  and resumption, to avoid additional data in handshake that just measures
+  RTT, we would have to enforce that this PING/PONG/PING only follows the
+  exact form needed by this proposal, at the expected time, and at no
+  other points.
+  
+  If we do not enforce this specific use of PING/PONG/PING, it becomes
+  another potential side channel, for use in attacks such as [DROPMARK].
+  
+  In general, Tor is planning to remove current forms of context-free and
+  semantic-free cells from its protocol:
+  https://gitlab.torproject.org/tpo/core/torspec/-/issues/39
+  
+  We should not add more.
 
 
 Appendix B: Acknowledgments
 
-   Thanks to Per Hurtig for helping us with the framing of the MPTCP
-   problem space.
-
-   Thanks to Simone Ferlin for clarifications on the [BLEST]
-   paper, and for pointing us at the Linux kernel implementation.
-
-   Extreme thanks goes again to Toke Høiland-Jørgensen, who helped
-   immensely towards our understanding of how the BLEST condition relates
-   to edge connection pushback, and for clearing up many other
-   misconceptions we had.
-
-   Finally, thanks to Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian
-   Goldberg, for the original [CONFLUX] paper!
+  Thanks to Per Hurtig for helping us with the framing of the MPTCP
+  problem space.
+  
+  Thanks to Simone Ferlin for clarifications on the [BLEST] paper, and for
+  pointing us at the Linux kernel implementation.
+  
+  Extreme thanks goes again to Toke Høiland-Jørgensen, who helped
+  immensely towards our understanding of how the BLEST condition relates
+  to edge connection pushback, and for clearing up many other
+  misconceptions we had.
+  
+  Finally, thanks to Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian
+  Goldberg, for the original [CONFLUX] paper!
 
 
 References:
author	Mike Perry <mikeperry-git@torproject.org>	2021-03-26 14:49:55 +0000
committer	Mike Perry <mikeperry-git@torproject.org>	2021-03-26 14:49:55 +0000
commit	d73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456 (patch)
tree	56da959079f1607420d9f699f752e56480366845 /proposals/329-traffic-splitting.txt
parent	f66a457cb27202931d276f8b38ee92223344bd0d (diff)
download	torspec-d73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456.tar.gz torspec-d73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456.zip