aboutsummaryrefslogtreecommitdiff
path: root/proposals/329-traffic-splitting.txt
diff options
context:
space:
mode:
authorMike Perry <mikeperry-git@torproject.org>2021-03-26 14:49:55 +0000
committerMike Perry <mikeperry-git@torproject.org>2021-03-26 14:49:55 +0000
commitd73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456 (patch)
tree56da959079f1607420d9f699f752e56480366845 /proposals/329-traffic-splitting.txt
parentf66a457cb27202931d276f8b38ee92223344bd0d (diff)
downloadtorspec-d73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456.tar.gz
torspec-d73bdd1e0b44b0f9c6ad8da216cdba0a9be2f456.zip
Reformat Prop329.
Diffstat (limited to 'proposals/329-traffic-splitting.txt')
-rw-r--r--proposals/329-traffic-splitting.txt1407
1 files changed, 709 insertions, 698 deletions
diff --git a/proposals/329-traffic-splitting.txt b/proposals/329-traffic-splitting.txt
index 746c6c4..8d73a8c 100644
--- a/proposals/329-traffic-splitting.txt
+++ b/proposals/329-traffic-splitting.txt
@@ -6,657 +6,668 @@ Status: Draft
0. Status
- This proposal describes the Conflux [CONFLUX] system developed by Mashael
- AlSabah, Kevin Bauer, Tariq Elahi, and Ian Goldberg. It aims at improving
- Tor client network performance by dynamically splitting traffic between two
- circuits.
+ This proposal describes the Conflux [CONFLUX] system developed by
+ Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian Goldberg. It aims at
+ improving Tor client network performance by dynamically splitting
+ traffic between two circuits.
1. Overview
1.1. Multipath TCP Design Space
- In order to understand our improvements to Conflux, it is important to
- properly conceptualize what is involved in the design of multipath
- algorithms in general.
-
- The design space is broken into two orthogonal parts: congestion
- control algorithms that apply to each path, and traffic scheduling
- algorithms that decide when to send packets to send on each path.
-
- MPTCP specifies 'coupled' congestion control (see [COUPLED]). Coupled
- congestion control updates single-path congestion control algorithms to
- account for shared bottlenecks between the paths, so that the combined
- congestion control algorithms do not overwhelm any bottlenecks that
- happen to be shared between the multiple paths. Various ways of
- accomplishing this have been proposed and implemented in the Linux kernel.
-
- Because Tor's congestion control only concerns itself with bottnecks in Tor
- relay queues, and not with any other bottlenecks (such as intermediate
- Internet routers), we can avoid this complexity merely by specifying that
- any paths that are constructed should not share any relays. In this way, we
- can proceed to use the exact same congestion control as specified in Proposal
- 324, for each path.
-
- For this reason, this proposal will focus on the traffic scheduling
- algorithms, rather than coupling. We propose three candidate algorithms
- that have been studied in the literature, and will compare their
- performance using simulation and consensus parameters.
+ In order to understand our improvements to Conflux, it is important to
+ properly conceptualize what is involved in the design of multipath
+ algorithms in general.
+
+ The design space is broken into two orthogonal parts: congestion control
+ algorithms that apply to each path, and traffic scheduling algorithms
+ that decide when to send packets to send on each path.
+
+ MPTCP specifies 'coupled' congestion control (see [COUPLED]). Coupled
+ congestion control updates single-path congestion control algorithms to
+ account for shared bottlenecks between the paths, so that the combined
+ congestion control algorithms do not overwhelm any bottlenecks that
+ happen to be shared between the multiple paths. Various ways of
+ accomplishing this have been proposed and implemented in the Linux
+ kernel.
+
+ Because Tor's congestion control only concerns itself with bottnecks in
+ Tor relay queues, and not with any other bottlenecks (such as
+ intermediate Internet routers), we can avoid this complexity merely by
+ specifying that any paths that are constructed should not share any
+ relays. In this way, we can proceed to use the exact same congestion
+ control as specified in Proposal 324, for each path.
+
+ For this reason, this proposal will focus on the traffic scheduling
+ algorithms, rather than coupling. We propose three candidate algorithms
+ that have been studied in the literature, and will compare their
+ performance using simulation and consensus parameters.
1.2. Divergence from the initial Conflux design
- The initial [CONFLUX] paper doesn't provide any indications on how to handle
- the size of out-of-order cell queue, which we consider a potential dangerous
- memory DoS vector (see [MEMORY_DOS]). It also used RTT as the sole heuristic
- for selecting which circuit to send on, which may vary depending on the
- geographical locations of the participant relays, without considering their
- actual available circuit capacity (which will be available to us via Proposal
- 324). Additionally, since the publication of [CONFLUX], more modern
- packet scheduling algorithms have been developed, which aim to reduce
- out-of-order queue size.
-
- We propose mitigations for these issues using modern scheduling algorithms,
- as well as implementations options for avoiding the out-of-order queue at
- Exit relays. Additionally, we consider resumption, side channel, and traffic
- analysis risks and benefits in [RESUMPTION], [SIDE_CHANNELS] and
- [TRAFFIC_ANALYSIS].
+ The initial [CONFLUX] paper doesn't provide any indications on how to
+ handle the size of out-of-order cell queue, which we consider a
+ potential dangerous memory DoS vector (see [MEMORY_DOS]). It also used
+ RTT as the sole heuristic for selecting which circuit to send on, which
+ may vary depending on the geographical locations of the participant
+ relays, without considering their actual available circuit capacity
+ (which will be available to us via Proposal 324). Additionally, since
+ the publication of [CONFLUX], more modern packet scheduling algorithms
+ have been developed, which aim to reduce out-of-order queue size.
+
+ We propose mitigations for these issues using modern scheduling
+ algorithms, as well as implementations options for avoiding the
+ out-of-order queue at Exit relays. Additionally, we consider resumption,
+ side channel, and traffic analysis risks and benefits in [RESUMPTION],
+ [SIDE_CHANNELS] and [TRAFFIC_ANALYSIS].
2. Design
- The following section describes the Conflux design. Each sub-section is a
- building block to the multipath design that Conflux proposes.
-
- The circuit construction is as follow:
-
- Primary Circuit (lower RTT)
- +-------+ +--------+
- |Guard 1|----->|Middle 1|----------+
- +---^---+ +--------+ |
- +-----+ | +--v---+
- | OP +------+ | Exit |--> ...
- +-----+ | +--^---+
- +---v---+ +--------+ |
- |Guard 2|----->|Middle 2|----------+
- +-------+ +--------+
- Secondary Circuit (higher RTT)
-
- Both circuits are built using current Tor path selection, however they
- SHOULD NOT share the same Guard relay, or middle relay. By avoiding
- using the same relays in these positions in the path, we ensure
- additional path capacity, and eliminate the need to use more complicated
- 'coupled' congestion control algorithms from the MPTCP literature[COUPLED].
- This both simplifies design, and improves performance.
-
- Then, the OP needs to link the two circuits together, as described in
- [LINKING_CIRCUITS], [LINKING_EXIT], and [LINKING_SERVICE].
-
- For ease of explanation, the primary circuit is the circuit with lower RTT,
- and the secondary circuit is the circuit with higher RTT. Initial RTT
- is measured during circuit linking, as described in [LINKING_CIRCUITS].
- RTT is continually measured using SENDME timing, as in Proposal 324.
- This means that during use, the primary circuit and secondary circuit may
- switch roles, depending on unrelated network congestion caused by other
- Tor clients.
-
- We also support linking onion service circuits together. In this case,
- only two rendezvous circuits are linked. Each of these RP circuits will be
- constructed separately, and then linked. However, the same path constraints
- apply to each half of the circuits (no shared relays between the legs).
- Should, by chance, the service and the client sides end up sharing some
- relays, this is not catastrophic. Multipath TCP researchers we have
- consulted believe Tor's congestion control from Proposal 324 to be
- sufficient in this rare case.
-
- Only two circuits SHOULD be linked together. However, implementations
- SHOULD make it easy for researchers to *test* more than two paths, as this
- has been shown to assist in traffic analysis resistance[WTF_SPLIT]. At
- minimum, this means not hardcoding only two circuits in the implementation.
-
- If the number of circuits exceeds the current number of guard relays,
- guard relays MAY be re-used, but implementations SHOULD use the same
- number of Guards as paths.
-
- Linked circuits MUST NOT be extended further once linked (ie:
- 'cannibalization' is not supported).
+ The following section describes the Conflux design. Each sub-section is
+ a building block to the multipath design that Conflux proposes.
+
+ The circuit construction is as follow:
+
+ Primary Circuit (lower RTT)
+ +-------+ +--------+
+ |Guard 1|----->|Middle 1|----------+
+ +---^---+ +--------+ |
+ +-----+ | +--v---+
+ | OP +------+ | Exit |--> ...
+ +-----+ | +--^---+
+ +---v---+ +--------+ |
+ |Guard 2|----->|Middle 2|----------+
+ +-------+ +--------+
+ Secondary Circuit (higher RTT)
+
+ Both circuits are built using current Tor path selection, however they
+ SHOULD NOT share the same Guard relay, or middle relay. By avoiding
+ using the same relays in these positions in the path, we ensure
+ additional path capacity, and eliminate the need to use more complicated
+ 'coupled' congestion control algorithms from the MPTCP
+ literature[COUPLED]. This both simplifies design, and improves
+ performance.
+
+ Then, the OP needs to link the two circuits together, as described in
+ [LINKING_CIRCUITS], [LINKING_EXIT], and [LINKING_SERVICE].
+
+ For ease of explanation, the primary circuit is the circuit with lower
+ RTT, and the secondary circuit is the circuit with higher RTT. Initial
+ RTT is measured during circuit linking, as described in
+ [LINKING_CIRCUITS]. RTT is continually measured using SENDME timing, as
+ in Proposal 324. This means that during use, the primary circuit and
+ secondary circuit may switch roles, depending on unrelated network
+ congestion caused by other Tor clients.
+
+ We also support linking onion service circuits together. In this case,
+ only two rendezvous circuits are linked. Each of these RP circuits will
+ be constructed separately, and then linked. However, the same path
+ constraints apply to each half of the circuits (no shared relays between
+ the legs). Should, by chance, the service and the client sides end up
+ sharing some relays, this is not catastrophic. Multipath TCP researchers
+ we have consulted believe Tor's congestion control from Proposal 324 to
+ be sufficient in this rare case.
+
+ Only two circuits SHOULD be linked together. However, implementations
+ SHOULD make it easy for researchers to *test* more than two paths, as
+ this has been shown to assist in traffic analysis resistance[WTF_SPLIT].
+ At minimum, this means not hardcoding only two circuits in the
+ implementation.
+
+ If the number of circuits exceeds the current number of guard relays,
+ guard relays MAY be re-used, but implementations SHOULD use the same
+ number of Guards as paths.
+
+ Linked circuits MUST NOT be extended further once linked (ie:
+ 'cannibalization' is not supported).
2.1. Advertising support for conflux
- We propose a new protocol version in order to advertise support for
- circuit linking on the relay side:
-
- "Relay=4" -- Relay supports an 2 byte sequence number in a RELAY cell
- header used for multipath circuit which are linked with the
- new RELAY_CIRCUIT_LINK relay cell command.
+ We propose a new protocol version in order to advertise support for
+ circuit linking on the relay side:
+
+ "Relay=4" -- Relay supports an 2 byte sequence number in a RELAY cell
+ header used for multipath circuit which are linked with the
+ new RELAY_CIRCUIT_LINK relay cell command.
+
+ XXX: Advertise this in onion service descriptor.
+ XXX: Onion service descriptor can advertise more than two circuits?
- XXX: Advertise this in onion service descriptor.
- XXX: Onion service descriptor can advertise more than two circuits?
-
- The next section describes how the circuits are linked together.
+ The next section describes how the circuits are linked together.
2.2. Linking circuits [LINKING_CIRCUITS]
- To link circuits, we propose new relay commands that are sent on both
- circuits, as well as a response to confirm the join, and an ack of this
- response. These commands create a 3way handshake, which allows each
- endpoint to measure the initial RTT of each leg upon link, without
- needing to wait for any data.
-
- All three stages of this handshake are sent on *each* circuit leg to be
- linked.
-
- To save round trips, these cells SHOULD be combined with the initial
- RELAY_BEGIN cell on the faster circuit leg, using Proposal 325. See
- [LINKING_EXIT] and [LINKING_SERVICE] for more details on setup in each case.
-
- There are other ways to do this linking that we have considered, but they
- seem not to be significantly better than this method, especially since we
- can use Proposal 325 to eliminate the RTT cost of this setup before sending
- data. For those other ideas, see [ALTERNATIVE_LINKING] and
- [ALTERNATIVE_RTT], in the appendix.
-
- The first two parts of the handshake establish the link, and enable
- resumption:
-
- 16 -- RELAY_CIRCUIT_LINK
-
- Sent from the OP to the exit/service in order to link
- circuits together at the end point.
-
- 17 -- RELAY_CIRCUIT_LINKED
-
- Sent from the exit/service to the OP, to confirm the circuits
- were linked.
-
- These cells have the following contents:
-
- VERSION [1 byte]
- PAYLOAD [variable, up to end of relay payload]
-
- The VERSION tells us which circuit linking mechanism to use. At this point
- in time, only 0x01 is recognized and is the one described by the Conflux
- design.
-
- For version 0x01, the PAYLOAD contains:
-
- NONCE [32 bytes]
- LAST_SEQNO_SENT [8 bytes]
- LAST_SEQNO_RECV [8 bytes]
-
- XXX: Should we let endpoints specify their preferred [SCHEDULING] alg
- here, to override consensus params? This has benefits: eg low-memory
- mobile clients can ask for an alg that is better for their reorder
- queues. But it also has complexity risk, if the other endpoint does
- not want to support it, because of its own memory issues.
-
- The NONCE contains a random 256-bit secret, used to associate the two
- circuits together. The nonce must not be shared outside of the circuit
- transmission, or data may be injected into TCP streams. This means it
- MUST NOT be logged to disk.
-
- The two sequence number fields are 0 upon initial link, but non-zero in the
- case of a resumption attempt (See [RESUMPTION]).
-
- If either circuit does not receive a RELAY_CIRCUIT_LINKED response, both
- circuits MUST be closed.
-
- The third stage of the handshake exists to help the exit/service measure
- initial RTT, for use in [SCHEDULING]:
-
- 18 -- RELAY_CIRCUIT_LINKED_RTT_ACK
-
- Sent from the OP to the exit/service, to provide initial RTT
- measurement for the exit/service.
-
- For timeout of the handshake, clients should use the normal SOCKS/stream
- timeout already in use for RELAY_BEGIN.
-
- These three relay commands (RELAY_CIRCUIT_LINK, RELAY_CIRCUIT_LINKED,
- and RELAY_CIRCUIT_LINKED_ACK) are send on *each* leg, to allow each
- endpoint to measure the initial RTT of each leg.
+ To link circuits, we propose new relay commands that are sent on both
+ circuits, as well as a response to confirm the join, and an ack of this
+ response. These commands create a 3way handshake, which allows each
+ endpoint to measure the initial RTT of each leg upon link, without
+ needing to wait for any data.
+
+ All three stages of this handshake are sent on *each* circuit leg to be
+ linked.
+
+ To save round trips, these cells SHOULD be combined with the initial
+ RELAY_BEGIN cell on the faster circuit leg, using Proposal 325. See
+ [LINKING_EXIT] and [LINKING_SERVICE] for more details on setup in each
+ case.
+
+ There are other ways to do this linking that we have considered, but
+ they seem not to be significantly better than this method, especially
+ since we can use Proposal 325 to eliminate the RTT cost of this setup
+ before sending data. For those other ideas, see [ALTERNATIVE_LINKING]
+ and [ALTERNATIVE_RTT], in the appendix.
+
+ The first two parts of the handshake establish the link, and enable
+ resumption:
+
+ 16 -- RELAY_CIRCUIT_LINK
+
+ Sent from the OP to the exit/service in order to link
+ circuits together at the end point.
+
+ 17 -- RELAY_CIRCUIT_LINKED
+
+ Sent from the exit/service to the OP, to confirm the circuits
+ were linked.
+
+ These cells have the following contents:
+
+ VERSION [1 byte]
+ PAYLOAD [variable, up to end of relay payload]
+
+ The VERSION tells us which circuit linking mechanism to use. At this
+ point in time, only 0x01 is recognized and is the one described by the
+ Conflux design.
+
+ For version 0x01, the PAYLOAD contains:
+
+ NONCE [32 bytes]
+ LAST_SEQNO_SENT [8 bytes]
+ LAST_SEQNO_RECV [8 bytes]
+
+ XXX: Should we let endpoints specify their preferred [SCHEDULING] alg
+ here, to override consensus params? This has benefits: eg low-memory
+ mobile clients can ask for an alg that is better for their reorder
+ queues. But it also has complexity risk, if the other endpoint does not
+ want to support it, because of its own memory issues.
+
+ The NONCE contains a random 256-bit secret, used to associate the two
+ circuits together. The nonce must not be shared outside of the circuit
+ transmission, or data may be injected into TCP streams. This means it
+ MUST NOT be logged to disk.
+
+ The two sequence number fields are 0 upon initial link, but non-zero in
+ the case of a resumption attempt (See [RESUMPTION]).
+
+ If either circuit does not receive a RELAY_CIRCUIT_LINKED response, both
+ circuits MUST be closed.
+
+ The third stage of the handshake exists to help the exit/service measure
+ initial RTT, for use in [SCHEDULING]:
+
+ 18 -- RELAY_CIRCUIT_LINKED_RTT_ACK
+
+ Sent from the OP to the exit/service, to provide initial RTT
+ measurement for the exit/service.
+
+ For timeout of the handshake, clients should use the normal SOCKS/stream
+ timeout already in use for RELAY_BEGIN.
+
+ These three relay commands (RELAY_CIRCUIT_LINK, RELAY_CIRCUIT_LINKED,
+ and RELAY_CIRCUIT_LINKED_ACK) are send on *each* leg, to allow each
+ endpoint to measure the initial RTT of each leg.
2.2. Linking Circuits from OP to Exit [LINKING_EXIT]
- To link exit circuits, two circuits to the same exit are built. The
- client records the circuit build time of each.
-
- If the circuits are being built on-demand, for immediate use, the
- circuit with the lower build time SHOULD use Proposal 325 to append its
- first RELAY cell to the RELAY_COMMAND_LINK, on the circuit with the
- lower circuit build time. The exit MUST respond on this same leg.
- After that, actual RTT measurements MUST be used to determine
- future transmissions, as specified in [SCHEDULING].
-
- The RTT times between RELAY_COMMAND_LINK and RELAY_COMMAND_LINKED are
- measured by the client, to determine each circuit RTT to determine
- primary vs secondary circuit use, and for packet scheduling. Similarly,
- the exit measures the RTT times between RELAY_COMMAND_LINKED and
- RELAY_COMMAND_LINKED_ACK, for the same purpose.
-
+ To link exit circuits, two circuits to the same exit are built. The
+ client records the circuit build time of each.
+
+ If the circuits are being built on-demand, for immediate use, the
+ circuit with the lower build time SHOULD use Proposal 325 to append its
+ first RELAY cell to the RELAY_COMMAND_LINK, on the circuit with the
+ lower circuit build time. The exit MUST respond on this same leg. After
+ that, actual RTT measurements MUST be used to determine future
+ transmissions, as specified in [SCHEDULING].
+
+ The RTT times between RELAY_COMMAND_LINK and RELAY_COMMAND_LINKED are
+ measured by the client, to determine each circuit RTT to determine
+ primary vs secondary circuit use, and for packet scheduling. Similarly,
+ the exit measures the RTT times between RELAY_COMMAND_LINKED and
+ RELAY_COMMAND_LINKED_ACK, for the same purpose.
+
2.3. Linking circuits to an onion service [LINKING_SERVICE]
-
- For onion services, we will only concern ourselves with linking
- rendezvous circuits.
-
- To join rendezvous circuits, clients make two introduce requests to a
- service's intropoint, causing it to create two rendezvous circuits, to
- meet the client at two separate rendezvous points. These introduce
- requests MUST be sent to the same intropoint (due to potential use of
- onionbalance), and SHOULD be sent back-to-back on the same intro
- circuit. They MAY be combined with Proposal 325.
-
- The first rendezvous circuit to get joined SHOULD use Proposal 325
- to append the RELAY_BEGIN command, and the service MUST answer
- on this circuit, until RTT can be measured.
-
- Once both circuits are linked and RTT is measured, packet scheduling
- should be used, as per [SCHEDULING].
-
+
+ For onion services, we will only concern ourselves with linking
+ rendezvous circuits.
+
+ To join rendezvous circuits, clients make two introduce requests to a
+ service's intropoint, causing it to create two rendezvous circuits, to
+ meet the client at two separate rendezvous points. These introduce
+ requests MUST be sent to the same intropoint (due to potential use of
+ onionbalance), and SHOULD be sent back-to-back on the same intro
+ circuit. They MAY be combined with Proposal 325.
+
+ The first rendezvous circuit to get joined SHOULD use Proposal 325 to
+ append the RELAY_BEGIN command, and the service MUST answer on this
+ circuit, until RTT can be measured.
+
+ Once both circuits are linked and RTT is measured, packet scheduling
+ should be used, as per [SCHEDULING].
+
2.4. Congestion Control Application [CONGESTION_CONTROL]
-
- The SENDMEs for congestion control are performed per-leg. As data arrives,
- regardless of its ordering, it is counted towards SENDME delivery. In this
- way, 'cwnd - package_window' of each leg always reflects the available
- data to send on each leg. This is important for [SCHEDULING].
-
- The Congestion control Stream XON/XOFF can be sent on either leg, and
- applies to the stream's transmission on both legs.
-
+
+ The SENDMEs for congestion control are performed per-leg. As data
+ arrives, regardless of its ordering, it is counted towards SENDME
+ delivery. In this way, 'cwnd - package_window' of each leg always
+ reflects the available data to send on each leg. This is important for
+ [SCHEDULING].
+
+ The Congestion control Stream XON/XOFF can be sent on either leg, and
+ applies to the stream's transmission on both legs.
+
2.5. Sequencing [SEQUENCING]
-
- With multiple paths for data, the problem of data re-ordering appears. In
- other words, cells can arrive out of order from the two circuits where cell
- N + 1 arrives before the cell N.
-
- Handling this reordering operates after congestion control for each circuit
- leg, but before relay cell command processing or stream data delivery.
-
- For the receiver to be able to reorder the receiving cells, a sequencing
- scheme needs to be implemented. However, because Tor does not drop or
- reorder packets inside of a circuit, this sequence number can be very
- small. It only has to signal that a cell comes after those arriving
- on another circuit.
-
- To achieve this, we add a small sequence number to the common relay
- header for all relay cells on linked circuits. This sequence number
- is meant to signal the number of cells sent on the *other* leg, so
- that each endpoint knows how many cells are still in-flight on
- another leg. It is different from the absolute sequence number used
- in [LINKING_CIRCUITS] and [RESUMPTION], but can be derived from that
- number, using relative arithmetic.
-
- Relay command [1 byte]
- Recognized [2 bytes]
- StreamID [2 bytes]
- Digest [4 bytes]
- Length [2 bytes]
- > LongSeq [1 bit] # If this bit is set, use 31 bits for Seq
- > Sequencing [7 or 31 bits]
- Data [Remainder]
-
- The sequence number is only set for the first cell after the endpoint
- switches legs. In this case, LongSeq is set to 1, and the Sequencing
- field is 31 more bits. Otherwise it is a 1 byte 0 value.
-
- These fields MUST be present on ALL end-to-end relay cells on each leg
- that come from the endpoint, following a RELAY_CIRCUIT_LINK command.
-
- They are absent on 'leaky pipe' RELAY_COMMAND_DROP and
- RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays,
- as opposed to the endpoint, to support padding.
-
- When an endpoint switches legs, on the first cell in a new leg,
- LongSeq is set to 1, and the following 31 bits represent the *total*
- number of cells sent on the *other* leg, before the switch. The receiver
- must wait for that number of cells to arrive from the previous leg
- before delivering that cell.
-
- XXX: In the rare event that we send more than 2^31 cells (~1TB) on a
- single leg, do we force a switch of legs, or expand the field further?
-
- An alternative method of sequencing, that assumes that the endpoint
- knows when it is going to switch, the cell before it switches, is
- specified in [ALTERNATIVE_SEQUENCING]. Note that that method requires
- only 1 byte for sequence number and switch signaling, but requires that
- the sender know that it is planning to switch, the cell before it switches.
- (This is possible with [BLEST_TOR], but [LOWRTT_TOR] can switch based on
- RTT change, so it may be one cell late in that case).
+
+ With multiple paths for data, the problem of data re-ordering appears.
+ In other words, cells can arrive out of order from the two circuits
+ where cell N + 1 arrives before the cell N.
+
+ Handling this reordering operates after congestion control for each
+ circuit leg, but before relay cell command processing or stream data
+ delivery.
+
+ For the receiver to be able to reorder the receiving cells, a sequencing
+ scheme needs to be implemented. However, because Tor does not drop or
+ reorder packets inside of a circuit, this sequence number can be very
+ small. It only has to signal that a cell comes after those arriving on
+ another circuit.
+
+ To achieve this, we add a small sequence number to the common relay
+ header for all relay cells on linked circuits. This sequence number is
+ meant to signal the number of cells sent on the *other* leg, so that
+ each endpoint knows how many cells are still in-flight on another leg.
+ It is different from the absolute sequence number used in
+ [LINKING_CIRCUITS] and [RESUMPTION], but can be derived from that
+ number, using relative arithmetic.
+
+ Relay command [1 byte]
+ Recognized [2 bytes]
+ StreamID [2 bytes]
+ Digest [4 bytes]
+ Length [2 bytes]
+ > LongSeq [1 bit] # If this bit is set, use 31 bits for Seq
+ > Sequencing [7 or 31 bits]
+ Data [Remainder]
+
+ The sequence number is only set for the first cell after the endpoint
+ switches legs. In this case, LongSeq is set to 1, and the Sequencing
+ field is 31 more bits. Otherwise it is a 1 byte 0 value.
+
+ These fields MUST be present on ALL end-to-end relay cells on each leg
+ that come from the endpoint, following a RELAY_CIRCUIT_LINK command.
+
+ They are absent on 'leaky pipe' RELAY_COMMAND_DROP and
+ RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, as
+ opposed to the endpoint, to support padding.
+
+ When an endpoint switches legs, on the first cell in a new leg, LongSeq
+ is set to 1, and the following 31 bits represent the *total* number of
+ cells sent on the *other* leg, before the switch. The receiver must wait
+ for that number of cells to arrive from the previous leg before
+ delivering that cell.
+
+ XXX: In the rare event that we send more than 2^31 cells (~1TB) on a
+ single leg, do we force a switch of legs, or expand the field further?
+
+ An alternative method of sequencing, that assumes that the endpoint
+ knows when it is going to switch, the cell before it switches, is
+ specified in [ALTERNATIVE_SEQUENCING]. Note that that method requires
+ only 1 byte for sequence number and switch signaling, but requires that
+ the sender know that it is planning to switch, the cell before it
+ switches. (This is possible with [BLEST_TOR], but [LOWRTT_TOR] can
+ switch based on RTT change, so it may be one cell late in that case).
2.6. Resumption [RESUMPTION]
- In the event that a circuit leg is destroyed, they MAY be resumed.
-
- Resumption is achieved by re-using the NONCE and method to the same endpoint
- (either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path need not
- use the same middle and guard relays, but should not share any relays
- with any existing legs(s).
-
- To provide resumption, endpoints store an absolute 64bit cell counter of
- the last cell they have sent on a conflux pair (their LAST_SEQNO_SENT),
- as well the last sequence number they have delivered in-order to edge
- connections corresponding to a conflux pair (their LAST_SEQNO_RECV).
- Additionally, endpoints MAY store the entire contents of unacked
- inflight cells (ie the 'package_window' from proposal 324), for each
- leg, along with information corresponding to those cells' absolute
- sequence numbers.
-
- These 64 bit absolute counters can wrap without issue, as congestion
- windows will never grow to 2^64 cells until well past the Singularity.
- However, it is possible that extremely long, bulk circuits could
- exceed 2^64 total sent or received cells, so endpoints SHOULD handle
- wrapped sequence numbers for purposes of computing retransmit
- information. (But even this case is unlikely to happen within the next
- decade or so).
-
- Upon resumption, the LAST_SEQNO_SENT and LAST_SEQNO_RECV fields are used to
- convey the sequence numbers of the last cell the relay sent and received on
- that leg. The other endpoint can use these sequence numbers to determine if
- it received the in-flight data or not, or sent more data since that point,
- up to and including this absolute sequence number. If LAST_SEQNO_SENT
- has not been received, the endpoint MAY transmit the missing data, if it
- still has it buffered.
-
- Because both endpoints get information about the other side's absolute
- SENT sequence number, they will know exactly how many re-transmitted
- packets to expect, should the circuit stay open. Re-transmitters
- should not re-increment their absolute sent fields while re-transmitting.
-
- If it does not have this missing data due to memory pressure, that endpoint
- should destroy *both* legs, as this represents unrecoverable data loss.
-
- Otherwise, the new circuit can be re-joined, and its RTT can be compared
- to the remaining circuit to determine if the new leg is primary or
- secondary.
-
- It is even possible to resume conflux circuits where both legs have
- been collapsed using this scheme, if endpoints continue to buffer their
- unacked package_window data for some time after this close. However,
- see [TRAFFIC_ANALYSIS] for more details on the full scope of this
- issue.
-
- If endpoints are buffering package_window data, such data should be
- given priority to be freed in any oomkiller invocation. See
- [MEMORY_DOS] for more oomkiller information.
+ In the event that a circuit leg is destroyed, they MAY be resumed.
+
+ Resumption is achieved by re-using the NONCE and method to the same
+ endpoint (either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path
+ need not use the same middle and guard relays, but should not share any
+ relays with any existing legs(s).
+
+ To provide resumption, endpoints store an absolute 64bit cell counter of
+ the last cell they have sent on a conflux pair (their LAST_SEQNO_SENT),
+ as well the last sequence number they have delivered in-order to edge
+ connections corresponding to a conflux pair (their LAST_SEQNO_RECV).
+ Additionally, endpoints MAY store the entire contents of unacked
+ inflight cells (ie the 'package_window' from proposal 324), for each
+ leg, along with information corresponding to those cells' absolute
+ sequence numbers.
+
+ These 64 bit absolute counters can wrap without issue, as congestion
+ windows will never grow to 2^64 cells until well past the Singularity.
+ However, it is possible that extremely long, bulk circuits could exceed
+ 2^64 total sent or received cells, so endpoints SHOULD handle wrapped
+ sequence numbers for purposes of computing retransmit information. (But
+ even this case is unlikely to happen within the next decade or so).
+
+ Upon resumption, the LAST_SEQNO_SENT and LAST_SEQNO_RECV fields are used
+ to convey the sequence numbers of the last cell the relay sent and
+ received on that leg. The other endpoint can use these sequence numbers
+ to determine if it received the in-flight data or not, or sent more data
+ since that point, up to and including this absolute sequence number. If
+ LAST_SEQNO_SENT has not been received, the endpoint MAY transmit the
+ missing data, if it still has it buffered.
+
+ Because both endpoints get information about the other side's absolute
+ SENT sequence number, they will know exactly how many re-transmitted
+ packets to expect, should the circuit stay open. Re-transmitters should
+ not re-increment their absolute sent fields while re-transmitting.
+
+ If it does not have this missing data due to memory pressure, that
+ endpoint should destroy *both* legs, as this represents unrecoverable
+ data loss.
+
+ Otherwise, the new circuit can be re-joined, and its RTT can be compared
+ to the remaining circuit to determine if the new leg is primary or
+ secondary.
+
+ It is even possible to resume conflux circuits where both legs have been
+ collapsed using this scheme, if endpoints continue to buffer their
+ unacked package_window data for some time after this close. However, see
+ [TRAFFIC_ANALYSIS] for more details on the full scope of this issue.
+
+ If endpoints are buffering package_window data, such data should be
+ given priority to be freed in any oomkiller invocation. See [MEMORY_DOS]
+ for more oomkiller information.
3. Traffic Scheduling [SCHEDULING]
- In order to load balance the traffic between the two circuits, the original
- conflux paper used only RTT. However, with Proposal 324, we will have
- accurate information on the instantaneous available bandwidth of each
- circuit leg, as 'cwnd - package_window' (see Section 3 of Proposal 324).
-
- Some additional RTT optimizations are also useful, to improve
- responsiveness and minimize out-of-order queue sizes.
-
- We specify two traffic schedulers from the multipath literature and adapt
- them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has three
- variants, with different trade offs.
-
- However, see the [TRAFFIC_ANALYSIS] sections of this proposal for important
- details on how this selection can be changed, to reduce website traffic
- fingerprinting.
+ In order to load balance the traffic between the two circuits, the
+ original conflux paper used only RTT. However, with Proposal 324, we
+ will have accurate information on the instantaneous available bandwidth
+ of each circuit leg, as 'cwnd - package_window' (see Section 3 of
+ Proposal 324).
+
+ Some additional RTT optimizations are also useful, to improve
+ responsiveness and minimize out-of-order queue sizes.
+
+ We specify two traffic schedulers from the multipath literature and
+ adapt them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has
+ three variants, with different trade offs.
+
+ However, see the [TRAFFIC_ANALYSIS] sections of this proposal for
+ important details on how this selection can be changed, to reduce
+ website traffic fingerprinting.
3.1. LowRTT Scheduling [LOWRTT_TOR]
- This scheduling algorithm is based on the original [CONFLUX] paper, with
- ideas from [MPTCP]'s minRTT/LowRTT scheduler.
-
- In this algorithm, endpoints send cells on the circuit with lower RTT
- (primary circuit). This continues while the congestion window on the
- circuit has available room: ie whenever cwnd - package_window > 0.
-
- Whenever the primary circuit's congestion window becomes full, the
- secondary circuit is used. We stop reading on the send window source
- (edge connection) when both congestion windows become full.
-
- In this way, unlike original conflux, we switch to the secondary circuit
- without causing congestion on the primary circuit. This improves both
- load times, and overall throughput.
-
- This behavior matches minRTT from [MPTCP], sometimes called LowRTT.
-
- It may be better to stop reading on the edge connection when the primary
- congestion window becomes full, rather than switch to the secondary
- circuit as soon as the primary congestion window becomes full. (Ie: only
- switch if the RTTs themselves change which circuit is primary). This is
- what was done in the original Conflux paper. This behavior effectively
- causes us to optimize for responsiveness and congestion avoidance, rather
- than throughput. For evaluation, we should control this switching behavior
- with a consensus parameter (see [CONSENSUS_PARAMETERS]).
-
- Because of potential side channel risk (see [SIDE_CHANNELS]), a third
- variant of this algorithm, where the primary circuit is chosen during the
- [LINKING_CIRCUITS] handshake and never changed, should also be possible
- to control via consensus parameter.
+ This scheduling algorithm is based on the original [CONFLUX] paper, with
+ ideas from [MPTCP]'s minRTT/LowRTT scheduler.
+
+ In this algorithm, endpoints send cells on the circuit with lower RTT
+ (primary circuit). This continues while the congestion window on the
+ circuit has available room: ie whenever cwnd - package_window > 0.
+
+ Whenever the primary circuit's congestion window becomes full, the
+ secondary circuit is used. We stop reading on the send window source
+ (edge connection) when both congestion windows become full.
+
+ In this way, unlike original conflux, we switch to the secondary circuit
+ without causing congestion on the primary circuit. This improves both
+ load times, and overall throughput.
+
+ This behavior matches minRTT from [MPTCP], sometimes called LowRTT.
+
+ It may be better to stop reading on the edge connection when the primary
+ congestion window becomes full, rather than switch to the secondary
+ circuit as soon as the primary congestion window becomes full. (Ie: only
+ switch if the RTTs themselves change which circuit is primary). This is
+ what was done in the original Conflux paper. This behavior effectively
+ causes us to optimize for responsiveness and congestion avoidance,
+ rather than throughput. For evaluation, we should control this switching
+ behavior with a consensus parameter (see [CONSENSUS_PARAMETERS]).
+
+ Because of potential side channel risk (see [SIDE_CHANNELS]), a third
+ variant of this algorithm, where the primary circuit is chosen during
+ the [LINKING_CIRCUITS] handshake and never changed, should also be
+ possible to control via consensus parameter.
3.2. BLEST Scheduling [BLEST_TOR]
- [BLEST] attempts to predict the availability of the primary circuit, and
- use this information to reorder transmitted data, to minimize head-of-line
- blocking in the recipient (and thus minimize out-of-order queues there).
-
- BLEST_TOR uses the primary circuit until the congestion window is full.
- Then, it uses the relative RTT times of the two circuits to calculate
- how much data can be sent on the secondary circuit faster than if we
- just waited for the primary circuit to become available.
-
- This is achieved by computing two variables at the sender:
-
- rtts = secondary.currRTT / primary.currRTT
- primary_limit = primary.cwnd + (rtts-1)/2)*rtts
-
- Note: This (rtts-1)/2 factor represents anticipated congestion window
- growth over this period.. it may be different for Tor, depending on CC alg.
-
- If primary_limit < secondary.cwnd - (secondary.package_window + 1), then
- there is enough space on the secondary circuit to send data faster than we
- could than waiting for the primary circuit.
-
- XXX: Note that BLEST uses total_send_window where we use secondary.cwnd in
- this check. total_send_window is min(recv_win, CWND). But since Tor does
- not use receive windows and intead uses stream XON/XOFF, we only use CWND. There
- is some concern this may alter BLEST's buffer minimization properties,
- but since receive window should only matter if the application is slower
- than Tor, and XON/XOFF should cover that case, hopefully this is fine.
-
- Otherwise, if the primary_limit condition is not hit, cease reading
- on source edge connections until SENDME acks come back.
-
- Here is the pseudocode for this:
-
- while source.has_data_to_send():
- if primary.cwnd > primary.package_window:
- primary.send(source.get_packet())
- continue
-
- rtts = secondary.currRTT / primary.currRTT
- primary_limit = (primary.cwnd + (rtts-1)/2)*rtts
-
- if primary_limit < secondary.cwnd - (secondary.package_window+1):
- secondary.send(source.get_packet())
- else:
- break # done for now, wait for an ACK to free up CWND space and restart
-
- Note that BLEST also has a parameter lambda that is updated whenever HoL
- blocking occurs. Because it is expensive and takes significant time to
- signal this over Tor, we omit this.
- XXX: See [REORDER_SIGNALING] section if we want this lambda feedback.
+ [BLEST] attempts to predict the availability of the primary circuit, and
+ use this information to reorder transmitted data, to minimize
+ head-of-line blocking in the recipient (and thus minimize out-of-order
+ queues there).
+
+ BLEST_TOR uses the primary circuit until the congestion window is full.
+ Then, it uses the relative RTT times of the two circuits to calculate
+ how much data can be sent on the secondary circuit faster than if we
+ just waited for the primary circuit to become available.
+
+ This is achieved by computing two variables at the sender:
+
+ rtts = secondary.currRTT / primary.currRTT
+ primary_limit = primary.cwnd + (rtts-1)/2)*rtts
+
+ Note: This (rtts-1)/2 factor represents anticipated congestion window
+ growth over this period.. it may be different for Tor, depending on CC
+ alg.
+
+ If primary_limit < secondary.cwnd - (secondary.package_window + 1), then
+ there is enough space on the secondary circuit to send data faster than
+ we could than waiting for the primary circuit.
+
+ XXX: Note that BLEST uses total_send_window where we use secondary.cwnd
+ in this check. total_send_window is min(recv_win, CWND). But since Tor
+ does not use receive windows and intead uses stream XON/XOFF, we only
+ use CWND. There is some concern this may alter BLEST's buffer
+ minimization properties, but since receive window should only matter if
+ the application is slower than Tor, and XON/XOFF should cover that case,
+ hopefully this is fine.
+
+ Otherwise, if the primary_limit condition is not hit, cease reading on
+ source edge connections until SENDME acks come back.
+
+ Here is the pseudocode for this:
+
+ while source.has_data_to_send():
+ if primary.cwnd > primary.package_window:
+ primary.send(source.get_packet())
+ continue
+
+ rtts = secondary.currRTT / primary.currRTT
+ primary_limit = (primary.cwnd + (rtts-1)/2)*rtts
+
+ if primary_limit < secondary.cwnd - (secondary.package_window+1):
+ secondary.send(source.get_packet())
+ else:
+ break # done for now, wait for SENDME to free up CWND and restart
+
+ Note that BLEST also has a parameter lambda that is updated whenever HoL
+ blocking occurs. Because it is expensive and takes significant time to
+ signal this over Tor, we omit this.
+
+ XXX: See [REORDER_SIGNALING] section if we want this lambda feedback.
3.3. Reorder queue signaling [REORDER_SIGNALING]
- Reordering should be fairly simple task. By following using the sequence
- number field in [SEQUENCING], endpoints can know how many cells are still
- in flight on the other leg.
-
- To reorder them properly, a buffer of out of order cells needs to be kept.
- On the Exit side, this can quickly become overwhelming considering ten of
- thousands of possible circuits can be held open leading to gigabytes of
- memory being used. There is a clear potential memory DoS vector which means
- that a tor implementation should be able to limit the size of those queues.
-
- Luckily, [BLEST_TOR] and the form of [LOWRTT_TOR] that only uses the
- primary circuit will minimize or eliminate this out-of-order buffer.
-
- XXX: The remainder of this section may be over-complicating things... We
- only need these concepts if we want to use BLEST's lambda feedback.
-
- The default for this queue size is governed by the 'cflx_reorder_client'
- and 'cflx_reorder_srv' consensus parameters (see [CONSENSUS_PARAMS]).
- 'cflx_reorder_srv' applies to Exits and onion services. Both parameters
- can be overridden by Torrc, to larger or smaller than the consensus
- parameter. (Low memory clients may want to lower it; SecureDrop onion
- services or other high-upload services may want to raise it).
-
- When the reorder queue hits this size, a RELAY_CONFLUX_XOFF is sent down
- the circuit leg that has data waiting in the queue and use of that leg must
- cease, until it drains to half of this value, at which point an
- RELAY_CONFLUX_XON is sent. Note that this is different than the stream
- XON/XOFF from Proposal 324.
-
- XXX: [BLEST] actually does not cease use of a path in this case, but
- instead uses this signal to adjust the lambda parameter, which biases
- traffic away from that leg.
+ Reordering should be fairly simple task. By following using the sequence
+ number field in [SEQUENCING], endpoints can know how many cells are
+ still in flight on the other leg.
+
+ To reorder them properly, a buffer of out of order cells needs to be
+ kept. On the Exit side, this can quickly become overwhelming
+ considering ten of thousands of possible circuits can be held open
+ leading to gigabytes of memory being used. There is a clear potential
+ memory DoS vector which means that a tor implementation should be able
+ to limit the size of those queues.
+
+ Luckily, [BLEST_TOR] and the form of [LOWRTT_TOR] that only uses the
+ primary circuit will minimize or eliminate this out-of-order buffer.
+
+ XXX: The remainder of this section may be over-complicating things... We
+ only need these concepts if we want to use BLEST's lambda feedback.
+
+ The default for this queue size is governed by the 'cflx_reorder_client'
+ and 'cflx_reorder_srv' consensus parameters (see [CONSENSUS_PARAMS]).
+ 'cflx_reorder_srv' applies to Exits and onion services. Both parameters
+ can be overridden by Torrc, to larger or smaller than the consensus
+ parameter. (Low memory clients may want to lower it; SecureDrop onion
+ services or other high-upload services may want to raise it).
+
+ When the reorder queue hits this size, a RELAY_CONFLUX_XOFF is sent down
+ the circuit leg that has data waiting in the queue and use of that leg
+ must cease, until it drains to half of this value, at which point an
+ RELAY_CONFLUX_XON is sent. Note that this is different than the stream
+ XON/XOFF from Proposal 324.
+
+ XXX: [BLEST] actually does not cease use of a path in this case, but
+ instead uses this signal to adjust the lambda parameter, which biases
+ traffic away from that leg.
4. Security Considerations
4.1. Memory Denial of Service [MEMORY_DOS]
- Both reorder queues and retransmit buffers inherently represent a memory
- denial of service condition.
-
- For [RESUMPTION] retransmit buffers, endpoints that support this
- feature SHOULD free retransmit information as soon as they get close
- to memory pressure. This prevents resumption while data is in flight,
- but will not otherwise harm operation.
-
- For reorder buffers, adversaries can potentially impact this at any
- point, but most obviously and most severely from the client position.
-
- In particular, clients can lie about sequence numbers, sending cells
- with sequence numbers such that the next expected sequence number
- is never sent. They can do this repeatedly on many circuits, to exhaust
- memory at exits.
-
- One option is to only allow actual traffic splitting in the downstream
- direction, towards clients, and always use the primary circuit for
- everything in the upstream direction. However, the ability to support
- conflux from the client to the exit shows promise against traffic
- analysis (see [WTF_SPLIT]).
-
- The other option is to use [BLEST_TOR] from clients to exits, as it has
- predictable interleaved cell scheduling, and minimizes reorder queues
- at exits. If the ratios prescribed by that algorithm are not followed
- within some bounds, the other endpoint can close both circuits, and
- free the queue memory.
-
- This still leaves the possibility that intermediate relays may block
- a leg, allowing cells to traverse only one leg, thus still accumulating
- at the reorder queue. Clients can also spoof sequence numbers similarly,
- to make it appear that they are following [BLEST_TOR], without
- actually sending any data on one of the legs.
-
- To handle either of these cases, when a relay is under memory pressure,
- the circuit OOM killer SHOULD free and close circuits with the oldest
- reorder queue data, first. This heuristic was shown to be best during
- the [SNIPER] attack OOM killer iteration cycle.
+ Both reorder queues and retransmit buffers inherently represent a memory
+ denial of service condition.
+
+ For [RESUMPTION] retransmit buffers, endpoints that support this feature
+ SHOULD free retransmit information as soon as they get close to memory
+ pressure. This prevents resumption while data is in flight, but will not
+ otherwise harm operation.
+
+ For reorder buffers, adversaries can potentially impact this at any
+ point, but most obviously and most severely from the client position.
+
+ In particular, clients can lie about sequence numbers, sending cells
+ with sequence numbers such that the next expected sequence number is
+ never sent. They can do this repeatedly on many circuits, to exhaust
+ memory at exits.
+
+ One option is to only allow actual traffic splitting in the downstream
+ direction, towards clients, and always use the primary circuit for
+ everything in the upstream direction. However, the ability to support
+ conflux from the client to the exit shows promise against traffic
+ analysis (see [WTF_SPLIT]).
+
+ The other option is to use [BLEST_TOR] from clients to exits, as it has
+ predictable interleaved cell scheduling, and minimizes reorder queues at
+ exits. If the ratios prescribed by that algorithm are not followed
+ within some bounds, the other endpoint can close both circuits, and free
+ the queue memory.
+
+ This still leaves the possibility that intermediate relays may block a
+ leg, allowing cells to traverse only one leg, thus still accumulating at
+ the reorder queue. Clients can also spoof sequence numbers similarly, to
+ make it appear that they are following [BLEST_TOR], without actually
+ sending any data on one of the legs.
+
+ To handle either of these cases, when a relay is under memory pressure,
+ the circuit OOM killer SHOULD free and close circuits with the oldest
+ reorder queue data, first. This heuristic was shown to be best during
+ the [SNIPER] attack OOM killer iteration cycle.
4.2. Side Channels [SIDE_CHANNELS]
- Two potential side channels may be introduced by the use of Conflux:
- 1. RTT leg-use bias by altering SENDME latency
- 2. Location info leaks through the use of both leg's latencies
-
- For RTT and leg-use bias, Guard relays could delay legs to introduce
- a pattern into the delivery of cells at the exit relay, by varying the
- latency of SENDME cells (every 100th cell) to change the distribution
- of traffic to send information. This attack could be performed in either
- direction of traffic, to bias traffic load off of a particular Guard.
- If an adversary controls both Guards, it could in theory send a binary
- signal more easily, by alternating delays on each.
-
- However, this risk weighs against the potential benefits against traffic
- fingerprinting, as per [WTF_SPLIT]. Additionally, even ignoring
- cryptographic tagging attacks, this side channel provides significantly
- lower information over time than inter-packet-delay based side channels
- that are already available to Guards and routers along the path to the
- Guard.
-
- Tor currently provides no defenses against already existing
- single-circuit delay-based side channels, though both circuit padding
- and [BACKLIT] are potential options it could conceivably deploy. The
- [BACKLIT] paper also has an excellent review of the various methods
- that have been studied for such single circuit side channels, and
- the [BACKLIT] style RTT monitoring could be used to protect against
- these conflux side channels as well. Circuit padding can also help
- to obscure which cells are SENDMEs, since circuit padding is not
- counted towards SENDME totals.
-
- The second class of side channel is where the Exit relay may be able
- to use the two legs to further infer more information about client
- location. See [LATENCY_LEAK] for more details. It is unclear at
- this time how much more severe this is for two paths than just one.
-
- We should preserve the ability to disable conflux to and from Exit
- relays, should these side channels prove more severe, or should
- it prove possible to mitigate single-circuit side channels, but
- not conflux side channels.
-
- In all cases, all of these side channels appear less severe for onion
- service traffic, due to the higher path variability due to relay
- selection, as well as the end-to-end nature of conflux in that case.
- This indicates that our ability to enable/disable conflux for services
- should be separate from Exits.
+ Two potential side channels may be introduced by the use of Conflux:
+ 1. RTT leg-use bias by altering SENDME latency
+ 2. Location info leaks through the use of both leg's latencies
+
+ For RTT and leg-use bias, Guard relays could delay legs to introduce a
+ pattern into the delivery of cells at the exit relay, by varying the
+ latency of SENDME cells (every 100th cell) to change the distribution of
+ traffic to send information. This attack could be performed in either
+ direction of traffic, to bias traffic load off of a particular Guard.
+ If an adversary controls both Guards, it could in theory send a binary
+ signal more easily, by alternating delays on each.
+
+ However, this risk weighs against the potential benefits against traffic
+ fingerprinting, as per [WTF_SPLIT]. Additionally, even ignoring
+ cryptographic tagging attacks, this side channel provides significantly
+ lower information over time than inter-packet-delay based side channels
+ that are already available to Guards and routers along the path to the
+ Guard.
+
+ Tor currently provides no defenses against already existing
+ single-circuit delay-based side channels, though both circuit padding
+ and [BACKLIT] are potential options it could conceivably deploy. The
+ [BACKLIT] paper also has an excellent review of the various methods that
+ have been studied for such single circuit side channels, and the
+ [BACKLIT] style RTT monitoring could be used to protect against these
+ conflux side channels as well. Circuit padding can also help to obscure
+ which cells are SENDMEs, since circuit padding is not counted towards
+ SENDME totals.
+
+ The second class of side channel is where the Exit relay may be able to
+ use the two legs to further infer more information about client
+ location. See [LATENCY_LEAK] for more details. It is unclear at this
+ time how much more severe this is for two paths than just one.
+
+ We should preserve the ability to disable conflux to and from Exit
+ relays, should these side channels prove more severe, or should it prove
+ possible to mitigate single-circuit side channels, but not conflux side
+ channels.
+
+ In all cases, all of these side channels appear less severe for onion
+ service traffic, due to the higher path variability due to relay
+ selection, as well as the end-to-end nature of conflux in that case.
+ This indicates that our ability to enable/disable conflux for services
+ should be separate from Exits.
4.3. Traffic analysis [TRAFFIC_ANALYSIS]
- Even though conflux shows benefits against traffic analysis in [WTF_SPLIT],
- these gains may be moot if the adversary is able to perform packet counting
- and timing analysis at guards to guess which specific circuits are linked.
- In particular, the 3 way handshake in [LINKING_CIRCUITS] may be quite
- noticeable.
-
- As one countermeasure, it may be possible to eliminate the third leg
- (RELAY_CIRCUIT_LINKED_ACK) by computing the exit/service RTT via measuring
- the time between CREATED/REND_JOINED and RELAY_CIRCUIT_LINK, but this
- will introduce cross-component complexity into Tor's protocol that
- could quickly become unwieldy and fragile.
-
- Additionally, the conflux handshake may make onion services stand out
- more, regardless of the number of stages in the handshake. For this
- reason, it may be more wise to simply address these issues with circuit
- padding machines during circuit setup (see padding-spec.txt).
-
- Additional traffic analysis considerations arise when combining conflux
- with padding, for purposes of mitigating traffic fingerprinting. For this,
- it seems wise to treat the packet schedulers as another piece of a combined
- optimization problem in tandem with optimizing padding machines, perhaps
- introducing randomness or fudge factors their scheduling, as a parameterized
- distribution. For details, see
- https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDevelopment.md
-
- Finally, conflux may exacerbate forms of confirmation-based traffic
- analysis that close circuits to determine concretely if they were in use,
- since closing either leg might cause resumption to fail. TCP RST
- injection can perform this attack on the side, without surveillance
- capability. [RESUMPTION] with buffering of the inflight unacked
- package_window data, for retransmit, is a partial mitigation, if
- endpoints buffer this data for retransmission for a brief time even
- if both legs close. This seems more feasible for onion services,
- which are more vulnerable to this attack. However, if the adversary
- controls the client, they will notice the resumption re-link, and
- still obtain confirmation that way.
-
- It seems the only way to fully mitigate these kinds of attacks is with
- the Snowflake pluggable transport, which provides its own resumption
- and retransmit behavior. Additionally, Snowflake's use of UDP DTLS also
- protects against TCP RST injection, which we suspect to be the main
- vector for such attacks.
-
- In the future, a DTLS or QUIC transport for Tor such as masque could
- provide similar RST injection resistance, and resumption at
- Guard/Bridge nodes, as well.
+ Even though conflux shows benefits against traffic analysis in
+ [WTF_SPLIT], these gains may be moot if the adversary is able to perform
+ packet counting and timing analysis at guards to guess which specific
+ circuits are linked. In particular, the 3 way handshake in
+ [LINKING_CIRCUITS] may be quite noticeable.
+
+ As one countermeasure, it may be possible to eliminate the third leg
+ (RELAY_CIRCUIT_LINKED_ACK) by computing the exit/service RTT via
+ measuring the time between CREATED/REND_JOINED and RELAY_CIRCUIT_LINK,
+ but this will introduce cross-component complexity into Tor's protocol
+ that could quickly become unwieldy and fragile.
+
+ Additionally, the conflux handshake may make onion services stand out
+ more, regardless of the number of stages in the handshake. For this
+ reason, it may be more wise to simply address these issues with circuit
+ padding machines during circuit setup (see padding-spec.txt).
+
+ Additional traffic analysis considerations arise when combining conflux
+ with padding, for purposes of mitigating traffic fingerprinting. For
+ this, it seems wise to treat the packet schedulers as another piece of a
+ combined optimization problem in tandem with optimizing padding
+ machines, perhaps introducing randomness or fudge factors their
+ scheduling, as a parameterized distribution. For details, see
+ https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDevelopment.md
+
+ Finally, conflux may exacerbate forms of confirmation-based traffic
+ analysis that close circuits to determine concretely if they were in
+ use, since closing either leg might cause resumption to fail. TCP RST
+ injection can perform this attack on the side, without surveillance
+ capability. [RESUMPTION] with buffering of the inflight unacked
+ package_window data, for retransmit, is a partial mitigation, if
+ endpoints buffer this data for retransmission for a brief time even if
+ both legs close. This seems more feasible for onion services, which are
+ more vulnerable to this attack. However, if the adversary controls the
+ client, they will notice the resumption re-link, and still obtain
+ confirmation that way.
+
+ It seems the only way to fully mitigate these kinds of attacks is with
+ the Snowflake pluggable transport, which provides its own resumption and
+ retransmit behavior. Additionally, Snowflake's use of UDP DTLS also
+ protects against TCP RST injection, which we suspect to be the main
+ vector for such attacks.
+
+ In the future, a DTLS or QUIC transport for Tor such as masque could
+ provide similar RST injection resistance, and resumption at Guard/Bridge
+ nodes, as well.
5. System Interactions
@@ -665,7 +676,7 @@ Status: Draft
- EWMA and KIST
- CBT and number of guards
- Onion service circ obfuscation
- - Future UDP (it may increase the need for UDP to buffer before dropping)
+ - Future UDP (may increase need for UDP to buffer before dropping)
- Padding (no sequence numbers on padding cells, as per [SEQUENCING])
- Also, any padding machines may need re-tuning
- No 'cannibalization' of linked circuits
@@ -697,116 +708,116 @@ Appended A [ALTERNATIVES]
A.1 BEGIN/END sequencing [ALTERNATIVE_SEQUENCING]
- In this method of signaling, we increment the sequence number by 1
- only when we switch legs, and use BEGIN/END "bookends" to know that
- all data on a leg has been received.
-
- To achieve this, we add a small sequence number to the common relay
- header for all relay cells on linked circuits, as well as a field to
- signal the beginning of a sequence, intermediate data, and the end
- of a sequence.
-
- Relay command [1 byte]
- Recognized [2 bytes]
- StreamID [2 bytes]
- Digest [4 bytes]
- Length [2 bytes]
- > Switching [2 bits] # 01 = BEGIN, 00 = CONTINUE, 10 = END
- > Sequencing [6 bits]
- Data [PAYLOAD_LEN - 12 - Length bytes]
-
- These fields MUST be present on ALL end-to-end relay cells on each leg
- that come from the endpoint, following a RELAY_CIRCUIT_LINK command.
-
- They are absent on 'leaky pipe' RELAY_COMMAND_DROP and
- RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays,
- as opposed to the endpoint, to support padding.
-
- Sequence numbers are incremented by one when an endpoint switches legs
- to transmit a cell. This number will wrap; implementations should treat
- 0 as the next sequence after 2^6-1. Because we do not expect to support
- significantly more than 2 legs, and much fewer than 63, this is not an
- issue.
-
- The first cell on a new circuit MUST use the BEGIN code for switching.
- Cells are delivered from that circuit until an END switching signal is
- received, even if cells arrive first on another circuit with the next
- sequence number before and END switching field. Recipients MUST only
- deliver cells with a BEGIN, if their Sequencing number is one more than
- the last END.
+ In this method of signaling, we increment the sequence number by 1 only
+ when we switch legs, and use BEGIN/END "bookends" to know that all data
+ on a leg has been received.
+
+ To achieve this, we add a small sequence number to the common relay
+ header for all relay cells on linked circuits, as well as a field to
+ signal the beginning of a sequence, intermediate data, and the end of a
+ sequence.
+
+ Relay command [1 byte]
+ Recognized [2 bytes]
+ StreamID [2 bytes]
+ Digest [4 bytes]
+ Length [2 bytes]
+ > Switching [2 bits] # 01 = BEGIN, 00 = CONTINUE, 10 = END
+ > Sequencing [6 bits]
+ Data [PAYLOAD_LEN - 12 - Length bytes]
+
+ These fields MUST be present on ALL end-to-end relay cells on each leg
+ that come from the endpoint, following a RELAY_CIRCUIT_LINK command.
+
+ They are absent on 'leaky pipe' RELAY_COMMAND_DROP and
+ RELAY_COMMAND_PADDING_NEGOTIATED cells that come from middle relays, as
+ opposed to the endpoint, to support padding.
+
+ Sequence numbers are incremented by one when an endpoint switches legs
+ to transmit a cell. This number will wrap; implementations should treat
+ 0 as the next sequence after 2^6-1. Because we do not expect to support
+ significantly more than 2 legs, and much fewer than 63, this is not an
+ issue.
+
+ The first cell on a new circuit MUST use the BEGIN code for switching.
+ Cells are delivered from that circuit until an END switching signal is
+ received, even if cells arrive first on another circuit with the next
+ sequence number before and END switching field. Recipients MUST only
+ deliver cells with a BEGIN, if their Sequencing number is one more than
+ the last END.
A.2 Alternative Link Handshake [ALTERNATIVE_LINKING]
- The circuit linking in [LINKING_CIRCUITS] could be done as encrypted
- ntor onionskin extension fields, similar to those used by v3 onions.
-
- This approach has at least four problems:
- i). For onion services, since the onionskins traverse the intro circuit
- and then return on the rend circuit, this handshake cannot measure
- RTT there.
- ii). Since these onionskins are larger, and have no PFS, an adversary
- at the middle relay knows that the onionskin is for linking, and
- can potentially try to obtain the onionskin key for attacks on
- the link.
- iii). It makes linking circuits more fragile, since they could timeout
- due to CBT, or other issues during construction.
- iv). The overhead in processing this onionskin through onionskin queues
- adds additional time for linking, even in the Exit case, making
- that RTT potentially noisy.
-
- Additionally, it is not clear that this approach actually saves us
- anything in terms of setup time, because we can optimize away the
- linking phase using Proposal 325, to combine initial RELAY_BEGIN cells
- with RELAY_CIRCUIT_LINK.
+ The circuit linking in [LINKING_CIRCUITS] could be done as encrypted
+ ntor onionskin extension fields, similar to those used by v3 onions.
+
+ This approach has at least four problems:
+ i). For onion services, since onionskins traverse the intro circuit
+ and return on the rend circuit, this handshake cannot measure
+ RTT there.
+ ii). Since these onionskins are larger, and have no PFS, an adversary
+ at the middle relay knows that the onionskin is for linking, and
+ can potentially try to obtain the onionskin key for attacks on
+ the link.
+ iii). It makes linking circuits more fragile, since they could timeout
+ due to CBT, or other issues during construction.
+ iv). The overhead in processing this onionskin in onionskin queues
+ adds additional time for linking, even in the Exit case, making
+ that RTT potentially noisy.
+
+ Additionally, it is not clear that this approach actually saves us
+ anything in terms of setup time, because we can optimize away the
+ linking phase using Proposal 325, to combine initial RELAY_BEGIN cells
+ with RELAY_CIRCUIT_LINK.
A.3. Alternative RTT measurement [ALTERNATIVE_RTT]
- Instead of measuring RTTs during [LINKING_CIRCUITS], we could create
- PING/PONG cells, whose sole purpose is to allow endpoints to measure
- RTT.
-
- This was rejected for several reasons. First, during circuit use, we
- already have SENDMEs to measure RTT. Every 100 cells (or
- 'circwindow_inc' from Proposal 324), we are able to re-measure RTT based
- on the time between that Nth cell and the SENDME ack. So we only need
- PING/PONG to measure initial circuit RTT.
-
- If we were able to use onionskins, as per [ALTERNATIVE_LINKING] above,
- we might be able to specify a PING/PONG/PING handshake solely for
- measuring initial RTT, especially for onion service circuits.
-
- The reason for not making a dedicated PING/PONG for this purpose is that
- it is context-free. Even if we were able to use onionskins for linking
- and resumption, to avoid additional data in handshake that just measures
- RTT, we would have to enforce that this PING/PONG/PING only follows the
- exact form needed by this proposal, at the expected time, and at no
- other points.
-
- If we do not enforce this specific use of PING/PONG/PING, it becomes
- another potential side channel, for use in attacks such as [DROPMARK].
-
- In general, Tor is planning to remove current forms of context-free and
- semantic-free cells from its protocol:
- https://gitlab.torproject.org/tpo/core/torspec/-/issues/39
-
- We should not add more.
+ Instead of measuring RTTs during [LINKING_CIRCUITS], we could create
+ PING/PONG cells, whose sole purpose is to allow endpoints to measure
+ RTT.
+
+ This was rejected for several reasons. First, during circuit use, we
+ already have SENDMEs to measure RTT. Every 100 cells (or
+ 'circwindow_inc' from Proposal 324), we are able to re-measure RTT based
+ on the time between that Nth cell and the SENDME ack. So we only need
+ PING/PONG to measure initial circuit RTT.
+
+ If we were able to use onionskins, as per [ALTERNATIVE_LINKING] above,
+ we might be able to specify a PING/PONG/PING handshake solely for
+ measuring initial RTT, especially for onion service circuits.
+
+ The reason for not making a dedicated PING/PONG for this purpose is that
+ it is context-free. Even if we were able to use onionskins for linking
+ and resumption, to avoid additional data in handshake that just measures
+ RTT, we would have to enforce that this PING/PONG/PING only follows the
+ exact form needed by this proposal, at the expected time, and at no
+ other points.
+
+ If we do not enforce this specific use of PING/PONG/PING, it becomes
+ another potential side channel, for use in attacks such as [DROPMARK].
+
+ In general, Tor is planning to remove current forms of context-free and
+ semantic-free cells from its protocol:
+ https://gitlab.torproject.org/tpo/core/torspec/-/issues/39
+
+ We should not add more.
Appendix B: Acknowledgments
- Thanks to Per Hurtig for helping us with the framing of the MPTCP
- problem space.
-
- Thanks to Simone Ferlin for clarifications on the [BLEST]
- paper, and for pointing us at the Linux kernel implementation.
-
- Extreme thanks goes again to Toke Høiland-Jørgensen, who helped
- immensely towards our understanding of how the BLEST condition relates
- to edge connection pushback, and for clearing up many other
- misconceptions we had.
-
- Finally, thanks to Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian
- Goldberg, for the original [CONFLUX] paper!
+ Thanks to Per Hurtig for helping us with the framing of the MPTCP
+ problem space.
+
+ Thanks to Simone Ferlin for clarifications on the [BLEST] paper, and for
+ pointing us at the Linux kernel implementation.
+
+ Extreme thanks goes again to Toke Høiland-Jørgensen, who helped
+ immensely towards our understanding of how the BLEST condition relates
+ to edge connection pushback, and for clearing up many other
+ misconceptions we had.
+
+ Finally, thanks to Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian
+ Goldberg, for the original [CONFLUX] paper!
References: