aboutsummaryrefslogtreecommitdiff
path: root/proposals
diff options
context:
space:
mode:
Diffstat (limited to 'proposals')
-rw-r--r--proposals/000-index.txt22
-rw-r--r--proposals/196-transport-control-ports.txt4
-rw-r--r--proposals/217-ext-orport-auth.txt6
-rw-r--r--proposals/318-limit-protovers.md3
-rw-r--r--proposals/324-rtt-congestion-control.txt1120
-rw-r--r--proposals/325-packed-relay-cells.md73
-rw-r--r--proposals/326-tor-relay-well-known-uri-rfc8615.md9
-rw-r--r--proposals/328-relay-overload-report.md218
-rw-r--r--proposals/329-traffic-splitting.txt861
-rw-r--r--proposals/330-authority-contact.md180
-rw-r--r--proposals/331-res-tokens-for-anti-dos.md618
-rw-r--r--proposals/332-ntor-v3-with-extra-data.md430
-rw-r--r--proposals/333-vanguards-lite.md74
-rw-r--r--proposals/BY_INDEX.md11
-rw-r--r--proposals/README.md11
-rw-r--r--proposals/ideas/xxx-backward-ecn.txt85
16 files changed, 3306 insertions, 419 deletions
diff --git a/proposals/000-index.txt b/proposals/000-index.txt
index b6658ff..adfe0ba 100644
--- a/proposals/000-index.txt
+++ b/proposals/000-index.txt
@@ -116,7 +116,7 @@ Proposals by number:
193 Safe cookie authentication for Tor controllers [CLOSED]
194 Mnemonic .onion URLs [SUPERSEDED]
195 TLS certificate normalization for Tor 0.2.4.x [DEAD]
-196 Extended ORPort and TransportControlPort [FINISHED]
+196 Extended ORPort and TransportControlPort [CLOSED]
197 Message-based Inter-Controller IPC Channel [REJECTED]
198 Restore semantics of TLS ClientHello [CLOSED]
199 Integration of BridgeFinder and BridgeFinderHelper [OBSOLETE]
@@ -137,7 +137,7 @@ Proposals by number:
214 Allow 4-byte circuit IDs in a new link protocol [CLOSED]
215 Let the minimum consensus method change with time [CLOSED]
216 Improved circuit-creation key exchange [CLOSED]
-217 Tor Extended ORPort Authentication [FINISHED]
+217 Tor Extended ORPort Authentication [CLOSED]
218 Controller events to better understand connection/circuit usage [CLOSED]
219 Support for full DNS and DNSSEC resolution in Tor [NEEDS-REVISION]
220 Migrate server identity keys to Ed25519 [CLOSED]
@@ -238,7 +238,7 @@ Proposals by number:
315 Updating the list of fields required in directory documents [OPEN]
316 FlashFlow: A Secure Speed Test for Tor (Parent Proposal) [DRAFT]
317 Improve security aspects of DNS name resolution [NEEDS-REVISION]
-318 Limit protover values to 0-63 [ACCEPTED]
+318 Limit protover values to 0-63 [CLOSED]
319 RELAY_FRAGMENT cells [OPEN]
320 Removing TAP usage from v2 onion services [REJECTED]
321 Better performance and usability for the MyFamily option (v2) [OPEN]
@@ -248,6 +248,11 @@ Proposals by number:
325 Packed relay cells: saving space on small commands [OPEN]
326 The "tor-relay" Well-Known Resource Identifier [OPEN]
327 A First Take at PoW Over Introduction Circuits [DRAFT]
+328 Make Relays Report When They Are Overloaded [DRAFT]
+329 Overcoming Tor's Bottlenecks with Traffic Splitting [DRAFT]
+330 Modernizing authority contact entries [OPEN]
+331 Res tokens: Anonymous Credentials for Onion Service DoS Resilience [DRAFT]
+332 Ntor protocol with extra data, version 3 [OPEN]
Proposals by status:
@@ -257,6 +262,9 @@ Proposals by status:
294 TLS 1.3 Migration
316 FlashFlow: A Secure Speed Test for Tor (Parent Proposal)
327 A First Take at PoW Over Introduction Circuits
+ 328 Make Relays Report When They Are Overloaded
+ 329 Overcoming Tor's Bottlenecks with Traffic Splitting
+ 331 Res tokens: Anonymous Credentials for Onion Service DoS Resilience
NEEDS-REVISION:
212 Increase Acceptable Consensus Age [for 0.2.4.x+]
219 Support for full DNS and DNSSEC resolution in Tor [for 0.2.5.x]
@@ -286,6 +294,8 @@ Proposals by status:
324 RTT-based Congestion Control for Tor
325 Packed relay cells: saving space on small commands
326 The "tor-relay" Well-Known Resource Identifier
+ 330 Modernizing authority contact entries
+ 332 Ntor protocol with extra data, version 3
ACCEPTED:
265 Load Balancing with Overhead Parameters [for 0.2.9.x]
275 Stop including meaningful "published" time in microdescriptor consensus [for 0.3.1.x-alpha]
@@ -295,7 +305,6 @@ Proposals by status:
311 Tor Relay IPv6 Reachability
312 Tor Relay Automatic IPv6 Address Discovery
313 Tor Relay IPv6 Statistics
- 318 Limit protover values to 0-63
META:
000 Index of Tor Proposals
001 The Tor Proposal Process
@@ -306,8 +315,6 @@ Proposals by status:
290 Continuously update consensus methods
FINISHED:
160 Authorities vote for bandwidth offsets in consensus [for 0.2.1.x]
- 196 Extended ORPort and TransportControlPort [in 0.2.5.2-alpha]
- 217 Tor Extended ORPort Authentication [for 0.2.5.x]
232 Pluggable Transport through SOCKS proxy [in 0.2.6]
260 Rendezvous Single Onion Services [in 0.2.9.3-alpha]
282 Remove "Named" and "Unnamed" handling from consensus voting [for 0.3.3.x]
@@ -365,6 +372,7 @@ Proposals by status:
186 Multiple addresses for one OR or bridge [for 0.2.4.x+]
187 Reserve a cell type to allow client authorization [for 0.2.3.x]
193 Safe cookie authentication for Tor controllers
+ 196 Extended ORPort and TransportControlPort [in 0.2.5.2-alpha]
198 Restore semantics of TLS ClientHello [for 0.2.4.x]
200 Adding new, extensible CREATE, EXTEND, and related cells [in 0.2.4.8-alpha]
204 Subdomain support for Hidden Service addresses
@@ -375,6 +383,7 @@ Proposals by status:
214 Allow 4-byte circuit IDs in a new link protocol [in 0.2.4.11-alpha]
215 Let the minimum consensus method change with time [in 0.2.6.1-alpha]
216 Improved circuit-creation key exchange [in 0.2.4.8-alpha]
+ 217 Tor Extended ORPort Authentication [for 0.2.5.x]
218 Controller events to better understand connection/circuit usage [in 0.2.5.2-alpha]
220 Migrate server identity keys to Ed25519 [in 0.3.0.1-alpha]
221 Stop using CREATE_FAST [for 0.2.5.x]
@@ -405,6 +414,7 @@ Proposals by status:
302 Hiding onion service clients using padding [in 0.4.1.1-alpha]
304 Extending SOCKS5 Onion Service Error Codes
305 ESTABLISH_INTRO Cell DoS Defense Extension
+ 318 Limit protover values to 0-63 [in 0.4.5.1-alpha]
SUPERSEDED:
112 Bring Back Pathlen Coin Weight
113 Simplifying directory authority administration
diff --git a/proposals/196-transport-control-ports.txt b/proposals/196-transport-control-ports.txt
index afe6f11..f4d8a0e 100644
--- a/proposals/196-transport-control-ports.txt
+++ b/proposals/196-transport-control-ports.txt
@@ -2,7 +2,7 @@ Filename: 196-transport-control-ports.txt
Title: Extended ORPort and TransportControlPort
Author: George Kadianakis, Nick Mathewson
Created: 14 Mar 2012
-Status: Finished
+Status: Closed
Implemented-In: 0.2.5.2-alpha
1. Overview
@@ -196,7 +196,7 @@ Implemented-In: 0.2.5.2-alpha
5. Authentication
- To defend against cross-protocol attacks on the Extended ORPOrt,
+ To defend against cross-protocol attacks on the Extended ORPort,
proposal 213 defines an authentication scheme that should be used to
protect it.
diff --git a/proposals/217-ext-orport-auth.txt b/proposals/217-ext-orport-auth.txt
index 1305276..3734924 100644
--- a/proposals/217-ext-orport-auth.txt
+++ b/proposals/217-ext-orport-auth.txt
@@ -2,7 +2,7 @@ Filename: 217-ext-orport-auth.txt
Title: Tor Extended ORPort Authentication
Author: George Kadianakis
Created: 28-11-2012
-Status: Finished
+Status: Closed
Target: 0.2.5.x
1. Overview
@@ -131,7 +131,7 @@ Target: 0.2.5.x
"ExtORPort authentication server-to-client hash" | ClientNonce | ServerNonce)
+ ServerNonce is 32 random octets.
- Upon receiving that data, the client computers ServerHash herself and
+ Upon receiving that data, the client computes ServerHash herself and
validates it against the ServerHash provided by the server.
If the server-provided ServerHash is invalid, the client MUST
@@ -146,7 +146,7 @@ Target: 0.2.5.x
HMAC-SHA256(CookieString,
"ExtORPort authentication client-to-server hash" | ClientNonce | ServerNonce)
- Upon receiving that data, the server computers ClientHash herself and
+ Upon receiving that data, the server computes ClientHash herself and
validates it against the ClientHash provided by the client.
Finally, the server replies with:
diff --git a/proposals/318-limit-protovers.md b/proposals/318-limit-protovers.md
index d96a2c7..17e12b2 100644
--- a/proposals/318-limit-protovers.md
+++ b/proposals/318-limit-protovers.md
@@ -3,7 +3,8 @@ Filename: 318-limit-protovers.md
Title: Limit protover values to 0-63.
Author: Nick Mathewson
Created: 11 May 2020
-Status: Accepted
+Status: Closed
+Implemented-In: 0.4.5.1-alpha
```
# Limit protover values to 0-63.
diff --git a/proposals/324-rtt-congestion-control.txt b/proposals/324-rtt-congestion-control.txt
index e288563..f4397b6 100644
--- a/proposals/324-rtt-congestion-control.txt
+++ b/proposals/324-rtt-congestion-control.txt
@@ -60,7 +60,7 @@ searching.
Section [CONGESTION_SIGNALS] specifies how to use Tor's SENDME flow
control cells to measure circuit RTT, for use as an implicit congestion
-signal. It also specifies an explicit congestion signal, which can be
+signal. It also mentions an explicit congestion signal, which can be
used as a future optimization once all relays upgrade.
Section [CONTROL_ALGORITHMS] specifies two candidate congestion window
@@ -77,7 +77,7 @@ Section [SYSTEM_INTERACTIONS] describes how congestion control will
interact with onion services, circuit padding, and conflux-style traffic
splitting.
-Section [EVALUATION] describes how we will evaluate and tune our
+Section [EVALUATION] describes how we will evaluate and tune our
options for control algorithms and their parameters.
Section [PROTOCOL_SPEC] describes the specific cell formats and
@@ -99,11 +99,15 @@ To facilitate this, we will also change SENDME accounting logic
slightly. These changes only require clients, exits, and dirauths to
update.
-As a future optimization, we also specify an explicit congestion signal.
-This signal *will* require all relays on a circuit to upgrade to support
-it, but it will reduce congestion by making the first congestion event
+As a future optimization, it is possible to send a direct ECN congestion
+signal. This signal *will* require all relays on a circuit to upgrade to
+support it, but it will reduce congestion by making the first congestion event
on a circuit much faster to detect.
+To reduce confusion and complexity of this proposal, this signal has been
+moved to the ideas repository, under xxx-backward-ecn.txt [BACKWARD_ECN].
+
+
2.1 RTT measurement
Recall that Tor clients, exits, and onion services send
@@ -116,45 +120,67 @@ and the arrival of the SENDME command that the other endpoint
immediately sends to ack that 100th cell.
Circuits will record the current RTT measurement as a field in their
-circuit_t data structure. They will also record the minimum and maximum
-RTT seen so far.
+circuit_t data structure. The current RTT is N-EWMA smoothed[27] over a
+'cc_ewma_cwnd_cnt' multiple of congestion window worth of sendme acks.
+
+Circuits will also record the minimum and maximum RTT seen so far.
Algorithms that make use of this RTT measurement for congestion
window update are specified in [CONTROL_ALGORITHMS].
+2.1.1. Clock Jump Heuristics [CLOCK_HEURISTICS]
+
+The timestamps for RTT (and BDP) are measured using Tor's
+monotime_absolute_usec() API. This API is designed to provide a monotonic
+clock that only moves forward. However, depending on the underlying system
+clock, this may result in the same timestamp value being returned for long
+periods of time, which would result in RTT 0-values. Alternatively, the clock
+may jump forward, resulting in abnormally large RTT values.
+
+To guard against this, we perform a series of heuristic checks on the time delta
+measured by the RTT estimator, and if these heurtics detect a stall or a jump,
+we do not use that value to update RTT or BDP, nor do we update any congestion
+control algorithm information that round.
+
+If the time delta is 0, that is always treated as a clock stall.
+
+If we have measured at least 'cc_bwe_min' RTT values or we have successfully
+exited slow start, then every sendme ACK, the new candidate RTT is compared to
+the stored EWMA RTT. If the new RTT is either 100 times larger than the EWMA
+RTT, or 100 times smaller than the stored EWMA RTT, then we do not record that
+estimate, and do not update BDP or the congestion control algorithms for that
+SENDME ack.
+
2.2. SENDME behavior changes
We will make four major changes to SENDME behavior to aid in computing
and using RTT as a congestion signal.
-First, we will need to establish a ProtoVer of "CCtrl=1" to signal
+First, we will need to establish a ProtoVer of "FlowCtrl=2" to signal
support by Exits for the new SENDME format and congestion control
algorithm mechanisms. We will need a similar announcement in the onion
service descriptors of services that support congestion control.
Second, we will turn CIRCWINDOW_INCREMENT into a consensus parameter
-'circwindow_inc', instead of using a hardcoded value of 100 cells. It is
+cc_sendme_inc, instead of using a hardcoded value of 100 cells. It is
likely that more frequent SENDME cells will provide quicker reaction to
congestion, since the RTT will be measured more often. If
experimentation in Shadow shows that more frequent SENDMEs reduce
congestion and improve performance but add significant overhead, we can
reduce SENDME overhead by allowing SENDME cells to carry stream data, as
-well.
+well, using Proposal 325.
- TODO: If two endpoints view different consensus parameters for
- 'circwindow_inc', we will have complications measuring RTT,
- as well as complications for authenticated SENDME hash
- accounting. We need a way for endpoints to negotiate SENDME
- pacing with eachother, perhaps during circuit setup. This will
- require changes to the Onionskin/CREATE cell format (and
- RELAY_COMMAND_EXTEND), as mentioned in Section [PROTOCOL_SPEC].
+ TODO: Refer to or include final version of the negotiation proposal for
+ this: https://gitlab.torproject.org/tpo/core/tor/-/issues/40377
-Second, all end-to-end relay cells except RELAY_COMMAND_DROP and
-RELAY_COMMAND_INTRODUCE1 will count towards SENDME cell counts. The
-details behind how these cells are handled is addressed in section
-[SYSTEM_INTERACTIONS].
+Second, all end-to-end relay cells except RELAY_COMMAND_DROP and SENDME
+itself will count towards SENDME cell counts. The details behind how these
+cells are handled is addressed in section [SYSTEM_INTERACTIONS].
- TODO: List any other exceptions. There probably are some more.
+ XXX: Hrmm, this is not strictly necessary for this proposal to function.
+ It may make conflux easier though, but we can defer it to then. The
+ current implementation only counts RELAY_COMMAND_DATA towards sendme
+ acks, which is the same behavior as the old fixed sendme implementation.
Third, authenticated SENDMEs can remain as-is in terms of protocol
behavior, but will require some implementation updates to account for
@@ -171,72 +197,27 @@ examine include:
Fourth, stream level SENDMEs will be eliminated. Details on handling
streams and backpressure is covered in [FLOW_CONTROL].
-2.3. Backward ECN signaling [BACKWARD_ECN]
-
-As an optimization after the RTT deployment, we will deploy an explicit
-congestion control signal by allowing relays to modify the
-cell_t.command field when they detect congestion, on circuits for which
-all relays have support for this signal (as mediated by Tor protocol
-version handshake via the client). This is taken from the Options
-mail[1], section BACKWARD_ECN_TOR.
-
-To detect congestion in order to deliver this signal, we will deploy a
-simplified version of the already-simple CoDel algorithm on each
-outbound TLS connection at relays.
- https://queue.acm.org/detail.cfm?id=2209336
- https://tools.ietf.org/html/rfc8289
-
-Each cell will get a timestamp upon arrival at a relay that will allow
-us to measure how long it spends in queues, all the way to hitting a TLS
-outbuf.
-
-The duration of total circuitmux queue time for each cell will be
-compared a consensus parameter 'min_queue_target', which is set to 5% of
-min network RTT. (This mirrors the CoDel TARGET parameter).
-
-Additionally, an inspection INTERVAL parameter 'queue_interval' governs
-how long queue lengths must exceed 'min_queue_target' before a circuit
-is declared congested. This mirrors the CoDel INTERVAL parameter, and it
-should default to approximately 50-100% of average network RTT.
-
-As soon as the cells of a circuit spend more than 'min_queue_target'
-time in queues for at least 'queue_interval' amount of time, per-circuit
-flag 'ecn_exit_slow_start' will be set to 1. As soon as a cell is
-available in the opposite direction on that circuit, the relay will flip
-the cell_t.command of from CELL_COMMAND_RELAY to
-CELL_COMMAND_RELAY_CONGESTION. (We must wait for a cell in the opposite
-direction because that is the sender that caused the congestion).
-
-This enhancement will allow endpoints to very quickly exit from
-[CONTROL_ALGORITHM] "slow start" phase (during which, the congestion
-window increases exponentially). The ability to more quickly exit the
-exponential slow start phase during congestion will help reduce queue
-sizes at relays.
-
-To avoid side channels, this cell must only be flipped on
-CELL_COMMAND_RELAY, and not CELL_COMMAND_RELAY_EARLY. Additionally, all
-relays MUST enforce that only *one* such cell command is flipped, per
-direction, per circuit. Any additional CELL_COMMAND_RELAY_CONGESTION
-cells seen by any relay or client MUST cause those circuit participants
-to immediately close the circuit.
-
-As a further optimization, if no relay cells are pending in the opposite
-direction as congestion is happening, we can send a zero-filled cell
-instead. In the forward direction of the circuit, we can send this cell
-without any crypto layers, so long as further relays enforce that the
-contents are zero-filled, to avoid side channels.
-
3. Congestion Window Update Algorithms [CONTROL_ALGORITHMS]
-We specify two candidate window update algorithms. The specification
-describes the update algorithms for the slow start phase and the
-steady state phase separately, with some explanation. Then the
-combined algorithm is given.
-
-Note that these algorithms differ slightly from the background tor-dev
-mails[1,2], due to corrections and improvements. Hence they have been
-given different names than in those two mails.
+In general, the goal of congestion control is to ensure full and fair
+utilization of the capacity of a network path -- in the case of Tor the spare
+capacity of the circuit. This is accomplished by setting the congestion window
+to target the Bandwidth-Delay Product[28] (BDP) of the circuit in one way or
+another, so that the total data outstanding is roughly equal to the actual
+transit capacity of the circuit.
+
+There are several ways to update a congestion window to target the BDP. Some
+use direct BDP estimation, where as others use backoff properties to achieve
+this. We specify three BDP estimation algorithms in the [BDP_ESTIMATION]
+sub-section, and three congestion window update algorithms in [TOR_WESTWOOD],
+[TOR_VEGAS], and [TOR_NOLA].
+
+Note that the congestion window update algorithms differ slightly from the
+background tor-dev mails[1,2], due to corrections and improvements. Hence they
+have been given different names than in those two mails. The third algorithm,
+[TOR_NOLA], simply uses the latest BDP estimate directly as its congestion
+window.
These algorithms will be evaluated by running Shadow simulations, to
help determine parameter ranges, but experimentation on the live network
@@ -245,21 +226,28 @@ when in competition with our current SENDME behavior, as used by real
network traffic. This experimentation and tuning is detailed in section
[EVALUATION].
-All of these algorithms have rules to update 'cwnd' - the current
-congestion window. In the C Tor reference implementation, 'cwnd' is
-called the circuit 'package_window'. C Tor also maintains a
-'deliver_window', which it uses to track how many cells it has received,
-in order to send the appropriate number of SENDME acks.
-
- TODO: This 'deliver_window' count can be updated by the other
- endpoint using the congestion control rules to watch for
- cheating. Alternatively, it can be simplified just to count
- the number of cells we get until we send a SENDME.
+All of these algorithms have rules to update 'cwnd' - the current congestion
+window, which starts out at a value controlled by consensus parameter
+'cc_cwnd_init'. The algorithms also keep track of 'inflight', which is a count
+of the number of cells currently not yet acked by a SENDME. The algorithm MUST
+ensure that cells cease being sent if 'cwnd - inflight <= 0'. Note that this
+value CAN become negative in the case where the cwnd is reduced while packets
+are inflight.
+
+While these algorithms are in use, updates and checks of the current
+'package_window' field are disabled. Where a 'package_window' value is
+still needed, for example by cell packaging schedulers, 'cwnd - inflight' is
+used (with checks to return 0 in the event of negative values).
+
+The 'deliver_window' field is still used to decide when to send a SENDME. In C
+tor, the deliver window is initially set at 1000, but it never gets below 900,
+because authenticated sendmes (Proposal 289) require that we must send only
+one SENDME at a time, and send it immediately after 100 cells are received.
+This property turns out to be very useful for [BDP_ESTIMATION].
Implementation of different algorithms should be very simple - each
-algorithm should have a different set of package_window update functions
-depending on the selected algorithm, as specified by consensus parameter
-'cc_alg'.
+algorithm should have a different update function depending on the selected algorithm,
+as specified by consensus parameter 'cc_alg'.
For C Tor's current flow control, these functions are defined in sendme.c,
and are called by relay.c:
@@ -282,235 +270,323 @@ after too many congestion signals. These are deliberate choices that
simplify the algorithms and also should provide better performance for
Tor workloads.
- TODO: We may want to experiment with adding revert-to-slow-start back
- in, but slow start is very expensive in a lot of ways, so let's
- see if we can avoid falling back into it, if at all possible.
+In all cases, variables in these sections are either consensus parameters
+specified in [CONSENSUS_PARAMETERS], or scoped to the circuit. Consensus
+parameters for congestion control are all prefixed by cc_. Everything else
+is circuit-scoped.
-3.1. Tor Westwood: TCP Westwood using RTT signaling [TOR_WESTWOOD]
- http://intronetworks.cs.luc.edu/1/html/newtcps.html#tcp-westwood
- http://nrlweb.cs.ucla.edu/nrlweb/publication/download/99/2001-mobicom-0.pdf
- http://cpham.perso.univ-pau.fr/TCP/ccr_v31.pdf
- https://c3lab.poliba.it/images/d/d7/Westwood_linux.pdf
-Recall that TCP Westwood is basically TCP Reno, but it uses RTT to help
-estimate the bandwidth-delay-product (BDP) of the link, and use that for
-"Fast recovery" after a congestion signal arrives.
+3.1. Estimating Bandwidth-Delay Product [BDP_ESTIMATION]
-We will also be using the RTT congestion signal as per BOOTLEG_RTT_TOR
-here, from the Options mail[1] and Defenestrator paper[3]. Recall that
-BOOTLEG_RTT_TOR emits a congestion signal when the current RTT falls
-below some fractional threshold ('rtt_thresh') fraction between RTT_min
-and RTT_max:
+At a high-level, there are three main ways to estimate the Bandwidth-Delay
+Product: by using the current congestion window and RTT, by using the inflight
+cells and RTT, and by measuring SENDME arrival rate.
- RTT_current < (1−rtt_thresh)*RTT_min + rtt_thresh*RTT_max
+All three estimators are updated every SENDME ack arrival.
-We can also optionally use the ECN signal described in [BACKWARD_ECN]
-above, to exit Slow Start.
+The SENDME arrival rate is the most accurate way to estimate BDP, but it
+requires averaging over multiple SENDME acks to do so. The congestion window
+and inflight estimates rely on the congestion algorithm more or less correctly
+tracking an approximation of the BDP, and then use current and minimum RTT to
+compensate for overshoot.
-Tor Westwood will require each circuit endpoint to maintain a
-Bandwidth-Delay-Product (BDP) and Bandwidth Estimate (BWE) variable.
+The SENDME estimator tends to be accurate after ~3-5 SENDME acks. The cwnd and
+inflight estimators tend to be accurate once the congestion window exceeds
+BDP.
-The bandwidth estimate is the current congestion window size divided by
-the RTT estimate:
+We specify all three because they are all useful in various cases. These cases
+are broken up and combined to form the Piecewise BDP estimator.
- BWE = cwnd / RTT_current
+The SENDME estimator is smoothed through N-EWMA averaging on these
+estimators[27], to smooth out temporary effects. This is accomplished by using
+an EWMA function with alpha = 2/(N+1):
-The BDP estimate is computed by multiplying the Bandwidth estimate by
-the circuit latency:
+ N_EWMA = BDP*2/(N+1) + N_EWMA_prev*(N-1)/(N+1).
+
+N is specified in terms of the number of current congestion windows worth of
+SENDMEs to average, via consensus parameter 'cc_ewma_cwnd_cnt'.
+
+The cwnd and inflight estimators also use N-EWMA smoothing in the same way,
+but only smooth the current RTT estimate, not the congestion window value.
+
+3.1.1. SENDME arrival BDP estimation
+
+The most accurate way to estimate BDP is to measure the amount of time between
+SENDME acks. In this period of time, we know that the endpoint successfully
+received 'cc_sendme_inc' cells.
+
+This means that the bandwidth of the circuit is then calculated as:
+
+ BWE = cc_sendme_inc/sendme_ack_timestamp_delta
+
+The bandwidth delay product of the circuit is calculated by multiplying this
+bandwidth estimate by the *minimum* RTT time of the circuit (to avoid counting
+queue time):
BDP = BWE * RTT_min
- TODO: Different papers on TCP Westwood and TCP Vegas recommend
- different methods for calculating BWE. See citations for
- details, but common options are 'packets_in_flight/RTT_current'
- or 'circwindow_inc*sendme_arrival_rate'. They also recommend
- averaging and filtering of the BWE, due to ack compression in
- inbound queues. We will need to experiment to determine how to
- best compute the BWE for Tor circuits.
+In order to minimize the effects of ack compression (aka SENDME responses
+becoming close to one another due to queue delay on the return), we
+maintain a history a full congestion window worth of previous SENDME
+timestamps.
-3.1.1. Tor Westwood: Slow Start
+With this, the calculation becomes:
-Prior to the first congestion signal, Tor Westwood will update its
-congestion window exponentially, as per Slow Start.
+ BWE = (num_sendmes-1) * cc_sendme_inc / num_sendme_timestamp_delta
+ BDP = BWE * RTT_min
-Recall that this first congestion signal can be either BOOTLEG_RTT_TOR's
-RTT threshold signal, or BACKWARD_ECN's cell command signal.
+Note that because we are counting the number of cells *between* the first
+and last sendme of the congestion window, we must subtract 1 from the number
+of sendmes actually recieved. Over the time period between the first and last
+sendme of the congestion window, the other endpoint successfully read
+(num_sendmes-1) * cc_sendme_inc cells.
-For simplicity, we will just write the BOOTLEG_RTT_TOR check, which
-compares the current RTT measurement to the observed min and max RTT,
-using the consensus parameter 'rtt_thresh'.
+Furthermore, because the timestamps are microseconds, to avoid integer
+truncation, we compute the BDP using multiplication first:
-This section of the update algorithm is:
+ BDP = (num_sendmes-1) * cc_sendme_inc * RTT_min / num_sendme_timestamp_delta
- cwnd = cwnd + circwindow_inc # For acked cells
+Note that the SENDME BDP estimation will only work after two (2) SENDME acks
+have been received. Additionally, it tends not to be stable unless at least
+five (5) num_sendme's are used, due to ack compression. This is controlled by
+the 'cc_bwe_min' consensus parameter. Finally, if [CLOCK_HEURISTICS] have
+detected a clock jump or stall, this estimator is not updated.
- # BOOTLEG_RTT_TOR threshold check:
- if RTT_current < (1−rtt_thresh)*RTT_min + rtt_thresh*RTT_max:
- cwnd = cwnd + circwindow_inc # Exponential window growth
- else:
- BDP = BWE*RTT_min
- cwnd = max(cwnd * cwnd_recovery_m, BDP)
- in_slow_start = 0
+If all edge connections no longer have data available to send on a circuit
+and all circuit queues have drained without blocking the local orconn, we stop
+updating this BDP estimate and discard old timestamps. However, we retain the
+actual estimator value.
-Increasing the congestion window by 100 *more* cells every SENDME allows
-the sender to send 100 *more* cells every 100 cells. Thus, this is an
-exponential function that causes cwnd to double every cwnd cells.
+3.1.2. Congestion Window BDP Estimation
-Once a congestion signal is experienced, Slow Start is exited, and the
-Additive-Increase-Multiplicative-Decrease (AIMD) steady-state phase
-begins.
+Assuming that the current congestion window is at or above the current BDP,
+the bandwidth estimate is the current congestion window size divided by the
+RTT estimate:
-3.1.2. Tor Westwood: Steady State AIMD
+ BWE = cwnd / RTT_current_ewma
-After slow start exits, in steady-state, after every SENDME response
-without a congestion signal, the window is updated as:
+The BDP estimate is computed by multiplying the Bandwidth estimate by
+the *minimum* circuit latency:
- cwnd = cwnd + circwindow_inc # For acked cells
- cwnd = cwnd + circwindow_inc/cwnd # Linear window growth
+ BDP = BWE * RTT_min
-This comes out to increasing cwnd by 1, every time cwnd cells are
-successfully sent without a congestion signal occurring. Thus this is
-additive linear growth, not exponential growth.
+Simplifying:
-If there is a congestion signal, cwnd is updated as:
-
- cwnd = cwnd + circwindow_inc # For acked cells
- cwnd = max(cwnd * cwnd_recovery_m, BDP) # For window shrink
+ BDP = cwnd * RTT_min / RTT_current_ewma
-This is called "Fast Recovery". If you dig into the citations, actual
-TCP Westwood has some additional details for responding to multiple
-packet losses that in some cases can fall back into slow-start, as well
-as some smoothing of the BDP to make up for dropped acks. Tor does not
-need either of these aspects of complexity.
+The net effect of this estimation is to correct for any overshoot of
+the cwnd over the actual BDP. It will obviously underestimate BDP if cwnd
+is below BDP.
-3.1.3. Tor Westwood: Complete SENDME Update Algorithm
+3.1.3. Inflight BDP Estimation
-Here is the complete congestion window algorithm for Tor Westwood, using
-only RTT signaling.
+Similar to the congestion window based estimation, the inflight estimation
+uses the current inflight packet count to derive BDP. It also subtracts local
+circuit queue use from the inflight packet count. This means it will be strictly
+less than or equal to the cwnd version:
-This will run each time we get a SENDME (aka sendme_process_circuit_level()):
+ BDP = (inflight - circ.chan_cells.n) * RTT_min / RTT_current_ewma
- in_slow_start = 1 # Per-circuit indicator
+If all edge connections no longer have data available to send on a circuit
+and all circuit queues have drained without blocking the local orconn, we stop
+updating this BDP estimate, because there are not sufficient inflight cells
+to properly estimate BDP.
- Every received SENDME ack, do:
- cwnd = cwnd + circwindow_inc # Update acked cells
+3.1.4. Piecewise BDP estimation
- # BOOTLEG_RTT_TOR threshold; can also be BACKWARD_ECN check:
- if RTT_current < (1−rtt_thresh)*RTT_min + rtt_thresh*RTT_max:
- if in_slow_start:
- cwnd = cwnd + circwindow_inc # Exponential growth
- else:
- cwnd = cwnd + circwindow_inc/cwnd # Linear growth
+The piecewise BDP estimation is used to help respond more quickly in the event
+the local OR connection is blocked, which indicates congestion somewhere along
+the path from the client to the guard (or between Exit and Middle). In this
+case, it takes the minimum of the inflight and SENDME estimators.
+
+When the local OR connection is not blocked, this estimator uses the max of
+the SENDME and cwnd estimator values.
+
+When the SENDME estimator has not gathered enough data, or has cleared its
+estimates based on lack of edge connection use, this estimator uses the
+Congestion Window BDP estimator value.
+
+
+3.2. Tor Westwood: TCP Westwood using RTT signaling [TOR_WESTWOOD]
+ http://intronetworks.cs.luc.edu/1/html/newtcps.html#tcp-westwood
+ http://nrlweb.cs.ucla.edu/nrlweb/publication/download/99/2001-mobicom-0.pdf
+ http://cpham.perso.univ-pau.fr/TCP/ccr_v31.pdf
+ https://c3lab.poliba.it/images/d/d7/Westwood_linux.pdf
+
+Recall that TCP Westwood is basically TCP Reno, but it uses BDP estimates
+for "Fast recovery" after a congestion signal arrives.
+
+We will also be using the RTT congestion signal as per BOOTLEG_RTT_TOR
+here, from the Options mail[1] and Defenestrator paper[3].
+
+This system must keep track of RTT measurements per circuit: RTT_min, RTT_max,
+and RTT_current. These are measured using the time delta between every
+'cc_sendme_inc' relay cells and the SENDME response. The first RTT_min can be
+measured arbitrarily, so long as it is larger than what we would get from
+SENDME.
+
+RTT_current is N-EWMA smoothed over 'cc_ewma_cwnd_cnt' congestion windows
+worth of SENDME acks.
+
+Recall that BOOTLEG_RTT_TOR emits a congestion signal when the current
+RTT falls below some fractional threshold ('cc_westwood_rtt_thresh') fraction
+between RTT_min and RTT_max. This check is:
+ RTT_current < (1−cc_westwood_rtt_thresh)*RTT_min
+ + cc_westwood_rtt_thresh*RTT_max
+
+Additionally, if the local OR connection is blocked at the time of SENDME ack
+arrival, this is treated as an immediate congestion signal.
+
+(We can also optionally use the ECN signal described in
+ideas/xxx-backward-ecn.txt, to exit Slow Start.)
+
+Congestion signals from RTT, blocked OR connections, or ECN are processed only
+once per congestion window. This is achieved through the next_cc_event flag,
+which is initialized to a cwnd worth of SENDME acks, and is decremented
+each ack. Congestion signals are only evaluated when it reaches 0.
+
+Note that because the congestion signal threshold of TOR_WESTWOOD is a
+function of RTT_max, and excessive queuing can cause an increase in RTT_max,
+TOR_WESTWOOD may have a runaway condition. For this reason, we specify a
+cc_westwodd_rtt_m backoff parameter, which can reduce RTT_max by a percentage
+of the difference between RTT_min and RTT_max, in the event of congestion.
+
+Here is the complete congestion window algorithm for Tor Westwood. This will run
+each time we get a SENDME (aka sendme_process_circuit_level()):
+
+ # Update acked cells
+ inflight -= cc_sendme_inc
+
+ if next_cc_event:
+ next_cc_event--
+
+ # Do not update anything if we detected a clock stall or jump,
+ # as per [CLOCK_HEURISTICS]
+ if clock_stalled_or_jumped:
+ return
+
+ if next_cc_event == 0:
+ # BOOTLEG_RTT_TOR threshold; can also be BACKWARD_ECN check:
+ if (RTT_current <
+ (100−cc_westwood_rtt_thresh)*RTT_min/100 +
+ cc_westwood_rtt_thresh*RTT_max/100) or orconn_blocked:
+ if in_slow_start:
+ cwnd += cwnd * cc_cwnd_inc_pct_ss # Exponential growth
else:
- BDP = BWE*RTT_min
- cwnd = max(cwnd * cwnd_recovery_m, BDP) # Window shrink
- in_slow_start = 0
+ cwnd = cwnd + cc_cwnd_inc # Linear growth
+ else:
+ if cc_westwood_backoff_min:
+ cwnd = min(cwnd * cc_westwood_cwnd_m, BDP) # Window shrink
+ else:
+ cwnd = max(cwnd * cc_westwood_cwnd_m, BDP) # Window shrink
+ in_slow_start = 0
+
+ # Back off RTT_m