aboutsummaryrefslogtreecommitdiff
path: root/proposals/324-rtt-congestion-control.txt
diff options
context:
space:
mode:
Diffstat (limited to 'proposals/324-rtt-congestion-control.txt')
-rw-r--r--proposals/324-rtt-congestion-control.txt454
1 files changed, 318 insertions, 136 deletions
diff --git a/proposals/324-rtt-congestion-control.txt b/proposals/324-rtt-congestion-control.txt
index 78b6789..c601980 100644
--- a/proposals/324-rtt-congestion-control.txt
+++ b/proposals/324-rtt-congestion-control.txt
@@ -1,8 +1,9 @@
+```
Filename: 324-rtt-congestion-control.txt
Title: RTT-based Congestion Control for Tor
Author: Mike Perry
Created: 02 July 2020
-Status: Open
+Status: Finished
0. Motivation [MOTIVATION]
@@ -144,37 +145,43 @@ measured by the RTT estimator, and if these heurtics detect a stall or a jump,
we do not use that value to update RTT or BDP, nor do we update any congestion
control algorithm information that round.
-If the time delta is 0, that is always treated as a clock stall.
+If the time delta is 0, that is always treated as a clock stall, the RTT is
+not used, congestion control is not updated, and this fact is cached globally.
-If we have measured at least 'cc_bwe_min' RTT values or we have successfully
-exited slow start, then every sendme ACK, the new candidate RTT is compared to
-the stored EWMA RTT. If the new RTT is either 5000 times larger than the EWMA
-RTT, or 5000 times smaller than the stored EWMA RTT, then we do not record that
-estimate, and do not update BDP or the congestion control algorithms for that
-SENDME ack.
+If the circuit does not yet have an EWMA RTT or it is still in Slow Start, then
+no further checks are performed, and the RTT is used.
-Moreover, if a clock stall is detected by *any* circuit, this fact is
-cached, and this cached value is used on circuits for which we do not
-have enough data to compute the above heueristics. This cached value is
-also exported for use by the edge connection rate calculations done by
-[XON_ADVISORY].
+If the circuit has stored an EWMA RTT and has exited Slow Start, then every
+sendme ACK, the new candidate RTT is compared to the stored EWMA RTT. If the
+new RTT is 5000 times larger than the EWMA RTT, then the circuit does not
+record that estimate, and does not update BDP or the congestion control
+algorithms for that SENDME ack. If the new RTT is 5000 times smaller than the
+EWMA RTT, then the circuit uses the globally cached value from above (ie: it
+assumes the clock is stalled *only* if there was previously *also* a 0-delta RTT).
+
+If both ratio checks pass, the globally cached clock stall state is set to
+false (no stall), and the RTT value is used.
2.1.2. N_EWMA Smoothing [N_EWMA_SMOOTHING]
-Both RTT estimation and SENDME BDP estimation require smoothing, to
-reduce the effects of packet jitter.
+RTT estimation requires smoothing, to reduce the effects of packet jitter.
This smoothing is performed using N_EWMA[27], which is an Exponential
Moving Average with alpha = 2/(N+1):
- N_EWMA = BDP*2/(N+1) + N_EWMA_prev*(N-1)/(N+1).
+ N_EWMA = RTT*2/(N+1) + N_EWMA_prev*(N-1)/(N+1)
+ = (RTT*2 + N_EWMA_prev*(N-1))/(N+1).
+
+Note that the second rearranged form MUST be used in order to ensure that
+rounding errors are handled in the same manner as other implementations.
-Flow control rate limiting uses this function
+Flow control rate limiting uses this function.
-For both RTT and SENDME BDP estimation, N is the number of SENDME acks
-between congestion window updates, divided by the value of consensus
-parameter 'cc_ewma_cwnd_pct', and then capped at a max of 'cc_ewma_max',
-but always at least 2:
+During Slow Start, N is set to `cc_ewma_ss`, for RTT estimation.
+
+After Slow Start, N is the number of SENDME acks between congestion window
+updates, divided by the value of consensus parameter 'cc_ewma_cwnd_pct', and
+then capped at a max of 'cc_ewma_max', but always at least 2:
N = MAX(MIN(CWND_UPDATE_RATE(cc)*cc_ewma_cwnd_pct/100, cc_ewma_max), 2);
@@ -239,12 +246,11 @@ have been given different names than in those two mails. The third algorithm,
[TOR_NOLA], simply uses the latest BDP estimate directly as its congestion
window.
-These algorithms will be evaluated by running Shadow simulations, to
-help determine parameter ranges, but experimentation on the live network
-will be required to determine which of these algorithms performs best
-when in competition with our current SENDME behavior, as used by real
-network traffic. This experimentation and tuning is detailed in section
-[EVALUATION].
+These algorithms were evaluated by running Shadow simulations, to help
+determine parameter ranges, and with experimentation on the live network.
+After this testing, we have converged on using [TOR_VEGAS], and RTT-based BDP
+estimation using the congestion window. We leave the algorithms in place
+for historical reference.
All of these algorithms have rules to update 'cwnd' - the current congestion
window, which starts out at a value controlled by consensus parameter
@@ -263,7 +269,6 @@ The 'deliver_window' field is still used to decide when to send a SENDME. In C
tor, the deliver window is initially set at 1000, but it never gets below 900,
because authenticated sendmes (Proposal 289) require that we must send only
one SENDME at a time, and send it immediately after 100 cells are received.
-This property turns out to be very useful for [BDP_ESTIMATION].
Implementation of different algorithms should be very simple - each
algorithm should have a different update function depending on the selected algorithm,
@@ -300,22 +305,23 @@ is circuit-scoped.
At a high-level, there are three main ways to estimate the Bandwidth-Delay
Product: by using the current congestion window and RTT, by using the inflight
-cells and RTT, and by measuring SENDME arrival rate.
+cells and RTT, and by measuring SENDME arrival rate. After extensive shadow
+simulation and live testing, we have arrived at using the congestion window
+RTT based estimator, but we will describe all three for background.
All three estimators are updated every SENDME ack arrival.
-The SENDME arrival rate is the most accurate way to estimate BDP, but it
-requires averaging over multiple SENDME acks to do so. The congestion window
-and inflight estimates rely on the congestion algorithm more or less correctly
-tracking an approximation of the BDP, and then use current and minimum RTT to
-compensate for overshoot.
-
-The SENDME estimator tends to be accurate after ~3-5 SENDME acks. The cwnd and
-inflight estimators tend to be accurate once the congestion window exceeds
-BDP.
+The SENDME arrival rate is the most direct way to estimate BDP, but it
+requires averaging over multiple SENDME acks to do so. Unfortunatetely,
+this approach suffers from what is called "ACK compression", where returning
+SENDMEs build up in queues, causing over-estimation of the BDP.
-We specify all three because they are all useful in various cases. These cases
-are broken up and combined to form the Piecewise BDP estimator.
+The congestion window and inflight estimates rely on the congestion algorithm
+more or less correctly tracking an approximation of the BDP, and then use
+current and minimum RTT to compensate for overshoot. These estimators tend to
+under-estimate BDP, especially when the congestion window is below the BDP.
+This under-estimation is corrected for by the increase of the congestion
+window in congestion control algorithm rules.
3.1.1. SENDME arrival BDP estimation
@@ -372,6 +378,8 @@ in Shadow simulation, due to ack compression.
3.1.2. Congestion Window BDP Estimation
+This is the BDP estimator we use.
+
Assuming that the current congestion window is at or above the current BDP,
the bandwidth estimate is the current congestion window size divided by the
RTT estimate:
@@ -387,6 +395,13 @@ Simplifying:
BDP = cwnd * RTT_min / RTT_current_ewma
+The RTT_min for this calculation comes from the minimum RTT_current_ewma seen
+in the lifetime of this circuit. If the congestion window falls to
+`cc_cwnd_min` after slow start, implementations MAY choose to reset RTT_min
+for use in this calculation to either the RTT_current_ewma, or a
+percentile-weighted average between RTT_min and RTT_current_ewma, specified by
+`cc_rtt_reset_pct`. This helps with escaping starvation conditions.
+
The net effect of this estimation is to correct for any overshoot of
the cwnd over the actual BDP. It will obviously underestimate BDP if cwnd
is below BDP.
@@ -405,12 +420,17 @@ and all circuit queues have drained without blocking the local orconn, we stop
updating this BDP estimate, because there are not sufficient inflight cells
to properly estimate BDP.
+While the research literature for Vegas says that inflight estimators
+performed better due to the ability to avoid overhsoot, we had better
+performance results using other methods to control overshot. Hence, we do not
+use the inflight BDP estimator.
+
3.1.4. Piecewise BDP estimation
-The piecewise BDP estimation is used to help respond more quickly in the event
-the local OR connection is blocked, which indicates congestion somewhere along
-the path from the client to the guard (or between Exit and Middle). In this
-case, it takes the minimum of the inflight and SENDME estimators.
+A piecewise BDP estimation could be used to help respond more quickly in the
+event the local OR connection is blocked, which indicates congestion somewhere
+along the path from the client to the guard (or between Exit and Middle). In
+this case, it takes the minimum of the inflight and SENDME estimators.
When the local OR connection is not blocked, this estimator uses the max of
the SENDME and cwnd estimator values.
@@ -512,6 +532,9 @@ each time we get a SENDME (aka sendme_process_circuit_level()):
TCP Vegas control algorithm estimates the queue lengths at relays by
subtracting the current BDP estimate from the current congestion window.
+After extensive shadow simulation and live testing, we have settled on this
+congestion control algorithm for use in Tor.
+
Assuming the BDP estimate is accurate, any amount by which the congestion
window exceeds the BDP will cause data to queue.
@@ -529,63 +552,151 @@ original cwnd estimator. So while this capability to change the BDP estimator
remains in the C implementation, we do not expect it to be used.
However, it was useful to use a local OR connection block at the time of
-SENDME ack arrival, as an immediate congestion signal.
+SENDME ack arrival, as an immediate congestion signal. Note that in C-Tor,
+this orconn_block state is not derived from any socket info, but instead is a
+heuristic that declares an orconn as blocked if any circuit cell queue
+exceeds the 'cellq_high' consensus parameter.
(As an additional optimization, we could also use the ECN signal described in
ideas/xxx-backward-ecn.txt, but this is not implemented. It is likely only of
any benefit during Slow Start, and even that benefit is likely small.)
-Congestion signals from RTT, blocked OR connections, or ECN are processed only
-once per congestion window. This is achieved through the next_cc_event flag,
-which is initialized to a cwnd worth of SENDME acks, and is decremented
-each ack. Congestion signals are only evaluated when it reaches 0.
-
-Here is the complete pseudocode for TOR_VEGAS, which is run every time
-an endpoint receives a SENDME ack:
-
- # Update acked cells
- inflight -= cc_sendme_inc
+During Slow Start, we use RFC3742 Limited Slow Start[32], which checks the
+congestion signals from RTT, blocked OR connections, or ECN every single
+SENDME ack. It also provides a `cc_sscap_*` parameter for each path length,
+which reduces the congestion window increment rate after it is crossed, as
+per the rules in RFC3742:
+ rfc3742_ss_inc(cwnd):
+ if cwnd <= cc_ss_cap_pathtype:
+ # Below the cap, we increment as per cc_cwnd_inc_pct_ss percent:
+ return round(cc_cwnd_inc_pct_ss*cc_sendme_inc/100)
+ else:
+ # This returns an increment equivalent to RFC3742, rounded,
+ # with a minimum of inc=1.
+ # From RFC3742:
+ # K = int(cwnd/(0.5 max_ssthresh));
+ # inc = int(MSS/K);
+ return MAX(round((cc_sendme_inc*cc_ss_cap_pathtype)/(2*cwnd)), 1);
+
+During both Slow Start, and Steady State, if the congestion window is not full,
+we never increase the congestion window. We can still decrease it, or exit slow
+start, in this case. This is done to avoid causing overshoot. The original TCP
+Vegas addressed this problem by computing BDP and queue_use from inflight,
+instead of cwnd, but we found that approach to have signficantly worse
+performance.
+
+Because C-Tor is single-threaded, multiple SENDME acks may arrive during one
+processing loop, before edge connections resume reading. For this reason,
+we provide two heuristics to provide some slack in determining the full
+condition. The first is to allow a gap between inflight and cwnd,
+parameterized as 'cc_cwnd_full_gap' multiples of 'cc_sendme_inc':
+ cwnd_is_full(cwnd, inflight):
+ if inflight + 'cc_cwnd_full_gap'*'cc_sendme_inc' >= cwnd:
+ return true
+ else
+ return false
+The second heuristic immediately resets the full state if it falls below
+'cc_cwnd_full_minpct' full:
+ cwnd_is_nonfull(cwnd, inflight):
+ if 100*inflight < 'cc_cwnd_full_minpct'*cwnd:
+ return true
+ else
+ return false
+
+This full status is cached once per cwnd if 'cc_cwnd_full_per_cwnd=1';
+otherwise it is cached once per cwnd update. These two helper functions
+determine the number of acks in each case:
+ SENDME_PER_CWND(cwnd):
+ return ((cwnd + 'cc_sendme_inc'/2)/'cc_sendme_inc')
+ CWND_UPDATE_RATE(cwnd, in_slow_start):
+ # In Slow Start, update every SENDME
+ if in_slow_start:
+ return 1
+ else: # Otherwise, update as per the 'cc_inc_rate' (31)
+ return ((cwnd + 'cc_cwnd_inc_rate'*'cc_sendme_inc'/2)
+ / ('cc_cwnd_inc_rate'*'cc_sendme_inc'));
+
+Shadow experimentation indicates that 'cc_cwnd_full_gap=2' and
+'cc_cwnd_full_per_cwnd=0' minimizes queue overshoot, where as
+'cc_cwnd_full_per_cwnd=1' and 'cc_cwnd_full_gap=1' is slightly better
+for performance. Since there may be a difference between Shadow and live,
+we leave this parmeterization in place.
+
+Here is the complete pseudocode for TOR_VEGAS with RFC3742, which is run every
+time an endpoint receives a SENDME ack. All variables are scoped to the
+circuit, unless prefixed by an underscore (local), or in single quotes
+(consensus parameters):
+
+ # Decrement counters that signal either an update or cwnd event
if next_cc_event:
next_cc_event--
+ if next_cwnd_event:
+ next_cwnd_event--
# Do not update anything if we detected a clock stall or jump,
# as per [CLOCK_HEURISTICS]
if clock_stalled_or_jumped:
+ inflight -= 'cc_sendme_inc'
return
+ if BDP > cwnd:
+ _queue_use = 0
+ else:
+ _queue_use = cwnd - BDP
+
+ if cwnd_is_full(cwnd, inflight):
+ cwnd_full = 1
+ else if cwnd_is_nonfull(cwnd, inflight):
+ cwnd_full = 0
+
+ if in_slow_start:
+ if _queue_use < 'cc_vegas_gamma' and not orconn_blocked:
+ # Only increase cwnd if the cwnd is full
+ if cwnd_full:
+ _inc = rfc3742_ss_inc(cwnd);
+ cwnd += _inc
+
+ # If the RFC3742 increment drops below steady-state increment
+ # over a full cwnd worth of acks, exit slow start.
+ if _inc*SENDME_PER_CWND(cwnd) <= 'cc_cwnd_inc'*'cc_cwnd_inc_rate':
+ in_slow_start = 0
+ else: # Limit hit. Exit Slow start (even if cwnd not full)
+ in_slow_start = 0
+ cwnd = BDP + 'cc_vegas_gamma'
+
+ # Provide an emergency hard-max on slow start:
+ if cwnd >= 'cc_ss_max':
+ cwnd = 'cc_ss_max'
+ in_slow_start = 0
+ else if next_cc_event == 0:
+ if _queue_use > 'cc_vegas_delta':
+ cwnd = BDP + 'cc_vegas_delta' - 'cc_cwnd_inc'
+ elif _queue_use > cc_vegas_beta or orconn_blocked:
+ cwnd -= 'cc_cwnd_inc'
+ elif cwnd_full and _queue_use < 'cc_vegas_alpha':
+ # Only increment if queue is low, *and* the cwnd is full
+ cwnd += 'cc_cwnd_inc'
+
+ cwnd = MAX(cwnd, 'cc_circwindow_min')
+
+ # Specify next cwnd and cc update
if next_cc_event == 0:
- if BDP > cwnd:
- queue_use = 0
- else:
- queue_use = cwnd - BDP
-
- if in_slow_start:
- if queue_use < cc_vegas_gamma and not orconn_blocked:
- # Increment by slow start %, or at least 2 sendme_inc's worth
- cwnd = cwnd + MAX(cwnd * cc_cwnd_inc_pct_ss, 2*cc_sendme_inc)
- # If our BDP estimator thinks the BDP is still larger, use that
- cwnd = MAX(cwnd, BDP)
- else:
- cwnd = BDP + cc_vegas_gamma
- in_slow_start = 0
- else:
- if queue_use > cc_vegas_delta:
- cwnd = BDP + cc_vegas_delta - cc_cwnd_inc
- elif queue_use > cc_vegas_beta or orconn_blocked:
- cwnd -= cc_cwnd_inc
- elif queue_use < cc_vegas_alpha:
- cwnd += cc_cwnd_inc
-
- cwnd = MAX(cwnd, cc_circwindow_min)
+ next_cc_event = CWND_UPDATE_RATE(cwnd)
+ if next_cwnd_event == 0:
+ next_cwnd_event = SENDME_PER_CWND(cwnd)
+
+ # Determine if we need to reset the cwnd_full state
+ # (Parameterized)
+ if 'cc_cwnd_full_per_cwnd' == 1:
+ if next_cwnd_event == SENDME_PER_CWND(cwnd):
+ cwnd_full = 0
+ else:
+ if next_cc_event == CWND_UPDATE_RATE(cwnd):
+ cwnd_full = 0
- # Count the number of sendme acks until next update of cwnd,
- # rounded to nearest integer
- if in_slow_start:
- next_cc_event = round(cwnd / cc_sendme_inc)
- else
- # Never increment faster in slow start, only steady state.
- next_cc_event = round(cwnd / (cc_cwnd_inc_rate * cc_sendme_inc))
+ # Update acked cells
+ inflight -= 'cc_sendme_inc'
3.4. Tor NOLA: Direct BDP tracker [TOR_NOLA]
@@ -715,6 +826,12 @@ struct xon_cell {
u32 kbps_ewma;
}
+Parties SHOULD treat XON or XOFF cells with unrecognized versions as a
+protocol violation.
+
+In `xon_cell`, a zero value for `kbps_ewma` means that the stream's rate is
+unlimited. Parties should therefore not send "0" to mean "do not send data".
+
4.1.1. XON/XOFF behavior
If the length of an edge outbuf queue exceeds the size provided in the
@@ -1071,7 +1188,7 @@ We will calibrate the Shadow simulator so that it has similar CDFs for all of
these metrics as the live network, without using congestion control.
Then, we will want to inspect CDFs of these three metrics for various
-congestion control algorithms and parameters.
+congestion control algorithms and parameters.
The live network testing will also spot-check performance characteristics of
a couple algorithm and parameter sets, to ensure we see similar results as
@@ -1103,7 +1220,7 @@ These are sorted in order of importance to tune, most important first.
- Description:
Specifies which congestion control algorithm clients should
use, as an integer.
- - Range: [0,3] (0=fixed, 1=Westwood, 2=Vegas, 3=NOLA)
+ - Range: 0 or 2 (0=fixed windows, 2=Vegas)
- Default: 2
- Tuning Values: [2,3]
- Tuning Notes:
@@ -1112,6 +1229,8 @@ These are sorted in order of importance to tune, most important first.
values, and even the optimal algorithm itself, will likely depend
upon how much fixed sendme traffic is in competition. See the
algorithm-specific parameters for additional tuning notes.
+ As of Tor 0.4.8, Vegas is the default algorithm, and support
+ for algorithms 1 (Westwood) and 3 (NOLA) have been removed.
- Shadow Tuning Results:
Westwood exhibited responsiveness problems, drift, and overshoot.
NOLA exhibited ack compression resulting in over-estimating the
@@ -1143,7 +1262,7 @@ These are sorted in order of importance to tune, most important first.
cc_sendme_inc:
- Description: Specifies how many cells a SENDME acks
- - Range: [1, 255]
+ - Range: [1, 254]
- Default: 31
- Tuning Values: 25,33,50
- Tuning Notes:
@@ -1157,7 +1276,7 @@ These are sorted in order of importance to tune, most important first.
cells that fit in a TLS frame. Much of the rest of Tor has
processing values at 32 cells, as well.
- Consensus Update Notes:
- This value MUST only be changed by a factor of 2, every 4 hours.
+ This value MUST only be changed by +/- 1, every 4 hours.
If greater changes are needed, they MUST be spread out over
multiple consensus updates.
@@ -1270,6 +1389,25 @@ These are sorted in order of importance to tune, most important first.
congestion, to avoid overloading slow relays. Values of 10 or 20
were best.
+ cc_ewma_ss:
+ - Description: This specifies the N in N_EWMA smoothing of RTT during
+ Slow Start.
+ - Range: [2, INT32_MAX]
+ - Default: 2
+ - Tuning Values: [2,4]
+ - Shadow Tuning Results:
+ Setting this to 2 helped reduce overshoot during Slow Start.
+
+ cc_rtt_reset_pct:
+ - Description: Describes a percentile average between RTT_min and
+ RTT_current_ewma, for use to reset RTT_min, when the
+ congestion window hits cwnd_min.
+ - Range: [0, 100]
+ - Default: 100
+ - Shadow Tuning Results:
+ cwnd_min is not hit in Shadow simulations, but it can be hit
+ on the live network while under DoS conditions, and with cheaters.
+
cc_cwnd_inc:
- Description: How much to increment the congestion window by during
steady state, every cwnd.
@@ -1299,14 +1437,6 @@ These are sorted in order of importance to tune, most important first.
allocation, though. Values of 50-100 will be explored after
examining Shadow Guard Relay Utilization.
- cc_bdp_alg:
- - Description: The BDP estimation algorithm to use.
- - Range: [0,7]
- - Default: 7 (Piecewise EWMA)
- - Tuning Notes:
- We don't expect to need to tune this.
- - Shadow Tuning Results:
- We leave this as-is, but disable it in Vegas instead, below.
6.5.2. Westwood parameters
@@ -1365,50 +1495,93 @@ These are sorted in order of importance to tune, most important first.
6.5.3. Vegas Parameters
- cc_vegas_alpha_{exit,onion,sos,vg,sbws}:
- cc_vegas_beta_{exit,onion,sos,vg,sbws}:
- cc_vegas_gamma_{exit,onion,sos,vg,sbws}:
- cc_vegas_delta_{exit,onion,sos,vg,sbws}:
+ cc_vegas_alpha_{exit,onion,sbws}:
+ cc_vegas_beta_{exit,onion,sbws}:
+ cc_vegas_gamma_{exit,onion,sbws}:
+ cc_vegas_delta_{exit,onion,sbws}:
- Description: These parameters govern the number of cells
that [TOR_VEGAS] can detect in queue before reacting.
- Range: [0, 1000] (except delta, which has max of INT32_MAX)
- Defaults:
- alpha: path_len*2*31-31
- beta: path_len*2*31
- gamma: path_len*2*31
- delta: path_len*2*31 + 2*31
+ # OUTBUF_CELLS=62
+ cc_vegas_alpha_exit (3*OUTBUF_CELLS)
+ cc_vegas_beta_exit (4*OUTBUF_CELLS)
+ cc_vegas_gamma_exit (3*OUTBUF_CELLS)
+ cc_vegas_delta_exit (5*OUTBUF_CELLS)
+ cc_vegas_alpha_onion (3*OUTBUF_CELLS)
+ cc_vegas_beta_onion (6*OUTBUF_CELLS)
+ cc_vegas_gamma_onion (4*OUTBUF_CELLS)
+ cc_vegas_delta_onion (7*OUTBUF_CELLS)
- Tuning Notes:
The amount of queued cells that Vegas should tolerate is heavily
dependent upon competing congestion control algorithms. The specified
defaults are necessary to compete against current fixed SENDME traffic,
- but are much larger than neccessary otherwise. As the number of
- clients using fixed windows is reduced (or circwindow is lowered), we
- can reduce these parameters, which will result in less overall queuing
- and congestion on the network.
+ but are much larger than neccessary otherwise. These values also
+ need a large-ish range between alpha and beta, to allow some degree of
+ variance in traffic, as per [33]. The tuning of these parameters
+ happened in two tickets[34,35]. The onion service parameters were
+ set on the basis that they should increase the queue until as much
+ queue delay as Exits, but tolerate up to 6 hops of outbuf delay.
+ Lack of visibility into onion service congestion window on the live
+ network prevented confirming this.
- Shadow Tuning Results:
We found that the best values for 3-hop Exit circuits was to set
- beta and gamma to the size of the outbufs times the number of hops.
- This has the effect that Vegas detects congestion and backs off
- when this much queue delay is detected. Alpha is set to one TLS
- record/sendme_inc below this value. If the queue length is detected
- to be below that, we increase the congestion window. We still
- need to verify that the path length multiplier still holds for
- other types of circuits, specifically onion services.
-
- cc_vegas_bdp_mix:
- - Description: This parameter allows us to specify a weighted percent
- average between the cwnd BDP estimator and the piecewise
- estimator.
- - Range: [0, 100]
- - Default: 100 (use 100% CWND estimator)
- - Tuning Notes:
- The original Vegas literature specified the CWND estimator, but
- the Piecewise estimator is very likely more suited for our use
- case. This parameter allows us to average them. We're unlikely to
- need it.
+ alpha and gamma to the size of the outbufs times the number of
+ hops. Beta is set to one TLS record/sendme_inc above this value.
+
+ cc_sscap_{exit,onion,sbws}:
+ - Description: These parameters describe the RFC3742 'cap', after which
+ congestion window increments are reduced. INT32_MAX disables
+ RFC3742.
+ - Range: [100, INT32_MAX]
+ - Defaults:
+ sbws: 400
+ exit: 600
+ onion: 475
- Shadow Tuning Results:
- Because the BDP estimator had ack compression and over-estimation,
- we the CWND estimator.
+ We picked these defaults based on the average congestion window
+ seen in Shadow sims for exits and onion service circuits.
+
+ cc_ss_max:
+ - Description: This parameter provides a hard-max on the congestion
+ window in slow start.
+ - Range: [500, INT32_MAX]
+ - Default: 5000
+ - Shadow Tuning Results:
+ The largest congestion window seen in Shadow is ~3000, so this was
+ set as a safety valve above that.
+
+ cc_cwnd_full_gap:
+ - Description: This parameter defines the integer number of
+ 'cc_sendme_inc' multiples of gap allowed between inflight and
+ cwnd, to still declare the cwnd full.
+ - Range: [0, INT16_MAX]
+ - Default: 4
+ - Shadow Tuning Results:
+ Low values resulted in a slight loss of performance, and increased
+ variance in throughput. Setting this at 4 seemed to achieve a good
+ balance betwen throughput and queue overshoot.
+
+ cc_cwnd_full_minpct:
+ - Description: This paramter defines a low watermark in percent. If
+ inflight falls below this percent of cwnd, the congestion window
+ is immediately declared non-full.
+ - Range: [0, 100]
+ - Default: 25
+
+ cc_cwnd_full_per_cwnd:
+ - Description: This parameter governs how often a cwnd must be
+ full, in order to allow congestion window increase. If it is 1,
+ then the cwnd only needs to be full once per cwnd worth of acks.
+ If it is 0, then it must be full once every cwnd update (ie:
+ every SENDME).
+ - Range: [0, 1]
+ - Default: 1
+ - Shadow Tuning Results:
+ A value of 0 resulted in a slight loss of performance, and increased
+ variance in throughput. The optimal number here likely depends on
+ edgeconn inbuf size, edgeconn kernel buffer size, and eventloop
+ behavior.
6.5.4. NOLA Parameters
@@ -1445,7 +1618,7 @@ These are sorted in order of importance to tune, most important first.
This threshold plus the sender's cwnd must be greater than the
cc_xon_rate value, or a rate cannot be computed. Unfortunately,
unless it is sent, the receiver does not know the cwnd. Therefore,
- this value should always be higher than cc_xon_rate minus
+ this value should always be higher than cc_xon_rate minus
'cc_cwnd_min' (100) minus the xon threshhold value (0).
cc_xon_rate
@@ -1483,7 +1656,7 @@ These are sorted in order of importance to tune, most important first.
- Tuning Notes:
Setting this higher will smooth over changes in the rate field,
and thus avoid XONs, but will reduce our reactivity to rate changes.
-
+
6.5.6. External Performance Parameters to Tune
@@ -1495,7 +1668,7 @@ These are sorted in order of importance to tune, most important first.
- Description: Specifies the percentage cutoff for the circuit build
timeout mechanism.
- Range: [60, 80]
- - Default: 80
+ - Default: 80
- Tuning Values: [70, 75, 80]
- Tuning Notes:
The circuit build timeout code causes Tor to use only the fastest
@@ -1976,7 +2149,7 @@ The client MUST reject an ntorv3 reply with field EXT_FIELD_TYPE=02, if the
client did not include EXT_FIELD_TYPE=01 in its handshake.
The client SHOULD reject a sendme_inc field value that differs from the
-current 'cc_sendme_inc' consensus parameter by more than a factor of 2, in
+current 'cc_sendme_inc' consensus parameter by more than +/- 1, in
either direction.
If a client rejects a handshake, it MUST close the circuit.
@@ -1987,8 +2160,7 @@ The pedantic reader will note that a rogue consensus can cause all clients
to decide to close circuits by changing 'cc_sendme_inc' by a large margin.
As a matter of policy, the directory authorities MUST NOT change
-'cc_sendme_inc' by more than a factor of two (2), within a four (4) hour
-window, for this reason.
+'cc_sendme_inc' by more than +/- 1.
In Shadow simulation, the optimal 'cc_sendme_inc' value to be ~31 cells, or
one (1) TLS record worth of cells. We do not expect to change this value
@@ -2133,3 +2305,13 @@ receive more data. It is sent to tell the sender to resume sending.
31. KIST: Kernel-Informed Socket Transport for Tor
https://matt.traudt.xyz/static/papers/kist-tops2018.pdf
+
+32. RFC3742 Limited Slow Start
+ https://datatracker.ietf.org/doc/html/rfc3742#section-2
+
+33. https://people.csail.mit.edu/venkatar/cc-starvation.pdf
+
+34. https://gitlab.torproject.org/tpo/core/tor/-/issues/40642
+
+35. https://gitlab.torproject.org/tpo/network-health/analysis/-/issues/49
+```