Prop#324: Clarify that we use TOR_VEGAS and cwnd-RTT BDP estimation

Also remove the deprecated cc_bdp_alg param, and update the cc_alg param description.
author: Mike Perry <mikeperry-git@torproject.org> 2023-06-27 21:05:13 +0000
committer: Mike Perry <mikeperry-git@torproject.org> 2023-06-27 21:05:13 +0000
commit: a8ded3ca6bfb795b0d050a805a1ee66a1673ebf6 (patch)
tree: ef8eab1d65e68f2c568fd666d9859797c7f5a11b /proposals/324-rtt-congestion-control.txt
parent: 62301aa08519f8848933ecd48f28476c4a481f1a (diff)
download: torspec-a8ded3ca6bfb795b0d050a805a1ee66a1673ebf6.tar.gz
torspec-a8ded3ca6bfb795b0d050a805a1ee66a1673ebf6.zip
1 files changed, 41 insertions, 40 deletions
diff --git a/proposals/324-rtt-congestion-control.txt b/proposals/324-rtt-congestion-control.txt
index 5126410..582c54d 100644
--- a/proposals/324-rtt-congestion-control.txt
+++ b/proposals/324-rtt-congestion-control.txt
@@ -163,14 +163,13 @@ false (no stall), and the RTT value is used.
 
 2.1.2. N_EWMA Smoothing [N_EWMA_SMOOTHING]
 
-Both RTT estimation and SENDME BDP estimation require smoothing, to
-reduce the effects of packet jitter.
+RTT estimation requires smoothing, to reduce the effects of packet jitter.
 
 This smoothing is performed using N_EWMA[27], which is an Exponential
 Moving Average with alpha = 2/(N+1):
 
-  N_EWMA = BDP*2/(N+1) + N_EWMA_prev*(N-1)/(N+1)
-         = (BDP*2 + N_EWMA_prev*(N-1))/(N+1).
+  N_EWMA = RTT*2/(N+1) + N_EWMA_prev*(N-1)/(N+1)
+         = (RTT*2 + N_EWMA_prev*(N-1))/(N+1).
 
 Note that the second rearranged form MUST be used in order to ensure that
 rounding errors are handled in the same manner as other implementations.
@@ -179,10 +178,9 @@ Flow control rate limiting uses this function.
 
 During Slow Start, N is set to `cc_ewma_ss`, for RTT estimation.
 
-After Slow Start, for both RTT and SENDME BDP estimation, N is the number
-of SENDME acks between congestion window updates, divided by the value of consensus
-parameter 'cc_ewma_cwnd_pct', and then capped at a max of 'cc_ewma_max',
-but always at least 2:
+After Slow Start, N is the number of SENDME acks between congestion window
+updates, divided by the value of consensus parameter 'cc_ewma_cwnd_pct', and
+then capped at a max of 'cc_ewma_max', but always at least 2:
 
   N = MAX(MIN(CWND_UPDATE_RATE(cc)*cc_ewma_cwnd_pct/100, cc_ewma_max), 2);
 
@@ -247,12 +245,11 @@ have been given different names than in those two mails. The third algorithm,
 [TOR_NOLA], simply uses the latest BDP estimate directly as its congestion
 window.
 
-These algorithms will be evaluated by running Shadow simulations, to
-help determine parameter ranges, but experimentation on the live network
-will be required to determine which of these algorithms performs best
-when in competition with our current SENDME behavior, as used by real
-network traffic. This experimentation and tuning is detailed in section
-[EVALUATION].
+These algorithms were evaluated by running Shadow simulations, to help
+determine parameter ranges, and with experimentation on the live network.
+After this testing, we have converged on using [TOR_VEGAS], and RTT-based BDP
+estimation using the congestion window. We leave the algorithms in place
+for historical reference.
 
 All of these algorithms have rules to update 'cwnd' - the current congestion
 window, which starts out at a value controlled by consensus parameter
@@ -271,7 +268,6 @@ The 'deliver_window' field is still used to decide when to send a SENDME. In C
 tor, the deliver window is initially set at 1000, but it never gets below 900,
 because authenticated sendmes (Proposal 289) require that we must send only
 one SENDME at a time, and send it immediately after 100 cells are received.
-This property turns out to be very useful for [BDP_ESTIMATION].
 
 Implementation of different algorithms should be very simple - each
 algorithm should have a different update function depending on the selected algorithm,
@@ -308,22 +304,23 @@ is circuit-scoped.
 
 At a high-level, there are three main ways to estimate the Bandwidth-Delay
 Product: by using the current congestion window and RTT, by using the inflight
-cells and RTT, and by measuring SENDME arrival rate.
+cells and RTT, and by measuring SENDME arrival rate. After extensive shadow
+simulation and live testing, we have arrived at using the congestion window
+RTT based estimator, but we will describe all three for background.
 
 All three estimators are updated every SENDME ack arrival.
 
-The SENDME arrival rate is the most accurate way to estimate BDP, but it
-requires averaging over multiple SENDME acks to do so. The congestion window
-and inflight estimates rely on the congestion algorithm more or less correctly
-tracking an approximation of the BDP, and then use current and minimum RTT to
-compensate for overshoot.
+The SENDME arrival rate is the most direct way to estimate BDP, but it
+requires averaging over multiple SENDME acks to do so. Unfortunatetely,
+this approach suffers from what is called "ACK compression", where returning
+SENDMEs build up in queues, causing over-estimation of the BDP.
 
-The SENDME estimator tends to be accurate after ~3-5 SENDME acks. The cwnd and
-inflight estimators tend to be accurate once the congestion window exceeds
-BDP.
-
-We specify all three because they are all useful in various cases. These cases
-are broken up and combined to form the Piecewise BDP estimator.
+The congestion window and inflight estimates rely on the congestion algorithm
+more or less correctly tracking an approximation of the BDP, and then use
+current and minimum RTT to compensate for overshoot. These estimators tend to
+under-estimate BDP, especially when the congestion window is below the BDP.
+This under-estimation is corrected for by the increase of the congestion
+window in congestion control algorithm rules.
 
 3.1.1. SENDME arrival BDP estimation
 
@@ -380,6 +377,8 @@ in Shadow simulation, due to ack compression.
 
 3.1.2. Congestion Window BDP Estimation
 
+This is the BDP estimator we use.
+
 Assuming that the current congestion window is at or above the current BDP,
 the bandwidth estimate is the current congestion window size divided by the
 RTT estimate:
@@ -420,12 +419,17 @@ and all circuit queues have drained without blocking the local orconn, we stop
 updating this BDP estimate, because there are not sufficient inflight cells
 to properly estimate BDP.
 
+While the research literature for Vegas says that inflight estimators
+performed better due to the ability to avoid overhsoot, we had better
+performance results using other methods to control overshot. Hence, we do not
+use the inflight BDP estimator.
+
 3.1.4. Piecewise BDP estimation
 
-The piecewise BDP estimation is used to help respond more quickly in the event
-the local OR connection is blocked, which indicates congestion somewhere along
-the path from the client to the guard (or between Exit and Middle). In this
-case, it takes the minimum of the inflight and SENDME estimators.
+A piecewise BDP estimation could be used to help respond more quickly in the
+event the local OR connection is blocked, which indicates congestion somewhere
+along the path from the client to the guard (or between Exit and Middle). In
+this case, it takes the minimum of the inflight and SENDME estimators.
 
 When the local OR connection is not blocked, this estimator uses the max of
 the SENDME and cwnd estimator values.
@@ -527,6 +531,9 @@ each time we get a SENDME (aka sendme_process_circuit_level()):
 TCP Vegas control algorithm estimates the queue lengths at relays by
 subtracting the current BDP estimate from the current congestion window.
 
+After extensive shadow simulation and live testing, we have settled on this
+congestion control algorithm for use in Tor.
+
 Assuming the BDP estimate is accurate, any amount by which the congestion
 window exceeds the BDP will cause data to queue.
 
@@ -1212,7 +1219,7 @@ These are sorted in order of importance to tune, most important first.
     - Description:
           Specifies which congestion control algorithm clients should
           use, as an integer.
-    - Range: [0,3]  (0=fixed, 1=Westwood, 2=Vegas, 3=NOLA)
+    - Range: 0 or 2  (0=fixed windows, 2=Vegas)
     - Default: 2
     - Tuning Values: [2,3]
     - Tuning Notes:
@@ -1221,6 +1228,8 @@ These are sorted in order of importance to tune, most important first.
            values, and even the optimal algorithm itself, will likely depend
            upon how much fixed sendme traffic is in competition. See the
            algorithm-specific parameters for additional tuning notes.
+           As of Tor 0.4.8, Vegas is the default algorithm, and support
+           for algorithms 1 (Westwood) and 3 (NOLA) have been removed.
     - Shadow Tuning Results:
            Westwood exhibited responsiveness problems, drift, and overshoot.
            NOLA exhibited ack compression resulting in over-estimating the
@@ -1427,14 +1436,6 @@ These are sorted in order of importance to tune, most important first.
            allocation, though. Values of 50-100 will be explored after
            examining Shadow Guard Relay Utilization.
 
-  cc_bdp_alg:
-    - Description: The BDP estimation algorithm to use.
-    - Range: [0,7]
-    - Default: 7 (Piecewise EWMA)
-    - Tuning Notes:
-           We don't expect to need to tune this.
-    - Shadow Tuning Results:
-           We leave this as-is, but disable it in Vegas instead, below.
 
 6.5.2. Westwood parameters
author	Mike Perry <mikeperry-git@torproject.org>	2023-06-27 21:05:13 +0000
committer	Mike Perry <mikeperry-git@torproject.org>	2023-06-27 21:05:13 +0000
commit	a8ded3ca6bfb795b0d050a805a1ee66a1673ebf6 (patch)
tree	ef8eab1d65e68f2c568fd666d9859797c7f5a11b /proposals/324-rtt-congestion-control.txt
parent	62301aa08519f8848933ecd48f28476c4a481f1a (diff)
download	torspec-a8ded3ca6bfb795b0d050a805a1ee66a1673ebf6.tar.gz torspec-a8ded3ca6bfb795b0d050a805a1ee66a1673ebf6.zip