From 4d950a261fdfbe18a16ead421d67e3959236be6b Mon Sep 17 00:00:00 2001
From: Mike Perry <mikeperry-git@torproject.org>
Date: Fri, 16 Dec 2022 22:22:27 +0000
Subject: Prop#324: Do not increase cwnd if the window is not full.

 - Allow a gap between inflight and cwnd before declaring the cwnd not full.
 - Parameterize how often a cwnd must be full
 - Clean up vegas algorithm for variable scoping and clarity
---
 proposals/324-rtt-congestion-control.txt | 184 +++++++++++++++++++++++--------
 1 file changed, 141 insertions(+), 43 deletions(-)

(limited to 'proposals/324-rtt-congestion-control.txt')

diff --git a/proposals/324-rtt-congestion-control.txt b/proposals/324-rtt-congestion-control.txt
index d529d5c..4937744 100644
--- a/proposals/324-rtt-congestion-control.txt
+++ b/proposals/324-rtt-congestion-control.txt
@@ -538,7 +538,10 @@ original cwnd estimator. So while this capability to change the BDP estimator
 remains in the C implementation, we do not expect it to be used.
 
 However, it was useful to use a local OR connection block at the time of
-SENDME ack arrival, as an immediate congestion signal.
+SENDME ack arrival, as an immediate congestion signal. Note that in C-Tor,
+this orconn_block state is not derived from any socket info, but instead is a
+heuristic that declares an orconn as blocked if any circuit cell queue
+exceeds the 'cellq_high' consensus parameter.
 
 (As an additional optimization, we could also use the ECN signal described in
 ideas/xxx-backward-ecn.txt, but this is not implemented. It is likely only of
@@ -554,70 +557,132 @@ per the rules in RFC3742:
       # Below the cap, we increment as per cc_cwnd_inc_pct_ss percent:
       return round(cc_cwnd_inc_pct_ss*cc_sendme_inc/100)
     else:
-      # This returns an increment equivalent to RFC3742, rounded:
-      # K = int(cwnd/(0.5 max_ssthresh));
-      # inc = int(MSS/K);
-      return round((cc_sendme_inc*cc_ss_cap_pathtype)/(2*cwnd));
+      # This returns an increment equivalent to RFC3742, rounded,
+      # with a minimum of inc=1.
+      # From RFC3742:
+      #  K = int(cwnd/(0.5 max_ssthresh));
+      #  inc = int(MSS/K);
+      return MAX(round((cc_sendme_inc*cc_ss_cap_pathtype)/(2*cwnd)), 1);
+
+During both Slow Start, and Steady State, if the congestion window is not full,
+we never increase the congestion window. We can still decrease it, or exit slow
+start, in this case. This is done to avoid causing overshoot. The original TCP
+Vegas addressed this problem by computing BDP and queue_use from inflight,
+instead of cwnd, but we found that approach to have signficantly worse
+performance.
+
+Because C-Tor is single-threaded, multiple SENDME acks may arrive during one
+processing loop, before edge connections resume reading. For this reason,
+we provide two heuristics to provide some slack in determining the full
+condition. The first is to allow a gap between inflight and cwnd,
+parameterized as 'cc_cwnd_full_gap' multiples of 'cc_sendme_inc':
+   cwnd_is_full(cwnd, inflight):
+     if inflight + 'cc_cwnd_full_gap'*'cc_sendme_inc' >= cwnd:
+       return true
+     else
+       return false
+
+The second heuristic immediately resets the full state if it falls below
+'cc_cwnd_full_minpct' full:
+   cwnd_is_nonfull(cwnd, inflight):
+     if 100*inflight < 'cc_cwnd_full_minpct'*cwnd:
+       return true
+     else
+       return false
+
+This full status is cached once per cwnd if 'cc_cwnd_full_per_cwnd=1';
+otherwise it is cached once per cwnd update. These two helper functions
+determine the number of acks in each case:
+   SENDME_PER_CWND(cwnd):
+     return ((cwnd + 'cc_sendme_inc'/2)/'cc_sendme_inc')
+   CWND_UPDATE_RATE(cwnd, in_slow_start):
+     # In Slow Start, update every SENDME
+     if in_slow_start:
+       return 1
+     else: # Otherwise, update as per the 'cc_inc_rate' (31)
+       return ((cwnd + 'cc_cwnd_inc_rate'*'cc_sendme_inc'/2)
+           / ('cc_cwnd_inc_rate'*'cc_sendme_inc'));
 
-After Slow Start, congestion signals from RTT, blocked OR connections, or ECN
-are processed only once per congestion window. This is achieved through the
-next_cc_event flag, which is initialized to a cwnd worth of SENDME acks, and
-is decremented each ack. Congestion signals are only evaluated when it reaches
-0.
+Shadow experimentation indicates that 'cc_cwnd_full_gap=2' and
+'cc_cwnd_full_per_cwnd=0' minimizes queue overshoot, where as
+'cc_cwnd_full_per_cwnd=1' and 'cc_cwnd_full_gap=1' is slightly better
+for performance. Since there may be a difference between Shadow and live,
+we leave this parmeterization in place.
 
 Here is the complete pseudocode for TOR_VEGAS with RFC3742, which is run every
-time an endpoint receives a SENDME ack:
-
-  # Update acked cells
-  inflight -= cc_sendme_inc
+time an endpoint receives a SENDME ack. All variables are scoped to the
+circuit, unless prefixed by an underscore (local), or in single quotes
+(consensus parameters):
 
+  # Decrement counters that signal either an update or cwnd event
   if next_cc_event:
     next_cc_event--
+  if next_cwnd_event:
+    next_cwnd_event--
 
   # Do not update anything if we detected a clock stall or jump,
   # as per [CLOCK_HEURISTICS]
   if clock_stalled_or_jumped:
+    inflight -= 'cc_sendme_inc'
     return
 
   if BDP > cwnd:
-    queue_use = 0
+    _queue_use = 0
   else:
-    queue_use = cwnd - BDP
+    _queue_use = cwnd - BDP
+
+  if cwnd_is_full(cwnd, inflight):
+    cwnd_full = 1
+  else if cwnd_is_nonfull(cwnd, inflight):
+    cwnd_full = 0
 
   if in_slow_start:
-    if queue_use < cc_vegas_gamma and not orconn_blocked:
-      inc = rfc3742_ss_inc(cwnd);
-      cwnd += inc
-      next_cc_event = 1
-
-      # If the RFC3742 increment drops below steady-state increment
-      # over a full cwnd worth of acks, exit slow start
-      if inc*SENDME_PER_CWND(cwnd) <= cc_cwnd_inc:
-        in_slow_start = 0
-        next_cc_event = round(cwnd / (cc_cwnd_inc_rate * cc_sendme_inc))
-    else:
+    if _queue_use < 'cc_vegas_gamma' and not orconn_blocked:
+      # Only increase cwnd if the cwnd is full
+      if cwnd_full:
+        _inc = rfc3742_ss_inc(cwnd);
+        cwnd += _inc
+
+        # If the RFC3742 increment drops below steady-state increment
+        # over a full cwnd worth of acks, exit slow start.
+        if _inc*SENDME_PER_CWND(cwnd) <= 'cc_cwnd_inc'*'cc_cwnd_inc_rate':
+          in_slow_start = 0
+    else: # Limit hit. Exit Slow start (even if cwnd not full)
       in_slow_start = 0
-      cwnd = BDP + cc_vegas_gamma
-      next_cc_event = round(cwnd / (cc_cwnd_inc_rate * cc_sendme_inc))
+      cwnd = BDP + 'cc_vegas_gamma'
 
     # Provide an emergency hard-max on slow start:
-    if cwnd >= cc_ss_max:
-      cwnd = cc_ss_max
+    if cwnd >= 'cc_ss_max':
+      cwnd = 'cc_ss_max'
       in_slow_start = 0
-      next_cc_event = round(cwnd / (cc_cwnd_inc_rate * cc_sendme_inc))
   else if next_cc_event == 0:
-    if queue_use > cc_vegas_delta:
-      cwnd = BDP + cc_vegas_delta - cc_cwnd_inc
-    elif queue_use > cc_vegas_beta or orconn_blocked:
-      cwnd -= cc_cwnd_inc
-    elif queue_use < cc_vegas_alpha:
-      cwnd += cc_cwnd_inc
-
-    cwnd = MAX(cwnd, cc_circwindow_min)
+    if _queue_use > 'cc_vegas_delta':
+      cwnd = BDP + 'cc_vegas_delta' - 'cc_cwnd_inc'
+    elif _queue_use > cc_vegas_beta or orconn_blocked:
+      cwnd -= 'cc_cwnd_inc'
+    elif cwnd_full and _queue_use < 'cc_vegas_alpha':
+      # Only increment if queue is low, *and* the cwnd is full
+      cwnd += 'cc_cwnd_inc'
+
+    cwnd = MAX(cwnd, 'cc_circwindow_min')
+
+  # Specify next cwnd and cc update
+  if next_cc_event == 0:
+    next_cc_event = CWND_UPDATE_RATE(cwnd)
+  if next_cwnd_event == 0:
+    next_cwnd_event = SENDME_PER_CWND(cwnd)
+
+  # Determine if we need to reset the cwnd_full state
+  # (Parameterized)
+  if 'cc_cwnd_full_per_cwnd' == 1:
+    if next_cwnd_event == SENDME_PER_CWND(cwnd):
+      cwnd_full = 0
+  else:
+    if next_cc_event == CWND_UPDATE_RATE(cwnd):
+      cwnd_full = 0
 
-    # Count the number of sendme acks until next update of cwnd,
-    # rounded to nearest integer
-    next_cc_event = round(cwnd / (cc_cwnd_inc_rate * cc_sendme_inc))
+  # Update acked cells
+  inflight -= 'cc_sendme_inc'
 
 
 3.4. Tor NOLA: Direct BDP tracker [TOR_NOLA]
@@ -1479,6 +1544,39 @@ These are sorted in order of importance to tune, most important first.
          The largest congestion window seen in Shadow is ~3000, so this was
          set as a safety valve above that.
 
+   cc_cwnd_full_gap:
+      - Description: This parameter defines the integer number of
+        'cc_sendme_inc' multiples of gap allowed between inflight and
+        cwnd, to still declare the cwnd full.
+      - Range: [0, INT32_MAX]
+      - Default: 1-2
+      - Shadow Tuning Results:
+        A value of 0 resulted in a slight loss of performance, and increased
+        variance in throughput. The optimal number here likely depends on
+        edgeconn inbuf size, edgeconn kernel buffer size, and eventloop
+        behavior.
+
+   cc_cwnd_full_minpct:
+      - Description: This paramter defines a low watermark in percent. If 
+        inflight falls below this percent of cwnd, the congestion window
+        is immediately declared non-full.
+      - Range: [0, 100]
+      - Default: 75
+
+   cc_cwnd_full_per_cwnd:
+      - Description: This parameter governs how often a cwnd must be
+        full, in order to allow congestion window increase. If it is 1,
+        then the cwnd only needs to be full once per cwnd worth of acks.
+        If it is 0, then it must be full once every cwnd update (ie:
+        every SENDME).
+      - Range: [0, 1]
+      - Default: 1
+      - Shadow Tuning Results:
+        A value of 0 resulted in a slight loss of performance, and increased
+        variance in throughput. The optimal number here likely depends on
+        edgeconn inbuf size, edgeconn kernel buffer size, and eventloop
+        behavior.
+
 6.5.4. NOLA Parameters
 
   cc_nola_overshoot:
-- 
cgit v1.2.3-54-g00ecf