aboutsummaryrefslogtreecommitdiff
path: root/spec/path-spec/learning-timeouts.md
diff options
context:
space:
mode:
Diffstat (limited to 'spec/path-spec/learning-timeouts.md')
-rw-r--r--spec/path-spec/learning-timeouts.md305
1 files changed, 305 insertions, 0 deletions
diff --git a/spec/path-spec/learning-timeouts.md b/spec/path-spec/learning-timeouts.md
new file mode 100644
index 0000000..e98bac3
--- /dev/null
+++ b/spec/path-spec/learning-timeouts.md
@@ -0,0 +1,305 @@
+<a id="path-spec.txt-2.4"></a>
+
+# Learning when to give up ("timeout") on circuit construction {#timing-out}
+
+Since version 0.2.2.8-alpha, Tor clients attempt to learn when to give
+up on circuits based on network conditions.
+
+<a id="path-spec.txt-2.4.1"></a>
+
+## Distribution choice
+
+Based on studies of build times, we found that the distribution of
+circuit build times appears to be a Frechet distribution (and a multi-modal
+Frechet distribution, if more than one guard or bridge is used). However,
+estimators and quantile functions of the Frechet distribution are difficult
+to work with and slow to converge. So instead, since we are only interested
+in the accuracy of the tail, clients approximate the tail of the multi-modal
+distribution with a single Pareto curve.
+
+<a id="path-spec.txt-2.4.2"></a>
+
+## How much data to record {#observations}
+
+From our observations, the minimum number of circuit build times for a
+reasonable fit appears to be on the order of 100. However, to keep a
+good fit over the long term, clients store 1000 most recent circuit build
+times in a circular array.
+
+These build times only include the times required to build three-hop
+circuits, and the times required to build the first three hops of circuits
+with more than three hops. Circuits of fewer than three hops are not
+recorded, and hops past the third are not recorded.
+
+The Tor client should build test circuits at a rate of one every 'cbttestfreq'
+(10 seconds) until 'cbtmincircs' (100 circuits) are built, with a maximum of
+'cbtmaxopencircs' (default: 10) circuits open at once. This allows a fresh
+Tor to have a CircuitBuildTimeout estimated within 30 minutes after install
+or network change
+(see [Detecting Changing Network Conditions](#changes-in-network) below.)
+
+Timeouts are stored on disk in a histogram of 10ms bin width, the same
+width used to calculate the Xm value above. The timeouts recorded in the
+histogram must be shuffled after being read from disk, to preserve a
+proper expiration of old values after restart.
+
+Thus, some build time resolution is lost during restart. Implementations may
+choose a different persistence mechanism than this histogram, but be aware
+that build time binning is still needed for parameter estimation.
+
+<a id="path-spec.txt-2.4.3"></a>
+
+## Parameter estimation
+
+Once 'cbtmincircs' build times are recorded, Tor clients update the
+distribution parameters and recompute the timeout every circuit completion
+(though
+[see below](#changes-in-network)
+for when to pause and reset timeout due to
+too many circuits timing out).
+
+Tor clients calculate the parameters for a Pareto distribution fitting the
+data using the maximum likelihood estimator. For derivation, see:
+<https://en.wikipedia.org/wiki/Pareto_distribution#Estimation_of_parameters>
+
+Because build times are not a true Pareto distribution, we alter how Xm is
+computed. In a max likelihood estimator, the mode of the distribution is
+used directly as Xm.
+
+Instead of using the mode of discrete build times directly, Tor clients
+compute the Xm parameter using the weighted average of the midpoints
+of the 'cbtnummodes' (10) most frequently occurring 10ms histogram bins.
+Ties are broken in favor of earlier bins (that is, in favor of bins
+corresponding to shorter build times).
+
+(The use of 10 modes was found to minimize error from the selected
+cbtquantile, with 10ms bins for quantiles 60-80, compared to many other
+heuristics).
+
+To avoid ln(1.0+epsilon) precision issues, use log laws to rewrite the
+estimator for 'alpha' as the sum of logs followed by subtraction, rather
+than multiplication and division:
+
+`alpha = n/(Sum_n{ln(MAX(Xm, x_i))} - n\*ln(Xm))`
+
+In this, n is the total number of build times that have completed, x_i is
+the ith recorded build time, and Xm is the modes of x_i as above.
+
+All times below Xm are counted as having the Xm value via the MAX(),
+because in Pareto estimators, Xm is supposed to be the lowest value.
+However, since clients use mode averaging to estimate Xm, there can be
+values below our Xm. Effectively, the Pareto estimator then treats that
+everything smaller than Xm happened at Xm. One can also see that if
+clients did not do this, alpha could underflow to become negative, which
+results in an exponential curve, not a Pareto probability distribution.
+
+The timeout itself is calculated by using the Pareto Quantile function (the
+inverted CDF) to give us the value on the CDF such that 80% of the mass
+of the distribution is below the timeout value (parameter 'cbtquantile').
+
+The Pareto Quantile Function (inverse CDF) is:
+
+`F(q) = Xm/((1.0-q)^(1.0/alpha))`
+
+Thus, clients obtain the circuit build timeout for 3-hop circuits by
+computing:
+
+`timeout_ms = F(0.8) # 'cbtquantile' == 0.8`
+
+With this, we expect that the Tor client will accept the fastest 80% of the
+total number of paths on the network.
+
+Clients obtain the circuit close time to completely abandon circuits as:
+
+`close_ms = F(0.99) # 'cbtclosequantile' == 0.99`
+
+To avoid waiting an unreasonably long period of time for circuits that
+simply have relays that are down, Tor clients cap timeout_ms at the max
+build time actually observed so far, and cap close_ms at twice this max,
+but at least 60 seconds:
+
+```text
+ timeout_ms = MIN(timeout_ms, max_observed_timeout)
+ close_ms = MAX(MIN(close_ms, 2*max_observed_timeout), 'cbtinitialtimeout')
+```
+
+<a id="path-spec.txt-2.4.3"></a>
+
+## Calculating timeouts thresholds for circuits of different lengths {#different-lengths}
+
+The timeout_ms and close_ms estimates above are good only for 3-hop
+circuits, since only 3-hop circuits are recorded in the list of build
+times.
+
+To calculate the appropriate timeouts and close timeouts for circuits of
+other lengths, the client multiples the timeout_ms and close_ms values
+by a scaling factor determined by the number of communication hops
+needed to build their circuits:
+
+```text
+timeout_ms\[hops=n\] = timeout_ms * Actions(N) / Actions(3)
+
+close_ms\[hops=n\] = close_ms * Actions(N) / Actions(3)
+```
+
+where `Actions(N) = N * (N + 1) / 2.`
+
+To calculate timeouts for operations other than circuit building,
+the client should add X to Actions(N) for every round-trip communication
+required with the Xth hop.
+
+<a id="path-spec.txt-2.4.4"></a>
+
+## How to record timeouts {#recording-timeouts}
+
+Pareto estimators begin to lose their accuracy if the tail is omitted.
+Hence, Tor clients actually calculate two timeouts: a usage timeout, and a
+close timeout.
+
+Circuits that pass the usage timeout are marked as measurement circuits,
+and are allowed to continue to build until the close timeout corresponding
+to the point 'cbtclosequantile' (default 99) on the Pareto curve, or 60
+seconds, whichever is greater.
+
+The actual completion times for these measurement circuits should be
+recorded.
+
+Implementations should completely abandon a circuit and ignore the circuit
+if the total build time exceeds the close threshold. Such closed circuits
+should be ignored, as this typically means one of the relays in the path is
+offline.
+
+<a id="path-spec.txt-2.4.5"></a>
+
+## Detecting Changing Network Conditions {#changes-in-network}
+
+Tor clients attempt to detect both network connectivity loss and drastic
+changes in the timeout characteristics.
+
+To detect changing network conditions, clients keep a history of
+the timeout or non-timeout status of the past 'cbtrecentcount' circuits
+(20 circuits) that successfully completed at least one hop. If more than
+90% of these circuits timeout, the client discards all buildtimes history,
+resets the timeout to 'cbtinitialtimeout' (60 seconds), and then begins
+recomputing the timeout.
+
+If the timeout was already at least `cbtinitialtimeout`,
+the client doubles the timeout.
+
+The records here (of how many circuits succeeded or failed among the most
+recent 'cbrrecentcount') are not stored as persistent state. On reload,
+we start with a new, empty state.
+
+<a id="path-spec.txt-2.4.6"></a>
+
+## Consensus parameters governing behavior {#parameters}
+
+Clients that implement circuit build timeout learning should obey the
+following consensus parameters that govern behavior, in order to allow
+us to handle bugs or other emergent behaviors due to client circuit
+construction. If these parameters are not present in the consensus,
+the listed default values should be used instead.
+
+```text
+ cbtdisabled
+ Default: 0
+ Min: 0
+ Max: 1
+ Effect: If 1, all CircuitBuildTime learning code should be
+ disabled and history should be discarded. For use in
+ emergency situations only.
+
+ cbtnummodes
+ Default: 10
+ Min: 1
+ Max: 20
+ Effect: This value governs how many modes to use in the weighted
+ average calculation of Pareto parameter Xm. Selecting Xm as the
+ average of multiple modes improves accuracy of the Pareto tail
+ for quantile cutoffs from 60-80% (see cbtquantile).
+
+ cbtrecentcount
+ Default: 20
+ Min: 3
+ Max: 1000
+ Effect: This is the number of circuit build outcomes (success vs
+ timeout) to keep track of for the following option.
+
+ cbtmaxtimeouts
+ Default: 18
+ Min: 3
+ Max: 10000
+ Effect: When this many timeouts happen in the last 'cbtrecentcount'
+ circuit attempts, the client should discard all of its
+ history and begin learning a fresh timeout value.
+
+ Note that if this parameter's value is greater than the value
+ of 'cbtrecentcount', then the history will never be
+ discarded because of this feature.
+
+ cbtmincircs
+ Default: 100
+ Min: 1
+ Max: 10000
+ Effect: This is the minimum number of circuits to build before
+ computing a timeout.
+
+ Note that if this parameter's value is higher than 1000 (the
+ number of time observations that a client keeps in its
+ circular buffer), circuit build timeout calculation is
+ effectively disabled, and the default timeouts are used
+ indefinitely.
+
+ cbtquantile
+ Default: 80
+ Min: 10
+ Max: 99
+ Effect: This is the position on the quantile curve to use to set the
+ timeout value. It is a percent (10-99).
+
+ cbtclosequantile
+ Default: 99
+ Min: Value of cbtquantile parameter
+ Max: 99
+ Effect: This is the position on the quantile curve to use to set the
+ timeout value to use to actually close circuits. It is a
+ percent (0-99).
+
+ cbttestfreq
+ Default: 10
+ Min: 1
+ Max: 2147483647 (INT32_MAX)
+ Effect: Describes how often in seconds to build a test circuit to
+ gather timeout values. Only applies if less than 'cbtmincircs'
+ have been recorded.
+
+ cbtmintimeout
+ Default: 10
+ Min: 10
+ Max: 2147483647 (INT32_MAX)
+ Effect: This is the minimum allowed timeout value in milliseconds.
+
+ cbtinitialtimeout
+ Default: 60000
+ Min: Value of cbtmintimeout
+ Max: 2147483647 (INT32_MAX)
+ Effect: This is the timeout value to use before we have enough data
+ to compute a timeout, in milliseconds. If we do not have
+ enough data to compute a timeout estimate (see cbtmincircs),
+ then we use this interval both for the close timeout and the
+ abandon timeout.
+
+ cbtlearntimeout
+ Default: 180
+ Min: 10
+ Max: 60000
+ Effect: This is how long idle circuits will be kept open while cbt is
+ learning a new timeout value.
+
+ cbtmaxopencircs
+ Default: 10
+ Min: 0
+ Max: 14
+ Effect: This is the maximum number of circuits that can be open at
+ at the same time during the circuit build time learning phase.
+```