diff options
Diffstat (limited to 'spec/path-spec/learning-timeouts.md')
-rw-r--r-- | spec/path-spec/learning-timeouts.md | 305 |
1 files changed, 305 insertions, 0 deletions
diff --git a/spec/path-spec/learning-timeouts.md b/spec/path-spec/learning-timeouts.md new file mode 100644 index 0000000..e98bac3 --- /dev/null +++ b/spec/path-spec/learning-timeouts.md @@ -0,0 +1,305 @@ +<a id="path-spec.txt-2.4"></a> + +# Learning when to give up ("timeout") on circuit construction {#timing-out} + +Since version 0.2.2.8-alpha, Tor clients attempt to learn when to give +up on circuits based on network conditions. + +<a id="path-spec.txt-2.4.1"></a> + +## Distribution choice + +Based on studies of build times, we found that the distribution of +circuit build times appears to be a Frechet distribution (and a multi-modal +Frechet distribution, if more than one guard or bridge is used). However, +estimators and quantile functions of the Frechet distribution are difficult +to work with and slow to converge. So instead, since we are only interested +in the accuracy of the tail, clients approximate the tail of the multi-modal +distribution with a single Pareto curve. + +<a id="path-spec.txt-2.4.2"></a> + +## How much data to record {#observations} + +From our observations, the minimum number of circuit build times for a +reasonable fit appears to be on the order of 100. However, to keep a +good fit over the long term, clients store 1000 most recent circuit build +times in a circular array. + +These build times only include the times required to build three-hop +circuits, and the times required to build the first three hops of circuits +with more than three hops. Circuits of fewer than three hops are not +recorded, and hops past the third are not recorded. + +The Tor client should build test circuits at a rate of one every 'cbttestfreq' +(10 seconds) until 'cbtmincircs' (100 circuits) are built, with a maximum of +'cbtmaxopencircs' (default: 10) circuits open at once. This allows a fresh +Tor to have a CircuitBuildTimeout estimated within 30 minutes after install +or network change +(see [Detecting Changing Network Conditions](#changes-in-network) below.) + +Timeouts are stored on disk in a histogram of 10ms bin width, the same +width used to calculate the Xm value above. The timeouts recorded in the +histogram must be shuffled after being read from disk, to preserve a +proper expiration of old values after restart. + +Thus, some build time resolution is lost during restart. Implementations may +choose a different persistence mechanism than this histogram, but be aware +that build time binning is still needed for parameter estimation. + +<a id="path-spec.txt-2.4.3"></a> + +## Parameter estimation + +Once 'cbtmincircs' build times are recorded, Tor clients update the +distribution parameters and recompute the timeout every circuit completion +(though +[see below](#changes-in-network) +for when to pause and reset timeout due to +too many circuits timing out). + +Tor clients calculate the parameters for a Pareto distribution fitting the +data using the maximum likelihood estimator. For derivation, see: +<https://en.wikipedia.org/wiki/Pareto_distribution#Estimation_of_parameters> + +Because build times are not a true Pareto distribution, we alter how Xm is +computed. In a max likelihood estimator, the mode of the distribution is +used directly as Xm. + +Instead of using the mode of discrete build times directly, Tor clients +compute the Xm parameter using the weighted average of the midpoints +of the 'cbtnummodes' (10) most frequently occurring 10ms histogram bins. +Ties are broken in favor of earlier bins (that is, in favor of bins +corresponding to shorter build times). + +(The use of 10 modes was found to minimize error from the selected +cbtquantile, with 10ms bins for quantiles 60-80, compared to many other +heuristics). + +To avoid ln(1.0+epsilon) precision issues, use log laws to rewrite the +estimator for 'alpha' as the sum of logs followed by subtraction, rather +than multiplication and division: + +`alpha = n/(Sum_n{ln(MAX(Xm, x_i))} - n\*ln(Xm))` + +In this, n is the total number of build times that have completed, x_i is +the ith recorded build time, and Xm is the modes of x_i as above. + +All times below Xm are counted as having the Xm value via the MAX(), +because in Pareto estimators, Xm is supposed to be the lowest value. +However, since clients use mode averaging to estimate Xm, there can be +values below our Xm. Effectively, the Pareto estimator then treats that +everything smaller than Xm happened at Xm. One can also see that if +clients did not do this, alpha could underflow to become negative, which +results in an exponential curve, not a Pareto probability distribution. + +The timeout itself is calculated by using the Pareto Quantile function (the +inverted CDF) to give us the value on the CDF such that 80% of the mass +of the distribution is below the timeout value (parameter 'cbtquantile'). + +The Pareto Quantile Function (inverse CDF) is: + +`F(q) = Xm/((1.0-q)^(1.0/alpha))` + +Thus, clients obtain the circuit build timeout for 3-hop circuits by +computing: + +`timeout_ms = F(0.8) # 'cbtquantile' == 0.8` + +With this, we expect that the Tor client will accept the fastest 80% of the +total number of paths on the network. + +Clients obtain the circuit close time to completely abandon circuits as: + +`close_ms = F(0.99) # 'cbtclosequantile' == 0.99` + +To avoid waiting an unreasonably long period of time for circuits that +simply have relays that are down, Tor clients cap timeout_ms at the max +build time actually observed so far, and cap close_ms at twice this max, +but at least 60 seconds: + +```text + timeout_ms = MIN(timeout_ms, max_observed_timeout) + close_ms = MAX(MIN(close_ms, 2*max_observed_timeout), 'cbtinitialtimeout') +``` + +<a id="path-spec.txt-2.4.3"></a> + +## Calculating timeouts thresholds for circuits of different lengths {#different-lengths} + +The timeout_ms and close_ms estimates above are good only for 3-hop +circuits, since only 3-hop circuits are recorded in the list of build +times. + +To calculate the appropriate timeouts and close timeouts for circuits of +other lengths, the client multiples the timeout_ms and close_ms values +by a scaling factor determined by the number of communication hops +needed to build their circuits: + +```text +timeout_ms\[hops=n\] = timeout_ms * Actions(N) / Actions(3) + +close_ms\[hops=n\] = close_ms * Actions(N) / Actions(3) +``` + +where `Actions(N) = N * (N + 1) / 2.` + +To calculate timeouts for operations other than circuit building, +the client should add X to Actions(N) for every round-trip communication +required with the Xth hop. + +<a id="path-spec.txt-2.4.4"></a> + +## How to record timeouts {#recording-timeouts} + +Pareto estimators begin to lose their accuracy if the tail is omitted. +Hence, Tor clients actually calculate two timeouts: a usage timeout, and a +close timeout. + +Circuits that pass the usage timeout are marked as measurement circuits, +and are allowed to continue to build until the close timeout corresponding +to the point 'cbtclosequantile' (default 99) on the Pareto curve, or 60 +seconds, whichever is greater. + +The actual completion times for these measurement circuits should be +recorded. + +Implementations should completely abandon a circuit and ignore the circuit +if the total build time exceeds the close threshold. Such closed circuits +should be ignored, as this typically means one of the relays in the path is +offline. + +<a id="path-spec.txt-2.4.5"></a> + +## Detecting Changing Network Conditions {#changes-in-network} + +Tor clients attempt to detect both network connectivity loss and drastic +changes in the timeout characteristics. + +To detect changing network conditions, clients keep a history of +the timeout or non-timeout status of the past 'cbtrecentcount' circuits +(20 circuits) that successfully completed at least one hop. If more than +90% of these circuits timeout, the client discards all buildtimes history, +resets the timeout to 'cbtinitialtimeout' (60 seconds), and then begins +recomputing the timeout. + +If the timeout was already at least `cbtinitialtimeout`, +the client doubles the timeout. + +The records here (of how many circuits succeeded or failed among the most +recent 'cbrrecentcount') are not stored as persistent state. On reload, +we start with a new, empty state. + +<a id="path-spec.txt-2.4.6"></a> + +## Consensus parameters governing behavior {#parameters} + +Clients that implement circuit build timeout learning should obey the +following consensus parameters that govern behavior, in order to allow +us to handle bugs or other emergent behaviors due to client circuit +construction. If these parameters are not present in the consensus, +the listed default values should be used instead. + +```text + cbtdisabled + Default: 0 + Min: 0 + Max: 1 + Effect: If 1, all CircuitBuildTime learning code should be + disabled and history should be discarded. For use in + emergency situations only. + + cbtnummodes + Default: 10 + Min: 1 + Max: 20 + Effect: This value governs how many modes to use in the weighted + average calculation of Pareto parameter Xm. Selecting Xm as the + average of multiple modes improves accuracy of the Pareto tail + for quantile cutoffs from 60-80% (see cbtquantile). + + cbtrecentcount + Default: 20 + Min: 3 + Max: 1000 + Effect: This is the number of circuit build outcomes (success vs + timeout) to keep track of for the following option. + + cbtmaxtimeouts + Default: 18 + Min: 3 + Max: 10000 + Effect: When this many timeouts happen in the last 'cbtrecentcount' + circuit attempts, the client should discard all of its + history and begin learning a fresh timeout value. + + Note that if this parameter's value is greater than the value + of 'cbtrecentcount', then the history will never be + discarded because of this feature. + + cbtmincircs + Default: 100 + Min: 1 + Max: 10000 + Effect: This is the minimum number of circuits to build before + computing a timeout. + + Note that if this parameter's value is higher than 1000 (the + number of time observations that a client keeps in its + circular buffer), circuit build timeout calculation is + effectively disabled, and the default timeouts are used + indefinitely. + + cbtquantile + Default: 80 + Min: 10 + Max: 99 + Effect: This is the position on the quantile curve to use to set the + timeout value. It is a percent (10-99). + + cbtclosequantile + Default: 99 + Min: Value of cbtquantile parameter + Max: 99 + Effect: This is the position on the quantile curve to use to set the + timeout value to use to actually close circuits. It is a + percent (0-99). + + cbttestfreq + Default: 10 + Min: 1 + Max: 2147483647 (INT32_MAX) + Effect: Describes how often in seconds to build a test circuit to + gather timeout values. Only applies if less than 'cbtmincircs' + have been recorded. + + cbtmintimeout + Default: 10 + Min: 10 + Max: 2147483647 (INT32_MAX) + Effect: This is the minimum allowed timeout value in milliseconds. + + cbtinitialtimeout + Default: 60000 + Min: Value of cbtmintimeout + Max: 2147483647 (INT32_MAX) + Effect: This is the timeout value to use before we have enough data + to compute a timeout, in milliseconds. If we do not have + enough data to compute a timeout estimate (see cbtmincircs), + then we use this interval both for the close timeout and the + abandon timeout. + + cbtlearntimeout + Default: 180 + Min: 10 + Max: 60000 + Effect: This is how long idle circuits will be kept open while cbt is + learning a new timeout value. + + cbtmaxopencircs + Default: 10 + Min: 0 + Max: 14 + Effect: This is the maximum number of circuits that can be open at + at the same time during the circuit build time learning phase. +``` |