diff options
Diffstat (limited to 'doc/spec')
-rw-r--r-- | doc/spec/proposals/151-path-selection-improvements.txt | 85 |
1 files changed, 42 insertions, 43 deletions
diff --git a/doc/spec/proposals/151-path-selection-improvements.txt b/doc/spec/proposals/151-path-selection-improvements.txt index 7821a5dddb..94e964b017 100644 --- a/doc/spec/proposals/151-path-selection-improvements.txt +++ b/doc/spec/proposals/151-path-selection-improvements.txt @@ -20,7 +20,7 @@ Motivation Implementation - Storing Build Times + Gathering Build Times Circuit build times are stored in the circular array 'circuit_build_times' consisting of uint32_t elements as milliseconds. @@ -30,8 +30,16 @@ Implementation too large, because it will make it difficult for clients to adapt to moving between different links. - From our observations, this value appears to be on the order of 1000, - but is configurable in a #define NCIRCUITS_TO_OBSERVE. + From our observations, the minimum value for a reasonable fit appears + to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep + a good fit over the long term, we store 5000 most recent circuits in + the array (NCIRCUITS_TO_OBSERVE). + + The Tor client will build test circuits at a rate of one per + minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of + MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have + a CircuitBuildTimeout estimated within 8 hours after install, + upgrade, or network change (see below). Long Term Storage @@ -43,9 +51,9 @@ Implementation Example: TotalBuildTimes 100 - CircuitBuildTimeBin 0 50 - CircuitBuildTimeBin 50 25 - CircuitBuildTimeBin 100 13 + CircuitBuildTimeBin 25 50 + CircuitBuildTimeBin 75 25 + CircuitBuildTimeBin 125 13 ... Reading the histogram in will entail inserting <count> values @@ -57,7 +65,12 @@ Implementation Learning the CircuitBuildTimeout Based on studies of build times, we found that the distribution of - circuit buildtimes appears to be a Pareto distribution. + circuit buildtimes appears to be a Frechet distribution. However, + estimators and quantile functions of the Frechet distribution are + difficult to work with and slow to converge. So instead, since we + are only interested in the accuracy of the tail, we approximate + the tail of the distribution with a Pareto curve starting at + the mode of the circuit build time sample set. We will calculate the parameters for a Pareto distribution fitting the data using the estimators at @@ -73,11 +86,8 @@ Implementation Detecting Changing Network Conditions - We attempt to detect both network connectivty loss and drastic - changes in the timeout characteristics. Network connectivity loss - is detected by recording a timestamp every time Tor either completes - a TLS connection or receives a cell. If this timestamp is more than - 90 seconds in the past, circuit timeouts are no longer counted. + We attempt to detect both network connectivity loss and drastic + changes in the timeout characteristics. If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past RECENT_CIRCUITS (20) time out, we assume the network connection @@ -86,6 +96,11 @@ Implementation position on the Pareto Quartile function for the ratio of timeouts. + Network connectivity loss is detected by recording a timestamp every + time Tor either completes a TLS connection or receives a cell. If + this timestamp is more than CircuitBuildTimeout*RECENT_CIRCUITS/3 + seconds in the past, circuit timeouts are no longer counted. + Testing After circuit build times, storage, and learning are implemented, @@ -96,44 +111,28 @@ Implementation the python produces matches that which is output to the state file in Tor, and verify that the Pareto parameters and cutoff points also match. - Soft timeout vs Hard Timeout - - At some point, it may be desirable to change the cutoff from a - single hard cutoff that destroys the circuit to a soft cutoff and - a hard cutoff, where the soft cutoff merely triggers the building - of a new circuit, and the hard cutoff triggers destruction of the - circuit. - - Good values for hard and soft cutoffs seem to be 80% and 60% - respectively, but we should eventually justify this with observation. - - When to Begin Calculation - - The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before - changing the CircuitBuildTimeout will be tunable via a #define. From - our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be - on the order of 100. + We will also verify that there are no unexpected large deviations from + node selection, such as nodes from distant geographical locations being + completely excluded. Dealing with Timeouts Timeouts should be counted as the expectation of the region of - of the Pareto distribution beyond the cutoff. The proposal will - be updated with this value soon. + of the Pareto distribution beyond the cutoff. This is done by + generating a random sample for each timeout at points on the + curve beyond the current timeout cutoff. - Also, in the event of network failure, the observation mechanism - should stop collecting timeout data. + Future Work - Client Hints - - Some research still needs to be done to provide initial values - for CircuitBuildTimeout based on values learned from modem - users, DSL users, Cable Modem users, and dedicated links. A - radiobutton in Vidalia should eventually be provided that - sets CircuitBuildTimeout to one of these values and also - provide the option of purging all learned data, should any exist. + At some point, it may be desirable to change the cutoff from a + single hard cutoff that destroys the circuit to a soft cutoff and + a hard cutoff, where the soft cutoff merely triggers the building + of a new circuit, and the hard cutoff triggers destruction of the + circuit. - These values can either be published in the directory, or - shipped hardcoded for a particular Tor version. + It may also be beneficial to learn separate timeouts for each + guard node, as they will have slightly different distributions. + This will take longer to generate initial values though. Issues |