diff options
Diffstat (limited to 'doc/spec/proposals/151-path-selection-improvements.txt')
-rw-r--r-- | doc/spec/proposals/151-path-selection-improvements.txt | 148 |
1 files changed, 0 insertions, 148 deletions
diff --git a/doc/spec/proposals/151-path-selection-improvements.txt b/doc/spec/proposals/151-path-selection-improvements.txt deleted file mode 100644 index af89f21193..0000000000 --- a/doc/spec/proposals/151-path-selection-improvements.txt +++ /dev/null @@ -1,148 +0,0 @@ -Filename: 151-path-selection-improvements.txt -Title: Improving Tor Path Selection -Author: Fallon Chen, Mike Perry -Created: 5-Jul-2008 -Status: Finished -In-Spec: path-spec.txt - -Overview - - The performance of paths selected can be improved by adjusting the - CircuitBuildTimeout and avoiding failing guard nodes. This proposal - describes a method of tracking buildtime statistics at the client, and - using those statistics to adjust the CircuitBuildTimeout. - -Motivation - - Tor's performance can be improved by excluding those circuits that - have long buildtimes (and by extension, high latency). For those Tor - users who require better performance and have lower requirements for - anonymity, this would be a very useful option to have. - -Implementation - - Gathering Build Times - - Circuit build times are stored in the circular array - 'circuit_build_times' consisting of uint32_t elements as milliseconds. - The total size of this array is based on the number of circuits - it takes to converge on a good fit of the long term distribution of - the circuit builds for a fixed link. We do not want this value to be - too large, because it will make it difficult for clients to adapt to - moving between different links. - - From our observations, the minimum value for a reasonable fit appears - to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep - a good fit over the long term, we store 5000 most recent circuits in - the array (NCIRCUITS_TO_OBSERVE). - - The Tor client will build test circuits at a rate of one per - minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of - MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have - a CircuitBuildTimeout estimated within 8 hours after install, - upgrade, or network change (see below). - - Long Term Storage - - The long-term storage representation is implemented by storing a - histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when - writing out the statistics to disk. The format this takes in the - state file is 'CircuitBuildTime <bin-ms> <count>', with the total - specified as 'TotalBuildTimes <total>' - Example: - - TotalBuildTimes 100 - CircuitBuildTimeBin 25 50 - CircuitBuildTimeBin 75 25 - CircuitBuildTimeBin 125 13 - ... - - Reading the histogram in will entail inserting <count> values - into the circuit_build_times array each with the value of - <bin-ms> milliseconds. In order to evenly distribute the values - in the circular array, the Fisher-Yates shuffle will be performed - after reading values from the bins. - - Learning the CircuitBuildTimeout - - Based on studies of build times, we found that the distribution of - circuit buildtimes appears to be a Frechet distribution. However, - estimators and quantile functions of the Frechet distribution are - difficult to work with and slow to converge. So instead, since we - are only interested in the accuracy of the tail, we approximate - the tail of the distribution with a Pareto curve starting at - the mode of the circuit build time sample set. - - We will calculate the parameters for a Pareto distribution - fitting the data using the estimators at - http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation. - - The timeout itself is calculated by using the Quartile function (the - inverted CDF) to give us the value on the CDF such that - BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is - below the timeout value. - - Thus, we expect that the Tor client will accept the fastest 80% of - the total number of paths on the network. - - Detecting Changing Network Conditions - - We attempt to detect both network connectivity loss and drastic - changes in the timeout characteristics. - - We assume that we've had network connectivity loss if 3 circuits - timeout and we've received no cells or TLS handshakes since those - circuits began. We then set the timeout to 60 seconds and stop - counting timeouts. - - If 3 more circuits timeout and the network still has not been - live within this new 60 second timeout window, we then discard - the previous timeouts during this period from our history. - - To detect changing network conditions, we keep a history of - the timeout or non-timeout status of the past RECENT_CIRCUITS (20) - that successfully completed at least one hop. If more than 75% - of these circuits timeout, we discard all buildtimes history, - reset the timeout to 60, and then begin recomputing the timeout. - - Testing - - After circuit build times, storage, and learning are implemented, - the resulting histogram should be checked for consistency by - verifying it persists across successive Tor invocations where - no circuits are built. In addition, we can also use the existing - buildtime scripts to record build times, and verify that the histogram - the python produces matches that which is output to the state file in Tor, - and verify that the Pareto parameters and cutoff points also match. - - We will also verify that there are no unexpected large deviations from - node selection, such as nodes from distant geographical locations being - completely excluded. - - Dealing with Timeouts - - Timeouts should be counted as the expectation of the region of - of the Pareto distribution beyond the cutoff. This is done by - generating a random sample for each timeout at points on the - curve beyond the current timeout cutoff. - - Future Work - - At some point, it may be desirable to change the cutoff from a - single hard cutoff that destroys the circuit to a soft cutoff and - a hard cutoff, where the soft cutoff merely triggers the building - of a new circuit, and the hard cutoff triggers destruction of the - circuit. - - It may also be beneficial to learn separate timeouts for each - guard node, as they will have slightly different distributions. - This will take longer to generate initial values though. - -Issues - - Impact on anonymity - - Since this follows a Pareto distribution, large reductions on the - timeout can be achieved without cutting off a great number of the - total paths. This will eliminate a great deal of the performance - variation of Tor usage. |