diff options
Diffstat (limited to 'doc/spec/proposals/151-path-selection-improvements.txt')
-rw-r--r-- | doc/spec/proposals/151-path-selection-improvements.txt | 147 |
1 files changed, 0 insertions, 147 deletions
diff --git a/doc/spec/proposals/151-path-selection-improvements.txt b/doc/spec/proposals/151-path-selection-improvements.txt deleted file mode 100644 index e3c8f35451..0000000000 --- a/doc/spec/proposals/151-path-selection-improvements.txt +++ /dev/null @@ -1,147 +0,0 @@ -Filename: 151-path-selection-improvements.txt -Title: Improving Tor Path Selection -Version: -Last-Modified: -Author: Fallon Chen, Mike Perry -Created: 5-Jul-2008 -Status: Draft - -Overview - - The performance of paths selected can be improved by adjusting the - CircuitBuildTimeout and avoiding failing guard nodes. This proposal - describes a method of tracking buildtime statistics at the client, and - using those statistics to adjust the CircuitBuildTimeout. - -Motivation - - Tor's performance can be improved by excluding those circuits that - have long buildtimes (and by extension, high latency). For those Tor - users who require better performance and have lower requirements for - anonymity, this would be a very useful option to have. - -Implementation - - Storing Build Times - - Circuit build times will be stored in the circular array - 'circuit_build_times' consisting of uint16_t elements as milliseconds. - The total size of this array will be based on the number of circuits - it takes to converge on a good fit of the long term distribution of - the circuit builds for a fixed link. We do not want this value to be - too large, because it will make it difficult for clients to adapt to - moving between different links. - - From our initial observations, this value appears to be on the order - of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE. - The exact value for this #define will be determined by performing - goodness of fit tests using measurments obtained from the shufflebt.py - script from TorFlow. - - Long Term Storage - - The long-term storage representation will be implemented by storing a - histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when - writing out the statistics to disk. The format of this histogram on disk - is yet to be finalized, but it will likely be of the format - 'CircuitBuildTime <bin> <count>', with the total specified as - 'TotalBuildTimes <total>' - Example: - - TotalBuildTimes 100 - CircuitBuildTimeBin 1 50 - CircuitBuildTimeBin 2 25 - CircuitBuildTimeBin 3 13 - ... - - Reading the histogram in will entail multiplying each bin by the - BUILDTIME_BIN_WIDTH and then inserting <count> values into the - circuit_build_times array each with the value of - <bin>*BUILDTIME_BIN_WIDTH. In order to evenly distribute the - values in the circular array, a form of index skipping must - be employed. Values from bin #N with bin count C and total T - will occupy indexes specified by N+((T/C)*k)-1, where k is the - set of integers ranging from 0 to C-1. - - For example, this would mean that the values from bin 1 would - occupy indexes 1+(100/50)*k-1, or 0, 2, 4, 6, 8, 10 and so on. - The values for bin 2 would occupy positions 1, 5, 9, 13. Collisions - will be inserted at the first empty position in the array greater - than the selected index (which may requiring looping around the - array back to index 0). - - Learning the CircuitBuildTimeout - - Based on studies of build times, we found that the distribution of - circuit buildtimes appears to be a Pareto distribution. - - We will calculate the parameters for a Pareto distribution - fitting the data using the estimators at - http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation. - - The timeout itself will be calculated by solving the CDF for the - a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value - represents the percentage of paths the Tor client will accept out of - the total number of paths. We have not yet determined a good - cutoff for this mathematically, but 85% seems a good choice for now. - - From http://en.wikipedia.org/wiki/Pareto_distribution#Definition, - the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm. - - Testing - - After circuit build times, storage, and learning are implemented, - the resulting histogram should be checked for consistency by - verifying it persists across successive Tor invocations where - no circuits are built. In addition, we can also use the existing - buildtime scripts to record build times, and verify that the histogram - the python produces matches that which is output to the state file in Tor, - and verify that the Pareto parameters and cutoff points also match. - - Soft timeout vs Hard Timeout - - At some point, it may be desirable to change the cutoff from a - single hard cutoff that destroys the circuit to a soft cutoff and - a hard cutoff, where the soft cutoff merely triggers the building - of a new circuit, and the hard cutoff triggers destruction of the - circuit. - - Good values for hard and soft cutoffs seem to be 85% and 65% - respectively, but we should eventually justify this with observation. - - When to Begin Calculation - - The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before - changing the CircuitBuildTimeout will be tunable via a #define. From - our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be - on the order of 100. - - Dealing with Timeouts - - Timeouts should be counted as the expectation of the region of - of the Pareto distribution beyond the cutoff. The proposal will - be updated with this value soon. - - Also, in the event of network failure, the observation mechanism - should stop collecting timeout data. - - Client Hints - - Some research still needs to be done to provide initial values - for CircuitBuildTimeout based on values learned from modem - users, DSL users, Cable Modem users, and dedicated links. A - radiobutton in Vidalia should eventually be provided that - sets CircuitBuildTimeout to one of these values and also - provide the option of purging all learned data, should any exist. - - These values can either be published in the directory, or - shipped hardcoded for a particular Tor version. - -Issues - - Impact on anonymity - - Since this follows a Pareto distribution, large reductions on the - timeout can be achieved without cutting off a great number of the - total paths. This will eliminate a great deal of the performance - variation of Tor usage. |