diff options
author | Mike Perry <mikeperry-git@fscked.org> | 2009-09-01 20:13:52 -0700 |
---|---|---|
committer | Mike Perry <mikeperry-git@fscked.org> | 2009-09-16 15:52:03 -0700 |
commit | fd412549fdd6631c12d1e6d9092a9506f60d9789 (patch) | |
tree | 1d8e714733ebdee4a8e10a8a231421634c038886 /doc | |
parent | fca84469496e19542127a0c0f65b933a3eee4104 (diff) | |
download | tor-fd412549fdd6631c12d1e6d9092a9506f60d9789.tar.gz tor-fd412549fdd6631c12d1e6d9092a9506f60d9789.zip |
Update proposal to bring it more in-line with implementation.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/spec/proposals/151-path-selection-improvements.txt | 80 |
1 files changed, 40 insertions, 40 deletions
diff --git a/doc/spec/proposals/151-path-selection-improvements.txt b/doc/spec/proposals/151-path-selection-improvements.txt index 3d5f07d3ab..df19e0f71f 100644 --- a/doc/spec/proposals/151-path-selection-improvements.txt +++ b/doc/spec/proposals/151-path-selection-improvements.txt @@ -2,7 +2,7 @@ Filename: 151-path-selection-improvements.txt Title: Improving Tor Path Selection Author: Fallon Chen, Mike Perry Created: 5-Jul-2008 -Status: Draft +Status: Implemented Overview @@ -22,51 +22,37 @@ Implementation Storing Build Times - Circuit build times will be stored in the circular array - 'circuit_build_times' consisting of uint16_t elements as milliseconds. - The total size of this array will be based on the number of circuits + Circuit build times are stored in the circular array + 'circuit_build_times' consisting of uint32_t elements as milliseconds. + The total size of this array is based on the number of circuits it takes to converge on a good fit of the long term distribution of the circuit builds for a fixed link. We do not want this value to be too large, because it will make it difficult for clients to adapt to moving between different links. - From our initial observations, this value appears to be on the order - of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE. - The exact value for this #define will be determined by performing - goodness of fit tests using measurments obtained from the shufflebt.py - script from TorFlow. + From our observations, this value appears to be on the order of 1000, + but is configurable in a #define NCIRCUITS_TO_OBSERVE. Long Term Storage - The long-term storage representation will be implemented by storing a + The long-term storage representation is implemented by storing a histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when - writing out the statistics to disk. The format of this histogram on disk - is yet to be finalized, but it will likely be of the format - 'CircuitBuildTime <bin> <count>', with the total specified as - 'TotalBuildTimes <total>' + writing out the statistics to disk. The format this takes in the + state file is 'CircuitBuildTime <bin-ms> <count>', with the total + specified as 'TotalBuildTimes <total>' Example: TotalBuildTimes 100 - CircuitBuildTimeBin 1 50 - CircuitBuildTimeBin 2 25 - CircuitBuildTimeBin 3 13 + CircuitBuildTimeBin 0 50 + CircuitBuildTimeBin 50 25 + CircuitBuildTimeBin 100 13 ... - Reading the histogram in will entail multiplying each bin by the - BUILDTIME_BIN_WIDTH and then inserting <count> values into the - circuit_build_times array each with the value of - <bin>*BUILDTIME_BIN_WIDTH. In order to evenly distribute the - values in the circular array, a form of index skipping must - be employed. Values from bin #N with bin count C and total T - will occupy indexes specified by N+((T/C)*k)-1, where k is the - set of integers ranging from 0 to C-1. - - For example, this would mean that the values from bin 1 would - occupy indexes 1+(100/50)*k-1, or 0, 2, 4, 6, 8, 10 and so on. - The values for bin 2 would occupy positions 1, 5, 9, 13. Collisions - will be inserted at the first empty position in the array greater - than the selected index (which may requiring looping around the - array back to index 0). + Reading the histogram in will entail inserting <count> values + into the circuit_build_times array each with the value of + <bin-ms> milliseconds. In order to evenly distribute the values + in the circular array, the Fisher-Yates shuffle will be performed + after reading values from the bins. Learning the CircuitBuildTimeout @@ -77,14 +63,28 @@ Implementation fitting the data using the estimators at http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation. - The timeout itself will be calculated by solving the CDF for the - a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value - represents the percentage of paths the Tor client will accept out of - the total number of paths. We have not yet determined a good - cutoff for this mathematically, but 85% seems a good choice for now. + The timeout itself is calculated by using the Quartile function (the + inverted CDF) to give us the value on the CDF such that + BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is + below the timeout value. - From http://en.wikipedia.org/wiki/Pareto_distribution#Definition, - the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm. + Thus, we expect that the Tor client will accept the fastest 80% of + the total number of paths on the network. + + Detecting Changing Network Conditions + + We attempt to detect both network connectivty loss and drastic + changes in the timeout characteristics. Network connectivity loss + is detected by recording a timestamp every time Tor either completes + a TLS connection or receives a cell. If this timestamp is more than + 90 seconds in the past, circuit timeouts are no longer counted. + + If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past + RECENT_CIRCUITS (20) time out, we assume the network connection + has changed, and we discard all buildtimes history and compute + a new timeout by estimating a new Pareto curve using the + position on the Pareto Quartile function for the ratio of + timeouts. Testing @@ -104,7 +104,7 @@ Implementation of a new circuit, and the hard cutoff triggers destruction of the circuit. - Good values for hard and soft cutoffs seem to be 85% and 65% + Good values for hard and soft cutoffs seem to be 80% and 60% respectively, but we should eventually justify this with observation. When to Begin Calculation |