summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMike Perry <mikeperry-git@fscked.org>2009-09-16 17:03:54 -0700
committerMike Perry <mikeperry-git@fscked.org>2009-09-16 17:03:54 -0700
commit81dc435ffaa86382c7702a5d7c7460635225dcb7 (patch)
treecf86d54ed4b250a3f90b413e82c90e2bc3c8893d
parent0352d4391756cf80c220d9daf49f2cbc55856e89 (diff)
downloadtor-81dc435ffaa86382c7702a5d7c7460635225dcb7.tar.gz
tor-81dc435ffaa86382c7702a5d7c7460635225dcb7.zip
Update proposal to match implementation.
-rw-r--r--doc/spec/proposals/151-path-selection-improvements.txt85
1 files changed, 42 insertions, 43 deletions
diff --git a/doc/spec/proposals/151-path-selection-improvements.txt b/doc/spec/proposals/151-path-selection-improvements.txt
index 7821a5dddb..94e964b017 100644
--- a/doc/spec/proposals/151-path-selection-improvements.txt
+++ b/doc/spec/proposals/151-path-selection-improvements.txt
@@ -20,7 +20,7 @@ Motivation
Implementation
- Storing Build Times
+ Gathering Build Times
Circuit build times are stored in the circular array
'circuit_build_times' consisting of uint32_t elements as milliseconds.
@@ -30,8 +30,16 @@ Implementation
too large, because it will make it difficult for clients to adapt to
moving between different links.
- From our observations, this value appears to be on the order of 1000,
- but is configurable in a #define NCIRCUITS_TO_OBSERVE.
+ From our observations, the minimum value for a reasonable fit appears
+ to be on the order of 500 (MIN_CIRCUITS_TO_OBSERVE). However, to keep
+ a good fit over the long term, we store 5000 most recent circuits in
+ the array (NCIRCUITS_TO_OBSERVE).
+
+ The Tor client will build test circuits at a rate of one per
+ minute (BUILD_TIMES_TEST_FREQUENCY) up to the point of
+ MIN_CIRCUITS_TO_OBSERVE. This allows a fresh Tor to have
+ a CircuitBuildTimeout estimated within 8 hours after install,
+ upgrade, or network change (see below).
Long Term Storage
@@ -43,9 +51,9 @@ Implementation
Example:
TotalBuildTimes 100
- CircuitBuildTimeBin 0 50
- CircuitBuildTimeBin 50 25
- CircuitBuildTimeBin 100 13
+ CircuitBuildTimeBin 25 50
+ CircuitBuildTimeBin 75 25
+ CircuitBuildTimeBin 125 13
...
Reading the histogram in will entail inserting <count> values
@@ -57,7 +65,12 @@ Implementation
Learning the CircuitBuildTimeout
Based on studies of build times, we found that the distribution of
- circuit buildtimes appears to be a Pareto distribution.
+ circuit buildtimes appears to be a Frechet distribution. However,
+ estimators and quantile functions of the Frechet distribution are
+ difficult to work with and slow to converge. So instead, since we
+ are only interested in the accuracy of the tail, we approximate
+ the tail of the distribution with a Pareto curve starting at
+ the mode of the circuit build time sample set.
We will calculate the parameters for a Pareto distribution
fitting the data using the estimators at
@@ -73,11 +86,8 @@ Implementation
Detecting Changing Network Conditions
- We attempt to detect both network connectivty loss and drastic
- changes in the timeout characteristics. Network connectivity loss
- is detected by recording a timestamp every time Tor either completes
- a TLS connection or receives a cell. If this timestamp is more than
- 90 seconds in the past, circuit timeouts are no longer counted.
+ We attempt to detect both network connectivity loss and drastic
+ changes in the timeout characteristics.
If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
RECENT_CIRCUITS (20) time out, we assume the network connection
@@ -86,6 +96,11 @@ Implementation
position on the Pareto Quartile function for the ratio of
timeouts.
+ Network connectivity loss is detected by recording a timestamp every
+ time Tor either completes a TLS connection or receives a cell. If
+ this timestamp is more than CircuitBuildTimeout*RECENT_CIRCUITS/3
+ seconds in the past, circuit timeouts are no longer counted.
+
Testing
After circuit build times, storage, and learning are implemented,
@@ -96,44 +111,28 @@ Implementation
the python produces matches that which is output to the state file in Tor,
and verify that the Pareto parameters and cutoff points also match.
- Soft timeout vs Hard Timeout
-
- At some point, it may be desirable to change the cutoff from a
- single hard cutoff that destroys the circuit to a soft cutoff and
- a hard cutoff, where the soft cutoff merely triggers the building
- of a new circuit, and the hard cutoff triggers destruction of the
- circuit.
-
- Good values for hard and soft cutoffs seem to be 80% and 60%
- respectively, but we should eventually justify this with observation.
-
- When to Begin Calculation
-
- The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before
- changing the CircuitBuildTimeout will be tunable via a #define. From
- our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be
- on the order of 100.
+ We will also verify that there are no unexpected large deviations from
+ node selection, such as nodes from distant geographical locations being
+ completely excluded.
Dealing with Timeouts
Timeouts should be counted as the expectation of the region of
- of the Pareto distribution beyond the cutoff. The proposal will
- be updated with this value soon.
+ of the Pareto distribution beyond the cutoff. This is done by
+ generating a random sample for each timeout at points on the
+ curve beyond the current timeout cutoff.
- Also, in the event of network failure, the observation mechanism
- should stop collecting timeout data.
+ Future Work
- Client Hints
-
- Some research still needs to be done to provide initial values
- for CircuitBuildTimeout based on values learned from modem
- users, DSL users, Cable Modem users, and dedicated links. A
- radiobutton in Vidalia should eventually be provided that
- sets CircuitBuildTimeout to one of these values and also
- provide the option of purging all learned data, should any exist.
+ At some point, it may be desirable to change the cutoff from a
+ single hard cutoff that destroys the circuit to a soft cutoff and
+ a hard cutoff, where the soft cutoff merely triggers the building
+ of a new circuit, and the hard cutoff triggers destruction of the
+ circuit.
- These values can either be published in the directory, or
- shipped hardcoded for a particular Tor version.
+ It may also be beneficial to learn separate timeouts for each
+ guard node, as they will have slightly different distributions.
+ This will take longer to generate initial values though.
Issues