Update path-spec.txt with contents of proposal 151.

author: Mike Perry <mikeperry-git@fscked.org> 2010-05-04 16:37:43 -0700
committer: Mike Perry <mikeperry-git@fscked.org> 2010-05-10 12:59:05 -0700
commit: e84025bc2baf9248b8e888e2e5c7eefbb6354d14 (patch)
tree: fdb269fd465513e82c36ce222f43821e119c2e4f /doc
parent: 835ab531028911fbec6e9706a1ba33bcab37f102 (diff)
download: tor-e84025bc2baf9248b8e888e2e5c7eefbb6354d14.tar.gz
tor-e84025bc2baf9248b8e888e2e5c7eefbb6354d14.zip
2 files changed, 131 insertions, 1 deletions
diff --git a/doc/spec/path-spec.txt b/doc/spec/path-spec.txt
index 8a85718a08..ae5ab3e3dd 100644
--- a/doc/spec/path-spec.txt
+++ b/doc/spec/path-spec.txt
@@ -175,6 +175,7 @@ of their choices.
 
    XXXX
 
+
 2.2. Path selection and constraints
 
    We choose the path for each new circuit before we build it.  We choose the
@@ -295,8 +296,136 @@ of their choices.
    at a given node -- either via the ".exit" notation or because the
    destination is running at the same location as an exit node.
 
+2.4. Learning when to give up ("timeout") on circuit construction
+
+   Since version 0.2.2.8-alpha, Tor attempts to learn when to give up on
+   circuits based on network conditions.
+
+2.4.1 Distribution choice and parameter estimation
+
+   Based on studies of build times, we found that the distribution of
+   circuit build times appears to be a Frechet distribution. However,
+   estimators and quantile functions of the Frechet distribution are
+   difficult to work with and slow to converge. So instead, since we
+   are only interested in the accuracy of the tail, we approximate
+   the tail of the distribution with a Pareto curve.
+
+   We calculate the parameters for a Pareto distribution fitting the data
+   using the estimators at
+   http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
+
+   Because this is not a true Pareto distribution, we alter how Xm is
+   computed. The Xm parameter is computed as the midpoint of the most
+   frequently occurring 50ms histogram bin, until the point where 1000
+   circuits are recorded. After this point, the weighted average of the top
+   3 midpoint modes is used as Xm. All times below this value are counted
+   as having the midpoint value of this weighted average bin.
+
+   The timeout itself is calculated by using the Pareto Quantile function (the
+   inverted CDF) to give us the value on the CDF such that 80% of the mass
+   of the distribution is below the timeout value.
+
+   Thus, we expect that the Tor client will accept the fastest 80% of
+   the total number of paths on the network.
+
+2.4.2. How much data to record
+
+   From our observations, the minimum number of circuit build times for a
+   reasonable fit appears to be on the order of 100. However, to keep a
+   good fit over the long term, we store 1000 most recent circuit build times
+   in a circular array.
+
+   The Tor client should build test circuits at a rate of one per
+   minute up until 100 circuits are built. This allows a fresh Tor to have
+   a CircuitBuildTimeout estimated within 1.5 hours after install,
+   upgrade, or network change (see below).
+
+   Timeouts are stored on disk in a histogram of 50ms bin width, the same
+   width used to calculate the Xm value above. This histogram must be shuffled
+   after being read from disk, to preserve a proper expiration of old values
+   after restart.
+
+2.4.3. How to record timeouts
+
+   Timeouts should be counted as the expectation of the region of
+   of the Pareto distribution beyond the cutoff. This is done by
+   generating a random sample for each timeout at points on the
+   curve beyond the current timeout cutoff up to the 90% quantile marker.
+
+2.4.4. Detecting Changing Network Conditions
+
+   We attempt to detect both network connectivity loss and drastic
+   changes in the timeout characteristics.
+
+   We assume that we've had network connectivity loss if 3 circuits
+   timeout and we've received no cells or TLS handshakes since those
+   circuits began. We then temporarily set the timeout to 60 seconds
+   and stop counting timeouts.
+
+   If 3 more circuits timeout and the network still has not been
+   live within this new 60 second timeout window, we then discard
+   the previous timeouts during this period from our history.
+
+   To detect changing network conditions, we keep a history of
+   the timeout or non-timeout status of the past 20 circuits that
+   successfully completed at least one hop. If more than 90% of
+   these circuits timeout, we discard all buildtimes history, reset
+   the timeout to 60, and then begin recomputing the timeout.
+
+   If the timeout was already 60 or higher, we double the timeout.
+
+2.4.5. Consensus parameters governing behavior
+
+   Clients that implement circuit build timeout learning should obey the
+   following consensus parameters that govern behavior, in order to allow
+   us to handle bugs or other emergent behaviors due to client circuit
+   construction. If these parameters are not present in the consensus,
+   the listed default values should be used instead.
+
+      cbtrecentcount
+        Default: 20
+        Effect: This is the number of circuit build times to keep track of
+                for the following option.
+
+      cbtmaxtimeouts
+        Default: 18
+        Effect: When this many timeouts happen in the last 'cbtrecentcount'
+                circuit attempts, the client should discard all of its
+                history and begin learning a fresh timeout value.
+
+      cbtmincircs
+        Default: 100
+        Effect: This is the minimum number of circuits to build before
+                computing a timeout.
+
+      cbtquantile
+        Default: 80
+        Effect: This is the position on the quantile curve to use to set the
+                timeout value. It is a percent (0-99).
+
+      cbtmaxsynthquantile
+        Default: 90
+        Effect: This is the maximum position on the quantile curve to use to
+                generate synthetic circuit build times for timeouts. It is a
+                percent (0-99).
+
+      cbttestfreq
+        Default: 60
+        Effect: Describes how often in seconds to build a test circuit to
+                gather timeout values. Only applies if less than 'cbtmincircs'
+                have been recorded.
+
+      cbtmintimeout
+        Default: 2000
+        Effect: This is the minimum allowed timeout value in milliseconds.
+
+      cbtinitialtimeout
+        Default: 60000
+        Effect: This is the timeout value to use before computing a timeout,
+                in milliseconds.
+
 
-2.4. Handling failure
+2.5. Handling failure
 
    If an attempt to extend a circuit fails (either because the first create
    failed or a subsequent extend failed) then the circuit is torn down and is
diff --git a/doc/spec/proposals/151-path-selection-improvements.txt b/doc/spec/proposals/151-path-selection-improvements.txt
index 363eebae84..af89f21193 100644
--- a/doc/spec/proposals/151-path-selection-improvements.txt
+++ b/doc/spec/proposals/151-path-selection-improvements.txt
@@ -3,6 +3,7 @@ Title: Improving Tor Path Selection
 Author: Fallon Chen, Mike Perry
 Created: 5-Jul-2008
 Status: Finished
+In-Spec: path-spec.txt
 
 Overview
author	Mike Perry <mikeperry-git@fscked.org>	2010-05-04 16:37:43 -0700
committer	Mike Perry <mikeperry-git@fscked.org>	2010-05-10 12:59:05 -0700
commit	e84025bc2baf9248b8e888e2e5c7eefbb6354d14 (patch)
tree	fdb269fd465513e82c36ce222f43821e119c2e4f /doc
parent	835ab531028911fbec6e9706a1ba33bcab37f102 (diff)
download	tor-e84025bc2baf9248b8e888e2e5c7eefbb6354d14.tar.gz tor-e84025bc2baf9248b8e888e2e5c7eefbb6354d14.zip