From 95969867fcff48cf63ea9a247e9ac09810e160d2 Mon Sep 17 00:00:00 2001
From: Mike Perry <mikeperry-git@fscked.org>
Date: Tue, 12 Aug 2008 18:23:38 +0000
Subject: Updated to remove dropping of failing guards and just focus on the
 specifics of recording, storing, and learning circuitbuildtimeout parameters.

svn:r16511
---
 proposals/151-path-selection-improvements.txt | 138 +++++++++++++++-----------
 1 file changed, 79 insertions(+), 59 deletions(-)

(limited to 'proposals/151-path-selection-improvements.txt')
diff --git a/proposals/151-path-selection-improvements.txt b/proposals/151-path-selection-improvements.txt
index 3362efb..fd24a9c 100644
--- a/proposals/151-path-selection-improvements.txt
+++ b/proposals/151-path-selection-improvements.txt
@@ -10,8 +10,8 @@ Overview
 
   The performance of paths selected can be improved by adjusting the
   CircuitBuildTimeout and avoiding failing guard nodes. This proposal
-  describes a method of tracking buildtime statistics, and using those
-  statistics to adjust the CircuitBuildTimeout and the number of guards.
+  describes a method of tracking buildtime statistics at the client, and 
+  using those statistics to adjust the CircuitBuildTimeout.
 
 Motivation
 
@@ -22,71 +22,91 @@ Motivation
 
 Implementation
 
+  Storing Build Times
+
+    Circuit build times will be stored in the circular array
+    'circuit_build_times' consisting of uint16_t elements as milliseconds.
+    The total size of this array will be based on the number of circuits
+    it takes to converge on a good fit of the long term distribution of
+    the circuit builds for a fixed link. We do not want this value to be
+    too large, because it will make it difficult for clients to adapt to
+    moving between different links.
+
+    From our initial observations, this value appears to be on the order 
+    of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
+ 
+  Long Term Storage
+
+    The long-term storage representation will be implemented by storing a 
+    histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when 
+    writing out the statistics to disk. The format of this histogram on disk 
+    is yet to be finalized, but it will likely be of the format 
+    'CircuitBuildTime <bin> <count>'.
+    Example:
+
+    CircuitBuildTimeBin 1 100
+    CircuitBuildTimeBin 2 50
+    ...
+
+    Reading the histogram in will entail multiplying each bin by the 
+    BUILDTIME_BIN_WIDTH and then inserting <count> values into the 
+    circuit_build_times array each with the value of
+    <bin>*BUILDTIME_BIN_WIDTH.
+
   Learning the CircuitBuildTimeout
 
     Based on studies of build times, we found that the distribution of
-    circuit buildtimes appears to be a Pareto distribution. The number
-    of circuits to observe (ncircuits_to_cutoff) before changing the
-    CircuitBuildTimeout will be tunable. From out measurements, 
-    ncircuits_to_cuttoff appears to be on the order of 100.
- 
-	In addition, the total number of circuits gathered
-    (ncircuits_to_observe) will also be tunable. It is likely that
-    ncircuits_to_observe will be somewhere on the order of 1000. The values
-    can be represented compactly in Tor in milliseconds as a circular array
-    of 16 bit integers. More compact long-term storage representations can
-    be implemented by simply storing a histogram with 50 millisecond buckets
-    when writing out the statistics to disk.
-
-  Calculating the preferred CircuitBuildTimeout
-
-    Circuits that have longer buildtimes than some x% of the estimated
-    CDF of the Pareto distribution will be excluded. x will be tunable
-    as well.
-
-  Circuit timeouts
-
-    In the event of a timeout, backoff values should include the 100-x%
-    of expected CDF of timeouts.  Also, in the event of network failure,
-    the observation mechanism should stop collecting timeout data.
-
-  Dropping Failed Guards
-
-    In addition, we have noticed that some entry guards are much more
-    failure prone than others. In particular, the circuit failure rates for
-    the fastest entry guards was approximately 20-25%, where as slower
-    guards exhibit failure rates as high as 45-50%. In [1], it was
-    demonstrated that failing guard nodes can deliberately bias path
-    selection to improve their success at capturing traffic. For both these
-    reasons, failing guards should be avoided. 
-    
-    We propose increasing the number of entry guards to five, and gathering
-    circuit failure statistics on each entry guard. Any guards that exceed
-    the average failure rate of all guards by 10% after we have
-    gathered ncircuits_to_observe circuits will be replaced.
-    
+    circuit buildtimes appears to be a Pareto distribution. 
 
-Issues
+    We will calculate the parameters for a Pareto distribution
+    fitting the data using the estimators at
+    http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
 
-  Impact on anonymity
+    The timeout itself will be calculated by solving the CDF for the 
+    a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value
+    represents the percentage of paths the Tor client will accept out of
+    the total number of paths. We have not yet determined a good
+    cutoff for this mathematically, but 85% seems a good choice for now.
 
-    Since this follows a Pareto distribution, large reductions on the
-    timeout can be achieved without cutting off a great number of the
-    total paths.  However, hard statistics on which cutoff percentage
-    gives optimal performance have not yet been gathered.
+    From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
+    the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm. 
+
+  When to Begin Calculation
+
+    The number of circuits to observe (NCIRCUITS_TO_CUTOFF) before 
+    changing the CircuitBuildTimeout will be tunable via a #define. From 
+    our measurements, a good value for NCIRCUITS_TO_CUTOFF appears to be 
+    on the order of 100.
+
+  Dealing with Timeouts
+
+    Timeouts should be counted as the expectation of the region of 
+    of the Pareto distribution beyond the cutoff. The proposal will
+    be updated with this value soon.
+
+    Also, in the event of network failure, the observation mechanism 
+    should stop collecting timeout data.
 
-  Guard Turnover
+    Circuits that timeout will be destroyed, as this indicates one
+    or more of their respective nodes are currently overloaded.
 
-    We contend that the risk from failing guards biasing path selection
-    outweighs the risk of exposure to larger portions of the network
-    for the first hop. Furthermore, from our observations, it appears
-    that circuit failure is strongly correlated to node load. Allowing
-    clients to migrate away from failing guards should naturally
-    rebalance the network, and eventually clients should converge on
-    a stable set of reliable guards. It is also likely that once clients
-    begin to migrate away from failing guards, their load should go
-    down, causing their failure rates to drop as well.
+  Client Hints
 
+    Some research still needs to be done to provide initial values
+    for CircuitBuildTimeout based on values learned from modem
+    users, DSL users, Cable Modem users, and dedicated links. A 
+    radiobutton in Vidalia should eventually be provided that
+    sets CircuitBuildTimeout to one of these values and also 
+    provide the option of purging all learned data, should any exist.
 
-[1] http://www.crhc.uiuc.edu/~nikita/papers/relmix-ccs07.pdf
+    These values can either be published in the directory, or
+    shipped hardcoded for a particular Tor version.
+    
+Issues
+
+  Impact on anonymity
 
+    Since this follows a Pareto distribution, large reductions on the
+    timeout can be achieved without cutting off a great number of the
+    total paths. This will eliminate a great deal of the performance
+    variation of Tor usage.
-- 
cgit v1.2.3-54-g00ecf