1 files changed, 174 insertions, 0 deletions
diff --git a/doc/spec/proposals/161-computing-bandwidth-adjustments.txt b/doc/spec/proposals/161-computing-bandwidth-adjustments.txt
new file mode 100644
index 0000000000..d219826668
--- /dev/null
+++ b/doc/spec/proposals/161-computing-bandwidth-adjustments.txt
@@ -0,0 +1,174 @@
+Title: Computing Bandwidth Adjustments
+Filename: 161-computing-bandwidth-adjustments.txt
+Author: Mike Perry
+Created: 12-May-2009
+Target: 0.2.2.x
+Status: Finished
+
+
+1. Motivation
+
+  There is high variance in the performance of the Tor network. Despite
+  our efforts to balance load evenly across the Tor nodes, some nodes are
+  significantly slower and more overloaded than others.
+
+  Proposal 160 describes how we can augment the directory authorities to
+  vote on measured bandwidths for routers. This proposal describes what
+  goes into the measuring process.
+
+
+2. Measurement Selection
+
+  The general idea is to determine a load factor representing the ratio
+  of the capacity of measured nodes to the rest of the network. This load
+  factor could be computed from three potentially relevant statistics:
+  circuit failure rates, circuit extend times, or stream capacity.
+
+  Circuit failure rates and circuit extend times appear to be
+  non-linearly proportional to node load. We've observed that the same
+  nodes when scanned at US nighttime hours (when load is presumably
+  lower) exhibit almost no circuit failure, and significantly faster
+  extend times than when scanned during the day.
+
+  Stream capacity, however, is much more uniform, even during US
+  nighttime hours. Moreover, it is a more intuitive representation of
+  node capacity, and also less dependent upon distance and latency
+  if amortized over large stream fetches.
+
+
+3. Average Stream Bandwidth Calculation
+
+  The average stream bandwidths are obtained by dividing the network into
+  slices of 50 nodes each, grouped according to advertised node bandwidth.
+
+  Two hop circuits are built using nodes from the same slice, and a large
+  file is downloaded via these circuits. The file sizes are set based
+  on node percentile rank as follows:
+    
+     0-10: 2M
+     10-20: 1M
+     20-30: 512k
+     30-50: 256k
+     50-100: 128k
+
+  These sizes are based on measurements performed during test scans.
+
+  This process is repeated until each node has been chosen to participate
+  in at least 5 circuits.
+
+
+4. Ratio Calculation
+
+  The ratios are calculated by dividing each measured value by the 
+  network-wide average.
+
+
+5. Ratio Filtering
+
+  After the base ratios are calculated, a second pass is performed
+  to remove any streams with nodes of ratios less than X=0.5 from
+  the results of other nodes. In addition, all outlying streams
+  with capacity of one standard deviation below a node's average
+  are also removed.
+
+  The final ratio result will be greater of the unfiltered ratio
+  and the filtered ratio.
+
+
+6. Pseudocode for Ratio Calculation Algorithm
+
+  Here is the complete pseudocode for the ratio algorithm:
+
+    Slices = {S | S is 50 nodes of similar consensus capacity}
+    for S in Slices:
+      while exists node N in S with circ_chosen(N) < 7:
+        fetch_slice_file(build_2hop_circuit(N, (exit in S)))
+      for N in S:
+        BW_measured(N) = MEAN(b | b is bandwidth of a stream through N)
+        Bw_stddev(N) = STDDEV(b | b is bandwidth of a stream through N)
+      Bw_avg(S) = MEAN(b | b = BW_measured(N) for all N in S)  
+      for N in S:
+        Normal_Streams(N) = {stream via N | bandwidth >= BW_measured(N)} 
+        BW_Norm_measured(N) =  MEAN(b | b is a bandwidth of Normal_Streams(N))
+
+    Bw_net_avg(Slices) = MEAN(BW_measured(N) for all N in Slices)
+    Bw_Norm_net_avg(Slices) = MEAN(BW_Norm_measured(N) for all N in Slices)
+
+    for N in all Slices:
+      Bw_net_ratio(N) = Bw_measured(N)/Bw_net_avg(Slices)
+      Bw_Norm_net_ratio(N) = BW_Norm_measured(N)/Bw_Norm_net_avg(Slices)
+
+      ResultRatio(N) = MAX(Bw_net_ratio(N), Bw_Norm_net_ratio(N))
+
+
+7. Security implications
+
+  The ratio filtering will deal with cases of sabotage by dropping
+  both very slow outliers in stream average calculations, as well
+  as dropping streams that used very slow nodes from the calculation
+  of other nodes.
+
+  This scheme will not address nodes that try to game the system by
+  providing better service to scanners. The scanners can be detected
+  at the entry by IP address, and at the exit by the destination fetch
+  IP.
+
+  Measures can be taken to obfuscate and separate the scanners' source
+  IP address from the directory authority IP address. For instance,
+  scans can happen offsite and the results can be rsynced into the
+  authorities. The destination server IP can also change.
+ 
+  Neither of these methods are foolproof, but such nodes can already
+  lie about their bandwidth to attract more traffic, so this solution
+  does not set us back any in that regard.
+
+
+8. Parallelization
+
+  Because each slice takes as long as 6 hours to complete, we will want
+  to parallelize as much as possible. This will be done by concurrently
+  running multiple scanners from each authority to deal with different
+  segments of the network. Each scanner piece will continually loop 
+  over a portion of the network, outputting files of the form:
+
+   node_id=<idhex> SP strm_bw=<BW_measured(N)> SP 
+         filt_bw=<BW_Norm_measured(N)> ns_bw=<CurrentConsensusBw(N)> NL
+
+  The most recent file from each scanner will be periodically gathered 
+  by another script that uses them to produce network-wide averages 
+  and calculate ratios as per the algorithm in section 6. Because nodes 
+  may shift in capacity, they may appear in more than one slice and/or 
+  appear more than once in the file set. The most recently measured
+  line will be chosen in this case.
+
+
+9. Integration with Proposal 160
+
+  The final results will be produced for the voting mechanism
+  described in Proposal 160 by multiplying the derived ratio by
+  the average published consensus bandwidth during the course of the
+  scan, and taking the weighted average with the previous consensus
+  bandwidth:
+
+     Bw_new = Round((Bw_current * Alpha + Bw_scan_avg*Bw_ratio)/(Alpha + 1))
+
+  The Alpha parameter is a smoothing parameter intended to prevent
+  rapid oscillation between loaded and unloaded conditions. It is
+  currently fixed at 0.333.
+
+  The Round() step consists of rounding to the 3 most significant figures
+  in base10, and then rounding that result to the nearest 1000, with 
+  a minimum value of 1000.
+
+  This will produce a new bandwidth value that will be output into a 
+  file consisting of lines of the form:
+
+     node_id=<idhex> SP bw=<Bw_new> NL
+ 
+  The first line of the file will contain a timestamp in UNIX time()
+  seconds. This will be used by the authority to decide if the 
+  measured values are too old to use.
+ 
+  This file can be either copied or rsynced into a directory readable
+  by the directory authority.
+