aboutsummaryrefslogtreecommitdiff
path: root/doc/spec/proposals/161-computing-bandwidth-adjustments.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/spec/proposals/161-computing-bandwidth-adjustments.txt')
-rw-r--r--doc/spec/proposals/161-computing-bandwidth-adjustments.txt188
1 files changed, 188 insertions, 0 deletions
diff --git a/doc/spec/proposals/161-computing-bandwidth-adjustments.txt b/doc/spec/proposals/161-computing-bandwidth-adjustments.txt
new file mode 100644
index 0000000000..b02dc64eb2
--- /dev/null
+++ b/doc/spec/proposals/161-computing-bandwidth-adjustments.txt
@@ -0,0 +1,188 @@
+Title: Computing Bandwidth Adjustments
+Filename: 161-computing-bandwidth-adjustments.txt
+Author: Mike Perry
+Created: 12-May-2009
+Target: 0.2.2.x
+Status: Open
+
+
+1. Motivation
+
+ There is high variance in the performance of the Tor network. Despite
+ our efforts to balance load evenly across the Tor nodes, some nodes are
+ significantly slower and more overloaded than others.
+
+ Proposal 160 describes how we can augment the directory authorities to
+ vote on measured bandwidths for routers. This proposal describes what
+ goes into the measuring process.
+
+
+2. Measurement Selection
+
+ The general idea is to determine a load factor representing the ratio
+ of the capacity of measured nodes to the rest of the network. This load
+ factor could be computed from three potentially relevant statistics:
+ circuit failure rates, circuit extend times, or stream capacity.
+
+ Circuit failure rates and circuit extend times appear to be
+ non-linearly proportional to node load. We've observed that the same
+ nodes when scanned at US nighttime hours (when load is presumably
+ lower) exhibit almost no circuit failure, and significantly faster
+ extend times than when scanned during the day.
+
+ Stream capacity, however, is much more uniform, even during US
+ nighttime hours. Moreover, it is a more intuitive representation of
+ node capacity, and also less dependent upon distance and latency
+ if amortized over large stream fetches.
+
+
+3. Average Stream Bandwidth Calculation
+
+ The average stream bandwidths are obtained by dividing the network into
+ slices of 50 nodes each, grouped according to advertised node bandwidth.
+
+ Two hop circuits are built using nodes from the same slice, and a large
+ file is downloaded via these circuits. For nodes in the first 15% of the
+ network, a 500K file will be used. For nodes in the next 15%, a 250K file
+ will be used. For nodes in next 15%, a 100K file will be used. The
+ remainder of the nodes will fetch a 75K file.[1]
+
+ This process is repeated 250 times, and average stream capacities are
+ assigned to each node from these results.
+
+ In the future, a node generator type can be created to ensure that
+ each node is chosen to participate in an equal number of circuits,
+ and the selection will continue until every live node is chosen
+ to participate in at least 7 circuits.
+
+
+4. Ratio Calculation Options
+
+ There are two options for deriving the ratios themselves. They can
+ be obtained by dividing each nodes' average stream capacity by
+ either the average for the slice, or the average for the network as a
+ whole.
+
+ Dividing by the network-wide average has the advantage that it will
+ account for issues related to unbalancing between higher vs lower
+ capacity, such as Steven Murdoch's queuing theory weighting result.
+ For this reason, we will opt for network-wide averages.
+
+
+5. Ratio Filtering
+
+ After the base ratios are calculated, a second pass is performed
+ to remove any streams with nodes of ratios less than X=0.5 from
+ the results of other nodes. In addition, all outlying streams
+ with capacity of one standard deviation below a node's average
+ are also removed.
+
+ The final ratio result will be calculated as the maximum of
+ these two resulting ratios if both are less than 1.0, the minimum
+ if both are greater than 1.0, and the mean if one is greater
+ and one is less than 1.0.
+
+
+6. Pseudocode for Ratio Calculation Algorithm
+
+ Here is the complete pseudocode for the ratio algorithm:
+
+ Slices = {S | S is 50 nodes of similar consensus capacity}
+ for S in Slices:
+ while exists node N in S with circ_chosen(N) < 7:
+ fetch_slice_file(build_2hop_circuit(N, (exit in S)))
+ for N in S:
+ BW_measured(N) = MEAN(b | b is bandwidth of a stream through N)
+ Bw_stddev(N) = STDDEV(b | b is bandwidth of a stream through N)
+ Bw_avg(S) = MEAN(b | b = BW_measured(N) for all N in S)
+ Normal_Routers(S) = {N | Bw_measured(N)/Bw_avg(S) > 0.5 }
+ for N in S:
+ Normal_Streams(N) =
+ {stream via N | all nodes in stream not in {Normal_Routers(S)-N}
+ and bandwidth > BW_measured(N)-Bw_stddev(N)}
+ BW_Norm_measured(N) = MEAN(b | b is a bandwidth of Normal_Streams(N))
+
+ Bw_net_avg(Slices) = MEAN(BW_measured(N) for all N in Slices)
+ Bw_Norm_net_avg(Slices) = MEAN(BW_Norm_measured(N) for all N in Slices)
+
+ for N in all Slices:
+ Bw_net_ratio(N) = Bw_measured(N)/Bw_net_avg(Slices)
+ Bw_Norm_net_ratio(N) = Bw_measured2(N)/Bw_Norm_net_avg(Slices)
+
+ if Bw_net_ratio(N) < 1.0 and Bw_Norm_net_ratio(N) < 1.0:
+ ResultRatio(N) = MAX(Bw_net_ratio(N), Bw_Norm_net_ratio(N))
+ else if Bw_net_ratio(N) > 1.0 and Bw_Norm_net_ratio(N) > 1.0:
+ ResultRatio(N) = MIN(Bw_net_ratio(N), Bw_Norm_net_ratio(N))
+ else:
+ ResultRatio(N) = MEAN(Bw_net_ratio(N), Bw_Norm_net_ratio(N))
+
+
+7. Security implications
+
+ The ratio filtering will deal with cases of sabotage by dropping
+ both very slow outliers in stream average calculations, as well
+ as dropping streams that used very slow nodes from the calculation
+ of other nodes.
+
+ This scheme will not address nodes that try to game the system by
+ providing better service to scanners. The scanners can be detected
+ at the entry by IP address, and at the exit by the destination fetch.
+
+ Measures can be taken to obfuscate and separate the scanners' source
+ IP address from the directory authority IP address. For instance,
+ scans can happen offsite and the results can be rsynced into the
+ authorities. The destination fetch can also be obscured by using SSL
+ and periodically changing the large document that is fetched.
+
+ Neither of these methods are foolproof, but such nodes can already
+ lie about their bandwidth to attract more traffic, so this solution
+ does not set us back any in that regard.
+
+
+8. Parallelization
+
+ Because each slice takes as long as 6 hours to complete, we will want
+ to parallelize as much as possible. This will be done by concurrently
+ running multiple scanners from each authority to deal with different
+ segments of the network. Each scanner piece will continually loop
+ over a portion of the network, outputting files of the form:
+
+ node_id=<idhex> SP strm_bw=<BW_measured(N)> SP
+ filt_bw=<BW_Norm_measured(N)> NL
+
+ The most recent file from each scanner will be periodically gathered
+ by another script that uses them to produce network-wide averages
+ and calculate ratios as per the algorithm in section 6. Because nodes
+ may shift in capacity, they may appear in more than one slice and/or
+ appear more than once in the file set. The line that yields a ratio
+ closest to 1.0 will be chosen in this case.
+
+
+9. Integration with Proposal 160
+
+ The final results will be produced for the voting mechanism
+ described in Proposal 160 by multiplying the derived ratio by
+ the average published consensus bandwidth during the course of the
+ scan, and taking the weighted average with the previous consensus
+ bandwidth:
+
+ Bw_new = (Bw_current * Alpha + Bw_scan_avg*Bw_ratio)/(Alpha + 1)
+
+ The Alpha parameter is a smoothing parameter intended to prevent
+ rapid oscillation between loaded and unloaded conditions.
+
+ This will produce a new bandwidth value that will be output into a
+ file consisting of lines of the form:
+
+ node_id=<idhex> SP bw=<Bw_new> NL
+
+ The first line of the file will contain a timestamp in UNIX time()
+ seconds. This will be used by the authority to decide if the
+ measured values are too old to use.
+
+ This file can be either copied or rsynced into a directory readable
+ by the directory authority.
+
+
+1. Exact values for each segment are still being determined via
+test scans.