diff options
Diffstat (limited to 'doc/spec/proposals/161-computing-bandwidth-adjustments.txt')
-rw-r--r-- | doc/spec/proposals/161-computing-bandwidth-adjustments.txt | 188 |
1 files changed, 188 insertions, 0 deletions
diff --git a/doc/spec/proposals/161-computing-bandwidth-adjustments.txt b/doc/spec/proposals/161-computing-bandwidth-adjustments.txt new file mode 100644 index 0000000000..b02dc64eb2 --- /dev/null +++ b/doc/spec/proposals/161-computing-bandwidth-adjustments.txt @@ -0,0 +1,188 @@ +Title: Computing Bandwidth Adjustments +Filename: 161-computing-bandwidth-adjustments.txt +Author: Mike Perry +Created: 12-May-2009 +Target: 0.2.2.x +Status: Open + + +1. Motivation + + There is high variance in the performance of the Tor network. Despite + our efforts to balance load evenly across the Tor nodes, some nodes are + significantly slower and more overloaded than others. + + Proposal 160 describes how we can augment the directory authorities to + vote on measured bandwidths for routers. This proposal describes what + goes into the measuring process. + + +2. Measurement Selection + + The general idea is to determine a load factor representing the ratio + of the capacity of measured nodes to the rest of the network. This load + factor could be computed from three potentially relevant statistics: + circuit failure rates, circuit extend times, or stream capacity. + + Circuit failure rates and circuit extend times appear to be + non-linearly proportional to node load. We've observed that the same + nodes when scanned at US nighttime hours (when load is presumably + lower) exhibit almost no circuit failure, and significantly faster + extend times than when scanned during the day. + + Stream capacity, however, is much more uniform, even during US + nighttime hours. Moreover, it is a more intuitive representation of + node capacity, and also less dependent upon distance and latency + if amortized over large stream fetches. + + +3. Average Stream Bandwidth Calculation + + The average stream bandwidths are obtained by dividing the network into + slices of 50 nodes each, grouped according to advertised node bandwidth. + + Two hop circuits are built using nodes from the same slice, and a large + file is downloaded via these circuits. For nodes in the first 15% of the + network, a 500K file will be used. For nodes in the next 15%, a 250K file + will be used. For nodes in next 15%, a 100K file will be used. The + remainder of the nodes will fetch a 75K file.[1] + + This process is repeated 250 times, and average stream capacities are + assigned to each node from these results. + + In the future, a node generator type can be created to ensure that + each node is chosen to participate in an equal number of circuits, + and the selection will continue until every live node is chosen + to participate in at least 7 circuits. + + +4. Ratio Calculation Options + + There are two options for deriving the ratios themselves. They can + be obtained by dividing each nodes' average stream capacity by + either the average for the slice, or the average for the network as a + whole. + + Dividing by the network-wide average has the advantage that it will + account for issues related to unbalancing between higher vs lower + capacity, such as Steven Murdoch's queuing theory weighting result. + For this reason, we will opt for network-wide averages. + + +5. Ratio Filtering + + After the base ratios are calculated, a second pass is performed + to remove any streams with nodes of ratios less than X=0.5 from + the results of other nodes. In addition, all outlying streams + with capacity of one standard deviation below a node's average + are also removed. + + The final ratio result will be calculated as the maximum of + these two resulting ratios if both are less than 1.0, the minimum + if both are greater than 1.0, and the mean if one is greater + and one is less than 1.0. + + +6. Pseudocode for Ratio Calculation Algorithm + + Here is the complete pseudocode for the ratio algorithm: + + Slices = {S | S is 50 nodes of similar consensus capacity} + for S in Slices: + while exists node N in S with circ_chosen(N) < 7: + fetch_slice_file(build_2hop_circuit(N, (exit in S))) + for N in S: + BW_measured(N) = MEAN(b | b is bandwidth of a stream through N) + Bw_stddev(N) = STDDEV(b | b is bandwidth of a stream through N) + Bw_avg(S) = MEAN(b | b = BW_measured(N) for all N in S) + Normal_Routers(S) = {N | Bw_measured(N)/Bw_avg(S) > 0.5 } + for N in S: + Normal_Streams(N) = + {stream via N | all nodes in stream not in {Normal_Routers(S)-N} + and bandwidth > BW_measured(N)-Bw_stddev(N)} + BW_Norm_measured(N) = MEAN(b | b is a bandwidth of Normal_Streams(N)) + + Bw_net_avg(Slices) = MEAN(BW_measured(N) for all N in Slices) + Bw_Norm_net_avg(Slices) = MEAN(BW_Norm_measured(N) for all N in Slices) + + for N in all Slices: + Bw_net_ratio(N) = Bw_measured(N)/Bw_net_avg(Slices) + Bw_Norm_net_ratio(N) = Bw_measured2(N)/Bw_Norm_net_avg(Slices) + + if Bw_net_ratio(N) < 1.0 and Bw_Norm_net_ratio(N) < 1.0: + ResultRatio(N) = MAX(Bw_net_ratio(N), Bw_Norm_net_ratio(N)) + else if Bw_net_ratio(N) > 1.0 and Bw_Norm_net_ratio(N) > 1.0: + ResultRatio(N) = MIN(Bw_net_ratio(N), Bw_Norm_net_ratio(N)) + else: + ResultRatio(N) = MEAN(Bw_net_ratio(N), Bw_Norm_net_ratio(N)) + + +7. Security implications + + The ratio filtering will deal with cases of sabotage by dropping + both very slow outliers in stream average calculations, as well + as dropping streams that used very slow nodes from the calculation + of other nodes. + + This scheme will not address nodes that try to game the system by + providing better service to scanners. The scanners can be detected + at the entry by IP address, and at the exit by the destination fetch. + + Measures can be taken to obfuscate and separate the scanners' source + IP address from the directory authority IP address. For instance, + scans can happen offsite and the results can be rsynced into the + authorities. The destination fetch can also be obscured by using SSL + and periodically changing the large document that is fetched. + + Neither of these methods are foolproof, but such nodes can already + lie about their bandwidth to attract more traffic, so this solution + does not set us back any in that regard. + + +8. Parallelization + + Because each slice takes as long as 6 hours to complete, we will want + to parallelize as much as possible. This will be done by concurrently + running multiple scanners from each authority to deal with different + segments of the network. Each scanner piece will continually loop + over a portion of the network, outputting files of the form: + + node_id=<idhex> SP strm_bw=<BW_measured(N)> SP + filt_bw=<BW_Norm_measured(N)> NL + + The most recent file from each scanner will be periodically gathered + by another script that uses them to produce network-wide averages + and calculate ratios as per the algorithm in section 6. Because nodes + may shift in capacity, they may appear in more than one slice and/or + appear more than once in the file set. The line that yields a ratio + closest to 1.0 will be chosen in this case. + + +9. Integration with Proposal 160 + + The final results will be produced for the voting mechanism + described in Proposal 160 by multiplying the derived ratio by + the average published consensus bandwidth during the course of the + scan, and taking the weighted average with the previous consensus + bandwidth: + + Bw_new = (Bw_current * Alpha + Bw_scan_avg*Bw_ratio)/(Alpha + 1) + + The Alpha parameter is a smoothing parameter intended to prevent + rapid oscillation between loaded and unloaded conditions. + + This will produce a new bandwidth value that will be output into a + file consisting of lines of the form: + + node_id=<idhex> SP bw=<Bw_new> NL + + The first line of the file will contain a timestamp in UNIX time() + seconds. This will be used by the authority to decide if the + measured values are too old to use. + + This file can be either copied or rsynced into a directory readable + by the directory authority. + + +1. Exact values for each segment are still being determined via +test scans. |