aboutsummaryrefslogtreecommitdiff
path: root/doc/spec/proposals/108-mtbf-based-stability.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/spec/proposals/108-mtbf-based-stability.txt')
-rw-r--r--doc/spec/proposals/108-mtbf-based-stability.txt88
1 files changed, 0 insertions, 88 deletions
diff --git a/doc/spec/proposals/108-mtbf-based-stability.txt b/doc/spec/proposals/108-mtbf-based-stability.txt
deleted file mode 100644
index 294103760b..0000000000
--- a/doc/spec/proposals/108-mtbf-based-stability.txt
+++ /dev/null
@@ -1,88 +0,0 @@
-Filename: 108-mtbf-based-stability.txt
-Title: Base "Stable" Flag on Mean Time Between Failures
-Author: Nick Mathewson
-Created: 10-Mar-2007
-Status: Closed
-Implemented-In: 0.2.0.x
-
-Overview:
-
- This document proposes that we change how directory authorities set the
- stability flag from inspection of a router's declared Uptime to the
- authorities' perceived mean time between failure for the router.
-
-Motivation:
-
- Clients prefer nodes that the authorities call Stable. This flag is (as
- of 0.2.0.0-alpha-dev) set entirely based on the node's declared value for
- uptime. This creates an opportunity for malicious nodes to declare
- falsely high uptimes in order to get more traffic.
-
-Spec changes:
-
- Replace the current rule for setting the Stable flag with:
-
- "Stable" -- A router is 'Stable' if it is active and its observed Stability
- for the past month is at or above the median Stability for active routers.
- Routers are never called stable if they are running a version of Tor
- known to drop circuits stupidly. (0.1.1.10-alpha through 0.1.1.16-rc
- are stupid this way.)
-
- Stability shall be defined as the weighted mean length of the runs
- observed by a given directory authority. A run begins when an authority
- decides that the server is Running, and ends when the authority decides
- that the server is not Running. In-progress runs are counted when
- measuring Stability. When calculating the mean, runs are weighted by
- $\alpha ^ t$, where $t$ is time elapsed since the end of the run, and
- $0 < \alpha < 1$. Time when an authority is down do not count to the
- length of the run.
-
-Rejected Alternative:
-
- "A router's Stability shall be defined as the sum of $\alpha ^ d$ for every
- $d$ such that the router was considered reachable for the entire day
- $d$ days ago.
-
- This allows a simpler implementation: every day, we multiply
- yesterday's Stability by alpha, and if the router was observed to be
- available every time we looked today, we add 1.
-
- Instead of "day", we could pick an arbitrary time unit. We should
- pick alpha to be high enough that long-term stability counts, but low
- enough that the distant past is eventually forgotten. Something
- between .8 and .95 seems right.
-
- (By requiring that routers be up for an entire day to get their
- stability increased, instead of counting fractions of a day, we
- capture the notion that stability is more like "probability of
- staying up for the next hour" than it is like "probability of being
- up at some randomly chosen time over the next hour." The former
- notion of stability is far more relevant for long-lived circuits.)
-
-Limitations:
-
- Authorities can have false positives and false negatives when trying to
- tell whether a router is up or down. So long as these aren't terribly
- wrong, and so long as they aren't significantly biased, we should be able
- to use them to estimate stability pretty well.
-
- Probing approaches like the above could miss short incidents of
- downtime. If we use the router's declared uptime, we could detect
- these: but doing so would penalize routers who reported their uptime
- accurately.
-
-Implementation:
-
- For now, the easiest way to store this information at authorities
- would probably be in some kind of periodically flushed flat file.
- Later, we could move to Berkeley db or something if we really had to.
-
- For each router, an authority will need to store:
- The router ID.
- Whether the router is up.
- The time when the current run started, if the router is up.
- The weighted sum length of all previous runs.
- The time at which the weighted sum length was last weighted down.
-
- Servers should probe at random intervals to test whether servers are
- running.