Move all text-only specifications into the OLD_TXT directory.

author: Nick Mathewson <nickm@torproject.org> 2023-10-12 12:27:58 -0400
committer: Nick Mathewson <nickm@torproject.org> 2023-10-12 12:27:58 -0400
commit: e4e0d93d56ee8c1aec4c2efaa7046b651f0fe55c (patch)
tree: 15a085da265ae3b2b70f29a70f910a5371059a78 /path-spec.txt
parent: b719a373934d3e79ef833446c6aeeb19be485510 (diff)
download: torspec-e4e0d93d56ee8c1aec4c2efaa7046b651f0fe55c.tar.gz
torspec-e4e0d93d56ee8c1aec4c2efaa7046b651f0fe55c.zip
1 files changed, 0 insertions, 1051 deletions
diff --git a/path-spec.txt b/path-spec.txt
deleted file mode 100644
index 33d50e5..0000000
--- a/path-spec.txt
+++ /dev/null
@@ -1,1051 +0,0 @@
-
-                           Tor Path Specification
-
-                              Roger Dingledine
-                               Nick Mathewson
-
-Note: This is an attempt to specify Tor as currently implemented.  Future
-versions of Tor will implement improved algorithms.
-
-This document tries to cover how Tor chooses to build circuits and assign
-streams to circuits.  Other implementations MAY take other approaches, but
-implementors should be aware of the anonymity and load-balancing implications
-of their choices.
-
-                    THIS SPEC ISN'T DONE YET.
-
-      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
-      NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
-      "OPTIONAL" in this document are to be interpreted as described in
-      RFC 2119.
-
-Tables of Contents
-
-    1. General operation
-        1.1. Terminology
-        1.2. A relay's bandwidth
-    2. Building circuits
-        2.1. When we build
-            2.1.0. We don't build circuits until we have enough directory info
-            2.1.1. Clients build circuits preemptively
-            2.1.2. Clients build circuits on demand
-            2.1.3. Relays build circuits for testing reachability and bandwidth
-            2.1.4. Hidden-service circuits
-            2.1.5. Rate limiting of failed circuits
-            2.1.6. When to tear down circuits
-        2.2. Path selection and constraints
-            2.2.1. Choosing an exit
-            2.2.2. User configuration
-        2.3. Cannibalizing circuits
-        2.4. Learning when to give up ("timeout") on circuit construction
-            2.4.1 Distribution choice and parameter estimation
-            2.4.2. How much data to record
-            2.4.3. How to record timeouts
-            2.4.4. Detecting Changing Network Conditions
-            2.4.5. Consensus parameters governing behavior
-            2.4.6. Consensus parameters governing behavior
-        2.5. Handling failure
-    3. Attaching streams to circuits
-    4. Hidden-service related circuits
-    5. Guard nodes
-        5.1. How consensus bandwidth weights factor into entry guard selection
-    6. Server descriptor purposes
-    7. Detecting route manipulation by Guard nodes (Path Bias)
-        7.1. Measuring path construction success rates
-        7.2. Measuring path usage success rates
-        7.3. Scaling success counts
-        7.4. Parametrization
-        7.5. Known barriers to enforcement
-    X. Old notes
-    X.1. Do we actually do this?
-    X.2. A thing we could do to deal with reachability.
-    X.3. Some stuff that worries me about entry guards. 2006 Jun, Nickm.
-
-1. General operation
-
-   Tor begins building circuits as soon as it has enough directory
-   information to do so (see section 5 of dir-spec.txt).  Some circuits are
-   built preemptively because we expect to need them later (for user
-   traffic), and some are built because of immediate need (for user traffic
-   that no current circuit can handle, for testing the network or our
-   reachability, and so on).
-
-  [Newer versions of Tor (0.2.6.2-alpha and later):
-   If the consensus contains Exits (the typical case), Tor will build both
-   exit and internal circuits. When bootstrap completes, Tor will be ready
-   to handle an application requesting an exit circuit to services like the
-   World Wide Web.
-
-   If the consensus does not contain Exits, Tor will only build internal
-   circuits. In this case, earlier statuses will have included "internal"
-   as indicated above. When bootstrap completes, Tor will be ready to handle
-   an application requesting an internal circuit to hidden services at
-   ".onion" addresses.
-
-   If a future consensus contains Exits, exit circuits may become available.]
-
-   When a client application creates a new stream (by opening a SOCKS
-   connection or launching a resolve request), we attach it to an appropriate
-   open circuit if one exists, or wait if an appropriate circuit is
-   in-progress. We launch a new circuit only
-   if no current circuit can handle the request.  We rotate circuits over
-   time to avoid some profiling attacks.
-
-   To build a circuit, we choose all the nodes we want to use, and then
-   construct the circuit.  Sometimes, when we want a circuit that ends at a
-   given hop, and we have an appropriate unused circuit, we "cannibalize" the
-   existing circuit and extend it to the new terminus.
-
-   These processes are described in more detail below.
-
-   This document describes Tor's automatic path selection logic only; path
-   selection can be overridden by a controller (with the EXTENDCIRCUIT and
-   ATTACHSTREAM commands).  Paths constructed through these means may
-   violate some constraints given below.
-
-1.1. Terminology
-
-   A "path" is an ordered sequence of nodes, not yet built as a circuit.
-
-   A "clean" circuit is one that has not yet been used for any traffic.
-
-   A "fast" or "stable" or "valid" node is one that has the 'Fast' or
-   'Stable' or 'Valid' flag
-   set respectively, based on our current directory information.  A "fast"
-   or "stable" circuit is one consisting only of "fast" or "stable" nodes.
-
-   In an "exit" circuit, the final node is chosen based on waiting stream
-   requests if any, and in any case it avoids nodes with exit policy of
-   "reject *:*". An "internal" circuit, on the other hand, is one where
-   the final node is chosen just like a middle node (ignoring its exit
-   policy).
-
-   A "request" is a client-side stream or DNS resolve that needs to be
-   served by a circuit.
-
-   A "pending" circuit is one that we have started to build, but which has
-   not yet completed.
-
-   A circuit or path "supports" a request if it is okay to use the
-   circuit/path to fulfill the request, according to the rules given below.
-   A circuit or path "might support" a request if some aspect of the request
-   is unknown (usually its target IP), but we believe the path probably
-   supports the request according to the rules given below.
-
-1.2. A relay's bandwidth
-
-   Old versions of Tor did not report bandwidths in network status
-   documents, so clients had to learn them from the routers' advertised
-   relay descriptors.
-
-   For versions of Tor prior to 0.2.1.17-rc, everywhere below where we
-   refer to a relay's "bandwidth", we mean its clipped advertised
-   bandwidth, computed by taking the smaller of the 'rate' and
-   'observed' arguments to the "bandwidth" element in the relay's
-   descriptor.  If a router's advertised bandwidth is greater than
-   MAX_BELIEVABLE_BANDWIDTH (currently 10 MB/s), we clipped to that
-   value.
-
-   For more recent versions of Tor, we take the bandwidth value declared
-   in the consensus, and fall back to the clipped advertised bandwidth
-   only if the consensus does not have bandwidths listed.
-
-2. Building circuits
-
-2.1. When we build
-
-2.1.0. We don't build circuits until we have enough directory info
-
-   There's a class of possible attacks where our directory servers
-   only give us information about the relays that they would like us
-   to use.  To prevent this attack, we don't build multi-hop
-   circuits for real traffic (like those in 2.1.1, 2.1.2, 2.1.4
-   below) until we have enough directory information to be
-   reasonably confident this attack isn't being done to us.
-
-   Here, "enough" directory information is defined as:
-
-      * Having a consensus that's been valid at some point in the
-        last REASONABLY_LIVE_TIME interval (24 hours).
-
-      * Having enough descriptors that we could build at least some
-        fraction F of all bandwidth-weighted paths, without taking
-        ExitNodes/EntryNodes/etc into account.
-
-        (F is set by the PathsNeededToBuildCircuits option,
-        defaulting to the 'min_paths_for_circs_pct' consensus
-        parameter, with a final default value of 60%.)
-
-      * Having enough descriptors that we could build at least some
-        fraction F of all bandwidth-weighted paths, _while_ taking
-        ExitNodes/EntryNodes/etc into account.
-
-        (F is as above.)
-
-      * Having a descriptor for every one of the first
-        NUM_USABLE_PRIMARY_GUARDS guards among our primary guards. (see
-        guard-spec.txt)
-
-   We define the "fraction of bandwidth-weighted paths" as the product of
-   these three fractions.
-
-      * The fraction of descriptors that we have for nodes with the Guard
-        flag, weighted by their bandwidth for the guard position.
-      * The fraction of descriptors that we have for all nodes,
-        weighted by their bandwidth for the middle position.
-      * The fraction of descriptors that we have for nodes with the Exit
-        flag, weighted by their bandwidth for the exit position.
-
-   If the consensus has zero weighted bandwidth for a given kind of
-   relay (Guard, Middle, or Exit), Tor instead uses the fraction of relays
-   for which it has the descriptor (not weighted by bandwidth at all).
-
-   If the consensus lists zero exit-flagged relays, Tor instead uses the
-   fraction of middle relays.
-
-
-2.1.1. Clients build circuits preemptively
-
-   When running as a client, Tor tries to maintain at least a certain
-   number of clean circuits, so that new streams can be handled
-   quickly.  To increase the likelihood of success, Tor tries to
-   predict what circuits will be useful by choosing from among nodes
-   that support the ports we have used in the recent past (by default
-   one hour). Specifically, on startup Tor tries to maintain one clean
-   fast exit circuit that allows connections to port 80, and at least
-   two fast clean stable internal circuits in case we get a resolve
-   request or hidden service request (at least three if we _run_ a
-   hidden service).
-
-   After that, Tor will adapt the circuits that it preemptively builds
-   based on the requests it sees from the user: it tries to have two fast
-   clean exit circuits available for every port seen within the past hour
-   (each circuit can be adequate for many predicted ports -- it doesn't
-   need two separate circuits for each port), and it tries to have the
-   above internal circuits available if we've seen resolves or hidden
-   service activity within the past hour. If there are 12 or more clean
-   circuits open, it doesn't open more even if it has more predictions.
-
-   Only stable circuits can "cover" a port that is listed in the
-   LongLivedPorts config option. Similarly, hidden service requests
-   to ports listed in LongLivedPorts make us create stable internal
-   circuits.
-
-   Note that if there are no requests from the user for an hour, Tor
-   will predict no use and build no preemptive circuits.
-
-   The Tor client SHOULD NOT store its list of predicted requests to a
-   persistent medium.
-
-2.1.2. Clients build circuits on demand
-
-   Additionally, when a client request exists that no circuit (built or
-   pending) might support, we create a new circuit to support the request.
-   For exit connections, we pick an exit node that will handle the
-   most pending requests (choosing arbitrarily among ties), launch a
-   circuit to end there, and repeat until every unattached request
-   might be supported by a pending or built circuit. For internal
-   circuits, we pick an arbitrary acceptable path, repeating as needed.
-
-   Clients consider a circuit to become "dirty" as soon as a stream is
-   attached to it, or some other request is performed over the circuit.
-   If a circuit has been "dirty" for at least MaxCircuitDirtiness seconds,
-   new circuits may not be attached to it.
-
-   In some cases we can reuse an already established circuit if it's
-   clean; see Section 2.3 (cannibalizing circuits) for details.
-
-2.1.3. Relays build circuits for testing reachability and bandwidth
-
-   Tor relays test reachability of their ORPort once they have
-   successfully built a circuit (on startup and whenever their IP address
-   changes). They build an ordinary fast internal circuit with themselves
-   as the last hop. As soon as any testing circuit succeeds, the Tor
-   relay decides it's reachable and is willing to publish a descriptor.
-
-   We launch multiple testing circuits (one at a time), until we
-   have NUM_PARALLEL_TESTING_CIRC (4) such circuits open. Then we
-   do a "bandwidth test" by sending a certain number of relay drop
-   cells down each circuit: BandwidthRate * 10 / CELL_NETWORK_SIZE
-   total cells divided across the four circuits, but never more than
-   CIRCWINDOW_START (1000) cells total. This exercises both outgoing and
-   incoming bandwidth, and helps to jumpstart the observed bandwidth
-   (see dir-spec.txt).
-
-   Tor relays also test reachability of their DirPort once they have
-   established a circuit, but they use an ordinary exit circuit for
-   this purpose.
-
-2.1.4. Hidden-service circuits
-
-   See section 4 below.
-
-2.1.5. Rate limiting of failed circuits
-
-   If we fail to build a circuit N times in a X second period (see Section
-   2.3 for how this works), we stop building circuits until the X seconds
-   have elapsed.
-   XXXX
-
-2.1.6. When to tear down circuits
-
-   Clients should tear down circuits (in general) only when those circuits
-   have no streams on them.  Additionally, clients should tear-down
-   stream-less circuits only under one of the following conditions:
-
-     - The circuit has never had a stream attached, and it was created too
-       long in the past (based on CircuitsAvailableTimeout or
-       cbtlearntimeout, depending on timeout estimate status).
-
-     - The circuit is dirty (has had a stream attached), and it has been
-       dirty for at least MaxCircuitDirtiness.
-
-2.2. Path selection and constraints
-
-   We choose the path for each new circuit before we build it.  We choose the
-   exit node first, followed by the other nodes in the circuit, front to
-   back. (In other words, for a 3-hop circuit, we first pick hop 3,
-   then hop 1, then hop 2.)  All paths we generate obey the following
-   constraints:
-
-     - We do not choose the same router twice for the same path.
-     - We do not choose any router in the same family as another in the same
-       path. (Two routers are in the same family if each one lists the other
-       in the "family" entries of its descriptor.)
-     - We do not choose more than one router in a given /16 subnet
-       (unless EnforceDistinctSubnets is 0).
-     - We don't choose any non-running or non-valid router unless we have
-       been configured to do so. By default, we are configured to allow
-       non-valid routers in "middle" and "rendezvous" positions.
-     - If we're using Guard nodes, the first node must be a Guard (see 5
-       below)
-     - XXXX Choosing the length
-
-   For "fast" circuits, we only choose nodes with the Fast flag. For
-   non-"fast" circuits, all nodes are eligible.
-
-   For all circuits, we weight node selection according to router bandwidth.
-
-   We also weight the bandwidth of Exit and Guard flagged nodes depending on
-   the fraction of total bandwidth that they make up and depending upon the
-   position they are being selected for.
-
-   These weights are published in the consensus, and are computed as described
-   in Section "Computing Bandwidth Weights" of dir-spec.txt. They are:
-
-      Wgg - Weight for Guard-flagged nodes in the guard position
-      Wgm - Weight for non-flagged nodes in the guard Position
-      Wgd - Weight for Guard+Exit-flagged nodes in the guard Position
-
-      Wmg - Weight for Guard-flagged nodes in the middle Position
-      Wmm - Weight for non-flagged nodes in the middle Position
-      Wme - Weight for Exit-flagged nodes in the middle Position
-      Wmd - Weight for Guard+Exit flagged nodes in the middle Position
-
-      Weg - Weight for Guard flagged nodes in the exit Position
-      Wem - Weight for non-flagged nodes in the exit Position
-      Wee - Weight for Exit-flagged nodes in the exit Position
-      Wed - Weight for Guard+Exit-flagged nodes in the exit Position
-
-      Wgb - Weight for BEGIN_DIR-supporting Guard-flagged nodes
-      Wmb - Weight for BEGIN_DIR-supporting non-flagged nodes
-      Web - Weight for BEGIN_DIR-supporting Exit-flagged nodes
-      Wdb - Weight for BEGIN_DIR-supporting Guard+Exit-flagged nodes
-
-      Wbg - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
-      Wbm - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
-      Wbe - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
-      Wbd - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
-
-   If any of those weights is malformed or not present in a consensus,
-   clients proceed with the regular path selection algorithm setting
-   the weights to the default value of 10000.
-
-   Additionally, we may be building circuits with one or more requests in
-   mind.  Each kind of request puts certain constraints on paths:
-
-     - All service-side introduction circuits and all rendezvous paths
-       should be Stable.
-     - All connection requests for connections that we think will need to
-       stay open a long time require Stable circuits.  Currently, Tor decides
-       this by examining the request's target port, and comparing it to a
-       list of "long-lived" ports. (Default: 21, 22, 706, 1863, 5050,
-       5190, 5222, 5223, 6667, 6697, 8300.)
-     - DNS resolves require an exit node whose exit policy is not equivalent
-       to "reject *:*".
-     - Reverse DNS resolves require a version of Tor with advertised eventdns
-       support (available in Tor 0.1.2.1-alpha-dev and later).
-     - All connection requests require an exit node whose exit policy
-       supports their target address and port (if known), or which "might
-       support it" (if the address isn't known).  See 2.2.1.
-     - Rules for Fast? XXXXX
-
-2.2.1. Choosing an exit
-
-   If we know what IP address we want to connect to or resolve, we can
-   trivially tell whether a given router will support it by simulating
-   its declared exit policy.
-
-   Because we often connect to addresses of the form hostname:port, we do not
-   always know the target IP address when we select an exit node.  In these
-   cases, we need to pick an exit node that "might support" connections to a
-   given address port with an unknown address.  An exit node "might support"
-   such a connection if any clause that accepts any connections to that port
-   precedes all clauses (if any) that reject all connections to that port.
-
-   Unless requested to do so by the user, we never choose an exit node
-   flagged as "BadExit" by more than half of the authorities who advertise
-   themselves as listing bad exits.
-
-2.2.2. User configuration
-
-   Users can alter the default behavior for path selection with configuration
-   options.
-
-   - If "ExitNodes" is provided, then every request requires an exit node on
-     the ExitNodes list.  (If a request is supported by no nodes on that list,
-     and StrictExitNodes is false, then Tor treats that request as if
-     ExitNodes were not provided.)
-
-   - "EntryNodes" and "StrictEntryNodes" behave analogously.
-
-   - If a user tries to connect to or resolve a hostname of the form
-     <target>.<servername>.exit, the request is rewritten to a request for
-     <target>, and the request is only supported by the exit whose nickname
-     or fingerprint is <servername>.
-
-   - When set, "HSLayer2Nodes" and "HSLayer3Nodes" relax Tor's path
-     restrictions to allow nodes in the same /16 and node family to reappear
-     in the path. They also allow the guard node to be chosen as the RP, IP,
-     and HSDIR, and as the hop before those positions.
-
-2.3. Cannibalizing circuits
-
-   If we need a circuit and have a clean one already established, in
-   some cases we can adapt the clean circuit for our new
-   purpose. Specifically,
-
-   For hidden service interactions, we can "cannibalize" a clean internal
-   circuit if one is available, so we don't need to build those circuits
-   from scratch on demand.
-
-   We can also cannibalize clean circuits when the client asks to exit
-   at a given node -- either via the ".exit" notation or because the
-   destination is running at the same location as an exit node.
-
-2.4. Learning when to give up ("timeout") on circuit construction
-
-   Since version 0.2.2.8-alpha, Tor clients attempt to learn when to give
-   up on circuits based on network conditions.
-
-2.4.1. Distribution choice
-
-   Based on studies of build times, we found that the distribution of
-   circuit build times appears to be a Frechet distribution (and a multi-modal
-   Frechet distribution, if more than one guard or bridge is used). However,
-   estimators and quantile functions of the Frechet distribution are difficult
-   to work with and slow to converge. So instead, since we are only interested
-   in the accuracy of the tail, clients approximate the tail of the multi-modal
-   distribution with a single Pareto curve.
-
-2.4.2. How much data to record
-
-   From our observations, the minimum number of circuit build times for a
-   reasonable fit appears to be on the order of 100. However, to keep a
-   good fit over the long term, clients store 1000 most recent circuit build
-   times in a circular array.
-
-   These build times only include the times required to build three-hop
-   circuits, and the times required to build the first three hops of circuits
-   with more than three hops.  Circuits of fewer than three hops are not
-   recorded, and hops past the third are not recorded.
-   
-   The Tor client should build test circuits at a rate of one every 'cbttestfreq'
-   (10 seconds) until 'cbtmincircs' (100 circuits) are built, with a maximum of
-   'cbtmaxopencircs' (default: 10) circuits open at once. This allows a fresh
-   Tor to have a CircuitBuildTimeout estimated within 30 minutes after install
-   or network change (see section 2.4.5 below).
-
-   Timeouts are stored on disk in a histogram of 10ms bin width, the same
-   width used to calculate the Xm value above. The timeouts recorded in the
-   histogram must be shuffled after being read from disk, to preserve a
-   proper expiration of old values after restart.
-
-   Thus, some build time resolution is lost during restart. Implementations may
-   choose a different persistence mechanism than this histogram, but be aware
-   that build time binning is still needed for parameter estimation.
-
-2.4.3. Parameter estimation
-
-   Once 'cbtmincircs' build times are recorded, Tor clients update the
-   distribution parameters and recompute the timeout every circuit completion
-   (though see section 2.4.5 for when to pause and reset timeout due to
-   too many circuits timing out).
-
-   Tor clients calculate the parameters for a Pareto distribution fitting the
-   data using the maximum likelihood estimator. For derivation, see:
-   https://en.wikipedia.org/wiki/Pareto_distribution#Estimation_of_parameters
-
-   Because build times are not a true Pareto distribution, we alter how Xm is
-   computed. In a max likelihood estimator, the mode of the distribution is
-   used directly as Xm.
-
-   Instead of using the mode of discrete build times directly, Tor clients
-   compute the Xm parameter using the weighted average of the midpoints
-   of the 'cbtnummodes' (10) most frequently occurring 10ms histogram bins.
-   Ties are broken in favor of earlier bins (that is, in favor of bins
-   corresponding to shorter build times).
-
-   (The use of 10 modes was found to minimize error from the selected
-   cbtquantile, with 10ms bins for quantiles 60-80, compared to many other
-   heuristics).
-
-   To avoid ln(1.0+epsilon) precision issues, use log laws to rewrite the
-   estimator for 'alpha' as the sum of logs followed by subtraction, rather
-   than multiplication and division:
-
-       alpha = n/(Sum_n{ln(MAX(Xm, x_i))} - n*ln(Xm))
-
-   In this, n is the total number of build times that have completed, x_i is
-   the ith recorded build time, and Xm is the modes of x_i as above.
-
-   All times below Xm are counted as having the Xm value via the MAX(),
-   because in Pareto estimators, Xm is supposed to be the lowest value.
-   However, since clients use mode averaging to estimate Xm, there can be
-   values below our Xm. Effectively, the Pareto estimator then treats that
-   everything smaller than Xm happened at Xm. One can also see that if
-   clients did not do this, alpha could underflow to become negative, which
-   results in an exponential curve, not a Pareto probability distribution.
-
-   The timeout itself is calculated by using the Pareto Quantile function (the
-   inverted CDF) to give us the value on the CDF such that 80% of the mass
-   of the distribution is below the timeout value (parameter 'cbtquantile').
-
-   The Pareto Quantile Function (inverse CDF) is:
-
-      F(q) = Xm/((1.0-q)^(1.0/alpha))
-
-   Thus, clients obtain the circuit build timeout for 3-hop circuits by
-   computing:
-
-     timeout_ms = F(0.8)    # 'cbtquantile' == 0.8
-
-   With this, we expect that the Tor client will accept the fastest 80% of the
-   total number of paths on the network.
-
-   Clients obtain the circuit close time to completely abandon circuits as:
-
-     close_ms = F(0.99)     # 'cbtclosequantile' == 0.99
-
-   To avoid waiting an unreasonably long period of time for circuits that
-   simply have relays that are down, Tor clients cap timeout_ms at the max
-   build time actually observed so far, and cap close_ms at twice this max,
-   but at least 60 seconds:
-
-     timeout_ms = MIN(timeout_ms, max_observed_timeout)
-     close_ms = MAX(MIN(close_ms, 2*max_observed_timeout), 'cbtinitialtimeout')
-
-2.4.3. Calculating timeouts thresholds for circuits of different lengths
-
-   The timeout_ms and close_ms estimates above are good only for 3-hop
-   circuits, since only 3-hop circuits are recorded in the list of build
-   times.
-
-   To calculate the appropriate timeouts and close timeouts for circuits of
-   other lengths, the client multiples the timeout_ms and close_ms values
-   by a scaling factor determined by the number of communication hops
-   needed to build their circuits:
-
-   timeout_ms[hops=n] = timeout_ms * Actions(N) / Actions(3)
-
-   close_ms[hops=n] = close_ms * Actions(N) / Actions(3)
-
-       where Actions(N) = N * (N + 1) / 2.
-
-   To calculate timeouts for operations other than circuit building,
-   the client should add X to Actions(N) for every round-trip communication
-   required with the Xth hop.
-
-2.4.4. How to record timeouts
-
-   Pareto estimators begin to lose their accuracy if the tail is omitted.
-   Hence, Tor clients actually calculate two timeouts: a usage timeout, and a
-   close timeout.
-
-   Circuits that pass the usage timeout are marked as measurement circuits,
-   and are allowed to continue to build until the close timeout corresponding
-   to the point 'cbtclosequantile' (default 99) on the Pareto curve, or 60
-   seconds, whichever is greater.
-
-   The actual completion times for these measurement circuits should be
-   recorded.
-
-   Implementations should completely abandon a circuit and ignore the circuit
-   if the total build time exceeds the close threshold. Such closed circuits
-   should be ignored, as this typically means one of the relays in the path is
-   offline.
-
-2.4.5. Detecting Changing Network Conditions
-
-   Tor clients attempt to detect both network connectivity loss and drastic
-   changes in the timeout characteristics.
-
-   To detect changing network conditions, clients keep a history of
-   the timeout or non-timeout status of the past 'cbtrecentcount' circuits
-   (20 circuits) that successfully completed at least one hop. If more than
-   90% of these circuits timeout, the client discards all buildtimes history,
-   resets the timeout to 'cbtinitialtimeout' (60 seconds), and then begins
-   recomputing the timeout.
-
-   If the timeout was already at least `cbtinitialtimeout`,
-   the client doubles the timeout.
-
-   The records here (of how many circuits succeeded or failed among the most
-   recent 'cbrrecentcount') are not stored as persistent state.  On reload,
-   we start with a new, empty state.
-
-2.4.6. Consensus parameters governing behavior
-
-   Clients that implement circuit build timeout learning should obey the
-   following consensus parameters that govern behavior, in order to allow
-   us to handle bugs or other emergent behaviors due to client circuit
-   construction. If these parameters are not present in the consensus,
-   the listed default values should be used instead.
-
-      cbtdisabled
-        Default: 0
-        Min: 0
-        Max: 1
-        Effect: If 1, all CircuitBuildTime learning code should be
-                disabled and history should be discarded. For use in
-                emergency situations only.
-
-      cbtnummodes
-        Default: 10
-        Min: 1
-        Max: 20
-        Effect: This value governs how many modes to use in the weighted
-        average calculation of Pareto parameter Xm. Selecting Xm as the
-        average of multiple modes improves accuracy of the Pareto tail
-        for quantile cutoffs from 60-80% (see cbtquantile).
-
-      cbtrecentcount
-        Default: 20
-        Min: 3
-        Max: 1000
-        Effect: This is the number of circuit build outcomes (success vs
-                timeout) to keep track of for the following option.
-
-      cbtmaxtimeouts
-        Default: 18
-        Min: 3
-        Max: 10000
-        Effect: When this many timeouts happen in the last 'cbtrecentcount'
-                circuit attempts, the client should discard all of its
-                history and begin learning a fresh timeout value.
-
-                Note that if this parameter's value is greater than the value
-                of 'cbtrecentcount', then the history will never be
-                discarded because of this feature.
-
-      cbtmincircs
-        Default: 100
-        Min: 1
-        Max: 10000
-        Effect: This is the minimum number of circuits to build before
-                computing a timeout.
-
-                Note that if this parameter's value is higher than 1000 (the
-                number of time observations that a client keeps in its
-                circular buffer), circuit build timeout calculation is
-                effectively disabled, and the default timeouts are used
-                indefinitely.
-
-      cbtquantile
-        Default: 80
-        Min: 10
-        Max: 99
-        Effect: This is the position on the quantile curve to use to set the
-                timeout value. It is a percent (10-99).
-
-      cbtclosequantile
-        Default: 99
-        Min: Value of cbtquantile parameter
-        Max: 99
-        Effect: This is the position on the quantile curve to use to set the
-                timeout value to use to actually close circuits. It is a
-                percent (0-99).
-
-      cbttestfreq
-        Default: 10
-        Min: 1
-        Max: 2147483647 (INT32_MAX)
-        Effect: Describes how often in seconds to build a test circuit to
-                gather timeout values. Only applies if less than 'cbtmincircs'
-                have been recorded.
-
-      cbtmintimeout
-        Default: 10
-        Min: 10
-        Max: 2147483647 (INT32_MAX)
-        Effect: This is the minimum allowed timeout value in milliseconds.
-
-      cbtinitialtimeout
-        Default: 60000
-        Min: Value of cbtmintimeout
-        Max: 2147483647 (INT32_MAX)
-        Effect: This is the timeout value to use before we have enough data
-                to compute a timeout, in milliseconds.  If we do not have
-                enough data to compute a timeout estimate (see cbtmincircs),
-                then we use this interval both for the close timeout and the
-                abandon timeout.
-
-      cbtlearntimeout
-        Default: 180
-        Min: 10
-        Max: 60000
-        Effect: This is how long idle circuits will be kept open while cbt is
-                learning a new timeout value.
-
-      cbtmaxopencircs
-        Default: 10
-        Min: 0
-        Max: 14
-        Effect: This is the maximum number of circuits that can be open at
-                at the same time during the circuit build time learning phase.
-
-2.5. Handling failure
-
-   If an attempt to extend a circuit fails (either because the first create
-   failed or a subsequent extend failed) then the circuit is torn down and is
-   no longer pending.  (XXXX really?)  Requests that might have been
-   supported by the pending circuit thus become unsupported, and a new
-   circuit needs to be constructed.
-
-   If a stream "begin" attempt fails with an EXITPOLICY error, we
-   decide that the exit node's exit policy is not correctly advertised,
-   so we treat the exit node as if it were a non-exit until we retrieve
-   a fresh descriptor for it.
-
-   Excessive amounts of either type of failure can indicate an
-   attack on anonymity. See section 7 for how excessive failure is handled.
-
-3. Attaching streams to circuits
-
-   When a circuit that might support a request is built, Tor tries to attach
-   the request's stream to the circuit and sends a BEGIN, BEGIN_DIR,
-   or RESOLVE relay
-   cell as appropriate.  If the request completes unsuccessfully, Tor
-   considers the reason given in the CLOSE relay cell. [XXX yes, and?]
-
-
-   After a request has remained unattached for SocksTimeout (2 minutes
-   by default), Tor abandons the attempt and signals an error to the
-   client as appropriate (e.g., by closing the SOCKS connection).
-
-   XXX Timeouts and when Tor auto-retries.
-
-    * What stream-end-reasons are appropriate for retrying.
-
-   If no reply to BEGIN/RESOLVE, then the stream will timeout and fail.
-
-4. Hidden-service related circuits
-
-  XXX Tracking expected hidden service use (client-side and hidserv-side)
-
-5. Guard nodes
-
-  We use Guard nodes (also called "helper nodes" in the research
-  literature) to prevent certain profiling attacks. For an overview of
-  our Guard selection algorithm -- which has grown rather complex -- see
-  guard-spec.txt.
-
-5.1. How consensus bandwidth weights factor into entry guard selection
-
-  When weighting a list of routers for choosing an entry guard, the following
-  consensus parameters (from the "bandwidth-weights" line) apply:
-
-      Wgg - Weight for Guard-flagged nodes in the guard position
-      Wgm - Weight for non-flagged nodes in the guard Position
-      Wgd - Weight for Guard+Exit-flagged nodes in the guard Position
-      Wgb - Weight for BEGIN_DIR-supporting Guard-flagged nodes
-      Wmb - Weight for BEGIN_DIR-supporting non-flagged nodes
-      Web - Weight for BEGIN_DIR-supporting Exit-flagged nodes
-      Wdb - Weight for BEGIN_DIR-supporting Guard+Exit-flagged nodes
-
-  Please see "bandwidth-weights" in §3.4.1 of dir-spec.txt for more in depth
-  descriptions of these parameters.
-
-  If a router has been marked as both an entry guard and an exit, then we
-  prefer to use it more, with our preference for doing so (roughly) linearly
-  increasing w.r.t. the router's non-guard bandwidth and bandwidth weight
-  (calculated without taking the guard flag into account).  From proposal
-  #236:
-
-    |
-    | Let Wpf denote the weight from the 'bandwidth-weights' line a
-    | client would apply to N for position p if it had the guard
-    | flag, Wpn the weight if it did not have the guard flag, and B the
-    | measured bandwidth of N in the consensus.  Then instead of choosing
-    | N for position p proportionally to Wpf*B or Wpn*B, clients should
-    | choose N proportionally to F*Wpf*B + (1-F)*Wpn*B.
-
-  where F is the weight as calculated using the above parameters.
-
-6. Server descriptor purposes
-
-  There are currently three "purposes" supported for server descriptors:
-  general, controller, and bridge. Most descriptors are of type general
-  -- these are the ones listed in the consensus, and the ones fetched
-  and used in normal cases.
-
-  Controller-purpose descriptors are those delivered by the controller
-  and labelled as such: they will be kept around (and expire like
-  normal descriptors), and they can be used by the controller in its
-  CIRCUITEXTEND commands. Otherwise they are ignored by Tor when it
-  chooses paths.
-
-  Bridge-purpose descriptors are for routers that are used as bridges. See
-  doc/design-paper/blocking.pdf for more design explanation, or proposal
-  125 for specific details. Currently bridge descriptors are used in place
-  of normal entry guards, for Tor clients that have UseBridges enabled.
-
-7. Detecting route manipulation by Guard nodes (Path Bias)
-
-  The Path Bias defense is designed to defend against a type of route
-  capture where malicious Guard nodes deliberately fail or choke circuits
-  that extend to non-colluding Exit nodes to maximize their network
-  utilization in favor of carrying only compromised traffic.
-
-  In the extreme, the attack allows an adversary that carries c/n
-  of the network capacity to deanonymize c/n of the network
-  connections, breaking the O((c/n)^2) property of Tor's original
-  threat model. It also allows targeted attacks aimed at monitoring
-  the activity of specific users, bridges, or Guard nodes.
-
-  There are two points where path selection can be manipulated:
-  during construction, and during usage. Circuit construction
-  can be manipulated by inducing circuit failures during circuit
-  extend steps, which causes the Tor client to transparently retry
-  the circuit construction with a new path. Circuit usage can be
-  manipulated by abusing the stream retry features of Tor (for
-  example by withholding stream attempt responses from the client
-  until the stream timeout has expired), at which point the tor client
-  will also transparently retry the stream on a new path.
-
-  The defense as deployed therefore makes two independent sets of
-  measurements of successful path use: one during circuit construction,
-  and one during circuit usage.
-
-  The intended behavior is for clients to ultimately disable the use
-  of Guards responsible for excessive circuit failure of either type
-  (see section 7.4); however known issues with the Tor network currently
-  restrict the defense to being informational only at this stage (see
-  section 7.5).
-
-7.1. Measuring path construction success rates
-
-  Clients maintain two counts for each of their guards: a count of the
-  number of times a circuit was extended to at least two hops through that
-  guard, and a count of the number of circuits that successfully complete
-  through that guard. The ratio of these two numbers is used to determine
-  a circuit success rate for that Guard.
-
-  Circuit build timeouts are counted as construction failures if the
-  circuit fails to complete before the 95% "right-censored" timeout
-  interval, not the 80% timeout condition (see section 2.4).
-
-  If a circuit closes prematurely after construction but before being
-  requested to close by the client, this is counted as a failure.
-
-7.2. Measuring path usage success rates
-
-  Clients maintain two usage counts for each of their guards: a count
-  of the number of usage attempts, and a count of the number of
-  successful usages.
-
-  A usage attempt means any attempt to attach a stream to a circuit.
-
-  Usage success status is temporarily recorded by state flags on circuits.
-  Guard usage success counts are not incremented until circuit close. A
-  circuit is marked as successfully used if we receive a properly
-  recognized RELAY cell on that circuit that was expected for the current
-  circuit purpose.
-
-  If subsequent stream attachments fail or time out, the successfully used
-  state of the circuit is cleared, causing it once again to be regarded
-  as a usage attempt only.
-
-  Upon close by the client, all circuits that are still marked as usage
-  attempts are probed using a RELAY_BEGIN cell constructed with a
-  destination of the form 0.a.b.c:25, where a.b.c is a 24 bit random
-  nonce. If we get a RELAY_COMMAND_END in response matching our nonce,
-  the circuit is counted as successfully used.
-
-  If any unrecognized RELAY cells arrive after the probe has been sent,
-  the circuit is counted as a usage failure.
-
-  If the stream failure reason codes DESTROY, TORPROTOCOL, or INTERNAL
-  are received in response to any stream attempt, such circuits are not
-  probed and are declared usage failures.
-
-  Prematurely closed circuits are not probed, and are counted as usage
-  failures.
-
-7.3. Scaling success counts
-
-  To provide a moving average of recent Guard activity while
-  still preserving the ability to verify correctness, we periodically
-  "scale" the success counts by multiplying them by a scale factor
-  between 0 and 1.0.
-
-  Scaling is performed when either usage or construction attempt counts
-  exceed a parametrized value.
-
-  To avoid error due to scaling during circuit construction and use,
-  currently open circuits are subtracted from the usage counts before
-  scaling, and added back after scaling.
-
-7.4. Parametrization
-
-   The following consensus parameters tune various aspects of the
-   defense.
-
-     pb_mincircs
-       Default: 150
-       Min: 5
-       Effect: This is the minimum number of circuits that must complete
-               at least 2 hops before we begin evaluating construction rates.
-
-
-     pb_noticepct
-       Default: 70
-       Min: 0
-       Max: 100
-       Effect: If the circuit success rate falls below this percentage,
-               we emit a notice log message.
-
-     pb_warnpct
-       Default: 50
-       Min: 0
-       Max: 100
-       Effect: If the circuit success rate falls below this percentage,
-               we emit a warn log message.
-
-     pb_extremepct
-       Default: 30
-       Min: 0
-       Max: 100
-       Effect: If the circuit success rate falls below this percentage,
-               we emit a more alarmist warning log message. If
-               pb_dropguard is set to 1, we also disable the use of the
-               guard.
-
-     pb_dropguards
-       Default: 0
-       Min: 0
-       Max: 1
-       Effect: If the circuit success rate falls below pb_extremepct,
-               when pb_dropguard is set to 1, we disable use of that
-               guard.
-
-     pb_scalecircs
-       Default: 300
-       Min: 10
-       Effect: After this many circuits have completed at least two hops,
-               Tor performs the scaling described in Section 7.3.
-
-     pb_multfactor and pb_scalefactor
-       Default: 1/2
-       Min: 0.0
-       Max: 1.0
-       Effect: The double-precision result obtained from
-               pb_multfactor/pb_scalefactor is multiplied by our current
-               counts to scale them.
-
-     pb_minuse
-       Default: 20
-       Min: 3
-       Effect: This is the minimum number of circuits that we must attempt to
-               use before we begin evaluating construction rates.
-
-     pb_noticeusepct
-       Default: 80
-       Min: 3
-       Effect: If the circuit usage success rate falls below this percentage,
-               we emit a notice log message.
-
-     pb_extremeusepct
-       Default: 60
-       Min: 3
-       Effect: If the circuit usage success rate falls below this percentage,
-               we emit a warning log message. We also disable the use of the
-               guard if pb_dropguards is set.
-
-     pb_scaleuse
-       Default: 100
-       Min: 10
-       Effect: After we have attempted to use this many circuits,
-               Tor performs the scaling described in Section 7.3.
-
-7.5. Known barriers to enforcement
-
-  Due to intermittent CPU overload at relays, the normal rate of
-  successful circuit completion is highly variable. The Guard-dropping
-  version of the defense is unlikely to be deployed until the ntor
-  circuit handshake is enabled, or the nature of CPU overload induced
-  failure is better understood.
-
-
-
-X. Old notes
-
-X.1. Do we actually do this?
-
-How to deal with network down.
-  - While all helpers are down/unreachable and there are no established
-    or on-the-way testing circuits, launch a testing circuit. (Do this
-    periodically in the same way we try to establish normal circuits
-    when things are working normally.)
-    (Testing circuits are a special type of circuit, that streams won't
-    attach to by accident.)
-  - When a testing circuit succeeds, mark all helpers up and hold
-    the testing circuit open.
-  - If a connection to a helper succeeds, close all testing circuits.
-    Else mark that helper down and try another.
-  - If the last helper is marked down and we already have a testing
-    circuit established, then add the first hop of that testing circuit
-    to the end of our helper node list, close that testing circuit,
-    and go back to square one. (Actually, rather than closing the
-    testing circuit, can we get away with converting it to a normal
-    circuit and beginning to use it immediately?)
-
-  [Do we actually do any of the above?  If so, let's spec it.  If not, let's
-  remove it. -NM]
-
-X.2. A thing we could do to deal with reachability.
-
-And as a bonus, it leads to an answer to Nick's attack ("If I pick
-my helper nodes all on 18.0.0.0:*, then I move, you'll know where I
-bootstrapped") -- the answer is to pick your original three helper nodes
-without regard for reachability. Then the above algorithm will add some
-more that are reachable for you, and if you move somewhere, it's more
-likely (though not certain) that some of the originals will become useful.
-Is that smart or just complex?
-
-X.3. Some stuff that worries me about entry guards. 2006 Jun, Nickm.
-
-  It is unlikely for two users to have the same set of entry guards.
-  Observing a user is sufficient to learn its entry guards.  So, as we move
-  around, entry guards make us linkable.  If we want to change guards when
-  our location (IP? subnet?) changes, we have two bad options.  We could
-
-    - Drop the old guards.  But if we go back to our old location,
-      we'll not use our old guards.  For a laptop that sometimes gets used
-      from work and sometimes from home, this is pretty fatal.
-    - Remember the old guards as associated with the old location, and use
-      them again if we ever go back to the old location.  This would be
-      nasty, since it would force us to record where we've been.
-
-  [Do we do any of this now? If not, this should move into 099-misc or
-  098-todo. -NM]
author	Nick Mathewson <nickm@torproject.org>	2023-10-12 12:27:58 -0400
committer	Nick Mathewson <nickm@torproject.org>	2023-10-12 12:27:58 -0400
commit	e4e0d93d56ee8c1aec4c2efaa7046b651f0fe55c (patch)
tree	15a085da265ae3b2b70f29a70f910a5371059a78 /path-spec.txt
parent	b719a373934d3e79ef833446c6aeeb19be485510 (diff)
download	torspec-e4e0d93d56ee8c1aec4c2efaa7046b651f0fe55c.tar.gz torspec-e4e0d93d56ee8c1aec4c2efaa7046b651f0fe55c.zip