13 files changed, 1078 insertions, 0 deletions
diff --git a/spec/path-spec/attaching-streams-to-circuits.md b/spec/path-spec/attaching-streams-to-circuits.md
new file mode 100644
index 0000000..585885b
--- /dev/null
+++ b/spec/path-spec/attaching-streams-to-circuits.md
@@ -0,0 +1,19 @@
+<a id="path-spec.txt-3"></a>
+
+# Attaching streams to circuits
+
+When a circuit that might support a request is built, Tor tries to attach
+the request's stream to the circuit and sends a BEGIN, BEGIN_DIR,
+or RESOLVE relay
+cell as appropriate.  If the request completes unsuccessfully, Tor
+considers the reason given in the CLOSE relay cell. \[XXX yes, and?\]
+
+After a request has remained unattached for SocksTimeout (2 minutes
+by default), Tor abandons the attempt and signals an error to the
+client as appropriate (e.g., by closing the SOCKS connection).
+
+XXX Timeouts and when Tor auto-retries.
+
+- What stream-end-reasons are appropriate for retrying.
+
+If no reply to BEGIN/RESOLVE, then the stream will timeout and fail.
diff --git a/spec/path-spec/building-circuits.md b/spec/path-spec/building-circuits.md
new file mode 100644
index 0000000..dfb1332
--- /dev/null
+++ b/spec/path-spec/building-circuits.md
@@ -0,0 +1,9 @@
+<a id="path-spec.txt-2"></a>
+
+# Building circuits
+
+Here we describe a number of rules for building circuits: under what
+circumstances we do so, how we choose the paths for them, when we give
+up on an in-progress circuits, and what we do when circuit
+construction fails.
+
diff --git a/spec/path-spec/cannibalizing-circuits.md b/spec/path-spec/cannibalizing-circuits.md
new file mode 100644
index 0000000..3987839
--- /dev/null
+++ b/spec/path-spec/cannibalizing-circuits.md
@@ -0,0 +1,15 @@
+<a id="path-spec.txt-2.3"></a>
+
+# Cannibalizing circuits
+
+If we need a circuit and have a clean one already established, in
+some cases we can adapt the clean circuit for our new
+purpose. Specifically,
+
+For hidden service interactions, we can "cannibalize" a clean internal
+circuit if one is available, so we don't need to build those circuits
+from scratch on demand.
+
+We can also cannibalize clean circuits when the client asks to exit
+at a given node -- either via the ".exit" notation or because the
+destination is running at the same location as an exit node.
diff --git a/spec/path-spec/detecting-route-manipulation.md b/spec/path-spec/detecting-route-manipulation.md
new file mode 100644
index 0000000..76be4d1
--- /dev/null
+++ b/spec/path-spec/detecting-route-manipulation.md
@@ -0,0 +1,202 @@
+<a id="path-spec.txt-7"></a>
+
+# Detecting route manipulation by Guard nodes (Path Bias) {#pathbias}
+
+The Path Bias defense is designed to defend against a type of route
+capture where malicious Guard nodes deliberately fail or choke circuits
+that extend to non-colluding Exit nodes to maximize their network
+utilization in favor of carrying only compromised traffic.
+
+In the extreme, the attack allows an adversary that carries c/n
+of the network capacity to deanonymize c/n of the network
+connections, breaking the O((c/n)^2) property of Tor's original
+threat model. It also allows targeted attacks aimed at monitoring
+the activity of specific users, bridges, or Guard nodes.
+
+There are two points where path selection can be manipulated:
+during construction, and during usage. Circuit construction
+can be manipulated by inducing circuit failures during circuit
+extend steps, which causes the Tor client to transparently retry
+the circuit construction with a new path. Circuit usage can be
+manipulated by abusing the stream retry features of Tor (for
+example by withholding stream attempt responses from the client
+until the stream timeout has expired), at which point the tor client
+will also transparently retry the stream on a new path.
+
+The defense as deployed therefore makes two independent sets of
+measurements of successful path use: one during circuit construction,
+and one during circuit usage.
+
+The intended behavior is for clients to ultimately disable the use
+of Guards responsible for excessive circuit failure of either type
+(for the parameters to do this, see ["Parameterization"](#parameters) below);
+however known issues with the Tor network currently
+restrict the defense to being informational only at this stage
+(see ["Known barriers to enforcement"](#barriers)).
+
+<a id="path-spec.txt-7.1"></a>
+
+## Measuring path construction success rates {#construction-success-rate}
+
+Clients maintain two counts for each of their guards: a count of the
+number of times a circuit was extended to at least two hops through that
+guard, and a count of the number of circuits that successfully complete
+through that guard. The ratio of these two numbers is used to determine
+a circuit success rate for that Guard.
+
+[Circuit build timeouts](./learning-timeouts.md)
+are counted as construction failures if the
+circuit fails to complete before the 95% "right-censored" timeout
+interval, not the 80% timeout condition.
+
+If a circuit closes prematurely after construction but before being
+requested to close by the client, this is counted as a failure.
+
+<a id="path-spec.txt-7.2"></a>
+
+## Measuring path usage success rates {#usage-success-rate}
+
+Clients maintain two usage counts for each of their guards: a count
+of the number of usage attempts, and a count of the number of
+successful usages.
+
+A usage attempt means any attempt to attach a stream to a circuit.
+
+Usage success status is temporarily recorded by state flags on circuits.
+Guard usage success counts are not incremented until circuit close. A
+circuit is marked as successfully used if we receive a properly
+recognized RELAY cell on that circuit that was expected for the current
+circuit purpose.
+
+If subsequent stream attachments fail or time out, the successfully used
+state of the circuit is cleared, causing it once again to be regarded
+as a usage attempt only.
+
+Upon close by the client, all circuits that are still marked as usage
+attempts are probed using a RELAY_BEGIN cell constructed with a
+destination of the form 0.a.b.c:25, where a.b.c is a 24 bit random
+nonce. If we get a RELAY_COMMAND_END in response matching our nonce,
+the circuit is counted as successfully used.
+
+If any unrecognized RELAY cells arrive after the probe has been sent,
+the circuit is counted as a usage failure.
+
+If the stream failure reason codes DESTROY, TORPROTOCOL, or INTERNAL
+are received in response to any stream attempt, such circuits are not
+probed and are declared usage failures.
+
+Prematurely closed circuits are not probed, and are counted as usage
+failures.
+
+<a id="path-spec.txt-7.3"></a>
+
+## Scaling success counts {#scaling}
+
+To provide a moving average of recent Guard activity while
+still preserving the ability to verify correctness, we periodically
+"scale" the success counts by multiplying them by a scale factor
+between 0 and 1.0.
+
+Scaling is performed when either usage or construction attempt counts
+exceed a parametrized value.
+
+To avoid error due to scaling during circuit construction and use,
+currently open circuits are subtracted from the usage counts before
+scaling, and added back after scaling.
+
+<a id="path-spec.txt-7.4"></a>
+
+## Parametrization {#parameters}
+
+The following consensus parameters tune various aspects of the
+defense.
+
+```text
+     pb_mincircs
+       Default: 150
+       Min: 5
+       Effect: This is the minimum number of circuits that must complete
+               at least 2 hops before we begin evaluating construction rates.
+
+     pb_noticepct
+       Default: 70
+       Min: 0
+       Max: 100
+       Effect: If the circuit success rate falls below this percentage,
+               we emit a notice log message.
+
+     pb_warnpct
+       Default: 50
+       Min: 0
+       Max: 100
+       Effect: If the circuit success rate falls below this percentage,
+               we emit a warn log message.
+
+     pb_extremepct
+       Default: 30
+       Min: 0
+       Max: 100
+       Effect: If the circuit success rate falls below this percentage,
+               we emit a more alarmist warning log message. If
+               pb_dropguard is set to 1, we also disable the use of the
+               guard.
+
+     pb_dropguards
+       Default: 0
+       Min: 0
+       Max: 1
+       Effect: If the circuit success rate falls below pb_extremepct,
+               when pb_dropguard is set to 1, we disable use of that
+               guard.
+
+     pb_scalecircs
+       Default: 300
+       Min: 10
+       Effect: After this many circuits have completed at least two hops,
+               Tor performs the scaling described in
+	       ["Scaling success counts"](#scaling).
+
+     pb_multfactor and pb_scalefactor
+       Default: 1/2
+       Min: 0.0
+       Max: 1.0
+       Effect: The double-precision result obtained from
+               pb_multfactor/pb_scalefactor is multiplied by our current
+               counts to scale them.
+
+     pb_minuse
+       Default: 20
+       Min: 3
+       Effect: This is the minimum number of circuits that we must attempt to
+               use before we begin evaluating construction rates.
+
+     pb_noticeusepct
+       Default: 80
+       Min: 3
+       Effect: If the circuit usage success rate falls below this percentage,
+               we emit a notice log message.
+
+     pb_extremeusepct
+       Default: 60
+       Min: 3
+       Effect: If the circuit usage success rate falls below this percentage,
+               we emit a warning log message. We also disable the use of the
+               guard if pb_dropguards is set.
+
+     pb_scaleuse
+       Default: 100
+       Min: 10
+       Effect: After we have attempted to use this many circuits,
+               Tor performs the scaling described in
+      	       ["Scaling success counts"](#scaling).
+```
+
+<a id="path-spec.txt-7.5"></a>
+
+## Known barriers to enforcement {#barriers}
+
+Due to intermittent CPU overload at relays, the normal rate of
+successful circuit completion is highly variable. The Guard-dropping
+version of the defense is unlikely to be deployed until the ntor
+circuit handshake is enabled, or the nature of CPU overload induced
+failure is better understood.
diff --git a/spec/path-spec/general-operation.md b/spec/path-spec/general-operation.md
new file mode 100644
index 0000000..721ab3d
--- /dev/null
+++ b/spec/path-spec/general-operation.md
@@ -0,0 +1,97 @@
+<a id="path-spec.txt-1"></a>
+
+# General operation
+
+Tor begins building circuits as soon as it has
+[enough directory information](./when-we-build.md) to do so.
+Some circuits are
+built preemptively because we expect to need them later (for user
+traffic), and some are built because of immediate need (for user traffic
+that no current circuit can handle, for testing the network or our
+reachability, and so on).
+
+```text
+  [Newer versions of Tor (0.2.6.2-alpha and later):
+   If the consensus contains Exits (the typical case), Tor will build both
+   exit and internal circuits. When bootstrap completes, Tor will be ready
+   to handle an application requesting an exit circuit to services like the
+   World Wide Web.
+```
+
+If the consensus does not contain Exits, Tor will only build internal
+circuits. In this case, earlier statuses will have included "internal"
+as indicated above. When bootstrap completes, Tor will be ready to handle
+an application requesting an internal circuit to hidden services at
+".onion" addresses.
+
+If a future consensus contains Exits, exit circuits may become available.\]
+
+When a client application creates a new stream (by opening a SOCKS
+connection or launching a resolve request), we attach it to an appropriate
+open circuit if one exists, or wait if an appropriate circuit is
+in-progress. We launch a new circuit only
+if no current circuit can handle the request.  We rotate circuits over
+time to avoid some profiling attacks.
+
+To build a circuit, we choose all the nodes we want to use, and then
+construct the circuit.  Sometimes, when we want a circuit that ends at a
+given hop, and we have an appropriate unused circuit, we "cannibalize" the
+existing circuit and extend it to the new terminus.
+
+These processes are described in more detail below.
+
+This document describes Tor's automatic path selection logic only; path
+selection can be overridden by a controller (with the EXTENDCIRCUIT and
+ATTACHSTREAM commands).  Paths constructed through these means may
+violate some constraints given below.
+
+<a id="path-spec.txt-1.1"></a>
+
+## Terminology
+
+A "path" is an ordered sequence of nodes, not yet built as a circuit.
+
+A "clean" circuit is one that has not yet been used for any traffic.
+
+A "fast" or "stable" or "valid" node is one that has the 'Fast' or
+'Stable' or 'Valid' flag
+set respectively, based on our current directory information.  A "fast"
+or "stable" circuit is one consisting only of "fast" or "stable" nodes.
+
+In an "exit" circuit, the final node is chosen based on waiting stream
+requests if any, and in any case it avoids nodes with exit policy of
+"reject *:*". An "internal" circuit, on the other hand, is one where
+the final node is chosen just like a middle node (ignoring its exit
+policy).
+
+A "request" is a client-side stream or DNS resolve that needs to be
+served by a circuit.
+
+A "pending" circuit is one that we have started to build, but which has
+not yet completed.
+
+A circuit or path "supports" a request if it is okay to use the
+circuit/path to fulfill the request, according to the rules given below.
+A circuit or path "might support" a request if some aspect of the request
+is unknown (usually its target IP), but we believe the path probably
+supports the request according to the rules given below.
+
+<a id="path-spec.txt-1.2"></a>
+
+## A relay's bandwidth {#bandwidth}
+
+Old versions of Tor did not report bandwidths in network status
+documents, so clients had to learn them from the routers' advertised
+relay descriptors.
+
+For versions of Tor prior to 0.2.1.17-rc, everywhere below where we
+refer to a relay's "bandwidth", we mean its clipped advertised
+bandwidth, computed by taking the smaller of the 'rate' and
+'observed' arguments to the "bandwidth" element in the relay's
+descriptor.  If a router's advertised bandwidth is greater than
+MAX_BELIEVABLE_BANDWIDTH (currently 10 MB/s), we clipped to that
+value.
+
+For more recent versions of Tor, we take the bandwidth value declared
+in the consensus, and fall back to the clipped advertised bandwidth
+only if the consensus does not have bandwidths listed.
diff --git a/spec/path-spec/guard-nodes.md b/spec/path-spec/guard-nodes.md
new file mode 100644
index 0000000..af5750c
--- /dev/null
+++ b/spec/path-spec/guard-nodes.md
@@ -0,0 +1,44 @@
+<a id="path-spec.txt-5"></a>
+
+# Guard nodes
+
+We use Guard nodes (also called "helper nodes" in the research
+literature) to prevent certain profiling attacks. For an overview of
+our Guard selection algorithm -- which has grown rather complex -- see
+guard-spec.txt.
+
+<a id="path-spec.txt-5.1"></a>
+
+## How consensus bandwidth weights factor into entry guard selection {#bw-and-guards}
+
+When weighting a list of routers for choosing an entry guard, the following
+consensus parameters (from the "bandwidth-weights" line) apply:
+
+```text
+      Wgg - Weight for Guard-flagged nodes in the guard position
+      Wgm - Weight for non-flagged nodes in the guard Position
+      Wgd - Weight for Guard+Exit-flagged nodes in the guard Position
+      Wgb - Weight for BEGIN_DIR-supporting Guard-flagged nodes
+      Wmb - Weight for BEGIN_DIR-supporting non-flagged nodes
+      Web - Weight for BEGIN_DIR-supporting Exit-flagged nodes
+      Wdb - Weight for BEGIN_DIR-supporting Guard+Exit-flagged nodes
+```
+
+Please see "bandwidth-weights" in §3.4.1 of dir-spec.txt for more in depth
+descriptions of these parameters.
+
+If a router has been marked as both an entry guard and an exit, then we
+prefer to use it more, with our preference for doing so (roughly) linearly
+increasing w.r.t. the router's non-guard bandwidth and bandwidth weight
+(calculated without taking the guard flag into account).  From proposal
+236:
+
+|
+| Let Wpf denote the weight from the 'bandwidth-weights' line a
+| client would apply to N for position p if it had the guard
+| flag, Wpn the weight if it did not have the guard flag, and B the
+| measured bandwidth of N in the consensus.  Then instead of choosing
+| N for position p proportionally to Wpf*B or Wpn*B, clients should
+| choose N proportionally to F*Wpf*B + (1-F)*Wpn*B.
+
+where F is the weight as calculated using the above parameters.
diff --git a/spec/path-spec/handling-failure.md b/spec/path-spec/handling-failure.md
new file mode 100644
index 0000000..8db9245
--- /dev/null
+++ b/spec/path-spec/handling-failure.md
@@ -0,0 +1,19 @@
+<a id="path-spec.txt-2.5"></a>
+
+# Handling failure
+
+If an attempt to extend a circuit fails (either because the first create
+failed or a subsequent extend failed) then the circuit is torn down and is
+no longer pending.  (XXXX really?)  Requests that might have been
+supported by the pending circuit thus become unsupported, and a new
+circuit needs to be constructed.
+
+If a stream "begin" attempt fails with an EXITPOLICY error, we
+decide that the exit node's exit policy is not correctly advertised,
+so we treat the exit node as if it were a non-exit until we retrieve
+a fresh descriptor for it.
+
+Excessive amounts of either type of failure can indicate an
+attack on anonymity.
+See [discussion of path bias detection](./detecting-route-manipulation.md)
+for how excessive failure is handled.
diff --git a/spec/path-spec/hidden-service-related-circuits.md b/spec/path-spec/hidden-service-related-circuits.md
new file mode 100644
index 0000000..fc6e30d
--- /dev/null
+++ b/spec/path-spec/hidden-service-related-circuits.md
@@ -0,0 +1,5 @@
+<a id="path-spec.txt-4"></a>
+
+# Hidden-service related circuits
+
+XXX Tracking expected hidden service use (client-side and hidserv-side)
diff --git a/spec/path-spec/index.md b/spec/path-spec/index.md
new file mode 100644
index 0000000..55e5df8
--- /dev/null
+++ b/spec/path-spec/index.md
@@ -0,0 +1,21 @@
+# Tor Path Specification
+
+```text
+                              Roger Dingledine
+                               Nick Mathewson
+```
+
+Note: This is an attempt to specify Tor as currently implemented.  Future
+versions of Tor will implement improved algorithms.
+
+This document tries to cover how Tor chooses to build circuits and assign
+streams to circuits.  Other implementations MAY take other approaches, but
+implementors should be aware of the anonymity and load-balancing implications
+of their choices.
+
+THIS SPEC ISN'T DONE YET.
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+RFC 2119.
diff --git a/spec/path-spec/learning-timeouts.md b/spec/path-spec/learning-timeouts.md
new file mode 100644
index 0000000..e98bac3
--- /dev/null
+++ b/spec/path-spec/learning-timeouts.md
@@ -0,0 +1,305 @@
+<a id="path-spec.txt-2.4"></a>
+
+# Learning when to give up ("timeout") on circuit construction {#timing-out}
+
+Since version 0.2.2.8-alpha, Tor clients attempt to learn when to give
+up on circuits based on network conditions.
+
+<a id="path-spec.txt-2.4.1"></a>
+
+## Distribution choice
+
+Based on studies of build times, we found that the distribution of
+circuit build times appears to be a Frechet distribution (and a multi-modal
+Frechet distribution, if more than one guard or bridge is used). However,
+estimators and quantile functions of the Frechet distribution are difficult
+to work with and slow to converge. So instead, since we are only interested
+in the accuracy of the tail, clients approximate the tail of the multi-modal
+distribution with a single Pareto curve.
+
+<a id="path-spec.txt-2.4.2"></a>
+
+## How much data to record {#observations}
+
+From our observations, the minimum number of circuit build times for a
+reasonable fit appears to be on the order of 100. However, to keep a
+good fit over the long term, clients store 1000 most recent circuit build
+times in a circular array.
+
+These build times only include the times required to build three-hop
+circuits, and the times required to build the first three hops of circuits
+with more than three hops.  Circuits of fewer than three hops are not
+recorded, and hops past the third are not recorded.
+
+The Tor client should build test circuits at a rate of one every 'cbttestfreq'
+(10 seconds) until 'cbtmincircs' (100 circuits) are built, with a maximum of
+'cbtmaxopencircs' (default: 10) circuits open at once. This allows a fresh
+Tor to have a CircuitBuildTimeout estimated within 30 minutes after install
+or network change
+(see [Detecting Changing Network Conditions](#changes-in-network) below.)
+
+Timeouts are stored on disk in a histogram of 10ms bin width, the same
+width used to calculate the Xm value above. The timeouts recorded in the
+histogram must be shuffled after being read from disk, to preserve a
+proper expiration of old values after restart.
+
+Thus, some build time resolution is lost during restart. Implementations may
+choose a different persistence mechanism than this histogram, but be aware
+that build time binning is still needed for parameter estimation.
+
+<a id="path-spec.txt-2.4.3"></a>
+
+## Parameter estimation
+
+Once 'cbtmincircs' build times are recorded, Tor clients update the
+distribution parameters and recompute the timeout every circuit completion
+(though
+[see below](#changes-in-network)
+for when to pause and reset timeout due to
+too many circuits timing out).
+
+Tor clients calculate the parameters for a Pareto distribution fitting the
+data using the maximum likelihood estimator. For derivation, see:
+<https://en.wikipedia.org/wiki/Pareto_distribution#Estimation_of_parameters>
+
+Because build times are not a true Pareto distribution, we alter how Xm is
+computed. In a max likelihood estimator, the mode of the distribution is
+used directly as Xm.
+
+Instead of using the mode of discrete build times directly, Tor clients
+compute the Xm parameter using the weighted average of the midpoints
+of the 'cbtnummodes' (10) most frequently occurring 10ms histogram bins.
+Ties are broken in favor of earlier bins (that is, in favor of bins
+corresponding to shorter build times).
+
+(The use of 10 modes was found to minimize error from the selected
+cbtquantile, with 10ms bins for quantiles 60-80, compared to many other
+heuristics).
+
+To avoid ln(1.0+epsilon) precision issues, use log laws to rewrite the
+estimator for 'alpha' as the sum of logs followed by subtraction, rather
+than multiplication and division:
+
+`alpha = n/(Sum_n{ln(MAX(Xm, x_i))} - n\*ln(Xm))`
+
+In this, n is the total number of build times that have completed, x_i is
+the ith recorded build time, and Xm is the modes of x_i as above.
+
+All times below Xm are counted as having the Xm value via the MAX(),
+because in Pareto estimators, Xm is supposed to be the lowest value.
+However, since clients use mode averaging to estimate Xm, there can be
+values below our Xm. Effectively, the Pareto estimator then treats that
+everything smaller than Xm happened at Xm. One can also see that if
+clients did not do this, alpha could underflow to become negative, which
+results in an exponential curve, not a Pareto probability distribution.
+
+The timeout itself is calculated by using the Pareto Quantile function (the
+inverted CDF) to give us the value on the CDF such that 80% of the mass
+of the distribution is below the timeout value (parameter 'cbtquantile').
+
+The Pareto Quantile Function (inverse CDF) is:
+
+`F(q) = Xm/((1.0-q)^(1.0/alpha))`
+
+Thus, clients obtain the circuit build timeout for 3-hop circuits by
+computing:
+
+`timeout_ms = F(0.8)    # 'cbtquantile' == 0.8`
+
+With this, we expect that the Tor client will accept the fastest 80% of the
+total number of paths on the network.
+
+Clients obtain the circuit close time to completely abandon circuits as:
+
+`close_ms = F(0.99)     # 'cbtclosequantile' == 0.99`
+
+To avoid waiting an unreasonably long period of time for circuits that
+simply have relays that are down, Tor clients cap timeout_ms at the max
+build time actually observed so far, and cap close_ms at twice this max,
+but at least 60 seconds:
+
+```text
+     timeout_ms = MIN(timeout_ms, max_observed_timeout)
+     close_ms = MAX(MIN(close_ms, 2*max_observed_timeout), 'cbtinitialtimeout')
+```
+
+<a id="path-spec.txt-2.4.3"></a>
+
+## Calculating timeouts thresholds for circuits of different lengths {#different-lengths}
+
+The timeout_ms and close_ms estimates above are good only for 3-hop
+circuits, since only 3-hop circuits are recorded in the list of build
+times.
+
+To calculate the appropriate timeouts and close timeouts for circuits of
+other lengths, the client multiples the timeout_ms and close_ms values
+by a scaling factor determined by the number of communication hops
+needed to build their circuits:
+
+```text
+timeout_ms\[hops=n\] = timeout_ms * Actions(N) / Actions(3)
+
+close_ms\[hops=n\] = close_ms * Actions(N) / Actions(3)
+```
+
+where `Actions(N) = N * (N + 1) / 2.`
+
+To calculate timeouts for operations other than circuit building,
+the client should add X to Actions(N) for every round-trip communication
+required with the Xth hop.
+
+<a id="path-spec.txt-2.4.4"></a>
+
+## How to record timeouts {#recording-timeouts}
+
+Pareto estimators begin to lose their accuracy if the tail is omitted.
+Hence, Tor clients actually calculate two timeouts: a usage timeout, and a
+close timeout.
+
+Circuits that pass the usage timeout are marked as measurement circuits,
+and are allowed to continue to build until the close timeout corresponding
+to the point 'cbtclosequantile' (default 99) on the Pareto curve, or 60
+seconds, whichever is greater.
+
+The actual completion times for these measurement circuits should be
+recorded.
+
+Implementations should completely abandon a circuit and ignore the circuit
+if the total build time exceeds the close threshold. Such closed circuits
+should be ignored, as this typically means one of the relays in the path is
+offline.
+
+<a id="path-spec.txt-2.4.5"></a>
+
+## Detecting Changing Network Conditions {#changes-in-network}
+
+Tor clients attempt to detect both network connectivity loss and drastic
+changes in the timeout characteristics.
+
+To detect changing network conditions, clients keep a history of
+the timeout or non-timeout status of the past 'cbtrecentcount' circuits
+(20 circuits) that successfully completed at least one hop. If more than
+90% of these circuits timeout, the client discards all buildtimes history,
+resets the timeout to 'cbtinitialtimeout' (60 seconds), and then begins
+recomputing the timeout.
+
+If the timeout was already at least `cbtinitialtimeout`,
+the client doubles the timeout.
+
+The records here (of how many circuits succeeded or failed among the most
+recent 'cbrrecentcount') are not stored as persistent state.  On reload,
+we start with a new, empty state.
+
+<a id="path-spec.txt-2.4.6"></a>
+
+## Consensus parameters governing behavior {#parameters}
+
+Clients that implement circuit build timeout learning should obey the
+following consensus parameters that govern behavior, in order to allow
+us to handle bugs or other emergent behaviors due to client circuit
+construction. If these parameters are not present in the consensus,
+the listed default values should be used instead.
+
+```text
+      cbtdisabled
+        Default: 0
+        Min: 0
+        Max: 1
+        Effect: If 1, all CircuitBuildTime learning code should be
+                disabled and history should be discarded. For use in
+                emergency situations only.
+
+      cbtnummodes
+        Default: 10
+        Min: 1
+        Max: 20
+        Effect: This value governs how many modes to use in the weighted
+        average calculation of Pareto parameter Xm. Selecting Xm as the
+        average of multiple modes improves accuracy of the Pareto tail
+        for quantile cutoffs from 60-80% (see cbtquantile).
+
+      cbtrecentcount
+        Default: 20
+        Min: 3
+        Max: 1000
+        Effect: This is the number of circuit build outcomes (success vs
+                timeout) to keep track of for the following option.
+
+      cbtmaxtimeouts
+        Default: 18
+        Min: 3
+        Max: 10000
+        Effect: When this many timeouts happen in the last 'cbtrecentcount'
+                circuit attempts, the client should discard all of its
+                history and begin learning a fresh timeout value.
+
+                Note that if this parameter's value is greater than the value
+                of 'cbtrecentcount', then the history will never be
+                discarded because of this feature.
+
+      cbtmincircs
+        Default: 100
+        Min: 1
+        Max: 10000
+        Effect: This is the minimum number of circuits to build before
+                computing a timeout.
+
+                Note that if this parameter's value is higher than 1000 (the
+                number of time observations that a client keeps in its
+                circular buffer), circuit build timeout calculation is
+                effectively disabled, and the default timeouts are used
+                indefinitely.
+
+      cbtquantile
+        Default: 80
+        Min: 10
+        Max: 99
+        Effect: This is the position on the quantile curve to use to set the
+                timeout value. It is a percent (10-99).
+
+      cbtclosequantile
+        Default: 99
+        Min: Value of cbtquantile parameter
+        Max: 99
+        Effect: This is the position on the quantile curve to use to set the
+                timeout value to use to actually close circuits. It is a
+                percent (0-99).
+
+      cbttestfreq
+        Default: 10
+        Min: 1
+        Max: 2147483647 (INT32_MAX)
+        Effect: Describes how often in seconds to build a test circuit to
+                gather timeout values. Only applies if less than 'cbtmincircs'
+                have been recorded.
+
+      cbtmintimeout
+        Default: 10
+        Min: 10
+        Max: 2147483647 (INT32_MAX)
+        Effect: This is the minimum allowed timeout value in milliseconds.
+
+      cbtinitialtimeout
+        Default: 60000
+        Min: Value of cbtmintimeout
+        Max: 2147483647 (INT32_MAX)
+        Effect: This is the timeout value to use before we have enough data
+                to compute a timeout, in milliseconds.  If we do not have
+                enough data to compute a timeout estimate (see cbtmincircs),
+                then we use this interval both for the close timeout and the
+                abandon timeout.
+
+      cbtlearntimeout
+        Default: 180
+        Min: 10
+        Max: 60000
+        Effect: This is how long idle circuits will be kept open while cbt is
+                learning a new timeout value.
+
+      cbtmaxopencircs
+        Default: 10
+        Min: 0
+        Max: 14
+        Effect: This is the maximum number of circuits that can be open at
+                at the same time during the circuit build time learning phase.
+```
diff --git a/spec/path-spec/path-selection-constraints.md b/spec/path-spec/path-selection-constraints.md
new file mode 100644
index 0000000..8cac4fc
--- /dev/null
+++ b/spec/path-spec/path-selection-constraints.md
@@ -0,0 +1,146 @@
+<a id="path-spec.txt-2.2"></a>
+
+# Path selection and constraints
+
+We choose the path for each new circuit before we build it,
+based on our current directory information.
+(Clients and relays use the latest directory information they have;
+directory authorities use their own opinions.)
+
+We choose the
+exit node first, followed by the other nodes in the circuit, front to
+back. (In other words, for a 3-hop circuit, we first pick hop 3,
+then hop 1, then hop 2.)
+
+## Universal constraints
+
+All paths we generate obey the following
+constraints:
+
+- We do not choose the same router twice for the same path.
+- We do not choose any router in the same family as another in the same
+  path. (Two routers are in the same family if each one lists the other
+  in the "family" entries of its descriptor.)
+- We do not choose more than one router in a given network range,
+  which defaults to /16 for IPv4 and /32 for IPv6.
+  (C Tor overrides this with `EnforceDistinctSubnets`;
+  Arti overrides this with `ipv[46]_subnet_family_prefix`.)
+- The first node must be a Guard (see
+  discussion [below](./guard-nodes.md) and in the
+  [guard specification](../guard-spec)).
+- XXXX Choosing the length
+
+## Special-purpose constraints
+
+Additionally, we may be building circuits with one or more requests in
+mind.  Each kind of request puts certain constraints on paths.
+
+Most circuits need to be "Fast".
+For these, we only choose nodes with the `Fast` flag.
+For non-"fast" circuits, nodes without the `Fast` flag are eligible.
+
+- TODO document which circuits (do not) need to be Fast.
+
+Similarly, some circuits need to be "Stable".
+For these, we only choose nodes with the Stable flag.
+
+- All service-side introduction circuits and all rendezvous paths
+  should be Stable.
+- All connection requests for connections that we think will need to
+  stay open a long time require Stable circuits.  Currently, Tor decides
+  this by examining the request's target port, and comparing it to a
+  list of "long-lived" ports. (Default: 21, 22, 706, 1863, 5050,
+  5190, 5222, 5223, 6667, 6697, 8300.)
+
+<a id="path-spec.txt-2.2.1"></a>
+
+## Weighting node selection
+
+For all circuits, we weight node selection according to router bandwidth.
+
+We also weight the bandwidth of Exit and Guard flagged nodes depending on
+the fraction of total bandwidth that they make up and depending upon the
+position they are being selected for.
+
+These weights are published in the consensus, and are computed as described
+in
+["Computing Bandwidth Weights"](../dir-spec/computing-consensus.md#computing-bandwidth-weights)
+in the directory specification.
+They are:
+
+```text
+      Wgg - Weight for Guard-flagged nodes in the guard position
+      Wgm - Weight for non-flagged nodes in the guard Position
+      Wgd - Weight for Guard+Exit-flagged nodes in the guard Position
+
+      Wmg - Weight for Guard-flagged nodes in the middle Position
+      Wmm - Weight for non-flagged nodes in the middle Position
+      Wme - Weight for Exit-flagged nodes in the middle Position
+      Wmd - Weight for Guard+Exit flagged nodes in the middle Position
+
+      Weg - Weight for Guard flagged nodes in the exit Position
+      Wem - Weight for non-flagged nodes in the exit Position
+      Wee - Weight for Exit-flagged nodes in the exit Position
+      Wed - Weight for Guard+Exit-flagged nodes in the exit Position
+
+      Wgb - Weight for BEGIN_DIR-supporting Guard-flagged nodes
+      Wmb - Weight for BEGIN_DIR-supporting non-flagged nodes
+      Web - Weight for BEGIN_DIR-supporting Exit-flagged nodes
+      Wdb - Weight for BEGIN_DIR-supporting Guard+Exit-flagged nodes
+
+      Wbg - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
+      Wbm - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
+      Wbe - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
+      Wbd - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests
+```
+
+If any of those weights is malformed or not present in a consensus,
+clients proceed with the regular path selection algorithm setting
+the weights to the default value of 10000.
+
+
+## Choosing an exit
+
+If we know what IP address we want to connect to, we can
+trivially tell whether a given router will support it by simulating
+its declared exit policy.
+
+(DNS resolve requests are only sent to relays whose
+exit policy is not equivalent to "reject *:*".)
+
+Because we often connect to addresses of the form hostname:port, we do not
+always know the target IP address when we select an exit node.  In these
+cases, we need to pick an exit node that "might support" connections to a
+given address port with an unknown address.  An exit node "might support"
+such a connection if any clause that accepts any connections to that port
+precedes all clauses (if any) that reject all connections to that port.
+
+Unless requested to do so by the user, we never choose an exit node
+flagged as "BadExit" by more than half of the authorities who advertise
+themselves as listing bad exits.
+
+<a id="path-spec.txt-2.2.2"></a>
+
+## User configuration
+
+Users can alter the default behavior for path selection with configuration
+options.
+
+```text
+   - If "ExitNodes" is provided, then every request requires an exit node on
+     the ExitNodes list.  (If a request is supported by no nodes on that list,
+     and StrictExitNodes is false, then Tor treats that request as if
+     ExitNodes were not provided.)
+
+   - "EntryNodes" and "StrictEntryNodes" behave analogously.
+
+   - If a user tries to connect to or resolve a hostname of the form
+     <target>.<servername>.exit, the request is rewritten to a request for
+     <target>, and the request is only supported by the exit whose nickname
+     or fingerprint is <servername>.
+
+   - When set, "HSLayer2Nodes" and "HSLayer3Nodes" relax Tor's path
+     restrictions to allow nodes in the same /16 and node family to reappear
+     in the path. They also allow the guard node to be chosen as the RP, IP,
+     and HSDIR, and as the hop before those positions.
+```
diff --git a/spec/path-spec/server-descriptor-purposes.md b/spec/path-spec/server-descriptor-purposes.md
new file mode 100644
index 0000000..3a7dca0
--- /dev/null
+++ b/spec/path-spec/server-descriptor-purposes.md
@@ -0,0 +1,19 @@
+<a id="path-spec.txt-6"></a>
+
+# Server descriptor purposes
+
+There are currently three "purposes" supported for server descriptors:
+general, controller, and bridge. Most descriptors are of type general
+-- these are the ones listed in the consensus, and the ones fetched
+and used in normal cases.
+
+Controller-purpose descriptors are those delivered by the controller
+and labelled as such: they will be kept around (and expire like
+normal descriptors), and they can be used by the controller in its
+CIRCUITEXTEND commands. Otherwise they are ignored by Tor when it
+chooses paths.
+
+Bridge-purpose descriptors are for routers that are used as bridges. See
+doc/design-paper/blocking.pdf for more design explanation, or proposal
+125 for specific details. Currently bridge descriptors are used in place
+of normal entry guards, for Tor clients that have UseBridges enabled.
diff --git a/spec/path-spec/when-we-build.md b/spec/path-spec/when-we-build.md
new file mode 100644
index 0000000..7130978
--- /dev/null
+++ b/spec/path-spec/when-we-build.md
@@ -0,0 +1,177 @@
+<a id="path-spec.txt-2.1"></a>
+
+# When we build
+
+<a id="path-spec.txt-2.1.0"></a>
+
+## We don't build circuits until we have enough directory info
+
+There's a class of possible attacks where our directory servers
+only give us information about the relays that they would like us
+to use.  To prevent this attack, we don't build multi-hop
+circuits
+(including
+[preemptive circuits](#preemptive),
+[on-demand circuits(#on-demand),
+[onion-service circuits](#onion-service)]
+or [self-testing testing circuits](#self-test))
+for real traffic
+until we have enough directory information to be
+reasonably confident this attack isn't being done to us.
+
+Here, "enough" directory information is defined as:
+
+```text
+      * Having a consensus that's been valid at some point in the
+        last REASONABLY_LIVE_TIME interval (24 hours).
+
+      * Having enough descriptors that we could build at least some
+        fraction F of all bandwidth-weighted paths, without taking
+        ExitNodes/EntryNodes/etc into account.
+
+        (F is set by the PathsNeededToBuildCircuits option,
+        defaulting to the 'min_paths_for_circs_pct' consensus
+        parameter, with a final default value of 60%.)
+
+      * Having enough descriptors that we could build at least some
+        fraction F of all bandwidth-weighted paths, _while_ taking
+        ExitNodes/EntryNodes/etc into account.
+
+        (F is as above.)
+
+      * Having a descriptor for every one of the first
+        NUM_USABLE_PRIMARY_GUARDS guards among our primary guards. (see
+        guard-spec.txt)
+```
+
+We define the "fraction of bandwidth-weighted paths" as the product of
+these three fractions.
+
+```text
+      * The fraction of descriptors that we have for nodes with the Guard
+        flag, weighted by their bandwidth for the guard position.
+      * The fraction of descriptors that we have for all nodes,
+        weighted by their bandwidth for the middle position.
+      * The fraction of descriptors that we have for nodes with the Exit
+        flag, weighted by their bandwidth for the exit position.
+```
+
+If the consensus has zero weighted bandwidth for a given kind of
+relay (Guard, Middle, or Exit), Tor instead uses the fraction of relays
+for which it has the descriptor (not weighted by bandwidth at all).
+
+If the consensus lists zero exit-flagged relays, Tor instead uses the
+fraction of middle relays.
+
+<a id="path-spec.txt-2.1.1"></a>
+
+## Clients build circuits preemptively {#preemptive}
+
+When running as a client, Tor tries to maintain at least a certain
+number of clean circuits, so that new streams can be handled
+quickly.  To increase the likelihood of success, Tor tries to
+predict what circuits will be useful by choosing from among nodes
+that support the ports we have used in the recent past (by default
+one hour). Specifically, on startup Tor tries to maintain one clean
+fast exit circuit that allows connections to port 80, and at least
+two fast clean stable internal circuits in case we get a resolve
+request or hidden service request (at least three if we _run_ a
+hidden service).
+
+After that, Tor will adapt the circuits that it preemptively builds
+based on the requests it sees from the user: it tries to have two fast
+clean exit circuits available for every port seen within the past hour
+(each circuit can be adequate for many predicted ports -- it doesn't
+need two separate circuits for each port), and it tries to have the
+above internal circuits available if we've seen resolves or hidden
+service activity within the past hour. If there are 12 or more clean
+circuits open, it doesn't open more even if it has more predictions.
+
+Only stable circuits can "cover" a port that is listed in the
+LongLivedPorts config option. Similarly, hidden service requests
+to ports listed in LongLivedPorts make us create stable internal
+circuits.
+
+Note that if there are no requests from the user for an hour, Tor
+will predict no use and build no preemptive circuits.
+
+The Tor client SHOULD NOT store its list of predicted requests to a
+persistent medium.
+
+<a id="path-spec.txt-2.1.2"></a>
+
+## Clients build circuits on demand {#on-demand}
+
+Additionally, when a client request exists that no circuit (built or
+pending) might support, we create a new circuit to support the request.
+For exit connections, we pick an exit node that will handle the
+most pending requests (choosing arbitrarily among ties), launch a
+circuit to end there, and repeat until every unattached request
+might be supported by a pending or built circuit. For internal
+circuits, we pick an arbitrary acceptable path, repeating as needed.
+
+Clients consider a circuit to become "dirty" as soon as a stream is
+attached to it, or some other request is performed over the circuit.
+If a circuit has been "dirty" for at least MaxCircuitDirtiness seconds,
+new circuits may not be attached to it.
+
+In some cases we can reuse an already established circuit if it's
+clean; see ["cannibalizing circuits"](./cannibalizing-circuits.md)
+
+for details.
+
+<a id="path-spec.txt-2.1.3"></a>
+
+## Relays build circuits for testing reachability and bandwidth {#self-test}
+
+Tor relays test reachability of their ORPort once they have
+successfully built a circuit (on startup and whenever their IP address
+changes). They build an ordinary fast internal circuit with themselves
+as the last hop. As soon as any testing circuit succeeds, the Tor
+relay decides it's reachable and is willing to publish a descriptor.
+
+We launch multiple testing circuits (one at a time), until we
+have NUM_PARALLEL_TESTING_CIRC (4) such circuits open. Then we
+do a "bandwidth test" by sending a certain number of relay drop
+cells down each circuit: BandwidthRate * 10 / CELL_NETWORK_SIZE
+total cells divided across the four circuits, but never more than
+CIRCWINDOW_START (1000) cells total. This exercises both outgoing and
+incoming bandwidth, and helps to jumpstart the observed bandwidth
+(see dir-spec.txt).
+
+Tor relays also test reachability of their DirPort once they have
+established a circuit, but they use an ordinary exit circuit for
+this purpose.
+
+<a id="path-spec.txt-2.1.4"></a>
+
+## Hidden-service circuits {#onion-service}
+
+See section 4 below.
+
+<a id="path-spec.txt-2.1.5"></a>
+
+## Rate limiting of failed circuits
+
+If we fail to build a circuit N times in a X second period
+(see ["Handling failure"](./handling-failure.md)
+for how this works), we stop building circuits until the X seconds
+have elapsed.
+XXXX
+
+<a id="path-spec.txt-2.1.6"></a>
+
+## When to tear down circuits
+
+Clients should tear down circuits (in general) only when those circuits
+have no streams on them.  Additionally, clients should tear-down
+stream-less circuits only under one of the following conditions:
+
+```text
+     - The circuit has never had a stream attached, and it was created too
+       long in the past (based on CircuitsAvailableTimeout or
+       cbtlearntimeout, depending on timeout estimate status).
+
+     - The circuit is dirty (has had a stream attached), and it has been
+       dirty for at least MaxCircuitDirtiness.
+```