From 489f6185bff08278e648d944ec1a9b2d03443d21 Mon Sep 17 00:00:00 2001
From: Nick Mathewson <nickm@torproject.org>
Date: Fri, 26 Jan 2007 01:59:50 +0000
Subject: Move specification documents into new doc/spec subdirectory. 
 (Proposals, drafts, and bad ideas still remain in doc.)

svn:r9411
---
 doc/spec/path-spec.txt | 551 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 551 insertions(+)
 create mode 100644 doc/spec/path-spec.txt

(limited to 'doc/spec/path-spec.txt')

diff --git a/doc/spec/path-spec.txt b/doc/spec/path-spec.txt
new file mode 100644
index 0000000000..dc6d25331f
--- /dev/null
+++ b/doc/spec/path-spec.txt
@@ -0,0 +1,551 @@
+$Id$
+
+                           Tor Path Specification
+
+                              Roger Dingledine
+                               Nick Mathewson
+
+Note: This is an attempt to specify Tor as currently implemented.  Future
+versions of Tor will implement improved algorithms.
+
+This document tries to cover how Tor chooses to build circuits and assign
+streams to circuits.  Other implementations MAY take other approaches, but
+implementors should be aware of the anonymity and load-balancing implications
+of their choices.
+
+                    THIS SPEC ISN'T DONE OR CORRECT YET.
+
+1. General operation
+
+   Tor begins building circuits as soon as it has enough directory
+   information to do so (see section 5.1 of dir-spec.txt).  Some circuits are
+   built preemptively because we expect to need them later (for user
+   traffic), and some are built because of immediate need (for user traffic
+   that no current circuit can handle, for testing the network or our
+   reachability, and so on).
+
+   When a client application creates a new stream (by opening a SOCKS
+   connection or launching a resolve request), we attach it to an appropriate
+   open circuit if one exists, or wait if an appropriate circuit is
+   in-progress. We launch a new circuit only
+   if no current circuit can handle the request.  We rotate circuits over
+   time to avoid some profiling attacks.
+
+   To build a circuit, we choose all the nodes we want to use, and then
+   construct the circuit.  Sometimes, when we want a circuit that ends at a
+   given hop, and we have an appropriate unused circuit, we "cannibalize" the
+   existing circuit and extend it to the new terminus.
+
+   These processes are described in more detail below.
+
+   This document describes Tor's automatic path selection logic only; path
+   selection can be overridden by a controller (with the EXTENDCIRCUIT and
+   ATTACHSTREAM commands).  Paths constructed through these means may
+   violate some constraints given below.
+
+1b. Terminology
+
+   A "path" is an ordered sequence of nodes, not yet built as a circuit.
+
+   A "clean" circuit is one that has not yet been used for any traffic.
+
+   A "fast" or "stable" or "valid" node is one that has the 'Fast' or
+   'Stable' or 'Valid' flag
+   set respectively, based on our current directory information.  A "fast"
+   or "stable" circuit is one consisting only of "fast" or "stable" nodes.
+
+   In an "exit" circuit, the final node is chosen based on waiting stream
+   requests if any, and in any case it avoids nodes with exit policy of
+   "reject *:*". An "internal" circuit, on the other hand, is one where
+   the final node is chosen just like a middle node (ignoring its exit
+   policy).
+
+   A "request" is a client-side stream or DNS resolve that needs to be
+   served by a circuit.
+
+   A "pending" circuit is one that we have started to build, but which has
+   not yet completed.
+
+   A circuit or path "supports" a request if it is okay to use the
+   circuit/path to fulfill the request, according to the rules given below.
+   A circuit or path "might support" a request if some aspect of the request
+   is unknown (usually its target IP), but we believe the path probably
+   supports the request according to the rules given below.
+
+2. Building circuits
+
+2.1. When we build.
+
+2.1.1. Clients build circuits preemptively
+
+   When running as a client, Tor tries to maintain at least a certain
+   number of clean circuits, so that new streams can be handled
+   quickly.  To increase the likelihood of success, Tor tries to
+   predict what circuits will be useful by choosing from among nodes
+   that support the ports we have used in the recent past (by default
+   one hour). Specifically, on startup Tor tries to maintain one clean
+   fast exit circuit that allows connections to port 80, and at least
+   two internal circuits in case we get a resolve request or hidden
+   service request (at least three internal circuits if we _run_ a
+   hidden service).
+
+   After that, Tor will adapt the circuits that it preemptively builds
+   based on the requests it sees from the user: it tries to have a clean
+   fast exit circuit available for every port seen recently (one circuit
+   is adequate for many predicted ports -- it doesn't keep a separate
+   circuit for each port), and it tries to have the above internal
+   circuits available if we've seen resolves or hidden service activity
+   recently. If there are 12 clean circuits open, it doesn't open more
+   even if it has more predictions. Lastly, note that if there are no
+   requests from the user for an hour, Tor will predict no use and build
+   no preemptive circuits.
+
+   The Tor client SHOULD NOT store its list of predicted requests to a
+   persistent medium.
+
+2.1.2. Clients build circuits on demand
+
+   Additionally, when a client request exists that no circuit (built or
+   pending) might support, we create a new circuit to support the request.
+   We do so by picking a request arbitrarily, launching a circuit to
+   support it, and repeating until every unattached request might be
+   supported by a pending or built circuit.
+
+   For hidden service interations, we can "cannibalize" a clean internal
+   circuit if one is available, so we don't need to build those circuits
+   from scratch on demand.
+
+   We can also cannibalize clean circuits when the client asks to exit
+   at a given node -- either via mapaddress or the ".exit" notation,
+   or because the destination is running at the same location as an
+   exit node.
+
+2.1.3. Servers build circuits for testing reachability
+
+   Tor servers test reachability of their ORPort on start and whenever
+   their IP address changes.
+   XXXX
+
+2.1.4. Hidden-service circuits
+
+   See section 4 below.
+
+2.1.5. Rate limiting of failed circuits
+
+   If we fail to build a circuit N times in a X second period (see Section
+   2.3 for how this works), we stop building circuits until the X seconds
+   have elapsed.
+   XXX
+
+2.1.6. When to tear down circuits
+
+
+2.2. Path selection and constraints
+
+   We choose the path for each new circuit before we build it.  We choose the
+   exit node first, followed by the other nodes in the circuit.  All paths
+   we generate obey the following constraints:
+     - We do not choose the same router twice for the same path.
+     - We do not choose any router in the same family as another in the same
+       path.
+     - We do not choose more than one router in a given /16 subnet
+       (unless EnforceDistinctSubnets is 0).
+     - We don't choose any non-running or non-valid router unless we have
+       been configured to do so. By default, we are configured to allow
+       non-valid routers in "middle" and "rendezvous" positions.
+     - If we're using Guard nodes, the first node must be a Guard (see 5
+       below)
+     - XXXX Choosing the length
+
+   For circuits that do not need to be not "fast", when choosing among
+   multiple candidates for a path element, we choose randomly.
+
+   For "fast" circuits, we pick a given router as an exit with probability
+   proportional to its advertised bandwidth [the smaller of the 'rate' and
+   'observed' arguments to the "bandwidth" element in its descriptor].  If a
+   router's advertised bandwidth is greater than MAX_BELIEVABLE_BANDWIDTH
+   (1.5 MB/s), we clip to that value.
+
+   For non-exit positions on "fast" circuits, we pick routers as above, but
+   we weight the clipped advertised bandwidth of Exit-flagged nodes depending
+   on the fraction of bandwidth available from non-Exit nodes.  Call the
+   total clipped advertised bandwidth for Exit nodes under consideration E,
+   and the total clipped advertised bandwidth for non-Exit nodes under
+   consideration N.  If E<N/2, we do not consider Exit-flagged nodes.
+   Otherwise, we weight their bandwidth with the factor (E-N/2)/(N+E-N/2) ==
+   (2E - N)/(2E + N).  This ensures that bandwidth is evenly distributed over
+   nodes in 3-hop paths.
+
+   Additionally, we may be building circuits with one or more requests in
+   mind.  Each kind of request puts certain constraints on paths:
+
+     - All service-side introduction circuits and all rendezvous paths
+       should be Stable.
+     - All connection requests for connections that we think will need to
+       stay open a long time require Stable circuits.  Currently, Tor decides
+       this by examining the request's target port, and comparing it to a
+       list of "long-lived" ports. (Default: 21, 22, 706, 1863, 5050,
+       5190, 5222, 5223, 6667, 6697, 8300.)
+     - DNS resolves require an exit node whose exit policy is not equivalent
+       to "reject *:*".
+     - Reverse DNS resolves require a version of Tor with advertised eventdns
+       support (available in Tor 0.1.2.1-alpha-dev and later).
+     - All connection requests require an exit node whose exit policy
+       supports their target address and port (if known), or which "might
+       support it" (if the address isn't known).  See 2.2.1.
+     - Rules for Fast? XXXXX
+
+2.2.1. Choosing an exit
+
+   If we know what IP address we want to resolve, we can trivially tell
+   whether a given router will support it by simulating its declared
+   exit policy.
+
+   Because we often connect to addresses of the form hostname:port, we do not
+   always know the target IP address when we select an exit node.  In these
+   cases, we need to pick an exit node that "might support" connections to a
+   given address port with an unknown address.  An exit node "might support"
+   such a connection if any clause that accepts any connections to that port
+   precedes all clauses (if any) that reject all connections to that port.
+
+   Unless requested to do so by the user, we never choose an exit server
+   flagged as "BadExit" by more than half of the authorities who advertise
+   themselves as listing bad exits.
+
+2.2.2. User configuration
+
+   Users can alter the default behavior for path selection with configuration
+   options.
+
+   - If "ExitNodes" is provided, then every request requires an exit node on
+     the ExitNodes list.  (If a request is supported by no nodes on that list,
+     and StrictExitNodes is false, then Tor treats that request as if
+     ExitNodes were not provided.)
+
+   - "EntryNodes" and "StrictEntryNodes" behave analogously.
+
+   - If a user tries to connect to or resolve a hostname of the form
+     <target>.<servername>.exit, the request is rewritten to a request for
+     <target>, and the request is only supported by the exit whose nickname
+     or fingerprint is <servername>.
+
+2.3. Handling failure
+
+   If an attempt to extend a circuit fails (either because the first create
+   failed or a subsequent extend failed) then the circuit is torn down and is
+   no longer pending.  (XXXX really?)  Requests that might have been
+   supported by the pending circuit thus become unsupported, and a new
+   circuit needs to be constructed.
+
+   If a stream "begin" attempt fails with an EXITPOLICY error, we
+   decide that the exit node's exit policy is not correctly advertised,
+   so we treat the exit node as if it were a non-exit until we retrieve
+   a fresh descriptor for it.
+
+   XXXX
+
+3. Attaching streams to circuits
+
+   When a circuit that might support a request is built, Tor tries to attach
+   the request's stream to the circuit and sends a BEGIN or RESOLVE relay
+   cell as appropriate.  If the request completes unsuccessfully, Tor
+   considers the reason given in the CLOSE relay cell. [XXX yes, and?]
+
+
+   After a request has remained unattached for [XXXX interval?], Tor
+   abandons the attempt and signals an error to the client as appropriate
+   (e.g., by closing the SOCKS connection).
+
+   XXX Timeouts and when Tor auto-retries.
+    * What stream-end-reasons are appropriate for retrying.
+
+   If no reply to BEGIN/RESOLVE, then the stream will timeout and fail.
+
+4. Hidden-service related circuits
+
+  XXX Tracking expected hidden service use (client-side and hidserv-side)
+
+5. Guard nodes
+
+  XXX writeme
+
+6. Testing circuits
+
+
+
+
+(From some emails by arma)
+
+Right now the code exists to pick helper nodes, store our choices to
+disk, and use them for our entry nodes. But there are three topics
+to tackle before I'm comfortable turning them on by default. First,
+how to handle churn: since Tor nodes are not always up, and sometimes
+disappear forever, we need a plan for replacing missing helpers in a
+safe way. Second, we need a way to distinguish "the network is down"
+from "all my helpers are down", also in a safe way. Lastly, we need to
+examine the situation where a client picks three crummy helper nodes
+and is forever doomed to a lousy Tor experience. Here's my plan:
+
+How to handle churn.
+  - Keep track of whether you have ever actually established a
+    connection to each helper. Any helper node in your list that you've
+    never used is ok to drop immediately. Also, we don't save that
+    one to disk.
+  - If all our helpers are down, we need more helper nodes: add a new
+    one to the *end*of our list. Only remove dead ones when they have
+    been gone for a very long time (months).
+  - Pick from the first n (by default 3) helper nodes in your list
+    that are up (according to the network-statuses) and reachable
+    (according to your local firewall config).
+    - This means that order matters when writing/reading them to disk.
+
+How to deal with network down.
+  - While all helpers are down/unreachable and there are no established
+    or on-the-way testing circuits, launch a testing circuit. (Do this
+    periodically in the same way we try to establish normal circuits
+    when things are working normally.)
+    (Testing circuits are a special type of circuit, that streams won't
+    attach to by accident.)
+  - When a testing circuit succeeds, mark all helpers up and hold
+    the testing circuit open.
+  - If a connection to a helper succeeds, close all testing circuits.
+    Else mark that helper down and try another.
+  - If the last helper is marked down and we already have a testing
+    circuit established, then add the first hop of that testing circuit
+    to the end of our helper node list, close that testing circuit,
+    and go back to square one. (Actually, rather than closing the
+    testing circuit, can we get away with converting it to a normal
+    circuit and beginning to use it immediately?)
+
+How to pick non-sucky helpers.
+  - When we're picking a new helper nodes, don't use ones which aren't
+    reachable according to our local ReachableAddresses configuration.
+  (There's an attack here: if I pick my helper nodes in a very
+   restrictive environment, say "ReachableAddresses 18.0.0.0/255.0.0.0:*",
+   then somebody watching me use the network from another location will
+   guess where I first joined the network. But let's ignore it for now.)
+  - Right now we choose new helpers just like we'd choose any entry
+    node: they must be "stable" (claim >1day uptime) and "fast" (advertise
+    >10kB capacity). In 0.1.1.11-alpha, clients let dirservers define
+    "stable" and "fast" however they like, and they just believe them.
+    So the next step is to make them a function of the current network:
+    e.g. line up all the 'up' nodes in order and declare the top
+    three-quarter to be stable, fast, etc, as long as they meet some
+    minimum too.
+  - If that's not sufficient (it won't be), dirservers should introduce
+    a new status flag: in additional to "stable" and "fast", we should
+    also describe certain nodes as "entry", meaning they are suitable
+    to be chosen as a helper. The first difference would be that we'd
+    demand the top half rather than the top three-quarters. Another
+    requirement would be to look at "mean time between returning" to
+    ensure that these nodes spend most of their time available. (Up for
+    two days straight, once a month, is not good enough.)
+  - Lastly, we need a function, given our current set of helpers and a
+    directory of the rest of the network, that decides when our helper
+    set has become "too crummy" and we need to add more. For example,
+    this could be based on currently advertised capacity of each of
+    our helpers, and it would also be based on the user's preferences
+    of speed vs. security.
+
+***
+
+Lasse wrote:
+> I am a bit concerned with performance if we are to have e.g. two out of
+> three helper nodes down or unreachable. How often should Tor check if
+> they are back up and running?
+
+Right now Tor believes a threshold of directory servers when deciding
+whether each server is up. When Tor observes a server to be down
+(connection failed or building the first hop of the circuit failed),
+it marks it as down and doesn't try it again, until it gets a new
+network-status from somebody, at which point it takes a new concensus
+and marks the appropriate servers as up.
+
+According to sec 5.1 of dir-spec.txt, the client will try to fetch a new
+network-status at least every 30 minutes, and more often in certain cases.
+
+With the proposed scheme, we'll also mark all our helpers as up shortly
+after the last one is marked down.
+
+> When should there be
+> added an extra node to the helper node list? This is kind of an
+> important threshold?
+
+I agree, this is an important question. I don't have a good answer yet. Is
+it terrible, anonymity-wise, to add a new helper every time only one of
+your helpers is up? Notice that I say add rather than replace -- so you'd
+only use this fourth helper when one of your main three helpers is down,
+and if three of your four are down, you'd add a fifth, but only use it
+when two of the first four are down, etc.
+
+In fact, this may be smarter than just picking a random node for your
+testing circuit, because if your network goes up and down a lot, then
+eventually you have a chance of using any entry node in the network for
+your testing circuit.
+
+We have a design choice here. Do we only try to use helpers for the
+connections that will have streams on them (revealing our communication
+partners), or do we also want to restrict the overall set of nodes that
+we'll connect to, to discourage people from enumerating all Tor clients?
+
+I'm increasingly of the belief that we want to hide our presence too,
+based on the fact that Steven and George and others keep coming up with
+attacks that start with "Assuming we know the set of users".
+
+If so, then here's a revised "How to deal with network down" section:
+
+  1) When a helper is marked down or the helper list shrinks, and as
+     a result the total number of helpers that are either (up and
+     reachable) or (reachable but never connected to) is <= 1, then pick
+     a new helper and add it to the end of the list.
+     [We count nodes that have never been connected to, since otherwise
+      we might keep on adding new nodes before trying any of them. By
+      "reachable" I mean "is allowed by ReachableAddresses".]
+  2) When you fail to connect to a helper that has never been connected
+     to, you remove him from the list right then (and the above rule
+     might kick in).
+  3) When you succeed at connecting to a helper that you've never
+     connected to before, mark all reachable helpers earlier in the list
+     as up, and close that circuit.
+     [We close the circuit, since if the other helpers are now up, we
+      prefer to use them for circuits that will reveal communication
+      partners.]
+
+This certainly seems simpler. Are there holes that I'm missing?
+
+> If running from a laptop you will meet different firewall settings, so
+> how should Helper Nodes settings keep up with moving from an open
+> ReachableAddresses to a FascistFirewall setting after the helper nodes
+> have been selected?
+
+I added the word "reachable" to three places in the above list, and I
+believe that totally solves this question.
+
+And as a bonus, it leads to an answer to Nick's attack ("If I pick
+my helper nodes all on 18.0.0.0:*, then I move, you'll know where I
+bootstrapped") -- the answer is to pick your original three helper nodes
+without regard for reachability. Then the above algorithm will add some
+more that are reachable for you, and if you move somewhere, it's more
+likely (though not certain) that some of the originals will become useful.
+Is that smart or just complex?
+
+> What happens if(when?) performance of the third node is bad?
+
+My above solution solves this a little bit, in that we always try to
+have two nodes available. But what if they are both up but bad? I'm not
+sure. As my previous mail said, we need some function, given our list
+of helpers and the network directory, that will tell us when we're in a
+bad situation. I can imagine some simple versions of this function --
+for example, when both our working helpers are in the bottom half of
+the nodes, ranked by capacity.
+
+But the hard part: what's the remedy when we decide there's something
+to fix? Do we add a third, and now we have two crummy ones and a new
+one? Or do we drop one or both of the bad ones?
+
+Perhaps we believe the latest claim from the network-status concensus,
+and we count a helper the dirservers believe is crummy as "not worth
+trying" (equivalent to "not reachable under our current ReachableAddresses
+config") -- and then the above algorithm would end up adding good ones,
+but we'd go back to the originals if they resume being acceptable? That's
+an appealing design. I wonder if it will cause the typical Tor user to
+have a helper node list that comprises most of the network, though. I'm
+ok with this.
+
+> Another point you might want to keep in mind, is the possibility to
+> reuse the code in order to add a second layer helper node (meaning node
+> number two) to "protect" the first layer (node number one) helper nodes.
+> These nodes should be tied to each of the first layer nodes. E.g. there
+> is one helper node list, as described in your mail, for each of the
+> first layer nodes, following their create/destroy.
+
+True. Does that require us to add a fourth hop to our path length,
+since the first hop is from a limited set, the second hop is from a
+limited set, and the third hop might also be constrained because, say,
+we're asking for an unusual exit port?
+
+> Another of the things might worth adding to the to do list is
+> localization of server (helper) nodes. Making it possible to pick
+> countries/regions where you do (not) want your helper nodes located. (As
+> in "HelperNodesLocated us,!eu" etc.) I know this requires the use of
+> external data and may not be worth it, but it _could_ be integrated at
+> the directory servers only -- adding a list of node IP's and e.g. a
+> country/region code to the directory and thus reduce the overhead. (?)
+> Maybe extending the Family-term?
+
+I think we are heading towards doing path selection based on geography,
+but I don't have a good sense yet of how that will actually turn out --
+that is, with what mechanism Tor clients will learn the information they
+need. But this seems to be something that is orthogonal to the rest of
+this discussion, so I look forward to having somebody else solve it for
+us, and fitting it in when it's ready. :)
+
+> And I would like to keep an option to pick the first X helper nodes
+> myself and then let Tor extend this list if these nodes are down (like
+> EntryNodes in current code). Even if this opens up for some new types of
+> "relationship" attacks.
+
+Good idea. Here's how I'd like to name these:
+
+The "EntryNodes" config option is a list of seed helper nodes. When we
+read EntryNodes, any node listed in entrynodes but not in the current
+helper node list gets *pre*pended to the helper node list.
+
+The "NumEntryNodes" config option (currently called NumHelperNodes)
+specifies the number of up, reachable, good-enough helper nodes that
+will make up the pool of possible choices for first hop, counted from
+the front of the helper node list until we have enough.
+
+The "UseEntryNodes" config option (currently called UseHelperNodes)
+tells us to turn on all this helper node behavior. If you set EntryNodes,
+then this option is implied.
+
+The "StrictEntryNodes" config option, provided for backward compatibility
+and for debugging, means a) we replace the helper node list with the
+current EntryNodes list, and b) whenever we would do an operation that
+alters the helper node list, we don't. (Yes, this means that if all the
+helper nodes are down, we lose until we mark them up again. But this is
+how it behaves now.)
+
+> I am sure my next point has been asked before, but what about testing
+> the current speed of the connections when looking for new helper nodes,
+> not only testing the connectivity? I know this might contribute to a lot
+> of overhead in the network, but if this only occur e.g. when using
+> helper nodes as a Hidden Service it might not have that large an impact,
+> but could help availability for the services?
+
+If we're just going to be testing them when we're first picking them,
+then it seems we can do the same thing by letting the directory servers
+test them. This has the added benefit that all the (behaving) clients
+use the same data, so they don't end up partitioned by a node that
+(for example) performs selectively for his victims.
+
+Another idea would be to periodically keep track of what speeds you get
+through your helpers, and make decisions from this. The reason we haven't
+done this yet is because there are a lot of variables -- perhaps the
+web site is slow, perhaps some other node in the path is slow, perhaps
+your local network is slow briefly, perhaps you got unlucky, etc.  I
+believe that over time (assuming the user has roughly the same browsing
+habits) all of these would average out and you'd get a usable answer,
+but I don't have a good sense of how long it would take to converge,
+so I don't know whether this would be worthwhile.
+
+> BTW. I feel confortable with all the terms helper/entry/contact nodes,
+> but I think you (the developers) should just pick one and stay with it
+> to avoid confusion.
+
+I think I'm going to try to co-opt the term 'Entry' node for this
+purpose. We're going to have to keep referring to helper nodes for the
+research community for a while though, so they realize that Tor does
+more than just let users ask for certain entry nodes.
+
+
+
+============================================================
+Some stuff that worries me about entry guards. 2006 Jun, Nickm.
+
+1. It is unlikely for two users to have the same set of entry guards.
+
+2. Observing a user is sufficient to learn its entry guards.
+
+3. So, as we move around, we leak our 
-- 
cgit v1.2.3-54-g00ecf