diff options
Diffstat (limited to 'doc/spec/path-spec.txt')
-rw-r--r-- | doc/spec/path-spec.txt | 423 |
1 files changed, 0 insertions, 423 deletions
diff --git a/doc/spec/path-spec.txt b/doc/spec/path-spec.txt deleted file mode 100644 index dceb21dad7..0000000000 --- a/doc/spec/path-spec.txt +++ /dev/null @@ -1,423 +0,0 @@ -$Id$ - - Tor Path Specification - - Roger Dingledine - Nick Mathewson - -Note: This is an attempt to specify Tor as currently implemented. Future -versions of Tor will implement improved algorithms. - -This document tries to cover how Tor chooses to build circuits and assign -streams to circuits. Other implementations MAY take other approaches, but -implementors should be aware of the anonymity and load-balancing implications -of their choices. - - THIS SPEC ISN'T DONE YET. - -1. General operation - - Tor begins building circuits as soon as it has enough directory - information to do so (see section 5 of dir-spec.txt). Some circuits are - built preemptively because we expect to need them later (for user - traffic), and some are built because of immediate need (for user traffic - that no current circuit can handle, for testing the network or our - reachability, and so on). - - When a client application creates a new stream (by opening a SOCKS - connection or launching a resolve request), we attach it to an appropriate - open circuit if one exists, or wait if an appropriate circuit is - in-progress. We launch a new circuit only - if no current circuit can handle the request. We rotate circuits over - time to avoid some profiling attacks. - - To build a circuit, we choose all the nodes we want to use, and then - construct the circuit. Sometimes, when we want a circuit that ends at a - given hop, and we have an appropriate unused circuit, we "cannibalize" the - existing circuit and extend it to the new terminus. - - These processes are described in more detail below. - - This document describes Tor's automatic path selection logic only; path - selection can be overridden by a controller (with the EXTENDCIRCUIT and - ATTACHSTREAM commands). Paths constructed through these means may - violate some constraints given below. - -1.1. Terminology - - A "path" is an ordered sequence of nodes, not yet built as a circuit. - - A "clean" circuit is one that has not yet been used for any traffic. - - A "fast" or "stable" or "valid" node is one that has the 'Fast' or - 'Stable' or 'Valid' flag - set respectively, based on our current directory information. A "fast" - or "stable" circuit is one consisting only of "fast" or "stable" nodes. - - In an "exit" circuit, the final node is chosen based on waiting stream - requests if any, and in any case it avoids nodes with exit policy of - "reject *:*". An "internal" circuit, on the other hand, is one where - the final node is chosen just like a middle node (ignoring its exit - policy). - - A "request" is a client-side stream or DNS resolve that needs to be - served by a circuit. - - A "pending" circuit is one that we have started to build, but which has - not yet completed. - - A circuit or path "supports" a request if it is okay to use the - circuit/path to fulfill the request, according to the rules given below. - A circuit or path "might support" a request if some aspect of the request - is unknown (usually its target IP), but we believe the path probably - supports the request according to the rules given below. - -2. Building circuits - -2.1. When we build - -2.1.1. Clients build circuits preemptively - - When running as a client, Tor tries to maintain at least a certain - number of clean circuits, so that new streams can be handled - quickly. To increase the likelihood of success, Tor tries to - predict what circuits will be useful by choosing from among nodes - that support the ports we have used in the recent past (by default - one hour). Specifically, on startup Tor tries to maintain one clean - fast exit circuit that allows connections to port 80, and at least - two fast clean stable internal circuits in case we get a resolve - request or hidden service request (at least three if we _run_ a - hidden service). - - After that, Tor will adapt the circuits that it preemptively builds - based on the requests it sees from the user: it tries to have two fast - clean exit circuits available for every port seen within the past hour - (each circuit can be adequate for many predicted ports -- it doesn't - need two separate circuits for each port), and it tries to have the - above internal circuits available if we've seen resolves or hidden - service activity within the past hour. If there are 12 or more clean - circuits open, it doesn't open more even if it has more predictions. - - Only stable circuits can "cover" a port that is listed in the - LongLivedPorts config option. Similarly, hidden service requests - to ports listed in LongLivedPorts make us create stable internal - circuits. - - Note that if there are no requests from the user for an hour, Tor - will predict no use and build no preemptive circuits. - - The Tor client SHOULD NOT store its list of predicted requests to a - persistent medium. - -2.1.2. Clients build circuits on demand - - Additionally, when a client request exists that no circuit (built or - pending) might support, we create a new circuit to support the request. - For exit connections, we pick an exit node that will handle the - most pending requests (choosing arbitrarily among ties), launch a - circuit to end there, and repeat until every unattached request - might be supported by a pending or built circuit. For internal - circuits, we pick an arbitrary acceptable path, repeating as needed. - - In some cases we can reuse an already established circuit if it's - clean; see Section 2.3 (cannibalizing circuits) for details. - -2.1.3. Servers build circuits for testing reachability and bandwidth - - Tor servers test reachability of their ORPort once they have - successfully built a circuit (on start and whenever their IP address - changes). They build an ordinary fast internal circuit with themselves - as the last hop. As soon as any testing circuit succeeds, the Tor - server decides it's reachable and is willing to publish a descriptor. - - We launch multiple testing circuits (one at a time), until we - have NUM_PARALLEL_TESTING_CIRC (4) such circuits open. Then we - do a "bandwidth test" by sending a certain number of relay drop - cells down each circuit: BandwidthRate * 10 / CELL_NETWORK_SIZE - total cells divided across the four circuits, but never more than - CIRCWINDOW_START (1000) cells total. This exercises both outgoing and - incoming bandwidth, and helps to jumpstart the observed bandwidth - (see dir-spec.txt). - - Tor servers also test reachability of their DirPort once they have - established a circuit, but they use an ordinary exit circuit for - this purpose. - -2.1.4. Hidden-service circuits - - See section 4 below. - -2.1.5. Rate limiting of failed circuits - - If we fail to build a circuit N times in a X second period (see Section - 2.3 for how this works), we stop building circuits until the X seconds - have elapsed. - XXXX - -2.1.6. When to tear down circuits - - XXXX - -2.2. Path selection and constraints - - We choose the path for each new circuit before we build it. We choose the - exit node first, followed by the other nodes in the circuit. All paths - we generate obey the following constraints: - - We do not choose the same router twice for the same path. - - We do not choose any router in the same family as another in the same - path. - - We do not choose more than one router in a given /16 subnet - (unless EnforceDistinctSubnets is 0). - - We don't choose any non-running or non-valid router unless we have - been configured to do so. By default, we are configured to allow - non-valid routers in "middle" and "rendezvous" positions. - - If we're using Guard nodes, the first node must be a Guard (see 5 - below) - - XXXX Choosing the length - - For circuits that do not need to be "fast", when choosing among - multiple candidates for a path element, we choose randomly. - - For "fast" circuits, we pick a given router as an exit with probability - proportional to its advertised bandwidth [the smaller of the 'rate' and - 'observed' arguments to the "bandwidth" element in its descriptor]. If a - router's advertised bandwidth is greater than MAX_BELIEVABLE_BANDWIDTH - (currently 10 MB/s), we clip to that value. - - For non-exit positions on "fast" circuits, we pick routers as above, but - we weight the clipped advertised bandwidth of Exit-flagged nodes depending - on the fraction of bandwidth available from non-Exit nodes. Call the - total clipped advertised bandwidth for Exit nodes under consideration E, - and the total clipped advertised bandwidth for all nodes under - consideration T. If E<T/3, we do not consider Exit-flagged nodes. - Otherwise, we weight their bandwidth with the factor (E-T/3)/E. This - ensures that bandwidth is evenly distributed over nodes in 3-hop paths. - - Similarly, guard nodes are weighted by the factor (G-T/3)/G, and not - considered for non-guard positions if this value is less than 0. - - Additionally, we may be building circuits with one or more requests in - mind. Each kind of request puts certain constraints on paths: - - - All service-side introduction circuits and all rendezvous paths - should be Stable. - - All connection requests for connections that we think will need to - stay open a long time require Stable circuits. Currently, Tor decides - this by examining the request's target port, and comparing it to a - list of "long-lived" ports. (Default: 21, 22, 706, 1863, 5050, - 5190, 5222, 5223, 6667, 6697, 8300.) - - DNS resolves require an exit node whose exit policy is not equivalent - to "reject *:*". - - Reverse DNS resolves require a version of Tor with advertised eventdns - support (available in Tor 0.1.2.1-alpha-dev and later). - - All connection requests require an exit node whose exit policy - supports their target address and port (if known), or which "might - support it" (if the address isn't known). See 2.2.1. - - Rules for Fast? XXXXX - -2.2.1. Choosing an exit - - If we know what IP address we want to connect to or resolve, we can - trivially tell whether a given router will support it by simulating - its declared exit policy. - - Because we often connect to addresses of the form hostname:port, we do not - always know the target IP address when we select an exit node. In these - cases, we need to pick an exit node that "might support" connections to a - given address port with an unknown address. An exit node "might support" - such a connection if any clause that accepts any connections to that port - precedes all clauses (if any) that reject all connections to that port. - - Unless requested to do so by the user, we never choose an exit server - flagged as "BadExit" by more than half of the authorities who advertise - themselves as listing bad exits. - -2.2.2. User configuration - - Users can alter the default behavior for path selection with configuration - options. - - - If "ExitNodes" is provided, then every request requires an exit node on - the ExitNodes list. (If a request is supported by no nodes on that list, - and StrictExitNodes is false, then Tor treats that request as if - ExitNodes were not provided.) - - - "EntryNodes" and "StrictEntryNodes" behave analogously. - - - If a user tries to connect to or resolve a hostname of the form - <target>.<servername>.exit, the request is rewritten to a request for - <target>, and the request is only supported by the exit whose nickname - or fingerprint is <servername>. - -2.3. Cannibalizing circuits - - If we need a circuit and have a clean one already established, in - some cases we can adapt the clean circuit for our new - purpose. Specifically, - - For hidden service interactions, we can "cannibalize" a clean internal - circuit if one is available, so we don't need to build those circuits - from scratch on demand. - - We can also cannibalize clean circuits when the client asks to exit - at a given node -- either via the ".exit" notation or because the - destination is running at the same location as an exit node. - - -2.4. Handling failure - - If an attempt to extend a circuit fails (either because the first create - failed or a subsequent extend failed) then the circuit is torn down and is - no longer pending. (XXXX really?) Requests that might have been - supported by the pending circuit thus become unsupported, and a new - circuit needs to be constructed. - - If a stream "begin" attempt fails with an EXITPOLICY error, we - decide that the exit node's exit policy is not correctly advertised, - so we treat the exit node as if it were a non-exit until we retrieve - a fresh descriptor for it. - - XXXX - -3. Attaching streams to circuits - - When a circuit that might support a request is built, Tor tries to attach - the request's stream to the circuit and sends a BEGIN, BEGIN_DIR, - or RESOLVE relay - cell as appropriate. If the request completes unsuccessfully, Tor - considers the reason given in the CLOSE relay cell. [XXX yes, and?] - - - After a request has remained unattached for SocksTimeout (2 minutes - by default), Tor abandons the attempt and signals an error to the - client as appropriate (e.g., by closing the SOCKS connection). - - XXX Timeouts and when Tor auto-retries. - * What stream-end-reasons are appropriate for retrying. - - If no reply to BEGIN/RESOLVE, then the stream will timeout and fail. - -4. Hidden-service related circuits - - XXX Tracking expected hidden service use (client-side and hidserv-side) - -5. Guard nodes - - We use Guard nodes (also called "helper nodes" in the literature) to - prevent certain profiling attacks. Here's the risk: if we choose entry and - exit nodes at random, and an attacker controls C out of N servers - (ignoring advertised bandwidth), then the - attacker will control the entry and exit node of any given circuit with - probability (C/N)^2. But as we make many different circuits over time, - then the probability that the attacker will see a sample of about (C/N)^2 - of our traffic goes to 1. Since statistical sampling works, the attacker - can be sure of learning a profile of our behavior. - - If, on the other hand, we picked an entry node and held it fixed, we would - have probability C/N of choosing a bad entry and being profiled, and - probability (N-C)/N of choosing a good entry and not being profiled. - - When guard nodes are enabled, Tor maintains an ordered list of entry nodes - as our chosen guards, and stores this list persistently to disk. If a Guard - node becomes unusable, rather than replacing it, Tor adds new guards to the - end of the list. When choosing the first hop of a circuit, Tor - chooses at - random from among the first NumEntryGuards (default 3) usable guards on the - list. If there are not at least 2 usable guards on the list, Tor adds - routers until there are, or until there are no more usable routers to add. - - A guard is unusable if any of the following hold: - - it is not marked as a Guard by the networkstatuses, - - it is not marked Valid (and the user hasn't set AllowInvalid entry) - - it is not marked Running - - Tor couldn't reach it the last time it tried to connect - - A guard is unusable for a particular circuit if any of the rules for path - selection in 2.2 are not met. In particular, if the circuit is "fast" - and the guard is not Fast, or if the circuit is "stable" and the guard is - not Stable, or if the guard has already been chosen as the exit node in - that circuit, Tor can't use it as a guard node for that circuit. - - If the guard is excluded because of its status in the networkstatuses for - over 30 days, Tor removes it from the list entirely, preserving order. - - If Tor fails to connect to an otherwise usable guard, it retries - periodically: every hour for six hours, every 4 hours for 3 days, every - 18 hours for a week, and every 36 hours thereafter. Additionally, Tor - retries unreachable guards the first time it adds a new guard to the list, - since it is possible that the old guards were only marked as unreachable - because the network was unreachable or down. - - Tor does not add a guard persistently to the list until the first time we - have connected to it successfully. - -6. Router descriptor purposes - - There are currently three "purposes" supported for router descriptors: - general, controller, and bridge. Most descriptors are of type general - -- these are the ones listed in the consensus, and the ones fetched - and used in normal cases. - - Controller-purpose descriptors are those delivered by the controller - and labelled as such: they will be kept around (and expire like - normal descriptors), and they can be used by the controller in its - CIRCUITEXTEND commands. Otherwise they are ignored by Tor when it - chooses paths. - - Bridge-purpose descriptors are for routers that are used as bridges. See - doc/design-paper/blocking.pdf for more design explanation, or proposal - 125 for specific details. Currently bridge descriptors are used in place - of normal entry guards, for Tor clients that have UseBridges enabled. - - -X. Old notes - -X.1. Do we actually do this? - -How to deal with network down. - - While all helpers are down/unreachable and there are no established - or on-the-way testing circuits, launch a testing circuit. (Do this - periodically in the same way we try to establish normal circuits - when things are working normally.) - (Testing circuits are a special type of circuit, that streams won't - attach to by accident.) - - When a testing circuit succeeds, mark all helpers up and hold - the testing circuit open. - - If a connection to a helper succeeds, close all testing circuits. - Else mark that helper down and try another. - - If the last helper is marked down and we already have a testing - circuit established, then add the first hop of that testing circuit - to the end of our helper node list, close that testing circuit, - and go back to square one. (Actually, rather than closing the - testing circuit, can we get away with converting it to a normal - circuit and beginning to use it immediately?) - - [Do we actually do any of the above? If so, let's spec it. If not, let's - remove it. -NM] - -X.2. A thing we could do to deal with reachability. - -And as a bonus, it leads to an answer to Nick's attack ("If I pick -my helper nodes all on 18.0.0.0:*, then I move, you'll know where I -bootstrapped") -- the answer is to pick your original three helper nodes -without regard for reachability. Then the above algorithm will add some -more that are reachable for you, and if you move somewhere, it's more -likely (though not certain) that some of the originals will become useful. -Is that smart or just complex? - -X.3. Some stuff that worries me about entry guards. 2006 Jun, Nickm. - - It is unlikely for two users to have the same set of entry guards. - Observing a user is sufficient to learn its entry guards. So, as we move - around, entry guards make us linkable. If we want to change guards when - our location (IP? subnet?) changes, we have two bad options. We could - - Drop the old guards. But if we go back to our old location, - we'll not use our old guards. For a laptop that sometimes gets used - from work and sometimes from home, this is pretty fatal. - - Remember the old guards as associated with the old location, and use - them again if we ever go back to the old location. This would be - nasty, since it would force us to record where we've been. - - [Do we do any of this now? If not, this should move into 099-misc or - 098-todo. -NM] - |