From 489f6185bff08278e648d944ec1a9b2d03443d21 Mon Sep 17 00:00:00 2001 From: Nick Mathewson Date: Fri, 26 Jan 2007 01:59:50 +0000 Subject: Move specification documents into new doc/spec subdirectory. (Proposals, drafts, and bad ideas still remain in doc.) svn:r9411 --- doc/spec/path-spec.txt | 551 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 551 insertions(+) create mode 100644 doc/spec/path-spec.txt (limited to 'doc/spec/path-spec.txt') diff --git a/doc/spec/path-spec.txt b/doc/spec/path-spec.txt new file mode 100644 index 0000000000..dc6d25331f --- /dev/null +++ b/doc/spec/path-spec.txt @@ -0,0 +1,551 @@ +$Id$ + + Tor Path Specification + + Roger Dingledine + Nick Mathewson + +Note: This is an attempt to specify Tor as currently implemented. Future +versions of Tor will implement improved algorithms. + +This document tries to cover how Tor chooses to build circuits and assign +streams to circuits. Other implementations MAY take other approaches, but +implementors should be aware of the anonymity and load-balancing implications +of their choices. + + THIS SPEC ISN'T DONE OR CORRECT YET. + +1. General operation + + Tor begins building circuits as soon as it has enough directory + information to do so (see section 5.1 of dir-spec.txt). Some circuits are + built preemptively because we expect to need them later (for user + traffic), and some are built because of immediate need (for user traffic + that no current circuit can handle, for testing the network or our + reachability, and so on). + + When a client application creates a new stream (by opening a SOCKS + connection or launching a resolve request), we attach it to an appropriate + open circuit if one exists, or wait if an appropriate circuit is + in-progress. We launch a new circuit only + if no current circuit can handle the request. We rotate circuits over + time to avoid some profiling attacks. + + To build a circuit, we choose all the nodes we want to use, and then + construct the circuit. Sometimes, when we want a circuit that ends at a + given hop, and we have an appropriate unused circuit, we "cannibalize" the + existing circuit and extend it to the new terminus. + + These processes are described in more detail below. + + This document describes Tor's automatic path selection logic only; path + selection can be overridden by a controller (with the EXTENDCIRCUIT and + ATTACHSTREAM commands). Paths constructed through these means may + violate some constraints given below. + +1b. Terminology + + A "path" is an ordered sequence of nodes, not yet built as a circuit. + + A "clean" circuit is one that has not yet been used for any traffic. + + A "fast" or "stable" or "valid" node is one that has the 'Fast' or + 'Stable' or 'Valid' flag + set respectively, based on our current directory information. A "fast" + or "stable" circuit is one consisting only of "fast" or "stable" nodes. + + In an "exit" circuit, the final node is chosen based on waiting stream + requests if any, and in any case it avoids nodes with exit policy of + "reject *:*". An "internal" circuit, on the other hand, is one where + the final node is chosen just like a middle node (ignoring its exit + policy). + + A "request" is a client-side stream or DNS resolve that needs to be + served by a circuit. + + A "pending" circuit is one that we have started to build, but which has + not yet completed. + + A circuit or path "supports" a request if it is okay to use the + circuit/path to fulfill the request, according to the rules given below. + A circuit or path "might support" a request if some aspect of the request + is unknown (usually its target IP), but we believe the path probably + supports the request according to the rules given below. + +2. Building circuits + +2.1. When we build. + +2.1.1. Clients build circuits preemptively + + When running as a client, Tor tries to maintain at least a certain + number of clean circuits, so that new streams can be handled + quickly. To increase the likelihood of success, Tor tries to + predict what circuits will be useful by choosing from among nodes + that support the ports we have used in the recent past (by default + one hour). Specifically, on startup Tor tries to maintain one clean + fast exit circuit that allows connections to port 80, and at least + two internal circuits in case we get a resolve request or hidden + service request (at least three internal circuits if we _run_ a + hidden service). + + After that, Tor will adapt the circuits that it preemptively builds + based on the requests it sees from the user: it tries to have a clean + fast exit circuit available for every port seen recently (one circuit + is adequate for many predicted ports -- it doesn't keep a separate + circuit for each port), and it tries to have the above internal + circuits available if we've seen resolves or hidden service activity + recently. If there are 12 clean circuits open, it doesn't open more + even if it has more predictions. Lastly, note that if there are no + requests from the user for an hour, Tor will predict no use and build + no preemptive circuits. + + The Tor client SHOULD NOT store its list of predicted requests to a + persistent medium. + +2.1.2. Clients build circuits on demand + + Additionally, when a client request exists that no circuit (built or + pending) might support, we create a new circuit to support the request. + We do so by picking a request arbitrarily, launching a circuit to + support it, and repeating until every unattached request might be + supported by a pending or built circuit. + + For hidden service interations, we can "cannibalize" a clean internal + circuit if one is available, so we don't need to build those circuits + from scratch on demand. + + We can also cannibalize clean circuits when the client asks to exit + at a given node -- either via mapaddress or the ".exit" notation, + or because the destination is running at the same location as an + exit node. + +2.1.3. Servers build circuits for testing reachability + + Tor servers test reachability of their ORPort on start and whenever + their IP address changes. + XXXX + +2.1.4. Hidden-service circuits + + See section 4 below. + +2.1.5. Rate limiting of failed circuits + + If we fail to build a circuit N times in a X second period (see Section + 2.3 for how this works), we stop building circuits until the X seconds + have elapsed. + XXX + +2.1.6. When to tear down circuits + + +2.2. Path selection and constraints + + We choose the path for each new circuit before we build it. We choose the + exit node first, followed by the other nodes in the circuit. All paths + we generate obey the following constraints: + - We do not choose the same router twice for the same path. + - We do not choose any router in the same family as another in the same + path. + - We do not choose more than one router in a given /16 subnet + (unless EnforceDistinctSubnets is 0). + - We don't choose any non-running or non-valid router unless we have + been configured to do so. By default, we are configured to allow + non-valid routers in "middle" and "rendezvous" positions. + - If we're using Guard nodes, the first node must be a Guard (see 5 + below) + - XXXX Choosing the length + + For circuits that do not need to be not "fast", when choosing among + multiple candidates for a path element, we choose randomly. + + For "fast" circuits, we pick a given router as an exit with probability + proportional to its advertised bandwidth [the smaller of the 'rate' and + 'observed' arguments to the "bandwidth" element in its descriptor]. If a + router's advertised bandwidth is greater than MAX_BELIEVABLE_BANDWIDTH + (1.5 MB/s), we clip to that value. + + For non-exit positions on "fast" circuits, we pick routers as above, but + we weight the clipped advertised bandwidth of Exit-flagged nodes depending + on the fraction of bandwidth available from non-Exit nodes. Call the + total clipped advertised bandwidth for Exit nodes under consideration E, + and the total clipped advertised bandwidth for non-Exit nodes under + consideration N. If E..exit, the request is rewritten to a request for + , and the request is only supported by the exit whose nickname + or fingerprint is . + +2.3. Handling failure + + If an attempt to extend a circuit fails (either because the first create + failed or a subsequent extend failed) then the circuit is torn down and is + no longer pending. (XXXX really?) Requests that might have been + supported by the pending circuit thus become unsupported, and a new + circuit needs to be constructed. + + If a stream "begin" attempt fails with an EXITPOLICY error, we + decide that the exit node's exit policy is not correctly advertised, + so we treat the exit node as if it were a non-exit until we retrieve + a fresh descriptor for it. + + XXXX + +3. Attaching streams to circuits + + When a circuit that might support a request is built, Tor tries to attach + the request's stream to the circuit and sends a BEGIN or RESOLVE relay + cell as appropriate. If the request completes unsuccessfully, Tor + considers the reason given in the CLOSE relay cell. [XXX yes, and?] + + + After a request has remained unattached for [XXXX interval?], Tor + abandons the attempt and signals an error to the client as appropriate + (e.g., by closing the SOCKS connection). + + XXX Timeouts and when Tor auto-retries. + * What stream-end-reasons are appropriate for retrying. + + If no reply to BEGIN/RESOLVE, then the stream will timeout and fail. + +4. Hidden-service related circuits + + XXX Tracking expected hidden service use (client-side and hidserv-side) + +5. Guard nodes + + XXX writeme + +6. Testing circuits + + + + +(From some emails by arma) + +Right now the code exists to pick helper nodes, store our choices to +disk, and use them for our entry nodes. But there are three topics +to tackle before I'm comfortable turning them on by default. First, +how to handle churn: since Tor nodes are not always up, and sometimes +disappear forever, we need a plan for replacing missing helpers in a +safe way. Second, we need a way to distinguish "the network is down" +from "all my helpers are down", also in a safe way. Lastly, we need to +examine the situation where a client picks three crummy helper nodes +and is forever doomed to a lousy Tor experience. Here's my plan: + +How to handle churn. + - Keep track of whether you have ever actually established a + connection to each helper. Any helper node in your list that you've + never used is ok to drop immediately. Also, we don't save that + one to disk. + - If all our helpers are down, we need more helper nodes: add a new + one to the *end*of our list. Only remove dead ones when they have + been gone for a very long time (months). + - Pick from the first n (by default 3) helper nodes in your list + that are up (according to the network-statuses) and reachable + (according to your local firewall config). + - This means that order matters when writing/reading them to disk. + +How to deal with network down. + - While all helpers are down/unreachable and there are no established + or on-the-way testing circuits, launch a testing circuit. (Do this + periodically in the same way we try to establish normal circuits + when things are working normally.) + (Testing circuits are a special type of circuit, that streams won't + attach to by accident.) + - When a testing circuit succeeds, mark all helpers up and hold + the testing circuit open. + - If a connection to a helper succeeds, close all testing circuits. + Else mark that helper down and try another. + - If the last helper is marked down and we already have a testing + circuit established, then add the first hop of that testing circuit + to the end of our helper node list, close that testing circuit, + and go back to square one. (Actually, rather than closing the + testing circuit, can we get away with converting it to a normal + circuit and beginning to use it immediately?) + +How to pick non-sucky helpers. + - When we're picking a new helper nodes, don't use ones which aren't + reachable according to our local ReachableAddresses configuration. + (There's an attack here: if I pick my helper nodes in a very + restrictive environment, say "ReachableAddresses 18.0.0.0/255.0.0.0:*", + then somebody watching me use the network from another location will + guess where I first joined the network. But let's ignore it for now.) + - Right now we choose new helpers just like we'd choose any entry + node: they must be "stable" (claim >1day uptime) and "fast" (advertise + >10kB capacity). In 0.1.1.11-alpha, clients let dirservers define + "stable" and "fast" however they like, and they just believe them. + So the next step is to make them a function of the current network: + e.g. line up all the 'up' nodes in order and declare the top + three-quarter to be stable, fast, etc, as long as they meet some + minimum too. + - If that's not sufficient (it won't be), dirservers should introduce + a new status flag: in additional to "stable" and "fast", we should + also describe certain nodes as "entry", meaning they are suitable + to be chosen as a helper. The first difference would be that we'd + demand the top half rather than the top three-quarters. Another + requirement would be to look at "mean time between returning" to + ensure that these nodes spend most of their time available. (Up for + two days straight, once a month, is not good enough.) + - Lastly, we need a function, given our current set of helpers and a + directory of the rest of the network, that decides when our helper + set has become "too crummy" and we need to add more. For example, + this could be based on currently advertised capacity of each of + our helpers, and it would also be based on the user's preferences + of speed vs. security. + +*** + +Lasse wrote: +> I am a bit concerned with performance if we are to have e.g. two out of +> three helper nodes down or unreachable. How often should Tor check if +> they are back up and running? + +Right now Tor believes a threshold of directory servers when deciding +whether each server is up. When Tor observes a server to be down +(connection failed or building the first hop of the circuit failed), +it marks it as down and doesn't try it again, until it gets a new +network-status from somebody, at which point it takes a new concensus +and marks the appropriate servers as up. + +According to sec 5.1 of dir-spec.txt, the client will try to fetch a new +network-status at least every 30 minutes, and more often in certain cases. + +With the proposed scheme, we'll also mark all our helpers as up shortly +after the last one is marked down. + +> When should there be +> added an extra node to the helper node list? This is kind of an +> important threshold? + +I agree, this is an important question. I don't have a good answer yet. Is +it terrible, anonymity-wise, to add a new helper every time only one of +your helpers is up? Notice that I say add rather than replace -- so you'd +only use this fourth helper when one of your main three helpers is down, +and if three of your four are down, you'd add a fifth, but only use it +when two of the first four are down, etc. + +In fact, this may be smarter than just picking a random node for your +testing circuit, because if your network goes up and down a lot, then +eventually you have a chance of using any entry node in the network for +your testing circuit. + +We have a design choice here. Do we only try to use helpers for the +connections that will have streams on them (revealing our communication +partners), or do we also want to restrict the overall set of nodes that +we'll connect to, to discourage people from enumerating all Tor clients? + +I'm increasingly of the belief that we want to hide our presence too, +based on the fact that Steven and George and others keep coming up with +attacks that start with "Assuming we know the set of users". + +If so, then here's a revised "How to deal with network down" section: + + 1) When a helper is marked down or the helper list shrinks, and as + a result the total number of helpers that are either (up and + reachable) or (reachable but never connected to) is <= 1, then pick + a new helper and add it to the end of the list. + [We count nodes that have never been connected to, since otherwise + we might keep on adding new nodes before trying any of them. By + "reachable" I mean "is allowed by ReachableAddresses".] + 2) When you fail to connect to a helper that has never been connected + to, you remove him from the list right then (and the above rule + might kick in). + 3) When you succeed at connecting to a helper that you've never + connected to before, mark all reachable helpers earlier in the list + as up, and close that circuit. + [We close the circuit, since if the other helpers are now up, we + prefer to use them for circuits that will reveal communication + partners.] + +This certainly seems simpler. Are there holes that I'm missing? + +> If running from a laptop you will meet different firewall settings, so +> how should Helper Nodes settings keep up with moving from an open +> ReachableAddresses to a FascistFirewall setting after the helper nodes +> have been selected? + +I added the word "reachable" to three places in the above list, and I +believe that totally solves this question. + +And as a bonus, it leads to an answer to Nick's attack ("If I pick +my helper nodes all on 18.0.0.0:*, then I move, you'll know where I +bootstrapped") -- the answer is to pick your original three helper nodes +without regard for reachability. Then the above algorithm will add some +more that are reachable for you, and if you move somewhere, it's more +likely (though not certain) that some of the originals will become useful. +Is that smart or just complex? + +> What happens if(when?) performance of the third node is bad? + +My above solution solves this a little bit, in that we always try to +have two nodes available. But what if they are both up but bad? I'm not +sure. As my previous mail said, we need some function, given our list +of helpers and the network directory, that will tell us when we're in a +bad situation. I can imagine some simple versions of this function -- +for example, when both our working helpers are in the bottom half of +the nodes, ranked by capacity. + +But the hard part: what's the remedy when we decide there's something +to fix? Do we add a third, and now we have two crummy ones and a new +one? Or do we drop one or both of the bad ones? + +Perhaps we believe the latest claim from the network-status concensus, +and we count a helper the dirservers believe is crummy as "not worth +trying" (equivalent to "not reachable under our current ReachableAddresses +config") -- and then the above algorithm would end up adding good ones, +but we'd go back to the originals if they resume being acceptable? That's +an appealing design. I wonder if it will cause the typical Tor user to +have a helper node list that comprises most of the network, though. I'm +ok with this. + +> Another point you might want to keep in mind, is the possibility to +> reuse the code in order to add a second layer helper node (meaning node +> number two) to "protect" the first layer (node number one) helper nodes. +> These nodes should be tied to each of the first layer nodes. E.g. there +> is one helper node list, as described in your mail, for each of the +> first layer nodes, following their create/destroy. + +True. Does that require us to add a fourth hop to our path length, +since the first hop is from a limited set, the second hop is from a +limited set, and the third hop might also be constrained because, say, +we're asking for an unusual exit port? + +> Another of the things might worth adding to the to do list is +> localization of server (helper) nodes. Making it possible to pick +> countries/regions where you do (not) want your helper nodes located. (As +> in "HelperNodesLocated us,!eu" etc.) I know this requires the use of +> external data and may not be worth it, but it _could_ be integrated at +> the directory servers only -- adding a list of node IP's and e.g. a +> country/region code to the directory and thus reduce the overhead. (?) +> Maybe extending the Family-term? + +I think we are heading towards doing path selection based on geography, +but I don't have a good sense yet of how that will actually turn out -- +that is, with what mechanism Tor clients will learn the information they +need. But this seems to be something that is orthogonal to the rest of +this discussion, so I look forward to having somebody else solve it for +us, and fitting it in when it's ready. :) + +> And I would like to keep an option to pick the first X helper nodes +> myself and then let Tor extend this list if these nodes are down (like +> EntryNodes in current code). Even if this opens up for some new types of +> "relationship" attacks. + +Good idea. Here's how I'd like to name these: + +The "EntryNodes" config option is a list of seed helper nodes. When we +read EntryNodes, any node listed in entrynodes but not in the current +helper node list gets *pre*pended to the helper node list. + +The "NumEntryNodes" config option (currently called NumHelperNodes) +specifies the number of up, reachable, good-enough helper nodes that +will make up the pool of possible choices for first hop, counted from +the front of the helper node list until we have enough. + +The "UseEntryNodes" config option (currently called UseHelperNodes) +tells us to turn on all this helper node behavior. If you set EntryNodes, +then this option is implied. + +The "StrictEntryNodes" config option, provided for backward compatibility +and for debugging, means a) we replace the helper node list with the +current EntryNodes list, and b) whenever we would do an operation that +alters the helper node list, we don't. (Yes, this means that if all the +helper nodes are down, we lose until we mark them up again. But this is +how it behaves now.) + +> I am sure my next point has been asked before, but what about testing +> the current speed of the connections when looking for new helper nodes, +> not only testing the connectivity? I know this might contribute to a lot +> of overhead in the network, but if this only occur e.g. when using +> helper nodes as a Hidden Service it might not have that large an impact, +> but could help availability for the services? + +If we're just going to be testing them when we're first picking them, +then it seems we can do the same thing by letting the directory servers +test them. This has the added benefit that all the (behaving) clients +use the same data, so they don't end up partitioned by a node that +(for example) performs selectively for his victims. + +Another idea would be to periodically keep track of what speeds you get +through your helpers, and make decisions from this. The reason we haven't +done this yet is because there are a lot of variables -- perhaps the +web site is slow, perhaps some other node in the path is slow, perhaps +your local network is slow briefly, perhaps you got unlucky, etc. I +believe that over time (assuming the user has roughly the same browsing +habits) all of these would average out and you'd get a usable answer, +but I don't have a good sense of how long it would take to converge, +so I don't know whether this would be worthwhile. + +> BTW. I feel confortable with all the terms helper/entry/contact nodes, +> but I think you (the developers) should just pick one and stay with it +> to avoid confusion. + +I think I'm going to try to co-opt the term 'Entry' node for this +purpose. We're going to have to keep referring to helper nodes for the +research community for a while though, so they realize that Tor does +more than just let users ask for certain entry nodes. + + + +============================================================ +Some stuff that worries me about entry guards. 2006 Jun, Nickm. + +1. It is unlikely for two users to have the same set of entry guards. + +2. Observing a user is sufficient to learn its entry guards. + +3. So, as we move around, we leak our -- cgit v1.2.3-54-g00ecf