diff options
Diffstat (limited to 'doc/spec/proposals/141-jit-sd-downloads.txt')
-rw-r--r-- | doc/spec/proposals/141-jit-sd-downloads.txt | 325 |
1 files changed, 0 insertions, 325 deletions
diff --git a/doc/spec/proposals/141-jit-sd-downloads.txt b/doc/spec/proposals/141-jit-sd-downloads.txt deleted file mode 100644 index b0c2b2cbcd..0000000000 --- a/doc/spec/proposals/141-jit-sd-downloads.txt +++ /dev/null @@ -1,325 +0,0 @@ -Filename: 141-jit-sd-downloads.txt -Title: Download server descriptors on demand -Version: $Revision$ -Last-Modified: $Date$ -Author: Peter Palfrader -Created: 15-Jun-2008 -Status: Draft - -1. Overview - - Downloading all server descriptors is the most expensive part - of bootstrapping a Tor client. These server descriptors currently - amount to about 1.5 Megabytes of data, and this size will grow - linearly with network size. - - Fetching all these server descriptors takes a long while for people - behind slow network connections. It is also a considerable load on - our network of directory mirrors. - - This document describes proposed changes to the Tor network and - directory protocol so that clients will no longer need to download - all server descriptors. - - These changes consist of moving load balancing information into - network status documents, implementing a means to download server - descriptors on demand in an anonymity-preserving way, and dealing - with exit node selection. - -2. What is in a server descriptor - - When a Tor client starts the first thing it will try to get is a - current network status document: a consensus signed by a majority - of directory authorities. This document is currently about 100 - Kilobytes in size, tho it will grow linearly with network size. - This document lists all servers currently running on the network. - The Tor client will then try to get a server descriptor for each - of the running servers. All server descriptors currently amount - to about 1.5 Megabytes of downloads. - - A Tor client learns several things about a server from its descriptor. - Some of these it already learned from the network status document - published by the authorities, but the server descriptor contains it - again in a single statement signed by the server itself, not just by - the directory authorities. - - Tor clients use the information from server descriptors for - different purposes, which are considered in the following sections. - - #three ways: One, to determine if a server will be able to handle - #this client's request; two, to actually communicate or use the server; - #three, for load balancing decisions. - # - #These three points are considered in the following subsections. - -2.1 Load balancing - - The Tor load balancing mechanism is quite complex in its details, but - it has a simple goal: The more traffic a server can handle the more - traffic it should get. That means the more traffic a server can - handle the more likely a client will use it. - - For this purpose each server descriptor has bandwidth information - which tries to convey a server's capacity to clients. - - Currently we weigh servers differently for different purposes. There - is a weigh for when we use a server as a guard node (our entry to the - Tor network), there is one weigh we assign servers for exit duties, - and a third for when we need intermediate (middle) nodes. - -2.2 Exit information - - When a Tor wants to exit to some resource on the internet it will - build a circuit to an exit node that allows access to that resource's - IP address and TCP Port. - - When building that circuit the client can make sure that the circuit - ends at a server that will be able to fulfill the request because the - client already learned of all the servers' exit policies from their - descriptors. - -2.3 Capability information - - Server descriptors contain information about the specific version or - the Tor protocol they understand [proposal 105]. - - Furthermore the server descriptor also contains the exact version of - the Tor software that the server is running and some decisions are - made based on the server version number (for instance a Tor client - will only make conditional consensus requests [proposal 139] when - talking to Tor servers version 0.2.1.1-alpha or later). - -2.4 Contact/key information - - A server descriptor lists a server's IP address and TCP ports on which - it accepts onion and directory connections. Furthermore it contains - the onion key (a short lived RSA key to which clients encrypt CREATE - cells). - -2.5 Identity information - - A Tor client learns the digest of a server's key from the network - status document. Once it has a server descriptor this descriptor - contains the full RSA identity key of the server. Clients verify - that 1) the digest of the identity key matches the expected digest - it got from the consensus, and 2) that the signature on the descriptor - from that key is valid. - - -3. No longer require clients to have copies of all SDs - -3.1 Load balancing info in consensus documents - - One of the reasons why clients download all server descriptors is for - doing load proper load balancing as described in 2.1. In order for - clients to not require all server descriptors this information will - have to move into the network status document. - - Consensus documents will have a new line per router similar - to the "r", "s", and "v" lines that already exist. This line - will convey weight information to clients. - - "w Bandwidth=193" - - The bandwidth number is the lesser of observed bandwidth and bandwidth - rate limit from the server descriptor that the "r" line referenced by - digest (1st and 3rd field of the bandwidth line in the descriptor). - It is given in kilobytes per second so the byte value in the - descriptor has to be divided by 1024 (and is then truncated, i.e. - rounded down). - - Authorities will cap the bandwidth number at some arbitrary value, - currently 10MB/sec. If a router claims a larger bandwidth an - authority's vote will still only show Bandwidth=10240. - - The consensus value for bandwidth is the median of all bandwidth - numbers given in votes. In case of an even number of votes we use - the lower median. (Using this procedure allows us to change the - cap value more easily.) - - Clients should believe the bandwidth as presented in the consensus, - not capping it again. - -3.2 Fetching descriptors on demand - - As described in 2.4 a descriptor lists IP address, OR- and Dir-Port, - and the onion key for a server. - - A client already knows the IP address and the ports from the consensus - documents, but without the onion key it will not be able to send - CREATE/EXTEND cells for that server. Since the client needs the onion - key it needs the descriptor. - - If a client only downloaded a few descriptors in an observable manner - then that would leak which nodes it was going to use. - - This proposal suggests the following: - - 1) when connecting to a guard node for which the client does not - yet have a cached descriptor it requests the descriptor it - expects by hash. (The consensus document that the client holds - has a hash for the descriptor of this server. We want exactly - that descriptor, not a different one.) - - It does that by sending a RELAY_REQUEST_SD cell. - - A client MAY cache the descriptor of the guard node so that it does - not need to request it every single time it contacts the guard. - - 2) when a client wants to extend a circuit that currently ends in - server B to a new next server C, the client will send a - RELAY_REQUEST_SD cell to server B. This cell contains in its - payload the hash of a server descriptor the client would like - to obtain (C's server descriptor). The server sends back the - descriptor and the client can now form a valid EXTEND/CREATE cell - encrypted to C's onion key. - - Clients MUST NOT cache such descriptors. If they did they might - leak that they already extended to that server at least once - before. - - Replies to RELAY_REQUEST_SD requests need to be padded to some - constant upper limit in order to conceal a client's destination - from anybody who might be counting cells/bytes. - - RELAY_REQUEST_SD cells contain the following information: - - hash of the server descriptor requested - - hash of the identity digest of the server for which we want the SD - - IP address and OR-port or the server for which we want the SD - - padding factor - the number of cells we want the answer - padded to. - [XXX this just occured to me and it might be smart. or it might - be stupid. clients would learn the padding factor they want - to use from the consensus document. This allows us to grow - the replies later on should SDs become larger.] - [XXX: figure out a decent padding size] - -3.3 Protocol versions - - Server descriptors contain optional information of supported - link-level and circuit-level protocols in the form of - "opt protocols Link 1 2 Circuit 1". These are not currently needed - and will probably eventually move into the "v" (version) line in - the consensus. This proposal does not deal with them. - - Similarly a server descriptor contains the version number of - a Tor node. This information is already present in the consensus - and is thus available to all clients immediately. - -3.4 Exit selection - - Currently finding an appropriate exit node for a user's request is - easy for a client because it has complete knowledge of all the exit - policies of all servers on the network. - - The consensus document will once again be extended to contain the - information required by clients. This information will be a summary - of each node's exit policy. The exit policy summary will only contain - the list of ports to which a node exits to most destination IP - addresses. - - A summary should claim a router exits to a specific TCP port if, - ignoring private IP addresses, the exit policy indicates that the - router would exit to this port to most IP address. either two /8 - netblocks, or one /8 and a couple of /12s or any other combination). - The exact algorith used is this: Going through all exit policy items - - ignore any accept that is not for all IP addresses ("*"), - - ignore rejects for these netblocks (exactly, no subnetting): - 0.0.0.0/8, 169.254.0.0/16, 127.0.0.0/8, 192.168.0.0/16, 10.0.0.0/8, - and 172.16.0.0/12m - - for each reject count the number of IP addresses rejected against - the affected ports, - - once we hit an accept for all IP addresses ("*") add the ports in - that policy item to the list of accepted ports, if they don't have - more than 2^25 IP addresses (that's two /8 networks) counted - against them (i.e. if the router exits to a port to everywhere but - at most two /8 networks). - - An exit policy summary will be included in votes and consensus as a - new line attached to each exit node. The line will have the format - "p" <space> "accept"|"reject" <portlist> - where portlist is a comma seperated list of single port numbers or - portranges (e.g. "22,80-88,1024-6000,6667"). - - Whether the summary shows the list of accepted ports or the list of - rejected ports depends on which list is shorter (has a shorter string - representation). In case of ties we choose the list of accepted - ports. As an exception to this rule an allow-all policy is - represented as "accept 1-65535" instead of "reject " and a reject-all - policy is similarly given as "reject 1-65535". - - Summary items are compressed, that is instead of "80-88,89-100" there - only is a single item of "80-100", similarly instead of "20,21" a - summary will say "20-21". - - Port lists are sorted in ascending order. - - The maximum allowed length of a policy summary (including the "accept " - or "reject ") is 1000 characters. If a summary exceeds that length we - use an accept-style summary and list as much of the port list as is - possible within these 1000 bytes. - -3.4.1 Consensus selection - - When building a consensus, authorities have to agree on a digest of - the server descriptor to list in the router line for each router. - This is documented in dir-spec section 3.4. - - All authorities that listed that agreed upon descriptor digest in - their vote should also list the same exit policy summary - or list - none at all if the authority has not been upgraded to list that - information in their vote. - - If we have votes with matching server descriptor digest of which at - least one of them has an exit policy then we differ between two cases: - a) all authorities agree (or abstained) on the policy summary, and we - use the exit policy summary that they all listed in their vote, - b) something went wrong (or some authority is playing foul) and we - have different policy summaries. In that case we pick the one - that is most commonly listed in votes with the matching - descriptor. We break ties in favour of the lexigraphically larger - vote. - - If none one of the votes with a matching server descriptor digest has - an exit policy summary we use the most commonly listed one in all - votes, breaking ties like in case b above. - -3.4.2 Client behaviour - - When choosing an exit node for a specific request a Tor client will - choose from the list of nodes that exit to the requested port as given - by the consensus document. If a client has additional knowledge (like - cached full descriptors) that indicates the so chosen exit node will - reject the request then it MAY use that knowledge (or not include such - nodes in the selection to begin with). However, clients MUST NOT use - nodes that do not list the port as accepted in the summary (but for - which they know that the node would exit to that address from other - sources, like a cached descriptor). - - An exception to this is exit enclave behaviour: A client MAY use the - node at a specific IP address to exit to any port on the same address - even if that node is not listed as exiting to the port in the summary. - -4. Migration - -4.1 Consensus document changes. - - The consensus will need to include - - bandwidth information (see 3.1) - - exit policy summaries (3.4) - - A new consensus method (number TBD) will be chosen for this. - -5. Future possibilities - - This proposal still requires that all servers have the descriptors of - every other node in the network in order to answer RELAY_REQUEST_SD - cells. These cells are sent when a circuit is extended from ending at - node B to a new node C. In that case B would have to answer a - RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest). - - In order to answer that request B obviously needs a copy of C's server - descriptor. The RELAY_REQUEST_SD cell already has all the info that - B needs to contact C so it can ask about the descriptor before passing it - back to the client. - |