From 19c503cf8f43443f15ccf3dba686bdb3c9ddcdce Mon Sep 17 00:00:00 2001 From: Nick Mathewson Date: Mon, 16 Jun 2008 17:30:22 +0000 Subject: Add proposal 141: download server descriptors on demand. (Status: Draft). svn:r15302 --- proposals/141-jit-sd-downloads.txt | 219 +++++++++++++++++++++++++++++++++++++ 1 file changed, 219 insertions(+) create mode 100644 proposals/141-jit-sd-downloads.txt (limited to 'proposals/141-jit-sd-downloads.txt') diff --git a/proposals/141-jit-sd-downloads.txt b/proposals/141-jit-sd-downloads.txt new file mode 100644 index 0000000..2384405 --- /dev/null +++ b/proposals/141-jit-sd-downloads.txt @@ -0,0 +1,219 @@ +Filename: 141-jit-sd-downloads.txt +Title: Download server descriptors on demand +Version: $Revision$ +Last-Modified: $Date$ +Author: Peter Palfrader +Created: 15-Jun-2008 +Status: Draft + +1. Overview + + Downloading all server descriptors is the most expensive part + of bootstrapping a Tor client. These server descriptors currently + amount to about 1.5 Megabytes of data, and this size will grow + linearly with network size. + + Fetching all these server descriptors takes a long while for people + behind slow network connections. It is also a considerable load on + our network of directory mirrors. + + This document describes proposed changes to the Tor network and + directory protocol so that clients will no longer need to download + all server descriptors. + + These changes consist of moving load balancing information into + network status documents, implementing a means to download server + descriptors on demand in an anonymity-preserving way, and dealing + with exit node selection. + +2. What is in a server descriptor + + When a Tor client starts the first thing it will try to get is a + current network status document, a consensus signed by a majority + of directory authorities. This document is currently about 100 + Kilobytes in size, tho it will grow linearly with network size. + This document lists all servers currently running on the network. + The Tor client will then try to get a server descriptor for each + of the running servers. All server descriptors currently amount + to about 1.5 Metabytes of downloads. + + A Tor client learns several things about a server from its descriptor. + Some of these it already learned from the network status document + published by the authorities, but the server descriptor contains it + again in a single statement signed by the server itself, not just by + the directory authorities. + + Tor clients use the information from server descriptors for + different purposes, which are considered in the following sections. + + #three ways: One, to determine if a server will be able to handle + #this client's request; two, to actually communicate or use the server; + #three, for load balancing decisions. + # + #These three points are considered in the following subsections. + +2.1 Load balancing + + The Tor load balancing mechanism is quite complex in its details, but + it has a simple goal: The more traffic a server can handle the more + traffic it should get. That means the more traffic a server can + handle the more likely a client will use it. + + For this purpose each server descriptor has bandwidth information + which tries to convey a server's capacity to clients. + + Currently we weigh servers differently for different purposes. There + is a weigh for when we use a server as a guard node (our entry to the + Tor network), there is one weigh we assign servers for exit duties, + and a third for when we need intermediate (middle) nodes. + +2.2 Exit information + + When a Tor wants to exit to some resource on the internet it will + build a circuit to an exit node that allows access to that resource's + IP address and TCP Port. + + When building that circuit the client can make sure that the circuit + ends at a server that will be able to fulfill the request because the + client already learned of all the servers' exit policies from their + descriptors. + +2.3 Capability information + + Server descriptors contain information about the specific version or + the Tor protocol they understand [proposal 105]. + + Furthermore the server descriptor also contains the exact version of + the Tor software that the server is running and some decisions are + made based on the server version number (for instance a Tor client + will only make conditional consensus requests [proposal from 13 Apr + 2008 that never got a number] when talking to Tor servers version + 0.2.1.1-alpha or later). + +2.4 Contact/key information + + A server descriptor lists a server's IP address and TCP ports on which + it accepts onion and directory connections. Furthermore it contains + the onion key, a short lived RSA key to which clients encrypt CREATE + cells. + +2.5 Identity information + + A Tor client learns the digest of a server's key from the network + status document. Once it has a server descriptor this descriptor + contains the full RSA identity key of the server. Clients verify + that 1) the digest of the identity key matches the expected digest + it got from the consensus, and 2) that the signature on the descriptor + from that key is valid. + + +3. Doing away with the need for all SDs + +3.1 Load balancing info in consensus documents + + One of the reasons why clients download all server descriptors is for + doing load proper load balancing as described in 2.1. In order for + clients to not require all server descriptors this information will + have to move into the network status document. + + [XXX Two open questions here: + a) how do we arrive at a consensus weight? + b) how to represent weights in the consensus? + Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.." + ] + +3.2 Fetching descriptors on demand + + As described in 2.4 a descriptor lists IP address, OR- and Dir-Port, + and the onion key for a server. + + A client already knows the IP address and the ports from the consensus + documents, but without the onion key it will not be able to send + CREATE/EXTEND cells for that server. Since the client needs the onion + key it needs the descriptor. + + If a client only downloaded a few descriptors in an observable manner + then that would leak which nodes it was going to use. + + This proposal suggests the following: + + 1) when connecting to a guard node for which the client does not + yet have a cached descriptor it requests the descriptor it + expects by hash. (The consensus document that the client holds + has a hash for the descriptor of this server. We want exactly + that descriptor, not a different one.) + + [XXX: How? We could either come up with a new cell type, + RELAY_REQUEST_SD that takes only a hash (of the SD), or use + RELAY_BEGIN_DIR. The former is probably smarter since we will + want to use it later on as well, and there we will require + padding.] + + A client MAY cache the descriptor of the guard node so that it does + not need to request it every single time it contacts the guard. + + 2) when a client wants to extend a circuit that currently ends in + server B to a new next server C, the client will send a + RELAY_REQUEST_SD cell to server B. This cell contains in its + payload the hash of a server descriptor the client would like + to obtain (C's server descriptor). The server sends back the + descriptor and the client can now form a valid EXTEND/CREATE cell + encrypted to C's onion key. + + Clients MUST NOT cache such descriptors. If they did they might + leak that they already extended to that server at least once + before. + + Replies to RELAY_REQUEST_SD requests need to be padded to some + constant upper limit in order to conceal a client's destination + from anybody who might be counting cells/bytes. + + [XXX: detailed spec of RELAY_REQUEST_SD cell and its reply] + [XXX: figure out a decent padding size] + +3.3 Protocol versions + + [XXX: find out where we need "opt protocols Link 1 2 Circuit 1" + information described in 2.3 above. If we need it, it might have + to go into the consensus document.] + + [XXX: Similarly find out where we need the version number of a + remote tor server. This information is in the consensus, but + maybe we use it in some place where having it signed by the + server in question is really important?] + +3.4 Exit selection + + Currently finding an appropriate exit node for a user's request is + easy for a client because it has complete knowledge of all the exit + policies of all servers on the network. + + [XXX: I have no finished ideas here yet. + - if clients only rely on the current exit flag they will + a) never use servers for exit purposes that don't have it, + b) will have a hard time finding a suitable exit node for + their weird port that only a few servers allow. + - the authorities could create a new summary document that + lists all the exit policies and their nodes (by fingerprint). + I need to find out how large that document would be. + - can we make the "Exit" flag more useful? can we come + up with some "standard policies" and have operators pick + one of the standards? + ] + +4. Future possibilities + + This proposal still requires that all servers have the descriptors of + every other node in the network in order to answer RELAY_REQUEST_SD + cells. These cells are sent when a circuit is extended from ending at + node B to a new node C. In that case B would have to answer a + RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest). + + In order to answer that request B obviously needs a copy of C's server + descriptor. In the future we might amend RELAY_REQUEST_SD cells to + contain also the expected IP address and OR-port of the server C (the + client learns them from the network status document), so that B no + longer needs to know all the descriptors of the entire network but + instead can simply go and ask C for its descriptor before passing it + back to the client. + -- cgit v1.2.3-54-g00ecf