summaryrefslogtreecommitdiff
path: root/doc/spec
diff options
context:
space:
mode:
authorNick Mathewson <nickm@torproject.org>2008-06-16 17:30:22 +0000
committerNick Mathewson <nickm@torproject.org>2008-06-16 17:30:22 +0000
commitbcd7357b7123be7b136f0bce66630abf515e9a39 (patch)
treeaa5ede078d19753a9f422c4a3c003615e0e5e781 /doc/spec
parent44452c2756990520ad1e88172867fbac08c64e38 (diff)
downloadtor-bcd7357b7123be7b136f0bce66630abf515e9a39.tar.gz
tor-bcd7357b7123be7b136f0bce66630abf515e9a39.zip
Add proposal 141: download server descriptors on demand. (Status: Draft).
svn:r15302
Diffstat (limited to 'doc/spec')
-rw-r--r--doc/spec/proposals/000-index.txt2
-rw-r--r--doc/spec/proposals/141-jit-sd-downloads.txt219
2 files changed, 221 insertions, 0 deletions
diff --git a/doc/spec/proposals/000-index.txt b/doc/spec/proposals/000-index.txt
index 0306dead4a..398b9410d9 100644
--- a/doc/spec/proposals/000-index.txt
+++ b/doc/spec/proposals/000-index.txt
@@ -63,6 +63,7 @@ Proposals by number:
138 Remove routers that are not Running from consensus documents [CLOSED]
139 Download consensus documents only when it will be trusted [CLOSED]
140 Provide diffs between consensuses [OPEN]
+141 Download server descriptors on demand [DRAFT]
Proposals by status:
@@ -74,6 +75,7 @@ Proposals by status:
132 A Tor Web Service For Verifying Correct Browser Configuration
133 Incorporate Unreachable ORs into the Tor Network
134 More robust consensus voting with diverse authority sets
+ 141 Download server descriptors on demand
OPEN:
120 Shutdown descriptors when Tor servers stop
121 Hidden Service Authentication
diff --git a/doc/spec/proposals/141-jit-sd-downloads.txt b/doc/spec/proposals/141-jit-sd-downloads.txt
new file mode 100644
index 0000000000..238440532d
--- /dev/null
+++ b/doc/spec/proposals/141-jit-sd-downloads.txt
@@ -0,0 +1,219 @@
+Filename: 141-jit-sd-downloads.txt
+Title: Download server descriptors on demand
+Version: $Revision$
+Last-Modified: $Date$
+Author: Peter Palfrader
+Created: 15-Jun-2008
+Status: Draft
+
+1. Overview
+
+ Downloading all server descriptors is the most expensive part
+ of bootstrapping a Tor client. These server descriptors currently
+ amount to about 1.5 Megabytes of data, and this size will grow
+ linearly with network size.
+
+ Fetching all these server descriptors takes a long while for people
+ behind slow network connections. It is also a considerable load on
+ our network of directory mirrors.
+
+ This document describes proposed changes to the Tor network and
+ directory protocol so that clients will no longer need to download
+ all server descriptors.
+
+ These changes consist of moving load balancing information into
+ network status documents, implementing a means to download server
+ descriptors on demand in an anonymity-preserving way, and dealing
+ with exit node selection.
+
+2. What is in a server descriptor
+
+ When a Tor client starts the first thing it will try to get is a
+ current network status document, a consensus signed by a majority
+ of directory authorities. This document is currently about 100
+ Kilobytes in size, tho it will grow linearly with network size.
+ This document lists all servers currently running on the network.
+ The Tor client will then try to get a server descriptor for each
+ of the running servers. All server descriptors currently amount
+ to about 1.5 Metabytes of downloads.
+
+ A Tor client learns several things about a server from its descriptor.
+ Some of these it already learned from the network status document
+ published by the authorities, but the server descriptor contains it
+ again in a single statement signed by the server itself, not just by
+ the directory authorities.
+
+ Tor clients use the information from server descriptors for
+ different purposes, which are considered in the following sections.
+
+ #three ways: One, to determine if a server will be able to handle
+ #this client's request; two, to actually communicate or use the server;
+ #three, for load balancing decisions.
+ #
+ #These three points are considered in the following subsections.
+
+2.1 Load balancing
+
+ The Tor load balancing mechanism is quite complex in its details, but
+ it has a simple goal: The more traffic a server can handle the more
+ traffic it should get. That means the more traffic a server can
+ handle the more likely a client will use it.
+
+ For this purpose each server descriptor has bandwidth information
+ which tries to convey a server's capacity to clients.
+
+ Currently we weigh servers differently for different purposes. There
+ is a weigh for when we use a server as a guard node (our entry to the
+ Tor network), there is one weigh we assign servers for exit duties,
+ and a third for when we need intermediate (middle) nodes.
+
+2.2 Exit information
+
+ When a Tor wants to exit to some resource on the internet it will
+ build a circuit to an exit node that allows access to that resource's
+ IP address and TCP Port.
+
+ When building that circuit the client can make sure that the circuit
+ ends at a server that will be able to fulfill the request because the
+ client already learned of all the servers' exit policies from their
+ descriptors.
+
+2.3 Capability information
+
+ Server descriptors contain information about the specific version or
+ the Tor protocol they understand [proposal 105].
+
+ Furthermore the server descriptor also contains the exact version of
+ the Tor software that the server is running and some decisions are
+ made based on the server version number (for instance a Tor client
+ will only make conditional consensus requests [proposal from 13 Apr
+ 2008 that never got a number] when talking to Tor servers version
+ 0.2.1.1-alpha or later).
+
+2.4 Contact/key information
+
+ A server descriptor lists a server's IP address and TCP ports on which
+ it accepts onion and directory connections. Furthermore it contains
+ the onion key, a short lived RSA key to which clients encrypt CREATE
+ cells.
+
+2.5 Identity information
+
+ A Tor client learns the digest of a server's key from the network
+ status document. Once it has a server descriptor this descriptor
+ contains the full RSA identity key of the server. Clients verify
+ that 1) the digest of the identity key matches the expected digest
+ it got from the consensus, and 2) that the signature on the descriptor
+ from that key is valid.
+
+
+3. Doing away with the need for all SDs
+
+3.1 Load balancing info in consensus documents
+
+ One of the reasons why clients download all server descriptors is for
+ doing load proper load balancing as described in 2.1. In order for
+ clients to not require all server descriptors this information will
+ have to move into the network status document.
+
+ [XXX Two open questions here:
+ a) how do we arrive at a consensus weight?
+ b) how to represent weights in the consensus?
+ Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."
+ ]
+
+3.2 Fetching descriptors on demand
+
+ As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
+ and the onion key for a server.
+
+ A client already knows the IP address and the ports from the consensus
+ documents, but without the onion key it will not be able to send
+ CREATE/EXTEND cells for that server. Since the client needs the onion
+ key it needs the descriptor.
+
+ If a client only downloaded a few descriptors in an observable manner
+ then that would leak which nodes it was going to use.
+
+ This proposal suggests the following:
+
+ 1) when connecting to a guard node for which the client does not
+ yet have a cached descriptor it requests the descriptor it
+ expects by hash. (The consensus document that the client holds
+ has a hash for the descriptor of this server. We want exactly
+ that descriptor, not a different one.)
+
+ [XXX: How? We could either come up with a new cell type,
+ RELAY_REQUEST_SD that takes only a hash (of the SD), or use
+ RELAY_BEGIN_DIR. The former is probably smarter since we will
+ want to use it later on as well, and there we will require
+ padding.]
+
+ A client MAY cache the descriptor of the guard node so that it does
+ not need to request it every single time it contacts the guard.
+
+ 2) when a client wants to extend a circuit that currently ends in
+ server B to a new next server C, the client will send a
+ RELAY_REQUEST_SD cell to server B. This cell contains in its
+ payload the hash of a server descriptor the client would like
+ to obtain (C's server descriptor). The server sends back the
+ descriptor and the client can now form a valid EXTEND/CREATE cell
+ encrypted to C's onion key.
+
+ Clients MUST NOT cache such descriptors. If they did they might
+ leak that they already extended to that server at least once
+ before.
+
+ Replies to RELAY_REQUEST_SD requests need to be padded to some
+ constant upper limit in order to conceal a client's destination
+ from anybody who might be counting cells/bytes.
+
+ [XXX: detailed spec of RELAY_REQUEST_SD cell and its reply]
+ [XXX: figure out a decent padding size]
+
+3.3 Protocol versions
+
+ [XXX: find out where we need "opt protocols Link 1 2 Circuit 1"
+ information described in 2.3 above. If we need it, it might have
+ to go into the consensus document.]
+
+ [XXX: Similarly find out where we need the version number of a
+ remote tor server. This information is in the consensus, but
+ maybe we use it in some place where having it signed by the
+ server in question is really important?]
+
+3.4 Exit selection
+
+ Currently finding an appropriate exit node for a user's request is
+ easy for a client because it has complete knowledge of all the exit
+ policies of all servers on the network.
+
+ [XXX: I have no finished ideas here yet.
+ - if clients only rely on the current exit flag they will
+ a) never use servers for exit purposes that don't have it,
+ b) will have a hard time finding a suitable exit node for
+ their weird port that only a few servers allow.
+ - the authorities could create a new summary document that
+ lists all the exit policies and their nodes (by fingerprint).
+ I need to find out how large that document would be.
+ - can we make the "Exit" flag more useful? can we come
+ up with some "standard policies" and have operators pick
+ one of the standards?
+ ]
+
+4. Future possibilities
+
+ This proposal still requires that all servers have the descriptors of
+ every other node in the network in order to answer RELAY_REQUEST_SD
+ cells. These cells are sent when a circuit is extended from ending at
+ node B to a new node C. In that case B would have to answer a
+ RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
+
+ In order to answer that request B obviously needs a copy of C's server
+ descriptor. In the future we might amend RELAY_REQUEST_SD cells to
+ contain also the expected IP address and OR-port of the server C (the
+ client learns them from the network status document), so that B no
+ longer needs to know all the descriptors of the entire network but
+ instead can simply go and ask C for its descriptor before passing it
+ back to the client.
+