From 19c503cf8f43443f15ccf3dba686bdb3c9ddcdce Mon Sep 17 00:00:00 2001
From: Nick Mathewson <nickm@torproject.org>
Date: Mon, 16 Jun 2008 17:30:22 +0000
Subject: Add proposal 141: download server descriptors on demand.  (Status:
 Draft).

svn:r15302
---
 proposals/141-jit-sd-downloads.txt | 219 +++++++++++++++++++++++++++++++++++++
 1 file changed, 219 insertions(+)
 create mode 100644 proposals/141-jit-sd-downloads.txt

(limited to 'proposals/141-jit-sd-downloads.txt')

diff --git a/proposals/141-jit-sd-downloads.txt b/proposals/141-jit-sd-downloads.txt
new file mode 100644
index 0000000..2384405
--- /dev/null
+++ b/proposals/141-jit-sd-downloads.txt
@@ -0,0 +1,219 @@
+Filename: 141-jit-sd-downloads.txt
+Title: Download server descriptors on demand
+Version: $Revision$
+Last-Modified: $Date$
+Author: Peter Palfrader
+Created: 15-Jun-2008
+Status: Draft
+
+1. Overview
+
+  Downloading all server descriptors is the most expensive part
+  of bootstrapping a Tor client.  These server descriptors currently
+  amount to about 1.5 Megabytes of data, and this size will grow
+  linearly with network size.
+
+  Fetching all these server descriptors takes a long while for people
+  behind slow network connections.  It is also a considerable load on
+  our network of directory mirrors.
+
+  This document describes proposed changes to the Tor network and
+  directory protocol so that clients will no longer need to download
+  all server descriptors.
+
+  These changes consist of moving load balancing information into
+  network status documents, implementing a means to download server
+  descriptors on demand in an anonymity-preserving way, and dealing
+  with exit node selection.
+
+2. What is in a server descriptor
+
+  When a Tor client starts the first thing it will try to get is a
+  current network status document, a consensus signed by a majority
+  of directory authorities.  This document is currently about 100
+  Kilobytes in size, tho it will grow linearly with network size.
+  This document lists all servers currently running on the network.
+  The Tor client will then try to get a server descriptor for each
+  of the running servers.  All server descriptors currently amount
+  to about 1.5 Metabytes of downloads.
+
+  A Tor client learns several things about a server from its descriptor.
+  Some of these it already learned from the network status document
+  published by the authorities, but the server descriptor contains it
+  again in a single statement signed by the server itself, not just by
+  the directory authorities.
+
+  Tor clients use the information from server descriptors for
+  different purposes, which are considered in the following sections.
+
+  #three ways:  One, to determine if a server will be able to handle
+  #this client's request; two, to actually communicate or use the server;
+  #three, for load balancing decisions.
+  #
+  #These three points are considered in the following subsections.
+
+2.1 Load balancing
+
+  The Tor load balancing mechanism is quite complex in its details, but
+  it has a simple goal: The more traffic a server can handle the more
+  traffic it should get.  That means the more traffic a server can
+  handle the more likely a client will use it.
+
+  For this purpose each server descriptor has bandwidth information
+  which tries to convey a server's capacity to clients.
+
+  Currently we weigh servers differently for different purposes.  There
+  is a weigh for when we use a server as a guard node (our entry to the
+  Tor network), there is one weigh we assign servers for exit duties,
+  and a third for when we need intermediate (middle) nodes.
+
+2.2 Exit information
+
+  When a Tor wants to exit to some resource on the internet it will
+  build a circuit to an exit node that allows access to that resource's
+  IP address and TCP Port.
+
+  When building that circuit the client can make sure that the circuit
+  ends at a server that will be able to fulfill the request because the
+  client already learned of all the servers' exit policies from their
+  descriptors.
+
+2.3 Capability information
+
+  Server descriptors contain information about the specific version or
+  the Tor protocol they understand [proposal 105].
+
+  Furthermore the server descriptor also contains the exact version of
+  the Tor software that the server is running and some decisions are
+  made based on the server version number (for instance a Tor client
+  will only make conditional consensus requests [proposal from 13 Apr
+  2008 that never got a number] when talking to Tor servers version
+  0.2.1.1-alpha or later).
+
+2.4 Contact/key information
+
+  A server descriptor lists a server's IP address and TCP ports on which
+  it accepts onion and directory connections.  Furthermore it contains
+  the onion key, a short lived RSA key to which clients encrypt CREATE
+  cells.
+
+2.5 Identity information
+
+  A Tor client learns the digest of a server's key from the network
+  status document.  Once it has a server descriptor this descriptor
+  contains the full RSA identity key of the server.  Clients verify
+  that 1) the digest of the identity key matches the expected digest
+  it got from the consensus, and 2) that the signature on the descriptor
+  from that key is valid.
+
+
+3. Doing away with the need for all SDs
+
+3.1 Load balancing info in consensus documents
+
+  One of the reasons why clients download all server descriptors is for
+  doing load proper load balancing as described in 2.1.  In order for
+  clients to not require all server descriptors this information will
+  have to move into the network status document.
+
+  [XXX Two open questions here:
+   a) how do we arrive at a consensus weight?
+   b) how to represent weights in the consensus?
+      Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."
+  ]
+
+3.2 Fetching descriptors on demand
+
+  As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
+  and the onion key for a server.
+
+  A client already knows the IP address and the ports from the consensus
+  documents, but without the onion key it will not be able to send
+  CREATE/EXTEND cells for that server.  Since the client needs the onion
+  key it needs the descriptor.
+
+  If a client only downloaded a few descriptors in an observable manner
+  then that would leak which nodes it was going to use.
+
+  This proposal suggests the following:
+
+  1) when connecting to a guard node for which the client does not
+     yet have a cached descriptor it requests the descriptor it
+     expects by hash.  (The consensus document that the client holds
+     has a hash for the descriptor of this server.  We want exactly
+     that descriptor, not a different one.)
+
+     [XXX: How?  We could either come up with a new cell type,
+      RELAY_REQUEST_SD that takes only a hash (of the SD), or use
+      RELAY_BEGIN_DIR.  The former is probably smarter since we will
+      want to use it later on as well, and there we will require
+      padding.]
+
+     A client MAY cache the descriptor of the guard node so that it does
+     not need to request it every single time it contacts the guard.
+
+  2) when a client wants to extend a circuit that currently ends in
+     server B to a new next server C, the client will send a
+     RELAY_REQUEST_SD cell to server B.  This cell contains in its
+     payload the hash of a server descriptor the client would like
+     to obtain (C's server descriptor).  The server sends back the
+     descriptor and the client can now form a valid EXTEND/CREATE cell
+     encrypted to C's onion key.
+
+     Clients MUST NOT cache such descriptors.  If they did they might
+     leak that they already extended to that server at least once
+     before.
+
+  Replies to RELAY_REQUEST_SD requests need to be padded to some
+  constant upper limit in order to conceal a client's destination
+  from anybody who might be counting cells/bytes.
+
+  [XXX: detailed spec of RELAY_REQUEST_SD cell and its reply]
+  [XXX: figure out a decent padding size]
+
+3.3 Protocol versions
+
+  [XXX: find out where we need "opt protocols Link 1 2 Circuit 1"
+  information described in 2.3 above.  If we need it, it might have
+  to go into the consensus document.]
+
+  [XXX: Similarly find out where we need the version number of a
+  remote tor server.  This information is in the consensus, but
+  maybe we use it in some place where having it signed by the
+  server in question is really important?]
+
+3.4 Exit selection
+
+  Currently finding an appropriate exit node for a user's request is
+  easy for a client because it has complete knowledge of all the exit
+  policies of all servers on the network.
+
+  [XXX: I have no finished ideas here yet.
+    - if clients only rely on the current exit flag they will
+      a) never use servers for exit purposes that don't have it,
+      b) will have a hard time finding a suitable exit node for
+         their weird port that only a few servers allow.
+    - the authorities could create a new summary document that
+      lists all the exit policies and their nodes (by fingerprint).
+      I need to find out how large that document would be.
+    - can we make the "Exit" flag more useful?  can we come
+      up with some "standard policies" and have operators pick
+      one of the standards?
+  ]
+
+4. Future possibilities
+
+  This proposal still requires that all servers have the descriptors of
+  every other node in the network in order to answer RELAY_REQUEST_SD
+  cells.  These cells are sent when a circuit is extended from ending at
+  node B to a new node C.  In that case B would have to answer a
+  RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
+
+  In order to answer that request B obviously needs a copy of C's server
+  descriptor.  In the future we might amend RELAY_REQUEST_SD cells to
+  contain also the expected IP address and OR-port of the server C (the
+  client learns them from the network status document), so that B no
+  longer needs to know all the descriptors of the entire network but
+  instead can simply go and ask C for its descriptor before passing it
+  back to the client.
+
-- 
cgit v1.2.3-54-g00ecf