1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
|
Filename: 141-jit-sd-downloads.txt
Title: Download server descriptors on demand
Version: $Revision$
Last-Modified: $Date$
Author: Peter Palfrader
Created: 15-Jun-2008
Status: Draft
1. Overview
Downloading all server descriptors is the most expensive part
of bootstrapping a Tor client. These server descriptors currently
amount to about 1.5 Megabytes of data, and this size will grow
linearly with network size.
Fetching all these server descriptors takes a long while for people
behind slow network connections. It is also a considerable load on
our network of directory mirrors.
This document describes proposed changes to the Tor network and
directory protocol so that clients will no longer need to download
all server descriptors.
These changes consist of moving load balancing information into
network status documents, implementing a means to download server
descriptors on demand in an anonymity-preserving way, and dealing
with exit node selection.
2. What is in a server descriptor
When a Tor client starts the first thing it will try to get is a
current network status document: a consensus signed by a majority
of directory authorities. This document is currently about 100
Kilobytes in size, tho it will grow linearly with network size.
This document lists all servers currently running on the network.
The Tor client will then try to get a server descriptor for each
of the running servers. All server descriptors currently amount
to about 1.5 Megabytes of downloads.
A Tor client learns several things about a server from its descriptor.
Some of these it already learned from the network status document
published by the authorities, but the server descriptor contains it
again in a single statement signed by the server itself, not just by
the directory authorities.
Tor clients use the information from server descriptors for
different purposes, which are considered in the following sections.
#three ways: One, to determine if a server will be able to handle
#this client's request; two, to actually communicate or use the server;
#three, for load balancing decisions.
#
#These three points are considered in the following subsections.
2.1 Load balancing
The Tor load balancing mechanism is quite complex in its details, but
it has a simple goal: The more traffic a server can handle the more
traffic it should get. That means the more traffic a server can
handle the more likely a client will use it.
For this purpose each server descriptor has bandwidth information
which tries to convey a server's capacity to clients.
Currently we weigh servers differently for different purposes. There
is a weigh for when we use a server as a guard node (our entry to the
Tor network), there is one weigh we assign servers for exit duties,
and a third for when we need intermediate (middle) nodes.
2.2 Exit information
When a Tor wants to exit to some resource on the internet it will
build a circuit to an exit node that allows access to that resource's
IP address and TCP Port.
When building that circuit the client can make sure that the circuit
ends at a server that will be able to fulfill the request because the
client already learned of all the servers' exit policies from their
descriptors.
2.3 Capability information
Server descriptors contain information about the specific version or
the Tor protocol they understand [proposal 105].
Furthermore the server descriptor also contains the exact version of
the Tor software that the server is running and some decisions are
made based on the server version number (for instance a Tor client
will only make conditional consensus requests [proposal 139] when
talking to Tor servers version 0.2.1.1-alpha or later).
2.4 Contact/key information
A server descriptor lists a server's IP address and TCP ports on which
it accepts onion and directory connections. Furthermore it contains
the onion key (a short lived RSA key to which clients encrypt CREATE
cells).
2.5 Identity information
A Tor client learns the digest of a server's key from the network
status document. Once it has a server descriptor this descriptor
contains the full RSA identity key of the server. Clients verify
that 1) the digest of the identity key matches the expected digest
it got from the consensus, and 2) that the signature on the descriptor
from that key is valid.
3. No longer require clients to have copies of all SDs
3.1 Load balancing info in consensus documents
One of the reasons why clients download all server descriptors is for
doing load proper load balancing as described in 2.1. In order for
clients to not require all server descriptors this information will
have to move into the network status document.
Consensus documents will have a new line per router similar
to the "r", "s", and "v" lines that already exist. This line
will convey weight information to clients.
"w Bandwidth=193671"
The bandwidth number is the lesser of observed bandwidth and bandwidth
rate limit from the server descriptor that the "r" line referenced by
digest (1st and 3rd field of the bandwidth line in the descriptor).
The bandwidth item is added as another item in the router tuple
described in dir-spec section 3.4:
| * Two router entries are "the same" if they have the same
| <descriptor digest, published time, nickname, IP, ports> tuple.
| We choose the tuple for a given router as whichever tuple appears
| for that router in the most votes. We break ties in favor of
| the more recently published.
3.2 Fetching descriptors on demand
As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
and the onion key for a server.
A client already knows the IP address and the ports from the consensus
documents, but without the onion key it will not be able to send
CREATE/EXTEND cells for that server. Since the client needs the onion
key it needs the descriptor.
If a client only downloaded a few descriptors in an observable manner
then that would leak which nodes it was going to use.
This proposal suggests the following:
1) when connecting to a guard node for which the client does not
yet have a cached descriptor it requests the descriptor it
expects by hash. (The consensus document that the client holds
has a hash for the descriptor of this server. We want exactly
that descriptor, not a different one.)
It does that by sending a RELAY_REQUEST_SD cell.
A client MAY cache the descriptor of the guard node so that it does
not need to request it every single time it contacts the guard.
2) when a client wants to extend a circuit that currently ends in
server B to a new next server C, the client will send a
RELAY_REQUEST_SD cell to server B. This cell contains in its
payload the hash of a server descriptor the client would like
to obtain (C's server descriptor). The server sends back the
descriptor and the client can now form a valid EXTEND/CREATE cell
encrypted to C's onion key.
Clients MUST NOT cache such descriptors. If they did they might
leak that they already extended to that server at least once
before.
Replies to RELAY_REQUEST_SD requests need to be padded to some
constant upper limit in order to conceal a client's destination
from anybody who might be counting cells/bytes.
RELAY_REQUEST_SD cells contain the following information:
- hash of the server descriptor requested
- hash of the identity digest of the server for which we want the SD
- IP address and OR-port or the server for which we want the SD
- padding factor - the number of cells we want the answer
padded to.
[XXX this just occured to me and it might be smart. or it might
be stupid. clients would learn the padding factor they want
to use from the consensus document. This allows us to grow
the replies later on should SDs become larger.]
[XXX: figure out a decent padding size]
3.3 Protocol versions
Server descriptors contain optional information of supported
link-level and circuit-level protocols in the form of
"opt protocols Link 1 2 Circuit 1". These are not currently needed
and will probably eventually move into the "v" (version) line in
the consensus. This proposal does not deal with them.
Similarly a server descriptor contains the version number of
a Tor node. This information is already present in the consensus
and is thus available to all clients immediately.
3.4 Exit selection
Currently finding an appropriate exit node for a user's request is
easy for a client because it has complete knowledge of all the exit
policies of all servers on the network.
The consensus document will once again be extended to contain the
information required by clients. This information will be a summary
of each node's exit policy. The exit policy summary will only contain
the list of ports to which a node exits to most destination IP
addresses.
A summary should claim a router exits to a specific TCP port if,
ignoring private IP addresses (link and site local per RFC3300), the
exit policy indicates that the router would exit to this port to any
IP address with the exception of at most 2^25 single addresses (That's
either two /8 netblocks, or one /8 and a couple of /12s or any other
combination).
An exit policy summary will be included in votes and consensus as a
new line attached to each exit node. A lack of policy should indicate
a non-exit policy. The line will have the format
"p" <space> "accept"|"reject" <portlist>
where portlist is a comma seperated list of single port numbers or
portranges (e.g. "22,80-88,1024-6000,6667"). Whether the summary
shows the list of accepted ports or the list of rejected ports depends
on which list is shorter (has less elements). In case of ties we
choose the list of accepted ports.
Similarly to IP address, ports, timestamp, and bandwidth a consensus
should list the exit policy matching the descriptor digest referenced
in the consensus document.
4. Migration
4.1 Consensus document changes.
The consensus will need to include
- bandwidth information (see 3.1)
- exit policy summaries (3.4)
A new consensus method (number TBD) will be chosen for this.
5. Future possibilities
This proposal still requires that all servers have the descriptors of
every other node in the network in order to answer RELAY_REQUEST_SD
cells. These cells are sent when a circuit is extended from ending at
node B to a new node C. In that case B would have to answer a
RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
In order to answer that request B obviously needs a copy of C's server
descriptor. The RELAY_REQUEST_SD cell already has all the info that
B needs to contact C so it can ask about the descriptor before passing it
back to the client.
|