1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
|
Filename: 140-consensus-diffs.txt
Title: Provide diffs between consensuses
Author: Peter Palfrader
Created: 13-Jun-2008
Status: Accepted
0. History
22-May-2009: Restricted the ed format even more strictly for ease of
implementation. -nickm
25-May-2014: Adapted to the new dir-spec version 3 and made the diff urls
backwards-compatible. -mvdan
1. Overview.
Tor clients and servers need a list of which relays are on the
network. This list, the consensus, is created by authorities
hourly and clients fetch a copy of it, with some delay, hourly.
This proposal suggests that clients download diffs of consensuses
once they have a consensus instead of hourly downloading a full
consensus.
This does not only apply to ordinary directory consensuses, but to the
newer microdescriptor consensuses added in the third version of the
directory specification.
2. Numbers
After implementing proposal 138 which removes nodes that are not
running from the list a consensus document is about 92 kilobytes
in size after compression.
The diff between two consecutive consensus, in ed format, is on
average 13 kilobytes compressed.
3. Proposal
3.1 Clients
If a client has a consensus that is recent enough it SHOULD
try to download a diff to get the latest consensus rather than
fetching a full one.
[XXX: what is recent enough?
time delta in hours / size of compressed diff
0 20
1 9650
2 17011
3 23150
4 29813
5 36079
6 39455
7 43903
8 48907
9 54549
10 60057
11 67810
12 71171
13 73863
14 76048
15 80031
16 84686
17 89862
18 94760
19 94868
20 94223
21 93921
22 92144
23 90228
[ size of gzip compressed "diff -e" between the consensus on
2008-06-01-00:00:00 and the following consensuses that day.
Consensuses have been modified to exclude down routers per
proposal 138. ]
Data suggests that for the first few hours diffs are very useful,
saving about 60% for the first three hours, 30% for the first 10,
and almost nothing once we are past 16 hours.
]
3.2 Servers
Directory authorities and servers need to keep up to X [XXX: depends
on how long clients try to download diffs per above] old consensus
documents so they can build diffs. They should offer a diff to the
most recent consensus at the following request:
HTTP/1.0 GET /tor/status-vote/current/consensus/<FPRLIST>.z
X-Or-Diff-From-Consensus: HASH1 HASH2...
where the hashes are the full digests of the consensuses the client
currently has, and FPRLIST is a list of (abbreviated) fingerprints of
authorities the client trusts.
Servers will only return a consensus if more than half of the requested
authorities have signed the document, otherwise a 404 error will be sent
back. The fingerprints can be shortened to a length of any multiple of
two, using only the leftmost part of the encoded fingerprint. Tor uses
3 bytes (6 hex characters) of the fingerprint. (This is just like the
conditional consensus downloads that Tor supports starting with
0.1.2.1-alpha.)
The advantage of using the same URL that is currently used for
consensuses is that the client doesn't need to know whether a server
supports consensus diffs. If it doesn't, it will simply ignore the
extra header and return the full consensus.
If a server cannot offer a diff from one of the consensuses identified
by one of the hashes but has a current consensus it MUST return the
full consensus.
[XXX: what should we do when the client already has the latest
consensus? I can think of the following options:
- send back 3xx not modified
- send back 200 ok and an empty diff
- send back 404 nothing newer here.
I currently lean towards the empty diff.]
4. Diff Format
Diffs start with the token "network-status-diff-version" followed by a
space and the version number, currently "1".
If a document does not start with network-status-diff it is assumed
to be a full consensus download and would therefore currently start
with "network-status-version 3".
Following the network-status-diff line is another header line, starting with
the token "hash" followed by the full digest of the consensus that this diff
applies to and the full digest of the consensus that the resulting consensus
should have.
Following the network-status-diff header lines is a diff, or patch, in
limited ed format. We choose this format because it is easy to create
and process with standard tools (patch, diff -e, ed). This will help
us in developing and testing this proposal and it should make future
debugging easier.
[ If at one point in the future we decide that the space benefits from
a custom diff format outweighs these benefits we can always
introduce a new diff format and offer it at for instance
../diff2/... ]
We support the following ed commands, each on a line by itself:
- "<n1>d" Delete line n1
- "<n1>,<n2>d" Delete lines n1 through n2, including
- "<n1>c" Replace line n1 with the following block
- "<n1>,<n2>c" Replace lines n1 through n2, including, with the
following block.
- "<n1>a" Append the following block after line n1.
- "a" Append the following block after the current line.
Note that line numbers always apply to the file after all previous
commands have already been applied.
The commands MUST apply to the file from back to front, such that
lines are only ever referred to by their position in the original
file.
The "current line" is either the first line of the file, if this is
the first command, the last line of a block we added in an append or
change command, or the line immediate following a set of lines we just
deleted (or the last line of the file if there are no lines after
that).
The replace and append command take blocks. These blocks are simply
appended to the diff after the line with the command. A line with
just a period (".") ends the block (and is not part of the lines
to add). Note that it is impossible to insert a line with just
a single dot.
|