From 4e78e4971868036b08752fb7019db5ff23e6d2ba Mon Sep 17 00:00:00 2001 From: Nick Mathewson Date: Mon, 16 Jun 2008 01:14:02 +0000 Subject: Add proposal 140-consensus-diffs.txt svn:r15293 --- proposals/140-consensus-diffs.txt | 148 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 proposals/140-consensus-diffs.txt (limited to 'proposals/140-consensus-diffs.txt') diff --git a/proposals/140-consensus-diffs.txt b/proposals/140-consensus-diffs.txt new file mode 100644 index 0000000..e2f0edc --- /dev/null +++ b/proposals/140-consensus-diffs.txt @@ -0,0 +1,148 @@ +Filename: 140-consensus-diffs.txt +Title: Provide diffs between consensuses +Version: $Revision$ +Last-Modified: $Date$ +Author: Peter Palfrader +Created: 13-Jun-2008 +Status: Open + +1. Overview. + + Tor clients and servers need a list of which relays are on the + network. This list, the consensus, is created by authorities + hourly and clients fetch a copy of it, with some delay, hourly. + + This proposal suggests that clients download diffs of consensuses + once they have a consensus instead of hourly downloading a full + consensus. + +2. Numbers + + After implementing proposal 138 which removes nodes that are not + running from the list a consensus document is about 92 kilobytes + in size after compression. + + The diff between two consecutive consensus, in ed format, is on + average 13 kilobytes compressed. + +3. Proposal + +3.1 Clients + + If a client has a consensus that is recent enough it SHOULD + try to download a diff to get the latest consensus rather than + fetching a full one. + + [XXX: what is recent enough? + time delta in hours / size of compressed diff + 0 20 + 1 9650 + 2 17011 + 3 23150 + 4 29813 + 5 36079 + 6 39455 + 7 43903 + 8 48907 + 9 54549 + 10 60057 + 11 67810 + 12 71171 + 13 73863 + 14 76048 + 15 80031 + 16 84686 + 17 89862 + 18 94760 + 19 94868 + 20 94223 + 21 93921 + 22 92144 + 23 90228 + [ size of gzip compressed "diff -e" between the consensus on + 2008-06-01-00:00:00 and the following consensuses that day. + Consensuses have been modified to exclude down routers per + proposal 138. ] + + Data suggests that for the first few hours diffs are very useful, + saving about 60% for the first three hours, 30% for the first 10, + and almost nothing once we are past 16 hours. + ] + +3.2 Servers + + Directory authorities and servers need to keep up to X [XXX: depends + on how long clients try to download diffs per above] old consensus + documents so they can build diffs. They should offer a diff to the + most recent consensus at the URL + + http://tor.noreply.org/tor/status-vote/current/consensus/diff// + + where hash is the full digest of the consensus the client currently + has, and FPRLIST is a list of (abbreviated) fingerprints of + authorities the client trusts. + + Servers will only return a consensus if more than half of the requested + authorities have signed the document, otherwise a 404 error will be sent + back. The fingerprints can be shortened to a length of any multiple of + two, using only the leftmost part of the encoded fingerprint. Tor uses + 3 bytes (6 hex characters) of the fingerprint. (This is just like the + conditional consensus downloads that Tor supports starting with + 0.1.2.1-alpha.) + + If a server cannot offer a diff from the consensus identified by the + hash but has a current consensus it MUST return the full consensus. + + [XXX: what should we do when the client already has the latest + consensus? I can think of the following options: + - send back 3xx not modified + - send back 200 ok and an empty diff + - send back 404 nothing newer here. + + I currently lean towards the empty diff.] + +4. Diff Format + + Diffs start with the token "network-status-diff-version" followed by a + space and the version number, currently "1". + + If a document does not start with network-status-diff it is assumed + to be a full consensus download and would therefore currently start + with "network-status-version 3". + + Following the network-status-diff header line is a diff, or patch, in + limited ed format. We choose this format because it is easy to create + and process with standard tools (patch, diff -e, ed). This will help + us in developing and testing this proposal and it should make future + debugging easier. + + [ If at one point in the future we decide that the space benefits from + a custom diff format outweighs these benefits we can always + introduce a new diff format and offer it at for instance + ../diff2/... ] + + We support the following ed commands, each on a line by itself: + - "d" Delete line n1 + - ",d" Delete lines n1 through n2, including + - "c" Replace line n1 with the following block + - ",c" Replace lines n1 through n2, including, with the + following block. + - "a" Append the following block after line n1. + - "a" Append the following block after the current line. + - "s/.//" Remove the first character in the current line. + + Note that line numbers always apply to the file after all previous + commands have already been applied. + + The "current line" is either the first line of the file, if this is + the first command, the last line of a block we added in an append or + change command, or the line immediate following a set of lines we just + deleted (or the last line of the file if there are no lines after + that). + + The replace and append command take blocks. These blocks are simply + appended to the diff after the line with the command. A line with + just a period (".") ends the block (and is not part of the lines + to add). Note that it is impossible to insert a line with just + a single dot. Recommended procedure is to insert a line with + two dots, then remove the first character of that line using s/.//. -- cgit v1.2.3-54-g00ecf