diff options
Diffstat (limited to 'doc/spec/dir-spec.txt')
-rw-r--r-- | doc/spec/dir-spec.txt | 500 |
1 files changed, 483 insertions, 17 deletions
diff --git a/doc/spec/dir-spec.txt b/doc/spec/dir-spec.txt index 9a2a62bc46..cfe6cd8f92 100644 --- a/doc/spec/dir-spec.txt +++ b/doc/spec/dir-spec.txt @@ -1,4 +1,3 @@ -$Id$ Tor directory protocol, version 3 @@ -11,7 +10,7 @@ $Id$ Caches and authorities must still support older versions of the directory protocols, until the versions of Tor that require them are - finally out of commission. See Section XXXX on backward compatibility. + finally out of commission. This document merges and supersedes the following proposals: @@ -183,7 +182,8 @@ $Id$ All directory information is uploaded and downloaded with HTTP. [Authorities also generate and caches also cache documents produced and - used by earlier versions of this protocol; see section XXX for notes.] + used by earlier versions of this protocol; see dir-spec-v1.txt and + dir-spec-v2.txt for notes on those versions.] 1.1. What's different from version 2? @@ -592,9 +592,9 @@ $Id$ with unrecognized items; the protocols line should be preceded with an "opt" until these Tors are obsolete.] - "allow-single-hop-exits" + "allow-single-hop-exits" NL - [At most one.] + [At most once.] Present only if the router allows single-hop circuits to make exit connections. Most Tor servers do not support this: this is @@ -613,7 +613,7 @@ $Id$ Fingerprint is encoded in hex (using upper-case letters), with no spaces. - "published" + "published" YYYY-MM-DD HH:MM:SS NL [Exactly once.] @@ -628,8 +628,8 @@ $Id$ As documented in 2.1 above. See migration notes in section 2.2.1. - "geoip-start" YYYY-MM-DD HH:MM:SS NL - "geoip-client-origins" CC=N,CC=N,... NL + ("geoip-start" YYYY-MM-DD HH:MM:SS NL) + ("geoip-client-origins" CC=N,CC=N,... NL) Only generated by bridge routers (see blocking.pdf), and only when they have been configured with a geoip database. @@ -642,6 +642,227 @@ $Id$ "geoip-start" is the time at which we began collecting geoip statistics. + "geoip-start" and "geoip-client-origins" have been replaced by + "bridge-stats-end" and "bridge-stats-ips" in 0.2.2.4-alpha. The + reason is that the measurement interval with "geoip-stats" as + determined by subtracting "geoip-start" from "published" could + have had a variable length, whereas the measurement interval in + 0.2.2.4-alpha and later is set to be exactly 24 hours long. In + order to clearly distinguish the new measurement intervals from + the old ones, the new keywords have been introduced. + + "bridge-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL + [At most once.] + + YYYY-MM-DD HH:MM:SS defines the end of the included measurement + interval of length NSEC seconds (86400 seconds by default). + + A "bridge-stats-end" line, as well as any other "bridge-*" line, + is only added when the relay has been running as a bridge for at + least 24 hours. + + "bridge-ips" CC=N,CC=N,... NL + [At most once.] + + List of mappings from two-letter country codes to the number of + unique IP addresses that have connected from that country to the + bridge and which are no known relays, rounded up to the nearest + multiple of 8. + + "dirreq-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL + [At most once.] + + YYYY-MM-DD HH:MM:SS defines the end of the included measurement + interval of length NSEC seconds (86400 seconds by default). + + A "dirreq-stats-end" line, as well as any other "dirreq-*" line, + is only added when the relay has opened its Dir port and after 24 + hours of measuring directory requests. + + "dirreq-v2-ips" CC=N,CC=N,... NL + [At most once.] + "dirreq-v3-ips" CC=N,CC=N,... NL + [At most once.] + + List of mappings from two-letter country codes to the number of + unique IP addresses that have connected from that country to + request a v2/v3 network status, rounded up to the nearest multiple + of 8. Only those IP addresses are counted that the directory can + answer with a 200 OK status code. + + "dirreq-v2-reqs" CC=N,CC=N,... NL + [At most once.] + "dirreq-v3-reqs" CC=N,CC=N,... NL + [At most once.] + + List of mappings from two-letter country codes to the number of + requests for v2/v3 network statuses from that country, rounded up + to the nearest multiple of 8. Only those requests are counted that + the directory can answer with a 200 OK status code. + + "dirreq-v2-share" num% NL + [At most once.] + "dirreq-v3-share" num% NL + [At most once.] + + The share of v2/v3 network status requests that the directory + expects to receive from clients based on its advertised bandwidth + compared to the overall network bandwidth capacity. Shares are + formatted in percent with two decimal places. Shares are + calculated as means over the whole 24-hour interval. + + "dirreq-v2-resp" status=num,... NL + [At most once.] + "dirreq-v3-resp" status=nul,... NL + [At most once.] + + List of mappings from response statuses to the number of requests + for v2/v3 network statuses that were answered with that response + status, rounded up to the nearest multiple of 4. Only response + statuses with at least 1 response are reported. New response + statuses can be added at any time. The current list of response + statuses is as follows: + + "ok": a network status request is answered; this number + corresponds to the sum of all requests as reported in + "dirreq-v2-reqs" or "dirreq-v3-reqs", respectively, before + rounding up. + "not-enough-sigs: a version 3 network status is not signed by a + sufficient number of requested authorities. + "unavailable": a requested network status object is unavailable. + "not-found": a requested network status is not found. + "not-modified": a network status has not been modified since the + If-Modified-Since time that is included in the request. + "busy": the directory is busy. + + "dirreq-v2-direct-dl" key=val,... NL + [At most once.] + "dirreq-v3-direct-dl" key=val,... NL + [At most once.] + "dirreq-v2-tunneled-dl" key=val,... NL + [At most once.] + "dirreq-v3-tunneled-dl" key=val,... NL + [At most once.] + + List of statistics about possible failures in the download process + of v2/v3 network statuses. Requests are either "direct" + HTTP-encoded requests over the relay's directory port, or + "tunneled" requests using a BEGIN_DIR cell over the relay's OR + port. The list of possible statistics can change, and statistics + can be left out from reporting. The current list of statistics is + as follows: + + Successful downloads and failures: + + "complete": a client has finished the download successfully. + "timeout": a download did not finish within 10 minutes after + starting to send the response. + "running": a download is still running at the end of the + measurement period for less than 10 minutes after starting to + send the response. + + Download times: + + "min", "max": smallest and largest measured bandwidth in B/s. + "d[1-4,6-9]": 1st to 4th and 6th to 9th decile of measured + bandwidth in B/s. For a given decile i, i/10 of all downloads + had a smaller bandwidth than di, and (10-i)/10 of all downloads + had a larger bandwidth than di. + "q[1,3]": 1st and 3rd quartile of measured bandwidth in B/s. One + fourth of all downloads had a smaller bandwidth than q1, one + fourth of all downloads had a larger bandwidth than q3, and the + remaining half of all downloads had a bandwidth between q1 and + q3. + "md": median of measured bandwidth in B/s. Half of the downloads + had a smaller bandwidth than md, the other half had a larger + bandwidth than md. + + "entry-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL + [At most once.] + + YYYY-MM-DD HH:MM:SS defines the end of the included measurement + interval of length NSEC seconds (86400 seconds by default). + + An "entry-stats-end" line, as well as any other "entry-*" + line, is first added after the relay has been running for at least + 24 hours. + + "entry-ips" CC=N,CC=N,... NL + [At most once.] + + List of mappings from two-letter country codes to the number of + unique IP addresses that have connected from that country to the + relay and which are no known other relays, rounded up to the + nearest multiple of 8. + + "cell-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL + [At most once.] + + YYYY-MM-DD HH:MM:SS defines the end of the included measurement + interval of length NSEC seconds (86400 seconds by default). + + A "cell-stats-end" line, as well as any other "cell-*" line, + is first added after the relay has been running for at least 24 + hours. + + "cell-processed-cells" num,...,num NL + [At most once.] + + Mean number of processed cells per circuit, subdivided into + deciles of circuits by the number of cells they have processed in + descending order from loudest to quietest circuits. + + "cell-queued-cells" num,...,num NL + [At most once.] + + Mean number of cells contained in queues by circuit decile. These + means are calculated by 1) determining the mean number of cells in + a single circuit between its creation and its termination and 2) + calculating the mean for all circuits in a given decile as + determined in "cell-processed-cells". Numbers have a precision of + two decimal places. + + "cell-time-in-queue" num,...,num NL + [At most once.] + + Mean time cells spend in circuit queues in milliseconds. Times are + calculated by 1) determining the mean time cells spend in the + queue of a single circuit and 2) calculating the mean for all + circuits in a given decile as determined in + "cell-processed-cells". + + "cell-circuits-per-decile" num NL + [At most once.] + + Mean number of circuits that are included in any of the deciles, + rounded up to the next integer. + + "exit-stats-end" YYYY-MM-DD HH:MM:SS (NSEC s) NL + [At most once.] + + YYYY-MM-DD HH:MM:SS defines the end of the included measurement + interval of length NSEC seconds (86400 seconds by default). + + An "exit-stats-end" line, as well as any other "exit-*" line, is + first added after the relay has been running for at least 24 hours + and only if the relay permits exiting (where exiting to a single + port and IP address is sufficient). + + "exit-kibibytes-written" port=N,port=N,... NL + [At most once.] + "exit-kibibytes-read" port=N,port=N,... NL + [At most once.] + + List of mappings from ports to the number of kibibytes that the + relay has written to or read from exit connections to that port, + rounded up to the next full kibibyte. + + "exit-streams-opened" port=N,port=N,... NL + [At most once.] + + List of mappings from ports to the number of opened exit streams + to that port, rounded up to the nearest multiple of 4. + "router-signature" NL Signature NL [At end, exactly once.] @@ -795,10 +1016,10 @@ $Id$ generate exactly the same consensus given the same set of votes. The procedure for deciding when to generate vote and consensus status - documents are described in section XXX below. + documents are described in section 1.4 on the voting timeline. Status documents contain a preamble, an authority section, a list of - router status entries, and one more footers signature, in that order. + router status entries, and one or more footer signature, in that order. Unlike other formats described above, a SP in these documents must be a single space character (hex 20). @@ -905,6 +1126,32 @@ $Id$ enough votes were counted for the consensus for an authoritative opinion to have been formed about their status. + "params" SP [Parameters] NL + + [At most once] + + Parameter ::= Keyword '=' Int32 + Int32 ::= A decimal integer between -2147483648 and 2147483647. + Parameters ::= Parameter | Parameters SP Parameter + + The parameters list, if present, contains a space-separated list of + key-value pairs, sorted in lexical order by their keyword. Each + parameter has its own meaning. + + (Only included when the vote is generated with consensus-method 7 or + later.) + + Commonly used "param" arguments at this point include: + + "CircWindow" -- the default package window that circuits should + be established with. It started out at 1000 cells, but some + research indicates that a lower value would mean fewer cells in + transit in the network at any given time. Obeyed by Tor 0.2.1.20 + and later. + + "CircPriorityHalflifeMsec" -- the halflife parameter used when + weighting which circuit will send the next cell. Obeyed by Tor + 0.2.2.7-alpha and later. The authority section of a vote contains the following items, followed in turn by the authority's current key certificate: @@ -1030,13 +1277,20 @@ $Id$ descriptors if they would cause "v" lines to be over 128 characters long. - "w" SP "Bandwidth=" INT NL + "w" SP "Bandwidth=" INT [SP "Measured=" INT] NL [At most once.] An estimate of the bandwidth of this server, in an arbitrary unit (currently kilobytes per second). Used to weight router - selection. Other weighting keywords may be added later. + selection. + + Additionally, the Measured= keyword is present in votes by + participating bandwidth measurement authorities to indicate + a measured bandwidth currently produced by measuring stream + capacities. + + Other weighting keywords may be added later. Clients MUST ignore keywords they do not recognize. "p" SP ("accept" / "reject") SP PortList NL @@ -1051,8 +1305,57 @@ $Id$ or does not support (if 'reject') for exit to "most addresses". - The signature section contains the following item, which appears - Exactly Once for a vote, and At Least Once for a consensus. + The footer section is delineated in all votes and consensuses supporting + consensus method 9 and above with the following: + + "directory-footer" NL + + It contains two subsections, a bandwidths-weights line and a + directory-signature. + + The bandwidths-weights line appears At Most Once for a consensus. It does + not appear in votes. + + "bandwidth-weights" SP + "Wbd=" INT SP "Wbe=" INT SP "Wbg=" INT SP "Wbm=" INT SP + "Wdb=" INT SP + "Web=" INT SP "Wed=" INT SP "Wee=" INT SP "Weg=" INT SP "Wem=" INT SP + "Wgb=" INT SP "Wgd=" INT SP "Wgg=" INT SP "Wgm=" INT SP + "Wmb=" INT SP "Wmd=" INT SP "Wme=" INT SP "Wmg=" INT SP "Wmm=" INT NL + + These values represent the weights to apply to router bandwidths during + path selection. They are sorted in alphabetical order in the list. The + integer values are divided by BW_WEIGHT_SCALE=10000 or the consensus + param "bwweightscale". They are: + + Wgg - Weight for Guard-flagged nodes in the guard position + Wgm - Weight for non-flagged nodes in the guard Position + Wgd - Weight for Guard+Exit-flagged nodes in the guard Position + + Wmg - Weight for Guard-flagged nodes in the middle Position + Wmm - Weight for non-flagged nodes in the middle Position + Wme - Weight for Exit-flagged nodes in the middle Position + Wmd - Weight for Guard+Exit flagged nodes in the middle Position + + Weg - Weight for Guard flagged nodes in the exit Position + Wem - Weight for non-flagged nodes in the exit Position + Wee - Weight for Exit-flagged nodes in the exit Position + Wed - Weight for Guard+Exit-flagged nodes in the exit Position + + Wgb - Weight for BEGIN_DIR-supporting Guard-flagged nodes + Wmb - Weight for BEGIN_DIR-supporting non-flagged nodes + Web - Weight for BEGIN_DIR-supporting Exit-flagged nodes + Wdb - Weight for BEGIN_DIR-supporting Guard+Exit-flagged nodes + + Wbg - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests + Wbm - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests + Wbe - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests + Wbd - Weight for Guard+Exit-flagged nodes for BEGIN_DIR requests + + These values are calculated as specified in Section 3.4.3. + + The signature contains the following item, which appears Exactly Once + for a vote, and At Least Once for a consensus. "directory-signature" SP identity SP signing-key-digest NL Signature @@ -1065,7 +1368,7 @@ $Id$ the signing authority, and "signing-key-digest" is the hex-encoded digest of the current authority signing key of the signing authority. -3.3. Deciding how to vote. +3.3. Assigning flags in a vote (This section describes how directory authorities choose which status flags to apply to routers, as of Tor 0.2.0.0-alpha-dev. Later directory @@ -1128,7 +1431,7 @@ $Id$ least one /8 address space. "Fast" -- A router is 'Fast' if it is active, and its bandwidth is - either in the top 7/8ths for known active routers or at least 100KB/s. + either in the top 7/8ths for known active routers or at least 20KB/s. "Guard" -- A router is a possible 'Guard' if its Weighted Fractional Uptime is at least the median for "familiar" active routers, and if @@ -1179,6 +1482,13 @@ $Id$ rate limit from the router descriptor. It is given in kilobytes per second, and capped at some arbitrary value (currently 10 MB/s). + The Measured= keyword on a "w" line vote is currently computed + by multiplying the previous published consensus bandwidth by the + ratio of the measured average node stream capacity to the network + average. If 3 or more authorities provide a Measured= keyword for + a router, the authorities produce a consensus containing a "w" + Bandwidth= keyword equal to the median of the Measured= votes. + The ports listed in a "p" line should be taken as those ports for which the router's exit policy permits 'most' addresses, ignoring any accept not for all addresses, ignoring all rejects for private @@ -1199,6 +1509,10 @@ $Id$ Known-flags is the union of all flags known by any voter. + Entries are given on the "params" line for every keyword on which any + authority voted. The values given are the low-median of all votes on + that keyword. + "client-versions" and "server-versions" are sorted in ascending order; A version is recommended in the consensus if it is recommended by more than half of the voting authorities that included a @@ -1261,6 +1575,14 @@ $Id$ one, breaking ties in favor of the lexicographically larger vote.) The port list is encoded as specified in 3.4.2. + * If consensus-method 6 or later is in use and if 3 or more + authorities provide a Measured= keyword in their votes for + a router, the authorities produce a consensus containing a + Bandwidth= keyword equal to the median of the Measured= votes. + + * If consensus-method 7 or later is in use, the params line is + included in the output. + The signatures at the end of a consensus document are sorted in ascending order by identity digest. @@ -1281,6 +1603,10 @@ $Id$ "3" -- Added legacy ID key support to aid in authority ID key rollovers "4" -- No longer list routers that are not running in the consensus "5" -- adds support for "w" and "p" lines. + "6" -- Prefers measured bandwidth values rather than advertised + "7" -- Provides keyword=integer pairs of consensus parameters + "8" -- Provides microdescriptor summaries + "9" -- Provides weights for selecting flagged routers in paths Before generating a consensus, an authority must decide which consensus method to use. To do this, it looks for the highest version number @@ -1313,6 +1639,147 @@ $Id$ use an accept-style summary and list as much of the port list as is possible within these 1000 bytes. [XXXX be more specific.] +3.4.3. Computing Bandwidth Weights + + Let weight_scale = 10000 + + Let G be the total bandwidth for Guard-flagged nodes. + Let M be the total bandwidth for non-flagged nodes. + Let E be the total bandwidth for Exit-flagged nodes. + Let D be the total bandwidth for Guard+Exit-flagged nodes. + Let T = G+M+E+D + + Let Wgd be the weight for choosing a Guard+Exit for the guard position. + Let Wmd be the weight for choosing a Guard+Exit for the middle position. + Let Wed be the weight for choosing a Guard+Exit for the exit position. + + Let Wme be the weight for choosing an Exit for the middle position. + Let Wmg be the weight for choosing a Guard for the middle position. + + Let Wgg be the weight for choosing a Guard for the guard position. + Let Wee be the weight for choosing an Exit for the exit position. + + Balanced network conditions then arise from solutions to the following + system of equations: + + Wgg*G + Wgd*D == M + Wmd*D + Wme*E + Wmg*G (guard bw = middle bw) + Wgg*G + Wgd*D == Wee*E + Wed*D (guard bw = exit bw) + Wed*D + Wmd*D + Wgd*D == D (aka: Wed+Wmd+Wdg = 1) + Wmg*G + Wgg*G == G (aka: Wgg = 1-Wmg) + Wme*E + Wee*E == E (aka: Wee = 1-Wme) + + We are short 2 constraints with the above set. The remaining constraints + come from examining different cases of network load. + + Case 1: E >= T/3 && G >= T/3 (Neither Exit nor Guard Scarce) + + In this case, the additional two constraints are: Wme*E == Wmd*D and + Wgd == 0, which maximizes Exit-flagged bandwidth in the middle position. + + This leads to the solution: + + Wgg = (weight_scale*(D+E+G+M))/(3*G) + Wmd = (weight_scale*(2*D + 2*E - G - M))/(6*D) + Wme = (weight_scale*(2*D + 2*E - G - M))/(6*E) + Wee = (weight_scale*(-2*D + 4*E + G + M))/(6*E) + Wmg = weight_scale - Wgg + Wed = weight_scale - Wmd + Wgd = 0 + + Case 2: E < T/3 && G < T/3 (Both are scarce) + + Let R denote the more scarce class (Rare) between Guard vs Exit. + Let S denote the less scarce class. + + Subcase a: R+D < S + + In this subcase, we simply devote all of D bandwidth to the + scarce class. + + Wgg = Wee = weight_scale + Wmg = Wme = Wmd = 0; + if E < G: + Wed = weight_scale + Wgd = 0 + else: + Wed = 0 + Wgd = weight_scale + + Subcase b: R+D >= S + + In this case, if M <= T/3, we have enough bandwidth to try to achieve + a balancing condition, and add the constraints Wgg == 1 and + Wme*E == Wmd*D: + + Wgg = weight_scale + Wgd = (weight_scale*(D + E - 2*G + M))/(3*D) (T/3 >= G (Ok)) + Wmd = (weight_scale*(D + E + G - 2*M))/(6*D) (T/3 >= M) + Wme = (weight_scale*(D + E + G - 2*M))/(6*E) + Wee = (weight_scale*(-D + 5*E - G + 2*M))/(6*E) (2E+M >= T/3) + Wmg = 0; + Wed = weight_scale - Wgd - Wmd + + If M >= T/3, the above solution will not be valid (one of the weights + will be < 0 or > 1). In this case, we use: + + Wgg = weight_scale + Wee = weight_scale + Wmg = Wme = Wmd = 0 + Wgd = (weight_scale*(D+E-G))/(2*D) + Wed = weight_scale - Wgd + + Case 3: One of E < T/3 or G < T/3 + + Let S be the scarce class (of E or G). + + Subcase a: (S+D) < T/3: + if G=S: + Wgg = Wgd = weight_scale; + Wmd = Wed = Wmg = 0; + Wme = (weight_scale*(E-M))/(2*E); + Wee = weight_scale-Wme; + if E=S: + Wee = Wed = weight_scale; + Wmd = Wgd = Wmg = 0; + Wmg = (weight_scale*(G-M))/(2*G); + Wgg = weight_scale-Wmg; + + Subcase b: (S+D) >= T/3 + if G=S: + Add constraints Wmg = 0, Wme*E == Wmd*D to maximize exit bandwidth + in the middle position: + Wgd = (weight_scale*(D + E - 2*G + M))/(3*D); + Wmd = (weight_scale*(D + E + G - 2*M))/(6*D); + Wme = (weight_scale*(D + E + G - 2*M))/(6*E); + Wee = (weight_scale*(-D + 5*E - G + 2*M))/(6*E); + Wgg = weight_scale; + Wmg = 0; + Wed = weight_scale - Wgd - Wmd; + if E=S: + Add constraints Wgd = 0, Wme*E == Wmd*D: + Wgg = (weight_scale*(D + E + G + M))/(3*G); + Wmd = (weight_scale*(2*D + 2*E - G - M))/(6*D); + Wme = (weight_scale*(2*D + 2*E - G - M))/(6*E); + Wee = (weight_scale*(-2*D + 4*E + G + M))/(6*E); + Wgd = 0; + Wmg = weight_scale - Wgg; + Wed = weight_scale - Wmd; + + To ensure consensus, all calculations are performed using integer math + with a fixed precision determined by the bwweightscale consensus + parameter (defaults at 10000). + + For future balancing improvements, Tor clients support 11 additional weights + for directory requests and middle weighting. These weights are currently + set at weight_scale, with the exception of the following groups of + assignments: + + Directory requests use middle weights: + Wbd=Wmd, Wbg=Wmg, Wbe=Wme, Wbm=Wmm + + Handle bridges and strange exit policies: + Wgm=Wgg, Wem=Wee, Weg=Wed + 3.5. Detached signatures Assuming full connectivity, every authority should compute and sign the @@ -1884,7 +2351,6 @@ $Id$ A. Consensus-negotiation timeline. - Period begins: this is the Published time. Everybody sends votes Reconciliation: everybody tries to fetch missing votes. |