diff options
Diffstat (limited to 'spec/dir-spec/computing-consensus.md')
-rw-r--r-- | spec/dir-spec/computing-consensus.md | 683 |
1 files changed, 683 insertions, 0 deletions
diff --git a/spec/dir-spec/computing-consensus.md b/spec/dir-spec/computing-consensus.md new file mode 100644 index 0000000..329fd45 --- /dev/null +++ b/spec/dir-spec/computing-consensus.md @@ -0,0 +1,683 @@ +# Computing a consensus from a set of votes { #computing-consensus } + +Given a set of votes, authorities compute the contents of the consensus. + +The consensus status, along with as many signatures as the server +currently knows (see section 3.10 below), should be available at + +`http://<hostname>/tor/status-vote/next/consensus.z` + +The contents of the consensus document are as follows: + +The "valid-after", "valid-until", and "fresh-until" times are taken as +the median of the respective values from all the votes. + +The times in the "voting-delay" line are taken as the median of the +VoteSeconds and DistSeconds times in the votes. + +Known-flags is the union of all flags known by any voter. + +Entries are given on the "params" line for every keyword on which a +majority of authorities (total authorities, not just those +participating in this vote) voted on, or if at least three +authorities voted for that parameter. The values given are the +low-median of all votes on that keyword. + +(In consensus methods 7 to 11 inclusive, entries were given on +the "params" line for every keyword on which *any* authority voted, +the value given being the low-median of all votes on that keyword.) + +```text + "client-versions" and "server-versions" are sorted in ascending + order; A version is recommended in the consensus if it is recommended + by more than half of the voting authorities that included a + client-versions or server-versions lines in their votes. +``` + +With consensus methods 19 through 33, a package line is generated for a +given PACKAGENAME/VERSION pair if at least three authorities list such a +package in their votes. (Call these lines the "input" lines for +PACKAGENAME.) The consensus will contain every "package" line that is +listed verbatim by more than half of the authorities listing a line for +the PACKAGENAME/VERSION pair, and no others. + +The authority item groups (dir-source, contact, fingerprint, +vote-digest) are taken from the votes of the voting +authorities. These groups are sorted by the digests of the +authorities identity keys, in ascending order. If the consensus +method is 3 or later, a dir-source line must be included for +every vote with legacy-key entry, using the legacy-key's +fingerprint, the voter's ordinary nickname with the string +"-legacy" appended, and all other fields as from the original +vote's dir-source line. + +```text + A router status entry: + * is included in the result if some router status entry with the same + identity is included by more than half of the authorities (total + authorities, not just those whose votes we have). + (Consensus method earlier than 21) + + * is included according to the rules in section 3.8.0.1 and + 3.8.0.2 below. (Consensus method 22 or later) + + * For any given RSA identity digest, we include at most + one router status entry. + + * For any given Ed25519 identity, we include at most one router + status entry. + + * A router entry has a flag set if that is included by more than half + of the authorities who care about that flag. + + * Two router entries are "the same" if they have the same + (descriptor digest, published time, nickname, IP, ports> tuple. + We choose the tuple for a given router as whichever tuple appears + for that router in the most votes. We break ties first in favor of + the more recently published, then in favor of smaller server + descriptor digest. + + [ + * The Named flag appears if it is included for this routerstatus by + _any_ authority, and if all authorities that list it list the same + nickname. However, if consensus-method 2 or later is in use, and + any authority calls this identity/nickname pair Unnamed, then + this routerstatus does not get the Named flag. + + * If consensus-method 2 or later is in use, the Unnamed flag is + set for a routerstatus if any authorities have voted for a different + identities to be Named with that nickname, or if any authority + lists that nickname/ID pair as Unnamed. + + (With consensus-method 1, Unnamed is set like any other flag.) + + [But note that authorities no longer vote for the Named flag, + and the above two bulletpoints are now irrelevant.] + ] + + * The version is given as whichever version is listed by the most + voters, with ties decided in favor of more recent versions. + + * If consensus-method 4 or later is in use, then routers that + do not have the Running flag are not listed at all. + + * If consensus-method 5 or later is in use, then the "w" line + is generated using a low-median of the bandwidth values from + the votes that included "w" lines for this router. + + * If consensus-method 5 or later is in use, then the "p" line + is taken from the votes that have the same policy summary + for the descriptor we are listing. (They should all be the + same. If they are not, we pick the most commonly listed + one, breaking ties in favor of the lexicographically larger + vote.) The port list is encoded as specified in section 3.8.2. + + * If consensus-method 6 or later is in use and if 3 or more + authorities provide a Measured= keyword in their votes for + a router, the authorities produce a consensus containing a + Bandwidth= keyword equal to the median of the Measured= votes. + + * If consensus-method 7 or later is in use, the params line is + included in the output. + + * If the consensus method is under 11, bad exits are considered as + possible exits when computing bandwidth weights. Otherwise, if + method 11 or later is in use, any router that is determined to get + the BadExit flag doesn't count when we're calculating weights. + + * If consensus method 12 or later is used, only consensus + parameters that more than half of the total number of + authorities voted for are included in the consensus. + + [ As of 0.2.6.1-alpha, authorities no longer advertise or negotiate + any consensus methods lower than 13. ] + + * If consensus method 13 or later is used, microdesc consensuses + omit any router for which no microdesc was agreed upon. + + * If consensus method 14 or later is used, the ns consensus and + microdescriptors may include an "a" line for each router, listing + an IPv6 OR port. + + * If consensus method 15 or later is used, microdescriptors + include "p6" lines including IPv6 exit policies. + + * If consensus method 16 or later is used, ntor-onion-key + are included in microdescriptors + + * If consensus method 17 or later is used, authorities impose a + maximum on the Bandwidth= values that they'll put on a 'w' + line for any router that doesn't have at least 3 measured + bandwidth values in votes. They also add an "Unmeasured=1" + flag to such 'w' lines. + + * If consensus method 18 or later is used, authorities include + "id" lines in microdescriptors. This method adds RSA ids. + + * If consensus method 19 or later is used, authorities may include + "package" lines in consensuses. + + * If consensus method 20 or later is used, authorities may include + GuardFraction information in microdescriptors. + + * If consensus method 21 or later is used, authorities may include + an "id" line for ed25519 identities in microdescriptors. + + [ As of 0.2.8.2-alpha, authorities no longer advertise or negotiate + consensus method 21, because it contains bugs. ] + + * If consensus method 22 or later is used, and the votes do not + produce a majority consensus about a relay's Ed25519 key (see + 3.8.0.1 below), the consensus must include a NoEdConsensus flag on + the "s" line for every relay whose listed Ed key does not reflect + consensus. + + * If consensus method 23 or later is used, authorities include + shared randomness protocol data on their votes and consensus. + + * If consensus-method 24 or later is in use, then routers that + do not have the Valid flag are not listed at all. + + [ As of 0.3.4.1-alpha, authorities no longer advertise or negotiate + any consensus methods lower than 25. ] + + * If consensus-method 25 or later is in use, then we vote + on recommended-protocols and required-protocols lines in the + consensus. We also include protocols lines in routerstatus + entries. + + * If consensus-method 26 or later is in use, then we initialize + bandwidth weights to 1 in our calculations, to avoid + division-by-zero errors on unusual networks. + + * If consensus method 27 or later is used, the microdesc consensus + may include an "a" line for each router, listing an IPv6 OR port. + + [ As of 0.4.3.1-alpha, authorities no longer advertise or negotiate + any consensus methods lower than 28. ] + + * If consensus method 28 or later is used, microdescriptors no longer + include "a" lines. + + * If consensus method 29 or later is used, microdescriptor "family" + lines are canonicalized to improve compression. + + * If consensus method 30 or later is used, the base64 encoded + ntor-onion-key does not include the trailing = sign. + + * If consensus method 31 or later is used, authorities parse the + "bwweightscale" and "maxunmeasuredbw" parameters correctly when + computing votes. + + * If consensus method 32 or later is used, authorities handle the + "MiddleOnly" flag specially when computing a consensus. When the + voters agree to include "MiddleOnly" in a routerstatus, they + automatically remove "Exit", "Guard", "V2Dir", and "HSDir". If + the BadExit flag is included in the consensus, they automatically + add it to the routerstatus. + + * If consensus method 33 or later is used, and the consensus + flavor is "microdesc", then the "Publication" field in the "r" + line is set to "2038-01-01 00:00:00". + + * If consensus method 34 or later is used, the consensus + does not include any "package" lines. +``` + +The signatures at the end of a consensus document are sorted in +ascending order by identity digest. + +All ties in computing medians are broken in favor of the smaller or +earlier item. + +<a id="dir-spec.txt-3.8.0.1"></a> + +## Deciding which Ids to include { #choosing-relay-ids } + +This sorting algorithm is used for consensus-method 22 and later. + +```text + First, consider each listing by tuple of <Ed,Rsa> identities, where 'Ed' + may be "None" if the voter included "id ed25519 none" to indicate that + the authority knows what ed25519 identities are, and thinks that the RSA + key doesn't have one. + + For each such <Ed, RSA> tuple that is listed by more than half of the + total authorities (not just total votes), include it. (It is not + possible for any other <id-Ed, id-RSA'> to have as many votes.) If more + than half of the authorities list a single <Ed,Rsa> pair of this type, we + consider that Ed key to be "consensus"; see description of the + NoEdConsensus flag. + + Log any other id-RSA values corresponding to an id-Ed we included, and any + other id-Ed values corresponding to an id-RSA we included. + + For each <id-RSA> that is not yet included, if it is listed by more than + half of the total authorities, and we do not already have it listed with + some <id-Ed>, include it, but do not consider its Ed identity canonical. +``` + +<a id="dir-spec.txt-3.8.0.2"></a> + +### Deciding which descriptors to include { #choosing-relay-descs } + +Deciding which descriptors to include. + +A tuple belongs to an `<id-RSA, id-Ed>` identity if it is a new tuple that +matches both ID parts, or if it is an old tuple (one with no Ed opinion) +that matches the RSA part. A tuple belongs to an `<id-RSA>` identity if its +RSA identity matches. + +A tuple matches another tuple if all the fields that are present in both +tuples are the same. + +For every included identity, consider the tuples belonging to that +identity. Group them into sets of matching tuples. Include the tuple +that matches the largest set, breaking ties in favor of the most recently +published, and then in favor of the smaller server descriptor digest. + +<a id="dir-spec.txt-3.8.1"></a> + +## Forward compatibility { #consensus-method-list } + +Future versions of Tor will need to include new information in the +consensus documents, but it is important that all authorities (or at least +half) generate and sign the same signed consensus. + +To achieve this, authorities list in their votes their supported methods +for generating consensuses from votes. Later methods will be assigned +higher numbers. Currently specified methods: + +```text + "1" -- The first implemented version. + "2" -- Added support for the Unnamed flag. + "3" -- Added legacy ID key support to aid in authority ID key rollovers + "4" -- No longer list routers that are not running in the consensus + "5" -- adds support for "w" and "p" lines. + "6" -- Prefers measured bandwidth values rather than advertised + "7" -- Provides keyword=integer pairs of consensus parameters + "8" -- Provides microdescriptor summaries + "9" -- Provides weights for selecting flagged routers in paths + "10" -- Fixes edge case bugs in router flag selection weights + "11" -- Don't consider BadExits when calculating bandwidth weights + "12" -- Params are only included if enough auths voted for them + "13" -- Omit router entries with missing microdescriptors. + "14" -- Adds support for "a" lines in ns consensuses and microdescriptors. + "15" -- Adds support for "p6" lines. + "16" -- Adds ntor keys to microdescriptors + "17" -- Adds "Unmeasured=1" flags to "w" lines + "18" -- Adds 'id' to microdescriptors. + "19" -- Adds "package" lines to consensuses + "20" -- Adds GuardFraction information to microdescriptors. + "21" -- Adds Ed25519 keys to microdescriptors. + "22" -- Instantiates Ed25519 voting algorithm correctly. + "23" -- Adds shared randomness protocol data. + "24" -- No longer lists routers that are not Valid in the consensus. + "25" -- Vote on recommended-protocols and required-protocols. + "26" -- Initialize bandwidth weights to 1 to avoid division-by-zero. + "27" -- Adds support for "a" lines in microdescriptor consensuses. + "28" -- Removes "a" lines from microdescriptors. + "29" -- Canonicalizes families in microdescriptors. + "30" -- Removes padding from ntor-onion-key. + "31" -- Uses correct parsing for bwweightscale and maxunmeasuredbw + when computing weights + "32" -- Adds special handling for MiddleOnly flag. + "33" -- Sets "publication" field in microdesc consensus "r" lines + to a meaningless value. + "34" -- Removes "package" lines from consensus. +``` + +Before generating a consensus, an authority must decide which consensus +method to use. To do this, it looks for the highest version number +supported by more than 2/3 of the authorities voting. If it supports this +method, then it uses it. Otherwise, it falls back to the newest consensus +method that it supports (which will probably not result in a sufficiently +signed consensus). + +All authorities MUST support method 25; authorities SHOULD support +more recent methods as well. Authorities SHOULD NOT support or +advertise support for any method before 25. Clients MAY assume that +they will never see a current valid signed consensus for any method +before method 25. + +(The consensuses generated by new methods must be parsable by +implementations that only understand the old methods, and must not cause +those implementations to compromise their anonymity. This is a means for +making changes in the contents of consensus; not for making +backward-incompatible changes in their format.) + +The following methods have incorrect implementations; authorities SHOULD +NOT advertise support for them: + +"21" -- Did not correctly enable support for ed25519 key collation. + +<a id="dir-spec.txt-3.8.2"></a> + +## Encoding port lists + +Whether the summary shows the list of accepted ports or the list of +rejected ports depends on which list is shorter (has a shorter string +representation). In case of ties we choose the list of accepted +ports. As an exception to this rule an allow-all policy is +represented as "accept 1-65535" instead of "reject " and a reject-all +policy is similarly given as "reject 1-65535". + +Summary items are compressed, that is instead of "80-88,89-100" there +only is a single item of "80-100", similarly instead of "20,21" a +summary will say "20-21". + +Port lists are sorted in ascending order. + +The maximum allowed length of a policy summary (including the "accept " +or "reject ") is 1000 characters. If a summary exceeds that length we +use an accept-style summary and list as much of the port list as is +possible within these 1000 bytes. \[XXXX be more specific.\] + +<a id="dir-spec.txt-3.8.3"></a> + +## Computing Bandwidth Weights + +Let weight_scale = 10000, or the value of the "bwweightscale" parameter. +(Before consensus method 31 there was a bug in parsing bwweightscale, so +that if there were any consensus parameters after it alphabetically, it +would always be treated as 10000. A similar bug existed for +"maxunmeasuredbw".) + +Starting with consensus method 26, G, M, E, and D are initialized to 1 and +T to 4. Prior consensus methods initialize them all to 0. With this change, +test tor networks that are small or new are much more likely to produce +bandwidth-weights in their consensus. The extra bandwidth has a negligible +impact on the bandwidth weights in the public tor network. + +Let G be the total bandwidth for Guard-flagged nodes. +Let M be the total bandwidth for non-flagged nodes. +Let E be the total bandwidth for Exit-flagged nodes. +Let D be the total bandwidth for Guard+Exit-flagged nodes. +Let T = G+M+E+D + +Let Wgd be the weight for choosing a Guard+Exit for the guard position. +Let Wmd be the weight for choosing a Guard+Exit for the middle position. +Let Wed be the weight for choosing a Guard+Exit for the exit position. + +Let Wme be the weight for choosing an Exit for the middle position. +Let Wmg be the weight for choosing a Guard for the middle position. + +Let Wgg be the weight for choosing a Guard for the guard position. +Let Wee be the weight for choosing an Exit for the exit position. + +Balanced network conditions then arise from solutions to the following +system of equations: + +Wgg*G + Wgd*D == M + Wmd*D + Wme*E + Wmg*G (guard bw = middle bw) +Wgg*G + Wgd*D == Wee*E + Wed*D (guard bw = exit bw) +Wed*D + Wmd*D + Wgd*D == D (aka: Wed+Wmd+Wdg = weight_scale) +Wmg*G + Wgg*G == G (aka: Wgg = weight_scale-Wmg) +Wme*E + Wee*E == E (aka: Wee = weight_scale-Wme) + +We are short 2 constraints with the above set. The remaining constraints +come from examining different cases of network load. The following +constraints are used in consensus method 10 and above. There are another +incorrect and obsolete set of constraints used for these same cases in +consensus method 9. For those, see dir-spec.txt in Tor 0.2.2.10-alpha +to 0.2.2.16-alpha. + +Case 1: E >= T/3 && G >= T/3 (Neither Exit nor Guard Scarce) + +In this case, the additional two constraints are: Wmg == Wmd, +Wed == 1/3. + +```text + This leads to the solution: + Wgd = weight_scale/3 + Wed = weight_scale/3 + Wmd = weight_scale/3 + Wee = (weight_scale*(E+G+M))/(3*E) + Wme = weight_scale - Wee + Wmg = (weight_scale*(2*G-E-M))/(3*G) + Wgg = weight_scale - Wmg + + Case 2: E < T/3 && G < T/3 (Both are scarce) +``` + +Let R denote the more scarce class (Rare) between Guard vs Exit. +Let S denote the less scarce class. + +Subcase a: R+D \< S + +In this subcase, we simply devote all of D bandwidth to the +scarce class. + +```text + Wgg = Wee = weight_scale + Wmg = Wme = Wmd = 0; + if E < G: + Wed = weight_scale + Wgd = 0 + else: + Wed = 0 + Wgd = weight_scale + + Subcase b: R+D >= S +``` + +In this case, if M \<= T/3, we have enough bandwidth to try to achieve +a balancing condition. + +Add constraints Wgg = weight_scale, Wmd == Wgd to maximize bandwidth in +the guard position while still allowing exits to be used as middle nodes: + +Wee = (weight_scale\*(E - G + M))/E +Wed = (weight_scale\*(D - 2*E + 4*G - 2*M))/(3*D) +Wme = (weight_scale\*(G-M))/E +Wmg = 0 +Wgg = weight_scale +Wmd = (weight_scale - Wed)/2 +Wgd = (weight_scale - Wed)/2 + +If this system ends up with any values out of range (ie negative, or +above weight_scale), use the constraints Wgg == weight_scale and Wee == +weight_scale, since both those positions are scarce: + +```text + Wgg = weight_scale + Wee = weight_scale + Wed = (weight_scale*(D - 2*E + G + M))/(3*D) + Wmd = (weight_Scale*(D - 2*M + G + E))/(3*D) + Wme = 0 + Wmg = 0 + Wgd = weight_scale - Wed - Wmd + + If M > T/3, then the Wmd weight above will become negative. Set it to 0 + in this case: + Wmd = 0 + Wgd = weight_scale - Wed + + Case 3: One of E < T/3 or G < T/3 + + Let S be the scarce class (of E or G). + + Subcase a: (S+D) < T/3: + if G=S: + Wgg = Wgd = weight_scale; + Wmd = Wed = Wmg = 0; + // Minor subcase, if E is more scarce than M, + // keep its bandwidth in place. + if (E < M) Wme = 0; + else Wme = (weight_scale*(E-M))/(2*E); + Wee = weight_scale-Wme; + if E=S: + Wee = Wed = weight_scale; + Wmd = Wgd = Wme = 0; + // Minor subcase, if G is more scarce than M, + // keep its bandwidth in place. + if (G < M) Wmg = 0; + else Wmg = (weight_scale*(G-M))/(2*G); + Wgg = weight_scale-Wmg; + + Subcase b: (S+D) >= T/3 + if G=S: + Add constraints Wgg = weight_scale, Wmd == Wed to maximize bandwidth + in the guard position, while still allowing exits to be + used as middle nodes: + Wgg = weight_scale + Wgd = (weight_scale*(D - 2*G + E + M))/(3*D) + Wmg = 0 + Wee = (weight_scale*(E+M))/(2*E) + Wme = weight_scale - Wee + Wmd = (weight_scale - Wgd)/2 + Wed = (weight_scale - Wgd)/2 + if E=S: + Add constraints Wee == weight_scale, Wmd == Wgd to maximize bandwidth + in the exit position: + Wee = weight_scale; + Wed = (weight_scale*(D - 2*E + G + M))/(3*D); + Wme = 0; + Wgg = (weight_scale*(G+M))/(2*G); + Wmg = weight_scale - Wgg; + Wmd = (weight_scale - Wed)/2; + Wgd = (weight_scale - Wed)/2; +``` + +To ensure consensus, all calculations are performed using integer math +with a fixed precision determined by the bwweightscale consensus +parameter (defaults at 10000, Min: 1, Max: INT32_MAX). (See note above +about parsing bug in bwweightscale before consensus method 31.) + +For future balancing improvements, Tor clients support 11 additional weights +for directory requests and middle weighting. These weights are currently +set at weight_scale, with the exception of the following groups of +assignments: + +Directory requests use middle weights: + +Wbd=Wmd, Wbg=Wmg, Wbe=Wme, Wbm=Wmm + +Handle bridges and strange exit policies: + +Wgm=Wgg, Wem=Wee, Weg=Wed + +<a id="dir-spec.txt-3.9"></a> + +## Computing consensus flavors + +Consensus flavors are variants of the consensus that clients can choose +to download and use instead of the unflavored consensus. The purpose +of a consensus flavor is to remove or replace information in the +unflavored consensus without forcing clients to download information +they would not use anyway. + +Directory authorities can produce and serve an arbitrary number of +flavors of the same consensus. A downside of creating too many new +flavors is that clients will be distinguishable based on which flavor +they download. A new flavor should not be created when adding a field +instead wouldn't be too onerous. + +Examples for consensus flavors include: + +```text + - Publishing hashes of microdescriptors instead of hashes of + full descriptors (see section 3.9.2). + - Including different digests of descriptors, instead of the + perhaps-soon-to-be-totally-broken SHA1. +``` + +Consensus flavors are derived from the unflavored consensus once the +voting process is complete. This is to avoid consensus synchronization +problems. + +Every consensus flavor has a name consisting of a sequence of one +or more alphanumeric characters and dashes. For compatibility, +the original (unflavored) consensus type is called "ns". + +The supported consensus flavors are defined as part of the +authorities' consensus method. + +All consensus flavors have in common that their first line is +"network-status-version" where version is 3 or higher, and the flavor +is a string consisting of alphanumeric characters and dashes: + +"network-status-version" SP version \[SP flavor\] NL + +<a id="dir-spec.txt-3.9.1"></a> + +### ns consensus + +The ns consensus flavor is equivalent to the unflavored consensus. +When the flavor is omitted from the "network-status-version" line, +it should be assumed to be "ns". Some implementations may explicitly +state that the flavor is "ns" when generating consensuses, but should +accept consensuses where the flavor is omitted. + +<a id="dir-spec.txt-3.9.2"></a> + +### Microdescriptor consensus + +The microdescriptor consensus is a consensus flavor that contains +microdescriptor hashes instead of descriptor hashes and that omits +exit-policy summaries which are contained in microdescriptors. The +microdescriptor consensus was designed to contain elements that are +small and frequently changing. Clients use the information in the +microdescriptor consensus to decide which servers to fetch information +about and which servers to fetch information from. + +The microdescriptor consensus is based on the unflavored consensus with +the exceptions as follows: + +"network-status-version" SP version SP "microdesc" NL + +\[At start, exactly once.\] + +The flavor name of a microdescriptor consensus is "microdesc". + +Changes to router status entries are as follows: + +```text + "r" SP nickname SP identity SP publication SP IP SP ORPort + SP DirPort NL + + [At start, exactly once.] + + Similar to "r" lines in section 3.4.1, but without the digest element. + + "a" SP address ":" port NL + + [Any number] + + Identical to the "r" lines in section 3.4.1. +``` + +(Only included when the vote is generated with consensus-method 14 +or later, and the consensus is generated with consensus-method 27 or +later.) + +"p" ... NL + +\[At most once\] + +Not currently generated. + +Exit policy summaries are contained in microdescriptors and +therefore omitted in the microdescriptor consensus. + +"m" SP digest NL + +\[Exactly once.\*\] + +"digest" is the base64 of the SHA256 hash of the router's +microdescriptor with trailing =s omitted. For a given router +descriptor digest and consensus method there should only be a +single microdescriptor digest in the "m" lines of all votes. +If different votes have different microdescriptor digests for +the same descriptor digest and consensus method, at least one +of the authorities is broken. If this happens, the microdesc +consensus should contain whichever microdescriptor digest is +most common. If there is no winner, we break ties in the favor +of the lexically earliest. + +\[\*Before consensus method 13, this field was sometimes erroneously +omitted.\] + +Additionally, a microdescriptor consensus SHOULD use the sha256 digest +algorithm for its signatures. + +<a id="dir-spec.txt-3.10"></a> |