diff options
Diffstat (limited to 'attic/text_formats/bandwidth-file-spec.txt')
-rw-r--r-- | attic/text_formats/bandwidth-file-spec.txt | 1315 |
1 files changed, 1315 insertions, 0 deletions
diff --git a/attic/text_formats/bandwidth-file-spec.txt b/attic/text_formats/bandwidth-file-spec.txt new file mode 100644 index 0000000..bad13f6 --- /dev/null +++ b/attic/text_formats/bandwidth-file-spec.txt @@ -0,0 +1,1315 @@ + + Tor Bandwidth File Format + juga + teor + +Table of Contents + + 1. Scope and preliminaries + 1.2. Acknowledgements + 1.3. Outline + 1.4. Format Versions + 2. Format details + 2.1. Definitions + 2.2. Header List format + 2.3. Relay Line format + 2.4. Implementation details + 2.4.1. Writing bandwidth files atomically + 2.4.2. Additional KeyValue pair definitions + 2.4.2.1. Simple Bandwidth Scanner + 2.4.2.2. Torflow + A. Sample data + A.1. Generated by Torflow + A.2. Generated by sbws version 0.1.0 + A.3. Generated by sbws version 1.0.3 + A.4. Headers generated by sbws version 1.0.4 + A.5 Generated by sbws version 1.1.0 + B. Scaling bandwidths + B.1. Scaling requirements + B.2. A linear scaling method + B.3. Quota changes + B.4. Torflow aggregation + +1. Scope and preliminaries + + This document describes the format of Tor's Bandwidth File, version + 1.0.0 and later. + + It is a new specification for the existing bandwidth file format, + which we call version 1.0.0. It also specifies new format versions + 1.1.0 and later, which are backwards compatible with 1.0.0 parsers. + + Since Tor version 0.2.4.12-alpha, the directory authorities use + the Bandwidth File file called "V3BandwidthsFile" generated by + Torflow [1]. The details of this format are described in Torflow's + README.spec.txt. We also summarise the format in this specification. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL + NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + RFC 2119. + +1.2. Acknowledgements + + The original bandwidth generator (Torflow) and format was + created by mike. Teor suggested to write this specification while + contributing on pastly's new bandwidth generator implementation. + + This specification was revised after feedback from: + + Nick Mathewson (nickm) + Iain Learmonth (irl) + +1.3. Outline + + The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1 + and 3.4.2, use the term bandwidth measurements, to refer to what + here is called Bandwidth File. + + A Bandwidth File contains information on relays' bandwidth + capacities and is produced by bandwidth generators, previously known + as bandwidth scanners. + +1.4. Format Versions + + 1.0.0 - The legacy Bandwidth File format + + 1.1.0 - Adds a header containing information about the bandwidth + file. Document the sbws and Torflow relay line keys. + + 1.2.0 - If there are not enough eligible relays, the bandwidth file + SHOULD contain a header, but no relays. (To match Torflow's + existing behaviour.) + + Adds scanner and destination countries to the header. + Adds new KeyValue Lines to the Header List section with + statistics about the number of relays included in the file. + Adds new KeyValues to Relay Bandwidth Lines, with different + bandwidth values (averages and descriptor bandwidths). + + 1.4.0 - Adds monitoring KeyValues to the header and relay lines. + + RelayLines for excluded relays MAY be present in the bandwidth + file for diagnostic reasons. Similarly, if there are not enough + eligible relays, the bandwidth file MAY contain all known relays. + + Diagnostic relay lines SHOULD be marked with vote=0, and + Tor SHOULD NOT use their bandwidths in its votes. + + Also adds Tor version. + 1.5.0 - Removes "recent_measurement_attempt_count" KeyValue. + 1.6.0 - Adds congestion control stream events KeyValues. + 1.7.0 - Adds ratios KeyValues to the relay lines and network averages + KeyValues to the header. + + All Tor versions can consume format version 1.0.0. + + All Tor versions can consume format version 1.1.0 and later, + but Tor versions earlier than 0.3.5.1-alpha warn if the header + contains any KeyValue lines after the Timestamp. + + Tor versions 0.4.0.3-alpha, 0.3.5.8, 0.3.4.11, and earlier do not + understand "vote=0". Instead, they will vote for the actual bandwidths + that sbws puts in diagnostic relay lines: + * 1 for relays with "unmeasured=1", and + * the relay's measured and scaled bandwidth when "under_min_report=1". + +2. Format details + + The Bandwidth File MUST contain the following sections: + - Header List (exactly once), which is a partially ordered list of + - Header Lines (one or more times), then + - Relay Lines (zero or more times), in an arbitrary order. + If it does not contain these sections, parsers SHOULD ignore the file. + +2.1. Definitions + + The following nonterminals are defined in Tor directory protocol + sections 1.2., 2.1.1., 2.1.3.: + + bool + Int + SP (space) + NL (newline) + KeywordChar + ArgumentChar + nickname + hexdigest (a '$', followed by 40 hexadecimal characters + ([A-Fa-f0-9])) + + Nonterminal defined section 2 of version-spec.txt [4]: + + version_number + + We define the following nonterminals: + + Line ::= ArgumentChar* NL + RelayLine ::= KeyValue (SP KeyValue)* NL + HeaderLine ::= KeyValue NL + KeyValue ::= Key "=" Value + Key ::= (KeywordChar | "_")+ + Value ::= ArgumentCharValue+ + ArgumentCharValue ::= any printing ASCII character except NL and SP. + Terminator ::= "=====" or "====" + Generators SHOULD use a 5-character terminator. + Timestamp ::= Int + Bandwidth ::= Int + MasterKey ::= a base64-encoded Ed25519 public key, with + padding characters omitted. + DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601 + CountryCode ::= Two capital ASCII letters ([A-Z]{2}), as defined in + ISO 3166-1 alpha-2 plus "ZZ" to denote unknown country + (eg the destination is in a Content Delivery Network). + CountryCodeList ::= One or more CountryCode(s) separated by a comma + ([A-Z]{2}(,[A-Z]{2})*). + + Note that key_value and value are defined in Tor directory protocol + with different formats to KeyValue and Value here. + + Tor versions earlier than 0.3.5.1-alpha require all lines in the file + to be 510 characters or less. The previous limit was 254 characters in + Tor 0.2.6.2-alpha and earlier. Parsers MAY ignore longer Lines. + + Note that directory authorities are only supported on the two most + recent stable Tor versions, so we expect that line limits will be + removed after Tor 0.4.0 is released in 2019. + +2.2. Header List format + + It consists of a Timestamp line and zero or more HeaderLines. + + All the header lines MUST conform to the HeaderLine format, except + the first Timestamp line. + + The Timestamp line is not a HeaderLine to keep compatibility with + the legacy Bandwidth File format. + + Some header Lines MUST appear in specific positions, as documented + below. All other Lines can appear in any order. + + If a parser does not recognize any extra material in a header Line, + the Line MUST be ignored. + + If a header Line does not conform to this format, the Line SHOULD be + ignored by parsers. + + It consists of: + + Timestamp NL + + [At start, exactly once.] + + The Unix Epoch time in seconds of the most recent generator bandwidth + result. + + If the generator implementation has multiple threads or + subprocesses which can fail independently, it SHOULD take the most + recent timestamp from each thread and use the oldest value. This + ensures all the threads continue running. + + If there are threads that do not run continuously, they SHOULD be + excluded from the timestamp calculation. + + If there are no recent results, the generator MUST NOT generate a new + file. + + It does not follow the KeyValue format for backwards compatibility + with version 1.0.0. + + "version" version_number NL + + [In second position, zero or one time.] + + The specification document format version. + It uses semantic versioning [5]. + + This Line was added in version 1.1.0 of this specification. + + Version 1.0.0 documents do not contain this Line, and the + version_number is considered to be "1.0.0". + + "software" Value NL + + [Zero or one time.] + + The name of the software that created the document. + + This Line was added in version 1.1.0 of this specification. + + Version 1.0.0 documents do not contain this Line, and the software + is considered to be "torflow". + + "software_version" Value NL + + [Zero or one time.] + + The version of the software that created the document. + The version may be a version_number, a git commit, or some other + version scheme. + + This Line was added in version 1.1.0 of this specification. + + "file_created" DateTime NL + + [Zero or one time.] + + The date and time timestamp in ISO 8601 format and UTC time zone + when the file was created. + + This Line was added in version 1.1.0 of this specification. + + "generator_started" DateTime NL + + [Zero or one time.] + + The date and time timestamp in ISO 8601 format and UTC time zone + when the generator started. + + This Line was added in version 1.1.0 of this specification. + + "earliest_bandwidth" DateTime NL + + [Zero or one time.] + + The date and time timestamp in ISO 8601 format and UTC time zone + when the first relay bandwidth was obtained. + + This Line was added in version 1.1.0 of this specification. + + "latest_bandwidth" DateTime NL + + [Zero or one time.] + + The date and time timestamp in ISO 8601 format and UTC time zone + of the most recent generator bandwidth result. + + This time MUST be identical to the initial Timestamp line. + + This duplicate value is included to make the format easier for people + to read. + + This Line was added in version 1.1.0 of this specification. + + "number_eligible_relays" Int NL + + [Zero or one time.] + + The number of relays that have enough measurements to be + included in the bandwidth file. + + This Line was added in version 1.2.0 of this specification. + + "minimum_percent_eligible_relays" Int NL + + [Zero or one time.] + + The percentage of relays in the consensus that SHOULD be + included in every generated bandwidth file. + + If this threshold is not reached, format versions 1.3.0 and earlier + SHOULD NOT contain any relays. (Bandwidth files always include a + header.) + + Format versions 1.4.0 and later SHOULD include all the relays for + diagnostic purposes, even if this threshold is not reached. But these + relays SHOULD be marked so that Tor does not vote on them. + See section 1.4 for details. + + The minimum percentage is 60% in Torflow, so sbws uses + 60% as the default. + + This Line was added in version 1.2.0 of this specification. + + "number_consensus_relays" Int NL + + [Zero or one time.] + + The number of relays in the consensus. + + This Line was added in version 1.2.0 of this specification. + + "percent_eligible_relays" Int NL + + [Zero or one time.] + + The number of eligible relays, as a percentage of the number + of relays in the consensus. + + This line SHOULD be equal to: + (number_eligible_relays * 100.0) / number_consensus_relays + to the number of relays in the consensus to include in this file. + + This Line was added in version 1.2.0 of this specification. + + "minimum_number_eligible_relays" Int NL + + [Zero or one time.] + + The minimum number of relays that SHOULD be included in the bandwidth + file. See minimum_percent_eligible_relays for details. + + This line SHOULD be equal to: + number_consensus_relays * (minimum_percent_eligible_relays / 100.0) + + This Line was added in version 1.2.0 of this specification. + + "scanner_country" CountryCode NL + + [Zero or one time.] + + The country, as in political geolocation, where the generator is run. + + This Line was added in version 1.2.0 of this specification. + + "destinations_countries" CountryCodeList NL + + [Zero or one time.] + + The country, as in political geolocation, or countries where the + destination Web server(s) are located. + The destination Web Servers serve the data that the generator retrieves + to measure the bandwidth. + + This Line was added in version 1.2.0 of this specification. + + "recent_consensus_count" Int NL + + [Zero or one time.]. + + The number of the different consensuses seen in the last data_period + days. (data_period is 5 by default.) + + Assuming that Tor clients fetch a consensus every 1-2 hours, + and that the data_period is 5 days, the Value of this Key SHOULD be + between: + data_period * 24 / 2 = 60 + data_period * 24 = 120 + + This Line was added in version 1.4.0 of this specification. + + "recent_priority_list_count" Int NL + + [Zero or one time.] + + The number of times that a list with a subset of relays prioritized + to be measured has been created in the last data_period days. + (data_period is 5 by default.) + + In 2019, with 7000 relays in the network, the Value of this Key SHOULD be + approximately: + data_period * 24 / 1.5 = 80 + Being 1.5 the approximate number of hours it takes to measure a + priority list of 7000 * 0.05 (350) relays, when the fraction of relays + in a priority list is the 5% (0.05). + + This Line was added in version 1.4.0 of this specification. + + "recent_priority_relay_count" Int NL + + [Zero or one time.] + + The number of relays that has been in in the list of relays prioritized + to be measured in the last data_period days. (data_period is 5 by + default.) + + In 2019, with 7000 relays in the network, the Value of this Key SHOULD be + approximately: + 80 * (7000 * 0.05) = 28000 + Being 0.05 (5%) the fraction of relays in a priority list and 80 + the approximate number of priority lists (see + "recent_priority_list_count"). + + This Line was added in version 1.4.0 of this specification. + + "recent_measurement_attempt_count" Int NL + + [Zero or one time.] + + The number of times that any relay has been queued to be measured + in the last data_period days. (data_period is 5 by default.) + + In 2019, with 7000 relays in the network, the Value of this Key SHOULD be + approximately the same as "recent_priority_relay_count", + assuming that there is one attempt to measure a relay for each relay that + has been prioritized unless there are system, network or implementation + issues. + + This Line was added in version 1.4.0 of this specification and removed + in version 1.5.0. + + "recent_measurement_failure_count" Int NL + + [Zero or one time.] + + The number of times that the scanner attempted to measure a relay in + the last data_period days (5 by default), but the relay has not been + measured because of system, network or implementation issues. + + This Line was added in version 1.4.0 of this specification. + + "recent_measurements_excluded_error_count" Int NL + + [Zero or one time.] + + The number of relays that have no successful measurements in the last + data_period days (5 by default). + + (See the note in section 1.4, version 1.4.0, about excluded relays.) + + This Line was added in version 1.4.0 of this specification. + + "recent_measurements_excluded_near_count" Int NL + + [Zero or one time.] + + The number of relays that have some successful measurements in the last + data_period days (5 by default), but all those measurements were + performed in a period of time that was too short (by default 1 day). + + (See the note in section 1.4, version 1.4.0, about excluded relays.) + + This Line was added in version 1.4.0 of this specification. + + "recent_measurements_excluded_old_count" Int NL + + [Zero or one time.] + + The number of relays that have some successful measurements, but all + those measurements are too old (more than 5 days, by default). + + Excludes relays that are already counted in + recent_measurements_excluded_near_count. + + (See the note in section 1.4, version 1.4.0, about excluded relays.) + + This Line was added in version 1.4.0 of this specification. + + "recent_measurements_excluded_few_count" Int NL + + [Zero or one time.] + + The number of relays that don't have enough recent successful + measurements. (Fewer than 2 measurements in the last 5 days, by + default). + + Excludes relays that are already counted in + recent_measurements_excluded_near_count and + recent_measurements_excluded_old_count. + + (See the note in section 1.4, version 1.4.0, about excluded relays.) + + This Line was added in version 1.4.0 of this specification. + + "time_to_report_half_network" Int NL + + [Zero or one time.] + + The time in seconds that it would take to report measurements about the + half of the network, given the number of eligible relays and the time + it took in the last days (5 days, by default). + + (See the note in section 1.4, version 1.4.0, about excluded relays.) + + This Line was added in version 1.4.0 of this specification. + + "tor_version" version_number NL + + [Zero or one time.] + + The Tor version of the Tor process controlled by the generator. + + This Line was added in version 1.4.0 of this specification. + + "mu" Int NL + + [Zero or one time.] + + The network stream bandwidth average calculated as explained in B4.2. + + This Line was added in version 1.7.0 of this specification. + + "muf" Int NL + + [Zero or one time.] + + The network stream bandwidth average filtered calculated as explained in + B4.2. + + This Line was added in version 1.7.0 of this specification. + + KeyValue NL + + [Zero or more times.] + + There MUST NOT be multiple KeyValue header Lines with the same key. + If there are, the parser SHOULD choose an arbitrary Line. + + If a parser does not recognize a Keyword in a KeyValue Line, it + MUST be ignored. + + Future format versions may include additional KeyValue header Lines. + Additional header Lines will be accompanied by a minor version + increment. + + Implementations MAY add additional header Lines as needed. This + specification SHOULD be updated to avoid conflicting meanings for + the same header keys. + + Parsers MUST NOT rely on the order of these additional Lines. + + Additional header Lines MUST NOT use any keywords specified in the + relay measurements format. + If there are, the parser MAY ignore conflicting keywords. + + Terminator NL + + [Zero or one time.] + + The Header List section ends with a Terminator. + + In version 1.0.0, Header List ends when the first relay bandwidth + is found conforming to the next section. + + Implementations of version 1.1.0 and later SHOULD use a 5-character + terminator. + + Tor 0.4.0.1-alpha and later look for a 5-character terminator, + or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2 + used a 4-character terminator, this bug was fixed in 1.0.3. + +2.3. Relay Line format + + It consists of zero or more RelayLines containing relay ids and + bandwidths. The relays and their KeyValues are in arbitrary order. + + There MUST NOT be multiple KeyValue pairs with the same key in the same + RelayLine. If there are, the parser SHOULD choose an arbitrary Value. + + There MUST NOT be multiple RelayLines per relay identity (node_id or + master_key_ed25519). If there are, parsers SHOULD issue a warning. + Parers MAY reject the file, choose an arbitrary RelayLine, or ignore + both RelayLines. + + If a parser does not recognize any extra material in a RelayLine, + the extra material MUST be ignored. + + Each RelayLine includes the following KeyValue pairs: + + "node_id" hexdigest + + [Exactly once.] + + The fingerprint for the relay's RSA identity key. + + Note: In bandwidth files read by Tor versions earlier than + 0.3.4.1-alpha, node_id MUST NOT be at the end of the Line. + These authority versions are no longer supported. + + Current Tor versions ignore master_key_ed25519, so node_id MUST be + present in each relay Line. + + Implementations of version 1.1.0 and later SHOULD include both node_id + and master_key_ed25519. Parsers SHOULD accept Lines that contain at + least one of them. + + "master_key_ed25519" MasterKey + + [Zero or one time.] + + The relays's master Ed25519 key, base64 encoded, + without trailing "="s, to avoid ambiguity with KeyValue "=" + character. + + This KeyValue pair SHOULD be present, see the note under node_id. + + This KeyValue was added in version 1.1.0 of this specification. + + "bw" Bandwidth + + [Exactly once.] + + The bandwidth of this relay in kilobytes per second. + + No Zero Bandwidths: + Tor accepts zero bandwidths, but they trigger bugs in older Tor + implementations. Therefore, implementations SHOULD NOT produce zero + bandwidths. Instead, they SHOULD use one as their minimum bandwidth. + If there are zero bandwidths, the parser MAY ignore them. + + Bandwidth Aggregation: + Multiple measurements can be aggregated using an averaging scheme, + such as a mean, median, or decaying average. + + Bandwidth Scaling: + Torflow scales bandwidths to kilobytes per second. Other + implementations SHOULD use kilobytes per second for their initial + bandwidth scaling. + + If different implementations or configurations are used in votes for + the same network, their measurements MAY need further scaling. See + Appendix B for information about scaling, and one possible scaling + method. + + MaxAdvertisedBandwidth: + Bandwidth generators MUST limit the relays' measured bandwidth based + on the MaxAdvertisedBadwidth. + A relay's MaxAdvertisedBandwidth limits the bandwidth-avg in its + descriptor. bandwidth-avg is the minimum of MaxAdvertisedBandwidth, + BandwidthRate, RelayBandwidthRate, BandwidthBurst, and + RelayBandwidthBurst. + Therefore, generators MUST limit a relay's measured bandwidth to its + descriptor's bandwidth-avg. This limit needs to be implemented in the + generator, because generators may scale consensus weights before + sending them to Tor. + Generators SHOULD NOT limit measured bandwidths based on descriptors' + bandwidth-observed, because that penalises new relays. + + sbws limits the relay's measured bandwidth to the bandwidth-avg + advertised. + + Torflow partitions relays based on their bandwidth. For unmeasured + relays, Torflow uses the minimum of all descriptor bandwidths, + including bandwidth-avg (MaxAdvertisedBandwidth) and + bandwidth-observed. Then Torflow measures the relays in each partition + against each other, which implicitly limits a relay's measured + bandwidth to the bandwidths of similar relays. + + Torflow also generates consensus weights based on the ratio between the + measured bandwidth and the minimum of all descriptor bandwidths (at the + time of the measurement). So when an operator reduces the + MaxAdvertisedBandwidth for a relay, Torflow reduces that relay's + measured bandwidth. + + KeyValue + + [Zero or more times.] + + Future format versions may include additional KeyValue pairs on a + RelayLine. + Additional KeyValue pairs will be accompanied by a minor version + increment. + + Implementations MAY add additional relay KeyValue pairs as needed. + This specification SHOULD be updated to avoid conflicting meanings + for the same Keywords. + + Parsers MUST NOT rely on the order of these additional KeyValue + pairs. + + Additional KeyValue pairs MUST NOT use any keywords specified in the + header format. + If there are, the parser MAY ignore conflicting keywords. + +2.4. Implementation details + +2.4.1. Writing bandwidth files atomically + + To avoid inconsistent reads, implementations SHOULD write bandwidth files + atomically. If the file is transferred from another host, it SHOULD be + written to a temporary path, then renamed to the V3BandwidthsFile path. + + sbws versions 0.7.0 and later write the bandwidth file to an archival + location, create a temporary symlink to that location, then atomically rename + the symlink + to the configured V3BandwidthsFile path. + + Torflow does not write bandwidth files atomically. + +2.4.2. Additional KeyValue pair definitions + + KeyValue pairs in RelayLines that current implementations generate. + +2.4.2.1. Simple Bandwidth Scanner + + sbws RelayLines contain these keys: + + "node_id" hexdigest + + As above. + + "bw" Bandwidth + + As above. + + "nick" nickname + + [Exactly once.] + + The relay nickname. + + Torflow also has a "nick" KeyValue. + + "rtt" Int + + [Zero or one time.] + + The Round Trip Time in milliseconds to obtain 1 byte of data. + + This KeyValue was added in version 1.1.0 of this specification. + It became optional in version 1.3.0 or 1.4.0 of this specification. + + "time" DateTime + + [Exactly once.] + + The date and time timestamp in ISO 8601 format and UTC time zone + when the last bandwidth was obtained. + + This KeyValue was added in version 1.1.0 of this specification. + The Torflow equivalent is "measured_at". + + "success" Int + + [Zero or one time.] + + The number of times that the bandwidth measurements for this relay were + successful. + + This KeyValue was added in version 1.1.0 of this specification. + + "error_circ" Int + + [Zero or one time.] + + The number of times that the bandwidth measurements for this relay + failed because of circuit failures. + + This KeyValue was added in version 1.1.0 of this specification. + The Torflow equivalent is "circ_fail". + + "error_stream" Int + + [Zero or one time.] + + The number of times that the bandwidth measurements for this relay + failed because of stream failures. + + This KeyValue was added in version 1.1.0 of this specification. + + "error_destination" Int + + [Zero or one time.] + + The number of times that the bandwidth measurements for this relay + failed because the destination Web server was not available. + + This KeyValue was added in version 1.4.0 of this specification. + + "error_second_relay" Int + + [Zero or one time.] + + The number of times that the bandwidth measurements for this relay + failed because sbws could not find a second relay for the test circuit. + + This KeyValue was added in version 1.4.0 of this specification. + + "error_misc" Int + + [Zero or one time.] + + The number of times that the bandwidth measurements for this relay + failed because of other reasons. + + This KeyValue was added in version 1.1.0 of this specification. + + "bw_mean" Int + + [Zero or one time.] + + The measured bandwidth mean for this relay in bytes per second. + + This KeyValue was added in version 1.2.0 of this specification. + + "bw_median" Int + + [Zero or one time.] + + The measured bandwidth median for this relay in bytes per second. + + This KeyValue was added in version 1.2.0 of this specification. + + "desc_bw_avg" Int + + [Zero or one time.] + + The descriptor average bandwidth for this relay in bytes per second. + + This KeyValue was added in version 1.2.0 of this specification. + + "desc_bw_obs_last" Int + + [Zero or one time.] + + The last descriptor observed bandwidth for this relay in bytes per + second. + + This KeyValue was added in version 1.2.0 of this specification. + + "desc_bw_obs_mean" Int + + [Zero or one time.] + + The descriptor observed bandwidth mean for this relay in bytes per + second. + + This KeyValue was added in version 1.2.0 of this specification. + + "desc_bw_bur" Int + + [Zero or one time.] + + The descriptor burst bandwidth for this relay in bytes per + second. + + This KeyValue was added in version 1.2.0 of this specification. + + "consensus_bandwidth" Int + + [Zero or one time.] + + The consensus bandwidth for this relay in bytes per second. + + This KeyValue was added in version 1.2.0 of this specification. + + "consensus_bandwidth_is_unmeasured" Bool + + [Zero or one time.] + + If the consensus bandwidth for this relay was not obtained from + three or more bandwidth authorities, this KeyValue is True or + False otherwise. + + This KeyValue was added in version 1.2.0 of this specification. + + "relay_in_recent_consensus_count" Int + + [Zero or one time.] + + The number of times this relay was found in a consensus in the + last data_period days. (Unless otherwise stated, data_period is + 5 by default.) + + This KeyValue was added in version 1.4.0 of this specification. + + "relay_recent_priority_list_count" Int + + [Zero or one time.] + + The number of times this relay has been prioritized to be measured + in the last data_period days. + + This KeyValue was added in version 1.4.0 of this specification. + + "relay_recent_measurement_attempt_count" Int + + [Zero or one time.] + + The number of times this relay was tried to be measured in the + last data_period days. + + This KeyValue was added in version 1.4.0 of this specification. + + "relay_recent_measurement_failure_count" Int + + [Zero or one time.] + + The number of times this relay was tried to be measured in the + last data_period days, but it was not possible to obtain a + measurement. + + This KeyValue was added in version 1.4.0 of this specification. + + "relay_recent_measurements_excluded_error_count" Int + + [Zero or one time.] + + The number of recent relay measurement attempts that failed. + Measurements are recent if they are in the last data_period days + (5 by default). + + (See the note in section 1.4, version 1.4.0, about excluded relays.) + + This KeyValue was added in version 1.4.0 of this specification. + + "relay_recent_measurements_excluded_near_count" Int + + [Zero or one time.] + + When all of a relay's recent successful measurements were performed in + a period of time that was too short (by default 1 day), the relay is + excluded. This KeyValue contains the number of recent successful + measurements for the relay that were ignored for this reason. + + (See the note in section 1.4, version 1.4.0, about excluded relays.) + + This KeyValue was added in version 1.4.0 of this specification. + + "relay_recent_measurements_excluded_old_count" Int + + [Zero or one time.] + + The number of successful measurements for this relay that are too old + (more than data_period days, 5 by default). + + Excludes measurements that are already counted in + relay_recent_measurements_excluded_near_count. + + (See the note in section 1.4, version 1.4.0, about excluded relays.) + + This KeyValue was added in version 1.4.0 of this specification. + + "relay_recent_measurements_excluded_few_count" Int + + [Zero or one time.] + + The number of successful measurements for this relay that were ignored + because the relay did not have enough successful measurements (fewer + than 2, by default). + + Excludes measurements that are already counted in + relay_recent_measurements_excluded_near_count or + relay_recent_measurements_excluded_old_count. + + (See the note in section 1.4, version 1.4.0, about excluded relays.) + + This KeyValue was added in version 1.4.0 of this specification. + + "under_min_report" bool + + [Zero or one time.] + + If the value is 1, there are not enough eligible relays in the + bandwidth file, and Tor bandwidth authorities MAY NOT vote on this + relay. (Current Tor versions do not change their behaviour based on + the "under_min_report" key.) + + If the value is 0 or the KeyValue is not present, there are enough + relays in the bandwidth file. + + Because Tor versions released before April 2019 (see section 1.4. for + the full list of versions) ignore "vote=0", generator implementations + MUST NOT change the bandwidths for under_min_report relays. Using the + same bw value makes authorities that do not understand "vote=0" + or "under_min_report=1" produce votes that don't change relay weights + too much. It also avoids flapping when the reporting threshold is + reached. + + This KeyValue was added in version 1.4.0 of this specification. + + "unmeasured" bool + + [Zero or one time.] + + If the value is 1, this relay was not successfully measured and + Tor bandwidth authorities MAY NOT vote on this relay. + (Current Tor versions do not change their behaviour based on + the "unmeasured" key.) + + If the value is 0 or the KeyValue is not present, this relay + was successfully measured. + + Because Tor versions released before April 2019 (see section 1.4. for + the full list of versions) ignore "vote=0", generator implementations + MUST set "bw=1" for unmeasured relays. Using the minimum bw value + makes authorities that do not understand "vote=0" or "unmeasured=1" + produce votes that don't change relay weights too much. + + This KeyValue was added in version 1.4.0 of this specification. + + "vote" bool + + [Zero or one time.] + + If the value is 0, Tor directory authorities SHOULD ignore the relay's + entry in the bandwidth file. They SHOULD vote for the relay the same + way they would vote for a relay that is not present in the file. + + This MAY be the case when this relay was not successfully measured but + it is included in the Bandwidth File, to diagnose why they were not + measured. + + If the value is 1 or the KeyValue is not present, Tor directory + authorities MUST use the relay's bw value in any votes for that relay. + + Implementations MUST also set "bw=1" for unmeasured relays. + But they MUST NOT change the bw for under_min_report relays. + (See the explanations under "unmeasured" and "under_min_report" + for more details.) + + This KeyValue was added in version 1.4.0 of this specification. + + "xoff_recv" Int + + [Zero or one time.] + + The number of times this relay received `XOFF_RECV` stream events while + being measured in the last data_period days. + + This KeyValue was added in version 1.6.0 of this specification. + + "xoff_sent" Int + + [Zero or one time.] + + The number of times this relay received `XOFF_SENT` stream events while + being measured in the last data_period days. + + This KeyValue was added in version 1.6.0 of this specification. + + "r_strm" Float + + [Zero or one time.] + + The stream ratio of this relay calculated as explained in B4.3. + + This KeyValue was added in version 1.7.0 of this specification. + + "r_strm_filt" Float + + [Zero or one time.] + + The filtered stream ratio of this relay calculated as explained in B4.3. + + This KeyValue was added in version 1.7.0 of this specification. + + +2.4.2.2. Torflow + + Torflow RelayLines include node_id and bw, and other KeyValue pairs [2]. + +References: + +1. https://gitweb.torproject.org/torflow.git +2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332 + The Torflow specification is outdated, and does not match the current + implementation. See section A.1. for the format produced by Torflow. +3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt +4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt +5. https://semver.org/ + +A. Sample data + +The following has not been obtained from any real measurement. + +A.1. Generated by Torflow + +This an example version 1.0.0 document: + +1523911758 +node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath +node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath + +A.2. Generated by sbws version 0.1.0 + +1523911758 +version=1.1.0 +software=sbws +software_version=0.1.0 +latest_bandwidth=2018-04-16T20:49:18 +file_created=2018-04-16T21:49:18 +generator_started=2018-04-16T15:13:25 +earliest_bandwidth=2018-04-16T15:13:26 +==== +bw=380 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26 +bw=189 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36 + +A.3. Generated by sbws version 1.0.3 + +1523911758 +version=1.2.0 +latest_bandwidth=2018-04-16T20:49:18 +file_created=2018-04-16T21:49:18 +generator_started=2018-04-16T15:13:25 +earliest_bandwidth=2018-04-16T15:13:26 +minimum_number_eligible_relays=3862 +minimum_percent_eligible_relays=60 +number_consensus_relays=6436 +number_eligible_relays=6000 +percent_eligible_relays=93 +software=sbws +software_version=1.0.3 +===== +bw=38000 bw_mean=1127824 bw_median=1180062 desc_bw_avg=1073741824 desc_bw_obs_last=17230879 desc_bw_obs_mean=14732306 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26 +bw=1 bw_mean=199162 bw_median=185675 desc_bw_avg=409600 desc_bw_obs_last=836165 desc_bw_obs_mean=858030 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36 + +A.3.1. When there are not enough eligible measured relays: + +1540496079 +version=1.2.0 +earliest_bandwidth=2018-10-20T19:35:52 +file_created=2018-10-25T19:35:03 +generator_started=2018-10-25T11:42:56 +latest_bandwidth=2018-10-25T19:34:39 +minimum_number_eligible_relays=3862 +minimum_percent_eligible_relays=60 +number_consensus_relays=6436 +number_eligible_relays=2960 +percent_eligible_relays=46 +software=sbws +software_version=1.0.3 +===== + +A.4. Headers generated by sbws version 1.0.4 + +1523911758 +version=1.2.0 +latest_bandwidth=2018-04-16T20:49:18 +destinations_countries=TH,ZZ +file_created=2018-04-16T21:49:18 +generator_started=2018-04-16T15:13:25 +earliest_bandwidth=2018-04-16T15:13:26 +minimum_number_eligible_relays=3862 +minimum_percent_eligible_relays=60 +number_consensus_relays=6436 +number_eligible_relays=6000 +percent_eligible_relays=93 +scanner_country=SN +software=sbws +software_version=1.0.4 +===== + +A.5 Generated by sbws version 1.1.0 + +1523911758 +version=1.4.0 +latest_bandwidth=2018-04-16T20:49:18 +destinations_countries=TH,ZZ +file_created=2018-04-16T21:49:18 +generator_started=2018-04-16T15:13:25 +earliest_bandwidth=2018-04-16T15:13:26 +minimum_number_eligible_relays=3862 +minimum_percent_eligible_relays=60 +number_consensus_relays=6436 +number_eligible_relays=6000 +percent_eligible_relays=93 +recent_measurement_attempt_count=6243 +recent_measurement_failure_count=732 +recent_measurements_excluded_error_count=969 +recent_measurements_excluded_few_count=3946 +recent_measurements_excluded_near_count=90 +recent_measurements_excluded_old_count=0 +recent_priority_list_count=20 +recent_priority_relay_count=6243 +scanner_country=SN +software=sbws +software_version=1.1.0 +time_to_report_half_network=57273 +===== +bw=1 error_circ=1 error_destination=0 error_misc=0 error_second_relay=0 error_stream=0 master_key_ed25519=J3HQ24kOQWac3L1xlFLp7gY91qkb5NuKxjj1BhDi+m8 nick=snap269 node_id=$DC4D609F95A52614D1E69C752168AF1FCAE0B05F relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=1 relay_recent_measurements_excluded_near_count=3 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=3 time=2019-03-16T18:20:57 unmeasured=1 vote=0 +bw=1 error_circ=0 error_destination=0 error_misc=0 error_second_relay=0 error_stream=2 master_key_ed25519=h6ZB1E1yBFWIMloUm9IWwjgaPXEpL5cUbuoQDgdSDKg nick=relay node_id=$C4544F9E209A9A9B99591D548B3E2822236C0503 relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=2 relay_recent_measurements_excluded_few_count=1 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=1 time=2019-03-17T06:50:58 unmeasured=1 vote=0 + +B. Scaling bandwidths + +B.1. Scaling requirements + + Tor accepts zero bandwidths, but they trigger bugs in older Tor + implementations. Therefore, scaling methods SHOULD perform the + following checks: + * If the total bandwidth is zero, all relays should be given equal + bandwidths. + * If the scaled bandwidth is zero, it should be rounded up to one. + + Initial experiments indicate that scaling may not be needed for + torflow and sbws, because their measured bandwidths are similar + enough already. + +B.2. A linear scaling method + + If scaling is required, here is a simple linear bandwidth scaling + method, which ensures that all bandwidth votes contain approximately + the same total bandwidth: + + 1. Calculate the relay quota by dividing the total measured bandwidth + in all votes, by the number of relays with measured bandwidth + votes. In the public tor network, this is approximately 7500 as of + April 2018. The quota should be a consensus parameter, so it can be + adjusted for all generators on the network. + + 2. Calculate a vote quota by multiplying the relay quota by the number + of relays this bandwidth authority has measured + bandwidths for. + + 3. Calculate a scaling factor by dividing the vote quota by the + total unscaled measured bandwidth in this bandwidth + authority's upcoming vote. + + 4. Multiply each unscaled measured bandwidth by the scaling + factor. + + Now, the total scaled bandwidth in the upcoming vote is + approximately equal to the quota. + +B.3. Quota changes + + If all generators are using scaling, the quota can be gradually + reduced or increased as needed. Smaller quotas decrease the size + of uncompressed consensuses, and may decrease the size of + consensus diffs and compressed consensuses. But if the relay + quota is too small, some relays may be over- or under-weighted. + +B.4. Torflow aggregation + + Torflow implements two methods to compute the bandwidth values from the + (stream) bandwidth measurements: with and without PID control feedback. + The method described here is without PID control (see Torflow + specification, section 2.2). + + In the following sections, the relays' measured bandwidth refer to the + ones that this bandwidth authority has measured for the relays that + would be included in the next bandwidth authority's upcoming vote. + + 1. Calculate the filtered bandwidth for each relay: + - choose the relay's measurements (`bw_j`) that are equal or greater + than the mean of the measurements for this relay + - calculate the mean of those measurements + + In pseudocode: + + bw_filt_i = mean(max(mean(bw_j), bw_j)) + + 2. Calculate network averages: + - calculate the filtered average by dividing the sum of all the + relays' filtered bandwidth by the number of relays that have been + measured (`n`), ie, calculate the mean average of the relays' + filtered bandwidth. + - calculate the stream average by dividing the sum of all the + relays' measured bandwidth by the number of relays that have been + measured (`n`), ie, calculate the mean average or the relays' + measured bandwidth. + + In pseudocode: + + bw_avg_filt_ = bw_filt_i / n + bw_avg_strm = bw_i / n + + 3. Calculate ratios for each relay: + - calculate the filtered ratio by dividing each relay filtered + bandwidth by the filtered average + - calculate the stream ratio by dividing each relay measured + bandwidth by the stream average + + In pseudocode: + + r_filt_i = bw_filt_i / bw_avg_filt + r_strm_i = bw_i / bw_avg_strm + + 4. Calculate the final ratio for each relay: + The final ratio is the larger between the filtered bandwidth's and the + stream bandwidth's ratio. + + In pseudocode: + + r_i = max(r_filt_i, r_strm_i) + + 5. Calculate the scaled bandwidth for each relay: + The most recent descriptor observed bandwidth (`bw_obs_i`) is + multiplied by the ratio + + In pseudocode: + + bw_new_i = r_i * bw_obs_i + + <<In this way, the resulting network status consensus bandwidth + values are effectively re-weighted proportional to how much faster + the node was as compared to the rest of the network.>> |