aboutsummaryrefslogtreecommitdiff
path: root/attic/text_formats/bandwidth-file-spec.txt
diff options
context:
space:
mode:
Diffstat (limited to 'attic/text_formats/bandwidth-file-spec.txt')
-rw-r--r--attic/text_formats/bandwidth-file-spec.txt1315
1 files changed, 1315 insertions, 0 deletions
diff --git a/attic/text_formats/bandwidth-file-spec.txt b/attic/text_formats/bandwidth-file-spec.txt
new file mode 100644
index 0000000..bad13f6
--- /dev/null
+++ b/attic/text_formats/bandwidth-file-spec.txt
@@ -0,0 +1,1315 @@
+
+ Tor Bandwidth File Format
+ juga
+ teor
+
+Table of Contents
+
+ 1. Scope and preliminaries
+ 1.2. Acknowledgements
+ 1.3. Outline
+ 1.4. Format Versions
+ 2. Format details
+ 2.1. Definitions
+ 2.2. Header List format
+ 2.3. Relay Line format
+ 2.4. Implementation details
+ 2.4.1. Writing bandwidth files atomically
+ 2.4.2. Additional KeyValue pair definitions
+ 2.4.2.1. Simple Bandwidth Scanner
+ 2.4.2.2. Torflow
+ A. Sample data
+ A.1. Generated by Torflow
+ A.2. Generated by sbws version 0.1.0
+ A.3. Generated by sbws version 1.0.3
+ A.4. Headers generated by sbws version 1.0.4
+ A.5 Generated by sbws version 1.1.0
+ B. Scaling bandwidths
+ B.1. Scaling requirements
+ B.2. A linear scaling method
+ B.3. Quota changes
+ B.4. Torflow aggregation
+
+1. Scope and preliminaries
+
+ This document describes the format of Tor's Bandwidth File, version
+ 1.0.0 and later.
+
+ It is a new specification for the existing bandwidth file format,
+ which we call version 1.0.0. It also specifies new format versions
+ 1.1.0 and later, which are backwards compatible with 1.0.0 parsers.
+
+ Since Tor version 0.2.4.12-alpha, the directory authorities use
+ the Bandwidth File file called "V3BandwidthsFile" generated by
+ Torflow [1]. The details of this format are described in Torflow's
+ README.spec.txt. We also summarise the format in this specification.
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+ NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ RFC 2119.
+
+1.2. Acknowledgements
+
+ The original bandwidth generator (Torflow) and format was
+ created by mike. Teor suggested to write this specification while
+ contributing on pastly's new bandwidth generator implementation.
+
+ This specification was revised after feedback from:
+
+ Nick Mathewson (nickm)
+ Iain Learmonth (irl)
+
+1.3. Outline
+
+ The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1
+ and 3.4.2, use the term bandwidth measurements, to refer to what
+ here is called Bandwidth File.
+
+ A Bandwidth File contains information on relays' bandwidth
+ capacities and is produced by bandwidth generators, previously known
+ as bandwidth scanners.
+
+1.4. Format Versions
+
+ 1.0.0 - The legacy Bandwidth File format
+
+ 1.1.0 - Adds a header containing information about the bandwidth
+ file. Document the sbws and Torflow relay line keys.
+
+ 1.2.0 - If there are not enough eligible relays, the bandwidth file
+ SHOULD contain a header, but no relays. (To match Torflow's
+ existing behaviour.)
+
+ Adds scanner and destination countries to the header.
+ Adds new KeyValue Lines to the Header List section with
+ statistics about the number of relays included in the file.
+ Adds new KeyValues to Relay Bandwidth Lines, with different
+ bandwidth values (averages and descriptor bandwidths).
+
+ 1.4.0 - Adds monitoring KeyValues to the header and relay lines.
+
+ RelayLines for excluded relays MAY be present in the bandwidth
+ file for diagnostic reasons. Similarly, if there are not enough
+ eligible relays, the bandwidth file MAY contain all known relays.
+
+ Diagnostic relay lines SHOULD be marked with vote=0, and
+ Tor SHOULD NOT use their bandwidths in its votes.
+
+ Also adds Tor version.
+ 1.5.0 - Removes "recent_measurement_attempt_count" KeyValue.
+ 1.6.0 - Adds congestion control stream events KeyValues.
+ 1.7.0 - Adds ratios KeyValues to the relay lines and network averages
+ KeyValues to the header.
+
+ All Tor versions can consume format version 1.0.0.
+
+ All Tor versions can consume format version 1.1.0 and later,
+ but Tor versions earlier than 0.3.5.1-alpha warn if the header
+ contains any KeyValue lines after the Timestamp.
+
+ Tor versions 0.4.0.3-alpha, 0.3.5.8, 0.3.4.11, and earlier do not
+ understand "vote=0". Instead, they will vote for the actual bandwidths
+ that sbws puts in diagnostic relay lines:
+ * 1 for relays with "unmeasured=1", and
+ * the relay's measured and scaled bandwidth when "under_min_report=1".
+
+2. Format details
+
+ The Bandwidth File MUST contain the following sections:
+ - Header List (exactly once), which is a partially ordered list of
+ - Header Lines (one or more times), then
+ - Relay Lines (zero or more times), in an arbitrary order.
+ If it does not contain these sections, parsers SHOULD ignore the file.
+
+2.1. Definitions
+
+ The following nonterminals are defined in Tor directory protocol
+ sections 1.2., 2.1.1., 2.1.3.:
+
+ bool
+ Int
+ SP (space)
+ NL (newline)
+ KeywordChar
+ ArgumentChar
+ nickname
+ hexdigest (a '$', followed by 40 hexadecimal characters
+ ([A-Fa-f0-9]))
+
+ Nonterminal defined section 2 of version-spec.txt [4]:
+
+ version_number
+
+ We define the following nonterminals:
+
+ Line ::= ArgumentChar* NL
+ RelayLine ::= KeyValue (SP KeyValue)* NL
+ HeaderLine ::= KeyValue NL
+ KeyValue ::= Key "=" Value
+ Key ::= (KeywordChar | "_")+
+ Value ::= ArgumentCharValue+
+ ArgumentCharValue ::= any printing ASCII character except NL and SP.
+ Terminator ::= "=====" or "===="
+ Generators SHOULD use a 5-character terminator.
+ Timestamp ::= Int
+ Bandwidth ::= Int
+ MasterKey ::= a base64-encoded Ed25519 public key, with
+ padding characters omitted.
+ DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601
+ CountryCode ::= Two capital ASCII letters ([A-Z]{2}), as defined in
+ ISO 3166-1 alpha-2 plus "ZZ" to denote unknown country
+ (eg the destination is in a Content Delivery Network).
+ CountryCodeList ::= One or more CountryCode(s) separated by a comma
+ ([A-Z]{2}(,[A-Z]{2})*).
+
+ Note that key_value and value are defined in Tor directory protocol
+ with different formats to KeyValue and Value here.
+
+ Tor versions earlier than 0.3.5.1-alpha require all lines in the file
+ to be 510 characters or less. The previous limit was 254 characters in
+ Tor 0.2.6.2-alpha and earlier. Parsers MAY ignore longer Lines.
+
+ Note that directory authorities are only supported on the two most
+ recent stable Tor versions, so we expect that line limits will be
+ removed after Tor 0.4.0 is released in 2019.
+
+2.2. Header List format
+
+ It consists of a Timestamp line and zero or more HeaderLines.
+
+ All the header lines MUST conform to the HeaderLine format, except
+ the first Timestamp line.
+
+ The Timestamp line is not a HeaderLine to keep compatibility with
+ the legacy Bandwidth File format.
+
+ Some header Lines MUST appear in specific positions, as documented
+ below. All other Lines can appear in any order.
+
+ If a parser does not recognize any extra material in a header Line,
+ the Line MUST be ignored.
+
+ If a header Line does not conform to this format, the Line SHOULD be
+ ignored by parsers.
+
+ It consists of:
+
+ Timestamp NL
+
+ [At start, exactly once.]
+
+ The Unix Epoch time in seconds of the most recent generator bandwidth
+ result.
+
+ If the generator implementation has multiple threads or
+ subprocesses which can fail independently, it SHOULD take the most
+ recent timestamp from each thread and use the oldest value. This
+ ensures all the threads continue running.
+
+ If there are threads that do not run continuously, they SHOULD be
+ excluded from the timestamp calculation.
+
+ If there are no recent results, the generator MUST NOT generate a new
+ file.
+
+ It does not follow the KeyValue format for backwards compatibility
+ with version 1.0.0.
+
+ "version" version_number NL
+
+ [In second position, zero or one time.]
+
+ The specification document format version.
+ It uses semantic versioning [5].
+
+ This Line was added in version 1.1.0 of this specification.
+
+ Version 1.0.0 documents do not contain this Line, and the
+ version_number is considered to be "1.0.0".
+
+ "software" Value NL
+
+ [Zero or one time.]
+
+ The name of the software that created the document.
+
+ This Line was added in version 1.1.0 of this specification.
+
+ Version 1.0.0 documents do not contain this Line, and the software
+ is considered to be "torflow".
+
+ "software_version" Value NL
+
+ [Zero or one time.]
+
+ The version of the software that created the document.
+ The version may be a version_number, a git commit, or some other
+ version scheme.
+
+ This Line was added in version 1.1.0 of this specification.
+
+ "file_created" DateTime NL
+
+ [Zero or one time.]
+
+ The date and time timestamp in ISO 8601 format and UTC time zone
+ when the file was created.
+
+ This Line was added in version 1.1.0 of this specification.
+
+ "generator_started" DateTime NL
+
+ [Zero or one time.]
+
+ The date and time timestamp in ISO 8601 format and UTC time zone
+ when the generator started.
+
+ This Line was added in version 1.1.0 of this specification.
+
+ "earliest_bandwidth" DateTime NL
+
+ [Zero or one time.]
+
+ The date and time timestamp in ISO 8601 format and UTC time zone
+ when the first relay bandwidth was obtained.
+
+ This Line was added in version 1.1.0 of this specification.
+
+ "latest_bandwidth" DateTime NL
+
+ [Zero or one time.]
+
+ The date and time timestamp in ISO 8601 format and UTC time zone
+ of the most recent generator bandwidth result.
+
+ This time MUST be identical to the initial Timestamp line.
+
+ This duplicate value is included to make the format easier for people
+ to read.
+
+ This Line was added in version 1.1.0 of this specification.
+
+ "number_eligible_relays" Int NL
+
+ [Zero or one time.]
+
+ The number of relays that have enough measurements to be
+ included in the bandwidth file.
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "minimum_percent_eligible_relays" Int NL
+
+ [Zero or one time.]
+
+ The percentage of relays in the consensus that SHOULD be
+ included in every generated bandwidth file.
+
+ If this threshold is not reached, format versions 1.3.0 and earlier
+ SHOULD NOT contain any relays. (Bandwidth files always include a
+ header.)
+
+ Format versions 1.4.0 and later SHOULD include all the relays for
+ diagnostic purposes, even if this threshold is not reached. But these
+ relays SHOULD be marked so that Tor does not vote on them.
+ See section 1.4 for details.
+
+ The minimum percentage is 60% in Torflow, so sbws uses
+ 60% as the default.
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "number_consensus_relays" Int NL
+
+ [Zero or one time.]
+
+ The number of relays in the consensus.
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "percent_eligible_relays" Int NL
+
+ [Zero or one time.]
+
+ The number of eligible relays, as a percentage of the number
+ of relays in the consensus.
+
+ This line SHOULD be equal to:
+ (number_eligible_relays * 100.0) / number_consensus_relays
+ to the number of relays in the consensus to include in this file.
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "minimum_number_eligible_relays" Int NL
+
+ [Zero or one time.]
+
+ The minimum number of relays that SHOULD be included in the bandwidth
+ file. See minimum_percent_eligible_relays for details.
+
+ This line SHOULD be equal to:
+ number_consensus_relays * (minimum_percent_eligible_relays / 100.0)
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "scanner_country" CountryCode NL
+
+ [Zero or one time.]
+
+ The country, as in political geolocation, where the generator is run.
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "destinations_countries" CountryCodeList NL
+
+ [Zero or one time.]
+
+ The country, as in political geolocation, or countries where the
+ destination Web server(s) are located.
+ The destination Web Servers serve the data that the generator retrieves
+ to measure the bandwidth.
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "recent_consensus_count" Int NL
+
+ [Zero or one time.].
+
+ The number of the different consensuses seen in the last data_period
+ days. (data_period is 5 by default.)
+
+ Assuming that Tor clients fetch a consensus every 1-2 hours,
+ and that the data_period is 5 days, the Value of this Key SHOULD be
+ between:
+ data_period * 24 / 2 = 60
+ data_period * 24 = 120
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_priority_list_count" Int NL
+
+ [Zero or one time.]
+
+ The number of times that a list with a subset of relays prioritized
+ to be measured has been created in the last data_period days.
+ (data_period is 5 by default.)
+
+ In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
+ approximately:
+ data_period * 24 / 1.5 = 80
+ Being 1.5 the approximate number of hours it takes to measure a
+ priority list of 7000 * 0.05 (350) relays, when the fraction of relays
+ in a priority list is the 5% (0.05).
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_priority_relay_count" Int NL
+
+ [Zero or one time.]
+
+ The number of relays that has been in in the list of relays prioritized
+ to be measured in the last data_period days. (data_period is 5 by
+ default.)
+
+ In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
+ approximately:
+ 80 * (7000 * 0.05) = 28000
+ Being 0.05 (5%) the fraction of relays in a priority list and 80
+ the approximate number of priority lists (see
+ "recent_priority_list_count").
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_measurement_attempt_count" Int NL
+
+ [Zero or one time.]
+
+ The number of times that any relay has been queued to be measured
+ in the last data_period days. (data_period is 5 by default.)
+
+ In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
+ approximately the same as "recent_priority_relay_count",
+ assuming that there is one attempt to measure a relay for each relay that
+ has been prioritized unless there are system, network or implementation
+ issues.
+
+ This Line was added in version 1.4.0 of this specification and removed
+ in version 1.5.0.
+
+ "recent_measurement_failure_count" Int NL
+
+ [Zero or one time.]
+
+ The number of times that the scanner attempted to measure a relay in
+ the last data_period days (5 by default), but the relay has not been
+ measured because of system, network or implementation issues.
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_measurements_excluded_error_count" Int NL
+
+ [Zero or one time.]
+
+ The number of relays that have no successful measurements in the last
+ data_period days (5 by default).
+
+ (See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_measurements_excluded_near_count" Int NL
+
+ [Zero or one time.]
+
+ The number of relays that have some successful measurements in the last
+ data_period days (5 by default), but all those measurements were
+ performed in a period of time that was too short (by default 1 day).
+
+ (See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_measurements_excluded_old_count" Int NL
+
+ [Zero or one time.]
+
+ The number of relays that have some successful measurements, but all
+ those measurements are too old (more than 5 days, by default).
+
+ Excludes relays that are already counted in
+ recent_measurements_excluded_near_count.
+
+ (See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_measurements_excluded_few_count" Int NL
+
+ [Zero or one time.]
+
+ The number of relays that don't have enough recent successful
+ measurements. (Fewer than 2 measurements in the last 5 days, by
+ default).
+
+ Excludes relays that are already counted in
+ recent_measurements_excluded_near_count and
+ recent_measurements_excluded_old_count.
+
+ (See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "time_to_report_half_network" Int NL
+
+ [Zero or one time.]
+
+ The time in seconds that it would take to report measurements about the
+ half of the network, given the number of eligible relays and the time
+ it took in the last days (5 days, by default).
+
+ (See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "tor_version" version_number NL
+
+ [Zero or one time.]
+
+ The Tor version of the Tor process controlled by the generator.
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "mu" Int NL
+
+ [Zero or one time.]
+
+ The network stream bandwidth average calculated as explained in B4.2.
+
+ This Line was added in version 1.7.0 of this specification.
+
+ "muf" Int NL
+
+ [Zero or one time.]
+
+ The network stream bandwidth average filtered calculated as explained in
+ B4.2.
+
+ This Line was added in version 1.7.0 of this specification.
+
+ KeyValue NL
+
+ [Zero or more times.]
+
+ There MUST NOT be multiple KeyValue header Lines with the same key.
+ If there are, the parser SHOULD choose an arbitrary Line.
+
+ If a parser does not recognize a Keyword in a KeyValue Line, it
+ MUST be ignored.
+
+ Future format versions may include additional KeyValue header Lines.
+ Additional header Lines will be accompanied by a minor version
+ increment.
+
+ Implementations MAY add additional header Lines as needed. This
+ specification SHOULD be updated to avoid conflicting meanings for
+ the same header keys.
+
+ Parsers MUST NOT rely on the order of these additional Lines.
+
+ Additional header Lines MUST NOT use any keywords specified in the
+ relay measurements format.
+ If there are, the parser MAY ignore conflicting keywords.
+
+ Terminator NL
+
+ [Zero or one time.]
+
+ The Header List section ends with a Terminator.
+
+ In version 1.0.0, Header List ends when the first relay bandwidth
+ is found conforming to the next section.
+
+ Implementations of version 1.1.0 and later SHOULD use a 5-character
+ terminator.
+
+ Tor 0.4.0.1-alpha and later look for a 5-character terminator,
+ or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2
+ used a 4-character terminator, this bug was fixed in 1.0.3.
+
+2.3. Relay Line format
+
+ It consists of zero or more RelayLines containing relay ids and
+ bandwidths. The relays and their KeyValues are in arbitrary order.
+
+ There MUST NOT be multiple KeyValue pairs with the same key in the same
+ RelayLine. If there are, the parser SHOULD choose an arbitrary Value.
+
+ There MUST NOT be multiple RelayLines per relay identity (node_id or
+ master_key_ed25519). If there are, parsers SHOULD issue a warning.
+ Parers MAY reject the file, choose an arbitrary RelayLine, or ignore
+ both RelayLines.
+
+ If a parser does not recognize any extra material in a RelayLine,
+ the extra material MUST be ignored.
+
+ Each RelayLine includes the following KeyValue pairs:
+
+ "node_id" hexdigest
+
+ [Exactly once.]
+
+ The fingerprint for the relay's RSA identity key.
+
+ Note: In bandwidth files read by Tor versions earlier than
+ 0.3.4.1-alpha, node_id MUST NOT be at the end of the Line.
+ These authority versions are no longer supported.
+
+ Current Tor versions ignore master_key_ed25519, so node_id MUST be
+ present in each relay Line.
+
+ Implementations of version 1.1.0 and later SHOULD include both node_id
+ and master_key_ed25519. Parsers SHOULD accept Lines that contain at
+ least one of them.
+
+ "master_key_ed25519" MasterKey
+
+ [Zero or one time.]
+
+ The relays's master Ed25519 key, base64 encoded,
+ without trailing "="s, to avoid ambiguity with KeyValue "="
+ character.
+
+ This KeyValue pair SHOULD be present, see the note under node_id.
+
+ This KeyValue was added in version 1.1.0 of this specification.
+
+ "bw" Bandwidth
+
+ [Exactly once.]
+
+ The bandwidth of this relay in kilobytes per second.
+
+ No Zero Bandwidths:
+ Tor accepts zero bandwidths, but they trigger bugs in older Tor
+ implementations. Therefore, implementations SHOULD NOT produce zero
+ bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
+ If there are zero bandwidths, the parser MAY ignore them.
+
+ Bandwidth Aggregation:
+ Multiple measurements can be aggregated using an averaging scheme,
+ such as a mean, median, or decaying average.
+
+ Bandwidth Scaling:
+ Torflow scales bandwidths to kilobytes per second. Other
+ implementations SHOULD use kilobytes per second for their initial
+ bandwidth scaling.
+
+ If different implementations or configurations are used in votes for
+ the same network, their measurements MAY need further scaling. See
+ Appendix B for information about scaling, and one possible scaling
+ method.
+
+ MaxAdvertisedBandwidth:
+ Bandwidth generators MUST limit the relays' measured bandwidth based
+ on the MaxAdvertisedBadwidth.
+ A relay's MaxAdvertisedBandwidth limits the bandwidth-avg in its
+ descriptor. bandwidth-avg is the minimum of MaxAdvertisedBandwidth,
+ BandwidthRate, RelayBandwidthRate, BandwidthBurst, and
+ RelayBandwidthBurst.
+ Therefore, generators MUST limit a relay's measured bandwidth to its
+ descriptor's bandwidth-avg. This limit needs to be implemented in the
+ generator, because generators may scale consensus weights before
+ sending them to Tor.
+ Generators SHOULD NOT limit measured bandwidths based on descriptors'
+ bandwidth-observed, because that penalises new relays.
+
+ sbws limits the relay's measured bandwidth to the bandwidth-avg
+ advertised.
+
+ Torflow partitions relays based on their bandwidth. For unmeasured
+ relays, Torflow uses the minimum of all descriptor bandwidths,
+ including bandwidth-avg (MaxAdvertisedBandwidth) and
+ bandwidth-observed. Then Torflow measures the relays in each partition
+ against each other, which implicitly limits a relay's measured
+ bandwidth to the bandwidths of similar relays.
+
+ Torflow also generates consensus weights based on the ratio between the
+ measured bandwidth and the minimum of all descriptor bandwidths (at the
+ time of the measurement). So when an operator reduces the
+ MaxAdvertisedBandwidth for a relay, Torflow reduces that relay's
+ measured bandwidth.
+
+ KeyValue
+
+ [Zero or more times.]
+
+ Future format versions may include additional KeyValue pairs on a
+ RelayLine.
+ Additional KeyValue pairs will be accompanied by a minor version
+ increment.
+
+ Implementations MAY add additional relay KeyValue pairs as needed.
+ This specification SHOULD be updated to avoid conflicting meanings
+ for the same Keywords.
+
+ Parsers MUST NOT rely on the order of these additional KeyValue
+ pairs.
+
+ Additional KeyValue pairs MUST NOT use any keywords specified in the
+ header format.
+ If there are, the parser MAY ignore conflicting keywords.
+
+2.4. Implementation details
+
+2.4.1. Writing bandwidth files atomically
+
+ To avoid inconsistent reads, implementations SHOULD write bandwidth files
+ atomically. If the file is transferred from another host, it SHOULD be
+ written to a temporary path, then renamed to the V3BandwidthsFile path.
+
+ sbws versions 0.7.0 and later write the bandwidth file to an archival
+ location, create a temporary symlink to that location, then atomically rename
+ the symlink
+ to the configured V3BandwidthsFile path.
+
+ Torflow does not write bandwidth files atomically.
+
+2.4.2. Additional KeyValue pair definitions
+
+ KeyValue pairs in RelayLines that current implementations generate.
+
+2.4.2.1. Simple Bandwidth Scanner
+
+ sbws RelayLines contain these keys:
+
+ "node_id" hexdigest
+
+ As above.
+
+ "bw" Bandwidth
+
+ As above.
+
+ "nick" nickname
+
+ [Exactly once.]
+
+ The relay nickname.
+
+ Torflow also has a "nick" KeyValue.
+
+ "rtt" Int
+
+ [Zero or one time.]
+
+ The Round Trip Time in milliseconds to obtain 1 byte of data.
+
+ This KeyValue was added in version 1.1.0 of this specification.
+ It became optional in version 1.3.0 or 1.4.0 of this specification.
+
+ "time" DateTime
+
+ [Exactly once.]
+
+ The date and time timestamp in ISO 8601 format and UTC time zone
+ when the last bandwidth was obtained.
+
+ This KeyValue was added in version 1.1.0 of this specification.
+ The Torflow equivalent is "measured_at".
+
+ "success" Int
+
+ [Zero or one time.]
+
+ The number of times that the bandwidth measurements for this relay were
+ successful.
+
+ This KeyValue was added in version 1.1.0 of this specification.
+
+ "error_circ" Int
+
+ [Zero or one time.]
+
+ The number of times that the bandwidth measurements for this relay
+ failed because of circuit failures.
+
+ This KeyValue was added in version 1.1.0 of this specification.
+ The Torflow equivalent is "circ_fail".
+
+ "error_stream" Int
+
+ [Zero or one time.]
+
+ The number of times that the bandwidth measurements for this relay
+ failed because of stream failures.
+
+ This KeyValue was added in version 1.1.0 of this specification.
+
+ "error_destination" Int
+
+ [Zero or one time.]
+
+ The number of times that the bandwidth measurements for this relay
+ failed because the destination Web server was not available.
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "error_second_relay" Int
+
+ [Zero or one time.]
+
+ The number of times that the bandwidth measurements for this relay
+ failed because sbws could not find a second relay for the test circuit.
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "error_misc" Int
+
+ [Zero or one time.]
+
+ The number of times that the bandwidth measurements for this relay
+ failed because of other reasons.
+
+ This KeyValue was added in version 1.1.0 of this specification.
+
+ "bw_mean" Int
+
+ [Zero or one time.]
+
+ The measured bandwidth mean for this relay in bytes per second.
+
+ This KeyValue was added in version 1.2.0 of this specification.
+
+ "bw_median" Int
+
+ [Zero or one time.]
+
+ The measured bandwidth median for this relay in bytes per second.
+
+ This KeyValue was added in version 1.2.0 of this specification.
+
+ "desc_bw_avg" Int
+
+ [Zero or one time.]
+
+ The descriptor average bandwidth for this relay in bytes per second.
+
+ This KeyValue was added in version 1.2.0 of this specification.
+
+ "desc_bw_obs_last" Int
+
+ [Zero or one time.]
+
+ The last descriptor observed bandwidth for this relay in bytes per
+ second.
+
+ This KeyValue was added in version 1.2.0 of this specification.
+
+ "desc_bw_obs_mean" Int
+
+ [Zero or one time.]
+
+ The descriptor observed bandwidth mean for this relay in bytes per
+ second.
+
+ This KeyValue was added in version 1.2.0 of this specification.
+
+ "desc_bw_bur" Int
+
+ [Zero or one time.]
+
+ The descriptor burst bandwidth for this relay in bytes per
+ second.
+
+ This KeyValue was added in version 1.2.0 of this specification.
+
+ "consensus_bandwidth" Int
+
+ [Zero or one time.]
+
+ The consensus bandwidth for this relay in bytes per second.
+
+ This KeyValue was added in version 1.2.0 of this specification.
+
+ "consensus_bandwidth_is_unmeasured" Bool
+
+ [Zero or one time.]
+
+ If the consensus bandwidth for this relay was not obtained from
+ three or more bandwidth authorities, this KeyValue is True or
+ False otherwise.
+
+ This KeyValue was added in version 1.2.0 of this specification.
+
+ "relay_in_recent_consensus_count" Int
+
+ [Zero or one time.]
+
+ The number of times this relay was found in a consensus in the
+ last data_period days. (Unless otherwise stated, data_period is
+ 5 by default.)
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "relay_recent_priority_list_count" Int
+
+ [Zero or one time.]
+
+ The number of times this relay has been prioritized to be measured
+ in the last data_period days.
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "relay_recent_measurement_attempt_count" Int
+
+ [Zero or one time.]
+
+ The number of times this relay was tried to be measured in the
+ last data_period days.
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "relay_recent_measurement_failure_count" Int
+
+ [Zero or one time.]
+
+ The number of times this relay was tried to be measured in the
+ last data_period days, but it was not possible to obtain a
+ measurement.
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "relay_recent_measurements_excluded_error_count" Int
+
+ [Zero or one time.]
+
+ The number of recent relay measurement attempts that failed.
+ Measurements are recent if they are in the last data_period days
+ (5 by default).
+
+ (See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "relay_recent_measurements_excluded_near_count" Int
+
+ [Zero or one time.]
+
+ When all of a relay's recent successful measurements were performed in
+ a period of time that was too short (by default 1 day), the relay is
+ excluded. This KeyValue contains the number of recent successful
+ measurements for the relay that were ignored for this reason.
+
+ (See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "relay_recent_measurements_excluded_old_count" Int
+
+ [Zero or one time.]
+
+ The number of successful measurements for this relay that are too old
+ (more than data_period days, 5 by default).
+
+ Excludes measurements that are already counted in
+ relay_recent_measurements_excluded_near_count.
+
+ (See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "relay_recent_measurements_excluded_few_count" Int
+
+ [Zero or one time.]
+
+ The number of successful measurements for this relay that were ignored
+ because the relay did not have enough successful measurements (fewer
+ than 2, by default).
+
+ Excludes measurements that are already counted in
+ relay_recent_measurements_excluded_near_count or
+ relay_recent_measurements_excluded_old_count.
+
+ (See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "under_min_report" bool
+
+ [Zero or one time.]
+
+ If the value is 1, there are not enough eligible relays in the
+ bandwidth file, and Tor bandwidth authorities MAY NOT vote on this
+ relay. (Current Tor versions do not change their behaviour based on
+ the "under_min_report" key.)
+
+ If the value is 0 or the KeyValue is not present, there are enough
+ relays in the bandwidth file.
+
+ Because Tor versions released before April 2019 (see section 1.4. for
+ the full list of versions) ignore "vote=0", generator implementations
+ MUST NOT change the bandwidths for under_min_report relays. Using the
+ same bw value makes authorities that do not understand "vote=0"
+ or "under_min_report=1" produce votes that don't change relay weights
+ too much. It also avoids flapping when the reporting threshold is
+ reached.
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "unmeasured" bool
+
+ [Zero or one time.]
+
+ If the value is 1, this relay was not successfully measured and
+ Tor bandwidth authorities MAY NOT vote on this relay.
+ (Current Tor versions do not change their behaviour based on
+ the "unmeasured" key.)
+
+ If the value is 0 or the KeyValue is not present, this relay
+ was successfully measured.
+
+ Because Tor versions released before April 2019 (see section 1.4. for
+ the full list of versions) ignore "vote=0", generator implementations
+ MUST set "bw=1" for unmeasured relays. Using the minimum bw value
+ makes authorities that do not understand "vote=0" or "unmeasured=1"
+ produce votes that don't change relay weights too much.
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "vote" bool
+
+ [Zero or one time.]
+
+ If the value is 0, Tor directory authorities SHOULD ignore the relay's
+ entry in the bandwidth file. They SHOULD vote for the relay the same
+ way they would vote for a relay that is not present in the file.
+
+ This MAY be the case when this relay was not successfully measured but
+ it is included in the Bandwidth File, to diagnose why they were not
+ measured.
+
+ If the value is 1 or the KeyValue is not present, Tor directory
+ authorities MUST use the relay's bw value in any votes for that relay.
+
+ Implementations MUST also set "bw=1" for unmeasured relays.
+ But they MUST NOT change the bw for under_min_report relays.
+ (See the explanations under "unmeasured" and "under_min_report"
+ for more details.)
+
+ This KeyValue was added in version 1.4.0 of this specification.
+
+ "xoff_recv" Int
+
+ [Zero or one time.]
+
+ The number of times this relay received `XOFF_RECV` stream events while
+ being measured in the last data_period days.
+
+ This KeyValue was added in version 1.6.0 of this specification.
+
+ "xoff_sent" Int
+
+ [Zero or one time.]
+
+ The number of times this relay received `XOFF_SENT` stream events while
+ being measured in the last data_period days.
+
+ This KeyValue was added in version 1.6.0 of this specification.
+
+ "r_strm" Float
+
+ [Zero or one time.]
+
+ The stream ratio of this relay calculated as explained in B4.3.
+
+ This KeyValue was added in version 1.7.0 of this specification.
+
+ "r_strm_filt" Float
+
+ [Zero or one time.]
+
+ The filtered stream ratio of this relay calculated as explained in B4.3.
+
+ This KeyValue was added in version 1.7.0 of this specification.
+
+
+2.4.2.2. Torflow
+
+ Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].
+
+References:
+
+1. https://gitweb.torproject.org/torflow.git
+2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
+ The Torflow specification is outdated, and does not match the current
+ implementation. See section A.1. for the format produced by Torflow.
+3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
+4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt
+5. https://semver.org/
+
+A. Sample data
+
+The following has not been obtained from any real measurement.
+
+A.1. Generated by Torflow
+
+This an example version 1.0.0 document:
+
+1523911758
+node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
+node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
+
+A.2. Generated by sbws version 0.1.0
+
+1523911758
+version=1.1.0
+software=sbws
+software_version=0.1.0
+latest_bandwidth=2018-04-16T20:49:18
+file_created=2018-04-16T21:49:18
+generator_started=2018-04-16T15:13:25
+earliest_bandwidth=2018-04-16T15:13:26
+====
+bw=380 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26
+bw=189 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36
+
+A.3. Generated by sbws version 1.0.3
+
+1523911758
+version=1.2.0
+latest_bandwidth=2018-04-16T20:49:18
+file_created=2018-04-16T21:49:18
+generator_started=2018-04-16T15:13:25
+earliest_bandwidth=2018-04-16T15:13:26
+minimum_number_eligible_relays=3862
+minimum_percent_eligible_relays=60
+number_consensus_relays=6436
+number_eligible_relays=6000
+percent_eligible_relays=93
+software=sbws
+software_version=1.0.3
+=====
+bw=38000 bw_mean=1127824 bw_median=1180062 desc_bw_avg=1073741824 desc_bw_obs_last=17230879 desc_bw_obs_mean=14732306 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26
+bw=1 bw_mean=199162 bw_median=185675 desc_bw_avg=409600 desc_bw_obs_last=836165 desc_bw_obs_mean=858030 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36
+
+A.3.1. When there are not enough eligible measured relays:
+
+1540496079
+version=1.2.0
+earliest_bandwidth=2018-10-20T19:35:52
+file_created=2018-10-25T19:35:03
+generator_started=2018-10-25T11:42:56
+latest_bandwidth=2018-10-25T19:34:39
+minimum_number_eligible_relays=3862
+minimum_percent_eligible_relays=60
+number_consensus_relays=6436
+number_eligible_relays=2960
+percent_eligible_relays=46
+software=sbws
+software_version=1.0.3
+=====
+
+A.4. Headers generated by sbws version 1.0.4
+
+1523911758
+version=1.2.0
+latest_bandwidth=2018-04-16T20:49:18
+destinations_countries=TH,ZZ
+file_created=2018-04-16T21:49:18
+generator_started=2018-04-16T15:13:25
+earliest_bandwidth=2018-04-16T15:13:26
+minimum_number_eligible_relays=3862
+minimum_percent_eligible_relays=60
+number_consensus_relays=6436
+number_eligible_relays=6000
+percent_eligible_relays=93
+scanner_country=SN
+software=sbws
+software_version=1.0.4
+=====
+
+A.5 Generated by sbws version 1.1.0
+
+1523911758
+version=1.4.0
+latest_bandwidth=2018-04-16T20:49:18
+destinations_countries=TH,ZZ
+file_created=2018-04-16T21:49:18
+generator_started=2018-04-16T15:13:25
+earliest_bandwidth=2018-04-16T15:13:26
+minimum_number_eligible_relays=3862
+minimum_percent_eligible_relays=60
+number_consensus_relays=6436
+number_eligible_relays=6000
+percent_eligible_relays=93
+recent_measurement_attempt_count=6243
+recent_measurement_failure_count=732
+recent_measurements_excluded_error_count=969
+recent_measurements_excluded_few_count=3946
+recent_measurements_excluded_near_count=90
+recent_measurements_excluded_old_count=0
+recent_priority_list_count=20
+recent_priority_relay_count=6243
+scanner_country=SN
+software=sbws
+software_version=1.1.0
+time_to_report_half_network=57273
+=====
+bw=1 error_circ=1 error_destination=0 error_misc=0 error_second_relay=0 error_stream=0 master_key_ed25519=J3HQ24kOQWac3L1xlFLp7gY91qkb5NuKxjj1BhDi+m8 nick=snap269 node_id=$DC4D609F95A52614D1E69C752168AF1FCAE0B05F relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=1 relay_recent_measurements_excluded_near_count=3 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=3 time=2019-03-16T18:20:57 unmeasured=1 vote=0
+bw=1 error_circ=0 error_destination=0 error_misc=0 error_second_relay=0 error_stream=2 master_key_ed25519=h6ZB1E1yBFWIMloUm9IWwjgaPXEpL5cUbuoQDgdSDKg nick=relay node_id=$C4544F9E209A9A9B99591D548B3E2822236C0503 relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=2 relay_recent_measurements_excluded_few_count=1 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=1 time=2019-03-17T06:50:58 unmeasured=1 vote=0
+
+B. Scaling bandwidths
+
+B.1. Scaling requirements
+
+ Tor accepts zero bandwidths, but they trigger bugs in older Tor
+ implementations. Therefore, scaling methods SHOULD perform the
+ following checks:
+ * If the total bandwidth is zero, all relays should be given equal
+ bandwidths.
+ * If the scaled bandwidth is zero, it should be rounded up to one.
+
+ Initial experiments indicate that scaling may not be needed for
+ torflow and sbws, because their measured bandwidths are similar
+ enough already.
+
+B.2. A linear scaling method
+
+ If scaling is required, here is a simple linear bandwidth scaling
+ method, which ensures that all bandwidth votes contain approximately
+ the same total bandwidth:
+
+ 1. Calculate the relay quota by dividing the total measured bandwidth
+ in all votes, by the number of relays with measured bandwidth
+ votes. In the public tor network, this is approximately 7500 as of
+ April 2018. The quota should be a consensus parameter, so it can be
+ adjusted for all generators on the network.
+
+ 2. Calculate a vote quota by multiplying the relay quota by the number
+ of relays this bandwidth authority has measured
+ bandwidths for.
+
+ 3. Calculate a scaling factor by dividing the vote quota by the
+ total unscaled measured bandwidth in this bandwidth
+ authority's upcoming vote.
+
+ 4. Multiply each unscaled measured bandwidth by the scaling
+ factor.
+
+ Now, the total scaled bandwidth in the upcoming vote is
+ approximately equal to the quota.
+
+B.3. Quota changes
+
+ If all generators are using scaling, the quota can be gradually
+ reduced or increased as needed. Smaller quotas decrease the size
+ of uncompressed consensuses, and may decrease the size of
+ consensus diffs and compressed consensuses. But if the relay
+ quota is too small, some relays may be over- or under-weighted.
+
+B.4. Torflow aggregation
+
+ Torflow implements two methods to compute the bandwidth values from the
+ (stream) bandwidth measurements: with and without PID control feedback.
+ The method described here is without PID control (see Torflow
+ specification, section 2.2).
+
+ In the following sections, the relays' measured bandwidth refer to the
+ ones that this bandwidth authority has measured for the relays that
+ would be included in the next bandwidth authority's upcoming vote.
+
+ 1. Calculate the filtered bandwidth for each relay:
+ - choose the relay's measurements (`bw_j`) that are equal or greater
+ than the mean of the measurements for this relay
+ - calculate the mean of those measurements
+
+ In pseudocode:
+
+ bw_filt_i = mean(max(mean(bw_j), bw_j))
+
+ 2. Calculate network averages:
+ - calculate the filtered average by dividing the sum of all the
+ relays' filtered bandwidth by the number of relays that have been
+ measured (`n`), ie, calculate the mean average of the relays'
+ filtered bandwidth.
+ - calculate the stream average by dividing the sum of all the
+ relays' measured bandwidth by the number of relays that have been
+ measured (`n`), ie, calculate the mean average or the relays'
+ measured bandwidth.
+
+ In pseudocode:
+
+ bw_avg_filt_ = bw_filt_i / n
+ bw_avg_strm = bw_i / n
+
+ 3. Calculate ratios for each relay:
+ - calculate the filtered ratio by dividing each relay filtered
+ bandwidth by the filtered average
+ - calculate the stream ratio by dividing each relay measured
+ bandwidth by the stream average
+
+ In pseudocode:
+
+ r_filt_i = bw_filt_i / bw_avg_filt
+ r_strm_i = bw_i / bw_avg_strm
+
+ 4. Calculate the final ratio for each relay:
+ The final ratio is the larger between the filtered bandwidth's and the
+ stream bandwidth's ratio.
+
+ In pseudocode:
+
+ r_i = max(r_filt_i, r_strm_i)
+
+ 5. Calculate the scaled bandwidth for each relay:
+ The most recent descriptor observed bandwidth (`bw_obs_i`) is
+ multiplied by the ratio
+
+ In pseudocode:
+
+ bw_new_i = r_i * bw_obs_i
+
+ <<In this way, the resulting network status consensus bandwidth
+ values are effectively re-weighted proportional to how much faster
+ the node was as compared to the rest of the network.>>