diff options
Diffstat (limited to 'spec/bandwidth-file-spec')
-rw-r--r-- | spec/bandwidth-file-spec/definitions.md | 55 | ||||
-rw-r--r-- | spec/bandwidth-file-spec/format-details.md | 11 | ||||
-rw-r--r-- | spec/bandwidth-file-spec/header-list-format.md | 422 | ||||
-rw-r--r-- | spec/bandwidth-file-spec/implementation-details.md | 398 | ||||
-rw-r--r-- | spec/bandwidth-file-spec/index.md | 18 | ||||
-rw-r--r-- | spec/bandwidth-file-spec/relay-line-format.md | 129 | ||||
-rw-r--r-- | spec/bandwidth-file-spec/sample-data.md | 139 | ||||
-rw-r--r-- | spec/bandwidth-file-spec/scaling-bandwidths.md | 132 | ||||
-rw-r--r-- | spec/bandwidth-file-spec/scope-preliminaries.md | 85 |
9 files changed, 1389 insertions, 0 deletions
diff --git a/spec/bandwidth-file-spec/definitions.md b/spec/bandwidth-file-spec/definitions.md new file mode 100644 index 0000000..c01b407 --- /dev/null +++ b/spec/bandwidth-file-spec/definitions.md @@ -0,0 +1,55 @@ +<a id="bandwidth-file-spec.txt-2.1"></a> + +# Definitions + +The following nonterminals are defined in Tor directory protocol +sections 1.2., 2.1.1., 2.1.3.: + +```text + bool + Int + SP (space) + NL (newline) + KeywordChar + ArgumentChar + nickname + hexdigest (a '$', followed by 40 hexadecimal characters + ([A-Fa-f0-9])) + + Nonterminal defined section 2 of version-spec.txt [4]: + + version_number + + We define the following nonterminals: + + Line ::= ArgumentChar* NL + RelayLine ::= KeyValue (SP KeyValue)* NL + HeaderLine ::= KeyValue NL + KeyValue ::= Key "=" Value + Key ::= (KeywordChar | "_")+ + Value ::= ArgumentCharValue+ + ArgumentCharValue ::= any printing ASCII character except NL and SP. + Terminator ::= "=====" or "====" + Generators SHOULD use a 5-character terminator. + Timestamp ::= Int + Bandwidth ::= Int + MasterKey ::= a base64-encoded Ed25519 public key, with + padding characters omitted. + DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601 + CountryCode ::= Two capital ASCII letters ([A-Z]{2}), as defined in + ISO 3166-1 alpha-2 plus "ZZ" to denote unknown country + (eg the destination is in a Content Delivery Network). + CountryCodeList ::= One or more CountryCode(s) separated by a comma + ([A-Z]{2}(,[A-Z]{2})*). +``` + +Note that key_value and value are defined in Tor directory protocol +with different formats to KeyValue and Value here. + +Tor versions earlier than 0.3.5.1-alpha require all lines in the file +to be 510 characters or less. The previous limit was 254 characters in +Tor 0.2.6.2-alpha and earlier. Parsers MAY ignore longer Lines. + +Note that directory authorities are only supported on the two most +recent stable Tor versions, so we expect that line limits will be +removed after Tor 0.4.0 is released in 2019. diff --git a/spec/bandwidth-file-spec/format-details.md b/spec/bandwidth-file-spec/format-details.md new file mode 100644 index 0000000..c5de6f3 --- /dev/null +++ b/spec/bandwidth-file-spec/format-details.md @@ -0,0 +1,11 @@ +<a id="bandwidth-file-spec.txt-2"></a> + +# Format details + +```text + The Bandwidth File MUST contain the following sections: + - Header List (exactly once), which is a partially ordered list of + - Header Lines (one or more times), then + - Relay Lines (zero or more times), in an arbitrary order. + If it does not contain these sections, parsers SHOULD ignore the file. +``` diff --git a/spec/bandwidth-file-spec/header-list-format.md b/spec/bandwidth-file-spec/header-list-format.md new file mode 100644 index 0000000..206ff0c --- /dev/null +++ b/spec/bandwidth-file-spec/header-list-format.md @@ -0,0 +1,422 @@ +<a id="bandwidth-file-spec.txt-2.2"></a> + +# Header List format + +It consists of a Timestamp line and zero or more HeaderLines. + +All the header lines MUST conform to the HeaderLine format, except +the first Timestamp line. + +The Timestamp line is not a HeaderLine to keep compatibility with +the legacy Bandwidth File format. + +Some header Lines MUST appear in specific positions, as documented +below. All other Lines can appear in any order. + +If a parser does not recognize any extra material in a header Line, +the Line MUST be ignored. + +If a header Line does not conform to this format, the Line SHOULD be +ignored by parsers. + +It consists of: + +Timestamp NL + +\[At start, exactly once.\] + +The Unix Epoch time in seconds of the most recent generator bandwidth +result. + +If the generator implementation has multiple threads or +subprocesses which can fail independently, it SHOULD take the most +recent timestamp from each thread and use the oldest value. This +ensures all the threads continue running. + +If there are threads that do not run continuously, they SHOULD be +excluded from the timestamp calculation. + +If there are no recent results, the generator MUST NOT generate a new +file. + +It does not follow the KeyValue format for backwards compatibility +with version 1.0.0. + +"version" version_number NL + +\[In second position, zero or one time.\] + +The specification document format version. +It uses semantic versioning \[5\]. + +This Line was added in version 1.1.0 of this specification. + +Version 1.0.0 documents do not contain this Line, and the +version_number is considered to be "1.0.0". + +"software" Value NL + +\[Zero or one time.\] + +The name of the software that created the document. + +This Line was added in version 1.1.0 of this specification. + +Version 1.0.0 documents do not contain this Line, and the software +is considered to be "torflow". + +"software_version" Value NL + +\[Zero or one time.\] + +The version of the software that created the document. +The version may be a version_number, a git commit, or some other +version scheme. + +This Line was added in version 1.1.0 of this specification. + +"file_created" DateTime NL + +\[Zero or one time.\] + +The date and time timestamp in ISO 8601 format and UTC time zone +when the file was created. + +This Line was added in version 1.1.0 of this specification. + +"generator_started" DateTime NL + +\[Zero or one time.\] + +The date and time timestamp in ISO 8601 format and UTC time zone +when the generator started. + +This Line was added in version 1.1.0 of this specification. + +"earliest_bandwidth" DateTime NL + +\[Zero or one time.\] + +The date and time timestamp in ISO 8601 format and UTC time zone +when the first relay bandwidth was obtained. + +This Line was added in version 1.1.0 of this specification. + +"latest_bandwidth" DateTime NL + +\[Zero or one time.\] + +The date and time timestamp in ISO 8601 format and UTC time zone +of the most recent generator bandwidth result. + +This time MUST be identical to the initial Timestamp line. + +This duplicate value is included to make the format easier for people +to read. + +This Line was added in version 1.1.0 of this specification. + +"number_eligible_relays" Int NL + +\[Zero or one time.\] + +The number of relays that have enough measurements to be +included in the bandwidth file. + +This Line was added in version 1.2.0 of this specification. + +"minimum_percent_eligible_relays" Int NL + +\[Zero or one time.\] + +The percentage of relays in the consensus that SHOULD be +included in every generated bandwidth file. + +If this threshold is not reached, format versions 1.3.0 and earlier +SHOULD NOT contain any relays. (Bandwidth files always include a +header.) + +Format versions 1.4.0 and later SHOULD include all the relays for +diagnostic purposes, even if this threshold is not reached. But these +relays SHOULD be marked so that Tor does not vote on them. +See section 1.4 for details. + +The minimum percentage is 60% in Torflow, so sbws uses +60% as the default. + +This Line was added in version 1.2.0 of this specification. + +"number_consensus_relays" Int NL + +\[Zero or one time.\] + +The number of relays in the consensus. + +This Line was added in version 1.2.0 of this specification. + +"percent_eligible_relays" Int NL + +\[Zero or one time.\] + +The number of eligible relays, as a percentage of the number +of relays in the consensus. + +```text + This line SHOULD be equal to: + (number_eligible_relays * 100.0) / number_consensus_relays + to the number of relays in the consensus to include in this file. + + This Line was added in version 1.2.0 of this specification. + + "minimum_number_eligible_relays" Int NL + + [Zero or one time.] +``` + +The minimum number of relays that SHOULD be included in the bandwidth +file. See minimum_percent_eligible_relays for details. + +```text + This line SHOULD be equal to: + number_consensus_relays * (minimum_percent_eligible_relays / 100.0) + + This Line was added in version 1.2.0 of this specification. + + "scanner_country" CountryCode NL + + [Zero or one time.] + + The country, as in political geolocation, where the generator is run. + + This Line was added in version 1.2.0 of this specification. + + "destinations_countries" CountryCodeList NL + + [Zero or one time.] +``` + +The country, as in political geolocation, or countries where the +destination Web server(s) are located. +The destination Web Servers serve the data that the generator retrieves +to measure the bandwidth. + +This Line was added in version 1.2.0 of this specification. + +"recent_consensus_count" Int NL + +\[Zero or one time.\]. + +The number of the different consensuses seen in the last data_period +days. (data_period is 5 by default.) + +```text + Assuming that Tor clients fetch a consensus every 1-2 hours, + and that the data_period is 5 days, the Value of this Key SHOULD be + between: + data_period * 24 / 2 = 60 + data_period * 24 = 120 + + This Line was added in version 1.4.0 of this specification. + + "recent_priority_list_count" Int NL + + [Zero or one time.] +``` + +The number of times that a list with a subset of relays prioritized +to be measured has been created in the last data_period days. +(data_period is 5 by default.) + +```text + In 2019, with 7000 relays in the network, the Value of this Key SHOULD be + approximately: + data_period * 24 / 1.5 = 80 + Being 1.5 the approximate number of hours it takes to measure a + priority list of 7000 * 0.05 (350) relays, when the fraction of relays + in a priority list is the 5% (0.05). + + This Line was added in version 1.4.0 of this specification. + + "recent_priority_relay_count" Int NL + + [Zero or one time.] +``` + +The number of relays that has been in in the list of relays prioritized +to be measured in the last data_period days. (data_period is 5 by +default.) + +```text + In 2019, with 7000 relays in the network, the Value of this Key SHOULD be + approximately: + 80 * (7000 * 0.05) = 28000 + Being 0.05 (5%) the fraction of relays in a priority list and 80 + the approximate number of priority lists (see + "recent_priority_list_count"). + + This Line was added in version 1.4.0 of this specification. + + "recent_measurement_attempt_count" Int NL + + [Zero or one time.] +``` + +The number of times that any relay has been queued to be measured +in the last data_period days. (data_period is 5 by default.) + +In 2019, with 7000 relays in the network, the Value of this Key SHOULD be +approximately the same as "recent_priority_relay_count", +assuming that there is one attempt to measure a relay for each relay that +has been prioritized unless there are system, network or implementation +issues. + +This Line was added in version 1.4.0 of this specification and removed +in version 1.5.0. + +"recent_measurement_failure_count" Int NL + +\[Zero or one time.\] + +The number of times that the scanner attempted to measure a relay in +the last data_period days (5 by default), but the relay has not been +measured because of system, network or implementation issues. + +This Line was added in version 1.4.0 of this specification. + +"recent_measurements_excluded_error_count" Int NL + +\[Zero or one time.\] + +The number of relays that have no successful measurements in the last +data_period days (5 by default). + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"recent_measurements_excluded_near_count" Int NL + +\[Zero or one time.\] + +The number of relays that have some successful measurements in the last +data_period days (5 by default), but all those measurements were +performed in a period of time that was too short (by default 1 day). + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"recent_measurements_excluded_old_count" Int NL + +\[Zero or one time.\] + +The number of relays that have some successful measurements, but all +those measurements are too old (more than 5 days, by default). + +Excludes relays that are already counted in +recent_measurements_excluded_near_count. + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"recent_measurements_excluded_few_count" Int NL + +\[Zero or one time.\] + +The number of relays that don't have enough recent successful +measurements. (Fewer than 2 measurements in the last 5 days, by +default). + +Excludes relays that are already counted in +recent_measurements_excluded_near_count and +recent_measurements_excluded_old_count. + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"time_to_report_half_network" Int NL + +\[Zero or one time.\] + +The time in seconds that it would take to report measurements about the +half of the network, given the number of eligible relays and the time +it took in the last days (5 days, by default). + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"tor_version" version_number NL + +\[Zero or one time.\] + +The Tor version of the Tor process controlled by the generator. + +This Line was added in version 1.4.0 of this specification. + +"mu" Int NL + +\[Zero or one time.\] + +The network stream bandwidth average calculated as explained in B4.2. + +This Line was added in version 1.7.0 of this specification. + +"muf" Int NL + +\[Zero or one time.\] + +The network stream bandwidth average filtered calculated as explained in +B4.2. + +This Line was added in version 1.7.0 of this specification. + +KeyValue NL + +\[Zero or more times.\] + +"dirauth_nickname" NL + +\[Zero or one time.\] + +The dirauth's nickname which publishes this V3BandwidthsFile. + +This Line was added in version 1.8.0 of this specification. + +There MUST NOT be multiple KeyValue header Lines with the same key. +If there are, the parser SHOULD choose an arbitrary Line. + +If a parser does not recognize a Keyword in a KeyValue Line, it +MUST be ignored. + +Future format versions may include additional KeyValue header Lines. +Additional header Lines will be accompanied by a minor version +increment. + +Implementations MAY add additional header Lines as needed. This +specification SHOULD be updated to avoid conflicting meanings for +the same header keys. + +Parsers MUST NOT rely on the order of these additional Lines. + +Additional header Lines MUST NOT use any keywords specified in the +relay measurements format. +If there are, the parser MAY ignore conflicting keywords. + +Terminator NL + +\[Zero or one time.\] + +The Header List section ends with a Terminator. + +In version 1.0.0, Header List ends when the first relay bandwidth +is found conforming to the next section. + +Implementations of version 1.1.0 and later SHOULD use a 5-character +terminator. + +Tor 0.4.0.1-alpha and later look for a 5-character terminator, +or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2 +used a 4-character terminator, this bug was fixed in 1.0.3. diff --git a/spec/bandwidth-file-spec/implementation-details.md b/spec/bandwidth-file-spec/implementation-details.md new file mode 100644 index 0000000..dbc02cc --- /dev/null +++ b/spec/bandwidth-file-spec/implementation-details.md @@ -0,0 +1,398 @@ +<a id="bandwidth-file-spec.txt-2.4"></a> + +# Implementation details + +<a id="bandwidth-file-spec.txt-2.4.1"></a> + +## Writing bandwidth files atomically { #write-atomically } + +To avoid inconsistent reads, implementations SHOULD write bandwidth files +atomically. If the file is transferred from another host, it SHOULD be +written to a temporary path, then renamed to the V3BandwidthsFile path. + +sbws versions 0.7.0 and later write the bandwidth file to an archival +location, create a temporary symlink to that location, then atomically rename +the symlink +to the configured V3BandwidthsFile path. + +Torflow does not write bandwidth files atomically. + +<a id="bandwidth-file-spec.txt-2.4.2"></a> + +## Additional KeyValue pair definitions { #key-value-pairs } + +KeyValue pairs in RelayLines that current implementations generate. + +<a id="bandwidth-file-spec.txt-2.4.2.1"></a> + +### Simple Bandwidth Scanner { #sbws } + +sbws RelayLines contain these keys: + +"node_id" hexdigest + +As above. + +"bw" Bandwidth + +As above. + +"nick" nickname + +\[Exactly once.\] + +The relay nickname. + +Torflow also has a "nick" KeyValue. + +"rtt" Int + +\[Zero or one time.\] + +The Round Trip Time in milliseconds to obtain 1 byte of data. + +This KeyValue was added in version 1.1.0 of this specification. +It became optional in version 1.3.0 or 1.4.0 of this specification. + +"time" DateTime + +\[Exactly once.\] + +The date and time timestamp in ISO 8601 format and UTC time zone +when the last bandwidth was obtained. + +This KeyValue was added in version 1.1.0 of this specification. +The Torflow equivalent is "measured_at". + +"success" Int + +\[Zero or one time.\] + +The number of times that the bandwidth measurements for this relay were +successful. + +This KeyValue was added in version 1.1.0 of this specification. + +"error_circ" Int + +\[Zero or one time.\] + +The number of times that the bandwidth measurements for this relay +failed because of circuit failures. + +This KeyValue was added in version 1.1.0 of this specification. +The Torflow equivalent is "circ_fail". + +"error_stream" Int + +\[Zero or one time.\] + +The number of times that the bandwidth measurements for this relay +failed because of stream failures. + +This KeyValue was added in version 1.1.0 of this specification. + +"error_destination" Int + +\[Zero or one time.\] + +The number of times that the bandwidth measurements for this relay +failed because the destination Web server was not available. + +This KeyValue was added in version 1.4.0 of this specification. + +"error_second_relay" Int + +\[Zero or one time.\] + +The number of times that the bandwidth measurements for this relay +failed because sbws could not find a second relay for the test circuit. + +This KeyValue was added in version 1.4.0 of this specification. + +"error_misc" Int + +\[Zero or one time.\] + +The number of times that the bandwidth measurements for this relay +failed because of other reasons. + +This KeyValue was added in version 1.1.0 of this specification. + +"bw_mean" Int + +\[Zero or one time.\] + +The measured bandwidth mean for this relay in bytes per second. + +This KeyValue was added in version 1.2.0 of this specification. + +"bw_median" Int + +\[Zero or one time.\] + +The measured bandwidth median for this relay in bytes per second. + +This KeyValue was added in version 1.2.0 of this specification. + +"desc_bw_avg" Int + +\[Zero or one time.\] + +The descriptor average bandwidth for this relay in bytes per second. + +This KeyValue was added in version 1.2.0 of this specification. + +"desc_bw_obs_last" Int + +\[Zero or one time.\] + +The last descriptor observed bandwidth for this relay in bytes per +second. + +This KeyValue was added in version 1.2.0 of this specification. + +"desc_bw_obs_mean" Int + +\[Zero or one time.\] + +The descriptor observed bandwidth mean for this relay in bytes per +second. + +This KeyValue was added in version 1.2.0 of this specification. + +"desc_bw_bur" Int + +\[Zero or one time.\] + +The descriptor burst bandwidth for this relay in bytes per +second. + +This KeyValue was added in version 1.2.0 of this specification. + +"consensus_bandwidth" Int + +\[Zero or one time.\] + +The consensus bandwidth for this relay in bytes per second. + +This KeyValue was added in version 1.2.0 of this specification. + +"consensus_bandwidth_is_unmeasured" Bool + +\[Zero or one time.\] + +If the consensus bandwidth for this relay was not obtained from +three or more bandwidth authorities, this KeyValue is True or +False otherwise. + +This KeyValue was added in version 1.2.0 of this specification. + +"relay_in_recent_consensus_count" Int + +\[Zero or one time.\] + +The number of times this relay was found in a consensus in the +last data_period days. (Unless otherwise stated, data_period is +5 by default.) + +This KeyValue was added in version 1.4.0 of this specification. + +"relay_recent_priority_list_count" Int + +\[Zero or one time.\] + +The number of times this relay has been prioritized to be measured +in the last data_period days. + +This KeyValue was added in version 1.4.0 of this specification. + +"relay_recent_measurement_attempt_count" Int + +\[Zero or one time.\] + +The number of times this relay was tried to be measured in the +last data_period days. + +This KeyValue was added in version 1.4.0 of this specification. + +"relay_recent_measurement_failure_count" Int + +\[Zero or one time.\] + +The number of times this relay was tried to be measured in the +last data_period days, but it was not possible to obtain a +measurement. + +This KeyValue was added in version 1.4.0 of this specification. + +"relay_recent_measurements_excluded_error_count" Int + +\[Zero or one time.\] + +The number of recent relay measurement attempts that failed. +Measurements are recent if they are in the last data_period days +(5 by default). + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This KeyValue was added in version 1.4.0 of this specification. + +"relay_recent_measurements_excluded_near_count" Int + +\[Zero or one time.\] + +When all of a relay's recent successful measurements were performed in +a period of time that was too short (by default 1 day), the relay is +excluded. This KeyValue contains the number of recent successful +measurements for the relay that were ignored for this reason. + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This KeyValue was added in version 1.4.0 of this specification. + +"relay_recent_measurements_excluded_old_count" Int + +\[Zero or one time.\] + +The number of successful measurements for this relay that are too old +(more than data_period days, 5 by default). + +Excludes measurements that are already counted in +relay_recent_measurements_excluded_near_count. + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This KeyValue was added in version 1.4.0 of this specification. + +"relay_recent_measurements_excluded_few_count" Int + +\[Zero or one time.\] + +The number of successful measurements for this relay that were ignored +because the relay did not have enough successful measurements (fewer +than 2, by default). + +Excludes measurements that are already counted in +relay_recent_measurements_excluded_near_count or +relay_recent_measurements_excluded_old_count. + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This KeyValue was added in version 1.4.0 of this specification. + +"under_min_report" bool + +\[Zero or one time.\] + +If the value is 1, there are not enough eligible relays in the +bandwidth file, and Tor bandwidth authorities MAY NOT vote on this +relay. (Current Tor versions do not change their behaviour based on +the "under_min_report" key.) + +If the value is 0 or the KeyValue is not present, there are enough +relays in the bandwidth file. + +Because Tor versions released before April 2019 (see section 1.4. for +the full list of versions) ignore "vote=0", generator implementations +MUST NOT change the bandwidths for under_min_report relays. Using the +same bw value makes authorities that do not understand "vote=0" +or "under_min_report=1" produce votes that don't change relay weights +too much. It also avoids flapping when the reporting threshold is +reached. + +This KeyValue was added in version 1.4.0 of this specification. + +"unmeasured" bool + +\[Zero or one time.\] + +If the value is 1, this relay was not successfully measured and +Tor bandwidth authorities MAY NOT vote on this relay. +(Current Tor versions do not change their behaviour based on +the "unmeasured" key.) + +If the value is 0 or the KeyValue is not present, this relay +was successfully measured. + +Because Tor versions released before April 2019 (see section 1.4. for +the full list of versions) ignore "vote=0", generator implementations +MUST set "bw=1" for unmeasured relays. Using the minimum bw value +makes authorities that do not understand "vote=0" or "unmeasured=1" +produce votes that don't change relay weights too much. + +This KeyValue was added in version 1.4.0 of this specification. + +"vote" bool + +\[Zero or one time.\] + +If the value is 0, Tor directory authorities SHOULD ignore the relay's +entry in the bandwidth file. They SHOULD vote for the relay the same +way they would vote for a relay that is not present in the file. + +This MAY be the case when this relay was not successfully measured but +it is included in the Bandwidth File, to diagnose why they were not +measured. + +If the value is 1 or the KeyValue is not present, Tor directory +authorities MUST use the relay's bw value in any votes for that relay. + +Implementations MUST also set "bw=1" for unmeasured relays. +But they MUST NOT change the bw for under_min_report relays. +(See the explanations under "unmeasured" and "under_min_report" +for more details.) + +This KeyValue was added in version 1.4.0 of this specification. + +"xoff_recv" Int + +\[Zero or one time.\] + +The number of times this relay received `XOFF_RECV` stream events while +being measured in the last data_period days. + +This KeyValue was added in version 1.6.0 of this specification. + +"xoff_sent" Int + +\[Zero or one time.\] + +The number of times this relay received `XOFF_SENT` stream events while +being measured in the last data_period days. + +This KeyValue was added in version 1.6.0 of this specification. + +"r_strm" Float + +\[Zero or one time.\] + +The stream ratio of this relay calculated as explained in B4.3. + +This KeyValue was added in version 1.7.0 of this specification. + +"r_strm_filt" Float + +\[Zero or one time.\] + +The filtered stream ratio of this relay calculated as explained in B4.3. + +This KeyValue was added in version 1.7.0 of this specification. + +<a id="bandwidth-file-spec.txt-2.4.2.2"></a> + +### Torflow + +Torflow RelayLines include node_id and bw, and other KeyValue pairs \[2\]. + +References: + + +1. <https://gitlab.torproject.org/tpo/network-health/torflow> +2. <https://gitlab.torproject.org/tpo/network-health/torflow/-/blob/main/NetworkScanners/BwAuthority/README.spec.txt?ref_type=heads#L332> + The Torflow specification is outdated, and does not match the current + implementation. See section A.1. for the format produced by Torflow. +3. [The Tor Directory Protocol](../dir-spec) +4. [How Tor Version Numbers Work In Tor](../version-spec.md) +5. <https://semver.org/> +``` diff --git a/spec/bandwidth-file-spec/index.md b/spec/bandwidth-file-spec/index.md new file mode 100644 index 0000000..86563fa --- /dev/null +++ b/spec/bandwidth-file-spec/index.md @@ -0,0 +1,18 @@ +# Tor Bandwidth File Format + +```text + juga + teor +``` + +This document describes the format of Tor's Bandwidth File, version +1.0.0 and later. + +It is a new specification for the existing bandwidth file format, +which we call version 1.0.0. It also specifies new format versions +1.1.0 and later, which are backwards compatible with 1.0.0 parsers. + +Since Tor version 0.2.4.12-alpha, the directory authorities use +the Bandwidth File file called "V3BandwidthsFile" generated by +Torflow \[1\]. The details of this format are described in Torflow's +README.spec.txt. We also summarise the format in this specification. diff --git a/spec/bandwidth-file-spec/relay-line-format.md b/spec/bandwidth-file-spec/relay-line-format.md new file mode 100644 index 0000000..31e1d9b --- /dev/null +++ b/spec/bandwidth-file-spec/relay-line-format.md @@ -0,0 +1,129 @@ +<a id="bandwidth-file-spec.txt-2.3"></a> + +# Relay Line format { #relay-line } + +It consists of zero or more RelayLines containing relay ids and +bandwidths. The relays and their KeyValues are in arbitrary order. + +There MUST NOT be multiple KeyValue pairs with the same key in the same +RelayLine. If there are, the parser SHOULD choose an arbitrary Value. + +There MUST NOT be multiple RelayLines per relay identity (node_id or +master_key_ed25519). If there are, parsers SHOULD issue a warning. +Parers MAY reject the file, choose an arbitrary RelayLine, or ignore +both RelayLines. + +If a parser does not recognize any extra material in a RelayLine, +the extra material MUST be ignored. + +Each RelayLine includes the following KeyValue pairs: + +"node_id" hexdigest + +\[Exactly once.\] + +The fingerprint for the relay's RSA identity key. + +```text + Note: In bandwidth files read by Tor versions earlier than + 0.3.4.1-alpha, node_id MUST NOT be at the end of the Line. + These authority versions are no longer supported. +``` + +Current Tor versions ignore master_key_ed25519, so node_id MUST be +present in each relay Line. + +Implementations of version 1.1.0 and later SHOULD include both node_id +and master_key_ed25519. Parsers SHOULD accept Lines that contain at +least one of them. + +From version 1.9.0 of this specification, "node_id" does not longer need to +start with the dollar sign before the hexdigit. + +"master_key_ed25519" MasterKey + +\[Zero or one time.\] + +The relays's master Ed25519 key, base64 encoded, +without trailing "="s, to avoid ambiguity with KeyValue "=" +character. + +This KeyValue pair SHOULD be present, see the note under node_id. + +This KeyValue was added in version 1.1.0 of this specification. + +"bw" Bandwidth + +\[Exactly once.\] + +The bandwidth of this relay in kilobytes per second. + +No Zero Bandwidths: +Tor accepts zero bandwidths, but they trigger bugs in older Tor +implementations. Therefore, implementations SHOULD NOT produce zero +bandwidths. Instead, they SHOULD use one as their minimum bandwidth. +If there are zero bandwidths, the parser MAY ignore them. + +Bandwidth Aggregation: +Multiple measurements can be aggregated using an averaging scheme, +such as a mean, median, or decaying average. + +Bandwidth Scaling: +Torflow scales bandwidths to kilobytes per second. Other +implementations SHOULD use kilobytes per second for their initial +bandwidth scaling. + +If different implementations or configurations are used in votes for +the same network, their measurements MAY need further scaling. See +Appendix B for information about scaling, and one possible scaling +method. + +MaxAdvertisedBandwidth: +Bandwidth generators MUST limit the relays' measured bandwidth based +on the MaxAdvertisedBadwidth. +A relay's MaxAdvertisedBandwidth limits the bandwidth-avg in its +descriptor. bandwidth-avg is the minimum of MaxAdvertisedBandwidth, +BandwidthRate, RelayBandwidthRate, BandwidthBurst, and +RelayBandwidthBurst. +Therefore, generators MUST limit a relay's measured bandwidth to its +descriptor's bandwidth-avg. This limit needs to be implemented in the +generator, because generators may scale consensus weights before +sending them to Tor. +Generators SHOULD NOT limit measured bandwidths based on descriptors' +bandwidth-observed, because that penalises new relays. + +sbws limits the relay's measured bandwidth to the bandwidth-avg +advertised. + +Torflow partitions relays based on their bandwidth. For unmeasured +relays, Torflow uses the minimum of all descriptor bandwidths, +including bandwidth-avg (MaxAdvertisedBandwidth) and +bandwidth-observed. Then Torflow measures the relays in each partition +against each other, which implicitly limits a relay's measured +bandwidth to the bandwidths of similar relays. + +Torflow also generates consensus weights based on the ratio between the +measured bandwidth and the minimum of all descriptor bandwidths (at the +time of the measurement). So when an operator reduces the +MaxAdvertisedBandwidth for a relay, Torflow reduces that relay's +measured bandwidth. + +KeyValue + +\[Zero or more times.\] + +Future format versions may include additional KeyValue pairs on a +RelayLine. +Additional KeyValue pairs will be accompanied by a minor version +increment. + +Implementations MAY add additional relay KeyValue pairs as needed. +This specification SHOULD be updated to avoid conflicting meanings +for the same Keywords. + +Parsers MUST NOT rely on the order of these additional KeyValue +pairs. + +Additional KeyValue pairs MUST NOT use any keywords specified in the +header format. +If there are, the parser MAY ignore conflicting keywords. diff --git a/spec/bandwidth-file-spec/sample-data.md b/spec/bandwidth-file-spec/sample-data.md new file mode 100644 index 0000000..1a689fd --- /dev/null +++ b/spec/bandwidth-file-spec/sample-data.md @@ -0,0 +1,139 @@ +<a id="bandwidth-file-spec.txt-A"></a> + +# Sample data + +The following has not been obtained from any real measurement. + +<a id="bandwidth-file-spec.txt-A.1"></a> + +## Generated by Torflow { #torflow } + +This an example version 1.0.0 document: + +```text +1523911758 +node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath +node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath +``` + +<a id="bandwidth-file-spec.txt-A.2"></a> + +## Generated by sbws version 0.1.0 { #sbws-010 } + +```text +1523911758 +version=1.1.0 +software=sbws +software_version=0.1.0 +latest_bandwidth=2018-04-16T20:49:18 +file_created=2018-04-16T21:49:18 +generator_started=2018-04-16T15:13:25 +earliest_bandwidth=2018-04-16T15:13:26 +==== + +bw=380 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26 +bw=189 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36 +``` + +<a id="bandwidth-file-spec.txt-A.3"></a> + +## Generated by sbws version 1.0.3 { #sbws-103 } + +````text +1523911758 +version=1.2.0 +latest_bandwidth=2018-04-16T20:49:18 +file_created=2018-04-16T21:49:18 +generator_started=2018-04-16T15:13:25 +earliest_bandwidth=2018-04-16T15:13:26 +minimum_number_eligible_relays=3862 +minimum_percent_eligible_relays=60 +number_consensus_relays=6436 +number_eligible_relays=6000 +percent_eligible_relays=93 +software=sbws +software_version=1.0.3 +===== + +bw=38000 bw_mean=1127824 bw_median=1180062 desc_bw_avg=1073741824 desc_bw_obs_last=17230879 desc_bw_obs_mean=14732306 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26 +bw=1 bw_mean=199162 bw_median=185675 desc_bw_avg=409600 desc_bw_obs_last=836165 desc_bw_obs_mean=858030 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36 +`` + +<a id="bandwidth-file-spec.txt-A.3.1"></a> + +### When there are not enough eligible measured relays { #sbws-103-not-enough-measured } + +```text +1540496079 +version=1.2.0 +earliest_bandwidth=2018-10-20T19:35:52 +file_created=2018-10-25T19:35:03 +generator_started=2018-10-25T11:42:56 +latest_bandwidth=2018-10-25T19:34:39 +minimum_number_eligible_relays=3862 +minimum_percent_eligible_relays=60 +number_consensus_relays=6436 +number_eligible_relays=2960 +percent_eligible_relays=46 +software=sbws +software_version=1.0.3 +===== +```` + +<a id="bandwidth-file-spec.txt-A.4"></a> + +## Headers generated by sbws version 1.0.4 { #sbws-104 } + +```text +1523911758 +version=1.2.0 +latest_bandwidth=2018-04-16T20:49:18 +destinations_countries=TH,ZZ +file_created=2018-04-16T21:49:18 +generator_started=2018-04-16T15:13:25 +earliest_bandwidth=2018-04-16T15:13:26 +minimum_number_eligible_relays=3862 +minimum_percent_eligible_relays=60 +number_consensus_relays=6436 +number_eligible_relays=6000 +percent_eligible_relays=93 +scanner_country=SN +software=sbws +software_version=1.0.4 +===== +``` + +<a id="bandwidth-file-spec.txt-A.5"></a> + +## Generated by sbws version 1.1.0 { #sbws-110 } + +```text +1523911758 +version=1.4.0 +latest_bandwidth=2018-04-16T20:49:18 +destinations_countries=TH,ZZ +file_created=2018-04-16T21:49:18 +generator_started=2018-04-16T15:13:25 +earliest_bandwidth=2018-04-16T15:13:26 +minimum_number_eligible_relays=3862 +minimum_percent_eligible_relays=60 +number_consensus_relays=6436 +number_eligible_relays=6000 +percent_eligible_relays=93 +recent_measurement_attempt_count=6243 +recent_measurement_failure_count=732 +recent_measurements_excluded_error_count=969 +recent_measurements_excluded_few_count=3946 +recent_measurements_excluded_near_count=90 +recent_measurements_excluded_old_count=0 +recent_priority_list_count=20 +recent_priority_relay_count=6243 +scanner_country=SN +software=sbws +software_version=1.1.0 +time_to_report_half_network=57273 +===== + +bw=1 error_circ=1 error_destination=0 error_misc=0 error_second_relay=0 error_stream=0 master_key_ed25519=J3HQ24kOQWac3L1xlFLp7gY91qkb5NuKxjj1BhDi+m8 nick=snap269 node_id=$DC4D609F95A52614D1E69C752168AF1FCAE0B05F relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=1 relay_recent_measurements_excluded_near_count=3 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=3 time=2019-03-16T18:20:57 unmeasured=1 vote=0 +bw=1 error_circ=0 error_destination=0 error_misc=0 error_second_relay=0 error_stream=2 master_key_ed25519=h6ZB1E1yBFWIMloUm9IWwjgaPXEpL5cUbuoQDgdSDKg nick=relay node_id=$C4544F9E209A9A9B99591D548B3E2822236C0503 relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=2 relay_recent_measurements_excluded_few_count=1 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=1 time=2019-03-17T06:50:58 unmeasured=1 vote=0 +``` diff --git a/spec/bandwidth-file-spec/scaling-bandwidths.md b/spec/bandwidth-file-spec/scaling-bandwidths.md new file mode 100644 index 0000000..65bb317 --- /dev/null +++ b/spec/bandwidth-file-spec/scaling-bandwidths.md @@ -0,0 +1,132 @@ +<a id="bandwidth-file-spec.txt-B"></a> + +# Scaling bandwidths + +<a id="bandwidth-file-spec.txt-B.1"></a> + +## Scaling requirements + +```text + Tor accepts zero bandwidths, but they trigger bugs in older Tor + implementations. Therefore, scaling methods SHOULD perform the + following checks: + * If the total bandwidth is zero, all relays should be given equal + bandwidths. + * If the scaled bandwidth is zero, it should be rounded up to one. +``` + +Initial experiments indicate that scaling may not be needed for +torflow and sbws, because their measured bandwidths are similar +enough already. + +<a id="bandwidth-file-spec.txt-B.2"></a> + +## A linear scaling method { #linear-method } + +If scaling is required, here is a simple linear bandwidth scaling +method, which ensures that all bandwidth votes contain approximately +the same total bandwidth: + +```text + 1. Calculate the relay quota by dividing the total measured bandwidth + in all votes, by the number of relays with measured bandwidth + votes. In the public tor network, this is approximately 7500 as of + April 2018. The quota should be a consensus parameter, so it can be + adjusted for all generators on the network. + + 2. Calculate a vote quota by multiplying the relay quota by the number + of relays this bandwidth authority has measured + bandwidths for. + + 3. Calculate a scaling factor by dividing the vote quota by the + total unscaled measured bandwidth in this bandwidth + authority's upcoming vote. + + 4. Multiply each unscaled measured bandwidth by the scaling + factor. +``` + +Now, the total scaled bandwidth in the upcoming vote is +approximately equal to the quota. + +<a id="bandwidth-file-spec.txt-B.3"></a> + +## Quota changes + +If all generators are using scaling, the quota can be gradually +reduced or increased as needed. Smaller quotas decrease the size +of uncompressed consensuses, and may decrease the size of +consensus diffs and compressed consensuses. But if the relay +quota is too small, some relays may be over- or under-weighted. + +<a id="bandwidth-file-spec.txt-B.4"></a> + +## Torflow aggregation + +Torflow implements two methods to compute the bandwidth values from the +(stream) bandwidth measurements: with and without PID control feedback. +The method described here is without PID control (see Torflow +specification, section 2.2). + +In the following sections, the relays' measured bandwidth refer to the +ones that this bandwidth authority has measured for the relays that +would be included in the next bandwidth authority's upcoming vote. + +```text + 1. Calculate the filtered bandwidth for each relay: + - choose the relay's measurements (`bw_j`) that are equal or greater + than the mean of the measurements for this relay + - calculate the mean of those measurements + + In pseudocode: + + bw_filt_i = mean(max(mean(bw_j), bw_j)) + + 2. Calculate network averages: + - calculate the filtered average by dividing the sum of all the + relays' filtered bandwidth by the number of relays that have been + measured (`n`), ie, calculate the mean average of the relays' + filtered bandwidth. + - calculate the stream average by dividing the sum of all the + relays' measured bandwidth by the number of relays that have been + measured (`n`), ie, calculate the mean average or the relays' + measured bandwidth. + + In pseudocode: + + bw_avg_filt_ = bw_filt_i / n + bw_avg_strm = bw_i / n + + 3. Calculate ratios for each relay: + - calculate the filtered ratio by dividing each relay filtered + bandwidth by the filtered average + - calculate the stream ratio by dividing each relay measured + bandwidth by the stream average + + In pseudocode: +``` + +r_filt_i = bw_filt_i / bw_avg_filt +r_strm_i = bw_i / bw_avg_strm + +```text + 4. Calculate the final ratio for each relay: + The final ratio is the larger between the filtered bandwidth's and the + stream bandwidth's ratio. + + In pseudocode: + + r_i = max(r_filt_i, r_strm_i) + + 5. Calculate the scaled bandwidth for each relay: + The most recent descriptor observed bandwidth (`bw_obs_i`) is + multiplied by the ratio + + In pseudocode: + + bw_new_i = r_i * bw_obs_i +``` + +\<\<In this way, the resulting network status consensus bandwidth +values are effectively re-weighted proportional to how much faster +the node was as compared to the rest of the network.>> diff --git a/spec/bandwidth-file-spec/scope-preliminaries.md b/spec/bandwidth-file-spec/scope-preliminaries.md new file mode 100644 index 0000000..7eb1419 --- /dev/null +++ b/spec/bandwidth-file-spec/scope-preliminaries.md @@ -0,0 +1,85 @@ +<a id="bandwidth-file-spec.txt-1"></a> + +# Scope and preliminaries + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL +NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and +"OPTIONAL" in this document are to be interpreted as described in +RFC 2119. + +<a id="bandwidth-file-spec.txt-1.2"></a> + +## Acknowledgements + +The original bandwidth generator (Torflow) and format was +created by mike. Teor suggested to write this specification while +contributing on pastly's new bandwidth generator implementation. + +This specification was revised after feedback from: + +Nick Mathewson (nickm) +Iain Learmonth (irl) + +<a id="bandwidth-file-spec.txt-1.3"></a> + +## Outline + +The Tor directory protocol (dir-spec.txt \[3\]) sections 3.4.1 +and 3.4.2, use the term bandwidth measurements, to refer to what +here is called Bandwidth File. + +A Bandwidth File contains information on relays' bandwidth +capacities and is produced by bandwidth generators, previously known +as bandwidth scanners. + +<a id="bandwidth-file-spec.txt-1.4"></a> + +## Format Versions + +1.0.0 - The legacy Bandwidth File format + +```text + 1.1.0 - Adds a header containing information about the bandwidth + file. Document the sbws and Torflow relay line keys. + + 1.2.0 - If there are not enough eligible relays, the bandwidth file + SHOULD contain a header, but no relays. (To match Torflow's + existing behaviour.) + + Adds scanner and destination countries to the header. + Adds new KeyValue Lines to the Header List section with + statistics about the number of relays included in the file. + Adds new KeyValues to Relay Bandwidth Lines, with different + bandwidth values (averages and descriptor bandwidths). + + 1.4.0 - Adds monitoring KeyValues to the header and relay lines. + + RelayLines for excluded relays MAY be present in the bandwidth + file for diagnostic reasons. Similarly, if there are not enough + eligible relays, the bandwidth file MAY contain all known relays. + + Diagnostic relay lines SHOULD be marked with vote=0, and + Tor SHOULD NOT use their bandwidths in its votes. + + Also adds Tor version. + 1.5.0 - Removes "recent_measurement_attempt_count" KeyValue. + 1.6.0 - Adds congestion control stream events KeyValues. + 1.7.0 - Adds ratios KeyValues to the relay lines and network averages + KeyValues to the header. + 1.8.0 - Adds "dirauth_nickname" KeyValue to the header. + 1.9.0 - Allows "node_id" KeyValue without the dollar sign at the start of the + hexdigit. + All Tor versions can consume format version 1.0.0. +``` + +All Tor versions can consume format version 1.1.0 and later, +but Tor versions earlier than 0.3.5.1-alpha warn if the header +contains any KeyValue lines after the Timestamp. + +```text + Tor versions 0.4.0.3-alpha, 0.3.5.8, 0.3.4.11, and earlier do not + understand "vote=0". Instead, they will vote for the actual bandwidths + that sbws puts in diagnostic relay lines: + * 1 for relays with "unmeasured=1", and + * the relay's measured and scaled bandwidth when "under_min_report=1". +``` |