aboutsummaryrefslogtreecommitdiff
path: root/spec/bandwidth-file-spec/header-list-format.md
diff options
context:
space:
mode:
Diffstat (limited to 'spec/bandwidth-file-spec/header-list-format.md')
-rw-r--r--spec/bandwidth-file-spec/header-list-format.md422
1 files changed, 422 insertions, 0 deletions
diff --git a/spec/bandwidth-file-spec/header-list-format.md b/spec/bandwidth-file-spec/header-list-format.md
new file mode 100644
index 0000000..206ff0c
--- /dev/null
+++ b/spec/bandwidth-file-spec/header-list-format.md
@@ -0,0 +1,422 @@
+<a id="bandwidth-file-spec.txt-2.2"></a>
+
+# Header List format
+
+It consists of a Timestamp line and zero or more HeaderLines.
+
+All the header lines MUST conform to the HeaderLine format, except
+the first Timestamp line.
+
+The Timestamp line is not a HeaderLine to keep compatibility with
+the legacy Bandwidth File format.
+
+Some header Lines MUST appear in specific positions, as documented
+below. All other Lines can appear in any order.
+
+If a parser does not recognize any extra material in a header Line,
+the Line MUST be ignored.
+
+If a header Line does not conform to this format, the Line SHOULD be
+ignored by parsers.
+
+It consists of:
+
+Timestamp NL
+
+\[At start, exactly once.\]
+
+The Unix Epoch time in seconds of the most recent generator bandwidth
+result.
+
+If the generator implementation has multiple threads or
+subprocesses which can fail independently, it SHOULD take the most
+recent timestamp from each thread and use the oldest value. This
+ensures all the threads continue running.
+
+If there are threads that do not run continuously, they SHOULD be
+excluded from the timestamp calculation.
+
+If there are no recent results, the generator MUST NOT generate a new
+file.
+
+It does not follow the KeyValue format for backwards compatibility
+with version 1.0.0.
+
+"version" version_number NL
+
+\[In second position, zero or one time.\]
+
+The specification document format version.
+It uses semantic versioning \[5\].
+
+This Line was added in version 1.1.0 of this specification.
+
+Version 1.0.0 documents do not contain this Line, and the
+version_number is considered to be "1.0.0".
+
+"software" Value NL
+
+\[Zero or one time.\]
+
+The name of the software that created the document.
+
+This Line was added in version 1.1.0 of this specification.
+
+Version 1.0.0 documents do not contain this Line, and the software
+is considered to be "torflow".
+
+"software_version" Value NL
+
+\[Zero or one time.\]
+
+The version of the software that created the document.
+The version may be a version_number, a git commit, or some other
+version scheme.
+
+This Line was added in version 1.1.0 of this specification.
+
+"file_created" DateTime NL
+
+\[Zero or one time.\]
+
+The date and time timestamp in ISO 8601 format and UTC time zone
+when the file was created.
+
+This Line was added in version 1.1.0 of this specification.
+
+"generator_started" DateTime NL
+
+\[Zero or one time.\]
+
+The date and time timestamp in ISO 8601 format and UTC time zone
+when the generator started.
+
+This Line was added in version 1.1.0 of this specification.
+
+"earliest_bandwidth" DateTime NL
+
+\[Zero or one time.\]
+
+The date and time timestamp in ISO 8601 format and UTC time zone
+when the first relay bandwidth was obtained.
+
+This Line was added in version 1.1.0 of this specification.
+
+"latest_bandwidth" DateTime NL
+
+\[Zero or one time.\]
+
+The date and time timestamp in ISO 8601 format and UTC time zone
+of the most recent generator bandwidth result.
+
+This time MUST be identical to the initial Timestamp line.
+
+This duplicate value is included to make the format easier for people
+to read.
+
+This Line was added in version 1.1.0 of this specification.
+
+"number_eligible_relays" Int NL
+
+\[Zero or one time.\]
+
+The number of relays that have enough measurements to be
+included in the bandwidth file.
+
+This Line was added in version 1.2.0 of this specification.
+
+"minimum_percent_eligible_relays" Int NL
+
+\[Zero or one time.\]
+
+The percentage of relays in the consensus that SHOULD be
+included in every generated bandwidth file.
+
+If this threshold is not reached, format versions 1.3.0 and earlier
+SHOULD NOT contain any relays. (Bandwidth files always include a
+header.)
+
+Format versions 1.4.0 and later SHOULD include all the relays for
+diagnostic purposes, even if this threshold is not reached. But these
+relays SHOULD be marked so that Tor does not vote on them.
+See section 1.4 for details.
+
+The minimum percentage is 60% in Torflow, so sbws uses
+60% as the default.
+
+This Line was added in version 1.2.0 of this specification.
+
+"number_consensus_relays" Int NL
+
+\[Zero or one time.\]
+
+The number of relays in the consensus.
+
+This Line was added in version 1.2.0 of this specification.
+
+"percent_eligible_relays" Int NL
+
+\[Zero or one time.\]
+
+The number of eligible relays, as a percentage of the number
+of relays in the consensus.
+
+```text
+ This line SHOULD be equal to:
+ (number_eligible_relays * 100.0) / number_consensus_relays
+ to the number of relays in the consensus to include in this file.
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "minimum_number_eligible_relays" Int NL
+
+ [Zero or one time.]
+```
+
+The minimum number of relays that SHOULD be included in the bandwidth
+file. See minimum_percent_eligible_relays for details.
+
+```text
+ This line SHOULD be equal to:
+ number_consensus_relays * (minimum_percent_eligible_relays / 100.0)
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "scanner_country" CountryCode NL
+
+ [Zero or one time.]
+
+ The country, as in political geolocation, where the generator is run.
+
+ This Line was added in version 1.2.0 of this specification.
+
+ "destinations_countries" CountryCodeList NL
+
+ [Zero or one time.]
+```
+
+The country, as in political geolocation, or countries where the
+destination Web server(s) are located.
+The destination Web Servers serve the data that the generator retrieves
+to measure the bandwidth.
+
+This Line was added in version 1.2.0 of this specification.
+
+"recent_consensus_count" Int NL
+
+\[Zero or one time.\].
+
+The number of the different consensuses seen in the last data_period
+days. (data_period is 5 by default.)
+
+```text
+ Assuming that Tor clients fetch a consensus every 1-2 hours,
+ and that the data_period is 5 days, the Value of this Key SHOULD be
+ between:
+ data_period * 24 / 2 = 60
+ data_period * 24 = 120
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_priority_list_count" Int NL
+
+ [Zero or one time.]
+```
+
+The number of times that a list with a subset of relays prioritized
+to be measured has been created in the last data_period days.
+(data_period is 5 by default.)
+
+```text
+ In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
+ approximately:
+ data_period * 24 / 1.5 = 80
+ Being 1.5 the approximate number of hours it takes to measure a
+ priority list of 7000 * 0.05 (350) relays, when the fraction of relays
+ in a priority list is the 5% (0.05).
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_priority_relay_count" Int NL
+
+ [Zero or one time.]
+```
+
+The number of relays that has been in in the list of relays prioritized
+to be measured in the last data_period days. (data_period is 5 by
+default.)
+
+```text
+ In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
+ approximately:
+ 80 * (7000 * 0.05) = 28000
+ Being 0.05 (5%) the fraction of relays in a priority list and 80
+ the approximate number of priority lists (see
+ "recent_priority_list_count").
+
+ This Line was added in version 1.4.0 of this specification.
+
+ "recent_measurement_attempt_count" Int NL
+
+ [Zero or one time.]
+```
+
+The number of times that any relay has been queued to be measured
+in the last data_period days. (data_period is 5 by default.)
+
+In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
+approximately the same as "recent_priority_relay_count",
+assuming that there is one attempt to measure a relay for each relay that
+has been prioritized unless there are system, network or implementation
+issues.
+
+This Line was added in version 1.4.0 of this specification and removed
+in version 1.5.0.
+
+"recent_measurement_failure_count" Int NL
+
+\[Zero or one time.\]
+
+The number of times that the scanner attempted to measure a relay in
+the last data_period days (5 by default), but the relay has not been
+measured because of system, network or implementation issues.
+
+This Line was added in version 1.4.0 of this specification.
+
+"recent_measurements_excluded_error_count" Int NL
+
+\[Zero or one time.\]
+
+The number of relays that have no successful measurements in the last
+data_period days (5 by default).
+
+(See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+This Line was added in version 1.4.0 of this specification.
+
+"recent_measurements_excluded_near_count" Int NL
+
+\[Zero or one time.\]
+
+The number of relays that have some successful measurements in the last
+data_period days (5 by default), but all those measurements were
+performed in a period of time that was too short (by default 1 day).
+
+(See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+This Line was added in version 1.4.0 of this specification.
+
+"recent_measurements_excluded_old_count" Int NL
+
+\[Zero or one time.\]
+
+The number of relays that have some successful measurements, but all
+those measurements are too old (more than 5 days, by default).
+
+Excludes relays that are already counted in
+recent_measurements_excluded_near_count.
+
+(See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+This Line was added in version 1.4.0 of this specification.
+
+"recent_measurements_excluded_few_count" Int NL
+
+\[Zero or one time.\]
+
+The number of relays that don't have enough recent successful
+measurements. (Fewer than 2 measurements in the last 5 days, by
+default).
+
+Excludes relays that are already counted in
+recent_measurements_excluded_near_count and
+recent_measurements_excluded_old_count.
+
+(See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+This Line was added in version 1.4.0 of this specification.
+
+"time_to_report_half_network" Int NL
+
+\[Zero or one time.\]
+
+The time in seconds that it would take to report measurements about the
+half of the network, given the number of eligible relays and the time
+it took in the last days (5 days, by default).
+
+(See the note in section 1.4, version 1.4.0, about excluded relays.)
+
+This Line was added in version 1.4.0 of this specification.
+
+"tor_version" version_number NL
+
+\[Zero or one time.\]
+
+The Tor version of the Tor process controlled by the generator.
+
+This Line was added in version 1.4.0 of this specification.
+
+"mu" Int NL
+
+\[Zero or one time.\]
+
+The network stream bandwidth average calculated as explained in B4.2.
+
+This Line was added in version 1.7.0 of this specification.
+
+"muf" Int NL
+
+\[Zero or one time.\]
+
+The network stream bandwidth average filtered calculated as explained in
+B4.2.
+
+This Line was added in version 1.7.0 of this specification.
+
+KeyValue NL
+
+\[Zero or more times.\]
+
+"dirauth_nickname" NL
+
+\[Zero or one time.\]
+
+The dirauth's nickname which publishes this V3BandwidthsFile.
+
+This Line was added in version 1.8.0 of this specification.
+
+There MUST NOT be multiple KeyValue header Lines with the same key.
+If there are, the parser SHOULD choose an arbitrary Line.
+
+If a parser does not recognize a Keyword in a KeyValue Line, it
+MUST be ignored.
+
+Future format versions may include additional KeyValue header Lines.
+Additional header Lines will be accompanied by a minor version
+increment.
+
+Implementations MAY add additional header Lines as needed. This
+specification SHOULD be updated to avoid conflicting meanings for
+the same header keys.
+
+Parsers MUST NOT rely on the order of these additional Lines.
+
+Additional header Lines MUST NOT use any keywords specified in the
+relay measurements format.
+If there are, the parser MAY ignore conflicting keywords.
+
+Terminator NL
+
+\[Zero or one time.\]
+
+The Header List section ends with a Terminator.
+
+In version 1.0.0, Header List ends when the first relay bandwidth
+is found conforming to the next section.
+
+Implementations of version 1.1.0 and later SHOULD use a 5-character
+terminator.
+
+Tor 0.4.0.1-alpha and later look for a 5-character terminator,
+or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2
+used a 4-character terminator, this bug was fixed in 1.0.3.