diff options
Diffstat (limited to 'spec/bandwidth-file-spec/header-list-format.md')
-rw-r--r-- | spec/bandwidth-file-spec/header-list-format.md | 422 |
1 files changed, 422 insertions, 0 deletions
diff --git a/spec/bandwidth-file-spec/header-list-format.md b/spec/bandwidth-file-spec/header-list-format.md new file mode 100644 index 0000000..206ff0c --- /dev/null +++ b/spec/bandwidth-file-spec/header-list-format.md @@ -0,0 +1,422 @@ +<a id="bandwidth-file-spec.txt-2.2"></a> + +# Header List format + +It consists of a Timestamp line and zero or more HeaderLines. + +All the header lines MUST conform to the HeaderLine format, except +the first Timestamp line. + +The Timestamp line is not a HeaderLine to keep compatibility with +the legacy Bandwidth File format. + +Some header Lines MUST appear in specific positions, as documented +below. All other Lines can appear in any order. + +If a parser does not recognize any extra material in a header Line, +the Line MUST be ignored. + +If a header Line does not conform to this format, the Line SHOULD be +ignored by parsers. + +It consists of: + +Timestamp NL + +\[At start, exactly once.\] + +The Unix Epoch time in seconds of the most recent generator bandwidth +result. + +If the generator implementation has multiple threads or +subprocesses which can fail independently, it SHOULD take the most +recent timestamp from each thread and use the oldest value. This +ensures all the threads continue running. + +If there are threads that do not run continuously, they SHOULD be +excluded from the timestamp calculation. + +If there are no recent results, the generator MUST NOT generate a new +file. + +It does not follow the KeyValue format for backwards compatibility +with version 1.0.0. + +"version" version_number NL + +\[In second position, zero or one time.\] + +The specification document format version. +It uses semantic versioning \[5\]. + +This Line was added in version 1.1.0 of this specification. + +Version 1.0.0 documents do not contain this Line, and the +version_number is considered to be "1.0.0". + +"software" Value NL + +\[Zero or one time.\] + +The name of the software that created the document. + +This Line was added in version 1.1.0 of this specification. + +Version 1.0.0 documents do not contain this Line, and the software +is considered to be "torflow". + +"software_version" Value NL + +\[Zero or one time.\] + +The version of the software that created the document. +The version may be a version_number, a git commit, or some other +version scheme. + +This Line was added in version 1.1.0 of this specification. + +"file_created" DateTime NL + +\[Zero or one time.\] + +The date and time timestamp in ISO 8601 format and UTC time zone +when the file was created. + +This Line was added in version 1.1.0 of this specification. + +"generator_started" DateTime NL + +\[Zero or one time.\] + +The date and time timestamp in ISO 8601 format and UTC time zone +when the generator started. + +This Line was added in version 1.1.0 of this specification. + +"earliest_bandwidth" DateTime NL + +\[Zero or one time.\] + +The date and time timestamp in ISO 8601 format and UTC time zone +when the first relay bandwidth was obtained. + +This Line was added in version 1.1.0 of this specification. + +"latest_bandwidth" DateTime NL + +\[Zero or one time.\] + +The date and time timestamp in ISO 8601 format and UTC time zone +of the most recent generator bandwidth result. + +This time MUST be identical to the initial Timestamp line. + +This duplicate value is included to make the format easier for people +to read. + +This Line was added in version 1.1.0 of this specification. + +"number_eligible_relays" Int NL + +\[Zero or one time.\] + +The number of relays that have enough measurements to be +included in the bandwidth file. + +This Line was added in version 1.2.0 of this specification. + +"minimum_percent_eligible_relays" Int NL + +\[Zero or one time.\] + +The percentage of relays in the consensus that SHOULD be +included in every generated bandwidth file. + +If this threshold is not reached, format versions 1.3.0 and earlier +SHOULD NOT contain any relays. (Bandwidth files always include a +header.) + +Format versions 1.4.0 and later SHOULD include all the relays for +diagnostic purposes, even if this threshold is not reached. But these +relays SHOULD be marked so that Tor does not vote on them. +See section 1.4 for details. + +The minimum percentage is 60% in Torflow, so sbws uses +60% as the default. + +This Line was added in version 1.2.0 of this specification. + +"number_consensus_relays" Int NL + +\[Zero or one time.\] + +The number of relays in the consensus. + +This Line was added in version 1.2.0 of this specification. + +"percent_eligible_relays" Int NL + +\[Zero or one time.\] + +The number of eligible relays, as a percentage of the number +of relays in the consensus. + +```text + This line SHOULD be equal to: + (number_eligible_relays * 100.0) / number_consensus_relays + to the number of relays in the consensus to include in this file. + + This Line was added in version 1.2.0 of this specification. + + "minimum_number_eligible_relays" Int NL + + [Zero or one time.] +``` + +The minimum number of relays that SHOULD be included in the bandwidth +file. See minimum_percent_eligible_relays for details. + +```text + This line SHOULD be equal to: + number_consensus_relays * (minimum_percent_eligible_relays / 100.0) + + This Line was added in version 1.2.0 of this specification. + + "scanner_country" CountryCode NL + + [Zero or one time.] + + The country, as in political geolocation, where the generator is run. + + This Line was added in version 1.2.0 of this specification. + + "destinations_countries" CountryCodeList NL + + [Zero or one time.] +``` + +The country, as in political geolocation, or countries where the +destination Web server(s) are located. +The destination Web Servers serve the data that the generator retrieves +to measure the bandwidth. + +This Line was added in version 1.2.0 of this specification. + +"recent_consensus_count" Int NL + +\[Zero or one time.\]. + +The number of the different consensuses seen in the last data_period +days. (data_period is 5 by default.) + +```text + Assuming that Tor clients fetch a consensus every 1-2 hours, + and that the data_period is 5 days, the Value of this Key SHOULD be + between: + data_period * 24 / 2 = 60 + data_period * 24 = 120 + + This Line was added in version 1.4.0 of this specification. + + "recent_priority_list_count" Int NL + + [Zero or one time.] +``` + +The number of times that a list with a subset of relays prioritized +to be measured has been created in the last data_period days. +(data_period is 5 by default.) + +```text + In 2019, with 7000 relays in the network, the Value of this Key SHOULD be + approximately: + data_period * 24 / 1.5 = 80 + Being 1.5 the approximate number of hours it takes to measure a + priority list of 7000 * 0.05 (350) relays, when the fraction of relays + in a priority list is the 5% (0.05). + + This Line was added in version 1.4.0 of this specification. + + "recent_priority_relay_count" Int NL + + [Zero or one time.] +``` + +The number of relays that has been in in the list of relays prioritized +to be measured in the last data_period days. (data_period is 5 by +default.) + +```text + In 2019, with 7000 relays in the network, the Value of this Key SHOULD be + approximately: + 80 * (7000 * 0.05) = 28000 + Being 0.05 (5%) the fraction of relays in a priority list and 80 + the approximate number of priority lists (see + "recent_priority_list_count"). + + This Line was added in version 1.4.0 of this specification. + + "recent_measurement_attempt_count" Int NL + + [Zero or one time.] +``` + +The number of times that any relay has been queued to be measured +in the last data_period days. (data_period is 5 by default.) + +In 2019, with 7000 relays in the network, the Value of this Key SHOULD be +approximately the same as "recent_priority_relay_count", +assuming that there is one attempt to measure a relay for each relay that +has been prioritized unless there are system, network or implementation +issues. + +This Line was added in version 1.4.0 of this specification and removed +in version 1.5.0. + +"recent_measurement_failure_count" Int NL + +\[Zero or one time.\] + +The number of times that the scanner attempted to measure a relay in +the last data_period days (5 by default), but the relay has not been +measured because of system, network or implementation issues. + +This Line was added in version 1.4.0 of this specification. + +"recent_measurements_excluded_error_count" Int NL + +\[Zero or one time.\] + +The number of relays that have no successful measurements in the last +data_period days (5 by default). + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"recent_measurements_excluded_near_count" Int NL + +\[Zero or one time.\] + +The number of relays that have some successful measurements in the last +data_period days (5 by default), but all those measurements were +performed in a period of time that was too short (by default 1 day). + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"recent_measurements_excluded_old_count" Int NL + +\[Zero or one time.\] + +The number of relays that have some successful measurements, but all +those measurements are too old (more than 5 days, by default). + +Excludes relays that are already counted in +recent_measurements_excluded_near_count. + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"recent_measurements_excluded_few_count" Int NL + +\[Zero or one time.\] + +The number of relays that don't have enough recent successful +measurements. (Fewer than 2 measurements in the last 5 days, by +default). + +Excludes relays that are already counted in +recent_measurements_excluded_near_count and +recent_measurements_excluded_old_count. + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"time_to_report_half_network" Int NL + +\[Zero or one time.\] + +The time in seconds that it would take to report measurements about the +half of the network, given the number of eligible relays and the time +it took in the last days (5 days, by default). + +(See the note in section 1.4, version 1.4.0, about excluded relays.) + +This Line was added in version 1.4.0 of this specification. + +"tor_version" version_number NL + +\[Zero or one time.\] + +The Tor version of the Tor process controlled by the generator. + +This Line was added in version 1.4.0 of this specification. + +"mu" Int NL + +\[Zero or one time.\] + +The network stream bandwidth average calculated as explained in B4.2. + +This Line was added in version 1.7.0 of this specification. + +"muf" Int NL + +\[Zero or one time.\] + +The network stream bandwidth average filtered calculated as explained in +B4.2. + +This Line was added in version 1.7.0 of this specification. + +KeyValue NL + +\[Zero or more times.\] + +"dirauth_nickname" NL + +\[Zero or one time.\] + +The dirauth's nickname which publishes this V3BandwidthsFile. + +This Line was added in version 1.8.0 of this specification. + +There MUST NOT be multiple KeyValue header Lines with the same key. +If there are, the parser SHOULD choose an arbitrary Line. + +If a parser does not recognize a Keyword in a KeyValue Line, it +MUST be ignored. + +Future format versions may include additional KeyValue header Lines. +Additional header Lines will be accompanied by a minor version +increment. + +Implementations MAY add additional header Lines as needed. This +specification SHOULD be updated to avoid conflicting meanings for +the same header keys. + +Parsers MUST NOT rely on the order of these additional Lines. + +Additional header Lines MUST NOT use any keywords specified in the +relay measurements format. +If there are, the parser MAY ignore conflicting keywords. + +Terminator NL + +\[Zero or one time.\] + +The Header List section ends with a Terminator. + +In version 1.0.0, Header List ends when the first relay bandwidth +is found conforming to the next section. + +Implementations of version 1.1.0 and later SHOULD use a 5-character +terminator. + +Tor 0.4.0.1-alpha and later look for a 5-character terminator, +or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2 +used a 4-character terminator, this bug was fixed in 1.0.3. |