aboutsummaryrefslogtreecommitdiff
path: root/spec/bandwidth-file-spec/header-list-format.md
blob: 206ff0cf99732d42a82f482b5151f8d8fc239332 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
<a id="bandwidth-file-spec.txt-2.2"></a>

# Header List format

It consists of a Timestamp line and zero or more HeaderLines.

All the header lines MUST conform to the HeaderLine format, except
the first Timestamp line.

The Timestamp line is not a HeaderLine to keep compatibility with
the legacy Bandwidth File format.

Some header Lines MUST appear in specific positions, as documented
below. All other Lines can appear in any order.

If a parser does not recognize any extra material in a header Line,
the Line MUST be ignored.

If a header Line does not conform to this format, the Line SHOULD be
ignored by parsers.

It consists of:

Timestamp NL

\[At start, exactly once.\]

The Unix Epoch time in seconds of the most recent generator bandwidth
result.

If the generator implementation has multiple threads or
subprocesses which can fail independently, it SHOULD take the most
recent timestamp from each thread and use the oldest value. This
ensures all the threads continue running.

If there are threads that do not run continuously, they SHOULD be
excluded from the timestamp calculation.

If there are no recent results, the generator MUST NOT generate a new
file.

It does not follow the KeyValue format for backwards compatibility
with version 1.0.0.

"version" version_number NL

\[In second position, zero or one time.\]

The specification document format version.
It uses semantic versioning \[5\].

This Line was added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this Line, and the
version_number is considered to be "1.0.0".

"software" Value NL

\[Zero or one time.\]

The name of the software that created the document.

This Line was added in version 1.1.0 of this specification.

Version 1.0.0 documents do not contain this Line, and the software
is considered to be "torflow".

"software_version" Value NL

\[Zero or one time.\]

The version of the software that created the document.
The version may be a version_number, a git commit, or some other
version scheme.

This Line was added in version 1.1.0 of this specification.

"file_created" DateTime NL

\[Zero or one time.\]

The date and time timestamp in ISO 8601 format and UTC time zone
when the file was created.

This Line was added in version 1.1.0 of this specification.

"generator_started" DateTime NL

\[Zero or one time.\]

The date and time timestamp in ISO 8601 format and UTC time zone
when the generator started.

This Line was added in version 1.1.0 of this specification.

"earliest_bandwidth" DateTime NL

\[Zero or one time.\]

The date and time timestamp in ISO 8601 format and UTC time zone
when the first relay bandwidth was obtained.

This Line was added in version 1.1.0 of this specification.

"latest_bandwidth" DateTime NL

\[Zero or one time.\]

The date and time timestamp in ISO 8601 format and UTC time zone
of the most recent generator bandwidth result.

This time MUST be identical to the initial Timestamp line.

This duplicate value is included to make the format easier for people
to read.

This Line was added in version 1.1.0 of this specification.

"number_eligible_relays" Int NL

\[Zero or one time.\]

The number of relays that have enough measurements to be
included in the bandwidth file.

This Line was added in version 1.2.0 of this specification.

"minimum_percent_eligible_relays" Int NL

\[Zero or one time.\]

The percentage of relays in the consensus that SHOULD be
included in every generated bandwidth file.

If this threshold is not reached, format versions 1.3.0 and earlier
SHOULD NOT contain any relays. (Bandwidth files always include a
header.)

Format versions 1.4.0 and later SHOULD include all the relays for
diagnostic purposes, even if this threshold is not reached. But these
relays SHOULD be marked so that Tor does not vote on them.
See section 1.4 for details.

The minimum percentage is 60% in Torflow, so sbws uses
60% as the default.

This Line was added in version 1.2.0 of this specification.

"number_consensus_relays" Int NL

\[Zero or one time.\]

The number of relays in the consensus.

This Line was added in version 1.2.0 of this specification.

"percent_eligible_relays" Int NL

\[Zero or one time.\]

The number of eligible relays, as a percentage of the number
of relays in the consensus.

```text
      This line SHOULD be equal to:
          (number_eligible_relays * 100.0) / number_consensus_relays
      to the number of relays in the consensus to include in this file.

      This Line was added in version 1.2.0 of this specification.

    "minimum_number_eligible_relays" Int NL

      [Zero or one time.]
```

The minimum number of relays that SHOULD be included in the bandwidth
file. See minimum_percent_eligible_relays for details.

```text
      This line SHOULD be equal to:
          number_consensus_relays * (minimum_percent_eligible_relays / 100.0)

      This Line was added in version 1.2.0 of this specification.

    "scanner_country" CountryCode NL

      [Zero or one time.]

      The country, as in political geolocation, where the generator is run.

      This Line was added in version 1.2.0 of this specification.

    "destinations_countries" CountryCodeList NL

      [Zero or one time.]
```

The country, as in political geolocation, or countries where the
destination Web server(s) are located.
The destination Web Servers serve the data that the generator retrieves
to measure the bandwidth.

This Line was added in version 1.2.0 of this specification.

"recent_consensus_count" Int NL

\[Zero or one time.\].

The number of the different consensuses seen in the last data_period
days. (data_period is 5 by default.)

```text
      Assuming that Tor clients fetch a consensus every 1-2 hours,
      and that the data_period is 5 days, the Value of this Key SHOULD be
      between:
          data_period * 24 / 2 =  60
          data_period * 24     = 120

      This Line was added in version 1.4.0 of this specification.

    "recent_priority_list_count" Int NL

      [Zero or one time.]
```

The number of times that a list with a subset of relays prioritized
to be measured has been created in the last data_period days.
(data_period is 5 by default.)

```text
      In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
      approximately:
          data_period * 24 / 1.5 = 80
      Being 1.5 the approximate number of hours it takes to measure a
      priority list of 7000 * 0.05 (350) relays, when the fraction of relays
      in a priority list is the 5% (0.05).

      This Line was added in version 1.4.0 of this specification.

    "recent_priority_relay_count" Int NL

      [Zero or one time.]
```

The number of relays that has been in in the list of relays prioritized
to be measured in the last data_period days. (data_period is 5 by
default.)

```text
      In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
      approximately:
          80 * (7000 * 0.05) = 28000
      Being 0.05 (5%) the fraction of relays in a priority list and 80
      the approximate number of priority lists (see
      "recent_priority_list_count").

      This Line was added in version 1.4.0 of this specification.

    "recent_measurement_attempt_count" Int NL

      [Zero or one time.]
```

The number of times that any relay has been queued to be measured
in the last data_period days. (data_period is 5 by default.)

In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
approximately the same as "recent_priority_relay_count",
assuming that there is one attempt to measure a relay for each relay that
has been prioritized unless there are system, network or implementation
issues.

This Line was added in version 1.4.0 of this specification and removed
in version 1.5.0.

"recent_measurement_failure_count" Int NL

\[Zero or one time.\]

The number of times that the scanner attempted to measure a relay in
the last data_period days (5 by default), but the relay has not been
measured because of system, network or implementation issues.

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_error_count" Int NL

\[Zero or one time.\]

The number of relays that have no successful measurements in the last
data_period days (5 by default).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_near_count" Int NL

\[Zero or one time.\]

The number of relays that have some successful measurements in the last
data_period days (5 by default), but all those measurements were
performed in a period of time that was too short (by default 1 day).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_old_count" Int NL

\[Zero or one time.\]

The number of relays that have some successful measurements, but all
those measurements are too old (more than 5 days, by default).

Excludes relays that are already counted in
recent_measurements_excluded_near_count.

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"recent_measurements_excluded_few_count" Int NL

\[Zero or one time.\]

The number of relays that don't have enough recent successful
measurements. (Fewer than 2 measurements in the last 5 days, by
default).

Excludes relays that are already counted in
recent_measurements_excluded_near_count and
recent_measurements_excluded_old_count.

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"time_to_report_half_network" Int NL

\[Zero or one time.\]

The time in seconds that it would take to report measurements about the
half of the network, given the number of eligible relays and the time
it took in the last days (5 days, by default).

(See the note in section 1.4, version 1.4.0, about excluded relays.)

This Line was added in version 1.4.0 of this specification.

"tor_version" version_number NL

\[Zero or one time.\]

The Tor version of the Tor process controlled by the generator.

This Line was added in version 1.4.0 of this specification.

"mu" Int NL

\[Zero or one time.\]

The network stream bandwidth average calculated as explained in B4.2.

This Line was added in version 1.7.0 of this specification.

"muf" Int NL

\[Zero or one time.\]

The network stream bandwidth average filtered calculated as explained in
B4.2.

This Line was added in version 1.7.0 of this specification.

KeyValue NL

\[Zero or more times.\]

"dirauth_nickname"  NL

\[Zero or one time.\]

The dirauth's nickname which publishes this V3BandwidthsFile.

This Line was added in version 1.8.0 of this specification.

There MUST NOT be multiple KeyValue header Lines with the same key.
If there are, the parser SHOULD choose an arbitrary Line.

If a parser does not recognize a Keyword in a KeyValue Line, it
MUST be ignored.

Future format versions may include additional KeyValue header Lines.
Additional header Lines will be accompanied by a minor version
increment.

Implementations MAY add additional header Lines as needed. This
specification SHOULD be updated to avoid conflicting meanings for
the same header keys.

Parsers MUST NOT rely on the order of these additional Lines.

Additional header Lines MUST NOT use any keywords specified in the
relay measurements format.
If there are, the parser MAY ignore conflicting keywords.

Terminator NL

\[Zero or one time.\]

The Header List section ends with a Terminator.

In version 1.0.0, Header List ends when the first relay bandwidth
is found conforming to the next section.

Implementations of version 1.1.0 and later SHOULD use a 5-character
terminator.

Tor 0.4.0.1-alpha and later look for a 5-character terminator,
or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2
used a 4-character terminator, this bug was fixed in 1.0.3.