aboutsummaryrefslogtreecommitdiff
path: root/bandwidth-file-spec.txt
blob: d9f4db63c1257e0a296b7a226f97e1430a58100d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
                  Tor Bandwidth File Format
                            juga
                            teor

Table of Contents

    1. Scope and preliminaries
        1.2. Acknowledgements
        1.3. Outline
        1.4. Format Versions
    2. Format details
        2.1. Definitions
        2.2. Header List format
        2.3. Relay Line format
        2.4. Implementation details
            2.4.1. Writing bandwidth files atomically
            2.4.2. Additional KeyValue pair definitions
                2.4.2.1. Simple Bandwidth Scanner
                2.4.2.2. Torflow
    A. Sample data
        A.1. Generated by Torflow
        A.2. Generated by sbws version 0.1.0
        A.3. Generated by sbws version 1.0.3
        A.4. Headers generated by sbws version 1.0.4
        A.5 Generated by sbws version 1.1.0
    B. Scaling bandwidths
        B.1. Scaling requirements
        B.2. A linear scaling method
        B.3. Quota changes
        B.4. Torflow aggregation

1. Scope and preliminaries

  This document describes the format of Tor's Bandwidth File, version
  1.0.0 and later.

  It is a new specification for the existing bandwidth file format,
  which we call version 1.0.0. It also specifies new format versions
  1.1.0 and later, which are backwards compatible with 1.0.0 parsers.

  Since Tor version 0.2.4.12-alpha, the directory authorities use
  the Bandwidth File file called "V3BandwidthsFile" generated by
  Torflow [1]. The details of this format are described in Torflow's
  README.spec.txt. We also summarise the format in this specification.

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
    NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
    "OPTIONAL" in this document are to be interpreted as described in
    RFC 2119.

1.2. Acknowledgements

  The original bandwidth generator (Torflow) and format was
  created by mike. Teor suggested to write this specification while
  contributing on pastly's new bandwidth generator implementation.

  This specification was revised after feedback from:

    Nick Mathewson (nickm)
    Iain Learmonth (irl)

1.3. Outline

  The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1
  and 3.4.2, use the term bandwidth measurements, to refer to what
  here is called Bandwidth File.

  A Bandwidth File contains information on relays' bandwidth
  capacities and is produced by bandwidth generators, previously known
  as bandwidth scanners.

1.4. Format Versions

  1.0.0 - The legacy Bandwidth File format

  1.1.0 - Adds a header containing information about the bandwidth
          file. Document the sbws and Torflow relay line keys.

  1.2.0 - If there are not enough eligible relays, the bandwidth file
          SHOULD contain a header, but no relays. (To match Torflow's
          existing behaviour.)

          Adds scanner and destination countries to the header.
          Adds new KeyValue Lines to the Header List section with
          statistics about the number of relays included in the file.
          Adds new KeyValues to Relay Bandwidth Lines, with different
          bandwidth values (averages and descriptor bandwidths).

  1.4.0 - Adds monitoring KeyValues to the header and relay lines.

          RelayLines for excluded relays MAY be present in the bandwidth
          file for diagnostic reasons. Similarly, if there are not enough
          eligible relays, the bandwidth file MAY contain all known relays.

          Diagnostic relay lines SHOULD be marked with vote=0, and
          Tor SHOULD NOT use their bandwidths in its votes.

          Also adds Tor version.
  1.5.0 - Removes "recent_measurement_attempt_count" KeyValue.

  All Tor versions can consume format version 1.0.0.

  All Tor versions can consume format version 1.1.0 and later,
  but Tor versions earlier than 0.3.5.1-alpha warn if the header
  contains any KeyValue lines after the Timestamp.

  Tor versions 0.4.0.3-alpha, 0.3.5.8, 0.3.4.11, and earlier do not
  understand "vote=0". Instead, they will vote for the actual bandwidths
  that sbws puts in diagnostic relay lines:
    * 1 for relays with "unmeasured=1", and
    * the relay's measured and scaled bandwidth when "under_min_report=1".

2. Format details

  The Bandwidth File MUST contain the following sections:
  - Header List (exactly once), which is a partially ordered list of
    - Header Lines (one or more times), then
  - Relay Lines (zero or more times), in an arbitrary order.
  If it does not contain these sections, parsers SHOULD ignore the file.

2.1. Definitions

  The following nonterminals are defined in Tor directory protocol
  sections 1.2., 2.1.1., 2.1.3.:

    bool
    Int
    SP (space)
    NL (newline)
    KeywordChar
    ArgumentChar
    nickname
    hexdigest (a '$', followed by 40 hexadecimal characters
      ([A-Fa-f0-9]))

  Nonterminal defined section 2 of version-spec.txt [4]:

    version_number

  We define the following nonterminals:

    Line ::= ArgumentChar* NL
    RelayLine ::= KeyValue (SP KeyValue)* NL
    HeaderLine ::= KeyValue NL
    KeyValue ::= Key "=" Value
    Key ::= (KeywordChar | "_")+
    Value ::= ArgumentCharValue+
    ArgumentCharValue ::= any printing ASCII character except NL and SP.
    Terminator ::= "=====" or "===="
                   Generators SHOULD use a 5-character terminator.
    Timestamp ::= Int
    Bandwidth ::= Int
    MasterKey ::= a base64-encoded Ed25519 public key, with
                  padding characters omitted.
    DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601
    CountryCode ::= Two capital ASCII letters ([A-Z]{2}), as defined in
                    ISO 3166-1 alpha-2 plus "ZZ" to denote unknown country
                    (eg the destination is in a Content Delivery Network).
    CountryCodeList ::= One or more CountryCode(s) separated by a comma
                        ([A-Z]{2}(,[A-Z]{2})*).

  Note that key_value and value are defined in Tor directory protocol
  with different formats to KeyValue and Value here.

  Tor versions earlier than 0.3.5.1-alpha require all lines in the file
  to be 510 characters or less. The previous limit was 254 characters in
  Tor 0.2.6.2-alpha and earlier. Parsers MAY ignore longer Lines.

  Note that directory authorities are only supported on the two most
  recent stable Tor versions, so we expect that line limits will be
  removed after Tor 0.4.0 is released in 2019.

2.2. Header List format

  It consists of a Timestamp line and zero or more HeaderLines.

  All the header lines MUST conform to the HeaderLine format, except
  the first Timestamp line.

  The Timestamp line is not a HeaderLine to keep compatibility with
  the legacy Bandwidth File format.

  Some header Lines MUST appear in specific positions, as documented
  below. All other Lines can appear in any order.

  If a parser does not recognize any extra material in a header Line,
  the Line MUST be ignored.

  If a header Line does not conform to this format, the Line SHOULD be
  ignored by parsers.

  It consists of:

    Timestamp NL

      [At start, exactly once.]

      The Unix Epoch time in seconds of the most recent generator bandwidth
      result.

      If the generator implementation has multiple threads or
      subprocesses which can fail independently, it SHOULD take the most
      recent timestamp from each thread and use the oldest value. This
      ensures all the threads continue running.

      If there are threads that do not run continuously, they SHOULD be
      excluded from the timestamp calculation.

      If there are no recent results, the generator MUST NOT generate a new
      file.

      It does not follow the KeyValue format for backwards compatibility
      with version 1.0.0.

    "version=" version_number NL

      [In second position, zero or one time.]

      The specification document format version.
      It uses semantic versioning [5].

      This Line was added in version 1.1.0 of this specification.

      Version 1.0.0 documents do not contain this Line, and the
      version_number is considered to be "1.0.0".

    "software=" Value NL

      [Zero or one time.]

      The name of the software that created the document.

      This Line was added in version 1.1.0 of this specification.

      Version 1.0.0 documents do not contain this Line, and the software
      is considered to be "torflow".

    "software_version=" Value NL

      [Zero or one time.]

      The version of the software that created the document.
      The version may be a version_number, a git commit, or some other
      version scheme.

      This Line was added in version 1.1.0 of this specification.

    "file_created=" DateTime NL

      [Zero or one time.]

      The date and time timestamp in ISO 8601 format and UTC time zone
      when the file was created.

      This Line was added in version 1.1.0 of this specification.

    "generator_started=" DateTime NL

      [Zero or one time.]

      The date and time timestamp in ISO 8601 format and UTC time zone
      when the generator started.

      This Line was added in version 1.1.0 of this specification.

    "earliest_bandwidth=" DateTime NL

      [Zero or one time.]

      The date and time timestamp in ISO 8601 format and UTC time zone
      when the first relay bandwidth was obtained.

      This Line was added in version 1.1.0 of this specification.

    "latest_bandwidth=" DateTime NL

      [Zero or one time.]

      The date and time timestamp in ISO 8601 format and UTC time zone
      of the most recent generator bandwidth result.

      This time MUST be identical to the initial Timestamp line.

      This duplicate value is included to make the format easier for people
      to read.

      This Line was added in version 1.1.0 of this specification.

    "number_eligible_relays=" Int NL

      [Zero or one time.]

      The number of relays that have enough measurements to be
      included in the bandwidth file.

      This Line was added in version 1.2.0 of this specification.

    "minimum_percent_eligible_relays=" Int NL

      [Zero or one time.]

      The percentage of relays in the consensus that SHOULD be
      included in every generated bandwidth file.

      If this threshold is not reached, format versions 1.3.0 and earlier
      SHOULD NOT contain any relays. (Bandwidth files always include a
      header.)

      Format versions 1.4.0 and later SHOULD include all the relays for
      diagnostic purposes, even if this threshold is not reached. But these
      relays SHOULD be marked so that Tor does not vote on them.
      See section 1.4 for details.

      The minimum percentage is 60% in Torflow, so sbws uses
      60% as the default.

      This Line was added in version 1.2.0 of this specification.

    "number_consensus_relays=" Int NL

      [Zero or one time.]

      The number of relays in the consensus.

      This Line was added in version 1.2.0 of this specification.

    "percent_eligible_relays=" Int NL

      [Zero or one time.]

      The number of eligible relays, as a percentage of the number
      of relays in the consensus.

      This line SHOULD be equal to:
          (number_eligible_relays * 100.0) / number_consensus_relays
      to the number of relays in the consensus to include in this file.

      This Line was added in version 1.2.0 of this specification.

    "minimum_number_eligible_relays=" Int NL

      [Zero or one time.]

      The minimum number of relays that SHOULD be included in the bandwidth
      file. See minimum_percent_eligible_relays for details.

      This line SHOULD be equal to:
          number_consensus_relays * (minimum_percent_eligible_relays / 100.0)

      This Line was added in version 1.2.0 of this specification.

    "scanner_country=" CountryCode NL

      [Zero or one time.]

      The country, as in political geolocation, where the generator is run.

      This Line was added in version 1.2.0 of this specification.

    "destinations_countries=" CountryCodeList NL

      [Zero or one time.]

      The country, as in political geolocation, or countries where the
      destination Web server(s) are located.
      The destination Web Servers serve the data that the generator retrieves
      to measure the bandwidth.

      This Line was added in version 1.2.0 of this specification.

    "recent_consensus_count=" Int NL

      [Zero or one time.].

      The number of the different consensuses seen in the last data_period
      days. (data_period is 5 by default.)

      Assuming that Tor clients fetch a consensus every 1-2 hours,
      and that the data_period is 5 days, the Value of this Key SHOULD be
      between:
          data_period * 24 / 2 =  60
          data_period * 24     = 120

      This Line was added in version 1.4.0 of this specification.

    "recent_priority_list_count=" Int NL

      [Zero or one time.]

      The number of times that a list with a subset of relays prioritized
      to be measured has been created in the last data_period days.
      (data_period is 5 by default.)

      In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
      approximately:
          data_period * 24 / 1.5 = 80
      Being 1.5 the approximate number of hours it takes to measure a
      priority list of 7000 * 0.05 (350) relays, when the fraction of relays
      in a priority list is the 5% (0.05).

      This Line was added in version 1.4.0 of this specification.

    "recent_priority_relay_count=" Int NL

      [Zero or one time.]

      The number of relays that has been in in the list of relays prioritized
      to be measured in the last data_period days. (data_period is 5 by
      default.)

      In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
      approximately:
          80 * (7000 * 0.05) = 28000
      Being 0.05 (5%) the fraction of relays in a priority list and 80
      the approximate number of priority lists (see
      "recent_priority_list_count").

      This Line was added in version 1.4.0 of this specification.

    "recent_measurement_attempt_count=" Int NL

      [Zero or one time.]

      The number of times that any relay has been queued to be measured
      in the last data_period days. (data_period is 5 by default.)

      In 2019, with 7000 relays in the network, the Value of this Key SHOULD be
      approximately the same as "recent_priority_relay_count",
      assuming that there is one attempt to measure a relay for each relay that
      has been prioritized unless there are system, network or implementation
      issues.

      This Line was added in version 1.4.0 of this specification and removed
      in version 1.5.0.

    "recent_measurement_failure_count=" Int NL

      [Zero or one time.]

      The number of times that the scanner attempted to measure a relay in
      the last data_period days (5 by default), but the relay has not been
      measured because of system, network or implementation issues.

      This Line was added in version 1.4.0 of this specification.

    "recent_measurements_excluded_error_count=" Int NL

      [Zero or one time.]

      The number of relays that have no successful measurements in the last
      data_period days (5 by default).

      (See the note in section 1.4, version 1.4.0, about excluded relays.)

      This Line was added in version 1.4.0 of this specification.

    "recent_measurements_excluded_near_count=" Int NL

      [Zero or one time.]

      The number of relays that have some successful measurements in the last
      data_period days (5 by default), but all those measurements were
      performed in a period of time that was too short (by default 1 day).

      (See the note in section 1.4, version 1.4.0, about excluded relays.)

      This Line was added in version 1.4.0 of this specification.

    "recent_measurements_excluded_old_count=" Int NL

      [Zero or one time.]

      The number of relays that have some successful measurements, but all
      those measurements are too old (more than 5 days, by default).

      Excludes relays that are already counted in
      recent_measurements_excluded_near_count.

      (See the note in section 1.4, version 1.4.0, about excluded relays.)

      This Line was added in version 1.4.0 of this specification.

    "recent_measurements_excluded_few_count=" Int NL

      [Zero or one time.]

      The number of relays that don't have enough recent successful
      measurements. (Fewer than 2 measurements in the last 5 days, by
      default).

      Excludes relays that are already counted in
      recent_measurements_excluded_near_count and
      recent_measurements_excluded_old_count.

      (See the note in section 1.4, version 1.4.0, about excluded relays.)

      This Line was added in version 1.4.0 of this specification.

    "time_to_report_half_network=" Int NL

      [Zero or one time.]

      The time in seconds that it would take to report measurements about the
      half of the network, given the number of eligible relays and the time
      it took in the last days (5 days, by default).

      (See the note in section 1.4, version 1.4.0, about excluded relays.)

      This Line was added in version 1.4.0 of this specification.

    "tor_version=" version_number NL

      [Zero or one time.]

      The Tor version of the Tor process controlled by the generator.

      This Line was added in version 1.4.0 of this specification.

    KeyValue NL

      [Zero or more times.]

      There MUST NOT be multiple KeyValue header Lines with the same key.
      If there are, the parser SHOULD choose an arbitrary Line.

      If a parser does not recognize a Keyword in a KeyValue Line, it
      MUST be ignored.

      Future format versions may include additional KeyValue header Lines.
      Additional header Lines will be accompanied by a minor version
      increment.

      Implementations MAY add additional header Lines as needed. This
      specification SHOULD be updated to avoid conflicting meanings for
      the same header keys.

      Parsers MUST NOT rely on the order of these additional Lines.

      Additional header Lines MUST NOT use any keywords specified in the
      relay measurements format.
      If there are, the parser MAY ignore conflicting keywords.

    Terminator NL

      [Zero or one time.]

      The Header List section ends with a Terminator.

      In version 1.0.0, Header List ends when the first relay bandwidth
      is found conforming to the next section.

      Implementations of version 1.1.0 and later SHOULD use a 5-character
      terminator.

      Tor 0.4.0.1-alpha and later look for a 5-character terminator,
      or the first relay bandwidth line. sbws versions 0.1.0 to 1.0.2
      used a 4-character terminator, this bug was fixed in 1.0.3.

2.3. Relay Line format

  It consists of zero or more RelayLines containing relay ids and
  bandwidths. The relays and their KeyValues are in arbitrary order.

  There MUST NOT be multiple KeyValue pairs with the same key in the same
  RelayLine. If there are, the parser SHOULD choose an arbitrary Value.

  There MUST NOT be multiple RelayLines per relay identity (node_id or
  master_key_ed25519). If there are, parsers SHOULD issue a warning.
  Parers MAY reject the file, choose an arbitrary RelayLine, or ignore
  both RelayLines.

  If a parser does not recognize any extra material in a RelayLine,
  the extra material MUST be ignored.

  Each RelayLine includes the following KeyValue pairs:

    "node_id=" hexdigest

      [Exactly once.]

      The fingerprint for the relay's RSA identity key.

      Note: In bandwidth files read by Tor versions earlier than
            0.3.4.1-alpha, node_id MUST NOT be at the end of the Line.
            These authority versions are no longer supported.

      Current Tor versions ignore master_key_ed25519, so node_id MUST be
      present in each relay Line.

      Implementations of version 1.1.0 and later SHOULD include both node_id
      and master_key_ed25519. Parsers SHOULD accept Lines that contain at
      least one of them.

    "master_key_ed25519=" MasterKey

      [Zero or one time.]

      The relays's master Ed25519 key, base64 encoded,
      without trailing "="s, to avoid ambiguity with KeyValue "="
      character.

      This KeyValue pair SHOULD be present, see the note under node_id.

      This KeyValue was added in version 1.1.0 of this specification.

    "bw=" Bandwidth

      [Exactly once.]

      The bandwidth of this relay in kilobytes per second.

      No Zero Bandwidths:
      Tor accepts zero bandwidths, but they trigger bugs in older Tor
      implementations. Therefore, implementations SHOULD NOT produce zero
      bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
      If there are zero bandwidths, the parser MAY ignore them.

      Bandwidth Aggregation:
      Multiple measurements can be aggregated using an averaging scheme,
      such as a mean, median, or decaying average.

      Bandwidth Scaling:
      Torflow scales bandwidths to kilobytes per second. Other
      implementations SHOULD use kilobytes per second for their initial
      bandwidth scaling.

      If different implementations or configurations are used in votes for
      the same network, their measurements MAY need further scaling. See
      Appendix B for information about scaling, and one possible scaling
      method.

      MaxAdvertisedBandwidth:
      Bandwidth generators MUST limit the relays' measured bandwidth based
      on the MaxAdvertisedBadwidth.
      A relay's MaxAdvertisedBandwidth limits the bandwidth-avg in its
      descriptor. bandwidth-avg is the minimum of MaxAdvertisedBandwidth,
      BandwidthRate, RelayBandwidthRate, BandwidthBurst, and
      RelayBandwidthBurst.
      Therefore, generators MUST limit a relay's measured bandwidth to its
      descriptor's bandwidth-avg. This limit needs to be implemented in the
      generator, because generators may scale consensus weights before
      sending them to Tor.
      Generators SHOULD NOT limit measured bandwidths based on descriptors'
      bandwidth-observed, because that penalises new relays.

      sbws limits the relay's measured bandwidth to the bandwidth-avg
      advertised.

      Torflow partitions relays based on their bandwidth. For unmeasured
      relays, Torflow uses the minimum of all descriptor bandwidths,
      including bandwidth-avg (MaxAdvertisedBandwidth) and
      bandwidth-observed. Then Torflow measures the relays in each partition
      against each other, which implicitly limits a relay's measured
      bandwidth to the bandwidths of similar relays.

      Torflow also generates consensus weights based on the ratio between the
      measured bandwidth and the minimum of all descriptor bandwidths (at the
      time of the measurement). So when an operator reduces the
      MaxAdvertisedBandwidth for a relay, Torflow reduces that relay's
      measured bandwidth.

    KeyValue

      [Zero or more times.]

      Future format versions may include additional KeyValue pairs on a
      RelayLine.
      Additional KeyValue pairs will be accompanied by a minor version
      increment.

      Implementations MAY add additional relay KeyValue pairs as needed.
      This specification SHOULD be updated to avoid conflicting meanings
      for the same Keywords.

      Parsers MUST NOT rely on the order of these additional KeyValue
      pairs.

      Additional KeyValue pairs MUST NOT use any keywords specified in the
      header format.
      If there are, the parser MAY ignore conflicting keywords.

2.4. Implementation details

2.4.1. Writing bandwidth files atomically

  To avoid inconsistent reads, implementations SHOULD write bandwidth files
  atomically. If the file is transferred from another host, it SHOULD be
  written to a temporary path, then renamed to the V3BandwidthsFile path.

  sbws versions 0.7.0 and later write the bandwidth file to an archival
  location, create a temporary symlink to that location, then atomically rename
  the symlink
  to the configured V3BandwidthsFile path.

  Torflow does not write bandwidth files atomically.

2.4.2. Additional KeyValue pair definitions

  KeyValue pairs in RelayLines that current implementations generate.

2.4.2.1. Simple Bandwidth Scanner

  sbws RelayLines contain these keys:

    "node_id=" hexdigest

      As above.

    "bw=" Bandwidth

      As above.

    "nick=" nickname

      [Exactly once.]

      The relay nickname.

      Torflow also has a "nick=" KeyValue.

    "rtt=" Int

      [Zero or one time.]

      The Round Trip Time in milliseconds to obtain 1 byte of data.

      This KeyValue was added in version 1.1.0 of this specification.
      It became optional in version 1.3.0 or 1.4.0 of this specification.

    "time=" DateTime

      [Exactly once.]

      The date and time timestamp in ISO 8601 format and UTC time zone
      when the last bandwidth was obtained.

      This KeyValue was added in version 1.1.0 of this specification.
      The Torflow equivalent is "measured_at=".

    "success=" Int

      [Zero or one time.]

      The number of times that the bandwidth measurements for this relay were
      successful.

      This KeyValue was added in version 1.1.0 of this specification.

    "error_circ=" Int

      [Zero or one time.]

      The number of times that the bandwidth measurements for this relay
      failed because of circuit failures.

      This KeyValue was added in version 1.1.0 of this specification.
      The Torflow equivalent is "circ_fail=".

    "error_stream=" Int

      [Zero or one time.]

      The number of times that the bandwidth measurements for this relay
      failed because of stream failures.

      This KeyValue was added in version 1.1.0 of this specification.

    "error_destination=" Int

      [Zero or one time.]

      The number of times that the bandwidth measurements for this relay
      failed because the destination Web server was not available.

      This KeyValue was added in version 1.4.0 of this specification.

    "error_second_relay=" Int

      [Zero or one time.]

      The number of times that the bandwidth measurements for this relay
      failed because sbws could not find a second relay for the test circuit.

      This KeyValue was added in version 1.4.0 of this specification.

    "error_misc=" Int

      [Zero or one time.]

      The number of times that the bandwidth measurements for this relay
      failed because of other reasons.

      This KeyValue was added in version 1.1.0 of this specification.

    "bw_mean=" Int

      [Zero or one time.]

      The measured bandwidth mean for this relay in bytes per second.

      This KeyValue was added in version 1.2.0 of this specification.

    "bw_median=" Int

      [Zero or one time.]

      The measured bandwidth median for this relay in bytes per second.

      This KeyValue was added in version 1.2.0 of this specification.

    "desc_bw_avg=" Int

      [Zero or one time.]

      The descriptor average bandwidth for this relay in bytes per second.

      This KeyValue was added in version 1.2.0 of this specification.

    "desc_bw_obs_last=" Int

      [Zero or one time.]

      The last descriptor observed bandwidth for this relay in bytes per
      second.

      This KeyValue was added in version 1.2.0 of this specification.

    "desc_bw_obs_mean=" Int

      [Zero or one time.]

      The descriptor observed bandwidth mean for this relay in bytes per
      second.

      This KeyValue was added in version 1.2.0 of this specification.

    "desc_bw_bur=" Int

      [Zero or one time.]

      The descriptor burst bandwidth for this relay in bytes per
      second.

      This KeyValue was added in version 1.2.0 of this specification.

    "consensus_bandwidth" Int

      [Zero or one time.]

      The consensus bandwidth for this relay in bytes per second.

      This KeyValue was added in version 1.2.0 of this specification.

    "consensus_bandwidth_is_unmeasured" Bool

      [Zero or one time.]

      If the consensus bandwidth for this relay was not obtained from
      three or more bandwidth authorities, this KeyValue is True or
      False otherwise.

      This KeyValue was added in version 1.2.0 of this specification.

    "relay_in_recent_consensus_count" Int

      [Zero or one time.]

      The number of times this relay was found in a consensus in the
      last data_period days. (Unless otherwise stated, data_period is
      5 by default.)

      This KeyValue was added in version 1.4.0 of this specification.

    "relay_recent_priority_list_count" Int

      [Zero or one time.]

      The number of times this relay has been prioritized to be measured
      in the last data_period days.

      This KeyValue was added in version 1.4.0 of this specification.

    "relay_recent_measurement_attempt_count" Int

      [Zero or one time.]

      The number of times this relay was tried to be measured in the
      last data_period days.

      This KeyValue was added in version 1.4.0 of this specification.

    "relay_recent_measurement_failure_count" Int

      [Zero or one time.]

      The number of times this relay was tried to be measured in the
      last data_period days, but it was not possible to obtain a
      measurement.

      This KeyValue was added in version 1.4.0 of this specification.

    "relay_recent_measurements_excluded_error_count=" Int

      [Zero or one time.]

      The number of recent relay measurement attempts that failed.
      Measurements are recent if they are in the last data_period days
      (5 by default).

      (See the note in section 1.4, version 1.4.0, about excluded relays.)

      This KeyValue was added in version 1.4.0 of this specification.

    "relay_recent_measurements_excluded_near_count=" Int

      [Zero or one time.]

      When all of a relay's recent successful measurements were performed in
      a period of time that was too short (by default 1 day), the relay is
      excluded. This KeyValue contains the number of recent successful
      measurements for the relay that were ignored for this reason.

      (See the note in section 1.4, version 1.4.0, about excluded relays.)

      This KeyValue was added in version 1.4.0 of this specification.

    "relay_recent_measurements_excluded_old_count=" Int

      [Zero or one time.]

      The number of successful measurements for this relay that are too old
      (more than data_period days, 5 by default).

      Excludes measurements that are already counted in
      relay_recent_measurements_excluded_near_count.

      (See the note in section 1.4, version 1.4.0, about excluded relays.)

      This KeyValue was added in version 1.4.0 of this specification.

    "relay_recent_measurements_excluded_few_count=" Int

      [Zero or one time.]

      The number of successful measurements for this relay that were ignored
      because the relay did not have enough successful measurements (fewer
      than 2, by default).

      Excludes measurements that are already counted in
      relay_recent_measurements_excluded_near_count or
      relay_recent_measurements_excluded_old_count.

      (See the note in section 1.4, version 1.4.0, about excluded relays.)

      This KeyValue was added in version 1.4.0 of this specification.

    "under_min_report=" bool

      [Zero or one time.]

      If the value is 1, there are not enough eligible relays in the
      bandwidth file, and Tor bandwidth authorities MAY NOT vote on this
      relay. (Current Tor versions do not change their behaviour based on
      the "under_min_report" key.)

      If the value is 0 or the KeyValue is not present, there are enough
      relays in the bandwidth file.

      Because Tor versions released before April 2019 (see section 1.4. for
      the full list of versions) ignore "vote=0", generator implementations
      MUST NOT change the bandwidths for under_min_report relays. Using the
      same bw value makes authorities that do not understand "vote=0"
      or "under_min_report=1" produce votes that don't change relay weights
      too much. It also avoids flapping when the reporting threshold is
      reached.

      This KeyValue was added in version 1.4.0 of this specification.

    "unmeasured=" bool

      [Zero or one time.]

      If the value is 1, this relay was not successfully measured and
      Tor bandwidth authorities MAY NOT vote on this relay.
      (Current Tor versions do not change their behaviour based on
      the "unmeasured" key.)

      If the value is 0 or the KeyValue is not present, this relay
      was successfully measured.

      Because Tor versions released before April 2019 (see section 1.4. for
      the full list of versions) ignore "vote=0", generator implementations
      MUST set "bw=1" for unmeasured relays. Using the minimum bw value
      makes authorities that do not understand "vote=0" or "unmeasured=1"
      produce votes that don't change relay weights too much.

      This KeyValue was added in version 1.4.0 of this specification.

    "vote=" bool

      [Zero or one time.]

      If the value is 0, Tor directory authorities SHOULD ignore the relay's
      entry in the bandwidth file. They SHOULD vote for the relay the same
      way they would vote for a relay that is not present in the file.

      This MAY be the case when this relay was not successfully measured but
      it is included in the Bandwidth File, to diagnose why they were not
      measured.

      If the value is 1 or the KeyValue is not present, Tor directory
      authorities MUST use the relay's bw value in any votes for that relay.

      Implementations MUST also set "bw=1" for unmeasured relays.
      But they MUST NOT change the bw for under_min_report relays.
      (See the explanations under "unmeasured" and "under_min_report"
      for more details.)

      This KeyValue was added in version 1.4.0 of this specification.

2.4.2.2. Torflow

  Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].

References:

1. https://gitweb.torproject.org/torflow.git
2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
   The Torflow specification is outdated, and does not match the current
   implementation. See section A.1. for the format produced by Torflow.
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt
5. https://semver.org/

A. Sample data

The following has not been obtained from any real measurement.

A.1. Generated by Torflow

This an example version 1.0.0 document:

1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath

A.2. Generated by sbws version 0.1.0

1523911758
version=1.1.0
software=sbws
software_version=0.1.0
latest_bandwidth=2018-04-16T20:49:18
file_created=2018-04-16T21:49:18
generator_started=2018-04-16T15:13:25
earliest_bandwidth=2018-04-16T15:13:26
====
bw=380 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26
bw=189 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36

A.3. Generated by sbws version 1.0.3

1523911758
version=1.2.0
latest_bandwidth=2018-04-16T20:49:18
file_created=2018-04-16T21:49:18
generator_started=2018-04-16T15:13:25
earliest_bandwidth=2018-04-16T15:13:26
minimum_number_eligible_relays=3862
minimum_percent_eligible_relays=60
number_consensus_relays=6436
number_eligible_relays=6000
percent_eligible_relays=93
software=sbws
software_version=1.0.3
=====
bw=38000 bw_mean=1127824 bw_median=1180062 desc_bw_avg=1073741824 desc_bw_obs_last=17230879 desc_bw_obs_mean=14732306 error_circ=0 error_misc=0 error_stream=1 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ nick=Test node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 rtt=380 success=1 time=2018-05-08T16:13:26
bw=1 bw_mean=199162 bw_median=185675 desc_bw_avg=409600 desc_bw_obs_last=836165 desc_bw_obs_mean=858030 error_circ=0 error_misc=0 error_stream=0 master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I nick=Test2 node_id=$96C15995F30895689291F455587BD94CA427B6FC rtt=378 success=1 time=2018-05-08T16:13:36

A.3.1. When there are not enough eligible measured relays:

1540496079
version=1.2.0
earliest_bandwidth=2018-10-20T19:35:52
file_created=2018-10-25T19:35:03
generator_started=2018-10-25T11:42:56
latest_bandwidth=2018-10-25T19:34:39
minimum_number_eligible_relays=3862
minimum_percent_eligible_relays=60
number_consensus_relays=6436
number_eligible_relays=2960
percent_eligible_relays=46
software=sbws
software_version=1.0.3
=====

A.4. Headers generated by sbws version 1.0.4

1523911758
version=1.2.0
latest_bandwidth=2018-04-16T20:49:18
destinations_countries=TH,ZZ
file_created=2018-04-16T21:49:18
generator_started=2018-04-16T15:13:25
earliest_bandwidth=2018-04-16T15:13:26
minimum_number_eligible_relays=3862
minimum_percent_eligible_relays=60
number_consensus_relays=6436
number_eligible_relays=6000
percent_eligible_relays=93
scanner_country=SN
software=sbws
software_version=1.0.4
=====

A.5 Generated by sbws version 1.1.0

1523911758
version=1.4.0
latest_bandwidth=2018-04-16T20:49:18
destinations_countries=TH,ZZ
file_created=2018-04-16T21:49:18
generator_started=2018-04-16T15:13:25
earliest_bandwidth=2018-04-16T15:13:26
minimum_number_eligible_relays=3862
minimum_percent_eligible_relays=60
number_consensus_relays=6436
number_eligible_relays=6000
percent_eligible_relays=93
recent_measurement_attempt_count=6243
recent_measurement_failure_count=732
recent_measurements_excluded_error_count=969
recent_measurements_excluded_few_count=3946
recent_measurements_excluded_near_count=90
recent_measurements_excluded_old_count=0
recent_priority_list_count=20
recent_priority_relay_count=6243
scanner_country=SN
software=sbws
software_version=1.1.0
time_to_report_half_network=57273
=====
bw=1 error_circ=1 error_destination=0 error_misc=0 error_second_relay=0 error_stream=0 master_key_ed25519=J3HQ24kOQWac3L1xlFLp7gY91qkb5NuKxjj1BhDi+m8 nick=snap269 node_id=$DC4D609F95A52614D1E69C752168AF1FCAE0B05F relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=1 relay_recent_measurements_excluded_near_count=3 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=3 time=2019-03-16T18:20:57 unmeasured=1 vote=0
bw=1 error_circ=0 error_destination=0 error_misc=0 error_second_relay=0 error_stream=2 master_key_ed25519=h6ZB1E1yBFWIMloUm9IWwjgaPXEpL5cUbuoQDgdSDKg nick=relay node_id=$C4544F9E209A9A9B99591D548B3E2822236C0503 relay_recent_measurement_attempt_count=3 relay_recent_measurements_excluded_error_count=2 relay_recent_measurements_excluded_few_count=1 relay_recent_consensus_count=3 relay_recent_priority_list_count=3 success=1 time=2019-03-17T06:50:58 unmeasured=1 vote=0

B. Scaling bandwidths

B.1. Scaling requirements

  Tor accepts zero bandwidths, but they trigger bugs in older Tor
  implementations. Therefore, scaling methods SHOULD perform the
  following checks:
   * If the total bandwidth is zero, all relays should be given equal
   bandwidths.
   * If the scaled bandwidth is zero, it should be rounded up to one.

  Initial experiments indicate that scaling may not be needed for
  torflow and sbws, because their measured bandwidths are similar
  enough already.

B.2. A linear scaling method

  If scaling is required, here is a simple linear bandwidth scaling
  method, which ensures that all bandwidth votes contain approximately
  the same total bandwidth:

  1. Calculate the relay quota by dividing the total measured bandwidth
     in all votes, by the number of relays with measured bandwidth
     votes. In the public tor network, this is approximately 7500 as of
     April 2018. The quota should be a consensus parameter, so it can be
     adjusted for all generators on the network.

  2. Calculate a vote quota by multiplying the relay quota by the number
     of relays this bandwidth authority has measured
     bandwidths for.

  3. Calculate a scaling factor by dividing the vote quota by the
     total unscaled measured bandwidth in this bandwidth
     authority's upcoming vote.

  4. Multiply each unscaled measured bandwidth by the scaling
     factor.

  Now, the total scaled bandwidth in the upcoming vote is
  approximately equal to the quota.

B.3. Quota changes

  If all generators are using scaling, the quota can be gradually
  reduced or increased as needed. Smaller quotas decrease the size
  of uncompressed consensuses, and may decrease the size of
  consensus diffs and compressed consensuses. But if the relay
  quota is too small, some relays may be over- or under-weighted.

B.4. Torflow aggregation

  Torflow implements two methods to compute the bandwidth values from the
  (stream) bandwidth measurements: with and without PID control feedback.
  The method described here is without PID control (see Torflow
  specification, section 2.2).

  In the following sections, the relays' measured bandwidth refer to the
  ones that this bandwidth authority has measured for the relays that
  would be included in the next bandwidth authority's upcoming vote.

  1. Calculate the filtered bandwidth for each relay:
    - choose the relay's measurements (`bw_j`) that are equal or greater
      than the mean of the measurements for this relay
    - calculate the mean of those measurements

    In pseudocode:

      bw_filt_i = mean(max(mean(bw_j), bw_j))

  2. Calculate network averages:
    - calculate the filtered average by dividing the sum of all the
      relays' filtered bandwidth by the number of relays that have been
      measured (`n`), ie, calculate the mean average of the relays'
      filtered bandwidth.
    - calculate the stream average by dividing the sum of all the
      relays' measured bandwidth by the number of relays that have been
      measured (`n`), ie, calculate the mean average or the relays'
      measured bandwidth.

     In pseudocode:

       bw_avg_filt_ = bw_filt_i / n
       bw_avg_strm = bw_i / n

  3. Calculate ratios for each relay:
    - calculate the filtered ratio by dividing each relay filtered
      bandwidth by the filtered average
    - calculate the stream ratio by dividing each relay measured
      bandwidth by the stream average

    In pseudocode:

      r_filt_i = bw_filt_i / bw_avg_filt
      r_strm_i = bw_i / bw_avg_strm

  4. Calculate the final ratio for each relay:
    The final ratio is the larger between the filtered bandwidth's and the
    stream bandwidth's ratio.

    In pseudocode:

      r_i = max(r_filt_i, r_strm_i)

  5. Calculate the scaled bandwidth for each relay:
    The most recent descriptor observed bandwidth (`bw_obs_i`) is
    multiplied by the ratio

    In pseudocode:

      bw_new_i = r_i * bw_obs_i

    <<In this way, the resulting network status consensus bandwidth
    values are effectively re-weighted proportional to how much faster
    the node was as compared to the rest of the network.>>