summaryrefslogtreecommitdiff
path: root/doc/HACKING/CircuitPaddingDevelopment.md
blob: a0dee68e919b0944ba9b43dcce9658e211113746 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
# Circuit Padding Developer Documentation

This document is written for researchers who wish to prototype and evaluate circuit-level padding defenses in Tor.

Written by Mike Perry and George Kadianakis.

# Table of Contents

- [0. Background](#0-background)
- [1. Introduction](#1-introduction)
    - [1.1. System Overview](#11-system-overview)
    - [1.2. Layering Model](#12-layering-model)
    - [1.3. Computation Model](#13-computation-model)
    - [1.4. Deployment Constraints](#14-other-deployment-constraints)
- [2. Creating New Padding Machines](#2-creating-new-padding-machines)
    - [2.1. Registering a New Padding Machine](#21-registering-a-new-padding-machine)
    - [2.2. Machine Activation and Shutdown](#22-machine-activation-and-shutdown)
- [3. Specifying Padding Machines](#3-specifying-padding-machines)
    - [3.1. Padding Machine States](#31-padding-machine-states)
    - [3.2. Padding Machine State Transitions](#32-padding-machine-state-transitions)
    - [3.3. Specifying Per-State Padding](#33-specifying-per-state-padding)
    - [3.4. Specifying Precise Cell Counts](#34-specifying-precise-cell-counts)
    - [3.5. Specifying Overhead Limits](#35-specifying-overhead-limits)
- [4. Evaluating Padding Machines](#4-evaluating-padding-machines)
    - [4.1. Pure Simulation](#41-pure-simulation)
    - [4.2. Testing in Chutney](#42-testing-in-chutney)
    - [4.3. Testing in Shadow](#43-testing-in-shadow)
    - [4.4. Testing on the Live Network](#44-testing-on-the-live-network)
- [5. Example Padding Machines](#5-example-padding-machines)
    - [5.1. Deployed Circuit Setup Machines](#51-deployed-circuit-setup-machines)
    - [5.2. Adaptive Padding Early](#52-adaptive-padding-early)
    - [5.3. Sketch of Tamaraw](#53-sketch-of-tamaraw)
    - [5.4. Other Padding Machines](#54-other-padding-machines)
- [6. Framework Implementation Details](#6-framework-implementation-details)
    - [6.1. Memory Allocation Conventions](#61-memory-allocation-conventions)
    - [6.2. Machine Application Events](#62-machine-application-events)
    - [6.3. Internal Machine Events](#63-internal-machine-events)
- [7. Future Features and Optimizations](#7-future-features-and-optimizations)
    - [7.1. Load Balancing and Flow Control](#71-load-balancing-and-flow-control)
    - [7.2. Timing and Queuing Optimizations](#72-timing-and-queuing-optimizations)
    - [7.3. Better Machine Negotiation](#73-better-machine-negotiation)
    - [7.4. Probabilistic State Transitions](#74-probabilistic-state-transitions)
    - [7.5. More Complex Pattern Recognition](#75-more-complex-pattern-recognition)
- [8. Open Research Problems](#8-open-research-problems)
    - [8.1. Onion Service Circuit Setup](#81-onion-service-circuit-setup)
    - [8.2. Onion Service Fingerprinting](#82-onion-service-fingerprinting)
    - [8.3. Open World Fingerprinting](#83-open-world-fingerprinting)
    - [8.4. Protocol Usage Fingerprinting](#84-protocol-usage-fingerprinting)
    - [8.5. Datagram Transport Side Channels](#85-datagram-transport-side-channels)
- [9. Must Read Papers](#9-must-read-papers)


## 0. Background

Tor supports both connection-level and circuit-level padding, and both
systems are live on the network today. The connection-level padding behavior
is described in [section 2 of
padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47). The
circuit-level padding behavior is described in [section 3 of
padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L282).

These two systems are orthogonal and should not be confused. The
connection-level padding system is only active while the TLS connection is
otherwise idle. Moreover, it regards circuit-level padding as normal data
traffic, and hence while the circuit-level padding system is actively padding,
the connection-level padding system will not add any additional overhead.

While the currently deployed circuit-level padding behavior is quite simple,
it is built on a flexible framework. This framework supports the description
of event-driven finite state machine by filling in fields of a simple C
structure, and is designed to support any delay-free statistically shaped
cover traffic on individual circuits, with cover traffic flowing to and from a
node of the implementor's choice (Guard, Middle, Exit, Rendezvous, etc).

This class of system was first proposed in
[Timing analysis in low-latency mix networks: attacks and defenses](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
by Shmatikov and Wang, and extended for the website traffic fingerprinting
domain by Juarez et al. in
[Toward an Efficient Website Fingerprinting Defense](http://arxiv.org/pdf/1512.00524). The
framework also supports fixed parameterized probability distributions, as
used in [APE](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) by Tobias
Pulls, and many other features.

This document describes how to use Tor's circuit padding framework to
implement and deploy novel delay-free cover traffic defenses.

## 1. Introduction

The circuit padding framework is the official way to implement padding
defenses in Tor. It may be used in combination with application-layer
defenses, and/or obfuscation defenses, or on its own.

Its current design should be enough to deploy most defenses without
modification, but you can extend it to [provide new
features](#7-future-features-and-optimizations) as well.

### 1.1. System Overview

Circuit-level padding can occur between Tor clients and relays at any hop of
one of the client's circuits. Both parties need to support the same padding
mechanisms for the system to work, and the client must enable it.

We added a padding negotiation relay cell to the Tor protocol that clients use
to ask a relay to start padding, as well as a torrc directive for researchers
to pin their clients' relay selection to the subset of Tor nodes that
implement their custom defenses, to support ethical live network testing and
evaluation.

Circuit-level padding is performed by 'padding machines'. A padding machine is
a finite state machine. Every state specifies a different form of
padding style, or stage of padding, in terms of inter-packet timings and total
packet counts.

Padding state machines are specified by filling in fields of a C structure,
which specifies the transitions between padding states based on various events,
probability distributions of inter-packet delays, and the conditions under
which padding machines should be applied to circuits.

This compact C structure representation is designed to function as a
microlanguage, which can be compiled down into a
bitstring that [can be tuned](#13-computation-model) using various
optimization methods (such as gradient descent, GAs, or GANs), either in
bitstring form or C struct form.

The event driven, self-contained nature of this framework is also designed to
make [evaluation](#4-evaluating-padding-machines) both expedient and rigorously
reproducible.

This document covers the engineering steps to write, test, and deploy a
padding machine, as well as how to extend the framework to support new machine
features.

If you prefer to learn by example, you may want to skip to either the
[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
5](#5-example-padding-machines) for example machines to get you up and running
quickly.

### 1.2. Layering Model

The circuit padding framework is designed to provide one layer in a layered
system of interchangeable components.

The circuit padding framework operates at the Tor circuit layer. It only deals
with the inter-cell timings and quantity of cells sent on a circuit. It can
insert cells on a circuit in arbitrary patterns, and in response to arbitrary
conditions, but it cannot delay cells. It also does not deal with packet
sizes, how cells are packed into TLS records, or ways that the Tor protocol
might be recognized on the wire.

The problem of differentiating Tor traffic from non-Tor traffic based on
TCP/TLS packet sizes, initial handshake patterns, and DPI characteristics is the
domain of [pluggable
transports](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGardenOfPluggableTransports),
which may optionally be used in conjunction with this framework (or without
it).

This document focuses primarily on the circuit padding framework's cover
traffic features, and will only briefly touch on the potential obfuscation and
application layer coupling points of the framework. Explicit layer coupling 
points can be created by adding either new [machine application
events](#62-machine-application-events) or new [internal machine
events](#63-internal-machine-events) to the circuit padding framework, so that
your padding machines can react to events from other layers.

### 1.3. Computation Model

The circuit padding framework is designed to support succinctly specified
defenses that can be tuned through [computer-assisted
optimization](#4-evaluating-padding-machines).

We chose to generalize the original [Adaptive Padding 2-state
design](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) into an
event-driven state machine because state machines are the simplest form of
sequence recognition devices from [automata
theory](https://en.wikipedia.org/wiki/Finite-state_machine).

Most importantly: this framing allows cover traffic defenses to be modeled as
an optimization problem search space, expressed as fields of a C structure
(which is simultaneously a compact opaque bitstring as well as a symbolic
vector in an abstract feature space). This kind of space is particularly well
suited to search by gradient descent, GAs, and GANs. 

When performing this optimization search, each padding machine should have a
fitness function, which will allow two padding machines to be compared for
relative effectiveness. Optimization searches work best if this fitness can be
represented as a single number, for example the total amount by which it
reduces the [Balanced
Accuracy](https://en.wikipedia.org/wiki/Precision_and_recall#Imbalanced_Data)
of an adversary's classifier, divided by an amount of traffic overhead. 

Before you begin the optimization phase for your defense, you should
also carefully consider the [features and
optimizations](#7-future-features-and-optimizations) that we suspect will be
useful, and also see if you can come up with any more. You should similarly be
sure to restrict your search space to avoid areas of the bitstring/feature
vector that you are sure you will not need. For example, some
[applications](#8-open-research-problems) may not need the histogram
accounting used by Adaptive Padding, but might need to add other forms of
[pattern recognition](#75-more-complex-pattern-recognition) to react to
sequences that resemble HTTP GET and HTTP POST.

### 1.4. Other Deployment Constraints

The framework has some limitations that are the result of deliberate
choices. We are unlikely to deploy defenses that ignore these limitations.

In particular, we have deliberately not provided any mechanism to delay actual
user traffic, even though we are keenly aware that if we were to support
additional delay, defenses would be able to have [more success with less
bandwidth
overhead](https://freedom.cs.purdue.edu/anonymity/trilemma/index.html).

In the website traffic fingerprinting domain, [provably optimal
defenses](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
achieve their bandwidth overhead bounds by ensuring that a non-empty queue is
maintained, by rate limiting traffic below the actual throughput of a circuit.
For optimal results, this queue must avoid draining to empty, and yet it
must also be drained fast enough to avoid tremendous queue overhead in fast
Tor relays, which carry hundreds of thousands of circuits simultaneously.

Unfortunately, Tor's end-to-end flow control is not congestion control. Its
window sizes are currently fixed. This means there is no signal when queuing
occurs, and thus no ability to limit queue size through pushback. This means
there is currently no way to do the fine-grained queue management necessary to
create such a queue and rate limit traffic effectively enough to keep this
queue from draining to empty, without also risking that aggregate queuing
would cause out-of-memory conditions on fast relays.

It may be possible to create a congestion control algorithm that can support
such fine grained queue management, but this is a [deeply unsolved area of
research](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).

Even beyond these major technical hurdles, additional latency is also
unappealing to the wider Internet community, for the simple reason that
bandwidth [continues to increase
exponentially](https://ipcarrier.blogspot.com/2014/02/bandwidth-growth-nearly-what-one-would.html)
where as the speed of light is fixed. Significant engineering effort has been
devoted to optimizations that reduce the effect of latency on Internet
protocols. To go against this trend would ensure our irrelevance to the wider
conversation about traffic analysis defenses for low latency Internet protocols.

On the other hand, through [load
balancing](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt)
and [circuit multiplexing strategies](https://bugs.torproject.org/29494), we
believe it is possible to add significant bandwidth overhead in the form of
cover traffic, without significantly impacting end-user performance.

For these reasons, we believe the trade-off should be in favor of adding more
cover traffic, rather than imposing queuing memory overhead and queuing delay.

As a last resort for narrowly scoped application domains (such as
shaping Tor service-side onion service traffic to look like other websites or
different application-layer protocols), delay *may* be added at the
[application layer](https://petsymposium.org/2017/papers/issue2/paper54-2017-2-source.pdf).
Any additional cover traffic required by such defenses should still be
added at the circuit padding layer using this framework, to provide
engineering efficiency through loose layer coupling and component re-use, as
well as to provide additional gains against [low
resolution](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47)
end-to-end traffic correlation.

Because such delay-based defenses will impact performance significantly more
than simply adding cover traffic, they must be optional, and negotiated by
only specific application layer endpoints that want them. This will have
consequences for anonymity sets and base rates, if such traffic shaping and
additional cover traffic is not very carefully constructed.

In terms of acceptable overhead, because Tor onion services
[currently use](https://metrics.torproject.org/hidserv-rend-relayed-cells.html)
less than 1% of the
[total consumed bandwidth](https://metrics.torproject.org/bandwidth-flags.html)
of the Tor network, and because onion services exist to provide higher
security as compared to Tor Exit traffic, they are an attractive target for
higher-overhead defenses. We encourage researchers to target this use case
for defenses that require more overhead, and/or for the deployment of
optional negotiated application-layer delays on either the server or the
client side.

## 2. Creating New Padding Machines

This section explains how to use the existing mechanisms in Tor to define a
new circuit padding machine.  We assume here that you know C, and are at
least somewhat familiar with Tor development.  For more information on Tor
development in general, see the other files in doc/HACKING/ in a recent Tor
distribution.

Again, if you prefer to learn by example, you may want to skip to either the
[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
5](#5-example-padding-machines) for example machines to get up and running
quickly.

To create a new padding machine, you must:

  1. Define your machine using the fields of a heap-allocated
     `circpad_machine_spec_t` C structure.

  2. Register this object in the global list of available padding machines,
     using `circpad_register_padding_machine()`.

  3. Ensure that your machine is properly negotiated under your desired
     circuit conditions.

### 2.1. Registering a New Padding Machine

Again, a circuit padding machine is designed to be specified entirely as a [single
C structure](#13-computation-model).

Your machine definitions should go into their own functions in
[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c). For
details on all of the fields involved in specifying a padding machine, see
[Section 3](#3-specifying-padding-machines).

You must register your machine in `circpad_machines_init()` in
[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). To
add a new padding machine specification, you must allocate a
`circpad_machine_spec_t` on the heap with `tor_malloc_zero()`, give it a
human readable name string, and a machine number equivalent to the number of
machines in the list, and register the structure using
`circpad_register_padding_machine()`.

Each machine must have a client instance and a relay instance. Register your
client-side machine instance in the `origin_padding_machines` list, and your
relay side machine instance in the `relay_padding_machines` list. Once you
have registered your instance, you do not need to worry about deallocation;
this is handled for you automatically.

Both machine lists use registration order to signal machine precedence for a
given `machine_idx` slot on a circuit. This means that machines that are
registered last are checked for activation *before* machines that are
registered first. (This reverse precedence ordering allows us to
deprecate older machines simply by adding new ones after them.)

### 2.2. Machine Activation and Shutdown

After a machine has been successfully registered with the framework, it will
be instantiated on any client-side circuits that support it. Only client-side
circuits may initiate padding machines, but either clients or relays may shut
down padding machines.

#### 2.2.1. Machine Application Conditions

The
[circpad_machine_conditions_t conditions field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L641)
of your `circpad_machine_spec_t` machine definition instance controls the
conditions under which your machine will be attached and enabled on a Tor
circuit, and when it gets shut down.

*All* of your explicitly specified conditions in
`circpad_machine_spec_t.conditions` *must* be met for the machine to be
applied to a circuit. If *any* condition ceases to be met, then the machine
is shut down.  (This is checked on every event that arrives, even if the
condition is unrelated to the event.)
Another way to look at this is that
all specified conditions must evaluate to true for the entire duration that
your machine is running. If any are false, your machine does not run (or
stops running and shuts down).

In particular, as part of the
[circpad_machine_conditions_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L149),
the circuit padding subsystem gives the developer the option to enable a
machine based on:
  - The
    [length](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L157)
    on the circuit (via the `min_hops` field).
  - The
    [current state](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L174)
    of the circuit, such as streams, relay_early, etc. (via the
    `circpad_circuit_state_t state_mask` field).
  - The
    [purpose](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L178)
    (i.e. type) of the circuit (via the `circpad_purpose_mask_t purpose_mask`
    field).

This condition mechanism is the preferred way to determine if a machine should
apply to a circuit. For information about potentially useful conditions that
we have considered but have not yet implemented, see [Section
7.3](#73-better-machine-negotiation). We will happily accept patches for those
conditions, or any for other additional conditions that are needed for your
use case.

#### 2.2.2. Detecting and Negotiating Machine Support

When a new machine specification is added to Tor (or removed from Tor), you
should bump the Padding subprotocol version in `src/core/or/protover.c` and
`src/rust/protover/protover.rs`, add a field to `protover_summary_flags_t` in
`or.h`, and set this field in `memoize_protover_summary()` in versions.c. This
new field must then be checked in `circpad_node_supports_padding()` in
`circuitpadding.c`.

Note that this protocol version update and associated support check is not
necessary if your experiments will *only* be using your own relays that
support your own padding machines. This can be accomplished by using the
`MiddleNodes` directive; see [Section 4](#4-evaluating-padding-machines) for more information.

If the protocol support check passes for the circuit, then the client sends a
`RELAY_COMMAND_PADDING_NEGOTIATE` cell towards the
`circpad_machine_spec_t.target_hop` relay, and immediately enables the
padding machine, and may begin sending padding. (The framework does not wait
for the `RELAY_COMMAND_PADDING_NEGOTIATED` response to begin padding,
so that we can
switch between machines rapidly.)

#### 2.2.3. Machine Shutdown Mechanisms

Padding machines can be shut down on a circuit in three main ways:
  1. During a `circpad_machine_event` callback, when
     `circpad_machine_spec_t.conditions` no longer applies (client side)
  2. After a transition to the CIRCPAD_STATE_END, if
     `circpad_machine_spec_t.should_negotiate_end` is set (client or relay
     side)
  3. If there is a `RELAY_COMMAND_PADDING_NEGOTIATED` error response from the
     relay during negotiation.

Each of these cases causes the originating node to send a relay cell towards
the other side, indicating that shutdown has occurred. The client side sends
`RELAY_COMMAND_PADDING_NEGOTIATE`, and the relay side sends
`RELAY_COMMAND_PADDING_NEGOTIATED`.

Because padding from malicious exit nodes can be used to construct active
timing-based side channels to malicious guard nodes, the client checks that
padding-related cells only come from relays with active padding machines.
For this reason, when a client decides to shut down a padding machine,
the framework frees the mutable `circuit_t.padding_info`, but leaves the
`circuit_t.padding_machine` pointer set until the
`RELAY_COMMAND_PADDING_NEGOTIATED` response comes back, to ensure that any
remaining in-flight padding packets are recognized a valid. Tor does
not yet close circuits due to violation of this property, but the
[vanguards addon component "bandguard"](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md#the-bandguards-subsystem)
does.

As an optimization, a client may replace a machine with another, by
sending a `RELAY_COMMAND_PADDING_NEGOTIATE` cell to shut down a machine, and
immediately sending a `RELAY_COMMAND_PADDING_NEGOTIATE` to start a new machine
in the same index, without waiting for the response from the first negotiate
cell.

Unfortunately, there is a known bug as a consequence of this optimization. If
your machine depends on repeated shutdown and restart of the same machine
number on the same circuit, please see [Bug
30922](https://bugs.torproject.org/30992). Depending on your use case, we may
need to fix that bug or help you find a workaround. See also [Section
6.1.3](#613-deallocation-and-shutdown) for some more technical details on this
mechanism.


## 3. Specifying Padding Machines

By now, you should understand how to register, negotiate, and control the
lifetime of your padding machine, but you still don't know how to make it do
anything yet. This section should help you understand how to specify how your
machine reacts to events and adds padding to the wire.

If you prefer to learn by example first instead, you may wish to skip to
[Section 5](#5-example-padding-machines).


A padding machine is specified by filling in an instance of
[circpad_machine_spec_t](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L605). Instances
of this structure specify the precise functionality of a machine: it's
what the circuit padding developer is called to write. These instances
are created only at startup, and are referenced via `const` pointers during
normal operation.

In this section we will go through the most important elements of this
structure.

### 3.1. Padding Machine States

A padding machine is a finite state machine where each state
specifies a different style of padding.

As an example of a simple padding machine, you could have a state machine
with the following states: `[START] -> [SETUP] -> [HTTP] -> [END]` where the
`[SETUP]` state pads in a way that obfuscates the ''circuit setup'' of Tor,
and the `[HTTP]` state pads in a way that emulates a simple HTTP session. Of
course, padding machines can be more complicated than that, with dozens of
states and non-trivial transitions.

Padding developers encode the machine states in the
[circpad_machine_spec_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L655). Each
machine state is described by a
[circpad_state_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L273)
and each such structure specifies the style and amount of padding to be sent,
as well as the possible state transitions.

The function `circpad_machine_states_init()` must be used for allocating and
initializing the `circpad_machine_spec_t.states` array before states and
state transitions can be defined, as some of the state object has non-zero
default values.

### 3.2. Padding Machine State Transitions

As described above, padding machines can have multiple states, to
support different forms of padding. Machines can transition between
states based on events that occur either on the circuit level or on
the machine level.

State transitions are specified using the
[next_state field](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L381)
of the `circpad_state_t` structure. As a simple example, to transition
from state `A` to state `B` when event `E` occurs, you would use the
following code: `A.next_state[E] = B`.

#### 3.2.1. State Transition Events

Here we will go through
[the various events](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L30)
that can be used to transition between states:

* Circuit-level events
  * `CIRCPAD_EVENT_NONPADDING_RECV`: A non-padding cell is received
  * `CIRCPAD_EVENT_NONPADDING_SENT`: A non-adding cell is sent
  * `CIRCPAD_EVENT_PADDING_SENT`: A padding cell is sent
  * `CIRCPAD_EVENT_PADDING_RECV`: A padding cell is received
* Machine-level events
  * `CIRCPAD_EVENT_INFINITY`: Tried to schedule padding using the ''infinity bin''.
  * `CIRCPAD_EVENT_BINS_EMPTY`: All histogram bins are empty (out of tokens)
  * `CIRCPAD_EVENT_LENGTH_COUNT`: State has used all its padding capacity (see `length_dist` below)

### 3.3. Specifying Per-State Padding

Each state of a padding machine specifies either:
  * A padding histogram describing inter-transmission delays between cells;
d     OR
  * A parameterized delay probability distribution for inter-transmission
    delays between cells.

Either mechanism specifies essentially the *minimum inter-transmission time*
distribution. If non-padding traffic does not get transmitted from this
endpoint before the delay value sampled from this distribution expires, a
padding packet is sent.

The choice between histograms and probability distributions can be subtle. A
rule of thumb is that probability distributions are easy to specify and
consume very little memory, but might not be able to describe certain types
of complex padding logic. Histograms, in contrast, can support precise
packet-count oriented or multimodal delay schemes, and can use token removal
logic to reduce overhead and shape the total padding+non-padding inter-packet
delay distribution towards an overall target distribution.

We suggest that you start with a probability distribution if possible, and
you move to a histogram-based approach only if a probability distribution
does not suit your needs.

#### 3.3.1. Padding Probability Distributions

The easiest, most compact way to schedule padding using a machine state is to
use a probability distribution that specifies the possible delays. That can
be done
[using the circpad_state_t fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L339)
`iat_dist`, `dist_max_sample_usec` and `dist_added_shift_usec`.

The Tor circuit padding framework
[supports multiple types](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L214)
of probability distributions, and the developer should use the
[circpad_distribution_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L240)
to specify them as well as the required parameters.

#### 3.3.2. Padding Histograms

A more advanced way to schedule padding is to use a ''padding
histogram''. The main advantages of a histogram are that it allows you to
specify distributions that are not easily parameterized in closed form, or
require specific packet counts at particular time intervals. Histograms also
allow you to make use of an optional traffic minimization and shaping
optimization called *token removal*, which is central to the original
[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
concept.

If a histogram is used by a state (as opposed to a fixed parameterized
distribution), then the developer must use the
[histogram-related fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L285)
of the `circpad_state_t` structure.

The width of a histogram bin specifies the range of inter-packet delay times,
whereas its height specifies the amount of tokens in each bin. To sample a
padding delay from a histogram, we first randomly pick a bin (weighted by the
amount of tokens in each bin) and then sample a delay from within that bin by
picking a uniformly random delay using the width of the bin as the range.

Each histogram also has an ''infinity bin'' as its final bin.  If the
''infinity bin'' is chosen,
we don't schedule any padding (i.e., we schedule padding with
infinite delay). If the developer does not want infinite delay, they
should not give any tokens to the ''infinity bin''.

If a token removal strategy is specified (via the
`circpad_state_t.token_removal` field), each time padding is sent using a
histogram, the padding machine will remove a token from the appropriate
histogram bin whenever this endpoint sends *either a padding packet or a
non-padding packet*. The different removal strategies govern what to do when
the bin corresponding to the current inter-packet delay is empty.

Token removal is optional. It is useful if you want to do things like specify
a burst should be at least N packets long, and you only want to add padding
packets if there are not enough non-padding packets. The cost of doing token
removal is additional memory allocations for making per-circuit copies of
your histogram that can be modified.

### 3.4. Specifying Precise Cell Counts

Padding machines should be able to specify the exact amount of padding they
send. For histogram-based machines this can be done using a specific amount
of tokens, but another (and perhaps easier) way to do this, is to use the
[length_dist field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L362)
of the `circpad_state_t` structure.

The `length_dist` field is basically a probability distribution similar to the
padding probability distributions, which applies to a specific machine state
and specifies the amount of padding we are willing to send during that state.
This value gets sampled when we transition to that state (TODO document this
in the code).

### 3.5. Specifying Overhead Limits

Separately from the length counts, it is possible to rate limit the overhead
percentage of padding at both the global level across all machines, and on a
per-machine basis.

At the global level, the overhead percentage of all circuit padding machines
as compared to total traffic can be limited through the Tor consensus
parameter `circpad_global_max_padding_pct`. This overhead is defined as the
percentage of padding cells *sent* out of the sum of non padding and padding
cells *sent*, and is applied *only after* at least
`circpad_global_allowed_cells` padding cells are sent by that relay or client
(to allow for small bursts of pure padding on otherwise idle or freshly
restarted relays). When both of these limits are hit by a relay or client, no
further padding cells will be sent, until sufficient non-padding traffic is
sent to cause the percentage of padding traffic to fall back below the
threshold.

Additionally, each individual padding machine can rate limit itself by
filling in the fields `circpad_machine_spec_t.max_padding_percent` and
`circpad_machine_spec_t.allowed_padding_count`, which behave identically to
the consensus parameters, but only apply to that specific machine.

## 4. Evaluating Padding Machines

One of the goals of the circuit padding framework is to provide improved
evaluation and scientific reproducibility for lower cost. This includes both
the [choice](#13-computation-model) of the compact C structure representation
(which has an easy-to-produce bitstring representation for optimization by
gradient descent, GAs, or GANs), as well as rapid prototyping and evaluation.

So far, whenever evaluation cost has been a barrier, each research group has
developed their own ad-hoc packet-level simulators of various padding
mechanisms for evaluating website fingerprinting attacks and defenses. The
process typically involves doing a crawl of Alexa top sites over Tor, and
recording the Tor cell count and timing information for each page in the
trace. These traces are then fed to simulations of defenses, which output
modified trace files.

Because no standardized simulation and evaluation mechanism exists, it is
often hard to tell if independent implementations of various attacks and
defenses are in fact true-to-form or even properly calibrated for direct
comparison, and discrepancies in results across the literature suggests
this is not always so.

Our preferred outcome with this framework is that machines are tuned
and optimized on a tracing simulator, but that the final results come from
an actual live network test of the defense. The traces from this final crawl
should be preserved as artifacts to be run on the simulator and reproduced
on the live network by future papers, ideally in journal venues that have an
artifact preservation policy.

### 4.1. Pure Simulation

When doing initial tuning of padding machines, especially in adversarial
settings, variations of a padding machine defense may need to be applied to
network activity hundreds or even millions of times. The wall-clock time
required to do this kind of tuning using live testing or even Shadow network
emulation may often be prohibitive.

To help address this, and to better standardize results, Tobias Pulls has
implemented a [circpad machine trace simulator](https://github.com/pylls/circpad-sim),
which uses Tor's unit test framework to simulate applying padding machines to
circuit packet traces via a combination of Tor patches and python scripts. This
simulator can be used to record traces from clients, Guards, Middles, Exits,
and any other hop in the path, only for circuits that are run by the
researcher. This makes it possible to safely record baseline traces and
ultimately even mount passive attacks on the live network, without impacting
or recording any normal user traffic.

In this way, a live crawl of the Alexa top sites could be performed once, to
produce a standard "undefended" corpus. Padding machines can be then quickly
evaluated and tuned on these simulated traces in a standardized way, and then
the results can then be [reproduced on the live Tor
network](#44-Testing-on-the-Live-Network) with the machines running on your own relays.

Please be mindful of the Limitations section of the simulator documentation,
however, to ensure that you are aware of the edge cases and timing
approximations that are introduced by this approach.

### 4.2. Testing in Chutney

The Tor Project provides a tool called
[Chutney](https://github.com/torproject/chutney/) which makes it very easy to
setup private Tor networks. While getting it work for the first time might
take you some time of doc reading, the final result is well worth it for the
following reasons:

- You control all the relays and hence you have greater control and debugging
  capabilities.
- You control all the relays and hence you can toggle padding support on/off
  at will.
- You don't need to be cautious about overhead or damaging the real Tor
  network during testing.
- You don't even need to be online; you can do all your testing offline over
  localhost.

A final word of warning here is that since Chutney runs over localhost, the
packet latencies and delays are completely different from the real Tor
network, so if your padding machines rely on real network timings you will
get different results on Chutney. You can work around this by using a
different set of delays if Chutney is used, or by moving your padding
machines to the real network when you want to do latency-related testing.

### 4.3. Testing in Shadow

[Shadow](https://shadow.github.io/) is an environment for running entire Tor
network simulations, similar to Chutney, but designed to be both more memory
efficient, as well as provide an accurate Tor network topology and latency
model.

While Shadow is significantly more memory efficient than Chutney, and can make
use of extremely accurate Tor network capacity and latency models, it will not
be as fast or efficient as the [circpad trace simulator](https://github.com/pylls/circpad-sim),
if you need to do many many iterations of an experiment to tune your defense.

### 4.4. Testing on the Live Network

Live network testing is the gold standard for verifying that any attack or
defense is behaving as expected, to minimize the influence of simplifying
assumptions.

However, it is not ethical, or necessarily possible, to run high-resolution
traffic analysis attacks on the entire Tor network. But it is both ethical
and possible to run small scale experiments that target only your own
clients, who will only use your own Tor relays that support your new padding
machines.

We provide the `MiddleNodes` torrc directive to enable this, which will allow
you to specify the fingerprints and/or IP netmasks of relays to be used in
the second hop position. Options to restrict other hops also exist, if your
padding system is padding to a different hop. The `HSLayer2Nodes` option
overrides the `MiddleNodes` option for onion service circuits, if both are
set. (The
[vanguards addon](https://github.com/mikeperry-tor/vanguards/README_TECHNICAL.md)
will set `HSLayer2Nodes`.)

When you run your own clients, and use MiddleNodes to restrict your clients
to use your relays, you can perform live network evaluations of a defense
applied to whatever traffic crawl or activity your clients do.

## 5. Example Padding Machines

### 5.1. Deployed Circuit Setup Machines

Tor currently has two padding machines enabled by default, which aim to hide
certain features of the client-side onion service circuit setup protocol. For
more details on their precise goal and function, please see
[proposal 302](https://github.com/torproject/torspec/blob/master/proposals/302-padding-machines-for-onion-clients.txt)
. In this section we will go over the code of those machines to clarify some
of the engineering parts:

#### 5.1.1. Overview

The codebase of proposal 302 can be found in
[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c)
and specifies four padding machines:

- The [client-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L60) circuit machine.
- The [relay-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L146) circuit machine.
- The [client-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L257) circuit machine
- The [relay-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L374) circuit machine.

Each of those machines has its own setup function, and
[they are all initialized](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.c#L2718)
by the circuit padding framework. To understand more about them, please
carefully read the individual setup function for each machine which are
fairly well documented. Each function goes through the following steps:
- Machine initialization
  - Give it a [name](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L70)
  - Specify [which hop](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L73) the padding should go to
  - Specify whether it should be [client-side](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L75) or relay-side.
- Specify for [which types of circuits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L78) the machine should apply
- Specify whether the circuits should be [kept alive](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c#L112) until the machine finishes padding.
- Sets [padding limits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L116) to avoid too much overhead in case of bugs or errors.
- Setup [machine states](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L120)
   - Specify [state transitions](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L123).
- Finally [register the machine](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L137) to the global machine list

### 5.2. Adaptive Padding Early

[Adaptive Padding Early](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/)
is a variant of Adaptive Padding/WTF-PAD that does not use histograms or token
removal to shift padding distributions, but instead uses fixed parameterized
distributions to specify inter-packet timing thresholds for burst and gap
inter-packet delays.

Tobias Pulls's [QuickStart Guide](CircuitPaddingQuickStart.md) describes how
to get this machine up and running, and has links to branches with a working
implementation.

### 5.3. Sketch of Tamaraw

The [Tamaraw defense
paper](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf) is the
only defense to date that provides a proof of optimality for the finite-length
website traffic fingerprinting domain. These bounds assume that a defense is
able to perform a full, arbitrary transform of a trace that is under a fixed
number of packets in length.

The key insight to understand Tamaraw's optimality is that it achieves one
such optimal transform by delaying traffic below a circuit's throughput. By
doing this, it creates a queue that is rarely empty, allowing it to produce
a provably optimal transform with minimal overhead. As [Section
1.4](#14-other-deployment-constraints) explains, this queue
cannot be maintained on the live Tor network without risk of out-of-memory
conditions at relays.

However, if the queue is not maintained in the Tor network, but instead by the
application layer, it could be deployed by websites that opt in to using it.

In this case, the application layer component would do *optional* constant
rate shaping, negotiated between a web browser and a website. The Circuit
Padding Framework can then easily fill in any missing gaps of cover traffic
packets, and also ensure that only a fixed length number of packets are sent
in total.

However, for such a defense to be safe, additional care must be taken to
ensure that the resulting traffic pattern still has a large
anonymity/confusion set with other traces on the live network.

Accomplishing this is an unsolved problem.

### 5.4. Other Padding Machines

Our partners in this project at RIT have produced a couple prototypes, based
on their published research designs
[REB and RBB](https://www.researchgate.net/publication/329743510_UNDERSTANDING_FEATURE_DISCOVERY_IN_WEBSITE_FINGERPRINTING_ATTACKS).

As [their writeup
explains](https://github.com/notem/tor-rbp-padding-machine-doc), because RBB
uses delay, the circuit padding machine that they made is a no-delay version.

They also ran into an issue with the 0-delay timing workaround for [bug
31653](https://bugs.torproject.org/31653). Keep an eye on that bug for updates
with improved workarounds/fixes.

Their code is [available on github](https://github.com/notem/tor/tree/circuit_padding_rbp_machine).


## 6. Framework Implementation Details

If you need to add additional events, conditions, or other features to the
circuit padding framework, then this section is for you.

### 6.1. Memory Allocation Conventions

If the existing circuit padding features are sufficient for your needs, then
you do not need to worry about memory management or pointer lifespans. The
circuit padding framework should take care of this for you automatically.

However, if you need to add new padding machine features to support your
padding machines, then it will be helpful to understand how circuits
correspond to the global machine definitions, and how mutable padding machine
memory is managed.

#### 6.1.1. Circuits and Padding Machines

In Tor, the
[circuit_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuit_st.h)
is the superclass structure for circuit-related state that is used on both
clients and relays. On clients, the actual datatype of the object pointed to
by `circuit_t *` is the subclass structure
[origin_circuit_t](https://github.com/torproject/tor/blob/master/src/core/or/origin_circuit_st.h). The
macros `CIRCUIT_IS_ORIGIN()` and `TO_ORIGIN_CIRCUIT()` are used to determine
if a circuit is a client-side (origin) circuit and to cast the pointer safely
to `origin_circuit_t *`.

Because circuit padding machines can be present at both clients and relays,
the circuit padding fields are stored in the `circuit_t *` superclass
structure. Notice that there are actually two sets of circuit padding fields:
a `const circpad_machine_spec_t *` array, and a `circpad_machine_runtime_t *`
array. Each of these arrays holds at most two elements, as there can be at
most two padding machines on each circuit.

The `const circpad_machine_spec_t *` points to a globally allocated machine
specification. These machine specifications are
allocated and set up during Tor program startup, in `circpad_machines_init()`
in
[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). Because
the machine specification object is shared by all circuits, it must not be
modified or freed until program exit (by `circpad_machines_free()`). The
`const` qualifier should enforce this at compile time.

The `circpad_machine_runtime_t *` array member points to the mutable runtime
information for machine specification at that same array index. This runtime
structure keeps track of the current machine state, packet counts, and other
information that must be updated as the machine operates. When a padding
machine is successfully negotiated `circpad_setup_machine_on_circ()` allocates
the associated runtime information.

#### 6.1.2. Histogram Management

If a `circpad_state_t` of a machine specifies a `token_removal` strategy
other than `CIRCPAD_TOKEN_REMOVAL_NONE`, then every time
there is a state transition into this state, `circpad_machine_setup_tokens()`
will copy the read-only `circpad_state_t.histogram` array into a mutable
version at `circpad_machine_runtime_t.histogram`. This mutable copy is used
to decrement the histogram bin accounts as packets are sent, as per the
specified token removal strategy.

When the state machine transitions out of this state, the mutable histogram copy is freed
by this same `circpad_machine_setup_tokens()` function.

#### 6.1.3. Deallocation and Shutdown

As an optimization, padding machines can be swapped in and out by the client
without waiting a full round trip for the relay machine to shut down.

Internally, this is accomplished by immediately freeing the heap-allocated
`circuit_t.padding_info` field corresponding to that machine, but still preserving the
`circuit_t.padding_machine` pointer to the global padding machine
specification until the response comes back from the relay. Once the response
comes back, that `circuit_t.padding_machine` pointer is set to NULL, if the
response machine number matches the current machine present.

Because of this partial shutdown condition, we have two macros for iterating
over machines. `FOR_EACH_ACTIVE_CIRCUIT_MACHINE_BEGIN()` is used to iterate
over machines that have both a `circuit_t.padding_info` slot and a
`circuit_t.padding_machine` slot occupied. `FOR_EACH_CIRCUIT_MACHINE_BEGIN()`
is used when we need to iterate over all machines that are either active or
are simply waiting for a response to a shutdown request.

If the machine is replaced instead of just shut down, then the client frees
the `circuit_t.padding_info`, and then sets the `circuit_t.padding_machine`
and `circuit_t.padding_info` fields for this next machine immediately. This is
done in `circpad_add_matching_machines()`. In this case, since the new machine
should have a different machine number, the shut down response from the relay
is silently discarded, since it will not match the new machine number.

If this sequence of machine teardown and spin-up happens rapidly enough for
the same machine number (as opposed to different machines), then a race
condition can happen. This is
[known bug #30992](https://bugs.torproject.org/30992).

When the relay side decides to shut down a machine, it sends a
`RELAY_COMMAND_PADDING_NEGOTIATED` towards the client. If this cell matches the
current machine number on the client, that machine is torn down, by freeing
the `circuit_t.padding_info` slot and immediately setting
`circuit_t.padding_machine` slot to NULL.

Additionally, if Tor decides to close a circuit forcibly due to error before
the padding machine is shut down, then `circuit_t.padding_info` is still
properly freed by the call to `circpad_circuit_free_all_machineinfos()`
in `circuit_free_()`.

### 6.2. Machine Application Events

The framework checks client-side origin circuits to see if padding machines
should be activated or terminated during specific event callbacks in
`circuitpadding.c`. We list these event callbacks here only for reference. You
should not modify any of these callbacks to get your machine to run; instead,
you should use the `circpad_machine_spec_t.conditions` field.

However, you may add new event callbacks if you need other activation events,
for example to provide obfuscation-layer or application-layer signaling. Any
new event callbacks should behave exactly like the existing callbacks.

During each of these event callbacks, the framework checks to see if any
current running padding machines have conditions that no longer apply as a
result of the event, and shuts those machines down. Then, it checks to see if
any new padding machines should be activated as a result of the event, based
on their circuit application conditions. **Remember: Machines are checked in
reverse order in the machine list. This means that later, more recently added
machines take precedence over older, earlier entries in each list.**

Both of these checks are performed using the machine application conditions
that you specify in your machine's `circpad_machine_spec_t.conditions` field.

The machine application event callbacks are prefixed by `circpad_machine_event_` by convention in circuitpadding.c. As of this writing, these callbacks are:

  - `circpad_machine_event_circ_added_hop()`: Called whenever a new hop is
    added to a circuit.
  - `circpad_machine_event_circ_built()`: Called when a circuit has completed
    construction and is
    opened. <!-- open != ready for traffic. Which do we mean? -nickm -->
  - `circpad_machine_event_circ_purpose_changed()`: Called when a circuit
    changes purpose.
  - `circpad_machine_event_circ_has_no_relay_early()`: Called when a circuit
    runs out of `RELAY_EARLY` cells.
  - `circpad_machine_event_circ_has_streams()`: Called when a circuit gets a
    stream attached.
  - `circpad_machine_event_circ_has_no_streams()`: Called when the last
    stream is detached from a circuit.

### 6.3. Internal Machine Events

To provide for some additional capabilities beyond simple finite state machine
behavior, the circuit padding machines also have internal events that they
emit to themselves when packet count length limits are hit, when the Infinity
bin is sampled, and when the histogram bins are emptied of all tokens.

These events are implemented as `circpad_internal_event_*` functions in
`circuitpadding.c`, which are called from various areas that determine when
the events should occur.

While the conditions that trigger these internal events to be called may be
complex, they are processed by the state machine definitions in a nearly
identical manner as the cell processing events, with the exception that they
are sent to the current machine only, rather than all machines on the circuit.


## 7. Future Features and Optimizations

While implementing the circuit padding framework, our goal was to deploy a
system that obscured client-side onion service circuit setup and supported
deployment of WTF-PAD and/or APE. Along the way, we noticed several features
that might prove useful to others, but were not essential to implement
immediately. We do not have immediate plans to implement these ideas, but we
would gladly accept patches that do so.

The following list gives an overview of these improvements, but as this
document ages, it may become stale. The canonical list of improvements that
researchers may find useful is tagged in our bugtracker with
[circpad-researchers](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers),
and the list of improvements that are known to be necessary for some research
areas are tagged with
[circpad-researchers-want](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers-want).

Please consult those lists for the latest status of these issues. Note that
not all fixes will be backported to all Tor versions, so be mindful of which
Tor releases receive which fixes as you conduct your experiments.

### 7.1. Load Balancing and Flow Control

Fortunately, non-Exit bandwidth is already plentiful and exceeds the Exit
capacity, and we anticipate that if we inform our relay operator community of
the need for non-Exit bandwidth to satisfy padding overhead requirements,
they will be able to provide that with relative ease.

Unfortunately, padding machines that have large quantities of overhead will
require changes to our load balancing system to account for this
overhead. The necessary changes are documented in
[Proposal 265](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt).

Additionally, padding cells are not currently subjected to flow control. For
high amounts of padding, we may want to change this. See [ticket
31782](https://bugs.torproject.org/31782) for details.

### 7.2. Timing and Queuing Optimizations

The circuitpadding framework has some timing related issues that may impact
results. If high-resolution timestamps are fed to opaque deep learning
trainers, those training models may end up able to differentiate padding
traffic from non-padding traffic due to these timing bugs.

The circuit padding cell event callbacks come from post-decryption points in
the cell processing codepath, and from the pre-queue points in the cell send
codepath. This means that our cell events will not reflect the actual time
when packets are read or sent on the wire. This is much worse in the send
direction, as the circuitmux queue, channel outbuf, and kernel TCP buffer will
impose significant additional delay between when we currently report that a
packet was sent, and when it actually hits the wire.

[Ticket 29494](https://bugs.torproject.org/29494) has a more detailed
description of this problem, and an experimental branch that changes the cell
event callback locations to be from circuitmux post-queue, which with KIST,
should be an accurate reflection of when they are actually sent on the wire.

If your padding machine and problem space depends on very accurate notions of
relay-side packet timing, please try that branch and let us know on the
ticket if you need any further assistance fixing it up.

Additionally, with these changes, it will be possible to provide further
overhead reducing optimizations by letting machines specify flags to indicate
that padding should not be sent if there are any cells pending in the cell
queue, for doing things like extending cell bursts more accurately and with
less overhead.

However, even if we solve the queuing issues, Tor's current timers are not as
precise as some padding designs may require. We will still have issues of
timing precision to solve. [Ticket 31653](https://bugs.torproject.org/31653)
describes an issue the circuit padding system has with sending 0-delay padding
cells, and [ticket 32670](https://bugs.torproject.org/32670) describes a
libevent timer accuracy issue, which causes callbacks to vary up to 10ms from
their scheduled time, even in absence of load.

All of these issues strongly suggest that you either truncate the resolution
of any timers you feed to your classifier, or that you omit timestamps
entirely from the classification problem until these issues are addressed.

### 7.3. Better Machine Negotiation

Circuit padding is applied to circuits through machine conditions.

The following machine conditions may be useful for some use cases, but have
not been implemented yet:
  * [Exit Policy-based Stream Conditions](https://bugs.torproject.org/29083)
  * [Probability to apply machine/Cointoss condition](https://bugs.torproject.org/30092)
  * [Probability distributions for launching new padding circuit(s)](https://bugs.torproject.org/31783)
  * [More flexible purpose handling](https://bugs.torproject.org/32040)

Additionally, the following features may help to obscure that padding is being
negotiated, and/or streamline that negotiation process:
  * [Always send negotiation cell on all circuits](https://bugs.torproject.org/30172)
  * [Better shutdown handling](https://bugs.torproject.org/30992)
  * [Preference-ordered negotiation menu](https://bugs.torproject.org/30348)

### 7.4. Probabilistic State Transitions

Right now, the state machine transitions are fully deterministic. However,
one could imagine a state machine that uses probabilistic transitions between
states to simulate a random walk or Hidden Markov Model traversal across
several pages.

The simplest way to implement this is to make  the `circpad_state_t.next_state` array
into an array of structs that have a next state field, and a probability to
transition to that state.

If you need this feature, please see [ticket
31787](https://bugs.torproject.org/31787) for more details.

### 7.5. More Complex Pattern Recognition

State machines are extremely efficient sequence recognition devices. But they
are not great pattern recognition devices. This is one of the reasons why
[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
used state machines in combination with histograms, to model the target
distribution of interpacket delays for transmitted packets.

However, there currently is no such optimization for reaction to patterns of
*received* traffic. There may also be cases where defenses must react to more
complex patterns of sent traffic than can be expressed by our current
histogram and length count events.

For example: if you wish your machine to react to a certain count of incoming
cells in a row, right now you have to have a state for each cell, and use the
infinity bin to timeout of the sequence in each state. We could make this more
compact if each state had an arrival cell counter and inter-cell timeout. Or
we could make more complex mechanisms to recognize certain patterns of arrival
traffic in a state.

The best way to build recognition primitives like this into the framework is
to add additional [Internal Machine Events](#63-internal-machine-events) for
the pattern in question.

As another simple example, a useful additional event might be to transition
whenever any of your histogram bins are empty, rather than all of them. To do
this, you would add `CIRCPAD_EVENT_ANY_BIN_EMPTY` to the enum
`circpad_event_t` in `circuitpadding.h`. You would then create a function
`circuitpadding_internal_event_any_bin_empty()`, which would work just like
`circuitpadding_internal_event_bin_empty()`, and also be called from
`check_machine_token_supply()` in `circuitpadding.c` but with the check for
each bin being zero instead of the total. With this change, new machines could
react to this new event in the same way as any other.

If you have any other ideas that may be useful, please comment on [ticket
32680](https://bugs.torproject.org/32680).


## 8. Open Research Problems

### 8.1. Onion Service Circuit Setup

Our circuit setup padding does not address timing-based features, only
packet counts. Deep learning can probably see this.

However, before going too deep down the timing rabithole, we may need to make
[some improvements to Tor](#72-timing-and-queuing-optimizations). Please
comment on those tickets if you need this.

### 8.2. Onion Service Fingerprinting

We have done nothing to obscure the service side of onion service circuit
setup. Because service-side onion services will have the reverse traffic byte
counts as normal clients, they will likely need some kind of [hybrid
application layer traffic shaping](#53-sketch-of-tamaraw), in addition to
simple circuit setup obfuscation.

Fingerprinting in
[combination](https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md)
with
[vanguards](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md)
ia also an open issue.

### 8.3. Open World Fingerprinting

Similarly, Open World/clearweb website fingerprinting defenses remain
an unsolved problem from the practicality point of view. The original WTF-PAD
defense was never tuned, and it is showing accuracy issues against deep
learning attacks.

### 8.4. Protocol Usage Fingerprinting

Traffic Fingerprinting to determine the protocol in use by a client has not
been studied, either from the attack or defense point of view.

### 8.5. Datagram Transport Side Channels

Padding can reduce the accuracy of dropped-cell side channels in such
transports, but we don't know [how to measure
this](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).

## 9. Must Read Papers

These are by far the most important papers in the space, to date:

 - [Tamaraw](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
 - [Bayes, Not Naive](https://www.petsymposium.org/2017/papers/issue4/paper50-2017-4-source.pdf)
 - [Anonymity Trilemma](https://eprint.iacr.org/2017/954.pdf)
 - [WTF-PAD](http://arxiv.org/pdf/1512.00524)

Except for WTF-PAD, these papers were selected because they point towards
optimality bounds that can be benchmarked against.

We cite them even though we are skeptical that provably optimal defenses can
be constructed, at least not without trivial or impractical transforms (such as
those that can be created with unbounded queue capacity, or stored knowledge
of traces for every possible HTTP trace on the Internet).

We also are not demanding an optimality or security proof for every defense.

Instead, we cite the above as benchmarks. We believe the space, especially the
open-world case, to be more akin to an optimization problem, where a
WTF-PAD-like defense must be tuned through an optimizer to produce results
comparable to provably optimal but practically unrealizable defenses, through
rigorous adversarial evaluation.