summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/HACKING/CircuitPaddingDevelopment.md1230
-rw-r--r--doc/HACKING/CircuitPaddingQuickStart.md263
2 files changed, 1493 insertions, 0 deletions
diff --git a/doc/HACKING/CircuitPaddingDevelopment.md b/doc/HACKING/CircuitPaddingDevelopment.md
new file mode 100644
index 0000000000..a0dee68e91
--- /dev/null
+++ b/doc/HACKING/CircuitPaddingDevelopment.md
@@ -0,0 +1,1230 @@
+# Circuit Padding Developer Documentation
+
+This document is written for researchers who wish to prototype and evaluate circuit-level padding defenses in Tor.
+
+Written by Mike Perry and George Kadianakis.
+
+# Table of Contents
+
+- [0. Background](#0-background)
+- [1. Introduction](#1-introduction)
+ - [1.1. System Overview](#11-system-overview)
+ - [1.2. Layering Model](#12-layering-model)
+ - [1.3. Computation Model](#13-computation-model)
+ - [1.4. Deployment Constraints](#14-other-deployment-constraints)
+- [2. Creating New Padding Machines](#2-creating-new-padding-machines)
+ - [2.1. Registering a New Padding Machine](#21-registering-a-new-padding-machine)
+ - [2.2. Machine Activation and Shutdown](#22-machine-activation-and-shutdown)
+- [3. Specifying Padding Machines](#3-specifying-padding-machines)
+ - [3.1. Padding Machine States](#31-padding-machine-states)
+ - [3.2. Padding Machine State Transitions](#32-padding-machine-state-transitions)
+ - [3.3. Specifying Per-State Padding](#33-specifying-per-state-padding)
+ - [3.4. Specifying Precise Cell Counts](#34-specifying-precise-cell-counts)
+ - [3.5. Specifying Overhead Limits](#35-specifying-overhead-limits)
+- [4. Evaluating Padding Machines](#4-evaluating-padding-machines)
+ - [4.1. Pure Simulation](#41-pure-simulation)
+ - [4.2. Testing in Chutney](#42-testing-in-chutney)
+ - [4.3. Testing in Shadow](#43-testing-in-shadow)
+ - [4.4. Testing on the Live Network](#44-testing-on-the-live-network)
+- [5. Example Padding Machines](#5-example-padding-machines)
+ - [5.1. Deployed Circuit Setup Machines](#51-deployed-circuit-setup-machines)
+ - [5.2. Adaptive Padding Early](#52-adaptive-padding-early)
+ - [5.3. Sketch of Tamaraw](#53-sketch-of-tamaraw)
+ - [5.4. Other Padding Machines](#54-other-padding-machines)
+- [6. Framework Implementation Details](#6-framework-implementation-details)
+ - [6.1. Memory Allocation Conventions](#61-memory-allocation-conventions)
+ - [6.2. Machine Application Events](#62-machine-application-events)
+ - [6.3. Internal Machine Events](#63-internal-machine-events)
+- [7. Future Features and Optimizations](#7-future-features-and-optimizations)
+ - [7.1. Load Balancing and Flow Control](#71-load-balancing-and-flow-control)
+ - [7.2. Timing and Queuing Optimizations](#72-timing-and-queuing-optimizations)
+ - [7.3. Better Machine Negotiation](#73-better-machine-negotiation)
+ - [7.4. Probabilistic State Transitions](#74-probabilistic-state-transitions)
+ - [7.5. More Complex Pattern Recognition](#75-more-complex-pattern-recognition)
+- [8. Open Research Problems](#8-open-research-problems)
+ - [8.1. Onion Service Circuit Setup](#81-onion-service-circuit-setup)
+ - [8.2. Onion Service Fingerprinting](#82-onion-service-fingerprinting)
+ - [8.3. Open World Fingerprinting](#83-open-world-fingerprinting)
+ - [8.4. Protocol Usage Fingerprinting](#84-protocol-usage-fingerprinting)
+ - [8.5. Datagram Transport Side Channels](#85-datagram-transport-side-channels)
+- [9. Must Read Papers](#9-must-read-papers)
+
+
+## 0. Background
+
+Tor supports both connection-level and circuit-level padding, and both
+systems are live on the network today. The connection-level padding behavior
+is described in [section 2 of
+padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47). The
+circuit-level padding behavior is described in [section 3 of
+padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L282).
+
+These two systems are orthogonal and should not be confused. The
+connection-level padding system is only active while the TLS connection is
+otherwise idle. Moreover, it regards circuit-level padding as normal data
+traffic, and hence while the circuit-level padding system is actively padding,
+the connection-level padding system will not add any additional overhead.
+
+While the currently deployed circuit-level padding behavior is quite simple,
+it is built on a flexible framework. This framework supports the description
+of event-driven finite state machine by filling in fields of a simple C
+structure, and is designed to support any delay-free statistically shaped
+cover traffic on individual circuits, with cover traffic flowing to and from a
+node of the implementor's choice (Guard, Middle, Exit, Rendezvous, etc).
+
+This class of system was first proposed in
+[Timing analysis in low-latency mix networks: attacks and defenses](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
+by Shmatikov and Wang, and extended for the website traffic fingerprinting
+domain by Juarez et al. in
+[Toward an Efficient Website Fingerprinting Defense](http://arxiv.org/pdf/1512.00524). The
+framework also supports fixed parameterized probability distributions, as
+used in [APE](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) by Tobias
+Pulls, and many other features.
+
+This document describes how to use Tor's circuit padding framework to
+implement and deploy novel delay-free cover traffic defenses.
+
+## 1. Introduction
+
+The circuit padding framework is the official way to implement padding
+defenses in Tor. It may be used in combination with application-layer
+defenses, and/or obfuscation defenses, or on its own.
+
+Its current design should be enough to deploy most defenses without
+modification, but you can extend it to [provide new
+features](#7-future-features-and-optimizations) as well.
+
+### 1.1. System Overview
+
+Circuit-level padding can occur between Tor clients and relays at any hop of
+one of the client's circuits. Both parties need to support the same padding
+mechanisms for the system to work, and the client must enable it.
+
+We added a padding negotiation relay cell to the Tor protocol that clients use
+to ask a relay to start padding, as well as a torrc directive for researchers
+to pin their clients' relay selection to the subset of Tor nodes that
+implement their custom defenses, to support ethical live network testing and
+evaluation.
+
+Circuit-level padding is performed by 'padding machines'. A padding machine is
+a finite state machine. Every state specifies a different form of
+padding style, or stage of padding, in terms of inter-packet timings and total
+packet counts.
+
+Padding state machines are specified by filling in fields of a C structure,
+which specifies the transitions between padding states based on various events,
+probability distributions of inter-packet delays, and the conditions under
+which padding machines should be applied to circuits.
+
+This compact C structure representation is designed to function as a
+microlanguage, which can be compiled down into a
+bitstring that [can be tuned](#13-computation-model) using various
+optimization methods (such as gradient descent, GAs, or GANs), either in
+bitstring form or C struct form.
+
+The event driven, self-contained nature of this framework is also designed to
+make [evaluation](#4-evaluating-padding-machines) both expedient and rigorously
+reproducible.
+
+This document covers the engineering steps to write, test, and deploy a
+padding machine, as well as how to extend the framework to support new machine
+features.
+
+If you prefer to learn by example, you may want to skip to either the
+[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
+5](#5-example-padding-machines) for example machines to get you up and running
+quickly.
+
+### 1.2. Layering Model
+
+The circuit padding framework is designed to provide one layer in a layered
+system of interchangeable components.
+
+The circuit padding framework operates at the Tor circuit layer. It only deals
+with the inter-cell timings and quantity of cells sent on a circuit. It can
+insert cells on a circuit in arbitrary patterns, and in response to arbitrary
+conditions, but it cannot delay cells. It also does not deal with packet
+sizes, how cells are packed into TLS records, or ways that the Tor protocol
+might be recognized on the wire.
+
+The problem of differentiating Tor traffic from non-Tor traffic based on
+TCP/TLS packet sizes, initial handshake patterns, and DPI characteristics is the
+domain of [pluggable
+transports](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGardenOfPluggableTransports),
+which may optionally be used in conjunction with this framework (or without
+it).
+
+This document focuses primarily on the circuit padding framework's cover
+traffic features, and will only briefly touch on the potential obfuscation and
+application layer coupling points of the framework. Explicit layer coupling
+points can be created by adding either new [machine application
+events](#62-machine-application-events) or new [internal machine
+events](#63-internal-machine-events) to the circuit padding framework, so that
+your padding machines can react to events from other layers.
+
+### 1.3. Computation Model
+
+The circuit padding framework is designed to support succinctly specified
+defenses that can be tuned through [computer-assisted
+optimization](#4-evaluating-padding-machines).
+
+We chose to generalize the original [Adaptive Padding 2-state
+design](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) into an
+event-driven state machine because state machines are the simplest form of
+sequence recognition devices from [automata
+theory](https://en.wikipedia.org/wiki/Finite-state_machine).
+
+Most importantly: this framing allows cover traffic defenses to be modeled as
+an optimization problem search space, expressed as fields of a C structure
+(which is simultaneously a compact opaque bitstring as well as a symbolic
+vector in an abstract feature space). This kind of space is particularly well
+suited to search by gradient descent, GAs, and GANs.
+
+When performing this optimization search, each padding machine should have a
+fitness function, which will allow two padding machines to be compared for
+relative effectiveness. Optimization searches work best if this fitness can be
+represented as a single number, for example the total amount by which it
+reduces the [Balanced
+Accuracy](https://en.wikipedia.org/wiki/Precision_and_recall#Imbalanced_Data)
+of an adversary's classifier, divided by an amount of traffic overhead.
+
+Before you begin the optimization phase for your defense, you should
+also carefully consider the [features and
+optimizations](#7-future-features-and-optimizations) that we suspect will be
+useful, and also see if you can come up with any more. You should similarly be
+sure to restrict your search space to avoid areas of the bitstring/feature
+vector that you are sure you will not need. For example, some
+[applications](#8-open-research-problems) may not need the histogram
+accounting used by Adaptive Padding, but might need to add other forms of
+[pattern recognition](#75-more-complex-pattern-recognition) to react to
+sequences that resemble HTTP GET and HTTP POST.
+
+### 1.4. Other Deployment Constraints
+
+The framework has some limitations that are the result of deliberate
+choices. We are unlikely to deploy defenses that ignore these limitations.
+
+In particular, we have deliberately not provided any mechanism to delay actual
+user traffic, even though we are keenly aware that if we were to support
+additional delay, defenses would be able to have [more success with less
+bandwidth
+overhead](https://freedom.cs.purdue.edu/anonymity/trilemma/index.html).
+
+In the website traffic fingerprinting domain, [provably optimal
+defenses](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
+achieve their bandwidth overhead bounds by ensuring that a non-empty queue is
+maintained, by rate limiting traffic below the actual throughput of a circuit.
+For optimal results, this queue must avoid draining to empty, and yet it
+must also be drained fast enough to avoid tremendous queue overhead in fast
+Tor relays, which carry hundreds of thousands of circuits simultaneously.
+
+Unfortunately, Tor's end-to-end flow control is not congestion control. Its
+window sizes are currently fixed. This means there is no signal when queuing
+occurs, and thus no ability to limit queue size through pushback. This means
+there is currently no way to do the fine-grained queue management necessary to
+create such a queue and rate limit traffic effectively enough to keep this
+queue from draining to empty, without also risking that aggregate queuing
+would cause out-of-memory conditions on fast relays.
+
+It may be possible to create a congestion control algorithm that can support
+such fine grained queue management, but this is a [deeply unsolved area of
+research](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
+
+Even beyond these major technical hurdles, additional latency is also
+unappealing to the wider Internet community, for the simple reason that
+bandwidth [continues to increase
+exponentially](https://ipcarrier.blogspot.com/2014/02/bandwidth-growth-nearly-what-one-would.html)
+where as the speed of light is fixed. Significant engineering effort has been
+devoted to optimizations that reduce the effect of latency on Internet
+protocols. To go against this trend would ensure our irrelevance to the wider
+conversation about traffic analysis defenses for low latency Internet protocols.
+
+On the other hand, through [load
+balancing](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt)
+and [circuit multiplexing strategies](https://bugs.torproject.org/29494), we
+believe it is possible to add significant bandwidth overhead in the form of
+cover traffic, without significantly impacting end-user performance.
+
+For these reasons, we believe the trade-off should be in favor of adding more
+cover traffic, rather than imposing queuing memory overhead and queuing delay.
+
+As a last resort for narrowly scoped application domains (such as
+shaping Tor service-side onion service traffic to look like other websites or
+different application-layer protocols), delay *may* be added at the
+[application layer](https://petsymposium.org/2017/papers/issue2/paper54-2017-2-source.pdf).
+Any additional cover traffic required by such defenses should still be
+added at the circuit padding layer using this framework, to provide
+engineering efficiency through loose layer coupling and component re-use, as
+well as to provide additional gains against [low
+resolution](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47)
+end-to-end traffic correlation.
+
+Because such delay-based defenses will impact performance significantly more
+than simply adding cover traffic, they must be optional, and negotiated by
+only specific application layer endpoints that want them. This will have
+consequences for anonymity sets and base rates, if such traffic shaping and
+additional cover traffic is not very carefully constructed.
+
+In terms of acceptable overhead, because Tor onion services
+[currently use](https://metrics.torproject.org/hidserv-rend-relayed-cells.html)
+less than 1% of the
+[total consumed bandwidth](https://metrics.torproject.org/bandwidth-flags.html)
+of the Tor network, and because onion services exist to provide higher
+security as compared to Tor Exit traffic, they are an attractive target for
+higher-overhead defenses. We encourage researchers to target this use case
+for defenses that require more overhead, and/or for the deployment of
+optional negotiated application-layer delays on either the server or the
+client side.
+
+## 2. Creating New Padding Machines
+
+This section explains how to use the existing mechanisms in Tor to define a
+new circuit padding machine. We assume here that you know C, and are at
+least somewhat familiar with Tor development. For more information on Tor
+development in general, see the other files in doc/HACKING/ in a recent Tor
+distribution.
+
+Again, if you prefer to learn by example, you may want to skip to either the
+[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
+5](#5-example-padding-machines) for example machines to get up and running
+quickly.
+
+To create a new padding machine, you must:
+
+ 1. Define your machine using the fields of a heap-allocated
+ `circpad_machine_spec_t` C structure.
+
+ 2. Register this object in the global list of available padding machines,
+ using `circpad_register_padding_machine()`.
+
+ 3. Ensure that your machine is properly negotiated under your desired
+ circuit conditions.
+
+### 2.1. Registering a New Padding Machine
+
+Again, a circuit padding machine is designed to be specified entirely as a [single
+C structure](#13-computation-model).
+
+Your machine definitions should go into their own functions in
+[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c). For
+details on all of the fields involved in specifying a padding machine, see
+[Section 3](#3-specifying-padding-machines).
+
+You must register your machine in `circpad_machines_init()` in
+[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). To
+add a new padding machine specification, you must allocate a
+`circpad_machine_spec_t` on the heap with `tor_malloc_zero()`, give it a
+human readable name string, and a machine number equivalent to the number of
+machines in the list, and register the structure using
+`circpad_register_padding_machine()`.
+
+Each machine must have a client instance and a relay instance. Register your
+client-side machine instance in the `origin_padding_machines` list, and your
+relay side machine instance in the `relay_padding_machines` list. Once you
+have registered your instance, you do not need to worry about deallocation;
+this is handled for you automatically.
+
+Both machine lists use registration order to signal machine precedence for a
+given `machine_idx` slot on a circuit. This means that machines that are
+registered last are checked for activation *before* machines that are
+registered first. (This reverse precedence ordering allows us to
+deprecate older machines simply by adding new ones after them.)
+
+### 2.2. Machine Activation and Shutdown
+
+After a machine has been successfully registered with the framework, it will
+be instantiated on any client-side circuits that support it. Only client-side
+circuits may initiate padding machines, but either clients or relays may shut
+down padding machines.
+
+#### 2.2.1. Machine Application Conditions
+
+The
+[circpad_machine_conditions_t conditions field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L641)
+of your `circpad_machine_spec_t` machine definition instance controls the
+conditions under which your machine will be attached and enabled on a Tor
+circuit, and when it gets shut down.
+
+*All* of your explicitly specified conditions in
+`circpad_machine_spec_t.conditions` *must* be met for the machine to be
+applied to a circuit. If *any* condition ceases to be met, then the machine
+is shut down. (This is checked on every event that arrives, even if the
+condition is unrelated to the event.)
+Another way to look at this is that
+all specified conditions must evaluate to true for the entire duration that
+your machine is running. If any are false, your machine does not run (or
+stops running and shuts down).
+
+In particular, as part of the
+[circpad_machine_conditions_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L149),
+the circuit padding subsystem gives the developer the option to enable a
+machine based on:
+ - The
+ [length](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L157)
+ on the circuit (via the `min_hops` field).
+ - The
+ [current state](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L174)
+ of the circuit, such as streams, relay_early, etc. (via the
+ `circpad_circuit_state_t state_mask` field).
+ - The
+ [purpose](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L178)
+ (i.e. type) of the circuit (via the `circpad_purpose_mask_t purpose_mask`
+ field).
+
+This condition mechanism is the preferred way to determine if a machine should
+apply to a circuit. For information about potentially useful conditions that
+we have considered but have not yet implemented, see [Section
+7.3](#73-better-machine-negotiation). We will happily accept patches for those
+conditions, or any for other additional conditions that are needed for your
+use case.
+
+#### 2.2.2. Detecting and Negotiating Machine Support
+
+When a new machine specification is added to Tor (or removed from Tor), you
+should bump the Padding subprotocol version in `src/core/or/protover.c` and
+`src/rust/protover/protover.rs`, add a field to `protover_summary_flags_t` in
+`or.h`, and set this field in `memoize_protover_summary()` in versions.c. This
+new field must then be checked in `circpad_node_supports_padding()` in
+`circuitpadding.c`.
+
+Note that this protocol version update and associated support check is not
+necessary if your experiments will *only* be using your own relays that
+support your own padding machines. This can be accomplished by using the
+`MiddleNodes` directive; see [Section 4](#4-evaluating-padding-machines) for more information.
+
+If the protocol support check passes for the circuit, then the client sends a
+`RELAY_COMMAND_PADDING_NEGOTIATE` cell towards the
+`circpad_machine_spec_t.target_hop` relay, and immediately enables the
+padding machine, and may begin sending padding. (The framework does not wait
+for the `RELAY_COMMAND_PADDING_NEGOTIATED` response to begin padding,
+so that we can
+switch between machines rapidly.)
+
+#### 2.2.3. Machine Shutdown Mechanisms
+
+Padding machines can be shut down on a circuit in three main ways:
+ 1. During a `circpad_machine_event` callback, when
+ `circpad_machine_spec_t.conditions` no longer applies (client side)
+ 2. After a transition to the CIRCPAD_STATE_END, if
+ `circpad_machine_spec_t.should_negotiate_end` is set (client or relay
+ side)
+ 3. If there is a `RELAY_COMMAND_PADDING_NEGOTIATED` error response from the
+ relay during negotiation.
+
+Each of these cases causes the originating node to send a relay cell towards
+the other side, indicating that shutdown has occurred. The client side sends
+`RELAY_COMMAND_PADDING_NEGOTIATE`, and the relay side sends
+`RELAY_COMMAND_PADDING_NEGOTIATED`.
+
+Because padding from malicious exit nodes can be used to construct active
+timing-based side channels to malicious guard nodes, the client checks that
+padding-related cells only come from relays with active padding machines.
+For this reason, when a client decides to shut down a padding machine,
+the framework frees the mutable `circuit_t.padding_info`, but leaves the
+`circuit_t.padding_machine` pointer set until the
+`RELAY_COMMAND_PADDING_NEGOTIATED` response comes back, to ensure that any
+remaining in-flight padding packets are recognized a valid. Tor does
+not yet close circuits due to violation of this property, but the
+[vanguards addon component "bandguard"](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md#the-bandguards-subsystem)
+does.
+
+As an optimization, a client may replace a machine with another, by
+sending a `RELAY_COMMAND_PADDING_NEGOTIATE` cell to shut down a machine, and
+immediately sending a `RELAY_COMMAND_PADDING_NEGOTIATE` to start a new machine
+in the same index, without waiting for the response from the first negotiate
+cell.
+
+Unfortunately, there is a known bug as a consequence of this optimization. If
+your machine depends on repeated shutdown and restart of the same machine
+number on the same circuit, please see [Bug
+30922](https://bugs.torproject.org/30992). Depending on your use case, we may
+need to fix that bug or help you find a workaround. See also [Section
+6.1.3](#613-deallocation-and-shutdown) for some more technical details on this
+mechanism.
+
+
+## 3. Specifying Padding Machines
+
+By now, you should understand how to register, negotiate, and control the
+lifetime of your padding machine, but you still don't know how to make it do
+anything yet. This section should help you understand how to specify how your
+machine reacts to events and adds padding to the wire.
+
+If you prefer to learn by example first instead, you may wish to skip to
+[Section 5](#5-example-padding-machines).
+
+
+A padding machine is specified by filling in an instance of
+[circpad_machine_spec_t](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L605). Instances
+of this structure specify the precise functionality of a machine: it's
+what the circuit padding developer is called to write. These instances
+are created only at startup, and are referenced via `const` pointers during
+normal operation.
+
+In this section we will go through the most important elements of this
+structure.
+
+### 3.1. Padding Machine States
+
+A padding machine is a finite state machine where each state
+specifies a different style of padding.
+
+As an example of a simple padding machine, you could have a state machine
+with the following states: `[START] -> [SETUP] -> [HTTP] -> [END]` where the
+`[SETUP]` state pads in a way that obfuscates the ''circuit setup'' of Tor,
+and the `[HTTP]` state pads in a way that emulates a simple HTTP session. Of
+course, padding machines can be more complicated than that, with dozens of
+states and non-trivial transitions.
+
+Padding developers encode the machine states in the
+[circpad_machine_spec_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L655). Each
+machine state is described by a
+[circpad_state_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L273)
+and each such structure specifies the style and amount of padding to be sent,
+as well as the possible state transitions.
+
+The function `circpad_machine_states_init()` must be used for allocating and
+initializing the `circpad_machine_spec_t.states` array before states and
+state transitions can be defined, as some of the state object has non-zero
+default values.
+
+### 3.2. Padding Machine State Transitions
+
+As described above, padding machines can have multiple states, to
+support different forms of padding. Machines can transition between
+states based on events that occur either on the circuit level or on
+the machine level.
+
+State transitions are specified using the
+[next_state field](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L381)
+of the `circpad_state_t` structure. As a simple example, to transition
+from state `A` to state `B` when event `E` occurs, you would use the
+following code: `A.next_state[E] = B`.
+
+#### 3.2.1. State Transition Events
+
+Here we will go through
+[the various events](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L30)
+that can be used to transition between states:
+
+* Circuit-level events
+ * `CIRCPAD_EVENT_NONPADDING_RECV`: A non-padding cell is received
+ * `CIRCPAD_EVENT_NONPADDING_SENT`: A non-adding cell is sent
+ * `CIRCPAD_EVENT_PADDING_SENT`: A padding cell is sent
+ * `CIRCPAD_EVENT_PADDING_RECV`: A padding cell is received
+* Machine-level events
+ * `CIRCPAD_EVENT_INFINITY`: Tried to schedule padding using the ''infinity bin''.
+ * `CIRCPAD_EVENT_BINS_EMPTY`: All histogram bins are empty (out of tokens)
+ * `CIRCPAD_EVENT_LENGTH_COUNT`: State has used all its padding capacity (see `length_dist` below)
+
+### 3.3. Specifying Per-State Padding
+
+Each state of a padding machine specifies either:
+ * A padding histogram describing inter-transmission delays between cells;
+d OR
+ * A parameterized delay probability distribution for inter-transmission
+ delays between cells.
+
+Either mechanism specifies essentially the *minimum inter-transmission time*
+distribution. If non-padding traffic does not get transmitted from this
+endpoint before the delay value sampled from this distribution expires, a
+padding packet is sent.
+
+The choice between histograms and probability distributions can be subtle. A
+rule of thumb is that probability distributions are easy to specify and
+consume very little memory, but might not be able to describe certain types
+of complex padding logic. Histograms, in contrast, can support precise
+packet-count oriented or multimodal delay schemes, and can use token removal
+logic to reduce overhead and shape the total padding+non-padding inter-packet
+delay distribution towards an overall target distribution.
+
+We suggest that you start with a probability distribution if possible, and
+you move to a histogram-based approach only if a probability distribution
+does not suit your needs.
+
+#### 3.3.1. Padding Probability Distributions
+
+The easiest, most compact way to schedule padding using a machine state is to
+use a probability distribution that specifies the possible delays. That can
+be done
+[using the circpad_state_t fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L339)
+`iat_dist`, `dist_max_sample_usec` and `dist_added_shift_usec`.
+
+The Tor circuit padding framework
+[supports multiple types](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L214)
+of probability distributions, and the developer should use the
+[circpad_distribution_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L240)
+to specify them as well as the required parameters.
+
+#### 3.3.2. Padding Histograms
+
+A more advanced way to schedule padding is to use a ''padding
+histogram''. The main advantages of a histogram are that it allows you to
+specify distributions that are not easily parameterized in closed form, or
+require specific packet counts at particular time intervals. Histograms also
+allow you to make use of an optional traffic minimization and shaping
+optimization called *token removal*, which is central to the original
+[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
+concept.
+
+If a histogram is used by a state (as opposed to a fixed parameterized
+distribution), then the developer must use the
+[histogram-related fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L285)
+of the `circpad_state_t` structure.
+
+The width of a histogram bin specifies the range of inter-packet delay times,
+whereas its height specifies the amount of tokens in each bin. To sample a
+padding delay from a histogram, we first randomly pick a bin (weighted by the
+amount of tokens in each bin) and then sample a delay from within that bin by
+picking a uniformly random delay using the width of the bin as the range.
+
+Each histogram also has an ''infinity bin'' as its final bin. If the
+''infinity bin'' is chosen,
+we don't schedule any padding (i.e., we schedule padding with
+infinite delay). If the developer does not want infinite delay, they
+should not give any tokens to the ''infinity bin''.
+
+If a token removal strategy is specified (via the
+`circpad_state_t.token_removal` field), each time padding is sent using a
+histogram, the padding machine will remove a token from the appropriate
+histogram bin whenever this endpoint sends *either a padding packet or a
+non-padding packet*. The different removal strategies govern what to do when
+the bin corresponding to the current inter-packet delay is empty.
+
+Token removal is optional. It is useful if you want to do things like specify
+a burst should be at least N packets long, and you only want to add padding
+packets if there are not enough non-padding packets. The cost of doing token
+removal is additional memory allocations for making per-circuit copies of
+your histogram that can be modified.
+
+### 3.4. Specifying Precise Cell Counts
+
+Padding machines should be able to specify the exact amount of padding they
+send. For histogram-based machines this can be done using a specific amount
+of tokens, but another (and perhaps easier) way to do this, is to use the
+[length_dist field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L362)
+of the `circpad_state_t` structure.
+
+The `length_dist` field is basically a probability distribution similar to the
+padding probability distributions, which applies to a specific machine state
+and specifies the amount of padding we are willing to send during that state.
+This value gets sampled when we transition to that state (TODO document this
+in the code).
+
+### 3.5. Specifying Overhead Limits
+
+Separately from the length counts, it is possible to rate limit the overhead
+percentage of padding at both the global level across all machines, and on a
+per-machine basis.
+
+At the global level, the overhead percentage of all circuit padding machines
+as compared to total traffic can be limited through the Tor consensus
+parameter `circpad_global_max_padding_pct`. This overhead is defined as the
+percentage of padding cells *sent* out of the sum of non padding and padding
+cells *sent*, and is applied *only after* at least
+`circpad_global_allowed_cells` padding cells are sent by that relay or client
+(to allow for small bursts of pure padding on otherwise idle or freshly
+restarted relays). When both of these limits are hit by a relay or client, no
+further padding cells will be sent, until sufficient non-padding traffic is
+sent to cause the percentage of padding traffic to fall back below the
+threshold.
+
+Additionally, each individual padding machine can rate limit itself by
+filling in the fields `circpad_machine_spec_t.max_padding_percent` and
+`circpad_machine_spec_t.allowed_padding_count`, which behave identically to
+the consensus parameters, but only apply to that specific machine.
+
+## 4. Evaluating Padding Machines
+
+One of the goals of the circuit padding framework is to provide improved
+evaluation and scientific reproducibility for lower cost. This includes both
+the [choice](#13-computation-model) of the compact C structure representation
+(which has an easy-to-produce bitstring representation for optimization by
+gradient descent, GAs, or GANs), as well as rapid prototyping and evaluation.
+
+So far, whenever evaluation cost has been a barrier, each research group has
+developed their own ad-hoc packet-level simulators of various padding
+mechanisms for evaluating website fingerprinting attacks and defenses. The
+process typically involves doing a crawl of Alexa top sites over Tor, and
+recording the Tor cell count and timing information for each page in the
+trace. These traces are then fed to simulations of defenses, which output
+modified trace files.
+
+Because no standardized simulation and evaluation mechanism exists, it is
+often hard to tell if independent implementations of various attacks and
+defenses are in fact true-to-form or even properly calibrated for direct
+comparison, and discrepancies in results across the literature suggests
+this is not always so.
+
+Our preferred outcome with this framework is that machines are tuned
+and optimized on a tracing simulator, but that the final results come from
+an actual live network test of the defense. The traces from this final crawl
+should be preserved as artifacts to be run on the simulator and reproduced
+on the live network by future papers, ideally in journal venues that have an
+artifact preservation policy.
+
+### 4.1. Pure Simulation
+
+When doing initial tuning of padding machines, especially in adversarial
+settings, variations of a padding machine defense may need to be applied to
+network activity hundreds or even millions of times. The wall-clock time
+required to do this kind of tuning using live testing or even Shadow network
+emulation may often be prohibitive.
+
+To help address this, and to better standardize results, Tobias Pulls has
+implemented a [circpad machine trace simulator](https://github.com/pylls/circpad-sim),
+which uses Tor's unit test framework to simulate applying padding machines to
+circuit packet traces via a combination of Tor patches and python scripts. This
+simulator can be used to record traces from clients, Guards, Middles, Exits,
+and any other hop in the path, only for circuits that are run by the
+researcher. This makes it possible to safely record baseline traces and
+ultimately even mount passive attacks on the live network, without impacting
+or recording any normal user traffic.
+
+In this way, a live crawl of the Alexa top sites could be performed once, to
+produce a standard "undefended" corpus. Padding machines can be then quickly
+evaluated and tuned on these simulated traces in a standardized way, and then
+the results can then be [reproduced on the live Tor
+network](#44-Testing-on-the-Live-Network) with the machines running on your own relays.
+
+Please be mindful of the Limitations section of the simulator documentation,
+however, to ensure that you are aware of the edge cases and timing
+approximations that are introduced by this approach.
+
+### 4.2. Testing in Chutney
+
+The Tor Project provides a tool called
+[Chutney](https://github.com/torproject/chutney/) which makes it very easy to
+setup private Tor networks. While getting it work for the first time might
+take you some time of doc reading, the final result is well worth it for the
+following reasons:
+
+- You control all the relays and hence you have greater control and debugging
+ capabilities.
+- You control all the relays and hence you can toggle padding support on/off
+ at will.
+- You don't need to be cautious about overhead or damaging the real Tor
+ network during testing.
+- You don't even need to be online; you can do all your testing offline over
+ localhost.
+
+A final word of warning here is that since Chutney runs over localhost, the
+packet latencies and delays are completely different from the real Tor
+network, so if your padding machines rely on real network timings you will
+get different results on Chutney. You can work around this by using a
+different set of delays if Chutney is used, or by moving your padding
+machines to the real network when you want to do latency-related testing.
+
+### 4.3. Testing in Shadow
+
+[Shadow](https://shadow.github.io/) is an environment for running entire Tor
+network simulations, similar to Chutney, but designed to be both more memory
+efficient, as well as provide an accurate Tor network topology and latency
+model.
+
+While Shadow is significantly more memory efficient than Chutney, and can make
+use of extremely accurate Tor network capacity and latency models, it will not
+be as fast or efficient as the [circpad trace simulator](https://github.com/pylls/circpad-sim),
+if you need to do many many iterations of an experiment to tune your defense.
+
+### 4.4. Testing on the Live Network
+
+Live network testing is the gold standard for verifying that any attack or
+defense is behaving as expected, to minimize the influence of simplifying
+assumptions.
+
+However, it is not ethical, or necessarily possible, to run high-resolution
+traffic analysis attacks on the entire Tor network. But it is both ethical
+and possible to run small scale experiments that target only your own
+clients, who will only use your own Tor relays that support your new padding
+machines.
+
+We provide the `MiddleNodes` torrc directive to enable this, which will allow
+you to specify the fingerprints and/or IP netmasks of relays to be used in
+the second hop position. Options to restrict other hops also exist, if your
+padding system is padding to a different hop. The `HSLayer2Nodes` option
+overrides the `MiddleNodes` option for onion service circuits, if both are
+set. (The
+[vanguards addon](https://github.com/mikeperry-tor/vanguards/README_TECHNICAL.md)
+will set `HSLayer2Nodes`.)
+
+When you run your own clients, and use MiddleNodes to restrict your clients
+to use your relays, you can perform live network evaluations of a defense
+applied to whatever traffic crawl or activity your clients do.
+
+## 5. Example Padding Machines
+
+### 5.1. Deployed Circuit Setup Machines
+
+Tor currently has two padding machines enabled by default, which aim to hide
+certain features of the client-side onion service circuit setup protocol. For
+more details on their precise goal and function, please see
+[proposal 302](https://github.com/torproject/torspec/blob/master/proposals/302-padding-machines-for-onion-clients.txt)
+. In this section we will go over the code of those machines to clarify some
+of the engineering parts:
+
+#### 5.1.1. Overview
+
+The codebase of proposal 302 can be found in
+[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c)
+and specifies four padding machines:
+
+- The [client-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L60) circuit machine.
+- The [relay-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L146) circuit machine.
+- The [client-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L257) circuit machine
+- The [relay-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L374) circuit machine.
+
+Each of those machines has its own setup function, and
+[they are all initialized](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.c#L2718)
+by the circuit padding framework. To understand more about them, please
+carefully read the individual setup function for each machine which are
+fairly well documented. Each function goes through the following steps:
+- Machine initialization
+ - Give it a [name](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L70)
+ - Specify [which hop](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L73) the padding should go to
+ - Specify whether it should be [client-side](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L75) or relay-side.
+- Specify for [which types of circuits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L78) the machine should apply
+- Specify whether the circuits should be [kept alive](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c#L112) until the machine finishes padding.
+- Sets [padding limits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L116) to avoid too much overhead in case of bugs or errors.
+- Setup [machine states](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L120)
+ - Specify [state transitions](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L123).
+- Finally [register the machine](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L137) to the global machine list
+
+### 5.2. Adaptive Padding Early
+
+[Adaptive Padding Early](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/)
+is a variant of Adaptive Padding/WTF-PAD that does not use histograms or token
+removal to shift padding distributions, but instead uses fixed parameterized
+distributions to specify inter-packet timing thresholds for burst and gap
+inter-packet delays.
+
+Tobias Pulls's [QuickStart Guide](CircuitPaddingQuickStart.md) describes how
+to get this machine up and running, and has links to branches with a working
+implementation.
+
+### 5.3. Sketch of Tamaraw
+
+The [Tamaraw defense
+paper](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf) is the
+only defense to date that provides a proof of optimality for the finite-length
+website traffic fingerprinting domain. These bounds assume that a defense is
+able to perform a full, arbitrary transform of a trace that is under a fixed
+number of packets in length.
+
+The key insight to understand Tamaraw's optimality is that it achieves one
+such optimal transform by delaying traffic below a circuit's throughput. By
+doing this, it creates a queue that is rarely empty, allowing it to produce
+a provably optimal transform with minimal overhead. As [Section
+1.4](#14-other-deployment-constraints) explains, this queue
+cannot be maintained on the live Tor network without risk of out-of-memory
+conditions at relays.
+
+However, if the queue is not maintained in the Tor network, but instead by the
+application layer, it could be deployed by websites that opt in to using it.
+
+In this case, the application layer component would do *optional* constant
+rate shaping, negotiated between a web browser and a website. The Circuit
+Padding Framework can then easily fill in any missing gaps of cover traffic
+packets, and also ensure that only a fixed length number of packets are sent
+in total.
+
+However, for such a defense to be safe, additional care must be taken to
+ensure that the resulting traffic pattern still has a large
+anonymity/confusion set with other traces on the live network.
+
+Accomplishing this is an unsolved problem.
+
+### 5.4. Other Padding Machines
+
+Our partners in this project at RIT have produced a couple prototypes, based
+on their published research designs
+[REB and RBB](https://www.researchgate.net/publication/329743510_UNDERSTANDING_FEATURE_DISCOVERY_IN_WEBSITE_FINGERPRINTING_ATTACKS).
+
+As [their writeup
+explains](https://github.com/notem/tor-rbp-padding-machine-doc), because RBB
+uses delay, the circuit padding machine that they made is a no-delay version.
+
+They also ran into an issue with the 0-delay timing workaround for [bug
+31653](https://bugs.torproject.org/31653). Keep an eye on that bug for updates
+with improved workarounds/fixes.
+
+Their code is [available on github](https://github.com/notem/tor/tree/circuit_padding_rbp_machine).
+
+
+## 6. Framework Implementation Details
+
+If you need to add additional events, conditions, or other features to the
+circuit padding framework, then this section is for you.
+
+### 6.1. Memory Allocation Conventions
+
+If the existing circuit padding features are sufficient for your needs, then
+you do not need to worry about memory management or pointer lifespans. The
+circuit padding framework should take care of this for you automatically.
+
+However, if you need to add new padding machine features to support your
+padding machines, then it will be helpful to understand how circuits
+correspond to the global machine definitions, and how mutable padding machine
+memory is managed.
+
+#### 6.1.1. Circuits and Padding Machines
+
+In Tor, the
+[circuit_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuit_st.h)
+is the superclass structure for circuit-related state that is used on both
+clients and relays. On clients, the actual datatype of the object pointed to
+by `circuit_t *` is the subclass structure
+[origin_circuit_t](https://github.com/torproject/tor/blob/master/src/core/or/origin_circuit_st.h). The
+macros `CIRCUIT_IS_ORIGIN()` and `TO_ORIGIN_CIRCUIT()` are used to determine
+if a circuit is a client-side (origin) circuit and to cast the pointer safely
+to `origin_circuit_t *`.
+
+Because circuit padding machines can be present at both clients and relays,
+the circuit padding fields are stored in the `circuit_t *` superclass
+structure. Notice that there are actually two sets of circuit padding fields:
+a `const circpad_machine_spec_t *` array, and a `circpad_machine_runtime_t *`
+array. Each of these arrays holds at most two elements, as there can be at
+most two padding machines on each circuit.
+
+The `const circpad_machine_spec_t *` points to a globally allocated machine
+specification. These machine specifications are
+allocated and set up during Tor program startup, in `circpad_machines_init()`
+in
+[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). Because
+the machine specification object is shared by all circuits, it must not be
+modified or freed until program exit (by `circpad_machines_free()`). The
+`const` qualifier should enforce this at compile time.
+
+The `circpad_machine_runtime_t *` array member points to the mutable runtime
+information for machine specification at that same array index. This runtime
+structure keeps track of the current machine state, packet counts, and other
+information that must be updated as the machine operates. When a padding
+machine is successfully negotiated `circpad_setup_machine_on_circ()` allocates
+the associated runtime information.
+
+#### 6.1.2. Histogram Management
+
+If a `circpad_state_t` of a machine specifies a `token_removal` strategy
+other than `CIRCPAD_TOKEN_REMOVAL_NONE`, then every time
+there is a state transition into this state, `circpad_machine_setup_tokens()`
+will copy the read-only `circpad_state_t.histogram` array into a mutable
+version at `circpad_machine_runtime_t.histogram`. This mutable copy is used
+to decrement the histogram bin accounts as packets are sent, as per the
+specified token removal strategy.
+
+When the state machine transitions out of this state, the mutable histogram copy is freed
+by this same `circpad_machine_setup_tokens()` function.
+
+#### 6.1.3. Deallocation and Shutdown
+
+As an optimization, padding machines can be swapped in and out by the client
+without waiting a full round trip for the relay machine to shut down.
+
+Internally, this is accomplished by immediately freeing the heap-allocated
+`circuit_t.padding_info` field corresponding to that machine, but still preserving the
+`circuit_t.padding_machine` pointer to the global padding machine
+specification until the response comes back from the relay. Once the response
+comes back, that `circuit_t.padding_machine` pointer is set to NULL, if the
+response machine number matches the current machine present.
+
+Because of this partial shutdown condition, we have two macros for iterating
+over machines. `FOR_EACH_ACTIVE_CIRCUIT_MACHINE_BEGIN()` is used to iterate
+over machines that have both a `circuit_t.padding_info` slot and a
+`circuit_t.padding_machine` slot occupied. `FOR_EACH_CIRCUIT_MACHINE_BEGIN()`
+is used when we need to iterate over all machines that are either active or
+are simply waiting for a response to a shutdown request.
+
+If the machine is replaced instead of just shut down, then the client frees
+the `circuit_t.padding_info`, and then sets the `circuit_t.padding_machine`
+and `circuit_t.padding_info` fields for this next machine immediately. This is
+done in `circpad_add_matching_machines()`. In this case, since the new machine
+should have a different machine number, the shut down response from the relay
+is silently discarded, since it will not match the new machine number.
+
+If this sequence of machine teardown and spin-up happens rapidly enough for
+the same machine number (as opposed to different machines), then a race
+condition can happen. This is
+[known bug #30992](https://bugs.torproject.org/30992).
+
+When the relay side decides to shut down a machine, it sends a
+`RELAY_COMMAND_PADDING_NEGOTIATED` towards the client. If this cell matches the
+current machine number on the client, that machine is torn down, by freeing
+the `circuit_t.padding_info` slot and immediately setting
+`circuit_t.padding_machine` slot to NULL.
+
+Additionally, if Tor decides to close a circuit forcibly due to error before
+the padding machine is shut down, then `circuit_t.padding_info` is still
+properly freed by the call to `circpad_circuit_free_all_machineinfos()`
+in `circuit_free_()`.
+
+### 6.2. Machine Application Events
+
+The framework checks client-side origin circuits to see if padding machines
+should be activated or terminated during specific event callbacks in
+`circuitpadding.c`. We list these event callbacks here only for reference. You
+should not modify any of these callbacks to get your machine to run; instead,
+you should use the `circpad_machine_spec_t.conditions` field.
+
+However, you may add new event callbacks if you need other activation events,
+for example to provide obfuscation-layer or application-layer signaling. Any
+new event callbacks should behave exactly like the existing callbacks.
+
+During each of these event callbacks, the framework checks to see if any
+current running padding machines have conditions that no longer apply as a
+result of the event, and shuts those machines down. Then, it checks to see if
+any new padding machines should be activated as a result of the event, based
+on their circuit application conditions. **Remember: Machines are checked in
+reverse order in the machine list. This means that later, more recently added
+machines take precedence over older, earlier entries in each list.**
+
+Both of these checks are performed using the machine application conditions
+that you specify in your machine's `circpad_machine_spec_t.conditions` field.
+
+The machine application event callbacks are prefixed by `circpad_machine_event_` by convention in circuitpadding.c. As of this writing, these callbacks are:
+
+ - `circpad_machine_event_circ_added_hop()`: Called whenever a new hop is
+ added to a circuit.
+ - `circpad_machine_event_circ_built()`: Called when a circuit has completed
+ construction and is
+ opened. <!-- open != ready for traffic. Which do we mean? -nickm -->
+ - `circpad_machine_event_circ_purpose_changed()`: Called when a circuit
+ changes purpose.
+ - `circpad_machine_event_circ_has_no_relay_early()`: Called when a circuit
+ runs out of `RELAY_EARLY` cells.
+ - `circpad_machine_event_circ_has_streams()`: Called when a circuit gets a
+ stream attached.
+ - `circpad_machine_event_circ_has_no_streams()`: Called when the last
+ stream is detached from a circuit.
+
+### 6.3. Internal Machine Events
+
+To provide for some additional capabilities beyond simple finite state machine
+behavior, the circuit padding machines also have internal events that they
+emit to themselves when packet count length limits are hit, when the Infinity
+bin is sampled, and when the histogram bins are emptied of all tokens.
+
+These events are implemented as `circpad_internal_event_*` functions in
+`circuitpadding.c`, which are called from various areas that determine when
+the events should occur.
+
+While the conditions that trigger these internal events to be called may be
+complex, they are processed by the state machine definitions in a nearly
+identical manner as the cell processing events, with the exception that they
+are sent to the current machine only, rather than all machines on the circuit.
+
+
+## 7. Future Features and Optimizations
+
+While implementing the circuit padding framework, our goal was to deploy a
+system that obscured client-side onion service circuit setup and supported
+deployment of WTF-PAD and/or APE. Along the way, we noticed several features
+that might prove useful to others, but were not essential to implement
+immediately. We do not have immediate plans to implement these ideas, but we
+would gladly accept patches that do so.
+
+The following list gives an overview of these improvements, but as this
+document ages, it may become stale. The canonical list of improvements that
+researchers may find useful is tagged in our bugtracker with
+[circpad-researchers](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers),
+and the list of improvements that are known to be necessary for some research
+areas are tagged with
+[circpad-researchers-want](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers-want).
+
+Please consult those lists for the latest status of these issues. Note that
+not all fixes will be backported to all Tor versions, so be mindful of which
+Tor releases receive which fixes as you conduct your experiments.
+
+### 7.1. Load Balancing and Flow Control
+
+Fortunately, non-Exit bandwidth is already plentiful and exceeds the Exit
+capacity, and we anticipate that if we inform our relay operator community of
+the need for non-Exit bandwidth to satisfy padding overhead requirements,
+they will be able to provide that with relative ease.
+
+Unfortunately, padding machines that have large quantities of overhead will
+require changes to our load balancing system to account for this
+overhead. The necessary changes are documented in
+[Proposal 265](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt).
+
+Additionally, padding cells are not currently subjected to flow control. For
+high amounts of padding, we may want to change this. See [ticket
+31782](https://bugs.torproject.org/31782) for details.
+
+### 7.2. Timing and Queuing Optimizations
+
+The circuitpadding framework has some timing related issues that may impact
+results. If high-resolution timestamps are fed to opaque deep learning
+trainers, those training models may end up able to differentiate padding
+traffic from non-padding traffic due to these timing bugs.
+
+The circuit padding cell event callbacks come from post-decryption points in
+the cell processing codepath, and from the pre-queue points in the cell send
+codepath. This means that our cell events will not reflect the actual time
+when packets are read or sent on the wire. This is much worse in the send
+direction, as the circuitmux queue, channel outbuf, and kernel TCP buffer will
+impose significant additional delay between when we currently report that a
+packet was sent, and when it actually hits the wire.
+
+[Ticket 29494](https://bugs.torproject.org/29494) has a more detailed
+description of this problem, and an experimental branch that changes the cell
+event callback locations to be from circuitmux post-queue, which with KIST,
+should be an accurate reflection of when they are actually sent on the wire.
+
+If your padding machine and problem space depends on very accurate notions of
+relay-side packet timing, please try that branch and let us know on the
+ticket if you need any further assistance fixing it up.
+
+Additionally, with these changes, it will be possible to provide further
+overhead reducing optimizations by letting machines specify flags to indicate
+that padding should not be sent if there are any cells pending in the cell
+queue, for doing things like extending cell bursts more accurately and with
+less overhead.
+
+However, even if we solve the queuing issues, Tor's current timers are not as
+precise as some padding designs may require. We will still have issues of
+timing precision to solve. [Ticket 31653](https://bugs.torproject.org/31653)
+describes an issue the circuit padding system has with sending 0-delay padding
+cells, and [ticket 32670](https://bugs.torproject.org/32670) describes a
+libevent timer accuracy issue, which causes callbacks to vary up to 10ms from
+their scheduled time, even in absence of load.
+
+All of these issues strongly suggest that you either truncate the resolution
+of any timers you feed to your classifier, or that you omit timestamps
+entirely from the classification problem until these issues are addressed.
+
+### 7.3. Better Machine Negotiation
+
+Circuit padding is applied to circuits through machine conditions.
+
+The following machine conditions may be useful for some use cases, but have
+not been implemented yet:
+ * [Exit Policy-based Stream Conditions](https://bugs.torproject.org/29083)
+ * [Probability to apply machine/Cointoss condition](https://bugs.torproject.org/30092)
+ * [Probability distributions for launching new padding circuit(s)](https://bugs.torproject.org/31783)
+ * [More flexible purpose handling](https://bugs.torproject.org/32040)
+
+Additionally, the following features may help to obscure that padding is being
+negotiated, and/or streamline that negotiation process:
+ * [Always send negotiation cell on all circuits](https://bugs.torproject.org/30172)
+ * [Better shutdown handling](https://bugs.torproject.org/30992)
+ * [Preference-ordered negotiation menu](https://bugs.torproject.org/30348)
+
+### 7.4. Probabilistic State Transitions
+
+Right now, the state machine transitions are fully deterministic. However,
+one could imagine a state machine that uses probabilistic transitions between
+states to simulate a random walk or Hidden Markov Model traversal across
+several pages.
+
+The simplest way to implement this is to make the `circpad_state_t.next_state` array
+into an array of structs that have a next state field, and a probability to
+transition to that state.
+
+If you need this feature, please see [ticket
+31787](https://bugs.torproject.org/31787) for more details.
+
+### 7.5. More Complex Pattern Recognition
+
+State machines are extremely efficient sequence recognition devices. But they
+are not great pattern recognition devices. This is one of the reasons why
+[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
+used state machines in combination with histograms, to model the target
+distribution of interpacket delays for transmitted packets.
+
+However, there currently is no such optimization for reaction to patterns of
+*received* traffic. There may also be cases where defenses must react to more
+complex patterns of sent traffic than can be expressed by our current
+histogram and length count events.
+
+For example: if you wish your machine to react to a certain count of incoming
+cells in a row, right now you have to have a state for each cell, and use the
+infinity bin to timeout of the sequence in each state. We could make this more
+compact if each state had an arrival cell counter and inter-cell timeout. Or
+we could make more complex mechanisms to recognize certain patterns of arrival
+traffic in a state.
+
+The best way to build recognition primitives like this into the framework is
+to add additional [Internal Machine Events](#63-internal-machine-events) for
+the pattern in question.
+
+As another simple example, a useful additional event might be to transition
+whenever any of your histogram bins are empty, rather than all of them. To do
+this, you would add `CIRCPAD_EVENT_ANY_BIN_EMPTY` to the enum
+`circpad_event_t` in `circuitpadding.h`. You would then create a function
+`circuitpadding_internal_event_any_bin_empty()`, which would work just like
+`circuitpadding_internal_event_bin_empty()`, and also be called from
+`check_machine_token_supply()` in `circuitpadding.c` but with the check for
+each bin being zero instead of the total. With this change, new machines could
+react to this new event in the same way as any other.
+
+If you have any other ideas that may be useful, please comment on [ticket
+32680](https://bugs.torproject.org/32680).
+
+
+## 8. Open Research Problems
+
+### 8.1. Onion Service Circuit Setup
+
+Our circuit setup padding does not address timing-based features, only
+packet counts. Deep learning can probably see this.
+
+However, before going too deep down the timing rabithole, we may need to make
+[some improvements to Tor](#72-timing-and-queuing-optimizations). Please
+comment on those tickets if you need this.
+
+### 8.2. Onion Service Fingerprinting
+
+We have done nothing to obscure the service side of onion service circuit
+setup. Because service-side onion services will have the reverse traffic byte
+counts as normal clients, they will likely need some kind of [hybrid
+application layer traffic shaping](#53-sketch-of-tamaraw), in addition to
+simple circuit setup obfuscation.
+
+Fingerprinting in
+[combination](https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md)
+with
+[vanguards](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md)
+ia also an open issue.
+
+### 8.3. Open World Fingerprinting
+
+Similarly, Open World/clearweb website fingerprinting defenses remain
+an unsolved problem from the practicality point of view. The original WTF-PAD
+defense was never tuned, and it is showing accuracy issues against deep
+learning attacks.
+
+### 8.4. Protocol Usage Fingerprinting
+
+Traffic Fingerprinting to determine the protocol in use by a client has not
+been studied, either from the attack or defense point of view.
+
+### 8.5. Datagram Transport Side Channels
+
+Padding can reduce the accuracy of dropped-cell side channels in such
+transports, but we don't know [how to measure
+this](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
+
+## 9. Must Read Papers
+
+These are by far the most important papers in the space, to date:
+
+ - [Tamaraw](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
+ - [Bayes, Not Naive](https://www.petsymposium.org/2017/papers/issue4/paper50-2017-4-source.pdf)
+ - [Anonymity Trilemma](https://eprint.iacr.org/2017/954.pdf)
+ - [WTF-PAD](http://arxiv.org/pdf/1512.00524)
+
+Except for WTF-PAD, these papers were selected because they point towards
+optimality bounds that can be benchmarked against.
+
+We cite them even though we are skeptical that provably optimal defenses can
+be constructed, at least not without trivial or impractical transforms (such as
+those that can be created with unbounded queue capacity, or stored knowledge
+of traces for every possible HTTP trace on the Internet).
+
+We also are not demanding an optimality or security proof for every defense.
+
+Instead, we cite the above as benchmarks. We believe the space, especially the
+open-world case, to be more akin to an optimization problem, where a
+WTF-PAD-like defense must be tuned through an optimizer to produce results
+comparable to provably optimal but practically unrealizable defenses, through
+rigorous adversarial evaluation.
diff --git a/doc/HACKING/CircuitPaddingQuickStart.md b/doc/HACKING/CircuitPaddingQuickStart.md
new file mode 100644
index 0000000000..167ff9f292
--- /dev/null
+++ b/doc/HACKING/CircuitPaddingQuickStart.md
@@ -0,0 +1,263 @@
+# A Padding Machine from Scratch
+
+A quickstart guide by Tobias Pulls.
+
+This document describes the process of building a "padding machine" in tor's new
+circuit padding framework from scratch. Notes were taken as part of porting
+[Adaptive Padding Early
+(APE)](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) from basket2 to the
+circuit padding framework. The goal is just to document the process and provide
+useful pointers along the way, not create a useful machine.
+
+The quick and dirty plan is to:
+1. clone and compile tor
+2. use newly built tor in TB and at small (non-exit) relay we run
+3. add a bare-bones APE padding machine
+4. run the machine, inspect logs for activity
+5. port APE's state machine without thinking much about parameters
+
+## Clone and compile tor
+
+```bash
+git clone https://git.torproject.org/tor.git
+cd tor
+git checkout tor-0.4.1.5
+```
+Above we use the tag for tor-0.4.1.5 where the circuit padding framework was
+released. Note that this version of the framework is missing many features and
+fixes that have since been merged to origin/master. If you need the newest
+framework features, you should use that master instead.
+
+```bash
+sh autogen.sh
+./configure
+make
+```
+When you run `./configure` you'll be told of missing dependencies and packages
+to install on debian-based distributions. Important: if you plan to run `tor` on
+a relay as part of the real Tor network and your server runs a distribution that
+uses systemd, then I'd recommend that you `apt install dpkg dpkg-dev
+libevent-dev libssl-dev asciidoc quilt dh-apparmor libseccomp-dev dh-systemd
+libsystemd-dev pkg-config dh-autoreconf libfakeroot zlib1g zlib1g-dev automake
+liblzma-dev libzstd-dev` and ensure that tor has systemd support enabled:
+`./configure --enable-systemd`. Without this, on a recent Ubuntu, my tor service
+was forcefully restarted (SIGINT interrupt) by systemd every five minutes.
+
+If you want to install on your localsystem, run `make install`. For our case we
+just want the tor binary at `src/app/tor`.
+
+## Use tor in TB and at a relay
+Download and install a fresh Tor Browser (TB) from torproject.org. Make sure it
+works. From the command line, relative to the folder created when you extracted
+TB, run `./Browser/start-tor-browser --verbose` to get some basic log output.
+Note the version of tor, in my case, `Tor 0.4.0.5 (git-bf071e34aa26e096)` as
+part of TB 8.5.4. Shut down TB, copy the `tor` binary that you compiled earlier
+and replace `Browser/TorBrowser/Tor/tor`. Start TB from the command line again,
+you should see a different version, in my case `Tor 0.4.1.5
+(git-439ca48989ece545)`.
+
+The relay we run is also on linux, and `tor` is located at `/usr/bin/tor`. To
+view relevant logs since last boot `sudo journalctl -b /usr/bin/tor`, where we
+find `Tor 0.4.0.5 running on Linux`. Copy the locally compiled `tor` to the
+relay at a temporary location and then make sure it's ownership and access
+rights are identical to `/usr/bin/tor`. Next, shut down the running tor service
+with `sudo service tor stop`, wait for it to stop (typically 30s), copy our
+locally compiled tor to replace `/usr/bin/tor` then start the service again.
+Checking the logs we see `or 0.4.1.5 (git-439ca48989ece545)`.
+
+Repeatedly shutting down a relay is detrimental to the network and should be
+avoided. Sorry about that.
+
+We have one more step left before we move on the machine: configure TB to always
+use our middle relay. Edit `Browser/TorBrowser/Data/Tor/torrc` and set
+`MiddleNodes <fingerprint>`, where `<fingerprint>` is the fingerprint of the
+relay. Start TB, visit a website, and manually confirm that the middle is used
+by looking at the circuit display.
+
+## Add a bare-bones APE padding machine
+Now the fun part. We have several resources at our disposal (mind that links
+might be broken in the future, just search for the headings):
+- The official [Circuit Padding Developer
+ Documentation](https://storm.torproject.org/shared/ChieH_sLU93313A2gopZYT3x2waJ41hz5Hn2uG1Uuh7).
+- Notes we made on the [implementation of the circuit padding
+ framework](https://github.com/pylls/padding-machines-for-tor/blob/master/notes/circuit-padding-framework.md).
+- The implementation of the current circuit padding machines in tor:
+ [circuitpadding.c](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.c)
+ and
+ [circuitpadding_machines.h](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.h).
+
+Please consult the above links for details. Moving forward, the focus is to
+describe what was done, not necessarily explaining all the details why.
+
+Since we plan to make changes to tor, create a new branch `git checkout -b
+circuit-padding-ape-machine tor-0.4.1.5`.
+
+We start with declaring two functions, one for the machine at the client and one
+at the relay, in `circuitpadding_machines.h`:
+
+```c
+void circpad_machine_relay_wf_ape(smartlist_t *machines_sl);
+void circpad_machine_client_wf_ape(smartlist_t *machines_sl);
+```
+
+The definitions go into `circuitpadding_machines.c`:
+
+```c
+/**************** Adaptive Padding Early (APE) machine ****************/
+
+/**
+ * Create a relay-side padding machine based on the APE design.
+ */
+void
+circpad_machine_relay_wf_ape(smartlist_t *machines_sl)
+{
+ circpad_machine_spec_t *relay_machine
+ = tor_malloc_zero(sizeof(circpad_machine_spec_t));
+
+ relay_machine->name = "relay_wf_ape";
+ relay_machine->is_origin_side = 0; // relay-side
+
+ // Pad to/from the middle relay, only when the circuit has streams
+ relay_machine->target_hopnum = 2;
+ relay_machine->conditions.min_hops = 2;
+ relay_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
+
+ // limits to help guard against excessive padding
+ relay_machine->allowed_padding_count = 1;
+ relay_machine->max_padding_percent = 1;
+
+ // one state to start with: START (-> END, never takes a slot in states)
+ circpad_machine_states_init(relay_machine, 1);
+ relay_machine->states[CIRCPAD_STATE_START].
+ next_state[CIRCPAD_EVENT_NONPADDING_SENT] =
+ CIRCPAD_STATE_END;
+
+ // register the machine
+ relay_machine->machine_num = smartlist_len(machines_sl);
+ circpad_register_padding_machine(relay_machine, machines_sl);
+
+ log_info(LD_CIRC,
+ "Registered relay WF APE padding machine (%u)",
+ relay_machine->machine_num);
+}
+
+/**
+ * Create a client-side padding machine based on the APE design.
+ */
+void
+circpad_machine_client_wf_ape(smartlist_t *machines_sl)
+{
+ circpad_machine_spec_t *client_machine
+ = tor_malloc_zero(sizeof(circpad_machine_spec_t));
+
+ client_machine->name = "client_wf_ape";
+ client_machine->is_origin_side = 1; // client-side
+
+ /** Pad to/from the middle relay, only when the circuit has streams, and only
+ * for general purpose circuits (typical for web browsing)
+ */
+ client_machine->target_hopnum = 2;
+ client_machine->conditions.min_hops = 2;
+ client_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
+ client_machine->conditions.purpose_mask =
+ circpad_circ_purpose_to_mask(CIRCUIT_PURPOSE_C_GENERAL);
+
+ // limits to help guard against excessive padding
+ client_machine->allowed_padding_count = 1;
+ client_machine->max_padding_percent = 1;
+
+ // one state to start with: START (-> END, never takes a slot in states)
+ circpad_machine_states_init(client_machine, 1);
+ client_machine->states[CIRCPAD_STATE_START].
+ next_state[CIRCPAD_EVENT_NONPADDING_SENT] =
+ CIRCPAD_STATE_END;
+
+ client_machine->machine_num = smartlist_len(machines_sl);
+ circpad_register_padding_machine(client_machine, machines_sl);
+ log_info(LD_CIRC,
+ "Registered client WF APE padding machine (%u)",
+ client_machine->machine_num);
+}
+```
+
+We also have to modify `circpad_machines_init()` in `circuitpadding.c` to
+register our machines:
+
+```c
+ /* Register machines for the APE WF defense */
+ circpad_machine_client_wf_ape(origin_padding_machines);
+ circpad_machine_relay_wf_ape(relay_padding_machines);
+```
+
+We run `make` to get a new `tor` binary and copy it to our local TB.
+
+## Run the machine
+To be able
+to view circuit info events in the console as we launch TB, we add `Log
+[circ]info notice stdout` to `torrc` of TB.
+
+Running TB to visit example.com we first find in the log:
+
+```
+Aug 30 18:36:43.000 [info] circpad_machine_client_hide_intro_circuits(): Registered client intro point hiding padding machine (0)
+Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_intro_circuits(): Registered relay intro circuit hiding padding machine (0)
+Aug 30 18:36:43.000 [info] circpad_machine_client_hide_rend_circuits(): Registered client rendezvous circuit hiding padding machine (1)
+Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_rend_circuits(): Registered relay rendezvous circuit hiding padding machine (1)
+Aug 30 18:36:43.000 [info] circpad_machine_client_wf_ape(): Registered client WF APE padding machine (2)
+Aug 30 18:36:43.000 [info] circpad_machine_relay_wf_ape(): Registered relay WF APE padding machine (2)
+```
+
+All good, our machine is running. Looking further we find:
+
+```
+Aug 30 18:36:55.000 [info] circpad_setup_machine_on_circ(): Registering machine client_wf_ape to origin circ 2 (5)
+Aug 30 18:36:55.000 [info] circpad_node_supports_padding(): Checking padding: supported
+Aug 30 18:36:55.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 2 (5), command 2
+Aug 30 18:36:55.000 [info] circpad_machine_spec_transition(): Circuit 2 circpad machine 0 transitioning from 0 to 65535
+Aug 30 18:36:55.000 [info] circpad_machine_spec_transitioned_to_end(): Padding machine in end state on circuit 2 (5)
+Aug 30 18:36:55.000 [info] circpad_circuit_machineinfo_free_idx(): Freeing padding info idx 0 on circuit 2 (5)
+Aug 30 18:36:55.000 [info] circpad_handle_padding_negotiated(): Middle node did not accept our padding request on circuit 2 (5)
+```
+We see that our middle support padding (since we upgraded to tor-0.4.1.5), that
+we attempt to negotiate, our machine starts on the client, transitions to the
+end state, and is freed. The last line shows that the middle doesn't have a
+padding machine that can run.
+
+Next, we follow the same steps as earlier and replace the modified `tor` at our
+middle relay. We don't update the logging there to avoid logging on the info
+level on the live network. Looking at the client log again we see that
+negotiation works as before except for the last line: it's missing, so the
+machine is running at the middle as well.
+
+## Implementing the APE state machine
+
+Porting is fairly straightforward: define the states for all machines, add two
+more machines (for the receive portion of WTFP-PAD, beyond AP), and pick
+reasonable parameters for the distributions (I completely winged it now, as when
+implementing APE). The [circuit-padding-ape-machine
+branch](https://github.com/pylls/tor/tree/circuit-padding-ape-machine) contains
+the commits for the full machines with plenty of comments.
+
+Some comments on the process:
+
+- `tor-0.4.1.5` does not support two machines on the same circuit, the following
+ fix has to be made: https://trac.torproject.org/projects/tor/ticket/31111 .
+ The good news is that everything else seems to work after the small change in
+ the fix.
+- APE randomizes its distributions. Currently, this can only be done during
+ start of `tor`. This makes sense in the censorship circumvention setting
+ (`obfs4`), less so for WF defenses: further randomizing each circuit is likely
+ a PITA for attackers with few downsides.
+- it was annoying to figure out that the lack of systemd support in my compiled
+ tor caused systemd to interrupt (SIGINT) my tor process at the middle relay
+ every five minutes. Updated build steps above to hopefully save others the
+ pain.
+- there's for sure some bug on relays when sending padding cells too early (?).
+ It can happen with some probability with the APE implementation due to
+ `circpad_machine_relay_wf_ape_send()`. Will investigate next.
+- Moving the registration of machines from the definition of the machines to
+ `circpad_machines_init()` makes sense, as suggested in the circuit padding doc
+ draft.
+
+Remember that APE is just a proof-of-concept and we make zero claims about its
+ability to withstand WF attacks, in particular those based on deep learning.