10 files changed, 1753 insertions, 132 deletions
diff --git a/doc/HACKING/CircuitPaddingDevelopment.md b/doc/HACKING/CircuitPaddingDevelopment.md
new file mode 100644
index 0000000000..a4e65697b8
--- /dev/null
+++ b/doc/HACKING/CircuitPaddingDevelopment.md
@@ -0,0 +1,1234 @@
+# Circuit Padding Developer Documentation
+
+This document is written for researchers who wish to prototype and evaluate circuit-level padding defenses in Tor.
+
+Written by Mike Perry and George Kadianakis.
+
+# Table of Contents
+
+- [0. Background](#0-background)
+- [1. Introduction](#1-introduction)
+    - [1.1. System Overview](#11-system-overview)
+    - [1.2. Layering Model](#12-layering-model)
+    - [1.3. Computation Model](#13-computation-model)
+    - [1.4. Deployment Constraints](#14-other-deployment-constraints)
+- [2. Creating New Padding Machines](#2-creating-new-padding-machines)
+    - [2.1. Registering a New Padding Machine](#21-registering-a-new-padding-machine)
+    - [2.2. Machine Activation and Shutdown](#22-machine-activation-and-shutdown)
+- [3. Specifying Padding Machines](#3-specifying-padding-machines)
+    - [3.1. Padding Machine States](#31-padding-machine-states)
+    - [3.2. Padding Machine State Transitions](#32-padding-machine-state-transitions)
+    - [3.3. Specifying Per-State Padding](#33-specifying-per-state-padding)
+    - [3.4. Specifying Precise Cell Counts](#34-specifying-precise-cell-counts)
+    - [3.5. Specifying Overhead Limits](#35-specifying-overhead-limits)
+- [4. Evaluating Padding Machines](#4-evaluating-padding-machines)
+    - [4.1. Pure Simulation](#41-pure-simulation)
+    - [4.2. Testing in Chutney](#42-testing-in-chutney)
+    - [4.3. Testing in Shadow](#43-testing-in-shadow)
+    - [4.4. Testing on the Live Network](#44-testing-on-the-live-network)
+- [5. Example Padding Machines](#5-example-padding-machines)
+    - [5.1. Deployed Circuit Setup Machines](#51-deployed-circuit-setup-machines)
+    - [5.2. Adaptive Padding Early](#52-adaptive-padding-early)
+    - [5.3. Sketch of Tamaraw](#53-sketch-of-tamaraw)
+    - [5.4. Other Padding Machines](#54-other-padding-machines)
+- [6. Framework Implementation Details](#6-framework-implementation-details)
+    - [6.1. Memory Allocation Conventions](#61-memory-allocation-conventions)
+    - [6.2. Machine Application Events](#62-machine-application-events)
+    - [6.3. Internal Machine Events](#63-internal-machine-events)
+- [7. Future Features and Optimizations](#7-future-features-and-optimizations)
+    - [7.1. Load Balancing and Flow Control](#71-load-balancing-and-flow-control)
+    - [7.2. Timing and Queuing Optimizations](#72-timing-and-queuing-optimizations)
+    - [7.3. Better Machine Negotiation](#73-better-machine-negotiation)
+    - [7.4. Probabilistic State Transitions](#74-probabilistic-state-transitions)
+    - [7.5. More Complex Pattern Recognition](#75-more-complex-pattern-recognition)
+- [8. Open Research Problems](#8-open-research-problems)
+    - [8.1. Onion Service Circuit Setup](#81-onion-service-circuit-setup)
+    - [8.2. Onion Service Fingerprinting](#82-onion-service-fingerprinting)
+    - [8.3. Open World Fingerprinting](#83-open-world-fingerprinting)
+    - [8.4. Protocol Usage Fingerprinting](#84-protocol-usage-fingerprinting)
+    - [8.5. Datagram Transport Side Channels](#85-datagram-transport-side-channels)
+- [9. Must Read Papers](#9-must-read-papers)
+
+
+## 0. Background
+
+Tor supports both connection-level and circuit-level padding, and both
+systems are live on the network today. The connection-level padding behavior
+is described in [section 2 of
+padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47). The
+circuit-level padding behavior is described in [section 3 of
+padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L282).
+
+These two systems are orthogonal and should not be confused. The
+connection-level padding system is only active while the TLS connection is
+otherwise idle. Moreover, it regards circuit-level padding as normal data
+traffic, and hence while the circuit-level padding system is actively padding,
+the connection-level padding system will not add any additional overhead.
+
+While the currently deployed circuit-level padding behavior is quite simple,
+it is built on a flexible framework. This framework supports the description
+of event-driven finite state machine by filling in fields of a simple C
+structure, and is designed to support any delay-free statistically shaped
+cover traffic on individual circuits, with cover traffic flowing to and from a
+node of the implementor's choice (Guard, Middle, Exit, Rendezvous, etc).
+
+This class of system was first proposed in
+[Timing analysis in low-latency mix networks: attacks and defenses](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
+by Shmatikov and Wang, and extended for the website traffic fingerprinting
+domain by Juarez et al. in
+[Toward an Efficient Website Fingerprinting Defense](http://arxiv.org/pdf/1512.00524). The
+framework also supports fixed parameterized probability distributions, as
+used in [APE](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) by Tobias
+Pulls, and many other features.
+
+This document describes how to use Tor's circuit padding framework to
+implement and deploy novel delay-free cover traffic defenses.
+
+## 1. Introduction
+
+The circuit padding framework is the official way to implement padding
+defenses in Tor. It may be used in combination with application-layer
+defenses, and/or obfuscation defenses, or on its own.
+
+Its current design should be enough to deploy most defenses without
+modification, but you can extend it to [provide new
+features](#7-future-features-and-optimizations) as well.
+
+### 1.1. System Overview
+
+Circuit-level padding can occur between Tor clients and relays at any hop of
+one of the client's circuits. Both parties need to support the same padding
+mechanisms for the system to work, and the client must enable it.
+
+We added a padding negotiation relay cell to the Tor protocol that clients use
+to ask a relay to start padding, as well as a torrc directive for researchers
+to pin their clients' relay selection to the subset of Tor nodes that
+implement their custom defenses, to support ethical live network testing and
+evaluation.
+
+Circuit-level padding is performed by 'padding machines'. A padding machine is
+a finite state machine. Every state specifies a different form of
+padding style, or stage of padding, in terms of inter-packet timings and total
+packet counts.
+
+Padding state machines are specified by filling in fields of a C structure,
+which specifies the transitions between padding states based on various events,
+probability distributions of inter-packet delays, and the conditions under
+which padding machines should be applied to circuits.
+
+This compact C structure representation is designed to function as a
+microlanguage, which can be compiled down into a
+bitstring that [can be tuned](#13-computation-model) using various
+optimization methods (such as gradient descent, GAs, or GANs), either in
+bitstring form or C struct form.
+
+The event driven, self-contained nature of this framework is also designed to
+make [evaluation](#4-evaluating-padding-machines) both expedient and rigorously
+reproducible.
+
+This document covers the engineering steps to write, test, and deploy a
+padding machine, as well as how to extend the framework to support new machine
+features.
+
+If you prefer to learn by example, you may want to skip to either the
+[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
+5](#5-example-padding-machines) for example machines to get you up and running
+quickly.
+
+### 1.2. Layering Model
+
+The circuit padding framework is designed to provide one layer in a layered
+system of interchangeable components.
+
+The circuit padding framework operates at the Tor circuit layer. It only deals
+with the inter-cell timings and quantity of cells sent on a circuit. It can
+insert cells on a circuit in arbitrary patterns, and in response to arbitrary
+conditions, but it cannot delay cells. It also does not deal with packet
+sizes, how cells are packed into TLS records, or ways that the Tor protocol
+might be recognized on the wire.
+
+The problem of differentiating Tor traffic from non-Tor traffic based on
+TCP/TLS packet sizes, initial handshake patterns, and DPI characteristics is the
+domain of [pluggable
+transports](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGardenOfPluggableTransports),
+which may optionally be used in conjunction with this framework (or without
+it).
+
+This document focuses primarily on the circuit padding framework's cover
+traffic features, and will only briefly touch on the potential obfuscation and
+application layer coupling points of the framework. Explicit layer coupling 
+points can be created by adding either new [machine application
+events](#62-machine-application-events) or new [internal machine
+events](#63-internal-machine-events) to the circuit padding framework, so that
+your padding machines can react to events from other layers.
+
+### 1.3. Computation Model
+
+The circuit padding framework is designed to support succinctly specified
+defenses that can be tuned through [computer-assisted
+optimization](#4-evaluating-padding-machines).
+
+We chose to generalize the original [Adaptive Padding 2-state
+design](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) into an
+event-driven state machine because state machines are the simplest form of
+sequence recognition devices from [automata
+theory](https://en.wikipedia.org/wiki/Finite-state_machine).
+
+Most importantly: this framing allows cover traffic defenses to be modeled as
+an optimization problem search space, expressed as fields of a C structure
+(which is simultaneously a compact opaque bitstring as well as a symbolic
+vector in an abstract feature space). This kind of space is particularly well
+suited to search by gradient descent, GAs, and GANs. 
+
+When performing this optimization search, each padding machine should have a
+fitness function, which will allow two padding machines to be compared for
+relative effectiveness. Optimization searches work best if this fitness can be
+represented as a single number, for example the total amount by which it
+reduces the [Balanced
+Accuracy](https://en.wikipedia.org/wiki/Precision_and_recall#Imbalanced_Data)
+of an adversary's classifier, divided by an amount of traffic overhead. 
+
+Before you begin the optimization phase for your defense, you should
+also carefully consider the [features and
+optimizations](#7-future-features-and-optimizations) that we suspect will be
+useful, and also see if you can come up with any more. You should similarly be
+sure to restrict your search space to avoid areas of the bitstring/feature
+vector that you are sure you will not need. For example, some
+[applications](#8-open-research-problems) may not need the histogram
+accounting used by Adaptive Padding, but might need to add other forms of
+[pattern recognition](#75-more-complex-pattern-recognition) to react to
+sequences that resemble HTTP GET and HTTP POST.
+
+### 1.4. Other Deployment Constraints
+
+The framework has some limitations that are the result of deliberate
+choices. We are unlikely to deploy defenses that ignore these limitations.
+
+In particular, we have deliberately not provided any mechanism to delay actual
+user traffic, even though we are keenly aware that if we were to support
+additional delay, defenses would be able to have [more success with less
+bandwidth
+overhead](https://freedom.cs.purdue.edu/anonymity/trilemma/index.html).
+
+In the website traffic fingerprinting domain, [provably optimal
+defenses](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
+achieve their bandwidth overhead bounds by ensuring that a non-empty queue is
+maintained, by rate limiting traffic below the actual throughput of a circuit.
+For optimal results, this queue must avoid draining to empty, and yet it
+must also be drained fast enough to avoid tremendous queue overhead in fast
+Tor relays, which carry hundreds of thousands of circuits simultaneously.
+
+Unfortunately, Tor's end-to-end flow control is not congestion control. Its
+window sizes are currently fixed. This means there is no signal when queuing
+occurs, and thus no ability to limit queue size through pushback. This means
+there is currently no way to do the fine-grained queue management necessary to
+create such a queue and rate limit traffic effectively enough to keep this
+queue from draining to empty, without also risking that aggregate queuing
+would cause out-of-memory conditions on fast relays.
+
+It may be possible to create a congestion control algorithm that can support
+such fine grained queue management, but this is a [deeply unsolved area of
+research](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
+
+Even beyond these major technical hurdles, additional latency is also
+unappealing to the wider Internet community, for the simple reason that
+bandwidth [continues to increase
+exponentially](https://ipcarrier.blogspot.com/2014/02/bandwidth-growth-nearly-what-one-would.html)
+where as the speed of light is fixed. Significant engineering effort has been
+devoted to optimizations that reduce the effect of latency on Internet
+protocols. To go against this trend would ensure our irrelevance to the wider
+conversation about traffic analysis defenses for low latency Internet protocols.
+
+On the other hand, through [load
+balancing](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt)
+and [circuit multiplexing strategies](https://bugs.torproject.org/29494), we
+believe it is possible to add significant bandwidth overhead in the form of
+cover traffic, without significantly impacting end-user performance.
+
+For these reasons, we believe the trade-off should be in favor of adding more
+cover traffic, rather than imposing queuing memory overhead and queuing delay.
+
+As a last resort for narrowly scoped application domains (such as
+shaping Tor service-side onion service traffic to look like other websites or
+different application-layer protocols), delay *may* be added at the
+[application layer](https://petsymposium.org/2017/papers/issue2/paper54-2017-2-source.pdf).
+Any additional cover traffic required by such defenses should still be
+added at the circuit padding layer using this framework, to provide
+engineering efficiency through loose layer coupling and component re-use, as
+well as to provide additional gains against [low
+resolution](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47)
+end-to-end traffic correlation.
+
+Because such delay-based defenses will impact performance significantly more
+than simply adding cover traffic, they must be optional, and negotiated by
+only specific application layer endpoints that want them. This will have
+consequences for anonymity sets and base rates, if such traffic shaping and
+additional cover traffic is not very carefully constructed.
+
+In terms of acceptable overhead, because Tor onion services
+[currently use](https://metrics.torproject.org/hidserv-rend-relayed-cells.html)
+less than 1% of the
+[total consumed bandwidth](https://metrics.torproject.org/bandwidth-flags.html)
+of the Tor network, and because onion services exist to provide higher
+security as compared to Tor Exit traffic, they are an attractive target for
+higher-overhead defenses. We encourage researchers to target this use case
+for defenses that require more overhead, and/or for the deployment of
+optional negotiated application-layer delays on either the server or the
+client side.
+
+## 2. Creating New Padding Machines
+
+This section explains how to use the existing mechanisms in Tor to define a
+new circuit padding machine.  We assume here that you know C, and are at
+least somewhat familiar with Tor development.  For more information on Tor
+development in general, see the other files in doc/HACKING/ in a recent Tor
+distribution.
+
+Again, if you prefer to learn by example, you may want to skip to either the
+[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
+5](#5-example-padding-machines) for example machines to get up and running
+quickly.
+
+To create a new padding machine, you must:
+
+  1. Define your machine using the fields of a heap-allocated
+     `circpad_machine_spec_t` C structure.
+
+  2. Register this object in the global list of available padding machines,
+     using `circpad_register_padding_machine()`.
+
+  3. Ensure that your machine is properly negotiated under your desired
+     circuit conditions.
+
+### 2.1. Registering a New Padding Machine
+
+Again, a circuit padding machine is designed to be specified entirely as a [single
+C structure](#13-computation-model).
+
+Your machine definitions should go into their own functions in
+[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c). For
+details on all of the fields involved in specifying a padding machine, see
+[Section 3](#3-specifying-padding-machines).
+
+You must register your machine in `circpad_machines_init()` in
+[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). To
+add a new padding machine specification, you must allocate a
+`circpad_machine_spec_t` on the heap with `tor_malloc_zero()`, give it a
+human readable name string, and a machine number equivalent to the number of
+machines in the list, and register the structure using
+`circpad_register_padding_machine()`.
+
+Each machine must have a client instance and a relay instance. Register your
+client-side machine instance in the `origin_padding_machines` list, and your
+relay side machine instance in the `relay_padding_machines` list. Once you
+have registered your instance, you do not need to worry about deallocation;
+this is handled for you automatically.
+
+Both machine lists use registration order to signal machine precedence for a
+given `machine_idx` slot on a circuit. This means that machines that are
+registered last are checked for activation *before* machines that are
+registered first. (This reverse precedence ordering allows us to
+deprecate older machines simply by adding new ones after them.)
+
+### 2.2. Machine Activation and Shutdown
+
+After a machine has been successfully registered with the framework, it will
+be instantiated on any client-side circuits that support it. Only client-side
+circuits may initiate padding machines, but either clients or relays may shut
+down padding machines.
+
+#### 2.2.1. Machine Application Conditions
+
+The
+[circpad_machine_conditions_t conditions field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L641)
+of your `circpad_machine_spec_t` machine definition instance controls the
+conditions under which your machine will be attached and enabled on a Tor
+circuit, and when it gets shut down.
+
+*All* of your explicitly specified conditions in
+`circpad_machine_spec_t.conditions` *must* be met for the machine to be
+applied to a circuit. If *any* condition ceases to be met, then the machine
+is shut down.  (This is checked on every event that arrives, even if the
+condition is unrelated to the event.)
+Another way to look at this is that
+all specified conditions must evaluate to true for the entire duration that
+your machine is running. If any are false, your machine does not run (or
+stops running and shuts down).
+
+In particular, as part of the
+[circpad_machine_conditions_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L149),
+the circuit padding subsystem gives the developer the option to enable a
+machine based on:
+  - The
+    [length](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L157)
+    on the circuit (via the `min_hops` field).
+  - The
+    [current state](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L174)
+    of the circuit, such as streams, relay_early, etc. (via the
+    `circpad_circuit_state_t state_mask` field).
+  - The
+    [purpose](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L178)
+    (i.e. type) of the circuit (via the `circpad_purpose_mask_t purpose_mask`
+    field).
+
+This condition mechanism is the preferred way to determine if a machine should
+apply to a circuit. For information about potentially useful conditions that
+we have considered but have not yet implemented, see [Section
+7.3](#73-better-machine-negotiation). We will happily accept patches for those
+conditions, or any for other additional conditions that are needed for your
+use case.
+
+#### 2.2.2. Detecting and Negotiating Machine Support
+
+When a new machine specification is added to Tor (or removed from Tor), you
+should bump the Padding subprotocol version in `src/core/or/protover.c` and
+`src/rust/protover/protover.rs`, add a field to `protover_summary_flags_t` in
+`or.h`, and set this field in `memoize_protover_summary()` in versions.c. This
+new field must then be checked in `circpad_node_supports_padding()` in
+`circuitpadding.c`.
+
+Note that this protocol version update and associated support check is not
+necessary if your experiments will *only* be using your own relays that
+support your own padding machines. This can be accomplished by using the
+`MiddleNodes` directive; see [Section 4](#4-evaluating-padding-machines) for more information.
+
+If the protocol support check passes for the circuit, then the client sends a
+`RELAY_COMMAND_PADDING_NEGOTIATE` cell towards the
+`circpad_machine_spec_t.target_hop` relay, and immediately enables the
+padding machine, and may begin sending padding. (The framework does not wait
+for the `RELAY_COMMAND_PADDING_NEGOTIATED` response to begin padding,
+so that we can
+switch between machines rapidly.)
+
+#### 2.2.3. Machine Shutdown Mechanisms
+
+Padding machines can be shut down on a circuit in three main ways:
+  1. During a `circpad_machine_event` callback, when
+     `circpad_machine_spec_t.conditions` no longer applies (client side)
+  2. After a transition to the CIRCPAD_STATE_END, if
+     `circpad_machine_spec_t.should_negotiate_end` is set (client or relay
+     side)
+  3. If there is a `RELAY_COMMAND_PADDING_NEGOTIATED` error response from the
+     relay during negotiation.
+
+Each of these cases causes the originating node to send a relay cell towards
+the other side, indicating that shutdown has occurred. The client side sends
+`RELAY_COMMAND_PADDING_NEGOTIATE`, and the relay side sends
+`RELAY_COMMAND_PADDING_NEGOTIATED`.
+
+Because padding from malicious exit nodes can be used to construct active
+timing-based side channels to malicious guard nodes, the client checks that
+padding-related cells only come from relays with active padding machines.
+For this reason, when a client decides to shut down a padding machine,
+the framework frees the mutable `circuit_t.padding_info`, but leaves the
+`circuit_t.padding_machine` pointer set until the
+`RELAY_COMMAND_PADDING_NEGOTIATED` response comes back, to ensure that any
+remaining in-flight padding packets are recognized a valid. Tor does
+not yet close circuits due to violation of this property, but the
+[vanguards addon component "bandguard"](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md#the-bandguards-subsystem)
+does.
+
+As an optimization, a client may replace a machine with another, by
+sending a `RELAY_COMMAND_PADDING_NEGOTIATE` cell to shut down a machine, and
+immediately sending a `RELAY_COMMAND_PADDING_NEGOTIATE` to start a new machine
+in the same index, without waiting for the response from the first negotiate
+cell.
+
+Unfortunately, there is a known bug as a consequence of this optimization. If
+your machine depends on repeated shutdown and restart of the same machine
+number on the same circuit, please see [Bug
+30922](https://bugs.torproject.org/30992). Depending on your use case, we may
+need to fix that bug or help you find a workaround. See also [Section
+6.1.3](#613-deallocation-and-shutdown) for some more technical details on this
+mechanism.
+
+
+## 3. Specifying Padding Machines
+
+By now, you should understand how to register, negotiate, and control the
+lifetime of your padding machine, but you still don't know how to make it do
+anything yet. This section should help you understand how to specify how your
+machine reacts to events and adds padding to the wire.
+
+If you prefer to learn by example first instead, you may wish to skip to
+[Section 5](#5-example-padding-machines).
+
+
+A padding machine is specified by filling in an instance of
+[circpad_machine_spec_t](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L605). Instances
+of this structure specify the precise functionality of a machine: it's
+what the circuit padding developer is called to write. These instances
+are created only at startup, and are referenced via `const` pointers during
+normal operation.
+
+In this section we will go through the most important elements of this
+structure.
+
+### 3.1. Padding Machine States
+
+A padding machine is a finite state machine where each state
+specifies a different style of padding.
+
+As an example of a simple padding machine, you could have a state machine
+with the following states: `[START] -> [SETUP] -> [HTTP] -> [END]` where the
+`[SETUP]` state pads in a way that obfuscates the ''circuit setup'' of Tor,
+and the `[HTTP]` state pads in a way that emulates a simple HTTP session. Of
+course, padding machines can be more complicated than that, with dozens of
+states and non-trivial transitions.
+
+Padding developers encode the machine states in the
+[circpad_machine_spec_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L655). Each
+machine state is described by a
+[circpad_state_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L273)
+and each such structure specifies the style and amount of padding to be sent,
+as well as the possible state transitions.
+
+The function `circpad_machine_states_init()` must be used for allocating and
+initializing the `circpad_machine_spec_t.states` array before states and
+state transitions can be defined, as some of the state object has non-zero
+default values.
+
+### 3.2. Padding Machine State Transitions
+
+As described above, padding machines can have multiple states, to
+support different forms of padding. Machines can transition between
+states based on events that occur either on the circuit level or on
+the machine level.
+
+State transitions are specified using the
+[next_state field](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L381)
+of the `circpad_state_t` structure. As a simple example, to transition
+from state `A` to state `B` when event `E` occurs, you would use the
+following code: `A.next_state[E] = B`.
+
+#### 3.2.1. State Transition Events
+
+Here we will go through
+[the various events](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L30)
+that can be used to transition between states:
+
+* Circuit-level events
+  * `CIRCPAD_EVENT_NONPADDING_RECV`: A non-padding cell is received
+  * `CIRCPAD_EVENT_NONPADDING_SENT`: A non-adding cell is sent
+  * `CIRCPAD_EVENT_PADDING_SENT`: A padding cell is sent
+  * `CIRCPAD_EVENT_PADDING_RECV`: A padding cell is received
+* Machine-level events
+  * `CIRCPAD_EVENT_INFINITY`: Tried to schedule padding using the ''infinity bin''.
+  * `CIRCPAD_EVENT_BINS_EMPTY`: All histogram bins are empty (out of tokens)
+  * `CIRCPAD_EVENT_LENGTH_COUNT`: State has used all its padding capacity (see `length_dist` below)
+
+### 3.3. Specifying Per-State Padding
+
+Each state of a padding machine specifies either:
+  * A padding histogram describing inter-transmission delays between cells;
+d     OR
+  * A parameterized delay probability distribution for inter-transmission
+    delays between cells.
+
+Either mechanism specifies essentially the *minimum inter-transmission time*
+distribution. If non-padding traffic does not get transmitted from this
+endpoint before the delay value sampled from this distribution expires, a
+padding packet is sent.
+
+The choice between histograms and probability distributions can be subtle. A
+rule of thumb is that probability distributions are easy to specify and
+consume very little memory, but might not be able to describe certain types
+of complex padding logic. Histograms, in contrast, can support precise
+packet-count oriented or multimodal delay schemes, and can use token removal
+logic to reduce overhead and shape the total padding+non-padding inter-packet
+delay distribution towards an overall target distribution.
+
+We suggest that you start with a probability distribution if possible, and
+you move to a histogram-based approach only if a probability distribution
+does not suit your needs.
+
+#### 3.3.1. Padding Probability Distributions
+
+The easiest, most compact way to schedule padding using a machine state is to
+use a probability distribution that specifies the possible delays. That can
+be done
+[using the circpad_state_t fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L339)
+`iat_dist`, `dist_max_sample_usec` and `dist_added_shift_usec`.
+
+The Tor circuit padding framework
+[supports multiple types](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L214)
+of probability distributions, and the developer should use the
+[circpad_distribution_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L240)
+to specify them as well as the required parameters.
+
+#### 3.3.2. Padding Histograms
+
+A more advanced way to schedule padding is to use a ''padding
+histogram''. The main advantages of a histogram are that it allows you to
+specify distributions that are not easily parameterized in closed form, or
+require specific packet counts at particular time intervals. Histograms also
+allow you to make use of an optional traffic minimization and shaping
+optimization called *token removal*, which is central to the original
+[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
+concept.
+
+If a histogram is used by a state (as opposed to a fixed parameterized
+distribution), then the developer must use the
+[histogram-related fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L285)
+of the `circpad_state_t` structure.
+
+The width of a histogram bin specifies the range of inter-packet delay times,
+whereas its height specifies the amount of tokens in each bin. To sample a
+padding delay from a histogram, we first randomly pick a bin (weighted by the
+amount of tokens in each bin) and then sample a delay from within that bin by
+picking a uniformly random delay using the width of the bin as the range.
+
+Each histogram also has an ''infinity bin'' as its final bin.  If the
+''infinity bin'' is chosen,
+we don't schedule any padding (i.e., we schedule padding with
+infinite delay). If the developer does not want infinite delay, they
+should not give any tokens to the ''infinity bin''.
+
+If a token removal strategy is specified (via the
+`circpad_state_t.token_removal` field), each time padding is sent using a
+histogram, the padding machine will remove a token from the appropriate
+histogram bin whenever this endpoint sends *either a padding packet or a
+non-padding packet*. The different removal strategies govern what to do when
+the bin corresponding to the current inter-packet delay is empty.
+
+Token removal is optional. It is useful if you want to do things like specify
+a burst should be at least N packets long, and you only want to add padding
+packets if there are not enough non-padding packets. The cost of doing token
+removal is additional memory allocations for making per-circuit copies of
+your histogram that can be modified.
+
+### 3.4. Specifying Precise Cell Counts
+
+Padding machines should be able to specify the exact amount of padding they
+send. For histogram-based machines this can be done using a specific amount
+of tokens, but another (and perhaps easier) way to do this, is to use the
+[length_dist field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L362)
+of the `circpad_state_t` structure.
+
+The `length_dist` field is basically a probability distribution similar to the
+padding probability distributions, which applies to a specific machine state
+and specifies the amount of padding we are willing to send during that state.
+This value gets sampled when we transition to that state (TODO document this
+in the code).
+
+### 3.5. Specifying Overhead Limits
+
+Separately from the length counts, it is possible to rate limit the overhead
+percentage of padding at both the global level across all machines, and on a
+per-machine basis.
+
+At the global level, the overhead percentage of all circuit padding machines
+as compared to total traffic can be limited through the Tor consensus
+parameter `circpad_global_max_padding_pct`. This overhead is defined as the
+percentage of padding cells *sent* out of the sum of non padding and padding
+cells *sent*, and is applied *only after* at least
+`circpad_global_allowed_cells` padding cells are sent by that relay or client
+(to allow for small bursts of pure padding on otherwise idle or freshly
+restarted relays). When both of these limits are hit by a relay or client, no
+further padding cells will be sent, until sufficient non-padding traffic is
+sent to cause the percentage of padding traffic to fall back below the
+threshold.
+
+Additionally, each individual padding machine can rate limit itself by
+filling in the fields `circpad_machine_spec_t.max_padding_percent` and
+`circpad_machine_spec_t.allowed_padding_count`, which behave identically to
+the consensus parameters, but only apply to that specific machine.
+
+## 4. Evaluating Padding Machines
+
+One of the goals of the circuit padding framework is to provide improved
+evaluation and scientific reproducibility for lower cost. This includes both
+the [choice](#13-computation-model) of the compact C structure representation
+(which has an easy-to-produce bitstring representation for optimization by
+gradient descent, GAs, or GANs), as well as rapid prototyping and evaluation.
+
+So far, whenever evaluation cost has been a barrier, each research group has
+developed their own ad-hoc packet-level simulators of various padding
+mechanisms for evaluating website fingerprinting attacks and defenses. The
+process typically involves doing a crawl of Alexa top sites over Tor, and
+recording the Tor cell count and timing information for each page in the
+trace. These traces are then fed to simulations of defenses, which output
+modified trace files.
+
+Because no standardized simulation and evaluation mechanism exists, it is
+often hard to tell if independent implementations of various attacks and
+defenses are in fact true-to-form or even properly calibrated for direct
+comparison, and discrepancies in results across the literature suggests
+this is not always so.
+
+Our preferred outcome with this framework is that machines are tuned
+and optimized on a tracing simulator, but that the final results come from
+an actual live network test of the defense. The traces from this final crawl
+should be preserved as artifacts to be run on the simulator and reproduced
+on the live network by future papers, ideally in journal venues that have an
+artifact preservation policy.
+
+### 4.1. Pure Simulation
+
+When doing initial tuning of padding machines, especially in adversarial
+settings, variations of a padding machine defense may need to be applied to
+network activity hundreds or even millions of times. The wall-clock time
+required to do this kind of tuning using live testing or even Shadow network
+emulation may often be prohibitive.
+
+To help address this, and to better standardize results, Tobias Pulls has
+implemented a [circpad machine trace simulator](https://github.com/pylls/circpad-sim),
+which uses Tor's unit test framework to simulate applying padding machines to
+circuit packet traces via a combination of Tor patches and python scripts. This
+simulator can be used to record traces from clients, Guards, Middles, Exits,
+and any other hop in the path, only for circuits that are run by the
+researcher. This makes it possible to safely record baseline traces and
+ultimately even mount passive attacks on the live network, without impacting
+or recording any normal user traffic.
+
+In this way, a live crawl of the Alexa top sites could be performed once, to
+produce a standard "undefended" corpus. Padding machines can be then quickly
+evaluated and tuned on these simulated traces in a standardized way, and then
+the results can then be [reproduced on the live Tor
+network](#44-Testing-on-the-Live-Network) with the machines running on your own relays.
+
+Please be mindful of the Limitations section of the simulator documentation,
+however, to ensure that you are aware of the edge cases and timing
+approximations that are introduced by this approach.
+
+### 4.2. Testing in Chutney
+
+The Tor Project provides a tool called
+[Chutney](https://github.com/torproject/chutney/) which makes it very easy to
+setup private Tor networks. While getting it work for the first time might
+take you some time of doc reading, the final result is well worth it for the
+following reasons:
+
+- You control all the relays and hence you have greater control and debugging
+  capabilities.
+- You control all the relays and hence you can toggle padding support on/off
+  at will.
+- You don't need to be cautious about overhead or damaging the real Tor
+  network during testing.
+- You don't even need to be online; you can do all your testing offline over
+  localhost.
+
+A final word of warning here is that since Chutney runs over localhost, the
+packet latencies and delays are completely different from the real Tor
+network, so if your padding machines rely on real network timings you will
+get different results on Chutney. You can work around this by using a
+different set of delays if Chutney is used, or by moving your padding
+machines to the real network when you want to do latency-related testing.
+
+### 4.3. Testing in Shadow
+
+[Shadow](https://shadow.github.io/) is an environment for running entire Tor
+network simulations, similar to Chutney, but designed to be both more memory
+efficient, as well as provide an accurate Tor network topology and latency
+model.
+
+While Shadow is significantly more memory efficient than Chutney, and can make
+use of extremely accurate Tor network capacity and latency models, it will not
+be as fast or efficient as the [circpad trace simulator](https://github.com/pylls/circpad-sim),
+if you need to do many many iterations of an experiment to tune your defense.
+
+### 4.4. Testing on the Live Network
+
+Live network testing is the gold standard for verifying that any attack or
+defense is behaving as expected, to minimize the influence of simplifying
+assumptions.
+
+However, it is not ethical, or necessarily possible, to run high-resolution
+traffic analysis attacks on the entire Tor network. But it is both ethical
+and possible to run small scale experiments that target only your own
+clients, who will only use your own Tor relays that support your new padding
+machines.
+
+We provide the `MiddleNodes` torrc directive to enable this, which will allow
+you to specify the fingerprints and/or IP netmasks of relays to be used in
+the second hop position. Options to restrict other hops also exist, if your
+padding system is padding to a different hop. The `HSLayer2Nodes` option
+overrides the `MiddleNodes` option for onion service circuits, if both are
+set. (The
+[vanguards addon](https://github.com/mikeperry-tor/vanguards/README_TECHNICAL.md)
+will set `HSLayer2Nodes`.)
+
+When you run your own clients, and use MiddleNodes to restrict your clients
+to use your relays, you can perform live network evaluations of a defense
+applied to whatever traffic crawl or activity your clients do.
+
+## 5. Example Padding Machines
+
+### 5.1. Deployed Circuit Setup Machines
+
+Tor currently has two padding machines enabled by default, which aim to hide
+certain features of the client-side onion service circuit setup protocol. For
+more details on their precise goal and function, please see
+[proposal 302](https://github.com/torproject/torspec/blob/master/proposals/302-padding-machines-for-onion-clients.txt)
+. In this section we will go over the code of those machines to clarify some
+of the engineering parts:
+
+#### 5.1.1. Overview
+
+The codebase of proposal 302 can be found in
+[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c)
+and specifies four padding machines:
+
+- The [client-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L60) circuit machine.
+- The [relay-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L146) circuit machine.
+- The [client-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L257) circuit machine
+- The [relay-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L374) circuit machine.
+
+Each of those machines has its own setup function, and
+[they are all initialized](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.c#L2718)
+by the circuit padding framework. To understand more about them, please
+carefully read the individual setup function for each machine which are
+fairly well documented. Each function goes through the following steps:
+- Machine initialization
+  - Give it a [name](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L70)
+  - Specify [which hop](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L73) the padding should go to
+  - Specify whether it should be [client-side](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L75) or relay-side.
+- Specify for [which types of circuits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L78) the machine should apply
+- Specify whether the circuits should be [kept alive](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c#L112) until the machine finishes padding.
+- Sets [padding limits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L116) to avoid too much overhead in case of bugs or errors.
+- Setup [machine states](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L120)
+   - Specify [state transitions](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L123).
+- Finally [register the machine](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L137) to the global machine list
+
+### 5.2. Adaptive Padding Early
+
+[Adaptive Padding Early](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/)
+is a variant of Adaptive Padding/WTF-PAD that does not use histograms or token
+removal to shift padding distributions, but instead uses fixed parameterized
+distributions to specify inter-packet timing thresholds for burst and gap
+inter-packet delays.
+
+Tobias Pulls's [QuickStart Guide](CircuitPaddingQuickStart.md) describes how
+to get this machine up and running, and has links to branches with a working
+implementation.
+
+### 5.3. Sketch of Tamaraw
+
+The [Tamaraw defense
+paper](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf) is the
+only defense to date that provides a proof of optimality for the finite-length
+website traffic fingerprinting domain. These bounds assume that a defense is
+able to perform a full, arbitrary transform of a trace that is under a fixed
+number of packets in length.
+
+The key insight to understand Tamaraw's optimality is that it achieves one
+such optimal transform by delaying traffic below a circuit's throughput. By
+doing this, it creates a queue that is rarely empty, allowing it to produce
+a provably optimal transform with minimal overhead. As [Section
+1.4](#14-other-deployment-constraints) explains, this queue
+cannot be maintained on the live Tor network without risk of out-of-memory
+conditions at relays.
+
+However, if the queue is not maintained in the Tor network, but instead by the
+application layer, it could be deployed by websites that opt in to using it.
+
+In this case, the application layer component would do *optional* constant
+rate shaping, negotiated between a web browser and a website. The Circuit
+Padding Framework can then easily fill in any missing gaps of cover traffic
+packets, and also ensure that only a fixed length number of packets are sent
+in total.
+
+However, for such a defense to be safe, additional care must be taken to
+ensure that the resulting traffic pattern still has a large
+anonymity/confusion set with other traces on the live network.
+
+Accomplishing this is an unsolved problem.
+
+### 5.4. Other Padding Machines
+
+Our partners in this project at RIT have produced a couple prototypes, based
+on their published research designs
+[REB and RBB](https://www.researchgate.net/publication/329743510_UNDERSTANDING_FEATURE_DISCOVERY_IN_WEBSITE_FINGERPRINTING_ATTACKS).
+
+As [their writeup
+explains](https://github.com/notem/tor-rbp-padding-machine-doc), because RBB
+uses delay, the circuit padding machine that they made is a no-delay version.
+
+They also ran into an issue with the 0-delay timing workaround for [bug
+31653](https://bugs.torproject.org/31653). Keep an eye on that bug for updates
+with improved workarounds/fixes.
+
+Their code is [available on github](https://github.com/notem/tor/tree/circuit_padding_rbp_machine).
+
+
+## 6. Framework Implementation Details
+
+If you need to add additional events, conditions, or other features to the
+circuit padding framework, then this section is for you.
+
+### 6.1. Memory Allocation Conventions
+
+If the existing circuit padding features are sufficient for your needs, then
+you do not need to worry about memory management or pointer lifespans. The
+circuit padding framework should take care of this for you automatically.
+
+However, if you need to add new padding machine features to support your
+padding machines, then it will be helpful to understand how circuits
+correspond to the global machine definitions, and how mutable padding machine
+memory is managed.
+
+#### 6.1.1. Circuits and Padding Machines
+
+In Tor, the
+[circuit_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuit_st.h)
+is the superclass structure for circuit-related state that is used on both
+clients and relays. On clients, the actual datatype of the object pointed to
+by `circuit_t *` is the subclass structure
+[origin_circuit_t](https://github.com/torproject/tor/blob/master/src/core/or/origin_circuit_st.h). The
+macros `CIRCUIT_IS_ORIGIN()` and `TO_ORIGIN_CIRCUIT()` are used to determine
+if a circuit is a client-side (origin) circuit and to cast the pointer safely
+to `origin_circuit_t *`.
+
+Because circuit padding machines can be present at both clients and relays,
+the circuit padding fields are stored in the `circuit_t *` superclass
+structure. Notice that there are actually two sets of circuit padding fields:
+a `const circpad_machine_spec_t *` array, and a `circpad_machine_runtime_t *`
+array. Each of these arrays holds at most two elements, as there can be at
+most two padding machines on each circuit.
+
+The `const circpad_machine_spec_t *` points to a globally allocated machine
+specification. These machine specifications are
+allocated and set up during Tor program startup, in `circpad_machines_init()`
+in
+[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). Because
+the machine specification object is shared by all circuits, it must not be
+modified or freed until program exit (by `circpad_machines_free()`). The
+`const` qualifier should enforce this at compile time.
+
+The `circpad_machine_runtime_t *` array member points to the mutable runtime
+information for machine specification at that same array index. This runtime
+structure keeps track of the current machine state, packet counts, and other
+information that must be updated as the machine operates. When a padding
+machine is successfully negotiated `circpad_setup_machine_on_circ()` allocates
+the associated runtime information.
+
+#### 6.1.2. Histogram Management
+
+If a `circpad_state_t` of a machine specifies a `token_removal` strategy
+other than `CIRCPAD_TOKEN_REMOVAL_NONE`, then every time
+there is a state transition into this state, `circpad_machine_setup_tokens()`
+will copy the read-only `circpad_state_t.histogram` array into a mutable
+version at `circpad_machine_runtime_t.histogram`. This mutable copy is used
+to decrement the histogram bin accounts as packets are sent, as per the
+specified token removal strategy.
+
+When the state machine transitions out of this state, the mutable histogram copy is freed
+by this same `circpad_machine_setup_tokens()` function.
+
+#### 6.1.3. Deallocation and Shutdown
+
+As an optimization, padding machines can be swapped in and out by the client
+without waiting a full round trip for the relay machine to shut down.
+
+Internally, this is accomplished by immediately freeing the heap-allocated
+`circuit_t.padding_info` field corresponding to that machine, but still preserving the
+`circuit_t.padding_machine` pointer to the global padding machine
+specification until the response comes back from the relay. Once the response
+comes back, that `circuit_t.padding_machine` pointer is set to NULL, if the
+response machine number matches the current machine present.
+
+Because of this partial shutdown condition, we have two macros for iterating
+over machines. `FOR_EACH_ACTIVE_CIRCUIT_MACHINE_BEGIN()` is used to iterate
+over machines that have both a `circuit_t.padding_info` slot and a
+`circuit_t.padding_machine` slot occupied. `FOR_EACH_CIRCUIT_MACHINE_BEGIN()`
+is used when we need to iterate over all machines that are either active or
+are simply waiting for a response to a shutdown request.
+
+If the machine is replaced instead of just shut down, then the client frees
+the `circuit_t.padding_info`, and then sets the `circuit_t.padding_machine`
+and `circuit_t.padding_info` fields for this next machine immediately. This is
+done in `circpad_add_matching_machines()`. In this case, since the new machine
+should have a different machine number, the shut down response from the relay
+is silently discarded, since it will not match the new machine number.
+
+If this sequence of machine teardown and spin-up happens rapidly enough for
+the same machine number (as opposed to different machines), then a race
+condition can happen. This is
+[known bug #30992](https://bugs.torproject.org/30992).
+
+When the relay side decides to shut down a machine, it sends a
+`RELAY_COMMAND_PADDING_NEGOTIATED` towards the client. If this cell matches the
+current machine number on the client, that machine is torn down, by freeing
+the `circuit_t.padding_info` slot and immediately setting
+`circuit_t.padding_machine` slot to NULL.
+
+Additionally, if Tor decides to close a circuit forcibly due to error before
+the padding machine is shut down, then `circuit_t.padding_info` is still
+properly freed by the call to `circpad_circuit_free_all_machineinfos()`
+in `circuit_free_()`.
+
+### 6.2. Machine Application Events
+
+The framework checks client-side origin circuits to see if padding machines
+should be activated or terminated during specific event callbacks in
+`circuitpadding.c`. We list these event callbacks here only for reference. You
+should not modify any of these callbacks to get your machine to run; instead,
+you should use the `circpad_machine_spec_t.conditions` field.
+
+However, you may add new event callbacks if you need other activation events,
+for example to provide obfuscation-layer or application-layer signaling. Any
+new event callbacks should behave exactly like the existing callbacks.
+
+During each of these event callbacks, the framework checks to see if any
+current running padding machines have conditions that no longer apply as a
+result of the event, and shuts those machines down. Then, it checks to see if
+any new padding machines should be activated as a result of the event, based
+on their circuit application conditions. **Remember: Machines are checked in
+reverse order in the machine list. This means that later, more recently added
+machines take precedence over older, earlier entries in each list.**
+
+Both of these checks are performed using the machine application conditions
+that you specify in your machine's `circpad_machine_spec_t.conditions` field.
+
+The machine application event callbacks are prefixed by `circpad_machine_event_` by convention in circuitpadding.c. As of this writing, these callbacks are:
+
+  - `circpad_machine_event_circ_added_hop()`: Called whenever a new hop is
+    added to a circuit.
+  - `circpad_machine_event_circ_built()`: Called when a circuit has completed
+    construction and is
+    opened. <!-- open != ready for traffic. Which do we mean? -nickm -->
+  - `circpad_machine_event_circ_purpose_changed()`: Called when a circuit
+    changes purpose.
+  - `circpad_machine_event_circ_has_no_relay_early()`: Called when a circuit
+    runs out of `RELAY_EARLY` cells.
+  - `circpad_machine_event_circ_has_streams()`: Called when a circuit gets a
+    stream attached.
+  - `circpad_machine_event_circ_has_no_streams()`: Called when the last
+    stream is detached from a circuit.
+
+### 6.3. Internal Machine Events
+
+To provide for some additional capabilities beyond simple finite state machine
+behavior, the circuit padding machines also have internal events that they
+emit to themselves when packet count length limits are hit, when the Infinity
+bin is sampled, and when the histogram bins are emptied of all tokens.
+
+These events are implemented as `circpad_internal_event_*` functions in
+`circuitpadding.c`, which are called from various areas that determine when
+the events should occur.
+
+While the conditions that trigger these internal events to be called may be
+complex, they are processed by the state machine definitions in a nearly
+identical manner as the cell processing events, with the exception that they
+are sent to the current machine only, rather than all machines on the circuit.
+
+
+## 7. Future Features and Optimizations
+
+While implementing the circuit padding framework, our goal was to deploy a
+system that obscured client-side onion service circuit setup and supported
+deployment of WTF-PAD and/or APE. Along the way, we noticed several features
+that might prove useful to others, but were not essential to implement
+immediately. We do not have immediate plans to implement these ideas, but we
+would gladly accept patches that do so.
+
+The following list gives an overview of these improvements, but as this
+document ages, it may become stale. The canonical list of improvements that
+researchers may find useful is tagged in our bugtracker with
+[circpad-researchers](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers),
+and the list of improvements that are known to be necessary for some research
+areas are tagged with
+[circpad-researchers-want](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers-want).
+
+Please consult those lists for the latest status of these issues. Note that
+not all fixes will be backported to all Tor versions, so be mindful of which
+Tor releases receive which fixes as you conduct your experiments.
+
+### 7.1. Load Balancing and Flow Control
+
+Fortunately, non-Exit bandwidth is already plentiful and exceeds the Exit
+capacity, and we anticipate that if we inform our relay operator community of
+the need for non-Exit bandwidth to satisfy padding overhead requirements,
+they will be able to provide that with relative ease.
+
+Unfortunately, padding machines that have large quantities of overhead will
+require changes to our load balancing system to account for this
+overhead. The necessary changes are documented in
+[Proposal 265](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt).
+
+Additionally, padding cells are not currently subjected to flow control. For
+high amounts of padding, we may want to change this. See [ticket
+31782](https://bugs.torproject.org/31782) for details.
+
+### 7.2. Timing and Queuing Optimizations
+
+The circuitpadding framework has some timing related issues that may impact
+results. If high-resolution timestamps are fed to opaque deep learning
+trainers, those training models may end up able to differentiate padding
+traffic from non-padding traffic due to these timing bugs.
+
+The circuit padding cell event callbacks come from post-decryption points in
+the cell processing codepath, and from the pre-queue points in the cell send
+codepath. This means that our cell events will not reflect the actual time
+when packets are read or sent on the wire. This is much worse in the send
+direction, as the circuitmux queue, channel outbuf, and kernel TCP buffer will
+impose significant additional delay between when we currently report that a
+packet was sent, and when it actually hits the wire.
+
+[Ticket 29494](https://bugs.torproject.org/29494) has a more detailed
+description of this problem, and an experimental branch that changes the cell
+event callback locations to be from circuitmux post-queue, which with KIST,
+should be an accurate reflection of when they are actually sent on the wire.
+
+If your padding machine and problem space depends on very accurate notions of
+relay-side packet timing, please try that branch and let us know on the
+ticket if you need any further assistance fixing it up.
+
+Additionally, with these changes, it will be possible to provide further
+overhead reducing optimizations by letting machines specify flags to indicate
+that padding should not be sent if there are any cells pending in the cell
+queue, for doing things like extending cell bursts more accurately and with
+less overhead.
+
+However, even if we solve the queuing issues, Tor's current timers are not as
+precise as some padding designs may require. We will still have issues of
+timing precision to solve. [Ticket 31653](https://bugs.torproject.org/31653)
+describes an issue the circuit padding system has with sending 0-delay padding
+cells, and [ticket 32670](https://bugs.torproject.org/32670) describes a
+libevent timer accuracy issue, which causes callbacks to vary up to 10ms from
+their scheduled time, even in absence of load.
+
+All of these issues strongly suggest that you either truncate the resolution
+of any timers you feed to your classifier, or that you omit timestamps
+entirely from the classification problem until these issues are addressed.
+
+### 7.3. Better Machine Negotiation
+
+Circuit padding is applied to circuits through machine conditions.
+
+The following machine conditions may be useful for some use cases, but have
+not been implemented yet:
+  * [Exit Policy-based Stream Conditions](https://bugs.torproject.org/29083)
+  * [Probability to apply machine/Cointoss condition](https://bugs.torproject.org/30092)
+  * [Probability distributions for launching new padding circuit(s)](https://bugs.torproject.org/31783)
+  * [More flexible purpose handling](https://bugs.torproject.org/32040)
+
+Additionally, the following features may help to obscure that padding is being
+negotiated, and/or streamline that negotiation process:
+  * [Always send negotiation cell on all circuits](https://bugs.torproject.org/30172)
+  * [Better shutdown handling](https://bugs.torproject.org/30992)
+  * [Preference-ordered negotiation menu](https://bugs.torproject.org/30348)
+
+### 7.4. Probabilistic State Transitions
+
+Right now, the state machine transitions are fully deterministic. However,
+one could imagine a state machine that uses probabilistic transitions between
+states to simulate a random walk or Hidden Markov Model traversal across
+several pages.
+
+The simplest way to implement this is to make  the `circpad_state_t.next_state` array
+into an array of structs that have a next state field, and a probability to
+transition to that state.
+
+If you need this feature, please see [ticket
+31787](https://bugs.torproject.org/31787) for more details.
+
+### 7.5. More Complex Pattern Recognition
+
+State machines are extremely efficient sequence recognition devices. But they
+are not great pattern recognition devices. This is one of the reasons why
+[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
+used state machines in combination with histograms, to model the target
+distribution of interpacket delays for transmitted packets.
+
+However, there currently is no such optimization for reaction to patterns of
+*received* traffic. There may also be cases where defenses must react to more
+complex patterns of sent traffic than can be expressed by our current
+histogram and length count events.
+
+For example: if you wish your machine to react to a certain count of incoming
+cells in a row, right now you have to have a state for each cell, and use the
+infinity bin to timeout of the sequence in each state. We could make this more
+compact if each state had an arrival cell counter and inter-cell timeout. Or
+we could make more complex mechanisms to recognize certain patterns of arrival
+traffic in a state.
+
+The best way to build recognition primitives like this into the framework is
+to add additional [Internal Machine Events](#63-internal-machine-events) for
+the pattern in question.
+
+As another simple example, a useful additional event might be to transition
+whenever any of your histogram bins are empty, rather than all of them. To do
+this, you would add `CIRCPAD_EVENT_ANY_BIN_EMPTY` to the enum
+`circpad_event_t` in `circuitpadding.h`. You would then create a function
+`circuitpadding_internal_event_any_bin_empty()`, which would work just like
+`circuitpadding_internal_event_bin_empty()`, and also be called from
+`check_machine_token_supply()` in `circuitpadding.c` but with the check for
+each bin being zero instead of the total. With this change, new machines could
+react to this new event in the same way as any other.
+
+If you have any other ideas that may be useful, please comment on [ticket
+32680](https://bugs.torproject.org/32680).
+
+
+## 8. Open Research Problems
+
+### 8.1. Onion Service Circuit Setup
+
+Our circuit setup padding does not address timing-based features, only
+packet counts. Deep learning can probably see this.
+
+However, before going too deep down the timing rabithole, we may need to make
+[some improvements to Tor](#72-timing-and-queuing-optimizations). Please
+comment on those tickets if you need this.
+
+### 8.2. Onion Service Fingerprinting
+
+We have done nothing to obscure the service side of onion service circuit
+setup. Because service-side onion services will have the reverse traffic byte
+counts as normal clients, they will likely need some kind of [hybrid
+application layer traffic shaping](#53-sketch-of-tamaraw), in addition to
+simple circuit setup obfuscation.
+
+Fingerprinting in
+[combination](https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md)
+with
+[vanguards](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md)
+ia also an open issue.
+
+### 8.3. Open World Fingerprinting
+
+Similarly, Open World/clearweb website fingerprinting defenses remain
+an unsolved problem from the practicality point of view. The original WTF-PAD
+defense was never tuned, and it is showing accuracy issues against deep
+learning attacks.
+
+### 8.4. Protocol Usage Fingerprinting
+
+Traffic Fingerprinting to determine the protocol in use by a client has not
+been studied, either from the attack or defense point of view.
+
+### 8.5. Datagram Transport Side Channels
+
+Padding can reduce the accuracy of dropped-cell side channels in such
+transports, but we don't know [how to measure
+this](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
+
+## 9. Must Read Papers
+
+These are by far the most important papers in the space, to date:
+
+ - [Tamaraw](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
+ - [Bayes, Not Naive](https://www.petsymposium.org/2017/papers/issue4/paper50-2017-4-source.pdf)
+ - [Anonymity Trilemma](https://eprint.iacr.org/2017/954.pdf)
+ - [WTF-PAD](http://arxiv.org/pdf/1512.00524)
+
+Except for WTF-PAD, these papers were selected because they point towards
+optimality bounds that can be benchmarked against.
+
+We cite them even though we are skeptical that provably optimal defenses can
+be constructed, at least not without trivial or impractical transforms (such as
+those that can be created with unbounded queue capacity, or stored knowledge
+of traces for every possible HTTP trace on the Internet).
+
+We also are not demanding an optimality or security proof for every defense.
+
+Instead, we cite the above as benchmarks. We believe the space, especially the
+open-world case, to be more akin to an optimization problem, where a
+WTF-PAD-like defense must be tuned through an optimizer to produce results
+comparable to provably optimal but practically unrealizable defenses, through
+rigorous adversarial evaluation.
+
+## A. Acknowledgments
+
+This research was supported in part by NSF grants CNS-1619454 and CNS-1526306.
diff --git a/doc/HACKING/CircuitPaddingQuickStart.md b/doc/HACKING/CircuitPaddingQuickStart.md
new file mode 100644
index 0000000000..167ff9f292
--- /dev/null
+++ b/doc/HACKING/CircuitPaddingQuickStart.md
@@ -0,0 +1,263 @@
+# A Padding Machine from Scratch
+
+A quickstart guide by Tobias Pulls.
+
+This document describes the process of building a "padding machine" in tor's new
+circuit padding framework from scratch. Notes were taken as part of porting
+[Adaptive Padding Early
+(APE)](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) from basket2 to the
+circuit padding framework. The goal is just to document the process and provide
+useful pointers along the way, not create a useful machine. 
+
+The quick and dirty plan is to:
+1. clone and compile tor
+2. use newly built tor in TB and at small (non-exit) relay we run
+3. add a bare-bones APE padding machine
+4. run the machine, inspect logs for activity
+5. port APE's state machine without thinking much about parameters
+
+## Clone and compile tor
+
+```bash
+git clone https://git.torproject.org/tor.git
+cd tor
+git checkout tor-0.4.1.5
+```
+Above we use the tag for tor-0.4.1.5 where the circuit padding framework was
+released. Note that this version of the framework is missing many features and
+fixes that have since been merged to origin/master. If you need the newest
+framework features, you should use that master instead.
+
+```bash
+sh autogen.sh 
+./configure
+make
+```
+When you run `./configure` you'll be told of missing dependencies and packages
+to install on debian-based distributions. Important: if you plan to run `tor` on
+a relay as part of the real Tor network and your server runs a distribution that
+uses systemd, then I'd recommend that you `apt install dpkg dpkg-dev
+libevent-dev libssl-dev asciidoc quilt dh-apparmor libseccomp-dev dh-systemd
+libsystemd-dev pkg-config dh-autoreconf libfakeroot zlib1g zlib1g-dev automake
+liblzma-dev libzstd-dev` and ensure that tor has systemd support enabled:
+`./configure --enable-systemd`. Without this, on a recent Ubuntu, my tor service
+was forcefully restarted (SIGINT interrupt) by systemd every five minutes.
+
+If you want to install on your localsystem, run `make install`. For our case we
+just want the tor binary at `src/app/tor`.
+
+## Use tor in TB and at a relay
+Download and install a fresh Tor Browser (TB) from torproject.org. Make sure it
+works. From the command line, relative to the folder created when you extracted
+TB, run `./Browser/start-tor-browser --verbose` to get some basic log output.
+Note the version of tor, in my case, `Tor 0.4.0.5 (git-bf071e34aa26e096)` as
+part of TB 8.5.4. Shut down TB, copy the `tor` binary that you compiled earlier
+and replace `Browser/TorBrowser/Tor/tor`. Start TB from the command line again,
+you should see a different version, in my case `Tor 0.4.1.5
+(git-439ca48989ece545)`.
+
+The relay we run is also on linux, and `tor` is located at `/usr/bin/tor`. To
+view relevant logs since last boot `sudo journalctl -b /usr/bin/tor`, where we
+find `Tor 0.4.0.5 running on Linux`. Copy the locally compiled `tor` to the
+relay at a temporary location and then make sure it's ownership and access
+rights are identical to `/usr/bin/tor`. Next, shut down the running tor service
+with `sudo service tor stop`, wait for it to stop (typically 30s), copy our
+locally compiled tor to replace `/usr/bin/tor` then start the service again.
+Checking the logs we see `or 0.4.1.5 (git-439ca48989ece545)`.
+
+Repeatedly shutting down a relay is detrimental to the network and should be
+avoided. Sorry about that.
+
+We have one more step left before we move on the machine: configure TB to always
+use our middle relay. Edit `Browser/TorBrowser/Data/Tor/torrc` and set
+`MiddleNodes <fingerprint>`, where `<fingerprint>` is the fingerprint of the
+relay. Start TB, visit a website, and manually confirm that the middle is used
+by looking at the circuit display. 
+
+## Add a bare-bones APE padding machine
+Now the fun part. We have several resources at our disposal (mind that links
+might be broken in the future, just search for the headings):
+- The official [Circuit Padding Developer
+  Documentation](https://storm.torproject.org/shared/ChieH_sLU93313A2gopZYT3x2waJ41hz5Hn2uG1Uuh7).
+- Notes we made on the [implementation of the circuit padding
+  framework](https://github.com/pylls/padding-machines-for-tor/blob/master/notes/circuit-padding-framework.md).
+- The implementation of the current circuit padding machines in tor:
+  [circuitpadding.c](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.c)
+  and
+  [circuitpadding_machines.h](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.h).
+
+Please consult the above links for details. Moving forward, the focus is to
+describe what was done, not necessarily explaining all the details why. 
+
+Since we plan to make changes to tor, create a new branch `git checkout -b
+circuit-padding-ape-machine tor-0.4.1.5`. 
+
+We start with declaring two functions, one for the machine at the client and one
+at the relay, in `circuitpadding_machines.h`:
+
+```c
+void circpad_machine_relay_wf_ape(smartlist_t *machines_sl);
+void circpad_machine_client_wf_ape(smartlist_t *machines_sl);
+```
+
+The definitions go into `circuitpadding_machines.c`:
+
+```c
+/**************** Adaptive Padding Early (APE) machine ****************/
+
+/** 
+ * Create a relay-side padding machine based on the APE design. 
+ */
+void
+circpad_machine_relay_wf_ape(smartlist_t *machines_sl)
+{
+  circpad_machine_spec_t *relay_machine
+  = tor_malloc_zero(sizeof(circpad_machine_spec_t));
+
+  relay_machine->name = "relay_wf_ape";
+  relay_machine->is_origin_side = 0; // relay-side
+
+  // Pad to/from the middle relay, only when the circuit has streams
+  relay_machine->target_hopnum = 2;
+  relay_machine->conditions.min_hops = 2;
+  relay_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
+
+  // limits to help guard against excessive padding
+  relay_machine->allowed_padding_count = 1;
+  relay_machine->max_padding_percent = 1;
+
+  // one state to start with: START (-> END, never takes a slot in states)
+  circpad_machine_states_init(relay_machine, 1);
+  relay_machine->states[CIRCPAD_STATE_START].
+    next_state[CIRCPAD_EVENT_NONPADDING_SENT] =
+    CIRCPAD_STATE_END;
+
+  // register the machine
+  relay_machine->machine_num = smartlist_len(machines_sl);
+  circpad_register_padding_machine(relay_machine, machines_sl);
+  
+  log_info(LD_CIRC,
+           "Registered relay WF APE padding machine (%u)",
+           relay_machine->machine_num);
+}
+
+/** 
+ * Create a client-side padding machine based on the APE design. 
+ */
+void
+circpad_machine_client_wf_ape(smartlist_t *machines_sl)
+{
+    circpad_machine_spec_t *client_machine
+  = tor_malloc_zero(sizeof(circpad_machine_spec_t));
+
+  client_machine->name = "client_wf_ape";
+  client_machine->is_origin_side = 1; // client-side
+
+  /** Pad to/from the middle relay, only when the circuit has streams, and only
+  * for general purpose circuits (typical for web browsing)
+  */
+  client_machine->target_hopnum = 2;
+  client_machine->conditions.min_hops = 2;
+  client_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
+  client_machine->conditions.purpose_mask =
+    circpad_circ_purpose_to_mask(CIRCUIT_PURPOSE_C_GENERAL);
+
+  // limits to help guard against excessive padding
+  client_machine->allowed_padding_count = 1;
+  client_machine->max_padding_percent = 1;
+
+  // one state to start with: START (-> END, never takes a slot in states)
+  circpad_machine_states_init(client_machine, 1);
+  client_machine->states[CIRCPAD_STATE_START].
+    next_state[CIRCPAD_EVENT_NONPADDING_SENT] =
+    CIRCPAD_STATE_END;
+
+  client_machine->machine_num = smartlist_len(machines_sl);
+  circpad_register_padding_machine(client_machine, machines_sl);
+  log_info(LD_CIRC,
+           "Registered client WF APE padding machine (%u)",
+           client_machine->machine_num);
+}
+```
+
+We also have to modify `circpad_machines_init()` in `circuitpadding.c` to
+register our machines:
+
+```c
+  /* Register machines for the APE WF defense */
+  circpad_machine_client_wf_ape(origin_padding_machines);
+  circpad_machine_relay_wf_ape(relay_padding_machines);
+```
+
+We run `make` to get a new `tor` binary and copy it to our local TB. 
+
+## Run the machine
+To be able
+to view circuit info events in the console as we launch TB, we add `Log
+[circ]info notice stdout` to `torrc` of TB. 
+
+Running TB to visit example.com we first find in the log:
+
+```
+Aug 30 18:36:43.000 [info] circpad_machine_client_hide_intro_circuits(): Registered client intro point hiding padding machine (0)
+Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_intro_circuits(): Registered relay intro circuit hiding padding machine (0)
+Aug 30 18:36:43.000 [info] circpad_machine_client_hide_rend_circuits(): Registered client rendezvous circuit hiding padding machine (1)
+Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_rend_circuits(): Registered relay rendezvous circuit hiding padding machine (1)
+Aug 30 18:36:43.000 [info] circpad_machine_client_wf_ape(): Registered client WF APE padding machine (2)
+Aug 30 18:36:43.000 [info] circpad_machine_relay_wf_ape(): Registered relay WF APE padding machine (2)
+```
+
+All good, our machine is running. Looking further we find:
+
+```
+Aug 30 18:36:55.000 [info] circpad_setup_machine_on_circ(): Registering machine client_wf_ape to origin circ 2 (5)
+Aug 30 18:36:55.000 [info] circpad_node_supports_padding(): Checking padding: supported
+Aug 30 18:36:55.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 2 (5), command 2
+Aug 30 18:36:55.000 [info] circpad_machine_spec_transition(): Circuit 2 circpad machine 0 transitioning from 0 to 65535
+Aug 30 18:36:55.000 [info] circpad_machine_spec_transitioned_to_end(): Padding machine in end state on circuit 2 (5)
+Aug 30 18:36:55.000 [info] circpad_circuit_machineinfo_free_idx(): Freeing padding info idx 0 on circuit 2 (5)
+Aug 30 18:36:55.000 [info] circpad_handle_padding_negotiated(): Middle node did not accept our padding request on circuit 2 (5)
+```
+We see that our middle support padding (since we upgraded to tor-0.4.1.5), that
+we attempt to negotiate, our machine starts on the client, transitions to the
+end state, and is freed. The last line shows that the middle doesn't have a
+padding machine that can run. 
+
+Next, we follow the same steps as earlier and replace the modified `tor` at our
+middle relay. We don't update the logging there to avoid logging on the info
+level on the live network. Looking at the client log again we see that
+negotiation works as before except for the last line: it's missing, so the
+machine is running at the middle as well.  
+
+## Implementing the APE state machine
+
+Porting is fairly straightforward: define the states for all machines, add two
+more machines (for the receive portion of WTFP-PAD, beyond AP), and pick
+reasonable parameters for the distributions (I completely winged it now, as when
+implementing APE). The [circuit-padding-ape-machine
+branch](https://github.com/pylls/tor/tree/circuit-padding-ape-machine) contains
+the commits for the full machines with plenty of comments. 
+
+Some comments on the process:
+
+- `tor-0.4.1.5` does not support two machines on the same circuit, the following
+  fix has to be made: https://trac.torproject.org/projects/tor/ticket/31111 .
+  The good news is that everything else seems to work after the small change in
+  the fix. 
+- APE randomizes its distributions. Currently, this can only be done during
+  start of `tor`. This makes sense in the censorship circumvention setting
+  (`obfs4`), less so for WF defenses: further randomizing each circuit is likely
+  a PITA for attackers with few downsides.
+- it was annoying to figure out that the lack of systemd support in my compiled
+  tor caused systemd to interrupt (SIGINT) my tor process at the middle relay
+  every five minutes. Updated build steps above to hopefully save others the
+  pain.
+- there's for sure some bug on relays when sending padding cells too early (?).
+  It can happen with some probability with the APE implementation due to
+  `circpad_machine_relay_wf_ape_send()`. Will investigate next.
+- Moving the registration of machines from the definition of the machines to
+  `circpad_machines_init()` makes sense, as suggested in the circuit padding doc
+  draft.
+
+Remember that APE is just a proof-of-concept and we make zero claims about its
+ability to withstand WF attacks, in particular those based on deep learning.
diff --git a/doc/HACKING/CodeStructure.md b/doc/HACKING/CodeStructure.md
index 736d6cd484..fffafcaed1 100644
--- a/doc/HACKING/CodeStructure.md
+++ b/doc/HACKING/CodeStructure.md
@@ -2,128 +2,121 @@
 TODO: revise this to talk about how things are, rather than how things
 have changed.
 
-TODO: Make this into good markdown.
-
-
-
-For quite a while now, the program "tor" has been built from source
-code in just two directories: src/common and src/or.
+For quite a while now, the program *tor* has been built from source
+code in just two directories: **src/common** and **src/or**.
 
 This has become more-or-less untenable, for a few reasons -- most
 notably of which is that it has led our code to become more
 spaghetti-ish than I can endorse with a clean conscience.
 
 So to fix that, we've gone and done a huge code movement in our git
-master branch, which will land in a release once Tor 0.3.5.1-alpha is
+master branch, which will land in a release once Tor `0.3.5.1-alpha` is
 out.
 
 Here's what we did:
 
-  * src/common has been turned into a set of static libraries.  These
-all live in the "src/lib/*" directories.  The dependencies between
+  * **src/common** has been turned into a set of static libraries.  These
+all live in the **src/lib/*** directories.  The dependencies between
 these libraries should have no cycles.  The libraries are:
 
-    arch -- Headers to handle architectural differences
-    cc -- headers to handle differences among compilers
-    compress -- wraps zlib, zstd, lzma
-    container -- high-level container types
-    crypt_ops -- Cryptographic operations. Planning to split this into
+    - **arch** -- Headers to handle architectural differences
+    - **cc** -- headers to handle differences among compilers
+    - **compress** -- wraps zlib, zstd, lzma
+    - **container** -- high-level container types
+    - **crypt_ops** -- Cryptographic operations. Planning to split this into
 a higher and lower level library
-    ctime -- Operations that need to run in constant-time. (Properly,
+    - **ctime** -- Operations that need to run in constant-time. (Properly,
 data-invariant time)
-    defs -- miscelaneous definitions needed throughout Tor.
-    encoding -- transforming one data type into another, and various
+    - **defs** -- miscelaneous definitions needed throughout Tor.
+    - **encoding** -- transforming one data type into another, and various
 data types into strings.
-    err -- lowest-level error handling, in cases where we can't use
+    - **err** -- lowest-level error handling, in cases where we can't use
 the logs because something that the logging system needs has broken.
-    evloop -- Generic event-loop handling logic
-    fdio -- Low-level IO wrapper functions for file descriptors.
-    fs -- Operations on the filesystem
-    intmath -- low-level integer math and misc bit-twiddling hacks
-    lock -- low-level locking code
-    log -- Tor's logging module.  This library sits roughly halfway up
+    - **evloop** -- Generic event-loop handling logic
+    - **fdio** -- Low-level IO wrapper functions for file descriptors.
+    - **fs** -- Operations on the filesystem
+    - **intmath** -- low-level integer math and misc bit-twiddling hacks
+    - **lock** -- low-level locking code
+    - **log** -- Tor's logging module.  This library sits roughly halfway up
 the library dependency diagram, since everything it depends on has to
 be carefully crafted to *not* log.
-    malloc -- Low-level wrappers for the platform memory allocation functions.
-    math -- Higher-level mathematical functions, and floating-point math
-    memarea -- An arena allocator
-    meminfo -- Functions for querying the current process's memory
+    - **malloc** -- Low-level wrappers for the platform memory allocation functions.
+    - **math** -- Higher-level mathematical functions, and floating-point math
+    - **memarea** -- An arena allocator
+    - **meminfo** -- Functions for querying the current process's memory
 status and resources
-    net -- Networking compatibility and convenience code
-    osinfo -- Querying information about the operating system
-    process -- Launching and querying the status of other processes
-    sandbox -- Backend for the linux seccomp2 sandbox
-    smartlist_core -- The lowest-level of the smartlist_t data type.
+    - **net** -- Networking compatibility and convenience code
+    - **osinfo** -- Querying information about the operating system
+    - **process** -- Launching and querying the status of other processes
+    - **sandbox** -- Backend for the linux seccomp2 sandbox
+    - **smartlist_core** -- The lowest-level of the smartlist_t data type.
 Separated from the rest of the containers library because the logging
 subsystem depends on it.
-    string -- Compatibility and convenience functions for manipulating
+    - **string** -- Compatibility and convenience functions for manipulating
 C strings.
-    term -- Terminal-related functions (currently limited to a getpass
+    - **term** -- Terminal-related functions (currently limited to a getpass
 function).
-    testsupport -- Macros for mocking, unit tests, etc.
-    thread -- Higher-level thread compatibility code
-    time -- Higher-level time management code, including format
+    - **testsupport** -- Macros for mocking, unit tests, etc.
+    - **thread** -- Higher-level thread compatibility code
+    - **time** -- Higher-level time management code, including format
 conversions and monotonic time
-    tls -- Our wrapper around our TLS library
-    trace -- Formerly src/trace -- a generic event tracing API
-    wallclock -- Low-level time code, used by the log module.
+    - **tls** -- Our wrapper around our TLS library
+    - **trace** -- Formerly src/trace -- a generic event tracing API
+    - **wallclock** -- Low-level time code, used by the log module.
 
-  * To ensure that the dependency graph in src/common remains under
-control, there is a tool that you can run called "make
-check-includes".  It verifies that each module in Tor only includes
+  * To ensure that the dependency graph in **src/common** remains under
+control, there is a tool that you can run called `make
+check-includes`.  It verifies that each module in Tor only includes
 the headers that it is permitted to include, using a per-directory
-".may_include" file.
+*.may_include* file.
 
-  * The src/or/or.h header has been split into numerous smaller
+  * The **src/or/or.h** header has been split into numerous smaller
 headers.  Notably, many important structures are now declared in a
-header called foo_st.h, where "foo" is the name of the structure.
+header called *foo_st.h*, where "foo" is the name of the structure.
 
-  * The src/or directory, which had most of Tor's code, had been split
+  * The **src/or** directory, which had most of Tor's code, had been split
 up into several directories.  This is still a work in progress:  This
 code has not itself been refactored, and its dependency graph is still
 a tangled web.  I hope we'll be working on that over the coming
 releases, but it will take a while to do.
 
-    The new top-level source directories are:
-
-     src/core -- Code necessary to actually perform or use onion routing.
-     src/feature -- Code used only by some onion routing
+    - The new top-level source directories are:
+        - **src/core** -- Code necessary to actually perform or use onion routing.
+        - **src/feature** -- Code used only by some onion routing
 configurations, or only for a special purpose.
-     src/app -- Top-level code to run, invoke, and configure the
+        - **src/app** -- Top-level code to run, invoke, and configure the
 lower-level code
 
-   The new second-level source directories are:
-     src/core/crypto -- High-level cryptographic protocols used in Tor
-     src/core/mainloop -- Tor's event loop, connection-handling, and
+    - The new second-level source directories are:
+        - **src/core/crypto** -- High-level cryptographic protocols used in Tor
+        - **src/core/mainloop** -- Tor's event loop, connection-handling, and
 traffic-routing code.
-     src/core/or -- Parts related to handling onion routing itself
-     src/core/proto -- support for encoding and decoding different
+        - **src/core/or** -- Parts related to handling onion routing itself
+        - **src/core/proto** -- support for encoding and decoding different
 wire protocols
-
-     src/feature/api -- Support for making Tor embeddable
-     src/feature/client -- Functionality which only Tor clients need
-     src/feature/control -- Controller implementation
-     src/feature/dirauth -- Directory authority
-     src/feature/dircache -- Directory cache
-     src/feature/dirclient -- Directory client
-     src/feature/dircommon -- Shared code between the other directory modules
-     src/feature/hibernate -- Hibernating when Tor is out of bandwidth
+        - **src/feature/api** -- Support for making Tor embeddable
+        - **src/feature/client** -- Functionality which only Tor clients need
+        - **src/feature/control** -- Controller implementation
+        - **src/feature/dirauth** -- Directory authority
+        - **src/feature/dircache** -- Directory cache
+        - **src/feature/dirclient** -- Directory client
+        - **src/feature/dircommon** -- Shared code between the other directory modules
+        - **src/feature/hibernate** -- Hibernating when Tor is out of bandwidth
 or shutting down
-     src/feature/hs -- v3 onion service implementation
-     src/feature/hs_common -- shared code between both onion service
+        - **src/feature/hs** -- v3 onion service implementation
+        - **src/feature/hs_common** -- shared code between both onion service
 implementations
-     src/feature/nodelist -- storing and accessing the list of relays on
+        - **src/feature/nodelist** -- storing and accessing the list of relays on
 the network.
-     src/feature/relay -- code that only relay servers and exit servers need.
-     src/feature/rend -- v2 onion service implementation
-     src/feature/stats -- statistics and history
-
-     src/app/config -- configuration and state for Tor
-     src/app/main -- Top-level functions to invoke the rest or Tor.
+        - **src/feature/relay** -- code that only relay servers and exit servers need.
+        - **src/feature/rend** -- v2 onion service implementation
+        - **src/feature/stats** -- statistics and history
+        - **src/app/config** -- configuration and state for Tor
+        - **src/app/main** -- Top-level functions to invoke the rest or Tor.
 
-  * The "tor" executable is now built in src/app/tor rather than src/or/tor.
+  * The `tor` executable is now built in **src/app/tor** rather than **src/or/tor**.
 
   * There are more static libraries than before that you need to build
 into your application if you want to embed Tor.  Rather than
-maintaining this list yourself, I recommend that you run "make
-show-libs" to have Tor emit a list of what you need to link.
+maintaining this list yourself, I recommend that you run `make
+show-libs` to have Tor emit a list of what you need to link.
diff --git a/doc/HACKING/CodingStandards.md b/doc/HACKING/CodingStandards.md
index 74db2a39a3..7999724166 100644
--- a/doc/HACKING/CodingStandards.md
+++ b/doc/HACKING/CodingStandards.md
@@ -42,6 +42,7 @@ If you have changed build system components:
    - For example, if you have changed Makefiles, autoconf files, or anything
      else that affects the build system.
 
+
 License issues
 ==============
 
@@ -58,7 +59,6 @@ Some compatible licenses include:
   - CC0 Public Domain Dedication
 
 
-
 How we use Git branches
 =======================
 
@@ -99,17 +99,28 @@ When you do a commit that needs a ChangeLog entry, add a new file to
 the `changes` toplevel subdirectory.  It should have the format of a
 one-entry changelog section from the current ChangeLog file, as in
 
-- Major bugfixes:
+  o Major bugfixes (security):
     - Fix a potential buffer overflow. Fixes bug 99999; bugfix on
       0.3.1.4-beta.
+  o Minor features (performance):
+    - Make tor faster. Closes ticket 88888.
 
 To write a changes file, first categorize the change.  Some common categories
-are: Minor bugfixes, Major bugfixes, Minor features, Major features, Code
-simplifications and refactoring.  Then say what the change does.  If
-it's a bugfix, mention what bug it fixes and when the bug was
-introduced.  To find out which Git tag the change was introduced in,
-you can use `git describe --contains <sha1 of commit>`.
-
+are:
+  o Minor bugfixes (subheading):
+  o Major bugfixes (subheading):
+  o Minor features (subheading):
+  o Major features (subheading):
+  o Code simplifications and refactoring:
+  o Testing:
+  o Documentation:
+
+The subheading is a particular area within Tor.  See the ChangeLog for
+examples.
+
+Then say what the change does.  If it's a bugfix, mention what bug it fixes
+and when the bug was introduced. To find out which Git tag the change was
+introduced in, you can use `git describe --contains <sha1 of commit>`.
 If you don't know the commit, you can search the git diffs (-S) for the first
 instance of the feature (--reverse).
 
@@ -147,10 +158,6 @@ that our CI passes.  These checks are implemented in
 `scripts/maint/lintChanges.py`.
 
 Changes file style guide:
-  * Changes files begin with "  o Header (subheading):".  The header
-    should usually be "Minor/Major bugfixes/features". The subheading is a
-    particular area within Tor.  See the ChangeLog for examples.
-
   * Make everything terse.
 
   * Write from the user's point of view: describe the user-visible changes
@@ -212,6 +219,9 @@ deviations from our C whitespace style.  Generally, we use:
    - No space between a function name and an opening paren. `puts(x)`, not
      `puts (x)`.
    - Function declarations at the start of the line.
+   - Use `void foo(void)` to declare a function with no arguments.  Saying
+     `void foo()` is C++ syntax.
+   - Use `const` for new APIs.
 
 If you use an editor that has plugins for editorconfig.org, the file
 `.editorconfig` will help you to conform this coding style.
@@ -228,20 +238,49 @@ We have some wrapper functions like `tor_malloc`, `tor_free`, `tor_strdup`, and
 `tor_gettimeofday;` use them instead of their generic equivalents.  (They
 always succeed or exit.)
 
+Specifically, Don't use `malloc`, `realloc`, `calloc`, `free`, or
+`strdup`. Use `tor_malloc`, `tor_realloc`, `tor_calloc`, `tor_free`, or
+`tor_strdup`.
+
+Don't use `tor_realloc(x, y\*z)`. Use `tor_reallocarray(x, y, z)` instead.;
+
 You can get a full list of the compatibility functions that Tor provides by
 looking through `src/lib/*/*.h`.  You can see the
 available containers in `src/lib/containers/*.h`.  You should probably
 familiarize yourself with these modules before you write too much code, or
 else you'll wind up reinventing the wheel.
 
-We don't use `strcat` or `strcpy` or `sprintf` of any of those notoriously broken
-old C functions.  Use `strlcat`, `strlcpy`, or `tor_snprintf/tor_asprintf` instead.
+
+We don't use `strcat` or `strcpy` or `sprintf` of any of those notoriously
+broken old C functions.  We also avoid `strncat` and `strncpy`. Use
+`strlcat`, `strlcpy`, or `tor_snprintf/tor_asprintf` instead.
 
 We don't call `memcmp()` directly.  Use `fast_memeq()`, `fast_memneq()`,
-`tor_memeq()`, or `tor_memneq()` for most purposes.
+`tor_memeq()`, or `tor_memneq()` for most purposes.  If you really need a
+tristate return value, use `tor_memcmp()` or `fast_memcmp()`.
+
+Don't call `assert()` directly. For hard asserts, use `tor_assert()`. For
+soft asserts, use `tor_assert_nonfatal()` or `BUG()`. If you need to print
+debug information in assert error message, consider using `tor_assertf()` and
+`tor_assertf_nonfatal()`.  If you are writing code that is too low-level to
+use the logging subsystem, use `raw_assert()`.
+
+Don't use `toupper()` and `tolower()` functions. Use `TOR_TOUPPER` and
+`TOR_TOLOWER` macros instead. Similarly, use `TOR_ISALPHA`, `TOR_ISALNUM` et.
+al. instead of `isalpha()`, `isalnum()`, etc.
+
+When allocating new string to be added to a smartlist, use
+`smartlist_add_asprintf()` to do both at once.
+
+Avoid calling BSD socket functions directly. Use portable wrappers to work
+with sockets and socket addresses. Also, sockets should be of type
+`tor_socket_t`.
+
+Don't use any of these functions: they aren't portable. Use the
+version prefixed with `tor_` instead: strtok_r, memmem, memstr,
+asprintf, localtime_r, gmtime_r, inet_aton, inet_ntop, inet_pton,
+getpass, ntohll, htonll.  (This list is incomplete.)
 
-Also see a longer list of functions to avoid in:
-https://people.torproject.org/~nickm/tor-auto/internal/this-not-that.html
 
 What code can use what other code?
 ----------------------------------
@@ -331,8 +370,16 @@ definitions when necessary.)
 Assignment operators shouldn't nest inside other expressions.  (You can
 ignore this inside macro definitions when necessary.)
 
-Functions not to write
-----------------------
+Binary data and wire formats
+----------------------------
+
+Use pointer to `char` when representing NUL-terminated string. To represent
+arbitrary binary data, use pointer to `uint8_t`. (Many older Tor APIs ignore
+this rule.)
+
+Refrain from attempting to encode integers by casting their pointers to byte
+arrays. Use something like `set_uint32()`/`get_uint32()` instead and don't
+forget about endianness.
 
 Try to never hand-write new code to parse or generate binary
 formats. Instead, use trunnel if at all possible.  See
@@ -343,7 +390,6 @@ for more information about trunnel.
 
 For information on adding new trunnel code to Tor, see src/trunnel/README
 
-
 Calling and naming conventions
 ------------------------------
 
@@ -451,6 +497,8 @@ to use it as a function callback), define it with a name like
       abc_free_(obj);
     }
 
+When deallocating, don't say e.g. `if (x) tor_free(x)`. The convention is to
+have deallocators do nothing when NULL pointer is passed.
 
 Doxygen comment conventions
 ---------------------------
diff --git a/doc/HACKING/EndOfLifeTor.md b/doc/HACKING/EndOfLifeTor.md
new file mode 100644
index 0000000000..2fece2ca9d
--- /dev/null
+++ b/doc/HACKING/EndOfLifeTor.md
@@ -0,0 +1,50 @@
+
+End of Life on an old release series
+------------------------------------
+
+Here are the steps that the maintainer should take when an old Tor release
+series reaches End of Life.  Note that they are _only_ for entire series that
+have reached their planned EOL: they do not apply to security-related
+deprecations of individual versions.
+
+### 0. Preliminaries
+
+0. A few months before End of Life:
+   Write a deprecation announcement.
+   Send the announcement out with every new release announcement.
+
+1. A month before End of Life:
+   Send the announcement to tor-announce, tor-talk, tor-relays, and the
+   packagers.
+
+### 1. On the day
+
+1. Open tickets to remove the release from:
+   - the jenkins builds
+   - tor's Travis CI cron jobs
+   - chutney's Travis CI tests (#)
+   - stem's Travis CI tests (#)
+
+2. Close the milestone in Trac. To do this, go to Trac, log in,
+   select "Admin" near the top of the screen, then select "Milestones" from
+   the menu on the left.  Click on the milestone for this version, and
+   select the "Completed" checkbox. By convention, we select the date as
+   the End of Life date.
+
+3. Replace NNN-backport with NNN-unreached-backport in all open trac tickets.
+
+4. If there are any remaining tickets in the milestone:
+     - merge_ready tickets are for backports:
+       - if there are no supported releases for the backport, close the ticket
+       - if there is an earlier (LTS) release for the backport, move the ticket
+         to that release
+     - other tickets should be closed (if we won't fix them) or moved to a
+       supported release (if we will fix them)
+
+5. Mail the end of life announcement to tor-announce, the packagers list,
+   and tor-relays. The current list of packagers is in ReleasingTor.md.
+
+6. Ask at least two of weasel/arma/Sebastian to remove the old version
+   number from their approved versions list.
+
+7. Update the CoreTorReleases wiki page.
diff --git a/doc/HACKING/Fuzzing.md b/doc/HACKING/Fuzzing.md
index 2039d6a4c0..c2db7e9853 100644
--- a/doc/HACKING/Fuzzing.md
+++ b/doc/HACKING/Fuzzing.md
@@ -1,6 +1,6 @@
-= Fuzzing Tor
+# Fuzzing Tor
 
-== The simple version (no fuzzing, only tests)
+## The simple version (no fuzzing, only tests)
 
 Check out fuzzing-corpora, and set TOR_FUZZ_CORPORA to point to the place
 where you checked it out.
@@ -12,7 +12,7 @@ This won't actually fuzz Tor!  It will just run all the fuzz binaries
 on our existing set of testcases for the fuzzer.
 
 
-== Different kinds of fuzzing
+## Different kinds of fuzzing
 
 Right now we support three different kinds of fuzzer.
 
@@ -37,7 +37,7 @@ In all cases, you'll need some starting examples to give the fuzzer when it
 starts out.  There's a set in the "fuzzing-corpora" git repository.  Try
 setting TOR_FUZZ_CORPORA to point to a checkout of that repository
 
-== Writing Tor fuzzers
+## Writing Tor fuzzers
 
 A tor fuzzing harness should have:
 * a fuzz_init() function to set up any necessary global state.
@@ -52,7 +52,7 @@ bug, or accesses memory it shouldn't. This helps fuzzing frameworks detect
 "interesting" cases.
 
 
-== Guided Fuzzing with AFL
+## Guided Fuzzing with AFL
 
 There is no HTTPS, hash, or signature for American Fuzzy Lop's source code, so
 its integrity can't be verified. That said, you really shouldn't fuzz on a
@@ -101,7 +101,7 @@ macOS (OS X) requires slightly more preparation, including:
 * using afl-clang (or afl-clang-fast from the llvm directory)
 * disabling external crash reporting (AFL will guide you through this step)
 
-== Triaging Issues
+## Triaging Issues
 
 Crashes are usually interesting, particularly if using AFL_HARDEN=1 and --enable-expensive-hardening. Sometimes crashes are due to bugs in the harness code.
 
@@ -115,7 +115,7 @@ To see what fuzz-http is doing with a test case, call it like this:
 
 (Logging is disabled while fuzzing to increase fuzzing speed.)
 
-== Reporting Issues
+## Reporting Issues
 
 Please report any issues discovered using the process in Tor's security issue
 policy:
diff --git a/doc/HACKING/HelpfulTools.md b/doc/HACKING/HelpfulTools.md
index cba57e875d..866b321287 100644
--- a/doc/HACKING/HelpfulTools.md
+++ b/doc/HACKING/HelpfulTools.md
@@ -315,6 +315,30 @@ If you use emacs for editing Tor and nothing else, you could always just say:
 There is probably a better way to do this.  No, we are probably not going
 to clutter the files with emacs stuff.
 
+Building a tag file (code index)
+--------------------------------
+
+Many functions in tor use `MOCK_IMPL` wrappers for unit tests. Your
+tag-building program must be told how to handle this syntax.
+
+If you're using emacs, you can generate an emacs-compatible tag file using
+`make tags`. This will run your system's `etags`. Tor's build system assumes
+that you're using the emacs-specific version of `etags` (bundled under the
+`xemacs21-bin` package on Debian). This is incompatible with other versions of
+`etags` such as the version provided by Exuberant Ctags.
+
+If you're using vim or emacs, you can also use Universal Ctags to build a tag
+file using the syntax:
+
+    ctags -R -D 'MOCK_IMPL(r,h,a)=r h a' .
+
+If you're using an older version of Universal Ctags, you can use the following
+instead:
+
+    ctags -R --mline-regex-c='/MOCK_IMPL\([^,]+,\W*([a-zA-Z0-9_]+)\W*,/\1/f/{mgroup=1}' .
+
+A vim-compatible tag file will be generated by default. If you use emacs, add
+the `-e` flag to generate an emacs-compatible tag file.
 
 Doxygen
 -------
diff --git a/doc/HACKING/HowToReview.md b/doc/HACKING/HowToReview.md
index 2d1f3d1c9e..2325e70175 100644
--- a/doc/HACKING/HowToReview.md
+++ b/doc/HACKING/HowToReview.md
@@ -63,7 +63,7 @@ Let's look at the code!
 Let's look at the documentation!
 --------------------------------
 
-- Does the documentation confirm to CodingStandards.txt?
+- Does the documentation conform to CodingStandards.txt?
 
 - Does it make sense?
 
diff --git a/doc/HACKING/Module.md b/doc/HACKING/Module.md
index 9cf36090b4..781bb978f2 100644
--- a/doc/HACKING/Module.md
+++ b/doc/HACKING/Module.md
@@ -8,13 +8,24 @@ module in Tor.
 In the context of the tor code base, a module is a subsystem that we can
 selectively enable or disable, at `configure` time.
 
-Currently, there is only one module:
+Currently, tor has these modules:
 
+  - Relay subsystem (relay)
+  - Directory cache system (dircache).
   - Directory Authority subsystem (dirauth)
 
-It is located in its own directory in `src/feature/dirauth/`. To disable it,
-one need to pass `--disable-module-dirauth` at configure time. All modules
-are currently enabled by default.
+The dirauth code is located in its own directory in `src/feature/dirauth/`.
+
+The relay code is located in a directory named `src/*/*relay`, which is
+being progressively refactored and disabled.
+
+The dircache code is located in `src/*/*dircache`.  Right now, it is
+disabled if and only if the relay module is disabled.  (We are treating
+them as separate modules because they are logically independent, not
+because you would actually want to run one without the other.)
+
+To disable a module, pass `--disable-module-{dirauth,relay}` at configure
+time. All modules are currently enabled by default.
 
 ## Build System ##
 
@@ -24,7 +35,7 @@ The changes to the build system are pretty straightforward.
    contains a list (white-space separated) of the module in tor. Add yours to
    the list.
 
-2. Use the `AC_ARG_ENABLE([module-dirauth]` template for your new module. We
+2. Use the `AC_ARG_ENABLE([module-relay]` template for your new module. We
    use the "disable module" approach instead of enabling them one by one. So,
    by default, tor will build all the modules.
 
@@ -32,7 +43,7 @@ The changes to the build system are pretty straightforward.
    the C code to conditionally compile things for your module. And the
    `BUILD_MODULE_<name>` is also defined for automake files (e.g: include.am).
 
-3. In the `src/core/include.am` file, locate the `MODULE_DIRAUTH_SOURCES`
+3. In the `src/core/include.am` file, locate the `MODULE_RELAY_SOURCES`
    value.  You need to create your own `_SOURCES` variable for your module
    and then conditionally add the it to `LIBTOR_A_SOURCES` if you should
    build the module.
@@ -40,18 +51,14 @@ The changes to the build system are pretty straightforward.
    It is then **very** important to add your SOURCES variable to
    `src_or_libtor_testing_a_SOURCES` so the tests can build it.
 
-4. Do the same for header files, locate `ORHEADERS +=` which always add all
-   headers of all modules so the symbol can be found for the module entry
-   points.
-
 Finally, your module will automatically be included in the
-`TOR_MODULES_ALL_ENABLED` variable which is used to build the unit tests. They
-always build everything in order to tests everything.
+`TOR_MODULES_ALL_ENABLED` variable which is used to build the unit tests.
+They always build everything in order to test everything.
 
 ## Coding ##
 
-As mentioned above, a module must be isolated in its own directory (name of
-the module) in `src/feature/`.
+As mentioned above, a module should be isolated in its own directories,
+suffixed with the name of the module, in `src/*/`.
 
 There are couples of "rules" you want to follow:
 
diff --git a/doc/HACKING/ReleasingTor.md b/doc/HACKING/ReleasingTor.md
index a97ca08ce9..0f453ca2aa 100644
--- a/doc/HACKING/ReleasingTor.md
+++ b/doc/HACKING/ReleasingTor.md
@@ -5,7 +5,7 @@ Putting out a new release
 Here are the steps that the maintainer should take when putting out a
 new Tor release:
 
-=== 0. Preliminaries
+### 0. Preliminaries
 
 1. Get at least two of weasel/arma/Sebastian to put the new
    version number in their approved versions list.  Give them a few
@@ -18,7 +18,7 @@ new Tor release:
    date of a TB that contains it.  See note below in "commit, upload,
    announce".
 
-=== I. Make sure it works
+### I. Make sure it works
 
 1. Make sure that CI passes: have a look at Travis
    (https://travis-ci.org/torproject/tor/branches), Appveyor
@@ -52,7 +52,7 @@ new Tor release:
       memory leaks.)
 
 
-=== II. Write a changelog
+### II. Write a changelog
 
 
 1a. (Alpha release variant)
@@ -139,7 +139,7 @@ new Tor release:
    text of existing entries, though.)
 
 
-=== III. Making the source release.
+### III. Making the source release.
 
 1. In `maint-0.?.x`, bump the version number in `configure.ac` and run
    `make update-versions` to update version numbers in other
@@ -165,7 +165,7 @@ new Tor release:
    If it is not, you'll need to poke Roger, Weasel, and Sebastian again: see
    item 0.1 at the start of this document.
 
-=== IV. Commit, upload, announce
+### IV. Commit, upload, announce
 
 1. Sign the tarball, then sign and push the git tag:
 
@@ -197,7 +197,7 @@ new Tor release:
 3. Email the packagers (cc'ing tor-team) that a new tarball is up.
    The current list of packagers is:
 
-       - {weasel,gk,mikeperry} at torproject dot org
+       - {weasel,sysrqb,mikeperry} at torproject dot org
        - {blueness} at gentoo dot org
        - {paul} at invizbox dot io
        - {vincent} at invizbox dot com
@@ -241,15 +241,17 @@ new Tor release:
    For templates to use when announcing, see:
        https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/AnnouncementTemplates
 
-=== V. Aftermath and cleanup
+### V. Aftermath and cleanup
 
 1. If it's a stable release, bump the version number in the
     `maint-x.y.z` branch to "newversion-dev", and do a `merge -s ours`
     merge to avoid taking that change into master.
 
-2. Forward-port the ChangeLog (and ReleaseNotes if appropriate) to the
-   master branch.
-
-3. Keep an eye on the blog post, to moderate comments and answer questions.
+2. If there is a new `maint-x.y.z` branch, create a Travis CI cron job that
+   builds the release every week. (It's ok to skip the weekly build if the
+   branch was updated in the last 24 hours.)
 
+3. Forward-port the ChangeLog (and ReleaseNotes if appropriate) to the
+   master branch.
 
+4. Keep an eye on the blog post, to moderate comments and answer questions.