12 files changed, 2000 insertions, 189 deletions
diff --git a/doc/HACKING/CircuitPaddingDevelopment.md b/doc/HACKING/CircuitPaddingDevelopment.md
new file mode 100644
index 0000000000..a4e65697b8
--- /dev/null
+++ b/doc/HACKING/CircuitPaddingDevelopment.md
@@ -0,0 +1,1234 @@
+# Circuit Padding Developer Documentation
+
+This document is written for researchers who wish to prototype and evaluate circuit-level padding defenses in Tor.
+
+Written by Mike Perry and George Kadianakis.
+
+# Table of Contents
+
+- [0. Background](#0-background)
+- [1. Introduction](#1-introduction)
+    - [1.1. System Overview](#11-system-overview)
+    - [1.2. Layering Model](#12-layering-model)
+    - [1.3. Computation Model](#13-computation-model)
+    - [1.4. Deployment Constraints](#14-other-deployment-constraints)
+- [2. Creating New Padding Machines](#2-creating-new-padding-machines)
+    - [2.1. Registering a New Padding Machine](#21-registering-a-new-padding-machine)
+    - [2.2. Machine Activation and Shutdown](#22-machine-activation-and-shutdown)
+- [3. Specifying Padding Machines](#3-specifying-padding-machines)
+    - [3.1. Padding Machine States](#31-padding-machine-states)
+    - [3.2. Padding Machine State Transitions](#32-padding-machine-state-transitions)
+    - [3.3. Specifying Per-State Padding](#33-specifying-per-state-padding)
+    - [3.4. Specifying Precise Cell Counts](#34-specifying-precise-cell-counts)
+    - [3.5. Specifying Overhead Limits](#35-specifying-overhead-limits)
+- [4. Evaluating Padding Machines](#4-evaluating-padding-machines)
+    - [4.1. Pure Simulation](#41-pure-simulation)
+    - [4.2. Testing in Chutney](#42-testing-in-chutney)
+    - [4.3. Testing in Shadow](#43-testing-in-shadow)
+    - [4.4. Testing on the Live Network](#44-testing-on-the-live-network)
+- [5. Example Padding Machines](#5-example-padding-machines)
+    - [5.1. Deployed Circuit Setup Machines](#51-deployed-circuit-setup-machines)
+    - [5.2. Adaptive Padding Early](#52-adaptive-padding-early)
+    - [5.3. Sketch of Tamaraw](#53-sketch-of-tamaraw)
+    - [5.4. Other Padding Machines](#54-other-padding-machines)
+- [6. Framework Implementation Details](#6-framework-implementation-details)
+    - [6.1. Memory Allocation Conventions](#61-memory-allocation-conventions)
+    - [6.2. Machine Application Events](#62-machine-application-events)
+    - [6.3. Internal Machine Events](#63-internal-machine-events)
+- [7. Future Features and Optimizations](#7-future-features-and-optimizations)
+    - [7.1. Load Balancing and Flow Control](#71-load-balancing-and-flow-control)
+    - [7.2. Timing and Queuing Optimizations](#72-timing-and-queuing-optimizations)
+    - [7.3. Better Machine Negotiation](#73-better-machine-negotiation)
+    - [7.4. Probabilistic State Transitions](#74-probabilistic-state-transitions)
+    - [7.5. More Complex Pattern Recognition](#75-more-complex-pattern-recognition)
+- [8. Open Research Problems](#8-open-research-problems)
+    - [8.1. Onion Service Circuit Setup](#81-onion-service-circuit-setup)
+    - [8.2. Onion Service Fingerprinting](#82-onion-service-fingerprinting)
+    - [8.3. Open World Fingerprinting](#83-open-world-fingerprinting)
+    - [8.4. Protocol Usage Fingerprinting](#84-protocol-usage-fingerprinting)
+    - [8.5. Datagram Transport Side Channels](#85-datagram-transport-side-channels)
+- [9. Must Read Papers](#9-must-read-papers)
+
+
+## 0. Background
+
+Tor supports both connection-level and circuit-level padding, and both
+systems are live on the network today. The connection-level padding behavior
+is described in [section 2 of
+padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47). The
+circuit-level padding behavior is described in [section 3 of
+padding-spec.txt](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L282).
+
+These two systems are orthogonal and should not be confused. The
+connection-level padding system is only active while the TLS connection is
+otherwise idle. Moreover, it regards circuit-level padding as normal data
+traffic, and hence while the circuit-level padding system is actively padding,
+the connection-level padding system will not add any additional overhead.
+
+While the currently deployed circuit-level padding behavior is quite simple,
+it is built on a flexible framework. This framework supports the description
+of event-driven finite state machine by filling in fields of a simple C
+structure, and is designed to support any delay-free statistically shaped
+cover traffic on individual circuits, with cover traffic flowing to and from a
+node of the implementor's choice (Guard, Middle, Exit, Rendezvous, etc).
+
+This class of system was first proposed in
+[Timing analysis in low-latency mix networks: attacks and defenses](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
+by Shmatikov and Wang, and extended for the website traffic fingerprinting
+domain by Juarez et al. in
+[Toward an Efficient Website Fingerprinting Defense](http://arxiv.org/pdf/1512.00524). The
+framework also supports fixed parameterized probability distributions, as
+used in [APE](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) by Tobias
+Pulls, and many other features.
+
+This document describes how to use Tor's circuit padding framework to
+implement and deploy novel delay-free cover traffic defenses.
+
+## 1. Introduction
+
+The circuit padding framework is the official way to implement padding
+defenses in Tor. It may be used in combination with application-layer
+defenses, and/or obfuscation defenses, or on its own.
+
+Its current design should be enough to deploy most defenses without
+modification, but you can extend it to [provide new
+features](#7-future-features-and-optimizations) as well.
+
+### 1.1. System Overview
+
+Circuit-level padding can occur between Tor clients and relays at any hop of
+one of the client's circuits. Both parties need to support the same padding
+mechanisms for the system to work, and the client must enable it.
+
+We added a padding negotiation relay cell to the Tor protocol that clients use
+to ask a relay to start padding, as well as a torrc directive for researchers
+to pin their clients' relay selection to the subset of Tor nodes that
+implement their custom defenses, to support ethical live network testing and
+evaluation.
+
+Circuit-level padding is performed by 'padding machines'. A padding machine is
+a finite state machine. Every state specifies a different form of
+padding style, or stage of padding, in terms of inter-packet timings and total
+packet counts.
+
+Padding state machines are specified by filling in fields of a C structure,
+which specifies the transitions between padding states based on various events,
+probability distributions of inter-packet delays, and the conditions under
+which padding machines should be applied to circuits.
+
+This compact C structure representation is designed to function as a
+microlanguage, which can be compiled down into a
+bitstring that [can be tuned](#13-computation-model) using various
+optimization methods (such as gradient descent, GAs, or GANs), either in
+bitstring form or C struct form.
+
+The event driven, self-contained nature of this framework is also designed to
+make [evaluation](#4-evaluating-padding-machines) both expedient and rigorously
+reproducible.
+
+This document covers the engineering steps to write, test, and deploy a
+padding machine, as well as how to extend the framework to support new machine
+features.
+
+If you prefer to learn by example, you may want to skip to either the
+[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
+5](#5-example-padding-machines) for example machines to get you up and running
+quickly.
+
+### 1.2. Layering Model
+
+The circuit padding framework is designed to provide one layer in a layered
+system of interchangeable components.
+
+The circuit padding framework operates at the Tor circuit layer. It only deals
+with the inter-cell timings and quantity of cells sent on a circuit. It can
+insert cells on a circuit in arbitrary patterns, and in response to arbitrary
+conditions, but it cannot delay cells. It also does not deal with packet
+sizes, how cells are packed into TLS records, or ways that the Tor protocol
+might be recognized on the wire.
+
+The problem of differentiating Tor traffic from non-Tor traffic based on
+TCP/TLS packet sizes, initial handshake patterns, and DPI characteristics is the
+domain of [pluggable
+transports](https://trac.torproject.org/projects/tor/wiki/doc/AChildsGardenOfPluggableTransports),
+which may optionally be used in conjunction with this framework (or without
+it).
+
+This document focuses primarily on the circuit padding framework's cover
+traffic features, and will only briefly touch on the potential obfuscation and
+application layer coupling points of the framework. Explicit layer coupling 
+points can be created by adding either new [machine application
+events](#62-machine-application-events) or new [internal machine
+events](#63-internal-machine-events) to the circuit padding framework, so that
+your padding machines can react to events from other layers.
+
+### 1.3. Computation Model
+
+The circuit padding framework is designed to support succinctly specified
+defenses that can be tuned through [computer-assisted
+optimization](#4-evaluating-padding-machines).
+
+We chose to generalize the original [Adaptive Padding 2-state
+design](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf) into an
+event-driven state machine because state machines are the simplest form of
+sequence recognition devices from [automata
+theory](https://en.wikipedia.org/wiki/Finite-state_machine).
+
+Most importantly: this framing allows cover traffic defenses to be modeled as
+an optimization problem search space, expressed as fields of a C structure
+(which is simultaneously a compact opaque bitstring as well as a symbolic
+vector in an abstract feature space). This kind of space is particularly well
+suited to search by gradient descent, GAs, and GANs. 
+
+When performing this optimization search, each padding machine should have a
+fitness function, which will allow two padding machines to be compared for
+relative effectiveness. Optimization searches work best if this fitness can be
+represented as a single number, for example the total amount by which it
+reduces the [Balanced
+Accuracy](https://en.wikipedia.org/wiki/Precision_and_recall#Imbalanced_Data)
+of an adversary's classifier, divided by an amount of traffic overhead. 
+
+Before you begin the optimization phase for your defense, you should
+also carefully consider the [features and
+optimizations](#7-future-features-and-optimizations) that we suspect will be
+useful, and also see if you can come up with any more. You should similarly be
+sure to restrict your search space to avoid areas of the bitstring/feature
+vector that you are sure you will not need. For example, some
+[applications](#8-open-research-problems) may not need the histogram
+accounting used by Adaptive Padding, but might need to add other forms of
+[pattern recognition](#75-more-complex-pattern-recognition) to react to
+sequences that resemble HTTP GET and HTTP POST.
+
+### 1.4. Other Deployment Constraints
+
+The framework has some limitations that are the result of deliberate
+choices. We are unlikely to deploy defenses that ignore these limitations.
+
+In particular, we have deliberately not provided any mechanism to delay actual
+user traffic, even though we are keenly aware that if we were to support
+additional delay, defenses would be able to have [more success with less
+bandwidth
+overhead](https://freedom.cs.purdue.edu/anonymity/trilemma/index.html).
+
+In the website traffic fingerprinting domain, [provably optimal
+defenses](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
+achieve their bandwidth overhead bounds by ensuring that a non-empty queue is
+maintained, by rate limiting traffic below the actual throughput of a circuit.
+For optimal results, this queue must avoid draining to empty, and yet it
+must also be drained fast enough to avoid tremendous queue overhead in fast
+Tor relays, which carry hundreds of thousands of circuits simultaneously.
+
+Unfortunately, Tor's end-to-end flow control is not congestion control. Its
+window sizes are currently fixed. This means there is no signal when queuing
+occurs, and thus no ability to limit queue size through pushback. This means
+there is currently no way to do the fine-grained queue management necessary to
+create such a queue and rate limit traffic effectively enough to keep this
+queue from draining to empty, without also risking that aggregate queuing
+would cause out-of-memory conditions on fast relays.
+
+It may be possible to create a congestion control algorithm that can support
+such fine grained queue management, but this is a [deeply unsolved area of
+research](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
+
+Even beyond these major technical hurdles, additional latency is also
+unappealing to the wider Internet community, for the simple reason that
+bandwidth [continues to increase
+exponentially](https://ipcarrier.blogspot.com/2014/02/bandwidth-growth-nearly-what-one-would.html)
+where as the speed of light is fixed. Significant engineering effort has been
+devoted to optimizations that reduce the effect of latency on Internet
+protocols. To go against this trend would ensure our irrelevance to the wider
+conversation about traffic analysis defenses for low latency Internet protocols.
+
+On the other hand, through [load
+balancing](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt)
+and [circuit multiplexing strategies](https://bugs.torproject.org/29494), we
+believe it is possible to add significant bandwidth overhead in the form of
+cover traffic, without significantly impacting end-user performance.
+
+For these reasons, we believe the trade-off should be in favor of adding more
+cover traffic, rather than imposing queuing memory overhead and queuing delay.
+
+As a last resort for narrowly scoped application domains (such as
+shaping Tor service-side onion service traffic to look like other websites or
+different application-layer protocols), delay *may* be added at the
+[application layer](https://petsymposium.org/2017/papers/issue2/paper54-2017-2-source.pdf).
+Any additional cover traffic required by such defenses should still be
+added at the circuit padding layer using this framework, to provide
+engineering efficiency through loose layer coupling and component re-use, as
+well as to provide additional gains against [low
+resolution](https://github.com/torproject/torspec/blob/master/padding-spec.txt#L47)
+end-to-end traffic correlation.
+
+Because such delay-based defenses will impact performance significantly more
+than simply adding cover traffic, they must be optional, and negotiated by
+only specific application layer endpoints that want them. This will have
+consequences for anonymity sets and base rates, if such traffic shaping and
+additional cover traffic is not very carefully constructed.
+
+In terms of acceptable overhead, because Tor onion services
+[currently use](https://metrics.torproject.org/hidserv-rend-relayed-cells.html)
+less than 1% of the
+[total consumed bandwidth](https://metrics.torproject.org/bandwidth-flags.html)
+of the Tor network, and because onion services exist to provide higher
+security as compared to Tor Exit traffic, they are an attractive target for
+higher-overhead defenses. We encourage researchers to target this use case
+for defenses that require more overhead, and/or for the deployment of
+optional negotiated application-layer delays on either the server or the
+client side.
+
+## 2. Creating New Padding Machines
+
+This section explains how to use the existing mechanisms in Tor to define a
+new circuit padding machine.  We assume here that you know C, and are at
+least somewhat familiar with Tor development.  For more information on Tor
+development in general, see the other files in doc/HACKING/ in a recent Tor
+distribution.
+
+Again, if you prefer to learn by example, you may want to skip to either the
+[QuickStart Guide](CircuitPaddingQuickStart.md), and/or [Section
+5](#5-example-padding-machines) for example machines to get up and running
+quickly.
+
+To create a new padding machine, you must:
+
+  1. Define your machine using the fields of a heap-allocated
+     `circpad_machine_spec_t` C structure.
+
+  2. Register this object in the global list of available padding machines,
+     using `circpad_register_padding_machine()`.
+
+  3. Ensure that your machine is properly negotiated under your desired
+     circuit conditions.
+
+### 2.1. Registering a New Padding Machine
+
+Again, a circuit padding machine is designed to be specified entirely as a [single
+C structure](#13-computation-model).
+
+Your machine definitions should go into their own functions in
+[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c). For
+details on all of the fields involved in specifying a padding machine, see
+[Section 3](#3-specifying-padding-machines).
+
+You must register your machine in `circpad_machines_init()` in
+[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). To
+add a new padding machine specification, you must allocate a
+`circpad_machine_spec_t` on the heap with `tor_malloc_zero()`, give it a
+human readable name string, and a machine number equivalent to the number of
+machines in the list, and register the structure using
+`circpad_register_padding_machine()`.
+
+Each machine must have a client instance and a relay instance. Register your
+client-side machine instance in the `origin_padding_machines` list, and your
+relay side machine instance in the `relay_padding_machines` list. Once you
+have registered your instance, you do not need to worry about deallocation;
+this is handled for you automatically.
+
+Both machine lists use registration order to signal machine precedence for a
+given `machine_idx` slot on a circuit. This means that machines that are
+registered last are checked for activation *before* machines that are
+registered first. (This reverse precedence ordering allows us to
+deprecate older machines simply by adding new ones after them.)
+
+### 2.2. Machine Activation and Shutdown
+
+After a machine has been successfully registered with the framework, it will
+be instantiated on any client-side circuits that support it. Only client-side
+circuits may initiate padding machines, but either clients or relays may shut
+down padding machines.
+
+#### 2.2.1. Machine Application Conditions
+
+The
+[circpad_machine_conditions_t conditions field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L641)
+of your `circpad_machine_spec_t` machine definition instance controls the
+conditions under which your machine will be attached and enabled on a Tor
+circuit, and when it gets shut down.
+
+*All* of your explicitly specified conditions in
+`circpad_machine_spec_t.conditions` *must* be met for the machine to be
+applied to a circuit. If *any* condition ceases to be met, then the machine
+is shut down.  (This is checked on every event that arrives, even if the
+condition is unrelated to the event.)
+Another way to look at this is that
+all specified conditions must evaluate to true for the entire duration that
+your machine is running. If any are false, your machine does not run (or
+stops running and shuts down).
+
+In particular, as part of the
+[circpad_machine_conditions_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L149),
+the circuit padding subsystem gives the developer the option to enable a
+machine based on:
+  - The
+    [length](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L157)
+    on the circuit (via the `min_hops` field).
+  - The
+    [current state](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L174)
+    of the circuit, such as streams, relay_early, etc. (via the
+    `circpad_circuit_state_t state_mask` field).
+  - The
+    [purpose](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L178)
+    (i.e. type) of the circuit (via the `circpad_purpose_mask_t purpose_mask`
+    field).
+
+This condition mechanism is the preferred way to determine if a machine should
+apply to a circuit. For information about potentially useful conditions that
+we have considered but have not yet implemented, see [Section
+7.3](#73-better-machine-negotiation). We will happily accept patches for those
+conditions, or any for other additional conditions that are needed for your
+use case.
+
+#### 2.2.2. Detecting and Negotiating Machine Support
+
+When a new machine specification is added to Tor (or removed from Tor), you
+should bump the Padding subprotocol version in `src/core/or/protover.c` and
+`src/rust/protover/protover.rs`, add a field to `protover_summary_flags_t` in
+`or.h`, and set this field in `memoize_protover_summary()` in versions.c. This
+new field must then be checked in `circpad_node_supports_padding()` in
+`circuitpadding.c`.
+
+Note that this protocol version update and associated support check is not
+necessary if your experiments will *only* be using your own relays that
+support your own padding machines. This can be accomplished by using the
+`MiddleNodes` directive; see [Section 4](#4-evaluating-padding-machines) for more information.
+
+If the protocol support check passes for the circuit, then the client sends a
+`RELAY_COMMAND_PADDING_NEGOTIATE` cell towards the
+`circpad_machine_spec_t.target_hop` relay, and immediately enables the
+padding machine, and may begin sending padding. (The framework does not wait
+for the `RELAY_COMMAND_PADDING_NEGOTIATED` response to begin padding,
+so that we can
+switch between machines rapidly.)
+
+#### 2.2.3. Machine Shutdown Mechanisms
+
+Padding machines can be shut down on a circuit in three main ways:
+  1. During a `circpad_machine_event` callback, when
+     `circpad_machine_spec_t.conditions` no longer applies (client side)
+  2. After a transition to the CIRCPAD_STATE_END, if
+     `circpad_machine_spec_t.should_negotiate_end` is set (client or relay
+     side)
+  3. If there is a `RELAY_COMMAND_PADDING_NEGOTIATED` error response from the
+     relay during negotiation.
+
+Each of these cases causes the originating node to send a relay cell towards
+the other side, indicating that shutdown has occurred. The client side sends
+`RELAY_COMMAND_PADDING_NEGOTIATE`, and the relay side sends
+`RELAY_COMMAND_PADDING_NEGOTIATED`.
+
+Because padding from malicious exit nodes can be used to construct active
+timing-based side channels to malicious guard nodes, the client checks that
+padding-related cells only come from relays with active padding machines.
+For this reason, when a client decides to shut down a padding machine,
+the framework frees the mutable `circuit_t.padding_info`, but leaves the
+`circuit_t.padding_machine` pointer set until the
+`RELAY_COMMAND_PADDING_NEGOTIATED` response comes back, to ensure that any
+remaining in-flight padding packets are recognized a valid. Tor does
+not yet close circuits due to violation of this property, but the
+[vanguards addon component "bandguard"](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md#the-bandguards-subsystem)
+does.
+
+As an optimization, a client may replace a machine with another, by
+sending a `RELAY_COMMAND_PADDING_NEGOTIATE` cell to shut down a machine, and
+immediately sending a `RELAY_COMMAND_PADDING_NEGOTIATE` to start a new machine
+in the same index, without waiting for the response from the first negotiate
+cell.
+
+Unfortunately, there is a known bug as a consequence of this optimization. If
+your machine depends on repeated shutdown and restart of the same machine
+number on the same circuit, please see [Bug
+30922](https://bugs.torproject.org/30992). Depending on your use case, we may
+need to fix that bug or help you find a workaround. See also [Section
+6.1.3](#613-deallocation-and-shutdown) for some more technical details on this
+mechanism.
+
+
+## 3. Specifying Padding Machines
+
+By now, you should understand how to register, negotiate, and control the
+lifetime of your padding machine, but you still don't know how to make it do
+anything yet. This section should help you understand how to specify how your
+machine reacts to events and adds padding to the wire.
+
+If you prefer to learn by example first instead, you may wish to skip to
+[Section 5](#5-example-padding-machines).
+
+
+A padding machine is specified by filling in an instance of
+[circpad_machine_spec_t](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L605). Instances
+of this structure specify the precise functionality of a machine: it's
+what the circuit padding developer is called to write. These instances
+are created only at startup, and are referenced via `const` pointers during
+normal operation.
+
+In this section we will go through the most important elements of this
+structure.
+
+### 3.1. Padding Machine States
+
+A padding machine is a finite state machine where each state
+specifies a different style of padding.
+
+As an example of a simple padding machine, you could have a state machine
+with the following states: `[START] -> [SETUP] -> [HTTP] -> [END]` where the
+`[SETUP]` state pads in a way that obfuscates the ''circuit setup'' of Tor,
+and the `[HTTP]` state pads in a way that emulates a simple HTTP session. Of
+course, padding machines can be more complicated than that, with dozens of
+states and non-trivial transitions.
+
+Padding developers encode the machine states in the
+[circpad_machine_spec_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L655). Each
+machine state is described by a
+[circpad_state_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L273)
+and each such structure specifies the style and amount of padding to be sent,
+as well as the possible state transitions.
+
+The function `circpad_machine_states_init()` must be used for allocating and
+initializing the `circpad_machine_spec_t.states` array before states and
+state transitions can be defined, as some of the state object has non-zero
+default values.
+
+### 3.2. Padding Machine State Transitions
+
+As described above, padding machines can have multiple states, to
+support different forms of padding. Machines can transition between
+states based on events that occur either on the circuit level or on
+the machine level.
+
+State transitions are specified using the
+[next_state field](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L381)
+of the `circpad_state_t` structure. As a simple example, to transition
+from state `A` to state `B` when event `E` occurs, you would use the
+following code: `A.next_state[E] = B`.
+
+#### 3.2.1. State Transition Events
+
+Here we will go through
+[the various events](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.h#L30)
+that can be used to transition between states:
+
+* Circuit-level events
+  * `CIRCPAD_EVENT_NONPADDING_RECV`: A non-padding cell is received
+  * `CIRCPAD_EVENT_NONPADDING_SENT`: A non-adding cell is sent
+  * `CIRCPAD_EVENT_PADDING_SENT`: A padding cell is sent
+  * `CIRCPAD_EVENT_PADDING_RECV`: A padding cell is received
+* Machine-level events
+  * `CIRCPAD_EVENT_INFINITY`: Tried to schedule padding using the ''infinity bin''.
+  * `CIRCPAD_EVENT_BINS_EMPTY`: All histogram bins are empty (out of tokens)
+  * `CIRCPAD_EVENT_LENGTH_COUNT`: State has used all its padding capacity (see `length_dist` below)
+
+### 3.3. Specifying Per-State Padding
+
+Each state of a padding machine specifies either:
+  * A padding histogram describing inter-transmission delays between cells;
+d     OR
+  * A parameterized delay probability distribution for inter-transmission
+    delays between cells.
+
+Either mechanism specifies essentially the *minimum inter-transmission time*
+distribution. If non-padding traffic does not get transmitted from this
+endpoint before the delay value sampled from this distribution expires, a
+padding packet is sent.
+
+The choice between histograms and probability distributions can be subtle. A
+rule of thumb is that probability distributions are easy to specify and
+consume very little memory, but might not be able to describe certain types
+of complex padding logic. Histograms, in contrast, can support precise
+packet-count oriented or multimodal delay schemes, and can use token removal
+logic to reduce overhead and shape the total padding+non-padding inter-packet
+delay distribution towards an overall target distribution.
+
+We suggest that you start with a probability distribution if possible, and
+you move to a histogram-based approach only if a probability distribution
+does not suit your needs.
+
+#### 3.3.1. Padding Probability Distributions
+
+The easiest, most compact way to schedule padding using a machine state is to
+use a probability distribution that specifies the possible delays. That can
+be done
+[using the circpad_state_t fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L339)
+`iat_dist`, `dist_max_sample_usec` and `dist_added_shift_usec`.
+
+The Tor circuit padding framework
+[supports multiple types](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L214)
+of probability distributions, and the developer should use the
+[circpad_distribution_t structure](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L240)
+to specify them as well as the required parameters.
+
+#### 3.3.2. Padding Histograms
+
+A more advanced way to schedule padding is to use a ''padding
+histogram''. The main advantages of a histogram are that it allows you to
+specify distributions that are not easily parameterized in closed form, or
+require specific packet counts at particular time intervals. Histograms also
+allow you to make use of an optional traffic minimization and shaping
+optimization called *token removal*, which is central to the original
+[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
+concept.
+
+If a histogram is used by a state (as opposed to a fixed parameterized
+distribution), then the developer must use the
+[histogram-related fields](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L285)
+of the `circpad_state_t` structure.
+
+The width of a histogram bin specifies the range of inter-packet delay times,
+whereas its height specifies the amount of tokens in each bin. To sample a
+padding delay from a histogram, we first randomly pick a bin (weighted by the
+amount of tokens in each bin) and then sample a delay from within that bin by
+picking a uniformly random delay using the width of the bin as the range.
+
+Each histogram also has an ''infinity bin'' as its final bin.  If the
+''infinity bin'' is chosen,
+we don't schedule any padding (i.e., we schedule padding with
+infinite delay). If the developer does not want infinite delay, they
+should not give any tokens to the ''infinity bin''.
+
+If a token removal strategy is specified (via the
+`circpad_state_t.token_removal` field), each time padding is sent using a
+histogram, the padding machine will remove a token from the appropriate
+histogram bin whenever this endpoint sends *either a padding packet or a
+non-padding packet*. The different removal strategies govern what to do when
+the bin corresponding to the current inter-packet delay is empty.
+
+Token removal is optional. It is useful if you want to do things like specify
+a burst should be at least N packets long, and you only want to add padding
+packets if there are not enough non-padding packets. The cost of doing token
+removal is additional memory allocations for making per-circuit copies of
+your histogram that can be modified.
+
+### 3.4. Specifying Precise Cell Counts
+
+Padding machines should be able to specify the exact amount of padding they
+send. For histogram-based machines this can be done using a specific amount
+of tokens, but another (and perhaps easier) way to do this, is to use the
+[length_dist field](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.h#L362)
+of the `circpad_state_t` structure.
+
+The `length_dist` field is basically a probability distribution similar to the
+padding probability distributions, which applies to a specific machine state
+and specifies the amount of padding we are willing to send during that state.
+This value gets sampled when we transition to that state (TODO document this
+in the code).
+
+### 3.5. Specifying Overhead Limits
+
+Separately from the length counts, it is possible to rate limit the overhead
+percentage of padding at both the global level across all machines, and on a
+per-machine basis.
+
+At the global level, the overhead percentage of all circuit padding machines
+as compared to total traffic can be limited through the Tor consensus
+parameter `circpad_global_max_padding_pct`. This overhead is defined as the
+percentage of padding cells *sent* out of the sum of non padding and padding
+cells *sent*, and is applied *only after* at least
+`circpad_global_allowed_cells` padding cells are sent by that relay or client
+(to allow for small bursts of pure padding on otherwise idle or freshly
+restarted relays). When both of these limits are hit by a relay or client, no
+further padding cells will be sent, until sufficient non-padding traffic is
+sent to cause the percentage of padding traffic to fall back below the
+threshold.
+
+Additionally, each individual padding machine can rate limit itself by
+filling in the fields `circpad_machine_spec_t.max_padding_percent` and
+`circpad_machine_spec_t.allowed_padding_count`, which behave identically to
+the consensus parameters, but only apply to that specific machine.
+
+## 4. Evaluating Padding Machines
+
+One of the goals of the circuit padding framework is to provide improved
+evaluation and scientific reproducibility for lower cost. This includes both
+the [choice](#13-computation-model) of the compact C structure representation
+(which has an easy-to-produce bitstring representation for optimization by
+gradient descent, GAs, or GANs), as well as rapid prototyping and evaluation.
+
+So far, whenever evaluation cost has been a barrier, each research group has
+developed their own ad-hoc packet-level simulators of various padding
+mechanisms for evaluating website fingerprinting attacks and defenses. The
+process typically involves doing a crawl of Alexa top sites over Tor, and
+recording the Tor cell count and timing information for each page in the
+trace. These traces are then fed to simulations of defenses, which output
+modified trace files.
+
+Because no standardized simulation and evaluation mechanism exists, it is
+often hard to tell if independent implementations of various attacks and
+defenses are in fact true-to-form or even properly calibrated for direct
+comparison, and discrepancies in results across the literature suggests
+this is not always so.
+
+Our preferred outcome with this framework is that machines are tuned
+and optimized on a tracing simulator, but that the final results come from
+an actual live network test of the defense. The traces from this final crawl
+should be preserved as artifacts to be run on the simulator and reproduced
+on the live network by future papers, ideally in journal venues that have an
+artifact preservation policy.
+
+### 4.1. Pure Simulation
+
+When doing initial tuning of padding machines, especially in adversarial
+settings, variations of a padding machine defense may need to be applied to
+network activity hundreds or even millions of times. The wall-clock time
+required to do this kind of tuning using live testing or even Shadow network
+emulation may often be prohibitive.
+
+To help address this, and to better standardize results, Tobias Pulls has
+implemented a [circpad machine trace simulator](https://github.com/pylls/circpad-sim),
+which uses Tor's unit test framework to simulate applying padding machines to
+circuit packet traces via a combination of Tor patches and python scripts. This
+simulator can be used to record traces from clients, Guards, Middles, Exits,
+and any other hop in the path, only for circuits that are run by the
+researcher. This makes it possible to safely record baseline traces and
+ultimately even mount passive attacks on the live network, without impacting
+or recording any normal user traffic.
+
+In this way, a live crawl of the Alexa top sites could be performed once, to
+produce a standard "undefended" corpus. Padding machines can be then quickly
+evaluated and tuned on these simulated traces in a standardized way, and then
+the results can then be [reproduced on the live Tor
+network](#44-Testing-on-the-Live-Network) with the machines running on your own relays.
+
+Please be mindful of the Limitations section of the simulator documentation,
+however, to ensure that you are aware of the edge cases and timing
+approximations that are introduced by this approach.
+
+### 4.2. Testing in Chutney
+
+The Tor Project provides a tool called
+[Chutney](https://github.com/torproject/chutney/) which makes it very easy to
+setup private Tor networks. While getting it work for the first time might
+take you some time of doc reading, the final result is well worth it for the
+following reasons:
+
+- You control all the relays and hence you have greater control and debugging
+  capabilities.
+- You control all the relays and hence you can toggle padding support on/off
+  at will.
+- You don't need to be cautious about overhead or damaging the real Tor
+  network during testing.
+- You don't even need to be online; you can do all your testing offline over
+  localhost.
+
+A final word of warning here is that since Chutney runs over localhost, the
+packet latencies and delays are completely different from the real Tor
+network, so if your padding machines rely on real network timings you will
+get different results on Chutney. You can work around this by using a
+different set of delays if Chutney is used, or by moving your padding
+machines to the real network when you want to do latency-related testing.
+
+### 4.3. Testing in Shadow
+
+[Shadow](https://shadow.github.io/) is an environment for running entire Tor
+network simulations, similar to Chutney, but designed to be both more memory
+efficient, as well as provide an accurate Tor network topology and latency
+model.
+
+While Shadow is significantly more memory efficient than Chutney, and can make
+use of extremely accurate Tor network capacity and latency models, it will not
+be as fast or efficient as the [circpad trace simulator](https://github.com/pylls/circpad-sim),
+if you need to do many many iterations of an experiment to tune your defense.
+
+### 4.4. Testing on the Live Network
+
+Live network testing is the gold standard for verifying that any attack or
+defense is behaving as expected, to minimize the influence of simplifying
+assumptions.
+
+However, it is not ethical, or necessarily possible, to run high-resolution
+traffic analysis attacks on the entire Tor network. But it is both ethical
+and possible to run small scale experiments that target only your own
+clients, who will only use your own Tor relays that support your new padding
+machines.
+
+We provide the `MiddleNodes` torrc directive to enable this, which will allow
+you to specify the fingerprints and/or IP netmasks of relays to be used in
+the second hop position. Options to restrict other hops also exist, if your
+padding system is padding to a different hop. The `HSLayer2Nodes` option
+overrides the `MiddleNodes` option for onion service circuits, if both are
+set. (The
+[vanguards addon](https://github.com/mikeperry-tor/vanguards/README_TECHNICAL.md)
+will set `HSLayer2Nodes`.)
+
+When you run your own clients, and use MiddleNodes to restrict your clients
+to use your relays, you can perform live network evaluations of a defense
+applied to whatever traffic crawl or activity your clients do.
+
+## 5. Example Padding Machines
+
+### 5.1. Deployed Circuit Setup Machines
+
+Tor currently has two padding machines enabled by default, which aim to hide
+certain features of the client-side onion service circuit setup protocol. For
+more details on their precise goal and function, please see
+[proposal 302](https://github.com/torproject/torspec/blob/master/proposals/302-padding-machines-for-onion-clients.txt)
+. In this section we will go over the code of those machines to clarify some
+of the engineering parts:
+
+#### 5.1.1. Overview
+
+The codebase of proposal 302 can be found in
+[circuitpadding_machines.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c)
+and specifies four padding machines:
+
+- The [client-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L60) circuit machine.
+- The [relay-side introduction](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L146) circuit machine.
+- The [client-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L257) circuit machine
+- The [relay-side rendezvous](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L374) circuit machine.
+
+Each of those machines has its own setup function, and
+[they are all initialized](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding.c#L2718)
+by the circuit padding framework. To understand more about them, please
+carefully read the individual setup function for each machine which are
+fairly well documented. Each function goes through the following steps:
+- Machine initialization
+  - Give it a [name](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L70)
+  - Specify [which hop](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L73) the padding should go to
+  - Specify whether it should be [client-side](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L75) or relay-side.
+- Specify for [which types of circuits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L78) the machine should apply
+- Specify whether the circuits should be [kept alive](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding_machines.c#L112) until the machine finishes padding.
+- Sets [padding limits](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L116) to avoid too much overhead in case of bugs or errors.
+- Setup [machine states](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L120)
+   - Specify [state transitions](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L123).
+- Finally [register the machine](https://github.com/torproject/tor/blob/35e978da61efa04af9a5ab2399dff863bc6fb20a/src/core/or/circuitpadding_machines.c#L137) to the global machine list
+
+### 5.2. Adaptive Padding Early
+
+[Adaptive Padding Early](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/)
+is a variant of Adaptive Padding/WTF-PAD that does not use histograms or token
+removal to shift padding distributions, but instead uses fixed parameterized
+distributions to specify inter-packet timing thresholds for burst and gap
+inter-packet delays.
+
+Tobias Pulls's [QuickStart Guide](CircuitPaddingQuickStart.md) describes how
+to get this machine up and running, and has links to branches with a working
+implementation.
+
+### 5.3. Sketch of Tamaraw
+
+The [Tamaraw defense
+paper](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf) is the
+only defense to date that provides a proof of optimality for the finite-length
+website traffic fingerprinting domain. These bounds assume that a defense is
+able to perform a full, arbitrary transform of a trace that is under a fixed
+number of packets in length.
+
+The key insight to understand Tamaraw's optimality is that it achieves one
+such optimal transform by delaying traffic below a circuit's throughput. By
+doing this, it creates a queue that is rarely empty, allowing it to produce
+a provably optimal transform with minimal overhead. As [Section
+1.4](#14-other-deployment-constraints) explains, this queue
+cannot be maintained on the live Tor network without risk of out-of-memory
+conditions at relays.
+
+However, if the queue is not maintained in the Tor network, but instead by the
+application layer, it could be deployed by websites that opt in to using it.
+
+In this case, the application layer component would do *optional* constant
+rate shaping, negotiated between a web browser and a website. The Circuit
+Padding Framework can then easily fill in any missing gaps of cover traffic
+packets, and also ensure that only a fixed length number of packets are sent
+in total.
+
+However, for such a defense to be safe, additional care must be taken to
+ensure that the resulting traffic pattern still has a large
+anonymity/confusion set with other traces on the live network.
+
+Accomplishing this is an unsolved problem.
+
+### 5.4. Other Padding Machines
+
+Our partners in this project at RIT have produced a couple prototypes, based
+on their published research designs
+[REB and RBB](https://www.researchgate.net/publication/329743510_UNDERSTANDING_FEATURE_DISCOVERY_IN_WEBSITE_FINGERPRINTING_ATTACKS).
+
+As [their writeup
+explains](https://github.com/notem/tor-rbp-padding-machine-doc), because RBB
+uses delay, the circuit padding machine that they made is a no-delay version.
+
+They also ran into an issue with the 0-delay timing workaround for [bug
+31653](https://bugs.torproject.org/31653). Keep an eye on that bug for updates
+with improved workarounds/fixes.
+
+Their code is [available on github](https://github.com/notem/tor/tree/circuit_padding_rbp_machine).
+
+
+## 6. Framework Implementation Details
+
+If you need to add additional events, conditions, or other features to the
+circuit padding framework, then this section is for you.
+
+### 6.1. Memory Allocation Conventions
+
+If the existing circuit padding features are sufficient for your needs, then
+you do not need to worry about memory management or pointer lifespans. The
+circuit padding framework should take care of this for you automatically.
+
+However, if you need to add new padding machine features to support your
+padding machines, then it will be helpful to understand how circuits
+correspond to the global machine definitions, and how mutable padding machine
+memory is managed.
+
+#### 6.1.1. Circuits and Padding Machines
+
+In Tor, the
+[circuit_t structure](https://github.com/torproject/tor/blob/master/src/core/or/circuit_st.h)
+is the superclass structure for circuit-related state that is used on both
+clients and relays. On clients, the actual datatype of the object pointed to
+by `circuit_t *` is the subclass structure
+[origin_circuit_t](https://github.com/torproject/tor/blob/master/src/core/or/origin_circuit_st.h). The
+macros `CIRCUIT_IS_ORIGIN()` and `TO_ORIGIN_CIRCUIT()` are used to determine
+if a circuit is a client-side (origin) circuit and to cast the pointer safely
+to `origin_circuit_t *`.
+
+Because circuit padding machines can be present at both clients and relays,
+the circuit padding fields are stored in the `circuit_t *` superclass
+structure. Notice that there are actually two sets of circuit padding fields:
+a `const circpad_machine_spec_t *` array, and a `circpad_machine_runtime_t *`
+array. Each of these arrays holds at most two elements, as there can be at
+most two padding machines on each circuit.
+
+The `const circpad_machine_spec_t *` points to a globally allocated machine
+specification. These machine specifications are
+allocated and set up during Tor program startup, in `circpad_machines_init()`
+in
+[circuitpadding.c](https://github.com/torproject/tor/blob/master/src/core/or/circuitpadding.c). Because
+the machine specification object is shared by all circuits, it must not be
+modified or freed until program exit (by `circpad_machines_free()`). The
+`const` qualifier should enforce this at compile time.
+
+The `circpad_machine_runtime_t *` array member points to the mutable runtime
+information for machine specification at that same array index. This runtime
+structure keeps track of the current machine state, packet counts, and other
+information that must be updated as the machine operates. When a padding
+machine is successfully negotiated `circpad_setup_machine_on_circ()` allocates
+the associated runtime information.
+
+#### 6.1.2. Histogram Management
+
+If a `circpad_state_t` of a machine specifies a `token_removal` strategy
+other than `CIRCPAD_TOKEN_REMOVAL_NONE`, then every time
+there is a state transition into this state, `circpad_machine_setup_tokens()`
+will copy the read-only `circpad_state_t.histogram` array into a mutable
+version at `circpad_machine_runtime_t.histogram`. This mutable copy is used
+to decrement the histogram bin accounts as packets are sent, as per the
+specified token removal strategy.
+
+When the state machine transitions out of this state, the mutable histogram copy is freed
+by this same `circpad_machine_setup_tokens()` function.
+
+#### 6.1.3. Deallocation and Shutdown
+
+As an optimization, padding machines can be swapped in and out by the client
+without waiting a full round trip for the relay machine to shut down.
+
+Internally, this is accomplished by immediately freeing the heap-allocated
+`circuit_t.padding_info` field corresponding to that machine, but still preserving the
+`circuit_t.padding_machine` pointer to the global padding machine
+specification until the response comes back from the relay. Once the response
+comes back, that `circuit_t.padding_machine` pointer is set to NULL, if the
+response machine number matches the current machine present.
+
+Because of this partial shutdown condition, we have two macros for iterating
+over machines. `FOR_EACH_ACTIVE_CIRCUIT_MACHINE_BEGIN()` is used to iterate
+over machines that have both a `circuit_t.padding_info` slot and a
+`circuit_t.padding_machine` slot occupied. `FOR_EACH_CIRCUIT_MACHINE_BEGIN()`
+is used when we need to iterate over all machines that are either active or
+are simply waiting for a response to a shutdown request.
+
+If the machine is replaced instead of just shut down, then the client frees
+the `circuit_t.padding_info`, and then sets the `circuit_t.padding_machine`
+and `circuit_t.padding_info` fields for this next machine immediately. This is
+done in `circpad_add_matching_machines()`. In this case, since the new machine
+should have a different machine number, the shut down response from the relay
+is silently discarded, since it will not match the new machine number.
+
+If this sequence of machine teardown and spin-up happens rapidly enough for
+the same machine number (as opposed to different machines), then a race
+condition can happen. This is
+[known bug #30992](https://bugs.torproject.org/30992).
+
+When the relay side decides to shut down a machine, it sends a
+`RELAY_COMMAND_PADDING_NEGOTIATED` towards the client. If this cell matches the
+current machine number on the client, that machine is torn down, by freeing
+the `circuit_t.padding_info` slot and immediately setting
+`circuit_t.padding_machine` slot to NULL.
+
+Additionally, if Tor decides to close a circuit forcibly due to error before
+the padding machine is shut down, then `circuit_t.padding_info` is still
+properly freed by the call to `circpad_circuit_free_all_machineinfos()`
+in `circuit_free_()`.
+
+### 6.2. Machine Application Events
+
+The framework checks client-side origin circuits to see if padding machines
+should be activated or terminated during specific event callbacks in
+`circuitpadding.c`. We list these event callbacks here only for reference. You
+should not modify any of these callbacks to get your machine to run; instead,
+you should use the `circpad_machine_spec_t.conditions` field.
+
+However, you may add new event callbacks if you need other activation events,
+for example to provide obfuscation-layer or application-layer signaling. Any
+new event callbacks should behave exactly like the existing callbacks.
+
+During each of these event callbacks, the framework checks to see if any
+current running padding machines have conditions that no longer apply as a
+result of the event, and shuts those machines down. Then, it checks to see if
+any new padding machines should be activated as a result of the event, based
+on their circuit application conditions. **Remember: Machines are checked in
+reverse order in the machine list. This means that later, more recently added
+machines take precedence over older, earlier entries in each list.**
+
+Both of these checks are performed using the machine application conditions
+that you specify in your machine's `circpad_machine_spec_t.conditions` field.
+
+The machine application event callbacks are prefixed by `circpad_machine_event_` by convention in circuitpadding.c. As of this writing, these callbacks are:
+
+  - `circpad_machine_event_circ_added_hop()`: Called whenever a new hop is
+    added to a circuit.
+  - `circpad_machine_event_circ_built()`: Called when a circuit has completed
+    construction and is
+    opened. <!-- open != ready for traffic. Which do we mean? -nickm -->
+  - `circpad_machine_event_circ_purpose_changed()`: Called when a circuit
+    changes purpose.
+  - `circpad_machine_event_circ_has_no_relay_early()`: Called when a circuit
+    runs out of `RELAY_EARLY` cells.
+  - `circpad_machine_event_circ_has_streams()`: Called when a circuit gets a
+    stream attached.
+  - `circpad_machine_event_circ_has_no_streams()`: Called when the last
+    stream is detached from a circuit.
+
+### 6.3. Internal Machine Events
+
+To provide for some additional capabilities beyond simple finite state machine
+behavior, the circuit padding machines also have internal events that they
+emit to themselves when packet count length limits are hit, when the Infinity
+bin is sampled, and when the histogram bins are emptied of all tokens.
+
+These events are implemented as `circpad_internal_event_*` functions in
+`circuitpadding.c`, which are called from various areas that determine when
+the events should occur.
+
+While the conditions that trigger these internal events to be called may be
+complex, they are processed by the state machine definitions in a nearly
+identical manner as the cell processing events, with the exception that they
+are sent to the current machine only, rather than all machines on the circuit.
+
+
+## 7. Future Features and Optimizations
+
+While implementing the circuit padding framework, our goal was to deploy a
+system that obscured client-side onion service circuit setup and supported
+deployment of WTF-PAD and/or APE. Along the way, we noticed several features
+that might prove useful to others, but were not essential to implement
+immediately. We do not have immediate plans to implement these ideas, but we
+would gladly accept patches that do so.
+
+The following list gives an overview of these improvements, but as this
+document ages, it may become stale. The canonical list of improvements that
+researchers may find useful is tagged in our bugtracker with
+[circpad-researchers](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers),
+and the list of improvements that are known to be necessary for some research
+areas are tagged with
+[circpad-researchers-want](https://trac.torproject.org/projects/tor/query?keywords=~circpad-researchers-want).
+
+Please consult those lists for the latest status of these issues. Note that
+not all fixes will be backported to all Tor versions, so be mindful of which
+Tor releases receive which fixes as you conduct your experiments.
+
+### 7.1. Load Balancing and Flow Control
+
+Fortunately, non-Exit bandwidth is already plentiful and exceeds the Exit
+capacity, and we anticipate that if we inform our relay operator community of
+the need for non-Exit bandwidth to satisfy padding overhead requirements,
+they will be able to provide that with relative ease.
+
+Unfortunately, padding machines that have large quantities of overhead will
+require changes to our load balancing system to account for this
+overhead. The necessary changes are documented in
+[Proposal 265](https://gitweb.torproject.org/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt).
+
+Additionally, padding cells are not currently subjected to flow control. For
+high amounts of padding, we may want to change this. See [ticket
+31782](https://bugs.torproject.org/31782) for details.
+
+### 7.2. Timing and Queuing Optimizations
+
+The circuitpadding framework has some timing related issues that may impact
+results. If high-resolution timestamps are fed to opaque deep learning
+trainers, those training models may end up able to differentiate padding
+traffic from non-padding traffic due to these timing bugs.
+
+The circuit padding cell event callbacks come from post-decryption points in
+the cell processing codepath, and from the pre-queue points in the cell send
+codepath. This means that our cell events will not reflect the actual time
+when packets are read or sent on the wire. This is much worse in the send
+direction, as the circuitmux queue, channel outbuf, and kernel TCP buffer will
+impose significant additional delay between when we currently report that a
+packet was sent, and when it actually hits the wire.
+
+[Ticket 29494](https://bugs.torproject.org/29494) has a more detailed
+description of this problem, and an experimental branch that changes the cell
+event callback locations to be from circuitmux post-queue, which with KIST,
+should be an accurate reflection of when they are actually sent on the wire.
+
+If your padding machine and problem space depends on very accurate notions of
+relay-side packet timing, please try that branch and let us know on the
+ticket if you need any further assistance fixing it up.
+
+Additionally, with these changes, it will be possible to provide further
+overhead reducing optimizations by letting machines specify flags to indicate
+that padding should not be sent if there are any cells pending in the cell
+queue, for doing things like extending cell bursts more accurately and with
+less overhead.
+
+However, even if we solve the queuing issues, Tor's current timers are not as
+precise as some padding designs may require. We will still have issues of
+timing precision to solve. [Ticket 31653](https://bugs.torproject.org/31653)
+describes an issue the circuit padding system has with sending 0-delay padding
+cells, and [ticket 32670](https://bugs.torproject.org/32670) describes a
+libevent timer accuracy issue, which causes callbacks to vary up to 10ms from
+their scheduled time, even in absence of load.
+
+All of these issues strongly suggest that you either truncate the resolution
+of any timers you feed to your classifier, or that you omit timestamps
+entirely from the classification problem until these issues are addressed.
+
+### 7.3. Better Machine Negotiation
+
+Circuit padding is applied to circuits through machine conditions.
+
+The following machine conditions may be useful for some use cases, but have
+not been implemented yet:
+  * [Exit Policy-based Stream Conditions](https://bugs.torproject.org/29083)
+  * [Probability to apply machine/Cointoss condition](https://bugs.torproject.org/30092)
+  * [Probability distributions for launching new padding circuit(s)](https://bugs.torproject.org/31783)
+  * [More flexible purpose handling](https://bugs.torproject.org/32040)
+
+Additionally, the following features may help to obscure that padding is being
+negotiated, and/or streamline that negotiation process:
+  * [Always send negotiation cell on all circuits](https://bugs.torproject.org/30172)
+  * [Better shutdown handling](https://bugs.torproject.org/30992)
+  * [Preference-ordered negotiation menu](https://bugs.torproject.org/30348)
+
+### 7.4. Probabilistic State Transitions
+
+Right now, the state machine transitions are fully deterministic. However,
+one could imagine a state machine that uses probabilistic transitions between
+states to simulate a random walk or Hidden Markov Model traversal across
+several pages.
+
+The simplest way to implement this is to make  the `circpad_state_t.next_state` array
+into an array of structs that have a next state field, and a probability to
+transition to that state.
+
+If you need this feature, please see [ticket
+31787](https://bugs.torproject.org/31787) for more details.
+
+### 7.5. More Complex Pattern Recognition
+
+State machines are extremely efficient sequence recognition devices. But they
+are not great pattern recognition devices. This is one of the reasons why
+[Adaptive Padding](https://www.freehaven.net/anonbib/cache/ShWa-Timing06.pdf)
+used state machines in combination with histograms, to model the target
+distribution of interpacket delays for transmitted packets.
+
+However, there currently is no such optimization for reaction to patterns of
+*received* traffic. There may also be cases where defenses must react to more
+complex patterns of sent traffic than can be expressed by our current
+histogram and length count events.
+
+For example: if you wish your machine to react to a certain count of incoming
+cells in a row, right now you have to have a state for each cell, and use the
+infinity bin to timeout of the sequence in each state. We could make this more
+compact if each state had an arrival cell counter and inter-cell timeout. Or
+we could make more complex mechanisms to recognize certain patterns of arrival
+traffic in a state.
+
+The best way to build recognition primitives like this into the framework is
+to add additional [Internal Machine Events](#63-internal-machine-events) for
+the pattern in question.
+
+As another simple example, a useful additional event might be to transition
+whenever any of your histogram bins are empty, rather than all of them. To do
+this, you would add `CIRCPAD_EVENT_ANY_BIN_EMPTY` to the enum
+`circpad_event_t` in `circuitpadding.h`. You would then create a function
+`circuitpadding_internal_event_any_bin_empty()`, which would work just like
+`circuitpadding_internal_event_bin_empty()`, and also be called from
+`check_machine_token_supply()` in `circuitpadding.c` but with the check for
+each bin being zero instead of the total. With this change, new machines could
+react to this new event in the same way as any other.
+
+If you have any other ideas that may be useful, please comment on [ticket
+32680](https://bugs.torproject.org/32680).
+
+
+## 8. Open Research Problems
+
+### 8.1. Onion Service Circuit Setup
+
+Our circuit setup padding does not address timing-based features, only
+packet counts. Deep learning can probably see this.
+
+However, before going too deep down the timing rabithole, we may need to make
+[some improvements to Tor](#72-timing-and-queuing-optimizations). Please
+comment on those tickets if you need this.
+
+### 8.2. Onion Service Fingerprinting
+
+We have done nothing to obscure the service side of onion service circuit
+setup. Because service-side onion services will have the reverse traffic byte
+counts as normal clients, they will likely need some kind of [hybrid
+application layer traffic shaping](#53-sketch-of-tamaraw), in addition to
+simple circuit setup obfuscation.
+
+Fingerprinting in
+[combination](https://github.com/mikeperry-tor/vanguards/blob/master/README_SECURITY.md)
+with
+[vanguards](https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md)
+ia also an open issue.
+
+### 8.3. Open World Fingerprinting
+
+Similarly, Open World/clearweb website fingerprinting defenses remain
+an unsolved problem from the practicality point of view. The original WTF-PAD
+defense was never tuned, and it is showing accuracy issues against deep
+learning attacks.
+
+### 8.4. Protocol Usage Fingerprinting
+
+Traffic Fingerprinting to determine the protocol in use by a client has not
+been studied, either from the attack or defense point of view.
+
+### 8.5. Datagram Transport Side Channels
+
+Padding can reduce the accuracy of dropped-cell side channels in such
+transports, but we don't know [how to measure
+this](https://lists.torproject.org/pipermail/tor-dev/2018-November/013562.html).
+
+## 9. Must Read Papers
+
+These are by far the most important papers in the space, to date:
+
+ - [Tamaraw](https://www.cypherpunks.ca/~iang/pubs/webfingerprint-ccs14.pdf)
+ - [Bayes, Not Naive](https://www.petsymposium.org/2017/papers/issue4/paper50-2017-4-source.pdf)
+ - [Anonymity Trilemma](https://eprint.iacr.org/2017/954.pdf)
+ - [WTF-PAD](http://arxiv.org/pdf/1512.00524)
+
+Except for WTF-PAD, these papers were selected because they point towards
+optimality bounds that can be benchmarked against.
+
+We cite them even though we are skeptical that provably optimal defenses can
+be constructed, at least not without trivial or impractical transforms (such as
+those that can be created with unbounded queue capacity, or stored knowledge
+of traces for every possible HTTP trace on the Internet).
+
+We also are not demanding an optimality or security proof for every defense.
+
+Instead, we cite the above as benchmarks. We believe the space, especially the
+open-world case, to be more akin to an optimization problem, where a
+WTF-PAD-like defense must be tuned through an optimizer to produce results
+comparable to provably optimal but practically unrealizable defenses, through
+rigorous adversarial evaluation.
+
+## A. Acknowledgments
+
+This research was supported in part by NSF grants CNS-1619454 and CNS-1526306.
diff --git a/doc/HACKING/CircuitPaddingQuickStart.md b/doc/HACKING/CircuitPaddingQuickStart.md
new file mode 100644
index 0000000000..167ff9f292
--- /dev/null
+++ b/doc/HACKING/CircuitPaddingQuickStart.md
@@ -0,0 +1,263 @@
+# A Padding Machine from Scratch
+
+A quickstart guide by Tobias Pulls.
+
+This document describes the process of building a "padding machine" in tor's new
+circuit padding framework from scratch. Notes were taken as part of porting
+[Adaptive Padding Early
+(APE)](https://www.cs.kau.se/pulls/hot/thebasketcase-ape/) from basket2 to the
+circuit padding framework. The goal is just to document the process and provide
+useful pointers along the way, not create a useful machine. 
+
+The quick and dirty plan is to:
+1. clone and compile tor
+2. use newly built tor in TB and at small (non-exit) relay we run
+3. add a bare-bones APE padding machine
+4. run the machine, inspect logs for activity
+5. port APE's state machine without thinking much about parameters
+
+## Clone and compile tor
+
+```bash
+git clone https://git.torproject.org/tor.git
+cd tor
+git checkout tor-0.4.1.5
+```
+Above we use the tag for tor-0.4.1.5 where the circuit padding framework was
+released. Note that this version of the framework is missing many features and
+fixes that have since been merged to origin/master. If you need the newest
+framework features, you should use that master instead.
+
+```bash
+sh autogen.sh 
+./configure
+make
+```
+When you run `./configure` you'll be told of missing dependencies and packages
+to install on debian-based distributions. Important: if you plan to run `tor` on
+a relay as part of the real Tor network and your server runs a distribution that
+uses systemd, then I'd recommend that you `apt install dpkg dpkg-dev
+libevent-dev libssl-dev asciidoc quilt dh-apparmor libseccomp-dev dh-systemd
+libsystemd-dev pkg-config dh-autoreconf libfakeroot zlib1g zlib1g-dev automake
+liblzma-dev libzstd-dev` and ensure that tor has systemd support enabled:
+`./configure --enable-systemd`. Without this, on a recent Ubuntu, my tor service
+was forcefully restarted (SIGINT interrupt) by systemd every five minutes.
+
+If you want to install on your localsystem, run `make install`. For our case we
+just want the tor binary at `src/app/tor`.
+
+## Use tor in TB and at a relay
+Download and install a fresh Tor Browser (TB) from torproject.org. Make sure it
+works. From the command line, relative to the folder created when you extracted
+TB, run `./Browser/start-tor-browser --verbose` to get some basic log output.
+Note the version of tor, in my case, `Tor 0.4.0.5 (git-bf071e34aa26e096)` as
+part of TB 8.5.4. Shut down TB, copy the `tor` binary that you compiled earlier
+and replace `Browser/TorBrowser/Tor/tor`. Start TB from the command line again,
+you should see a different version, in my case `Tor 0.4.1.5
+(git-439ca48989ece545)`.
+
+The relay we run is also on linux, and `tor` is located at `/usr/bin/tor`. To
+view relevant logs since last boot `sudo journalctl -b /usr/bin/tor`, where we
+find `Tor 0.4.0.5 running on Linux`. Copy the locally compiled `tor` to the
+relay at a temporary location and then make sure it's ownership and access
+rights are identical to `/usr/bin/tor`. Next, shut down the running tor service
+with `sudo service tor stop`, wait for it to stop (typically 30s), copy our
+locally compiled tor to replace `/usr/bin/tor` then start the service again.
+Checking the logs we see `or 0.4.1.5 (git-439ca48989ece545)`.
+
+Repeatedly shutting down a relay is detrimental to the network and should be
+avoided. Sorry about that.
+
+We have one more step left before we move on the machine: configure TB to always
+use our middle relay. Edit `Browser/TorBrowser/Data/Tor/torrc` and set
+`MiddleNodes <fingerprint>`, where `<fingerprint>` is the fingerprint of the
+relay. Start TB, visit a website, and manually confirm that the middle is used
+by looking at the circuit display. 
+
+## Add a bare-bones APE padding machine
+Now the fun part. We have several resources at our disposal (mind that links
+might be broken in the future, just search for the headings):
+- The official [Circuit Padding Developer
+  Documentation](https://storm.torproject.org/shared/ChieH_sLU93313A2gopZYT3x2waJ41hz5Hn2uG1Uuh7).
+- Notes we made on the [implementation of the circuit padding
+  framework](https://github.com/pylls/padding-machines-for-tor/blob/master/notes/circuit-padding-framework.md).
+- The implementation of the current circuit padding machines in tor:
+  [circuitpadding.c](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.c)
+  and
+  [circuitpadding_machines.h](https://gitweb.torproject.org/tor.git/tree/src/core/or/circuitpadding_machines.h).
+
+Please consult the above links for details. Moving forward, the focus is to
+describe what was done, not necessarily explaining all the details why. 
+
+Since we plan to make changes to tor, create a new branch `git checkout -b
+circuit-padding-ape-machine tor-0.4.1.5`. 
+
+We start with declaring two functions, one for the machine at the client and one
+at the relay, in `circuitpadding_machines.h`:
+
+```c
+void circpad_machine_relay_wf_ape(smartlist_t *machines_sl);
+void circpad_machine_client_wf_ape(smartlist_t *machines_sl);
+```
+
+The definitions go into `circuitpadding_machines.c`:
+
+```c
+/**************** Adaptive Padding Early (APE) machine ****************/
+
+/** 
+ * Create a relay-side padding machine based on the APE design. 
+ */
+void
+circpad_machine_relay_wf_ape(smartlist_t *machines_sl)
+{
+  circpad_machine_spec_t *relay_machine
+  = tor_malloc_zero(sizeof(circpad_machine_spec_t));
+
+  relay_machine->name = "relay_wf_ape";
+  relay_machine->is_origin_side = 0; // relay-side
+
+  // Pad to/from the middle relay, only when the circuit has streams
+  relay_machine->target_hopnum = 2;
+  relay_machine->conditions.min_hops = 2;
+  relay_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
+
+  // limits to help guard against excessive padding
+  relay_machine->allowed_padding_count = 1;
+  relay_machine->max_padding_percent = 1;
+
+  // one state to start with: START (-> END, never takes a slot in states)
+  circpad_machine_states_init(relay_machine, 1);
+  relay_machine->states[CIRCPAD_STATE_START].
+    next_state[CIRCPAD_EVENT_NONPADDING_SENT] =
+    CIRCPAD_STATE_END;
+
+  // register the machine
+  relay_machine->machine_num = smartlist_len(machines_sl);
+  circpad_register_padding_machine(relay_machine, machines_sl);
+  
+  log_info(LD_CIRC,
+           "Registered relay WF APE padding machine (%u)",
+           relay_machine->machine_num);
+}
+
+/** 
+ * Create a client-side padding machine based on the APE design. 
+ */
+void
+circpad_machine_client_wf_ape(smartlist_t *machines_sl)
+{
+    circpad_machine_spec_t *client_machine
+  = tor_malloc_zero(sizeof(circpad_machine_spec_t));
+
+  client_machine->name = "client_wf_ape";
+  client_machine->is_origin_side = 1; // client-side
+
+  /** Pad to/from the middle relay, only when the circuit has streams, and only
+  * for general purpose circuits (typical for web browsing)
+  */
+  client_machine->target_hopnum = 2;
+  client_machine->conditions.min_hops = 2;
+  client_machine->conditions.state_mask = CIRCPAD_CIRC_STREAMS;
+  client_machine->conditions.purpose_mask =
+    circpad_circ_purpose_to_mask(CIRCUIT_PURPOSE_C_GENERAL);
+
+  // limits to help guard against excessive padding
+  client_machine->allowed_padding_count = 1;
+  client_machine->max_padding_percent = 1;
+
+  // one state to start with: START (-> END, never takes a slot in states)
+  circpad_machine_states_init(client_machine, 1);
+  client_machine->states[CIRCPAD_STATE_START].
+    next_state[CIRCPAD_EVENT_NONPADDING_SENT] =
+    CIRCPAD_STATE_END;
+
+  client_machine->machine_num = smartlist_len(machines_sl);
+  circpad_register_padding_machine(client_machine, machines_sl);
+  log_info(LD_CIRC,
+           "Registered client WF APE padding machine (%u)",
+           client_machine->machine_num);
+}
+```
+
+We also have to modify `circpad_machines_init()` in `circuitpadding.c` to
+register our machines:
+
+```c
+  /* Register machines for the APE WF defense */
+  circpad_machine_client_wf_ape(origin_padding_machines);
+  circpad_machine_relay_wf_ape(relay_padding_machines);
+```
+
+We run `make` to get a new `tor` binary and copy it to our local TB. 
+
+## Run the machine
+To be able
+to view circuit info events in the console as we launch TB, we add `Log
+[circ]info notice stdout` to `torrc` of TB. 
+
+Running TB to visit example.com we first find in the log:
+
+```
+Aug 30 18:36:43.000 [info] circpad_machine_client_hide_intro_circuits(): Registered client intro point hiding padding machine (0)
+Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_intro_circuits(): Registered relay intro circuit hiding padding machine (0)
+Aug 30 18:36:43.000 [info] circpad_machine_client_hide_rend_circuits(): Registered client rendezvous circuit hiding padding machine (1)
+Aug 30 18:36:43.000 [info] circpad_machine_relay_hide_rend_circuits(): Registered relay rendezvous circuit hiding padding machine (1)
+Aug 30 18:36:43.000 [info] circpad_machine_client_wf_ape(): Registered client WF APE padding machine (2)
+Aug 30 18:36:43.000 [info] circpad_machine_relay_wf_ape(): Registered relay WF APE padding machine (2)
+```
+
+All good, our machine is running. Looking further we find:
+
+```
+Aug 30 18:36:55.000 [info] circpad_setup_machine_on_circ(): Registering machine client_wf_ape to origin circ 2 (5)
+Aug 30 18:36:55.000 [info] circpad_node_supports_padding(): Checking padding: supported
+Aug 30 18:36:55.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 2 (5), command 2
+Aug 30 18:36:55.000 [info] circpad_machine_spec_transition(): Circuit 2 circpad machine 0 transitioning from 0 to 65535
+Aug 30 18:36:55.000 [info] circpad_machine_spec_transitioned_to_end(): Padding machine in end state on circuit 2 (5)
+Aug 30 18:36:55.000 [info] circpad_circuit_machineinfo_free_idx(): Freeing padding info idx 0 on circuit 2 (5)
+Aug 30 18:36:55.000 [info] circpad_handle_padding_negotiated(): Middle node did not accept our padding request on circuit 2 (5)
+```
+We see that our middle support padding (since we upgraded to tor-0.4.1.5), that
+we attempt to negotiate, our machine starts on the client, transitions to the
+end state, and is freed. The last line shows that the middle doesn't have a
+padding machine that can run. 
+
+Next, we follow the same steps as earlier and replace the modified `tor` at our
+middle relay. We don't update the logging there to avoid logging on the info
+level on the live network. Looking at the client log again we see that
+negotiation works as before except for the last line: it's missing, so the
+machine is running at the middle as well.  
+
+## Implementing the APE state machine
+
+Porting is fairly straightforward: define the states for all machines, add two
+more machines (for the receive portion of WTFP-PAD, beyond AP), and pick
+reasonable parameters for the distributions (I completely winged it now, as when
+implementing APE). The [circuit-padding-ape-machine
+branch](https://github.com/pylls/tor/tree/circuit-padding-ape-machine) contains
+the commits for the full machines with plenty of comments. 
+
+Some comments on the process:
+
+- `tor-0.4.1.5` does not support two machines on the same circuit, the following
+  fix has to be made: https://trac.torproject.org/projects/tor/ticket/31111 .
+  The good news is that everything else seems to work after the small change in
+  the fix. 
+- APE randomizes its distributions. Currently, this can only be done during
+  start of `tor`. This makes sense in the censorship circumvention setting
+  (`obfs4`), less so for WF defenses: further randomizing each circuit is likely
+  a PITA for attackers with few downsides.
+- it was annoying to figure out that the lack of systemd support in my compiled
+  tor caused systemd to interrupt (SIGINT) my tor process at the middle relay
+  every five minutes. Updated build steps above to hopefully save others the
+  pain.
+- there's for sure some bug on relays when sending padding cells too early (?).
+  It can happen with some probability with the APE implementation due to
+  `circpad_machine_relay_wf_ape_send()`. Will investigate next.
+- Moving the registration of machines from the definition of the machines to
+  `circpad_machines_init()` makes sense, as suggested in the circuit padding doc
+  draft.
+
+Remember that APE is just a proof-of-concept and we make zero claims about its
+ability to withstand WF attacks, in particular those based on deep learning.
diff --git a/doc/HACKING/CodeStructure.md b/doc/HACKING/CodeStructure.md
index 736d6cd484..fffafcaed1 100644
--- a/doc/HACKING/CodeStructure.md
+++ b/doc/HACKING/CodeStructure.md
@@ -2,128 +2,121 @@
 TODO: revise this to talk about how things are, rather than how things
 have changed.
 
-TODO: Make this into good markdown.
-
-
-
-For quite a while now, the program "tor" has been built from source
-code in just two directories: src/common and src/or.
+For quite a while now, the program *tor* has been built from source
+code in just two directories: **src/common** and **src/or**.
 
 This has become more-or-less untenable, for a few reasons -- most
 notably of which is that it has led our code to become more
 spaghetti-ish than I can endorse with a clean conscience.
 
 So to fix that, we've gone and done a huge code movement in our git
-master branch, which will land in a release once Tor 0.3.5.1-alpha is
+master branch, which will land in a release once Tor `0.3.5.1-alpha` is
 out.
 
 Here's what we did:
 
-  * src/common has been turned into a set of static libraries.  These
-all live in the "src/lib/*" directories.  The dependencies between
+  * **src/common** has been turned into a set of static libraries.  These
+all live in the **src/lib/*** directories.  The dependencies between
 these libraries should have no cycles.  The libraries are:
 
-    arch -- Headers to handle architectural differences
-    cc -- headers to handle differences among compilers
-    compress -- wraps zlib, zstd, lzma
-    container -- high-level container types
-    crypt_ops -- Cryptographic operations. Planning to split this into
+    - **arch** -- Headers to handle architectural differences
+    - **cc** -- headers to handle differences among compilers
+    - **compress** -- wraps zlib, zstd, lzma
+    - **container** -- high-level container types
+    - **crypt_ops** -- Cryptographic operations. Planning to split this into
 a higher and lower level library
-    ctime -- Operations that need to run in constant-time. (Properly,
+    - **ctime** -- Operations that need to run in constant-time. (Properly,
 data-invariant time)
-    defs -- miscelaneous definitions needed throughout Tor.
-    encoding -- transforming one data type into another, and various
+    - **defs** -- miscelaneous definitions needed throughout Tor.
+    - **encoding** -- transforming one data type into another, and various
 data types into strings.
-    err -- lowest-level error handling, in cases where we can't use
+    - **err** -- lowest-level error handling, in cases where we can't use
 the logs because something that the logging system needs has broken.
-    evloop -- Generic event-loop handling logic
-    fdio -- Low-level IO wrapper functions for file descriptors.
-    fs -- Operations on the filesystem
-    intmath -- low-level integer math and misc bit-twiddling hacks
-    lock -- low-level locking code
-    log -- Tor's logging module.  This library sits roughly halfway up
+    - **evloop** -- Generic event-loop handling logic
+    - **fdio** -- Low-level IO wrapper functions for file descriptors.
+    - **fs** -- Operations on the filesystem
+    - **intmath** -- low-level integer math and misc bit-twiddling hacks
+    - **lock** -- low-level locking code
+    - **log** -- Tor's logging module.  This library sits roughly halfway up
 the library dependency diagram, since everything it depends on has to
 be carefully crafted to *not* log.
-    malloc -- Low-level wrappers for the platform memory allocation functions.
-    math -- Higher-level mathematical functions, and floating-point math
-    memarea -- An arena allocator
-    meminfo -- Functions for querying the current process's memory
+    - **malloc** -- Low-level wrappers for the platform memory allocation functions.
+    - **math** -- Higher-level mathematical functions, and floating-point math
+    - **memarea** -- An arena allocator
+    - **meminfo** -- Functions for querying the current process's memory
 status and resources
-    net -- Networking compatibility and convenience code
-    osinfo -- Querying information about the operating system
-    process -- Launching and querying the status of other processes
-    sandbox -- Backend for the linux seccomp2 sandbox
-    smartlist_core -- The lowest-level of the smartlist_t data type.
+    - **net** -- Networking compatibility and convenience code
+    - **osinfo** -- Querying information about the operating system
+    - **process** -- Launching and querying the status of other processes
+    - **sandbox** -- Backend for the linux seccomp2 sandbox
+    - **smartlist_core** -- The lowest-level of the smartlist_t data type.
 Separated from the rest of the containers library because the logging
 subsystem depends on it.
-    string -- Compatibility and convenience functions for manipulating
+    - **string** -- Compatibility and convenience functions for manipulating
 C strings.
-    term -- Terminal-related functions (currently limited to a getpass
+    - **term** -- Terminal-related functions (currently limited to a getpass
 function).
-    testsupport -- Macros for mocking, unit tests, etc.
-    thread -- Higher-level thread compatibility code
-    time -- Higher-level time management code, including format
+    - **testsupport** -- Macros for mocking, unit tests, etc.
+    - **thread** -- Higher-level thread compatibility code
+    - **time** -- Higher-level time management code, including format
 conversions and monotonic time
-    tls -- Our wrapper around our TLS library
-    trace -- Formerly src/trace -- a generic event tracing API
-    wallclock -- Low-level time code, used by the log module.
+    - **tls** -- Our wrapper around our TLS library
+    - **trace** -- Formerly src/trace -- a generic event tracing API
+    - **wallclock** -- Low-level time code, used by the log module.
 
-  * To ensure that the dependency graph in src/common remains under
-control, there is a tool that you can run called "make
-check-includes".  It verifies that each module in Tor only includes
+  * To ensure that the dependency graph in **src/common** remains under
+control, there is a tool that you can run called `make
+check-includes`.  It verifies that each module in Tor only includes
 the headers that it is permitted to include, using a per-directory
-".may_include" file.
+*.may_include* file.
 
-  * The src/or/or.h header has been split into numerous smaller
+  * The **src/or/or.h** header has been split into numerous smaller
 headers.  Notably, many important structures are now declared in a
-header called foo_st.h, where "foo" is the name of the structure.
+header called *foo_st.h*, where "foo" is the name of the structure.
 
-  * The src/or directory, which had most of Tor's code, had been split
+  * The **src/or** directory, which had most of Tor's code, had been split
 up into several directories.  This is still a work in progress:  This
 code has not itself been refactored, and its dependency graph is still
 a tangled web.  I hope we'll be working on that over the coming
 releases, but it will take a while to do.
 
-    The new top-level source directories are:
-
-     src/core -- Code necessary to actually perform or use onion routing.
-     src/feature -- Code used only by some onion routing
+    - The new top-level source directories are:
+        - **src/core** -- Code necessary to actually perform or use onion routing.
+        - **src/feature** -- Code used only by some onion routing
 configurations, or only for a special purpose.
-     src/app -- Top-level code to run, invoke, and configure the
+        - **src/app** -- Top-level code to run, invoke, and configure the
 lower-level code
 
-   The new second-level source directories are:
-     src/core/crypto -- High-level cryptographic protocols used in Tor
-     src/core/mainloop -- Tor's event loop, connection-handling, and
+    - The new second-level source directories are:
+        - **src/core/crypto** -- High-level cryptographic protocols used in Tor
+        - **src/core/mainloop** -- Tor's event loop, connection-handling, and
 traffic-routing code.
-     src/core/or -- Parts related to handling onion routing itself
-     src/core/proto -- support for encoding and decoding different
+        - **src/core/or** -- Parts related to handling onion routing itself
+        - **src/core/proto** -- support for encoding and decoding different
 wire protocols
-
-     src/feature/api -- Support for making Tor embeddable
-     src/feature/client -- Functionality which only Tor clients need
-     src/feature/control -- Controller implementation
-     src/feature/dirauth -- Directory authority
-     src/feature/dircache -- Directory cache
-     src/feature/dirclient -- Directory client
-     src/feature/dircommon -- Shared code between the other directory modules
-     src/feature/hibernate -- Hibernating when Tor is out of bandwidth
+        - **src/feature/api** -- Support for making Tor embeddable
+        - **src/feature/client** -- Functionality which only Tor clients need
+        - **src/feature/control** -- Controller implementation
+        - **src/feature/dirauth** -- Directory authority
+        - **src/feature/dircache** -- Directory cache
+        - **src/feature/dirclient** -- Directory client
+        - **src/feature/dircommon** -- Shared code between the other directory modules
+        - **src/feature/hibernate** -- Hibernating when Tor is out of bandwidth
 or shutting down
-     src/feature/hs -- v3 onion service implementation
-     src/feature/hs_common -- shared code between both onion service
+        - **src/feature/hs** -- v3 onion service implementation
+        - **src/feature/hs_common** -- shared code between both onion service
 implementations
-     src/feature/nodelist -- storing and accessing the list of relays on
+        - **src/feature/nodelist** -- storing and accessing the list of relays on
 the network.
-     src/feature/relay -- code that only relay servers and exit servers need.
-     src/feature/rend -- v2 onion service implementation
-     src/feature/stats -- statistics and history
-
-     src/app/config -- configuration and state for Tor
-     src/app/main -- Top-level functions to invoke the rest or Tor.
+        - **src/feature/relay** -- code that only relay servers and exit servers need.
+        - **src/feature/rend** -- v2 onion service implementation
+        - **src/feature/stats** -- statistics and history
+        - **src/app/config** -- configuration and state for Tor
+        - **src/app/main** -- Top-level functions to invoke the rest or Tor.
 
-  * The "tor" executable is now built in src/app/tor rather than src/or/tor.
+  * The `tor` executable is now built in **src/app/tor** rather than **src/or/tor**.
 
   * There are more static libraries than before that you need to build
 into your application if you want to embed Tor.  Rather than
-maintaining this list yourself, I recommend that you run "make
-show-libs" to have Tor emit a list of what you need to link.
+maintaining this list yourself, I recommend that you run `make
+show-libs` to have Tor emit a list of what you need to link.
diff --git a/doc/HACKING/CodingStandards.md b/doc/HACKING/CodingStandards.md
index 4f229348e4..7999724166 100644
--- a/doc/HACKING/CodingStandards.md
+++ b/doc/HACKING/CodingStandards.md
@@ -42,6 +42,7 @@ If you have changed build system components:
    - For example, if you have changed Makefiles, autoconf files, or anything
      else that affects the build system.
 
+
 License issues
 ==============
 
@@ -58,7 +59,6 @@ Some compatible licenses include:
   - CC0 Public Domain Dedication
 
 
-
 How we use Git branches
 =======================
 
@@ -99,29 +99,65 @@ When you do a commit that needs a ChangeLog entry, add a new file to
 the `changes` toplevel subdirectory.  It should have the format of a
 one-entry changelog section from the current ChangeLog file, as in
 
-- Major bugfixes:
+  o Major bugfixes (security):
     - Fix a potential buffer overflow. Fixes bug 99999; bugfix on
       0.3.1.4-beta.
+  o Minor features (performance):
+    - Make tor faster. Closes ticket 88888.
 
 To write a changes file, first categorize the change.  Some common categories
-are: Minor bugfixes, Major bugfixes, Minor features, Major features, Code
-simplifications and refactoring.  Then say what the change does.  If
-it's a bugfix, mention what bug it fixes and when the bug was
-introduced.  To find out which Git tag the change was introduced in,
-you can use `git describe --contains <sha1 of commit>`.
-
-If at all possible, try to create this file in the same commit where you are
-making the change.  Please give it a distinctive name that no other branch will
-use for the lifetime of your change. To verify the format of the changes file,
-you can use `make check-changes`.  This is run automatically as part of
-`make check` -- if it fails, we must fix it before we release.  These
-checks are implemented in `scripts/maint/lintChanges.py`.
+are:
+  o Minor bugfixes (subheading):
+  o Major bugfixes (subheading):
+  o Minor features (subheading):
+  o Major features (subheading):
+  o Code simplifications and refactoring:
+  o Testing:
+  o Documentation:
+
+The subheading is a particular area within Tor.  See the ChangeLog for
+examples.
+
+Then say what the change does.  If it's a bugfix, mention what bug it fixes
+and when the bug was introduced. To find out which Git tag the change was
+introduced in, you can use `git describe --contains <sha1 of commit>`.
+If you don't know the commit, you can search the git diffs (-S) for the first
+instance of the feature (--reverse).
+
+For example, for #30224, we wanted to know when the bridge-distribution-request
+feature was introduced into Tor:
+    $ git log -S bridge-distribution-request --reverse
+    commit ebab521525
+    Author: Roger Dingledine <arma@torproject.org>
+    Date:   Sun Nov 13 02:39:16 2016 -0500
+
+        Add new BridgeDistribution config option
+
+    $ git describe --contains ebab521525
+    tor-0.3.2.3-alpha~15^2~4
+
+If you need to know all the Tor versions that contain a commit, use:
+    $ git tag --contains 9f2efd02a1 | sort -V
+    tor-0.2.5.16
+    tor-0.2.8.17
+    tor-0.2.9.14
+    tor-0.2.9.15
+    ...
+    tor-0.3.0.13
+    tor-0.3.1.9
+    tor-0.3.1.10
+    ...
+
+If at all possible, try to create the changes file in the same commit where
+you are making the change.  Please give it a distinctive name that no other
+branch will use for the lifetime of your change. We usually use "ticketNNNNN"
+or "bugNNNNN", where NNNNN is the ticket number. To verify the format of the
+changes file, you can use `make check-changes`.  This is run automatically as
+part of `make check` -- if it fails, we must fix it as soon as possible, so
+that our CI passes.  These checks are implemented in
+`scripts/maint/lintChanges.py`.
 
 Changes file style guide:
-  * Changes files begin with "  o Header (subheading):".  The header
-    should usually be "Minor/Major bugfixes/features". The subheading is a
-    particular area within Tor.  See the ChangeLog for examples.
-
   * Make everything terse.
 
   * Write from the user's point of view: describe the user-visible changes
@@ -183,6 +219,9 @@ deviations from our C whitespace style.  Generally, we use:
    - No space between a function name and an opening paren. `puts(x)`, not
      `puts (x)`.
    - Function declarations at the start of the line.
+   - Use `void foo(void)` to declare a function with no arguments.  Saying
+     `void foo()` is C++ syntax.
+   - Use `const` for new APIs.
 
 If you use an editor that has plugins for editorconfig.org, the file
 `.editorconfig` will help you to conform this coding style.
@@ -199,20 +238,49 @@ We have some wrapper functions like `tor_malloc`, `tor_free`, `tor_strdup`, and
 `tor_gettimeofday;` use them instead of their generic equivalents.  (They
 always succeed or exit.)
 
+Specifically, Don't use `malloc`, `realloc`, `calloc`, `free`, or
+`strdup`. Use `tor_malloc`, `tor_realloc`, `tor_calloc`, `tor_free`, or
+`tor_strdup`.
+
+Don't use `tor_realloc(x, y\*z)`. Use `tor_reallocarray(x, y, z)` instead.;
+
 You can get a full list of the compatibility functions that Tor provides by
 looking through `src/lib/*/*.h`.  You can see the
 available containers in `src/lib/containers/*.h`.  You should probably
 familiarize yourself with these modules before you write too much code, or
 else you'll wind up reinventing the wheel.
 
-We don't use `strcat` or `strcpy` or `sprintf` of any of those notoriously broken
-old C functions.  Use `strlcat`, `strlcpy`, or `tor_snprintf/tor_asprintf` instead.
+
+We don't use `strcat` or `strcpy` or `sprintf` of any of those notoriously
+broken old C functions.  We also avoid `strncat` and `strncpy`. Use
+`strlcat`, `strlcpy`, or `tor_snprintf/tor_asprintf` instead.
 
 We don't call `memcmp()` directly.  Use `fast_memeq()`, `fast_memneq()`,
-`tor_memeq()`, or `tor_memneq()` for most purposes.
+`tor_memeq()`, or `tor_memneq()` for most purposes.  If you really need a
+tristate return value, use `tor_memcmp()` or `fast_memcmp()`.
+
+Don't call `assert()` directly. For hard asserts, use `tor_assert()`. For
+soft asserts, use `tor_assert_nonfatal()` or `BUG()`. If you need to print
+debug information in assert error message, consider using `tor_assertf()` and
+`tor_assertf_nonfatal()`.  If you are writing code that is too low-level to
+use the logging subsystem, use `raw_assert()`.
+
+Don't use `toupper()` and `tolower()` functions. Use `TOR_TOUPPER` and
+`TOR_TOLOWER` macros instead. Similarly, use `TOR_ISALPHA`, `TOR_ISALNUM` et.
+al. instead of `isalpha()`, `isalnum()`, etc.
+
+When allocating new string to be added to a smartlist, use
+`smartlist_add_asprintf()` to do both at once.
+
+Avoid calling BSD socket functions directly. Use portable wrappers to work
+with sockets and socket addresses. Also, sockets should be of type
+`tor_socket_t`.
+
+Don't use any of these functions: they aren't portable. Use the
+version prefixed with `tor_` instead: strtok_r, memmem, memstr,
+asprintf, localtime_r, gmtime_r, inet_aton, inet_ntop, inet_pton,
+getpass, ntohll, htonll.  (This list is incomplete.)
 
-Also see a longer list of functions to avoid in:
-https://people.torproject.org/~nickm/tor-auto/internal/this-not-that.html
 
 What code can use what other code?
 ----------------------------------
@@ -302,8 +370,16 @@ definitions when necessary.)
 Assignment operators shouldn't nest inside other expressions.  (You can
 ignore this inside macro definitions when necessary.)
 
-Functions not to write
-----------------------
+Binary data and wire formats
+----------------------------
+
+Use pointer to `char` when representing NUL-terminated string. To represent
+arbitrary binary data, use pointer to `uint8_t`. (Many older Tor APIs ignore
+this rule.)
+
+Refrain from attempting to encode integers by casting their pointers to byte
+arrays. Use something like `set_uint32()`/`get_uint32()` instead and don't
+forget about endianness.
 
 Try to never hand-write new code to parse or generate binary
 formats. Instead, use trunnel if at all possible.  See
@@ -314,7 +390,6 @@ for more information about trunnel.
 
 For information on adding new trunnel code to Tor, see src/trunnel/README
 
-
 Calling and naming conventions
 ------------------------------
 
@@ -422,6 +497,8 @@ to use it as a function callback), define it with a name like
       abc_free_(obj);
     }
 
+When deallocating, don't say e.g. `if (x) tor_free(x)`. The convention is to
+have deallocators do nothing when NULL pointer is passed.
 
 Doxygen comment conventions
 ---------------------------
diff --git a/doc/HACKING/CodingStandardsRust.md b/doc/HACKING/CodingStandardsRust.md
index fc562816db..b570e10dc7 100644
--- a/doc/HACKING/CodingStandardsRust.md
+++ b/doc/HACKING/CodingStandardsRust.md
@@ -256,7 +256,7 @@ Here are some additional bits of advice and rules:
    or 2) should fail (i.e. in a unittest).
 
    You SHOULD NOT use `unwrap()` anywhere in which it is possible to handle the
-   potential error with either `expect()` or the eel operator, `?`.
+   potential error with the eel operator, `?` or another non panicking way.
    For example, consider a function which parses a string into an integer:
 
         fn parse_port_number(config_string: &str) -> u16 {
@@ -264,12 +264,12 @@ Here are some additional bits of advice and rules:
         }
 
    There are numerous ways this can fail, and the `unwrap()` will cause the
-   whole program to byte the dust!  Instead, either you SHOULD use `expect()`
+   whole program to byte the dust!  Instead, either you SHOULD use `ok()`
    (or another equivalent function which will return an `Option` or a `Result`)
    and change the return type to be compatible:
 
         fn parse_port_number(config_string: &str) -> Option<u16> {
-            u16::from_str_radix(config_string, 10).expect("Couldn't parse port into a u16")
+            u16::from_str_radix(config_string, 10).ok()
         }
 
    or you SHOULD use `or()` (or another similar method):
diff --git a/doc/HACKING/EndOfLifeTor.md b/doc/HACKING/EndOfLifeTor.md
new file mode 100644
index 0000000000..2fece2ca9d
--- /dev/null
+++ b/doc/HACKING/EndOfLifeTor.md
@@ -0,0 +1,50 @@
+
+End of Life on an old release series
+------------------------------------
+
+Here are the steps that the maintainer should take when an old Tor release
+series reaches End of Life.  Note that they are _only_ for entire series that
+have reached their planned EOL: they do not apply to security-related
+deprecations of individual versions.
+
+### 0. Preliminaries
+
+0. A few months before End of Life:
+   Write a deprecation announcement.
+   Send the announcement out with every new release announcement.
+
+1. A month before End of Life:
+   Send the announcement to tor-announce, tor-talk, tor-relays, and the
+   packagers.
+
+### 1. On the day
+
+1. Open tickets to remove the release from:
+   - the jenkins builds
+   - tor's Travis CI cron jobs
+   - chutney's Travis CI tests (#)
+   - stem's Travis CI tests (#)
+
+2. Close the milestone in Trac. To do this, go to Trac, log in,
+   select "Admin" near the top of the screen, then select "Milestones" from
+   the menu on the left.  Click on the milestone for this version, and
+   select the "Completed" checkbox. By convention, we select the date as
+   the End of Life date.
+
+3. Replace NNN-backport with NNN-unreached-backport in all open trac tickets.
+
+4. If there are any remaining tickets in the milestone:
+     - merge_ready tickets are for backports:
+       - if there are no supported releases for the backport, close the ticket
+       - if there is an earlier (LTS) release for the backport, move the ticket
+         to that release
+     - other tickets should be closed (if we won't fix them) or moved to a
+       supported release (if we will fix them)
+
+5. Mail the end of life announcement to tor-announce, the packagers list,
+   and tor-relays. The current list of packagers is in ReleasingTor.md.
+
+6. Ask at least two of weasel/arma/Sebastian to remove the old version
+   number from their approved versions list.
+
+7. Update the CoreTorReleases wiki page.
diff --git a/doc/HACKING/Fuzzing.md b/doc/HACKING/Fuzzing.md
index 2039d6a4c0..c2db7e9853 100644
--- a/doc/HACKING/Fuzzing.md
+++ b/doc/HACKING/Fuzzing.md
@@ -1,6 +1,6 @@
-= Fuzzing Tor
+# Fuzzing Tor
 
-== The simple version (no fuzzing, only tests)
+## The simple version (no fuzzing, only tests)
 
 Check out fuzzing-corpora, and set TOR_FUZZ_CORPORA to point to the place
 where you checked it out.
@@ -12,7 +12,7 @@ This won't actually fuzz Tor!  It will just run all the fuzz binaries
 on our existing set of testcases for the fuzzer.
 
 
-== Different kinds of fuzzing
+## Different kinds of fuzzing
 
 Right now we support three different kinds of fuzzer.
 
@@ -37,7 +37,7 @@ In all cases, you'll need some starting examples to give the fuzzer when it
 starts out.  There's a set in the "fuzzing-corpora" git repository.  Try
 setting TOR_FUZZ_CORPORA to point to a checkout of that repository
 
-== Writing Tor fuzzers
+## Writing Tor fuzzers
 
 A tor fuzzing harness should have:
 * a fuzz_init() function to set up any necessary global state.
@@ -52,7 +52,7 @@ bug, or accesses memory it shouldn't. This helps fuzzing frameworks detect
 "interesting" cases.
 
 
-== Guided Fuzzing with AFL
+## Guided Fuzzing with AFL
 
 There is no HTTPS, hash, or signature for American Fuzzy Lop's source code, so
 its integrity can't be verified. That said, you really shouldn't fuzz on a
@@ -101,7 +101,7 @@ macOS (OS X) requires slightly more preparation, including:
 * using afl-clang (or afl-clang-fast from the llvm directory)
 * disabling external crash reporting (AFL will guide you through this step)
 
-== Triaging Issues
+## Triaging Issues
 
 Crashes are usually interesting, particularly if using AFL_HARDEN=1 and --enable-expensive-hardening. Sometimes crashes are due to bugs in the harness code.
 
@@ -115,7 +115,7 @@ To see what fuzz-http is doing with a test case, call it like this:
 
 (Logging is disabled while fuzzing to increase fuzzing speed.)
 
-== Reporting Issues
+## Reporting Issues
 
 Please report any issues discovered using the process in Tor's security issue
 policy:
diff --git a/doc/HACKING/HelpfulTools.md b/doc/HACKING/HelpfulTools.md
index d499238526..ae892c34a2 100644
--- a/doc/HACKING/HelpfulTools.md
+++ b/doc/HACKING/HelpfulTools.md
@@ -251,16 +251,16 @@ Now you can run Tor with profiling enabled, and use the pprof utility to look at
 performance! See the gperftools manual for more info, but basically:
 
 2. Run `env CPUPROFILE=/tmp/profile src/app/tor -f <path/torrc>`. The profile file
-   is not written to until Tor finishes execuction.
+   is not written to until Tor finishes execution.
 
-3. Run `pprof src/app/tor /tm/profile` to start the REPL.
+3. Run `pprof src/app/tor /tmp/profile` to start the REPL.
 
 Generating and analyzing a callgraph
 ------------------------------------
 
 0. Build Tor on linux or mac, ideally with -O0 or -fno-inline.
 
-1. Clone 'https://gitweb.torproject.org/user/nickm/calltool.git/' .
+1. Clone 'https://git.torproject.org/user/nickm/calltool.git/' .
    Follow the README in that repository.
 
 Note that currently the callgraph generator can't detect calls that pass
@@ -315,6 +315,30 @@ If you use emacs for editing Tor and nothing else, you could always just say:
 There is probably a better way to do this.  No, we are probably not going
 to clutter the files with emacs stuff.
 
+Building a tag file (code index)
+--------------------------------
+
+Many functions in tor use `MOCK_IMPL` wrappers for unit tests. Your
+tag-building program must be told how to handle this syntax.
+
+If you're using emacs, you can generate an emacs-compatible tag file using
+`make tags`. This will run your system's `etags`. Tor's build system assumes
+that you're using the emacs-specific version of `etags` (bundled under the
+`xemacs21-bin` package on Debian). This is incompatible with other versions of
+`etags` such as the version provided by Exuberant Ctags.
+
+If you're using vim or emacs, you can also use Universal Ctags to build a tag
+file using the syntax:
+
+    ctags -R -D 'MOCK_IMPL(r,h,a)=r h a' .
+
+If you're using an older version of Universal Ctags, you can use the following
+instead:
+
+    ctags -R --mline-regex-c='/MOCK_IMPL\([^,]+,\W*([a-zA-Z0-9_]+)\W*,/\1/f/{mgroup=1}' .
+
+A vim-compatible tag file will be generated by default. If you use emacs, add
+the `-e` flag to generate an emacs-compatible tag file.
 
 Doxygen
 -------
@@ -371,3 +395,18 @@ source code. Here's how to use it:
 
   6. See the Doxygen manual for more information; this summary just
      scratches the surface.
+
+Style and best-practices checking
+--------------------------------
+
+We use scripts to check for various problems in the formatting and style
+of our source code.  The "check-spaces" test detects a bunch of violations
+of our coding style on the local level.  The "check-best-practices" test
+looks for violations of some of our complexity guidelines.
+
+You can tell the tool about exceptions to the complexity guidelines via its
+exceptions file (scripts/maint/practracker/exceptions.txt).  But before you
+do this, consider whether you shouldn't fix the underlying problem.  Maybe
+that file really _is_ too big.  Maybe that function really _is_ doing too
+much.  (On the other hand, for stable release series, it is sometimes better
+to leave things unrefactored.)
diff --git a/doc/HACKING/HowToReview.md b/doc/HACKING/HowToReview.md
index 2d1f3d1c9e..2325e70175 100644
--- a/doc/HACKING/HowToReview.md
+++ b/doc/HACKING/HowToReview.md
@@ -63,7 +63,7 @@ Let's look at the code!
 Let's look at the documentation!
 --------------------------------
 
-- Does the documentation confirm to CodingStandards.txt?
+- Does the documentation conform to CodingStandards.txt?
 
 - Does it make sense?
 
diff --git a/doc/HACKING/Maintaining.md b/doc/HACKING/Maintaining.md
new file mode 100644
index 0000000000..4d5a7f6b76
--- /dev/null
+++ b/doc/HACKING/Maintaining.md
@@ -0,0 +1,113 @@
+# Maintaining Tor
+
+This document details the duties and processes on maintaining the Tor code
+base.
+
+The first section describes who is the current Tor maintainer and what are the
+responsibilities. Tor has one main single maintainer but does have many
+committers and subsystem maintainers.
+
+The second third section describes how the **alpha and master** branches are
+maintained and by whom.
+
+Finally, the last section describes how the **stable** branches are maintained
+and by whom.
+
+This document does not cover how Tor is released, please see
+[ReleasingTor.md](ReleasingTor.md) for that information.
+
+## Tor Maintainer
+
+The current maintainer is Nick Mathewson <nickm@torproject.org>.
+
+The maintainer takes final decisions in terms of engineering, architecture and
+protocol design. Releasing Tor falls under their responsibility.
+
+## Alpha and Master Branches
+
+The Tor repository always has at all times a **master** branch which contains
+the upstream ongoing development.
+
+It may also contain a branch for a released feature freezed version which is
+called the **alpha** branch. The git tag and version number is always
+postfixed with `-alpha[-dev]`. For example: `tor-0.3.5.0-alpha-dev` or
+`tor-0.3.5.3-alpha`.
+
+Tor is separated into subsystems and some of those are maintained by other
+developers than the main maintainer. Those people have commit access to the
+code base but only commit (in most cases) into the subsystem they maintain.
+
+Upstream merges are restricted to the alpha and master branches. Subsystem
+maintainers should never push a patch into a stable branch which is the
+responsibility of the [stable branch maintainer](#stable-branches).
+
+### Who
+
+In alphabetical order, the following people have upstream commit access and
+maintain the following subsystems:
+
+- David Goulet <dgoulet@torproject.org>
+  * Onion Service (including Shared Random).  
+    ***keywords:*** *[tor-hs]*
+  * Channels, Circuitmux, Connection, Scheduler.  
+    ***keywords:*** *[tor-chan, tor-cmux, tor-sched, tor-conn]*
+  * Cell Logic (Handling/Parsing).  
+    ***keywords:*** *[tor-cell]*
+  * Threading backend.
+    ***keywords:*** *[tor-thread]*  
+
+- George Kadianakis <asn@torproject.org>
+  * Onion Service (including Shared Random).  
+    ***keywords:*** *[tor-hs]*
+  * Guard.  
+    ***keywords:*** *[tor-guard]*
+  * Pluggable Transport (excluding Bridge networking).  
+    ***keywords:*** *[tor-pt]*
+
+### Tasks
+
+These are the tasks of a subsystem maintainer:
+
+1. Regularly go over `merge_ready` tickets relevant to the related subsystem
+   and for the current alpha or development (master branch) Milestone.
+
+2. A subsystem maintainer is expected to contribute to any design changes
+   (including proposals) or large patch set about the subsystem.
+
+3. Leave their ego at the door. Mistakes will be made but they have to be
+   taking care of seriously. Learn and move on quickly.
+
+### Merging Policy
+
+These are few important items to follow when merging code upstream:
+
+1. To merge code upstream, the patch must have passed our CI (currently
+   github.com/torproject), have a corresponding ticket and reviewed by
+   **at least** one person that is not the original coder.
+
+   Example A: If Alice writes a patch then Bob, a Tor network team member,
+   reviews it and flags it `merge_ready`. Then, the maintainer is required
+   to look at the patch and makes a decision.
+
+   Example B: If the maintainer writes a patch then Bob, a Tor network
+   team member, reviews it and flags it `merge_ready`, then the maintainer
+   can merge the code upstream.
+
+2. Maintainer makes sure the commit message should describe what was fixed
+   and, if it applies, how was it fixed. It should also always refer to
+   the ticket number.
+
+3. Trivial patches such as comment change, documentation, syntax issues or
+   typos can be merged without a ticket or reviewers.
+
+4. Tor uses the "merge forward" method, that is, if a patch applies to the
+   alpha branch, it has to be merged there first and then merged forward
+   into master.
+
+5. Maintainer should always consult with the network team about any doubts,
+   mis-understandings or unknowns of a patch. Final word will always go to the
+   main Tor maintainer.
+
+## Stable Branches
+
+(Currently being drafted and reviewed by the network team.)
diff --git a/doc/HACKING/Module.md b/doc/HACKING/Module.md
index 9cf36090b4..781bb978f2 100644
--- a/doc/HACKING/Module.md
+++ b/doc/HACKING/Module.md
@@ -8,13 +8,24 @@ module in Tor.
 In the context of the tor code base, a module is a subsystem that we can
 selectively enable or disable, at `configure` time.
 
-Currently, there is only one module:
+Currently, tor has these modules:
 
+  - Relay subsystem (relay)
+  - Directory cache system (dircache).
   - Directory Authority subsystem (dirauth)
 
-It is located in its own directory in `src/feature/dirauth/`. To disable it,
-one need to pass `--disable-module-dirauth` at configure time. All modules
-are currently enabled by default.
+The dirauth code is located in its own directory in `src/feature/dirauth/`.
+
+The relay code is located in a directory named `src/*/*relay`, which is
+being progressively refactored and disabled.
+
+The dircache code is located in `src/*/*dircache`.  Right now, it is
+disabled if and only if the relay module is disabled.  (We are treating
+them as separate modules because they are logically independent, not
+because you would actually want to run one without the other.)
+
+To disable a module, pass `--disable-module-{dirauth,relay}` at configure
+time. All modules are currently enabled by default.
 
 ## Build System ##
 
@@ -24,7 +35,7 @@ The changes to the build system are pretty straightforward.
    contains a list (white-space separated) of the module in tor. Add yours to
    the list.
 
-2. Use the `AC_ARG_ENABLE([module-dirauth]` template for your new module. We
+2. Use the `AC_ARG_ENABLE([module-relay]` template for your new module. We
    use the "disable module" approach instead of enabling them one by one. So,
    by default, tor will build all the modules.
 
@@ -32,7 +43,7 @@ The changes to the build system are pretty straightforward.
    the C code to conditionally compile things for your module. And the
    `BUILD_MODULE_<name>` is also defined for automake files (e.g: include.am).
 
-3. In the `src/core/include.am` file, locate the `MODULE_DIRAUTH_SOURCES`
+3. In the `src/core/include.am` file, locate the `MODULE_RELAY_SOURCES`
    value.  You need to create your own `_SOURCES` variable for your module
    and then conditionally add the it to `LIBTOR_A_SOURCES` if you should
    build the module.
@@ -40,18 +51,14 @@ The changes to the build system are pretty straightforward.
    It is then **very** important to add your SOURCES variable to
    `src_or_libtor_testing_a_SOURCES` so the tests can build it.
 
-4. Do the same for header files, locate `ORHEADERS +=` which always add all
-   headers of all modules so the symbol can be found for the module entry
-   points.
-
 Finally, your module will automatically be included in the
-`TOR_MODULES_ALL_ENABLED` variable which is used to build the unit tests. They
-always build everything in order to tests everything.
+`TOR_MODULES_ALL_ENABLED` variable which is used to build the unit tests.
+They always build everything in order to test everything.
 
 ## Coding ##
 
-As mentioned above, a module must be isolated in its own directory (name of
-the module) in `src/feature/`.
+As mentioned above, a module should be isolated in its own directories,
+suffixed with the name of the module, in `src/*/`.
 
 There are couples of "rules" you want to follow:
 
diff --git a/doc/HACKING/ReleasingTor.md b/doc/HACKING/ReleasingTor.md
index 55a40fc89b..0f453ca2aa 100644
--- a/doc/HACKING/ReleasingTor.md
+++ b/doc/HACKING/ReleasingTor.md
@@ -5,7 +5,7 @@ Putting out a new release
 Here are the steps that the maintainer should take when putting out a
 new Tor release:
 
-=== 0. Preliminaries
+### 0. Preliminaries
 
 1. Get at least two of weasel/arma/Sebastian to put the new
    version number in their approved versions list.  Give them a few
@@ -18,35 +18,41 @@ new Tor release:
    date of a TB that contains it.  See note below in "commit, upload,
    announce".
 
-=== I. Make sure it works
+### I. Make sure it works
 
-1. Use it for a while, as a client, as a relay, as a hidden service,
-   and as a directory authority. See if it has any obvious bugs, and
-   resolve those.
+1. Make sure that CI passes: have a look at Travis
+   (https://travis-ci.org/torproject/tor/branches), Appveyor
+   (https://ci.appveyor.com/project/torproject/tor/history), and
+   Jenkins (https://jenkins.torproject.org/view/tor/).
+   Make sure you're looking at the right branches.
 
-   As applicable, merge the `maint-X` branch into the `release-X` branch.
-   But you've been doing that all along, right?
+   If there are any unexplained failures, try to fix them or figure them
+   out.
 
-2. Are all of the jenkins builders happy?  See jenkins.torproject.org.
+2. Verify that there are no big outstanding issues.  You might find such
+   issues --
 
-   What about the bsd buildbots?
-         See http://buildbot.pixelminers.net/builders/
+    * On Trac
 
-   What about Coverity Scan?
+    * On coverity scan
 
-   What about clang scan-build?
+    * On OSS-Fuzz
 
-   Does 'make distcheck' complain?
+3. Run checks that aren't covered above, including:
 
-   How about 'make test-stem' and 'make test-network' and
-   `make test-network-full`?
+    * clang scan-build.  (See the script in ./scripts/test/scan_build.sh)
 
-       - Are all those tests still happy with --enable-expensive-hardening ?
+    * make test-network and make test-network-all (with
+      --enable-fragile-hardening)
 
-   Any memory leaks?
+    * Running Tor yourself and making sure that it actually works for you.
 
+    * Running Tor under valgrind.  (Our 'fragile hardening' doesn't cover
+      libevent and openssl, so using valgrind will sometimes find extra
+      memory leaks.)
 
-=== II. Write a changelog
+
+### II. Write a changelog
 
 
 1a. (Alpha release variant)
@@ -55,11 +61,14 @@ new Tor release:
    of them and reordering to focus on what users and funders would find
    interesting and understandable.
 
-   To do this, first run `./scripts/maint/lintChanges.py changes/*` and
-   fix as many warnings as you can.  Then run `./scripts/maint/sortChanges.py
-   changes/* > changelog.in` to combine headings and sort the entries.
-   After that, it's time to hand-edit and fix the issues that lintChanges
-   can't find:
+   To do this, run
+      `./scripts/maint/sortChanges.py changes/* > changelog.in`
+   to combine headings and sort the entries.  Copy the changelog.in file
+   into the ChangeLog.  Run 'format_changelog.py' (see below) to clean
+   up the line breaks.
+
+   After that, it's time to hand-edit and fix the issues that
+   lintChanges can't find:
 
    1. Within each section, sort by "version it's a bugfix on", else by
       numerical ticket order.
@@ -68,8 +77,6 @@ new Tor release:
 
       Make stuff very terse
 
-      Make sure each section name ends with a colon
-
       Describe the user-visible problem right away
 
       Mention relevant config options by name.  If they're rare or unusual,
@@ -79,7 +86,9 @@ new Tor release:
 
       Present and imperative tense: not past.
 
-      'Relays', not 'servers' or 'nodes' or 'Tor relays'.
+      "Relays", not "servers" or "nodes" or "Tor relays".
+
+      "Onion services", not "hidden services".
 
       "Stop FOOing", not "Fix a bug where we would FOO".
 
@@ -100,12 +109,14 @@ new Tor release:
 
    For stable releases that backport things from later, we try to compose
    their releases, we try to make sure that we keep the changelog entries
-   identical to their original versions, with a 'backport from 0.x.y.z'
+   identical to their original versions, with a "backport from 0.x.y.z"
    note added to each section.  So in this case, once you have the items
    from the changes files copied together, don't use them to build a new
    changelog: instead, look up the corrected versions that were merged
    into ChangeLog in the master branch, and use those.
 
+   Add "backport from X.Y.Z" in the section header for these entries.
+
 2. Compose a short release blurb to highlight the user-facing
    changes. Insert said release blurb into the ChangeLog stanza. If it's
    a stable release, add it to the ReleaseNotes file too. If we're adding
@@ -128,51 +139,65 @@ new Tor release:
    text of existing entries, though.)
 
 
-=== III. Making the source release.
+### III. Making the source release.
 
 1. In `maint-0.?.x`, bump the version number in `configure.ac` and run
-   `perl scripts/maint/updateVersions.pl` to update version numbers in other
+   `make update-versions` to update version numbers in other
    places, and commit.  Then merge `maint-0.?.x` into `release-0.?.x`.
 
-   (NOTE: To bump the version number, edit `configure.ac`, and then run
-   either `make`, or `perl scripts/maint/updateVersions.pl`, depending on
-   your version.)
-
    When you merge the maint branch forward to the next maint branch, or into
    master, merge it with "-s ours" to avoid a needless version bump.
 
 2. Make distcheck, put the tarball up in somewhere (how about your
-   homedir on your homedir on people.torproject.org?) , and tell `#tor`
-   about it. Wait a while to see if anybody has problems building it.
-   (Though jenkins is usually pretty good about catching these things.)
+   homedir on your homedir on people.torproject.org?) , and tell `#tor-dev`
+   about it.
+
+   If you want, wait until at least one person has built it
+   successfully.  (We used to say "wait for others to test it", but our
+   CI has successfully caught these kinds of errors for the last several
+   years.)
+
 
-=== IV. Commit, upload, announce
+3. Make sure that the new version is recommended in the latest consensus.
+   (Otherwise, users will get confused when it complains to them
+   about its status.)
+
+   If it is not, you'll need to poke Roger, Weasel, and Sebastian again: see
+   item 0.1 at the start of this document.
+
+### IV. Commit, upload, announce
 
 1. Sign the tarball, then sign and push the git tag:
 
         gpg -ba <the_tarball>
-        git tag -u <keyid> tor-0.3.x.y-status
-        git push origin tag tor-0.3.x.y-status
+        git tag -s tor-0.4.x.y-<status>
+        git push origin tag tor-0.4.x.y-<status>
 
-   (You must do this before you update the website: it relies on finding
-   the version by tag.)
+   (You must do this before you update the website: the website scripts
+   rely on finding the version by tag.)
+
+   (If your default PGP key is not the one you want to sign with, then say
+   "-u <keyid>" instead of "-s".)
 
 2. scp the tarball and its sig to the dist website, i.e.
-   `/srv/dist-master.torproject.org/htdocs/` on dist-master. When you want
-   it to go live, you run "static-update-component dist.torproject.org"
-   on dist-master.
+   `/srv/dist-master.torproject.org/htdocs/` on dist-master. Run
+   "static-update-component dist.torproject.org" on dist-master.
 
-   In the webwml.git repository, `include/versions.wmi` and `Makefile`
-   to note the new version.
+   In the webwml.git repository, `include/versions.wmi` and `Makefile`.
+   In the project/web/tpo.git repository, update `databags/versions.ini`
+   to note the new version.  Push these changes to master.
 
    (NOTE: Due to #17805, there can only be one stable version listed at
    once.  Nonetheless, do not call your version "alpha" if it is stable,
    or people will get confused.)
 
+   (NOTE: It will take a while for the website update scripts to update
+   the website.)
+
 3. Email the packagers (cc'ing tor-team) that a new tarball is up.
    The current list of packagers is:
 
-       - {weasel,gk,mikeperry} at torproject dot org
+       - {weasel,sysrqb,mikeperry} at torproject dot org
        - {blueness} at gentoo dot org
        - {paul} at invizbox dot io
        - {vincent} at invizbox dot com
@@ -186,37 +211,47 @@ new Tor release:
 
    Also, email tor-packagers@lists.torproject.org.
 
+   Mention where to download the tarball (https://dist.torproject.org).
+
+   Include a link to the changelog.
+
+
 4. Add the version number to Trac.  To do this, go to Trac, log in,
     select "Admin" near the top of the screen, then select "Versions" from
     the menu on the left.  At the right, there will be an "Add version"
     box.  By convention, we enter the version in the form "Tor:
-    0.2.2.23-alpha" (or whatever the version is), and we select the date as
+    0.4.0.1-alpha" (or whatever the version is), and we select the date as
     the date in the ChangeLog.
 
-5. Double-check: did the version get recommended in the consensus yet?  Is
-   the website updated?  If not, don't announce until they have the
-   up-to-date versions, or people will get confused.
+5. Wait for the download page to be updated. (If you don't do this before you
+   announce, people will be confused.)
 
 6. Mail the release blurb and ChangeLog to tor-talk (development release) or
    tor-announce (stable).
 
    Post the changelog on the blog as well. You can generate a
-   blog-formatted version of the changelog with the -B option to
-   format-changelog.
+   blog-formatted version of the changelog with
+      `./scripts/maint/format_changelog.py --B`
 
    When you post, include an estimate of when the next TorBrowser
    releases will come out that include this Tor release.  This will
    usually track https://wiki.mozilla.org/RapidRelease/Calendar , but it
    can vary.
 
+   For templates to use when announcing, see:
+       https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/AnnouncementTemplates
 
-=== V. Aftermath and cleanup
+### V. Aftermath and cleanup
 
 1. If it's a stable release, bump the version number in the
     `maint-x.y.z` branch to "newversion-dev", and do a `merge -s ours`
     merge to avoid taking that change into master.
 
-2. Forward-port the ChangeLog (and ReleaseNotes if appropriate).
+2. If there is a new `maint-x.y.z` branch, create a Travis CI cron job that
+   builds the release every week. (It's ok to skip the weekly build if the
+   branch was updated in the last 24 hours.)
 
-3. Keep an eye on the blog post, to moderate comments and answer questions.
+3. Forward-port the ChangeLog (and ReleaseNotes if appropriate) to the
+   master branch.
 
+4. Keep an eye on the blog post, to moderate comments and answer questions.