summaryrefslogtreecommitdiff
path: root/doc/HACKING
diff options
context:
space:
mode:
Diffstat (limited to 'doc/HACKING')
-rw-r--r--doc/HACKING/CodeStructure.md151
-rw-r--r--doc/HACKING/CodingStandards.md74
-rw-r--r--doc/HACKING/CodingStandardsRust.md6
-rw-r--r--doc/HACKING/EndOfLifeTor.md50
-rw-r--r--doc/HACKING/Fuzzing.md14
-rw-r--r--doc/HACKING/HelpfulTools.md15
-rw-r--r--doc/HACKING/HowToReview.md2
-rw-r--r--doc/HACKING/Maintaining.md113
-rw-r--r--doc/HACKING/ReleasingTor.md145
-rw-r--r--doc/HACKING/design/00-overview.md124
-rw-r--r--doc/HACKING/design/01-common-utils.md121
-rw-r--r--doc/HACKING/design/01a-memory.md93
-rw-r--r--doc/HACKING/design/01b-collections.md43
-rw-r--r--doc/HACKING/design/01c-time.md75
-rw-r--r--doc/HACKING/design/01d-crypto.md169
-rw-r--r--doc/HACKING/design/01e-os-compat.md50
-rw-r--r--doc/HACKING/design/01f-threads.md26
-rw-r--r--doc/HACKING/design/01g-strings.md95
-rw-r--r--doc/HACKING/design/02-dataflow.md236
-rw-r--r--doc/HACKING/design/03-modules.md247
-rw-r--r--doc/HACKING/design/Makefile34
-rw-r--r--doc/HACKING/design/this-not-that.md51
22 files changed, 1769 insertions, 165 deletions
diff --git a/doc/HACKING/CodeStructure.md b/doc/HACKING/CodeStructure.md
index 736d6cd484..fffafcaed1 100644
--- a/doc/HACKING/CodeStructure.md
+++ b/doc/HACKING/CodeStructure.md
@@ -2,128 +2,121 @@
TODO: revise this to talk about how things are, rather than how things
have changed.
-TODO: Make this into good markdown.
-
-
-
-For quite a while now, the program "tor" has been built from source
-code in just two directories: src/common and src/or.
+For quite a while now, the program *tor* has been built from source
+code in just two directories: **src/common** and **src/or**.
This has become more-or-less untenable, for a few reasons -- most
notably of which is that it has led our code to become more
spaghetti-ish than I can endorse with a clean conscience.
So to fix that, we've gone and done a huge code movement in our git
-master branch, which will land in a release once Tor 0.3.5.1-alpha is
+master branch, which will land in a release once Tor `0.3.5.1-alpha` is
out.
Here's what we did:
- * src/common has been turned into a set of static libraries. These
-all live in the "src/lib/*" directories. The dependencies between
+ * **src/common** has been turned into a set of static libraries. These
+all live in the **src/lib/*** directories. The dependencies between
these libraries should have no cycles. The libraries are:
- arch -- Headers to handle architectural differences
- cc -- headers to handle differences among compilers
- compress -- wraps zlib, zstd, lzma
- container -- high-level container types
- crypt_ops -- Cryptographic operations. Planning to split this into
+ - **arch** -- Headers to handle architectural differences
+ - **cc** -- headers to handle differences among compilers
+ - **compress** -- wraps zlib, zstd, lzma
+ - **container** -- high-level container types
+ - **crypt_ops** -- Cryptographic operations. Planning to split this into
a higher and lower level library
- ctime -- Operations that need to run in constant-time. (Properly,
+ - **ctime** -- Operations that need to run in constant-time. (Properly,
data-invariant time)
- defs -- miscelaneous definitions needed throughout Tor.
- encoding -- transforming one data type into another, and various
+ - **defs** -- miscelaneous definitions needed throughout Tor.
+ - **encoding** -- transforming one data type into another, and various
data types into strings.
- err -- lowest-level error handling, in cases where we can't use
+ - **err** -- lowest-level error handling, in cases where we can't use
the logs because something that the logging system needs has broken.
- evloop -- Generic event-loop handling logic
- fdio -- Low-level IO wrapper functions for file descriptors.
- fs -- Operations on the filesystem
- intmath -- low-level integer math and misc bit-twiddling hacks
- lock -- low-level locking code
- log -- Tor's logging module. This library sits roughly halfway up
+ - **evloop** -- Generic event-loop handling logic
+ - **fdio** -- Low-level IO wrapper functions for file descriptors.
+ - **fs** -- Operations on the filesystem
+ - **intmath** -- low-level integer math and misc bit-twiddling hacks
+ - **lock** -- low-level locking code
+ - **log** -- Tor's logging module. This library sits roughly halfway up
the library dependency diagram, since everything it depends on has to
be carefully crafted to *not* log.
- malloc -- Low-level wrappers for the platform memory allocation functions.
- math -- Higher-level mathematical functions, and floating-point math
- memarea -- An arena allocator
- meminfo -- Functions for querying the current process's memory
+ - **malloc** -- Low-level wrappers for the platform memory allocation functions.
+ - **math** -- Higher-level mathematical functions, and floating-point math
+ - **memarea** -- An arena allocator
+ - **meminfo** -- Functions for querying the current process's memory
status and resources
- net -- Networking compatibility and convenience code
- osinfo -- Querying information about the operating system
- process -- Launching and querying the status of other processes
- sandbox -- Backend for the linux seccomp2 sandbox
- smartlist_core -- The lowest-level of the smartlist_t data type.
+ - **net** -- Networking compatibility and convenience code
+ - **osinfo** -- Querying information about the operating system
+ - **process** -- Launching and querying the status of other processes
+ - **sandbox** -- Backend for the linux seccomp2 sandbox
+ - **smartlist_core** -- The lowest-level of the smartlist_t data type.
Separated from the rest of the containers library because the logging
subsystem depends on it.
- string -- Compatibility and convenience functions for manipulating
+ - **string** -- Compatibility and convenience functions for manipulating
C strings.
- term -- Terminal-related functions (currently limited to a getpass
+ - **term** -- Terminal-related functions (currently limited to a getpass
function).
- testsupport -- Macros for mocking, unit tests, etc.
- thread -- Higher-level thread compatibility code
- time -- Higher-level time management code, including format
+ - **testsupport** -- Macros for mocking, unit tests, etc.
+ - **thread** -- Higher-level thread compatibility code
+ - **time** -- Higher-level time management code, including format
conversions and monotonic time
- tls -- Our wrapper around our TLS library
- trace -- Formerly src/trace -- a generic event tracing API
- wallclock -- Low-level time code, used by the log module.
+ - **tls** -- Our wrapper around our TLS library
+ - **trace** -- Formerly src/trace -- a generic event tracing API
+ - **wallclock** -- Low-level time code, used by the log module.
- * To ensure that the dependency graph in src/common remains under
-control, there is a tool that you can run called "make
-check-includes". It verifies that each module in Tor only includes
+ * To ensure that the dependency graph in **src/common** remains under
+control, there is a tool that you can run called `make
+check-includes`. It verifies that each module in Tor only includes
the headers that it is permitted to include, using a per-directory
-".may_include" file.
+*.may_include* file.
- * The src/or/or.h header has been split into numerous smaller
+ * The **src/or/or.h** header has been split into numerous smaller
headers. Notably, many important structures are now declared in a
-header called foo_st.h, where "foo" is the name of the structure.
+header called *foo_st.h*, where "foo" is the name of the structure.
- * The src/or directory, which had most of Tor's code, had been split
+ * The **src/or** directory, which had most of Tor's code, had been split
up into several directories. This is still a work in progress: This
code has not itself been refactored, and its dependency graph is still
a tangled web. I hope we'll be working on that over the coming
releases, but it will take a while to do.
- The new top-level source directories are:
-
- src/core -- Code necessary to actually perform or use onion routing.
- src/feature -- Code used only by some onion routing
+ - The new top-level source directories are:
+ - **src/core** -- Code necessary to actually perform or use onion routing.
+ - **src/feature** -- Code used only by some onion routing
configurations, or only for a special purpose.
- src/app -- Top-level code to run, invoke, and configure the
+ - **src/app** -- Top-level code to run, invoke, and configure the
lower-level code
- The new second-level source directories are:
- src/core/crypto -- High-level cryptographic protocols used in Tor
- src/core/mainloop -- Tor's event loop, connection-handling, and
+ - The new second-level source directories are:
+ - **src/core/crypto** -- High-level cryptographic protocols used in Tor
+ - **src/core/mainloop** -- Tor's event loop, connection-handling, and
traffic-routing code.
- src/core/or -- Parts related to handling onion routing itself
- src/core/proto -- support for encoding and decoding different
+ - **src/core/or** -- Parts related to handling onion routing itself
+ - **src/core/proto** -- support for encoding and decoding different
wire protocols
-
- src/feature/api -- Support for making Tor embeddable
- src/feature/client -- Functionality which only Tor clients need
- src/feature/control -- Controller implementation
- src/feature/dirauth -- Directory authority
- src/feature/dircache -- Directory cache
- src/feature/dirclient -- Directory client
- src/feature/dircommon -- Shared code between the other directory modules
- src/feature/hibernate -- Hibernating when Tor is out of bandwidth
+ - **src/feature/api** -- Support for making Tor embeddable
+ - **src/feature/client** -- Functionality which only Tor clients need
+ - **src/feature/control** -- Controller implementation
+ - **src/feature/dirauth** -- Directory authority
+ - **src/feature/dircache** -- Directory cache
+ - **src/feature/dirclient** -- Directory client
+ - **src/feature/dircommon** -- Shared code between the other directory modules
+ - **src/feature/hibernate** -- Hibernating when Tor is out of bandwidth
or shutting down
- src/feature/hs -- v3 onion service implementation
- src/feature/hs_common -- shared code between both onion service
+ - **src/feature/hs** -- v3 onion service implementation
+ - **src/feature/hs_common** -- shared code between both onion service
implementations
- src/feature/nodelist -- storing and accessing the list of relays on
+ - **src/feature/nodelist** -- storing and accessing the list of relays on
the network.
- src/feature/relay -- code that only relay servers and exit servers need.
- src/feature/rend -- v2 onion service implementation
- src/feature/stats -- statistics and history
-
- src/app/config -- configuration and state for Tor
- src/app/main -- Top-level functions to invoke the rest or Tor.
+ - **src/feature/relay** -- code that only relay servers and exit servers need.
+ - **src/feature/rend** -- v2 onion service implementation
+ - **src/feature/stats** -- statistics and history
+ - **src/app/config** -- configuration and state for Tor
+ - **src/app/main** -- Top-level functions to invoke the rest or Tor.
- * The "tor" executable is now built in src/app/tor rather than src/or/tor.
+ * The `tor` executable is now built in **src/app/tor** rather than **src/or/tor**.
* There are more static libraries than before that you need to build
into your application if you want to embed Tor. Rather than
-maintaining this list yourself, I recommend that you run "make
-show-libs" to have Tor emit a list of what you need to link.
+maintaining this list yourself, I recommend that you run `make
+show-libs` to have Tor emit a list of what you need to link.
diff --git a/doc/HACKING/CodingStandards.md b/doc/HACKING/CodingStandards.md
index 4f229348e4..2c273910d1 100644
--- a/doc/HACKING/CodingStandards.md
+++ b/doc/HACKING/CodingStandards.md
@@ -42,6 +42,7 @@ If you have changed build system components:
- For example, if you have changed Makefiles, autoconf files, or anything
else that affects the build system.
+
License issues
==============
@@ -58,7 +59,6 @@ Some compatible licenses include:
- CC0 Public Domain Dedication
-
How we use Git branches
=======================
@@ -99,29 +99,65 @@ When you do a commit that needs a ChangeLog entry, add a new file to
the `changes` toplevel subdirectory. It should have the format of a
one-entry changelog section from the current ChangeLog file, as in
-- Major bugfixes:
+ o Major bugfixes (security):
- Fix a potential buffer overflow. Fixes bug 99999; bugfix on
0.3.1.4-beta.
+ o Minor features (performance):
+ - Make tor faster. Closes ticket 88888.
To write a changes file, first categorize the change. Some common categories
-are: Minor bugfixes, Major bugfixes, Minor features, Major features, Code
-simplifications and refactoring. Then say what the change does. If
-it's a bugfix, mention what bug it fixes and when the bug was
-introduced. To find out which Git tag the change was introduced in,
-you can use `git describe --contains <sha1 of commit>`.
-
-If at all possible, try to create this file in the same commit where you are
-making the change. Please give it a distinctive name that no other branch will
-use for the lifetime of your change. To verify the format of the changes file,
-you can use `make check-changes`. This is run automatically as part of
-`make check` -- if it fails, we must fix it before we release. These
-checks are implemented in `scripts/maint/lintChanges.py`.
+are:
+ o Minor bugfixes (subheading):
+ o Major bugfixes (subheading):
+ o Minor features (subheading):
+ o Major features (subheading):
+ o Code simplifications and refactoring:
+ o Testing:
+ o Documentation:
+
+The subheading is a particular area within Tor. See the ChangeLog for
+examples.
+
+Then say what the change does. If it's a bugfix, mention what bug it fixes
+and when the bug was introduced. To find out which Git tag the change was
+introduced in, you can use `git describe --contains <sha1 of commit>`.
+If you don't know the commit, you can search the git diffs (-S) for the first
+instance of the feature (--reverse).
+
+For example, for #30224, we wanted to know when the bridge-distribution-request
+feature was introduced into Tor:
+ $ git log -S bridge-distribution-request --reverse
+ commit ebab521525
+ Author: Roger Dingledine <arma@torproject.org>
+ Date: Sun Nov 13 02:39:16 2016 -0500
+
+ Add new BridgeDistribution config option
+
+ $ git describe --contains ebab521525
+ tor-0.3.2.3-alpha~15^2~4
+
+If you need to know all the Tor versions that contain a commit, use:
+ $ git tag --contains 9f2efd02a1 | sort -V
+ tor-0.2.5.16
+ tor-0.2.8.17
+ tor-0.2.9.14
+ tor-0.2.9.15
+ ...
+ tor-0.3.0.13
+ tor-0.3.1.9
+ tor-0.3.1.10
+ ...
+
+If at all possible, try to create the changes file in the same commit where
+you are making the change. Please give it a distinctive name that no other
+branch will use for the lifetime of your change. We usually use "ticketNNNNN"
+or "bugNNNNN", where NNNNN is the ticket number. To verify the format of the
+changes file, you can use `make check-changes`. This is run automatically as
+part of `make check` -- if it fails, we must fix it as soon as possible, so
+that our CI passes. These checks are implemented in
+`scripts/maint/lintChanges.py`.
Changes file style guide:
- * Changes files begin with " o Header (subheading):". The header
- should usually be "Minor/Major bugfixes/features". The subheading is a
- particular area within Tor. See the ChangeLog for examples.
-
* Make everything terse.
* Write from the user's point of view: describe the user-visible changes
@@ -314,7 +350,6 @@ for more information about trunnel.
For information on adding new trunnel code to Tor, see src/trunnel/README
-
Calling and naming conventions
------------------------------
@@ -422,7 +457,6 @@ to use it as a function callback), define it with a name like
abc_free_(obj);
}
-
Doxygen comment conventions
---------------------------
diff --git a/doc/HACKING/CodingStandardsRust.md b/doc/HACKING/CodingStandardsRust.md
index fc562816db..b570e10dc7 100644
--- a/doc/HACKING/CodingStandardsRust.md
+++ b/doc/HACKING/CodingStandardsRust.md
@@ -256,7 +256,7 @@ Here are some additional bits of advice and rules:
or 2) should fail (i.e. in a unittest).
You SHOULD NOT use `unwrap()` anywhere in which it is possible to handle the
- potential error with either `expect()` or the eel operator, `?`.
+ potential error with the eel operator, `?` or another non panicking way.
For example, consider a function which parses a string into an integer:
fn parse_port_number(config_string: &str) -> u16 {
@@ -264,12 +264,12 @@ Here are some additional bits of advice and rules:
}
There are numerous ways this can fail, and the `unwrap()` will cause the
- whole program to byte the dust! Instead, either you SHOULD use `expect()`
+ whole program to byte the dust! Instead, either you SHOULD use `ok()`
(or another equivalent function which will return an `Option` or a `Result`)
and change the return type to be compatible:
fn parse_port_number(config_string: &str) -> Option<u16> {
- u16::from_str_radix(config_string, 10).expect("Couldn't parse port into a u16")
+ u16::from_str_radix(config_string, 10).ok()
}
or you SHOULD use `or()` (or another similar method):
diff --git a/doc/HACKING/EndOfLifeTor.md b/doc/HACKING/EndOfLifeTor.md
new file mode 100644
index 0000000000..2fece2ca9d
--- /dev/null
+++ b/doc/HACKING/EndOfLifeTor.md
@@ -0,0 +1,50 @@
+
+End of Life on an old release series
+------------------------------------
+
+Here are the steps that the maintainer should take when an old Tor release
+series reaches End of Life. Note that they are _only_ for entire series that
+have reached their planned EOL: they do not apply to security-related
+deprecations of individual versions.
+
+### 0. Preliminaries
+
+0. A few months before End of Life:
+ Write a deprecation announcement.
+ Send the announcement out with every new release announcement.
+
+1. A month before End of Life:
+ Send the announcement to tor-announce, tor-talk, tor-relays, and the
+ packagers.
+
+### 1. On the day
+
+1. Open tickets to remove the release from:
+ - the jenkins builds
+ - tor's Travis CI cron jobs
+ - chutney's Travis CI tests (#)
+ - stem's Travis CI tests (#)
+
+2. Close the milestone in Trac. To do this, go to Trac, log in,
+ select "Admin" near the top of the screen, then select "Milestones" from
+ the menu on the left. Click on the milestone for this version, and
+ select the "Completed" checkbox. By convention, we select the date as
+ the End of Life date.
+
+3. Replace NNN-backport with NNN-unreached-backport in all open trac tickets.
+
+4. If there are any remaining tickets in the milestone:
+ - merge_ready tickets are for backports:
+ - if there are no supported releases for the backport, close the ticket
+ - if there is an earlier (LTS) release for the backport, move the ticket
+ to that release
+ - other tickets should be closed (if we won't fix them) or moved to a
+ supported release (if we will fix them)
+
+5. Mail the end of life announcement to tor-announce, the packagers list,
+ and tor-relays. The current list of packagers is in ReleasingTor.md.
+
+6. Ask at least two of weasel/arma/Sebastian to remove the old version
+ number from their approved versions list.
+
+7. Update the CoreTorReleases wiki page.
diff --git a/doc/HACKING/Fuzzing.md b/doc/HACKING/Fuzzing.md
index 2039d6a4c0..c2db7e9853 100644
--- a/doc/HACKING/Fuzzing.md
+++ b/doc/HACKING/Fuzzing.md
@@ -1,6 +1,6 @@
-= Fuzzing Tor
+# Fuzzing Tor
-== The simple version (no fuzzing, only tests)
+## The simple version (no fuzzing, only tests)
Check out fuzzing-corpora, and set TOR_FUZZ_CORPORA to point to the place
where you checked it out.
@@ -12,7 +12,7 @@ This won't actually fuzz Tor! It will just run all the fuzz binaries
on our existing set of testcases for the fuzzer.
-== Different kinds of fuzzing
+## Different kinds of fuzzing
Right now we support three different kinds of fuzzer.
@@ -37,7 +37,7 @@ In all cases, you'll need some starting examples to give the fuzzer when it
starts out. There's a set in the "fuzzing-corpora" git repository. Try
setting TOR_FUZZ_CORPORA to point to a checkout of that repository
-== Writing Tor fuzzers
+## Writing Tor fuzzers
A tor fuzzing harness should have:
* a fuzz_init() function to set up any necessary global state.
@@ -52,7 +52,7 @@ bug, or accesses memory it shouldn't. This helps fuzzing frameworks detect
"interesting" cases.
-== Guided Fuzzing with AFL
+## Guided Fuzzing with AFL
There is no HTTPS, hash, or signature for American Fuzzy Lop's source code, so
its integrity can't be verified. That said, you really shouldn't fuzz on a
@@ -101,7 +101,7 @@ macOS (OS X) requires slightly more preparation, including:
* using afl-clang (or afl-clang-fast from the llvm directory)
* disabling external crash reporting (AFL will guide you through this step)
-== Triaging Issues
+## Triaging Issues
Crashes are usually interesting, particularly if using AFL_HARDEN=1 and --enable-expensive-hardening. Sometimes crashes are due to bugs in the harness code.
@@ -115,7 +115,7 @@ To see what fuzz-http is doing with a test case, call it like this:
(Logging is disabled while fuzzing to increase fuzzing speed.)
-== Reporting Issues
+## Reporting Issues
Please report any issues discovered using the process in Tor's security issue
policy:
diff --git a/doc/HACKING/HelpfulTools.md b/doc/HACKING/HelpfulTools.md
index d499238526..cba57e875d 100644
--- a/doc/HACKING/HelpfulTools.md
+++ b/doc/HACKING/HelpfulTools.md
@@ -371,3 +371,18 @@ source code. Here's how to use it:
6. See the Doxygen manual for more information; this summary just
scratches the surface.
+
+Style and best-pratices checking
+--------------------------------
+
+We use scripts to check for various problems in the formatting and style
+of our source code. The "check-spaces" test detects a bunch of violations
+of our coding style on the local level. The "check-best-practices" test
+looks for violations of some of our complexity guidelines.
+
+You can tell the tool about exceptions to the complexity guidelines via its
+exceptions file (scripts/maint/practracker/exceptions.txt). But before you
+do this, consider whether you shouldn't fix the underlying problem. Maybe
+that file really _is_ too big. Maybe that function really _is_ doing too
+much. (On the other hand, for stable release series, it is sometimes better
+to leave things unrefactored.)
diff --git a/doc/HACKING/HowToReview.md b/doc/HACKING/HowToReview.md
index 2d1f3d1c9e..2325e70175 100644
--- a/doc/HACKING/HowToReview.md
+++ b/doc/HACKING/HowToReview.md
@@ -63,7 +63,7 @@ Let's look at the code!
Let's look at the documentation!
--------------------------------
-- Does the documentation confirm to CodingStandards.txt?
+- Does the documentation conform to CodingStandards.txt?
- Does it make sense?
diff --git a/doc/HACKING/Maintaining.md b/doc/HACKING/Maintaining.md
new file mode 100644
index 0000000000..4d5a7f6b76
--- /dev/null
+++ b/doc/HACKING/Maintaining.md
@@ -0,0 +1,113 @@
+# Maintaining Tor
+
+This document details the duties and processes on maintaining the Tor code
+base.
+
+The first section describes who is the current Tor maintainer and what are the
+responsibilities. Tor has one main single maintainer but does have many
+committers and subsystem maintainers.
+
+The second third section describes how the **alpha and master** branches are
+maintained and by whom.
+
+Finally, the last section describes how the **stable** branches are maintained
+and by whom.
+
+This document does not cover how Tor is released, please see
+[ReleasingTor.md](ReleasingTor.md) for that information.
+
+## Tor Maintainer
+
+The current maintainer is Nick Mathewson <nickm@torproject.org>.
+
+The maintainer takes final decisions in terms of engineering, architecture and
+protocol design. Releasing Tor falls under their responsibility.
+
+## Alpha and Master Branches
+
+The Tor repository always has at all times a **master** branch which contains
+the upstream ongoing development.
+
+It may also contain a branch for a released feature freezed version which is
+called the **alpha** branch. The git tag and version number is always
+postfixed with `-alpha[-dev]`. For example: `tor-0.3.5.0-alpha-dev` or
+`tor-0.3.5.3-alpha`.
+
+Tor is separated into subsystems and some of those are maintained by other
+developers than the main maintainer. Those people have commit access to the
+code base but only commit (in most cases) into the subsystem they maintain.
+
+Upstream merges are restricted to the alpha and master branches. Subsystem
+maintainers should never push a patch into a stable branch which is the
+responsibility of the [stable branch maintainer](#stable-branches).
+
+### Who
+
+In alphabetical order, the following people have upstream commit access and
+maintain the following subsystems:
+
+- David Goulet <dgoulet@torproject.org>
+ * Onion Service (including Shared Random).
+ ***keywords:*** *[tor-hs]*
+ * Channels, Circuitmux, Connection, Scheduler.
+ ***keywords:*** *[tor-chan, tor-cmux, tor-sched, tor-conn]*
+ * Cell Logic (Handling/Parsing).
+ ***keywords:*** *[tor-cell]*
+ * Threading backend.
+ ***keywords:*** *[tor-thread]*
+
+- George Kadianakis <asn@torproject.org>
+ * Onion Service (including Shared Random).
+ ***keywords:*** *[tor-hs]*
+ * Guard.
+ ***keywords:*** *[tor-guard]*
+ * Pluggable Transport (excluding Bridge networking).
+ ***keywords:*** *[tor-pt]*
+
+### Tasks
+
+These are the tasks of a subsystem maintainer:
+
+1. Regularly go over `merge_ready` tickets relevant to the related subsystem
+ and for the current alpha or development (master branch) Milestone.
+
+2. A subsystem maintainer is expected to contribute to any design changes
+ (including proposals) or large patch set about the subsystem.
+
+3. Leave their ego at the door. Mistakes will be made but they have to be
+ taking care of seriously. Learn and move on quickly.
+
+### Merging Policy
+
+These are few important items to follow when merging code upstream:
+
+1. To merge code upstream, the patch must have passed our CI (currently
+ github.com/torproject), have a corresponding ticket and reviewed by
+ **at least** one person that is not the original coder.
+
+ Example A: If Alice writes a patch then Bob, a Tor network team member,
+ reviews it and flags it `merge_ready`. Then, the maintainer is required
+ to look at the patch and makes a decision.
+
+ Example B: If the maintainer writes a patch then Bob, a Tor network
+ team member, reviews it and flags it `merge_ready`, then the maintainer
+ can merge the code upstream.
+
+2. Maintainer makes sure the commit message should describe what was fixed
+ and, if it applies, how was it fixed. It should also always refer to
+ the ticket number.
+
+3. Trivial patches such as comment change, documentation, syntax issues or
+ typos can be merged without a ticket or reviewers.
+
+4. Tor uses the "merge forward" method, that is, if a patch applies to the
+ alpha branch, it has to be merged there first and then merged forward
+ into master.
+
+5. Maintainer should always consult with the network team about any doubts,
+ mis-understandings or unknowns of a patch. Final word will always go to the
+ main Tor maintainer.
+
+## Stable Branches
+
+(Currently being drafted and reviewed by the network team.)
diff --git a/doc/HACKING/ReleasingTor.md b/doc/HACKING/ReleasingTor.md
index 55a40fc89b..f40e2af573 100644
--- a/doc/HACKING/ReleasingTor.md
+++ b/doc/HACKING/ReleasingTor.md
@@ -5,7 +5,7 @@ Putting out a new release
Here are the steps that the maintainer should take when putting out a
new Tor release:
-=== 0. Preliminaries
+### 0. Preliminaries
1. Get at least two of weasel/arma/Sebastian to put the new
version number in their approved versions list. Give them a few
@@ -18,35 +18,41 @@ new Tor release:
date of a TB that contains it. See note below in "commit, upload,
announce".
-=== I. Make sure it works
+### I. Make sure it works
-1. Use it for a while, as a client, as a relay, as a hidden service,
- and as a directory authority. See if it has any obvious bugs, and
- resolve those.
+1. Make sure that CI passes: have a look at Travis
+ (https://travis-ci.org/torproject/tor/branches), Appveyor
+ (https://ci.appveyor.com/project/torproject/tor/history), and
+ Jenkins (https://jenkins.torproject.org/view/tor/).
+ Make sure you're looking at the right branches.
- As applicable, merge the `maint-X` branch into the `release-X` branch.
- But you've been doing that all along, right?
+ If there are any unexplained failures, try to fix them or figure them
+ out.
-2. Are all of the jenkins builders happy? See jenkins.torproject.org.
+2. Verify that there are no big outstanding issues. You might find such
+ issues --
- What about the bsd buildbots?
- See http://buildbot.pixelminers.net/builders/
+ * On Trac
- What about Coverity Scan?
+ * On coverity scan
- What about clang scan-build?
+ * On OSS-Fuzz
- Does 'make distcheck' complain?
+3. Run checks that aren't covered above, including:
- How about 'make test-stem' and 'make test-network' and
- `make test-network-full`?
+ * clang scan-build. (See the script in ./scripts/test/scan_build.sh)
- - Are all those tests still happy with --enable-expensive-hardening ?
+ * make test-network and make test-network-all (with
+ --enable-fragile-hardening)
- Any memory leaks?
+ * Running Tor yourself and making sure that it actually works for you.
+ * Running Tor under valgrind. (Our 'fragile hardening' doesn't cover
+ libevent and openssl, so using valgrind will sometimes find extra
+ memory leaks.)
-=== II. Write a changelog
+
+### II. Write a changelog
1a. (Alpha release variant)
@@ -55,11 +61,14 @@ new Tor release:
of them and reordering to focus on what users and funders would find
interesting and understandable.
- To do this, first run `./scripts/maint/lintChanges.py changes/*` and
- fix as many warnings as you can. Then run `./scripts/maint/sortChanges.py
- changes/* > changelog.in` to combine headings and sort the entries.
- After that, it's time to hand-edit and fix the issues that lintChanges
- can't find:
+ To do this, run
+ `./scripts/maint/sortChanges.py changes/* > changelog.in`
+ to combine headings and sort the entries. Copy the changelog.in file
+ into the ChangeLog. Run 'format_changelog.py' (see below) to clean
+ up the line breaks.
+
+ After that, it's time to hand-edit and fix the issues that
+ lintChanges can't find:
1. Within each section, sort by "version it's a bugfix on", else by
numerical ticket order.
@@ -68,8 +77,6 @@ new Tor release:
Make stuff very terse
- Make sure each section name ends with a colon
-
Describe the user-visible problem right away
Mention relevant config options by name. If they're rare or unusual,
@@ -79,7 +86,9 @@ new Tor release:
Present and imperative tense: not past.
- 'Relays', not 'servers' or 'nodes' or 'Tor relays'.
+ "Relays", not "servers" or "nodes" or "Tor relays".
+
+ "Onion services", not "hidden services".
"Stop FOOing", not "Fix a bug where we would FOO".
@@ -100,12 +109,14 @@ new Tor release:
For stable releases that backport things from later, we try to compose
their releases, we try to make sure that we keep the changelog entries
- identical to their original versions, with a 'backport from 0.x.y.z'
+ identical to their original versions, with a "backport from 0.x.y.z"
note added to each section. So in this case, once you have the items
from the changes files copied together, don't use them to build a new
changelog: instead, look up the corrected versions that were merged
into ChangeLog in the master branch, and use those.
+ Add "backport from X.Y.Z" in the section header for these entries.
+
2. Compose a short release blurb to highlight the user-facing
changes. Insert said release blurb into the ChangeLog stanza. If it's
a stable release, add it to the ReleaseNotes file too. If we're adding
@@ -128,47 +139,61 @@ new Tor release:
text of existing entries, though.)
-=== III. Making the source release.
+### III. Making the source release.
1. In `maint-0.?.x`, bump the version number in `configure.ac` and run
- `perl scripts/maint/updateVersions.pl` to update version numbers in other
+ `make update-versions` to update version numbers in other
places, and commit. Then merge `maint-0.?.x` into `release-0.?.x`.
- (NOTE: To bump the version number, edit `configure.ac`, and then run
- either `make`, or `perl scripts/maint/updateVersions.pl`, depending on
- your version.)
-
When you merge the maint branch forward to the next maint branch, or into
master, merge it with "-s ours" to avoid a needless version bump.
2. Make distcheck, put the tarball up in somewhere (how about your
- homedir on your homedir on people.torproject.org?) , and tell `#tor`
- about it. Wait a while to see if anybody has problems building it.
- (Though jenkins is usually pretty good about catching these things.)
+ homedir on your homedir on people.torproject.org?) , and tell `#tor-dev`
+ about it.
+
+ If you want, wait until at least one person has built it
+ successfully. (We used to say "wait for others to test it", but our
+ CI has successfully caught these kinds of errors for the last several
+ years.)
+
-=== IV. Commit, upload, announce
+3. Make sure that the new version is recommended in the latest consensus.
+ (Otherwise, users will get confused when it complains to them
+ about its status.)
+
+ If it is not, you'll need to poke Roger, Weasel, and Sebastian again: see
+ item 0.1 at the start of this document.
+
+### IV. Commit, upload, announce
1. Sign the tarball, then sign and push the git tag:
gpg -ba <the_tarball>
- git tag -u <keyid> tor-0.3.x.y-status
- git push origin tag tor-0.3.x.y-status
+ git tag -s tor-0.4.x.y-<status>
+ git push origin tag tor-0.4.x.y-<status>
- (You must do this before you update the website: it relies on finding
- the version by tag.)
+ (You must do this before you update the website: the website scripts
+ rely on finding the version by tag.)
+
+ (If your default PGP key is not the one you want to sign with, then say
+ "-u <keyid>" instead of "-s".)
2. scp the tarball and its sig to the dist website, i.e.
- `/srv/dist-master.torproject.org/htdocs/` on dist-master. When you want
- it to go live, you run "static-update-component dist.torproject.org"
- on dist-master.
+ `/srv/dist-master.torproject.org/htdocs/` on dist-master. Run
+ "static-update-component dist.torproject.org" on dist-master.
- In the webwml.git repository, `include/versions.wmi` and `Makefile`
- to note the new version.
+ In the webwml.git repository, `include/versions.wmi` and `Makefile`.
+ In the project/web/tpo.git repository, update `databags/versions.ini`
+ to note the new version. Push these changes to master.
(NOTE: Due to #17805, there can only be one stable version listed at
once. Nonetheless, do not call your version "alpha" if it is stable,
or people will get confused.)
+ (NOTE: It will take a while for the website update scripts to update
+ the website.)
+
3. Email the packagers (cc'ing tor-team) that a new tarball is up.
The current list of packagers is:
@@ -186,37 +211,47 @@ new Tor release:
Also, email tor-packagers@lists.torproject.org.
+ Mention where to download the tarball (https://dist.torproject.org).
+
+ Include a link to the changelog.
+
+
4. Add the version number to Trac. To do this, go to Trac, log in,
select "Admin" near the top of the screen, then select "Versions" from
the menu on the left. At the right, there will be an "Add version"
box. By convention, we enter the version in the form "Tor:
- 0.2.2.23-alpha" (or whatever the version is), and we select the date as
+ 0.4.0.1-alpha" (or whatever the version is), and we select the date as
the date in the ChangeLog.
-5. Double-check: did the version get recommended in the consensus yet? Is
- the website updated? If not, don't announce until they have the
- up-to-date versions, or people will get confused.
+5. Wait for the download page to be updated. (If you don't do this before you
+ announce, people will be confused.)
6. Mail the release blurb and ChangeLog to tor-talk (development release) or
tor-announce (stable).
Post the changelog on the blog as well. You can generate a
- blog-formatted version of the changelog with the -B option to
- format-changelog.
+ blog-formatted version of the changelog with
+ `./scripts/maint/format_changelog.py --B`
When you post, include an estimate of when the next TorBrowser
releases will come out that include this Tor release. This will
usually track https://wiki.mozilla.org/RapidRelease/Calendar , but it
can vary.
+ For templates to use when announcing, see:
+ https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/AnnouncementTemplates
-=== V. Aftermath and cleanup
+### V. Aftermath and cleanup
1. If it's a stable release, bump the version number in the
`maint-x.y.z` branch to "newversion-dev", and do a `merge -s ours`
merge to avoid taking that change into master.
-2. Forward-port the ChangeLog (and ReleaseNotes if appropriate).
+2. If there is a new `maint-x.y.z` branch, create a Travis CI cron job that
+ builds the release every week. (It's ok to skip the weekly build if the
+ branch was updated in the last 24 hours.)
-3. Keep an eye on the blog post, to moderate comments and answer questions.
+3. Forward-port the ChangeLog (and ReleaseNotes if appropriate) to the
+ master branch.
+4. Keep an eye on the blog post, to moderate comments and answer questions.
diff --git a/doc/HACKING/design/00-overview.md b/doc/HACKING/design/00-overview.md
new file mode 100644
index 0000000000..2103a9062a
--- /dev/null
+++ b/doc/HACKING/design/00-overview.md
@@ -0,0 +1,124 @@
+
+## Overview ##
+
+This document describes the general structure of the Tor codebase, how
+it fits together, what functionality is available for extending Tor,
+and gives some notes on how Tor got that way.
+
+Tor remains a work in progress: We've been working on it for more than a
+decade, and we've learned a lot about good coding since we first
+started. This means, however, that some of the older pieces of Tor will
+have some "code smell" in them that could sure stand a brisk
+refactoring. So when I describe a piece of code, I'll sometimes give a
+note on how it got that way, and whether I still think that's a good
+idea.
+
+The first drafts of this document were written in the Summer and Fall of
+2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7
+was under development. If you're reading this far in the future, some
+things may have changed. Caveat haxxor!
+
+This document is not an overview of the Tor protocol. For that, see the
+design paper and the specifications at https://spec.torproject.org/ .
+
+For more information about Tor's coding standards and some helpful
+development tools, see doc/HACKING in the Tor repository.
+
+For more information about writing tests, see doc/HACKING/WritingTests.txt
+in the Tor repository.
+
+### The very high level ###
+
+Ultimately, Tor runs as an event-driven network daemon: it responds to
+network events, signals, and timers by sending and receiving things over
+the network. Clients, relays, and directory authorities all use the
+same codebase: the Tor process will run as a client, relay, or authority
+depending on its configuration.
+
+Tor has a few major dependencies, including Libevent (used to tell which
+sockets are readable and writable), OpenSSL (used for many encryption
+functions, and to implement the TLS protocol), and zlib (used to
+compress and uncompress directory information).
+
+Most of Tor's work today is done in a single event-driven main thread.
+Tor also spawns one or more worker threads to handle CPU-intensive
+tasks. (Right now, this only includes circuit encryption.)
+
+On startup, Tor initializes its libraries, reads and responds to its
+configuration files, and launches a main event loop. At first, the only
+events that Tor listens for are a few signals (like TERM and HUP), and
+one or more listener sockets (for different kinds of incoming
+connections). Tor also configures a timer function to run once per
+second to handle periodic events. As Tor runs over time, other events
+will open, and new events will be scheduled.
+
+The codebase is divided into a few main subdirectories:
+
+ src/common -- utility functions, not necessarily tor-specific.
+
+ src/or -- implements the Tor protocols.
+
+ src/test -- unit and regression tests
+
+ src/ext -- Code maintained elsewhere that we include in the Tor
+ source distribution.
+
+ src/trunnel -- automatically generated code (from the Trunnel)
+ tool: used to parse and encode binary formats.
+
+### Some key high-level abstractions ###
+
+The most important abstractions at Tor's high-level are Connections,
+Channels, Circuits, and Nodes.
+
+A 'Connection' represents a stream-based information flow. Most
+connections are TCP connections to remote Tor servers and clients. (But
+as a shortcut, a relay will sometimes make a connection to itself
+without actually using a TCP connection. More details later on.)
+Connections exist in different varieties, depending on what
+functionality they provide. The principle types of connection are
+"edge" (eg a socks connection or a connection from an exit relay to a
+destination), "OR" (a TLS stream connecting to a relay), "Directory" (an
+HTTP connection to learn about the network), and "Control" (a connection
+from a controller).
+
+A 'Circuit' is persistent tunnel through the Tor network, established
+with public-key cryptography, and used to send cells one or more hops.
+Clients keep track of multi-hop circuits, and the cryptography
+associated with each hop. Relays, on the other hand, keep track only of
+their hop of each circuit.
+
+A 'Channel' is an abstract view of sending cells to and from a Tor
+relay. Currently, all channels are implemented using OR connections.
+If we switch to other strategies in the future, we'll have more
+connection types.
+
+A 'Node' is a view of a Tor instance's current knowledge and opinions
+about a Tor relay orbridge.
+
+### The rest of this document. ###
+
+> **Note**: This section describes the eventual organization of this
+> document, which is not yet complete.
+
+We'll begin with an overview of the various utility functions available
+in Tor's 'common' directory. Knowing about these is key to writing
+portable, simple code in Tor.
+
+Then we'll go on and talk about the main data-flow of the Tor network:
+how Tor generates and responds to network traffic. This will occupy a
+chapter for the main overview, with other chapters for special topics.
+
+After that, we'll mention the main modules in Tor, and describe the
+function of each.
+
+We'll cover the directory subsystem next: how Tor learns about other
+relays, and how relays advertise themselves.
+
+Then we'll cover a few specialized modules, such as hidden services,
+sandboxing, hibernation, accounting, statistics, guards, path
+generation, pluggable transports, and how they integrate with the rest of Tor.
+
+We'll close with a meandering overview of important pending issues in
+the Tor codebase, and how they affect the future of the Tor software.
+
diff --git a/doc/HACKING/design/01-common-utils.md b/doc/HACKING/design/01-common-utils.md
new file mode 100644
index 0000000000..79a6a7b7d3
--- /dev/null
+++ b/doc/HACKING/design/01-common-utils.md
@@ -0,0 +1,121 @@
+
+## Utility code in Tor
+
+Most of Tor's utility code is in modules in the src/common subdirectory.
+
+These are divided, broadly, into _compatibility_ functions, _utility_
+functions, _containers_, and _cryptography_. (Someday in the future, it
+would be great to split these modules into separate directories. Also, some
+functions are probably put in the wrong modules)
+
+### Compatibility code
+
+These functions live in src/common/compat\*.c; some corresponding macros live
+in src/common/compat\*.h. They serve as wrappers around platform-specific or
+compiler-specific logic functionality.
+
+In general, the rest of the Tor code *should not* be calling platform-specific
+or otherwise non-portable functions. Instead, they should call wrappers from
+compat.c, which implement a common cross-platform API. (If you don't know
+whether a function is portable, it's usually good enough to see whether it
+exists on OSX, Linux, and Windows.)
+
+Other compatibility modules include backtrace.c, which generates stack traces
+for crash reporting; sandbox.c, which implements the Linux seccomp2 sandbox;
+and procmon.c, which handles monitoring a child process.
+
+Parts of address.c are compatibility code for handling network addressing
+issues; other parts are in util.c.
+
+Notable compatibility areas are:
+
+ * mmap support for mapping files into the address space (read-only)
+
+ * Code to work around the intricacies
+
+ * Workaround code for Windows's horrible winsock incompatibilities and
+ Linux's intricate socket extensions.
+
+ * Helpful string functions like memmem, memstr, asprintf, strlcpy, and
+ strlcat that not all platforms have.
+
+ * Locale-ignoring variants of the ctypes functions.
+
+ * Time-manipulation functions
+
+ * File locking function
+
+ * IPv6 functions for platforms that don't have enough IPv6 support
+
+ * Endianness functions
+
+ * OS functions
+
+ * Threading and locking functions.
+
+=== Utility functions
+
+General-purpose utilities are in util.c; they include higher-level wrappers
+around many of the compatibility functions to provide things like
+file-at-once access, memory management functions, math, string manipulation,
+time manipulation, filesystem manipulation, etc.
+
+(Some functionality, like daemon-launching, would be better off in a
+compatibility module.)
+
+In util_format.c, we have code to implement stuff like base-32 and base-64
+encoding.
+
+The address.c module interfaces with the system resolver and implements
+address parsing and formatting functions. It converts sockaddrs to and from
+a more compact tor_addr_t type.
+
+The di_ops.c module provides constant-time comparison and associative-array
+operations, for side-channel avoidance.
+
+The logging subsystem in log.c supports logging to files, to controllers, to
+stdout/stderr, or to the system log.
+
+The abstraction in memarea.c is used in cases when a large amount of
+temporary objects need to be allocated, and they can all be freed at the same
+time.
+
+The torgzip.c module wraps the zlib library to implement compression.
+
+Workqueue.c provides a simple multithreaded work-queue implementation.
+
+### Containers
+
+The container.c module defines these container types, used throughout the Tor
+codebase.
+
+There is a dynamic array called **smartlist**, used as our general resizeable
+array type. It supports sorting, searching, common set operations, and so
+on. It has specialized functions for smartlists of strings, and for
+heap-based priority queues.
+
+There's a bit-array type.
+
+A set of mapping types to map strings, 160-bit digests, and 256-bit digests
+to void \*. These are what we generally use when we want O(1) lookup.
+
+Additionally, for containers, we use the ht.h and tor_queue.h headers, in
+src/ext. These provide intrusive hashtable and linked-list macros.
+
+### Cryptography
+
+Once, we tried to keep our cryptography code in a single "crypto.c" file,
+with an "aes.c" module containing an AES implementation for use with older
+OpenSSLs.
+
+Now, our practice has become to introduce crypto_\*.c modules when adding new
+cryptography backend code. We have modules for Ed25519, Curve25519,
+secret-to-key algorithms, and password-based boxed encryption.
+
+Our various TLS compatibility code, wrappers, and hacks are kept in
+tortls.c, which is probably too full of Tor-specific kludges. I'm
+hoping we can eliminate most of those kludges when we finally remove
+support for older versions of our TLS handshake.
+
+
+
diff --git a/doc/HACKING/design/01a-memory.md b/doc/HACKING/design/01a-memory.md
new file mode 100644
index 0000000000..9a20782962
--- /dev/null
+++ b/doc/HACKING/design/01a-memory.md
@@ -0,0 +1,93 @@
+
+## Memory management
+
+### Heap-allocation functions
+
+Tor imposes a few light wrappers over C's native malloc and free
+functions, to improve convenience, and to allow wholescale replacement
+of malloc and free as needed.
+
+You should never use 'malloc', 'calloc', 'realloc, or 'free' on their
+own; always use the variants prefixed with 'tor_'.
+They are the same as the standard C functions, with the following
+exceptions:
+
+ * tor_free(NULL) is a no-op.
+ * tor_free() is a macro that takes an lvalue as an argument and sets it to
+ NULL after freeing it. To avoid this behavior, you can use tor_free_()
+ instead.
+ * tor_malloc() and friends fail with an assertion if they are asked to
+ allocate a value so large that it is probably an underflow.
+ * It is always safe to tor_malloc(0), regardless of whether your libc
+ allows it.
+ * tor_malloc(), tor_realloc(), and friends are never allowed to fail.
+ Instead, Tor will die with an assertion. This means that you never
+ need to check their return values. See the next subsection for
+ information on why we think this is a good idea.
+
+We define additional general-purpose memory allocation functions as well:
+
+ * tor_malloc_zero(x) behaves as calloc(1, x), except the it makes clear
+ the intent to allocate a single zeroed-out value.
+ * tor_reallocarray(x,y) behaves as the OpenBSD reallocarray function.
+ Use it for cases when you need to realloc() in a multiplication-safe
+ way.
+
+And specific-purpose functions as well:
+
+ * tor_strdup() and tor_strndup() behaves as the underlying libc functions,
+ but use tor_malloc() instead of the underlying function.
+ * tor_memdup() copies a chunk of memory of a given size.
+ * tor_memdup_nulterm() copies a chunk of memory of a given size, then
+ NUL-terminates it just to be safe.
+
+#### Why assert on failure?
+
+Why don't we allow tor_malloc() and its allies to return NULL?
+
+First, it's error-prone. Many programmers forget to check for NULL return
+values, and testing for malloc() failures is a major pain.
+
+Second, it's not necessarily a great way to handle OOM conditions. It's
+probably better (we think) to have a memory target where we dynamically free
+things ahead of time in order to stay under the target. Trying to respond to
+an OOM at the point of tor_malloc() failure, on the other hand, would involve
+a rare operation invoked from deep in the call stack. (Again, that's
+error-prone and hard to debug.)
+
+Third, thanks to the rise of Linux and other operating systems that allow
+memory to be overcommitted, you can't actually ever rely on getting a NULL
+from malloc() when you're out of memory; instead you have to use an approach
+closer to tracking the total memory usage.
+
+#### Conventions for your own allocation functions.
+
+Whenever you create a new type, the convention is to give it a pair of
+x_new() and x_free() functions, named after the type.
+
+Calling x_free(NULL) should always be a no-op.
+
+
+### Grow-only memory allocation: memarea.c
+
+It's often handy to allocate a large number of tiny objects, all of which
+need to disappear at the same time. You can do this in tor using the
+memarea.c abstraction, which uses a set of grow-only buffers for allocation,
+and only supports a single "free" operation at the end.
+
+Using memareas also helps you avoid memory fragmentation. You see, some libc
+malloc implementations perform badly on the case where a large number of
+small temporary objects are allocated at the same time as a few long-lived
+objects of similar size. But if you use tor_malloc() for the long-lived ones
+and a memarea for the temporary object, the malloc implementation is likelier
+to do better.
+
+To create a new memarea, use memarea_new(). To drop all the storage from a
+memarea, and invalidate its pointers, use memarea_drop_all().
+
+The allocation functions memarea_alloc(), memarea_alloc_zero(),
+memarea_memdup(), memarea_strdup(), and memarea_strndup() are analogous to
+the similarly-named malloc() functions. There is intentionally no
+memarea_free() or memarea_realloc().
+
+
diff --git a/doc/HACKING/design/01b-collections.md b/doc/HACKING/design/01b-collections.md
new file mode 100644
index 0000000000..def60b0f15
--- /dev/null
+++ b/doc/HACKING/design/01b-collections.md
@@ -0,0 +1,43 @@
+
+## Collections in tor
+
+### Smartlists: Neither lists, nor especially smart.
+
+For historical reasons, we call our dynamic-allocated array type
+"smartlist_t". It can grow or shrink as elements are added and removed.
+
+All smartlists hold an array of void \*. Whenever you expose a smartlist
+in an API you *must* document which types its pointers actually hold.
+
+<!-- It would be neat to fix that, wouldn't it? -NM -->
+
+Smartlists are created empty with smartlist_new() and freed with
+smartlist_free(). See the containers.h module documentation for more
+information; there are many convenience functions for commonly needed
+operations.
+
+
+### Digest maps, string maps, and more.
+
+Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
+or nul-terminated strings to void \*. These types are digestmap_t,
+digest256map_t, and strmap_t respectively. See the containers.h
+module documentation for more information.
+
+
+### Intrusive lists and hashtables
+
+For performance-sensitive cases, we sometimes want to use "intrusive"
+collections: ones where the bookkeeping pointers are stuck inside the
+structures that belong to the collection. If you've used the
+BSD-style sys/queue.h macros, you'll be familiar with these.
+
+Unfortunately, the sys/queue.h macros vary significantly between the
+platforms that have them, so we provide our own variants in
+src/ext/tor_queue.h .
+
+We also provide an intrusive hashtable implementation in src/ext/ht.h
+. When you're using it, you'll need to define your own hash
+functions. If attacker-induced collisions are a worry here, use the
+cryptographic siphash24g function to extract hashes.
+
diff --git a/doc/HACKING/design/01c-time.md b/doc/HACKING/design/01c-time.md
new file mode 100644
index 0000000000..5cd0b354fd
--- /dev/null
+++ b/doc/HACKING/design/01c-time.md
@@ -0,0 +1,75 @@
+
+## Time in tor ##
+
+### What time is it? ###
+
+We have several notions of the current time in Tor.
+
+The *wallclock time* is available from time(NULL) with
+second-granularity and tor_gettimeofday() with microsecond
+granularity. It corresponds most closely to "the current time and date".
+
+The *monotonic time* is available with the set of monotime_\*
+functions declared in compat_time.h. Unlike the wallclock time, it
+can only move forward. It does not necessarily correspond to a real
+world time, and it is not portable between systems.
+
+The *coarse monotonic time* is available from the set of
+monotime_coarse_\* functions in compat_time.h. It is the same as
+monotime_\* on some platforms. On others, it gives a monotonic timer
+with less precision, but which it's more efficient to access.
+
+### Cached views of time. ###
+
+On some systems (like Linux), many time functions use a VDSO to avoid
+the overhead of a system call. But on other systems, gettimeofday()
+and time() can be costly enough that you wouldn't want to call them
+tens of thousands of times. To get a recent, but not especially
+accurate, view of the current time, see approx_time() and
+tor_gettimeofday_cached().
+
+
+### Parsing and encoding time values ###
+
+Tor has functions to parse and format time in these formats:
+
+ * RFC1123 format. ("Fri, 29 Sep 2006 15:54:20 GMT"). For this,
+ use format_rfc1123_time() and parse_rfc1123_time.
+
+ * ISO8601 format. ("2006-10-29 10:57:20") For this, use
+ format_local_iso_time and format_iso_time. We also support the
+ variant format "2006-10-29T10:57:20" with format_iso_time_nospace, and
+ "2006-10-29T10:57:20.123456" with format_iso_time_nospace_usec.
+
+ * HTTP format collections (preferably "Mon, 25 Jul 2016 04:01:11
+ GMT" or possibly "Wed Jun 30 21:49:08 1993" or even "25-Jul-16
+ 04:01:11 GMT"). For this, use parse_http_time. Don't generate anything
+ but the first format.
+
+Some of these functions use struct tm. You can use the standard
+tor_localtime_r and tor_gmtime_r() to wrap these in a safe way. We
+also have a tor_timegm() function.
+
+### Scheduling events ###
+
+The main way to schedule a not-too-frequent periodic event with
+respect to the Tor mainloop is via the mechanism in periodic.c.
+There's a big table of periodic_events in main.c, each of which gets
+invoked on its own schedule. You should not expect more than about
+one second of accuracy with these timers.
+
+You can create an independent timer using libevent directly, or using
+the periodic_timer_new() function. But you should avoid doing this
+for per-connection or per-circuit timers: Libevent's internal timer
+implementation uses a min-heap, and those tend to start scaling poorly
+once you have a few thousand entries.
+
+If you need to create a large number of fine-grained timers for some
+purpose, you should consider the mechanism in src/common/timers.c,
+which is optimized for the case where you have a large number of
+timers with not-too-long duration, many of which will be deleted
+before they actually expire. These timers should be reasonably
+accurate within a handful of milliseconds -- possibly better on some
+platforms. (The timers.c module uses William Ahern's timeout.c
+implementation as its backend, which is based on a hierarchical timing
+wheel algorithm. It's cool stuff; check it out.)
diff --git a/doc/HACKING/design/01d-crypto.md b/doc/HACKING/design/01d-crypto.md
new file mode 100644
index 0000000000..d4def947d1
--- /dev/null
+++ b/doc/HACKING/design/01d-crypto.md
@@ -0,0 +1,169 @@
+
+## Lower-level cryptography functionality in Tor ##
+
+Generally speaking, Tor code shouldn't be calling OpenSSL (or any
+other crypto library) directly. Instead, we should indirect through
+one of the functions in src/common/crypto\*.c or src/common/tortls.c.
+
+Cryptography functionality that's available is described below.
+
+### RNG facilities ###
+
+The most basic RNG capability in Tor is the crypto_rand() family of
+functions. These currently use OpenSSL's RAND_() backend, but may use
+something faster in the future.
+
+In addition to crypto_rand(), which fills in a buffer with random
+bytes, we also have functions to produce random integers in certain
+ranges; to produce random hostnames; to produce random doubles, etc.
+
+When you're creating a long-term cryptographic secret, you might want
+to use crypto_strongest_rand() instead of crypto_rand(). It takes the
+operating system's entropy source and combines it with output from
+crypto_rand(). This is a pure paranoia measure, but it might help us
+someday.
+
+You can use smartlist_choose() to pick a random element from a smartlist
+and smartlist_shuffle() to randomize the order of a smartlist. Both are
+potentially a bit slow.
+
+### Cryptographic digests and related functions ###
+
+We treat digests as separate types based on the length of their
+outputs. We support one 160-bit digest (SHA1), two 256-bit digests
+(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512).
+
+You should not use SHA1 for anything new.
+
+The crypto_digest\*() family of functions manipulates digests. You
+can either compute a digest of a chunk of memory all at once using
+crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you
+can create a crypto_digest_t object with
+crypto_digest{,256,512}_new(), feed information to it in chunks using
+crypto_digest_add_bytes(), and then extract the final digest using
+crypto_digest_get_digest(). You can copy the state of one of these
+objects using crypto_digest_dup() or crypto_digest_assign().
+
+We support the HMAC hash-based message authentication code
+instantiated using SHA256. See crypto_hmac_sha256. (You should not
+add any HMAC users with SHA1, and HMAC is not necessary with SHA3.)
+
+We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike
+digests, these are extendable output functions (or XOFs) where you can
+get any amount of output. Use the crypto_xof_\*() functions to access
+these.
+
+We have several ways to derive keys from cryptographically strong secret
+inputs (like diffie-hellman outputs). The old
+crypto_expand_key_material-TAP() performs an ad-hoc KDF based on SHA1 -- you
+shouldn't use it for implementing anything but old versions of the Tor
+protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern
+protocols. Also consider SHAKE256.
+
+If your input is potentially weak, like a password or passphrase, use a salt
+along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer
+scrypt over other hashing methods when possible. If you're using a password
+to encrypt something, see the "boxed file storage" section below.
+
+Finally, in order to store objects in hash tables, Tor includes the
+randomized SipHash 2-4 function. Call it via the siphash24g() function in
+src/ext/siphash.h whenever you're creating a hashtable whose keys may be
+manipulated by an attacker in order to DoS you with collisions.
+
+
+### Stream ciphers ###
+
+You can create instances of a stream cipher using crypto_cipher_new().
+These are stateful objects of type crypto_cipher_t. Note that these
+objects only support AES-128 right now; a future version should add
+support for AES-128 and/or ChaCha20.
+
+You can encrypt/decrypt with crypto_cipher_encrypt or
+crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs
+an encryption without a copy.
+
+Note that sensible people should not use raw stream ciphers; they should
+probably be using some kind of AEAD. Sorry.
+
+### Public key functionality ###
+
+We support four public key algorithms: DH1024, RSA, Curve25519, and
+Ed25519.
+
+We support DH1024 over two prime groups. You access these via the
+crypto_dh_\*() family of functions.
+
+We support RSA in many bit sizes for signing and encryption. You access
+it via the crypto_pk_*() family of functions. Note that a crypto_pk_t
+may or may not include a private key. See the crypto_pk_* functions in
+crypto.c for a full list of functions here.
+
+For Curve25519 functionality, see the functions and types in
+crypto_curve25519.c. Curve25519 is generally suitable for when you need
+a secure fast elliptic-curve diffie hellman implementation. When
+designing new protocols, prefer it over DH in Z_p.
+
+For Ed25519 functionality, see the functions and types in
+crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast
+elliptic curve signature method. For new protocols, prefer it over RSA
+signatures.
+
+### Metaformats for storage ###
+
+When OpenSSL manages the storage of some object, we use whatever format
+OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding
+that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----".
+
+When we manage the storage of some cryptographic object, we prefix the
+object with 32-byte NUL-padded prefix in order to avoid accidental
+object confusion; see the crypto_read_tagged_contents_from_file() and
+crypto_write_tagged_contents_to_file() functions for manipulating
+these. The prefix is "== type: tag ==", where type describes the object
+and its encoding, and tag indicates which one it is.
+
+### Boxed-file storage ###
+
+When managing keys, you frequently want to have some way to write a
+secret object to disk, encrypted with a passphrase. The crypto_pwbox
+and crypto_unpwbox functions do so in a way that's likely to be
+readable by future versions of Tor.
+
+### Certificates ###
+
+We have, alas, several certificate types in Tor.
+
+The tor_x509_cert_t type represents an X.509 certificate. This document
+won't explain X.509 to you -- possibly, no document can. (OTOH, Peter
+Gutmann's "x.509 style guide", though severely dated, does a good job of
+explaining how awful x.509 can be.) Do not introduce any new usages of
+X.509. Right now we only use it in places where TLS forces us to do so.
+
+The authority_cert_t type is used only for directory authority keys. It
+has a medium-term signing key (which the authorities actually keep
+online) signed by a long-term identity key (which the authority operator
+had really better be keeping offline). Don't use it for any new kind of
+certificate.
+
+For new places where you need a certificate, consider tor_cert_t: it
+represents a typed and dated _something_ signed by an Ed25519 key. The
+format is described in tor-spec. Unlike x.509, you can write it on a
+napkin.
+
+(Additionally, the Tor directory design uses a fairly wide variety of
+documents that include keys and which are signed by keys. You can
+consider these documents to be an additional kind of certificate if you
+want.)
+
+### TLS ###
+
+Tor's TLS implementation is more tightly coupled to OpenSSL than we'd
+prefer. You can read most of it in tortls.c.
+
+Unfortunately, TLS's state machine and our requirement for nonblocking
+IO support means that using TLS in practice is a bit hairy, since
+logical writes can block on a physical reads, and vice versa.
+
+If you are lucky, you will never have to look at the code here.
+
+
+
diff --git a/doc/HACKING/design/01e-os-compat.md b/doc/HACKING/design/01e-os-compat.md
new file mode 100644
index 0000000000..072e95bc8a
--- /dev/null
+++ b/doc/HACKING/design/01e-os-compat.md
@@ -0,0 +1,50 @@
+
+## OS compatibility functions ##
+
+We've got a bunch of functions to wrap differences between various
+operating systems where we run.
+
+### The filesystem ###
+
+We wrap the most important filesystem functions with load-file,
+save-file, and map-file abstractions declared in util.c or compat.c. If
+you're messing about with file descriptors yourself, you might be doing
+things wrong. Most of the time, write_str_to_file() and
+read_str_from_file() are all you need.
+
+Use the check_private_directory() function to create or verify the
+presence of directories, and tor_listdir() to list the files in a
+directory.
+
+Those modules also have functions for manipulating paths a bit.
+
+### Networking ###
+
+Nearly all the world is on a Berkeley sockets API, except for
+windows, whose version of the Berkeley API was corrupted by late-90s
+insistence on backward compatibility with the
+sort-of-berkeley-sort-of-not add-on *thing* that was WinSocks.
+
+What's more, everybody who implemented sockets realized that select()
+wasn't a very good way to do nonblocking IO... and then the various
+implementations all decided to so something different.
+
+You can forget about most of these differences, fortunately: We use
+libevent to hide most of the differences between the various networking
+backends, and we add a few of our own functions to hide the differences
+that Libevent doesn't.
+
+To create a network connection, the right level of abstraction to look
+at is probably the connection_t system in connection.c. Most of the
+lower level work has already been done for you. If you need to
+instantiate something that doesn't fit well with connection_t, you
+should see whether you can instantiate it with connection_t anyway -- or
+you might need to refactor connection.c a little.
+
+Whenever possible, represent network addresses as tor_addr_t.
+
+### Process launch and monitoring ###
+
+Launching and/or monitoring a process is tricky business. You can use
+the mechanisms in procmon.c and tor_spawn_background(), but they're both
+a bit wonky. A refactoring would not be out of order.
diff --git a/doc/HACKING/design/01f-threads.md b/doc/HACKING/design/01f-threads.md
new file mode 100644
index 0000000000..a0dfa2d40e
--- /dev/null
+++ b/doc/HACKING/design/01f-threads.md
@@ -0,0 +1,26 @@
+
+## Threads in Tor ##
+
+Tor is based around a single main thread and one or more worker
+threads. We aim (with middling success) to use worker threads for
+CPU-intensive activities and the main thread for our networking.
+Fortunately (?) we have enough cryptography that moving what we can of the
+cryptographic processes to the workers should achieve good parallelism under most
+loads. Unfortunately, we only have a small fraction of our
+cryptography done in our worker threads right now.
+
+Our threads-and-workers abstraction is defined in workqueue.c, which
+combines a work queue with a thread pool, and integrates the
+signalling with libevent. Tor main instance of a work queue is
+instantiated in cpuworker.c. It will probably need some refactoring
+as more types of work are added.
+
+On a lower level, we provide locks with tor_mutex_t, conditions with
+tor_cond_t, and thread-local storage with tor_threadlocal_t, all of
+which are specified in compat_threads.h and implemented in an OS-
+specific compat_\*threads.h module.
+
+Try to minimize sharing between threads: it is usually best to simply
+make the worker "own" all the data it needs while the work is in
+progress, and to give up ownership when it's complete.
+
diff --git a/doc/HACKING/design/01g-strings.md b/doc/HACKING/design/01g-strings.md
new file mode 100644
index 0000000000..145a35cd6f
--- /dev/null
+++ b/doc/HACKING/design/01g-strings.md
@@ -0,0 +1,95 @@
+
+## String processing in Tor ##
+
+Since you're reading about a C program, you probably expected this
+section: it's full of functions for manipulating the (notoriously
+dubious) C string abstraction. I'll describe some often-missed
+highlights here.
+
+### Comparing strings and memory chunks ###
+
+We provide strcmpstart() and strcmpend() to perform a strcmp with the start
+or end of a string.
+
+ tor_assert(!strcmpstart("Hello world","Hello"));
+ tor_assert(!strcmpend("Hello world","world"));
+
+ tor_assert(!strcasecmpstart("HELLO WORLD","Hello"));
+ tor_assert(!strcasecmpend("HELLO WORLD","world"));
+
+To compare two string pointers, either of which might be NULL, use
+strcmp_opt().
+
+To search for a string or a chunk of memory within a non-null
+terminated memory block, use tor_memstr or tor_memmem respectively.
+
+We avoid using memcmp() directly, since it tends to be used in cases
+when having a constant-time operation would be better. Instead, we
+recommend tor_memeq() and tor_memneq() for when you need a
+constant-time operation. In cases when you need a fast comparison,
+and timing leaks are not a danger, you can use fast_memeq() and
+fast_memneq().
+
+It's a common pattern to take a string representing one or more lines
+of text, and search within it for some other string, at the start of a
+line. You could search for "\\ntarget", but that would miss the first
+line. Instead, use find_str_at_start_of_line.
+
+### Parsing text ###
+
+Over the years, we have accumulated lots of ways to parse text --
+probably too many. Refactoring them to be safer and saner could be a
+good project! The one that seems most error-resistant is tokenizing
+text with smartlist_split_strings(). This function takes a smartlist,
+a string, and a separator, and splits the string along occurrences of
+the separator, adding new strings for the sub-elements to the given
+smartlist.
+
+To handle time, you can use one of the functions mentioned above in
+"Parsing and encoding time values".
+
+For numbers in general, use the tor_parse_{long,ulong,double,uint64}
+family of functions. Each of these can be called in a few ways. The
+most general is as follows:
+
+ const int BASE = 10;
+ const int MINVAL = 10, MAXVAL = 10000;
+ const char *next;
+ int ok;
+ long lng = tor_parse_long("100", BASE, MINVAL, MAXVAL, &ok, &next);
+
+The return value should be ignored if "ok" is set to false. The input
+string needs to contain an entire number, or it's considered
+invalid... unless the "next" pointer is available, in which case extra
+characters at the end are allowed, and "next" is set to point to the
+first such character.
+
+### Generating blocks of text ###
+
+For not-too-large blocks of text, we provide tor_asprintf(), which
+behaves like other members of the sprintf() family, except that it
+always allocates enough memory on the heap for its output.
+
+For larger blocks: Rather than using strlcat and strlcpy to build
+text, or keeping pointers to the interior of a memory block, we
+recommend that you use the smartlist_* functions to build a smartlist
+full of substrings in order. Then you can concatenate them into a
+single string with smartlist_join_strings(), which also takes optional
+separator and terminator arguments.
+
+As a convenience, we provide smartlist_add_asprintf(), which combines
+the two methods above together. Many of the cryptographic digest
+functions also accept a not-yet-concatenated smartlist of strings.
+
+### Logging helpers ###
+
+Often we'd like to log a value that comes from an untrusted source.
+To do this, use escaped() to escape the nonprintable characters and
+other confusing elements in a string, and surround it by quotes. (Use
+esc_for_log() if you need to allocate a new string.)
+
+It's also handy to put memory chunks into hexadecimal before logging;
+you can use hex_str(memory, length) for that.
+
+The escaped() and hex_str() functions both provide outputs that are
+only valid till they are next invoked; they are not threadsafe.
diff --git a/doc/HACKING/design/02-dataflow.md b/doc/HACKING/design/02-dataflow.md
new file mode 100644
index 0000000000..39f21a908c
--- /dev/null
+++ b/doc/HACKING/design/02-dataflow.md
@@ -0,0 +1,236 @@
+
+## Data flow in the Tor process ##
+
+We read bytes from the network, we write bytes to the network. For the
+most part, the bytes we write correspond roughly to bytes we have read,
+with bits of cryptography added in.
+
+The rest is a matter of details.
+
+![Diagram of main data flows in Tor](./diagrams/02/02-dataflow.png "Diagram of main data flows in Tor")
+
+### Connections and buffers: reading, writing, and interpreting. ###
+
+At a low level, Tor's networking code is based on "connections". Each
+connection represents an object that can send or receive network-like
+events. For the most part, each connection has a single underlying TCP
+stream (I'll discuss counterexamples below).
+
+A connection that behaves like a TCP stream has an input buffer and an
+output buffer. Incoming data is
+written into the input buffer ("inbuf"); data to be written to the
+network is queued on an output buffer ("outbuf").
+
+Buffers are implemented in buffers.c. Each of these buffers is
+implemented as a linked queue of memory extents, in the style of classic
+BSD mbufs, or Linux skbufs.
+
+A connection's reading and writing can be enabled or disabled. Under
+the hood, this functionality is implemented using libevent events: one
+for reading, one for writing. These events are turned on/off in
+main.c, in the functions connection_{start,stop}_{reading,writing}.
+
+When a read or write event is turned on, the main libevent loop polls
+the kernel, asking which sockets are ready to read or write. (This
+polling happens in the event_base_loop() call in run_main_loop_once()
+in main.c.) When libevent finds a socket that's ready to read or write,
+it invokes conn_{read,write}_callback(), also in main.c
+
+These callback functions delegate to connection_handle_read() and
+connection_handle_write() in connection.c, which read or write on the
+network as appropriate, possibly delegating to openssl.
+
+After data is read or written, or other event occurs, these
+connection_handle_read_write() functions call logic functions whose job is
+to respond to the information. Some examples included:
+
+ * connection_flushed_some() -- called after a connection writes any
+ amount of data from its outbuf.
+ * connection_finished_flushing() -- called when a connection has
+ emptied its outbuf.
+ * connection_finished_connecting() -- called when an in-process connection
+ finishes making a remote connection.
+ * connection_reached_eof() -- called after receiving a FIN from the
+ remote server.
+ * connection_process_inbuf() -- called when more data arrives on
+ the inbuf.
+
+These functions then call into specific implementations depending on
+the type of the connection. For example, if the connection is an
+edge_connection_t, connection_reached_eof() will call
+connection_edge_reached_eof().
+
+> **Note:** "Also there are bufferevents!" We have vestigial
+> code for an alternative low-level networking
+> implementation, based on Libevent's evbuffer and bufferevent
+> code. These two object types take on (most of) the roles of
+> buffers and connections respectively. It isn't working in today's
+> Tor, due to code rot and possible lingering libevent bugs. More
+> work is needed; it would be good to get this working efficiently
+> again, to have IOCP support on Windows.
+
+
+#### Controlling connections ####
+
+A connection can have reading or writing enabled or disabled for a
+wide variety of reasons, including:
+
+ * Writing is disabled when there is no more data to write
+ * For some connection types, reading is disabled when the inbuf is
+ too full.
+ * Reading/writing is temporarily disabled on connections that have
+ recently read/written enough data up to their bandwidth
+ * Reading is disabled on connections when reading more data from them
+ would require that data to be buffered somewhere else that is
+ already full.
+
+Currently, these conditions are checked in a diffuse set of
+increasingly complex conditional expressions. In the future, it could
+be helpful to transition to a unified model for handling temporary
+read/write suspensions.
+
+#### Kinds of connections ####
+
+Today Tor has the following connection and pseudoconnection types.
+For the most part, each type of channel has an associated C module
+that implements its underlying logic.
+
+**Edge connections** receive data from and deliver data to points
+outside the onion routing network. See `connection_edge.c`. They fall into two types:
+
+**Entry connections** are a type of edge connection. They receive data
+from the user running a Tor client, and deliver data to that user.
+They are used to implement SOCKSPort, TransPort, NATDPort, and so on.
+Sometimes they are called "AP" connections for historical reasons (it
+used to stand for "Application Proxy").
+
+**Exit connections** are a type of edge connection. They exist at an
+exit node, and transmit traffic to and from the network.
+
+(Entry connections and exit connections are also used as placeholders
+when performing a remote DNS request; they are not decoupled from the
+notion of "stream" in the Tor protocol. This is implemented partially
+in `connection_edge.c`, and partially in `dnsserv.c` and `dns.c`.)
+
+**OR connections** send and receive Tor cells over TLS, using some
+version of the Tor link protocol. Their implementation is spread
+across `connection_or.c`, with a bit of logic in `command.c`,
+`relay.c`, and `channeltls.c`.
+
+**Extended OR connections** are a type of OR connection for use on
+bridges using pluggable transports, so that the PT can tell the bridge
+some information about the incoming connection before passing on its
+data. They are implemented in `ext_orport.c`.
+
+**Directory connections** are server-side or client-side connections
+that implement Tor's HTTP-based directory protocol. These are
+instantiated using a socket when Tor is making an unencrypted HTTP
+connection. When Tor is tunneling a directory request over a Tor
+circuit, directory connections are implemented using a linked
+connection pair (see below). Directory connections are implemented in
+`directory.c`; some of the server-side logic is implemented in
+`dirserver.c`.
+
+**Controller connections** are local connections to a controller
+process implementing the controller protocol from
+control-spec.txt. These are in `control.c`.
+
+**Listener connections** are not stream oriented! Rather, they wrap a
+listening socket in order to detect new incoming connections. They
+bypass most of stream logic. They don't have associated buffers.
+They are implemented in `connection.c`.
+
+![structure hierarchy for connection types](./diagrams/02/02-connection-types.png "structure hierarchy for connection types")
+
+>**Note**: "History Time!" You might occasionally find reference to a couple types of connections
+> which no longer exist in modern Tor. A *CPUWorker connection*
+>connected the main Tor process to a thread or process used for
+>computation. (Nowadays we use in-process communication.) Even more
+>anciently, a *DNSWorker connection* connected the main tor process to
+>a separate thread or process used for running `gethostbyname()` or
+>`getaddrinfo()`. (Nowadays we use Libevent's evdns facility to
+>perform DNS requests asynchronously.)
+
+#### Linked connections ####
+
+Sometimes two channels are joined together, such that data which the
+Tor process sends on one should immediately be received by the same
+Tor process on the other. (For example, when Tor makes a tunneled
+directory connection, this is implemented on the client side as a
+directory connection whose output goes, not to the network, but to a
+local entry connection. And when a directory receives a tunnelled
+directory connection, this is implemented as an exit connection whose
+output goes, not to the network, but to a local directory connection.)
+
+The earliest versions of Tor to support linked connections used
+socketpairs for the purpose. But using socketpairs forced us to copy
+data through kernelspace, and wasted limited file descriptors. So
+instead, a pair of connections can be linked in-process. Each linked
+connection has a pointer to the other, such that data written on one
+is immediately readable on the other, and vice versa.
+
+### From connections to channels ###
+
+There's an abstraction layer above OR connections (the ones that
+handle cells) and below cells called **Channels**. A channel's
+purpose is to transmit authenticated cells from one Tor instance
+(relay or client) to another.
+
+Currently, only one implementation exists: Channel_tls, which sends
+and receiveds cells over a TLS-based OR connection.
+
+Cells are sent on a channel using
+`channel_write_{,packed_,var_}cell()`. Incoming cells arrive on a
+channel from its backend using `channel_queue*_cell()`, and are
+immediately processed using `channel_process_cells()`.
+
+Some cell types are handled below the channel layer, such as those
+that affect handshaking only. And some others are passed up to the
+generic cross-channel code in `command.c`: cells like `DESTROY` and
+`CREATED` are all trivial to handle. But relay cells
+require special handling...
+
+### From channels through circuits ###
+
+When a relay cell arrives on an existing circuit, it is handled in
+`circuit_receive_relay_cell()` -- one of the innermost functions in
+Tor. This function encrypts or decrypts the relay cell as
+appropriate, and decides whether the cell is intended for the current
+hop of the circuit.
+
+If the cell *is* intended for the current hop, we pass it to
+`connection_edge_process_relay_cell()` in `relay.c`, which acts on it
+based on its relay command, and (possibly) queues its data on an
+`edge_connection_t`.
+
+If the cell *is not* intended for the current hop, we queue it for the
+next channel in sequence with `append cell_to_circuit_queue()`. This
+places the cell on a per-circuit queue for cells headed out on that
+particular channel.
+
+### Sending cells on circuits: the complicated bit. ###
+
+Relay cells are queued onto circuits from one of two (main) sources:
+reading data from edge connections, and receiving a cell to be relayed
+on a circuit. Both of these sources place their cells on cell queue:
+each circuit has one cell queue for each direction that it travels.
+
+A naive implementation would skip using cell queues, and instead write
+each outgoing relay cell. (Tor did this in its earlier versions.)
+But such an approach tends to give poor performance, because it allows
+high-volume circuits to clog channels, and it forces the Tor server to
+send data queued on a circuit even after that circuit has been closed.
+
+So by using queues on each circuit, we can add cells to each channel
+on a just-in-time basis, choosing the cell at each moment based on
+a performance-aware algorithm.
+
+This logic is implemented in two main modules: `scheduler.c` and
+`circuitmux*.c`. The scheduler code is responsible for determining
+globally, across all channels that could write cells, which one should
+next receive queued cells. The circuitmux code determines, for all
+of the circuits with queued cells for a channel, which one should
+queue the next cell.
+
+(This logic applies to outgoing relay cells only; incoming relay cells
+are processed as they arrive.)
diff --git a/doc/HACKING/design/03-modules.md b/doc/HACKING/design/03-modules.md
new file mode 100644
index 0000000000..93eb9d3089
--- /dev/null
+++ b/doc/HACKING/design/03-modules.md
@@ -0,0 +1,247 @@
+
+## Tor's modules ##
+
+### Generic modules ###
+
+`buffers.c`
+: Implements the `buf_t` buffered data type for connections, and several
+low-level data handling functions to handle network protocols on it.
+
+`channel.c`
+: Generic channel implementation. Channels handle sending and receiving cells
+among tor nodes.
+
+`channeltls.c`
+: Channel implementation for TLS-based OR connections. Uses `connection_or.c`.
+
+`circuitbuild.c`
+: Code for constructing circuits and choosing their paths. (*Note*:
+this module could plausibly be split into handling the client side,
+the server side, and the path generation aspects of circuit building.)
+
+`circuitlist.c`
+: Code for maintaining and navigating the global list of circuits.
+
+`circuitmux.c`
+: Generic circuitmux implementation. A circuitmux handles deciding, for a
+particular channel, which circuit should write next.
+
+`circuitmux_ewma.c`
+: A circuitmux implementation based on the EWMA (exponentially
+weighted moving average) algorithm.
+
+`circuituse.c`
+: Code to actually send and receive data on circuits.
+
+`command.c`
+: Handles incoming cells on channels.
+
+`config.c`
+: Parses options from torrc, and uses them to configure the rest of Tor.
+
+`confparse.c`
+: Generic torrc-style parser. Used to parse torrc and state files.
+
+`connection.c`
+: Generic and common connection tools, and implementation for the simpler
+connection types.
+
+`connection_edge.c`
+: Implementation for entry and exit connections.
+
+`connection_or.c`
+: Implementation for OR connections (the ones that send cells over TLS).
+
+`main.c`
+: Principal entry point, main loops, scheduled events, and network
+management for Tor.
+
+`ntmain.c`
+: Implements Tor as a Windows service. (Not very well.)
+
+`onion.c`
+: Generic code for generating and responding to CREATE and CREATED
+cells, and performing the appropriate onion handshakes. Also contains
+code to manage the server-side onion queue.
+
+`onion_fast.c`
+: Implements the old SHA1-based CREATE_FAST/CREATED_FAST circuit
+creation handshake. (Now deprecated.)
+
+`onion_ntor.c`
+: Implements the Curve25519-based NTOR circuit creation handshake.
+
+`onion_tap.c`
+: Implements the old RSA1024/DH1024-based TAP circuit creation handshake. (Now
+deprecated.)
+
+`relay.c`
+: Handles particular types of relay cells, and provides code to receive,
+encrypt, route, and interpret relay cells.
+
+`scheduler.c`
+: Decides which channel/circuit pair is ready to receive the next cell.
+
+`statefile.c`
+: Handles loading and storing Tor's state file.
+
+`tor_main.c`
+: Contains the actual `main()` function. (This is placed in a separate
+file so that the unit tests can have their own `main()`.)
+
+
+### Node-status modules ###
+
+`directory.c`
+: Implements the HTTP-based directory protocol, including sending,
+receiving, and handling most request types. (*Note*: The client parts
+of this, and the generic-HTTP parts of this, could plausibly be split
+off.)
+
+`microdesc.c`
+: Implements the compact "microdescriptor" format for keeping track of
+what we know about a router.
+
+`networkstatus.c`
+: Code for fetching, storing, and interpreting consensus vote documents.
+
+`nodelist.c`
+: Higher-level view of our knowledge of which Tor servers exist. Each
+`node_t` corresponds to a router we know about.
+
+`routerlist.c`
+: Code for storing and retrieving router descriptors and extrainfo
+documents.
+
+`routerparse.c`
+: Generic and specific code for parsing all Tor directory information
+types.
+
+`routerset.c`
+: Parses and interprets a specification for a set of routers (by IP
+range, fingerprint, nickname (deprecated), or country).
+
+
+### Client modules ###
+
+`addressmap.c`
+: Handles client-side associations between one address and another.
+These are used to implement client-side DNS caching (NOT RECOMMENDED),
+MapAddress directives, Automapping, and more.
+
+`circpathbias.c`
+: Path bias attack detection for circuits: tracks whether
+connections made through a particular guard have an unusually high failure rate.
+
+`circuitstats.c`
+: Code to track circuit performance statistics in order to adapt our behavior.
+Notably includes an algorithm to track circuit build times.
+
+`dnsserv.c`
+: Implements DNSPort for clients. (Note that in spite of the word
+"server" in this module's name, it is used for Tor clients. It
+implements a DNS server, not DNS for servers.)
+
+`entrynodes.c`
+: Chooses, monitors, and remembers guard nodes. Also contains some
+bridge-related code.
+
+`torcert.c`
+: Code to interpret and generate Ed25519-based certificates.
+
+### Server modules ###
+
+`dns.c`
+: Server-side DNS code. Handles sending and receiving DNS requests on
+exit nodes, and implements the server-side DNS cache.
+
+`dirserv.c`
+: Implements part of directory caches that handles responding to
+client requests.
+
+`ext_orport.c`
+: Implements the extended ORPort protocol for communication between
+server-side pluggable transports and Tor servers.
+
+`hibernate.c`
+: Performs bandwidth accounting, and puts Tor relays into hibernation
+when their bandwidth is exhausted.
+
+`router.c`
+: Management code for running a Tor server. In charge of RSA key
+maintenance, descriptor generation and uploading.
+
+`routerkeys.c`
+: Key handling code for a Tor server. (Currently handles only the
+Ed25519 keys, but the RSA keys could be moved here too.)
+
+
+### Onion service modules ###
+
+`rendcache.c`
+: Stores onion service descriptors.
+
+`rendclient.c`
+: Client-side implementation of the onion service protocol.
+
+`rendcommon.c`
+: Parts of the onion service protocol that are shared by clients,
+services, and/or Tor servers.
+
+`rendmid.c`
+: Tor-server-side implementation of the onion service protocol. (Handles
+acting as an introduction point or a rendezvous point.)
+
+`rendservice.c`
+: Service-side implementation of the onion service protocol.
+
+`replaycache.c`
+: Backend to check introduce2 requests for replay attempts.
+
+
+### Authority modules ###
+
+`dircollate.c`
+: Helper for `dirvote.c`: Given a set of votes, each containing a list
+of Tor nodes, determines which entries across all the votes correspond
+to the same nodes, and yields them in a useful order.
+
+`dirvote.c`
+: Implements the directory voting algorithms that authorities use.
+
+`keypin.c`
+: Implements a persistent key-pinning mechanism to tie RSA1024
+identities to ed25519 identities.
+
+### Miscellaneous modules ###
+
+`control.c`
+: Implements the Tor controller protocol.
+
+`cpuworker.c`
+: Implements the inner work queue function. We use this to move the
+work of circuit creation (on server-side) to other CPUs.
+
+`fp_pair.c`
+: Types for handling 2-tuples of 20-byte fingerprints.
+
+`geoip.c`
+: Parses geoip files (which map IP addresses to country codes), and
+performs lookups on the internal geoip table. Also stores some
+geoip-related statistics.
+
+`policies.c`
+: Parses and implements Tor exit policies.
+
+`reasons.c`
+: Maps internal reason-codes to human-readable strings.
+
+`rephist.c`
+: Tracks Tor servers' performance over time.
+
+`status.c`
+: Writes periodic "heartbeat" status messages about the state of the Tor
+process.
+
+`transports.c`
+: Implements management for the pluggable transports subsystem.
diff --git a/doc/HACKING/design/Makefile b/doc/HACKING/design/Makefile
new file mode 100644
index 0000000000..e126130970
--- /dev/null
+++ b/doc/HACKING/design/Makefile
@@ -0,0 +1,34 @@
+
+
+
+HTML= \
+ 00-overview.html \
+ 01-common-utils.html \
+ 01a-memory.html \
+ 01b-collections.html \
+ 01c-time.html \
+ 01d-crypto.html \
+ 01e-os-compat.html \
+ 01f-threads.html \
+ 01g-strings.html \
+ 02-dataflow.html \
+ 03-modules.html \
+ this-not-that.html
+
+PNG = \
+ diagrams/02/02-dataflow.png \
+ diagrams/02/02-connection-types.png
+
+all: generated
+
+generated: $(HTML) $(PNG)
+
+%.html: %.md
+ maruku $< -o $@
+
+%.png: %.dia
+ dia $< --export=$@
+
+clean:
+ rm -f $(HTML)
+ rm -f $(PNG)
diff --git a/doc/HACKING/design/this-not-that.md b/doc/HACKING/design/this-not-that.md
new file mode 100644
index 0000000000..815c7b2fbc
--- /dev/null
+++ b/doc/HACKING/design/this-not-that.md
@@ -0,0 +1,51 @@
+
+Don't use memcmp. Use {tor,fast}_{memeq,memneq,memcmp}.
+
+Don't use assert. Use tor_assert or tor_assert_nonfatal or BUG. Prefer
+nonfatal assertions or BUG()s.
+
+Don't use sprintf or snprintf. Use tor_asprintf or tor_snprintf.
+
+Don't write hand-written binary parsers. Use trunnel.
+
+Don't use malloc, realloc, calloc, free, strdup, etc. Use tor_malloc,
+tor_realloc, tor_calloc, tor_free, tor_strdup, etc.
+
+Don't use tor_realloc(x, y\*z). Use tor_reallocarray(x, y, z);
+
+Don't say "if (x) foo_free(x)". Just foo_free(x) and make sure that
+foo_free(NULL) is a no-op.
+
+Don't use toupper or tolower; use TOR_TOUPPER and TOR_TOLOWER.
+
+Don't use isalpha, isalnum, etc. Instead use TOR_ISALPHA, TOR_ISALNUM, etc.
+
+Don't use strcat, strcpy, strncat, or strncpy. Use strlcat and strlcpy
+instead.
+
+Don't use tor_asprintf then smartlist_add; use smartlist_add_asprintf.
+
+Don't use any of these functions: they aren't portable. Use the
+version prefixed with `tor_` instead: strtok_r, memmem, memstr,
+asprintf, localtime_r, gmtime_r, inet_aton, inet_ntop, inet_pton,
+getpass, ntohll, htonll, strdup, (This list is incomplete.)
+
+Don't create or close sockets directly. Instead use the wrappers in
+compat.h.
+
+When creating new APIs, only use 'char \*' to represent 'pointer to a
+nul-terminated string'. Represent 'pointer to a chunk of memory' as
+'uint8_t \*'. (Many older Tor APIs ignore this rule.)
+
+Don't encode/decode u32, u64, or u16 to byte arrays by casting
+pointers. That can crash if the pointers aren't aligned, and can cause
+endianness problems. Instead say something more like set_uint32(ptr,
+htonl(foo)) to encode, and ntohl(get_uint32(ptr)) to decode.
+
+Don't declare a 0-argument function with "void foo()". That's C++
+syntax. In C you say "void foo(void)".
+
+When creating new APIs, use const everywhere you reasonably can.
+
+Sockets should have type tor_socket_t, not int.
+