summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--Doxyfile.in2
-rw-r--r--doc/HACKING/design/00-overview.md118
-rw-r--r--doc/HACKING/design/01.00-lib-overview.md171
-rw-r--r--doc/HACKING/design/01a-memory.md103
-rw-r--r--doc/HACKING/design/01b-collections.md45
-rw-r--r--doc/HACKING/design/01d-crypto.md142
-rw-r--r--doc/HACKING/design/03-modules.md89
-rw-r--r--src/app/app.dox2
-rw-r--r--src/app/config/app_config.dox8
-rw-r--r--src/app/main/app_main.dox4
-rw-r--r--src/core/core.dox14
-rw-r--r--src/core/crypto/core_crypto.dox8
-rw-r--r--src/core/mainloop/core_mainloop.dox12
-rw-r--r--src/core/or/core_or.dox64
-rw-r--r--src/core/proto/core_proto.dox8
-rw-r--r--src/feature/api/feature_api.dox4
-rw-r--r--src/feature/client/feature_client.dox7
-rw-r--r--src/feature/control/feature_control.dox10
-rw-r--r--src/feature/dirauth/feature_dirauth.dox11
-rw-r--r--src/feature/dircache/feature_dircache.dox8
-rw-r--r--src/feature/dirclient/feature_dirclient.dox9
-rw-r--r--src/feature/dircommon/feature_dircommon.dox9
-rw-r--r--src/feature/dirparse/feature_dirparse.dox10
-rw-r--r--src/feature/feature.dox2
-rw-r--r--src/feature/hibernate/feature_hibernate.dox16
-rw-r--r--src/feature/hs/feature_hs.dox10
-rw-r--r--src/feature/hs_common/feature_hs_common.dox5
-rw-r--r--src/feature/keymgt/feature_keymgt.dox5
-rw-r--r--src/feature/nodelist/feature_nodelist.dox4
-rw-r--r--src/feature/relay/feature_relay.dox6
-rw-r--r--src/feature/rend/feature_rend.dox9
-rw-r--r--src/feature/stats/feature_stats.dox12
-rw-r--r--src/lib/arch/lib_arch.dox4
-rw-r--r--src/lib/buf/lib_buf.dox15
-rw-r--r--src/lib/cc/lib_cc.dox4
-rw-r--r--src/lib/compress/lib_compress.dox8
-rw-r--r--src/lib/conf/lib_conf.dox5
-rw-r--r--src/lib/confmgt/lib_confmgt.dox9
-rw-r--r--src/lib/container/lib_container.dox51
-rw-r--r--src/lib/crypt_ops/lib_crypt_ops.dox139
-rw-r--r--src/lib/ctime/lib_ctime.dox16
-rw-r--r--src/lib/defs/lib_defs.dox4
-rw-r--r--src/lib/dispatch/lib_dispatch.dox16
-rw-r--r--src/lib/encoding/lib_encoding.dox8
-rw-r--r--src/lib/err/lib_err.dox15
-rw-r--r--src/lib/evloop/lib_evloop.dox9
-rw-r--r--src/lib/fdio/lib_fdio.dox7
-rw-r--r--src/lib/fs/lib_fs.dox11
-rw-r--r--src/lib/geoip/lib_geoip.dox5
-rw-r--r--src/lib/intmath/lib_intmath.dox4
-rw-r--r--src/lib/lib.dox131
-rw-r--r--src/lib/lock/lib_lock.dox8
-rw-r--r--src/lib/log/lib_log.dox12
-rw-r--r--src/lib/malloc/lib_malloc.dox78
-rw-r--r--src/lib/math/lib_math.dox8
-rw-r--r--src/lib/memarea/lib_memarea.dox30
-rw-r--r--src/lib/meminfo/lib_meminfo.dox7
-rw-r--r--src/lib/net/lib_net.dox8
-rw-r--r--src/lib/osinfo/lib_osinfo.dox10
-rw-r--r--src/lib/process/lib_process.dox4
-rw-r--r--src/lib/pubsub/lib_pubsub.dox16
-rw-r--r--src/lib/sandbox/lib_sandbox.dox17
-rw-r--r--src/lib/smartlist_core/lib_smartlist_core.dox12
-rw-r--r--src/lib/stats/lib_stats.dox4
-rw-r--r--src/lib/string/lib_string.dox15
-rw-r--r--src/lib/subsys/lib_subsys.dox34
-rw-r--r--src/lib/term/lib_term.dox4
-rw-r--r--src/lib/testsupport/lib_testsupport.dox4
-rw-r--r--src/lib/thread/lib_thread.dox9
-rw-r--r--src/lib/time/lib_time.dox11
-rw-r--r--src/lib/tls/lib_tls.dox13
-rw-r--r--src/lib/trace/lib_trace.dox8
-rw-r--r--src/lib/version/lib_version.dox4
-rw-r--r--src/lib/wallclock/lib_wallclock.dox13
-rw-r--r--src/mainpage.dox121
-rw-r--r--src/tools/tools.dox2
76 files changed, 1031 insertions, 809 deletions
diff --git a/Doxyfile.in b/Doxyfile.in
index 0128f12812..547a7db190 100644
--- a/Doxyfile.in
+++ b/Doxyfile.in
@@ -256,6 +256,8 @@ TAB_SIZE = 8
ALIASES =
+ALIASES += refdir{1}="\ref @top_srcdir@/src/\1 \"\1\""
+
# This tag can be used to specify a number of word-keyword mappings (TCL only).
# A mapping has the form "name=value". For example adding "class=itcl::class"
# will allow you to use the command class in the itcl::class meaning.
diff --git a/doc/HACKING/design/00-overview.md b/doc/HACKING/design/00-overview.md
index ff40a566be..1c14dc8c10 100644
--- a/doc/HACKING/design/00-overview.md
+++ b/doc/HACKING/design/00-overview.md
@@ -1,124 +1,6 @@
## Overview ##
-This document describes the general structure of the Tor codebase, how
-it fits together, what functionality is available for extending Tor,
-and gives some notes on how Tor got that way.
-
-Tor remains a work in progress: We've been working on it for nearly two
-decades, and we've learned a lot about good coding since we first
-started. This means, however, that some of the older pieces of Tor will
-have some "code smell" in them that could stand a brisk
-refactoring. So when I describe a piece of code, I'll sometimes give a
-note on how it got that way, and whether I still think that's a good
-idea.
-
-The first drafts of this document were written in the Summer and Fall of
-2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7
-was under development. There is a revision in progress (as of late
-2019), to bring it up to pace with Tor as of version 0.4.2. If you're
-reading this far in the future, some things may have changed. Caveat
-haxxor!
-
-This document is not an overview of the Tor protocol. For that, see the
-design paper and the specifications at https://spec.torproject.org/ .
-
-For more information about Tor's coding standards and some helpful
-development tools, see doc/HACKING in the Tor repository.
-
-
-### The very high level ###
-
-Ultimately, Tor runs as an event-driven network daemon: it responds to
-network events, signals, and timers by sending and receiving things over
-the network. Clients, relays, and directory authorities all use the
-same codebase: the Tor process will run as a client, relay, or authority
-depending on its configuration.
-
-Tor has a few major dependencies, including Libevent (used to tell which
-sockets are readable and writable), OpenSSL or NSS (used for many encryption
-functions, and to implement the TLS protocol), and zlib (used to
-compress and uncompress directory information).
-
-Most of Tor's work today is done in a single event-driven main thread.
-Tor also spawns one or more worker threads to handle CPU-intensive
-tasks. (Right now, this only includes circuit encryption and the more
-expensive compression algorithms.)
-
-On startup, Tor initializes its libraries, reads and responds to its
-configuration files, and launches a main event loop. At first, the only
-events that Tor listens for are a few signals (like TERM and HUP), and
-one or more listener sockets (for different kinds of incoming
-connections). Tor also configures several timers to handle periodic
-events. As Tor runs over time, other events will open, and new events
-will be scheduled.
-
-The codebase is divided into a few top-level subdirectories, each of
-which contains several sub-modules.
-
- * `src/ext` -- Code maintained elsewhere that we include in the Tor
- source distribution.
-
- * src/lib` -- Lower-level utility code, not necessarily tor-specific.
-
- * `src/trunnel` -- Automatically generated code (from the Trunnel
- tool): used to parse and encode binary formats.
-
- * `src/core` -- Networking code that is implements the central parts of
- the Tor protocol and main loop.
-
- * `src/feature` -- Aspects of Tor (like directory management, running a
- relay, running a directory authorities, managing a list of nodes,
- running and using onion services) that are built on top of the
- mainloop code.
-
- * `src/app` -- Highest-level functionality; responsible for setting up
- and configuring the Tor daemon, making sure all the lower-level
- modules start up when required, and so on.
-
- * `src/tools` -- Binaries other than Tor that we produce. Currently this
- is tor-resolve, tor-gencert, and the tor_runner.o helper module.
-
- * `src/test` -- unit tests, regression tests, and a few integration
- tests.
-
-In theory, the above parts of the codebase are sorted from highest-level to
-lowest-level, where high-level code is only allowed to invoke lower-level
-code, and lower-level code never includes or depends on code of a higher
-level. In practice, this refactoring is incomplete: The modules in `src/lib`
-are well-factored, but there are many layer violations ("upward
-dependencies") in `src/core` and `src/feature`. We aim to eliminate those
-over time.
-
-### Some key high-level abstractions ###
-
-The most important abstractions at Tor's high-level are Connections,
-Channels, Circuits, and Nodes.
-
-A 'Connection' represents a stream-based information flow. Most
-connections are TCP connections to remote Tor servers and clients. (But
-as a shortcut, a relay will sometimes make a connection to itself
-without actually using a TCP connection. More details later on.)
-Connections exist in different varieties, depending on what
-functionality they provide. The principle types of connection are
-"edge" (eg a socks connection or a connection from an exit relay to a
-destination), "OR" (a TLS stream connecting to a relay), "Directory" (an
-HTTP connection to learn about the network), and "Control" (a connection
-from a controller).
-
-A 'Circuit' is persistent tunnel through the Tor network, established
-with public-key cryptography, and used to send cells one or more hops.
-Clients keep track of multi-hop circuits, and the cryptography
-associated with each hop. Relays, on the other hand, keep track only of
-their hop of each circuit.
-
-A 'Channel' is an abstract view of sending cells to and from a Tor
-relay. Currently, all channels are implemented using OR connections.
-If we switch to other strategies in the future, we'll have more
-connection types.
-
-A 'Node' is a view of a Tor instance's current knowledge and opinions
-about a Tor relay or bridge.
### The rest of this document. ###
diff --git a/doc/HACKING/design/01.00-lib-overview.md b/doc/HACKING/design/01.00-lib-overview.md
deleted file mode 100644
index 58a92f4062..0000000000
--- a/doc/HACKING/design/01.00-lib-overview.md
+++ /dev/null
@@ -1,171 +0,0 @@
-
-## Library code in Tor.
-
-Most of Tor's utility code is in modules in the `src/lib` subdirectory. In
-general, this code is not necessarily Tor-specific, but is instead possibly
-useful for other applications.
-
-This code includes:
-
- * Compatibility wrappers, to provide a uniform API across different
- platforms.
-
- * Library wrappers, to provide a tor-like API over different libraries
- that Tor uses for things like compression and cryptography.
-
- * Containers, to implement some general-purpose data container types.
-
-The modules in `src/lib` are currently well-factored: each one depends
-only on lower-level modules. You can see an up-to-date list of the
-modules sorted from lowest to highest level by running
-`./scripts/maint/practracker/includes.py --toposort`.
-
-As of this writing, the library modules are (from lowest to highest
-level):
-
- * `lib/cc` -- Macros for managing the C compiler and
- language. Includes macros for improving compatibility and clarity
- across different C compilers.
-
- * `lib/version` -- Holds the current version of Tor.
-
- * `lib/testsupport` -- Helpers for making test-only code and test
- mocking support.
-
- * `lib/defs` -- Lowest-level constants used in many places across the
- code.
-
- * `lib/subsys` -- Types used for declaring a "subsystem". A subsystem
- is a module with support for initialization, shutdown,
- configuration, and so on.
-
- * `lib/conf` -- Types and macros used for declaring configuration
- options.
-
- * `lib/arch` -- Compatibility functions and macros for handling
- differences in CPU architecture.
-
- * `lib/err` -- Lowest-level error handling code: responsible for
- generating stack traces, handling raw assertion failures, and
- otherwise reporting problems that might not be safe to report
- via the regular logging module.
-
- * `lib/malloc` -- Wrappers and utilities for memory management.
-
- * `lib/intmath` -- Utilities for integer mathematics.
-
- * `lib/fdio` -- Utilities and compatibility code for reading and
- writing data on file descriptors (and on sockets, for platforms
- where a socket is not a kind of fd).
-
- * `lib/lock` -- Compatibility code for declaring and using locks.
- Lower-level than the rest of the threading code.
-
- * `lib/ctime` -- Constant-time implementations for data comparison
- and table lookup, used to avoid timing side-channels from standard
- implementations of memcmp() and so on.
-
- * `lib/string` -- Low-level compatibility wrappers and utility
- functions for string manipulation.
-
- * `lib/wallclock` -- Compatibility and utility functions for
- inspecting and manipulating the current (UTC) time.
-
- * `lib/osinfo` -- Functions for inspecting the version and
- capabilities of the operating system.
-
- * `lib/smartlist_core` -- The bare-bones pieces of our dynamic array
- ("smartlist") implementation. There are higher-level pieces, but
- these ones are used by (and therefore cannot use) the logging code.
-
- * `lib/log` -- Implements the logging system used by all higher-level
- Tor code. You can think of this as the logical "midpoint" of the
- library code: much of the higher-level code is higher-level
- _because_ it uses the logging module, and much of the lower-level
- code is specifically written to avoid having to log, because the
- logging module depends on it.
-
- * `lib/container` -- General purpose containers, including dynamic arrays
- ("smartlists"), hashtables, bit arrays, weak-reference-like "handles",
- bloom filters, and a bit more.
-
- * `lib/trace` -- A general-purpose API for introducing
- function-tracing functionality into Tor. Currently not much used.
-
- * `lib/thread` -- Threading compatibility and utility functionality,
- other than low-level locks (which are in `lib/lock`) and
- workqueue/threadpool code (which belongs in `lib/evloop`).
-
- * `lib/term` -- Code for terminal manipulation functions (like
- reading a password from the user).
-
- * `lib/memarea` -- A data structure for a fast "arena" style allocator,
- where the data is freed all at once. Used for parsing.
-
- * `lib/encoding` -- Implementations for encoding data in various
- formats, datatypes, and transformations.
-
- * `lib/dispatch` -- A general-purpose in-process message delivery
- system. Used by `lib/pubsub` to implement our inter-module
- publish/subscribe system.
-
- * `lib/sandbox` -- Our Linux seccomp2 sandbox implementation.
-
- * `lib/pubsub` -- Code and macros to implement our publish/subscribe
- message passing system.
-
- * `lib/fs` -- Utility and compatibility code for manipulating files,
- filenames, directories, and so on.
-
- * `lib/confmgt` -- Code to parse, encode, and manipulate our
- configuration files, state files, and so forth.
-
- * `lib/crypt_ops` -- Cryptographic operations. This module contains
- wrappers around the cryptographic libraries that we support,
- and implementations for some higher-level cryptographic
- constructions that we use.
-
- * `lib/meminfo` -- Functions for inspecting our memory usage, if the
- malloc implementation exposes that to us.
-
- * `lib/time` -- Higher level time functions, including fine-gained and
- monotonic timers.
-
- * `lib/math` -- Floating-point mathematical utilities, including
- compatibility code, and probability distributions.
-
- * `lib/buf` -- A general purpose queued buffer implementation,
- similar to the BSD kernel's "mbuf" structure.
-
- * `lib/net` -- Networking code, including address manipulation,
- compatibility wrappers,
-
- * `lib/compress` -- A compatibility wrapper around several
- compression libraries, currently including zlib, zstd, and lzma.
-
- * `lib/geoip` -- Utilities to manage geoip (IP to country) lookups
- and formats.
-
- * `lib/tls` -- Compatibility wrappers around the library (NSS or
- OpenSSL, depending on configuration) that Tor uses to implement the
- TLS link security protocol.
-
- * `lib/evloop` -- Tools to manage the event loop and related
- functionality, in order to implement asynchronous networking,
- timers, periodic events, and other scheduling tasks.
-
- * `lib/process` -- Utilities and compatibility code to launch and
- manage subprocesses.
-
-### What belongs in lib?
-
-In general, if you can imagine some program wanting the functionality
-you're writing, even if that program had nothing to do with Tor, your
-functionality belongs in lib.
-
-If it falls into one of the existing "lib" categories, your
-functionality belongs in lib.
-
-If you are using platform-specific `#ifdef`s to manage compatibility
-issues among platforms, you should probably consider whether you can
-put your code into lib.
diff --git a/doc/HACKING/design/01a-memory.md b/doc/HACKING/design/01a-memory.md
deleted file mode 100644
index 4c6bb09018..0000000000
--- a/doc/HACKING/design/01a-memory.md
+++ /dev/null
@@ -1,103 +0,0 @@
-
-## Memory management
-
-### Heap-allocation functions: lib/malloc/malloc.h
-
-Tor imposes a few light wrappers over C's native malloc and free
-functions, to improve convenience, and to allow wholescale replacement
-of malloc and free as needed.
-
-You should never use 'malloc', 'calloc', 'realloc, or 'free' on their
-own; always use the variants prefixed with 'tor_'.
-They are the same as the standard C functions, with the following
-exceptions:
-
- * `tor_free(NULL)` is a no-op.
- * `tor_free()` is a macro that takes an lvalue as an argument and sets it to
- NULL after freeing it. To avoid this behavior, you can use `tor_free_()`
- instead.
- * tor_malloc() and friends fail with an assertion if they are asked to
- allocate a value so large that it is probably an underflow.
- * It is always safe to `tor_malloc(0)`, regardless of whether your libc
- allows it.
- * `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail.
- Instead, Tor will die with an assertion. This means that you never
- need to check their return values. See the next subsection for
- information on why we think this is a good idea.
-
-We define additional general-purpose memory allocation functions as well:
-
- * `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear
- the intent to allocate a single zeroed-out value.
- * `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function.
- Use it for cases when you need to realloc() in a multiplication-safe
- way.
-
-And specific-purpose functions as well:
-
- * `tor_strdup()` and `tor_strndup()` behaves as the underlying libc
- functions, but use `tor_malloc()` instead of the underlying function.
- * `tor_memdup()` copies a chunk of memory of a given size.
- * `tor_memdup_nulterm()` copies a chunk of memory of a given size, then
- NUL-terminates it just to be safe.
-
-#### Why assert on allocation failure?
-
-Why don't we allow `tor_malloc()` and its allies to return NULL?
-
-First, it's error-prone. Many programmers forget to check for NULL return
-values, and testing for `malloc()` failures is a major pain.
-
-Second, it's not necessarily a great way to handle OOM conditions. It's
-probably better (we think) to have a memory target where we dynamically free
-things ahead of time in order to stay under the target. Trying to respond to
-an OOM at the point of `tor_malloc()` failure, on the other hand, would involve
-a rare operation invoked from deep in the call stack. (Again, that's
-error-prone and hard to debug.)
-
-Third, thanks to the rise of Linux and other operating systems that allow
-memory to be overcommitted, you can't actually ever rely on getting a NULL
-from `malloc()` when you're out of memory; instead you have to use an approach
-closer to tracking the total memory usage.
-
-#### Conventions for your own allocation functions.
-
-Whenever you create a new type, the convention is to give it a pair of
-`x_new()` and `x_free_()` functions, named after the type.
-
-Calling `x_free(NULL)` should always be a no-op.
-
-There should additionally be an `x_free()` macro, defined in terms of
-`x_free_()`. This macro should set its lvalue to NULL. You can define it
-using the FREE_AND_NULL macro, as follows:
-
-```
-#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr))
-```
-
-
-### Grow-only memory allocation: lib/memarea
-
-It's often handy to allocate a large number of tiny objects, all of which
-need to disappear at the same time. You can do this in tor using the
-memarea.c abstraction, which uses a set of grow-only buffers for allocation,
-and only supports a single "free" operation at the end.
-
-Using memareas also helps you avoid memory fragmentation. You see, some libc
-malloc implementations perform badly on the case where a large number of
-small temporary objects are allocated at the same time as a few long-lived
-objects of similar size. But if you use tor_malloc() for the long-lived ones
-and a memarea for the temporary object, the malloc implementation is likelier
-to do better.
-
-To create a new memarea, use `memarea_new()`. To drop all the storage from a
-memarea, and invalidate its pointers, use `memarea_drop_all()`.
-
-The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`,
-`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous
-to the similarly-named malloc() functions. There is intentionally no
-`memarea_free()` or `memarea_realloc()`.
-
-### Special allocation: lib/malloc/map_anon.h
-
-TODO: WRITEME.
diff --git a/doc/HACKING/design/01b-collections.md b/doc/HACKING/design/01b-collections.md
deleted file mode 100644
index ed6fdc9071..0000000000
--- a/doc/HACKING/design/01b-collections.md
+++ /dev/null
@@ -1,45 +0,0 @@
-
-## Collections in tor
-
-### Smartlists: Neither lists, nor especially smart.
-
-For historical reasons, we call our dynamic-allocated array type
-`smartlist_t`. It can grow or shrink as elements are added and removed.
-
-All smartlists hold an array of `void *`. Whenever you expose a smartlist
-in an API you *must* document which types its pointers actually hold.
-
-<!-- It would be neat to fix that, wouldn't it? -NM -->
-
-Smartlists are created empty with `smartlist_new()` and freed with
-`smartlist_free()`. See the `containers.h` module documentation for more
-information; there are many convenience functions for commonly needed
-operations.
-
-<!-- TODO: WRITE more about what you can do with smartlists. -->
-
-### Digest maps, string maps, and more.
-
-Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
-or nul-terminated strings to `void *`. These types are `digestmap_t`,
-`digest256map_t`, and `strmap_t` respectively. See the containers.h
-module documentation for more information.
-
-### Intrusive lists and hashtables
-
-For performance-sensitive cases, we sometimes want to use "intrusive"
-collections: ones where the bookkeeping pointers are stuck inside the
-structures that belong to the collection. If you've used the
-BSD-style sys/queue.h macros, you'll be familiar with these.
-
-Unfortunately, the `sys/queue.h` macros vary significantly between the
-platforms that have them, so we provide our own variants in
-`src/ext/tor_queue.h`.
-
-We also provide an intrusive hashtable implementation in `src/ext/ht.h`.
-When you're using it, you'll need to define your own hash
-functions. If attacker-induced collisions are a worry here, use the
-cryptographic siphash24g function to extract hashes.
-
-<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions.
--->
diff --git a/doc/HACKING/design/01d-crypto.md b/doc/HACKING/design/01d-crypto.md
index d4def947d1..3e23a07013 100644
--- a/doc/HACKING/design/01d-crypto.md
+++ b/doc/HACKING/design/01d-crypto.md
@@ -1,132 +1,4 @@
-## Lower-level cryptography functionality in Tor ##
-
-Generally speaking, Tor code shouldn't be calling OpenSSL (or any
-other crypto library) directly. Instead, we should indirect through
-one of the functions in src/common/crypto\*.c or src/common/tortls.c.
-
-Cryptography functionality that's available is described below.
-
-### RNG facilities ###
-
-The most basic RNG capability in Tor is the crypto_rand() family of
-functions. These currently use OpenSSL's RAND_() backend, but may use
-something faster in the future.
-
-In addition to crypto_rand(), which fills in a buffer with random
-bytes, we also have functions to produce random integers in certain
-ranges; to produce random hostnames; to produce random doubles, etc.
-
-When you're creating a long-term cryptographic secret, you might want
-to use crypto_strongest_rand() instead of crypto_rand(). It takes the
-operating system's entropy source and combines it with output from
-crypto_rand(). This is a pure paranoia measure, but it might help us
-someday.
-
-You can use smartlist_choose() to pick a random element from a smartlist
-and smartlist_shuffle() to randomize the order of a smartlist. Both are
-potentially a bit slow.
-
-### Cryptographic digests and related functions ###
-
-We treat digests as separate types based on the length of their
-outputs. We support one 160-bit digest (SHA1), two 256-bit digests
-(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512).
-
-You should not use SHA1 for anything new.
-
-The crypto_digest\*() family of functions manipulates digests. You
-can either compute a digest of a chunk of memory all at once using
-crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you
-can create a crypto_digest_t object with
-crypto_digest{,256,512}_new(), feed information to it in chunks using
-crypto_digest_add_bytes(), and then extract the final digest using
-crypto_digest_get_digest(). You can copy the state of one of these
-objects using crypto_digest_dup() or crypto_digest_assign().
-
-We support the HMAC hash-based message authentication code
-instantiated using SHA256. See crypto_hmac_sha256. (You should not
-add any HMAC users with SHA1, and HMAC is not necessary with SHA3.)
-
-We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike
-digests, these are extendable output functions (or XOFs) where you can
-get any amount of output. Use the crypto_xof_\*() functions to access
-these.
-
-We have several ways to derive keys from cryptographically strong secret
-inputs (like diffie-hellman outputs). The old
-crypto_expand_key_material-TAP() performs an ad-hoc KDF based on SHA1 -- you
-shouldn't use it for implementing anything but old versions of the Tor
-protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern
-protocols. Also consider SHAKE256.
-
-If your input is potentially weak, like a password or passphrase, use a salt
-along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer
-scrypt over other hashing methods when possible. If you're using a password
-to encrypt something, see the "boxed file storage" section below.
-
-Finally, in order to store objects in hash tables, Tor includes the
-randomized SipHash 2-4 function. Call it via the siphash24g() function in
-src/ext/siphash.h whenever you're creating a hashtable whose keys may be
-manipulated by an attacker in order to DoS you with collisions.
-
-
-### Stream ciphers ###
-
-You can create instances of a stream cipher using crypto_cipher_new().
-These are stateful objects of type crypto_cipher_t. Note that these
-objects only support AES-128 right now; a future version should add
-support for AES-128 and/or ChaCha20.
-
-You can encrypt/decrypt with crypto_cipher_encrypt or
-crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs
-an encryption without a copy.
-
-Note that sensible people should not use raw stream ciphers; they should
-probably be using some kind of AEAD. Sorry.
-
-### Public key functionality ###
-
-We support four public key algorithms: DH1024, RSA, Curve25519, and
-Ed25519.
-
-We support DH1024 over two prime groups. You access these via the
-crypto_dh_\*() family of functions.
-
-We support RSA in many bit sizes for signing and encryption. You access
-it via the crypto_pk_*() family of functions. Note that a crypto_pk_t
-may or may not include a private key. See the crypto_pk_* functions in
-crypto.c for a full list of functions here.
-
-For Curve25519 functionality, see the functions and types in
-crypto_curve25519.c. Curve25519 is generally suitable for when you need
-a secure fast elliptic-curve diffie hellman implementation. When
-designing new protocols, prefer it over DH in Z_p.
-
-For Ed25519 functionality, see the functions and types in
-crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast
-elliptic curve signature method. For new protocols, prefer it over RSA
-signatures.
-
-### Metaformats for storage ###
-
-When OpenSSL manages the storage of some object, we use whatever format
-OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding
-that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----".
-
-When we manage the storage of some cryptographic object, we prefix the
-object with 32-byte NUL-padded prefix in order to avoid accidental
-object confusion; see the crypto_read_tagged_contents_from_file() and
-crypto_write_tagged_contents_to_file() functions for manipulating
-these. The prefix is "== type: tag ==", where type describes the object
-and its encoding, and tag indicates which one it is.
-
-### Boxed-file storage ###
-
-When managing keys, you frequently want to have some way to write a
-secret object to disk, encrypted with a passphrase. The crypto_pwbox
-and crypto_unpwbox functions do so in a way that's likely to be
-readable by future versions of Tor.
### Certificates ###
@@ -153,17 +25,3 @@ napkin.
documents that include keys and which are signed by keys. You can
consider these documents to be an additional kind of certificate if you
want.)
-
-### TLS ###
-
-Tor's TLS implementation is more tightly coupled to OpenSSL than we'd
-prefer. You can read most of it in tortls.c.
-
-Unfortunately, TLS's state machine and our requirement for nonblocking
-IO support means that using TLS in practice is a bit hairy, since
-logical writes can block on a physical reads, and vice versa.
-
-If you are lucky, you will never have to look at the code here.
-
-
-
diff --git a/doc/HACKING/design/03-modules.md b/doc/HACKING/design/03-modules.md
index 93eb9d3089..9ab2fa7da3 100644
--- a/doc/HACKING/design/03-modules.md
+++ b/doc/HACKING/design/03-modules.md
@@ -1,95 +1,6 @@
## Tor's modules ##
-### Generic modules ###
-
-`buffers.c`
-: Implements the `buf_t` buffered data type for connections, and several
-low-level data handling functions to handle network protocols on it.
-
-`channel.c`
-: Generic channel implementation. Channels handle sending and receiving cells
-among tor nodes.
-
-`channeltls.c`
-: Channel implementation for TLS-based OR connections. Uses `connection_or.c`.
-
-`circuitbuild.c`
-: Code for constructing circuits and choosing their paths. (*Note*:
-this module could plausibly be split into handling the client side,
-the server side, and the path generation aspects of circuit building.)
-
-`circuitlist.c`
-: Code for maintaining and navigating the global list of circuits.
-
-`circuitmux.c`
-: Generic circuitmux implementation. A circuitmux handles deciding, for a
-particular channel, which circuit should write next.
-
-`circuitmux_ewma.c`
-: A circuitmux implementation based on the EWMA (exponentially
-weighted moving average) algorithm.
-
-`circuituse.c`
-: Code to actually send and receive data on circuits.
-
-`command.c`
-: Handles incoming cells on channels.
-
-`config.c`
-: Parses options from torrc, and uses them to configure the rest of Tor.
-
-`confparse.c`
-: Generic torrc-style parser. Used to parse torrc and state files.
-
-`connection.c`
-: Generic and common connection tools, and implementation for the simpler
-connection types.
-
-`connection_edge.c`
-: Implementation for entry and exit connections.
-
-`connection_or.c`
-: Implementation for OR connections (the ones that send cells over TLS).
-
-`main.c`
-: Principal entry point, main loops, scheduled events, and network
-management for Tor.
-
-`ntmain.c`
-: Implements Tor as a Windows service. (Not very well.)
-
-`onion.c`
-: Generic code for generating and responding to CREATE and CREATED
-cells, and performing the appropriate onion handshakes. Also contains
-code to manage the server-side onion queue.
-
-`onion_fast.c`
-: Implements the old SHA1-based CREATE_FAST/CREATED_FAST circuit
-creation handshake. (Now deprecated.)
-
-`onion_ntor.c`
-: Implements the Curve25519-based NTOR circuit creation handshake.
-
-`onion_tap.c`
-: Implements the old RSA1024/DH1024-based TAP circuit creation handshake. (Now
-deprecated.)
-
-`relay.c`
-: Handles particular types of relay cells, and provides code to receive,
-encrypt, route, and interpret relay cells.
-
-`scheduler.c`
-: Decides which channel/circuit pair is ready to receive the next cell.
-
-`statefile.c`
-: Handles loading and storing Tor's state file.
-
-`tor_main.c`
-: Contains the actual `main()` function. (This is placed in a separate
-file so that the unit tests can have their own `main()`.)
-
-
### Node-status modules ###
`directory.c`
diff --git a/src/app/app.dox b/src/app/app.dox
index 29e8651d51..21d5791cde 100644
--- a/src/app/app.dox
+++ b/src/app/app.dox
@@ -1,5 +1,5 @@
/**
-@dir app
+@dir /app
@brief app: top-level entry point for Tor
The "app" directory has Tor's main entry point and configuration logic,
diff --git a/src/app/config/app_config.dox b/src/app/config/app_config.dox
index 03762fd27d..ef4a878277 100644
--- a/src/app/config/app_config.dox
+++ b/src/app/config/app_config.dox
@@ -1,4 +1,8 @@
/**
-@dir app/config
-@brief app/config
+@dir /app/config
+@brief app/config: Top-level configuration code
+
+Refactoring this module is a work in progress, see
+[ticket 29211](https://trac.torproject.org/projects/tor/ticket/29211).
+
**/
diff --git a/src/app/main/app_main.dox b/src/app/main/app_main.dox
index 1d94f89814..c714ad1396 100644
--- a/src/app/main/app_main.dox
+++ b/src/app/main/app_main.dox
@@ -1,4 +1,4 @@
/**
-@dir app/main
-@brief app/main
+@dir /app/main
+@brief app/main: Entry point for tor.
**/
diff --git a/src/core/core.dox b/src/core/core.dox
index 1352daebd3..11bf55cb78 100644
--- a/src/core/core.dox
+++ b/src/core/core.dox
@@ -1,8 +1,20 @@
/**
-@dir core
+@dir /core
@brief core: main loop and onion routing functionality
The "core" directory has the central protocols for Tor, which every
client and relay must implement in order to perform onion routing.
+It is divided into three lower-level pieces:
+
+ - \refdir{core/crypto} -- Tor-specific cryptography.
+
+ - \refdir{core/proto} -- Protocol encoding/decoding.
+
+ - \refdir{core/mainloop} -- A connection-oriented asynchronous mainloop.
+
+and one high-level piece:
+
+ - \refdir{core/or} -- Implements onion routing itself.
+
**/
diff --git a/src/core/crypto/core_crypto.dox b/src/core/crypto/core_crypto.dox
index e5acdd6528..28ece92bb8 100644
--- a/src/core/crypto/core_crypto.dox
+++ b/src/core/crypto/core_crypto.dox
@@ -1,4 +1,8 @@
/**
-@dir core/crypto
-@brief core/crypto
+@dir /core/crypto
+@brief core/crypto: Tor-specific cryptography
+
+This module implements Tor's circuit-construction crypto and Tor's
+relay crypto.
+
**/
diff --git a/src/core/mainloop/core_mainloop.dox b/src/core/mainloop/core_mainloop.dox
index 9b32cb7f60..28cd42bf60 100644
--- a/src/core/mainloop/core_mainloop.dox
+++ b/src/core/mainloop/core_mainloop.dox
@@ -1,4 +1,12 @@
/**
-@dir core/mainloop
-@brief core/mainloop
+@dir /core/mainloop
+@brief core/mainloop: Non-onion-routing mainloop functionality
+
+This module uses the event-loop code of \refdir{lib/evloop} to implement an
+asynchronous connection-oriented protocol handler.
+
+The layering here is imperfect: the code here was split from \refdir{core/or}
+without refactoring how the two modules call one another. Probably many
+functions should be moved and refactored.
+
**/
diff --git a/src/core/or/core_or.dox b/src/core/or/core_or.dox
index 1289a85c80..705e9b5436 100644
--- a/src/core/or/core_or.dox
+++ b/src/core/or/core_or.dox
@@ -1,4 +1,62 @@
/**
-@dir core/or
-@brief core/or
-**/
+@dir /core/or
+@brief core/or: *Onion routing happens here*.
+
+This is the central part of Tor that handles the core tasks of onion routing:
+building circuit, handling circuits, attaching circuit to streams, moving
+data around, and so forth.
+
+Some aspects of this module should probably be refactored into others.
+
+Notable files here include:
+
+`channel.c`
+: Generic channel implementation. Channels handle sending and receiving cells
+among tor nodes.
+
+`channeltls.c`
+: Channel implementation for TLS-based OR connections. Uses `connection_or.c`.
+
+`circuitbuild.c`
+: Code for constructing circuits and choosing their paths. (*Note*:
+this module could plausibly be split into handling the client side,
+the server side, and the path generation aspects of circuit building.)
+
+`circuitlist.c`
+: Code for maintaining and navigating the global list of circuits.
+
+`circuitmux.c`
+: Generic circuitmux implementation. A circuitmux handles deciding, for a
+particular channel, which circuit should write next.
+
+`circuitmux_ewma.c`
+: A circuitmux implementation based on the EWMA (exponentially
+weighted moving average) algorithm.
+
+`circuituse.c`
+: Code to actually send and receive data on circuits.
+
+`command.c`
+: Handles incoming cells on channels.
+
+`connection.c`
+: Generic and common connection tools, and implementation for the simpler
+connection types.
+
+`connection_edge.c`
+: Implementation for entry and exit connections.
+
+`connection_or.c`
+: Implementation for OR connections (the ones that send cells over TLS).
+
+`onion.c`
+: Generic code for generating and responding to CREATE and CREATED
+cells, and performing the appropriate onion handshakes. Also contains
+code to manage the server-side onion queue.
+
+`relay.c`
+: Handles particular types of relay cells, and provides code to receive,
+encrypt, route, and interpret relay cells.
+
+`scheduler.c`
+: Decides which channel/circuit pair is ready to receive the next cell.
diff --git a/src/core/proto/core_proto.dox b/src/core/proto/core_proto.dox
index 3e1e4ddb6d..13ce751a76 100644
--- a/src/core/proto/core_proto.dox
+++ b/src/core/proto/core_proto.dox
@@ -1,4 +1,8 @@
/**
-@dir core/proto
-@brief core/proto
+@dir /core/proto
+@brief core/proto: Protocol encoding/decoding
+
+These functions should (but do not always) exist at a lower level than most
+of the rest of core.
+
**/
diff --git a/src/feature/api/feature_api.dox b/src/feature/api/feature_api.dox
index cb723b0601..06112120c3 100644
--- a/src/feature/api/feature_api.dox
+++ b/src/feature/api/feature_api.dox
@@ -1,4 +1,4 @@
/**
-@dir feature/api
-@brief feature/api
+@dir /feature/api
+@brief feature/api: In-process interface to starting/stopping Tor.
**/
diff --git a/src/feature/client/feature_client.dox b/src/feature/client/feature_client.dox
index 1a4881c50a..a8263b494c 100644
--- a/src/feature/client/feature_client.dox
+++ b/src/feature/client/feature_client.dox
@@ -1,4 +1,7 @@
/**
-@dir feature/client
-@brief feature/client
+@dir /feature/client
+@brief feature/client: Client-specific code
+
+(There is also a bunch of client-specific code in other modules.)
+
**/
diff --git a/src/feature/control/feature_control.dox b/src/feature/control/feature_control.dox
index 1f6e83c1dd..a0bf9413a1 100644
--- a/src/feature/control/feature_control.dox
+++ b/src/feature/control/feature_control.dox
@@ -1,4 +1,10 @@
/**
-@dir feature/control
-@brief feature/control
+@dir /feature/control
+@brief feature/control: Controller API.
+
+The Controller API is a text-based protocol that another program (or another
+thread, if you're running Tor in-process) can use to configure and control
+Tor while it is running. The current protocol is documented in
+[control-spec.txt](https://gitweb.torproject.org/torspec.git/tree/control-spec.txt).
+
**/
diff --git a/src/feature/dirauth/feature_dirauth.dox b/src/feature/dirauth/feature_dirauth.dox
index fa4bee5b31..9ee2d04589 100644
--- a/src/feature/dirauth/feature_dirauth.dox
+++ b/src/feature/dirauth/feature_dirauth.dox
@@ -1,4 +1,11 @@
/**
-@dir feature/dirauth
-@brief feature/dirauth
+@dir /feature/dirauth
+@brief feature/dirauth: Directory authority implementation.
+
+This module handles running Tor as a directory authority.
+
+The directory protocol is specified in
+[dir-spec.txt](https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt).
+
+
**/
diff --git a/src/feature/dircache/feature_dircache.dox b/src/feature/dircache/feature_dircache.dox
index 5f1c5cc70f..ef8a51aa9e 100644
--- a/src/feature/dircache/feature_dircache.dox
+++ b/src/feature/dircache/feature_dircache.dox
@@ -1,4 +1,8 @@
/**
-@dir feature/dircache
-@brief feature/dircache
+@dir /feature/dircache
+@brief feature/dircache: Run as a directory cache server
+
+This module handles the directory caching functionality that all relays may
+provide, for serving cached directory objects to objects.
+
**/
diff --git a/src/feature/dirclient/feature_dirclient.dox b/src/feature/dirclient/feature_dirclient.dox
index 984a17cf51..0cbae69111 100644
--- a/src/feature/dirclient/feature_dirclient.dox
+++ b/src/feature/dirclient/feature_dirclient.dox
@@ -1,4 +1,9 @@
/**
-@dir feature/dirclient
-@brief feature/dirclient
+@dir /feature/dirclient
+@brief feature/dirclient: Directory client implementation.
+
+The code here is used by all Tor instances that need to download directory
+information. Currently, that is all of them, since even authorities need to
+launch downloads to learn about relays that other authorities have listed.
+
**/
diff --git a/src/feature/dircommon/feature_dircommon.dox b/src/feature/dircommon/feature_dircommon.dox
index 2eff21065c..2d9866da01 100644
--- a/src/feature/dircommon/feature_dircommon.dox
+++ b/src/feature/dircommon/feature_dircommon.dox
@@ -1,4 +1,9 @@
/**
-@dir feature/dircommon
-@brief feature/dircommon
+@dir /feature/dircommon
+@brief feature/dircommon: Directory client and server shared code
+
+This module has the code that directory clients (anybody who download
+information about relays) and directory servers (anybody who serves such
+information) share in common.
+
**/
diff --git a/src/feature/dirparse/feature_dirparse.dox b/src/feature/dirparse/feature_dirparse.dox
index a6b34c1f5f..4f2136b02b 100644
--- a/src/feature/dirparse/feature_dirparse.dox
+++ b/src/feature/dirparse/feature_dirparse.dox
@@ -1,4 +1,10 @@
/**
-@dir feature/dirparse
-@brief feature/dirparse
+@dir /feature/dirparse
+@brief feature/dirparse: Parsing Tor directory objects
+
+We define a number of "directory objects" in
+[dir-spec.txt](https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt),
+all of them using a common line-oriented meta-format. This module is used by
+other parts of Tor to parse them.
+
**/
diff --git a/src/feature/feature.dox b/src/feature/feature.dox
index 1d9c3a9df4..03759f9a17 100644
--- a/src/feature/feature.dox
+++ b/src/feature/feature.dox
@@ -1,5 +1,5 @@
/**
-@dir feature
+@dir /feature
@brief feature: domain-specific modules
The "feature" directory has modules that Tor uses only for a particular
diff --git a/src/feature/hibernate/feature_hibernate.dox b/src/feature/hibernate/feature_hibernate.dox
index e24620a43c..eebb2d51a2 100644
--- a/src/feature/hibernate/feature_hibernate.dox
+++ b/src/feature/hibernate/feature_hibernate.dox
@@ -1,4 +1,16 @@
/**
-@dir feature/hibernate
-@brief feature/hibernate
+@dir /feature/hibernate
+@brief feature/hibernate: Bandwidth accounting and hibernation (!)
+
+This module implements two features that are only somewhat related, and
+should probably be separated in the future. One feature is bandwidth
+accounting (making sure we use no more than so many gigabytes in a day) and
+hibernation (avoiding network activity while we have used up all/most of our
+configured gigabytes). The other feature is clean shutdown, where we stop
+accepting new connections for a while and give the old ones time to close.
+
+The two features are related only in the sense that "soft hibernation" (being
+almost out of ) is very close to the "shutting down" state. But it would be
+better in the long run to make the two completely separate.
+
**/
diff --git a/src/feature/hs/feature_hs.dox b/src/feature/hs/feature_hs.dox
index 08801d002d..32f44d57fb 100644
--- a/src/feature/hs/feature_hs.dox
+++ b/src/feature/hs/feature_hs.dox
@@ -1,4 +1,10 @@
/**
-@dir feature/hs
-@brief feature/hs
+@dir /feature/hs
+@brief feature/hs: v3 (current) onion service protocol
+
+This directory implements the v3 onion service protocol,
+as specified in
+[rend-spec-v3.txt](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt).
+
+
**/
diff --git a/src/feature/hs_common/feature_hs_common.dox b/src/feature/hs_common/feature_hs_common.dox
index 8fd4f1b07c..85d7585872 100644
--- a/src/feature/hs_common/feature_hs_common.dox
+++ b/src/feature/hs_common/feature_hs_common.dox
@@ -1,4 +1,5 @@
/**
-@dir feature/hs_common
-@brief feature/hs_common
+@dir /feature/hs_common
+@brief feature/hs_common: Common to v2 (old) and v3 (current) onion services
+
**/
diff --git a/src/feature/keymgt/feature_keymgt.dox b/src/feature/keymgt/feature_keymgt.dox
index 8f72c70bbd..acc840eb2e 100644
--- a/src/feature/keymgt/feature_keymgt.dox
+++ b/src/feature/keymgt/feature_keymgt.dox
@@ -1,4 +1,5 @@
/**
-@dir feature/keymgt
-@brief feature/keymgt
+@dir /feature/keymgt
+@brief feature/keymgt: Store keys for relays, authorities, etc.
+
**/
diff --git a/src/feature/nodelist/feature_nodelist.dox b/src/feature/nodelist/feature_nodelist.dox
index faeb9970b3..0b25dd246d 100644
--- a/src/feature/nodelist/feature_nodelist.dox
+++ b/src/feature/nodelist/feature_nodelist.dox
@@ -1,4 +1,4 @@
/**
-@dir feature/nodelist
-@brief feature/nodelist
+@dir /feature/nodelist
+@brief feature/nodelist: Download and manage a list of relays
**/
diff --git a/src/feature/relay/feature_relay.dox b/src/feature/relay/feature_relay.dox
index 9aa7af48e6..6867818257 100644
--- a/src/feature/relay/feature_relay.dox
+++ b/src/feature/relay/feature_relay.dox
@@ -1,4 +1,6 @@
/**
-@dir feature/relay
-@brief feature/relay
+@dir /feature/relay
+@brief feature/relay: Relay-specific code
+
+(There is also a bunch of relay-specific code in other modules.)
**/
diff --git a/src/feature/rend/feature_rend.dox b/src/feature/rend/feature_rend.dox
index fcba0d460f..ed0784521c 100644
--- a/src/feature/rend/feature_rend.dox
+++ b/src/feature/rend/feature_rend.dox
@@ -1,4 +1,9 @@
/**
-@dir feature/rend
-@brief feature/rend
+@dir /feature/rend
+@brief feature/rend: version 2 (old) hidden services
+
+This directory implements the v2 onion service protocol,
+as specified in
+[rend-spec-v2.txt](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v2.txt).
+
**/
diff --git a/src/feature/stats/feature_stats.dox b/src/feature/stats/feature_stats.dox
index fc4ffd19df..0ced00ce58 100644
--- a/src/feature/stats/feature_stats.dox
+++ b/src/feature/stats/feature_stats.dox
@@ -1,4 +1,12 @@
/**
-@dir feature/stats
-@brief feature/stats
+@dir /feature/stats
+@brief feature/stats: Relay statistics. Also, port prediction.
+
+This module collects anonymized relay statistics in order to publish them in
+relays' routerinfo and extrainfo documents.
+
+Additionally, it contains predict_ports.c, which remembers which ports we've
+visited recently as a client, so we can make sure we have open circuits that
+support them.
+
**/
diff --git a/src/lib/arch/lib_arch.dox b/src/lib/arch/lib_arch.dox
index 60b5fafeb4..edb0cbbf1d 100644
--- a/src/lib/arch/lib_arch.dox
+++ b/src/lib/arch/lib_arch.dox
@@ -1,4 +1,4 @@
/**
-@dir lib/arch
-@brief lib/arch
+@dir /lib/arch
+@brief lib/arch: Compatibility code for handling different CPU architectures.
**/
diff --git a/src/lib/buf/lib_buf.dox b/src/lib/buf/lib_buf.dox
index f21c4b1b72..a2ac23ee4c 100644
--- a/src/lib/buf/lib_buf.dox
+++ b/src/lib/buf/lib_buf.dox
@@ -1,4 +1,15 @@
/**
-@dir lib/buf
-@brief lib/buf
+@dir /lib/buf
+@brief lib/buf: An efficient byte queue.
+
+This module defines the buf_t type, which is used throughout our networking
+code. The implementation is a singly-linked queue of buffer chunks, similar
+to the BSD kernel's
+["mbuf"](https://www.freebsd.org/cgi/man.cgi?query=mbuf&sektion=9) structure.
+
+The buf_t type is also reasonable for use in constructing long strings.
+
+See \refdir{lib/net} for networking code that uses buf_t, and
+\refdir{lib/tls} for cryptographic code that uses buf_t.
+
**/
diff --git a/src/lib/cc/lib_cc.dox b/src/lib/cc/lib_cc.dox
index 804260cb29..06f4e775bf 100644
--- a/src/lib/cc/lib_cc.dox
+++ b/src/lib/cc/lib_cc.dox
@@ -1,4 +1,4 @@
/**
-@dir lib/cc
-@brief lib/cc
+@dir /lib/cc
+@brief lib/cc: Macros for managing the C compiler and language.
**/
diff --git a/src/lib/compress/lib_compress.dox b/src/lib/compress/lib_compress.dox
index ac60794565..599126901a 100644
--- a/src/lib/compress/lib_compress.dox
+++ b/src/lib/compress/lib_compress.dox
@@ -1,4 +1,8 @@
/**
-@dir lib/compress
-@brief lib/compress
+@dir /lib/compress
+@brief lib/compress: Wraps several compression libraries
+
+Currently supported are zlib (mandatory), zstd (optional), and lzma
+(optional).
+
**/
diff --git a/src/lib/conf/lib_conf.dox b/src/lib/conf/lib_conf.dox
index 40a1d9f90f..be58fe5b55 100644
--- a/src/lib/conf/lib_conf.dox
+++ b/src/lib/conf/lib_conf.dox
@@ -1,4 +1,5 @@
/**
-@dir lib/conf
-@brief lib/conf
+@dir /lib/conf
+@brief lib/conf: Types and macros for declaring configuration options.
+
**/
diff --git a/src/lib/confmgt/lib_confmgt.dox b/src/lib/confmgt/lib_confmgt.dox
index 964fe1d074..d18fa304ca 100644
--- a/src/lib/confmgt/lib_confmgt.dox
+++ b/src/lib/confmgt/lib_confmgt.dox
@@ -1,4 +1,9 @@
/**
-@dir lib/confmgt
-@brief lib/confmgt
+@dir /lib/confmgt
+@brief lib/confmgt: Parse, encode, manipulate configuration files.
+
+This logic is used in common by our state files (statefile.c) and
+configuration files (config.c) to manage a set of named, typed fields,
+reading and writing them to disk and to the controller.
+
**/
diff --git a/src/lib/container/lib_container.dox b/src/lib/container/lib_container.dox
index 6ee719f47e..675aaeef3f 100644
--- a/src/lib/container/lib_container.dox
+++ b/src/lib/container/lib_container.dox
@@ -1,4 +1,51 @@
/**
-@dir lib/container
-@brief lib/container
+@dir /lib/container
+@brief lib/container: Hash tables, dynamic arrays, bit arrays, etc.
+
+### Smartlists: Neither lists, nor especially smart.
+
+For historical reasons, we call our dynamic-allocated array type
+`smartlist_t`. It can grow or shrink as elements are added and removed.
+
+All smartlists hold an array of `void *`. Whenever you expose a smartlist
+in an API you *must* document which types its pointers actually hold.
+
+<!-- It would be neat to fix that, wouldn't it? -NM -->
+
+Smartlists are created empty with `smartlist_new()` and freed with
+`smartlist_free()`. See the `containers.h` header documentation for more
+information; there are many convenience functions for commonly needed
+operations.
+
+For low-level operations on smartlists, see also
+\refdir{lib/smartlist_core}.
+
+<!-- TODO: WRITE more about what you can do with smartlists. -->
+
+### Digest maps, string maps, and more.
+
+Tor makes frequent use of maps from 160-bit digests, 256-bit digests,
+or nul-terminated strings to `void *`. These types are `digestmap_t`,
+`digest256map_t`, and `strmap_t` respectively. See the containers.h
+module documentation for more information.
+
+### Intrusive lists and hashtables
+
+For performance-sensitive cases, we sometimes want to use "intrusive"
+collections: ones where the bookkeeping pointers are stuck inside the
+structures that belong to the collection. If you've used the
+BSD-style sys/queue.h macros, you'll be familiar with these.
+
+Unfortunately, the `sys/queue.h` macros vary significantly between the
+platforms that have them, so we provide our own variants in
+`ext/tor_queue.h`.
+
+We also provide an intrusive hashtable implementation in `ext/ht.h`.
+When you're using it, you'll need to define your own hash
+functions. If attacker-induced collisions are a worry here, use the
+cryptographic siphash24g function to extract hashes.
+
+<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions.
+-->
+
**/
diff --git a/src/lib/crypt_ops/lib_crypt_ops.dox b/src/lib/crypt_ops/lib_crypt_ops.dox
index 1ea0b67d59..515c67f1c0 100644
--- a/src/lib/crypt_ops/lib_crypt_ops.dox
+++ b/src/lib/crypt_ops/lib_crypt_ops.dox
@@ -1,4 +1,139 @@
/**
-@dir lib/crypt_ops
-@brief lib/crypt_ops
+@dir /lib/crypt_ops
+@brief lib/crypt_ops: Cryptographic operations.
+
+This module contains wrappers around the cryptographic libraries that we
+support, and implementations for some higher-level cryptographic
+constructions that we use.
+
+It wraps our two major cryptographic backends (OpenSSL or NSS, as configured
+by the user), and also wraps other cryptographic code in src/ext.
+
+Generally speaking, Tor code shouldn't be calling OpenSSL or NSS
+(or any other crypto library) directly. Instead, we should indirect through
+one of the functions in this directory, or through \refdir{lib/tls}.
+
+Cryptography functionality that's available is described below.
+
+### RNG facilities ###
+
+The most basic RNG capability in Tor is the crypto_rand() family of
+functions. These currently use OpenSSL's RAND_() backend, but may use
+something faster in the future.
+
+In addition to crypto_rand(), which fills in a buffer with random
+bytes, we also have functions to produce random integers in certain
+ranges; to produce random hostnames; to produce random doubles, etc.
+
+When you're creating a long-term cryptographic secret, you might want
+to use crypto_strongest_rand() instead of crypto_rand(). It takes the
+operating system's entropy source and combines it with output from
+crypto_rand(). This is a pure paranoia measure, but it might help us
+someday.
+
+You can use smartlist_choose() to pick a random element from a smartlist
+and smartlist_shuffle() to randomize the order of a smartlist. Both are
+potentially a bit slow.
+
+### Cryptographic digests and related functions ###
+
+We treat digests as separate types based on the length of their
+outputs. We support one 160-bit digest (SHA1), two 256-bit digests
+(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512).
+
+You should not use SHA1 for anything new.
+
+The crypto_digest\*() family of functions manipulates digests. You
+can either compute a digest of a chunk of memory all at once using
+crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you
+can create a crypto_digest_t object with
+crypto_digest{,256,512}_new(), feed information to it in chunks using
+crypto_digest_add_bytes(), and then extract the final digest using
+crypto_digest_get_digest(). You can copy the state of one of these
+objects using crypto_digest_dup() or crypto_digest_assign().
+
+We support the HMAC hash-based message authentication code
+instantiated using SHA256. See crypto_hmac_sha256. (You should not
+add any HMAC users with SHA1, and HMAC is not necessary with SHA3.)
+
+We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike
+digests, these are extendable output functions (or XOFs) where you can
+get any amount of output. Use the crypto_xof_\*() functions to access
+these.
+
+We have several ways to derive keys from cryptographically strong secret
+inputs (like diffie-hellman outputs). The old
+crypto_expand_key_material_TAP() performs an ad-hoc KDF based on SHA1 -- you
+shouldn't use it for implementing anything but old versions of the Tor
+protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern
+protocols. Also consider SHAKE256.
+
+If your input is potentially weak, like a password or passphrase, use a salt
+along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer
+scrypt over other hashing methods when possible. If you're using a password
+to encrypt something, see the "boxed file storage" section below.
+
+Finally, in order to store objects in hash tables, Tor includes the
+randomized SipHash 2-4 function. Call it via the siphash24g() function in
+src/ext/siphash.h whenever you're creating a hashtable whose keys may be
+manipulated by an attacker in order to DoS you with collisions.
+
+
+### Stream ciphers ###
+
+You can create instances of a stream cipher using crypto_cipher_new().
+These are stateful objects of type crypto_cipher_t. Note that these
+objects only support AES-128 right now; a future version should add
+support for AES-128 and/or ChaCha20.
+
+You can encrypt/decrypt with crypto_cipher_encrypt or
+crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs
+an encryption without a copy.
+
+Note that sensible people should not use raw stream ciphers; they should
+probably be using some kind of AEAD. Sorry.
+
+### Public key functionality ###
+
+We support four public key algorithms: DH1024, RSA, Curve25519, and
+Ed25519.
+
+We support DH1024 over two prime groups. You access these via the
+crypto_dh_\*() family of functions.
+
+We support RSA in many bit sizes for signing and encryption. You access
+it via the crypto_pk_*() family of functions. Note that a crypto_pk_t
+may or may not include a private key. See the crypto_pk_* functions in
+crypto.c for a full list of functions here.
+
+For Curve25519 functionality, see the functions and types in
+crypto_curve25519.c. Curve25519 is generally suitable for when you need
+a secure fast elliptic-curve diffie hellman implementation. When
+designing new protocols, prefer it over DH in Z_p.
+
+For Ed25519 functionality, see the functions and types in
+crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast
+elliptic curve signature method. For new protocols, prefer it over RSA
+signatures.
+
+### Metaformats for storage ###
+
+When OpenSSL manages the storage of some object, we use whatever format
+OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding
+that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----".
+
+When we manage the storage of some cryptographic object, we prefix the
+object with 32-byte NUL-padded prefix in order to avoid accidental
+object confusion; see the crypto_read_tagged_contents_from_file() and
+crypto_write_tagged_contents_to_file() functions for manipulating
+these. The prefix is "== type: tag ==", where type describes the object
+and its encoding, and tag indicates which one it is.
+
+### Boxed-file storage ###
+
+When managing keys, you frequently want to have some way to write a
+secret object to disk, encrypted with a passphrase. The crypto_pwbox
+and crypto_unpwbox functions do so in a way that's likely to be
+readable by future versions of Tor.
+
**/
diff --git a/src/lib/ctime/lib_ctime.dox b/src/lib/ctime/lib_ctime.dox
index 476c95991c..2bcd0f036a 100644
--- a/src/lib/ctime/lib_ctime.dox
+++ b/src/lib/ctime/lib_ctime.dox
@@ -1,4 +1,16 @@
/**
-@dir lib/ctime
-@brief lib/ctime
+@dir /lib/ctime
+@brief lib/ctime: Constant-time code to avoid side-channels.
+
+This module contains constant-time implementations of various
+data comparison and table lookup functions. We use these in preference to
+memcmp() and so forth, since memcmp() can leak information about its inputs
+based on how fast it returns. In general, your code should call tor_memeq()
+and tor_memneq(), not memcmp().
+
+We also define some _non_-constant-time wrappers for memcmp() here: Since we
+consider calls to memcmp() to be in error, we require that code that actually
+doesn't need to be constant-time to use the fast_memeq() / fast_memneq() /
+fast_memcmp() aliases instead.
+
**/
diff --git a/src/lib/defs/lib_defs.dox b/src/lib/defs/lib_defs.dox
index 5adb527fc7..8ed4d7a0af 100644
--- a/src/lib/defs/lib_defs.dox
+++ b/src/lib/defs/lib_defs.dox
@@ -1,4 +1,4 @@
/**
-@dir lib/defs
-@brief lib/defs
+@dir /lib/defs
+@brief lib/defs: Lowest-level constants, used in many places.
**/
diff --git a/src/lib/dispatch/lib_dispatch.dox b/src/lib/dispatch/lib_dispatch.dox
index f194eff481..955b7df64f 100644
--- a/src/lib/dispatch/lib_dispatch.dox
+++ b/src/lib/dispatch/lib_dispatch.dox
@@ -1,4 +1,16 @@
/**
-@dir lib/dispatch
-@brief lib/dispatch
+@dir /lib/dispatch
+@brief lib/dispatch: In-process message delivery.
+
+This module provides a general in-process "message dispatch" system in which
+typed messages are sent on channels. The dispatch.h header has far more
+information.
+
+It is used by by \refdir{lib/pubsub} to implement our general
+inter-module publish/subscribe system.
+
+This is not a fancy multi-threaded many-to-many dispatcher as you may be used
+to from more sophisticated architectures: this dispatcher is intended only
+for use in improving Tor's architecture.
+
**/
diff --git a/src/lib/encoding/lib_encoding.dox b/src/lib/encoding/lib_encoding.dox
index 4a5fad9271..ca698cb183 100644
--- a/src/lib/encoding/lib_encoding.dox
+++ b/src/lib/encoding/lib_encoding.dox
@@ -1,4 +1,8 @@
/**
-@dir lib/encoding
-@brief lib/encoding
+@dir /lib/encoding
+@brief lib/encoding: Encoding data in various forms, types, and transformations
+
+Here we have time formats (timefmt.c), quoted strings (qstring.c), C strings
+(string.c) base-16/32/64 (binascii.c), and more.
+
**/
diff --git a/src/lib/err/lib_err.dox b/src/lib/err/lib_err.dox
index 8994fa5fd8..d1479b1140 100644
--- a/src/lib/err/lib_err.dox
+++ b/src/lib/err/lib_err.dox
@@ -1,4 +1,15 @@
/**
-@dir lib/err
-@brief lib/err
+@dir /lib/err
+@brief lib/err: Lowest-level error handling code.
+
+This module is responsible for generating stack traces, handling raw
+assertion failures, and otherwise reporting problems that might not be
+safe to report via the regular logging module.
+
+There are three kinds of users for the functions in this module:
+ * Code that needs a way to assert(), but which cannot use the regular
+ `tor_assert()` macros in logging module.
+ * Code that needs signal-safe error reporting.
+ * Higher-level error handling code.
+
**/
diff --git a/src/lib/evloop/lib_evloop.dox b/src/lib/evloop/lib_evloop.dox
index 86b60e3cd5..52fcf67755 100644
--- a/src/lib/evloop/lib_evloop.dox
+++ b/src/lib/evloop/lib_evloop.dox
@@ -1,4 +1,9 @@
/**
-@dir lib/evloop
-@brief lib/evloop
+@dir /lib/evloop
+@brief lib/evloop: Low-level event loop.
+
+This modules has tools to manage the [libevent](https://libevent.org/) event
+loop and related functionality, in order to implement asynchronous
+networking, timers, periodic events, and other scheduling tasks.
+
**/
diff --git a/src/lib/fdio/lib_fdio.dox b/src/lib/fdio/lib_fdio.dox
index b868d28aab..9e2fda617a 100644
--- a/src/lib/fdio/lib_fdio.dox
+++ b/src/lib/fdio/lib_fdio.dox
@@ -1,4 +1,7 @@
/**
-@dir lib/fdio
-@brief lib/fdio
+@dir /lib/fdio
+@brief lib/fdio: Code to read/write on file descriptors.
+
+(This module also handles sockets, on platforms where a socket is not a kind
+of fd.)
**/
diff --git a/src/lib/fs/lib_fs.dox b/src/lib/fs/lib_fs.dox
index ad775ba553..4466250bb8 100644
--- a/src/lib/fs/lib_fs.dox
+++ b/src/lib/fs/lib_fs.dox
@@ -1,4 +1,11 @@
/**
-@dir lib/fs
-@brief lib/fs
+@dir /lib/fs
+@brief lib/fs: Files, filenames, directories, etc.
+
+This module is mostly a set of compatibility wrappers around
+operating-system-specific filesystem access.
+
+It also contains a set of convenience functions for safely writing to files,
+creating directories, and so on.
+
**/
diff --git a/src/lib/geoip/lib_geoip.dox b/src/lib/geoip/lib_geoip.dox
index 7ad99e8f55..da1123640b 100644
--- a/src/lib/geoip/lib_geoip.dox
+++ b/src/lib/geoip/lib_geoip.dox
@@ -1,4 +1,5 @@
/**
-@dir lib/geoip
-@brief lib/geoip
+@dir /lib/geoip
+@brief lib/geoip: IP-to-country mapping
+
**/
diff --git a/src/lib/intmath/lib_intmath.dox b/src/lib/intmath/lib_intmath.dox
index ce71e455d1..e9b7044706 100644
--- a/src/lib/intmath/lib_intmath.dox
+++ b/src/lib/intmath/lib_intmath.dox
@@ -1,4 +1,4 @@
/**
-@dir lib/intmath
-@brief lib/intmath
+@dir /lib/intmath
+@brief lib/intmath: Integer mathematics.
**/
diff --git a/src/lib/lib.dox b/src/lib/lib.dox
index f1b2291c76..fdf2c47687 100644
--- a/src/lib/lib.dox
+++ b/src/lib/lib.dox
@@ -1,8 +1,133 @@
/**
-@dir lib
+@dir /lib
@brief lib: low-level functionality.
-The "lib" directory contains low-level functionality, most of it not
-necessarily Tor-specific.
+The "lib" directory contains low-level functionality. In general, this
+code is not necessarily Tor-specific, but is instead possibly useful for
+other applications.
+
+The modules in `lib` are currently well-factored: each one depends
+only on lower-level modules. You can see an up-to-date list of the
+modules, sorted from lowest to highest level, by running
+`./scripts/maint/practracker/includes.py --toposort`.
+
+As of this writing, the library modules are (from lowest to highest
+level):
+
+ - \refdir{lib/cc} -- Macros for managing the C compiler and
+ language.
+
+ - \refdir{lib/version} -- Holds the current version of Tor.
+
+ - \refdir{lib/testsupport} -- Helpers for making
+ test-only code, and test mocking support.
+
+ - \refdir{lib/defs} -- Lowest-level constants.
+
+ - \refdir{lib/subsys} -- Types used for declaring a
+ "subsystem". (_A subsystem is a module with support for initialization,
+ shutdown, configuration, and so on._)
+
+ - \refdir{lib/conf} -- For declaring configuration options.
+
+ - \refdir{lib/arch} -- For handling differences in CPU
+ architecture.
+
+ - \refdir{lib/err} -- Lowest-level error handling code.
+
+ - \refdir{lib/malloc} -- Memory management.
+ management.
+
+ - \refdir{lib/intmath} -- Integer mathematics.
+
+ - \refdir{lib/fdio} -- For
+ reading and writing n file descriptors.
+
+ - \refdir{lib/lock} -- Simple locking support.
+ (_Lower-level than the rest of the threading code._)
+
+ - \refdir{lib/ctime} -- Constant-time code to avoid
+ side-channels.
+
+ - \refdir{lib/string} -- Low-level string manipulation.
+
+ - \refdir{lib/wallclock} --
+ For inspecting and manipulating the current (UTC) time.
+
+ - \refdir{lib/osinfo} -- For inspecting the OS version
+ and capabilities.
+
+ - \refdir{lib/smartlist_core} -- The bare-bones
+ pieces of our dynamic array ("smartlist") implementation.
+
+ - \refdir{lib/log} -- Log messages to files, syslogs, etc.
+
+ - \refdir{lib/container} -- General purpose containers,
+ including dynamic arrays ("smartlists"), hashtables, bit arrays,
+ etc.
+
+ - \refdir{lib/trace} -- A general-purpose API
+ function-tracing functionality Tor. (_Currently not much used._)
+
+ - \refdir{lib/thread} -- Mid-level Threading.
+
+ - \refdir{lib/term} -- Terminal manipulation
+ (like reading a password from the user).
+
+ - \refdir{lib/memarea} -- A fast
+ "arena" style allocator, where the data is freed all at once.
+
+ - \refdir{lib/encoding} -- Encoding
+ data in various formats, datatypes, and transformations.
+
+ - \refdir{lib/dispatch} -- A general-purpose in-process
+ message delivery system.
+
+ - \refdir{lib/sandbox} -- Our Linux seccomp2 sandbox
+ implementation.
+
+ - \refdir{lib/pubsub} -- A publish/subscribe message passing system.
+
+ - \refdir{lib/fs} -- Files, filenames, directories, etc.
+
+ - \refdir{lib/confmgt} -- Parse, encode, and manipulate onfiguration files.
+
+ - \refdir{lib/crypt_ops} -- Cryptographic operations.
+
+ - \refdir{lib/meminfo} -- Functions for inspecting our
+ memory usage, if the malloc implementation exposes that to us.
+
+ - \refdir{lib/time} -- Higher level time functions, including
+ fine-gained and monotonic timers.
+
+ - \refdir{lib/math} -- Floating-point mathematical utilities.
+
+ - \refdir{lib/buf} -- An efficient byte queue.
+
+ - \refdir{lib/net} -- Networking code, including address
+ manipulation, compatibility wrappers, etc.
+
+ - \refdir{lib/compress} -- Wraps several compression libraries.
+
+ - \refdir{lib/geoip} -- IP-to-country mapping.
+
+ - \refdir{lib/tls} -- TLS library wrappers.
+
+ - \refdir{lib/evloop} -- Low-level event-loop.
+
+ - \refdir{lib/process} -- Launch and manage subprocesses.
+
+### What belongs in lib?
+
+In general, if you can imagine some program wanting the functionality
+you're writing, even if that program had nothing to do with Tor, your
+functionality belongs in lib.
+
+If it falls into one of the existing "lib" categories, your
+functionality belongs in lib.
+
+If you are using platform-specific `ifdef`s to manage compatibility
+issues among platforms, you should probably consider whether you can
+put your code into lib.
**/
diff --git a/src/lib/lock/lib_lock.dox b/src/lib/lock/lib_lock.dox
index 44693e7a69..868b5ba7d4 100644
--- a/src/lib/lock/lib_lock.dox
+++ b/src/lib/lock/lib_lock.dox
@@ -1,4 +1,8 @@
/**
-@dir lib/lock
-@brief lib/lock
+@dir /lib/lock
+@brief lib/lock: Simple locking support.
+
+This module is more low-level than the rest of the threading code, since it
+is needed by more intermediate-level modules.
+
**/
diff --git a/src/lib/log/lib_log.dox b/src/lib/log/lib_log.dox
index 915d652407..a772dc3207 100644
--- a/src/lib/log/lib_log.dox
+++ b/src/lib/log/lib_log.dox
@@ -1,4 +1,12 @@
/**
-@dir lib/log
-@brief lib/log
+@dir /lib/log
+@brief lib/log: Log messages to files, syslogs, etc.
+
+You can think of this as the logical "midpoint" of the
+\refdir{lib} code": much of the higher-level code is higher-level
+_because_ it uses the logging module, and much of the lower-level code is
+specifically written to avoid having to log, because the logging module
+depends on it.
+
+
**/
diff --git a/src/lib/malloc/lib_malloc.dox b/src/lib/malloc/lib_malloc.dox
index 4923f14463..c05e4c6473 100644
--- a/src/lib/malloc/lib_malloc.dox
+++ b/src/lib/malloc/lib_malloc.dox
@@ -1,4 +1,78 @@
/**
-@dir lib/malloc
-@brief lib/malloc
+@dir /lib/malloc
+@brief lib/malloc: Wrappers and utilities for memory management.
+
+
+Tor imposes a few light wrappers over C's native malloc and free
+functions, to improve convenience, and to allow wholescale replacement
+of malloc and free as needed.
+
+You should never use 'malloc', 'calloc', 'realloc, or 'free' on their
+own; always use the variants prefixed with 'tor_'.
+They are the same as the standard C functions, with the following
+exceptions:
+
+ * `tor_free(NULL)` is a no-op.
+ * `tor_free()` is a macro that takes an lvalue as an argument and sets it to
+ NULL after freeing it. To avoid this behavior, you can use `tor_free_()`
+ instead.
+ * tor_malloc() and friends fail with an assertion if they are asked to
+ allocate a value so large that it is probably an underflow.
+ * It is always safe to `tor_malloc(0)`, regardless of whether your libc
+ allows it.
+ * `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail.
+ Instead, Tor will die with an assertion. This means that you never
+ need to check their return values. See the next subsection for
+ information on why we think this is a good idea.
+
+We define additional general-purpose memory allocation functions as well:
+
+ * `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear
+ the intent to allocate a single zeroed-out value.
+ * `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function.
+ Use it for cases when you need to realloc() in a multiplication-safe
+ way.
+
+And specific-purpose functions as well:
+
+ * `tor_strdup()` and `tor_strndup()` behaves as the underlying libc
+ functions, but use `tor_malloc()` instead of the underlying function.
+ * `tor_memdup()` copies a chunk of memory of a given size.
+ * `tor_memdup_nulterm()` copies a chunk of memory of a given size, then
+ NUL-terminates it just to be safe.
+
+#### Why assert on allocation failure?
+
+Why don't we allow `tor_malloc()` and its allies to return NULL?
+
+First, it's error-prone. Many programmers forget to check for NULL return
+values, and testing for `malloc()` failures is a major pain.
+
+Second, it's not necessarily a great way to handle OOM conditions. It's
+probably better (we think) to have a memory target where we dynamically free
+things ahead of time in order to stay under the target. Trying to respond to
+an OOM at the point of `tor_malloc()` failure, on the other hand, would involve
+a rare operation invoked from deep in the call stack. (Again, that's
+error-prone and hard to debug.)
+
+Third, thanks to the rise of Linux and other operating systems that allow
+memory to be overcommitted, you can't actually ever rely on getting a NULL
+from `malloc()` when you're out of memory; instead you have to use an approach
+closer to tracking the total memory usage.
+
+#### Conventions for your own allocation functions.
+
+Whenever you create a new type, the convention is to give it a pair of
+`x_new()` and `x_free_()` functions, named after the type.
+
+Calling `x_free(NULL)` should always be a no-op.
+
+There should additionally be an `x_free()` macro, defined in terms of
+`x_free_()`. This macro should set its lvalue to NULL. You can define it
+using the FREE_AND_NULL macro, as follows:
+
+```
+#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr))
+```
+
**/
diff --git a/src/lib/math/lib_math.dox b/src/lib/math/lib_math.dox
index c2e121dc8c..f20d7092b3 100644
--- a/src/lib/math/lib_math.dox
+++ b/src/lib/math/lib_math.dox
@@ -1,4 +1,8 @@
/**
-@dir lib/math
-@brief lib/math
+@dir /lib/math
+@brief lib/math: Floating-point math utilities.
+
+This module includes a bunch of floating-point compatibility code, and
+implementations for several probability distributions.
+
**/
diff --git a/src/lib/memarea/lib_memarea.dox b/src/lib/memarea/lib_memarea.dox
index dbd98de5ec..041191482d 100644
--- a/src/lib/memarea/lib_memarea.dox
+++ b/src/lib/memarea/lib_memarea.dox
@@ -1,4 +1,30 @@
/**
-@dir lib/memarea
-@brief lib/memarea
+@dir /lib/memarea
+@brief lib/memarea: A fast arena-style allocator.
+
+This module has a fast "arena" style allocator, where memory is freed all at
+once. This kind of allocation is very fast and avoids fragmentation, at the
+expense of requiring all the data to be freed at the same time. We use this
+for parsing and diff calculations.
+
+It's often handy to allocate a large number of tiny objects, all of which
+need to disappear at the same time. You can do this in tor using the
+memarea.c abstraction, which uses a set of grow-only buffers for allocation,
+and only supports a single "free" operation at the end.
+
+Using memareas also helps you avoid memory fragmentation. You see, some libc
+malloc implementations perform badly on the case where a large number of
+small temporary objects are allocated at the same time as a few long-lived
+objects of similar size. But if you use tor_malloc() for the long-lived ones
+and a memarea for the temporary object, the malloc implementation is likelier
+to do better.
+
+To create a new memarea, use `memarea_new()`. To drop all the storage from a
+memarea, and invalidate its pointers, use `memarea_drop_all()`.
+
+The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`,
+`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous
+to the similarly-named malloc() functions. There is intentionally no
+`memarea_free()` or `memarea_realloc()`.
+
**/
diff --git a/src/lib/meminfo/lib_meminfo.dox b/src/lib/meminfo/lib_meminfo.dox
index c8def7e2f9..b57e60525e 100644
--- a/src/lib/meminfo/lib_meminfo.dox
+++ b/src/lib/meminfo/lib_meminfo.dox
@@ -1,4 +1,7 @@
/**
-@dir lib/meminfo
-@brief lib/meminfo
+@dir /lib/meminfo
+@brief lib/meminfo: Inspecting malloc() usage.
+
+Only available when malloc() provides mallinfo() or something similar.
+
**/
diff --git a/src/lib/net/lib_net.dox b/src/lib/net/lib_net.dox
index 03783c12aa..b4c00405d7 100644
--- a/src/lib/net/lib_net.dox
+++ b/src/lib/net/lib_net.dox
@@ -1,4 +1,8 @@
/**
-@dir lib/net
-@brief lib/net
+@dir /lib/net
+@brief lib/net: Low-level network-related code.
+
+This module includes address manipulation, compatibility wrappers,
+convenience functions, and so on.
+
**/
diff --git a/src/lib/osinfo/lib_osinfo.dox b/src/lib/osinfo/lib_osinfo.dox
index 7733755f20..4d9b1a6d76 100644
--- a/src/lib/osinfo/lib_osinfo.dox
+++ b/src/lib/osinfo/lib_osinfo.dox
@@ -1,4 +1,10 @@
/**
-@dir lib/osinfo
-@brief lib/osinfo
+@dir /lib/osinfo
+@brief lib/osinfo: For inspecting the OS version and capabilities.
+
+In general, we use this module when we're telling the user what operating
+system they are running. We shouldn't make decisions based on the output of
+these checks: instead, we should have more specific checks, either at compile
+time or run time, based on the observed system behavior.
+
**/
diff --git a/src/lib/process/lib_process.dox b/src/lib/process/lib_process.dox
index efb1adc091..723c9f193d 100644
--- a/src/lib/process/lib_process.dox
+++ b/src/lib/process/lib_process.dox
@@ -1,4 +1,4 @@
/**
-@dir lib/process
-@brief lib/process
+@dir /lib/process
+@brief lib/process: Launch and manage subprocesses.
**/
diff --git a/src/lib/pubsub/lib_pubsub.dox b/src/lib/pubsub/lib_pubsub.dox
index 9a3fc6dfac..c033660121 100644
--- a/src/lib/pubsub/lib_pubsub.dox
+++ b/src/lib/pubsub/lib_pubsub.dox
@@ -1,4 +1,16 @@
/**
-@dir lib/pubsub
-@brief lib/pubsub
+@dir /lib/pubsub
+@brief lib/pubsub: Publish-subscribe message passing.
+
+This module wraps the \refdir{lib/dispatch} module, to provide a more
+ergonomic and type-safe approach to message passing.
+
+In general, we favor this mechanism for cases where higher-level modules
+need to be notified when something happens in lower-level modules. (The
+alternative would be calling up from the lower-level modules, which
+would be error-prone; or maintaining lists of function-pointers, which
+would be clumsy and tend to complicate the call graph.)
+
+See pubsub.c for more information.
+
**/
diff --git a/src/lib/sandbox/lib_sandbox.dox b/src/lib/sandbox/lib_sandbox.dox
index eb42d97589..48eddac685 100644
--- a/src/lib/sandbox/lib_sandbox.dox
+++ b/src/lib/sandbox/lib_sandbox.dox
@@ -1,4 +1,17 @@
/**
-@dir lib/sandbox
-@brief lib/sandbox
+@dir /lib/sandbox
+@brief lib/sandbox: Linux seccomp2-based sandbox.
+
+This module uses Linux's seccomp2 facility via the
+[`libseccomp` library](https://github.com/seccomp/libseccomp), to restrict
+the set of system calls that Tor is allowed to invoke while it is running.
+
+Because there are many libc versions that invoke different system calls, and
+because handling strings is quite complex, this module is more complex and
+less portable than it needs to be.
+
+A better architecture would put the responsibility for invoking tricky system
+calls (like open()) in another, less restricted process, and give that
+process responsibility for enforcing our sandbox rules.
+
**/
diff --git a/src/lib/smartlist_core/lib_smartlist_core.dox b/src/lib/smartlist_core/lib_smartlist_core.dox
index 507d0fe92f..73c3b69056 100644
--- a/src/lib/smartlist_core/lib_smartlist_core.dox
+++ b/src/lib/smartlist_core/lib_smartlist_core.dox
@@ -1,4 +1,12 @@
/**
-@dir lib/smartlist_core
-@brief lib/smartlist_core
+@dir /lib/smartlist_core
+@brief lib/smartlist_core: Minimal dynamic array implementation
+
+A `smartlist_t` is a dynamic array type for holding `void *`. We use it
+throughout the rest of the codebase.
+
+There are higher-level pieces in \refdir{lib/container} but
+the ones in lib/smartlist_core are used by the logging code, and therefore
+cannot use the logging code.
+
**/
diff --git a/src/lib/stats/lib_stats.dox b/src/lib/stats/lib_stats.dox
deleted file mode 100644
index 897c41418f..0000000000
--- a/src/lib/stats/lib_stats.dox
+++ /dev/null
@@ -1,4 +0,0 @@
-/**
-@dir lib/stats
-@brief lib/stats
-**/
diff --git a/src/lib/string/lib_string.dox b/src/lib/string/lib_string.dox
index 3e038ea072..c8793ddf91 100644
--- a/src/lib/string/lib_string.dox
+++ b/src/lib/string/lib_string.dox
@@ -1,4 +1,15 @@
/**
-@dir lib/string
-@brief lib/string
+@dir /lib/string
+@brief lib/string: Low-level string manipulation.
+
+We have a number of compatibility functions here: some are for handling
+functionality that is not implemented (or not implemented the same) on every
+platform; some are for providing locale-independent versions of libc
+functions that would otherwise be defined differently for different users.
+
+Other functions here are for common string-manipulation operations that we do
+in the rest of the codebase.
+
+Any string function high-level enough to need logging belongs in a
+higher-level module.
**/
diff --git a/src/lib/subsys/lib_subsys.dox b/src/lib/subsys/lib_subsys.dox
index f9cd5eeb81..1a22a2d808 100644
--- a/src/lib/subsys/lib_subsys.dox
+++ b/src/lib/subsys/lib_subsys.dox
@@ -1,4 +1,34 @@
/**
-@dir lib/subsys
-@brief lib/subsys
+@dir /lib/subsys
+@brief lib/subsys: Types for declaring a "subsystem".
+
+## Subsystems in Tor
+
+A subsystem is a module with support for initialization, shutdown,
+configuration, and so on.
+
+Many parts of Tor can be initialized, cleaned up, and configured somewhat
+independently through a table-driven mechanism. Each such part is called a
+"subsystem".
+
+To declare a subsystem, make a global `const` instance of the `subsys_fns_t`
+type, filling in the function pointer fields that you require with ones
+corresponding to your subsystem. Any function pointers left as "NULL" will
+be a no-op. Each system must have a name and a "level", which corresponds to
+the order in which it is initialized. (See `app/main/subsystem_list.c` for a
+list of current subsystems and their levels.)
+
+Then, insert your subsystem in the list in `app/main/subsystem_list.c`. It
+will need to occupy a position corresponding to its level.
+
+At this point, your subsystem will be handled like the others: it will get
+initialized at startup, torn down at exit, and so on.
+
+Historical note: Not all of Tor's code is currently handled as
+subsystems. As you work with older code, you may see some parts of the code
+that are initialized from `tor_init()` or `run_tor_main_loop()` or
+`tor_run_main()`; and torn down from `tor_cleanup()`. We aim to migrate
+these to subsystems over time; please don't add any new code that follows
+this pattern.
+
**/
diff --git a/src/lib/term/lib_term.dox b/src/lib/term/lib_term.dox
index 2bc5125839..3bf2f960ab 100644
--- a/src/lib/term/lib_term.dox
+++ b/src/lib/term/lib_term.dox
@@ -1,4 +1,4 @@
/**
-@dir lib/term
-@brief lib/term
+@dir /lib/term
+@brief lib/term: Terminal operations (password input).
**/
diff --git a/src/lib/testsupport/lib_testsupport.dox b/src/lib/testsupport/lib_testsupport.dox
index 63ccc47d34..c09c32e478 100644
--- a/src/lib/testsupport/lib_testsupport.dox
+++ b/src/lib/testsupport/lib_testsupport.dox
@@ -1,4 +1,4 @@
/**
-@dir lib/testsupport
-@brief lib/testsupport
+@dir /lib/testsupport
+@brief lib/testsupport: Helpers for test-only code and for function mocking.
**/
diff --git a/src/lib/thread/lib_thread.dox b/src/lib/thread/lib_thread.dox
index 68937ef793..2773aa009d 100644
--- a/src/lib/thread/lib_thread.dox
+++ b/src/lib/thread/lib_thread.dox
@@ -1,4 +1,9 @@
/**
-@dir lib/thread
-@brief lib/thread
+@dir /lib/thread
+@brief lib/thread: Mid-level threading.
+
+This module contains compatibility and convenience code for multithreading,
+except for low-level locks (which are in \refdir{lib/lock} and
+workqueue/threadpool code (which belongs in \refdir{lib/evloop}.)
+
**/
diff --git a/src/lib/time/lib_time.dox b/src/lib/time/lib_time.dox
index 50abf072f7..b76a31fb97 100644
--- a/src/lib/time/lib_time.dox
+++ b/src/lib/time/lib_time.dox
@@ -1,4 +1,11 @@
/**
-@dir lib/time
-@brief lib/time
+@dir /lib/time
+@brief lib/time: Higher-level time functions
+
+This includes both fine-grained timers and monotonic timers, along with
+wrappers for them to try to improve efficiency.
+
+For "what time is it" in UTC, see \refdir{lib/wallclock}. For parsing and
+encoding times and dates, see \refdir{lib/encoding}.
+
**/
diff --git a/src/lib/tls/lib_tls.dox b/src/lib/tls/lib_tls.dox
index 40b7b2c27e..f0dba269e8 100644
--- a/src/lib/tls/lib_tls.dox
+++ b/src/lib/tls/lib_tls.dox
@@ -1,4 +1,13 @@
/**
-@dir lib/tls
-@brief lib/tls
+@dir /lib/tls
+@brief lib/tls: TLS library wrappers
+
+This module has compatibility wrappers around the library (NSS or OpenSSL,
+depending on configuration) that Tor uses to implement the TLS link security
+protocol.
+
+It also implements the logic for some legacy TLS protocol usage we used to
+support in old versions of Tor, involving conditional delivery of certificate
+chains (v1 link protocol) and conditional renegotiation (v2 link protocol).
+
**/
diff --git a/src/lib/trace/lib_trace.dox b/src/lib/trace/lib_trace.dox
index a1ae256506..64f762bc3e 100644
--- a/src/lib/trace/lib_trace.dox
+++ b/src/lib/trace/lib_trace.dox
@@ -1,4 +1,8 @@
/**
-@dir lib/trace
-@brief lib/trace
+@dir /lib/trace
+@brief lib/trace: Function-tracing functionality API.
+
+This module is used for adding "trace" support (low-granularity function
+logging) to Tor. Right now it doesn't have many users.
+
**/
diff --git a/src/lib/version/lib_version.dox b/src/lib/version/lib_version.dox
index 213e1a1ae8..93d2fb6b9b 100644
--- a/src/lib/version/lib_version.dox
+++ b/src/lib/version/lib_version.dox
@@ -1,4 +1,4 @@
/**
-@dir lib/version
-@brief lib/version
+@dir /lib/version
+@brief lib/version: holds the current version of Tor.
**/
diff --git a/src/lib/wallclock/lib_wallclock.dox b/src/lib/wallclock/lib_wallclock.dox
index 7bb2b075d1..7d43fa6129 100644
--- a/src/lib/wallclock/lib_wallclock.dox
+++ b/src/lib/wallclock/lib_wallclock.dox
@@ -1,4 +1,13 @@
/**
-@dir lib/wallclock
-@brief lib/wallclock
+@dir /lib/wallclock
+@brief lib/wallclock: Inspect and manipulate the current time.
+
+This module handles our concept of "what time is it" or "what time does the
+world agree it is?" Generally, if you want something derived from UTC, this
+is the module for you.
+
+For versions of the time that are more local, more monotonic, or more
+accurate, see \refdir{lib/time}. For parsing and encoding times and dates,
+see \refdir{lib/encoding}.
+
**/
diff --git a/src/mainpage.dox b/src/mainpage.dox
index 84eea3c526..02ce8675e7 100644
--- a/src/mainpage.dox
+++ b/src/mainpage.dox
@@ -1,11 +1,122 @@
/**
@mainpage Tor source reference
-@section intro Getting to know Tor
+@section intro Welcome to Tor
-Welcome to the Tor source code documentation! Here we have documentation for
-nearly every function, type, and module in the Tor source code. The high-level
-documentation is a work in progress. For now, have a look at the source code
-overview in doc/HACKING/design.
+This documentation describes the general structure of the Tor codebase, how
+it fits together, what functionality is available for extending Tor, and
+gives some notes on how Tor got that way. It also includes a reference for
+nearly every function, type, file, and module in the Tor source code. The
+high-level documentation is a work in progress.
+
+Tor itself remains a work in progress too: We've been working on it for
+nearly two decades, and we've learned a lot about good coding since we first
+started. This means, however, that some of the older pieces of Tor will have
+some "code smell" in them that could stand a brisk refactoring. So when we
+describe a piece of code, we'll sometimes give a note on how it got that way,
+and whether we still think that's a good idea.
+
+This document is not an overview of the Tor protocol. For that, see the
+design paper and the specifications at https://spec.torproject.org/ .
+
+For more information about Tor's coding standards and some helpful
+development tools, see
+[doc/HACKING](https://gitweb.torproject.org/tor.git/tree/doc/HACKING) in the
+Tor repository.
+
+@section highlevel The very high level
+
+Ultimately, Tor runs as an event-driven network daemon: it responds to
+network events, signals, and timers by sending and receiving things over
+the network. Clients, relays, and directory authorities all use the
+same codebase: the Tor process will run as a client, relay, or authority
+depending on its configuration.
+
+Tor has a few major dependencies, including Libevent (used to tell which
+sockets are readable and writable), OpenSSL or NSS (used for many encryption
+functions, and to implement the TLS protocol), and zlib (used to
+compress and uncompress directory information).
+
+Most of Tor's work today is done in a single event-driven main thread.
+Tor also spawns one or more worker threads to handle CPU-intensive
+tasks. (Right now, this only includes circuit encryption and the more
+expensive compression algorithms.)
+
+On startup, Tor initializes its libraries, reads and responds to its
+configuration files, and launches a main event loop. At first, the only
+events that Tor listens for are a few signals (like TERM and HUP), and
+one or more listener sockets (for different kinds of incoming
+connections). Tor also configures several timers to handle periodic
+events. As Tor runs over time, other events will open, and new events
+will be scheduled.
+
+The codebase is divided into a few top-level subdirectories, each of
+which contains several sub-modules.
+
+ - `ext` -- Code maintained elsewhere that we include in the Tor
+ source distribution.
+
+ - \refdir{lib} -- Lower-level utility code, not necessarily
+ tor-specific.
+
+ - `trunnel` -- Automatically generated code (from the Trunnel
+ tool): used to parse and encode binary formats.
+
+ - \refdir{core} -- Networking code that is implements the central
+ parts of the Tor protocol and main loop.
+
+ - \refdir{feature} -- Aspects of Tor (like directory management,
+ running a relay, running a directory authorities, managing a list of
+ nodes, running and using onion services) that are built on top of the
+ mainloop code.
+
+ - \refdir{app} -- Highest-level functionality; responsible for setting
+ up and configuring the Tor daemon, making sure all the lower-level
+ modules start up when required, and so on.
+
+ - \refdir{tools} -- Binaries other than Tor that we produce.
+ Currently this is tor-resolve, tor-gencert, and the tor_runner.o helper
+ module.
+
+ - `test` -- unit tests, regression tests, and a few integration
+ tests.
+
+In theory, the above parts of the codebase are sorted from highest-level to
+lowest-level, where high-level code is only allowed to invoke lower-level
+code, and lower-level code never includes or depends on code of a higher
+level. In practice, this refactoring is incomplete: The modules in
+\refdir{lib} are well-factored, but there are many layer violations ("upward
+dependencies") in \refdir{core} and \refdir{feature}.
+We aim to eliminate those over time.
+
+@section keyabstractions Some key high-level abstractions
+
+The most important abstractions at Tor's high-level are Connections,
+Channels, Circuits, and Nodes.
+
+A 'Connection' (connection_t) represents a stream-based information flow.
+Most connections are TCP connections to remote Tor servers and clients. (But
+as a shortcut, a relay will sometimes make a connection to itself without
+actually using a TCP connection. More details later on.) Connections exist
+in different varieties, depending on what functionality they provide. The
+principle types of connection are edge_connection_t (eg a socks connection or
+a connection from an exit relay to a destination), or_connection_t (a TLS
+stream connecting to a relay), dir_connection_t (an HTTP connection to learn
+about the network), and control_connection_t (a connection from a
+controller).
+
+A 'Circuit' (circuit_t) is persistent tunnel through the Tor network,
+established with public-key cryptography, and used to send cells one or more
+hops. Clients keep track of multi-hop circuits (origin_circuit_t), and the
+cryptography associated with each hop. Relays, on the other hand, keep track
+only of their hop of each circuit (or_circuit_t).
+
+A 'Channel' (channel_t) is an abstract view of sending cells to and from a
+Tor relay. Currently, all channels are implemented using OR connections
+(channel_tls_t). If we switch to other strategies in the future, we'll have
+more connection types.
+
+A 'Node' (node_t) is a view of a Tor instance's current knowledge and opinions
+about a Tor relay or bridge.
**/
diff --git a/src/tools/tools.dox b/src/tools/tools.dox
index 54aa4df48e..1168ed5bad 100644
--- a/src/tools/tools.dox
+++ b/src/tools/tools.dox
@@ -1,5 +1,5 @@
/**
-@dir tools
+@dir /tools
@brief tools: other command-line tools for use with Tor.
The "tools" directory has a few other programs that use Tor, but are not part