diff options
76 files changed, 1031 insertions, 809 deletions
diff --git a/Doxyfile.in b/Doxyfile.in index 0128f12812..547a7db190 100644 --- a/Doxyfile.in +++ b/Doxyfile.in @@ -256,6 +256,8 @@ TAB_SIZE = 8 ALIASES = +ALIASES += refdir{1}="\ref @top_srcdir@/src/\1 \"\1\"" + # This tag can be used to specify a number of word-keyword mappings (TCL only). # A mapping has the form "name=value". For example adding "class=itcl::class" # will allow you to use the command class in the itcl::class meaning. diff --git a/doc/HACKING/design/00-overview.md b/doc/HACKING/design/00-overview.md index ff40a566be..1c14dc8c10 100644 --- a/doc/HACKING/design/00-overview.md +++ b/doc/HACKING/design/00-overview.md @@ -1,124 +1,6 @@ ## Overview ## -This document describes the general structure of the Tor codebase, how -it fits together, what functionality is available for extending Tor, -and gives some notes on how Tor got that way. - -Tor remains a work in progress: We've been working on it for nearly two -decades, and we've learned a lot about good coding since we first -started. This means, however, that some of the older pieces of Tor will -have some "code smell" in them that could stand a brisk -refactoring. So when I describe a piece of code, I'll sometimes give a -note on how it got that way, and whether I still think that's a good -idea. - -The first drafts of this document were written in the Summer and Fall of -2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7 -was under development. There is a revision in progress (as of late -2019), to bring it up to pace with Tor as of version 0.4.2. If you're -reading this far in the future, some things may have changed. Caveat -haxxor! - -This document is not an overview of the Tor protocol. For that, see the -design paper and the specifications at https://spec.torproject.org/ . - -For more information about Tor's coding standards and some helpful -development tools, see doc/HACKING in the Tor repository. - - -### The very high level ### - -Ultimately, Tor runs as an event-driven network daemon: it responds to -network events, signals, and timers by sending and receiving things over -the network. Clients, relays, and directory authorities all use the -same codebase: the Tor process will run as a client, relay, or authority -depending on its configuration. - -Tor has a few major dependencies, including Libevent (used to tell which -sockets are readable and writable), OpenSSL or NSS (used for many encryption -functions, and to implement the TLS protocol), and zlib (used to -compress and uncompress directory information). - -Most of Tor's work today is done in a single event-driven main thread. -Tor also spawns one or more worker threads to handle CPU-intensive -tasks. (Right now, this only includes circuit encryption and the more -expensive compression algorithms.) - -On startup, Tor initializes its libraries, reads and responds to its -configuration files, and launches a main event loop. At first, the only -events that Tor listens for are a few signals (like TERM and HUP), and -one or more listener sockets (for different kinds of incoming -connections). Tor also configures several timers to handle periodic -events. As Tor runs over time, other events will open, and new events -will be scheduled. - -The codebase is divided into a few top-level subdirectories, each of -which contains several sub-modules. - - * `src/ext` -- Code maintained elsewhere that we include in the Tor - source distribution. - - * src/lib` -- Lower-level utility code, not necessarily tor-specific. - - * `src/trunnel` -- Automatically generated code (from the Trunnel - tool): used to parse and encode binary formats. - - * `src/core` -- Networking code that is implements the central parts of - the Tor protocol and main loop. - - * `src/feature` -- Aspects of Tor (like directory management, running a - relay, running a directory authorities, managing a list of nodes, - running and using onion services) that are built on top of the - mainloop code. - - * `src/app` -- Highest-level functionality; responsible for setting up - and configuring the Tor daemon, making sure all the lower-level - modules start up when required, and so on. - - * `src/tools` -- Binaries other than Tor that we produce. Currently this - is tor-resolve, tor-gencert, and the tor_runner.o helper module. - - * `src/test` -- unit tests, regression tests, and a few integration - tests. - -In theory, the above parts of the codebase are sorted from highest-level to -lowest-level, where high-level code is only allowed to invoke lower-level -code, and lower-level code never includes or depends on code of a higher -level. In practice, this refactoring is incomplete: The modules in `src/lib` -are well-factored, but there are many layer violations ("upward -dependencies") in `src/core` and `src/feature`. We aim to eliminate those -over time. - -### Some key high-level abstractions ### - -The most important abstractions at Tor's high-level are Connections, -Channels, Circuits, and Nodes. - -A 'Connection' represents a stream-based information flow. Most -connections are TCP connections to remote Tor servers and clients. (But -as a shortcut, a relay will sometimes make a connection to itself -without actually using a TCP connection. More details later on.) -Connections exist in different varieties, depending on what -functionality they provide. The principle types of connection are -"edge" (eg a socks connection or a connection from an exit relay to a -destination), "OR" (a TLS stream connecting to a relay), "Directory" (an -HTTP connection to learn about the network), and "Control" (a connection -from a controller). - -A 'Circuit' is persistent tunnel through the Tor network, established -with public-key cryptography, and used to send cells one or more hops. -Clients keep track of multi-hop circuits, and the cryptography -associated with each hop. Relays, on the other hand, keep track only of -their hop of each circuit. - -A 'Channel' is an abstract view of sending cells to and from a Tor -relay. Currently, all channels are implemented using OR connections. -If we switch to other strategies in the future, we'll have more -connection types. - -A 'Node' is a view of a Tor instance's current knowledge and opinions -about a Tor relay or bridge. ### The rest of this document. ### diff --git a/doc/HACKING/design/01.00-lib-overview.md b/doc/HACKING/design/01.00-lib-overview.md deleted file mode 100644 index 58a92f4062..0000000000 --- a/doc/HACKING/design/01.00-lib-overview.md +++ /dev/null @@ -1,171 +0,0 @@ - -## Library code in Tor. - -Most of Tor's utility code is in modules in the `src/lib` subdirectory. In -general, this code is not necessarily Tor-specific, but is instead possibly -useful for other applications. - -This code includes: - - * Compatibility wrappers, to provide a uniform API across different - platforms. - - * Library wrappers, to provide a tor-like API over different libraries - that Tor uses for things like compression and cryptography. - - * Containers, to implement some general-purpose data container types. - -The modules in `src/lib` are currently well-factored: each one depends -only on lower-level modules. You can see an up-to-date list of the -modules sorted from lowest to highest level by running -`./scripts/maint/practracker/includes.py --toposort`. - -As of this writing, the library modules are (from lowest to highest -level): - - * `lib/cc` -- Macros for managing the C compiler and - language. Includes macros for improving compatibility and clarity - across different C compilers. - - * `lib/version` -- Holds the current version of Tor. - - * `lib/testsupport` -- Helpers for making test-only code and test - mocking support. - - * `lib/defs` -- Lowest-level constants used in many places across the - code. - - * `lib/subsys` -- Types used for declaring a "subsystem". A subsystem - is a module with support for initialization, shutdown, - configuration, and so on. - - * `lib/conf` -- Types and macros used for declaring configuration - options. - - * `lib/arch` -- Compatibility functions and macros for handling - differences in CPU architecture. - - * `lib/err` -- Lowest-level error handling code: responsible for - generating stack traces, handling raw assertion failures, and - otherwise reporting problems that might not be safe to report - via the regular logging module. - - * `lib/malloc` -- Wrappers and utilities for memory management. - - * `lib/intmath` -- Utilities for integer mathematics. - - * `lib/fdio` -- Utilities and compatibility code for reading and - writing data on file descriptors (and on sockets, for platforms - where a socket is not a kind of fd). - - * `lib/lock` -- Compatibility code for declaring and using locks. - Lower-level than the rest of the threading code. - - * `lib/ctime` -- Constant-time implementations for data comparison - and table lookup, used to avoid timing side-channels from standard - implementations of memcmp() and so on. - - * `lib/string` -- Low-level compatibility wrappers and utility - functions for string manipulation. - - * `lib/wallclock` -- Compatibility and utility functions for - inspecting and manipulating the current (UTC) time. - - * `lib/osinfo` -- Functions for inspecting the version and - capabilities of the operating system. - - * `lib/smartlist_core` -- The bare-bones pieces of our dynamic array - ("smartlist") implementation. There are higher-level pieces, but - these ones are used by (and therefore cannot use) the logging code. - - * `lib/log` -- Implements the logging system used by all higher-level - Tor code. You can think of this as the logical "midpoint" of the - library code: much of the higher-level code is higher-level - _because_ it uses the logging module, and much of the lower-level - code is specifically written to avoid having to log, because the - logging module depends on it. - - * `lib/container` -- General purpose containers, including dynamic arrays - ("smartlists"), hashtables, bit arrays, weak-reference-like "handles", - bloom filters, and a bit more. - - * `lib/trace` -- A general-purpose API for introducing - function-tracing functionality into Tor. Currently not much used. - - * `lib/thread` -- Threading compatibility and utility functionality, - other than low-level locks (which are in `lib/lock`) and - workqueue/threadpool code (which belongs in `lib/evloop`). - - * `lib/term` -- Code for terminal manipulation functions (like - reading a password from the user). - - * `lib/memarea` -- A data structure for a fast "arena" style allocator, - where the data is freed all at once. Used for parsing. - - * `lib/encoding` -- Implementations for encoding data in various - formats, datatypes, and transformations. - - * `lib/dispatch` -- A general-purpose in-process message delivery - system. Used by `lib/pubsub` to implement our inter-module - publish/subscribe system. - - * `lib/sandbox` -- Our Linux seccomp2 sandbox implementation. - - * `lib/pubsub` -- Code and macros to implement our publish/subscribe - message passing system. - - * `lib/fs` -- Utility and compatibility code for manipulating files, - filenames, directories, and so on. - - * `lib/confmgt` -- Code to parse, encode, and manipulate our - configuration files, state files, and so forth. - - * `lib/crypt_ops` -- Cryptographic operations. This module contains - wrappers around the cryptographic libraries that we support, - and implementations for some higher-level cryptographic - constructions that we use. - - * `lib/meminfo` -- Functions for inspecting our memory usage, if the - malloc implementation exposes that to us. - - * `lib/time` -- Higher level time functions, including fine-gained and - monotonic timers. - - * `lib/math` -- Floating-point mathematical utilities, including - compatibility code, and probability distributions. - - * `lib/buf` -- A general purpose queued buffer implementation, - similar to the BSD kernel's "mbuf" structure. - - * `lib/net` -- Networking code, including address manipulation, - compatibility wrappers, - - * `lib/compress` -- A compatibility wrapper around several - compression libraries, currently including zlib, zstd, and lzma. - - * `lib/geoip` -- Utilities to manage geoip (IP to country) lookups - and formats. - - * `lib/tls` -- Compatibility wrappers around the library (NSS or - OpenSSL, depending on configuration) that Tor uses to implement the - TLS link security protocol. - - * `lib/evloop` -- Tools to manage the event loop and related - functionality, in order to implement asynchronous networking, - timers, periodic events, and other scheduling tasks. - - * `lib/process` -- Utilities and compatibility code to launch and - manage subprocesses. - -### What belongs in lib? - -In general, if you can imagine some program wanting the functionality -you're writing, even if that program had nothing to do with Tor, your -functionality belongs in lib. - -If it falls into one of the existing "lib" categories, your -functionality belongs in lib. - -If you are using platform-specific `#ifdef`s to manage compatibility -issues among platforms, you should probably consider whether you can -put your code into lib. diff --git a/doc/HACKING/design/01a-memory.md b/doc/HACKING/design/01a-memory.md deleted file mode 100644 index 4c6bb09018..0000000000 --- a/doc/HACKING/design/01a-memory.md +++ /dev/null @@ -1,103 +0,0 @@ - -## Memory management - -### Heap-allocation functions: lib/malloc/malloc.h - -Tor imposes a few light wrappers over C's native malloc and free -functions, to improve convenience, and to allow wholescale replacement -of malloc and free as needed. - -You should never use 'malloc', 'calloc', 'realloc, or 'free' on their -own; always use the variants prefixed with 'tor_'. -They are the same as the standard C functions, with the following -exceptions: - - * `tor_free(NULL)` is a no-op. - * `tor_free()` is a macro that takes an lvalue as an argument and sets it to - NULL after freeing it. To avoid this behavior, you can use `tor_free_()` - instead. - * tor_malloc() and friends fail with an assertion if they are asked to - allocate a value so large that it is probably an underflow. - * It is always safe to `tor_malloc(0)`, regardless of whether your libc - allows it. - * `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail. - Instead, Tor will die with an assertion. This means that you never - need to check their return values. See the next subsection for - information on why we think this is a good idea. - -We define additional general-purpose memory allocation functions as well: - - * `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear - the intent to allocate a single zeroed-out value. - * `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function. - Use it for cases when you need to realloc() in a multiplication-safe - way. - -And specific-purpose functions as well: - - * `tor_strdup()` and `tor_strndup()` behaves as the underlying libc - functions, but use `tor_malloc()` instead of the underlying function. - * `tor_memdup()` copies a chunk of memory of a given size. - * `tor_memdup_nulterm()` copies a chunk of memory of a given size, then - NUL-terminates it just to be safe. - -#### Why assert on allocation failure? - -Why don't we allow `tor_malloc()` and its allies to return NULL? - -First, it's error-prone. Many programmers forget to check for NULL return -values, and testing for `malloc()` failures is a major pain. - -Second, it's not necessarily a great way to handle OOM conditions. It's -probably better (we think) to have a memory target where we dynamically free -things ahead of time in order to stay under the target. Trying to respond to -an OOM at the point of `tor_malloc()` failure, on the other hand, would involve -a rare operation invoked from deep in the call stack. (Again, that's -error-prone and hard to debug.) - -Third, thanks to the rise of Linux and other operating systems that allow -memory to be overcommitted, you can't actually ever rely on getting a NULL -from `malloc()` when you're out of memory; instead you have to use an approach -closer to tracking the total memory usage. - -#### Conventions for your own allocation functions. - -Whenever you create a new type, the convention is to give it a pair of -`x_new()` and `x_free_()` functions, named after the type. - -Calling `x_free(NULL)` should always be a no-op. - -There should additionally be an `x_free()` macro, defined in terms of -`x_free_()`. This macro should set its lvalue to NULL. You can define it -using the FREE_AND_NULL macro, as follows: - -``` -#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr)) -``` - - -### Grow-only memory allocation: lib/memarea - -It's often handy to allocate a large number of tiny objects, all of which -need to disappear at the same time. You can do this in tor using the -memarea.c abstraction, which uses a set of grow-only buffers for allocation, -and only supports a single "free" operation at the end. - -Using memareas also helps you avoid memory fragmentation. You see, some libc -malloc implementations perform badly on the case where a large number of -small temporary objects are allocated at the same time as a few long-lived -objects of similar size. But if you use tor_malloc() for the long-lived ones -and a memarea for the temporary object, the malloc implementation is likelier -to do better. - -To create a new memarea, use `memarea_new()`. To drop all the storage from a -memarea, and invalidate its pointers, use `memarea_drop_all()`. - -The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`, -`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous -to the similarly-named malloc() functions. There is intentionally no -`memarea_free()` or `memarea_realloc()`. - -### Special allocation: lib/malloc/map_anon.h - -TODO: WRITEME. diff --git a/doc/HACKING/design/01b-collections.md b/doc/HACKING/design/01b-collections.md deleted file mode 100644 index ed6fdc9071..0000000000 --- a/doc/HACKING/design/01b-collections.md +++ /dev/null @@ -1,45 +0,0 @@ - -## Collections in tor - -### Smartlists: Neither lists, nor especially smart. - -For historical reasons, we call our dynamic-allocated array type -`smartlist_t`. It can grow or shrink as elements are added and removed. - -All smartlists hold an array of `void *`. Whenever you expose a smartlist -in an API you *must* document which types its pointers actually hold. - -<!-- It would be neat to fix that, wouldn't it? -NM --> - -Smartlists are created empty with `smartlist_new()` and freed with -`smartlist_free()`. See the `containers.h` module documentation for more -information; there are many convenience functions for commonly needed -operations. - -<!-- TODO: WRITE more about what you can do with smartlists. --> - -### Digest maps, string maps, and more. - -Tor makes frequent use of maps from 160-bit digests, 256-bit digests, -or nul-terminated strings to `void *`. These types are `digestmap_t`, -`digest256map_t`, and `strmap_t` respectively. See the containers.h -module documentation for more information. - -### Intrusive lists and hashtables - -For performance-sensitive cases, we sometimes want to use "intrusive" -collections: ones where the bookkeeping pointers are stuck inside the -structures that belong to the collection. If you've used the -BSD-style sys/queue.h macros, you'll be familiar with these. - -Unfortunately, the `sys/queue.h` macros vary significantly between the -platforms that have them, so we provide our own variants in -`src/ext/tor_queue.h`. - -We also provide an intrusive hashtable implementation in `src/ext/ht.h`. -When you're using it, you'll need to define your own hash -functions. If attacker-induced collisions are a worry here, use the -cryptographic siphash24g function to extract hashes. - -<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions. ---> diff --git a/doc/HACKING/design/01d-crypto.md b/doc/HACKING/design/01d-crypto.md index d4def947d1..3e23a07013 100644 --- a/doc/HACKING/design/01d-crypto.md +++ b/doc/HACKING/design/01d-crypto.md @@ -1,132 +1,4 @@ -## Lower-level cryptography functionality in Tor ## - -Generally speaking, Tor code shouldn't be calling OpenSSL (or any -other crypto library) directly. Instead, we should indirect through -one of the functions in src/common/crypto\*.c or src/common/tortls.c. - -Cryptography functionality that's available is described below. - -### RNG facilities ### - -The most basic RNG capability in Tor is the crypto_rand() family of -functions. These currently use OpenSSL's RAND_() backend, but may use -something faster in the future. - -In addition to crypto_rand(), which fills in a buffer with random -bytes, we also have functions to produce random integers in certain -ranges; to produce random hostnames; to produce random doubles, etc. - -When you're creating a long-term cryptographic secret, you might want -to use crypto_strongest_rand() instead of crypto_rand(). It takes the -operating system's entropy source and combines it with output from -crypto_rand(). This is a pure paranoia measure, but it might help us -someday. - -You can use smartlist_choose() to pick a random element from a smartlist -and smartlist_shuffle() to randomize the order of a smartlist. Both are -potentially a bit slow. - -### Cryptographic digests and related functions ### - -We treat digests as separate types based on the length of their -outputs. We support one 160-bit digest (SHA1), two 256-bit digests -(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512). - -You should not use SHA1 for anything new. - -The crypto_digest\*() family of functions manipulates digests. You -can either compute a digest of a chunk of memory all at once using -crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you -can create a crypto_digest_t object with -crypto_digest{,256,512}_new(), feed information to it in chunks using -crypto_digest_add_bytes(), and then extract the final digest using -crypto_digest_get_digest(). You can copy the state of one of these -objects using crypto_digest_dup() or crypto_digest_assign(). - -We support the HMAC hash-based message authentication code -instantiated using SHA256. See crypto_hmac_sha256. (You should not -add any HMAC users with SHA1, and HMAC is not necessary with SHA3.) - -We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike -digests, these are extendable output functions (or XOFs) where you can -get any amount of output. Use the crypto_xof_\*() functions to access -these. - -We have several ways to derive keys from cryptographically strong secret -inputs (like diffie-hellman outputs). The old -crypto_expand_key_material-TAP() performs an ad-hoc KDF based on SHA1 -- you -shouldn't use it for implementing anything but old versions of the Tor -protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern -protocols. Also consider SHAKE256. - -If your input is potentially weak, like a password or passphrase, use a salt -along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer -scrypt over other hashing methods when possible. If you're using a password -to encrypt something, see the "boxed file storage" section below. - -Finally, in order to store objects in hash tables, Tor includes the -randomized SipHash 2-4 function. Call it via the siphash24g() function in -src/ext/siphash.h whenever you're creating a hashtable whose keys may be -manipulated by an attacker in order to DoS you with collisions. - - -### Stream ciphers ### - -You can create instances of a stream cipher using crypto_cipher_new(). -These are stateful objects of type crypto_cipher_t. Note that these -objects only support AES-128 right now; a future version should add -support for AES-128 and/or ChaCha20. - -You can encrypt/decrypt with crypto_cipher_encrypt or -crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs -an encryption without a copy. - -Note that sensible people should not use raw stream ciphers; they should -probably be using some kind of AEAD. Sorry. - -### Public key functionality ### - -We support four public key algorithms: DH1024, RSA, Curve25519, and -Ed25519. - -We support DH1024 over two prime groups. You access these via the -crypto_dh_\*() family of functions. - -We support RSA in many bit sizes for signing and encryption. You access -it via the crypto_pk_*() family of functions. Note that a crypto_pk_t -may or may not include a private key. See the crypto_pk_* functions in -crypto.c for a full list of functions here. - -For Curve25519 functionality, see the functions and types in -crypto_curve25519.c. Curve25519 is generally suitable for when you need -a secure fast elliptic-curve diffie hellman implementation. When -designing new protocols, prefer it over DH in Z_p. - -For Ed25519 functionality, see the functions and types in -crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast -elliptic curve signature method. For new protocols, prefer it over RSA -signatures. - -### Metaformats for storage ### - -When OpenSSL manages the storage of some object, we use whatever format -OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding -that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----". - -When we manage the storage of some cryptographic object, we prefix the -object with 32-byte NUL-padded prefix in order to avoid accidental -object confusion; see the crypto_read_tagged_contents_from_file() and -crypto_write_tagged_contents_to_file() functions for manipulating -these. The prefix is "== type: tag ==", where type describes the object -and its encoding, and tag indicates which one it is. - -### Boxed-file storage ### - -When managing keys, you frequently want to have some way to write a -secret object to disk, encrypted with a passphrase. The crypto_pwbox -and crypto_unpwbox functions do so in a way that's likely to be -readable by future versions of Tor. ### Certificates ### @@ -153,17 +25,3 @@ napkin. documents that include keys and which are signed by keys. You can consider these documents to be an additional kind of certificate if you want.) - -### TLS ### - -Tor's TLS implementation is more tightly coupled to OpenSSL than we'd -prefer. You can read most of it in tortls.c. - -Unfortunately, TLS's state machine and our requirement for nonblocking -IO support means that using TLS in practice is a bit hairy, since -logical writes can block on a physical reads, and vice versa. - -If you are lucky, you will never have to look at the code here. - - - diff --git a/doc/HACKING/design/03-modules.md b/doc/HACKING/design/03-modules.md index 93eb9d3089..9ab2fa7da3 100644 --- a/doc/HACKING/design/03-modules.md +++ b/doc/HACKING/design/03-modules.md @@ -1,95 +1,6 @@ ## Tor's modules ## -### Generic modules ### - -`buffers.c` -: Implements the `buf_t` buffered data type for connections, and several -low-level data handling functions to handle network protocols on it. - -`channel.c` -: Generic channel implementation. Channels handle sending and receiving cells -among tor nodes. - -`channeltls.c` -: Channel implementation for TLS-based OR connections. Uses `connection_or.c`. - -`circuitbuild.c` -: Code for constructing circuits and choosing their paths. (*Note*: -this module could plausibly be split into handling the client side, -the server side, and the path generation aspects of circuit building.) - -`circuitlist.c` -: Code for maintaining and navigating the global list of circuits. - -`circuitmux.c` -: Generic circuitmux implementation. A circuitmux handles deciding, for a -particular channel, which circuit should write next. - -`circuitmux_ewma.c` -: A circuitmux implementation based on the EWMA (exponentially -weighted moving average) algorithm. - -`circuituse.c` -: Code to actually send and receive data on circuits. - -`command.c` -: Handles incoming cells on channels. - -`config.c` -: Parses options from torrc, and uses them to configure the rest of Tor. - -`confparse.c` -: Generic torrc-style parser. Used to parse torrc and state files. - -`connection.c` -: Generic and common connection tools, and implementation for the simpler -connection types. - -`connection_edge.c` -: Implementation for entry and exit connections. - -`connection_or.c` -: Implementation for OR connections (the ones that send cells over TLS). - -`main.c` -: Principal entry point, main loops, scheduled events, and network -management for Tor. - -`ntmain.c` -: Implements Tor as a Windows service. (Not very well.) - -`onion.c` -: Generic code for generating and responding to CREATE and CREATED -cells, and performing the appropriate onion handshakes. Also contains -code to manage the server-side onion queue. - -`onion_fast.c` -: Implements the old SHA1-based CREATE_FAST/CREATED_FAST circuit -creation handshake. (Now deprecated.) - -`onion_ntor.c` -: Implements the Curve25519-based NTOR circuit creation handshake. - -`onion_tap.c` -: Implements the old RSA1024/DH1024-based TAP circuit creation handshake. (Now -deprecated.) - -`relay.c` -: Handles particular types of relay cells, and provides code to receive, -encrypt, route, and interpret relay cells. - -`scheduler.c` -: Decides which channel/circuit pair is ready to receive the next cell. - -`statefile.c` -: Handles loading and storing Tor's state file. - -`tor_main.c` -: Contains the actual `main()` function. (This is placed in a separate -file so that the unit tests can have their own `main()`.) - - ### Node-status modules ### `directory.c` diff --git a/src/app/app.dox b/src/app/app.dox index 29e8651d51..21d5791cde 100644 --- a/src/app/app.dox +++ b/src/app/app.dox @@ -1,5 +1,5 @@ /** -@dir app +@dir /app @brief app: top-level entry point for Tor The "app" directory has Tor's main entry point and configuration logic, diff --git a/src/app/config/app_config.dox b/src/app/config/app_config.dox index 03762fd27d..ef4a878277 100644 --- a/src/app/config/app_config.dox +++ b/src/app/config/app_config.dox @@ -1,4 +1,8 @@ /** -@dir app/config -@brief app/config +@dir /app/config +@brief app/config: Top-level configuration code + +Refactoring this module is a work in progress, see +[ticket 29211](https://trac.torproject.org/projects/tor/ticket/29211). + **/ diff --git a/src/app/main/app_main.dox b/src/app/main/app_main.dox index 1d94f89814..c714ad1396 100644 --- a/src/app/main/app_main.dox +++ b/src/app/main/app_main.dox @@ -1,4 +1,4 @@ /** -@dir app/main -@brief app/main +@dir /app/main +@brief app/main: Entry point for tor. **/ diff --git a/src/core/core.dox b/src/core/core.dox index 1352daebd3..11bf55cb78 100644 --- a/src/core/core.dox +++ b/src/core/core.dox @@ -1,8 +1,20 @@ /** -@dir core +@dir /core @brief core: main loop and onion routing functionality The "core" directory has the central protocols for Tor, which every client and relay must implement in order to perform onion routing. +It is divided into three lower-level pieces: + + - \refdir{core/crypto} -- Tor-specific cryptography. + + - \refdir{core/proto} -- Protocol encoding/decoding. + + - \refdir{core/mainloop} -- A connection-oriented asynchronous mainloop. + +and one high-level piece: + + - \refdir{core/or} -- Implements onion routing itself. + **/ diff --git a/src/core/crypto/core_crypto.dox b/src/core/crypto/core_crypto.dox index e5acdd6528..28ece92bb8 100644 --- a/src/core/crypto/core_crypto.dox +++ b/src/core/crypto/core_crypto.dox @@ -1,4 +1,8 @@ /** -@dir core/crypto -@brief core/crypto +@dir /core/crypto +@brief core/crypto: Tor-specific cryptography + +This module implements Tor's circuit-construction crypto and Tor's +relay crypto. + **/ diff --git a/src/core/mainloop/core_mainloop.dox b/src/core/mainloop/core_mainloop.dox index 9b32cb7f60..28cd42bf60 100644 --- a/src/core/mainloop/core_mainloop.dox +++ b/src/core/mainloop/core_mainloop.dox @@ -1,4 +1,12 @@ /** -@dir core/mainloop -@brief core/mainloop +@dir /core/mainloop +@brief core/mainloop: Non-onion-routing mainloop functionality + +This module uses the event-loop code of \refdir{lib/evloop} to implement an +asynchronous connection-oriented protocol handler. + +The layering here is imperfect: the code here was split from \refdir{core/or} +without refactoring how the two modules call one another. Probably many +functions should be moved and refactored. + **/ diff --git a/src/core/or/core_or.dox b/src/core/or/core_or.dox index 1289a85c80..705e9b5436 100644 --- a/src/core/or/core_or.dox +++ b/src/core/or/core_or.dox @@ -1,4 +1,62 @@ /** -@dir core/or -@brief core/or -**/ +@dir /core/or +@brief core/or: *Onion routing happens here*. + +This is the central part of Tor that handles the core tasks of onion routing: +building circuit, handling circuits, attaching circuit to streams, moving +data around, and so forth. + +Some aspects of this module should probably be refactored into others. + +Notable files here include: + +`channel.c` +: Generic channel implementation. Channels handle sending and receiving cells +among tor nodes. + +`channeltls.c` +: Channel implementation for TLS-based OR connections. Uses `connection_or.c`. + +`circuitbuild.c` +: Code for constructing circuits and choosing their paths. (*Note*: +this module could plausibly be split into handling the client side, +the server side, and the path generation aspects of circuit building.) + +`circuitlist.c` +: Code for maintaining and navigating the global list of circuits. + +`circuitmux.c` +: Generic circuitmux implementation. A circuitmux handles deciding, for a +particular channel, which circuit should write next. + +`circuitmux_ewma.c` +: A circuitmux implementation based on the EWMA (exponentially +weighted moving average) algorithm. + +`circuituse.c` +: Code to actually send and receive data on circuits. + +`command.c` +: Handles incoming cells on channels. + +`connection.c` +: Generic and common connection tools, and implementation for the simpler +connection types. + +`connection_edge.c` +: Implementation for entry and exit connections. + +`connection_or.c` +: Implementation for OR connections (the ones that send cells over TLS). + +`onion.c` +: Generic code for generating and responding to CREATE and CREATED +cells, and performing the appropriate onion handshakes. Also contains +code to manage the server-side onion queue. + +`relay.c` +: Handles particular types of relay cells, and provides code to receive, +encrypt, route, and interpret relay cells. + +`scheduler.c` +: Decides which channel/circuit pair is ready to receive the next cell. diff --git a/src/core/proto/core_proto.dox b/src/core/proto/core_proto.dox index 3e1e4ddb6d..13ce751a76 100644 --- a/src/core/proto/core_proto.dox +++ b/src/core/proto/core_proto.dox @@ -1,4 +1,8 @@ /** -@dir core/proto -@brief core/proto +@dir /core/proto +@brief core/proto: Protocol encoding/decoding + +These functions should (but do not always) exist at a lower level than most +of the rest of core. + **/ diff --git a/src/feature/api/feature_api.dox b/src/feature/api/feature_api.dox index cb723b0601..06112120c3 100644 --- a/src/feature/api/feature_api.dox +++ b/src/feature/api/feature_api.dox @@ -1,4 +1,4 @@ /** -@dir feature/api -@brief feature/api +@dir /feature/api +@brief feature/api: In-process interface to starting/stopping Tor. **/ diff --git a/src/feature/client/feature_client.dox b/src/feature/client/feature_client.dox index 1a4881c50a..a8263b494c 100644 --- a/src/feature/client/feature_client.dox +++ b/src/feature/client/feature_client.dox @@ -1,4 +1,7 @@ /** -@dir feature/client -@brief feature/client +@dir /feature/client +@brief feature/client: Client-specific code + +(There is also a bunch of client-specific code in other modules.) + **/ diff --git a/src/feature/control/feature_control.dox b/src/feature/control/feature_control.dox index 1f6e83c1dd..a0bf9413a1 100644 --- a/src/feature/control/feature_control.dox +++ b/src/feature/control/feature_control.dox @@ -1,4 +1,10 @@ /** -@dir feature/control -@brief feature/control +@dir /feature/control +@brief feature/control: Controller API. + +The Controller API is a text-based protocol that another program (or another +thread, if you're running Tor in-process) can use to configure and control +Tor while it is running. The current protocol is documented in +[control-spec.txt](https://gitweb.torproject.org/torspec.git/tree/control-spec.txt). + **/ diff --git a/src/feature/dirauth/feature_dirauth.dox b/src/feature/dirauth/feature_dirauth.dox index fa4bee5b31..9ee2d04589 100644 --- a/src/feature/dirauth/feature_dirauth.dox +++ b/src/feature/dirauth/feature_dirauth.dox @@ -1,4 +1,11 @@ /** -@dir feature/dirauth -@brief feature/dirauth +@dir /feature/dirauth +@brief feature/dirauth: Directory authority implementation. + +This module handles running Tor as a directory authority. + +The directory protocol is specified in +[dir-spec.txt](https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt). + + **/ diff --git a/src/feature/dircache/feature_dircache.dox b/src/feature/dircache/feature_dircache.dox index 5f1c5cc70f..ef8a51aa9e 100644 --- a/src/feature/dircache/feature_dircache.dox +++ b/src/feature/dircache/feature_dircache.dox @@ -1,4 +1,8 @@ /** -@dir feature/dircache -@brief feature/dircache +@dir /feature/dircache +@brief feature/dircache: Run as a directory cache server + +This module handles the directory caching functionality that all relays may +provide, for serving cached directory objects to objects. + **/ diff --git a/src/feature/dirclient/feature_dirclient.dox b/src/feature/dirclient/feature_dirclient.dox index 984a17cf51..0cbae69111 100644 --- a/src/feature/dirclient/feature_dirclient.dox +++ b/src/feature/dirclient/feature_dirclient.dox @@ -1,4 +1,9 @@ /** -@dir feature/dirclient -@brief feature/dirclient +@dir /feature/dirclient +@brief feature/dirclient: Directory client implementation. + +The code here is used by all Tor instances that need to download directory +information. Currently, that is all of them, since even authorities need to +launch downloads to learn about relays that other authorities have listed. + **/ diff --git a/src/feature/dircommon/feature_dircommon.dox b/src/feature/dircommon/feature_dircommon.dox index 2eff21065c..2d9866da01 100644 --- a/src/feature/dircommon/feature_dircommon.dox +++ b/src/feature/dircommon/feature_dircommon.dox @@ -1,4 +1,9 @@ /** -@dir feature/dircommon -@brief feature/dircommon +@dir /feature/dircommon +@brief feature/dircommon: Directory client and server shared code + +This module has the code that directory clients (anybody who download +information about relays) and directory servers (anybody who serves such +information) share in common. + **/ diff --git a/src/feature/dirparse/feature_dirparse.dox b/src/feature/dirparse/feature_dirparse.dox index a6b34c1f5f..4f2136b02b 100644 --- a/src/feature/dirparse/feature_dirparse.dox +++ b/src/feature/dirparse/feature_dirparse.dox @@ -1,4 +1,10 @@ /** -@dir feature/dirparse -@brief feature/dirparse +@dir /feature/dirparse +@brief feature/dirparse: Parsing Tor directory objects + +We define a number of "directory objects" in +[dir-spec.txt](https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt), +all of them using a common line-oriented meta-format. This module is used by +other parts of Tor to parse them. + **/ diff --git a/src/feature/feature.dox b/src/feature/feature.dox index 1d9c3a9df4..03759f9a17 100644 --- a/src/feature/feature.dox +++ b/src/feature/feature.dox @@ -1,5 +1,5 @@ /** -@dir feature +@dir /feature @brief feature: domain-specific modules The "feature" directory has modules that Tor uses only for a particular diff --git a/src/feature/hibernate/feature_hibernate.dox b/src/feature/hibernate/feature_hibernate.dox index e24620a43c..eebb2d51a2 100644 --- a/src/feature/hibernate/feature_hibernate.dox +++ b/src/feature/hibernate/feature_hibernate.dox @@ -1,4 +1,16 @@ /** -@dir feature/hibernate -@brief feature/hibernate +@dir /feature/hibernate +@brief feature/hibernate: Bandwidth accounting and hibernation (!) + +This module implements two features that are only somewhat related, and +should probably be separated in the future. One feature is bandwidth +accounting (making sure we use no more than so many gigabytes in a day) and +hibernation (avoiding network activity while we have used up all/most of our +configured gigabytes). The other feature is clean shutdown, where we stop +accepting new connections for a while and give the old ones time to close. + +The two features are related only in the sense that "soft hibernation" (being +almost out of ) is very close to the "shutting down" state. But it would be +better in the long run to make the two completely separate. + **/ diff --git a/src/feature/hs/feature_hs.dox b/src/feature/hs/feature_hs.dox index 08801d002d..32f44d57fb 100644 --- a/src/feature/hs/feature_hs.dox +++ b/src/feature/hs/feature_hs.dox @@ -1,4 +1,10 @@ /** -@dir feature/hs -@brief feature/hs +@dir /feature/hs +@brief feature/hs: v3 (current) onion service protocol + +This directory implements the v3 onion service protocol, +as specified in +[rend-spec-v3.txt](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt). + + **/ diff --git a/src/feature/hs_common/feature_hs_common.dox b/src/feature/hs_common/feature_hs_common.dox index 8fd4f1b07c..85d7585872 100644 --- a/src/feature/hs_common/feature_hs_common.dox +++ b/src/feature/hs_common/feature_hs_common.dox @@ -1,4 +1,5 @@ /** -@dir feature/hs_common -@brief feature/hs_common +@dir /feature/hs_common +@brief feature/hs_common: Common to v2 (old) and v3 (current) onion services + **/ diff --git a/src/feature/keymgt/feature_keymgt.dox b/src/feature/keymgt/feature_keymgt.dox index 8f72c70bbd..acc840eb2e 100644 --- a/src/feature/keymgt/feature_keymgt.dox +++ b/src/feature/keymgt/feature_keymgt.dox @@ -1,4 +1,5 @@ /** -@dir feature/keymgt -@brief feature/keymgt +@dir /feature/keymgt +@brief feature/keymgt: Store keys for relays, authorities, etc. + **/ diff --git a/src/feature/nodelist/feature_nodelist.dox b/src/feature/nodelist/feature_nodelist.dox index faeb9970b3..0b25dd246d 100644 --- a/src/feature/nodelist/feature_nodelist.dox +++ b/src/feature/nodelist/feature_nodelist.dox @@ -1,4 +1,4 @@ /** -@dir feature/nodelist -@brief feature/nodelist +@dir /feature/nodelist +@brief feature/nodelist: Download and manage a list of relays **/ diff --git a/src/feature/relay/feature_relay.dox b/src/feature/relay/feature_relay.dox index 9aa7af48e6..6867818257 100644 --- a/src/feature/relay/feature_relay.dox +++ b/src/feature/relay/feature_relay.dox @@ -1,4 +1,6 @@ /** -@dir feature/relay -@brief feature/relay +@dir /feature/relay +@brief feature/relay: Relay-specific code + +(There is also a bunch of relay-specific code in other modules.) **/ diff --git a/src/feature/rend/feature_rend.dox b/src/feature/rend/feature_rend.dox index fcba0d460f..ed0784521c 100644 --- a/src/feature/rend/feature_rend.dox +++ b/src/feature/rend/feature_rend.dox @@ -1,4 +1,9 @@ /** -@dir feature/rend -@brief feature/rend +@dir /feature/rend +@brief feature/rend: version 2 (old) hidden services + +This directory implements the v2 onion service protocol, +as specified in +[rend-spec-v2.txt](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v2.txt). + **/ diff --git a/src/feature/stats/feature_stats.dox b/src/feature/stats/feature_stats.dox index fc4ffd19df..0ced00ce58 100644 --- a/src/feature/stats/feature_stats.dox +++ b/src/feature/stats/feature_stats.dox @@ -1,4 +1,12 @@ /** -@dir feature/stats -@brief feature/stats +@dir /feature/stats +@brief feature/stats: Relay statistics. Also, port prediction. + +This module collects anonymized relay statistics in order to publish them in +relays' routerinfo and extrainfo documents. + +Additionally, it contains predict_ports.c, which remembers which ports we've +visited recently as a client, so we can make sure we have open circuits that +support them. + **/ diff --git a/src/lib/arch/lib_arch.dox b/src/lib/arch/lib_arch.dox index 60b5fafeb4..edb0cbbf1d 100644 --- a/src/lib/arch/lib_arch.dox +++ b/src/lib/arch/lib_arch.dox @@ -1,4 +1,4 @@ /** -@dir lib/arch -@brief lib/arch +@dir /lib/arch +@brief lib/arch: Compatibility code for handling different CPU architectures. **/ diff --git a/src/lib/buf/lib_buf.dox b/src/lib/buf/lib_buf.dox index f21c4b1b72..a2ac23ee4c 100644 --- a/src/lib/buf/lib_buf.dox +++ b/src/lib/buf/lib_buf.dox @@ -1,4 +1,15 @@ /** -@dir lib/buf -@brief lib/buf +@dir /lib/buf +@brief lib/buf: An efficient byte queue. + +This module defines the buf_t type, which is used throughout our networking +code. The implementation is a singly-linked queue of buffer chunks, similar +to the BSD kernel's +["mbuf"](https://www.freebsd.org/cgi/man.cgi?query=mbuf&sektion=9) structure. + +The buf_t type is also reasonable for use in constructing long strings. + +See \refdir{lib/net} for networking code that uses buf_t, and +\refdir{lib/tls} for cryptographic code that uses buf_t. + **/ diff --git a/src/lib/cc/lib_cc.dox b/src/lib/cc/lib_cc.dox index 804260cb29..06f4e775bf 100644 --- a/src/lib/cc/lib_cc.dox +++ b/src/lib/cc/lib_cc.dox @@ -1,4 +1,4 @@ /** -@dir lib/cc -@brief lib/cc +@dir /lib/cc +@brief lib/cc: Macros for managing the C compiler and language. **/ diff --git a/src/lib/compress/lib_compress.dox b/src/lib/compress/lib_compress.dox index ac60794565..599126901a 100644 --- a/src/lib/compress/lib_compress.dox +++ b/src/lib/compress/lib_compress.dox @@ -1,4 +1,8 @@ /** -@dir lib/compress -@brief lib/compress +@dir /lib/compress +@brief lib/compress: Wraps several compression libraries + +Currently supported are zlib (mandatory), zstd (optional), and lzma +(optional). + **/ diff --git a/src/lib/conf/lib_conf.dox b/src/lib/conf/lib_conf.dox index 40a1d9f90f..be58fe5b55 100644 --- a/src/lib/conf/lib_conf.dox +++ b/src/lib/conf/lib_conf.dox @@ -1,4 +1,5 @@ /** -@dir lib/conf -@brief lib/conf +@dir /lib/conf +@brief lib/conf: Types and macros for declaring configuration options. + **/ diff --git a/src/lib/confmgt/lib_confmgt.dox b/src/lib/confmgt/lib_confmgt.dox index 964fe1d074..d18fa304ca 100644 --- a/src/lib/confmgt/lib_confmgt.dox +++ b/src/lib/confmgt/lib_confmgt.dox @@ -1,4 +1,9 @@ /** -@dir lib/confmgt -@brief lib/confmgt +@dir /lib/confmgt +@brief lib/confmgt: Parse, encode, manipulate configuration files. + +This logic is used in common by our state files (statefile.c) and +configuration files (config.c) to manage a set of named, typed fields, +reading and writing them to disk and to the controller. + **/ diff --git a/src/lib/container/lib_container.dox b/src/lib/container/lib_container.dox index 6ee719f47e..675aaeef3f 100644 --- a/src/lib/container/lib_container.dox +++ b/src/lib/container/lib_container.dox @@ -1,4 +1,51 @@ /** -@dir lib/container -@brief lib/container +@dir /lib/container +@brief lib/container: Hash tables, dynamic arrays, bit arrays, etc. + +### Smartlists: Neither lists, nor especially smart. + +For historical reasons, we call our dynamic-allocated array type +`smartlist_t`. It can grow or shrink as elements are added and removed. + +All smartlists hold an array of `void *`. Whenever you expose a smartlist +in an API you *must* document which types its pointers actually hold. + +<!-- It would be neat to fix that, wouldn't it? -NM --> + +Smartlists are created empty with `smartlist_new()` and freed with +`smartlist_free()`. See the `containers.h` header documentation for more +information; there are many convenience functions for commonly needed +operations. + +For low-level operations on smartlists, see also +\refdir{lib/smartlist_core}. + +<!-- TODO: WRITE more about what you can do with smartlists. --> + +### Digest maps, string maps, and more. + +Tor makes frequent use of maps from 160-bit digests, 256-bit digests, +or nul-terminated strings to `void *`. These types are `digestmap_t`, +`digest256map_t`, and `strmap_t` respectively. See the containers.h +module documentation for more information. + +### Intrusive lists and hashtables + +For performance-sensitive cases, we sometimes want to use "intrusive" +collections: ones where the bookkeeping pointers are stuck inside the +structures that belong to the collection. If you've used the +BSD-style sys/queue.h macros, you'll be familiar with these. + +Unfortunately, the `sys/queue.h` macros vary significantly between the +platforms that have them, so we provide our own variants in +`ext/tor_queue.h`. + +We also provide an intrusive hashtable implementation in `ext/ht.h`. +When you're using it, you'll need to define your own hash +functions. If attacker-induced collisions are a worry here, use the +cryptographic siphash24g function to extract hashes. + +<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions. +--> + **/ diff --git a/src/lib/crypt_ops/lib_crypt_ops.dox b/src/lib/crypt_ops/lib_crypt_ops.dox index 1ea0b67d59..515c67f1c0 100644 --- a/src/lib/crypt_ops/lib_crypt_ops.dox +++ b/src/lib/crypt_ops/lib_crypt_ops.dox @@ -1,4 +1,139 @@ /** -@dir lib/crypt_ops -@brief lib/crypt_ops +@dir /lib/crypt_ops +@brief lib/crypt_ops: Cryptographic operations. + +This module contains wrappers around the cryptographic libraries that we +support, and implementations for some higher-level cryptographic +constructions that we use. + +It wraps our two major cryptographic backends (OpenSSL or NSS, as configured +by the user), and also wraps other cryptographic code in src/ext. + +Generally speaking, Tor code shouldn't be calling OpenSSL or NSS +(or any other crypto library) directly. Instead, we should indirect through +one of the functions in this directory, or through \refdir{lib/tls}. + +Cryptography functionality that's available is described below. + +### RNG facilities ### + +The most basic RNG capability in Tor is the crypto_rand() family of +functions. These currently use OpenSSL's RAND_() backend, but may use +something faster in the future. + +In addition to crypto_rand(), which fills in a buffer with random +bytes, we also have functions to produce random integers in certain +ranges; to produce random hostnames; to produce random doubles, etc. + +When you're creating a long-term cryptographic secret, you might want +to use crypto_strongest_rand() instead of crypto_rand(). It takes the +operating system's entropy source and combines it with output from +crypto_rand(). This is a pure paranoia measure, but it might help us +someday. + +You can use smartlist_choose() to pick a random element from a smartlist +and smartlist_shuffle() to randomize the order of a smartlist. Both are +potentially a bit slow. + +### Cryptographic digests and related functions ### + +We treat digests as separate types based on the length of their +outputs. We support one 160-bit digest (SHA1), two 256-bit digests +(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512). + +You should not use SHA1 for anything new. + +The crypto_digest\*() family of functions manipulates digests. You +can either compute a digest of a chunk of memory all at once using +crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you +can create a crypto_digest_t object with +crypto_digest{,256,512}_new(), feed information to it in chunks using +crypto_digest_add_bytes(), and then extract the final digest using +crypto_digest_get_digest(). You can copy the state of one of these +objects using crypto_digest_dup() or crypto_digest_assign(). + +We support the HMAC hash-based message authentication code +instantiated using SHA256. See crypto_hmac_sha256. (You should not +add any HMAC users with SHA1, and HMAC is not necessary with SHA3.) + +We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike +digests, these are extendable output functions (or XOFs) where you can +get any amount of output. Use the crypto_xof_\*() functions to access +these. + +We have several ways to derive keys from cryptographically strong secret +inputs (like diffie-hellman outputs). The old +crypto_expand_key_material_TAP() performs an ad-hoc KDF based on SHA1 -- you +shouldn't use it for implementing anything but old versions of the Tor +protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern +protocols. Also consider SHAKE256. + +If your input is potentially weak, like a password or passphrase, use a salt +along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer +scrypt over other hashing methods when possible. If you're using a password +to encrypt something, see the "boxed file storage" section below. + +Finally, in order to store objects in hash tables, Tor includes the +randomized SipHash 2-4 function. Call it via the siphash24g() function in +src/ext/siphash.h whenever you're creating a hashtable whose keys may be +manipulated by an attacker in order to DoS you with collisions. + + +### Stream ciphers ### + +You can create instances of a stream cipher using crypto_cipher_new(). +These are stateful objects of type crypto_cipher_t. Note that these +objects only support AES-128 right now; a future version should add +support for AES-128 and/or ChaCha20. + +You can encrypt/decrypt with crypto_cipher_encrypt or +crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs +an encryption without a copy. + +Note that sensible people should not use raw stream ciphers; they should +probably be using some kind of AEAD. Sorry. + +### Public key functionality ### + +We support four public key algorithms: DH1024, RSA, Curve25519, and +Ed25519. + +We support DH1024 over two prime groups. You access these via the +crypto_dh_\*() family of functions. + +We support RSA in many bit sizes for signing and encryption. You access +it via the crypto_pk_*() family of functions. Note that a crypto_pk_t +may or may not include a private key. See the crypto_pk_* functions in +crypto.c for a full list of functions here. + +For Curve25519 functionality, see the functions and types in +crypto_curve25519.c. Curve25519 is generally suitable for when you need +a secure fast elliptic-curve diffie hellman implementation. When +designing new protocols, prefer it over DH in Z_p. + +For Ed25519 functionality, see the functions and types in +crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast +elliptic curve signature method. For new protocols, prefer it over RSA +signatures. + +### Metaformats for storage ### + +When OpenSSL manages the storage of some object, we use whatever format +OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding +that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----". + +When we manage the storage of some cryptographic object, we prefix the +object with 32-byte NUL-padded prefix in order to avoid accidental +object confusion; see the crypto_read_tagged_contents_from_file() and +crypto_write_tagged_contents_to_file() functions for manipulating +these. The prefix is "== type: tag ==", where type describes the object +and its encoding, and tag indicates which one it is. + +### Boxed-file storage ### + +When managing keys, you frequently want to have some way to write a +secret object to disk, encrypted with a passphrase. The crypto_pwbox +and crypto_unpwbox functions do so in a way that's likely to be +readable by future versions of Tor. + **/ diff --git a/src/lib/ctime/lib_ctime.dox b/src/lib/ctime/lib_ctime.dox index 476c95991c..2bcd0f036a 100644 --- a/src/lib/ctime/lib_ctime.dox +++ b/src/lib/ctime/lib_ctime.dox @@ -1,4 +1,16 @@ /** -@dir lib/ctime -@brief lib/ctime +@dir /lib/ctime +@brief lib/ctime: Constant-time code to avoid side-channels. + +This module contains constant-time implementations of various +data comparison and table lookup functions. We use these in preference to +memcmp() and so forth, since memcmp() can leak information about its inputs +based on how fast it returns. In general, your code should call tor_memeq() +and tor_memneq(), not memcmp(). + +We also define some _non_-constant-time wrappers for memcmp() here: Since we +consider calls to memcmp() to be in error, we require that code that actually +doesn't need to be constant-time to use the fast_memeq() / fast_memneq() / +fast_memcmp() aliases instead. + **/ diff --git a/src/lib/defs/lib_defs.dox b/src/lib/defs/lib_defs.dox index 5adb527fc7..8ed4d7a0af 100644 --- a/src/lib/defs/lib_defs.dox +++ b/src/lib/defs/lib_defs.dox @@ -1,4 +1,4 @@ /** -@dir lib/defs -@brief lib/defs +@dir /lib/defs +@brief lib/defs: Lowest-level constants, used in many places. **/ diff --git a/src/lib/dispatch/lib_dispatch.dox b/src/lib/dispatch/lib_dispatch.dox index f194eff481..955b7df64f 100644 --- a/src/lib/dispatch/lib_dispatch.dox +++ b/src/lib/dispatch/lib_dispatch.dox @@ -1,4 +1,16 @@ /** -@dir lib/dispatch -@brief lib/dispatch +@dir /lib/dispatch +@brief lib/dispatch: In-process message delivery. + +This module provides a general in-process "message dispatch" system in which +typed messages are sent on channels. The dispatch.h header has far more +information. + +It is used by by \refdir{lib/pubsub} to implement our general +inter-module publish/subscribe system. + +This is not a fancy multi-threaded many-to-many dispatcher as you may be used +to from more sophisticated architectures: this dispatcher is intended only +for use in improving Tor's architecture. + **/ diff --git a/src/lib/encoding/lib_encoding.dox b/src/lib/encoding/lib_encoding.dox index 4a5fad9271..ca698cb183 100644 --- a/src/lib/encoding/lib_encoding.dox +++ b/src/lib/encoding/lib_encoding.dox @@ -1,4 +1,8 @@ /** -@dir lib/encoding -@brief lib/encoding +@dir /lib/encoding +@brief lib/encoding: Encoding data in various forms, types, and transformations + +Here we have time formats (timefmt.c), quoted strings (qstring.c), C strings +(string.c) base-16/32/64 (binascii.c), and more. + **/ diff --git a/src/lib/err/lib_err.dox b/src/lib/err/lib_err.dox index 8994fa5fd8..d1479b1140 100644 --- a/src/lib/err/lib_err.dox +++ b/src/lib/err/lib_err.dox @@ -1,4 +1,15 @@ /** -@dir lib/err -@brief lib/err +@dir /lib/err +@brief lib/err: Lowest-level error handling code. + +This module is responsible for generating stack traces, handling raw +assertion failures, and otherwise reporting problems that might not be +safe to report via the regular logging module. + +There are three kinds of users for the functions in this module: + * Code that needs a way to assert(), but which cannot use the regular + `tor_assert()` macros in logging module. + * Code that needs signal-safe error reporting. + * Higher-level error handling code. + **/ diff --git a/src/lib/evloop/lib_evloop.dox b/src/lib/evloop/lib_evloop.dox index 86b60e3cd5..52fcf67755 100644 --- a/src/lib/evloop/lib_evloop.dox +++ b/src/lib/evloop/lib_evloop.dox @@ -1,4 +1,9 @@ /** -@dir lib/evloop -@brief lib/evloop +@dir /lib/evloop +@brief lib/evloop: Low-level event loop. + +This modules has tools to manage the [libevent](https://libevent.org/) event +loop and related functionality, in order to implement asynchronous +networking, timers, periodic events, and other scheduling tasks. + **/ diff --git a/src/lib/fdio/lib_fdio.dox b/src/lib/fdio/lib_fdio.dox index b868d28aab..9e2fda617a 100644 --- a/src/lib/fdio/lib_fdio.dox +++ b/src/lib/fdio/lib_fdio.dox @@ -1,4 +1,7 @@ /** -@dir lib/fdio -@brief lib/fdio +@dir /lib/fdio +@brief lib/fdio: Code to read/write on file descriptors. + +(This module also handles sockets, on platforms where a socket is not a kind +of fd.) **/ diff --git a/src/lib/fs/lib_fs.dox b/src/lib/fs/lib_fs.dox index ad775ba553..4466250bb8 100644 --- a/src/lib/fs/lib_fs.dox +++ b/src/lib/fs/lib_fs.dox @@ -1,4 +1,11 @@ /** -@dir lib/fs -@brief lib/fs +@dir /lib/fs +@brief lib/fs: Files, filenames, directories, etc. + +This module is mostly a set of compatibility wrappers around +operating-system-specific filesystem access. + +It also contains a set of convenience functions for safely writing to files, +creating directories, and so on. + **/ diff --git a/src/lib/geoip/lib_geoip.dox b/src/lib/geoip/lib_geoip.dox index 7ad99e8f55..da1123640b 100644 --- a/src/lib/geoip/lib_geoip.dox +++ b/src/lib/geoip/lib_geoip.dox @@ -1,4 +1,5 @@ /** -@dir lib/geoip -@brief lib/geoip +@dir /lib/geoip +@brief lib/geoip: IP-to-country mapping + **/ diff --git a/src/lib/intmath/lib_intmath.dox b/src/lib/intmath/lib_intmath.dox index ce71e455d1..e9b7044706 100644 --- a/src/lib/intmath/lib_intmath.dox +++ b/src/lib/intmath/lib_intmath.dox @@ -1,4 +1,4 @@ /** -@dir lib/intmath -@brief lib/intmath +@dir /lib/intmath +@brief lib/intmath: Integer mathematics. **/ diff --git a/src/lib/lib.dox b/src/lib/lib.dox index f1b2291c76..fdf2c47687 100644 --- a/src/lib/lib.dox +++ b/src/lib/lib.dox @@ -1,8 +1,133 @@ /** -@dir lib +@dir /lib @brief lib: low-level functionality. -The "lib" directory contains low-level functionality, most of it not -necessarily Tor-specific. +The "lib" directory contains low-level functionality. In general, this +code is not necessarily Tor-specific, but is instead possibly useful for +other applications. + +The modules in `lib` are currently well-factored: each one depends +only on lower-level modules. You can see an up-to-date list of the +modules, sorted from lowest to highest level, by running +`./scripts/maint/practracker/includes.py --toposort`. + +As of this writing, the library modules are (from lowest to highest +level): + + - \refdir{lib/cc} -- Macros for managing the C compiler and + language. + + - \refdir{lib/version} -- Holds the current version of Tor. + + - \refdir{lib/testsupport} -- Helpers for making + test-only code, and test mocking support. + + - \refdir{lib/defs} -- Lowest-level constants. + + - \refdir{lib/subsys} -- Types used for declaring a + "subsystem". (_A subsystem is a module with support for initialization, + shutdown, configuration, and so on._) + + - \refdir{lib/conf} -- For declaring configuration options. + + - \refdir{lib/arch} -- For handling differences in CPU + architecture. + + - \refdir{lib/err} -- Lowest-level error handling code. + + - \refdir{lib/malloc} -- Memory management. + management. + + - \refdir{lib/intmath} -- Integer mathematics. + + - \refdir{lib/fdio} -- For + reading and writing n file descriptors. + + - \refdir{lib/lock} -- Simple locking support. + (_Lower-level than the rest of the threading code._) + + - \refdir{lib/ctime} -- Constant-time code to avoid + side-channels. + + - \refdir{lib/string} -- Low-level string manipulation. + + - \refdir{lib/wallclock} -- + For inspecting and manipulating the current (UTC) time. + + - \refdir{lib/osinfo} -- For inspecting the OS version + and capabilities. + + - \refdir{lib/smartlist_core} -- The bare-bones + pieces of our dynamic array ("smartlist") implementation. + + - \refdir{lib/log} -- Log messages to files, syslogs, etc. + + - \refdir{lib/container} -- General purpose containers, + including dynamic arrays ("smartlists"), hashtables, bit arrays, + etc. + + - \refdir{lib/trace} -- A general-purpose API + function-tracing functionality Tor. (_Currently not much used._) + + - \refdir{lib/thread} -- Mid-level Threading. + + - \refdir{lib/term} -- Terminal manipulation + (like reading a password from the user). + + - \refdir{lib/memarea} -- A fast + "arena" style allocator, where the data is freed all at once. + + - \refdir{lib/encoding} -- Encoding + data in various formats, datatypes, and transformations. + + - \refdir{lib/dispatch} -- A general-purpose in-process + message delivery system. + + - \refdir{lib/sandbox} -- Our Linux seccomp2 sandbox + implementation. + + - \refdir{lib/pubsub} -- A publish/subscribe message passing system. + + - \refdir{lib/fs} -- Files, filenames, directories, etc. + + - \refdir{lib/confmgt} -- Parse, encode, and manipulate onfiguration files. + + - \refdir{lib/crypt_ops} -- Cryptographic operations. + + - \refdir{lib/meminfo} -- Functions for inspecting our + memory usage, if the malloc implementation exposes that to us. + + - \refdir{lib/time} -- Higher level time functions, including + fine-gained and monotonic timers. + + - \refdir{lib/math} -- Floating-point mathematical utilities. + + - \refdir{lib/buf} -- An efficient byte queue. + + - \refdir{lib/net} -- Networking code, including address + manipulation, compatibility wrappers, etc. + + - \refdir{lib/compress} -- Wraps several compression libraries. + + - \refdir{lib/geoip} -- IP-to-country mapping. + + - \refdir{lib/tls} -- TLS library wrappers. + + - \refdir{lib/evloop} -- Low-level event-loop. + + - \refdir{lib/process} -- Launch and manage subprocesses. + +### What belongs in lib? + +In general, if you can imagine some program wanting the functionality +you're writing, even if that program had nothing to do with Tor, your +functionality belongs in lib. + +If it falls into one of the existing "lib" categories, your +functionality belongs in lib. + +If you are using platform-specific `ifdef`s to manage compatibility +issues among platforms, you should probably consider whether you can +put your code into lib. **/ diff --git a/src/lib/lock/lib_lock.dox b/src/lib/lock/lib_lock.dox index 44693e7a69..868b5ba7d4 100644 --- a/src/lib/lock/lib_lock.dox +++ b/src/lib/lock/lib_lock.dox @@ -1,4 +1,8 @@ /** -@dir lib/lock -@brief lib/lock +@dir /lib/lock +@brief lib/lock: Simple locking support. + +This module is more low-level than the rest of the threading code, since it +is needed by more intermediate-level modules. + **/ diff --git a/src/lib/log/lib_log.dox b/src/lib/log/lib_log.dox index 915d652407..a772dc3207 100644 --- a/src/lib/log/lib_log.dox +++ b/src/lib/log/lib_log.dox @@ -1,4 +1,12 @@ /** -@dir lib/log -@brief lib/log +@dir /lib/log +@brief lib/log: Log messages to files, syslogs, etc. + +You can think of this as the logical "midpoint" of the +\refdir{lib} code": much of the higher-level code is higher-level +_because_ it uses the logging module, and much of the lower-level code is +specifically written to avoid having to log, because the logging module +depends on it. + + **/ diff --git a/src/lib/malloc/lib_malloc.dox b/src/lib/malloc/lib_malloc.dox index 4923f14463..c05e4c6473 100644 --- a/src/lib/malloc/lib_malloc.dox +++ b/src/lib/malloc/lib_malloc.dox @@ -1,4 +1,78 @@ /** -@dir lib/malloc -@brief lib/malloc +@dir /lib/malloc +@brief lib/malloc: Wrappers and utilities for memory management. + + +Tor imposes a few light wrappers over C's native malloc and free +functions, to improve convenience, and to allow wholescale replacement +of malloc and free as needed. + +You should never use 'malloc', 'calloc', 'realloc, or 'free' on their +own; always use the variants prefixed with 'tor_'. +They are the same as the standard C functions, with the following +exceptions: + + * `tor_free(NULL)` is a no-op. + * `tor_free()` is a macro that takes an lvalue as an argument and sets it to + NULL after freeing it. To avoid this behavior, you can use `tor_free_()` + instead. + * tor_malloc() and friends fail with an assertion if they are asked to + allocate a value so large that it is probably an underflow. + * It is always safe to `tor_malloc(0)`, regardless of whether your libc + allows it. + * `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail. + Instead, Tor will die with an assertion. This means that you never + need to check their return values. See the next subsection for + information on why we think this is a good idea. + +We define additional general-purpose memory allocation functions as well: + + * `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear + the intent to allocate a single zeroed-out value. + * `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function. + Use it for cases when you need to realloc() in a multiplication-safe + way. + +And specific-purpose functions as well: + + * `tor_strdup()` and `tor_strndup()` behaves as the underlying libc + functions, but use `tor_malloc()` instead of the underlying function. + * `tor_memdup()` copies a chunk of memory of a given size. + * `tor_memdup_nulterm()` copies a chunk of memory of a given size, then + NUL-terminates it just to be safe. + +#### Why assert on allocation failure? + +Why don't we allow `tor_malloc()` and its allies to return NULL? + +First, it's error-prone. Many programmers forget to check for NULL return +values, and testing for `malloc()` failures is a major pain. + +Second, it's not necessarily a great way to handle OOM conditions. It's +probably better (we think) to have a memory target where we dynamically free +things ahead of time in order to stay under the target. Trying to respond to +an OOM at the point of `tor_malloc()` failure, on the other hand, would involve +a rare operation invoked from deep in the call stack. (Again, that's +error-prone and hard to debug.) + +Third, thanks to the rise of Linux and other operating systems that allow +memory to be overcommitted, you can't actually ever rely on getting a NULL +from `malloc()` when you're out of memory; instead you have to use an approach +closer to tracking the total memory usage. + +#### Conventions for your own allocation functions. + +Whenever you create a new type, the convention is to give it a pair of +`x_new()` and `x_free_()` functions, named after the type. + +Calling `x_free(NULL)` should always be a no-op. + +There should additionally be an `x_free()` macro, defined in terms of +`x_free_()`. This macro should set its lvalue to NULL. You can define it +using the FREE_AND_NULL macro, as follows: + +``` +#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr)) +``` + **/ diff --git a/src/lib/math/lib_math.dox b/src/lib/math/lib_math.dox index c2e121dc8c..f20d7092b3 100644 --- a/src/lib/math/lib_math.dox +++ b/src/lib/math/lib_math.dox @@ -1,4 +1,8 @@ /** -@dir lib/math -@brief lib/math +@dir /lib/math +@brief lib/math: Floating-point math utilities. + +This module includes a bunch of floating-point compatibility code, and +implementations for several probability distributions. + **/ diff --git a/src/lib/memarea/lib_memarea.dox b/src/lib/memarea/lib_memarea.dox index dbd98de5ec..041191482d 100644 --- a/src/lib/memarea/lib_memarea.dox +++ b/src/lib/memarea/lib_memarea.dox @@ -1,4 +1,30 @@ /** -@dir lib/memarea -@brief lib/memarea +@dir /lib/memarea +@brief lib/memarea: A fast arena-style allocator. + +This module has a fast "arena" style allocator, where memory is freed all at +once. This kind of allocation is very fast and avoids fragmentation, at the +expense of requiring all the data to be freed at the same time. We use this +for parsing and diff calculations. + +It's often handy to allocate a large number of tiny objects, all of which +need to disappear at the same time. You can do this in tor using the +memarea.c abstraction, which uses a set of grow-only buffers for allocation, +and only supports a single "free" operation at the end. + +Using memareas also helps you avoid memory fragmentation. You see, some libc +malloc implementations perform badly on the case where a large number of +small temporary objects are allocated at the same time as a few long-lived +objects of similar size. But if you use tor_malloc() for the long-lived ones +and a memarea for the temporary object, the malloc implementation is likelier +to do better. + +To create a new memarea, use `memarea_new()`. To drop all the storage from a +memarea, and invalidate its pointers, use `memarea_drop_all()`. + +The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`, +`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous +to the similarly-named malloc() functions. There is intentionally no +`memarea_free()` or `memarea_realloc()`. + **/ diff --git a/src/lib/meminfo/lib_meminfo.dox b/src/lib/meminfo/lib_meminfo.dox index c8def7e2f9..b57e60525e 100644 --- a/src/lib/meminfo/lib_meminfo.dox +++ b/src/lib/meminfo/lib_meminfo.dox @@ -1,4 +1,7 @@ /** -@dir lib/meminfo -@brief lib/meminfo +@dir /lib/meminfo +@brief lib/meminfo: Inspecting malloc() usage. + +Only available when malloc() provides mallinfo() or something similar. + **/ diff --git a/src/lib/net/lib_net.dox b/src/lib/net/lib_net.dox index 03783c12aa..b4c00405d7 100644 --- a/src/lib/net/lib_net.dox +++ b/src/lib/net/lib_net.dox @@ -1,4 +1,8 @@ /** -@dir lib/net -@brief lib/net +@dir /lib/net +@brief lib/net: Low-level network-related code. + +This module includes address manipulation, compatibility wrappers, +convenience functions, and so on. + **/ diff --git a/src/lib/osinfo/lib_osinfo.dox b/src/lib/osinfo/lib_osinfo.dox index 7733755f20..4d9b1a6d76 100644 --- a/src/lib/osinfo/lib_osinfo.dox +++ b/src/lib/osinfo/lib_osinfo.dox @@ -1,4 +1,10 @@ /** -@dir lib/osinfo -@brief lib/osinfo +@dir /lib/osinfo +@brief lib/osinfo: For inspecting the OS version and capabilities. + +In general, we use this module when we're telling the user what operating +system they are running. We shouldn't make decisions based on the output of +these checks: instead, we should have more specific checks, either at compile +time or run time, based on the observed system behavior. + **/ diff --git a/src/lib/process/lib_process.dox b/src/lib/process/lib_process.dox index efb1adc091..723c9f193d 100644 --- a/src/lib/process/lib_process.dox +++ b/src/lib/process/lib_process.dox @@ -1,4 +1,4 @@ /** -@dir lib/process -@brief lib/process +@dir /lib/process +@brief lib/process: Launch and manage subprocesses. **/ diff --git a/src/lib/pubsub/lib_pubsub.dox b/src/lib/pubsub/lib_pubsub.dox index 9a3fc6dfac..c033660121 100644 --- a/src/lib/pubsub/lib_pubsub.dox +++ b/src/lib/pubsub/lib_pubsub.dox @@ -1,4 +1,16 @@ /** -@dir lib/pubsub -@brief lib/pubsub +@dir /lib/pubsub +@brief lib/pubsub: Publish-subscribe message passing. + +This module wraps the \refdir{lib/dispatch} module, to provide a more +ergonomic and type-safe approach to message passing. + +In general, we favor this mechanism for cases where higher-level modules +need to be notified when something happens in lower-level modules. (The +alternative would be calling up from the lower-level modules, which +would be error-prone; or maintaining lists of function-pointers, which +would be clumsy and tend to complicate the call graph.) + +See pubsub.c for more information. + **/ diff --git a/src/lib/sandbox/lib_sandbox.dox b/src/lib/sandbox/lib_sandbox.dox index eb42d97589..48eddac685 100644 --- a/src/lib/sandbox/lib_sandbox.dox +++ b/src/lib/sandbox/lib_sandbox.dox @@ -1,4 +1,17 @@ /** -@dir lib/sandbox -@brief lib/sandbox +@dir /lib/sandbox +@brief lib/sandbox: Linux seccomp2-based sandbox. + +This module uses Linux's seccomp2 facility via the +[`libseccomp` library](https://github.com/seccomp/libseccomp), to restrict +the set of system calls that Tor is allowed to invoke while it is running. + +Because there are many libc versions that invoke different system calls, and +because handling strings is quite complex, this module is more complex and +less portable than it needs to be. + +A better architecture would put the responsibility for invoking tricky system +calls (like open()) in another, less restricted process, and give that +process responsibility for enforcing our sandbox rules. + **/ diff --git a/src/lib/smartlist_core/lib_smartlist_core.dox b/src/lib/smartlist_core/lib_smartlist_core.dox index 507d0fe92f..73c3b69056 100644 --- a/src/lib/smartlist_core/lib_smartlist_core.dox +++ b/src/lib/smartlist_core/lib_smartlist_core.dox @@ -1,4 +1,12 @@ /** -@dir lib/smartlist_core -@brief lib/smartlist_core +@dir /lib/smartlist_core +@brief lib/smartlist_core: Minimal dynamic array implementation + +A `smartlist_t` is a dynamic array type for holding `void *`. We use it +throughout the rest of the codebase. + +There are higher-level pieces in \refdir{lib/container} but +the ones in lib/smartlist_core are used by the logging code, and therefore +cannot use the logging code. + **/ diff --git a/src/lib/stats/lib_stats.dox b/src/lib/stats/lib_stats.dox deleted file mode 100644 index 897c41418f..0000000000 --- a/src/lib/stats/lib_stats.dox +++ /dev/null @@ -1,4 +0,0 @@ -/** -@dir lib/stats -@brief lib/stats -**/ diff --git a/src/lib/string/lib_string.dox b/src/lib/string/lib_string.dox index 3e038ea072..c8793ddf91 100644 --- a/src/lib/string/lib_string.dox +++ b/src/lib/string/lib_string.dox @@ -1,4 +1,15 @@ /** -@dir lib/string -@brief lib/string +@dir /lib/string +@brief lib/string: Low-level string manipulation. + +We have a number of compatibility functions here: some are for handling +functionality that is not implemented (or not implemented the same) on every +platform; some are for providing locale-independent versions of libc +functions that would otherwise be defined differently for different users. + +Other functions here are for common string-manipulation operations that we do +in the rest of the codebase. + +Any string function high-level enough to need logging belongs in a +higher-level module. **/ diff --git a/src/lib/subsys/lib_subsys.dox b/src/lib/subsys/lib_subsys.dox index f9cd5eeb81..1a22a2d808 100644 --- a/src/lib/subsys/lib_subsys.dox +++ b/src/lib/subsys/lib_subsys.dox @@ -1,4 +1,34 @@ /** -@dir lib/subsys -@brief lib/subsys +@dir /lib/subsys +@brief lib/subsys: Types for declaring a "subsystem". + +## Subsystems in Tor + +A subsystem is a module with support for initialization, shutdown, +configuration, and so on. + +Many parts of Tor can be initialized, cleaned up, and configured somewhat +independently through a table-driven mechanism. Each such part is called a +"subsystem". + +To declare a subsystem, make a global `const` instance of the `subsys_fns_t` +type, filling in the function pointer fields that you require with ones +corresponding to your subsystem. Any function pointers left as "NULL" will +be a no-op. Each system must have a name and a "level", which corresponds to +the order in which it is initialized. (See `app/main/subsystem_list.c` for a +list of current subsystems and their levels.) + +Then, insert your subsystem in the list in `app/main/subsystem_list.c`. It +will need to occupy a position corresponding to its level. + +At this point, your subsystem will be handled like the others: it will get +initialized at startup, torn down at exit, and so on. + +Historical note: Not all of Tor's code is currently handled as +subsystems. As you work with older code, you may see some parts of the code +that are initialized from `tor_init()` or `run_tor_main_loop()` or +`tor_run_main()`; and torn down from `tor_cleanup()`. We aim to migrate +these to subsystems over time; please don't add any new code that follows +this pattern. + **/ diff --git a/src/lib/term/lib_term.dox b/src/lib/term/lib_term.dox index 2bc5125839..3bf2f960ab 100644 --- a/src/lib/term/lib_term.dox +++ b/src/lib/term/lib_term.dox @@ -1,4 +1,4 @@ /** -@dir lib/term -@brief lib/term +@dir /lib/term +@brief lib/term: Terminal operations (password input). **/ diff --git a/src/lib/testsupport/lib_testsupport.dox b/src/lib/testsupport/lib_testsupport.dox index 63ccc47d34..c09c32e478 100644 --- a/src/lib/testsupport/lib_testsupport.dox +++ b/src/lib/testsupport/lib_testsupport.dox @@ -1,4 +1,4 @@ /** -@dir lib/testsupport -@brief lib/testsupport +@dir /lib/testsupport +@brief lib/testsupport: Helpers for test-only code and for function mocking. **/ diff --git a/src/lib/thread/lib_thread.dox b/src/lib/thread/lib_thread.dox index 68937ef793..2773aa009d 100644 --- a/src/lib/thread/lib_thread.dox +++ b/src/lib/thread/lib_thread.dox @@ -1,4 +1,9 @@ /** -@dir lib/thread -@brief lib/thread +@dir /lib/thread +@brief lib/thread: Mid-level threading. + +This module contains compatibility and convenience code for multithreading, +except for low-level locks (which are in \refdir{lib/lock} and +workqueue/threadpool code (which belongs in \refdir{lib/evloop}.) + **/ diff --git a/src/lib/time/lib_time.dox b/src/lib/time/lib_time.dox index 50abf072f7..b76a31fb97 100644 --- a/src/lib/time/lib_time.dox +++ b/src/lib/time/lib_time.dox @@ -1,4 +1,11 @@ /** -@dir lib/time -@brief lib/time +@dir /lib/time +@brief lib/time: Higher-level time functions + +This includes both fine-grained timers and monotonic timers, along with +wrappers for them to try to improve efficiency. + +For "what time is it" in UTC, see \refdir{lib/wallclock}. For parsing and +encoding times and dates, see \refdir{lib/encoding}. + **/ diff --git a/src/lib/tls/lib_tls.dox b/src/lib/tls/lib_tls.dox index 40b7b2c27e..f0dba269e8 100644 --- a/src/lib/tls/lib_tls.dox +++ b/src/lib/tls/lib_tls.dox @@ -1,4 +1,13 @@ /** -@dir lib/tls -@brief lib/tls +@dir /lib/tls +@brief lib/tls: TLS library wrappers + +This module has compatibility wrappers around the library (NSS or OpenSSL, +depending on configuration) that Tor uses to implement the TLS link security +protocol. + +It also implements the logic for some legacy TLS protocol usage we used to +support in old versions of Tor, involving conditional delivery of certificate +chains (v1 link protocol) and conditional renegotiation (v2 link protocol). + **/ diff --git a/src/lib/trace/lib_trace.dox b/src/lib/trace/lib_trace.dox index a1ae256506..64f762bc3e 100644 --- a/src/lib/trace/lib_trace.dox +++ b/src/lib/trace/lib_trace.dox @@ -1,4 +1,8 @@ /** -@dir lib/trace -@brief lib/trace +@dir /lib/trace +@brief lib/trace: Function-tracing functionality API. + +This module is used for adding "trace" support (low-granularity function +logging) to Tor. Right now it doesn't have many users. + **/ diff --git a/src/lib/version/lib_version.dox b/src/lib/version/lib_version.dox index 213e1a1ae8..93d2fb6b9b 100644 --- a/src/lib/version/lib_version.dox +++ b/src/lib/version/lib_version.dox @@ -1,4 +1,4 @@ /** -@dir lib/version -@brief lib/version +@dir /lib/version +@brief lib/version: holds the current version of Tor. **/ diff --git a/src/lib/wallclock/lib_wallclock.dox b/src/lib/wallclock/lib_wallclock.dox index 7bb2b075d1..7d43fa6129 100644 --- a/src/lib/wallclock/lib_wallclock.dox +++ b/src/lib/wallclock/lib_wallclock.dox @@ -1,4 +1,13 @@ /** -@dir lib/wallclock -@brief lib/wallclock +@dir /lib/wallclock +@brief lib/wallclock: Inspect and manipulate the current time. + +This module handles our concept of "what time is it" or "what time does the +world agree it is?" Generally, if you want something derived from UTC, this +is the module for you. + +For versions of the time that are more local, more monotonic, or more +accurate, see \refdir{lib/time}. For parsing and encoding times and dates, +see \refdir{lib/encoding}. + **/ diff --git a/src/mainpage.dox b/src/mainpage.dox index 84eea3c526..02ce8675e7 100644 --- a/src/mainpage.dox +++ b/src/mainpage.dox @@ -1,11 +1,122 @@ /** @mainpage Tor source reference -@section intro Getting to know Tor +@section intro Welcome to Tor -Welcome to the Tor source code documentation! Here we have documentation for -nearly every function, type, and module in the Tor source code. The high-level -documentation is a work in progress. For now, have a look at the source code -overview in doc/HACKING/design. +This documentation describes the general structure of the Tor codebase, how +it fits together, what functionality is available for extending Tor, and +gives some notes on how Tor got that way. It also includes a reference for +nearly every function, type, file, and module in the Tor source code. The +high-level documentation is a work in progress. + +Tor itself remains a work in progress too: We've been working on it for +nearly two decades, and we've learned a lot about good coding since we first +started. This means, however, that some of the older pieces of Tor will have +some "code smell" in them that could stand a brisk refactoring. So when we +describe a piece of code, we'll sometimes give a note on how it got that way, +and whether we still think that's a good idea. + +This document is not an overview of the Tor protocol. For that, see the +design paper and the specifications at https://spec.torproject.org/ . + +For more information about Tor's coding standards and some helpful +development tools, see +[doc/HACKING](https://gitweb.torproject.org/tor.git/tree/doc/HACKING) in the +Tor repository. + +@section highlevel The very high level + +Ultimately, Tor runs as an event-driven network daemon: it responds to +network events, signals, and timers by sending and receiving things over +the network. Clients, relays, and directory authorities all use the +same codebase: the Tor process will run as a client, relay, or authority +depending on its configuration. + +Tor has a few major dependencies, including Libevent (used to tell which +sockets are readable and writable), OpenSSL or NSS (used for many encryption +functions, and to implement the TLS protocol), and zlib (used to +compress and uncompress directory information). + +Most of Tor's work today is done in a single event-driven main thread. +Tor also spawns one or more worker threads to handle CPU-intensive +tasks. (Right now, this only includes circuit encryption and the more +expensive compression algorithms.) + +On startup, Tor initializes its libraries, reads and responds to its +configuration files, and launches a main event loop. At first, the only +events that Tor listens for are a few signals (like TERM and HUP), and +one or more listener sockets (for different kinds of incoming +connections). Tor also configures several timers to handle periodic +events. As Tor runs over time, other events will open, and new events +will be scheduled. + +The codebase is divided into a few top-level subdirectories, each of +which contains several sub-modules. + + - `ext` -- Code maintained elsewhere that we include in the Tor + source distribution. + + - \refdir{lib} -- Lower-level utility code, not necessarily + tor-specific. + + - `trunnel` -- Automatically generated code (from the Trunnel + tool): used to parse and encode binary formats. + + - \refdir{core} -- Networking code that is implements the central + parts of the Tor protocol and main loop. + + - \refdir{feature} -- Aspects of Tor (like directory management, + running a relay, running a directory authorities, managing a list of + nodes, running and using onion services) that are built on top of the + mainloop code. + + - \refdir{app} -- Highest-level functionality; responsible for setting + up and configuring the Tor daemon, making sure all the lower-level + modules start up when required, and so on. + + - \refdir{tools} -- Binaries other than Tor that we produce. + Currently this is tor-resolve, tor-gencert, and the tor_runner.o helper + module. + + - `test` -- unit tests, regression tests, and a few integration + tests. + +In theory, the above parts of the codebase are sorted from highest-level to +lowest-level, where high-level code is only allowed to invoke lower-level +code, and lower-level code never includes or depends on code of a higher +level. In practice, this refactoring is incomplete: The modules in +\refdir{lib} are well-factored, but there are many layer violations ("upward +dependencies") in \refdir{core} and \refdir{feature}. +We aim to eliminate those over time. + +@section keyabstractions Some key high-level abstractions + +The most important abstractions at Tor's high-level are Connections, +Channels, Circuits, and Nodes. + +A 'Connection' (connection_t) represents a stream-based information flow. +Most connections are TCP connections to remote Tor servers and clients. (But +as a shortcut, a relay will sometimes make a connection to itself without +actually using a TCP connection. More details later on.) Connections exist +in different varieties, depending on what functionality they provide. The +principle types of connection are edge_connection_t (eg a socks connection or +a connection from an exit relay to a destination), or_connection_t (a TLS +stream connecting to a relay), dir_connection_t (an HTTP connection to learn +about the network), and control_connection_t (a connection from a +controller). + +A 'Circuit' (circuit_t) is persistent tunnel through the Tor network, +established with public-key cryptography, and used to send cells one or more +hops. Clients keep track of multi-hop circuits (origin_circuit_t), and the +cryptography associated with each hop. Relays, on the other hand, keep track +only of their hop of each circuit (or_circuit_t). + +A 'Channel' (channel_t) is an abstract view of sending cells to and from a +Tor relay. Currently, all channels are implemented using OR connections +(channel_tls_t). If we switch to other strategies in the future, we'll have +more connection types. + +A 'Node' (node_t) is a view of a Tor instance's current knowledge and opinions +about a Tor relay or bridge. **/ diff --git a/src/tools/tools.dox b/src/tools/tools.dox index 54aa4df48e..1168ed5bad 100644 --- a/src/tools/tools.dox +++ b/src/tools/tools.dox @@ -1,5 +1,5 @@ /** -@dir tools +@dir /tools @brief tools: other command-line tools for use with Tor. The "tools" directory has a few other programs that use Tor, but are not part |