diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/HACKING/CodingStandards.md | 57 | ||||
-rw-r--r-- | doc/HACKING/Module.md | 29 | ||||
-rw-r--r-- | doc/HACKING/design/00-overview.md | 143 | ||||
-rw-r--r-- | doc/HACKING/design/01.00-lib-overview.md | 171 | ||||
-rw-r--r-- | doc/HACKING/design/01a-memory.md | 103 | ||||
-rw-r--r-- | doc/HACKING/design/01b-collections.md | 45 | ||||
-rw-r--r-- | doc/HACKING/design/01c-time.md | 75 | ||||
-rw-r--r-- | doc/HACKING/design/01d-crypto.md | 169 | ||||
-rw-r--r-- | doc/HACKING/design/01e-os-compat.md | 50 | ||||
-rw-r--r-- | doc/HACKING/design/01f-threads.md | 26 | ||||
-rw-r--r-- | doc/HACKING/design/01g-strings.md | 95 | ||||
-rw-r--r-- | doc/HACKING/design/02-dataflow.md | 236 | ||||
-rw-r--r-- | doc/HACKING/design/03-modules.md | 247 | ||||
-rw-r--r-- | doc/HACKING/design/Makefile | 34 | ||||
-rw-r--r-- | doc/HACKING/design/this-not-that.md | 51 | ||||
-rw-r--r-- | doc/include.am | 1 | ||||
-rw-r--r-- | doc/tor-doxygen.css | 10 | ||||
-rw-r--r-- | doc/tor.1.txt | 564 |
18 files changed, 373 insertions, 1733 deletions
diff --git a/doc/HACKING/CodingStandards.md b/doc/HACKING/CodingStandards.md index 74db2a39a3..34d1dd52b5 100644 --- a/doc/HACKING/CodingStandards.md +++ b/doc/HACKING/CodingStandards.md @@ -212,6 +212,9 @@ deviations from our C whitespace style. Generally, we use: - No space between a function name and an opening paren. `puts(x)`, not `puts (x)`. - Function declarations at the start of the line. + - Use `void foo(void)` to declare a function with no arguments. Saying + `void foo()` is C++ syntax. + - Use `const` for new APIs. If you use an editor that has plugins for editorconfig.org, the file `.editorconfig` will help you to conform this coding style. @@ -228,20 +231,49 @@ We have some wrapper functions like `tor_malloc`, `tor_free`, `tor_strdup`, and `tor_gettimeofday;` use them instead of their generic equivalents. (They always succeed or exit.) +Specifically, Don't use `malloc`, `realloc`, `calloc`, `free`, or +`strdup`. Use `tor_malloc`, `tor_realloc`, `tor_calloc`, `tor_free`, or +`tor_strdup`. + +Don't use `tor_realloc(x, y\*z)`. Use `tor_reallocarray(x, y, z)` instead.; + You can get a full list of the compatibility functions that Tor provides by looking through `src/lib/*/*.h`. You can see the available containers in `src/lib/containers/*.h`. You should probably familiarize yourself with these modules before you write too much code, or else you'll wind up reinventing the wheel. -We don't use `strcat` or `strcpy` or `sprintf` of any of those notoriously broken -old C functions. Use `strlcat`, `strlcpy`, or `tor_snprintf/tor_asprintf` instead. + +We don't use `strcat` or `strcpy` or `sprintf` of any of those notoriously +broken old C functions. We also avoid `strncat` and `strncpy`. Use +`strlcat`, `strlcpy`, or `tor_snprintf/tor_asprintf` instead. We don't call `memcmp()` directly. Use `fast_memeq()`, `fast_memneq()`, -`tor_memeq()`, or `tor_memneq()` for most purposes. +`tor_memeq()`, or `tor_memneq()` for most purposes. If you really need a +tristate return value, use `tor_memcmp()` or `fast_memcmp()`. + +Don't call `assert()` directly. For hard asserts, use `tor_assert()`. For +soft asserts, use `tor_assert_nonfatal()` or `BUG()`. If you need to print +debug information in assert error message, consider using `tor_assertf()` and +`tor_assertf_nonfatal()`. If you are writing code that is too low-level to +use the logging subsystem, use `raw_assert()`. + +Don't use `toupper()` and `tolower()` functions. Use `TOR_TOUPPER` and +`TOR_TOLOWER` macros instead. Similarly, use `TOR_ISALPHA`, `TOR_ISALNUM` et. +al. instead of `isalpha()`, `isalnum()`, etc. + +When allocating new string to be added to a smartlist, use +`smartlist_add_asprintf()` to do both at once. + +Avoid calling BSD socket functions directly. Use portable wrappers to work +with sockets and socket addresses. Also, sockets should be of type +`tor_socket_t`. + +Don't use any of these functions: they aren't portable. Use the +version prefixed with `tor_` instead: strtok_r, memmem, memstr, +asprintf, localtime_r, gmtime_r, inet_aton, inet_ntop, inet_pton, +getpass, ntohll, htonll. (This list is incomplete.) -Also see a longer list of functions to avoid in: -https://people.torproject.org/~nickm/tor-auto/internal/this-not-that.html What code can use what other code? ---------------------------------- @@ -331,8 +363,16 @@ definitions when necessary.) Assignment operators shouldn't nest inside other expressions. (You can ignore this inside macro definitions when necessary.) -Functions not to write ----------------------- +Binary data and wire formats +---------------------------- + +Use pointer to `char` when representing NUL-terminated string. To represent +arbitrary binary data, use pointer to `uint8_t`. (Many older Tor APIs ignore +this rule.) + +Refrain from attempting to encode integers by casting their pointers to byte +arrays. Use something like `set_uint32()`/`get_uint32()` instead and don't +forget about endianness. Try to never hand-write new code to parse or generate binary formats. Instead, use trunnel if at all possible. See @@ -451,6 +491,9 @@ to use it as a function callback), define it with a name like abc_free_(obj); } +When deallocating, don't say e.g. `if (x) tor_free(x)`. The convention is to +have deallocators do nothing when NULL pointer is passed. + Doxygen comment conventions --------------------------- diff --git a/doc/HACKING/Module.md b/doc/HACKING/Module.md index 9cf36090b4..3a07d0c639 100644 --- a/doc/HACKING/Module.md +++ b/doc/HACKING/Module.md @@ -8,13 +8,18 @@ module in Tor. In the context of the tor code base, a module is a subsystem that we can selectively enable or disable, at `configure` time. -Currently, there is only one module: +Currently, tor has these modules: + - Relay subsystem (relay) - Directory Authority subsystem (dirauth) -It is located in its own directory in `src/feature/dirauth/`. To disable it, -one need to pass `--disable-module-dirauth` at configure time. All modules -are currently enabled by default. +dirauth is located in its own directory in `src/feature/dirauth/`. + +Relay is located in directories named `src/*/*relay` and `src/*/*dircache`, +which are being progressively refactored and disabled. + +To disable a module, pass `--disable-module-{dirauth,relay}` at configure +time. All modules are currently enabled by default. ## Build System ## @@ -24,7 +29,7 @@ The changes to the build system are pretty straightforward. contains a list (white-space separated) of the module in tor. Add yours to the list. -2. Use the `AC_ARG_ENABLE([module-dirauth]` template for your new module. We +2. Use the `AC_ARG_ENABLE([module-relay]` template for your new module. We use the "disable module" approach instead of enabling them one by one. So, by default, tor will build all the modules. @@ -32,7 +37,7 @@ The changes to the build system are pretty straightforward. the C code to conditionally compile things for your module. And the `BUILD_MODULE_<name>` is also defined for automake files (e.g: include.am). -3. In the `src/core/include.am` file, locate the `MODULE_DIRAUTH_SOURCES` +3. In the `src/core/include.am` file, locate the `MODULE_RELAY_SOURCES` value. You need to create your own `_SOURCES` variable for your module and then conditionally add the it to `LIBTOR_A_SOURCES` if you should build the module. @@ -40,18 +45,14 @@ The changes to the build system are pretty straightforward. It is then **very** important to add your SOURCES variable to `src_or_libtor_testing_a_SOURCES` so the tests can build it. -4. Do the same for header files, locate `ORHEADERS +=` which always add all - headers of all modules so the symbol can be found for the module entry - points. - Finally, your module will automatically be included in the -`TOR_MODULES_ALL_ENABLED` variable which is used to build the unit tests. They -always build everything in order to tests everything. +`TOR_MODULES_ALL_ENABLED` variable which is used to build the unit tests. +They always build everything in order to test everything. ## Coding ## -As mentioned above, a module must be isolated in its own directory (name of -the module) in `src/feature/`. +As mentioned above, a module should be isolated in its own directories, +suffixed with the name of the module, in `src/*/`. There are couples of "rules" you want to follow: diff --git a/doc/HACKING/design/00-overview.md b/doc/HACKING/design/00-overview.md deleted file mode 100644 index ff40a566be..0000000000 --- a/doc/HACKING/design/00-overview.md +++ /dev/null @@ -1,143 +0,0 @@ - -## Overview ## - -This document describes the general structure of the Tor codebase, how -it fits together, what functionality is available for extending Tor, -and gives some notes on how Tor got that way. - -Tor remains a work in progress: We've been working on it for nearly two -decades, and we've learned a lot about good coding since we first -started. This means, however, that some of the older pieces of Tor will -have some "code smell" in them that could stand a brisk -refactoring. So when I describe a piece of code, I'll sometimes give a -note on how it got that way, and whether I still think that's a good -idea. - -The first drafts of this document were written in the Summer and Fall of -2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7 -was under development. There is a revision in progress (as of late -2019), to bring it up to pace with Tor as of version 0.4.2. If you're -reading this far in the future, some things may have changed. Caveat -haxxor! - -This document is not an overview of the Tor protocol. For that, see the -design paper and the specifications at https://spec.torproject.org/ . - -For more information about Tor's coding standards and some helpful -development tools, see doc/HACKING in the Tor repository. - - -### The very high level ### - -Ultimately, Tor runs as an event-driven network daemon: it responds to -network events, signals, and timers by sending and receiving things over -the network. Clients, relays, and directory authorities all use the -same codebase: the Tor process will run as a client, relay, or authority -depending on its configuration. - -Tor has a few major dependencies, including Libevent (used to tell which -sockets are readable and writable), OpenSSL or NSS (used for many encryption -functions, and to implement the TLS protocol), and zlib (used to -compress and uncompress directory information). - -Most of Tor's work today is done in a single event-driven main thread. -Tor also spawns one or more worker threads to handle CPU-intensive -tasks. (Right now, this only includes circuit encryption and the more -expensive compression algorithms.) - -On startup, Tor initializes its libraries, reads and responds to its -configuration files, and launches a main event loop. At first, the only -events that Tor listens for are a few signals (like TERM and HUP), and -one or more listener sockets (for different kinds of incoming -connections). Tor also configures several timers to handle periodic -events. As Tor runs over time, other events will open, and new events -will be scheduled. - -The codebase is divided into a few top-level subdirectories, each of -which contains several sub-modules. - - * `src/ext` -- Code maintained elsewhere that we include in the Tor - source distribution. - - * src/lib` -- Lower-level utility code, not necessarily tor-specific. - - * `src/trunnel` -- Automatically generated code (from the Trunnel - tool): used to parse and encode binary formats. - - * `src/core` -- Networking code that is implements the central parts of - the Tor protocol and main loop. - - * `src/feature` -- Aspects of Tor (like directory management, running a - relay, running a directory authorities, managing a list of nodes, - running and using onion services) that are built on top of the - mainloop code. - - * `src/app` -- Highest-level functionality; responsible for setting up - and configuring the Tor daemon, making sure all the lower-level - modules start up when required, and so on. - - * `src/tools` -- Binaries other than Tor that we produce. Currently this - is tor-resolve, tor-gencert, and the tor_runner.o helper module. - - * `src/test` -- unit tests, regression tests, and a few integration - tests. - -In theory, the above parts of the codebase are sorted from highest-level to -lowest-level, where high-level code is only allowed to invoke lower-level -code, and lower-level code never includes or depends on code of a higher -level. In practice, this refactoring is incomplete: The modules in `src/lib` -are well-factored, but there are many layer violations ("upward -dependencies") in `src/core` and `src/feature`. We aim to eliminate those -over time. - -### Some key high-level abstractions ### - -The most important abstractions at Tor's high-level are Connections, -Channels, Circuits, and Nodes. - -A 'Connection' represents a stream-based information flow. Most -connections are TCP connections to remote Tor servers and clients. (But -as a shortcut, a relay will sometimes make a connection to itself -without actually using a TCP connection. More details later on.) -Connections exist in different varieties, depending on what -functionality they provide. The principle types of connection are -"edge" (eg a socks connection or a connection from an exit relay to a -destination), "OR" (a TLS stream connecting to a relay), "Directory" (an -HTTP connection to learn about the network), and "Control" (a connection -from a controller). - -A 'Circuit' is persistent tunnel through the Tor network, established -with public-key cryptography, and used to send cells one or more hops. -Clients keep track of multi-hop circuits, and the cryptography -associated with each hop. Relays, on the other hand, keep track only of -their hop of each circuit. - -A 'Channel' is an abstract view of sending cells to and from a Tor -relay. Currently, all channels are implemented using OR connections. -If we switch to other strategies in the future, we'll have more -connection types. - -A 'Node' is a view of a Tor instance's current knowledge and opinions -about a Tor relay or bridge. - -### The rest of this document. ### - -> **Note**: This section describes the eventual organization of this -> document, which is not yet complete. - -We'll begin with an overview of the facilities provided by the modules -in src/lib. Knowing about these is key to writing portable, simple code -in Tor. - -Then we'll move on to a discussion of how parts of the Tor codebase are -initialized, finalized, configured, and managed. - -Then we'll go on and talk about the main data-flow of the Tor network: -how Tor generates and responds to network traffic. This will occupy a -chapter for the main overview, with other chapters for special topics. - -After that, we'll mention the main modules in src/features and describe the -functions of each. - -We'll close with a meandering overview of important pending issues in -the Tor codebase, and how they affect the future of the Tor software. diff --git a/doc/HACKING/design/01.00-lib-overview.md b/doc/HACKING/design/01.00-lib-overview.md deleted file mode 100644 index 58a92f4062..0000000000 --- a/doc/HACKING/design/01.00-lib-overview.md +++ /dev/null @@ -1,171 +0,0 @@ - -## Library code in Tor. - -Most of Tor's utility code is in modules in the `src/lib` subdirectory. In -general, this code is not necessarily Tor-specific, but is instead possibly -useful for other applications. - -This code includes: - - * Compatibility wrappers, to provide a uniform API across different - platforms. - - * Library wrappers, to provide a tor-like API over different libraries - that Tor uses for things like compression and cryptography. - - * Containers, to implement some general-purpose data container types. - -The modules in `src/lib` are currently well-factored: each one depends -only on lower-level modules. You can see an up-to-date list of the -modules sorted from lowest to highest level by running -`./scripts/maint/practracker/includes.py --toposort`. - -As of this writing, the library modules are (from lowest to highest -level): - - * `lib/cc` -- Macros for managing the C compiler and - language. Includes macros for improving compatibility and clarity - across different C compilers. - - * `lib/version` -- Holds the current version of Tor. - - * `lib/testsupport` -- Helpers for making test-only code and test - mocking support. - - * `lib/defs` -- Lowest-level constants used in many places across the - code. - - * `lib/subsys` -- Types used for declaring a "subsystem". A subsystem - is a module with support for initialization, shutdown, - configuration, and so on. - - * `lib/conf` -- Types and macros used for declaring configuration - options. - - * `lib/arch` -- Compatibility functions and macros for handling - differences in CPU architecture. - - * `lib/err` -- Lowest-level error handling code: responsible for - generating stack traces, handling raw assertion failures, and - otherwise reporting problems that might not be safe to report - via the regular logging module. - - * `lib/malloc` -- Wrappers and utilities for memory management. - - * `lib/intmath` -- Utilities for integer mathematics. - - * `lib/fdio` -- Utilities and compatibility code for reading and - writing data on file descriptors (and on sockets, for platforms - where a socket is not a kind of fd). - - * `lib/lock` -- Compatibility code for declaring and using locks. - Lower-level than the rest of the threading code. - - * `lib/ctime` -- Constant-time implementations for data comparison - and table lookup, used to avoid timing side-channels from standard - implementations of memcmp() and so on. - - * `lib/string` -- Low-level compatibility wrappers and utility - functions for string manipulation. - - * `lib/wallclock` -- Compatibility and utility functions for - inspecting and manipulating the current (UTC) time. - - * `lib/osinfo` -- Functions for inspecting the version and - capabilities of the operating system. - - * `lib/smartlist_core` -- The bare-bones pieces of our dynamic array - ("smartlist") implementation. There are higher-level pieces, but - these ones are used by (and therefore cannot use) the logging code. - - * `lib/log` -- Implements the logging system used by all higher-level - Tor code. You can think of this as the logical "midpoint" of the - library code: much of the higher-level code is higher-level - _because_ it uses the logging module, and much of the lower-level - code is specifically written to avoid having to log, because the - logging module depends on it. - - * `lib/container` -- General purpose containers, including dynamic arrays - ("smartlists"), hashtables, bit arrays, weak-reference-like "handles", - bloom filters, and a bit more. - - * `lib/trace` -- A general-purpose API for introducing - function-tracing functionality into Tor. Currently not much used. - - * `lib/thread` -- Threading compatibility and utility functionality, - other than low-level locks (which are in `lib/lock`) and - workqueue/threadpool code (which belongs in `lib/evloop`). - - * `lib/term` -- Code for terminal manipulation functions (like - reading a password from the user). - - * `lib/memarea` -- A data structure for a fast "arena" style allocator, - where the data is freed all at once. Used for parsing. - - * `lib/encoding` -- Implementations for encoding data in various - formats, datatypes, and transformations. - - * `lib/dispatch` -- A general-purpose in-process message delivery - system. Used by `lib/pubsub` to implement our inter-module - publish/subscribe system. - - * `lib/sandbox` -- Our Linux seccomp2 sandbox implementation. - - * `lib/pubsub` -- Code and macros to implement our publish/subscribe - message passing system. - - * `lib/fs` -- Utility and compatibility code for manipulating files, - filenames, directories, and so on. - - * `lib/confmgt` -- Code to parse, encode, and manipulate our - configuration files, state files, and so forth. - - * `lib/crypt_ops` -- Cryptographic operations. This module contains - wrappers around the cryptographic libraries that we support, - and implementations for some higher-level cryptographic - constructions that we use. - - * `lib/meminfo` -- Functions for inspecting our memory usage, if the - malloc implementation exposes that to us. - - * `lib/time` -- Higher level time functions, including fine-gained and - monotonic timers. - - * `lib/math` -- Floating-point mathematical utilities, including - compatibility code, and probability distributions. - - * `lib/buf` -- A general purpose queued buffer implementation, - similar to the BSD kernel's "mbuf" structure. - - * `lib/net` -- Networking code, including address manipulation, - compatibility wrappers, - - * `lib/compress` -- A compatibility wrapper around several - compression libraries, currently including zlib, zstd, and lzma. - - * `lib/geoip` -- Utilities to manage geoip (IP to country) lookups - and formats. - - * `lib/tls` -- Compatibility wrappers around the library (NSS or - OpenSSL, depending on configuration) that Tor uses to implement the - TLS link security protocol. - - * `lib/evloop` -- Tools to manage the event loop and related - functionality, in order to implement asynchronous networking, - timers, periodic events, and other scheduling tasks. - - * `lib/process` -- Utilities and compatibility code to launch and - manage subprocesses. - -### What belongs in lib? - -In general, if you can imagine some program wanting the functionality -you're writing, even if that program had nothing to do with Tor, your -functionality belongs in lib. - -If it falls into one of the existing "lib" categories, your -functionality belongs in lib. - -If you are using platform-specific `#ifdef`s to manage compatibility -issues among platforms, you should probably consider whether you can -put your code into lib. diff --git a/doc/HACKING/design/01a-memory.md b/doc/HACKING/design/01a-memory.md deleted file mode 100644 index 4c6bb09018..0000000000 --- a/doc/HACKING/design/01a-memory.md +++ /dev/null @@ -1,103 +0,0 @@ - -## Memory management - -### Heap-allocation functions: lib/malloc/malloc.h - -Tor imposes a few light wrappers over C's native malloc and free -functions, to improve convenience, and to allow wholescale replacement -of malloc and free as needed. - -You should never use 'malloc', 'calloc', 'realloc, or 'free' on their -own; always use the variants prefixed with 'tor_'. -They are the same as the standard C functions, with the following -exceptions: - - * `tor_free(NULL)` is a no-op. - * `tor_free()` is a macro that takes an lvalue as an argument and sets it to - NULL after freeing it. To avoid this behavior, you can use `tor_free_()` - instead. - * tor_malloc() and friends fail with an assertion if they are asked to - allocate a value so large that it is probably an underflow. - * It is always safe to `tor_malloc(0)`, regardless of whether your libc - allows it. - * `tor_malloc()`, `tor_realloc()`, and friends are never allowed to fail. - Instead, Tor will die with an assertion. This means that you never - need to check their return values. See the next subsection for - information on why we think this is a good idea. - -We define additional general-purpose memory allocation functions as well: - - * `tor_malloc_zero(x)` behaves as `calloc(1, x)`, except the it makes clear - the intent to allocate a single zeroed-out value. - * `tor_reallocarray(x,y)` behaves as the OpenBSD reallocarray function. - Use it for cases when you need to realloc() in a multiplication-safe - way. - -And specific-purpose functions as well: - - * `tor_strdup()` and `tor_strndup()` behaves as the underlying libc - functions, but use `tor_malloc()` instead of the underlying function. - * `tor_memdup()` copies a chunk of memory of a given size. - * `tor_memdup_nulterm()` copies a chunk of memory of a given size, then - NUL-terminates it just to be safe. - -#### Why assert on allocation failure? - -Why don't we allow `tor_malloc()` and its allies to return NULL? - -First, it's error-prone. Many programmers forget to check for NULL return -values, and testing for `malloc()` failures is a major pain. - -Second, it's not necessarily a great way to handle OOM conditions. It's -probably better (we think) to have a memory target where we dynamically free -things ahead of time in order to stay under the target. Trying to respond to -an OOM at the point of `tor_malloc()` failure, on the other hand, would involve -a rare operation invoked from deep in the call stack. (Again, that's -error-prone and hard to debug.) - -Third, thanks to the rise of Linux and other operating systems that allow -memory to be overcommitted, you can't actually ever rely on getting a NULL -from `malloc()` when you're out of memory; instead you have to use an approach -closer to tracking the total memory usage. - -#### Conventions for your own allocation functions. - -Whenever you create a new type, the convention is to give it a pair of -`x_new()` and `x_free_()` functions, named after the type. - -Calling `x_free(NULL)` should always be a no-op. - -There should additionally be an `x_free()` macro, defined in terms of -`x_free_()`. This macro should set its lvalue to NULL. You can define it -using the FREE_AND_NULL macro, as follows: - -``` -#define x_free(ptr) FREE_AND_NULL(x_t, x_free_, (ptr)) -``` - - -### Grow-only memory allocation: lib/memarea - -It's often handy to allocate a large number of tiny objects, all of which -need to disappear at the same time. You can do this in tor using the -memarea.c abstraction, which uses a set of grow-only buffers for allocation, -and only supports a single "free" operation at the end. - -Using memareas also helps you avoid memory fragmentation. You see, some libc -malloc implementations perform badly on the case where a large number of -small temporary objects are allocated at the same time as a few long-lived -objects of similar size. But if you use tor_malloc() for the long-lived ones -and a memarea for the temporary object, the malloc implementation is likelier -to do better. - -To create a new memarea, use `memarea_new()`. To drop all the storage from a -memarea, and invalidate its pointers, use `memarea_drop_all()`. - -The allocation functions `memarea_alloc()`, `memarea_alloc_zero()`, -`memarea_memdup()`, `memarea_strdup()`, and `memarea_strndup()` are analogous -to the similarly-named malloc() functions. There is intentionally no -`memarea_free()` or `memarea_realloc()`. - -### Special allocation: lib/malloc/map_anon.h - -TODO: WRITEME. diff --git a/doc/HACKING/design/01b-collections.md b/doc/HACKING/design/01b-collections.md deleted file mode 100644 index ed6fdc9071..0000000000 --- a/doc/HACKING/design/01b-collections.md +++ /dev/null @@ -1,45 +0,0 @@ - -## Collections in tor - -### Smartlists: Neither lists, nor especially smart. - -For historical reasons, we call our dynamic-allocated array type -`smartlist_t`. It can grow or shrink as elements are added and removed. - -All smartlists hold an array of `void *`. Whenever you expose a smartlist -in an API you *must* document which types its pointers actually hold. - -<!-- It would be neat to fix that, wouldn't it? -NM --> - -Smartlists are created empty with `smartlist_new()` and freed with -`smartlist_free()`. See the `containers.h` module documentation for more -information; there are many convenience functions for commonly needed -operations. - -<!-- TODO: WRITE more about what you can do with smartlists. --> - -### Digest maps, string maps, and more. - -Tor makes frequent use of maps from 160-bit digests, 256-bit digests, -or nul-terminated strings to `void *`. These types are `digestmap_t`, -`digest256map_t`, and `strmap_t` respectively. See the containers.h -module documentation for more information. - -### Intrusive lists and hashtables - -For performance-sensitive cases, we sometimes want to use "intrusive" -collections: ones where the bookkeeping pointers are stuck inside the -structures that belong to the collection. If you've used the -BSD-style sys/queue.h macros, you'll be familiar with these. - -Unfortunately, the `sys/queue.h` macros vary significantly between the -platforms that have them, so we provide our own variants in -`src/ext/tor_queue.h`. - -We also provide an intrusive hashtable implementation in `src/ext/ht.h`. -When you're using it, you'll need to define your own hash -functions. If attacker-induced collisions are a worry here, use the -cryptographic siphash24g function to extract hashes. - -<!-- TODO: WRITE about bloom filters, namemaps, bit-arrays, order functions. ---> diff --git a/doc/HACKING/design/01c-time.md b/doc/HACKING/design/01c-time.md deleted file mode 100644 index 5cd0b354fd..0000000000 --- a/doc/HACKING/design/01c-time.md +++ /dev/null @@ -1,75 +0,0 @@ - -## Time in tor ## - -### What time is it? ### - -We have several notions of the current time in Tor. - -The *wallclock time* is available from time(NULL) with -second-granularity and tor_gettimeofday() with microsecond -granularity. It corresponds most closely to "the current time and date". - -The *monotonic time* is available with the set of monotime_\* -functions declared in compat_time.h. Unlike the wallclock time, it -can only move forward. It does not necessarily correspond to a real -world time, and it is not portable between systems. - -The *coarse monotonic time* is available from the set of -monotime_coarse_\* functions in compat_time.h. It is the same as -monotime_\* on some platforms. On others, it gives a monotonic timer -with less precision, but which it's more efficient to access. - -### Cached views of time. ### - -On some systems (like Linux), many time functions use a VDSO to avoid -the overhead of a system call. But on other systems, gettimeofday() -and time() can be costly enough that you wouldn't want to call them -tens of thousands of times. To get a recent, but not especially -accurate, view of the current time, see approx_time() and -tor_gettimeofday_cached(). - - -### Parsing and encoding time values ### - -Tor has functions to parse and format time in these formats: - - * RFC1123 format. ("Fri, 29 Sep 2006 15:54:20 GMT"). For this, - use format_rfc1123_time() and parse_rfc1123_time. - - * ISO8601 format. ("2006-10-29 10:57:20") For this, use - format_local_iso_time and format_iso_time. We also support the - variant format "2006-10-29T10:57:20" with format_iso_time_nospace, and - "2006-10-29T10:57:20.123456" with format_iso_time_nospace_usec. - - * HTTP format collections (preferably "Mon, 25 Jul 2016 04:01:11 - GMT" or possibly "Wed Jun 30 21:49:08 1993" or even "25-Jul-16 - 04:01:11 GMT"). For this, use parse_http_time. Don't generate anything - but the first format. - -Some of these functions use struct tm. You can use the standard -tor_localtime_r and tor_gmtime_r() to wrap these in a safe way. We -also have a tor_timegm() function. - -### Scheduling events ### - -The main way to schedule a not-too-frequent periodic event with -respect to the Tor mainloop is via the mechanism in periodic.c. -There's a big table of periodic_events in main.c, each of which gets -invoked on its own schedule. You should not expect more than about -one second of accuracy with these timers. - -You can create an independent timer using libevent directly, or using -the periodic_timer_new() function. But you should avoid doing this -for per-connection or per-circuit timers: Libevent's internal timer -implementation uses a min-heap, and those tend to start scaling poorly -once you have a few thousand entries. - -If you need to create a large number of fine-grained timers for some -purpose, you should consider the mechanism in src/common/timers.c, -which is optimized for the case where you have a large number of -timers with not-too-long duration, many of which will be deleted -before they actually expire. These timers should be reasonably -accurate within a handful of milliseconds -- possibly better on some -platforms. (The timers.c module uses William Ahern's timeout.c -implementation as its backend, which is based on a hierarchical timing -wheel algorithm. It's cool stuff; check it out.) diff --git a/doc/HACKING/design/01d-crypto.md b/doc/HACKING/design/01d-crypto.md deleted file mode 100644 index d4def947d1..0000000000 --- a/doc/HACKING/design/01d-crypto.md +++ /dev/null @@ -1,169 +0,0 @@ - -## Lower-level cryptography functionality in Tor ## - -Generally speaking, Tor code shouldn't be calling OpenSSL (or any -other crypto library) directly. Instead, we should indirect through -one of the functions in src/common/crypto\*.c or src/common/tortls.c. - -Cryptography functionality that's available is described below. - -### RNG facilities ### - -The most basic RNG capability in Tor is the crypto_rand() family of -functions. These currently use OpenSSL's RAND_() backend, but may use -something faster in the future. - -In addition to crypto_rand(), which fills in a buffer with random -bytes, we also have functions to produce random integers in certain -ranges; to produce random hostnames; to produce random doubles, etc. - -When you're creating a long-term cryptographic secret, you might want -to use crypto_strongest_rand() instead of crypto_rand(). It takes the -operating system's entropy source and combines it with output from -crypto_rand(). This is a pure paranoia measure, but it might help us -someday. - -You can use smartlist_choose() to pick a random element from a smartlist -and smartlist_shuffle() to randomize the order of a smartlist. Both are -potentially a bit slow. - -### Cryptographic digests and related functions ### - -We treat digests as separate types based on the length of their -outputs. We support one 160-bit digest (SHA1), two 256-bit digests -(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512). - -You should not use SHA1 for anything new. - -The crypto_digest\*() family of functions manipulates digests. You -can either compute a digest of a chunk of memory all at once using -crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you -can create a crypto_digest_t object with -crypto_digest{,256,512}_new(), feed information to it in chunks using -crypto_digest_add_bytes(), and then extract the final digest using -crypto_digest_get_digest(). You can copy the state of one of these -objects using crypto_digest_dup() or crypto_digest_assign(). - -We support the HMAC hash-based message authentication code -instantiated using SHA256. See crypto_hmac_sha256. (You should not -add any HMAC users with SHA1, and HMAC is not necessary with SHA3.) - -We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike -digests, these are extendable output functions (or XOFs) where you can -get any amount of output. Use the crypto_xof_\*() functions to access -these. - -We have several ways to derive keys from cryptographically strong secret -inputs (like diffie-hellman outputs). The old -crypto_expand_key_material-TAP() performs an ad-hoc KDF based on SHA1 -- you -shouldn't use it for implementing anything but old versions of the Tor -protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern -protocols. Also consider SHAKE256. - -If your input is potentially weak, like a password or passphrase, use a salt -along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer -scrypt over other hashing methods when possible. If you're using a password -to encrypt something, see the "boxed file storage" section below. - -Finally, in order to store objects in hash tables, Tor includes the -randomized SipHash 2-4 function. Call it via the siphash24g() function in -src/ext/siphash.h whenever you're creating a hashtable whose keys may be -manipulated by an attacker in order to DoS you with collisions. - - -### Stream ciphers ### - -You can create instances of a stream cipher using crypto_cipher_new(). -These are stateful objects of type crypto_cipher_t. Note that these -objects only support AES-128 right now; a future version should add -support for AES-128 and/or ChaCha20. - -You can encrypt/decrypt with crypto_cipher_encrypt or -crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs -an encryption without a copy. - -Note that sensible people should not use raw stream ciphers; they should -probably be using some kind of AEAD. Sorry. - -### Public key functionality ### - -We support four public key algorithms: DH1024, RSA, Curve25519, and -Ed25519. - -We support DH1024 over two prime groups. You access these via the -crypto_dh_\*() family of functions. - -We support RSA in many bit sizes for signing and encryption. You access -it via the crypto_pk_*() family of functions. Note that a crypto_pk_t -may or may not include a private key. See the crypto_pk_* functions in -crypto.c for a full list of functions here. - -For Curve25519 functionality, see the functions and types in -crypto_curve25519.c. Curve25519 is generally suitable for when you need -a secure fast elliptic-curve diffie hellman implementation. When -designing new protocols, prefer it over DH in Z_p. - -For Ed25519 functionality, see the functions and types in -crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast -elliptic curve signature method. For new protocols, prefer it over RSA -signatures. - -### Metaformats for storage ### - -When OpenSSL manages the storage of some object, we use whatever format -OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding -that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----". - -When we manage the storage of some cryptographic object, we prefix the -object with 32-byte NUL-padded prefix in order to avoid accidental -object confusion; see the crypto_read_tagged_contents_from_file() and -crypto_write_tagged_contents_to_file() functions for manipulating -these. The prefix is "== type: tag ==", where type describes the object -and its encoding, and tag indicates which one it is. - -### Boxed-file storage ### - -When managing keys, you frequently want to have some way to write a -secret object to disk, encrypted with a passphrase. The crypto_pwbox -and crypto_unpwbox functions do so in a way that's likely to be -readable by future versions of Tor. - -### Certificates ### - -We have, alas, several certificate types in Tor. - -The tor_x509_cert_t type represents an X.509 certificate. This document -won't explain X.509 to you -- possibly, no document can. (OTOH, Peter -Gutmann's "x.509 style guide", though severely dated, does a good job of -explaining how awful x.509 can be.) Do not introduce any new usages of -X.509. Right now we only use it in places where TLS forces us to do so. - -The authority_cert_t type is used only for directory authority keys. It -has a medium-term signing key (which the authorities actually keep -online) signed by a long-term identity key (which the authority operator -had really better be keeping offline). Don't use it for any new kind of -certificate. - -For new places where you need a certificate, consider tor_cert_t: it -represents a typed and dated _something_ signed by an Ed25519 key. The -format is described in tor-spec. Unlike x.509, you can write it on a -napkin. - -(Additionally, the Tor directory design uses a fairly wide variety of -documents that include keys and which are signed by keys. You can -consider these documents to be an additional kind of certificate if you -want.) - -### TLS ### - -Tor's TLS implementation is more tightly coupled to OpenSSL than we'd -prefer. You can read most of it in tortls.c. - -Unfortunately, TLS's state machine and our requirement for nonblocking -IO support means that using TLS in practice is a bit hairy, since -logical writes can block on a physical reads, and vice versa. - -If you are lucky, you will never have to look at the code here. - - - diff --git a/doc/HACKING/design/01e-os-compat.md b/doc/HACKING/design/01e-os-compat.md deleted file mode 100644 index 072e95bc8a..0000000000 --- a/doc/HACKING/design/01e-os-compat.md +++ /dev/null @@ -1,50 +0,0 @@ - -## OS compatibility functions ## - -We've got a bunch of functions to wrap differences between various -operating systems where we run. - -### The filesystem ### - -We wrap the most important filesystem functions with load-file, -save-file, and map-file abstractions declared in util.c or compat.c. If -you're messing about with file descriptors yourself, you might be doing -things wrong. Most of the time, write_str_to_file() and -read_str_from_file() are all you need. - -Use the check_private_directory() function to create or verify the -presence of directories, and tor_listdir() to list the files in a -directory. - -Those modules also have functions for manipulating paths a bit. - -### Networking ### - -Nearly all the world is on a Berkeley sockets API, except for -windows, whose version of the Berkeley API was corrupted by late-90s -insistence on backward compatibility with the -sort-of-berkeley-sort-of-not add-on *thing* that was WinSocks. - -What's more, everybody who implemented sockets realized that select() -wasn't a very good way to do nonblocking IO... and then the various -implementations all decided to so something different. - -You can forget about most of these differences, fortunately: We use -libevent to hide most of the differences between the various networking -backends, and we add a few of our own functions to hide the differences -that Libevent doesn't. - -To create a network connection, the right level of abstraction to look -at is probably the connection_t system in connection.c. Most of the -lower level work has already been done for you. If you need to -instantiate something that doesn't fit well with connection_t, you -should see whether you can instantiate it with connection_t anyway -- or -you might need to refactor connection.c a little. - -Whenever possible, represent network addresses as tor_addr_t. - -### Process launch and monitoring ### - -Launching and/or monitoring a process is tricky business. You can use -the mechanisms in procmon.c and tor_spawn_background(), but they're both -a bit wonky. A refactoring would not be out of order. diff --git a/doc/HACKING/design/01f-threads.md b/doc/HACKING/design/01f-threads.md deleted file mode 100644 index a0dfa2d40e..0000000000 --- a/doc/HACKING/design/01f-threads.md +++ /dev/null @@ -1,26 +0,0 @@ - -## Threads in Tor ## - -Tor is based around a single main thread and one or more worker -threads. We aim (with middling success) to use worker threads for -CPU-intensive activities and the main thread for our networking. -Fortunately (?) we have enough cryptography that moving what we can of the -cryptographic processes to the workers should achieve good parallelism under most -loads. Unfortunately, we only have a small fraction of our -cryptography done in our worker threads right now. - -Our threads-and-workers abstraction is defined in workqueue.c, which -combines a work queue with a thread pool, and integrates the -signalling with libevent. Tor main instance of a work queue is -instantiated in cpuworker.c. It will probably need some refactoring -as more types of work are added. - -On a lower level, we provide locks with tor_mutex_t, conditions with -tor_cond_t, and thread-local storage with tor_threadlocal_t, all of -which are specified in compat_threads.h and implemented in an OS- -specific compat_\*threads.h module. - -Try to minimize sharing between threads: it is usually best to simply -make the worker "own" all the data it needs while the work is in -progress, and to give up ownership when it's complete. - diff --git a/doc/HACKING/design/01g-strings.md b/doc/HACKING/design/01g-strings.md deleted file mode 100644 index 145a35cd6f..0000000000 --- a/doc/HACKING/design/01g-strings.md +++ /dev/null @@ -1,95 +0,0 @@ - -## String processing in Tor ## - -Since you're reading about a C program, you probably expected this -section: it's full of functions for manipulating the (notoriously -dubious) C string abstraction. I'll describe some often-missed -highlights here. - -### Comparing strings and memory chunks ### - -We provide strcmpstart() and strcmpend() to perform a strcmp with the start -or end of a string. - - tor_assert(!strcmpstart("Hello world","Hello")); - tor_assert(!strcmpend("Hello world","world")); - - tor_assert(!strcasecmpstart("HELLO WORLD","Hello")); - tor_assert(!strcasecmpend("HELLO WORLD","world")); - -To compare two string pointers, either of which might be NULL, use -strcmp_opt(). - -To search for a string or a chunk of memory within a non-null -terminated memory block, use tor_memstr or tor_memmem respectively. - -We avoid using memcmp() directly, since it tends to be used in cases -when having a constant-time operation would be better. Instead, we -recommend tor_memeq() and tor_memneq() for when you need a -constant-time operation. In cases when you need a fast comparison, -and timing leaks are not a danger, you can use fast_memeq() and -fast_memneq(). - -It's a common pattern to take a string representing one or more lines -of text, and search within it for some other string, at the start of a -line. You could search for "\\ntarget", but that would miss the first -line. Instead, use find_str_at_start_of_line. - -### Parsing text ### - -Over the years, we have accumulated lots of ways to parse text -- -probably too many. Refactoring them to be safer and saner could be a -good project! The one that seems most error-resistant is tokenizing -text with smartlist_split_strings(). This function takes a smartlist, -a string, and a separator, and splits the string along occurrences of -the separator, adding new strings for the sub-elements to the given -smartlist. - -To handle time, you can use one of the functions mentioned above in -"Parsing and encoding time values". - -For numbers in general, use the tor_parse_{long,ulong,double,uint64} -family of functions. Each of these can be called in a few ways. The -most general is as follows: - - const int BASE = 10; - const int MINVAL = 10, MAXVAL = 10000; - const char *next; - int ok; - long lng = tor_parse_long("100", BASE, MINVAL, MAXVAL, &ok, &next); - -The return value should be ignored if "ok" is set to false. The input -string needs to contain an entire number, or it's considered -invalid... unless the "next" pointer is available, in which case extra -characters at the end are allowed, and "next" is set to point to the -first such character. - -### Generating blocks of text ### - -For not-too-large blocks of text, we provide tor_asprintf(), which -behaves like other members of the sprintf() family, except that it -always allocates enough memory on the heap for its output. - -For larger blocks: Rather than using strlcat and strlcpy to build -text, or keeping pointers to the interior of a memory block, we -recommend that you use the smartlist_* functions to build a smartlist -full of substrings in order. Then you can concatenate them into a -single string with smartlist_join_strings(), which also takes optional -separator and terminator arguments. - -As a convenience, we provide smartlist_add_asprintf(), which combines -the two methods above together. Many of the cryptographic digest -functions also accept a not-yet-concatenated smartlist of strings. - -### Logging helpers ### - -Often we'd like to log a value that comes from an untrusted source. -To do this, use escaped() to escape the nonprintable characters and -other confusing elements in a string, and surround it by quotes. (Use -esc_for_log() if you need to allocate a new string.) - -It's also handy to put memory chunks into hexadecimal before logging; -you can use hex_str(memory, length) for that. - -The escaped() and hex_str() functions both provide outputs that are -only valid till they are next invoked; they are not threadsafe. diff --git a/doc/HACKING/design/02-dataflow.md b/doc/HACKING/design/02-dataflow.md deleted file mode 100644 index 39f21a908c..0000000000 --- a/doc/HACKING/design/02-dataflow.md +++ /dev/null @@ -1,236 +0,0 @@ - -## Data flow in the Tor process ## - -We read bytes from the network, we write bytes to the network. For the -most part, the bytes we write correspond roughly to bytes we have read, -with bits of cryptography added in. - -The rest is a matter of details. - -![Diagram of main data flows in Tor](./diagrams/02/02-dataflow.png "Diagram of main data flows in Tor") - -### Connections and buffers: reading, writing, and interpreting. ### - -At a low level, Tor's networking code is based on "connections". Each -connection represents an object that can send or receive network-like -events. For the most part, each connection has a single underlying TCP -stream (I'll discuss counterexamples below). - -A connection that behaves like a TCP stream has an input buffer and an -output buffer. Incoming data is -written into the input buffer ("inbuf"); data to be written to the -network is queued on an output buffer ("outbuf"). - -Buffers are implemented in buffers.c. Each of these buffers is -implemented as a linked queue of memory extents, in the style of classic -BSD mbufs, or Linux skbufs. - -A connection's reading and writing can be enabled or disabled. Under -the hood, this functionality is implemented using libevent events: one -for reading, one for writing. These events are turned on/off in -main.c, in the functions connection_{start,stop}_{reading,writing}. - -When a read or write event is turned on, the main libevent loop polls -the kernel, asking which sockets are ready to read or write. (This -polling happens in the event_base_loop() call in run_main_loop_once() -in main.c.) When libevent finds a socket that's ready to read or write, -it invokes conn_{read,write}_callback(), also in main.c - -These callback functions delegate to connection_handle_read() and -connection_handle_write() in connection.c, which read or write on the -network as appropriate, possibly delegating to openssl. - -After data is read or written, or other event occurs, these -connection_handle_read_write() functions call logic functions whose job is -to respond to the information. Some examples included: - - * connection_flushed_some() -- called after a connection writes any - amount of data from its outbuf. - * connection_finished_flushing() -- called when a connection has - emptied its outbuf. - * connection_finished_connecting() -- called when an in-process connection - finishes making a remote connection. - * connection_reached_eof() -- called after receiving a FIN from the - remote server. - * connection_process_inbuf() -- called when more data arrives on - the inbuf. - -These functions then call into specific implementations depending on -the type of the connection. For example, if the connection is an -edge_connection_t, connection_reached_eof() will call -connection_edge_reached_eof(). - -> **Note:** "Also there are bufferevents!" We have vestigial -> code for an alternative low-level networking -> implementation, based on Libevent's evbuffer and bufferevent -> code. These two object types take on (most of) the roles of -> buffers and connections respectively. It isn't working in today's -> Tor, due to code rot and possible lingering libevent bugs. More -> work is needed; it would be good to get this working efficiently -> again, to have IOCP support on Windows. - - -#### Controlling connections #### - -A connection can have reading or writing enabled or disabled for a -wide variety of reasons, including: - - * Writing is disabled when there is no more data to write - * For some connection types, reading is disabled when the inbuf is - too full. - * Reading/writing is temporarily disabled on connections that have - recently read/written enough data up to their bandwidth - * Reading is disabled on connections when reading more data from them - would require that data to be buffered somewhere else that is - already full. - -Currently, these conditions are checked in a diffuse set of -increasingly complex conditional expressions. In the future, it could -be helpful to transition to a unified model for handling temporary -read/write suspensions. - -#### Kinds of connections #### - -Today Tor has the following connection and pseudoconnection types. -For the most part, each type of channel has an associated C module -that implements its underlying logic. - -**Edge connections** receive data from and deliver data to points -outside the onion routing network. See `connection_edge.c`. They fall into two types: - -**Entry connections** are a type of edge connection. They receive data -from the user running a Tor client, and deliver data to that user. -They are used to implement SOCKSPort, TransPort, NATDPort, and so on. -Sometimes they are called "AP" connections for historical reasons (it -used to stand for "Application Proxy"). - -**Exit connections** are a type of edge connection. They exist at an -exit node, and transmit traffic to and from the network. - -(Entry connections and exit connections are also used as placeholders -when performing a remote DNS request; they are not decoupled from the -notion of "stream" in the Tor protocol. This is implemented partially -in `connection_edge.c`, and partially in `dnsserv.c` and `dns.c`.) - -**OR connections** send and receive Tor cells over TLS, using some -version of the Tor link protocol. Their implementation is spread -across `connection_or.c`, with a bit of logic in `command.c`, -`relay.c`, and `channeltls.c`. - -**Extended OR connections** are a type of OR connection for use on -bridges using pluggable transports, so that the PT can tell the bridge -some information about the incoming connection before passing on its -data. They are implemented in `ext_orport.c`. - -**Directory connections** are server-side or client-side connections -that implement Tor's HTTP-based directory protocol. These are -instantiated using a socket when Tor is making an unencrypted HTTP -connection. When Tor is tunneling a directory request over a Tor -circuit, directory connections are implemented using a linked -connection pair (see below). Directory connections are implemented in -`directory.c`; some of the server-side logic is implemented in -`dirserver.c`. - -**Controller connections** are local connections to a controller -process implementing the controller protocol from -control-spec.txt. These are in `control.c`. - -**Listener connections** are not stream oriented! Rather, they wrap a -listening socket in order to detect new incoming connections. They -bypass most of stream logic. They don't have associated buffers. -They are implemented in `connection.c`. - -![structure hierarchy for connection types](./diagrams/02/02-connection-types.png "structure hierarchy for connection types") - ->**Note**: "History Time!" You might occasionally find reference to a couple types of connections -> which no longer exist in modern Tor. A *CPUWorker connection* ->connected the main Tor process to a thread or process used for ->computation. (Nowadays we use in-process communication.) Even more ->anciently, a *DNSWorker connection* connected the main tor process to ->a separate thread or process used for running `gethostbyname()` or ->`getaddrinfo()`. (Nowadays we use Libevent's evdns facility to ->perform DNS requests asynchronously.) - -#### Linked connections #### - -Sometimes two channels are joined together, such that data which the -Tor process sends on one should immediately be received by the same -Tor process on the other. (For example, when Tor makes a tunneled -directory connection, this is implemented on the client side as a -directory connection whose output goes, not to the network, but to a -local entry connection. And when a directory receives a tunnelled -directory connection, this is implemented as an exit connection whose -output goes, not to the network, but to a local directory connection.) - -The earliest versions of Tor to support linked connections used -socketpairs for the purpose. But using socketpairs forced us to copy -data through kernelspace, and wasted limited file descriptors. So -instead, a pair of connections can be linked in-process. Each linked -connection has a pointer to the other, such that data written on one -is immediately readable on the other, and vice versa. - -### From connections to channels ### - -There's an abstraction layer above OR connections (the ones that -handle cells) and below cells called **Channels**. A channel's -purpose is to transmit authenticated cells from one Tor instance -(relay or client) to another. - -Currently, only one implementation exists: Channel_tls, which sends -and receiveds cells over a TLS-based OR connection. - -Cells are sent on a channel using -`channel_write_{,packed_,var_}cell()`. Incoming cells arrive on a -channel from its backend using `channel_queue*_cell()`, and are -immediately processed using `channel_process_cells()`. - -Some cell types are handled below the channel layer, such as those -that affect handshaking only. And some others are passed up to the -generic cross-channel code in `command.c`: cells like `DESTROY` and -`CREATED` are all trivial to handle. But relay cells -require special handling... - -### From channels through circuits ### - -When a relay cell arrives on an existing circuit, it is handled in -`circuit_receive_relay_cell()` -- one of the innermost functions in -Tor. This function encrypts or decrypts the relay cell as -appropriate, and decides whether the cell is intended for the current -hop of the circuit. - -If the cell *is* intended for the current hop, we pass it to -`connection_edge_process_relay_cell()` in `relay.c`, which acts on it -based on its relay command, and (possibly) queues its data on an -`edge_connection_t`. - -If the cell *is not* intended for the current hop, we queue it for the -next channel in sequence with `append cell_to_circuit_queue()`. This -places the cell on a per-circuit queue for cells headed out on that -particular channel. - -### Sending cells on circuits: the complicated bit. ### - -Relay cells are queued onto circuits from one of two (main) sources: -reading data from edge connections, and receiving a cell to be relayed -on a circuit. Both of these sources place their cells on cell queue: -each circuit has one cell queue for each direction that it travels. - -A naive implementation would skip using cell queues, and instead write -each outgoing relay cell. (Tor did this in its earlier versions.) -But such an approach tends to give poor performance, because it allows -high-volume circuits to clog channels, and it forces the Tor server to -send data queued on a circuit even after that circuit has been closed. - -So by using queues on each circuit, we can add cells to each channel -on a just-in-time basis, choosing the cell at each moment based on -a performance-aware algorithm. - -This logic is implemented in two main modules: `scheduler.c` and -`circuitmux*.c`. The scheduler code is responsible for determining -globally, across all channels that could write cells, which one should -next receive queued cells. The circuitmux code determines, for all -of the circuits with queued cells for a channel, which one should -queue the next cell. - -(This logic applies to outgoing relay cells only; incoming relay cells -are processed as they arrive.) diff --git a/doc/HACKING/design/03-modules.md b/doc/HACKING/design/03-modules.md deleted file mode 100644 index 93eb9d3089..0000000000 --- a/doc/HACKING/design/03-modules.md +++ /dev/null @@ -1,247 +0,0 @@ - -## Tor's modules ## - -### Generic modules ### - -`buffers.c` -: Implements the `buf_t` buffered data type for connections, and several -low-level data handling functions to handle network protocols on it. - -`channel.c` -: Generic channel implementation. Channels handle sending and receiving cells -among tor nodes. - -`channeltls.c` -: Channel implementation for TLS-based OR connections. Uses `connection_or.c`. - -`circuitbuild.c` -: Code for constructing circuits and choosing their paths. (*Note*: -this module could plausibly be split into handling the client side, -the server side, and the path generation aspects of circuit building.) - -`circuitlist.c` -: Code for maintaining and navigating the global list of circuits. - -`circuitmux.c` -: Generic circuitmux implementation. A circuitmux handles deciding, for a -particular channel, which circuit should write next. - -`circuitmux_ewma.c` -: A circuitmux implementation based on the EWMA (exponentially -weighted moving average) algorithm. - -`circuituse.c` -: Code to actually send and receive data on circuits. - -`command.c` -: Handles incoming cells on channels. - -`config.c` -: Parses options from torrc, and uses them to configure the rest of Tor. - -`confparse.c` -: Generic torrc-style parser. Used to parse torrc and state files. - -`connection.c` -: Generic and common connection tools, and implementation for the simpler -connection types. - -`connection_edge.c` -: Implementation for entry and exit connections. - -`connection_or.c` -: Implementation for OR connections (the ones that send cells over TLS). - -`main.c` -: Principal entry point, main loops, scheduled events, and network -management for Tor. - -`ntmain.c` -: Implements Tor as a Windows service. (Not very well.) - -`onion.c` -: Generic code for generating and responding to CREATE and CREATED -cells, and performing the appropriate onion handshakes. Also contains -code to manage the server-side onion queue. - -`onion_fast.c` -: Implements the old SHA1-based CREATE_FAST/CREATED_FAST circuit -creation handshake. (Now deprecated.) - -`onion_ntor.c` -: Implements the Curve25519-based NTOR circuit creation handshake. - -`onion_tap.c` -: Implements the old RSA1024/DH1024-based TAP circuit creation handshake. (Now -deprecated.) - -`relay.c` -: Handles particular types of relay cells, and provides code to receive, -encrypt, route, and interpret relay cells. - -`scheduler.c` -: Decides which channel/circuit pair is ready to receive the next cell. - -`statefile.c` -: Handles loading and storing Tor's state file. - -`tor_main.c` -: Contains the actual `main()` function. (This is placed in a separate -file so that the unit tests can have their own `main()`.) - - -### Node-status modules ### - -`directory.c` -: Implements the HTTP-based directory protocol, including sending, -receiving, and handling most request types. (*Note*: The client parts -of this, and the generic-HTTP parts of this, could plausibly be split -off.) - -`microdesc.c` -: Implements the compact "microdescriptor" format for keeping track of -what we know about a router. - -`networkstatus.c` -: Code for fetching, storing, and interpreting consensus vote documents. - -`nodelist.c` -: Higher-level view of our knowledge of which Tor servers exist. Each -`node_t` corresponds to a router we know about. - -`routerlist.c` -: Code for storing and retrieving router descriptors and extrainfo -documents. - -`routerparse.c` -: Generic and specific code for parsing all Tor directory information -types. - -`routerset.c` -: Parses and interprets a specification for a set of routers (by IP -range, fingerprint, nickname (deprecated), or country). - - -### Client modules ### - -`addressmap.c` -: Handles client-side associations between one address and another. -These are used to implement client-side DNS caching (NOT RECOMMENDED), -MapAddress directives, Automapping, and more. - -`circpathbias.c` -: Path bias attack detection for circuits: tracks whether -connections made through a particular guard have an unusually high failure rate. - -`circuitstats.c` -: Code to track circuit performance statistics in order to adapt our behavior. -Notably includes an algorithm to track circuit build times. - -`dnsserv.c` -: Implements DNSPort for clients. (Note that in spite of the word -"server" in this module's name, it is used for Tor clients. It -implements a DNS server, not DNS for servers.) - -`entrynodes.c` -: Chooses, monitors, and remembers guard nodes. Also contains some -bridge-related code. - -`torcert.c` -: Code to interpret and generate Ed25519-based certificates. - -### Server modules ### - -`dns.c` -: Server-side DNS code. Handles sending and receiving DNS requests on -exit nodes, and implements the server-side DNS cache. - -`dirserv.c` -: Implements part of directory caches that handles responding to -client requests. - -`ext_orport.c` -: Implements the extended ORPort protocol for communication between -server-side pluggable transports and Tor servers. - -`hibernate.c` -: Performs bandwidth accounting, and puts Tor relays into hibernation -when their bandwidth is exhausted. - -`router.c` -: Management code for running a Tor server. In charge of RSA key -maintenance, descriptor generation and uploading. - -`routerkeys.c` -: Key handling code for a Tor server. (Currently handles only the -Ed25519 keys, but the RSA keys could be moved here too.) - - -### Onion service modules ### - -`rendcache.c` -: Stores onion service descriptors. - -`rendclient.c` -: Client-side implementation of the onion service protocol. - -`rendcommon.c` -: Parts of the onion service protocol that are shared by clients, -services, and/or Tor servers. - -`rendmid.c` -: Tor-server-side implementation of the onion service protocol. (Handles -acting as an introduction point or a rendezvous point.) - -`rendservice.c` -: Service-side implementation of the onion service protocol. - -`replaycache.c` -: Backend to check introduce2 requests for replay attempts. - - -### Authority modules ### - -`dircollate.c` -: Helper for `dirvote.c`: Given a set of votes, each containing a list -of Tor nodes, determines which entries across all the votes correspond -to the same nodes, and yields them in a useful order. - -`dirvote.c` -: Implements the directory voting algorithms that authorities use. - -`keypin.c` -: Implements a persistent key-pinning mechanism to tie RSA1024 -identities to ed25519 identities. - -### Miscellaneous modules ### - -`control.c` -: Implements the Tor controller protocol. - -`cpuworker.c` -: Implements the inner work queue function. We use this to move the -work of circuit creation (on server-side) to other CPUs. - -`fp_pair.c` -: Types for handling 2-tuples of 20-byte fingerprints. - -`geoip.c` -: Parses geoip files (which map IP addresses to country codes), and -performs lookups on the internal geoip table. Also stores some -geoip-related statistics. - -`policies.c` -: Parses and implements Tor exit policies. - -`reasons.c` -: Maps internal reason-codes to human-readable strings. - -`rephist.c` -: Tracks Tor servers' performance over time. - -`status.c` -: Writes periodic "heartbeat" status messages about the state of the Tor -process. - -`transports.c` -: Implements management for the pluggable transports subsystem. diff --git a/doc/HACKING/design/Makefile b/doc/HACKING/design/Makefile deleted file mode 100644 index e126130970..0000000000 --- a/doc/HACKING/design/Makefile +++ /dev/null @@ -1,34 +0,0 @@ - - - -HTML= \ - 00-overview.html \ - 01-common-utils.html \ - 01a-memory.html \ - 01b-collections.html \ - 01c-time.html \ - 01d-crypto.html \ - 01e-os-compat.html \ - 01f-threads.html \ - 01g-strings.html \ - 02-dataflow.html \ - 03-modules.html \ - this-not-that.html - -PNG = \ - diagrams/02/02-dataflow.png \ - diagrams/02/02-connection-types.png - -all: generated - -generated: $(HTML) $(PNG) - -%.html: %.md - maruku $< -o $@ - -%.png: %.dia - dia $< --export=$@ - -clean: - rm -f $(HTML) - rm -f $(PNG) diff --git a/doc/HACKING/design/this-not-that.md b/doc/HACKING/design/this-not-that.md deleted file mode 100644 index 815c7b2fbc..0000000000 --- a/doc/HACKING/design/this-not-that.md +++ /dev/null @@ -1,51 +0,0 @@ - -Don't use memcmp. Use {tor,fast}_{memeq,memneq,memcmp}. - -Don't use assert. Use tor_assert or tor_assert_nonfatal or BUG. Prefer -nonfatal assertions or BUG()s. - -Don't use sprintf or snprintf. Use tor_asprintf or tor_snprintf. - -Don't write hand-written binary parsers. Use trunnel. - -Don't use malloc, realloc, calloc, free, strdup, etc. Use tor_malloc, -tor_realloc, tor_calloc, tor_free, tor_strdup, etc. - -Don't use tor_realloc(x, y\*z). Use tor_reallocarray(x, y, z); - -Don't say "if (x) foo_free(x)". Just foo_free(x) and make sure that -foo_free(NULL) is a no-op. - -Don't use toupper or tolower; use TOR_TOUPPER and TOR_TOLOWER. - -Don't use isalpha, isalnum, etc. Instead use TOR_ISALPHA, TOR_ISALNUM, etc. - -Don't use strcat, strcpy, strncat, or strncpy. Use strlcat and strlcpy -instead. - -Don't use tor_asprintf then smartlist_add; use smartlist_add_asprintf. - -Don't use any of these functions: they aren't portable. Use the -version prefixed with `tor_` instead: strtok_r, memmem, memstr, -asprintf, localtime_r, gmtime_r, inet_aton, inet_ntop, inet_pton, -getpass, ntohll, htonll, strdup, (This list is incomplete.) - -Don't create or close sockets directly. Instead use the wrappers in -compat.h. - -When creating new APIs, only use 'char \*' to represent 'pointer to a -nul-terminated string'. Represent 'pointer to a chunk of memory' as -'uint8_t \*'. (Many older Tor APIs ignore this rule.) - -Don't encode/decode u32, u64, or u16 to byte arrays by casting -pointers. That can crash if the pointers aren't aligned, and can cause -endianness problems. Instead say something more like set_uint32(ptr, -htonl(foo)) to encode, and ntohl(get_uint32(ptr)) to decode. - -Don't declare a 0-argument function with "void foo()". That's C++ -syntax. In C you say "void foo(void)". - -When creating new APIs, use const everywhere you reasonably can. - -Sockets should have type tor_socket_t, not int. - diff --git a/doc/include.am b/doc/include.am index a9d3fa1c98..8651f845eb 100644 --- a/doc/include.am +++ b/doc/include.am @@ -47,6 +47,7 @@ EXTRA_DIST+= doc/asciidoc-helper.sh \ $(html_in) $(man_in) $(txt_in) \ doc/state-contents.txt \ doc/torrc_format.txt \ + doc/tor-doxygen.css \ doc/TUNING \ doc/HACKING/README.1st.md \ doc/HACKING/CodingStandards.md \ diff --git a/doc/tor-doxygen.css b/doc/tor-doxygen.css new file mode 100644 index 0000000000..97cd1886db --- /dev/null +++ b/doc/tor-doxygen.css @@ -0,0 +1,10 @@ + +p.definition { + font-size: small; + padding-left: 1.5em; +} + +p.reference { + font-size: small; + padding-left: 1.5em; +} diff --git a/doc/tor.1.txt b/doc/tor.1.txt index 2b09da6737..ed9efb6fca 100644 --- a/doc/tor.1.txt +++ b/doc/tor.1.txt @@ -18,145 +18,174 @@ SYNOPSIS DESCRIPTION ----------- -Tor is a connection-oriented anonymizing communication -service. Users choose a source-routed path through a set of nodes, and -negotiate a "virtual circuit" through the network, in which each node -knows its predecessor and successor, but no others. Traffic flowing down -the circuit is unwrapped by a symmetric key at each node, which reveals -the downstream node. + - -Basically, Tor provides a distributed network of servers or relays ("onion routers"). -Users bounce their TCP streams -- web traffic, ftp, ssh, etc. -- around the -network, and recipients, observers, and even the relays themselves have -difficulty tracking the source of the stream. - -By default, **tor** will act as a client only. To help the network -by providing bandwidth as a relay, change the **ORPort** configuration -option -- see below. Please also consult the documentation on the Tor -Project's website. + +Tor is a connection-oriented anonymizing communication service. Users +choose a source-routed path through a set of nodes, and negotiate a +"virtual circuit" through the network. Each node in a virtual circuit +knows its predecessor and successor nodes, but no other nodes. Traffic +flowing down the circuit is unwrapped by a symmetric key at each node, +which reveals the downstream node. + + +Basically, Tor provides a distributed network of servers or relays +("onion routers"). Users bounce their TCP streams, including web +traffic, ftp, ssh, etc., around the network, so that recipients, +observers, and even the relays themselves have difficulty tracking the +source of the stream. + +[NOTE] +By default, **tor** acts as a client only. To help the network by +providing bandwidth as a relay, change the **ORPort** configuration +option as mentioned below. Please also consult the documentation on +the Tor Project's website. COMMAND-LINE OPTIONS -------------------- -[[opt-h]] **-h**, **--help**:: + +Tor has a powerful command-line interface. This section lists optional +arguments you can specify at the command line using the **`tor`** +command. + +Configuration options can be specified on the command line in the +format **`--`**_OptionName_ _OptionValue_, on the command line in the +format _OptionName_ _OptionValue_, or in a configuration file. For +instance, you can tell Tor to start listening for SOCKS connections on +port 9999 by passing either **`--SocksPort 9999`** or **`SocksPort +9999`** on the command line, or by specifying **`SocksPort 9999`** in +the configuration file. On the command line, quote option values that +contain spaces. For instance, if you want Tor to log all debugging +messages to **`debug.log`**, you must specify **`--Log "debug file +debug.log"`**. + +NOTE: Configuration options on the command line override those in +configuration files. See **<<conf-format,THE CONFIGURATION FILE +FORMAT>>** for more information. + +The following options in this section are only recognized on the +**`tor`** command line, not in a configuration file. + +[[opt-h]] **`-h`**, **`--help`**:: Display a short help message and exit. -[[opt-f]] **-f** __FILE__:: +[[opt-f]] **`-f`** __FILE__:: Specify a new configuration file to contain further Tor configuration - options OR pass *-* to make Tor read its configuration from standard - input. (Default: @CONFDIR@/torrc, or $HOME/.torrc if that file is not - found) + options, or pass *-* to make Tor read its configuration from standard + input. (Default: **`@CONFDIR@/torrc`**, or **`$HOME/.torrc`** if + that file is not found) -[[opt-allow-missing-torrc]] **--allow-missing-torrc**:: - Do not require that configuration file specified by **-f** exist if - default torrc can be accessed. +[[opt-allow-missing-torrc]] **`--allow-missing-torrc`**:: + Allow the configuration file specified by **`-f`** to be missing, + if the defaults-torrc file (see below) is accessible. -[[opt-defaults-torrc]] **--defaults-torrc** __FILE__:: +[[opt-defaults-torrc]] **`--defaults-torrc`** __FILE__:: Specify a file in which to find default values for Tor options. The contents of this file are overridden by those in the regular configuration file, and by those on the command line. (Default: - @CONFDIR@/torrc-defaults.) + **`@CONFDIR@/torrc-defaults`**.) -[[opt-ignore-missing-torrc]] **--ignore-missing-torrc**:: - Specifies that Tor should treat a missing torrc file as though it +[[opt-ignore-missing-torrc]] **`--ignore-missing-torrc`**:: + Specify that Tor should treat a missing torrc file as though it were empty. Ordinarily, Tor does this for missing default torrc files, but not for those specified on the command line. -[[opt-hash-password]] **--hash-password** __PASSWORD__:: - Generates a hashed password for control port access. +[[opt-hash-password]] **`--hash-password`** __PASSWORD__:: + Generate a hashed password for control port access. -[[opt-list-fingerprint]] **--list-fingerprint**:: +[[opt-list-fingerprint]] **`--list-fingerprint`**:: Generate your keys and output your nickname and fingerprint. -[[opt-verify-config]] **--verify-config**:: - Verify the configuration file is valid. +[[opt-verify-config]] **`--verify-config`**:: + Verify whether the configuration file is valid. -[[opt-serviceinstall]] **--service install** [**--options** __command-line options__]:: +[[opt-dump-config]] **`--dump-config`** **`short`**|**`full`**|**`non-builtin`**:: + Write a complete list of Tor's configured options to standard output. + When the `short` flag is selected, only write the options that + are different from their default values. When `non-builtin` is selected, + write options that are not zero or the empty string. + When `full` is selected, write every option. + +[[opt-serviceinstall]] **`--service install`** [**`--options`** __command-line options__]:: Install an instance of Tor as a Windows service, with the provided command-line options. Current instructions can be found at https://www.torproject.org/docs/faq#NTService -[[opt-service]] **--service** **remove**|**start**|**stop**:: +[[opt-service]] **`--service`** **`remove`**|**`start`**|**`stop`**:: Remove, start, or stop a configured Tor Windows service. -[[opt-nt-service]] **--nt-service**:: +[[opt-nt-service]] **`--nt-service`**:: Used internally to implement a Windows service. -[[opt-list-torrc-options]] **--list-torrc-options**:: +[[opt-list-torrc-options]] **`--list-torrc-options`**:: List all valid options. -[[opt-list-deprecated-options]] **--list-deprecated-options**:: +[[opt-list-deprecated-options]] **`--list-deprecated-options`**:: List all valid options that are scheduled to become obsolete in a future version. (This is a warning, not a promise.) -[[opt-list-modules]] **--list-modules**:: - For each optional module, list whether or not it has been compiled - into Tor. (Any module not listed is not optional in this version of Tor.) +[[opt-list-modules]] **`--list-modules`**:: + List whether each optional module has been compiled into Tor. + (Any module not listed is not optional in this version of Tor.) -[[opt-version]] **--version**:: +[[opt-version]] **`--version`**:: Display Tor version and exit. The output is a single line of the format "Tor version [version number]." (The version number format is as specified in version-spec.txt.) -[[opt-quiet]] **--quiet**|**--hush**:: - Override the default console log. By default, Tor starts out logging - messages at level "notice" and higher to the console. It stops doing so - after it parses its configuration, if the configuration tells it to log - anywhere else. You can override this behavior with the **--hush** option, - which tells Tor to only send warnings and errors to the console, or with - the **--quiet** option, which tells Tor not to log to the console at all. - -[[opt-keygen]] **--keygen** [**--newpass**]:: - Running "tor --keygen" creates a new ed25519 master identity key for a - relay, or only a fresh temporary signing key and certificate, if you - already have a master key. Optionally you can encrypt the master identity - key with a passphrase: Tor will ask you for one. If you don't want to - encrypt the master key, just don't enter any passphrase when asked. + - + - The **--newpass** option should be used with --keygen only when you need - to add, change, or remove a passphrase on an existing ed25519 master - identity key. You will be prompted for the old passphase (if any), - and the new passphrase (if any). + - + - When generating a master key, you will probably want to use - **--DataDirectory** to control where the keys - and certificates will be stored, and **--SigningKeyLifetime** to - control their lifetimes. Their behavior is as documented in the - server options section below. (You must have write access to the specified - DataDirectory.) + - + - To use the generated files, you must copy them to the DataDirectory/keys - directory of your Tor daemon, and make sure that they are owned by the - user actually running the Tor daemon on your system. - -**--passphrase-fd** __FILEDES__:: - Filedescriptor to read the passphrase from. Note that unlike with the +[[opt-quiet]] **`--quiet`**|**`--hush`**:: + Override the default console logging behavior. By default, Tor + starts out logging messages at level "notice" and higher to the + console. It stops doing so after it parses its configuration, if + the configuration tells it to log anywhere else. These options + override the default console logging behavior. Use the + **`--hush`** option if you want Tor to log only warnings and + errors to the console, or use the **`--quiet`** option if you want + Tor not to log to the console at all. + +[[opt-keygen]] **`--keygen`** [**`--newpass`**]:: + Running **`tor --keygen`** creates a new ed25519 master identity key + for a relay, or only a fresh temporary signing key and + certificate, if you already have a master key. Optionally, you + can encrypt the master identity key with a passphrase. When Tor + asks you for a passphrase and you don't want to encrypt the master + key, just don't enter any passphrase when asked. + + + + Use the **`--newpass`** option with **`--keygen`** only when you + need to add, change, or remove a passphrase on an existing ed25519 + master identity key. You will be prompted for the old passphase + (if any), and the new passphrase (if any). ++ +[NOTE] + When generating a master key, you may want to use + **`--DataDirectory`** to control where the keys and certificates + will be stored, and **`--SigningKeyLifetime`** to control their + lifetimes. See the server options section to learn more about the + behavior of these options. You must have write access to the + specified DataDirectory. ++ +[normal] + To use the generated files, you must copy them to the + __DataDirectory__/**`keys`** directory of your Tor daemon, and + make sure that they are owned by the user actually running the Tor + daemon on your system. + +**`--passphrase-fd`** __FILEDES__:: + File descriptor to read the passphrase from. Note that unlike with the tor-gencert program, the entire file contents are read and used as the passphrase, including any trailing newlines. - Default: read from the terminal. + If the file descriptor is not specified, the passphrase is read + from the terminal by default. -[[opt-key-expiration]] **--key-expiration** [**purpose**]:: - The **purpose** specifies which type of key certificate to determine - the expiration of. The only currently recognised **purpose** is +[[opt-key-expiration]] **`--key-expiration`** [__purpose__]:: + The __purpose__ specifies which type of key certificate to determine + the expiration of. The only currently recognised __purpose__ is "sign". + + - Running "tor --key-expiration sign" will attempt to find your signing - key certificate and will output, both in the logs as well as to stdout, - the signing key certificate's expiration time in ISO-8601 format. - For example, the output sent to stdout will be of the form: - "signing-cert-expiry: 2017-07-25 08:30:15 UTC" - -Other options can be specified on the command-line in the format "--option -value", in the format "option value", or in a configuration file. For -instance, you can tell Tor to start listening for SOCKS connections on port -9999 by passing --SocksPort 9999 or SocksPort 9999 to it on the command line, -or by putting "SocksPort 9999" in the configuration file. You will need to -quote options with spaces in them: if you want Tor to log all debugging -messages to debug.log, you will probably need to say **--Log** `"debug file -debug.log"`. - -Options on the command line override those in configuration files. See the -next section for more information. + Running **`tor --key-expiration sign`** will attempt to find your + signing key certificate and will output, both in the logs as well + as to stdout, the signing key certificate's expiration time in + ISO-8601 format. For example, the output sent to stdout will be + of the form: "signing-cert-expiry: 2017-07-25 08:30:15 UTC" +[[conf-format]] THE CONFIGURATION FILE FORMAT ----------------------------- @@ -211,10 +240,14 @@ GENERAL OPTIONS Note that this option, and other bandwidth-limiting options, apply to TCP data only: They do not count TCP headers or DNS traffic. + + + Tor uses powers of two, not powers of ten, so 1 GByte is + 1024*1024*1024 bytes as opposed to 1 billion bytes. + + + With this option, and in other options that take arguments in bytes, KBytes, and so on, other formats are also supported. Notably, "KBytes" can also be written as "kilobytes" or "kb"; "MBytes" can be written as "megabytes" or "MB"; "kbits" can be written as "kilobits"; and so forth. + Case doesn't matter. Tor also accepts "byte" and "bit" in the singular. The prefixes "tera" and "T" are also recognized. If no units are given, we default to bytes. @@ -269,27 +302,28 @@ GENERAL OPTIONS client launches the pluggable transport proxy executable in __path-to-binary__ using __options__ as its command-line options, and forwards its traffic to it. It's the duty of that proxy to properly forward - the traffic to the bridge. + the traffic to the bridge. (Default: none) [[ServerTransportPlugin]] **ServerTransportPlugin** __transport__ exec __path-to-binary__ [options]:: The Tor relay launches the pluggable transport proxy in __path-to-binary__ using __options__ as its command-line options, and expects to receive - proxied client traffic from it. + proxied client traffic from it. (Default: none) [[ServerTransportListenAddr]] **ServerTransportListenAddr** __transport__ __IP__:__PORT__:: When this option is set, Tor will suggest __IP__:__PORT__ as the listening address of any pluggable transport proxy that tries to launch __transport__. (IPv4 addresses should written as-is; IPv6 - addresses should be wrapped in square brackets.) + addresses should be wrapped in square brackets.) (Default: none) [[ServerTransportOptions]] **ServerTransportOptions** __transport__ __k=v__ __k=v__ ...:: When this option is set, Tor will pass the __k=v__ parameters to any pluggable transport proxy that tries to launch __transport__. + - (Example: ServerTransportOptions obfs45 shared-secret=bridgepasswd cache=/var/lib/tor/cache) + (Example: ServerTransportOptions obfs45 shared-secret=bridgepasswd cache=/var/lib/tor/cache) (Default: none) [[ExtORPort]] **ExtORPort** \['address':]__port__|**auto**:: Open this port to listen for Extended ORPort connections from your - pluggable transports. + pluggable transports. + + (Default: **DataDirectory**/extended_orport_auth_cookie) [[ExtORPortCookieAuthFile]] **ExtORPortCookieAuthFile** __Path__:: If set, this option overrides the default location and file name @@ -816,6 +850,9 @@ GENERAL OPTIONS engine of this name. This must be used for any dynamic hardware engine. Names can be verified with the openssl engine command. Can not be changed while tor is running. + + + If the engine name is prefixed with a "!", then Tor will exit if the + engine cannot be loaded. [[AccelDir]] **AccelDir** __DIR__:: Specify this option if using dynamic hardware acceleration and the engine @@ -1990,10 +2027,6 @@ is non-zero): would like its bridge address to be given out. Set it to "none" if you want BridgeDB to avoid distributing your bridge address, or "any" to let BridgeDB decide. (Default: any) - + - Note: as of Oct 2017, the BridgeDB part of this option is not yet - implemented. Until BridgeDB is updated to obey this option, your - bridge will make this request, but it will not (yet) be obeyed. [[ContactInfo]] **ContactInfo** __email_address__:: Administrative contact information for this relay or bridge. This line @@ -2330,9 +2363,9 @@ is non-zero): using a given calculation rule (see: AccountingStart, AccountingRule). Useful if you need to stay under a specific bandwidth. By default, the number used for calculation is the max of either the bytes sent or - received. For example, with AccountingMax set to 1 GByte, a server - could send 900 MBytes and receive 800 MBytes and continue running. - It will only hibernate once one of the two reaches 1 GByte. This can + received. For example, with AccountingMax set to 1 TByte, a server + could send 900 GBytes and receive 800 GBytes and continue running. + It will only hibernate once one of the two reaches 1 TByte. This can be changed to use the sum of the both bytes received and sent by setting the AccountingRule option to "sum" (total bandwidth in/out). When the number of bytes remaining gets low, Tor will stop accepting new connections @@ -2343,7 +2376,12 @@ is non-zero): enabling hibernation is preferable to setting a low bandwidth, since it provides users with a collection of fast servers that are up some of the time, which is more useful than a set of slow servers that are - always "available". + always "available". + + + + Note that (as also described in the Bandwidth section) Tor uses + powers of two, not powers of ten: 1 GByte is 1024*1024*1024, not + one billion. Be careful: some internet service providers might count + GBytes differently. [[AccountingRule]] **AccountingRule** **sum**|**max**|**in**|**out**:: How we determine when our AccountingMax has been reached (when we @@ -3429,256 +3467,248 @@ Tor catches the following signals: FILES ----- -**@CONFDIR@/torrc**:: - The configuration file, which contains "option value" pairs. +**`@CONFDIR@/torrc`**:: + Default location of the configuration file. -**$HOME/.torrc**:: +**`$HOME/.torrc`**:: Fallback location for torrc, if @CONFDIR@/torrc is not found. -**@LOCALSTATEDIR@/lib/tor/**:: +**`@LOCALSTATEDIR@/lib/tor/`**:: The tor process stores keys and other data here. +__CacheDirectory__/**`cached-certs`**:: + Contains downloaded directory key certificates that are used to verify + authenticity of documents generated by the Tor directory authorities. -__CacheDirectory__**/cached-certs**:: - This file holds downloaded directory key certificates that are used to - verify authenticity of documents generated by Tor directory authorities. - -__CacheDirectory__**/cached-consensus** and/or **cached-microdesc-consensus**:: +__CacheDirectory__/**`cached-consensus`** and/or **`cached-microdesc-consensus`**:: The most recent consensus network status document we've downloaded. -__CacheDirectory__**/cached-descriptors** and **cached-descriptors.new**:: - These files hold downloaded router statuses. Some routers may appear more - than once; if so, the most recently published descriptor is used. Lines - beginning with @-signs are annotations that contain more information about - a given router. The ".new" file is an append-only journal; when it gets - too large, all entries are merged into a new cached-descriptors file. - -__CacheDirectory__**/cached-extrainfo** and **cached-extrainfo.new**:: - As "cached-descriptors", but holds optionally-downloaded "extra-info" - documents. Relays use these documents to send inessential information - about statistics, bandwidth history, and network health to the - authorities. They aren't fetched by default; see the DownloadExtraInfo - option for more info. - -__CacheDirectory__**/cached-microdescs** and **cached-microdescs.new**:: +__CacheDirectory__/**`cached-descriptors`** and **`cached-descriptors.new`**:: + These files contain the downloaded router statuses. Some routers may appear + more than once; if so, the most recently published descriptor is + used. Lines beginning with **`@`**-signs are annotations that contain more + information about a given router. The **`.new`** file is an append-only + journal; when it gets too large, all entries are merged into a new + cached-descriptors file. + +__CacheDirectory__/**`cached-extrainfo`** and **`cached-extrainfo.new`**:: + Similar to **cached-descriptors**, but holds optionally-downloaded + "extra-info" documents. Relays use these documents to send inessential + information about statistics, bandwidth history, and network health to the + authorities. They aren't fetched by default. See the DownloadExtraInfo + option for more information. + +__CacheDirectory__/**`cached-microdescs`** and **`cached-microdescs.new`**:: These files hold downloaded microdescriptors. Lines beginning with - @-signs are annotations that contain more information about a given - router. The ".new" file is an append-only journal; when it gets too + **`@`**-signs are annotations that contain more information about a given + router. The **`.new`** file is an append-only journal; when it gets too large, all entries are merged into a new cached-microdescs file. -__DataDirectory__**/state**:: - A set of persistent key-value mappings. These are documented in - the file. These include: - - The current entry guards and their status. - - The current bandwidth accounting values. - - When the file was last written - - What version of Tor generated the state file - - A short history of bandwidth usage, as produced in the server - descriptors. - -__DataDirectory__**/sr-state**:: - Authority only. State file used to record information about the current +__DataDirectory__/**`state`**:: + Contains a set of persistent key-value mappings. These include: + - the current entry guards and their status. + - the current bandwidth accounting values. + - when the file was last written + - what version of Tor generated the state file + - a short history of bandwidth usage, as produced in the server + descriptors. + +__DataDirectory__/**`sr-state`**:: + _Authority only_. This file is used to record information about the current status of the shared-random-value voting state. -__CacheDirectory__**/diff-cache**:: - Directory cache only. Holds older consensuses, and diffs from older - consensuses to the most recent consensus of each type, compressed - in various ways. Each file contains a set of key-value arguments - describing its contents, followed by a single NUL byte, followed by the - main file contents. - -__DataDirectory__**/bw_accounting**:: - Used to track bandwidth accounting values (when the current period starts - and ends; how much has been read and written so far this period). This file - is obsolete, and the data is now stored in the \'state' file instead. - -__DataDirectory__**/control_auth_cookie**:: - Used for cookie authentication with the controller. Location can be - overridden by the CookieAuthFile config option. Regenerated on startup. See +__CacheDirectory__/**`diff-cache`**:: + _Directory cache only_. Holds older consensuses and diffs from oldest to + the most recent consensus of each type compressed in various ways. Each + file contains a set of key-value arguments describing its contents, + followed by a single NUL byte, followed by the main file contents. + +__DataDirectory__/**`bw_accounting`**:: + This file is obsolete and the data is now stored in the **`state`** file + instead. Used to track bandwidth accounting values (when the current period + starts and ends; how much has been read and written so far this period). + +__DataDirectory__/**`control_auth_cookie`**:: + This file can be used only when cookie authentication is enabled. Used for + cookie authentication with the controller. Location can be overridden by + the `CookieAuthFile` configuration option. Regenerated on startup. See control-spec.txt in https://spec.torproject.org/[torspec] for details. - Only used when cookie authentication is enabled. -__DataDirectory__**/lock**:: - This file is used to prevent two Tor instances from using same data - directory. If access to this file is locked, data directory is already - in use by Tor. +__DataDirectory__/**`lock`**:: + This file is used to prevent two Tor instances from using the same data + directory. If access to this file is locked, data directory is already in + use by Tor. -__DataDirectory__**/key-pinning-journal**:: +__DataDirectory__/**`key-pinning-journal`**:: Used by authorities. A line-based file that records mappings between - RSA1024 identity keys and Ed25519 identity keys. Authorities enforce - these mappings, so that once a relay has picked an Ed25519 key, stealing - or factoring the RSA1024 key will no longer let an attacker impersonate - the relay. + RSA1024 and Ed25519 identity keys. Authorities enforce these mappings, so + that once a relay has picked an Ed25519 key, stealing or factoring the + RSA1024 key will no longer let an attacker impersonate the relay. -__KeyDirectory__**/authority_identity_key**:: +__KeyDirectory__/**`authority_identity_key`**:: A v3 directory authority's master identity key, used to authenticate its signing key. Tor doesn't use this while it's running. The tor-gencert - program uses this. If you're running an authority, you should keep this - key offline, and not actually put it here. + program uses this. If you're running an authority, you should keep this key + offline, and not put it in this file. -__KeyDirectory__**/authority_certificate**:: - A v3 directory authority's certificate, which authenticates the authority's - current vote- and consensus-signing key using its master identity key. - Only directory authorities use this file. +__KeyDirectory__/**`authority_certificate`**:: + Only directory authorities use this file. A v3 directory authority's + certificate which authenticates the authority's current vote- and + consensus-signing key using its master identity key. -__KeyDirectory__**/authority_signing_key**:: - A v3 directory authority's signing key, used to sign votes and consensuses. - Only directory authorities use this file. Corresponds to the +__KeyDirectory__/**`authority_signing_key`**:: + Only directory authorities use this file. A v3 directory authority's + signing key that is used to sign votes and consensuses. Corresponds to the **authority_certificate** cert. -__KeyDirectory__**/legacy_certificate**:: - As authority_certificate: used only when V3AuthUseLegacyKey is set. - See documentation for V3AuthUseLegacyKey. +__KeyDirectory__/**`legacy_certificate`**:: + As authority_certificate; used only when `V3AuthUseLegacyKey` is set. See + documentation for V3AuthUseLegacyKey. -__KeyDirectory__**/legacy_signing_key**:: - As authority_signing_key: used only when V3AuthUseLegacyKey is set. - See documentation for V3AuthUseLegacyKey. +__KeyDirectory__/**`legacy_signing_key`**:: + As authority_signing_key: used only when `V3AuthUseLegacyKey` is set. See + documentation for V3AuthUseLegacyKey. -__KeyDirectory__**/secret_id_key**:: +__KeyDirectory__/**`secret_id_key`**:: A relay's RSA1024 permanent identity key, including private and public - components. Used to sign router descriptors, and to sign other keys. + components. Used to sign router descriptors, and to sign other keys. -__KeyDirectory__**/ed25519_master_id_public_key**:: +__KeyDirectory__/**`ed25519_master_id_public_key`**:: The public part of a relay's Ed25519 permanent identity key. -__KeyDirectory__**/ed25519_master_id_secret_key**:: - The private part of a relay's Ed25519 permanent identity key. This key - is used to sign the medium-term ed25519 signing key. This file can be - kept offline, or kept encrypted. If so, Tor will not be able to generate - new signing keys itself; you'll need to use tor --keygen yourself to do - so. +__KeyDirectory__/**`ed25519_master_id_secret_key`**:: + The private part of a relay's Ed25519 permanent identity key. This key is + used to sign the medium-term ed25519 signing key. This file can be kept + offline or encrypted. If so, Tor will not be able to generate new signing + keys automatically; you'll need to use `tor --keygen` to do so. -__KeyDirectory__**/ed25519_signing_secret_key**:: +__KeyDirectory__/**`ed25519_signing_secret_key`**:: The private and public components of a relay's medium-term Ed25519 signing - key. This key is authenticated by the Ed25519 master key, in turn + key. This key is authenticated by the Ed25519 master key, which in turn authenticates other keys (and router descriptors). -__KeyDirectory__**/ed25519_signing_cert**:: - The certificate which authenticates "ed25519_signing_secret_key" as - having been signed by the Ed25519 master key. +__KeyDirectory__/**`ed25519_signing_cert`**:: + The certificate which authenticates "ed25519_signing_secret_key" as having + been signed by the Ed25519 master key. -__KeyDirectory__**/secret_onion_key** and **secret_onion_key.old**:: +__KeyDirectory__/**`secret_onion_key`** and **`secret_onion_key.old`**:: A relay's RSA1024 short-term onion key. Used to decrypt old-style ("TAP") - circuit extension requests. The ".old" file holds the previously - generated key, which the relay uses to handle any requests that were - made by clients that didn't have the new one. + circuit extension requests. The **`.old`** file holds the previously + generated key, which the relay uses to handle any requests that were made + by clients that didn't have the new one. -__KeyDirectory__**/secret_onion_key_ntor** and **secret_onion_key_ntor.old**:: +__KeyDirectory__/**`secret_onion_key_ntor`** and **`secret_onion_key_ntor.old`**:: A relay's Curve25519 short-term onion key. Used to handle modern ("ntor") - circuit extension requests. The ".old" file holds the previously - generated key, which the relay uses to handle any requests that were - made by clients that didn't have the new one. + circuit extension requests. The **`.old`** file holds the previously + generated key, which the relay uses to handle any requests that were made + by clients that didn't have the new one. -__DataDirectory__**/fingerprint**:: - Only used by servers. Holds the fingerprint of the server's identity key. +__DataDirectory__/**`fingerprint`**:: + Only used by servers. Contains the fingerprint of the server's identity key. -__DataDirectory__**/hashed-fingerprint**:: - Only used by bridges. Holds the hashed fingerprint of the bridge's +__DataDirectory__/**`hashed-fingerprint`**:: + Only used by bridges. Contains the hashed fingerprint of the bridge's identity key. (That is, the hash of the hash of the identity key.) -__DataDirectory__**/approved-routers**:: - Only used by authoritative directory servers. This file lists - the status of routers by their identity fingerprint. - Each line lists a status and a fingerprint separated by - whitespace. See your **fingerprint** file in the __DataDirectory__ for an - example line. If the status is **!reject** then descriptors from the - given identity (fingerprint) are rejected by this server. If it is - **!invalid** then descriptors are accepted but marked in the directory as - not valid, that is, not recommended. - -__DataDirectory__**/v3-status-votes**:: - Only for v3 authoritative directory servers. This file contains - status votes from all the authoritative directory servers. - -__CacheDirectory__**/unverified-consensus**:: - This file contains a network consensus document that has been downloaded, - but which we didn't have the right certificates to check yet. - -__CacheDirectory__**/unverified-microdesc-consensus**:: - This file contains a microdescriptor-flavored network consensus document - that has been downloaded, but which we didn't have the right certificates - to check yet. - -__DataDirectory__**/unparseable-desc**:: +__DataDirectory__/**`approved-routers`**:: + Only used by authoritative directory servers. This file lists the status of + routers by their identity fingerprint. Each line lists a status and a + fingerprint separated by whitespace. See your **`fingerprint`** file in the + __DataDirectory__ for an example line. If the status is **!reject**, then + the descriptors from the given identity (fingerprint) are rejected by this + server. If it is **!invalid**, then the descriptors are accepted but marked + in the directory as not valid, that is, not recommended. + +__DataDirectory__/**`v3-status-votes`**:: + Only for v3 authoritative directory servers. This file contains status + votes from all the authoritative directory servers. + +__CacheDirectory__/**`unverified-consensus`**:: + Contains a network consensus document that has been downloaded, but which + we didn't have the right certificates to check yet. + +__CacheDirectory__/**`unverified-microdesc-consensus`**:: + Contains a microdescriptor-flavored network consensus document that has + been downloaded, but which we didn't have the right certificates to check + yet. + +__DataDirectory__/**`unparseable-desc`**:: Onion server descriptors that Tor was unable to parse are dumped to this file. Only used for debugging. -__DataDirectory__**/router-stability**:: +__DataDirectory__/**`router-stability`**:: Only used by authoritative directory servers. Tracks measurements for - router mean-time-between-failures so that authorities have a good idea of + router mean-time-between-failures so that authorities have a fair idea of how to set their Stable flags. -__DataDirectory__**/stats/dirreq-stats**:: +__DataDirectory__/**`stats/dirreq-stats`**:: Only used by directory caches and authorities. This file is used to collect directory request statistics. -__DataDirectory__**/stats/entry-stats**:: +__DataDirectory__/**`stats/entry-stats`**:: Only used by servers. This file is used to collect incoming connection statistics by Tor entry nodes. -__DataDirectory__**/stats/bridge-stats**:: +__DataDirectory__/**`stats/bridge-stats`**:: Only used by servers. This file is used to collect incoming connection statistics by Tor bridges. -__DataDirectory__**/stats/exit-stats**:: +__DataDirectory__/**`stats/exit-stats`**:: Only used by servers. This file is used to collect outgoing connection statistics by Tor exit routers. -__DataDirectory__**/stats/buffer-stats**:: +__DataDirectory__/**`stats/buffer-stats`**:: Only used by servers. This file is used to collect buffer usage history. -__DataDirectory__**/stats/conn-stats**:: +__DataDirectory__/**`stats/conn-stats`**:: Only used by servers. This file is used to collect approximate connection history (number of active connections over time). -__DataDirectory__**/stats/hidserv-stats**:: +__DataDirectory__/**`stats/hidserv-stats`**:: Only used by servers. This file is used to collect approximate counts of what fraction of the traffic is hidden service rendezvous traffic, and approximately how many hidden services the relay has seen. -__DataDirectory__**/networkstatus-bridges**:: +__DataDirectory__/**networkstatus-bridges`**:: Only used by authoritative bridge directories. Contains information about bridges that have self-reported themselves to the bridge authority. -__DataDirectory__**/approved-routers**:: - Authorities only. This file is used to configure which relays are - known to be valid, invalid, and so forth. - -__HiddenServiceDirectory__**/hostname**:: +__HiddenServiceDirectory__/**`hostname`**:: The <base32-encoded-fingerprint>.onion domain name for this hidden service. If the hidden service is restricted to authorized clients only, this file also contains authorization data for all clients. - + - Note that clients will ignore any extra subdomains prepended to a hidden - service hostname. So if you have "xyz.onion" as your hostname, you - can tell clients to connect to "www.xyz.onion" or "irc.xyz.onion" ++ +[NOTE] + The clients will ignore any extra subdomains prepended to a hidden + service hostname. Supposing you have "xyz.onion" as your hostname, you + can ask your clients to connect to "www.xyz.onion" or "irc.xyz.onion" for virtual-hosting purposes. -__HiddenServiceDirectory__**/private_key**:: - The private key for this hidden service. +__HiddenServiceDirectory__/**`private_key`**:: + Contains the private key for this hidden service. -__HiddenServiceDirectory__**/client_keys**:: - Authorization data for a hidden service that is only accessible by +__HiddenServiceDirectory__/**`client_keys`**:: + Contains authorization data for a hidden service that is only accessible by authorized clients. -__HiddenServiceDirectory__**/onion_service_non_anonymous**:: +__HiddenServiceDirectory__/**`onion_service_non_anonymous`**:: This file is present if a hidden service key was created in **HiddenServiceNonAnonymousMode**. SEE ALSO -------- -**torsocks**(1), **torify**(1) + - -**https://www.torproject.org/** -**torspec: https://spec.torproject.org ** +For more information, refer to the Tor Project website at +https://www.torproject.org/ and the Tor specifications at +https://spec.torproject.org. See also **torsocks**(1) and **torify**(1). BUGS ---- -Plenty, probably. Tor is still in development. Please report them at https://trac.torproject.org/. +Because Tor is still under development, there may be plenty of bugs. Please +report them at https://trac.torproject.org/. AUTHORS ------- |