diff options
-rw-r--r-- | changes/ticket31849 | 5 | ||||
-rw-r--r-- | doc/HACKING/design/00-overview.md | 124 | ||||
-rw-r--r-- | doc/HACKING/design/01-common-utils.md | 121 | ||||
-rw-r--r-- | doc/HACKING/design/01a-memory.md | 93 | ||||
-rw-r--r-- | doc/HACKING/design/01b-collections.md | 43 | ||||
-rw-r--r-- | doc/HACKING/design/01c-time.md | 75 | ||||
-rw-r--r-- | doc/HACKING/design/01d-crypto.md | 169 | ||||
-rw-r--r-- | doc/HACKING/design/01e-os-compat.md | 50 | ||||
-rw-r--r-- | doc/HACKING/design/01f-threads.md | 26 | ||||
-rw-r--r-- | doc/HACKING/design/01g-strings.md | 95 | ||||
-rw-r--r-- | doc/HACKING/design/02-dataflow.md | 236 | ||||
-rw-r--r-- | doc/HACKING/design/03-modules.md | 247 | ||||
-rw-r--r-- | doc/HACKING/design/Makefile | 34 | ||||
-rw-r--r-- | doc/HACKING/design/this-not-that.md | 51 |
14 files changed, 1369 insertions, 0 deletions
diff --git a/changes/ticket31849 b/changes/ticket31849 new file mode 100644 index 0000000000..9d12d938c4 --- /dev/null +++ b/changes/ticket31849 @@ -0,0 +1,5 @@ + o Documentation: + - The Tor source code repository now includes a (somewhat dated) + description of Tor's modular architecture, in doc/HACKING/design. + This is based on the old "tor-guts.git" repository, which we are + adopting and superseding. Closes ticket 31849. diff --git a/doc/HACKING/design/00-overview.md b/doc/HACKING/design/00-overview.md new file mode 100644 index 0000000000..2103a9062a --- /dev/null +++ b/doc/HACKING/design/00-overview.md @@ -0,0 +1,124 @@ + +## Overview ## + +This document describes the general structure of the Tor codebase, how +it fits together, what functionality is available for extending Tor, +and gives some notes on how Tor got that way. + +Tor remains a work in progress: We've been working on it for more than a +decade, and we've learned a lot about good coding since we first +started. This means, however, that some of the older pieces of Tor will +have some "code smell" in them that could sure stand a brisk +refactoring. So when I describe a piece of code, I'll sometimes give a +note on how it got that way, and whether I still think that's a good +idea. + +The first drafts of this document were written in the Summer and Fall of +2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7 +was under development. If you're reading this far in the future, some +things may have changed. Caveat haxxor! + +This document is not an overview of the Tor protocol. For that, see the +design paper and the specifications at https://spec.torproject.org/ . + +For more information about Tor's coding standards and some helpful +development tools, see doc/HACKING in the Tor repository. + +For more information about writing tests, see doc/HACKING/WritingTests.txt +in the Tor repository. + +### The very high level ### + +Ultimately, Tor runs as an event-driven network daemon: it responds to +network events, signals, and timers by sending and receiving things over +the network. Clients, relays, and directory authorities all use the +same codebase: the Tor process will run as a client, relay, or authority +depending on its configuration. + +Tor has a few major dependencies, including Libevent (used to tell which +sockets are readable and writable), OpenSSL (used for many encryption +functions, and to implement the TLS protocol), and zlib (used to +compress and uncompress directory information). + +Most of Tor's work today is done in a single event-driven main thread. +Tor also spawns one or more worker threads to handle CPU-intensive +tasks. (Right now, this only includes circuit encryption.) + +On startup, Tor initializes its libraries, reads and responds to its +configuration files, and launches a main event loop. At first, the only +events that Tor listens for are a few signals (like TERM and HUP), and +one or more listener sockets (for different kinds of incoming +connections). Tor also configures a timer function to run once per +second to handle periodic events. As Tor runs over time, other events +will open, and new events will be scheduled. + +The codebase is divided into a few main subdirectories: + + src/common -- utility functions, not necessarily tor-specific. + + src/or -- implements the Tor protocols. + + src/test -- unit and regression tests + + src/ext -- Code maintained elsewhere that we include in the Tor + source distribution. + + src/trunnel -- automatically generated code (from the Trunnel) + tool: used to parse and encode binary formats. + +### Some key high-level abstractions ### + +The most important abstractions at Tor's high-level are Connections, +Channels, Circuits, and Nodes. + +A 'Connection' represents a stream-based information flow. Most +connections are TCP connections to remote Tor servers and clients. (But +as a shortcut, a relay will sometimes make a connection to itself +without actually using a TCP connection. More details later on.) +Connections exist in different varieties, depending on what +functionality they provide. The principle types of connection are +"edge" (eg a socks connection or a connection from an exit relay to a +destination), "OR" (a TLS stream connecting to a relay), "Directory" (an +HTTP connection to learn about the network), and "Control" (a connection +from a controller). + +A 'Circuit' is persistent tunnel through the Tor network, established +with public-key cryptography, and used to send cells one or more hops. +Clients keep track of multi-hop circuits, and the cryptography +associated with each hop. Relays, on the other hand, keep track only of +their hop of each circuit. + +A 'Channel' is an abstract view of sending cells to and from a Tor +relay. Currently, all channels are implemented using OR connections. +If we switch to other strategies in the future, we'll have more +connection types. + +A 'Node' is a view of a Tor instance's current knowledge and opinions +about a Tor relay orbridge. + +### The rest of this document. ### + +> **Note**: This section describes the eventual organization of this +> document, which is not yet complete. + +We'll begin with an overview of the various utility functions available +in Tor's 'common' directory. Knowing about these is key to writing +portable, simple code in Tor. + +Then we'll go on and talk about the main data-flow of the Tor network: +how Tor generates and responds to network traffic. This will occupy a +chapter for the main overview, with other chapters for special topics. + +After that, we'll mention the main modules in Tor, and describe the +function of each. + +We'll cover the directory subsystem next: how Tor learns about other +relays, and how relays advertise themselves. + +Then we'll cover a few specialized modules, such as hidden services, +sandboxing, hibernation, accounting, statistics, guards, path +generation, pluggable transports, and how they integrate with the rest of Tor. + +We'll close with a meandering overview of important pending issues in +the Tor codebase, and how they affect the future of the Tor software. + diff --git a/doc/HACKING/design/01-common-utils.md b/doc/HACKING/design/01-common-utils.md new file mode 100644 index 0000000000..79a6a7b7d3 --- /dev/null +++ b/doc/HACKING/design/01-common-utils.md @@ -0,0 +1,121 @@ + +## Utility code in Tor + +Most of Tor's utility code is in modules in the src/common subdirectory. + +These are divided, broadly, into _compatibility_ functions, _utility_ +functions, _containers_, and _cryptography_. (Someday in the future, it +would be great to split these modules into separate directories. Also, some +functions are probably put in the wrong modules) + +### Compatibility code + +These functions live in src/common/compat\*.c; some corresponding macros live +in src/common/compat\*.h. They serve as wrappers around platform-specific or +compiler-specific logic functionality. + +In general, the rest of the Tor code *should not* be calling platform-specific +or otherwise non-portable functions. Instead, they should call wrappers from +compat.c, which implement a common cross-platform API. (If you don't know +whether a function is portable, it's usually good enough to see whether it +exists on OSX, Linux, and Windows.) + +Other compatibility modules include backtrace.c, which generates stack traces +for crash reporting; sandbox.c, which implements the Linux seccomp2 sandbox; +and procmon.c, which handles monitoring a child process. + +Parts of address.c are compatibility code for handling network addressing +issues; other parts are in util.c. + +Notable compatibility areas are: + + * mmap support for mapping files into the address space (read-only) + + * Code to work around the intricacies + + * Workaround code for Windows's horrible winsock incompatibilities and + Linux's intricate socket extensions. + + * Helpful string functions like memmem, memstr, asprintf, strlcpy, and + strlcat that not all platforms have. + + * Locale-ignoring variants of the ctypes functions. + + * Time-manipulation functions + + * File locking function + + * IPv6 functions for platforms that don't have enough IPv6 support + + * Endianness functions + + * OS functions + + * Threading and locking functions. + +=== Utility functions + +General-purpose utilities are in util.c; they include higher-level wrappers +around many of the compatibility functions to provide things like +file-at-once access, memory management functions, math, string manipulation, +time manipulation, filesystem manipulation, etc. + +(Some functionality, like daemon-launching, would be better off in a +compatibility module.) + +In util_format.c, we have code to implement stuff like base-32 and base-64 +encoding. + +The address.c module interfaces with the system resolver and implements +address parsing and formatting functions. It converts sockaddrs to and from +a more compact tor_addr_t type. + +The di_ops.c module provides constant-time comparison and associative-array +operations, for side-channel avoidance. + +The logging subsystem in log.c supports logging to files, to controllers, to +stdout/stderr, or to the system log. + +The abstraction in memarea.c is used in cases when a large amount of +temporary objects need to be allocated, and they can all be freed at the same +time. + +The torgzip.c module wraps the zlib library to implement compression. + +Workqueue.c provides a simple multithreaded work-queue implementation. + +### Containers + +The container.c module defines these container types, used throughout the Tor +codebase. + +There is a dynamic array called **smartlist**, used as our general resizeable +array type. It supports sorting, searching, common set operations, and so +on. It has specialized functions for smartlists of strings, and for +heap-based priority queues. + +There's a bit-array type. + +A set of mapping types to map strings, 160-bit digests, and 256-bit digests +to void \*. These are what we generally use when we want O(1) lookup. + +Additionally, for containers, we use the ht.h and tor_queue.h headers, in +src/ext. These provide intrusive hashtable and linked-list macros. + +### Cryptography + +Once, we tried to keep our cryptography code in a single "crypto.c" file, +with an "aes.c" module containing an AES implementation for use with older +OpenSSLs. + +Now, our practice has become to introduce crypto_\*.c modules when adding new +cryptography backend code. We have modules for Ed25519, Curve25519, +secret-to-key algorithms, and password-based boxed encryption. + +Our various TLS compatibility code, wrappers, and hacks are kept in +tortls.c, which is probably too full of Tor-specific kludges. I'm +hoping we can eliminate most of those kludges when we finally remove +support for older versions of our TLS handshake. + + + diff --git a/doc/HACKING/design/01a-memory.md b/doc/HACKING/design/01a-memory.md new file mode 100644 index 0000000000..9a20782962 --- /dev/null +++ b/doc/HACKING/design/01a-memory.md @@ -0,0 +1,93 @@ + +## Memory management + +### Heap-allocation functions + +Tor imposes a few light wrappers over C's native malloc and free +functions, to improve convenience, and to allow wholescale replacement +of malloc and free as needed. + +You should never use 'malloc', 'calloc', 'realloc, or 'free' on their +own; always use the variants prefixed with 'tor_'. +They are the same as the standard C functions, with the following +exceptions: + + * tor_free(NULL) is a no-op. + * tor_free() is a macro that takes an lvalue as an argument and sets it to + NULL after freeing it. To avoid this behavior, you can use tor_free_() + instead. + * tor_malloc() and friends fail with an assertion if they are asked to + allocate a value so large that it is probably an underflow. + * It is always safe to tor_malloc(0), regardless of whether your libc + allows it. + * tor_malloc(), tor_realloc(), and friends are never allowed to fail. + Instead, Tor will die with an assertion. This means that you never + need to check their return values. See the next subsection for + information on why we think this is a good idea. + +We define additional general-purpose memory allocation functions as well: + + * tor_malloc_zero(x) behaves as calloc(1, x), except the it makes clear + the intent to allocate a single zeroed-out value. + * tor_reallocarray(x,y) behaves as the OpenBSD reallocarray function. + Use it for cases when you need to realloc() in a multiplication-safe + way. + +And specific-purpose functions as well: + + * tor_strdup() and tor_strndup() behaves as the underlying libc functions, + but use tor_malloc() instead of the underlying function. + * tor_memdup() copies a chunk of memory of a given size. + * tor_memdup_nulterm() copies a chunk of memory of a given size, then + NUL-terminates it just to be safe. + +#### Why assert on failure? + +Why don't we allow tor_malloc() and its allies to return NULL? + +First, it's error-prone. Many programmers forget to check for NULL return +values, and testing for malloc() failures is a major pain. + +Second, it's not necessarily a great way to handle OOM conditions. It's +probably better (we think) to have a memory target where we dynamically free +things ahead of time in order to stay under the target. Trying to respond to +an OOM at the point of tor_malloc() failure, on the other hand, would involve +a rare operation invoked from deep in the call stack. (Again, that's +error-prone and hard to debug.) + +Third, thanks to the rise of Linux and other operating systems that allow +memory to be overcommitted, you can't actually ever rely on getting a NULL +from malloc() when you're out of memory; instead you have to use an approach +closer to tracking the total memory usage. + +#### Conventions for your own allocation functions. + +Whenever you create a new type, the convention is to give it a pair of +x_new() and x_free() functions, named after the type. + +Calling x_free(NULL) should always be a no-op. + + +### Grow-only memory allocation: memarea.c + +It's often handy to allocate a large number of tiny objects, all of which +need to disappear at the same time. You can do this in tor using the +memarea.c abstraction, which uses a set of grow-only buffers for allocation, +and only supports a single "free" operation at the end. + +Using memareas also helps you avoid memory fragmentation. You see, some libc +malloc implementations perform badly on the case where a large number of +small temporary objects are allocated at the same time as a few long-lived +objects of similar size. But if you use tor_malloc() for the long-lived ones +and a memarea for the temporary object, the malloc implementation is likelier +to do better. + +To create a new memarea, use memarea_new(). To drop all the storage from a +memarea, and invalidate its pointers, use memarea_drop_all(). + +The allocation functions memarea_alloc(), memarea_alloc_zero(), +memarea_memdup(), memarea_strdup(), and memarea_strndup() are analogous to +the similarly-named malloc() functions. There is intentionally no +memarea_free() or memarea_realloc(). + + diff --git a/doc/HACKING/design/01b-collections.md b/doc/HACKING/design/01b-collections.md new file mode 100644 index 0000000000..def60b0f15 --- /dev/null +++ b/doc/HACKING/design/01b-collections.md @@ -0,0 +1,43 @@ + +## Collections in tor + +### Smartlists: Neither lists, nor especially smart. + +For historical reasons, we call our dynamic-allocated array type +"smartlist_t". It can grow or shrink as elements are added and removed. + +All smartlists hold an array of void \*. Whenever you expose a smartlist +in an API you *must* document which types its pointers actually hold. + +<!-- It would be neat to fix that, wouldn't it? -NM --> + +Smartlists are created empty with smartlist_new() and freed with +smartlist_free(). See the containers.h module documentation for more +information; there are many convenience functions for commonly needed +operations. + + +### Digest maps, string maps, and more. + +Tor makes frequent use of maps from 160-bit digests, 256-bit digests, +or nul-terminated strings to void \*. These types are digestmap_t, +digest256map_t, and strmap_t respectively. See the containers.h +module documentation for more information. + + +### Intrusive lists and hashtables + +For performance-sensitive cases, we sometimes want to use "intrusive" +collections: ones where the bookkeeping pointers are stuck inside the +structures that belong to the collection. If you've used the +BSD-style sys/queue.h macros, you'll be familiar with these. + +Unfortunately, the sys/queue.h macros vary significantly between the +platforms that have them, so we provide our own variants in +src/ext/tor_queue.h . + +We also provide an intrusive hashtable implementation in src/ext/ht.h +. When you're using it, you'll need to define your own hash +functions. If attacker-induced collisions are a worry here, use the +cryptographic siphash24g function to extract hashes. + diff --git a/doc/HACKING/design/01c-time.md b/doc/HACKING/design/01c-time.md new file mode 100644 index 0000000000..5cd0b354fd --- /dev/null +++ b/doc/HACKING/design/01c-time.md @@ -0,0 +1,75 @@ + +## Time in tor ## + +### What time is it? ### + +We have several notions of the current time in Tor. + +The *wallclock time* is available from time(NULL) with +second-granularity and tor_gettimeofday() with microsecond +granularity. It corresponds most closely to "the current time and date". + +The *monotonic time* is available with the set of monotime_\* +functions declared in compat_time.h. Unlike the wallclock time, it +can only move forward. It does not necessarily correspond to a real +world time, and it is not portable between systems. + +The *coarse monotonic time* is available from the set of +monotime_coarse_\* functions in compat_time.h. It is the same as +monotime_\* on some platforms. On others, it gives a monotonic timer +with less precision, but which it's more efficient to access. + +### Cached views of time. ### + +On some systems (like Linux), many time functions use a VDSO to avoid +the overhead of a system call. But on other systems, gettimeofday() +and time() can be costly enough that you wouldn't want to call them +tens of thousands of times. To get a recent, but not especially +accurate, view of the current time, see approx_time() and +tor_gettimeofday_cached(). + + +### Parsing and encoding time values ### + +Tor has functions to parse and format time in these formats: + + * RFC1123 format. ("Fri, 29 Sep 2006 15:54:20 GMT"). For this, + use format_rfc1123_time() and parse_rfc1123_time. + + * ISO8601 format. ("2006-10-29 10:57:20") For this, use + format_local_iso_time and format_iso_time. We also support the + variant format "2006-10-29T10:57:20" with format_iso_time_nospace, and + "2006-10-29T10:57:20.123456" with format_iso_time_nospace_usec. + + * HTTP format collections (preferably "Mon, 25 Jul 2016 04:01:11 + GMT" or possibly "Wed Jun 30 21:49:08 1993" or even "25-Jul-16 + 04:01:11 GMT"). For this, use parse_http_time. Don't generate anything + but the first format. + +Some of these functions use struct tm. You can use the standard +tor_localtime_r and tor_gmtime_r() to wrap these in a safe way. We +also have a tor_timegm() function. + +### Scheduling events ### + +The main way to schedule a not-too-frequent periodic event with +respect to the Tor mainloop is via the mechanism in periodic.c. +There's a big table of periodic_events in main.c, each of which gets +invoked on its own schedule. You should not expect more than about +one second of accuracy with these timers. + +You can create an independent timer using libevent directly, or using +the periodic_timer_new() function. But you should avoid doing this +for per-connection or per-circuit timers: Libevent's internal timer +implementation uses a min-heap, and those tend to start scaling poorly +once you have a few thousand entries. + +If you need to create a large number of fine-grained timers for some +purpose, you should consider the mechanism in src/common/timers.c, +which is optimized for the case where you have a large number of +timers with not-too-long duration, many of which will be deleted +before they actually expire. These timers should be reasonably +accurate within a handful of milliseconds -- possibly better on some +platforms. (The timers.c module uses William Ahern's timeout.c +implementation as its backend, which is based on a hierarchical timing +wheel algorithm. It's cool stuff; check it out.) diff --git a/doc/HACKING/design/01d-crypto.md b/doc/HACKING/design/01d-crypto.md new file mode 100644 index 0000000000..d4def947d1 --- /dev/null +++ b/doc/HACKING/design/01d-crypto.md @@ -0,0 +1,169 @@ + +## Lower-level cryptography functionality in Tor ## + +Generally speaking, Tor code shouldn't be calling OpenSSL (or any +other crypto library) directly. Instead, we should indirect through +one of the functions in src/common/crypto\*.c or src/common/tortls.c. + +Cryptography functionality that's available is described below. + +### RNG facilities ### + +The most basic RNG capability in Tor is the crypto_rand() family of +functions. These currently use OpenSSL's RAND_() backend, but may use +something faster in the future. + +In addition to crypto_rand(), which fills in a buffer with random +bytes, we also have functions to produce random integers in certain +ranges; to produce random hostnames; to produce random doubles, etc. + +When you're creating a long-term cryptographic secret, you might want +to use crypto_strongest_rand() instead of crypto_rand(). It takes the +operating system's entropy source and combines it with output from +crypto_rand(). This is a pure paranoia measure, but it might help us +someday. + +You can use smartlist_choose() to pick a random element from a smartlist +and smartlist_shuffle() to randomize the order of a smartlist. Both are +potentially a bit slow. + +### Cryptographic digests and related functions ### + +We treat digests as separate types based on the length of their +outputs. We support one 160-bit digest (SHA1), two 256-bit digests +(SHA256 and SHA3-256), and two 512-bit digests (SHA512 and SHA3-512). + +You should not use SHA1 for anything new. + +The crypto_digest\*() family of functions manipulates digests. You +can either compute a digest of a chunk of memory all at once using +crypto_digest(), crypto_digest256(), or crypto_digest512(). Or you +can create a crypto_digest_t object with +crypto_digest{,256,512}_new(), feed information to it in chunks using +crypto_digest_add_bytes(), and then extract the final digest using +crypto_digest_get_digest(). You can copy the state of one of these +objects using crypto_digest_dup() or crypto_digest_assign(). + +We support the HMAC hash-based message authentication code +instantiated using SHA256. See crypto_hmac_sha256. (You should not +add any HMAC users with SHA1, and HMAC is not necessary with SHA3.) + +We also support the SHA3 cousins, SHAKE128 and SHAKE256. Unlike +digests, these are extendable output functions (or XOFs) where you can +get any amount of output. Use the crypto_xof_\*() functions to access +these. + +We have several ways to derive keys from cryptographically strong secret +inputs (like diffie-hellman outputs). The old +crypto_expand_key_material-TAP() performs an ad-hoc KDF based on SHA1 -- you +shouldn't use it for implementing anything but old versions of the Tor +protocol. You can use HKDF-SHA256 (as defined in RFC5869) for more modern +protocols. Also consider SHAKE256. + +If your input is potentially weak, like a password or passphrase, use a salt +along with the secret_to_key() functions as defined in crypto_s2k.c. Prefer +scrypt over other hashing methods when possible. If you're using a password +to encrypt something, see the "boxed file storage" section below. + +Finally, in order to store objects in hash tables, Tor includes the +randomized SipHash 2-4 function. Call it via the siphash24g() function in +src/ext/siphash.h whenever you're creating a hashtable whose keys may be +manipulated by an attacker in order to DoS you with collisions. + + +### Stream ciphers ### + +You can create instances of a stream cipher using crypto_cipher_new(). +These are stateful objects of type crypto_cipher_t. Note that these +objects only support AES-128 right now; a future version should add +support for AES-128 and/or ChaCha20. + +You can encrypt/decrypt with crypto_cipher_encrypt or +crypto_cipher_decrypt. The crypto_cipher_crypt_inplace function performs +an encryption without a copy. + +Note that sensible people should not use raw stream ciphers; they should +probably be using some kind of AEAD. Sorry. + +### Public key functionality ### + +We support four public key algorithms: DH1024, RSA, Curve25519, and +Ed25519. + +We support DH1024 over two prime groups. You access these via the +crypto_dh_\*() family of functions. + +We support RSA in many bit sizes for signing and encryption. You access +it via the crypto_pk_*() family of functions. Note that a crypto_pk_t +may or may not include a private key. See the crypto_pk_* functions in +crypto.c for a full list of functions here. + +For Curve25519 functionality, see the functions and types in +crypto_curve25519.c. Curve25519 is generally suitable for when you need +a secure fast elliptic-curve diffie hellman implementation. When +designing new protocols, prefer it over DH in Z_p. + +For Ed25519 functionality, see the functions and types in +crypto_ed25519.c. Ed25519 is a generally suitable as a secure fast +elliptic curve signature method. For new protocols, prefer it over RSA +signatures. + +### Metaformats for storage ### + +When OpenSSL manages the storage of some object, we use whatever format +OpenSSL provides -- typically, some kind of PEM-wrapped base 64 encoding +that starts with "----- BEGIN CRYPTOGRAPHIC OBJECT ----". + +When we manage the storage of some cryptographic object, we prefix the +object with 32-byte NUL-padded prefix in order to avoid accidental +object confusion; see the crypto_read_tagged_contents_from_file() and +crypto_write_tagged_contents_to_file() functions for manipulating +these. The prefix is "== type: tag ==", where type describes the object +and its encoding, and tag indicates which one it is. + +### Boxed-file storage ### + +When managing keys, you frequently want to have some way to write a +secret object to disk, encrypted with a passphrase. The crypto_pwbox +and crypto_unpwbox functions do so in a way that's likely to be +readable by future versions of Tor. + +### Certificates ### + +We have, alas, several certificate types in Tor. + +The tor_x509_cert_t type represents an X.509 certificate. This document +won't explain X.509 to you -- possibly, no document can. (OTOH, Peter +Gutmann's "x.509 style guide", though severely dated, does a good job of +explaining how awful x.509 can be.) Do not introduce any new usages of +X.509. Right now we only use it in places where TLS forces us to do so. + +The authority_cert_t type is used only for directory authority keys. It +has a medium-term signing key (which the authorities actually keep +online) signed by a long-term identity key (which the authority operator +had really better be keeping offline). Don't use it for any new kind of +certificate. + +For new places where you need a certificate, consider tor_cert_t: it +represents a typed and dated _something_ signed by an Ed25519 key. The +format is described in tor-spec. Unlike x.509, you can write it on a +napkin. + +(Additionally, the Tor directory design uses a fairly wide variety of +documents that include keys and which are signed by keys. You can +consider these documents to be an additional kind of certificate if you +want.) + +### TLS ### + +Tor's TLS implementation is more tightly coupled to OpenSSL than we'd +prefer. You can read most of it in tortls.c. + +Unfortunately, TLS's state machine and our requirement for nonblocking +IO support means that using TLS in practice is a bit hairy, since +logical writes can block on a physical reads, and vice versa. + +If you are lucky, you will never have to look at the code here. + + + diff --git a/doc/HACKING/design/01e-os-compat.md b/doc/HACKING/design/01e-os-compat.md new file mode 100644 index 0000000000..072e95bc8a --- /dev/null +++ b/doc/HACKING/design/01e-os-compat.md @@ -0,0 +1,50 @@ + +## OS compatibility functions ## + +We've got a bunch of functions to wrap differences between various +operating systems where we run. + +### The filesystem ### + +We wrap the most important filesystem functions with load-file, +save-file, and map-file abstractions declared in util.c or compat.c. If +you're messing about with file descriptors yourself, you might be doing +things wrong. Most of the time, write_str_to_file() and +read_str_from_file() are all you need. + +Use the check_private_directory() function to create or verify the +presence of directories, and tor_listdir() to list the files in a +directory. + +Those modules also have functions for manipulating paths a bit. + +### Networking ### + +Nearly all the world is on a Berkeley sockets API, except for +windows, whose version of the Berkeley API was corrupted by late-90s +insistence on backward compatibility with the +sort-of-berkeley-sort-of-not add-on *thing* that was WinSocks. + +What's more, everybody who implemented sockets realized that select() +wasn't a very good way to do nonblocking IO... and then the various +implementations all decided to so something different. + +You can forget about most of these differences, fortunately: We use +libevent to hide most of the differences between the various networking +backends, and we add a few of our own functions to hide the differences +that Libevent doesn't. + +To create a network connection, the right level of abstraction to look +at is probably the connection_t system in connection.c. Most of the +lower level work has already been done for you. If you need to +instantiate something that doesn't fit well with connection_t, you +should see whether you can instantiate it with connection_t anyway -- or +you might need to refactor connection.c a little. + +Whenever possible, represent network addresses as tor_addr_t. + +### Process launch and monitoring ### + +Launching and/or monitoring a process is tricky business. You can use +the mechanisms in procmon.c and tor_spawn_background(), but they're both +a bit wonky. A refactoring would not be out of order. diff --git a/doc/HACKING/design/01f-threads.md b/doc/HACKING/design/01f-threads.md new file mode 100644 index 0000000000..a0dfa2d40e --- /dev/null +++ b/doc/HACKING/design/01f-threads.md @@ -0,0 +1,26 @@ + +## Threads in Tor ## + +Tor is based around a single main thread and one or more worker +threads. We aim (with middling success) to use worker threads for +CPU-intensive activities and the main thread for our networking. +Fortunately (?) we have enough cryptography that moving what we can of the +cryptographic processes to the workers should achieve good parallelism under most +loads. Unfortunately, we only have a small fraction of our +cryptography done in our worker threads right now. + +Our threads-and-workers abstraction is defined in workqueue.c, which +combines a work queue with a thread pool, and integrates the +signalling with libevent. Tor main instance of a work queue is +instantiated in cpuworker.c. It will probably need some refactoring +as more types of work are added. + +On a lower level, we provide locks with tor_mutex_t, conditions with +tor_cond_t, and thread-local storage with tor_threadlocal_t, all of +which are specified in compat_threads.h and implemented in an OS- +specific compat_\*threads.h module. + +Try to minimize sharing between threads: it is usually best to simply +make the worker "own" all the data it needs while the work is in +progress, and to give up ownership when it's complete. + diff --git a/doc/HACKING/design/01g-strings.md b/doc/HACKING/design/01g-strings.md new file mode 100644 index 0000000000..145a35cd6f --- /dev/null +++ b/doc/HACKING/design/01g-strings.md @@ -0,0 +1,95 @@ + +## String processing in Tor ## + +Since you're reading about a C program, you probably expected this +section: it's full of functions for manipulating the (notoriously +dubious) C string abstraction. I'll describe some often-missed +highlights here. + +### Comparing strings and memory chunks ### + +We provide strcmpstart() and strcmpend() to perform a strcmp with the start +or end of a string. + + tor_assert(!strcmpstart("Hello world","Hello")); + tor_assert(!strcmpend("Hello world","world")); + + tor_assert(!strcasecmpstart("HELLO WORLD","Hello")); + tor_assert(!strcasecmpend("HELLO WORLD","world")); + +To compare two string pointers, either of which might be NULL, use +strcmp_opt(). + +To search for a string or a chunk of memory within a non-null +terminated memory block, use tor_memstr or tor_memmem respectively. + +We avoid using memcmp() directly, since it tends to be used in cases +when having a constant-time operation would be better. Instead, we +recommend tor_memeq() and tor_memneq() for when you need a +constant-time operation. In cases when you need a fast comparison, +and timing leaks are not a danger, you can use fast_memeq() and +fast_memneq(). + +It's a common pattern to take a string representing one or more lines +of text, and search within it for some other string, at the start of a +line. You could search for "\\ntarget", but that would miss the first +line. Instead, use find_str_at_start_of_line. + +### Parsing text ### + +Over the years, we have accumulated lots of ways to parse text -- +probably too many. Refactoring them to be safer and saner could be a +good project! The one that seems most error-resistant is tokenizing +text with smartlist_split_strings(). This function takes a smartlist, +a string, and a separator, and splits the string along occurrences of +the separator, adding new strings for the sub-elements to the given +smartlist. + +To handle time, you can use one of the functions mentioned above in +"Parsing and encoding time values". + +For numbers in general, use the tor_parse_{long,ulong,double,uint64} +family of functions. Each of these can be called in a few ways. The +most general is as follows: + + const int BASE = 10; + const int MINVAL = 10, MAXVAL = 10000; + const char *next; + int ok; + long lng = tor_parse_long("100", BASE, MINVAL, MAXVAL, &ok, &next); + +The return value should be ignored if "ok" is set to false. The input +string needs to contain an entire number, or it's considered +invalid... unless the "next" pointer is available, in which case extra +characters at the end are allowed, and "next" is set to point to the +first such character. + +### Generating blocks of text ### + +For not-too-large blocks of text, we provide tor_asprintf(), which +behaves like other members of the sprintf() family, except that it +always allocates enough memory on the heap for its output. + +For larger blocks: Rather than using strlcat and strlcpy to build +text, or keeping pointers to the interior of a memory block, we +recommend that you use the smartlist_* functions to build a smartlist +full of substrings in order. Then you can concatenate them into a +single string with smartlist_join_strings(), which also takes optional +separator and terminator arguments. + +As a convenience, we provide smartlist_add_asprintf(), which combines +the two methods above together. Many of the cryptographic digest +functions also accept a not-yet-concatenated smartlist of strings. + +### Logging helpers ### + +Often we'd like to log a value that comes from an untrusted source. +To do this, use escaped() to escape the nonprintable characters and +other confusing elements in a string, and surround it by quotes. (Use +esc_for_log() if you need to allocate a new string.) + +It's also handy to put memory chunks into hexadecimal before logging; +you can use hex_str(memory, length) for that. + +The escaped() and hex_str() functions both provide outputs that are +only valid till they are next invoked; they are not threadsafe. diff --git a/doc/HACKING/design/02-dataflow.md b/doc/HACKING/design/02-dataflow.md new file mode 100644 index 0000000000..39f21a908c --- /dev/null +++ b/doc/HACKING/design/02-dataflow.md @@ -0,0 +1,236 @@ + +## Data flow in the Tor process ## + +We read bytes from the network, we write bytes to the network. For the +most part, the bytes we write correspond roughly to bytes we have read, +with bits of cryptography added in. + +The rest is a matter of details. + +![Diagram of main data flows in Tor](./diagrams/02/02-dataflow.png "Diagram of main data flows in Tor") + +### Connections and buffers: reading, writing, and interpreting. ### + +At a low level, Tor's networking code is based on "connections". Each +connection represents an object that can send or receive network-like +events. For the most part, each connection has a single underlying TCP +stream (I'll discuss counterexamples below). + +A connection that behaves like a TCP stream has an input buffer and an +output buffer. Incoming data is +written into the input buffer ("inbuf"); data to be written to the +network is queued on an output buffer ("outbuf"). + +Buffers are implemented in buffers.c. Each of these buffers is +implemented as a linked queue of memory extents, in the style of classic +BSD mbufs, or Linux skbufs. + +A connection's reading and writing can be enabled or disabled. Under +the hood, this functionality is implemented using libevent events: one +for reading, one for writing. These events are turned on/off in +main.c, in the functions connection_{start,stop}_{reading,writing}. + +When a read or write event is turned on, the main libevent loop polls +the kernel, asking which sockets are ready to read or write. (This +polling happens in the event_base_loop() call in run_main_loop_once() +in main.c.) When libevent finds a socket that's ready to read or write, +it invokes conn_{read,write}_callback(), also in main.c + +These callback functions delegate to connection_handle_read() and +connection_handle_write() in connection.c, which read or write on the +network as appropriate, possibly delegating to openssl. + +After data is read or written, or other event occurs, these +connection_handle_read_write() functions call logic functions whose job is +to respond to the information. Some examples included: + + * connection_flushed_some() -- called after a connection writes any + amount of data from its outbuf. + * connection_finished_flushing() -- called when a connection has + emptied its outbuf. + * connection_finished_connecting() -- called when an in-process connection + finishes making a remote connection. + * connection_reached_eof() -- called after receiving a FIN from the + remote server. + * connection_process_inbuf() -- called when more data arrives on + the inbuf. + +These functions then call into specific implementations depending on +the type of the connection. For example, if the connection is an +edge_connection_t, connection_reached_eof() will call +connection_edge_reached_eof(). + +> **Note:** "Also there are bufferevents!" We have vestigial +> code for an alternative low-level networking +> implementation, based on Libevent's evbuffer and bufferevent +> code. These two object types take on (most of) the roles of +> buffers and connections respectively. It isn't working in today's +> Tor, due to code rot and possible lingering libevent bugs. More +> work is needed; it would be good to get this working efficiently +> again, to have IOCP support on Windows. + + +#### Controlling connections #### + +A connection can have reading or writing enabled or disabled for a +wide variety of reasons, including: + + * Writing is disabled when there is no more data to write + * For some connection types, reading is disabled when the inbuf is + too full. + * Reading/writing is temporarily disabled on connections that have + recently read/written enough data up to their bandwidth + * Reading is disabled on connections when reading more data from them + would require that data to be buffered somewhere else that is + already full. + +Currently, these conditions are checked in a diffuse set of +increasingly complex conditional expressions. In the future, it could +be helpful to transition to a unified model for handling temporary +read/write suspensions. + +#### Kinds of connections #### + +Today Tor has the following connection and pseudoconnection types. +For the most part, each type of channel has an associated C module +that implements its underlying logic. + +**Edge connections** receive data from and deliver data to points +outside the onion routing network. See `connection_edge.c`. They fall into two types: + +**Entry connections** are a type of edge connection. They receive data +from the user running a Tor client, and deliver data to that user. +They are used to implement SOCKSPort, TransPort, NATDPort, and so on. +Sometimes they are called "AP" connections for historical reasons (it +used to stand for "Application Proxy"). + +**Exit connections** are a type of edge connection. They exist at an +exit node, and transmit traffic to and from the network. + +(Entry connections and exit connections are also used as placeholders +when performing a remote DNS request; they are not decoupled from the +notion of "stream" in the Tor protocol. This is implemented partially +in `connection_edge.c`, and partially in `dnsserv.c` and `dns.c`.) + +**OR connections** send and receive Tor cells over TLS, using some +version of the Tor link protocol. Their implementation is spread +across `connection_or.c`, with a bit of logic in `command.c`, +`relay.c`, and `channeltls.c`. + +**Extended OR connections** are a type of OR connection for use on +bridges using pluggable transports, so that the PT can tell the bridge +some information about the incoming connection before passing on its +data. They are implemented in `ext_orport.c`. + +**Directory connections** are server-side or client-side connections +that implement Tor's HTTP-based directory protocol. These are +instantiated using a socket when Tor is making an unencrypted HTTP +connection. When Tor is tunneling a directory request over a Tor +circuit, directory connections are implemented using a linked +connection pair (see below). Directory connections are implemented in +`directory.c`; some of the server-side logic is implemented in +`dirserver.c`. + +**Controller connections** are local connections to a controller +process implementing the controller protocol from +control-spec.txt. These are in `control.c`. + +**Listener connections** are not stream oriented! Rather, they wrap a +listening socket in order to detect new incoming connections. They +bypass most of stream logic. They don't have associated buffers. +They are implemented in `connection.c`. + +![structure hierarchy for connection types](./diagrams/02/02-connection-types.png "structure hierarchy for connection types") + +>**Note**: "History Time!" You might occasionally find reference to a couple types of connections +> which no longer exist in modern Tor. A *CPUWorker connection* +>connected the main Tor process to a thread or process used for +>computation. (Nowadays we use in-process communication.) Even more +>anciently, a *DNSWorker connection* connected the main tor process to +>a separate thread or process used for running `gethostbyname()` or +>`getaddrinfo()`. (Nowadays we use Libevent's evdns facility to +>perform DNS requests asynchronously.) + +#### Linked connections #### + +Sometimes two channels are joined together, such that data which the +Tor process sends on one should immediately be received by the same +Tor process on the other. (For example, when Tor makes a tunneled +directory connection, this is implemented on the client side as a +directory connection whose output goes, not to the network, but to a +local entry connection. And when a directory receives a tunnelled +directory connection, this is implemented as an exit connection whose +output goes, not to the network, but to a local directory connection.) + +The earliest versions of Tor to support linked connections used +socketpairs for the purpose. But using socketpairs forced us to copy +data through kernelspace, and wasted limited file descriptors. So +instead, a pair of connections can be linked in-process. Each linked +connection has a pointer to the other, such that data written on one +is immediately readable on the other, and vice versa. + +### From connections to channels ### + +There's an abstraction layer above OR connections (the ones that +handle cells) and below cells called **Channels**. A channel's +purpose is to transmit authenticated cells from one Tor instance +(relay or client) to another. + +Currently, only one implementation exists: Channel_tls, which sends +and receiveds cells over a TLS-based OR connection. + +Cells are sent on a channel using +`channel_write_{,packed_,var_}cell()`. Incoming cells arrive on a +channel from its backend using `channel_queue*_cell()`, and are +immediately processed using `channel_process_cells()`. + +Some cell types are handled below the channel layer, such as those +that affect handshaking only. And some others are passed up to the +generic cross-channel code in `command.c`: cells like `DESTROY` and +`CREATED` are all trivial to handle. But relay cells +require special handling... + +### From channels through circuits ### + +When a relay cell arrives on an existing circuit, it is handled in +`circuit_receive_relay_cell()` -- one of the innermost functions in +Tor. This function encrypts or decrypts the relay cell as +appropriate, and decides whether the cell is intended for the current +hop of the circuit. + +If the cell *is* intended for the current hop, we pass it to +`connection_edge_process_relay_cell()` in `relay.c`, which acts on it +based on its relay command, and (possibly) queues its data on an +`edge_connection_t`. + +If the cell *is not* intended for the current hop, we queue it for the +next channel in sequence with `append cell_to_circuit_queue()`. This +places the cell on a per-circuit queue for cells headed out on that +particular channel. + +### Sending cells on circuits: the complicated bit. ### + +Relay cells are queued onto circuits from one of two (main) sources: +reading data from edge connections, and receiving a cell to be relayed +on a circuit. Both of these sources place their cells on cell queue: +each circuit has one cell queue for each direction that it travels. + +A naive implementation would skip using cell queues, and instead write +each outgoing relay cell. (Tor did this in its earlier versions.) +But such an approach tends to give poor performance, because it allows +high-volume circuits to clog channels, and it forces the Tor server to +send data queued on a circuit even after that circuit has been closed. + +So by using queues on each circuit, we can add cells to each channel +on a just-in-time basis, choosing the cell at each moment based on +a performance-aware algorithm. + +This logic is implemented in two main modules: `scheduler.c` and +`circuitmux*.c`. The scheduler code is responsible for determining +globally, across all channels that could write cells, which one should +next receive queued cells. The circuitmux code determines, for all +of the circuits with queued cells for a channel, which one should +queue the next cell. + +(This logic applies to outgoing relay cells only; incoming relay cells +are processed as they arrive.) diff --git a/doc/HACKING/design/03-modules.md b/doc/HACKING/design/03-modules.md new file mode 100644 index 0000000000..93eb9d3089 --- /dev/null +++ b/doc/HACKING/design/03-modules.md @@ -0,0 +1,247 @@ + +## Tor's modules ## + +### Generic modules ### + +`buffers.c` +: Implements the `buf_t` buffered data type for connections, and several +low-level data handling functions to handle network protocols on it. + +`channel.c` +: Generic channel implementation. Channels handle sending and receiving cells +among tor nodes. + +`channeltls.c` +: Channel implementation for TLS-based OR connections. Uses `connection_or.c`. + +`circuitbuild.c` +: Code for constructing circuits and choosing their paths. (*Note*: +this module could plausibly be split into handling the client side, +the server side, and the path generation aspects of circuit building.) + +`circuitlist.c` +: Code for maintaining and navigating the global list of circuits. + +`circuitmux.c` +: Generic circuitmux implementation. A circuitmux handles deciding, for a +particular channel, which circuit should write next. + +`circuitmux_ewma.c` +: A circuitmux implementation based on the EWMA (exponentially +weighted moving average) algorithm. + +`circuituse.c` +: Code to actually send and receive data on circuits. + +`command.c` +: Handles incoming cells on channels. + +`config.c` +: Parses options from torrc, and uses them to configure the rest of Tor. + +`confparse.c` +: Generic torrc-style parser. Used to parse torrc and state files. + +`connection.c` +: Generic and common connection tools, and implementation for the simpler +connection types. + +`connection_edge.c` +: Implementation for entry and exit connections. + +`connection_or.c` +: Implementation for OR connections (the ones that send cells over TLS). + +`main.c` +: Principal entry point, main loops, scheduled events, and network +management for Tor. + +`ntmain.c` +: Implements Tor as a Windows service. (Not very well.) + +`onion.c` +: Generic code for generating and responding to CREATE and CREATED +cells, and performing the appropriate onion handshakes. Also contains +code to manage the server-side onion queue. + +`onion_fast.c` +: Implements the old SHA1-based CREATE_FAST/CREATED_FAST circuit +creation handshake. (Now deprecated.) + +`onion_ntor.c` +: Implements the Curve25519-based NTOR circuit creation handshake. + +`onion_tap.c` +: Implements the old RSA1024/DH1024-based TAP circuit creation handshake. (Now +deprecated.) + +`relay.c` +: Handles particular types of relay cells, and provides code to receive, +encrypt, route, and interpret relay cells. + +`scheduler.c` +: Decides which channel/circuit pair is ready to receive the next cell. + +`statefile.c` +: Handles loading and storing Tor's state file. + +`tor_main.c` +: Contains the actual `main()` function. (This is placed in a separate +file so that the unit tests can have their own `main()`.) + + +### Node-status modules ### + +`directory.c` +: Implements the HTTP-based directory protocol, including sending, +receiving, and handling most request types. (*Note*: The client parts +of this, and the generic-HTTP parts of this, could plausibly be split +off.) + +`microdesc.c` +: Implements the compact "microdescriptor" format for keeping track of +what we know about a router. + +`networkstatus.c` +: Code for fetching, storing, and interpreting consensus vote documents. + +`nodelist.c` +: Higher-level view of our knowledge of which Tor servers exist. Each +`node_t` corresponds to a router we know about. + +`routerlist.c` +: Code for storing and retrieving router descriptors and extrainfo +documents. + +`routerparse.c` +: Generic and specific code for parsing all Tor directory information +types. + +`routerset.c` +: Parses and interprets a specification for a set of routers (by IP +range, fingerprint, nickname (deprecated), or country). + + +### Client modules ### + +`addressmap.c` +: Handles client-side associations between one address and another. +These are used to implement client-side DNS caching (NOT RECOMMENDED), +MapAddress directives, Automapping, and more. + +`circpathbias.c` +: Path bias attack detection for circuits: tracks whether +connections made through a particular guard have an unusually high failure rate. + +`circuitstats.c` +: Code to track circuit performance statistics in order to adapt our behavior. +Notably includes an algorithm to track circuit build times. + +`dnsserv.c` +: Implements DNSPort for clients. (Note that in spite of the word +"server" in this module's name, it is used for Tor clients. It +implements a DNS server, not DNS for servers.) + +`entrynodes.c` +: Chooses, monitors, and remembers guard nodes. Also contains some +bridge-related code. + +`torcert.c` +: Code to interpret and generate Ed25519-based certificates. + +### Server modules ### + +`dns.c` +: Server-side DNS code. Handles sending and receiving DNS requests on +exit nodes, and implements the server-side DNS cache. + +`dirserv.c` +: Implements part of directory caches that handles responding to +client requests. + +`ext_orport.c` +: Implements the extended ORPort protocol for communication between +server-side pluggable transports and Tor servers. + +`hibernate.c` +: Performs bandwidth accounting, and puts Tor relays into hibernation +when their bandwidth is exhausted. + +`router.c` +: Management code for running a Tor server. In charge of RSA key +maintenance, descriptor generation and uploading. + +`routerkeys.c` +: Key handling code for a Tor server. (Currently handles only the +Ed25519 keys, but the RSA keys could be moved here too.) + + +### Onion service modules ### + +`rendcache.c` +: Stores onion service descriptors. + +`rendclient.c` +: Client-side implementation of the onion service protocol. + +`rendcommon.c` +: Parts of the onion service protocol that are shared by clients, +services, and/or Tor servers. + +`rendmid.c` +: Tor-server-side implementation of the onion service protocol. (Handles +acting as an introduction point or a rendezvous point.) + +`rendservice.c` +: Service-side implementation of the onion service protocol. + +`replaycache.c` +: Backend to check introduce2 requests for replay attempts. + + +### Authority modules ### + +`dircollate.c` +: Helper for `dirvote.c`: Given a set of votes, each containing a list +of Tor nodes, determines which entries across all the votes correspond +to the same nodes, and yields them in a useful order. + +`dirvote.c` +: Implements the directory voting algorithms that authorities use. + +`keypin.c` +: Implements a persistent key-pinning mechanism to tie RSA1024 +identities to ed25519 identities. + +### Miscellaneous modules ### + +`control.c` +: Implements the Tor controller protocol. + +`cpuworker.c` +: Implements the inner work queue function. We use this to move the +work of circuit creation (on server-side) to other CPUs. + +`fp_pair.c` +: Types for handling 2-tuples of 20-byte fingerprints. + +`geoip.c` +: Parses geoip files (which map IP addresses to country codes), and +performs lookups on the internal geoip table. Also stores some +geoip-related statistics. + +`policies.c` +: Parses and implements Tor exit policies. + +`reasons.c` +: Maps internal reason-codes to human-readable strings. + +`rephist.c` +: Tracks Tor servers' performance over time. + +`status.c` +: Writes periodic "heartbeat" status messages about the state of the Tor +process. + +`transports.c` +: Implements management for the pluggable transports subsystem. diff --git a/doc/HACKING/design/Makefile b/doc/HACKING/design/Makefile new file mode 100644 index 0000000000..e126130970 --- /dev/null +++ b/doc/HACKING/design/Makefile @@ -0,0 +1,34 @@ + + + +HTML= \ + 00-overview.html \ + 01-common-utils.html \ + 01a-memory.html \ + 01b-collections.html \ + 01c-time.html \ + 01d-crypto.html \ + 01e-os-compat.html \ + 01f-threads.html \ + 01g-strings.html \ + 02-dataflow.html \ + 03-modules.html \ + this-not-that.html + +PNG = \ + diagrams/02/02-dataflow.png \ + diagrams/02/02-connection-types.png + +all: generated + +generated: $(HTML) $(PNG) + +%.html: %.md + maruku $< -o $@ + +%.png: %.dia + dia $< --export=$@ + +clean: + rm -f $(HTML) + rm -f $(PNG) diff --git a/doc/HACKING/design/this-not-that.md b/doc/HACKING/design/this-not-that.md new file mode 100644 index 0000000000..815c7b2fbc --- /dev/null +++ b/doc/HACKING/design/this-not-that.md @@ -0,0 +1,51 @@ + +Don't use memcmp. Use {tor,fast}_{memeq,memneq,memcmp}. + +Don't use assert. Use tor_assert or tor_assert_nonfatal or BUG. Prefer +nonfatal assertions or BUG()s. + +Don't use sprintf or snprintf. Use tor_asprintf or tor_snprintf. + +Don't write hand-written binary parsers. Use trunnel. + +Don't use malloc, realloc, calloc, free, strdup, etc. Use tor_malloc, +tor_realloc, tor_calloc, tor_free, tor_strdup, etc. + +Don't use tor_realloc(x, y\*z). Use tor_reallocarray(x, y, z); + +Don't say "if (x) foo_free(x)". Just foo_free(x) and make sure that +foo_free(NULL) is a no-op. + +Don't use toupper or tolower; use TOR_TOUPPER and TOR_TOLOWER. + +Don't use isalpha, isalnum, etc. Instead use TOR_ISALPHA, TOR_ISALNUM, etc. + +Don't use strcat, strcpy, strncat, or strncpy. Use strlcat and strlcpy +instead. + +Don't use tor_asprintf then smartlist_add; use smartlist_add_asprintf. + +Don't use any of these functions: they aren't portable. Use the +version prefixed with `tor_` instead: strtok_r, memmem, memstr, +asprintf, localtime_r, gmtime_r, inet_aton, inet_ntop, inet_pton, +getpass, ntohll, htonll, strdup, (This list is incomplete.) + +Don't create or close sockets directly. Instead use the wrappers in +compat.h. + +When creating new APIs, only use 'char \*' to represent 'pointer to a +nul-terminated string'. Represent 'pointer to a chunk of memory' as +'uint8_t \*'. (Many older Tor APIs ignore this rule.) + +Don't encode/decode u32, u64, or u16 to byte arrays by casting +pointers. That can crash if the pointers aren't aligned, and can cause +endianness problems. Instead say something more like set_uint32(ptr, +htonl(foo)) to encode, and ntohl(get_uint32(ptr)) to decode. + +Don't declare a 0-argument function with "void foo()". That's C++ +syntax. In C you say "void foo(void)". + +When creating new APIs, use const everywhere you reasonably can. + +Sockets should have type tor_socket_t, not int. + |