diff options
Diffstat (limited to 'doc/HACKING/design/00-overview.md')
-rw-r--r-- | doc/HACKING/design/00-overview.md | 124 |
1 files changed, 124 insertions, 0 deletions
diff --git a/doc/HACKING/design/00-overview.md b/doc/HACKING/design/00-overview.md new file mode 100644 index 0000000000..2103a9062a --- /dev/null +++ b/doc/HACKING/design/00-overview.md @@ -0,0 +1,124 @@ + +## Overview ## + +This document describes the general structure of the Tor codebase, how +it fits together, what functionality is available for extending Tor, +and gives some notes on how Tor got that way. + +Tor remains a work in progress: We've been working on it for more than a +decade, and we've learned a lot about good coding since we first +started. This means, however, that some of the older pieces of Tor will +have some "code smell" in them that could sure stand a brisk +refactoring. So when I describe a piece of code, I'll sometimes give a +note on how it got that way, and whether I still think that's a good +idea. + +The first drafts of this document were written in the Summer and Fall of +2015, when Tor 0.2.6 was the most recent stable version, and Tor 0.2.7 +was under development. If you're reading this far in the future, some +things may have changed. Caveat haxxor! + +This document is not an overview of the Tor protocol. For that, see the +design paper and the specifications at https://spec.torproject.org/ . + +For more information about Tor's coding standards and some helpful +development tools, see doc/HACKING in the Tor repository. + +For more information about writing tests, see doc/HACKING/WritingTests.txt +in the Tor repository. + +### The very high level ### + +Ultimately, Tor runs as an event-driven network daemon: it responds to +network events, signals, and timers by sending and receiving things over +the network. Clients, relays, and directory authorities all use the +same codebase: the Tor process will run as a client, relay, or authority +depending on its configuration. + +Tor has a few major dependencies, including Libevent (used to tell which +sockets are readable and writable), OpenSSL (used for many encryption +functions, and to implement the TLS protocol), and zlib (used to +compress and uncompress directory information). + +Most of Tor's work today is done in a single event-driven main thread. +Tor also spawns one or more worker threads to handle CPU-intensive +tasks. (Right now, this only includes circuit encryption.) + +On startup, Tor initializes its libraries, reads and responds to its +configuration files, and launches a main event loop. At first, the only +events that Tor listens for are a few signals (like TERM and HUP), and +one or more listener sockets (for different kinds of incoming +connections). Tor also configures a timer function to run once per +second to handle periodic events. As Tor runs over time, other events +will open, and new events will be scheduled. + +The codebase is divided into a few main subdirectories: + + src/common -- utility functions, not necessarily tor-specific. + + src/or -- implements the Tor protocols. + + src/test -- unit and regression tests + + src/ext -- Code maintained elsewhere that we include in the Tor + source distribution. + + src/trunnel -- automatically generated code (from the Trunnel) + tool: used to parse and encode binary formats. + +### Some key high-level abstractions ### + +The most important abstractions at Tor's high-level are Connections, +Channels, Circuits, and Nodes. + +A 'Connection' represents a stream-based information flow. Most +connections are TCP connections to remote Tor servers and clients. (But +as a shortcut, a relay will sometimes make a connection to itself +without actually using a TCP connection. More details later on.) +Connections exist in different varieties, depending on what +functionality they provide. The principle types of connection are +"edge" (eg a socks connection or a connection from an exit relay to a +destination), "OR" (a TLS stream connecting to a relay), "Directory" (an +HTTP connection to learn about the network), and "Control" (a connection +from a controller). + +A 'Circuit' is persistent tunnel through the Tor network, established +with public-key cryptography, and used to send cells one or more hops. +Clients keep track of multi-hop circuits, and the cryptography +associated with each hop. Relays, on the other hand, keep track only of +their hop of each circuit. + +A 'Channel' is an abstract view of sending cells to and from a Tor +relay. Currently, all channels are implemented using OR connections. +If we switch to other strategies in the future, we'll have more +connection types. + +A 'Node' is a view of a Tor instance's current knowledge and opinions +about a Tor relay orbridge. + +### The rest of this document. ### + +> **Note**: This section describes the eventual organization of this +> document, which is not yet complete. + +We'll begin with an overview of the various utility functions available +in Tor's 'common' directory. Knowing about these is key to writing +portable, simple code in Tor. + +Then we'll go on and talk about the main data-flow of the Tor network: +how Tor generates and responds to network traffic. This will occupy a +chapter for the main overview, with other chapters for special topics. + +After that, we'll mention the main modules in Tor, and describe the +function of each. + +We'll cover the directory subsystem next: how Tor learns about other +relays, and how relays advertise themselves. + +Then we'll cover a few specialized modules, such as hidden services, +sandboxing, hibernation, accounting, statistics, guards, path +generation, pluggable transports, and how they integrate with the rest of Tor. + +We'll close with a meandering overview of important pending issues in +the Tor codebase, and how they affect the future of the Tor software. + |