aboutsummaryrefslogtreecommitdiff
path: root/doc/HACKING
diff options
context:
space:
mode:
Diffstat (limited to 'doc/HACKING')
-rw-r--r--doc/HACKING213
1 files changed, 109 insertions, 104 deletions
diff --git a/doc/HACKING b/doc/HACKING
index e6a9e8157a..00177f2e37 100644
--- a/doc/HACKING
+++ b/doc/HACKING
@@ -6,108 +6,113 @@ the code, add features, fix bugs, etc.
Read the README file first, so you can get familiar with the basics.
-1. The programs.
-
-1.1. "or". This is the main program here. It functions as either a server
-or a client, depending on which config file you give it.
-
-1.2. "orkeygen". Use "orkeygen file-for-privkey file-for-pubkey" to
-generate key files for an onion router.
-
-2. The pieces.
-
-2.1. Routers. Onion routers, as far as the 'or' program is concerned,
-are a bunch of data items that are loaded into the router_array when
-the program starts. Periodically it downloads a new set of routers
-from a directory server, and updates the router_array. When a new OR
-connection is started (see below), the relevant information is copied
-from the router struct to the connection struct.
-
-2.2. Connections. A connection is a long-standing tcp socket between
-nodes. A connection is named based on what it's connected to -- an "OR
-connection" has an onion router on the other end, an "OP connection" has
-an onion proxy on the other end, an "exit connection" has a website or
-other server on the other end, and an "AP connection" has an application
-proxy (and thus a user) on the other end.
-
-2.3. Circuits. A circuit is a path over the onion routing
-network. Applications can connect to one end of the circuit, and can
-create exit connections at the other end of the circuit. AP and exit
-connections have only one circuit associated with them (and thus these
-connection types are closed when the circuit is closed), whereas OP and
-OR connections multiplex many circuits at once, and stay standing even
-when there are no circuits running over them.
-
-2.4. Topics. Topics are specific conversations between an AP and an exit.
-Topics are multiplexed over circuits.
-
-2.4. Cells. Some connections, specifically OR and OP connections, speak
-"cells". This means that data over that connection is bundled into 256
-byte packets (8 bytes of header and 248 bytes of payload). Each cell has
-a type, or "command", which indicates what it's for.
-
-
-3. Important parameters in the code.
-
-
-
-4. Robustness features.
-
-4.1. Bandwidth throttling. Each cell-speaking connection has a maximum
-bandwidth it can use, as specified in the routers.or file. Bandwidth
-throttling can occur on both the sender side and the receiving side. If
-the LinkPadding option is on, the sending side sends cells at regularly
-spaced intervals (e.g., a connection with a bandwidth of 25600B/s would
-queue a cell every 10ms). The receiving side protects against misbehaving
-servers that send cells more frequently, by using a simple token bucket:
-
-Each connection has a token bucket with a specified capacity. Tokens are
-added to the bucket each second (when the bucket is full, new tokens
-are discarded.) Each token represents permission to receive one byte
-from the network --- to receive a byte, the connection must remove a
-token from the bucket. Thus if the bucket is empty, that connection must
-wait until more tokens arrive. The number of tokens we add enforces a
-longterm average rate of incoming bytes, yet we still permit short-term
-bursts above the allowed bandwidth. Currently bucket sizes are set to
-ten seconds worth of traffic.
-
-The bandwidth throttling uses TCP to push back when we stop reading.
-We extend it with token buckets to allow more flexibility for traffic
-bursts.
-
-4.2. Data congestion control. Even with the above bandwidth throttling,
-we still need to worry about congestion, either accidental or intentional.
-If a lot of people make circuits into same node, and they all come out
-through the same connection, then that connection may become saturated
-(be unable to send out data cells as quickly as it wants to). An adversary
-can make a 'put' request through the onion routing network to a webserver
-he owns, and then refuse to read any of the bytes at the webserver end
-of the circuit. These bottlenecks can propagate back through the entire
-network, mucking up everything.
-
-(See the tor-spec.txt document for details of how congestion control
-works.)
-
-In practice, all the nodes in the circuit maintain a receive window
-close to maximum except the exit node, which stays around 0, periodically
-receiving a sendme and reading more data cells from the webserver.
-In this way we can use pretty much all of the available bandwidth for
-data, but gracefully back off when faced with multiple circuits (a new
-sendme arrives only after some cells have traversed the entire network),
-stalled network connections, or attacks.
-
-We don't need to reimplement full tcp windows, with sequence numbers,
-the ability to drop cells when we're full etc, because the tcp streams
-already guarantee in-order delivery of each cell. Rather than trying
-to build some sort of tcp-on-tcp scheme, we implement this minimal data
-congestion control; so far it's enough.
-
-4.3. Router twins. In many cases when we ask for a router with a given
-address and port, we really mean a router who knows a given key. Router
-twins are two or more routers that share the same private key. We thus
-give routers extra flexibility in choosing the next hop in the circuit: if
-some of the twins are down or slow, it can choose the more available ones.
-
-Currently the code tries for the primary router first, and if it's down,
-chooses the first available twin.
+The pieces.
+
+ Routers. Onion routers, as far as the 'tor' program is concerned,
+ are a bunch of data items that are loaded into the router_array when
+ the program starts. Periodically it downloads a new set of routers
+ from a directory server, and updates the router_array. When a new OR
+ connection is started (see below), the relevant information is copied
+ from the router struct to the connection struct.
+
+ Connections. A connection is a long-standing tcp socket between
+ nodes. A connection is named based on what it's connected to -- an "OR
+ connection" has an onion router on the other end, an "OP connection" has
+ an onion proxy on the other end, an "exit connection" has a website or
+ other server on the other end, and an "AP connection" has an application
+ proxy (and thus a user) on the other end.
+
+ Circuits. A circuit is a path over the onion routing
+ network. Applications can connect to one end of the circuit, and can
+ create exit connections at the other end of the circuit. AP and exit
+ connections have only one circuit associated with them (and thus these
+ connection types are closed when the circuit is closed), whereas OP and
+ OR connections multiplex many circuits at once, and stay standing even
+ when there are no circuits running over them.
+
+ Streams. Streams are specific conversations between an AP and an exit.
+ Streams are multiplexed over circuits.
+
+ Cells. Some connections, specifically OR and OP connections, speak
+ "cells". This means that data over that connection is bundled into 256
+ byte packets (8 bytes of header and 248 bytes of payload). Each cell has
+ a type, or "command", which indicates what it's for.
+
+Robustness features.
+
+[XXX no longer up to date]
+ Bandwidth throttling. Each cell-speaking connection has a maximum
+ bandwidth it can use, as specified in the routers.or file. Bandwidth
+ throttling can occur on both the sender side and the receiving side. If
+ the LinkPadding option is on, the sending side sends cells at regularly
+ spaced intervals (e.g., a connection with a bandwidth of 25600B/s would
+ queue a cell every 10ms). The receiving side protects against misbehaving
+ servers that send cells more frequently, by using a simple token bucket:
+
+ Each connection has a token bucket with a specified capacity. Tokens are
+ added to the bucket each second (when the bucket is full, new tokens
+ are discarded.) Each token represents permission to receive one byte
+ from the network --- to receive a byte, the connection must remove a
+ token from the bucket. Thus if the bucket is empty, that connection must
+ wait until more tokens arrive. The number of tokens we add enforces a
+ longterm average rate of incoming bytes, yet we still permit short-term
+ bursts above the allowed bandwidth. Currently bucket sizes are set to
+ ten seconds worth of traffic.
+
+ The bandwidth throttling uses TCP to push back when we stop reading.
+ We extend it with token buckets to allow more flexibility for traffic
+ bursts.
+
+ Data congestion control. Even with the above bandwidth throttling,
+ we still need to worry about congestion, either accidental or intentional.
+ If a lot of people make circuits into same node, and they all come out
+ through the same connection, then that connection may become saturated
+ (be unable to send out data cells as quickly as it wants to). An adversary
+ can make a 'put' request through the onion routing network to a webserver
+ he owns, and then refuse to read any of the bytes at the webserver end
+ of the circuit. These bottlenecks can propagate back through the entire
+ network, mucking up everything.
+
+ (See the tor-spec.txt document for details of how congestion control
+ works.)
+
+ In practice, all the nodes in the circuit maintain a receive window
+ close to maximum except the exit node, which stays around 0, periodically
+ receiving a sendme and reading more data cells from the webserver.
+ In this way we can use pretty much all of the available bandwidth for
+ data, but gracefully back off when faced with multiple circuits (a new
+ sendme arrives only after some cells have traversed the entire network),
+ stalled network connections, or attacks.
+
+ We don't need to reimplement full tcp windows, with sequence numbers,
+ the ability to drop cells when we're full etc, because the tcp streams
+ already guarantee in-order delivery of each cell. Rather than trying
+ to build some sort of tcp-on-tcp scheme, we implement this minimal data
+ congestion control; so far it's enough.
+
+ Router twins. In many cases when we ask for a router with a given
+ address and port, we really mean a router who knows a given key. Router
+ twins are two or more routers that share the same private key. We thus
+ give routers extra flexibility in choosing the next hop in the circuit: if
+ some of the twins are down or slow, it can choose the more available ones.
+
+ Currently the code tries for the primary router first, and if it's down,
+ chooses the first available twin.
+
+Coding conventions:
+
+ Log convention: use only these four log severities.
+
+ ERR is if something fatal just happened.
+ WARNING is something bad happened, but we're still running. The
+ bad thing is either a bug in the code, an attack or buggy
+ protocol/implementation of the remote peer, etc. The operator should
+ examine the bad thing and try to correct it.
+ (No error or warning messages should be expected. I expect most people
+ to run on -l warning eventually. If a library function is currently
+ called such that failure always means ERR, then the library function
+ should log WARNING and let the caller log ERR.)
+ INFO means something happened (maybe bad, maybe ok), but there's nothing
+ you need to (or can) do about it.
+ DEBUG is for everything louder than INFO.