summaryrefslogtreecommitdiff
path: root/doc/HACKING/design/02-dataflow.md
blob: 39f21a908cec0ff6d6b4d5def9f4c75c7c32a9a5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
## Data flow in the Tor process ##

We read bytes from the network, we write bytes to the network.  For the
most part, the bytes we write correspond roughly to bytes we have read,
with bits of cryptography added in.

The rest is a matter of details.

![Diagram of main data flows in Tor](./diagrams/02/02-dataflow.png "Diagram of main data flows in Tor")

### Connections and buffers: reading, writing, and interpreting. ###

At a low level, Tor's networking code is based on "connections".  Each
connection represents an object that can send or receive network-like
events.  For the most part, each connection has a single underlying TCP
stream (I'll discuss counterexamples below).

A connection that behaves like a TCP stream has an input buffer and an
output buffer.  Incoming data is
written into the input buffer ("inbuf"); data to be written to the
network is queued on an output buffer ("outbuf").

Buffers are implemented in buffers.c.  Each of these buffers is
implemented as a linked queue of memory extents, in the style of classic
BSD mbufs, or Linux skbufs.

A connection's reading and writing can be enabled or disabled.  Under
the hood, this functionality is implemented using libevent events: one
for reading, one for writing.  These events are turned on/off in
main.c, in the functions connection_{start,stop}_{reading,writing}.

When a read or write event is turned on, the main libevent loop polls
the kernel, asking which sockets are ready to read or write.  (This
polling happens in the event_base_loop() call in run_main_loop_once()
in main.c.)  When libevent finds a socket that's ready to read or write,
it invokes conn_{read,write}_callback(), also in main.c

These callback functions delegate to connection_handle_read() and
connection_handle_write() in connection.c, which read or write on the
network as appropriate, possibly delegating to openssl.

After data is read or written, or other event occurs, these
connection_handle_read_write() functions call logic functions whose job is
to respond to the information.  Some examples included:

   * connection_flushed_some() -- called after a connection writes any
     amount of data from its outbuf.
   * connection_finished_flushing() -- called when a connection has
     emptied its outbuf.
   * connection_finished_connecting() -- called when an in-process connection
     finishes making a remote connection.
   * connection_reached_eof() -- called after receiving a FIN from the
     remote server.
   * connection_process_inbuf() -- called when more data arrives on
     the inbuf.

These functions then call into specific implementations depending on
the type of the connection.  For example, if the connection is an
edge_connection_t, connection_reached_eof() will call
connection_edge_reached_eof().

> **Note:** "Also there are bufferevents!"  We have vestigial
> code for an alternative low-level networking
> implementation, based on Libevent's evbuffer and bufferevent
> code.  These two object types take on (most of) the roles of
> buffers and connections respectively. It isn't working in today's
> Tor, due to code rot and possible lingering libevent bugs.  More
> work is needed; it would be good to get this working efficiently
> again, to have IOCP support on Windows.


#### Controlling connections ####

A connection can have reading or writing enabled or disabled for a
wide variety of reasons, including:

   * Writing is disabled when there is no more data to write
   * For some connection types, reading is disabled when the inbuf is
     too full.
   * Reading/writing is temporarily disabled on connections that have
     recently read/written enough data up to their bandwidth 
   * Reading is disabled on connections when reading more data from them
     would require that data to be buffered somewhere else that is
     already full.

Currently, these conditions are checked in a diffuse set of
increasingly complex conditional expressions.  In the future, it could
be helpful to transition to a unified model for handling temporary
read/write suspensions.

#### Kinds of connections ####

Today Tor has the following connection and pseudoconnection types.
For the most part, each type of channel has an associated C module
that implements its underlying logic.

**Edge connections** receive data from and deliver data to points
outside the onion routing network.  See `connection_edge.c`. They fall into two types:

**Entry connections** are a type of edge connection. They receive data
from the user running a Tor client, and deliver data to that user.
They are used to implement SOCKSPort, TransPort, NATDPort, and so on.
Sometimes they are called "AP" connections for historical reasons (it
used to stand for "Application Proxy").

**Exit connections** are a type of edge connection. They exist at an
exit node, and transmit traffic to and from the network.

(Entry connections and exit connections are also used as placeholders
when performing a remote DNS request; they are not decoupled from the
notion of "stream" in the Tor protocol. This is implemented partially
in `connection_edge.c`, and partially in `dnsserv.c` and `dns.c`.)

**OR connections** send and receive Tor cells over TLS, using some
version of the Tor link protocol.  Their implementation is spread
across `connection_or.c`, with a bit of logic in `command.c`,
`relay.c`, and `channeltls.c`.

**Extended OR connections** are a type of OR connection for use on
bridges using pluggable transports, so that the PT can tell the bridge
some information about the incoming connection before passing on its
data.  They are implemented in `ext_orport.c`.

**Directory connections** are server-side or client-side connections
that implement Tor's HTTP-based directory protocol.  These are
instantiated using a socket when Tor is making an unencrypted HTTP
connection.  When Tor is tunneling a directory request over a Tor
circuit, directory connections are implemented using a linked
connection pair (see below).  Directory connections are implemented in
`directory.c`; some of the server-side logic is implemented in
`dirserver.c`.

**Controller connections** are local connections to a controller
process implementing the controller protocol from
control-spec.txt. These are in `control.c`.

**Listener connections** are not stream oriented!  Rather, they wrap a
listening socket in order to detect new incoming connections.  They
bypass most of stream logic.  They don't have associated buffers.
They are implemented in `connection.c`.

![structure hierarchy for connection types](./diagrams/02/02-connection-types.png "structure hierarchy for connection types")

>**Note**: "History Time!" You might occasionally find reference to a couple types of connections
> which no longer exist in modern Tor.  A *CPUWorker connection*
>connected the main Tor process to a thread or process used for
>computation.  (Nowadays we use in-process communication.)  Even more
>anciently, a *DNSWorker connection* connected the main tor process to
>a separate thread or process used for running `gethostbyname()` or
>`getaddrinfo()`.  (Nowadays we use Libevent's evdns facility to
>perform DNS requests asynchronously.)

#### Linked connections ####

Sometimes two channels are joined together, such that data which the
Tor process sends on one should immediately be received by the same
Tor process on the other.  (For example, when Tor makes a tunneled
directory connection, this is implemented on the client side as a
directory connection whose output goes, not to the network, but to a
local entry connection. And when a directory receives a tunnelled
directory connection, this is implemented as an exit connection whose
output goes, not to the network, but to a local directory connection.)

The earliest versions of Tor to support linked connections used
socketpairs for the purpose.  But using socketpairs forced us to copy
data through kernelspace, and wasted limited file descriptors.  So
instead, a pair of connections can be linked in-process.  Each linked
connection has a pointer to the other, such that data written on one
is immediately readable on the other, and vice versa.

### From connections to channels ###

There's an abstraction layer above OR connections (the ones that
handle cells) and below cells called **Channels**.  A channel's
purpose is to transmit authenticated cells from one Tor instance
(relay or client) to another.

Currently, only one implementation exists: Channel_tls, which sends
and receiveds cells over a TLS-based OR connection.

Cells are sent on a channel using
`channel_write_{,packed_,var_}cell()`. Incoming cells arrive on a
channel from its backend using `channel_queue*_cell()`, and are
immediately processed using `channel_process_cells()`.

Some cell types are handled below the channel layer, such as those
that affect handshaking only.  And some others are passed up to the
generic cross-channel code in `command.c`: cells like `DESTROY` and
`CREATED` are all trivial to handle.  But relay cells
require special handling...

### From channels through circuits ###

When a relay cell arrives on an existing circuit, it is handled in
`circuit_receive_relay_cell()` -- one of the innermost functions in
Tor.  This function encrypts or decrypts the relay cell as
appropriate, and decides whether the cell is intended for the current
hop of the circuit.

If the cell *is* intended for the current hop, we pass it to
`connection_edge_process_relay_cell()` in `relay.c`, which acts on it
based on its relay command, and (possibly) queues its data on an
`edge_connection_t`.

If the cell *is not* intended for the current hop, we queue it for the
next channel in sequence with `append cell_to_circuit_queue()`.  This
places the cell on a per-circuit queue for cells headed out on that
particular channel.

### Sending cells on circuits: the complicated bit. ###

Relay cells are queued onto circuits from one of two (main) sources:
reading data from edge connections, and receiving a cell to be relayed
on a circuit.  Both of these sources place their cells on cell queue:
each circuit has one cell queue for each direction that it travels.

A naive implementation would skip using cell queues, and instead write
each outgoing relay cell.  (Tor did this in its earlier versions.)
But such an approach tends to give poor performance, because it allows
high-volume circuits to clog channels, and it forces the Tor server to
send data queued on a circuit even after that circuit has been closed.

So by using queues on each circuit, we can add cells to each channel
on a just-in-time basis, choosing the cell at each moment based on
a performance-aware algorithm.

This logic is implemented in two main modules: `scheduler.c` and
`circuitmux*.c`.  The scheduler code is responsible for determining
globally, across all channels that could write cells, which one should
next receive queued cells.  The circuitmux code determines, for all
of the circuits with queued cells for a channel, which one should
queue the next cell.

(This logic applies to outgoing relay cells only; incoming relay cells
are processed as they arrive.)