spec/intro/index.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140

# A short introduction to Tor {#tor-intro}

### Basic functionality {#basics}

Tor is a distributed overlay network designed to anonymize
low-latency TCP-based applications
such as web browsing, secure shell, and instant messaging.
The network is built of a number of servers, called **relays**
(also called "onion routers" or "ORs" in some older documentation).

To connect to the network,
a client needs to download an up-to-date signed directory
of the relays on the network.
These directory documents are generated and signed
by a set of semi-trusted **directory authority** servers,
and are cached by the relays themselves.
(If a client does not yet have a directory,
it finds a cache by looking at a list of stable cache locations,
distributed along with its source code.)

> For more information on the directory subsystem,
> see the [directory protocol specification](../dir-spec).

After the client knows the relays on the network,
it can pick a relay and open a [**channel**](../tor-spec/channels.md)
to one of these relays.
A channel is an encrypted reliable non-anonymous transport
between a client and a relay or a relay and a relay,
used to transmit messages called [**cells**](../tor-spec/cell-packet-format.md).
(Under the hood, a channel is just a TLS connection over TCP,
with a specified encoding for cells.)

To anonymize its traffic,
a client chooses a **path**—a sequence of relays on the network—
and opens a channel to the first relay on the path
(if it does not already have a channel open to that relay).
The client then uses that channel to build
a multi-hop cryptographic structure
called a [**circuit**](../tor-spec/circuit-management.md).
A circuit is built over a sequence of relays (typically three).
Every relay in the circuit knows its precessor and successor,
but no other relays in the circuit.
Many circuits can be multiplexed over a single channel.

> For more information on how paths are selected,
> see the [path specification](../path-spec).
> The first hop on a path,
> also called a **guard node**,
> has complicated rules for its selection;
> for more on those, see the [guard specification](../guard-spec).

Once a circuit exists,
the client can use it to exchange fixed-length
[**relay cells**](../tor-spec/relay-cells.md)
with any relay on the circuit.
These relay cells are wrapped in multiple layers of encryption:
as part of building the circuit,
the client [negotiates](../tor-spec/create-created-cells.md)
a separate set of symmetric keys
with each relay on the circuit.
Each relay removes (or adds)
a [single layer of encryption](../tor-spec/routing-relay-cells.md)
for each relay cell before passing it on.

A client uses these relay cells
to exchange [**relay messages**](../tor-spec/relay-cells.md) with relays on a circuit.
These "relay messages" in turn are used
to actually deliver traffic over the network.
In the [simplest use case](../tor-spec/opening-streams.md),
the client sends a `BEGIN` message
to tell the last relay on the circuit
(called the **exit node**)
to create a new session, or **stream**,
and associate that stream
with a new TCP connection to a target host.
The exit node replies with a `CONNECTED` message
to say that the TCP connection has succeeded.
Then the client and the exit exchange `DATA` messages
to represent the contents of the anonymized stream.

> Note that as of 2023,
> the specifications do not perfectly distinguish
> between relay cells and relay messages.
> This is because, until recently,
> there was a 1-to-1 relationship between the two:
> every relay cell held a single relay message.
> As [proposal 340](../proposals/340-packed-and-fragmented.md) is implemented,
> we will revise the specifications
> for improved clarify on this point.

Other kinds of relay messages can be used
for more advanced functionality.

<!-- TODO: I'm not so sure about the vocabulary in this part. -->

Using a system called **conflux**
a client can build multiple circuits to the _same_ exit node,
and associate those circuits within a **conflux set**.
Once this is done,
relay messages can be sent over _either_ circuit in the set,
depending on capacity and performance.

> For more on conflux,
> which has been integrated into the C tor implementation,
> but not yet (as of 2023) into this document,
> see [proposal 329](../proposals/329-traffic-splitting.txt).

### Advanced topics: Onion services and responder anonymity {#onions}

In addition to _initiating_ anonymous communications,
clients can also arrange to _receive_ communications
without revealing their identity or location.
This is called **responder anonymity**,
and the mechanism Tor uses to achieve it
is called **onion services**
(or "hidden services" or "rendezvous services"
in some older documentation).

> For the details on onion services,
> see the [Tor Rendezvous Specification](../rend-spec).

### Advanced topics: Censorship resistence {#anticensorship}

In some places, Tor is censored.
Typically, censors do this by blocking connections
to the addresses of the known Tor relays,
and by blocking traffic that resembles Tor.

To resist this censorship,
some Tor relays, called **bridges**,
are unlisted in the public directory:
their addresses are distributed by [other means](../bridgedb-spec.md).
(To distinguish ordinary published relays from bridges,
we sometimes call them **public relays**.)

Additionally, Tor clients and bridges can use extension programs,
called [**pluggable transports**](../pt-spec),
that obfuscate their traffic to make it harder to detect.