aboutsummaryrefslogtreecommitdiff
path: root/spec/intro
diff options
context:
space:
mode:
Diffstat (limited to 'spec/intro')
-rw-r--r--spec/intro/conventions.md112
-rw-r--r--spec/intro/index.md140
2 files changed, 252 insertions, 0 deletions
diff --git a/spec/intro/conventions.md b/spec/intro/conventions.md
new file mode 100644
index 0000000..8a92521
--- /dev/null
+++ b/spec/intro/conventions.md
@@ -0,0 +1,112 @@
+# Notation and conventions
+
+These conventions apply,
+at least in theory,
+to all of the specification documents
+unless stated otherwise.
+
+> Remember, our specification documents
+> were once a collection of separate text files,
+> written separately
+> and edited over the course of years.
+>
+> While we are trying (as of 2023)
+> to edit them into consistency,
+> you should be aware that these conventions
+> are not now followed uniformly everywhere.
+
+## MUST, SHOULD, and so on {#rfc2119}
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+[RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).
+
+## Data lengths {#data-lengths}
+
+Unless otherwise stated,
+all lengths are given as a number of 8-bit bytes.
+
+> All bytes are 8 bits long.
+> We sometimes call them "octets";
+> the terms as used here are interchangeable.
+
+When referring to longer lengths,
+we use [SI binary prefixes](https://en.wikipedia.org/wiki/Binary_prefix)
+(as in "kibibytes", "mebibytes", and so on)
+to refer unambiguously to increments of 1024<sup>X</sup> bytes.
+
+> If you encounter a reference
+> to "kilobytes", "megabytes", or so on,
+> you cannot safely infer whether the author intended
+> a decimal (1000<sup>N</sup>) or binary (1024<sup>N</sup>) interpretation.
+> In these cases, it is better to revise the specifications.
+
+<a id="tor-spec.txt-0.1.1"></a>
+
+## Integer encoding {#integers}
+
+Unless otherwise stated,
+all multi-byte integers are encoded
+in big-endian ("network") order.
+
+> For example, 4660 (0x1234),
+> when encoded as a two-byte integer,
+> is the byte 0x12 followed by the byte 0x34. (\[12 34\])
+>
+> When encoded as a four-byte integer,
+> it is the byte 0x00, the byte 0x00, the byte 0x12, and the byte 0x34.
+> (\[00 00 12 34\]).
+
+## Binary-as-text encodings {#binascii}
+
+When we refer to "base64", "base32", or "base16",
+we mean the encodings described in
+[RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648),
+with the following notes:
+
+- In base32, we never insert linefeeds in base32,
+ and we omit trailing `=` padding characters.
+- In base64,
+ we _sometimes_ omit trailing `=` padding characters,
+ and we do not insert linefeeds unless explicitly noted.
+- We do not insert any other whitespace,
+ except as specifically noted.
+
+Base 16 and base 32 are case-insensitive.
+Unless otherwise stated,
+implementations should accept any cases,
+and should produce a single uniform case.
+
+We sometimes refer to base16 as "hex" or "hexadecimal".
+
+> Note that as of 2023, in some places, the specs are not always
+> explicit about:
+>
+> - which base64 strings are multiline
+> - which base32 strings and base16 strings
+> should be generated in what case.
+>
+> This is something we should correct.
+
+## Notation {#notation}
+
+### Operations on byte strings {#ops}
+
+* `A | B` represents the concatenation of two binary strings `A` and `B`.
+
+### Binary literals {#binary-literals}
+
+When we write a series of one-byte hexadecimal literals
+in square brackets,
+it represents a multi-byte binary string.
+
+> For example,
+> `[6f 6e 69 6f 6e 20 72 6f 75 74 69 6e 67]`
+> is a 13-byte sequence representing the unterminated ASCII string,
+> `onion routing`.
+
+
+
+
+
diff --git a/spec/intro/index.md b/spec/intro/index.md
new file mode 100644
index 0000000..894d7e0
--- /dev/null
+++ b/spec/intro/index.md
@@ -0,0 +1,140 @@
+# A short introduction to Tor {#tor-intro}
+
+### Basic functionality {#basics}
+
+Tor is a distributed overlay network designed to anonymize
+low-latency TCP-based applications
+such as web browsing, secure shell, and instant messaging.
+The network is built of a number of servers, called **relays**
+(also called "onion routers" or "ORs" in some older documentation).
+
+To connect to the network,
+a client needs to download an up-to-date signed directory
+of the relays on the network.
+These directory documents are generated and signed
+by a set of semi-trusted **directory authority** servers,
+and are cached by the relays themselves.
+(If a client does not yet have a directory,
+it finds a cache by looking at a list of stable cache locations,
+distributed along with its source code.)
+
+> For more information on the directory subsystem,
+> see the [directory protocol specification](../dir-spec).
+
+After the client knows the relays on the network,
+it can pick a relay and open a [**channel**](../tor-spec/channels.md)
+to one of these relays.
+A channel is an encrypted reliable non-anonymous transport
+between a client and a relay or a relay and a relay,
+used to transmit messages called [**cells**](../tor-spec/cell-packet-format.md).
+(Under the hood, a channel is just a TLS connection over TCP,
+with a specified encoding for cells.)
+
+To anonymize its traffic,
+a client chooses a **path**—a sequence of relays on the network—
+and opens a channel to the first relay on the path
+(if it does not already have a channel open to that relay).
+The client then uses that channel to build
+a multi-hop cryptographic structure
+called a [**circuit**](../tor-spec/circuit-management.md).
+A circuit is built over a sequence of relays (typically three).
+Every relay in the circuit knows its precessor and successor,
+but no other relays in the circuit.
+Many circuits can be multiplexed over a single channel.
+
+> For more information on how paths are selected,
+> see the [path specification](../path-spec).
+> The first hop on a path,
+> also called a **guard node**,
+> has complicated rules for its selection;
+> for more on those, see the [guard specification](../guard-spec).
+
+Once a circuit exists,
+the client can use it to exchange fixed-length
+[**relay cells**](../tor-spec/relay-cells.md)
+with any relay on the circuit.
+These relay cells are wrapped in multiple layers of encryption:
+as part of building the circuit,
+the client [negotiates](../tor-spec/create-created-cells.md)
+a separate set of symmetric keys
+with each relay on the circuit.
+Each relay removes (or adds)
+a [single layer of encryption](../tor-spec/routing-relay-cells.md)
+for each relay cell before passing it on.
+
+A client uses these relay cells
+to exchange [**relay messages**](../tor-spec/relay-cells.md) with relays on a circuit.
+These "relay messages" in turn are used
+to actually deliver traffic over the network.
+In the [simplest use case](../tor-spec/opening-streams.md),
+the client sends a `BEGIN` message
+to tell the last relay on the circuit
+(called the **exit node**)
+to create a new session, or **stream**,
+and associate that stream
+with a new TCP connection to a target host.
+The exit node replies with a `CONNECTED` message
+to say that the TCP connection has succeeded.
+Then the client and the exit exchange `DATA` messages
+to represent the contents of the anonymized stream.
+
+> Note that as of 2023,
+> the specifications do not perfectly distinguish
+> between relay cells and relay messages.
+> This is because, until recently,
+> there was a 1-to-1 relationship between the two:
+> every relay cell held a single relay message.
+> As [proposal 340](../proposals/340-packed-and-fragmented.md) is implemented,
+> we will revise the specifications
+> for improved clarify on this point.
+
+Other kinds of relay messages can be used
+for more advanced functionality.
+
+<!-- TODO: I'm not so sure about the vocabulary in this part. -->
+
+Using a system called **conflux**
+a client can build multiple circuits to the _same_ exit node,
+and associate those circuits within a **conflux set**.
+Once this is done,
+relay messages can be sent over _either_ circuit in the set,
+depending on capacity and performance.
+
+> For more on conflux,
+> which has been integrated into the C tor implementation,
+> but not yet (as of 2023) into this document,
+> see [proposal 329](../proposals/329-traffic-splitting.txt).
+
+### Advanced topics: Onion services and responder anonymity {#onions}
+
+In addition to _initiating_ anonymous communications,
+clients can also arrange to _receive_ communications
+without revealing their identity or location.
+This is called **responder anonymity**,
+and the mechanism Tor uses to achieve it
+is called **onion services**
+(or "hidden services" or "rendezvous services"
+in some older documentation).
+
+> For the details on onion services,
+> see the [Tor Rendezvous Specification](../rend-spec).
+
+### Advanced topics: Censorship resistence {#anticensorship}
+
+In some places, Tor is censored.
+Typically, censors do this by blocking connections
+to the addresses of the known Tor relays,
+and by blocking traffic that resembles Tor.
+
+To resist this censorship,
+some Tor relays, called **bridges**,
+are unlisted in the public directory:
+their addresses are distributed by [other means](../bridgedb-spec.md).
+(To distinguish ordinary published relays from bridges,
+we sometimes call them **public relays**.)
+
+Additionally, Tor clients and bridges can use extension programs,
+called [**pluggable transports**](../pt-spec),
+that obfuscate their traffic to make it harder to detect.
+
+