diff options
author | Nick Mathewson <nickm@torproject.org> | 2023-11-09 16:25:38 +0000 |
---|---|---|
committer | Nick Mathewson <nickm@torproject.org> | 2023-11-09 16:25:38 +0000 |
commit | 22f5b0ecdb9d21448e1a04ba2687b8f84d52724b (patch) | |
tree | 0b62f272469ed0bece8f854674aec9f37722d3af | |
parent | eab09d4a520b6d7d44fa73d120a5be889fdc9e51 (diff) | |
parent | 5063c0e80e4aeb62c47fe8eb406ed500431dae6e (diff) | |
download | torspec-22f5b0ecdb9d21448e1a04ba2687b8f84d52724b.tar.gz torspec-22f5b0ecdb9d21448e1a04ba2687b8f84d52724b.zip |
Merge branch 'preliminaries' into 'main'
Start an "introductions" sections with a "conventions" subsection to span all the specs
See merge request tpo/core/torspec!217
-rw-r--r-- | spec/SUMMARY.md | 7 | ||||
-rw-r--r-- | spec/intro/conventions.md | 112 | ||||
-rw-r--r-- | spec/intro/index.md (renamed from spec/overview.md) | 32 | ||||
-rw-r--r-- | spec/tor-spec/index.md | 1 | ||||
-rw-r--r-- | spec/tor-spec/opening-streams.md | 5 | ||||
-rw-r--r-- | spec/tor-spec/preliminaries.md | 27 | ||||
-rw-r--r-- | spec/tor-spec/system-overview.md | 10 |
7 files changed, 137 insertions, 57 deletions
diff --git a/spec/SUMMARY.md b/spec/SUMMARY.md index 6a11961..c425ee6 100644 --- a/spec/SUMMARY.md +++ b/spec/SUMMARY.md @@ -1,13 +1,16 @@ # Summary [About these specifications](./README.md) -[A short introduction to Tor](./overview.md) + +# Introduction + +- [A short introduction to Tor](./intro/index.md) + - [Notation and conventions](./intro/conventions.md) # The core Tor protocol - [`Tor Protocol Specification`](./tor-spec/index.md) - [Preliminaries](./tor-spec/preliminaries.md) - - [System overview](./tor-spec/system-overview.md) - [Relay keys and identities](./tor-spec/relay-keys.md) - [Channels](./tor-spec/channels.md) - [Negotiating and initializing channels](./tor-spec/negotiating-channels.md) diff --git a/spec/intro/conventions.md b/spec/intro/conventions.md new file mode 100644 index 0000000..8a92521 --- /dev/null +++ b/spec/intro/conventions.md @@ -0,0 +1,112 @@ +# Notation and conventions + +These conventions apply, +at least in theory, +to all of the specification documents +unless stated otherwise. + +> Remember, our specification documents +> were once a collection of separate text files, +> written separately +> and edited over the course of years. +> +> While we are trying (as of 2023) +> to edit them into consistency, +> you should be aware that these conventions +> are not now followed uniformly everywhere. + +## MUST, SHOULD, and so on {#rfc2119} + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL +NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and +"OPTIONAL" in this document are to be interpreted as described in +[RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). + +## Data lengths {#data-lengths} + +Unless otherwise stated, +all lengths are given as a number of 8-bit bytes. + +> All bytes are 8 bits long. +> We sometimes call them "octets"; +> the terms as used here are interchangeable. + +When referring to longer lengths, +we use [SI binary prefixes](https://en.wikipedia.org/wiki/Binary_prefix) +(as in "kibibytes", "mebibytes", and so on) +to refer unambiguously to increments of 1024<sup>X</sup> bytes. + +> If you encounter a reference +> to "kilobytes", "megabytes", or so on, +> you cannot safely infer whether the author intended +> a decimal (1000<sup>N</sup>) or binary (1024<sup>N</sup>) interpretation. +> In these cases, it is better to revise the specifications. + +<a id="tor-spec.txt-0.1.1"></a> + +## Integer encoding {#integers} + +Unless otherwise stated, +all multi-byte integers are encoded +in big-endian ("network") order. + +> For example, 4660 (0x1234), +> when encoded as a two-byte integer, +> is the byte 0x12 followed by the byte 0x34. (\[12 34\]) +> +> When encoded as a four-byte integer, +> it is the byte 0x00, the byte 0x00, the byte 0x12, and the byte 0x34. +> (\[00 00 12 34\]). + +## Binary-as-text encodings {#binascii} + +When we refer to "base64", "base32", or "base16", +we mean the encodings described in +[RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648), +with the following notes: + +- In base32, we never insert linefeeds in base32, + and we omit trailing `=` padding characters. +- In base64, + we _sometimes_ omit trailing `=` padding characters, + and we do not insert linefeeds unless explicitly noted. +- We do not insert any other whitespace, + except as specifically noted. + +Base 16 and base 32 are case-insensitive. +Unless otherwise stated, +implementations should accept any cases, +and should produce a single uniform case. + +We sometimes refer to base16 as "hex" or "hexadecimal". + +> Note that as of 2023, in some places, the specs are not always +> explicit about: +> +> - which base64 strings are multiline +> - which base32 strings and base16 strings +> should be generated in what case. +> +> This is something we should correct. + +## Notation {#notation} + +### Operations on byte strings {#ops} + +* `A | B` represents the concatenation of two binary strings `A` and `B`. + +### Binary literals {#binary-literals} + +When we write a series of one-byte hexadecimal literals +in square brackets, +it represents a multi-byte binary string. + +> For example, +> `[6f 6e 69 6f 6e 20 72 6f 75 74 69 6e 67]` +> is a 13-byte sequence representing the unterminated ASCII string, +> `onion routing`. + + + + + diff --git a/spec/overview.md b/spec/intro/index.md index f6f3d5d..894d7e0 100644 --- a/spec/overview.md +++ b/spec/intro/index.md @@ -19,14 +19,14 @@ it finds a cache by looking at a list of stable cache locations, distributed along with its source code.) > For more information on the directory subsystem, -> see the [directory protocol specification](./dir-spec). +> see the [directory protocol specification](../dir-spec). After the client knows the relays on the network, -it can pick a relay and open a [**channel**](./tor-spec/channels.md) +it can pick a relay and open a [**channel**](../tor-spec/channels.md) to one of these relays. A channel is an encrypted reliable non-anonymous transport between a client and a relay or a relay and a relay, -used to transmit messages called [**cells**](./tor-spec/cell-packet-format.md). +used to transmit messages called [**cells**](../tor-spec/cell-packet-format.md). (Under the hood, a channel is just a TLS connection over TCP, with a specified encoding for cells.) @@ -36,37 +36,37 @@ and opens a channel to the first relay on the path (if it does not already have a channel open to that relay). The client then uses that channel to build a multi-hop cryptographic structure -called a [**circuit**](./tor-spec/circuit-management.md). +called a [**circuit**](../tor-spec/circuit-management.md). A circuit is built over a sequence of relays (typically three). Every relay in the circuit knows its precessor and successor, but no other relays in the circuit. Many circuits can be multiplexed over a single channel. > For more information on how paths are selected, -> see the [path specification](./path-spec). +> see the [path specification](../path-spec). > The first hop on a path, > also called a **guard node**, > has complicated rules for its selection; -> for more on those, see the [guard specification](./guard-spec). +> for more on those, see the [guard specification](../guard-spec). Once a circuit exists, the client can use it to exchange fixed-length -[**relay cells**](./tor-spec/relay-cells.md) +[**relay cells**](../tor-spec/relay-cells.md) with any relay on the circuit. These relay cells are wrapped in multiple layers of encryption: as part of building the circuit, -the client [negotiates](./tor-spec/create-created-cells.md) +the client [negotiates](../tor-spec/create-created-cells.md) a separate set of symmetric keys with each relay on the circuit. Each relay removes (or adds) -a [single layer of encryption](./tor-spec/routing-relay-cells.md) +a [single layer of encryption](../tor-spec/routing-relay-cells.md) for each relay cell before passing it on. A client uses these relay cells -to exchange [**relay messages**](./tor-spec/relay-cells.md) with relays on a circuit. +to exchange [**relay messages**](../tor-spec/relay-cells.md) with relays on a circuit. These "relay messages" in turn are used to actually deliver traffic over the network. -In the [simplest use case](./tor-spec/opening-streams.md), +In the [simplest use case](../tor-spec/opening-streams.md), the client sends a `BEGIN` message to tell the last relay on the circuit (called the **exit node**) @@ -84,7 +84,7 @@ to represent the contents of the anonymized stream. > This is because, until recently, > there was a 1-to-1 relationship between the two: > every relay cell held a single relay message. -> As [proposal 340](proposals/340-packed-and-fragmented.md) is implemented, +> As [proposal 340](../proposals/340-packed-and-fragmented.md) is implemented, > we will revise the specifications > for improved clarify on this point. @@ -103,7 +103,7 @@ depending on capacity and performance. > For more on conflux, > which has been integrated into the C tor implementation, > but not yet (as of 2023) into this document, -> see [proposal 329](proposals/329-traffic-splitting.txt). +> see [proposal 329](../proposals/329-traffic-splitting.txt). ### Advanced topics: Onion services and responder anonymity {#onions} @@ -117,7 +117,7 @@ is called **onion services** in some older documentation). > For the details on onion services, -> see the [Tor Rendezvous Specification](./rend-spec). +> see the [Tor Rendezvous Specification](../rend-spec). ### Advanced topics: Censorship resistence {#anticensorship} @@ -129,12 +129,12 @@ and by blocking traffic that resembles Tor. To resist this censorship, some Tor relays, called **bridges**, are unlisted in the public directory: -their addresses are distributed by [other means](./bridgedb-spec.md). +their addresses are distributed by [other means](../bridgedb-spec.md). (To distinguish ordinary published relays from bridges, we sometimes call them **public relays**.) Additionally, Tor clients and bridges can use extension programs, -called [**pluggable transports**](./pt-spec), +called [**pluggable transports**](../pt-spec), that obfuscate their traffic to make it harder to detect. diff --git a/spec/tor-spec/index.md b/spec/tor-spec/index.md index a81e1b9..219ea1d 100644 --- a/spec/tor-spec/index.md +++ b/spec/tor-spec/index.md @@ -9,3 +9,4 @@ Tor as they become obsolete. This specification is not a design document; most design criteria are not examined. For more information on why Tor acts as it does, see tor-design.pdf. + diff --git a/spec/tor-spec/opening-streams.md b/spec/tor-spec/opening-streams.md index d59e26b..757f776 100644 --- a/spec/tor-spec/opening-streams.md +++ b/spec/tor-spec/opening-streams.md @@ -26,8 +26,9 @@ fingerprinting. Implementations MUST accept strings in any case. The FLAGS value has one or more of the following bits set, where "bit 1" is the LSB of the 32-bit value, and "bit 32" is the MSB. -(Remember that all values in Tor are big-endian (see -["Preliminaries ยป Encoding integers"](./preliminaries.md#encoding)), so +(Remember that +[all integers in Tor are big-endian](../intro/conventions.md), +so the MSB of a 4-byte value is the MSB of the first byte, and the LSB of a 4-byte value is the LSB of its last byte.) diff --git a/spec/tor-spec/preliminaries.md b/spec/tor-spec/preliminaries.md index 8cc92b7..78e6e80 100644 --- a/spec/tor-spec/preliminaries.md +++ b/spec/tor-spec/preliminaries.md @@ -2,13 +2,6 @@ # Preliminaries -```text - The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL - NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and - "OPTIONAL" in this document are to be interpreted as described in - RFC 2119. -``` - <a id="tor-spec.txt-0.1"></a> ## Notation and encoding{#notation-and-encoding} @@ -19,30 +12,10 @@ K -- a key for a symmetric cipher. N -- a "nonce", a random value, usually deterministically chosen from other inputs using hashing. - - a|b -- concatenation of 'a' and 'b'. ``` -\[A0 B1 C2\] -- a three-byte sequence, containing the bytes with -hexadecimal values A0, B1, and C2, in that order. - H(m) -- a cryptographic hash of m. -We use "byte" and "octet" interchangeably. Possibly we shouldn't. - -Some specs mention "base32". This means RFC4648, without "=" padding. - -<a id="tor-spec.txt-0.1.1"></a> - -### Encoding integers{#encoding} - -Unless we explicitly say otherwise below, all numeric values in the -Tor protocol are encoded in network (big-endian) order. So a "32-bit -integer" means a big-endian 32-bit integer; a "2-byte" integer means -a big-endian 16-bit integer, and so forth. - -<a id="tor-spec.txt-0.2"></a> - ## Security parameters Tor uses a stream cipher, a public-key cipher, the Diffie-Hellman diff --git a/spec/tor-spec/system-overview.md b/spec/tor-spec/system-overview.md deleted file mode 100644 index 17cc1a4..0000000 --- a/spec/tor-spec/system-overview.md +++ /dev/null @@ -1,10 +0,0 @@ -<a id="tor-spec.txt-1"></a> - -# System overview - -Tor is a distributed overlay network designed to anonymize -low-latency TCP-based applications such as web browsing, secure shell, -and instant messaging. Clients choose a path through the network and -build a `circuit'', in which each node (or`onion router'' or `OR'') in the path knows its predecessor and successor, but no other nodes in the circuit. Traffic flowing down the circuit is sent in fixed-size `cells'', which are unwrapped by a symmetric key at each node (like -the layers of an onion) and relayed downstream. - |