aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorNick Mathewson <nickm@torproject.org>2023-11-09 16:25:38 +0000
committerNick Mathewson <nickm@torproject.org>2023-11-09 16:25:38 +0000
commit22f5b0ecdb9d21448e1a04ba2687b8f84d52724b (patch)
tree0b62f272469ed0bece8f854674aec9f37722d3af
parenteab09d4a520b6d7d44fa73d120a5be889fdc9e51 (diff)
parent5063c0e80e4aeb62c47fe8eb406ed500431dae6e (diff)
downloadtorspec-22f5b0ecdb9d21448e1a04ba2687b8f84d52724b.tar.gz
torspec-22f5b0ecdb9d21448e1a04ba2687b8f84d52724b.zip
Merge branch 'preliminaries' into 'main'
Start an "introductions" sections with a "conventions" subsection to span all the specs See merge request tpo/core/torspec!217
-rw-r--r--spec/SUMMARY.md7
-rw-r--r--spec/intro/conventions.md112
-rw-r--r--spec/intro/index.md (renamed from spec/overview.md)32
-rw-r--r--spec/tor-spec/index.md1
-rw-r--r--spec/tor-spec/opening-streams.md5
-rw-r--r--spec/tor-spec/preliminaries.md27
-rw-r--r--spec/tor-spec/system-overview.md10
7 files changed, 137 insertions, 57 deletions
diff --git a/spec/SUMMARY.md b/spec/SUMMARY.md
index 6a11961..c425ee6 100644
--- a/spec/SUMMARY.md
+++ b/spec/SUMMARY.md
@@ -1,13 +1,16 @@
# Summary
[About these specifications](./README.md)
-[A short introduction to Tor](./overview.md)
+
+# Introduction
+
+- [A short introduction to Tor](./intro/index.md)
+ - [Notation and conventions](./intro/conventions.md)
# The core Tor protocol
- [`Tor Protocol Specification`](./tor-spec/index.md)
- [Preliminaries](./tor-spec/preliminaries.md)
- - [System overview](./tor-spec/system-overview.md)
- [Relay keys and identities](./tor-spec/relay-keys.md)
- [Channels](./tor-spec/channels.md)
- [Negotiating and initializing channels](./tor-spec/negotiating-channels.md)
diff --git a/spec/intro/conventions.md b/spec/intro/conventions.md
new file mode 100644
index 0000000..8a92521
--- /dev/null
+++ b/spec/intro/conventions.md
@@ -0,0 +1,112 @@
+# Notation and conventions
+
+These conventions apply,
+at least in theory,
+to all of the specification documents
+unless stated otherwise.
+
+> Remember, our specification documents
+> were once a collection of separate text files,
+> written separately
+> and edited over the course of years.
+>
+> While we are trying (as of 2023)
+> to edit them into consistency,
+> you should be aware that these conventions
+> are not now followed uniformly everywhere.
+
+## MUST, SHOULD, and so on {#rfc2119}
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+[RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).
+
+## Data lengths {#data-lengths}
+
+Unless otherwise stated,
+all lengths are given as a number of 8-bit bytes.
+
+> All bytes are 8 bits long.
+> We sometimes call them "octets";
+> the terms as used here are interchangeable.
+
+When referring to longer lengths,
+we use [SI binary prefixes](https://en.wikipedia.org/wiki/Binary_prefix)
+(as in "kibibytes", "mebibytes", and so on)
+to refer unambiguously to increments of 1024<sup>X</sup> bytes.
+
+> If you encounter a reference
+> to "kilobytes", "megabytes", or so on,
+> you cannot safely infer whether the author intended
+> a decimal (1000<sup>N</sup>) or binary (1024<sup>N</sup>) interpretation.
+> In these cases, it is better to revise the specifications.
+
+<a id="tor-spec.txt-0.1.1"></a>
+
+## Integer encoding {#integers}
+
+Unless otherwise stated,
+all multi-byte integers are encoded
+in big-endian ("network") order.
+
+> For example, 4660 (0x1234),
+> when encoded as a two-byte integer,
+> is the byte 0x12 followed by the byte 0x34. (\[12 34\])
+>
+> When encoded as a four-byte integer,
+> it is the byte 0x00, the byte 0x00, the byte 0x12, and the byte 0x34.
+> (\[00 00 12 34\]).
+
+## Binary-as-text encodings {#binascii}
+
+When we refer to "base64", "base32", or "base16",
+we mean the encodings described in
+[RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648),
+with the following notes:
+
+- In base32, we never insert linefeeds in base32,
+ and we omit trailing `=` padding characters.
+- In base64,
+ we _sometimes_ omit trailing `=` padding characters,
+ and we do not insert linefeeds unless explicitly noted.
+- We do not insert any other whitespace,
+ except as specifically noted.
+
+Base 16 and base 32 are case-insensitive.
+Unless otherwise stated,
+implementations should accept any cases,
+and should produce a single uniform case.
+
+We sometimes refer to base16 as "hex" or "hexadecimal".
+
+> Note that as of 2023, in some places, the specs are not always
+> explicit about:
+>
+> - which base64 strings are multiline
+> - which base32 strings and base16 strings
+> should be generated in what case.
+>
+> This is something we should correct.
+
+## Notation {#notation}
+
+### Operations on byte strings {#ops}
+
+* `A | B` represents the concatenation of two binary strings `A` and `B`.
+
+### Binary literals {#binary-literals}
+
+When we write a series of one-byte hexadecimal literals
+in square brackets,
+it represents a multi-byte binary string.
+
+> For example,
+> `[6f 6e 69 6f 6e 20 72 6f 75 74 69 6e 67]`
+> is a 13-byte sequence representing the unterminated ASCII string,
+> `onion routing`.
+
+
+
+
+
diff --git a/spec/overview.md b/spec/intro/index.md
index f6f3d5d..894d7e0 100644
--- a/spec/overview.md
+++ b/spec/intro/index.md
@@ -19,14 +19,14 @@ it finds a cache by looking at a list of stable cache locations,
distributed along with its source code.)
> For more information on the directory subsystem,
-> see the [directory protocol specification](./dir-spec).
+> see the [directory protocol specification](../dir-spec).
After the client knows the relays on the network,
-it can pick a relay and open a [**channel**](./tor-spec/channels.md)
+it can pick a relay and open a [**channel**](../tor-spec/channels.md)
to one of these relays.
A channel is an encrypted reliable non-anonymous transport
between a client and a relay or a relay and a relay,
-used to transmit messages called [**cells**](./tor-spec/cell-packet-format.md).
+used to transmit messages called [**cells**](../tor-spec/cell-packet-format.md).
(Under the hood, a channel is just a TLS connection over TCP,
with a specified encoding for cells.)
@@ -36,37 +36,37 @@ and opens a channel to the first relay on the path
(if it does not already have a channel open to that relay).
The client then uses that channel to build
a multi-hop cryptographic structure
-called a [**circuit**](./tor-spec/circuit-management.md).
+called a [**circuit**](../tor-spec/circuit-management.md).
A circuit is built over a sequence of relays (typically three).
Every relay in the circuit knows its precessor and successor,
but no other relays in the circuit.
Many circuits can be multiplexed over a single channel.
> For more information on how paths are selected,
-> see the [path specification](./path-spec).
+> see the [path specification](../path-spec).
> The first hop on a path,
> also called a **guard node**,
> has complicated rules for its selection;
-> for more on those, see the [guard specification](./guard-spec).
+> for more on those, see the [guard specification](../guard-spec).
Once a circuit exists,
the client can use it to exchange fixed-length
-[**relay cells**](./tor-spec/relay-cells.md)
+[**relay cells**](../tor-spec/relay-cells.md)
with any relay on the circuit.
These relay cells are wrapped in multiple layers of encryption:
as part of building the circuit,
-the client [negotiates](./tor-spec/create-created-cells.md)
+the client [negotiates](../tor-spec/create-created-cells.md)
a separate set of symmetric keys
with each relay on the circuit.
Each relay removes (or adds)
-a [single layer of encryption](./tor-spec/routing-relay-cells.md)
+a [single layer of encryption](../tor-spec/routing-relay-cells.md)
for each relay cell before passing it on.
A client uses these relay cells
-to exchange [**relay messages**](./tor-spec/relay-cells.md) with relays on a circuit.
+to exchange [**relay messages**](../tor-spec/relay-cells.md) with relays on a circuit.
These "relay messages" in turn are used
to actually deliver traffic over the network.
-In the [simplest use case](./tor-spec/opening-streams.md),
+In the [simplest use case](../tor-spec/opening-streams.md),
the client sends a `BEGIN` message
to tell the last relay on the circuit
(called the **exit node**)
@@ -84,7 +84,7 @@ to represent the contents of the anonymized stream.
> This is because, until recently,
> there was a 1-to-1 relationship between the two:
> every relay cell held a single relay message.
-> As [proposal 340](proposals/340-packed-and-fragmented.md) is implemented,
+> As [proposal 340](../proposals/340-packed-and-fragmented.md) is implemented,
> we will revise the specifications
> for improved clarify on this point.
@@ -103,7 +103,7 @@ depending on capacity and performance.
> For more on conflux,
> which has been integrated into the C tor implementation,
> but not yet (as of 2023) into this document,
-> see [proposal 329](proposals/329-traffic-splitting.txt).
+> see [proposal 329](../proposals/329-traffic-splitting.txt).
### Advanced topics: Onion services and responder anonymity {#onions}
@@ -117,7 +117,7 @@ is called **onion services**
in some older documentation).
> For the details on onion services,
-> see the [Tor Rendezvous Specification](./rend-spec).
+> see the [Tor Rendezvous Specification](../rend-spec).
### Advanced topics: Censorship resistence {#anticensorship}
@@ -129,12 +129,12 @@ and by blocking traffic that resembles Tor.
To resist this censorship,
some Tor relays, called **bridges**,
are unlisted in the public directory:
-their addresses are distributed by [other means](./bridgedb-spec.md).
+their addresses are distributed by [other means](../bridgedb-spec.md).
(To distinguish ordinary published relays from bridges,
we sometimes call them **public relays**.)
Additionally, Tor clients and bridges can use extension programs,
-called [**pluggable transports**](./pt-spec),
+called [**pluggable transports**](../pt-spec),
that obfuscate their traffic to make it harder to detect.
diff --git a/spec/tor-spec/index.md b/spec/tor-spec/index.md
index a81e1b9..219ea1d 100644
--- a/spec/tor-spec/index.md
+++ b/spec/tor-spec/index.md
@@ -9,3 +9,4 @@ Tor as they become obsolete.
This specification is not a design document; most design criteria
are not examined. For more information on why Tor acts as it does,
see tor-design.pdf.
+
diff --git a/spec/tor-spec/opening-streams.md b/spec/tor-spec/opening-streams.md
index d59e26b..757f776 100644
--- a/spec/tor-spec/opening-streams.md
+++ b/spec/tor-spec/opening-streams.md
@@ -26,8 +26,9 @@ fingerprinting. Implementations MUST accept strings in any case.
The FLAGS value has one or more of the following bits set, where
"bit 1" is the LSB of the 32-bit value, and "bit 32" is the MSB.
-(Remember that all values in Tor are big-endian (see
-["Preliminaries ยป Encoding integers"](./preliminaries.md#encoding)), so
+(Remember that
+[all integers in Tor are big-endian](../intro/conventions.md),
+so
the MSB of a 4-byte value is the MSB of the first byte, and the LSB
of a 4-byte value is the LSB of its last byte.)
diff --git a/spec/tor-spec/preliminaries.md b/spec/tor-spec/preliminaries.md
index 8cc92b7..78e6e80 100644
--- a/spec/tor-spec/preliminaries.md
+++ b/spec/tor-spec/preliminaries.md
@@ -2,13 +2,6 @@
# Preliminaries
-```text
- The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
- NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
- "OPTIONAL" in this document are to be interpreted as described in
- RFC 2119.
-```
-
<a id="tor-spec.txt-0.1"></a>
## Notation and encoding{#notation-and-encoding}
@@ -19,30 +12,10 @@
K -- a key for a symmetric cipher.
N -- a "nonce", a random value, usually deterministically chosen
from other inputs using hashing.
-
- a|b -- concatenation of 'a' and 'b'.
```
-\[A0 B1 C2\] -- a three-byte sequence, containing the bytes with
-hexadecimal values A0, B1, and C2, in that order.
-
H(m) -- a cryptographic hash of m.
-We use "byte" and "octet" interchangeably. Possibly we shouldn't.
-
-Some specs mention "base32". This means RFC4648, without "=" padding.
-
-<a id="tor-spec.txt-0.1.1"></a>
-
-### Encoding integers{#encoding}
-
-Unless we explicitly say otherwise below, all numeric values in the
-Tor protocol are encoded in network (big-endian) order. So a "32-bit
-integer" means a big-endian 32-bit integer; a "2-byte" integer means
-a big-endian 16-bit integer, and so forth.
-
-<a id="tor-spec.txt-0.2"></a>
-
## Security parameters
Tor uses a stream cipher, a public-key cipher, the Diffie-Hellman
diff --git a/spec/tor-spec/system-overview.md b/spec/tor-spec/system-overview.md
deleted file mode 100644
index 17cc1a4..0000000
--- a/spec/tor-spec/system-overview.md
+++ /dev/null
@@ -1,10 +0,0 @@
-<a id="tor-spec.txt-1"></a>
-
-# System overview
-
-Tor is a distributed overlay network designed to anonymize
-low-latency TCP-based applications such as web browsing, secure shell,
-and instant messaging. Clients choose a path through the network and
-build a `circuit'', in which each node (or`onion router'' or `OR'') in the path knows its predecessor and successor, but no other nodes in the circuit. Traffic flowing down the circuit is sent in fixed-size `cells'', which are unwrapped by a symmetric key at each node (like
-the layers of an onion) and relayed downstream.
-