spec/intro/conventions.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

# Notation and conventions

These conventions apply,
at least in theory,
to all of the specification documents
unless stated otherwise.

> Remember, our specification documents
> were once a collection of separate text files,
> written separately
> and edited over the course of years.
>
> While we are trying (as of 2023)
> to edit them into consistency,
> you should be aware that these conventions
> are not now followed uniformly everywhere.

## MUST, SHOULD, and so on {#rfc2119}

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).

## Data lengths {#data-lengths}

Unless otherwise stated,
all lengths are given as a number of 8-bit bytes.

> All bytes are 8 bits long.
> We sometimes call them "octets";
> the terms as used here are interchangeable.

When referring to longer lengths,
we use [SI binary prefixes](https://en.wikipedia.org/wiki/Binary_prefix)
(as in "kibibytes", "mebibytes", and so on)
to refer unambiguously to increments of 1024<sup>X</sup> bytes.

> If you encounter a reference
> to "kilobytes", "megabytes", or so on,
> you cannot safely infer whether the author intended
> a decimal (1000<sup>N</sup>) or binary (1024<sup>N</sup>) interpretation.
> In these cases, it is better to revise the specifications.

<a id="tor-spec.txt-0.1.1"></a>

## Integer encoding {#integers}

Unless otherwise stated,
all multi-byte integers are encoded
in big-endian ("network") order.

> For example, 4660 (0x1234),
> when encoded as a two-byte integer,
> is the byte 0x12 followed by the byte 0x34. (\[12 34\])
>
> When encoded as a four-byte integer,
> it is the byte 0x00, the byte 0x00, the byte 0x12, and the byte 0x34.
> (\[00 00 12 34\]).

## Binary-as-text encodings {#binascii}

When we refer to "base64", "base32", or "base16",
we mean the encodings described in
[RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648),
with the following notes:

- In base32, we never insert linefeeds in base32,
  and we omit trailing `=` padding characters.
- In base64,
  we _sometimes_ omit trailing `=` padding characters,
  and we do not insert linefeeds unless explicitly noted.
- We do not insert any other whitespace,
  except as specifically noted.

Base 16 and base 32 are case-insensitive.
Unless otherwise stated,
implementations should accept any cases,
and should produce a single uniform case.

We sometimes refer to base16 as "hex" or "hexadecimal".

> Note that as of 2023, in some places, the specs are not always
>  explicit about:
>
>   - which base64 strings are multiline
>   - which base32 strings and base16 strings
>     should be generated in what case.
>
> This is something we should correct.

## Notation {#notation}

### Operations on byte strings {#ops}

* `A | B` represents the concatenation of two binary strings `A` and `B`.

### Binary literals {#binary-literals}

When we write a series of one-byte hexadecimal literals
in square brackets,
it represents a multi-byte binary string.

> For example,
> `[6f 6e 69 6f 6e 20 72 6f 75 74 69 6e 67]`
> is a 13-byte sequence representing the unterminated ASCII string,
> `onion routing`.