diff options
Diffstat (limited to 'spec/intro/conventions.md')
-rw-r--r-- | spec/intro/conventions.md | 112 |
1 files changed, 112 insertions, 0 deletions
diff --git a/spec/intro/conventions.md b/spec/intro/conventions.md new file mode 100644 index 0000000..8a92521 --- /dev/null +++ b/spec/intro/conventions.md @@ -0,0 +1,112 @@ +# Notation and conventions + +These conventions apply, +at least in theory, +to all of the specification documents +unless stated otherwise. + +> Remember, our specification documents +> were once a collection of separate text files, +> written separately +> and edited over the course of years. +> +> While we are trying (as of 2023) +> to edit them into consistency, +> you should be aware that these conventions +> are not now followed uniformly everywhere. + +## MUST, SHOULD, and so on {#rfc2119} + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL +NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and +"OPTIONAL" in this document are to be interpreted as described in +[RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). + +## Data lengths {#data-lengths} + +Unless otherwise stated, +all lengths are given as a number of 8-bit bytes. + +> All bytes are 8 bits long. +> We sometimes call them "octets"; +> the terms as used here are interchangeable. + +When referring to longer lengths, +we use [SI binary prefixes](https://en.wikipedia.org/wiki/Binary_prefix) +(as in "kibibytes", "mebibytes", and so on) +to refer unambiguously to increments of 1024<sup>X</sup> bytes. + +> If you encounter a reference +> to "kilobytes", "megabytes", or so on, +> you cannot safely infer whether the author intended +> a decimal (1000<sup>N</sup>) or binary (1024<sup>N</sup>) interpretation. +> In these cases, it is better to revise the specifications. + +<a id="tor-spec.txt-0.1.1"></a> + +## Integer encoding {#integers} + +Unless otherwise stated, +all multi-byte integers are encoded +in big-endian ("network") order. + +> For example, 4660 (0x1234), +> when encoded as a two-byte integer, +> is the byte 0x12 followed by the byte 0x34. (\[12 34\]) +> +> When encoded as a four-byte integer, +> it is the byte 0x00, the byte 0x00, the byte 0x12, and the byte 0x34. +> (\[00 00 12 34\]). + +## Binary-as-text encodings {#binascii} + +When we refer to "base64", "base32", or "base16", +we mean the encodings described in +[RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648), +with the following notes: + +- In base32, we never insert linefeeds in base32, + and we omit trailing `=` padding characters. +- In base64, + we _sometimes_ omit trailing `=` padding characters, + and we do not insert linefeeds unless explicitly noted. +- We do not insert any other whitespace, + except as specifically noted. + +Base 16 and base 32 are case-insensitive. +Unless otherwise stated, +implementations should accept any cases, +and should produce a single uniform case. + +We sometimes refer to base16 as "hex" or "hexadecimal". + +> Note that as of 2023, in some places, the specs are not always +> explicit about: +> +> - which base64 strings are multiline +> - which base32 strings and base16 strings +> should be generated in what case. +> +> This is something we should correct. + +## Notation {#notation} + +### Operations on byte strings {#ops} + +* `A | B` represents the concatenation of two binary strings `A` and `B`. + +### Binary literals {#binary-literals} + +When we write a series of one-byte hexadecimal literals +in square brackets, +it represents a multi-byte binary string. + +> For example, +> `[6f 6e 69 6f 6e 20 72 6f 75 74 69 6e 67]` +> is a 13-byte sequence representing the unterminated ASCII string, +> `onion routing`. + + + + + |