aboutsummaryrefslogtreecommitdiff
path: root/spec/intro/conventions.md
diff options
context:
space:
mode:
Diffstat (limited to 'spec/intro/conventions.md')
-rw-r--r--spec/intro/conventions.md112
1 files changed, 112 insertions, 0 deletions
diff --git a/spec/intro/conventions.md b/spec/intro/conventions.md
new file mode 100644
index 0000000..8a92521
--- /dev/null
+++ b/spec/intro/conventions.md
@@ -0,0 +1,112 @@
+# Notation and conventions
+
+These conventions apply,
+at least in theory,
+to all of the specification documents
+unless stated otherwise.
+
+> Remember, our specification documents
+> were once a collection of separate text files,
+> written separately
+> and edited over the course of years.
+>
+> While we are trying (as of 2023)
+> to edit them into consistency,
+> you should be aware that these conventions
+> are not now followed uniformly everywhere.
+
+## MUST, SHOULD, and so on {#rfc2119}
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+"OPTIONAL" in this document are to be interpreted as described in
+[RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).
+
+## Data lengths {#data-lengths}
+
+Unless otherwise stated,
+all lengths are given as a number of 8-bit bytes.
+
+> All bytes are 8 bits long.
+> We sometimes call them "octets";
+> the terms as used here are interchangeable.
+
+When referring to longer lengths,
+we use [SI binary prefixes](https://en.wikipedia.org/wiki/Binary_prefix)
+(as in "kibibytes", "mebibytes", and so on)
+to refer unambiguously to increments of 1024<sup>X</sup> bytes.
+
+> If you encounter a reference
+> to "kilobytes", "megabytes", or so on,
+> you cannot safely infer whether the author intended
+> a decimal (1000<sup>N</sup>) or binary (1024<sup>N</sup>) interpretation.
+> In these cases, it is better to revise the specifications.
+
+<a id="tor-spec.txt-0.1.1"></a>
+
+## Integer encoding {#integers}
+
+Unless otherwise stated,
+all multi-byte integers are encoded
+in big-endian ("network") order.
+
+> For example, 4660 (0x1234),
+> when encoded as a two-byte integer,
+> is the byte 0x12 followed by the byte 0x34. (\[12 34\])
+>
+> When encoded as a four-byte integer,
+> it is the byte 0x00, the byte 0x00, the byte 0x12, and the byte 0x34.
+> (\[00 00 12 34\]).
+
+## Binary-as-text encodings {#binascii}
+
+When we refer to "base64", "base32", or "base16",
+we mean the encodings described in
+[RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648),
+with the following notes:
+
+- In base32, we never insert linefeeds in base32,
+ and we omit trailing `=` padding characters.
+- In base64,
+ we _sometimes_ omit trailing `=` padding characters,
+ and we do not insert linefeeds unless explicitly noted.
+- We do not insert any other whitespace,
+ except as specifically noted.
+
+Base 16 and base 32 are case-insensitive.
+Unless otherwise stated,
+implementations should accept any cases,
+and should produce a single uniform case.
+
+We sometimes refer to base16 as "hex" or "hexadecimal".
+
+> Note that as of 2023, in some places, the specs are not always
+> explicit about:
+>
+> - which base64 strings are multiline
+> - which base32 strings and base16 strings
+> should be generated in what case.
+>
+> This is something we should correct.
+
+## Notation {#notation}
+
+### Operations on byte strings {#ops}
+
+* `A | B` represents the concatenation of two binary strings `A` and `B`.
+
+### Binary literals {#binary-literals}
+
+When we write a series of one-byte hexadecimal literals
+in square brackets,
+it represents a multi-byte binary string.
+
+> For example,
+> `[6f 6e 69 6f 6e 20 72 6f 75 74 69 6e 67]`
+> is a 13-byte sequence representing the unterminated ASCII string,
+> `onion routing`.
+
+
+
+
+