aboutsummaryrefslogtreecommitdiff
path: root/spec/control-spec/message-format.md
blob: a9af669c2f2818ff1cb7049364f7f522bc1d3b29 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
<a id="control-spec.txt-2"></a>

# Message format

<a id="control-spec.txt-2.1"></a>

## Description format

The message formats listed below use ABNF as described in RFC 2234.
The protocol itself is loosely based on SMTP (see RFC 2821).

We use the following nonterminals from RFC 2822: atom, qcontent

We define the following general-use nonterminals:

QuotedString = DQUOTE \*qcontent DQUOTE

There are explicitly no limits on line length.  All 8-bit characters
are permitted unless explicitly disallowed.  In QuotedStrings,
backslashes and quotes must be escaped; other characters need not be
escaped.

Wherever CRLF is specified to be accepted from the controller, Tor MAY also
accept LF.  Tor, however, MUST NOT generate LF instead of CRLF.
Controllers SHOULD always send CRLF.

<a id="control-spec.txt-2.1.1"></a>

### Notes on an escaping bug

CString = DQUOTE \*qcontent DQUOTE

Note that although these nonterminals have the same grammar, they
are interpreted differently.  In a QuotedString, a backslash
followed by any character represents that character.  But
in a CString, the escapes "\\n", "\\t", "\\r", and the octal escapes
"\\0" ... "\\377" represent newline, tab, carriage return, and the
256 possible octet values respectively.

The use of CString in this document reflects a bug in Tor;
they should have been QuotedString instead.  In the future, they
may migrate to use QuotedString instead.  If they do, the
QuotedString implementation will never place a backslash before a
"n", "t", "r", or digit, to ensure that old controllers don't get
confused.

For future-proofing, controller implementors MAY use the following
rules to be compatible with buggy Tor implementations and with
future ones that implement the spec as intended:

```text
    Read \n \t \r and \0 ... \377 as C escapes.
    Treat a backslash followed by any other character as that character.
```

Currently, many of the QuotedString instances below that Tor
outputs are in fact CStrings.  We intend to fix this in future
versions of Tor, and document which ones were broken.  (See
bugtracker ticket #14555 for a bit more information.)

Note that this bug exists only in strings generated by Tor for the
Tor controller; Tor should parse input QuotedStrings from the
controller correctly.

<a id="control-spec.txt-2.2"></a>

## Commands from controller to Tor { #commands }

```text
    Command = Keyword OptArguments CRLF / "+" Keyword OptArguments CRLF CmdData
    Keyword = 1*ALPHA
    OptArguments = [ SP *(SP / VCHAR) ]
```

A command is either a single line containing a Keyword and arguments, or a
multiline command whose initial keyword begins with +, and whose data
section ends with a single "." on a line of its own.  (We use a special
character to distinguish multiline commands so that Tor can correctly parse
multi-line commands that it does not recognize.) Specific commands and
their arguments are described below in section 3.

<a id="control-spec.txt-2.3"></a>

## Replies from Tor to the controller { #replies }

```text
    Reply = SyncReply / AsyncReply
    SyncReply = *(MidReplyLine / DataReplyLine) EndReplyLine
    AsyncReply = *(MidReplyLine / DataReplyLine) EndReplyLine

    MidReplyLine = StatusCode "-" ReplyLine
    DataReplyLine = StatusCode "+" ReplyLine CmdData
    EndReplyLine = StatusCode SP ReplyLine
    ReplyLine = [ReplyText] CRLF
    ReplyText = XXXX
    StatusCode = 3DIGIT
```

Unless specified otherwise, multiple lines in a single reply from
Tor to the controller are guaranteed to share the same status
code. Specific replies are mentioned below in section 3, and
described more fully in section 4.

\[Compatibility note:  versions of Tor before 0.2.0.3-alpha sometimes
generate AsyncReplies of the form "\*(MidReplyLine / DataReplyLine)".
This is incorrect, but controllers that need to work with these
versions of Tor should be prepared to get multi-line AsyncReplies with
the final line (usually "650 OK") omitted.\]

<a id="control-spec.txt-2.4"></a>

## General-use tokens { #tokens }

; CRLF means, "the ASCII Carriage Return character (decimal value 13)
; followed by the ASCII Linefeed character (decimal value 10)."
CRLF = CR LF

; How a controller tells Tor about a particular OR.  There are four
; possible formats:
;    $Fingerprint -- The router whose identity key hashes to the fingerprint.
;        This is the preferred way to refer to an OR.
;    $Fingerprint~Nickname -- The router whose identity key hashes to the
;        given fingerprint, but only if the router has the given nickname.
;    $Fingerprint=Nickname -- The router whose identity key hashes to the
;        given fingerprint, but only if the router is Named and has the given
;        nickname.
;    Nickname -- The Named router with the given nickname, or, if no such
;        router exists, any router whose nickname matches the one given.
;        This is not a safe way to refer to routers, since Named status
;        could under some circumstances change over time.
;
; The tokens that implement the above follow:

ServerSpec = LongName / Nickname
LongName   = Fingerprint \[ "~" Nickname \]

; For tors older than 0.3.1.3-alpha, LongName may have included an equal
; sign ("=") in lieu of a tilde ("~").  The presence of an equal sign
; denoted that the OR possessed the "Named" flag:

LongName   = Fingerprint \[ ( "=" / "~" ) Nickname \]

Fingerprint = "$" 40*HEXDIG
NicknameChar = "a"-"z" / "A"-"Z" / "0" - "9"
Nickname = 1*19 NicknameChar

; What follows is an outdated way to refer to ORs.
; Feature VERBOSE_NAMES replaces ServerID with LongName in events and
; GETINFO results. VERBOSE_NAMES can be enabled starting in Tor version
; 0.1.2.2-alpha and it is always-on in 0.2.2.1-alpha and later.
ServerID = Nickname / Fingerprint

; Unique identifiers for streams or circuits.  Currently, Tor only
; uses digits, but this may change
StreamID = 1*16 IDChar
CircuitID = 1*16 IDChar
ConnID = 1*16 IDChar
QueueID = 1*16 IDChar
IDChar = ALPHA / DIGIT

Address = ip4-address / ip6-address / hostname   (XXXX Define these)

; A "CmdData" section is a sequence of octets concluded by the terminating
; sequence CRLF "." CRLF.  The terminating sequence may not appear in the
; body of the data.  Leading periods on lines in the data are escaped with
; an additional leading period as in RFC 2821 section 4.5.2.
CmdData = *DataLine "." CRLF
DataLine = CRLF / "." 1*LineItem CRLF / NonDotItem *LineItem CRLF
LineItem = NonCR / 1*CR NonCRLF
NonDotItem = NonDotCR / 1\*CR NonCRLF

; ISOTime, ISOTime2, and ISOTime2Frac are time formats as specified in
; ISO8601.
;  example ISOTime:      "2012-01-11 12:15:33"
;  example ISOTime2:     "2012-01-11T12:15:33"
;  example ISOTime2Frac: "2012-01-11T12:15:33.51"
IsoDatePart = 4*DIGIT "-" 2*DIGIT "-" 2*DIGIT
IsoTimePart = 2*DIGIT ":" 2*DIGIT ":" 2*DIGIT
ISOTime  = IsoDatePart " " IsoTimePart
ISOTime2 = IsoDatePart "T" IsoTimePart
ISOTime2Frac = IsoTime2 \[ "." 1\*DIGIT \]

; Numbers
LeadingDigit = "1" - "9"
UInt = LeadingDigit \*Digit