spec/tor-spec/flow-control.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199

<a id="tor-spec.txt-7"></a>

# Flow control{#flow-control}

<a id="tor-spec.txt-7.1"></a>

## Link throttling

Each client or relay should do appropriate bandwidth throttling to
keep its user happy.

Communicants rely on TCP's default flow control to push back when they
stop reading.

The mainline Tor implementation uses token buckets (one for reads,
one for writes) for the rate limiting.

Since 0.2.0.x, Tor has let the user specify an additional pair of
token buckets for "relayed" traffic, so people can deploy a Tor relay
with strict rate limiting, but also use the same Tor as a client. To
avoid partitioning concerns we combine both classes of traffic over a
given OR connection, and keep track of the last time we read or wrote
a high-priority (non-relayed) cell. If it's been less than N seconds
(currently N=30), we give the whole connection high priority, else we
give the whole connection low priority. We also give low priority
to reads and writes for connections that are serving directory
information. See [proposal 111] for details.

[proposal 111]: ../proposals/111-local-traffic-priority.txt

<a id="tor-spec.txt-7.2"></a>

## Link padding{#link-padding}

Link padding can be created by sending PADDING or VPADDING cells
along the connection; relay messages of type "DROP" can be used for
long-range padding.  The bodies of PADDING cells, VPADDING cells, or DROP
message are filled with padding bytes.
See [Cell Packet format](./cell-packet-format.md#cell-packet-format).

If the link protocol is version 5 or higher, link level padding is
enabled as per padding-spec.txt. On these connections, clients may
negotiate the use of padding with a PADDING_NEGOTIATE command
whose format is as follows:

```text
         Version           [1 byte]
         Command           [1 byte]
         ito_low_ms        [2 bytes]
         ito_high_ms       [2 bytes]
```

Currently, only version 0 of this cell is defined. In it, the command
field is either 1 (stop padding) or 2 (start padding). For the start
padding command, a pair of timeout values specifying a low and a high
range bounds for randomized padding timeouts may be specified as unsigned
integer values in milliseconds. The ito_low_ms field should not be lower
than the current consensus parameter value for nf_ito_low (default:
1500).  The ito_high_ms field should not be lower than ito_low_ms.
(If any party receives an out-of-range value, they clamp it so
that it is in-range.)

For the stop padding command, the timeout fields should be sent as
zero (to avoid client distinguishability) and ignored by the recipient.

For more details on padding behavior, see padding-spec.txt.

<a id="tor-spec.txt-7.3"></a>

## Circuit-level flow control

To control a circuit's bandwidth usage, each OR keeps track of two
'windows', consisting of how many DATA-bearing relay cells it is allowed to
originate or willing to consume.

(For the purposes of flow control,
we call a relay cell "DATA-bearing"
if it holds a DATA relay message.
Note that this design does _not_ limit relay cells that don't contain
a DATA message;
this limitation may be addressed in the future.)

These two windows are respectively named: the package window (packaged for
transmission) and the deliver window (delivered for local streams).

Because of our leaky-pipe topology, every relay on the circuit has a pair
of windows, and the OP has a pair of windows for every relay on the
circuit.
These windows apply only to _originated_ and _consumed_ cells.
They do not, however, apply to _relayed_ cells,
and a relay
that is never used for streams will never decrement its windows or cause the
client to decrement a window.

Each 'window' value is initially set based on the consensus parameter
'circwindow' in the directory (see dir-spec.txt), or to 1000
DATA-bearing relay cells if
no 'circwindow' value is given. In each direction, cells that are not
RELAY_DATA cells do not affect the window.

An OR or OP (depending on the stream direction) sends a RELAY_SENDME message
to indicate that it is willing to receive more DATA-bearing cells when its deliver
window goes down below a full increment (100). For example, if the window
started at 1000, it should send a RELAY_SENDME when it reaches 900.

When an OR or OP receives a RELAY_SENDME, it increments its package window
by a value of 100 (circuit window increment) and proceeds to sending the
remaining DATA-bearing cells.

If a package window reaches 0, the OR or OP stops reading from TCP
connections for all streams on the corresponding circuit, and sends no more
DATA-bearing cells until receiving a RELAY_SENDME message.

If a deliver window goes below 0, the circuit should be torn down.

Starting with tor-0.4.1.1-alpha, authenticated SENDMEs are supported
(version 1, see below). This means that both the OR and OP need to remember
the rolling digest of the relay cell that precedes (triggers) a RELAY_SENDME.
This can be known if the package window gets to a multiple of the circuit
window increment (100).

When the RELAY_SENDME version 1 arrives, it will contain a digest that MUST
match the one remembered. This represents a proof that the end point of the
circuit saw the sent relay cells. On failure to match, the circuit should be torn
down.

To ensure unpredictability, random bytes should be added to at least one
RELAY_DATA cell within one increment window. In other word,
at every 100 data-bearing cells (increment),
random bytes should be introduced in at least one cell.

<a id="tor-spec.txt-7.3.1"></a>

### SENDME Message Format

A circuit-level RELAY_SENDME message always has its StreamID=0.

An OR or OP must obey these two consensus parameters in order to know which
version to emit and accept.

```text
      'sendme_emit_min_version': Minimum version to emit.
      'sendme_accept_min_version': Minimum version to accept.
```

If a RELAY_SENDME version is received that is below the minimum accepted
version, the circuit should be closed.

The body of a RELAY_SENDME message contains the following:

```text
      VERSION     [1 byte]
      DATA_LEN    [2 bytes]
      DATA        [DATA_LEN bytes]
```

The VERSION tells us what is expected in the DATA section of length
DATA_LEN and how to handle it. The recognized values are:

0x00: The rest of the message should be ignored.

0x01: Authenticated SENDME. The DATA section MUST contain:

DIGEST   \[20 bytes\]

```text
         If the DATA_LEN value is less than 20 bytes, the message should be
         dropped and the circuit closed. If the value is more than 20 bytes,
         then the first 20 bytes should be read to get the DIGEST value.

         The DIGEST is the rolling digest value from the DATA-bearing relay cell that
         immediately preceded (triggered) this RELAY_SENDME. This value is
         matched on the other side from the previous cell sent that the OR/OP
         must remember.

         (Note that if the digest in use has an output length greater than 20
         bytes—as is the case for the hop of an onion service rendezvous
         circuit created by the hs_ntor handshake—we truncate the digest
         to 20 bytes here.)
```

If the VERSION is unrecognized or below the minimum accepted version (taken
from the consensus), the circuit should be torn down.

<a id="tor-spec.txt-7.4"></a>

## Stream-level flow control

Edge nodes use RELAY_SENDME messages to implement end-to-end flow
control for individual connections across circuits. Similarly to
circuit-level flow control, edge nodes begin with a window of
DATA-bearing cells
(500) per stream, and increment the window by a fixed value (50)
upon receiving a RELAY_SENDME message. Edge nodes initiate RELAY_SENDME
messages when both a) the window is \<= 450, and b) there are less than
ten cells' worth of data remaining to be flushed at that edge.

Stream-level RELAY_SENDME messages are distinguished by having nonzero
StreamID. They are still empty; the body still SHOULD be ignored.