```
Filename: 340-packed-and-fragmented.md
Title: Packed and fragmented relay messages
Author: Nick Mathewson
Created: 31 May 2022
Status: Open
```

# Introduction

Tor sends long-distance messages on circuits via _relay cells_.  The
current relay cell format allows one _relay message_ (e.g., "BEGIN" or
"DATA" or "END") per relay cell. We want to relax this 1:1 requirement,
between messages and cells, for two reasons:

 * To support relay messages that are longer than the current 498-byte
   limit.  Applications would include wider handshake messages for
   postquantum crypto, UDP messages, and SNIP transfer in walking
   onions.

 * To transmit small messages more efficiently.  Several message types
   (notably `SENDME`, `XON`, `XOFF`, and several types from
   [proposal 329](./329-traffic-splitting.txt)) are much smaller than
   the relay cell size, and could be sent comparatively often.
   We also want to be able to *hide* the transmission of small control messages
   by packing them into what would have been the padding of other messages,
   making them effectively invisible to a network traffic observer.

In this proposal, we describe a way to decouple relay cells from relay
messages.  Relay messages can now be packed into multiple cells or split
across multiple cells.

This proposal combines ideas from
[proposal 319](./319-wide-everything.md) (fragmentation) and
[proposal 325](./325-packed-relay-cells.md) (packed cells).  It requires
[ntor v3](./332-ntor-v3-with-extra-data.md) and prepares for
[next-generation relay cryptography](./308-counter-galois-onion.md).

Additionally, this proposal has been revised to incorporate another
protocol change, and move StreamId from the relay cell header into a new,
optional header.

## A preliminary change: Relay encryption, version 1.5

We are fairly sure that, whatever we do for our next batch of relay
cryptography, we will want to increase the size of the data used to
authenticate relay cells to 128 bits.  (Currently it uses a 4-byte tag
plus 2 bytes of zeros.)

To avoid proliferating formats, I'm going to suggest that we make the
other changes in this proposal changes concurrently with a change in our
relay cryptography, so that we do not have too many incompatible cell
formats going on at the same time.

The new format for a decrypted relay _cell_ will be:

```text
recognized [2 bytes]
digest     [14 bytes]
body       [509 - 16 = 493 bytes]
```

The `recognized` and `digest` fields are computed as before; the only
difference is that they occur _before_ the rest of the cell, and that `digest`
is truncated to 14 bytes instead of 4.

If we are lucky, we won't have to build this encryption at all, and we
can just move to some version of GCM-UIV or other RPRP that reserves 16
bytes for an authentication tag or similar cryptographic object.

The `body` MUST contain exactly 493 bytes as relay cells have a fixed size.

## New relay message packing

We define this new format for a relay message.  We require that
both header parts fit in a single RELAY cell.
However, the body can be split across many relay cells:

```
  Message Header
    command         u8
    length          u16
  Message Routing Header (optional)
    stream_id       u16
  Message Body
    data            u8[length]
```

One big change from the current tor protocol is something that has become
optional: we have moved `stream_id` into a separate inner header that only
appears sometimes named the Message Routing Header. The command value tells us
if the header is to be expected or not.

The following message types take required stream IDs: `BEGIN`, `DATA`, `END`,
`CONNECTED`, `RESOLVE`, `RESOLVED`, and `BEGIN_DIR`, `XON`, `XOFF`.

The following message types from proposal 339 (UDP) take required stream IDs:
`CONNECT_UDP`, `CONNECTED_UDP` and `DATAGRAM`.

No other current message types take stream IDs. The `stream_id` field, when
present, MUST NOT be zero.

Messages can be split across relay cells; multiple messages can occur in
a single relay cell.  We enforce the following rules:

  * Headers may not be split across cells.
  * If a 0 byte follows a message body, there are no more messages.
  * A message body is permitted to end at exactly the end of a relay cell,
    without a 0 byte afterwards.
  * A relay cell may not be "empty": it must have at least some part of
    some message.

Unless specified elsewhere, **all** message types may be packed, and
**all** message types may be fragmented.

Every command has an associated maximum length for its messages.  If not
specified elsewhere, the maximum length for every message is 498 bytes (for
legacy reasons).

Receivers MUST validate that the cell `header` and the `message header` are
well-formed and have valid lengths while handling the cell in which the header
is encoded. If any of them is invalid, the circuit MUST be destroyed.

A message header with an unrecognized `command` is considered invalid and thus
MUST result in the circuit being immediately destroyed (without waiting for the
rest of the relay message to arrive, in the case of a fragmented message).

## New subprotocol `RelayCell`

We introduce a new subprotocol `RelayCell` to specify the relay cell ABI. The
new format specified in this proposal, supporting packing and fragmentation,
corresponds to `RelayCell` version 1. The ABI prior to this proposal is
`RelayCell` version 0.

All clients and relays implicitly support `RelayCell` version 0.

> XXX: Do we want to consider some migration path for eventually removing
> support for `RelayCell` version 0? e.g. maybe this should be something like
> "Support for any of `Relay` versions 1-5 imply support for `RelayCell`
> version 0"?

We reserve the protocol ID 13 for binary encoding of this subprotocol with
respect to [proposal 346][prop346] and [proposal 323][prop323].

To use `RelayCell` version 1 or greater with a given relay on a given circuit,
the client negotiates it using an `ntor_v3` extension, as per [proposal
346][prop346]. This implies that the relay must advertise support for `Relay`
version 5 (`ntor_v3` circuit extensions) as well as the target `RelayCell`
version (1 for the format introduced in this proposal).

Circuits using mixed `RelayCell` versions are permitted. e.g. we anticipate
some of the use-cases for packing and fragmentation to only need the exit-relay
to support it. Not requiring `RelayCell=1` for other relays in the circuit
provides a larger pool of candidate relays. While an intermediate relay using a
different `RelayCell` version than the destination relay of a given relay cell
will look at the wrong bytes for the `recognized` and `digest` fields, they
will reach the correct conclusion that the cell is not intended for them and
pass it to the next hop in the circuit.

## Migration

> Note: This differs from what we decided was our new best-practices.
> Should we make this disableable at all?

We add a consensus parameter, "streamed-relay-messages", with default
value 0, minimum value 0, and maximum value 1.

If this value is 0, then clients will not (by default) negotiate this
relay protocol.  If it is 1, then clients will negotiate it when relays
support it.

For testing, clients can override this setting.  Once enough relays
support this proposal, we'll change the consensus parameter to 1.
Later, we'll change the default to 1 as well.

## Packing decisions

We specify the following greedy algorithm for making decisions about
fragmentation and packing.  Other algorithms are possible, but this one
is fairly simple, and using it will help avoid distinguishability
issues:

Whenever a client or relay is about to send a cell that would leave
at least 32 bytes unused in a relay cell, it checks to see whether there
is any pending data to be sent in the same circuit (in a data cell).  If
there is, then it adds a DATA message to the end of the current cell,
with as much data as possible.  Otherwise, the client sends the cell
with no packed data.

> XXX: This isn't quite right. What was actually implemented in tor, and
> what we want in arti, is to defer sending some "control" messages like
> confluence switch and (non-first) xon, until they can be invisibly packed
> into a cell for a DATA message.
>
> dgoulet: Could you update this section with the concrete details, and exactly
> what property we're trying to achieve? e.g.:
>
> If we have data to send, but the corresponding DATA messages don't leave
> enough room to pack in the deferred control message(s), what do we do? If we
> continue deferring could we end up deferring forever if the application
> always writes in chunks that happen to align this way?
>
> Since cells containing any part of a DATA message is subject to congestion
> windows, does that mean if our congestion window is empty we can't send these
> control messages either (until the window becomes non-empty)?

## Onion services

Negotiating this for onion services will happen in a separate proposal;
it is not a current priority, since there is nothing sent over
rendezvous circuits that we currently need to fragment or pack.

## Miscellany

### Handling `RELAY_EARLY`

The `RELAY_EARLY` status for a command is determined based on the relay
cell in which the command's _header_ appeared.

Thus, a relay MUST close a circuit if the cell containing the _first_
fragment of an `EXTEND` message is not `RELAY_EARLY`, and MUST allow
but NOT require `RELAY_EARLY` to be set on other cells.

This implies that a client only _needs_ to set `RELAY_EARLY` on the cell
containing the _first_ fragment of an EXTEND message, but that it MAY
set RELAY_EARLY on other cells, in order to prevent traffic fingerprinting.

(Note: As now, relays and clients MUST destroy any circuit upon seeing a
`RELAY_EARLY` message in the inbound direction.)

> In our implementation, clients will continue to set `RELAY_EARLY` on
> the first N cells of each circuit, as we do today.

> Note that this description allows us to take two approaches when
> we eventually _do_ support fragmented `EXTEND` messages.  We can
> either set the `RELAY_EARLY` flag on the cell containing the first
> fragment only, or we can continue to set it on the first N cells
> sent on each circuit.  Both will work fine, assuming that the
> limit of `RELAY_EARLY` cells is adjustable.  This brings us to:

#### Making the `RELAY_EARLY` limit adjustable

We add the following parameter, to support an eventual migration to
longer extend cells, in case we decide to take the second approach in
our note above.

    "max_early_per_circ" -- Relays MUST destroy any circuit on
    which they see more than this number of RELAY_EARLY cells.
    Min: 5. Max: 65535. Default: 8.


### Handling `SENDME`s

SENDME messages may not be fragmented; the body and the command must
appear in the same cell.  (This is necessary so authenticated sendmes
can have a reasonable implementation.)

### Interaction with Conflux

Fragmented messages may be used together with Conflux, but we do not
allow fragments from a single method to be sent on separate legs of
a single circuit bundle.

That is to say, it is an error to send a CONFLUX_SWITCH message if
the SeqNum would leave any other circuit with an incomplete message
where not all framgents have arrived. Upon receiving such an
erroneous message, parties SHOULD destroy all circuits in the
conflux bundle.

### An exception for `DATA`.

Data messages may not be fragmented. When packing data into a cell containing
other messages is desired, the application can instead construct a DATA message
of an appropriate size to fit into the remaining space.

While relaxing this could simplify the implementation of opportunistic packing
somewhat (by allowing code that constructs `DATA` messages not to have to know
about packing or fragmentation), doing so would have several downsides.

First, on the receiver side a naive implementation that receives the first cell
of a fragmented `DATA` message would not be able to pass the data in that
fragment on to the application until the remaining cells of that message are
received. An optimized implementation might choose to do so, but that
complexity seems worse than the complexity we'd be avoiding by allowing `DATA`
fragmentation in the first place.

Second, as with any sort of flexibility permitted to implementations, allowing
flexibility here adds opportunities for fingerprinting and covert channels.

### Extending message-length maxima

For now, the maximum length for every message body is 493 bytes, except as
follows:

   - `DATAGRAM` messages (see proposal 339) have a maximum body length
     of 1967 bytes.  (This works out to four relay cells, and
     accommodates most reasonable MTU choices)

Any increase in maximum length for any other message type requires a new
`RelayCell` subprotocol version.  (For example, if we later want to allow
EXTEND2 messages to be 2000 bytes long, we need to add a new proposal
saying so, and reserving a new subprotocol version.)

### `SENDME` window accounting

`SENDME` windows count relay *cells* rather than relay *messages*.

A cell counts towards the circuit's `SENDME` window if it contains any part of
any message that would normally count towards `SENDME` windows (currently only
`DATA`).

A cell counts towards the `SENDME` window of every stream that it contains
part of a message for, whose command counts towards `SENDME` windows.

Examples:

* A cell containing a `SENDME` message and a `RESOLVE` message currently
  wouldn't count towards any windows, since neither of those commands currently
  counts towards windows.
* A cell containing a `SENDME` message and a `DATA` message would count towards
  the circuit window and the `DATA` message's stream's window.
* A cell containing two `DATA` messages, for different streams, would count
  towards the circuit-level window and both stream-level windows.
* A cell containing two `DATA` messages for the *same* stream counts
  *once* towards the circuit-level and stream-level windows.
* If `DATAGRAM` messages (proposal 339) are implemented, and count towards
  windows, then every cell containing a fragment of a `DATAGRAM` message counts
  towards windows.

# Appendix: Example cells

Here is an example of the simplest case: one message, sent in one relay cell:

```
  Cell 1:
    header:
       circid         ..                [4 bytes]
       command        RELAY             [1 byte]
    relay cell header:
       recognized     0                 [2 bytes]
       digest         (...)             [14 bytes]
    message header:
       command        BEGIN             [1 byte]
       length         23                [2 bytes]
    message routing header:
       stream_id      42                [2 bytes]
    message body:
      "www.torproject.org:443\0"        [23 bytes]
    end-of-messages marker:
      0                                 [1 byte]
    padding up to end of cell:
      random                            [464 bytes]
```

Total of 514 bytes which is the absolute maximum relay cell size.

A message whose body ends at exactly the end of a relay cell has no
corresponding end-of-messages marker.

```
  Cell 1:
    header:
       circid         ..                [4 bytes]
       command        RELAY             [1 byte]
    relay cell header:
       recognized     0                 [2 bytes]
       digest         (...)             [14 bytes]
    message header:
       command        DATA              [1 byte]
       length         488               [2 bytes]
    message routing header:
       stream_id      42                [2 bytes]
    message body:
       (data)                           [488 bytes]
```

Here's an example with fragmentation only: a large EXTEND2 message split
across two relay cells.

```
  Cell 1:
    header:
       circid         ..               [4 bytes]
       command        RELAY_EARLY      [1 byte]
    relay cell header:
       recognized     0                [2 bytes]
       digest         (...)            [14 bytes]
    message header:
       command        EXTEND           [1 byte]
       length         800              [2 bytes]
    message body:
       (extend body, part 1)           [490 bytes]

  Cell 2:
    header:
       circid         ..               [4 bytes]
       command        RELAY            [1 byte]
    relay cell header:
      recognized     0                 [2 bytes]
      digest         (...)             [14 bytes]
    message body, continued:
      (extend body, part 2)            [310 bytes] (310+490=800)
    end-of-messages marker:
      0                                [1 byte]
    padding up to end of cell:
      random                           [182 bytes]
```

Each cells are 514 bytes for a message body totalling 800 bytes.

Here is an example with packing only: A `BEGIN_DIR` message and a data message
in the same cell.

```
  Cell 1:
    header:
       circid         ..                [4 bytes]
       command        RELAY             [1 byte]
    relay cell header:
       recognized     0                 [2 bytes]
       digest         (...)             [14 bytes]

    # First relay message
    message header:
       command        BEGIN_DIR         [1 byte]
       length         0                 [2 bytes]
    message routing header:
       stream_id      32                [2 bytes]

    # Second relay message
    message header:
       command        DATA              [1 byte]
       length         25                [2 bytes]
    message routing header:
       stream_id      32                [2 bytes]
    message body:
       "HTTP/1.0 GET /tor/foo\r\n\r\n"  [25 bytes]

    end-of-messages marker:
      0                                 [1 byte]
    padding up to end of cell:
      random                            [457 bytes]

```

Here is an example with packing and fragmentation: a large DATAGRAM cell, a
SENDME cell, and an XON cell.

(Note that this sequence of cells would not actually be generated by the
algorithm described in "Packing decisions" above; this is only an example of
what parties need to accept.)

```
  Cell 1:
    header:
       circid         ..               [4 bytes]
       command        RELAY            [1 byte]
    relay cell header:
       recognized     0                [2 bytes]
       digest         (...)            [14 bytes]

    # First message
    message header:
       command        DATAGRAM         [1 byte]
       length         1200             [2 bytes]
    message routing header:
       stream_id      99               [2 bytes]
    message body:
       (datagram body, part 1)         [488 bytes]

  Cell 2:
    header:
       circid         ..               [4 bytes]
       command        RELAY            [1 byte]
    relay cell header:
      recognized     0                 [2 bytes]
      digest         (...)             [14 bytes]
    message body, continued:
      (datagram body, part 2)          [493 bytes]

  Cell 3:
    header:
       circid         ..               [4 bytes]
       command        RELAY            [1 byte]
    relay cell header:
      recognized     0                 [2 bytes]
      digest         (...)             [14 bytes]
    message body, continued:
      (datagram body, part 3)          [219 bytes] (488+493+219=1200)

    # Second message
    message header:
       command        SENDME           [1 byte]
       length         23               [2 bytes]
    message body:
       version        1                [1 byte]
       datalen        20               [2 bytes]
       data           (digest to ack)  [20 bytes]

    # Third message
    message header:
       command        XON              [1 byte]
       length         1                [2 bytes]
    message routing header:
       stream_id      50               [2 bytes]
    message body:
       version        1                [1 byte]

    end-of-messages marker:
      0                                [1 byte]
    padding up to end of cell:
      random                           [241 bytes]
```

[prop323]: ./323-walking-onions-full.md "Specification for Walking Onions"
[prop346]: ./346-protovers-again.md "Clarifying and extending the use of protocol versioning"