```
Filename: 349-command-state-validation.md
Title: Client-Side Command Acceptance Validation
Author: Mike Perry
Created: 2023-08-17
Status: Draft
```

# Introduction

The ability of relays to inject end-to-end relay cells that are ignored by
clients allows malicious relays to create a covert channel to verify that they
are present in multiple positions of a path. This covert channel allows a
Guard to deanonymize 100% of its traffic, or just all the traffic of a
particular client IP address.

This attack was first documented in [DROPMARK]. Proposal 344 describes the
severity of this attack, and how this kind of end-to-end covert channel leads
to full deanonymization, in a reliable way, in practice. (Recall that dropped
cell attacks are most severe when an adversary can inject arbitrary end-to-end
data patterns at times when the circuit is known to be idle, before it is used
for traffic; injection at this point enables path bias attacks which can
ensure that only malicious Guard+Exit relays are present in all circuits used
by a particular target client IP address. For further details, see Proposal
344.)

This proposal is targeting arti-client, not C-Tor. This proposal is specific
to client-side checks of relay cells and relay messages. Its primary change to
behavior is the definition of state machines that enforce what relay message
commands are acceptable on a given circuit, and when.

By applying and enforcing these state machine rules, we prevent the end-to-end
transmission of arbitrary amounts of data, and ensure that predictable periods
of the protocol are happening as expected, and not filled with side channel
packet patterns.


## Overview of dropped cell types

Dropped cells are cells that a relay can inject that end up ignored and
discarded by a Tor client.

These include:
  1. Unparsable cells
  2. invalid relay commands
  3. Unrecognized cells (ie: wrong source hop, or decrypt failures)
  4. unsupported (or consensus-disabled) relay commands or extensions
  5. out-of-context relay commands
  6. duplicate relay commands
  7. relay commands that hit any error codepaths
  8. relay commands for an invalid or already-closed stream ID
  9. semantically void relay cells (incl relay data len == 0, or PING)
  10. onion descriptor-appended junk

Items 1-4 and 8 are handled by the existing relay command parsers in arti. In
these cases, arti closes the circuit already.

> XXX: Arti's relay parser is lazy; see https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/1978
> Does this mean that individual components need to properly propagate error
> information in order for circuits to get closed, when a command does not
> parse?

The state machines of this proposal handle 5-7 in a rigorous way. (In many
cases of out-of-context relay cells, arti already closes the circuit;
our goal here is to centralize this validation so that we can ensure that
it is not possible for any relay commands to omit checks or allow unbounded
activity.)

> XXX: Does arti allow extra onion-descriptor junk to be appended after the
> descriptor signature? C-Tor does...


# Architectural Patterns and Behavior

Ideally, the handling of invalid protocol behavior should be centralized,
so that validation can happen in one easy-to-audit place, rather than spread
across the codebase (as it currently is with C-Tor).

For some narrow cases of invalid protocol activity, this is trivial. The relay
command acceptance is centralized in arti, which allows arti to immediately
reject unknown or disabled relay commands. This kind of validation is
necessary, but not sufficient, in order to prevent dropped cell vectors.

Things quickly get complicated when handling parsable relay cells sent during
an inappropriate time, or other activity such as duplicate relay commands,
semantically void cells, or commands that would hit an error condition, or
lazy parsing failure, deep in the code and be silently accepted without
closing the circuit.

To handle such cases, we propose adding a relay command message state machine
pattern. Each relay protocol, when it becomes active on a circuit, must
register a state machine that handles validating its messages.

Because multiple relay protocols can be active at a time, multiple validation
state machines can be attached to a circuit. This also allows protocols to
create their own validation without needing to modify the entire validation
process. Relay messages that are not accepted by any active protocol
validation handler MUST result in circuit close.


## Architectural Patterns

In order to handle these cases, we rely on some architectural patterns:
  1. No relay message command may be sent to the client unless it is unless
     explicitly allowed by the specification, advertised as supported, and
     negotiated on a particular channel or circuit. (Prop#346)
  2. Any relay commands or extension fields not successfully negotiated
     on a circuit are invalid. This includes cells from intermediate hops,
     which must also negotiate their use (example: padding machine
     negotiation to middles).
  3. By following the above principles, state machines can be developed
     that govern when a relay command is acceptable. This covers the
     majority of protocol activity. See Section 3.
  4. For some commands, additional checks must be performed by using
     context of the protocol itself.

The following relay commands require additional module state to enforce
limitations, beyond what is known by a state machine, for #4:
  - RELAY_COMMAND_SENDME
    - Requires checking that the auth digest hash is accurate
  - RELAY_COMMAND_XOFF and RELAY_COMMAND_XON
    - Context and rate limiting is stream-dependent
    - Packing enforcement via prop#340 is context-dependent
  - RELAY_COMMAND_CONFLUX_SWITCH
    - Packing enforcement via prop#340 is context-dependent
  - RELAY_COMMAND_DROP:
    - This can only be accepted from a hop if there is a padding
      machine at that hop.
  - RELAY_COMMAND_INTRODUCE2
    - Requires inspecting replay cache (however, circuits should not get
      closed because replays can come from the client)

## Behavior

When an invalid relay cell or relay message is encountered, the corresponding 
circuit should be immediately closed.

Initially, this can be accomplished by sending a DESTROY cell to the Guard
relay.

Additionally, when closing circuits in this way, clients must take care not to
allow cases of adversarially-induced infinite circuit creation in non-onion
service protocols that are not protected by Vanguards/Vanguards-lite, by
limiting the number of retries they perform. (One such example of this is a
malicious conflux exit that repeatedly kills only one leg by injecting dropped
cells to close the circuit.)

While we also specify some cases where the channel to the Guard should be
closed, this is not necessary in the general case.

> XXX: I can't think of any issues severe enough to actually warrant the
> following, but Florentin pointed it out as a possibility: A malicious Guard
> may withhold the DESTROY, and still allow full identifier transmission before
> the circuit is closed. While this does not directly allow full deanonymization
> because the client won't actually use the circuit, it may still be enough to
> make the vector useful for other attacks. For completeness against this
> vector, we may want to consider sending a new RELAY_DESTROY command to the
> middle node, such that it has responsibility for tearing down a circuit by
> sending its own DESTROYS in both directions, and then have the client send its
> own DESTROY if the client does not get a DESTROY from the Guard.
>		>>> See torspec#220: https://gitlab.torproject.org/tpo/core/torspec/-/issues/220


# State machine descriptions

These state machines apply only at the client. (There is no information leak
from extra cells in the protocol on the relay side, so we will not be specifying
relay-side enforcement, or implementing it for C-Tor.)

There are multiple state machines, describing particular circuit purposes
and/or components of the Tor relay protocol.

Each state machine has a "Trigger", and a "Message Scope". The "Trigger" is
the condition, relay command, or action that causes the state machine to get
added to a circuit's command state validator set. The Message Scope is where the state
machine applies: to specific a hop number, stream ID, or both.

A circuit can have multiple state machines attached at one time.
  * If no state machine accepts a relay command, then the circuit MUST be
    closed.
  * When we say "receive X" we mean "receive a _valid_ cell of
    type X".  If the cell is invalid, we MUST kill the circuit

## Relay message handlers

The state machines process enveloped relay message commands. Ie, with respect
to prop#340, they operate on the message bodies, with associated stream ID.

With respect to Proposal #340, the calls to state machine validation would go
after converting cells to messages, but before parsing the message body
itself, to still minimize exposure of the parser attack surfaces.

> XXX: Again, some validation will require early parsing, not lazy parsing

There are multiple relay message handlers that can be registered with each
circuit ID, for a specific hop on that circuit ID, depending on the protocols
that are in use on that circuit with that hop, as well as the streams to that
hop.

Each handler has a Message Scope, that acts as a filter such that only relay
command messages from this scope are processed by that handler.

If a message is not accepted by any active handler, the circuit MUST be
closed.


### Base Handler

Purpose: This handler validates commands for circuit construction and
circuit-level SENDME activity.

Trigger: Creation of a circuit; ntor handhshake success for a hop

Message Scope: The circuit ID and hop number must match for this handler to
apply. (Because of leaky pipes, each hop of the circuit has a base handler
added when that hop completes an ntor handshake and is added to the circuit.)

```text
START:
  Upon sending EXTEND:
     Enter EXTEND_SENT.

  Receive SENDME:
     Ensure expected auth digest matches; close circuit otherwise
     No transition.

EXTEND_SENT:
  Receiving EXTENDED:
     Enter START.

  Receive SENDME:
     Ensure expected auth digest matches; close circuit otherwise
     No transition.
```

### Client Introducing Handler

Purpose: Circuits used by clients to connect to a service introduction point
have this handler attached.

Trigger: Usage of a circuit for client introduction

Message Scope: Circuit ID and hop number must match

```text
CLIENT_INTRO_START:
  Upon sending INTRODUCE1:
    Enter CLIENT_INTRO_WAIT

CLIENT_INTRO_WAIT
  Receieve INTRODUCE_ACK:
    Accept
    Transition to CLIENT_INTRO_END

CLIENT_INTRO_END:
  No transitions possible
  - XXX: Enforce that no new handlers can be added? We may still have padding
    handlers though.
```


### Service Introduce Handler

Purpose: Service-side onion service introduction circuits have this handler
attached.

Trigger: Onion service establishing an introduction point circuit

Message Scope: Circuit ID and hop number must match

```text
SERVICE_INTRO_START:
  Upon sending ESTABLISH_INTRO:
    Enter SERVICE_INTRO_ESTABLISH

SERVICE_INTRO_ESTABLISH:
  Receiving INTRO_ESTABLISHED:
    Enter SERVICE_INTRO_ESTABLISHED

SERVICE_INTRO_ESTABLISHED:
  Receiving INTRODUCE2
    Accept
```


### Client Rendezvous Handler

Purpose: Circuits used by clients to build a rendezvous point have this handler
attached.

Trigger: Client rendezvous initiation

Message Scope: Circuit ID and hop number must match

```text
CLIENT_REND_START:
  Upon Sending RENDEZVOUS1:
    Enter CLIENT_REND_WAIT

CLIENT_REND_WAIT:
  Receive RENDEZVOUS2:
    Enter CLIENT_REND_ESTABLISHED

CLIENT_REND_ESTABLISHED:
  Remain in this state; launch TCP, UDP, or Conflux handlers for streams
```


### Service Rendezvous Handler

Purpose: Circuits used by services to connect to a rendezvous point have this
handler attached.

Trigger: Incoming introduce cell/service rend initiation

Message Scope: Circuit ID and hop number must match

```text
SERVICE_REND_START:
  Upon sending ESTABLISH_RENDEZVOUS:
    Enter SERVICE_REND_WAIT

SERVICE_REND_WAIT:
  Receive RENDEZVOUS_ESTABLISHED:
    Enter SERVICE_REND_ESTABLISHED

SERVICE_REND_ESTABLISHED:
  Remain in this state; launch TCP, UDP, or Conflux handlers for streams
```


### CircPad Handler

Purpose: Circuit-level padding is negotiated with a particular hop in the
circuit; when it is negotiated, we need to allow padding cells from that hop.

Trigger: Negotiation of a circuit padding machine

Message Scope: Circuit ID and hop must match; padding machine must be active

```text
PADDING_START:
  Upon sending PADDING_NEGOTIATE:
    Enter PADDING_NEGOTIATING

PADDING_NEGOTIATING:
  Receiving PADDING_NEGOTIATED:
    Enter PADDING_ACTIVE

PADDING_ACTIVE:
  Receiving DROP:
    Accept (if from correct hop)
    - XXX: We could perform more sophisticated rate limiting accounting here
      too?
```

### Resolve Stream Handler

Purpose: This handler is created on circuits when a resolve happens.

Trigger: RESOLVE message

Message Scope: Circuit ID, stream ID, and hop number must all match

```text
RESOLVE_START:
  Send a RESOLVE message:
    Enter RESOLVE_SENT

RESOLVE_SENT:
  Receive a RESOLVED or an END:
    Enter RESOLVE_START.
```


### TCP Stream handler

Purpose: This handler is created when the client creates a new stream ID, using either
BEGIN or BEGIN_DIR.

Trigger: New AP or DirConn stream

Message Scope: Circuit ID, stream ID, and hop number must all match; stream ID
must be open or half-open (half-open is END_SENT).

```text
TCP_STREAM_START:
  Send a BEGIN or BEGIN_DIR message:
    Enter BEGIN_SENT.

BEGIN_SENT:
  Receive an END:
    Enter TCP_STREAM_START.
  Receive a CONNECTED:
    Enter STREAM_OPEN.

STREAM_OPEN:
  Receive DATA:
    Verify length is > 0
    XXX: Handle [HSDIRINFLATION] here?
    Process.

  Receive XOFF:
    Enter STREAM_XOFF

  Send END:
    Enter END_SENT.

  Receive END:
    Enter TCP_STREAM_START

STREAM_XOFF:
  Receive DATA:
    Verify length is > 0
    XXX: Handle [HSDIRINFLATION] here?
    Process.
 
  Send END:
    Enter END_SENT.

  Receive XON:
    Enter STREAM_XON

  Receive END:
    Enter TCP_STREAM_START

STREAM_XON:
  Receive DATA:
    Verify length is > 0
    XXX: Handle [HSDIRINFLATION] here?
    Process.

  Receive XOFF:
    If prop#340 is enabled, verify packed with SENDME
    Enter STREAM_XOFF

  Receive XON:
    If prop#340 is enabled, verify packed with SENDME
    Verify rate has changed

  Send END:
    Enter END_SENT.

  Receive END:
    Enter TCP_STREAM_START

END_SENT:
  Same as STREAM_OPEN, except do not actually deliver data.
  Only remain in this state for one RTT_max, or until END_ACK.
```


### Conflux Handler

Purpose: Circuits that are a part of a conflux set have a conflux handler, associated
with the last hop.

Trigger: Creation of a conflux set

Message Scope: Circuit ID and hop number must match
 - XXX: Linked circuits must accept stream ids from either circuit for other
   handlers :/

```text
CONFLUX_START: (all conflux leg circuits start here)
  Upon sending CONFLUX_LINK:
     Enter CONFLUX_LINKING

CONFLUX_LINKING:
  Receiving CONFLUX_LINKED:
     Send CONFLUX_LINKED_ACK
     Enter CONFLUX_LINKED

CONFLUX_LINKED:
  Receiving CONFLUX_SWITCH:
     If prop#340 is negotiated, ensure packed with a DATA cell
```


### UDP Stream Handler

Purpose: Circuits that are using prop#339

Trigger: UDP stream creation

Message Scope: Circuit ID, hop number, and stream-id must match

```text
UDP_STREAM_START:
  If no other udp streams used on circuit:
    Send CONNECT_UDP for any stream, enter UDP_CONNECTING
  else:
    Immediately enter UDP_CONNECTING
    (CONNECTED_UDP MAY arrive without a CONNECT_UDP, after the first UDP
     stream on a circuit is established)

UDP_CONNECTING:
  Upon receipt of CONNECTED_UDP, enter UDP_CONNECTED

UDP_CONNECTED:
  Receive DATAGRAM:
    Verify length > 0
    Verify Prop#344 NAT rules are obeyed, including srcport and stream limits
    Process.

  Send END:
    Enter UDP_END_SENT

UDP_END_SENT:
  Same as UDP_CONNECTED, except do not actually deliver data.
  Only remain in this state for one RTT_max, or until END_ACK,
  then transition to UDP_STREAM_START.
```


# HSDIR Inflation  { #HSDIRINFLATION }

XXX: This can be folded into the state machines and/or rend-spec.. The state
machines should actually be able to handle this, once they are ready for it.

One of the most common questions about dropped cells is "what about data cells
with a 1 byte payload?". As Prop#344 makes clear, this is not a dropped cell
attack, but is instead an instance of an Active Traffic Manipulation Covert
Channel, described in Section 1.3.2. The lower severity of active traffic
manipulation is due to the fact that it cannot be used to deanonymize 100% of
a target client's circuits, where as the combination of path bias and
pre-usage dropped cells can.

However, there is one case where one can construct a potent attack from this
Active Traffic Manipulation: by making use of onion service circuits being
built on demand by an application. Further, because the onion service
handshake is uniquely fingerprintable (see Section 1.2.1 of Prop#344), it is
possible to use this vector in this specific case to encode an identifier in
the timing and traffic patterns of the onion service descriptor download,
similar to how the CMU attack operated, and use both the onion service
fingerprint and descriptor traffic pattern to transmit the fact that a
particular onion service was visited, to the Guard or possibly even a local
network observer.

A normal hidden service descriptor occupies only ~10 cells (with a hard max of
30KB, or ~60 cells). This is not enough to reliably encode the full address of
the onion service in a timing-based covert channel.

However, there are two ways to cause this descriptor download to transmit
enough data to encode such a covert channel, and replicate the CMU attack
using timing information of this data.

First, the actual descriptor payload can be spread across many DATA cells that
are filled only partially with data (which does not happen if the HSDIR is
honest and well-behaved, because it always has the full descriptor on hand).

Second, in C-tor, additional junk can be appended at the end of a onion service
descriptor document that does not count against the 30KB maximum, which the
client will happily download and then ignore.

Neither of these things are necessary to preserve, and neither can happen in
normal operation. They can either be addressed directly by checks on
HSDIR-based RELAY_COMMAND_DATA lengths and descriptor parsing, or by simply
enforcing that circuits used to fetch service descriptors can *only* receive
as many bytes as the maximum descriptor size, before being closed.

XXX: Consider RELAY_COMMAND_END_ACK also..
  - https://gitlab.torproject.org/tpo/core/torspec/-/issues/196

XXX: Tickets to grovel through for other stuff:
https://gitlab.torproject.org/tpo/core/torspec/-/issues/38
https://gitlab.torproject.org/tpo/core/torspec/-/issues/39
https://gitlab.torproject.org/tpo/core/arti/-/issues/525


# Command Allowlist enumeration { #CTORALLOWLIST }

XXX: We are planning to remove this section after we finish the state
machines; keeping it for reference until then for cross-checking.

Formerly, in C-Tor, we took the approach of performing a series of checks for
each message command, ad-hoc. Here's those rules, for spot-checking that the
above state machines cover them.

All relay commands are rejected by clients and serviced unless a rule says
they are OK.

Here's a list of those rules, by relay command:

  -  RELAY_COMMAND_DATA 2
    - This command MUST only arrive for valid open or half-open stream ID
    - This command MUST have data length > 0
    - On HSDIR circuits, ONLY ONE command is allowed to have a non-full
      payload (the last command). See Section 4.

  - RELAY_COMMAND_END 3
    - This command MUST only arrive ONCE for each valid open or half-open
      stream ID

  - RELAY_COMMAND_CONNECTED 4
    - This command MUST ONLY be accepted ONCE by clients if they sent a BEGIN
      or BEGIN_DIR
    - The stream ID MUST match the stream ID from BEGIN (or BEGIN_DIR)

  - RELAY_COMMAND_DROP 10
    - This command is accepted by clients from any hop that they
      have negotiated an active circuit padding machine with

  - RELAY_COMMAND_CONFLUX_LINKED 20
    - Ensure that a LINK cell was sent to the hop that sent this
    - Ensure that no previous LINKED cell has arrived on this circuit

  - RELAY_COMMAND_CONFLUX_SWITCH 22
    - Ensure that conflux is enabled and linked
    - If Prop#340 is in use, this cell MUST be packed with a valid
      multiplexed RELAY_COMMAND_DATA cell.

  - RELAY_COMMAND_INTRODUCE2 35
    - Services MUST check:
      - The intro is for a valid service identity and auth
      - The command has a valid sub-credential
      - The command is not a replay (possibly not close circuit?)

  - RELAY_COMMAND_RENDEZVOUS2 37
    - This command MUST ONLY arrive ONCE in response to a sent REND1 cell,
      on the appropriate circuit
    - The ntor handshake must succeed with MAC validation

  - RELAY_COMMAND_INTRO_ESTABLISHED 38
    - Services MUST check:
      - This cell MUST ONLY come ONCE in response to
        RELAY_COMMAND_ESTABLISH_INTRO, for the appropriate service identity

  - RELAY_COMMAND_RENDEZVOUS_ESTABLISHED 39
    - This command MUST ONLY be accepted ONCE in response to
      RELAY_COMMAND_ESTABLISH_RENDEZVOUS

  - RELAY_COMMAND_INTRODUCE_ACK 40
    - This command MUST ONLY be accepted ONCE by clients, in response to
      RELAY_COMMAND_INTRODUCE1

  - RELAY_COMMAND_PADDING_NEGOTIATED 42
    - This command MUST ONLY be accepted by clients in response to
      PADDING_NEGOTIATE

  - RELAY_COMMAND_XOFF 43
    - Ensure that congestion control is enabled and negotiated
    - Ensure that the stream id is either opened or half-open
    - Ensure that the stream id is in "XON" state

  - RELAY_COMMAND_XON 44
    - Ensure that congestion control is enabled and negotiated
    - Ensure that the stream id is either opened or half-open
    - Enforce always packing this to a SENDME with Prop#340?

  - RELAY_COMMAND_CONNECTED_UDP
    - The stream id in this command MUST match that from
      RELAY_COMMAND_CONNECT_UDP
    - This command is only accepted once per UDP stream id

  - RELAY_COMMAND_DATAGRAM
    - This command MUST only arrive for valid open or half-open stream ID
    - This command MUST have data length > 0


References:

[DROPMARK]: https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf