From c5e82549c5b97a487d8f0c11daaefd894c655f30 Mon Sep 17 00:00:00 2001
From: Mike Perry <mikeperry-git@torproject.org>
Date: Fri, 18 Aug 2023 20:18:34 +0000
Subject: Prop#349: Command state validation

---
 proposals/349-command-state-validation.md | 666 ++++++++++++++++++++++++++++++
 1 file changed, 666 insertions(+)
 create mode 100644 proposals/349-command-state-validation.md

diff --git a/proposals/349-command-state-validation.md b/proposals/349-command-state-validation.md
new file mode 100644
index 0000000..47598e6
--- /dev/null
+++ b/proposals/349-command-state-validation.md
@@ -0,0 +1,666 @@
+```
+Filename: 349-command-state-validation.md
+Title: Client-Side Command Acceptance Validation
+Author: Mike Perry
+Created: 2023-08-17
+Status: Draft
+```
+
+# Introduction
+
+The ability of relays to inject end-to-end relay cells that are ignored by
+clients allows malicious relays to create a covert channel to verify that they
+are present in multiple positions of a path. This covert channel allows a
+Guard to deanonymize 100% of its traffic, or just all the traffic of a
+particular client IP address.
+
+This attack was first documented in [DROPMARK]. Proposal 344 describes the
+severity of this attack, and how this kind of end-to-end covert channel leads
+to full deanonymization, in a reliable way, in practice. (Recall that dropped
+cell attacks are most severe when an adversary can inject arbitrary end-to-end
+data patterns at times when the circuit is known to be idle, before it is used
+for traffic; injection at this point enables path bias attacks which can
+ensure that only malicious Guard+Exit relays are present in all circuits used
+by a particular target client IP address. For further details, see Proposal
+344.)
+
+This proposal is targeting arti-client, not C-Tor. This proposal is specific
+to client-side checks of relay cells and relay messages. Its primary change to
+behavior is the definition of state machines that enforce what relay message
+commands are acceptable on a given circuit, and when.
+
+By applying and enforcing these state machine rules, we prevent the end-to-end
+transmission of arbitrary amounts of data, and ensure that predictable periods
+of the protocol are happening as expected, and not filled with side channel
+packet patterns.
+
+
+
+## Overview of dropped cell types
+
+Dropped cells are cells that a relay can inject that end up ignored and
+discarded by a Tor client.
+
+These include:
+  1. Unparsable cells
+  2. invalid relay commands
+  3. Unrecognized cells (ie: wrong source hop, or decrypt failures)
+  4. unsupported (or consensus-disabled) relay commands or extensions
+  5. out-of-context relay commands
+  6. duplicate relay commands
+  7. relay commands that hit any error codepaths
+  8. relay commands for an invalid or already-closed stream ID
+  9. semantically void relay cells (incl relay data len == 0, or PING)
+  10. onion descriptor-appended junk
+
+Items 1-4 and 8 are handled by the existing relay command parsers in arti. In
+these cases, arti closes the circuit already.
+
+> XXX: Arti's relay parser is lazy; see https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/1978
+> Does this mean that individual components need to properly propagate error
+> information in order for circuits to get closed, when a command does not
+> parse?
+
+The state machines of this proposal handle 5-7 in a rigorous way. (In many
+cases of out-of-context relay cells, arti already closes the circuit;
+our goal here is to centralize this validation so that we can ensure that
+it is not possible for any relay commands to omit checks or allow unbounded
+activity.)
+
+> XXX: Does arti allow extra onion-descriptor junk to be appended after the
+> descriptor signature? C-Tor does...
+
+
+# Architectural Patterns and Behavior
+
+Ideally, the handling of invalid protocol behavior should be centralized,
+so that validation can happen in one easy-to-audit place, rather than spread
+across the codebase (as it currently is with C-Tor).
+
+For some narrow cases of invalid protocol activity, this is trivial. The relay
+command acceptance is centralized in arti, which allows arti to immediately
+reject unknown or disabled relay commands. This kind of validation is
+necessary, but not sufficient, in order to prevent dropped cell vectors.
+
+Things quickly get complicated when handling parsable relay cells sent during
+an inappropriate time, or other activity such as duplicate relay commands,
+semantically void cells, or commands that would hit an error condition, or
+lazy parsing failure, deep in the code and be silently accepted without
+closing the circuit.
+
+To handle such cases, we propose adding a relay command message state machine
+pattern. Each relay protocol, when it becomes active on a circuit, must
+register a state machine that handles validating its messages.
+
+Because multiple relay protocols can be active at a time, multiple validation
+state machines can be attached to a circuit. This also allows protocols to
+create their own validation without needing to modify the entire validation
+process. Relay messages that are not accepted by any active protocol
+validation handler MUST result in circuit close.
+
+
+## Architectural Patterns
+
+In order to handle these cases, we rely on some architectural patterns:
+  1. No relay message command may be sent to the client unless it is unless
+     explicitly allowed by the specification, advertised as supported, and
+     negotiated on a particular channel or circuit. (Prop#346)
+  2. Any relay commands or extension fields not successfully negotiated
+     on a circuit are invalid. This includes cells from intermediate hops,
+     which must also negotiate their use (example: padding machine
+     negotiation to middles).
+  3. By following the above principles, state machines can be developed
+     that govern when a relay command is acceptable. This covers the
+     majority of protocol activity. See Section 3.
+  4. For some commands, additional checks must be performed by using
+     context of the protocol itself.
+
+The following relay commands require additional module state to enforce
+limitations, beyond what is known by a state machine, for #4:
+  - RELAY_COMMAND_SENDME
+    - Requires checking that the auth digest hash is accurate
+  - RELAY_COMMAND_XOFF and RELAY_COMMAND_XON
+    - Context and rate limiting is stream-dependent
+    - Packing enforcement via prop#340 is context-dependent
+  - RELAY_COMMAND_CONFLUX_SWITCH
+    - Packing enforcement via prop#340 is context-dependent
+  - RELAY_COMMAND_DROP:
+    - This can only be accepted from a hop if there is a padding
+      machine at that hop.
+  - RELAY_COMMAND_INTRODUCE2
+    - Requires inspecting replay cache (however, circuits should not get
+      closed because replays can come from the client)
+
+## Behavior
+
+When an invalid relay cell or relay message is encountered, the corresponding 
+circuit should be immediately closed.
+
+Initially, this can be accomplished by sending a DESTROY cell to the Guard
+relay.
+
+Additionally, when closing circuits in this way, clients must take care not to
+allow cases of adversarially-induced infinite circuit creation in non-onion
+service protocols that are not protected by Vanguards/Vanguards-lite, by
+limiting the number of retries they perform. (One such example of this is a
+malicious conflux exit that repeatedly kills only one leg by injecting dropped
+cells to close the circuit.)
+
+While we also specify some cases where the channel to the Guard should be
+closed, this is not necessary in the general case.
+
+> XXX: I can't think of any issues severe enough to actually warrant the
+> following, but Florentin pointed it out as a possibility: A malicious Guard
+> may withhold the DESTROY, and still allow full identifier transmission before
+> the circuit is closed. While this does not directly allow full deanonymization
+> because the client won't actually use the circuit, it may still be enough to
+> make the vector useful for other attacks. For completeness against this
+> vector, we may want to consider sending a new RELAY_DESTROY command to the
+> middle node, such that it has responsibility for tearing down a circuit by
+> sending its own DESTROYS in both directions, and then have the client send its
+> own DESTROY if the client does not get a DESTROY from the Guard.
+>		>>> See torspec#220: https://gitlab.torproject.org/tpo/core/torspec/-/issues/220
+
+
+
+# State machine descriptions
+
+These state machines apply only at the client. (There is no information leak
+from extra cells in the protocol on the relay side, so we will not be specifying
+relay-side enforcement, or implementing it for C-Tor.)
+
+There are multiple state machines, describing particular circuit purposes
+and/or components of the Tor relay protocol.
+
+Each state machine has a "Trigger", and a "Message Scope". The "Trigger" is
+the condition, relay command, or action that causes the state machine to get
+added to a circuit's command state validator set. The Message Scope is where the state
+machine applies: to specific a hop number, stream ID, or both.
+
+A circuit can have multiple state machines attached at one time.
+  * If no state machine accepts a relay command, then the circuit MUST be
+    closed.
+  * When we say "receive X" we mean "receive a _valid_ cell of
+    type X".  If the cell is invalid, we MUST kill the circuit
+
+## Relay message handlers
+
+The state machines process enveloped relay message commands. Ie, with respect
+to prop#340, they operate on the message bodies, with associated stream ID.
+
+With respect to Proposal #340, the calls to state machine validation would go
+after converting cells to messages, but before parsing the message body
+itself, to still minimize exposure of the parser attack surfaces.
+
+> XXX: Again, some validation will require early parsing, not lazy parsing
+
+There are multiple relay message handlers that can be registered with each
+circuit ID, for a specific hop on that circuit ID, depending on the protocols
+that are in use on that circuit with that hop, as well as the streams to that
+hop.
+
+Each handler has a Message Scope, that acts as a filter such that only relay
+command messages from this scope are processed by that handler.
+
+If a message is not accepted by any active handler, the circuit MUST be
+closed.
+
+
+### Base Handler
+
+Purpose: This handler validates commands for circuit construction and
+circuit-level SENDME activity.
+
+Trigger: Creation of a circuit; ntor handhshake success for a hop
+
+Message Scope: The circuit ID and hop number must match for this handler to
+apply. (Because of leaky pipes, each hop of the circuit has a base handler
+added when that hop completes an ntor handshake and is added to the circuit.)
+
+```text
+START:
+  Upon sending EXTEND:
+     Enter EXTEND_SENT.
+
+  Receive SENDME:
+     Ensure expected auth digest matches; close circuit otherwise
+     No transition.
+
+EXTEND_SENT:
+  Receiving EXTENDED:
+     Enter START.
+
+  Receive SENDME:
+     Ensure expected auth digest matches; close circuit otherwise
+     No transition.
+```
+
+### Client Introducing Handler
+
+Purpose: Circuits used by clients to connect to a service introduction point
+have this handler attached.
+
+Trigger: Usage of a circuit for client introduction
+
+Message Scope: Circuit ID and hop number must match
+
+```text
+CLIENT_INTRO_START:
+  Upon sending INTRODUCE1:
+    Enter CLIENT_INTRO_WAIT
+
+CLIENT_INTRO_WAIT
+  Receieve INTRODUCE_ACK:
+    Accept
+    Transition to CLIENT_INTRO_END
+
+CLIENT_INTRO_END:
+  No transitions possible
+  - XXX: Enforce that no new handlers can be added? We may still have padding
+    handlers though.
+```
+
+
+### Service Introduce Handler
+
+Purpose: Service-side onion service introduction circuits have this handler
+attached.
+
+Trigger: Onion service establishing an introduction point circuit
+
+Message Scope: Circuit ID and hop number must match
+
+```text
+SERVICE_INTRO_START:
+  Upon sending ESTABLISH_INTRO:
+    Enter SERVICE_INTRO_ESTABLISH
+
+SERVICE_INTRO_ESTABLISH:
+  Receiving INTRO_ESTABLISHED:
+    Enter SERVICE_INTRO_ESTABLISHED
+
+SERVICE_INTRO_ESTABLISHED:
+  Receiving INTRODUCE2
+    Accept
+```
+
+
+### Client Rendezvous Handler
+
+Purpose: Circuits used by clients to build a rendezvous point have this handler
+attached.
+
+Trigger: Client rendezvous initiation
+
+Message Scope: Circuit ID and hop number must match
+
+```text
+CLIENT_REND_START:
+  Upon Sending RENDEZVOUS1:
+    Enter CLIENT_REND_WAIT
+
+CLIENT_REND_WAIT:
+  Receive RENDEZVOUS2:
+    Enter CLIENT_REND_ESTABLISHED
+
+CLIENT_REND_ESTABLISHED:
+  Remain in this state; launch TCP, UDP, or Conflux handlers for streams
+```
+
+
+### Service Rendezvous Handler
+
+Purpose: Circuits used by services to connect to a rendezvous point have this
+handler attached.
+
+Trigger: Incoming introduce cell/service rend initiation
+
+Message Scope: Circuit ID and hop number must match
+
+```text
+SERVICE_REND_START:
+  Upon sending ESTABLISH_RENDEZVOUS:
+    Enter SERVICE_REND_WAIT
+
+SERVICE_REND_WAIT:
+  Receive RENDEZVOUS_ESTABLISHED:
+    Enter SERVICE_REND_ESTABLISHED
+
+SERVICE_REND_ESTABLISHED:
+  Remain in this state; launch TCP, UDP, or Conflux handlers for streams
+```
+
+
+### CircPad Handler
+
+Purpose: Circuit-level padding is negotiated with a particular hop in the
+circuit; when it is negotiated, we need to allow padding cells from that hop.
+
+Trigger: Negotiation of a circuit padding machine
+
+Message Scope: Circuit ID and hop must match; padding machine must be active
+
+```text
+PADDING_START:
+  Upon sending PADDING_NEGOTIATE:
+    Enter PADDING_NEGOTIATING
+
+PADDING_NEGOTIATING:
+  Receiving PADDING_NEGOTIATED:
+    Enter PADDING_ACTIVE
+
+PADDING_ACTIVE:
+  Receiving DROP:
+    Accept (if from correct hop)
+    - XXX: We could perform more sophisticated rate limiting accounting here
+      too?
+```
+
+### Resolve Stream Handler
+
+Purpose: This handler is created on circuits when a resolve happens.
+
+Trigger: RESOLVE message
+
+Message Scope: Circuit ID, stream ID, and hop number must all match
+
+```text
+RESOLVE_START:
+  Send a RESOLVE message:
+    Enter RESOLVE_SENT
+
+RESOLVE_SENT:
+  Receive a RESOLVED or an END:
+    Enter RESOLVE_START.
+```
+
+
+### TCP Stream handler
+
+Purpose: This handler is created when the client creates a new stream ID, using either
+BEGIN or BEGIN_DIR.
+
+Trigger: New AP or DirConn stream
+
+Message Scope: Circuit ID, stream ID, and hop number must all match; stream ID
+must be open or half-open (half-open is END_SENT).
+
+```text
+TCP_STREAM_START:
+  Send a BEGIN or BEGIN_DIR message:
+    Enter BEGIN_SENT.
+
+BEGIN_SENT:
+  Receive an END:
+    Enter TCP_STREAM_START.
+  Receive a CONNECTED:
+    Enter STREAM_OPEN.
+
+STREAM_OPEN:
+  Receive DATA:
+    Verify length is > 0
+    XXX: Handle [HSDIRINFLATION] here?
+    Process.
+
+  Receive XOFF:
+    Enter STREAM_XOFF
+
+  Send END:
+    Enter END_SENT.
+
+  Receive END:
+    Enter TCP_STREAM_START
+
+STREAM_XOFF:
+  Receive DATA:
+    Verify length is > 0
+    XXX: Handle [HSDIRINFLATION] here?
+    Process.
+ 
+  Send END:
+    Enter END_SENT.
+
+  Receive XON:
+    Enter STREAM_XON
+
+  Receive END:
+    Enter TCP_STREAM_START
+
+STREAM_XON:
+  Receive DATA:
+    Verify length is > 0
+    XXX: Handle [HSDIRINFLATION] here?
+    Process.
+
+  Receive XOFF:
+    If prop#340 is enabled, verify packed with SENDME
+    Enter STREAM_XOFF
+
+  Receive XON:
+    If prop#340 is enabled, verify packed with SENDME
+    Verify rate has changed
+
+  Send END:
+    Enter END_SENT.
+
+  Receive END:
+    Enter TCP_STREAM_START
+
+END_SENT:
+  Same as STREAM_OPEN, except do not actually deliver data.
+  Only remain in this state for one RTT_max, or until END_ACK.
+```
+
+
+### Conflux Handler
+
+Purpose: Circuits that are a part of a conflux set have a conflux handler, associated
+with the last hop.
+
+Trigger: Creation of a conflux set
+
+Message Scope: Circuit ID and hop number must match
+ - XXX: Linked circuits must accept stream ids from either circuit for other
+   handlers :/
+
+```text
+CONFLUX_START: (all conflux leg circuits start here)
+  Upon sending CONFLUX_LINK:
+     Enter CONFLUX_LINKING
+
+CONFLUX_LINKING:
+  Receiving CONFLUX_LINKED:
+     Send CONFLUX_LINKED_ACK
+     Enter CONFLUX_LINKED
+
+CONFLUX_LINKED:
+  Receiving CONFLUX_SWITCH:
+     If prop#340 is negotiated, ensure packed with a DATA cell
+```
+
+
+### UDP Stream Handler
+
+Purpose: Circuits that are using prop#339
+
+Trigger: UDP stream creation
+
+Message Scope: Circuit ID, hop number, and stream-id must match
+
+```text
+UDP_STREAM_START:
+  If no other udp streams used on circuit:
+    Send CONNECT_UDP for any stream, enter UDP_CONNECTING
+  else:
+    Immediately enter UDP_CONNECTING
+    (CONNECTED_UDP MAY arrive without a CONNECT_UDP, after the first UDP
+     stream on a circuit is established)
+
+UDP_CONNECTING:
+  Upon receipt of CONNECTED_UDP, enter UDP_CONNECTED
+
+UDP_CONNECTED:
+  Receive DATAGRAM:
+    Verify length > 0
+    Verify Prop#344 NAT rules are obeyed, including srcport and stream limits
+    Process.
+
+  Send END:
+    Enter UDP_END_SENT
+
+UDP_END_SENT:
+  Same as UDP_CONNECTED, except do not actually deliver data.
+  Only remain in this state for one RTT_max, or until END_ACK,
+  then transition to UDP_STREAM_START.
+```
+
+
+# HSDIR Inflation  { #HSDIRINFLATION }
+
+XXX: This can be folded into the state machines and/or rend-spec.. The state
+machines should actually be able to handle this, once they are ready for it.
+
+One of the most common questions about dropped cells is "what about data cells
+with a 1 byte payload?". As Prop#344 makes clear, this is not a dropped cell
+attack, but is instead an instance of an Active Traffic Manipulation Covert
+Channel, described in Section 1.3.2. The lower severity of active traffic
+manipulation is due to the fact that it cannot be used to deanonymize 100% of
+a target client's circuits, where as the combination of path bias and
+pre-usage dropped cells can.
+
+However, there is one case where one can construct a potent attack from this
+Active Traffic Manipulation: by making use of onion service circuits being
+built on demand by an application. Further, because the onion service
+handshake is uniquely fingerprintable (see Section 1.2.1 of Prop#344), it is
+possible to use this vector in this specific case to encode an identifier in
+the timing and traffic patterns of the onion service descriptor download,
+similar to how the CMU attack operated, and use both the onion service
+fingerprint and descriptor traffic pattern to transmit the fact that a
+particular onion service was visited, to the Guard or possibly even a local
+network observer.
+
+A normal hidden service descriptor occupies only ~10 cells (with a hard max of
+30KB, or ~60 cells). This is not enough to reliably encode the full address of
+the onion service in a timing-based covert channel.
+
+However, there are two ways to cause this descriptor download to transmit
+enough data to encode such a covert channel, and replicate the CMU attack
+using timing information of this data.
+
+First, the actual descriptor payload can be spread across many DATA cells that
+are filled only partially with data (which does not happen if the HSDIR is
+honest and well-behaved, because it always has the full descriptor on hand).
+
+Second, in C-tor, additional junk can be appended at the end of a onion service
+descriptor document that does not count against the 30KB maximum, which the
+client will happily download and then ignore.
+
+Neither of these things are necessary to preserve, and neither can happen in
+normal operation. They can either be addressed directly by checks on
+HSDIR-based RELAY_COMMAND_DATA lengths and descriptor parsing, or by simply
+enforcing that circuits used to fetch service descriptors can *only* receive
+as many bytes as the maximum descriptor size, before being closed.
+
+XXX: Consider RELAY_COMMAND_END_ACK also..
+  - https://gitlab.torproject.org/tpo/core/torspec/-/issues/196
+
+XXX: Tickets to grovel through for other stuff:
+https://gitlab.torproject.org/tpo/core/torspec/-/issues/38
+https://gitlab.torproject.org/tpo/core/torspec/-/issues/39
+https://gitlab.torproject.org/tpo/core/arti/-/issues/525
+
+
+# Command Allowlist enumeration { #CTORALLOWLIST }
+
+XXX: We are planning to remove this section after we finish the state
+machines; keeping it for reference until then for cross-checking.
+
+Formerly, in C-Tor, we took the approach of performing a series of checks for
+each message command, ad-hoc. Here's those rules, for spot-checking that the
+above state machines cover them.
+
+All relay commands are rejected by clients and serviced unless a rule says
+they are OK.
+
+Here's a list of those rules, by relay command:
+
+  -  RELAY_COMMAND_DATA 2
+    - This command MUST only arrive for valid open or half-open stream ID
+    - This command MUST have data length > 0
+    - On HSDIR circuits, ONLY ONE command is allowed to have a non-full
+      payload (the last command). See Section 4.
+
+  - RELAY_COMMAND_END 3
+    - This command MUST only arrive ONCE for each valid open or half-open
+      stream ID
+
+  - RELAY_COMMAND_CONNECTED 4
+    - This command MUST ONLY be accepted ONCE by clients if they sent a BEGIN
+      or BEGIN_DIR
+    - The stream ID MUST match the stream ID from BEGIN (or BEGIN_DIR)
+
+  - RELAY_COMMAND_DROP 10
+    - This command is accepted by clients from any hop that they
+      have negotiated an active circuit padding machine with
+
+  - RELAY_COMMAND_CONFLUX_LINKED 20
+    - Ensure that a LINK cell was sent to the hop that sent this
+    - Ensure that no previous LINKED cell has arrived on this circuit
+
+  - RELAY_COMMAND_CONFLUX_SWITCH 22
+    - Ensure that conflux is enabled and linked
+    - If Prop#340 is in use, this cell MUST be packed with a valid
+      multiplexed RELAY_COMMAND_DATA cell.
+
+  - RELAY_COMMAND_INTRODUCE2 35
+    - Services MUST check:
+      - The intro is for a valid service identity and auth
+      - The command has a valid sub-credential
+      - The command is not a replay (possibly not close circuit?)
+
+  - RELAY_COMMAND_RENDEZVOUS2 37
+    - This command MUST ONLY arrive ONCE in response to a sent REND1 cell,
+      on the appropriate circuit
+    - The ntor handshake must succeed with MAC validation
+
+  - RELAY_COMMAND_INTRO_ESTABLISHED 38
+    - Services MUST check:
+      - This cell MUST ONLY come ONCE in response to
+        RELAY_COMMAND_ESTABLISH_INTRO, for the appropriate service identity
+
+  - RELAY_COMMAND_RENDEZVOUS_ESTABLISHED 39
+    - This command MUST ONLY be accepted ONCE in response to
+      RELAY_COMMAND_ESTABLISH_RENDEZVOUS
+
+  - RELAY_COMMAND_INTRODUCE_ACK 40
+    - This command MUST ONLY be accepted ONCE by clients, in response to
+      RELAY_COMMAND_INTRODUCE1
+
+  - RELAY_COMMAND_PADDING_NEGOTIATED 42
+    - This command MUST ONLY be accepted by clients in response to
+      PADDING_NEGOTIATE
+
+  - RELAY_COMMAND_XOFF 43
+    - Ensure that congestion control is enabled and negotiated
+    - Ensure that the stream id is either opened or half-open
+    - Ensure that the stream id is in "XON" state
+
+  - RELAY_COMMAND_XON 44
+    - Ensure that congestion control is enabled and negotiated
+    - Ensure that the stream id is either opened or half-open
+    - Enforce always packing this to a SENDME with Prop#340?
+
+  - RELAY_COMMAND_CONNECTED_UDP
+    - The stream id in this command MUST match that from
+      RELAY_COMMAND_CONNECT_UDP
+    - This command is only accepted once per UDP stream id
+
+  - RELAY_COMMAND_DATAGRAM
+    - This command MUST only arrive for valid open or half-open stream ID
+    - This command MUST have data length > 0
+
+
+
+References:
+
+[DROPMARK]: https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf
-- 
cgit v1.2.3-54-g00ecf