``` Filename: 349-command-state-validation.md Title: Client-Side Command Acceptance Validation Author: Mike Perry Created: 2023-08-17 Status: Draft ``` # Introduction The ability of relays to inject end-to-end relay cells that are ignored by clients allows malicious relays to create a covert channel to verify that they are present in multiple positions of a path. This covert channel allows a Guard to deanonymize 100% of its traffic, or just all the traffic of a particular client IP address. This attack was first documented in [DROPMARK]. Proposal 344 describes the severity of this attack, and how this kind of end-to-end covert channel leads to full deanonymization, in a reliable way, in practice. (Recall that dropped cell attacks are most severe when an adversary can inject arbitrary end-to-end data patterns at times when the circuit is known to be idle, before it is used for traffic; injection at this point enables path bias attacks which can ensure that only malicious Guard+Exit relays are present in all circuits used by a particular target client IP address. For further details, see Proposal 344.) This proposal is targeting arti-client, not C-Tor. This proposal is specific to client-side checks of relay cells and relay messages. Its primary change to behavior is the definition of state machines that enforce what relay message commands are acceptable on a given circuit, and when. By applying and enforcing these state machine rules, we prevent the end-to-end transmission of arbitrary amounts of data, and ensure that predictable periods of the protocol are happening as expected, and not filled with side channel packet patterns. ## Overview of dropped cell types Dropped cells are cells that a relay can inject that end up ignored and discarded by a Tor client. These include: 1. Unparsable cells 2. invalid relay commands 3. Unrecognized cells (ie: wrong source hop, or decrypt failures) 4. unsupported (or consensus-disabled) relay commands or extensions 5. out-of-context relay commands 6. duplicate relay commands 7. relay commands that hit any error codepaths 8. relay commands for an invalid or already-closed stream ID 9. semantically void relay cells (incl relay data len == 0, or PING) 10. onion descriptor-appended junk Items 1-4 and 8 are handled by the existing relay command parsers in arti. In these cases, arti closes the circuit already. > XXX: Arti's relay parser is lazy; see https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/1978 > Does this mean that individual components need to properly propagate error > information in order for circuits to get closed, when a command does not > parse? The state machines of this proposal handle 5-7 in a rigorous way. (In many cases of out-of-context relay cells, arti already closes the circuit; our goal here is to centralize this validation so that we can ensure that it is not possible for any relay commands to omit checks or allow unbounded activity.) > XXX: Does arti allow extra onion-descriptor junk to be appended after the > descriptor signature? C-Tor does... # Architectural Patterns and Behavior Ideally, the handling of invalid protocol behavior should be centralized, so that validation can happen in one easy-to-audit place, rather than spread across the codebase (as it currently is with C-Tor). For some narrow cases of invalid protocol activity, this is trivial. The relay command acceptance is centralized in arti, which allows arti to immediately reject unknown or disabled relay commands. This kind of validation is necessary, but not sufficient, in order to prevent dropped cell vectors. Things quickly get complicated when handling parsable relay cells sent during an inappropriate time, or other activity such as duplicate relay commands, semantically void cells, or commands that would hit an error condition, or lazy parsing failure, deep in the code and be silently accepted without closing the circuit. To handle such cases, we propose adding a relay command message state machine pattern. Each relay protocol, when it becomes active on a circuit, must register a state machine that handles validating its messages. Because multiple relay protocols can be active at a time, multiple validation state machines can be attached to a circuit. This also allows protocols to create their own validation without needing to modify the entire validation process. Relay messages that are not accepted by any active protocol validation handler MUST result in circuit close. ## Architectural Patterns In order to handle these cases, we rely on some architectural patterns: 1. No relay message command may be sent to the client unless it is unless explicitly allowed by the specification, advertised as supported, and negotiated on a particular channel or circuit. (Prop#346) 2. Any relay commands or extension fields not successfully negotiated on a circuit are invalid. This includes cells from intermediate hops, which must also negotiate their use (example: padding machine negotiation to middles). 3. By following the above principles, state machines can be developed that govern when a relay command is acceptable. This covers the majority of protocol activity. See Section 3. 4. For some commands, additional checks must be performed by using context of the protocol itself. The following relay commands require additional module state to enforce limitations, beyond what is known by a state machine, for #4: - RELAY_COMMAND_SENDME - Requires checking that the auth digest hash is accurate - RELAY_COMMAND_XOFF and RELAY_COMMAND_XON - Context and rate limiting is stream-dependent - Packing enforcement via prop#340 is context-dependent - RELAY_COMMAND_CONFLUX_SWITCH - Packing enforcement via prop#340 is context-dependent - RELAY_COMMAND_DROP: - This can only be accepted from a hop if there is a padding machine at that hop. - RELAY_COMMAND_INTRODUCE2 - Requires inspecting replay cache (however, circuits should not get closed because replays can come from the client) ## Behavior When an invalid relay cell or relay message is encountered, the corresponding circuit should be immediately closed. Initially, this can be accomplished by sending a DESTROY cell to the Guard relay. Additionally, when closing circuits in this way, clients must take care not to allow cases of adversarially-induced infinite circuit creation in non-onion service protocols that are not protected by Vanguards/Vanguards-lite, by limiting the number of retries they perform. (One such example of this is a malicious conflux exit that repeatedly kills only one leg by injecting dropped cells to close the circuit.) While we also specify some cases where the channel to the Guard should be closed, this is not necessary in the general case. > XXX: I can't think of any issues severe enough to actually warrant the > following, but Florentin pointed it out as a possibility: A malicious Guard > may withhold the DESTROY, and still allow full identifier transmission before > the circuit is closed. While this does not directly allow full deanonymization > because the client won't actually use the circuit, it may still be enough to > make the vector useful for other attacks. For completeness against this > vector, we may want to consider sending a new RELAY_DESTROY command to the > middle node, such that it has responsibility for tearing down a circuit by > sending its own DESTROYS in both directions, and then have the client send its > own DESTROY if the client does not get a DESTROY from the Guard. > >>> See torspec#220: https://gitlab.torproject.org/tpo/core/torspec/-/issues/220 # State machine descriptions These state machines apply only at the client. (There is no information leak from extra cells in the protocol on the relay side, so we will not be specifying relay-side enforcement, or implementing it for C-Tor.) There are multiple state machines, describing particular circuit purposes and/or components of the Tor relay protocol. Each state machine has a "Trigger", and a "Message Scope". The "Trigger" is the condition, relay command, or action that causes the state machine to get added to a circuit's command state validator set. The Message Scope is where the state machine applies: to specific a hop number, stream ID, or both. A circuit can have multiple state machines attached at one time. * If no state machine accepts a relay command, then the circuit MUST be closed. * When we say "receive X" we mean "receive a _valid_ cell of type X". If the cell is invalid, we MUST kill the circuit ## Relay message handlers The state machines process enveloped relay message commands. Ie, with respect to prop#340, they operate on the message bodies, with associated stream ID. With respect to Proposal #340, the calls to state machine validation would go after converting cells to messages, but before parsing the message body itself, to still minimize exposure of the parser attack surfaces. > XXX: Again, some validation will require early parsing, not lazy parsing There are multiple relay message handlers that can be registered with each circuit ID, for a specific hop on that circuit ID, depending on the protocols that are in use on that circuit with that hop, as well as the streams to that hop. Each handler has a Message Scope, that acts as a filter such that only relay command messages from this scope are processed by that handler. If a message is not accepted by any active handler, the circuit MUST be closed. ### Base Handler Purpose: This handler validates commands for circuit construction and circuit-level SENDME activity. Trigger: Creation of a circuit; ntor handhshake success for a hop Message Scope: The circuit ID and hop number must match for this handler to apply. (Because of leaky pipes, each hop of the circuit has a base handler added when that hop completes an ntor handshake and is added to the circuit.) ```text START: Upon sending EXTEND: Enter EXTEND_SENT. Receive SENDME: Ensure expected auth digest matches; close circuit otherwise No transition. EXTEND_SENT: Receiving EXTENDED: Enter START. Receive SENDME: Ensure expected auth digest matches; close circuit otherwise No transition. ``` ### Client Introducing Handler Purpose: Circuits used by clients to connect to a service introduction point have this handler attached. Trigger: Usage of a circuit for client introduction Message Scope: Circuit ID and hop number must match ```text CLIENT_INTRO_START: Upon sending INTRODUCE1: Enter CLIENT_INTRO_WAIT CLIENT_INTRO_WAIT Receieve INTRODUCE_ACK: Accept Transition to CLIENT_INTRO_END CLIENT_INTRO_END: No transitions possible - XXX: Enforce that no new handlers can be added? We may still have padding handlers though. ``` ### Service Introduce Handler Purpose: Service-side onion service introduction circuits have this handler attached. Trigger: Onion service establishing an introduction point circuit Message Scope: Circuit ID and hop number must match ```text SERVICE_INTRO_START: Upon sending ESTABLISH_INTRO: Enter SERVICE_INTRO_ESTABLISH SERVICE_INTRO_ESTABLISH: Receiving INTRO_ESTABLISHED: Enter SERVICE_INTRO_ESTABLISHED SERVICE_INTRO_ESTABLISHED: Receiving INTRODUCE2 Accept ``` ### Client Rendezvous Handler Purpose: Circuits used by clients to build a rendezvous point have this handler attached. Trigger: Client rendezvous initiation Message Scope: Circuit ID and hop number must match ```text CLIENT_REND_START: Upon Sending RENDEZVOUS1: Enter CLIENT_REND_WAIT CLIENT_REND_WAIT: Receive RENDEZVOUS2: Enter CLIENT_REND_ESTABLISHED CLIENT_REND_ESTABLISHED: Remain in this state; launch TCP, UDP, or Conflux handlers for streams ``` ### Service Rendezvous Handler Purpose: Circuits used by services to connect to a rendezvous point have this handler attached. Trigger: Incoming introduce cell/service rend initiation Message Scope: Circuit ID and hop number must match ```text SERVICE_REND_START: Upon sending ESTABLISH_RENDEZVOUS: Enter SERVICE_REND_WAIT SERVICE_REND_WAIT: Receive RENDEZVOUS_ESTABLISHED: Enter SERVICE_REND_ESTABLISHED SERVICE_REND_ESTABLISHED: Remain in this state; launch TCP, UDP, or Conflux handlers for streams ``` ### CircPad Handler Purpose: Circuit-level padding is negotiated with a particular hop in the circuit; when it is negotiated, we need to allow padding cells from that hop. Trigger: Negotiation of a circuit padding machine Message Scope: Circuit ID and hop must match; padding machine must be active ```text PADDING_START: Upon sending PADDING_NEGOTIATE: Enter PADDING_NEGOTIATING PADDING_NEGOTIATING: Receiving PADDING_NEGOTIATED: Enter PADDING_ACTIVE PADDING_ACTIVE: Receiving DROP: Accept (if from correct hop) - XXX: We could perform more sophisticated rate limiting accounting here too? ``` ### Resolve Stream Handler Purpose: This handler is created on circuits when a resolve happens. Trigger: RESOLVE message Message Scope: Circuit ID, stream ID, and hop number must all match ```text RESOLVE_START: Send a RESOLVE message: Enter RESOLVE_SENT RESOLVE_SENT: Receive a RESOLVED or an END: Enter RESOLVE_START. ``` ### TCP Stream handler Purpose: This handler is created when the client creates a new stream ID, using either BEGIN or BEGIN_DIR. Trigger: New AP or DirConn stream Message Scope: Circuit ID, stream ID, and hop number must all match; stream ID must be open or half-open (half-open is END_SENT). ```text TCP_STREAM_START: Send a BEGIN or BEGIN_DIR message: Enter BEGIN_SENT. BEGIN_SENT: Receive an END: Enter TCP_STREAM_START. Receive a CONNECTED: Enter STREAM_OPEN. STREAM_OPEN: Receive DATA: Verify length is > 0 XXX: Handle [HSDIRINFLATION] here? Process. Receive XOFF: Enter STREAM_XOFF Send END: Enter END_SENT. Receive END: Enter TCP_STREAM_START STREAM_XOFF: Receive DATA: Verify length is > 0 XXX: Handle [HSDIRINFLATION] here? Process. Send END: Enter END_SENT. Receive XON: Enter STREAM_XON Receive END: Enter TCP_STREAM_START STREAM_XON: Receive DATA: Verify length is > 0 XXX: Handle [HSDIRINFLATION] here? Process. Receive XOFF: If prop#340 is enabled, verify packed with SENDME Enter STREAM_XOFF Receive XON: If prop#340 is enabled, verify packed with SENDME Verify rate has changed Send END: Enter END_SENT. Receive END: Enter TCP_STREAM_START END_SENT: Same as STREAM_OPEN, except do not actually deliver data. Only remain in this state for one RTT_max, or until END_ACK. ``` ### Conflux Handler Purpose: Circuits that are a part of a conflux set have a conflux handler, associated with the last hop. Trigger: Creation of a conflux set Message Scope: Circuit ID and hop number must match - XXX: Linked circuits must accept stream ids from either circuit for other handlers :/ ```text CONFLUX_START: (all conflux leg circuits start here) Upon sending CONFLUX_LINK: Enter CONFLUX_LINKING CONFLUX_LINKING: Receiving CONFLUX_LINKED: Send CONFLUX_LINKED_ACK Enter CONFLUX_LINKED CONFLUX_LINKED: Receiving CONFLUX_SWITCH: If prop#340 is negotiated, ensure packed with a DATA cell ``` ### UDP Stream Handler Purpose: Circuits that are using prop#339 Trigger: UDP stream creation Message Scope: Circuit ID, hop number, and stream-id must match ```text UDP_STREAM_START: If no other udp streams used on circuit: Send CONNECT_UDP for any stream, enter UDP_CONNECTING else: Immediately enter UDP_CONNECTING (CONNECTED_UDP MAY arrive without a CONNECT_UDP, after the first UDP stream on a circuit is established) UDP_CONNECTING: Upon receipt of CONNECTED_UDP, enter UDP_CONNECTED UDP_CONNECTED: Receive DATAGRAM: Verify length > 0 Verify Prop#344 NAT rules are obeyed, including srcport and stream limits Process. Send END: Enter UDP_END_SENT UDP_END_SENT: Same as UDP_CONNECTED, except do not actually deliver data. Only remain in this state for one RTT_max, or until END_ACK, then transition to UDP_STREAM_START. ``` # HSDIR Inflation { #HSDIRINFLATION } XXX: This can be folded into the state machines and/or rend-spec.. The state machines should actually be able to handle this, once they are ready for it. One of the most common questions about dropped cells is "what about data cells with a 1 byte payload?". As Prop#344 makes clear, this is not a dropped cell attack, but is instead an instance of an Active Traffic Manipulation Covert Channel, described in Section 1.3.2. The lower severity of active traffic manipulation is due to the fact that it cannot be used to deanonymize 100% of a target client's circuits, where as the combination of path bias and pre-usage dropped cells can. However, there is one case where one can construct a potent attack from this Active Traffic Manipulation: by making use of onion service circuits being built on demand by an application. Further, because the onion service handshake is uniquely fingerprintable (see Section 1.2.1 of Prop#344), it is possible to use this vector in this specific case to encode an identifier in the timing and traffic patterns of the onion service descriptor download, similar to how the CMU attack operated, and use both the onion service fingerprint and descriptor traffic pattern to transmit the fact that a particular onion service was visited, to the Guard or possibly even a local network observer. A normal hidden service descriptor occupies only ~10 cells (with a hard max of 30KB, or ~60 cells). This is not enough to reliably encode the full address of the onion service in a timing-based covert channel. However, there are two ways to cause this descriptor download to transmit enough data to encode such a covert channel, and replicate the CMU attack using timing information of this data. First, the actual descriptor payload can be spread across many DATA cells that are filled only partially with data (which does not happen if the HSDIR is honest and well-behaved, because it always has the full descriptor on hand). Second, in C-tor, additional junk can be appended at the end of a onion service descriptor document that does not count against the 30KB maximum, which the client will happily download and then ignore. Neither of these things are necessary to preserve, and neither can happen in normal operation. They can either be addressed directly by checks on HSDIR-based RELAY_COMMAND_DATA lengths and descriptor parsing, or by simply enforcing that circuits used to fetch service descriptors can *only* receive as many bytes as the maximum descriptor size, before being closed. XXX: Consider RELAY_COMMAND_END_ACK also.. - https://gitlab.torproject.org/tpo/core/torspec/-/issues/196 XXX: Tickets to grovel through for other stuff: https://gitlab.torproject.org/tpo/core/torspec/-/issues/38 https://gitlab.torproject.org/tpo/core/torspec/-/issues/39 https://gitlab.torproject.org/tpo/core/arti/-/issues/525 # Command Allowlist enumeration { #CTORALLOWLIST } XXX: We are planning to remove this section after we finish the state machines; keeping it for reference until then for cross-checking. Formerly, in C-Tor, we took the approach of performing a series of checks for each message command, ad-hoc. Here's those rules, for spot-checking that the above state machines cover them. All relay commands are rejected by clients and serviced unless a rule says they are OK. Here's a list of those rules, by relay command: - RELAY_COMMAND_DATA 2 - This command MUST only arrive for valid open or half-open stream ID - This command MUST have data length > 0 - On HSDIR circuits, ONLY ONE command is allowed to have a non-full payload (the last command). See Section 4. - RELAY_COMMAND_END 3 - This command MUST only arrive ONCE for each valid open or half-open stream ID - RELAY_COMMAND_CONNECTED 4 - This command MUST ONLY be accepted ONCE by clients if they sent a BEGIN or BEGIN_DIR - The stream ID MUST match the stream ID from BEGIN (or BEGIN_DIR) - RELAY_COMMAND_DROP 10 - This command is accepted by clients from any hop that they have negotiated an active circuit padding machine with - RELAY_COMMAND_CONFLUX_LINKED 20 - Ensure that a LINK cell was sent to the hop that sent this - Ensure that no previous LINKED cell has arrived on this circuit - RELAY_COMMAND_CONFLUX_SWITCH 22 - Ensure that conflux is enabled and linked - If Prop#340 is in use, this cell MUST be packed with a valid multiplexed RELAY_COMMAND_DATA cell. - RELAY_COMMAND_INTRODUCE2 35 - Services MUST check: - The intro is for a valid service identity and auth - The command has a valid sub-credential - The command is not a replay (possibly not close circuit?) - RELAY_COMMAND_RENDEZVOUS2 37 - This command MUST ONLY arrive ONCE in response to a sent REND1 cell, on the appropriate circuit - The ntor handshake must succeed with MAC validation - RELAY_COMMAND_INTRO_ESTABLISHED 38 - Services MUST check: - This cell MUST ONLY come ONCE in response to RELAY_COMMAND_ESTABLISH_INTRO, for the appropriate service identity - RELAY_COMMAND_RENDEZVOUS_ESTABLISHED 39 - This command MUST ONLY be accepted ONCE in response to RELAY_COMMAND_ESTABLISH_RENDEZVOUS - RELAY_COMMAND_INTRODUCE_ACK 40 - This command MUST ONLY be accepted ONCE by clients, in response to RELAY_COMMAND_INTRODUCE1 - RELAY_COMMAND_PADDING_NEGOTIATED 42 - This command MUST ONLY be accepted by clients in response to PADDING_NEGOTIATE - RELAY_COMMAND_XOFF 43 - Ensure that congestion control is enabled and negotiated - Ensure that the stream id is either opened or half-open - Ensure that the stream id is in "XON" state - RELAY_COMMAND_XON 44 - Ensure that congestion control is enabled and negotiated - Ensure that the stream id is either opened or half-open - Enforce always packing this to a SENDME with Prop#340? - RELAY_COMMAND_CONNECTED_UDP - The stream id in this command MUST match that from RELAY_COMMAND_CONNECT_UDP - This command is only accepted once per UDP stream id - RELAY_COMMAND_DATAGRAM - This command MUST only arrive for valid open or half-open stream ID - This command MUST have data length > 0 References: [DROPMARK]: https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf