From c5e82549c5b97a487d8f0c11daaefd894c655f30 Mon Sep 17 00:00:00 2001 From: Mike Perry Date: Fri, 18 Aug 2023 20:18:34 +0000 Subject: Prop#349: Command state validation --- proposals/349-command-state-validation.md | 666 ++++++++++++++++++++++++++++++ 1 file changed, 666 insertions(+) create mode 100644 proposals/349-command-state-validation.md diff --git a/proposals/349-command-state-validation.md b/proposals/349-command-state-validation.md new file mode 100644 index 0000000..47598e6 --- /dev/null +++ b/proposals/349-command-state-validation.md @@ -0,0 +1,666 @@ +``` +Filename: 349-command-state-validation.md +Title: Client-Side Command Acceptance Validation +Author: Mike Perry +Created: 2023-08-17 +Status: Draft +``` + +# Introduction + +The ability of relays to inject end-to-end relay cells that are ignored by +clients allows malicious relays to create a covert channel to verify that they +are present in multiple positions of a path. This covert channel allows a +Guard to deanonymize 100% of its traffic, or just all the traffic of a +particular client IP address. + +This attack was first documented in [DROPMARK]. Proposal 344 describes the +severity of this attack, and how this kind of end-to-end covert channel leads +to full deanonymization, in a reliable way, in practice. (Recall that dropped +cell attacks are most severe when an adversary can inject arbitrary end-to-end +data patterns at times when the circuit is known to be idle, before it is used +for traffic; injection at this point enables path bias attacks which can +ensure that only malicious Guard+Exit relays are present in all circuits used +by a particular target client IP address. For further details, see Proposal +344.) + +This proposal is targeting arti-client, not C-Tor. This proposal is specific +to client-side checks of relay cells and relay messages. Its primary change to +behavior is the definition of state machines that enforce what relay message +commands are acceptable on a given circuit, and when. + +By applying and enforcing these state machine rules, we prevent the end-to-end +transmission of arbitrary amounts of data, and ensure that predictable periods +of the protocol are happening as expected, and not filled with side channel +packet patterns. + + + +## Overview of dropped cell types + +Dropped cells are cells that a relay can inject that end up ignored and +discarded by a Tor client. + +These include: + 1. Unparsable cells + 2. invalid relay commands + 3. Unrecognized cells (ie: wrong source hop, or decrypt failures) + 4. unsupported (or consensus-disabled) relay commands or extensions + 5. out-of-context relay commands + 6. duplicate relay commands + 7. relay commands that hit any error codepaths + 8. relay commands for an invalid or already-closed stream ID + 9. semantically void relay cells (incl relay data len == 0, or PING) + 10. onion descriptor-appended junk + +Items 1-4 and 8 are handled by the existing relay command parsers in arti. In +these cases, arti closes the circuit already. + +> XXX: Arti's relay parser is lazy; see https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/1978 +> Does this mean that individual components need to properly propagate error +> information in order for circuits to get closed, when a command does not +> parse? + +The state machines of this proposal handle 5-7 in a rigorous way. (In many +cases of out-of-context relay cells, arti already closes the circuit; +our goal here is to centralize this validation so that we can ensure that +it is not possible for any relay commands to omit checks or allow unbounded +activity.) + +> XXX: Does arti allow extra onion-descriptor junk to be appended after the +> descriptor signature? C-Tor does... + + +# Architectural Patterns and Behavior + +Ideally, the handling of invalid protocol behavior should be centralized, +so that validation can happen in one easy-to-audit place, rather than spread +across the codebase (as it currently is with C-Tor). + +For some narrow cases of invalid protocol activity, this is trivial. The relay +command acceptance is centralized in arti, which allows arti to immediately +reject unknown or disabled relay commands. This kind of validation is +necessary, but not sufficient, in order to prevent dropped cell vectors. + +Things quickly get complicated when handling parsable relay cells sent during +an inappropriate time, or other activity such as duplicate relay commands, +semantically void cells, or commands that would hit an error condition, or +lazy parsing failure, deep in the code and be silently accepted without +closing the circuit. + +To handle such cases, we propose adding a relay command message state machine +pattern. Each relay protocol, when it becomes active on a circuit, must +register a state machine that handles validating its messages. + +Because multiple relay protocols can be active at a time, multiple validation +state machines can be attached to a circuit. This also allows protocols to +create their own validation without needing to modify the entire validation +process. Relay messages that are not accepted by any active protocol +validation handler MUST result in circuit close. + + +## Architectural Patterns + +In order to handle these cases, we rely on some architectural patterns: + 1. No relay message command may be sent to the client unless it is unless + explicitly allowed by the specification, advertised as supported, and + negotiated on a particular channel or circuit. (Prop#346) + 2. Any relay commands or extension fields not successfully negotiated + on a circuit are invalid. This includes cells from intermediate hops, + which must also negotiate their use (example: padding machine + negotiation to middles). + 3. By following the above principles, state machines can be developed + that govern when a relay command is acceptable. This covers the + majority of protocol activity. See Section 3. + 4. For some commands, additional checks must be performed by using + context of the protocol itself. + +The following relay commands require additional module state to enforce +limitations, beyond what is known by a state machine, for #4: + - RELAY_COMMAND_SENDME + - Requires checking that the auth digest hash is accurate + - RELAY_COMMAND_XOFF and RELAY_COMMAND_XON + - Context and rate limiting is stream-dependent + - Packing enforcement via prop#340 is context-dependent + - RELAY_COMMAND_CONFLUX_SWITCH + - Packing enforcement via prop#340 is context-dependent + - RELAY_COMMAND_DROP: + - This can only be accepted from a hop if there is a padding + machine at that hop. + - RELAY_COMMAND_INTRODUCE2 + - Requires inspecting replay cache (however, circuits should not get + closed because replays can come from the client) + +## Behavior + +When an invalid relay cell or relay message is encountered, the corresponding +circuit should be immediately closed. + +Initially, this can be accomplished by sending a DESTROY cell to the Guard +relay. + +Additionally, when closing circuits in this way, clients must take care not to +allow cases of adversarially-induced infinite circuit creation in non-onion +service protocols that are not protected by Vanguards/Vanguards-lite, by +limiting the number of retries they perform. (One such example of this is a +malicious conflux exit that repeatedly kills only one leg by injecting dropped +cells to close the circuit.) + +While we also specify some cases where the channel to the Guard should be +closed, this is not necessary in the general case. + +> XXX: I can't think of any issues severe enough to actually warrant the +> following, but Florentin pointed it out as a possibility: A malicious Guard +> may withhold the DESTROY, and still allow full identifier transmission before +> the circuit is closed. While this does not directly allow full deanonymization +> because the client won't actually use the circuit, it may still be enough to +> make the vector useful for other attacks. For completeness against this +> vector, we may want to consider sending a new RELAY_DESTROY command to the +> middle node, such that it has responsibility for tearing down a circuit by +> sending its own DESTROYS in both directions, and then have the client send its +> own DESTROY if the client does not get a DESTROY from the Guard. +> >>> See torspec#220: https://gitlab.torproject.org/tpo/core/torspec/-/issues/220 + + + +# State machine descriptions + +These state machines apply only at the client. (There is no information leak +from extra cells in the protocol on the relay side, so we will not be specifying +relay-side enforcement, or implementing it for C-Tor.) + +There are multiple state machines, describing particular circuit purposes +and/or components of the Tor relay protocol. + +Each state machine has a "Trigger", and a "Message Scope". The "Trigger" is +the condition, relay command, or action that causes the state machine to get +added to a circuit's command state validator set. The Message Scope is where the state +machine applies: to specific a hop number, stream ID, or both. + +A circuit can have multiple state machines attached at one time. + * If no state machine accepts a relay command, then the circuit MUST be + closed. + * When we say "receive X" we mean "receive a _valid_ cell of + type X". If the cell is invalid, we MUST kill the circuit + +## Relay message handlers + +The state machines process enveloped relay message commands. Ie, with respect +to prop#340, they operate on the message bodies, with associated stream ID. + +With respect to Proposal #340, the calls to state machine validation would go +after converting cells to messages, but before parsing the message body +itself, to still minimize exposure of the parser attack surfaces. + +> XXX: Again, some validation will require early parsing, not lazy parsing + +There are multiple relay message handlers that can be registered with each +circuit ID, for a specific hop on that circuit ID, depending on the protocols +that are in use on that circuit with that hop, as well as the streams to that +hop. + +Each handler has a Message Scope, that acts as a filter such that only relay +command messages from this scope are processed by that handler. + +If a message is not accepted by any active handler, the circuit MUST be +closed. + + +### Base Handler + +Purpose: This handler validates commands for circuit construction and +circuit-level SENDME activity. + +Trigger: Creation of a circuit; ntor handhshake success for a hop + +Message Scope: The circuit ID and hop number must match for this handler to +apply. (Because of leaky pipes, each hop of the circuit has a base handler +added when that hop completes an ntor handshake and is added to the circuit.) + +```text +START: + Upon sending EXTEND: + Enter EXTEND_SENT. + + Receive SENDME: + Ensure expected auth digest matches; close circuit otherwise + No transition. + +EXTEND_SENT: + Receiving EXTENDED: + Enter START. + + Receive SENDME: + Ensure expected auth digest matches; close circuit otherwise + No transition. +``` + +### Client Introducing Handler + +Purpose: Circuits used by clients to connect to a service introduction point +have this handler attached. + +Trigger: Usage of a circuit for client introduction + +Message Scope: Circuit ID and hop number must match + +```text +CLIENT_INTRO_START: + Upon sending INTRODUCE1: + Enter CLIENT_INTRO_WAIT + +CLIENT_INTRO_WAIT + Receieve INTRODUCE_ACK: + Accept + Transition to CLIENT_INTRO_END + +CLIENT_INTRO_END: + No transitions possible + - XXX: Enforce that no new handlers can be added? We may still have padding + handlers though. +``` + + +### Service Introduce Handler + +Purpose: Service-side onion service introduction circuits have this handler +attached. + +Trigger: Onion service establishing an introduction point circuit + +Message Scope: Circuit ID and hop number must match + +```text +SERVICE_INTRO_START: + Upon sending ESTABLISH_INTRO: + Enter SERVICE_INTRO_ESTABLISH + +SERVICE_INTRO_ESTABLISH: + Receiving INTRO_ESTABLISHED: + Enter SERVICE_INTRO_ESTABLISHED + +SERVICE_INTRO_ESTABLISHED: + Receiving INTRODUCE2 + Accept +``` + + +### Client Rendezvous Handler + +Purpose: Circuits used by clients to build a rendezvous point have this handler +attached. + +Trigger: Client rendezvous initiation + +Message Scope: Circuit ID and hop number must match + +```text +CLIENT_REND_START: + Upon Sending RENDEZVOUS1: + Enter CLIENT_REND_WAIT + +CLIENT_REND_WAIT: + Receive RENDEZVOUS2: + Enter CLIENT_REND_ESTABLISHED + +CLIENT_REND_ESTABLISHED: + Remain in this state; launch TCP, UDP, or Conflux handlers for streams +``` + + +### Service Rendezvous Handler + +Purpose: Circuits used by services to connect to a rendezvous point have this +handler attached. + +Trigger: Incoming introduce cell/service rend initiation + +Message Scope: Circuit ID and hop number must match + +```text +SERVICE_REND_START: + Upon sending ESTABLISH_RENDEZVOUS: + Enter SERVICE_REND_WAIT + +SERVICE_REND_WAIT: + Receive RENDEZVOUS_ESTABLISHED: + Enter SERVICE_REND_ESTABLISHED + +SERVICE_REND_ESTABLISHED: + Remain in this state; launch TCP, UDP, or Conflux handlers for streams +``` + + +### CircPad Handler + +Purpose: Circuit-level padding is negotiated with a particular hop in the +circuit; when it is negotiated, we need to allow padding cells from that hop. + +Trigger: Negotiation of a circuit padding machine + +Message Scope: Circuit ID and hop must match; padding machine must be active + +```text +PADDING_START: + Upon sending PADDING_NEGOTIATE: + Enter PADDING_NEGOTIATING + +PADDING_NEGOTIATING: + Receiving PADDING_NEGOTIATED: + Enter PADDING_ACTIVE + +PADDING_ACTIVE: + Receiving DROP: + Accept (if from correct hop) + - XXX: We could perform more sophisticated rate limiting accounting here + too? +``` + +### Resolve Stream Handler + +Purpose: This handler is created on circuits when a resolve happens. + +Trigger: RESOLVE message + +Message Scope: Circuit ID, stream ID, and hop number must all match + +```text +RESOLVE_START: + Send a RESOLVE message: + Enter RESOLVE_SENT + +RESOLVE_SENT: + Receive a RESOLVED or an END: + Enter RESOLVE_START. +``` + + +### TCP Stream handler + +Purpose: This handler is created when the client creates a new stream ID, using either +BEGIN or BEGIN_DIR. + +Trigger: New AP or DirConn stream + +Message Scope: Circuit ID, stream ID, and hop number must all match; stream ID +must be open or half-open (half-open is END_SENT). + +```text +TCP_STREAM_START: + Send a BEGIN or BEGIN_DIR message: + Enter BEGIN_SENT. + +BEGIN_SENT: + Receive an END: + Enter TCP_STREAM_START. + Receive a CONNECTED: + Enter STREAM_OPEN. + +STREAM_OPEN: + Receive DATA: + Verify length is > 0 + XXX: Handle [HSDIRINFLATION] here? + Process. + + Receive XOFF: + Enter STREAM_XOFF + + Send END: + Enter END_SENT. + + Receive END: + Enter TCP_STREAM_START + +STREAM_XOFF: + Receive DATA: + Verify length is > 0 + XXX: Handle [HSDIRINFLATION] here? + Process. + + Send END: + Enter END_SENT. + + Receive XON: + Enter STREAM_XON + + Receive END: + Enter TCP_STREAM_START + +STREAM_XON: + Receive DATA: + Verify length is > 0 + XXX: Handle [HSDIRINFLATION] here? + Process. + + Receive XOFF: + If prop#340 is enabled, verify packed with SENDME + Enter STREAM_XOFF + + Receive XON: + If prop#340 is enabled, verify packed with SENDME + Verify rate has changed + + Send END: + Enter END_SENT. + + Receive END: + Enter TCP_STREAM_START + +END_SENT: + Same as STREAM_OPEN, except do not actually deliver data. + Only remain in this state for one RTT_max, or until END_ACK. +``` + + +### Conflux Handler + +Purpose: Circuits that are a part of a conflux set have a conflux handler, associated +with the last hop. + +Trigger: Creation of a conflux set + +Message Scope: Circuit ID and hop number must match + - XXX: Linked circuits must accept stream ids from either circuit for other + handlers :/ + +```text +CONFLUX_START: (all conflux leg circuits start here) + Upon sending CONFLUX_LINK: + Enter CONFLUX_LINKING + +CONFLUX_LINKING: + Receiving CONFLUX_LINKED: + Send CONFLUX_LINKED_ACK + Enter CONFLUX_LINKED + +CONFLUX_LINKED: + Receiving CONFLUX_SWITCH: + If prop#340 is negotiated, ensure packed with a DATA cell +``` + + +### UDP Stream Handler + +Purpose: Circuits that are using prop#339 + +Trigger: UDP stream creation + +Message Scope: Circuit ID, hop number, and stream-id must match + +```text +UDP_STREAM_START: + If no other udp streams used on circuit: + Send CONNECT_UDP for any stream, enter UDP_CONNECTING + else: + Immediately enter UDP_CONNECTING + (CONNECTED_UDP MAY arrive without a CONNECT_UDP, after the first UDP + stream on a circuit is established) + +UDP_CONNECTING: + Upon receipt of CONNECTED_UDP, enter UDP_CONNECTED + +UDP_CONNECTED: + Receive DATAGRAM: + Verify length > 0 + Verify Prop#344 NAT rules are obeyed, including srcport and stream limits + Process. + + Send END: + Enter UDP_END_SENT + +UDP_END_SENT: + Same as UDP_CONNECTED, except do not actually deliver data. + Only remain in this state for one RTT_max, or until END_ACK, + then transition to UDP_STREAM_START. +``` + + +# HSDIR Inflation { #HSDIRINFLATION } + +XXX: This can be folded into the state machines and/or rend-spec.. The state +machines should actually be able to handle this, once they are ready for it. + +One of the most common questions about dropped cells is "what about data cells +with a 1 byte payload?". As Prop#344 makes clear, this is not a dropped cell +attack, but is instead an instance of an Active Traffic Manipulation Covert +Channel, described in Section 1.3.2. The lower severity of active traffic +manipulation is due to the fact that it cannot be used to deanonymize 100% of +a target client's circuits, where as the combination of path bias and +pre-usage dropped cells can. + +However, there is one case where one can construct a potent attack from this +Active Traffic Manipulation: by making use of onion service circuits being +built on demand by an application. Further, because the onion service +handshake is uniquely fingerprintable (see Section 1.2.1 of Prop#344), it is +possible to use this vector in this specific case to encode an identifier in +the timing and traffic patterns of the onion service descriptor download, +similar to how the CMU attack operated, and use both the onion service +fingerprint and descriptor traffic pattern to transmit the fact that a +particular onion service was visited, to the Guard or possibly even a local +network observer. + +A normal hidden service descriptor occupies only ~10 cells (with a hard max of +30KB, or ~60 cells). This is not enough to reliably encode the full address of +the onion service in a timing-based covert channel. + +However, there are two ways to cause this descriptor download to transmit +enough data to encode such a covert channel, and replicate the CMU attack +using timing information of this data. + +First, the actual descriptor payload can be spread across many DATA cells that +are filled only partially with data (which does not happen if the HSDIR is +honest and well-behaved, because it always has the full descriptor on hand). + +Second, in C-tor, additional junk can be appended at the end of a onion service +descriptor document that does not count against the 30KB maximum, which the +client will happily download and then ignore. + +Neither of these things are necessary to preserve, and neither can happen in +normal operation. They can either be addressed directly by checks on +HSDIR-based RELAY_COMMAND_DATA lengths and descriptor parsing, or by simply +enforcing that circuits used to fetch service descriptors can *only* receive +as many bytes as the maximum descriptor size, before being closed. + +XXX: Consider RELAY_COMMAND_END_ACK also.. + - https://gitlab.torproject.org/tpo/core/torspec/-/issues/196 + +XXX: Tickets to grovel through for other stuff: +https://gitlab.torproject.org/tpo/core/torspec/-/issues/38 +https://gitlab.torproject.org/tpo/core/torspec/-/issues/39 +https://gitlab.torproject.org/tpo/core/arti/-/issues/525 + + +# Command Allowlist enumeration { #CTORALLOWLIST } + +XXX: We are planning to remove this section after we finish the state +machines; keeping it for reference until then for cross-checking. + +Formerly, in C-Tor, we took the approach of performing a series of checks for +each message command, ad-hoc. Here's those rules, for spot-checking that the +above state machines cover them. + +All relay commands are rejected by clients and serviced unless a rule says +they are OK. + +Here's a list of those rules, by relay command: + + - RELAY_COMMAND_DATA 2 + - This command MUST only arrive for valid open or half-open stream ID + - This command MUST have data length > 0 + - On HSDIR circuits, ONLY ONE command is allowed to have a non-full + payload (the last command). See Section 4. + + - RELAY_COMMAND_END 3 + - This command MUST only arrive ONCE for each valid open or half-open + stream ID + + - RELAY_COMMAND_CONNECTED 4 + - This command MUST ONLY be accepted ONCE by clients if they sent a BEGIN + or BEGIN_DIR + - The stream ID MUST match the stream ID from BEGIN (or BEGIN_DIR) + + - RELAY_COMMAND_DROP 10 + - This command is accepted by clients from any hop that they + have negotiated an active circuit padding machine with + + - RELAY_COMMAND_CONFLUX_LINKED 20 + - Ensure that a LINK cell was sent to the hop that sent this + - Ensure that no previous LINKED cell has arrived on this circuit + + - RELAY_COMMAND_CONFLUX_SWITCH 22 + - Ensure that conflux is enabled and linked + - If Prop#340 is in use, this cell MUST be packed with a valid + multiplexed RELAY_COMMAND_DATA cell. + + - RELAY_COMMAND_INTRODUCE2 35 + - Services MUST check: + - The intro is for a valid service identity and auth + - The command has a valid sub-credential + - The command is not a replay (possibly not close circuit?) + + - RELAY_COMMAND_RENDEZVOUS2 37 + - This command MUST ONLY arrive ONCE in response to a sent REND1 cell, + on the appropriate circuit + - The ntor handshake must succeed with MAC validation + + - RELAY_COMMAND_INTRO_ESTABLISHED 38 + - Services MUST check: + - This cell MUST ONLY come ONCE in response to + RELAY_COMMAND_ESTABLISH_INTRO, for the appropriate service identity + + - RELAY_COMMAND_RENDEZVOUS_ESTABLISHED 39 + - This command MUST ONLY be accepted ONCE in response to + RELAY_COMMAND_ESTABLISH_RENDEZVOUS + + - RELAY_COMMAND_INTRODUCE_ACK 40 + - This command MUST ONLY be accepted ONCE by clients, in response to + RELAY_COMMAND_INTRODUCE1 + + - RELAY_COMMAND_PADDING_NEGOTIATED 42 + - This command MUST ONLY be accepted by clients in response to + PADDING_NEGOTIATE + + - RELAY_COMMAND_XOFF 43 + - Ensure that congestion control is enabled and negotiated + - Ensure that the stream id is either opened or half-open + - Ensure that the stream id is in "XON" state + + - RELAY_COMMAND_XON 44 + - Ensure that congestion control is enabled and negotiated + - Ensure that the stream id is either opened or half-open + - Enforce always packing this to a SENDME with Prop#340? + + - RELAY_COMMAND_CONNECTED_UDP + - The stream id in this command MUST match that from + RELAY_COMMAND_CONNECT_UDP + - This command is only accepted once per UDP stream id + + - RELAY_COMMAND_DATAGRAM + - This command MUST only arrive for valid open or half-open stream ID + - This command MUST have data length > 0 + + + +References: + +[DROPMARK]: https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf -- cgit v1.2.3-54-g00ecf