aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMike Perry <mikeperry-git@torproject.org>2023-08-18 20:18:34 +0000
committerMike Perry <mikeperry-git@torproject.org>2024-03-06 21:20:00 +0000
commitc5e82549c5b97a487d8f0c11daaefd894c655f30 (patch)
treea580ae69b6535c749ee21d9cb350888014db1b17
parent8a9c8bf0c0641487bffd99b5fb7bd8775d022170 (diff)
downloadtorspec-c5e82549c5b97a487d8f0c11daaefd894c655f30.tar.gz
torspec-c5e82549c5b97a487d8f0c11daaefd894c655f30.zip
Prop#349: Command state validation
-rw-r--r--proposals/349-command-state-validation.md666
1 files changed, 666 insertions, 0 deletions
diff --git a/proposals/349-command-state-validation.md b/proposals/349-command-state-validation.md
new file mode 100644
index 0000000..47598e6
--- /dev/null
+++ b/proposals/349-command-state-validation.md
@@ -0,0 +1,666 @@
+```
+Filename: 349-command-state-validation.md
+Title: Client-Side Command Acceptance Validation
+Author: Mike Perry
+Created: 2023-08-17
+Status: Draft
+```
+
+# Introduction
+
+The ability of relays to inject end-to-end relay cells that are ignored by
+clients allows malicious relays to create a covert channel to verify that they
+are present in multiple positions of a path. This covert channel allows a
+Guard to deanonymize 100% of its traffic, or just all the traffic of a
+particular client IP address.
+
+This attack was first documented in [DROPMARK]. Proposal 344 describes the
+severity of this attack, and how this kind of end-to-end covert channel leads
+to full deanonymization, in a reliable way, in practice. (Recall that dropped
+cell attacks are most severe when an adversary can inject arbitrary end-to-end
+data patterns at times when the circuit is known to be idle, before it is used
+for traffic; injection at this point enables path bias attacks which can
+ensure that only malicious Guard+Exit relays are present in all circuits used
+by a particular target client IP address. For further details, see Proposal
+344.)
+
+This proposal is targeting arti-client, not C-Tor. This proposal is specific
+to client-side checks of relay cells and relay messages. Its primary change to
+behavior is the definition of state machines that enforce what relay message
+commands are acceptable on a given circuit, and when.
+
+By applying and enforcing these state machine rules, we prevent the end-to-end
+transmission of arbitrary amounts of data, and ensure that predictable periods
+of the protocol are happening as expected, and not filled with side channel
+packet patterns.
+
+
+
+## Overview of dropped cell types
+
+Dropped cells are cells that a relay can inject that end up ignored and
+discarded by a Tor client.
+
+These include:
+ 1. Unparsable cells
+ 2. invalid relay commands
+ 3. Unrecognized cells (ie: wrong source hop, or decrypt failures)
+ 4. unsupported (or consensus-disabled) relay commands or extensions
+ 5. out-of-context relay commands
+ 6. duplicate relay commands
+ 7. relay commands that hit any error codepaths
+ 8. relay commands for an invalid or already-closed stream ID
+ 9. semantically void relay cells (incl relay data len == 0, or PING)
+ 10. onion descriptor-appended junk
+
+Items 1-4 and 8 are handled by the existing relay command parsers in arti. In
+these cases, arti closes the circuit already.
+
+> XXX: Arti's relay parser is lazy; see https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/1978
+> Does this mean that individual components need to properly propagate error
+> information in order for circuits to get closed, when a command does not
+> parse?
+
+The state machines of this proposal handle 5-7 in a rigorous way. (In many
+cases of out-of-context relay cells, arti already closes the circuit;
+our goal here is to centralize this validation so that we can ensure that
+it is not possible for any relay commands to omit checks or allow unbounded
+activity.)
+
+> XXX: Does arti allow extra onion-descriptor junk to be appended after the
+> descriptor signature? C-Tor does...
+
+
+# Architectural Patterns and Behavior
+
+Ideally, the handling of invalid protocol behavior should be centralized,
+so that validation can happen in one easy-to-audit place, rather than spread
+across the codebase (as it currently is with C-Tor).
+
+For some narrow cases of invalid protocol activity, this is trivial. The relay
+command acceptance is centralized in arti, which allows arti to immediately
+reject unknown or disabled relay commands. This kind of validation is
+necessary, but not sufficient, in order to prevent dropped cell vectors.
+
+Things quickly get complicated when handling parsable relay cells sent during
+an inappropriate time, or other activity such as duplicate relay commands,
+semantically void cells, or commands that would hit an error condition, or
+lazy parsing failure, deep in the code and be silently accepted without
+closing the circuit.
+
+To handle such cases, we propose adding a relay command message state machine
+pattern. Each relay protocol, when it becomes active on a circuit, must
+register a state machine that handles validating its messages.
+
+Because multiple relay protocols can be active at a time, multiple validation
+state machines can be attached to a circuit. This also allows protocols to
+create their own validation without needing to modify the entire validation
+process. Relay messages that are not accepted by any active protocol
+validation handler MUST result in circuit close.
+
+
+## Architectural Patterns
+
+In order to handle these cases, we rely on some architectural patterns:
+ 1. No relay message command may be sent to the client unless it is unless
+ explicitly allowed by the specification, advertised as supported, and
+ negotiated on a particular channel or circuit. (Prop#346)
+ 2. Any relay commands or extension fields not successfully negotiated
+ on a circuit are invalid. This includes cells from intermediate hops,
+ which must also negotiate their use (example: padding machine
+ negotiation to middles).
+ 3. By following the above principles, state machines can be developed
+ that govern when a relay command is acceptable. This covers the
+ majority of protocol activity. See Section 3.
+ 4. For some commands, additional checks must be performed by using
+ context of the protocol itself.
+
+The following relay commands require additional module state to enforce
+limitations, beyond what is known by a state machine, for #4:
+ - RELAY_COMMAND_SENDME
+ - Requires checking that the auth digest hash is accurate
+ - RELAY_COMMAND_XOFF and RELAY_COMMAND_XON
+ - Context and rate limiting is stream-dependent
+ - Packing enforcement via prop#340 is context-dependent
+ - RELAY_COMMAND_CONFLUX_SWITCH
+ - Packing enforcement via prop#340 is context-dependent
+ - RELAY_COMMAND_DROP:
+ - This can only be accepted from a hop if there is a padding
+ machine at that hop.
+ - RELAY_COMMAND_INTRODUCE2
+ - Requires inspecting replay cache (however, circuits should not get
+ closed because replays can come from the client)
+
+## Behavior
+
+When an invalid relay cell or relay message is encountered, the corresponding
+circuit should be immediately closed.
+
+Initially, this can be accomplished by sending a DESTROY cell to the Guard
+relay.
+
+Additionally, when closing circuits in this way, clients must take care not to
+allow cases of adversarially-induced infinite circuit creation in non-onion
+service protocols that are not protected by Vanguards/Vanguards-lite, by
+limiting the number of retries they perform. (One such example of this is a
+malicious conflux exit that repeatedly kills only one leg by injecting dropped
+cells to close the circuit.)
+
+While we also specify some cases where the channel to the Guard should be
+closed, this is not necessary in the general case.
+
+> XXX: I can't think of any issues severe enough to actually warrant the
+> following, but Florentin pointed it out as a possibility: A malicious Guard
+> may withhold the DESTROY, and still allow full identifier transmission before
+> the circuit is closed. While this does not directly allow full deanonymization
+> because the client won't actually use the circuit, it may still be enough to
+> make the vector useful for other attacks. For completeness against this
+> vector, we may want to consider sending a new RELAY_DESTROY command to the
+> middle node, such that it has responsibility for tearing down a circuit by
+> sending its own DESTROYS in both directions, and then have the client send its
+> own DESTROY if the client does not get a DESTROY from the Guard.
+> >>> See torspec#220: https://gitlab.torproject.org/tpo/core/torspec/-/issues/220
+
+
+
+# State machine descriptions
+
+These state machines apply only at the client. (There is no information leak
+from extra cells in the protocol on the relay side, so we will not be specifying
+relay-side enforcement, or implementing it for C-Tor.)
+
+There are multiple state machines, describing particular circuit purposes
+and/or components of the Tor relay protocol.
+
+Each state machine has a "Trigger", and a "Message Scope". The "Trigger" is
+the condition, relay command, or action that causes the state machine to get
+added to a circuit's command state validator set. The Message Scope is where the state
+machine applies: to specific a hop number, stream ID, or both.
+
+A circuit can have multiple state machines attached at one time.
+ * If no state machine accepts a relay command, then the circuit MUST be
+ closed.
+ * When we say "receive X" we mean "receive a _valid_ cell of
+ type X". If the cell is invalid, we MUST kill the circuit
+
+## Relay message handlers
+
+The state machines process enveloped relay message commands. Ie, with respect
+to prop#340, they operate on the message bodies, with associated stream ID.
+
+With respect to Proposal #340, the calls to state machine validation would go
+after converting cells to messages, but before parsing the message body
+itself, to still minimize exposure of the parser attack surfaces.
+
+> XXX: Again, some validation will require early parsing, not lazy parsing
+
+There are multiple relay message handlers that can be registered with each
+circuit ID, for a specific hop on that circuit ID, depending on the protocols
+that are in use on that circuit with that hop, as well as the streams to that
+hop.
+
+Each handler has a Message Scope, that acts as a filter such that only relay
+command messages from this scope are processed by that handler.
+
+If a message is not accepted by any active handler, the circuit MUST be
+closed.
+
+
+### Base Handler
+
+Purpose: This handler validates commands for circuit construction and
+circuit-level SENDME activity.
+
+Trigger: Creation of a circuit; ntor handhshake success for a hop
+
+Message Scope: The circuit ID and hop number must match for this handler to
+apply. (Because of leaky pipes, each hop of the circuit has a base handler
+added when that hop completes an ntor handshake and is added to the circuit.)
+
+```text
+START:
+ Upon sending EXTEND:
+ Enter EXTEND_SENT.
+
+ Receive SENDME:
+ Ensure expected auth digest matches; close circuit otherwise
+ No transition.
+
+EXTEND_SENT:
+ Receiving EXTENDED:
+ Enter START.
+
+ Receive SENDME:
+ Ensure expected auth digest matches; close circuit otherwise
+ No transition.
+```
+
+### Client Introducing Handler
+
+Purpose: Circuits used by clients to connect to a service introduction point
+have this handler attached.
+
+Trigger: Usage of a circuit for client introduction
+
+Message Scope: Circuit ID and hop number must match
+
+```text
+CLIENT_INTRO_START:
+ Upon sending INTRODUCE1:
+ Enter CLIENT_INTRO_WAIT
+
+CLIENT_INTRO_WAIT
+ Receieve INTRODUCE_ACK:
+ Accept
+ Transition to CLIENT_INTRO_END
+
+CLIENT_INTRO_END:
+ No transitions possible
+ - XXX: Enforce that no new handlers can be added? We may still have padding
+ handlers though.
+```
+
+
+### Service Introduce Handler
+
+Purpose: Service-side onion service introduction circuits have this handler
+attached.
+
+Trigger: Onion service establishing an introduction point circuit
+
+Message Scope: Circuit ID and hop number must match
+
+```text
+SERVICE_INTRO_START:
+ Upon sending ESTABLISH_INTRO:
+ Enter SERVICE_INTRO_ESTABLISH
+
+SERVICE_INTRO_ESTABLISH:
+ Receiving INTRO_ESTABLISHED:
+ Enter SERVICE_INTRO_ESTABLISHED
+
+SERVICE_INTRO_ESTABLISHED:
+ Receiving INTRODUCE2
+ Accept
+```
+
+
+### Client Rendezvous Handler
+
+Purpose: Circuits used by clients to build a rendezvous point have this handler
+attached.
+
+Trigger: Client rendezvous initiation
+
+Message Scope: Circuit ID and hop number must match
+
+```text
+CLIENT_REND_START:
+ Upon Sending RENDEZVOUS1:
+ Enter CLIENT_REND_WAIT
+
+CLIENT_REND_WAIT:
+ Receive RENDEZVOUS2:
+ Enter CLIENT_REND_ESTABLISHED
+
+CLIENT_REND_ESTABLISHED:
+ Remain in this state; launch TCP, UDP, or Conflux handlers for streams
+```
+
+
+### Service Rendezvous Handler
+
+Purpose: Circuits used by services to connect to a rendezvous point have this
+handler attached.
+
+Trigger: Incoming introduce cell/service rend initiation
+
+Message Scope: Circuit ID and hop number must match
+
+```text
+SERVICE_REND_START:
+ Upon sending ESTABLISH_RENDEZVOUS:
+ Enter SERVICE_REND_WAIT
+
+SERVICE_REND_WAIT:
+ Receive RENDEZVOUS_ESTABLISHED:
+ Enter SERVICE_REND_ESTABLISHED
+
+SERVICE_REND_ESTABLISHED:
+ Remain in this state; launch TCP, UDP, or Conflux handlers for streams
+```
+
+
+### CircPad Handler
+
+Purpose: Circuit-level padding is negotiated with a particular hop in the
+circuit; when it is negotiated, we need to allow padding cells from that hop.
+
+Trigger: Negotiation of a circuit padding machine
+
+Message Scope: Circuit ID and hop must match; padding machine must be active
+
+```text
+PADDING_START:
+ Upon sending PADDING_NEGOTIATE:
+ Enter PADDING_NEGOTIATING
+
+PADDING_NEGOTIATING:
+ Receiving PADDING_NEGOTIATED:
+ Enter PADDING_ACTIVE
+
+PADDING_ACTIVE:
+ Receiving DROP:
+ Accept (if from correct hop)
+ - XXX: We could perform more sophisticated rate limiting accounting here
+ too?
+```
+
+### Resolve Stream Handler
+
+Purpose: This handler is created on circuits when a resolve happens.
+
+Trigger: RESOLVE message
+
+Message Scope: Circuit ID, stream ID, and hop number must all match
+
+```text
+RESOLVE_START:
+ Send a RESOLVE message:
+ Enter RESOLVE_SENT
+
+RESOLVE_SENT:
+ Receive a RESOLVED or an END:
+ Enter RESOLVE_START.
+```
+
+
+### TCP Stream handler
+
+Purpose: This handler is created when the client creates a new stream ID, using either
+BEGIN or BEGIN_DIR.
+
+Trigger: New AP or DirConn stream
+
+Message Scope: Circuit ID, stream ID, and hop number must all match; stream ID
+must be open or half-open (half-open is END_SENT).
+
+```text
+TCP_STREAM_START:
+ Send a BEGIN or BEGIN_DIR message:
+ Enter BEGIN_SENT.
+
+BEGIN_SENT:
+ Receive an END:
+ Enter TCP_STREAM_START.
+ Receive a CONNECTED:
+ Enter STREAM_OPEN.
+
+STREAM_OPEN:
+ Receive DATA:
+ Verify length is > 0
+ XXX: Handle [HSDIRINFLATION] here?
+ Process.
+
+ Receive XOFF:
+ Enter STREAM_XOFF
+
+ Send END:
+ Enter END_SENT.
+
+ Receive END:
+ Enter TCP_STREAM_START
+
+STREAM_XOFF:
+ Receive DATA:
+ Verify length is > 0
+ XXX: Handle [HSDIRINFLATION] here?
+ Process.
+
+ Send END:
+ Enter END_SENT.
+
+ Receive XON:
+ Enter STREAM_XON
+
+ Receive END:
+ Enter TCP_STREAM_START
+
+STREAM_XON:
+ Receive DATA:
+ Verify length is > 0
+ XXX: Handle [HSDIRINFLATION] here?
+ Process.
+
+ Receive XOFF:
+ If prop#340 is enabled, verify packed with SENDME
+ Enter STREAM_XOFF
+
+ Receive XON:
+ If prop#340 is enabled, verify packed with SENDME
+ Verify rate has changed
+
+ Send END:
+ Enter END_SENT.
+
+ Receive END:
+ Enter TCP_STREAM_START
+
+END_SENT:
+ Same as STREAM_OPEN, except do not actually deliver data.
+ Only remain in this state for one RTT_max, or until END_ACK.
+```
+
+
+### Conflux Handler
+
+Purpose: Circuits that are a part of a conflux set have a conflux handler, associated
+with the last hop.
+
+Trigger: Creation of a conflux set
+
+Message Scope: Circuit ID and hop number must match
+ - XXX: Linked circuits must accept stream ids from either circuit for other
+ handlers :/
+
+```text
+CONFLUX_START: (all conflux leg circuits start here)
+ Upon sending CONFLUX_LINK:
+ Enter CONFLUX_LINKING
+
+CONFLUX_LINKING:
+ Receiving CONFLUX_LINKED:
+ Send CONFLUX_LINKED_ACK
+ Enter CONFLUX_LINKED
+
+CONFLUX_LINKED:
+ Receiving CONFLUX_SWITCH:
+ If prop#340 is negotiated, ensure packed with a DATA cell
+```
+
+
+### UDP Stream Handler
+
+Purpose: Circuits that are using prop#339
+
+Trigger: UDP stream creation
+
+Message Scope: Circuit ID, hop number, and stream-id must match
+
+```text
+UDP_STREAM_START:
+ If no other udp streams used on circuit:
+ Send CONNECT_UDP for any stream, enter UDP_CONNECTING
+ else:
+ Immediately enter UDP_CONNECTING
+ (CONNECTED_UDP MAY arrive without a CONNECT_UDP, after the first UDP
+ stream on a circuit is established)
+
+UDP_CONNECTING:
+ Upon receipt of CONNECTED_UDP, enter UDP_CONNECTED
+
+UDP_CONNECTED:
+ Receive DATAGRAM:
+ Verify length > 0
+ Verify Prop#344 NAT rules are obeyed, including srcport and stream limits
+ Process.
+
+ Send END:
+ Enter UDP_END_SENT
+
+UDP_END_SENT:
+ Same as UDP_CONNECTED, except do not actually deliver data.
+ Only remain in this state for one RTT_max, or until END_ACK,
+ then transition to UDP_STREAM_START.
+```
+
+
+# HSDIR Inflation { #HSDIRINFLATION }
+
+XXX: This can be folded into the state machines and/or rend-spec.. The state
+machines should actually be able to handle this, once they are ready for it.
+
+One of the most common questions about dropped cells is "what about data cells
+with a 1 byte payload?". As Prop#344 makes clear, this is not a dropped cell
+attack, but is instead an instance of an Active Traffic Manipulation Covert
+Channel, described in Section 1.3.2. The lower severity of active traffic
+manipulation is due to the fact that it cannot be used to deanonymize 100% of
+a target client's circuits, where as the combination of path bias and
+pre-usage dropped cells can.
+
+However, there is one case where one can construct a potent attack from this
+Active Traffic Manipulation: by making use of onion service circuits being
+built on demand by an application. Further, because the onion service
+handshake is uniquely fingerprintable (see Section 1.2.1 of Prop#344), it is
+possible to use this vector in this specific case to encode an identifier in
+the timing and traffic patterns of the onion service descriptor download,
+similar to how the CMU attack operated, and use both the onion service
+fingerprint and descriptor traffic pattern to transmit the fact that a
+particular onion service was visited, to the Guard or possibly even a local
+network observer.
+
+A normal hidden service descriptor occupies only ~10 cells (with a hard max of
+30KB, or ~60 cells). This is not enough to reliably encode the full address of
+the onion service in a timing-based covert channel.
+
+However, there are two ways to cause this descriptor download to transmit
+enough data to encode such a covert channel, and replicate the CMU attack
+using timing information of this data.
+
+First, the actual descriptor payload can be spread across many DATA cells that
+are filled only partially with data (which does not happen if the HSDIR is
+honest and well-behaved, because it always has the full descriptor on hand).
+
+Second, in C-tor, additional junk can be appended at the end of a onion service
+descriptor document that does not count against the 30KB maximum, which the
+client will happily download and then ignore.
+
+Neither of these things are necessary to preserve, and neither can happen in
+normal operation. They can either be addressed directly by checks on
+HSDIR-based RELAY_COMMAND_DATA lengths and descriptor parsing, or by simply
+enforcing that circuits used to fetch service descriptors can *only* receive
+as many bytes as the maximum descriptor size, before being closed.
+
+XXX: Consider RELAY_COMMAND_END_ACK also..
+ - https://gitlab.torproject.org/tpo/core/torspec/-/issues/196
+
+XXX: Tickets to grovel through for other stuff:
+https://gitlab.torproject.org/tpo/core/torspec/-/issues/38
+https://gitlab.torproject.org/tpo/core/torspec/-/issues/39
+https://gitlab.torproject.org/tpo/core/arti/-/issues/525
+
+
+# Command Allowlist enumeration { #CTORALLOWLIST }
+
+XXX: We are planning to remove this section after we finish the state
+machines; keeping it for reference until then for cross-checking.
+
+Formerly, in C-Tor, we took the approach of performing a series of checks for
+each message command, ad-hoc. Here's those rules, for spot-checking that the
+above state machines cover them.
+
+All relay commands are rejected by clients and serviced unless a rule says
+they are OK.
+
+Here's a list of those rules, by relay command:
+
+ - RELAY_COMMAND_DATA 2
+ - This command MUST only arrive for valid open or half-open stream ID
+ - This command MUST have data length > 0
+ - On HSDIR circuits, ONLY ONE command is allowed to have a non-full
+ payload (the last command). See Section 4.
+
+ - RELAY_COMMAND_END 3
+ - This command MUST only arrive ONCE for each valid open or half-open
+ stream ID
+
+ - RELAY_COMMAND_CONNECTED 4
+ - This command MUST ONLY be accepted ONCE by clients if they sent a BEGIN
+ or BEGIN_DIR
+ - The stream ID MUST match the stream ID from BEGIN (or BEGIN_DIR)
+
+ - RELAY_COMMAND_DROP 10
+ - This command is accepted by clients from any hop that they
+ have negotiated an active circuit padding machine with
+
+ - RELAY_COMMAND_CONFLUX_LINKED 20
+ - Ensure that a LINK cell was sent to the hop that sent this
+ - Ensure that no previous LINKED cell has arrived on this circuit
+
+ - RELAY_COMMAND_CONFLUX_SWITCH 22
+ - Ensure that conflux is enabled and linked
+ - If Prop#340 is in use, this cell MUST be packed with a valid
+ multiplexed RELAY_COMMAND_DATA cell.
+
+ - RELAY_COMMAND_INTRODUCE2 35
+ - Services MUST check:
+ - The intro is for a valid service identity and auth
+ - The command has a valid sub-credential
+ - The command is not a replay (possibly not close circuit?)
+
+ - RELAY_COMMAND_RENDEZVOUS2 37
+ - This command MUST ONLY arrive ONCE in response to a sent REND1 cell,
+ on the appropriate circuit
+ - The ntor handshake must succeed with MAC validation
+
+ - RELAY_COMMAND_INTRO_ESTABLISHED 38
+ - Services MUST check:
+ - This cell MUST ONLY come ONCE in response to
+ RELAY_COMMAND_ESTABLISH_INTRO, for the appropriate service identity
+
+ - RELAY_COMMAND_RENDEZVOUS_ESTABLISHED 39
+ - This command MUST ONLY be accepted ONCE in response to
+ RELAY_COMMAND_ESTABLISH_RENDEZVOUS
+
+ - RELAY_COMMAND_INTRODUCE_ACK 40
+ - This command MUST ONLY be accepted ONCE by clients, in response to
+ RELAY_COMMAND_INTRODUCE1
+
+ - RELAY_COMMAND_PADDING_NEGOTIATED 42
+ - This command MUST ONLY be accepted by clients in response to
+ PADDING_NEGOTIATE
+
+ - RELAY_COMMAND_XOFF 43
+ - Ensure that congestion control is enabled and negotiated
+ - Ensure that the stream id is either opened or half-open
+ - Ensure that the stream id is in "XON" state
+
+ - RELAY_COMMAND_XON 44
+ - Ensure that congestion control is enabled and negotiated
+ - Ensure that the stream id is either opened or half-open
+ - Enforce always packing this to a SENDME with Prop#340?
+
+ - RELAY_COMMAND_CONNECTED_UDP
+ - The stream id in this command MUST match that from
+ RELAY_COMMAND_CONNECT_UDP
+ - This command is only accepted once per UDP stream id
+
+ - RELAY_COMMAND_DATAGRAM
+ - This command MUST only arrive for valid open or half-open stream ID
+ - This command MUST have data length > 0
+
+
+
+References:
+
+[DROPMARK]: https://petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf