Table of contents, spell check, additional fixes

author: Micah Elizabeth Scott <beth@torproject.org> 2024-01-11 09:13:52 -0800
committer: Micah Elizabeth Scott <beth@torproject.org> 2024-01-25 08:56:48 -0800
commit: 679138c337affa080ce1ebb47da43e8fdc69d3ce (patch)
tree: 5cbf03c9d22005f8308550164e3c469d5b3a7a20
parent: 25feebf9d0a0de3e36fcf3331e41e0f9e9d16afa (diff)
download: torspec-679138c337affa080ce1ebb47da43e8fdc69d3ce.tar.gz
torspec-679138c337affa080ce1ebb47da43e8fdc69d3ce.zip
1 files changed, 147 insertions, 106 deletions
diff --git a/proposals/XXX-udp-app-support.md b/proposals/XXX-udp-app-support.md
index 1acc2ef..c338f17 100644
--- a/proposals/XXX-udp-app-support.md
+++ b/proposals/XXX-udp-app-support.md
@@ -1,12 +1,52 @@
-```
+# UDP Application Support in Tor
+
+```text
 Filename: XXX-udp-app-support.md
-Title: Support for UDP Applications over Tor
+Title: UDP Application Support in Tor
 Author: Micah Elizabeth Scott
 Created: December 2023
 Status: Draft
 ```
 
-# Introduction
+## Table of Contents
+
+- [Introduction](#introduction)
+  - [History](#history)
+  - [Scope](#scope)
+- [UDP Traffic Models](#udp-traffic-models)
+  - [User Datagram Protocol (RFC768)](#user-datagram-protocol-rfc768)
+  - [Socket Layer](#socket-layer)
+  - [Network Address Translation (NAT)](#network-address-translation-nat)
+    - [Mapping and Filtering Behaviors](#mapping-and-filtering-behaviors)
+  - [Common Protocols](#common-protocols)
+    - [QUIC](#quic)
+    - [WebRTC](#webrtc)
+  - [Common Applications](#common-applications)
+- [Overview of Possible Solutions](#overview-of-possible-solutions)
+  - [Datagram Routing](#datagram-routing)
+    - [Intentional UDP Leak](#intentional-udp-leak)
+    - [3rd Party Implementations](#3rd-party-implementations)
+    - [Future Work on Tor](#future-work-on-tor)
+  - [Tunneling](#tunneling)
+    - [TURN Encapsulated in a Tor Stream](#turn-encapsulated-in-a-tor-stream)
+    - [Tor Stream Tunnel to an Exit](#tor-stream-tunnel-to-an-exit)
+    - [Tor Stream Tunnel to a Rendezvous Point](#tor-stream-tunnel-to-a-rendezvous-point)
+- [Specific Designs Using Tor Streams](#specific-designs-using-tor-streams)
+  - [One Stream per Tunnel](#one-stream-per-tunnel)
+  - [One Stream per Socket](#one-stream-per-socket)
+  - [One Stream per Flow](#one-stream-per-flow)
+  - [One Stream per Mapping](#one-stream-per-socket)
+  - [Hybrid Mapping and Flow Approach](#hybrid-mapping-and-flow-approach)
+- [Risks](#risks)
+  - [Behavior Regressions](#behavior-regressions)
+  - [Bandwidth Usage](#bandwidth-usage)
+  - [Malicious Traffic](#malicious-traffic)
+  - [Local Port Usage](#local-port-usage)
+  - [Traffic Injection](#traffic-injection)
+  - [Peer-to-Peer Deanonymization](#peer-to-peer-deanonymization)
+  - [Additional Risks to Anonymity](#additional-risks-to-anonymity)
+
+## Introduction
 
 This proposal takes a fresh look at the problem of implementing support in Tor for applications which require UDP/IP communication.
 
@@ -15,11 +55,11 @@ This work is being done with the sponsorship and goals of the [Tor VPN Client fo
 We start out by defining how this proposal compares to previous work, and the specific problem space we are addressing.
 This leads into an analysis that references appropriate standards and proposes some specific solutions with properties we can compare.
 
-## History
+### History
 
 There have already been multiple attempts over Tor's history to define some type of UDP extension.
 
-### 2006
+#### 2006
 
 [Proposal 100](https://spec.torproject.org/proposals/100-tor-spec-udp.html) by Marc Liberatore in 2006 suggested a way to "add support for tunneling unreliable datagrams through tor with as few modifications to the protocol as possible."
 This proposal suggested extending the existing TLS+TCP protocol with a new DTLS+UDP link mode.
@@ -35,7 +75,7 @@ This value we will see is much too small for most applications.
 It's possible these UDP protocol details would have been elaborated during design, but the proposal hit a snag elsewhere:
 there was no agreement on a way to avoid facilitating new attacks against anonymity.
 
-### 2018
+#### 2018
 
 In 2018, Nick Mathewson and Mike Perry wrote a
 [summary of the side-channel issues with unreliable transports for Tor](https://research.torproject.org/techreports/side-channel-analysis-2018-11-27.pdf).
@@ -43,7 +83,7 @@ In 2018, Nick Mathewson and Mike Perry wrote a
 The focus of this document is on the communication between Tor relays, but there is considerable overlap between the attack space explored here and the potential risks of any application-level UDP support.
 Attacks that are described here, such as drops and injections, may be applied by malicious exits or some types of third parties even in an implementation using only present-day reliable Tor transports.
 
-### 2020
+#### 2020
 
 [Proposal 339](https://spec.torproject.org/proposals/339-udp-over-tor.html) by Nick Mathewson in 2020 introduced a simpler UDP encapsulation design which had similar stream mapping properties as in proposal 100, but with the unreliable transport omitted. Datagrams are tunneled over a new type of Tor stream using a new type of Tor message.
 As a prerequisite, it depends on [proposal 319](https://spec.torproject.org/proposals/319-wide-everything.html) to support messages that may be larger than a cell, extending the MTU to support arbitrarily large UDP datagrams.
@@ -52,7 +92,7 @@ In proposal 339 the property of binding a stream both to a local port and to a r
 The single-peer *connected socket* behavior would be referred to as an *endpoint-dependent mapping* in RFC4787.
 This type works fine for client/server apps but precludes the use of NAT traversal for peer-to-peer transfer.
 
-## Scope
+### Scope
 
 This proposal aims to allow Tor applications and Tor-based VPNs to provide compatibility with applications that require UDP/IP communications.
 
@@ -80,11 +120,11 @@ We do not plan to support applications which accept arbitrary incoming datagrams
 RFC4787 calls this *endpoint-independent filtering*.
 It's unnecessary for running peer-to-peer apps, and it facilitates an extremely easy traffic injection attack.
 
-# UDP Traffic Models
+## UDP Traffic Models
 
 To better specify the role of a UDP extension for Tor, we will look at a few frameworks for describing UDP applications.
 
-## User Datagram Protocol (RFC768)
+### User Datagram Protocol (RFC768)
 
 The "User Interface" suggested by [RFC768](https://www.rfc-editor.org/rfc/rfc768) for the protocol is a rough sketch, suggesting that applications have some way to allocate a local port for receiving datagrams and to transmit datagrams with arbitrary headers.
 
@@ -95,7 +135,7 @@ On IPv4, this requires sending packets with the "Don't Fragment" flag set, and m
 
 Note that many applications have their own requirements for path MTU. For example, QUIC and common implementations of WebRTC require an MTU no smaller than 1200 bytes, but they can discover larger MTUs when available.
 
-## Socket Layer
+### Socket Layer
 
 In practice the straightforward "User Interface" from RFC768, capable of arbitrary local address, is only available to privileged users.
 
@@ -113,7 +153,7 @@ It's better to think of one socket as one allocated source port.
 A typical application may allocate only a single port (one socket) for talking to many peers.
 Every datagram sent or received on the socket may have a different peer address.
 
-## Network Address Translation (NAT)
+### Network Address Translation (NAT)
 
 Much of the real-world complexity in UDP applications comes from their strategies to detect and overcome the effects of NAT.
 
@@ -146,7 +186,7 @@ For example, in analyzing Tor as a type of carrier-grade NAT, we may consider th
 Tor by necessity must carefully limit how predictable these mappings can ever be, to preserve its anonymity properties.
 A literal application of RFC6888 would find trouble in REQ-2 and REQ-9, as well as the various per-subscriber limiting requirements.
 
-### Mapping and Filtering Behaviors
+#### Mapping and Filtering Behaviors
 
 RFC4787 defines a framework for understanding the behavior of NAT by analyzing both its "mapping" and "filtering" behavior separately.
 Mappings are the NAT's unit of state tracking.
@@ -163,7 +203,7 @@ We can gain some additional insight by looking at requirements that come from ou
 
   In the context of Tor, we can likely rule out this technique entirely.
   It makes traffic injection attacks possible from any source address, provided you can guess the UDP port number used at an exit.
-  It also makes possible clearnet hosting of UDP servers using an exit node's IP, which may have undesirable abuse properties.
+  It also makes possible clear-net hosting of UDP servers using an exit node's IP, which may have undesirable abuse properties.
 
   It precludes "Port overlapping" behavior as defined in RFC7857 section 3, which may be necessary in order to achieve sufficient utilization of local port numbers on exit nodes.
 
@@ -202,17 +242,17 @@ We can gain some additional insight by looking at requirements that come from ou
 RFC4787 recommends that filtering style be configurable.
 We would like to implement that advice, but we are also looking for opportunities to make design decisions that give us the best network and end-user behaviors.
 
-## Common Protocols
+### Common Protocols
 
 Applications that want to use UDP are increasingly making use of higher-level protocols to avoid creating bespoke solutions for problems like NAT traversal, connection establishment, and reliable delivery.
 
 We will analyze how these protocols affect Tor's UDP traffic requirements.
 
-### QUIC
+#### QUIC
 
 [RFC9000](https://www.rfc-editor.org/rfc/rfc9000.html) defines QUIC, a multiplexed secure point-to-point protocol which supports reliable and unreliable delivery. The most common use is as an optional HTTP replacement, especially among Google services.
 
-QUIC does not normally try to traverse NAT; as an HTTP replacement, the server is expected to have a routable address.
+QUIC does not normally try to traverse NAT; as an HTTP replacement, the server is expected to have an address reachable without any prior connection setup.
 
 QUIC provides its own flexible connection lifetimes which may outlive individual network links or NAT mappings.
 The intention is to provide transparent roaming as mobile users change networks.
@@ -223,7 +263,7 @@ In these cases we are not looking for any specific compatibility enhancement, si
 
 In cases where QUIC is used as a primary protocol without TCP fallback, we expect UDP support to be vital. These applications are currently niche but we expect they may rise in popularity.
 
-### WebRTC
+#### WebRTC
 
 WebRTC is a large collection of protocols tuned to work together for media transport and NAT traversal.
 It is increasingly common, both for browser-based telephony and for peer to peer data transfer.
@@ -234,13 +274,13 @@ Of particular importance to us, WebRTC uses the Interactive Connection Establish
 Any generalized solution to connection establishment, like ICE, will require sending connectivity test probes. These have an inherent hazard to anonymity: assuming no delays are inserted intentionally, the result is a broadcast of similar traffic across all available network interfaces. This could form a convenient correlation beacon for an attacker attempting to deanonymize users who use WebRTC over a Tor VPN.
 
 See
-[RFC8825](https://www.rfc-editor.org/rfc/rfc8825.html) _Overview: Real-Time Protocols for Browser-Based Applications_,
-[RFC8445](https://www.rfc-editor.org/rfc/rfc8445.html) _Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal_,
-[RFC8838](https://www.rfc-editor.org/rfc/rfc8838.html) _Trickle ICE: Incremental Provisioning of Candidates for the Interactive Connectivity Establishment (ICE) Protocol_,
-[RFC5389](https://www.rfc-editor.org/rfc/rfc5389.html) _Session Traversal Utilities for NAT (STUN)_,
+[RFC8825](https://www.rfc-editor.org/rfc/rfc8825.html) *Overview: Real-Time Protocols for Browser-Based Applications*,
+[RFC8445](https://www.rfc-editor.org/rfc/rfc8445.html) *Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal*,
+[RFC8838](https://www.rfc-editor.org/rfc/rfc8838.html) *Trickle ICE: Incremental Provisioning of Candidates for the Interactive Connectivity Establishment (ICE) Protocol*,
+[RFC5389](https://www.rfc-editor.org/rfc/rfc5389.html) *Session Traversal Utilities for NAT (STUN)*,
 and others.
 
-## Common Applications
+### Common Applications
 
 With applications exhibiting such a wide variety of behaviors, how do we know what to expect from a good implementation?
 How do we know which compatibility decisions will be most important to users?
@@ -267,7 +307,7 @@ In alphabetical order:
 | WiFi Calling           | Telecom        | IPsec tunnel                      | Out of scope       | Still out of scope                                 |
 | Zoom                   | Telecom        | client/server or P2P, UDP/TCP     | Works              | Slight latency improvement                         |
 
-# High level approaches
+## Overview of Possible Solutions
 
 Now that we've defined some categories of UDP traffic we are interested in handling, this section starts to examine different high-level implementation techniques we could adopt.
 
@@ -275,11 +315,11 @@ We can broadly split these into *datagram routing* and *tunneling*.
 
 Ideally we would be choosing a design that solves problems we have in the near-term while also providing a solid foundation for future enhancements to Tor, including changes which may add full support for unreliable delivery of datagrams. If we proceed down that path with insufficient understanding of the long-term goal, there's a risk that we will choose to adopt complexity in service of future goals while failing to serve them adequately when the time comes.
 
-## Datagram routing
+### Datagram Routing
 
 These approaches seek to use a network that can directly route datagrams from place to place. These approaches are the most obviously suitable for implementing UDP, but they also form the widest departure from classic Tor.
 
-### Intentional UDP leak
+#### Intentional UDP Leak
 
 The simplest approach would be to allow UDP traffic to bypass the anonymity layer. This is an unacceptable loss of anonymity in many cases, given that the client's real IP address is made visible to web application providers.
 
@@ -287,13 +327,13 @@ In other cases, this is an acceptable or even preferable approach. For example,
 
 In threat models where application vendors are more trustworthy than the least trustworthy Tor exits, it may be more appropriate to allow direct peer-to-peer connections than to trust Tor exits with unencrypted connection establishment traffic.
 
-### 3rd party implementations
+#### 3rd Party Implementations
 
 Another option would be to use an unrelated anonymizer system for datagram traffic. It's not clear that a suitable system already exists. I2P provides a technical solution for routing anonymized datagrams, but not a Tor-style infrastructure of exit node operators.
 
 This points to the key weakness of relying on a separate network for UDP: Tor has an especially well-developed community of volunteers running relays. Any UDP solution that is inconvenient for relay operators has little chance of adoption.
 
-### Future proofing
+#### Future Work on Tor
 
 This is likely where we would seek to expand Tor's design in order to add end-to-end support for unreliable delivery in the future.
 A specific design is out of the scope of this document.
@@ -303,7 +343,7 @@ We may find a need for an abstraction similar to a network routing table, allowi
 
 Even without bringing any new network configurations to Tor, achieving interoperable support for both exit nodes and onion services in a Tor UDP implementation requires some attention to how multiple UDP providers can coexist.
 
-## Tunneling
+### Tunneling
 
 The approaches in this section add a new construct which does not exist in UDP itself: a point to point tunnel between clients and some other location at which they establish the capability to send and receive UDP datagrams.
 
@@ -313,9 +353,9 @@ We would like this to come as an extension of Tor's existing process for distrib
 We expect exit policies for UDP to have limited practical amounts of diversity.
 VPN implementations will need to know ahead of time which tunnel circuits to build, or they will suffer a significant spike in latency for the first outgoing datagram to a new peer.
 Additionally, it's common for UDP port numbers to be randomly assigned.
-This makes highly specific exit policies even less useful and even higher overhead than they are with TCP.
+This would make highly specific Tor exit policies even less useful and even higher overhead than they are with TCP.
 
-### Using TURN encapsulated in a Tor stream
+#### TURN Encapsulated in a Tor Stream
 
 The scope of this tunnel is quite similar to the existing TURN relays, used commonly by WebRTC applications to implement fallbacks for clients who cannot find a more direct connection path.
 
@@ -325,28 +365,30 @@ TURN was designed to be a set of modular and extensible pieces, which may be too
 
 TURN has a popular embeddable C-language implementation, [coturn](https://github.com/coturn/coturn), which may be suitable for including alongside or inside C tor.
 
-### Using Tor streams to an exit
+#### Tor Stream Tunnel to an Exit
 
 Most of the discussion on UDP implementation in Tor so far has assumed this approach. Essentially it's the same strategy as TCP exits, but for UDP. When the OP initializes support for UDP, it pre-builds circuits to exits that support required UDP exit policies. These pre-built circuits can then be used as tunnels for UDP datagrams.
 
 Within this overall approach, there are various ways we could choose to assign Tor *streams* for the UDP traffic. This will be considered below.
 
-### Using Tor streams to a rendezvous point
+#### Tor Stream Tunnel to a Rendezvous Point
 
 To implement onion services which advertise UDP, we may consider using multiple simultaneous tunnels.
 In addition to exit nodes, clients could establish the ability to allocate virtual UDP ports on a rendezvous node of some kind.
 
-The most immediate challenge in UDP rendezvous would then become application support. Protocols like STUN and ICE deal directly with IPv4 and IPv6 formats in order to advertise a routable address to their peer. Supporting onion services in WebRTC would require protocol extensions and software modifications for STUN, TURN, ICE, and SDP at minimum.
+The most immediate challenge in UDP rendezvous would then become application support. Protocols like STUN and ICE deal directly with IPv4 and IPv6 formats in order to advertise a reachable address to their peer. Supporting onion services in WebRTC would require protocol extensions and software modifications for STUN, TURN, ICE, and SDP at minimum.
 
 UDP-like rendezvous extensions would have limited meaning unless they form part of a long-term strategy to forward datagrams in some new way for enhanced performance or compatibility. Otherwise, application authors might as well stick with Tor's existing TCP-like rendezvous functionality.
 
-# Specific designs using Tor streams
+## Specific Designs Using Tor Streams
 
 Let's look more closely at Tor *streams*, the multiplexing layer right below circuits.
 
 Streams have a 16-bit identifier, allocated arbitrarily by clients. Stream lifetimes are subject to some ambiguity still in the Tor spec. They are allocated by clients, but may be destroyed by either peer.
 
-## Stream per tunnel
+We have an opportunity to use this additional existing multiplexing layer to serve a useful function in the new protocol, or we can opt to interact with streams as little as possible in order to keep the protocol features more orthogonal.
+
+### One Stream per Tunnel
 
 The fewest new streams would be a single stream for all of UDP. This is what we get if we choose an off-the-shelf protocol like TURN as our UDP proxy.
 
@@ -358,34 +400,34 @@ This approach would require only a single new Tor message type:
 
 Note that RFC8656 requires authentication before data can be relayed, which is a good default best practice for the internet perhaps but is the opposite of what Tor is trying to do. We would either deviate from the specification to relax this auth requirement, or we would provide a way for clients to discover credentials: perhaps by fixing them ahead of time or by including them in the relay descriptor.
 
-## Stream per socket
+### One Stream per Socket
 
 One stream **per socket** was the approach suggested in [Proposal 339](https://spec.torproject.org/proposals/339-udp-over-tor.html) by Nick Mathewson in 2020.
 
 In proposal 339, there would be one new type of stream and three new message types: `CONNECT_UDP`, `CONNECTED_UDP`, and `DATAGRAM`.
 
-Each stream's lifetime would match the lifetime of a source port allocation.
-There would be a single peer `(remote address, remote port)` allowed per `local port` allocation.
+Each stream's lifetime would match the lifetime of a local port allocation.
+There would be a single peer `(remote address, remote port)` allowed per `local port`.
 This matches the usage of BSD-style sockets on which `connect()` has completed.
 It's incompatible with many of the applications analyzed.
 Multiple peers are typically needed for a variety of reasons, like connectivity checks or multi-region servers.
 
 This approach would be simplest to implement and specify, especially in the existing C tor implementation.
-
 It also unfortunately has very limited compatibility, and no clear path toward incremental upgrades if we wish to improve compatibility later.
 
-## Stream per flow
+A simple one-to-one mapping between streams and sockets would preclude the optimizations necessary to address [local port usage](#local-port-usage) risks below. Solutions under this design are possible, but only by decoupling logical protocol-level sockets from the ultimate implementation-level sockets and reintroducing much of the complexity that we attempted to avoid by choosing this design.
+
+### One Stream per Flow
 
 One stream **per flow** has also been suggested.
-In particular, Mike Perry brought up this approach during our conversations about UDP earlier and we spent some time analyzing it.
+Specifically, Mike Perry brought this up during our conversations about UDP recently and we spent some time analyzing it from a RFC4787 perspective.
+We will see below it has interesting properties but also some hidden complexity.
 
 This would assign a stream ID to the tuple consisting of at least `(local port, remote address, remote port)`. Additional flags may be included for features like transmit and receive filtering, IPv4/v6 choice, and IP *Don't Fragment*.
 
 This has advantages in keeping the datagram cells simple, with no additional IDs beyond the existing circuit ID.
 It may also have advantages in DoS-prevention and in privacy analysis.
 
-By decoupling protocol features from the lifetime of sockets on the exit side, we facilitate implementing the desirable "Port overlapping" NAT behavior mentioned above.
-
 Stream lifetimes, in this case, would not have any specific meaning other than the lifetime of the ID itself.
 The bundle of flows associated with one source port would still all be limited to the lifetime of a Tor circuit, by scoping the source port identifier to be contained within the lifetime of its circuit.
 
@@ -405,46 +447,45 @@ Even with the stricter **address and port-dependent filtering** we may still be
 
 This approach thus requires some attention to either correctly allocating stream IDs on both sides of the circuit, or choosing a filtering strategy and filter/mapping lifetime that does not ever leave stream IDs undefined when expecting incoming datagrams.
 
-## Stream per mapping
+### One Stream per Mapping
 
-One stream **per mapping** is an alternative which attemps to reduce the number of edge cases by merging the lifetimes of one stream and one **endpoint-independent mapping**.
+One stream **per mapping** is an alternative which attempts to reduce the number of edge cases by merging the lifetimes of one stream and one **endpoint-independent mapping**.
 
 A mapping would always be allocated from the OP side.
 It could explicitly specify a filtering style, if we wish to allow applications to request non-port-dependent filtering for compatibility.
 Each datagram within the stream would still need to be tagged with a peer address/port in some way.
 
-This approach would involve a single new type of stream, two new messages that pertain to these *flow* streams:
+This approach would involve a single new type of stream, two new messages that pertain to these *mapping* streams:
 
 - `NEW_UDP_MAPPING`
 
-  - Always client-to-exit
-  - Creates a new mapping, with a specified stream ID
+  - Always client-to-exit.
+  - Creates a new mapping, with a specified stream ID.
   - Succeeds instantly; no reply is expected, early data is ok.
-  - Externally-visible local port number is arbitrary, and must be determined through interaction with other endpoints
-  - Might contain an IP "don't fragment" flag
-  - Might contain a requested filtering mode
-  - Lifetime is until circuit teardown or `END` message
+  - Externally-visible local port number is arbitrary, and must be determined through interaction with other endpoints.
+  - Might contain an IP "don't fragment" flag.
+  - Might contain a requested filtering mode.
+  - Lifetime is until circuit teardown or `END` message.
 
 - `UDP_MAPPING_DATAGRAM`
 
-  - conveys one datagram on a stream defined by `NEW_UDP_MAPPING`.
-  - Includes peer address (IPv4/IPv6) as well as datagram content
+  - Conveys one datagram on a stream previously defined by `NEW_UDP_MAPPING`.
+  - Includes peer address (IPv4/IPv6) as well as datagram content.
 
-This puts us in a very similar design space to TURN, RFC8656.
-In TURN, "allocations" are made explicitly on request, and assigned a random relayed port.
-TURN also uses its "allocations" as an opportunity to support a *Don't Fragment* flag.
+This puts us in a very similar design space to TURN (RFC8656).
+In that protocol, "allocations" are made explicitly on request, and assigned a random relayed port.
+TURN also uses its allocations as an opportunity to support a *Don't Fragment* flag.
 
-The principal disadvantage of this approach is in space overhead, especially the proportional overhead on small datagrams which must still carry a full-size address.
+The principal disadvantage of this approach is in space overhead, especially the proportional overhead on small datagrams which must each carry a full-size address.
 
-## Hybrid stream appoach
-
-We can extend the approach above with an optimization that addresses the undesiarable space overhead from redundant address headers.
+### Hybrid Mapping and Flow Approach
 
+We can extend the approach above with an optimization that addresses the undesirable space overhead from redundant address headers.
 This uses two new types of stream, in order to have streams **per mapping** and **per flow** at the same time.
 
 The per-mapping stream remains the sole interface for managing the lifetime of a mapped UDP port. Mappings are created explicitly by the client. As an optimization, within the lifetime of a mapping there may exist some number of *flows*, each assigned their own ID.
 
-This tries to combine the strengths of both approaches, using the lifetime of one stream to define a mapping and to carry otherwise-unbundled traffic while also allowing additional streams to bundle datagrams that would otherwise have repetetive headers.
+This tries to combine the strengths of both approaches, using the lifetime of one stream to define a mapping and to carry otherwise-unbundled traffic while also allowing additional streams to bundle datagrams that would otherwise have repetitive headers.
 It avoids the space overhead of a purely **per mapping** approach and avoids the ID allocation and lifetime complexity introduced with **per flow**.
 
 This approach takes some inspiration from TURN, where commonly used peers will be defined as a "channel" with an especially short header.
@@ -454,26 +495,26 @@ The implementation here could be a strict superset of the **per mapping** implem
 
 - `NEW_UDP_MAPPING`
 
-  - Same as above
+  - Same as above.
 
 - `UDP_MAPPING_DATAGRAM`
 
-  - Same as above
+  - Same as above.
 
 - `NEW_UDP_FLOW`
 
-  - Lifetime is <= lifetime of UDP mapping.
-  - stream ID for parent mapping, and new flow within that mapping
-  - ended on mapping end or by explicit `END`
-  - Includes peer address (IPv4/IPv6)
+  - Allocates a stream ID as a *flow*, given the ID to be allocated and the ID of its parent *mapping* stream.
+  - Includes a peer address (IPv4/IPv6).
+  - The *flow* has a lifetime strictly bounded by the outer *mapping*. It is deleted by an explicit `END` or when the mapping is de-allocated for any reason.
 
 - `UDP_FLOW_DATAGRAM`
 
-  - Datagram contents without address, for flow streams.
+  - Datagram contents only, without address.
+  - Only appears on *flow* streams.
 
 We must consider the traffic marking opportunities we open when allowing an exit to represent one incoming datagram as either a *flow* or *mapping* datagram.
 
-It's possible this traffic injection potential is not worse than the baseline amount of injection potential than every UDP protocol presents. See more on risks below. For this hybrid stream approach specifically, there's a limited mitigation we can use to allow exits only a bounded amount of leaked information per UDP peer:
+It's possible this traffic injection potential is not worse than the baseline amount of injection potential than every UDP protocol presents. See more on [risks](#risks) below. For this hybrid stream approach specifically, there's a limited mitigation we can use to allow exits only a bounded amount of leaked information per UDP peer:
 
 We would like to state that exits may not choose to send a `UDP_MAPPING_DATAGRAM` when they could have sent a `UDP_FLOW_DATAGRAM`.
 Sometimes it is genuinely unclear though: an exit may have received this datagram in-between processing `NEW_UDP_MAPPING` and `NEW_UDP_FLOW`.
@@ -484,12 +525,12 @@ Mappings that do not request port-specific filtering may always get unexpected `
 We may wish for `NEW_UDP_MAPPING` to have an option requiring that only `UDP_FLOW_DATAGRAM` is to be used, never `UDP_MAPPING_DATAGRAM`.
 This would remove the potential for ambiguity, but costs in compatibility as it's no longer possible to implement non-port-specific filtering.
 
-# Risks
+## Risks
 
 Any proposed UDP support involves significant risks to user privacy and software maintainability.
 We will try to elaborate some of these risks here, so they can be compared against the expected benefits.
 
-## Behavior regressions
+### Behavior Regressions
 
 In some applications it is possible that Tor's implementation of a UDP compatibility layer will cause a regression in the ultimate level of performance or security.
 
@@ -504,49 +545,49 @@ Privacy and security regressions have more severe consequences and they can be m
 There are straightforward downgrades, like WebRTC apps that give up TURN-over-TLS for plaintext TURN-over-UDP.
 More subtly, the act of centralizing connection establishment traffic in Tor exit nodes can make users an easier target for other attacks.
 
-## Bandwidth usage
+### Bandwidth Usage
 
 We expect an increase in overall exit bandwidth requirements due to peer-to-peer file sharing applications.
 
 Current users attempting to use BitTorrent over Tor are hampered by the lack of UDP compatibility. Interoperability with common file-sharing peers would make Tor more appealing to users with a large and sustained appetite for anonymized bandwidth.
 
-## Malicious traffic
+### Malicious Traffic
 
 We expect UDP compatibility in Tor will give malicious actors additional opportunities to transmit unwanted traffic.
 
-- Amplification attacks against arbitrary targets
+#### Amplification attacks against arbitrary targets
 
-  These are possible only in limited circumstances where the protocol allows an arbitrary reply address, like SIP.
-  The peer is often at fault for having an overly permissive configuration.
-  Nevertheless, any of these *easy* amplification targets can be exploited from Tor with little consequence, creating a nuisance for the ultimate target and for exit operators.
+These are possible only in limited circumstances where the protocol allows an arbitrary reply address, like SIP.
+The peer is often at fault for having an overly permissive configuration.
+Nevertheless, any of these *easy* amplification targets can be exploited from Tor with little consequence, creating a nuisance for the ultimate target and for exit operators.
 
-- Amplification attacks against exit relay
+#### Amplification attacks against exit relay
 
-  An amplification peer which doesn't allow arbitrary destinations can still be used to attack the exit relay itself or other users of that relay.
-  This is essentially the same attack that is possible against any NAT the attacker is behind.
+An amplification peer which doesn't allow arbitrary destinations can still be used to attack the exit relay itself or other users of that relay.
+This is essentially the same attack that is possible against any NAT the attacker is behind.
 
-- Malicious fragmented traffic
+#### Malicious fragmented traffic
 
-  If we allow sending large UDP datagrams over IPv4 without the *Don't Fragment* flag set, we allow attackers to generate fragmented IP datagrams.
-  This is not itself a problem, but it has historically been a common source of inconsistencies in firewall behavior.
+If we allow sending large UDP datagrams over IPv4 without the *Don't Fragment* flag set, we allow attackers to generate fragmented IP datagrams.
+This is not itself a problem, but it has historically been a common source of inconsistencies in firewall behavior.
 
-- Excessive sends to an uninterested peer
+#### Excessive sends to an uninterested peer
 
-  Whereas TCP mandates a successful handshake, UDP will happily send unlimited amounts of traffic to a peer that has never responded.
-  To prevent denial of service attacks we have an opportunity and perhaps a responsibility to define our supported subset of UDP to include true bidirectional traffic but exclude continued sends to peers who do not respond.
+Whereas TCP mandates a successful handshake, UDP will happily send unlimited amounts of traffic to a peer that has never responded.
+To prevent denial of service attacks we have an opportunity and perhaps a responsibility to define our supported subset of UDP to include true bidirectional traffic but exclude continued sends to peers who do not respond.
 
-  See also [RFC7675](https://www.rfc-editor.org/rfc/rfc7675.html) and STUN's concept of "Send consent".
+See also [RFC7675](https://www.rfc-editor.org/rfc/rfc7675.html) and STUN's concept of "Send consent".
 
-- Excessive number of peers
+#### Excessive number of peers
 
-  We may want to place conservative limits on the maximum number of peers per mapping or per circuit, in order to make bulk scanning of UDP port space less convenient.
+We may want to place conservative limits on the maximum number of peers per mapping or per circuit, in order to make bulk scanning of UDP port space less convenient.
 
-  The limit does need to be on peers, not stream IDs as we presently do for TCP.
+The limit does need to be on peers, not stream IDs as we presently do for TCP.
 
-  In this proposal stream IDs are not necessarily meaningful except as a representational choice made by clients.
-  Strategies like the *per-mapping* stream assignment like we have for TCP.
+In this proposal stream IDs are not necessarily meaningful except as a representational choice made by clients.
+Strategies like the *per-mapping* stream assignment like we have for TCP.
 
-## Local port usage
+### Local Port Usage
 
 Exit routers will have a limited number of local UDP ports. In the most constrained scenario, an exit may have a single IP with 16384 or fewer ephemeral ports available. These ports could each be allocated by one client for an unbounded amount of exclusive use.
 
@@ -556,22 +597,14 @@ An attacker who allocates ports for only this minimum duration of 2 minutes woul
 
 The expanded definition of "Port overlapping" from [RFC7857 section 3](https://datatracker.ietf.org/doc/html/rfc7857#section-3), may form at least a partial mitigation:
 
-    This document clarifies that this port overlapping behavior may be extended to connections originating from different internal source IP addresses and ports as long as their destinations are different.
+> This document clarifies that this port overlapping behavior may be extended to connections originating from different internal source IP addresses and ports as long as their destinations are different.
 
 This gives us an opportunity for a vast reduction in the number of required ports and file descriptors. Practically, though, it does require us to make a guess about which potential peers one source port may communicate with.
 
 Our UDP implementation will need to choose a port assignment based on knowledge of only the first peer the app is sending to.
 Heuristically, we can make this work. The first peer in practice will be less unique than subsequent peers. Applications will contact centralized services before contacting peers. This ordering is necessary in the general case of ICE-like connection establishment.
 
-## Additional risks to anonymity
-
-TODO: ICE connectivity checks, as mentioned elsewhere.
-
-TODO: Are there plaintext identifiers in these telecom apps?
-
-TODO: Is there any chance we make the anonymity risk worse by providing UDP exits than it would be with an application-provided TCP relay server?
-
-### Traffic injection attacks
+#### Traffic Injection
 
 Some forms of UDP support would have obvious and severe traffic injection vulnerabilities. For example, the very permissive *endpoint-independent filtering* strategy would allow any host on the internet to send datagrams in bulk to all available local ports on a Tor exit in order to map that traffic's effect on any guards they control.
 
@@ -583,7 +616,7 @@ Of particular interest is the plaintext STUN, TURN, and ICE traffic used by most
 
 These attacks are not fully unique to the proposed UDP support, but UDP may increase exposure. In cases where the application already has a fallback using TURN-over-TLS, the proposal is a clear regression over previous behaviors. Even when we are comparing plaintext to plaintext, there may be a serious downside to centralizing all connection establishment traffic through a small number of exit IPs. Depending on your threat model, it could very well be more private to allow the UDP traffic to bypass Tor entirely.
 
-### Peer to peer deanonymization attacks
+#### Peer-to-Peer Deanonymization
 
 One of our goals was to achieve the compatibility and perhaps performance benefits of allowing "peer to peer" (in our case really exit-to-exit) UDP connections. We expect this to enable the subset of applications that lack a fallback path which loops traffic through an app-provided server.
 
@@ -592,3 +625,11 @@ This goal may be at odds with our privacy requirements. At minimum, a pool of ma
 TODO: Seems likely applications do often leak enough information through the plaintext portions of their UDP traffic in order to facilitate fingerprinting, I should look closer at this and confirm or deny.
 
 Even if the application traffic itself is fingerprint-resistant, this is easily combined with the above traffic injection attacks in order to mark specific communicating peers.
+
+### Additional Risks to Anonymity
+
+TODO: ICE connectivity checks, as mentioned elsewhere.
+
+TODO: Are there plaintext identifiers in these telecom apps?
+
+TODO: Is there any chance we make the anonymity risk worse by providing UDP exits than it would be with an application-provided TCP relay server?
author	Micah Elizabeth Scott <beth@torproject.org>	2024-01-11 09:13:52 -0800
committer	Micah Elizabeth Scott <beth@torproject.org>	2024-01-25 08:56:48 -0800
commit	679138c337affa080ce1ebb47da43e8fdc69d3ce (patch)
tree	5cbf03c9d22005f8308550164e3c469d5b3a7a20
parent	25feebf9d0a0de3e36fcf3331e41e0f9e9d16afa (diff)
download	torspec-679138c337affa080ce1ebb47da43e8fdc69d3ce.tar.gz torspec-679138c337affa080ce1ebb47da43e8fdc69d3ce.zip