Thoughts on stream allocation pitfalls, and a suggestion

author: Micah Elizabeth Scott <beth@torproject.org> 2023-12-07 21:20:02 -0800
committer: Micah Elizabeth Scott <beth@torproject.org> 2024-01-25 08:56:48 -0800
commit: c1c238f4fa5bf318517091eab8695ba09dc3368c (patch)
tree: 74c118281fcde5cc73f4b779661d8bf73a1899c3
parent: 094e0071c7fa7d23ac8a32cf3615b7b4b416d079 (diff)
download: torspec-c1c238f4fa5bf318517091eab8695ba09dc3368c.tar.gz
torspec-c1c238f4fa5bf318517091eab8695ba09dc3368c.zip
1 files changed, 68 insertions, 3 deletions
diff --git a/proposals/339-udp-over-tor.md b/proposals/339-udp-over-tor.md
index 3395b0b..2db0341 100644
--- a/proposals/339-udp-over-tor.md
+++ b/proposals/339-udp-over-tor.md
@@ -110,11 +110,10 @@ For example, in analyzing Tor as a type of carrier-grade NAT, we may consider th
 Tor by necessity must carefully limit how predictable these mappings can ever be, to preserve its anonymity properties.
 A literal application of RFC6888 would find trouble in REQ-2 and REQ-9, as well as the various per-subscriber limiting requirements.
 
-Some edge cases must be carefully considered.
 RFC4787 defines a framework for understanding the behavior of NAT by analyzing both its "mapping" and "filtering" behavior separately.
 Mappings are the NAT's unit of state tracking.
 Filters are layered on top of mappings, potentially rejecting incoming datagrams that don't match an already-expected address.
-Both RFC4787 and the demands of peer to peer applications make a good case for always using an "Endpoint-Independent Mapping".
+Both RFC4787 and the demands of peer to peer applications make a good case for always using an **Endpoint-Independent Mapping**.
 
 Choice of filtering strategy is left open by the BEHAVE-WG recommendations.
 RFC4787 defines three types with different properties, and does not make one single recommendation for all circumstances.
@@ -239,8 +238,74 @@ TODO: Various kinds of traffic we want to avoid
 - Excessive number of peers (makes port scanning too much easier)
 
 
-# Tor protocol specification
+# Tor protocol design
+
+Using the specification- and application-based goals above, here we will briefly discuss the design constraints as they relate to Tor's protocol.
+
+## Stream usage
+
+An early design juncture in this project is the particular choice of scope for one *stream* in the existing Tor protocol.
+
+- One stream **per socket** was the approach suggested in an earlier version of this proposal.
+
+  Each stream would match the lifetime of a source port allocation.
+  There would be a single peer address/port allowed per allocation.
+  This matches the usage of BSD-style sockets on which `connect()` has completed.
+  It's incompatible with many of the applications analyzed.
+  Multiple peers are typically needed for a variety of reasons, like connectivity checks or multi-region servers.
+
+  This approach would be simplest to implement and specify.
+  It also unfortunately has very limited compatibility, and no clear path toward incremental upgrades if we wish to improve compatibility later.
+  
+- One stream **per flow** has also been suggested. This would assign a stream ID to the combination of an allocated source port and a remote peer address/port. The flows may contain additional flags, like transmit and receive filtering, IPv4/v6, and *Don't Fragment*.
+
+  This has advantages in keeping the datagram cells simple, with no additional IDs beyond the existing circuit ID.
+  It may also have advantages in DoS-prevention and in privacy analysis.
+  
+  By decoupling protocol features from the lifetime of sockets on the exit side, we facilitate implementing the desirable "Port overlapping" NAT behavior as mentioned above.
+
+  Stream lifetimes, in this case, would not have any specific meaning other than the lifetime of the ID itself.
+  The bundle of flows associated with one source port would still all be limited to the lifetime of a Tor circuit, by scoping the source port identifier to be contained within the lifetime of its circuit.
+
+  It would be necessary to allocate a new stream ID any time a new set of parameters (source port, remote address, remote port) is seen.
+  This would most commonly happen as a result of a first datagram sent to a new peer, coinciding with the establishment of a NAT-style mapping and the possible allocation of a socket on the exit.
+  A less common case needs to be considered too: what if the parameter tuple first occurs on the exit side?
+  We don't yet have a way to allocate stream IDs from either end of a circuit.
+  This would need to be considered, and the simplest solution might just be to partition the stream ID space into a half that can be allocated by each side.
+  This leaves a quite unfair split in the common case where streams are almost always allocated by the OP side.
+
+  When is this exit-originated circuit ID allocation potentially needed?
+  It is clearly needed when using **address-dependent filtering**.
+  An incoming datagram from a previously-unseen peer port is expected to be deliverable, and the exit would need to allocate an ID for it.
 
+  Even with the stricter **address and port-dependent filtering** we may still be exposed to exit-originated circuit IDs if there are mismatches in the lifetime of the filter and the stream.
+
+  This approach thus requires some attention to either correctly allocating stream IDs on both sides of the circuit, or choosing a filtering strategy and filter/mapping lifetime that does not ever leave stream IDs undefined when expecting incoming datagrams.
+
+- One stream **per mapping** is an alternative which attemps to reduce the number of edge cases by merging the lifetimes of one stream and one **endpoint-independent mapping**.
+
+  A mapping would always be allocated from the OP side.
+  It could explicitly specify a filtering style, if we wish to allow applications to request non-port-dependent filtering for compatibility.
+  Each datagram within the stream would still need to be tagged with a peer address/port in some way.
+
+  This puts us in a very similar design space to TURN, RFC8656.
+  In TURN, "allocations" are made explicitly on request, and assigned a random relayed port.
+  TURN also uses its "allocations" as an opportunity to support a *Don't Fragment* flag.
+  
+  Within the scope of one allocation, datagrams each still may have an arbitrary peer address. This takes up additional space in every datagram.
+
+- Hybrid approach, one stream **per mapping** and **per flow** at the same time
+
+  We could try to combine the strengths of both approaches, using the lifetime of one stream to define a mapping and to carry otherwise-unbundled traffic while also allowing additional streams to bundle datagrams that would otherwise have repetetive headers.
+
+  This avoids the space overhead of a purely **per mapping** approach and avoids the ID allocation and lifetime complexity introduced with **per flow**.
+  
+  This takes more inspiration from TURN, where commonly used peers will be defined as a "channel" with an especially short header.
+  In TURN, a "channel" is only ever allocated from the originating side of the connection.
+  Incoming datagrams with no channel can always be represented in the long form, so TURN never has to allocate channels unexpectedly.
+
+
+# Tor protocol specification
 
 ## Overview
author	Micah Elizabeth Scott <beth@torproject.org>	2023-12-07 21:20:02 -0800
committer	Micah Elizabeth Scott <beth@torproject.org>	2024-01-25 08:56:48 -0800
commit	c1c238f4fa5bf318517091eab8695ba09dc3368c (patch)
tree	74c118281fcde5cc73f4b779661d8bf73a1899c3
parent	094e0071c7fa7d23ac8a32cf3615b7b4b416d079 (diff)
download	torspec-c1c238f4fa5bf318517091eab8695ba09dc3368c.tar.gz torspec-c1c238f4fa5bf318517091eab8695ba09dc3368c.zip