aboutsummaryrefslogtreecommitdiff
path: root/proposals/171-separate-streams.txt
diff options
context:
space:
mode:
authorNick Mathewson <nickm@torproject.org>2010-11-29 14:29:47 -0500
committerNick Mathewson <nickm@torproject.org>2010-12-07 11:00:10 -0500
commit5d96921aae11ca4716b11bc4b478ac3bc5ab0e11 (patch)
treeff3e20365c9b8be690228ead4a37350a6964da89 /proposals/171-separate-streams.txt
parent38c4046759b2094139eef9c65379279fc608ff93 (diff)
downloadtorspec-5d96921aae11ca4716b11bc4b478ac3bc5ab0e11.tar.gz
torspec-5d96921aae11ca4716b11bc4b478ac3bc5ab0e11.zip
Revise proposal 171 from start to finish
The big semantic change is to make the IsolateFoo options exist on a per-client-port basis.
Diffstat (limited to 'proposals/171-separate-streams.txt')
-rw-r--r--proposals/171-separate-streams.txt393
1 files changed, 322 insertions, 71 deletions
diff --git a/proposals/171-separate-streams.txt b/proposals/171-separate-streams.txt
index d71a5c2..958b8f7 100644
--- a/proposals/171-separate-streams.txt
+++ b/proposals/171-separate-streams.txt
@@ -1,99 +1,350 @@
-Filename: 171-separate-streams-by-port-or-host.txt
-Title: Separate streams across circuits by destination port or destination host
-Author: Robert Hogan, Jacob Appelbaum, Damon McCoy
+Filename: 171-separate-streams.txt
+Title: Separate streams across circuits by connection metadata
+Author: Robert Hogan, Jacob Appelbaum, Damon McCoy, Nick Mathewson
Created: 21-Oct-2008
-Modified: 30-Aug-2010
-Status: Draft
+Modified: 7-Dec-2010
+Status: Open
+
+Summary:
+
+ We propose a new set of options to isolate unrelated streams from one
+ another, putting them on separate circuits so that semantically
+ unrelated traffic is not inadvertently made linkable.
Motivation:
-Streams are currently attached to circuits without regard to their content,
-destination host, or destination port. We propose three options,
-IsolateBySOCKSUser, IsolateStreamsByPort and IsolateStreamsByHost to change the
-default behavior.
+ Currently, Tor attaches regular streams (that is, ones not carrying
+ rendezvous or directory traffic) to circuits based only on whether Tor
+ circuit's current exit node supports the destination, and whether the
+ circuit has been dirty (that is, in use) for too long.
+
+ This means that traffic that would otherwise be unrelated sometimes
+ gets sent over the same circuit, allowing the exit node to link such
+ streams with certainty, and allowing other parties to link such
+ streams probabilistically.
+
+ Older versions of onion routing tried to address this problem by
+ sending every stream over a separate circuit; performance issues made
+ this unfeasible. Moreover, in the presence of a localized adversary,
+ separating streams by circuits increases the odds that, for any given
+ linked set of streams, at least one will go over a compromised
+ circuit.
+
+ Therefore we ought to look for ways to allow streams that ought to be
+ linked to travel over a single circuit, while keeping streams that
+ ought not be linked isolated to separate circuits.
+
+Discussion:
+
+ Let's call a series of inherently-linked streams (like a set of
+ streams downloading objects from the same webpage, or a browsing
+ session where the user requests several related webpages) a "Session".
+
+ "Sessions" are a necessarily a fuzzy concept. While users typically
+ consider some activities as wholly unrelated to each other ("My IM
+ session has nothing to do with my web browsing!"), the boundaries
+ between activities are sometimes hard to determine. If I'm reading
+ lolcats in one browser tab and reading about treatments for an
+ embarrassing disease in another, those are probably separate sessions.
+ If I search for a forum, log in, read it for a while, and post a few
+ messages on unrelated topics, that's probably all the same session.
+
+ So with the proviso that no automated process can identify sessions
+ 100% accurately, let's see which options we have available.
+
+ Generally, all the streams on a session come from a single
+ application. Unfortunately, isolating streams by application
+ automatically isn't feasible, given the lack of any nice
+ cross-platform way to tell which local process originated a given
+ connection. (Yes, lsof works. But a quick review of the lsof code
+ should be sufficient to scare you away from thinking there is a
+ portable option, much less a portable O(1) option.) So instead, we'll
+ have to use some other aspect of a Tor request as a proxy for the
+ application.
+
+ Generally, traffic from separate applications is not in the same
+ session.
+
+ With some applications (IRC, for example), each stream is a session.
-The contents of some streams will always have revealing plain text information;
-these streams should be treated differently than other streams that may or may
-not have unencrypted PII content. DNS, with the exception of DNSCurve, is
-always unencrypted. It is reasonable to assume that other protocols may exist
-that have a similar issue and may cause user concern. It is also the case that
-we must balance network load issues and stream privacy. The Tor network will not
-currently scale to one circuit per application connection nor should it anytime
-soon.
+ Some applications (most notably web browsing) can't be meaningfully
+ split into sessions without inspecting the traffic itself and
+ maintaining a lot of state.
-Circuits are currently created with a few constraints and are rotated within
-a reasonable time window. This allows a rogue exit node to correlate all
-streams on a given circuit.
+ How well do ports correspond to sessions? Early versions of this
+ proposal focused on using destination ports as a proxy for
+ application, since a connection to port 22 for SSH is probably not in
+ the same session as one to port 80. This only works with some
+ applications better than others, though: while SSH users typically
+ know when they're on port 22 and when they aren't, a web browser can
+ be coaxed (though img urls or any number of releated tricks) into
+ connecting to any port at all. Moreover, when Tor gets a DNS lookup
+ request, it doesn't know in advance which port the resulting address
+ will be used to connect to.
+
+ So in summary, each kind of traffic wants to follow different rules,
+ and assuming the existence of a web browser and a hostile web page or
+ exit node, we can't tell one kind of traffic from another by simply
+ looking at the destination:port of the traffic.
+
+ Fortunately, we're not doomed.
Design:
-We propose two options for isolation of streams that lessen the observability
-and linkability of the Tor client's traffic.
+ When a stream arrives at Tor, we have the following data to examine:
+ 1) The destination address
+ 2) The destination port (unless this a DNS lookup)
+ 3) The protocol used by the application to send the stream to Tor:
+ SOCKS4, SOCKS4A, SOCKS5, or whatever local "transparent proxy"
+ mechanism the kernel gives us.
+ 4) The port used by the application to send the stream to Tor --
+ that is, the SOCKSListenAddress or TransListenAddress that the
+ application used, if we have more than one.
+ 5) The SOCKS username and password, if any.
+ 6) The source address and port for the application.
-IsolateStreamsByPort will take a list of ports or optionally the keyword 'All'
-in place of a port list. The use of the keyword 'All' will ensure that all
-application connections attached to streams will be isolated to separate
-circuits by port number.
+ We propose to use 3, 4, and 5 as a backchannel for applications to
+ tell Tor about different sessions. Rather than running only one
+ SOCKSPort, a Tor user who would prefer better session isolation should
+ run multiple SOCKSPorts/TransPorts, and configure different
+ applications to use separate ports. Applications that support SOCKS
+ authentication can further be separated on a single port by their
+ choice of username/password. Streams sent to separate ports or using
+ different authentication information should never be sent over the
+ same circuit. We allow each port to have its own settings for
+ isolation based on destination port, destination address, or both.
-IsolateStreamsByHost will take a boolean value. When enabled, all application
-connections, regardless of port number will be isolated with separate circuits
-per host. If this option is enabled, we should ensure that the client has a
-reasonable number of pre-built circuits to ensure perceived performance. This
-should also intentionally limit the total number of circuits a client will
-build to ten circuits to prevent abuse and load on the network. This is a
-trade-off of performance for anonymity. Tor will issue a warning if a client
-encounters this limit.
+ Handling DNS can be a challenge. We can get hostnames by one of three
+ means:
-IsolateBySOCKSUser will take a boolean value. When enabled, all application
-connections, regardless of port number will be isolated with separate circuits
-per SOCKS username. This options ensures that any two streams that were created
-with different SOCKS usernames will be sent over different circuits. The empty
-username will be treated as its own username different from all other usernames.
+ A) A SOCKS4a request, or a SOCKS5 request with a hostname. This
+ case is handled trivially using the rules above.
+ B) A RESOLVE request on a SOCKSPort. This case is handled using the
+ rules above, except that port isolation can't work to isolate
+ RESOLVE requests into a proper session, since we don't know which
+ port will eventually be used when we connect to the returned
+ address.
+ C) A request on a DNSPort. We have no way of knowing which
+ address/port will be used to connect to the requested address.
-Security implications:
+ When B or C is required but problematic, we could favor the use of
+ AutomapHostsOnResolve.
-It is believed that the proposed changes will improve the anonymity for end
-user stream privacy. The end user will no longer link all streams at a single
-exit node during a given time window.
+Interface:
-There is a possible attack where a hostile web page possibly in collusion with
-an exit node contains image links for images at (say) "evil.example.com:53" and
-"evil.example.com:31337", and thereby (if they're lucky) correlate port-80
-circuits with port-53 and port-31337 circuits.
+ We propose that {SOCKS,Natd,Trans,DNS}ListenAddr be deprecated in
+ favor of an expanded {SOCKS,Natd,Trans,DNS}Port syntax:
-Specification:
+ ClientPortLine = OptionName SP (Addr ":")? Port (SP Options?)
+ OptionName = "SOCKSPort" / "NatdPort" / "TransPort" / "DNSPort"
+ Addr = An IPv4 address / an IPv6 address surrounded by brackets.
+ If optional, we default to 127.0.0.1
+ Port = An integer from 1 through 65535 inclusive
+ Options = Option
+ Options = Options SP Option
+ Option = IsolateOption / GroupOption
+ GroupOption = "SessionGroup=" UINT
+ IsolateOption = OptNo ("IsolateDestPort" / "IsolateDestAddr" /
+ "IsolateSOCKSUser"/ "IsolateClientProtocol" /
+ "IsolateClientAddr") OptPlural
+ OptNo = "No" ?
+ OptPlural = "s" ?
+ SP = " "
+ UINT = An unsigned integer
+
+ All options are case-insensitive.
+
+ The "IsolateSOCKSUser" and "IsolateClientAddr" options are on by
+ default; "NoIsolateSOCKSUser" and "NoIsolateClientAddr" respectively
+ turn them off. The IsolateDestPort and IsolateDestAddr and
+ IsolateClientProtocol options are off by default. NoIsolateDestPort and
+ NoIsolateDestAddr and NoIsolateClientProtocol have no effect.
+
+ Given a set of ClientPortLines, streams must NOT be placed on the same
+ circuit if ANY of the following hold:
+
+ * They were sent to two different client ports, unless the two
+ client ports both specify a "SessionGroup" option with the same
+ integer value.
+ * At least one was sent to a client port with the IsolateDestPort
+ active, and they have different destination ports.
+ * At least one was sent to a client port with IsolateDestAddr
+ active, and they have different destination addresses.
+ * At least one was sent to a client port with IsolateClientProtocol
+ active, and they use different protocols (where SOCKS4, SOCKS4a,
+ SOCKS5, TransPort, NatdPort, and DNS are the protocols in question)
+ * At least one was sent to a client port with IsolateSOCKSUser
+ active, and they have different SOCKS username/password values
+ configurations. (For the purposes of this option, the
+ username/password pair of ""/"" is distinct from SOCKS without
+ authentication, and both are distinct from any non-SOCKS client's
+ non-authentication.)
+ * At least one was sent to a client port with IsolateClientAddr
+ active, and they came from different client addresses. (For the
+ purpose of this option, any local interface counts as the same
+ address. So if the host is configured with addresses 10.0.0.1,
+ 192.0.32.10, and 127.0.0.1, then traffic from those addresses can
+ leave on the same circuit, but traffic to from 10.0.0.2 (for
+ example) could not share a circuit with any of them.)
+
+ These rules apply regardless of whether the streams are active at the
+ same time. In other words, if the rules say that streams A and B must
+ not be on the same circuit, and stream A is attached to circuit X,
+ then stream B must never be attached to stream X, even if stream A is
+ closed first.
+
+Alternative Interface:
+
+ We're cramming a lot onto one line in the design above. Perhaps
+ instead it would be a better idea to have grouped lines of the form:
+
+ StreamGroup 1
+ SOCKSPort 9050
+ TransPort 9051
+ IsolateDestPort 1
+ IsolateClientProtocol 0
+ EndStreamGroup
+
+ StreamGroup 2
+ SOCKSPort 9052
+ DNSPort 9053
+ IsolateDestAddr 1
+ EndStreamGroup
-The Tor client circuit selection process is not entirely specified. Any client
-circuit specification must take these changes into account.
+ This would be equivalent to:
+ SOCKSPort 9050 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
+ TransPort 9051 SessionGroup=1 IsolateDestPort NoIsolateClientProtocol
+ SOCKSPort 9052 SessionGroup=2 IsolateDestAddr
+ DNSPort 9053 SessionGroup=2 IsolateDestAddr
-Compatibility:
+ But it would let us extend range of allowed options later without
+ having client port lines group without bound. For example, we might
+ give different circuit building parameters to different session
+ groups.
-The proposed changes should not create any compatibility issues. New Tor clients
-will be able to take advantage of this without any modification to the network.
+Example of use:
-Implementation:
+ Suppose that we want to use a web browser, an IRC client, and a SSH
+ client all at the same time. Let's assume that we want web traffic to
+ be isolated from all other traffic, even if the browser makes
+ connections to ports usually used for IRC or SSH. Let's also assume
+ that IRC and SSH are both used for relatively long-lived connections,
+ and we want to keep all IRC/SSH sessions separate from one another.
-It is further proposed that IsolateStreamsByPort will be enabled by default
-for port 22, 53, and port 80.
+ In this case, we could say:
-It is further proposed that IsolateStreamsByHost will be disabled by default.
+ SOCKSPort 9050
+ SOCKSPort 9051 IsolateDestAddr IsolateDestPort
+
+ We would then configure our browser to use 9050 and our IRC/SSH
+ clients to use 9051.
+
+Advanced example of use, #2:
+
+ Suppose that we have a bunch of applications, and we launch them all
+ using torsocks, and we want to keep each applications isolated from
+ one another. We just create a shell script, "torlaunch":
+ #!/bin/bash
+ export TORSOCKS_USERNAME="$1"
+ exec torsocks $@
+ And we configure our SOCKSPort with IsolateSOCKSUser.
+
+ Or if we're on Linux and we want to isolate by application invocation,
+ we would change the TORSOCKS_USERNAME line to:
+
+ export TORSOCKS_USERNAME="`cat /proc/sys/kernel/random/uuid`"
+
+Advanced example of use, #2:
+
+ Now suppose that we want to achieve the benefits of the first example
+ of use, but we are stuck using transparent proxies. Let's suppose
+ this is Linux.
+
+ TransPort 9090
+ TransPort 9091 IsolateDestAddr IsolateDestPort
+ DNSPort 5353
+ AutomapHostsOnResolve 1
+
+ Here we use the iptables --cmd-owner filter to distinguish which
+ command is originating the packets, directing traffic from our irc
+ client and our SSH client to port 9091, and directing other traffic to
+ 9090. Using AutomapHostsOnResolve will confuse ssh in its default
+ configuration; we'll need to find a way around that.
+
+Security Risks:
+
+ Disabling IsolateClientAddr is a pretty bad idea.
+
+ Setting up a set of applications to use this system effectively is a
+ big problem. It's likely that lots of people who try to do this will
+ mess it up. We should try to see which setups are sensible, and see
+ if we can provide good feedback to explain which streams are isolated
+ how.
+
+Performance Risks:
+
+ This proposal will result in clients building many more circuits than
+ they do today. To avoid accidentally hammering the network, we should
+ have in-process limits on the maximum circuit creation rate and the
+ total maximum client circuits.
+
+Specification:
+
+ The Tor client circuit selection process is not entirely specified.
+ Any client circuit specification must take these changes into account.
Implementation notes:
-The implementation of this option may want to consider cases where the same
-exit node is shared by two or more circuits and IsolateStreamsByPort is in
-force. Since the purpose of the option is to reduce the opportunity of Exit
-Nodes to attack traffic from the same source on multiple ports, the
-implementation may need to ensure that circuits reserved for the exclusive use
-of given ports do not share the same exit node.
+ The more obvious ways to implement the "find a good circuit to attach
+ to" part of this proposal involve doing an O(n_circuits) operation
+ every time we have a stream to attach. We already do such an
+ operation, so it's not as if we need to hunt for fancy ways to make it
+ O(1). What will be harder is implementing the "launch circuits as
+ needed" part of the proposal. Still, it should come down to "a simple
+ matter of programming."
+
+ The SOCKS4 spec has the client provide authentication info when it
+ connects; accepting such info is no problem. But the SOCKS5 spec has
+ the client send a list of known auth methods, then has the server send
+ back the authentication method it chooses. We'll need to update the
+ SOCKS5 implementation so it can accept user/password authentication if
+ it's offered.
+
+ If we use the second syntax for describing these options, we'll want
+ to add a new "section-based" entry type for the configuration parser.
+ Not a huge deal; we already have kludged up something similar for
+ hidden service configurations.
+
+ Opening circuits for predicted ports has the potential to get a little
+ more complicated; we can probably get away with the existing
+ algorithm, though, to see where its weak points are and look for
+ better ones.
+
+ Perhaps we can get our next-gen HTTP proxy to communicate browser tab
+ or session into to tor via authentication, or have torbutton do it
+ directly. More design is needed here, though.
+
+Alternative designs:
+
+ The implementation of this option may want to consider cases where the
+ same exit node is shared by two or more circuits and
+ IsolateStreamsByPort is in force. Since one possible use of the option
+ is to reduce the opportunity of Exit Nodes to attack traffic from the
+ same source on multiple ports, the implementation may need to ensure
+ that circuits reserved for the exclusive use of given ports do not
+ share the same exit node. On the other hand, if our goal is only that
+ streams should be unlinkable, deliberately shunting them to different
+ exit nodes is unnecessary and slightly counterproductive.
-Circuits should not be shared by unique clients. Tor should check to ensure
-that peer IP addresses are identical when they connect to the SOCKS listener or
-the TransPort listener before sharing a circuit. If the addresses are not
-identical, Tor should ensure that the circuits are not shared.
+ Earlier versions of this design included a mechanism to isolate
+ _particular_ destination ports and addresses, so that traffic sent to,
+ say, port 22 would never share a port with any traffic *not* sent to
+ port 22. You can achieve this here by having all applications that
+ send traffic to one of these ports use a separate SOCKSPort, and
+ then setting IsolateDestPorts on that SOCKSPort.
-Performance and scalability notes:
+Lingering questions:
-It is further proposed that IsolateStreamsByPort will be enabled by default for
-all ports after a reasonable assessment is performed. Specifically, we should
-determine the impact this option has on Tor clients and the Tor network.
+ I suspect there are issues remaining with DNS and TransPort users, and
+ that my "just use AutomapHostsOnResolve" suggestion may be
+ insufficient.