From 3431377d86e0891a41ce0882069c16f3f87d2142 Mon Sep 17 00:00:00 2001 From: Paul Syverson Date: Fri, 1 Jun 2007 19:34:58 +0000 Subject: First stab at magazine article. Must be at most half this long. svn:r10442 --- doc/design-paper/blocking.pdf | Bin 199671 -> 199683 bytes doc/design-paper/challenges2.tex | 1593 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 1593 insertions(+) create mode 100644 doc/design-paper/challenges2.tex diff --git a/doc/design-paper/blocking.pdf b/doc/design-paper/blocking.pdf index e5d38e4d75..8ab25ac440 100644 Binary files a/doc/design-paper/blocking.pdf and b/doc/design-paper/blocking.pdf differ diff --git a/doc/design-paper/challenges2.tex b/doc/design-paper/challenges2.tex new file mode 100644 index 0000000000..03c4ec50cc --- /dev/null +++ b/doc/design-paper/challenges2.tex @@ -0,0 +1,1593 @@ +\documentclass{llncs} + +\usepackage{url} +\usepackage{amsmath} +\usepackage{epsfig} + +\setlength{\textwidth}{5.9in} +\setlength{\textheight}{8.4in} +\setlength{\topmargin}{.5cm} +\setlength{\oddsidemargin}{1cm} +\setlength{\evensidemargin}{1cm} + +\newenvironment{tightlist}{\begin{list}{$\bullet$}{ + \setlength{\itemsep}{0mm} + \setlength{\parsep}{0mm} + % \setlength{\labelsep}{0mm} + % \setlength{\labelwidth}{0mm} + % \setlength{\topsep}{0mm} + }}{\end{list}} + + +\newcommand{\workingnote}[1]{} % The version that hides the note. +%\newcommand{\workingnote}[1]{(**#1)} % The version that makes the note visible. + + +\begin{document} + +\title{Design challenges and social factors in deploying low-latency anonymity} + +\author{Roger Dingledine\inst{1} \and +Nick Mathewson\inst{1} \and +Paul Syverson\inst{2}} +\institute{The Free Haven Project \email{<\{arma,nickm\}@freehaven.net>} \and +Naval Research Laboratory \email{}} + +\maketitle +\pagestyle{plain} + +\begin{abstract} + There are many unexpected or unexpectedly difficult obstacles to + deploying anonymous communications. We describe the design + philosophy of Tor (the third-generation onion routing network), and, + drawing on our experiences deploying Tor, we describe social + challenges and related technical issues that must be faced in + building, deploying, and sustaining a scalable, distributed, + low-latency anonymity network. +\end{abstract} + +\section{Introduction} +% Your network is not practical unless it is sustainable and distributed. +Anonymous communication is full of surprises. This article describes +Tor, a low-latency general-purpose anonymous communication system, and +discusses some unexpected challenges arising from our experiences +deploying Tor. We will discuss +some of the difficulties we have experienced and how we have met them (or how +we plan to meet them, if we know). +% We also discuss some less +% troublesome open problems that we must nevertheless eventually address. +%We will describe both those future challenges that we intend to explore and +%those that we have decided not to explore and why. + +Tor is an overlay network for anonymizing TCP streams over the +Internet~\cite{tor-design}. It addresses limitations in earlier Onion +Routing designs~\cite{or-ih96,or-jsac98,or-discex00,or-pet00} by adding +perfect forward secrecy, congestion control, directory servers, data +integrity, +%configurable exit policies, Huh? That was part of the gen. 1 design -PFS +and a revised design for location-hidden services using +rendezvous points. Tor works on the real-world Internet, requires no special +privileges or kernel modifications, requires little synchronization or +coordination between nodes, and provides a reasonable trade-off between +anonymity, usability, and efficiency. + +We deployed the public Tor network in October 2003; since then it has +grown to over nine hundred volunteer-operated nodes worldwide +and over 100 megabytes average traffic per second from hundreds of +thousands of concurrent users. +Tor's research strategy has focused on deploying +a network to as many users as possible; thus, we have resisted designs that +would compromise deployability by imposing high resource demands on node +operators, and designs that would compromise usability by imposing +unacceptable restrictions on which applications we support. Although this +strategy has drawbacks (including a weakened threat model, as +discussed below), it has made it possible for Tor to serve many +hundreds of thousands of users and attract funding from diverse +sources whose goals range from security on a national scale down to +individual liberties. + +In~\cite{tor-design} we gave an overall view of Tor's design and +goals. Here we review that design at a higher level and describe +some policy and social issues that we face as +we continue deployment. Though we will discuss technical responses to +these, we do not in this article discuss purely technical challenges +facing Tor (e.g., transport protocol, resource scaling issues, moving +to non-clique topologies, performance, etc.), nor do we even cover +all of the social issues: we simply touch on some of the most salient of these. +Also, rather than providing complete solutions to every problem, we +instead lay out the challenges and constraints that we have observed while +deploying Tor. In doing so, we aim to provide a research agenda +of general interest to projects attempting to build +and deploy practical, usable anonymity networks in the wild. + +%While the Tor design paper~\cite{tor-design} gives an overall view its +%design and goals, +%this paper describes the policy and technical issues that Tor faces as +%we continue deployment. Rather than trying to provide complete solutions +%to every problem here, we lay out the assumptions and constraints +%that we have observed through deploying Tor in the wild. In doing so, we +%aim to create a research agenda for others to +%help in addressing these issues. +% Section~\ref{sec:what-is-tor} gives an +%overview of the Tor +%design and ours goals. Sections~\ref{sec:crossroads-policy} +%and~\ref{sec:crossroads-design} go on to describe the practical challenges, +%both policy and technical respectively, +%that stand in the way of moving +%from a practical useful network to a practical useful anonymous network. + +%\section{What Is Tor} +\section{Background} +Here we give a basic overview of the Tor design and its properties, and +compare Tor to other low-latency anonymity designs. + +\subsection{Tor, threat models, and distributed trust} +\label{sec:what-is-tor} + +%Here we give a basic overview of the Tor design and its properties. For +%details on the design, assumptions, and security arguments, we refer +%the reader to the Tor design paper~\cite{tor-design}. + +Tor provides \emph{forward privacy}, so that users can connect to +Internet sites without revealing their logical or physical locations +to those sites or to observers. It also provides \emph{location-hidden +services}, so that servers can support authorized users without +giving an effective vector for physical or online attackers. +Tor provides these protections even when a portion of its +infrastructure is compromised. + +To connect to a remote server via Tor, the client software learns a signed +list of Tor nodes from one of several central \emph{directory servers}, and +incrementally creates a private pathway or \emph{circuit} of encrypted +connections through authenticated Tor nodes on the network, negotiating a +separate set of encryption keys for each hop along the circuit. The circuit +is extended one node at a time, and each node along the way knows only the +immediately previous and following nodes in the circuit, so no individual Tor +node knows the complete path that each fixed-sized data packet (or +\emph{cell}) will take. +%Because each node sees no more than one hop in the +%circuit, +Thus, neither an eavesdropper nor a compromised node can +see both the connection's source and destination. Later requests use a new +circuit, to complicate long-term linkability between different actions by +a single user. + +Tor also helps servers hide their locations while +providing services such as web publishing or instant +messaging. Using ``rendezvous points'', other Tor users can +connect to these authenticated hidden services, neither one learning the +other's network identity. + +Tor attempts to anonymize the transport layer, not the application layer. +This approach is useful for applications such as SSH +where authenticated communication is desired. However, when anonymity from +those with whom we communicate is desired, +application protocols that include personally identifying information need +additional application-level scrubbing proxies, such as +Privoxy~\cite{privoxy} for HTTP\@. Furthermore, Tor does not relay arbitrary +IP packets; it only anonymizes TCP streams and DNS requests. +%, and only supports +%connections via SOCKS +%(but see Section~\ref{subsec:tcp-vs-ip}). + +Most node operators do not want to allow arbitrary TCP traffic. % to leave +%their server. +To address this, Tor provides \emph{exit policies} so +each exit node can block the IP addresses and ports it is unwilling to allow. +Tor nodes advertise their exit policies to the directory servers, so that +client can tell which nodes will support their connections. + +As of this writing, the Tor network has grown to around nine hundred nodes +on four continents, with a total average load exceeding 100 MB/s and +a total capacity exceeding %1Gbit/s. +\\***What's the current capacity? -PFS***\\ +%Appendix A +%shows a graph of the number of working nodes over time, as well as a +%graph of the number of bytes being handled by the network over time. +%The network is now sufficiently diverse for further development +%and testing; but of course we always encourage new nodes +%to join. + +Building from earlier versions of onion routing developed at NRL, +Tor was researched and developed by NRL and FreeHaven under +funding by ONR and DARPA for use in securing government +communications. Continuing development and deployment has also been +funded by the Omidyar Network, the Electronic Frontier Foundation for use +in maintaining civil liberties for ordinary citizens online, and the +International Broadcasting Bureau and Reporters without Borders to combat +blocking and censorship on the Internet. As we will see below, +this wide variety of interests helps maintain both the stability and +the security of the network. + +% The Tor +%protocol was chosen +%for the anonymizing layer in the European Union's PRIME directive to +%help maintain privacy in Europe. +%The AN.ON project in Germany +%has integrated an independent implementation of the Tor protocol into +%their popular Java Anon Proxy anonymizing client. + +\medskip +\noindent +{\bf Threat models and design philosophy.} +The ideal Tor network would be practical, useful and anonymous. When +trade-offs arise between these properties, Tor's research strategy has been +to remain useful enough to attract many users, +and practical enough to support them. Only subject to these +constraints do we try to maximize +anonymity.\footnote{This is not the only possible +direction in anonymity research: designs exist that provide more anonymity +than Tor at the expense of significantly increased resource requirements, or +decreased flexibility in application support (typically because of increased +latency). Such research does not typically abandon aspirations toward +deployability or utility, but instead tries to maximize deployability and +utility subject to a certain degree of structural anonymity (structural because +usability and practicality affect usage which affects the actual anonymity +provided by the network \cite{econymics,back01}).} +%{We believe that these +%approaches can be promising and useful, but that by focusing on deploying a +%usable system in the wild, Tor helps us experiment with the actual parameters +%of what makes a system ``practical'' for volunteer operators and ``useful'' +%for home users, and helps illuminate undernoticed issues which any deployed +%volunteer anonymity network will need to address.} +Because of our strategy, Tor has a weaker threat model than many designs in +the literature. In particular, because we +support interactive communications without impractically expensive padding, +we fall prey to a variety +of intra-network~\cite{back01,attack-tor-oak05,flow-correlation04,hs-attack} +and +end-to-end~\cite{danezis:pet2004,SS03} anonymity-breaking attacks. + +Tor does not attempt to defend against a global observer. In general, an +attacker who can measure both ends of a connection through the Tor network +% I say 'measure' rather than 'observe', to encompass murdoch-danezis +% style attacks. -RD +can correlate the timing and volume of data on that connection as it enters +and leaves the network, and so link communication partners. +Known solutions to this attack would seem to require introducing a +prohibitive degree of traffic padding between the user and the network, or +introducing an unacceptable degree of latency. +Also, it is not clear that these methods would +work at all against a minimally active adversary who could introduce timing +patterns or additional traffic. Thus, Tor only attempts to defend against +external observers who cannot observe both sides of a user's connections. + +Against internal attackers who sign up Tor nodes, the situation is more +complicated. In the simplest case, if an adversary has compromised $c$ of +$n$ nodes on the Tor network, then the adversary will be able to compromise +a random circuit with probability $\frac{c^2}{n^2}$~\cite{or-pet00} +(since the circuit +initiator chooses hops randomly). But there are +complicating factors: +(1)~If the user continues to build random circuits over time, an adversary + is pretty certain to see a statistical sample of the user's traffic, and + thereby can build an increasingly accurate profile of her behavior. +(2)~An adversary who controls a popular service outside the Tor network + can be certain to observe all connections to that service; he + can therefore trace connections to that service with probability + $\frac{c}{n}$. +(3)~Users do not in fact choose nodes with uniform probability; they + favor nodes with high bandwidth or uptime, and exit nodes that + permit connections to their favorite services. +We demonstrated the severity of these problems in experiments on the +live Tor network in 2006~\cite{hsattack} and introduced \emph{entry + guards} as a means to curtail them. By choosing entry nodes from +a small persistent subset, it becomes difficult for an adversary to +increase the number of circuits observed entering the network from any +given client simply by causing +numerous connections or by watching compromised nodes over time.% (See +%also Section~\ref{subsec:routing-zones} for discussion of larger +%adversaries and our dispersal goals.) + + +% I'm trying to make this paragraph work without reference to the +% analysis/confirmation distinction, which we haven't actually introduced +% yet, and which we realize isn't very stable anyway. Also, I don't want to +% deprecate these attacks if we can't demonstrate that they don't work, since +% in case they *do* turn out to work well against Tor, we'll look pretty +% foolish. -NM +More powerful attacks may exist. In \cite{hintz-pet02} it was +shown that an attacker who can catalog data volumes of popular +responder destinations (say, websites with consistent data volumes) may not +need to +observe both ends of a stream to learn source-destination links for those +responders. Entry guards should complicate such attacks as well. +Similarly, latencies of going through various routes can be +cataloged~\cite{back01} to connect endpoints. +% Also, \cite{kesdogan:pet2002} takes the +% attack another level further, to narrow down where you could be +% based on an intersection attack on subpages in a website. -RD +It has not yet been shown whether these attacks will succeed or fail +in the presence of the variability and volume quantization introduced by the +Tor network, but it seems likely that these factors will at best delay +the time and data needed for success +rather than prevent the attacks completely. + +\workingnote{ +Along similar lines, the same paper suggests a ``clogging +attack'' in which the throughput on a circuit is observed to slow +down when an adversary clogs the right nodes with his own traffic. +To determine the nodes in a circuit this attack requires the ability +to continuously monitor the traffic exiting the network on a circuit +that is up long enough to probe all network nodes in binary fashion. +% Though somewhat related, clogging and interference are really different +% attacks with different assumptions about adversary distribution and +% capabilities as well as different techniques. -pfs +Murdoch and Danezis~\cite{attack-tor-oak05} show a practical +interference attack against portions of +the fifty node Tor network as deployed in mid 2004. +An outside attacker can actively trace a circuit through the Tor network +by observing changes in the latency of his +own traffic sent through various Tor nodes. This can be done +simultaneously at multiple nodes; however, like clogging, +this attack only reveals +the Tor nodes in the circuit, not initiator and responder addresses, +so it is still necessary to discover the endpoints to complete an +effective attack. The the size and diversity of the Tor network have +increased many fold since then, and it is unknown if the attacks +can scale to the current Tor network. +} + + +%discuss $\frac{c^2}{n^2}$, except how in practice the chance of owning +%the last hop is not $c/n$ since that doesn't take the destination (website) +%into account. so in cases where the adversary does not also control the +%final destination we're in good shape, but if he *does* then we'd be better +%off with a system that lets each hop choose a path. +% +%Isn't it more accurate to say ``If the adversary _always_ controls the final +% dest, we would be just as well off with such as system.'' ? If not, why +% not? -nm +% Sure. In fact, better off, since they seem to scale more easily. -rd + +%Murdoch and Danezis describe an attack +%\cite{attack-tor-oak05} that lets an attacker determine the nodes used +%in a circuit; yet s/he cannot identify the initiator or responder, +%e.g., client or web server, through this attack. So the endpoints +%remain secure, which is the goal. It is conceivable that an +%adversary could attack or set up observation of all connections +%to an arbitrary Tor node in only a few minutes. If such an adversary +%were to exist, s/he could use this probing to remotely identify a node +%for further attack. Of more likely immediate practical concern +%an adversary with active access to the responder traffic +%wants to keep a circuit alive long enough to attack an identified +%node. Thus it is important to prevent the responding end of the circuit +%from keeping it open indefinitely. +%Also, someone could identify nodes in this way and if in their +%jurisdiction, immediately get a subpoena (if they even need one) +%telling the node operator(s) that she must retain all the active +%circuit data she now has. +%Further, the enclave model, which had previously looked to be the most +%generally secure, seems particularly threatened by this attack, since +%it identifies endpoints when they're also nodes in the Tor network: +%see Section~\ref{subsec:helper-nodes} for discussion of some ways to +%address this issue. + +\medskip +\noindent +{\bf Distributed trust.} +In practice Tor's threat model is based on +dispersal and diversity. +Our defense lies in having a diverse enough set of nodes +to prevent most real-world +adversaries from being in the right places to attack users, +by distributing each transaction +over several nodes in the network. This ``distributed trust'' approach +means the Tor network can be safely operated and used by a wide variety +of mutually distrustful users, providing sustainability and security. +%than some previous attempts at anonymizing networks. + +No organization can achieve this security on its own. If a single +corporation or government agency were to build a private network to +protect its operations, any connections entering or leaving that network +would be obviously linkable to the controlling organization. The members +and operations of that agency would be easier, not harder, to distinguish. + +Instead, to protect our networks from traffic analysis, we must +collaboratively blend the traffic from many organizations and private +citizens, so that an eavesdropper can't tell which users are which, +and who is looking for what information. %By bringing more users onto +%the network, all users become more secure~\cite{econymics}. +%[XXX I feel uncomfortable saying this last sentence now. -RD] +%[So, I took it out. I think we can do without it. -PFS] +The Tor network has a broad range of users, including ordinary citizens +concerned about their privacy, corporations +who don't want to reveal information to their competitors, and law +enforcement and government intelligence agencies who need +to do operations on the Internet without being noticed. +Naturally, organizations will not want to depend on others for their +security. If most participating providers are reliable, Tor tolerates +some hostile infiltration of the network. For maximum protection, +the Tor design includes an enclave approach that lets data be encrypted +(and authenticated) end-to-end, so high-sensitivity users can be sure it +hasn't been read or modified. This even works for Internet services that +don't have built-in encryption and authentication, such as unencrypted +HTTP or chat, and it requires no modification of those services. + +%\subsection{Related work} +Tor differs from other deployed systems for traffic analysis resistance +in its security and flexibility. Mix networks such as +Mixmaster~\cite{mixmaster-spec} or its successor Mixminion~\cite{minion-design} +gain the highest degrees of anonymity at the expense of introducing highly +variable delays, making them unsuitable for applications such as web +browsing. Commercial single-hop +proxies~\cite{anonymizer} can provide good performance, but +a single compromise can expose all users' traffic, and a single-point +eavesdropper can perform traffic analysis on the entire network. +%Also, their proprietary implementations place any infrastructure that +%depends on these single-hop solutions at the mercy of their providers' +%financial health as well as network security. +The Java +Anon Proxy (JAP)~\cite{web-mix} provides similar functionality to Tor but +handles only web browsing rather than all TCP\@. Because all traffic +passes through fixed ``cascades'' for which the endpoints are predictable, +an adversary can know where to watch for traffic analysis from particular +clients or to particular web servers. The design calls for padding to +complicate this, although it does not appear to be implemented. +%Some peer-to-peer file-sharing overlay networks such as +%Freenet~\cite{freenet} and Mute~\cite{mute} +The Freedom +network from Zero-Knowledge Systems~\cite{freedom21-security} +was even more flexible than Tor in +transporting arbitrary IP packets, and also supported +pseudonymity in addition to anonymity; but it had +a different approach to sustainability (collecting money from users +and paying ISPs to run Tor nodes), and was eventually shut down due to financial +load. Finally, %potentially more scalable +% [I had added 'potentially' because the scalability of these designs +% is not established, and I am uncomfortable making the +% bolder unmodified assertion. Roger took 'potentially' out. +% Here's an attempt at more neutral wording -pfs] +peer-to-peer designs that are intended to be more scalable, +for example Tarzan~\cite{tarzan:ccs02} and +MorphMix~\cite{morphmix:fc04}, have been proposed in the literature but +have not been fielded. These systems differ somewhat +in threat model and presumably practical resistance to threats. +Note that MorphMix differs from Tor only in +node discovery and circuit setup; so Tor's architecture is flexible +enough to contain a MorphMix experiment. Recently, +Tor has adopted from MorphMix the approach of making it harder to +own both ends of a circuit by requiring that nodes be chosen from +different /16 subnets. This requires +an adversary to own nodes in multiple address ranges to even have the +possibility of observing both ends of a circuit. We direct the +interested reader to~\cite{tor-design} for a more in-depth review of +related work. + +%XXXX six-four. crowds. i2p. + +%XXXX +%have a serious discussion of morphmix's assumptions, since they would +%seem to be the direct competition. in fact tor is a flexible architecture +%that would encompass morphmix, and they're nearly identical except for +%path selection and node discovery. and the trust system morphmix has +%seems overkill (and/or insecure) based on the threat model we've picked. +% this para should probably move to the scalability / directory system. -RD +% Nope. Cut for space, except for small comment added above -PFS + +\section{Social challenges} + +Many of the issues the Tor project needs to address extend beyond +system design and technology development. In particular, the +Tor project's \emph{image} with respect to its users and the rest of +the Internet impacts the security it can provide. +With this image issue in mind, this section discusses the Tor user base and +Tor's interaction with other services on the Internet. + +\subsection{Communicating security} + +Usability for anonymity systems +contributes to their security, because usability +affects the possible anonymity set~\cite{econymics,back01}. +Conversely, an unusable system attracts few users and thus can't provide +much anonymity. + +This phenomenon has a second-order effect: knowing this, users should +choose which anonymity system to use based in part on how usable +and secure +\emph{others} will find it, in order to get the protection of a larger +anonymity set. Thus we might supplement the adage ``usability is a security +parameter''~\cite{back01} with a new one: ``perceived usability is a +security parameter.'' From here we can better understand the effects +of publicity on security: the more convincing your +advertising, the more likely people will believe you have users, and thus +the more users you will attract. Perversely, over-hyped systems (if they +are not too broken) may be a better choice than modestly promoted ones, +if the hype attracts more users~\cite{usability-network-effect}. + +%So it follows that we should come up with ways to accurately communicate +%the available security levels to the user, so she can make informed +%decisions. +%JAP aims to do this by including a +%comforting `anonymity meter' dial in the software's graphical interface, +%giving the user an impression of the level of protection for her current +%traffic. + +However, there's a catch. For users to share the same anonymity set, +they need to act like each other. An attacker who can distinguish +a given user's traffic from the rest of the traffic will not be +distracted by anonymity set size. For high-latency systems like +Mixminion, where the threat model is based on mixing messages with each +other, there's an arms race between end-to-end statistical attacks and +counter-strategies~\cite{statistical-disclosure,minion-design,e2e-traffic,trickle02}. +But for low-latency systems like Tor, end-to-end \emph{traffic +correlation} attacks~\cite{danezis:pet2004,defensive-dropping,SS03,hs-attack} +allow an attacker who can observe both ends of a communication +to correlate packet timing and volume, quickly linking +the initiator to her destination. + +\workingnote{ +Like Tor, the current JAP implementation does not pad connections +apart from using small fixed-size cells for transport. In fact, +JAP's cascade-based network topology may be more vulnerable to these +attacks, because its network has fewer edges. JAP was born out of +the ISDN mix design~\cite{isdn-mixes}, where padding made sense because +every user had a fixed bandwidth allocation and altering the timing +pattern of packets could be immediately detected. But in its current context +as an Internet web anonymizer, adding sufficient padding to JAP +would probably be prohibitively expensive and ineffective against a +minimally active attacker.\footnote{Even if JAP could +fund higher-capacity nodes indefinitely, our experience +suggests that many users would not accept the increased per-user +bandwidth requirements, leading to an overall much smaller user base.} +Therefore, since under this threat +model the number of concurrent users does not seem to have much impact +on the anonymity provided, we suggest that JAP's anonymity meter is not +accurately communicating security levels to its users. +} + +On the other hand, while the number of active concurrent users may not +matter as much as we'd like, it still helps to have some other users +on the network, in particular different types of users. +We investigate this issue next. + +\subsection{Reputability and perceived social value} +Another factor impacting the network's security is its reputability: +the perception of its social value based on its current user base. If Alice is +the only user who has ever downloaded the software, it might be socially +accepted, but she's not getting much anonymity. Add a thousand +activists, and she's anonymous, but everyone thinks she's an activist too. +Add a thousand +diverse citizens (cancer survivors, privacy enthusiasts, and so on) +and now she's harder to profile. + +Furthermore, the network's reputability affects its operator base: more people +are willing to run a service if they believe it will be used by human rights +workers than if they believe it will be used exclusively for disreputable +ends. This effect becomes stronger if node operators themselves think they +will be associated with their users' disreputable ends. + +So the more cancer survivors on Tor, the better for the human rights +activists. The more malicious hackers, the worse for the normal users. Thus, +reputability is an anonymity issue for two reasons. First, it impacts +the sustainability of the network: a network that's always about to be +shut down has difficulty attracting and keeping adequate nodes. +Second, a disreputable network is more vulnerable to legal and +political attacks, since it will attract fewer supporters. + +While people therefore have an incentive for the network to be used for +``more reputable'' activities than their own, there are still trade-offs +involved when it comes to anonymity. To follow the above example, a +network used entirely by cancer survivors might welcome file sharers +onto the network, though of course they'd prefer a wider +variety of users. + +Reputability becomes even more tricky in the case of privacy networks, +since the good uses of the network (such as publishing by journalists in +dangerous countries) are typically kept private, whereas network abuses +or other problems tend to be more widely publicized. + +The impact of public perception on security is especially important +during the bootstrapping phase of the network, where the first few +widely publicized uses of the network can dictate the types of users it +attracts next. +As an example, some U.S.~Department of Energy +penetration testing engineers are tasked with compromising DoE computers +from the outside. They only have a limited number of ISPs from which to +launch their attacks, and they found that the defenders were recognizing +attacks because they came from the same IP space. These engineers wanted +to use Tor to hide their tracks. First, from a technical standpoint, +Tor does not support the variety of IP packets one would like to use in +such attacks.% (see Section~\ref{subsec:tcp-vs-ip}). +But aside from this, we also decided that it would probably be poor +precedent to encourage such use---even legal use that improves +national security---and managed to dissuade them. + +%% "outside of academia, jap has just lost, permanently". (That is, +%% even though the crime detection issues are resolved and are unlikely +%% to go down the same way again, public perception has not been kind.) + +\subsection{Sustainability and incentives} +One of the unsolved problems in low-latency anonymity designs is +how to keep the nodes running. ZKS's Freedom network +depended on paying third parties to run its servers; the JAP project's +bandwidth depends on grants to pay for its bandwidth and +administrative expenses. In Tor, bandwidth and administrative costs are +distributed across the volunteers who run Tor nodes, so we at least have +reason to think that the Tor network could survive without continued research +funding.\footnote{It also helps that Tor is implemented with free and open + source software that can be maintained by anybody with the ability and + inclination.} But why are these volunteers running nodes, and what can we +do to encourage more volunteers to do so? + +We have not formally surveyed Tor node operators to learn why they are +running nodes, but +from the information they have provided, it seems that many of them run Tor +nodes for reasons of personal interest in privacy issues. It is possible +that others are running Tor nodes to protect their own +anonymity, but of course they are +hardly likely to tell us specifics if they are. +%Significantly, Tor's threat model changes the anonymity incentives for running +%a node. In a high-latency mix network, users can receive additional +%anonymity by running their own node, since doing so obscures when they are +%injecting messages into the network. But, anybody observing all I/O to a Tor +%node can tell when the node is generating traffic that corresponds to +%none of its incoming traffic. +% +%I didn't buy the above for reason's subtle enough that I just cut it -PFS +Tor exit node operators do attain a degree of +``deniability'' for traffic that originates at that exit node. For + example, it is likely in practice that HTTP requests from a Tor node's IP + will be assumed to be from the Tor network. + More significantly, people and organizations who use Tor for + anonymity depend on the + continued existence of the Tor network to do so; running a node helps to + keep the network operational. +%\item Local Tor entry and exit nodes allow users on a network to run in an +% `enclave' configuration. [XXXX need to resolve this. They would do this +% for E2E encryption + auth?] + + +%We must try to make the costs of running a Tor node easily minimized. +Since Tor is run by volunteers, the most crucial software usability issue is +usability by operators: when an operator leaves, the network becomes less +usable by everybody. To keep operators pleased, we must try to keep Tor's +resource and administrative demands as low as possible. + +Because of ISP billing structures, many Tor operators have underused capacity +that they are willing to donate to the network, at no additional monetary +cost to them. Features to limit bandwidth have been essential to adoption. +Also useful has been a ``hibernation'' feature that allows a Tor node that +wants to provide high bandwidth, but no more than a certain amount in a +giving billing cycle, to become dormant once its bandwidth is exhausted, and +to reawaken at a random offset into the next billing cycle. This feature has +interesting policy implications, however; see +the next section below. +Exit policies help to limit administrative costs by limiting the frequency of +abuse complaints (see Section~\ref{subsec:tor-and-blacklists}). +% We discuss +%technical incentive mechanisms in Section~\ref{subsec:incentives-by-design}. + +%[XXXX say more. Why else would you run a node? What else can we do/do we +% already do to make running a node more attractive?] +%[We can enforce incentives; see Section 6.1. We can rate-limit clients. +% We can put "top bandwidth nodes lists" up a la seti@home.] + +\workingnote{ +\subsection{Bandwidth and file-sharing} +\label{subsec:bandwidth-and-file-sharing} +%One potentially problematical area with deploying Tor has been our response +%to file-sharing applications. +Once users have configured their applications to work with Tor, the largest +remaining usability issue is performance. Users begin to suffer +when websites ``feel slow.'' +Clients currently try to build their connections through nodes that they +guess will have enough bandwidth. But even if capacity is allocated +optimally, it seems unlikely that the current network architecture will have +enough capacity to provide every user with as much bandwidth as she would +receive if she weren't using Tor, unless far more nodes join the network. + +%Limited capacity does not destroy the network, however. Instead, usage tends +%towards an equilibrium: when performance suffers, users who value performance +%over anonymity tend to leave the system, thus freeing capacity until the +%remaining users on the network are exactly those willing to use that capacity +%there is. + +Much of Tor's recent bandwidth difficulties have come from file-sharing +applications. These applications provide two challenges to +any anonymizing network: their intensive bandwidth requirement, and the +degree to which they are associated (correctly or not) with copyright +infringement. + +High-bandwidth protocols can make the network unresponsive, +but tend to be somewhat self-correcting as lack of bandwidth drives away +users who need it. Issues of copyright violation, +however, are more interesting. Typical exit node operators want to help +people achieve private and anonymous speech, not to help people (say) host +Vin Diesel movies for download; and typical ISPs would rather not +deal with customers who draw menacing letters +from the MPAA\@. While it is quite likely that the operators are doing nothing +illegal, many ISPs have policies of dropping users who get repeated legal +threats regardless of the merits of those threats, and many operators would +prefer to avoid receiving even meritless legal threats. +So when letters arrive, operators are likely to face +pressure to block file-sharing applications entirely, in order to avoid the +hassle. + +But blocking file-sharing is not easy: popular +protocols have evolved to run on non-standard ports to +get around other port-based bans. Thus, exit node operators who want to +block file-sharing would have to find some way to integrate Tor with a +protocol-aware exit filter. This could be a technically expensive +undertaking, and one with poor prospects: it is unlikely that Tor exit nodes +would succeed where so many institutional firewalls have failed. Another +possibility for sensitive operators is to run a restrictive node that +only permits exit connections to a restricted range of ports that are +not frequently associated with file sharing. There are increasingly few such +ports. + +Other possible approaches might include rate-limiting connections, especially +long-lived connections or connections to file-sharing ports, so that +high-bandwidth connections do not flood the network. We might also want to +give priority to cells on low-bandwidth connections to keep them interactive, +but this could have negative anonymity implications. + +For the moment, it seems that Tor's bandwidth issues have rendered it +unattractive for bulk file-sharing traffic; this may continue to be so in the +future. Nevertheless, Tor will likely remain attractive for limited use in +file-sharing protocols that have separate control and data channels. + +%[We should say more -- but what? That we'll see a similar +% equilibriating effect as with bandwidth, where sensitive ops switch to +% middleman, and we become less useful for file-sharing, so the file-sharing +% people back off, so we get more ops since there's less file-sharing, so the +% file-sharers come back, etc.] + +%XXXX +%in practice, plausible deniability is hypothetical and doesn't seem very +%convincing. if ISPs find the activity antisocial, they don't care *why* +%your computer is doing that behavior. +} + +\subsection{Tor and blacklists} +\label{subsec:tor-and-blacklists} + +It was long expected that, alongside legitimate users, Tor would also +attract troublemakers who exploit Tor to abuse services on the +Internet with vandalism, rude mail, and so on. +Our initial answer to this situation was to use ``exit policies'' +to allow individual Tor nodes to block access to specific IP/port ranges. +This approach aims to make operators more willing to run Tor by allowing +them to prevent their nodes from being used for abusing particular +services. For example, by default Tor nodes block SMTP (port 25), +to avoid the issue of spam. Note that for spammers, Tor would be +a step back, a much less effective means of distributing spam than +those currently available. This is thus primarily an unmistakable +answer to those confused about Internet communication who might raise +spam as an issue. + +Exit policies are useful, but they are insufficient: if not all nodes +block a given service, that service may try to block Tor instead. +While being blockable is important to being good netizens, we would like +to encourage services to allow anonymous access. Services should not +need to decide between blocking legitimate anonymous use and allowing +unlimited abuse. For the time being, blocking by IP address is +an expedient strategy, even if it undermines Internet stability and +functionality in the long run~\cite{netauth} + +This is potentially a bigger problem than it may appear. +On the one hand, services should be allowed to refuse connections from +sources of possible abuse. +But when a Tor node administrator decides whether he prefers to be able +to post to Wikipedia from his IP address, or to allow people to read +Wikipedia anonymously through his Tor node, he is making the decision +for others as well. (For a while, Wikipedia +blocked all posting from all Tor nodes based on IP addresses.) If +the Tor node shares an address with a campus or corporate NAT, +then the decision can prevent the entire population from posting. +Similarly, whether intended or not, such blocking supports +repression of free speech. In many locations where Internet access +of various kinds is censored or even punished by imprisonment, +Tor is a path both to the outside world and to others inside. +Blocking posts from Tor makes the job of censoring authorities easier. +This is a loss for both Tor +and Wikipedia: we don't want to compete for (or divvy up) the +NAT-protected entities of the world. +This is also unfortunate because there are relatively simple technical +solutions. +Various schemes for escrowing anonymous posts until they are reviewed +by editors would both prevent abuse and remove incentives for attempts +to abuse. Further, pseudonymous reputation tracking of posters through Tor +would allow those who establish adequate reputation to post without +escrow. Software to support pseudonymous access via Tor designed precisely +to interact with Wikipedia's access mechanism has even been developed +and proposed to Wikimedia by Jason Holt~\cite{nym}, but has not been taken up. + + +Perhaps worse, many IP blacklists are coarse-grained: they ignore Tor's exit +policies, partly because it's easier to implement and partly +so they can punish +all Tor nodes. One IP blacklist even bans +every class C network that contains a Tor node, and recommends banning SMTP +from these networks even though Tor does not allow SMTP at all. This +strategic decision aims to discourage the +operation of anything resembling an open proxy by encouraging its neighbors +to shut it down to get unblocked themselves. This pressure even +affects Tor nodes running in middleman mode (disallowing all exits) when +those nodes are blacklisted too. +% Perception of Tor as an abuse vector +%is also partly driven by multiple base-rate fallacies~\cite{axelsson00}. + +Problems of abuse occur mainly with services such as IRC networks and +Wikipedia, which rely on IP blocking to ban abusive users. While at first +blush this practice might seem to depend on the anachronistic assumption that +each IP is an identifier for a single user, it is actually more reasonable in +practice: it assumes that non-proxy IPs are a costly resource, and that an +abuser can not change IPs at will. By blocking IPs which are used by Tor +nodes, open proxies, and service abusers, these systems hope to make +ongoing abuse difficult. Although the system is imperfect, it works +tolerably well for them in practice. + +Of course, we would prefer that legitimate anonymous users be able to +access abuse-prone services. One conceivable approach would require +would-be IRC users, for instance, to register accounts if they want to +access the IRC network from Tor. In practice this would not +significantly impede abuse if creating new accounts were easily automatable; +this is why services use IP blocking. To deter abuse, pseudonymous +identities need to require a significant switching cost in resources or human +time. Some popular webmail applications +impose cost with Reverse Turing Tests, but this step may not deter all +abusers. Freedom used blind signatures to limit +the number of pseudonyms for each paying account, but Tor has neither the +ability nor the desire to collect payment. + +We stress that as far as we can tell, most Tor uses are not +abusive. Most services have not complained, and others are actively +working to find ways besides banning to cope with the abuse. For example, +the Freenode IRC network had a problem with a coordinated group of +abusers joining channels and subtly taking over the conversation; but +when they labelled all users coming from Tor IPs as ``anonymous users,'' +removing the ability of the abusers to blend in, the abuse stopped. +This is an illustration of how simple technical mechanisms can remove +the ability to abuse anonymously without undermining the ability +to communicate anonymous and can thus remove the incentive to attempt +abusing in this way. + +%The use of squishy IP-based ``authentication'' and ``authorization'' +%has not broken down even to the level that SSNs used for these +%purposes have in commercial and public record contexts. Externalities +%and misplaced incentives cause a continued focus on fighting identity +%theft by protecting SSNs rather than developing better authentication +%and incentive schemes \cite{price-privacy}. Similarly we can expect a +%continued use of identification by IP number as long as there is no +%workable alternative. + +%[XXX Mention correct DNS-RBL implementation. -NM] + +\workingnote{ +\section{Design choices} + +In addition to social issues, Tor also faces some design trade-offs that must +be investigated as the network develops. + +\subsection{Transporting the stream vs transporting the packets} +\label{subsec:stream-vs-packet} +\label{subsec:tcp-vs-ip} + +Tor transports streams; it does not tunnel packets. +It has often been suggested that like the old Freedom +network~\cite{freedom21-security}, Tor should +``obviously'' anonymize IP traffic +at the IP layer. Before this could be done, many issues need to be resolved: + +\begin{enumerate} +\setlength{\itemsep}{0mm} +\setlength{\parsep}{0mm} +\item \emph{IP packets reveal OS characteristics.} We would still need to do +IP-level packet normalization, to stop things like TCP fingerprinting +attacks. %There likely exist libraries that can help with this. +This is unlikely to be a trivial task, given the diversity and complexity of +TCP stacks. +\item \emph{Application-level streams still need scrubbing.} We still need +Tor to be easy to integrate with user-level application-specific proxies +such as Privoxy. So it's not just a matter of capturing packets and +anonymizing them at the IP layer. +\item \emph{Certain protocols will still leak information.} For example, we +must rewrite DNS requests so they are delivered to an unlinkable DNS server +rather than the DNS server at a user's ISP; thus, we must understand the +protocols we are transporting. +\item \emph{The crypto is unspecified.} First we need a block-level encryption +approach that can provide security despite +packet loss and out-of-order delivery. Freedom allegedly had one, but it was +never publicly specified. +Also, TLS over UDP is not yet implemented or +specified, though some early work has begun~\cite{dtls}. +\item \emph{We'll still need to tune network parameters.} Since the above +encryption system will likely need sequence numbers (and maybe more) to do +replay detection, handle duplicate frames, and so on, we will be reimplementing +a subset of TCP anyway---a notoriously tricky path. +\item \emph{Exit policies for arbitrary IP packets mean building a secure +IDS\@.} Our node operators tell us that exit policies are one of +the main reasons they're willing to run Tor. +Adding an Intrusion Detection System to handle exit policies would +increase the security complexity of Tor, and would likely not work anyway, +as evidenced by the entire field of IDS and counter-IDS papers. Many +potential abuse issues are resolved by the fact that Tor only transports +valid TCP streams (as opposed to arbitrary IP including malformed packets +and IP floods), so exit policies become even \emph{more} important as +we become able to transport IP packets. We also need to compactly +describe exit policies so clients can predict +which nodes will allow which packets to exit. +\item \emph{The Tor-internal name spaces would need to be redesigned.} We +support hidden service {\tt{.onion}} addresses (and other special addresses, +like {\tt{.exit}} which lets the user request a particular exit node), +by intercepting the addresses when they are passed to the Tor client. +Doing so at the IP level would require a more complex interface between +Tor and the local DNS resolver. +\end{enumerate} + +This list is discouragingly long, but being able to transport more +protocols obviously has some advantages. It would be good to learn which +items are actual roadblocks and which are easier to resolve than we think. + +To be fair, Tor's stream-based approach has run into +stumbling blocks as well. While Tor supports the SOCKS protocol, +which provides a standardized interface for generic TCP proxies, many +applications do not support SOCKS\@. For them we already need to +replace the networking system calls with SOCKS-aware +versions, or run a SOCKS tunnel locally, neither of which is +easy for the average user. %---even with good instructions. +Even when applications can use SOCKS, they often make DNS requests +themselves before handing an IP address to Tor, which advertises +where the user is about to connect. +We are still working on more usable solutions. + +%So to actually provide good anonymity, we need to make sure that +%users have a practical way to use Tor anonymously. Possibilities include +%writing wrappers for applications to anonymize them automatically; improving +%the applications' support for SOCKS; writing libraries to help application +%writers use Tor properly; and implementing a local DNS proxy to reroute DNS +%requests to Tor so that applications can simply point their DNS resolvers at +%localhost and continue to use SOCKS for data only. + +\subsection{Mid-latency} +\label{subsec:mid-latency} + +Some users need to resist traffic correlation attacks. Higher-latency +mix-networks introduce variability into message +arrival times: as timing variance increases, timing correlation attacks +require increasingly more data~\cite{e2e-traffic}. Can we improve Tor's +resistance without losing too much usability? + +We need to learn whether we can trade a small increase in latency +for a large anonymity increase, or if we'd end up trading a lot of +latency for only a minimal security gain. A trade-off might be worthwhile +even if we +could only protect certain use cases, such as infrequent short-duration +transactions. % To answer this question +We might adapt the techniques of~\cite{e2e-traffic} to a lower-latency mix +network, where the messages are batches of cells in temporally clustered +connections. These large fixed-size batches can also help resist volume +signature attacks~\cite{hintz-pet02}. We could also experiment with traffic +shaping to get a good balance of throughput and security. +%Other padding regimens might supplement the +%mid-latency option; however, we should continue the caution with which +%we have always approached padding lest the overhead cost us too much +%performance or too many volunteers. + +We must keep usability in mind too. How much can latency increase +before we drive users away? We've already been forced to increase +latency slightly, as our growing network incorporates more DSL and +cable-modem nodes and more nodes in distant continents. Perhaps we can +harness this increased latency to improve anonymity rather than just +reduce usability. Further, if we let clients label certain circuits as +mid-latency as they are constructed, we could handle both types of traffic +on the same network, giving users a choice between speed and security---and +giving researchers a chance to experiment with parameters to improve the +quality of those choices. + +\subsection{Enclaves and helper nodes} +\label{subsec:helper-nodes} + +It has long been thought that users can improve their anonymity by +running their own node~\cite{tor-design,or-ih96,or-pet00}, and using +it in an \emph{enclave} configuration, where all their circuits begin +at the node under their control. Running Tor clients or servers at +the enclave perimeter is useful when policy or other requirements +prevent individual machines within the enclave from running Tor +clients~\cite{or-jsac98,or-discex00}. + +Of course, Tor's default path length of +three is insufficient for these enclaves, since the entry and/or exit +% [edit war: without the ``and/'' the natural reading here +% is aut rather than vel. And the use of the plural verb does not work -pfs] +themselves are sensitive. Tor thus increments path length by one +for each sensitive endpoint in the circuit. +Enclaves also help to protect against end-to-end attacks, since it's +possible that traffic coming from the node has simply been relayed from +elsewhere. However, if the node has recognizable behavior patterns, +an attacker who runs nodes in the network can triangulate over time to +gain confidence that it is in fact originating the traffic. Wright et +al.~\cite{wright03} introduce the notion of a \emph{helper node}---a +single fixed entry node for each user---to combat this \emph{predecessor +attack}. + +However, the attack in~\cite{attack-tor-oak05} shows that simply adding +to the path length, or using a helper node, may not protect an enclave +node. A hostile web server can send constant interference traffic to +all nodes in the network, and learn which nodes are involved in the +circuit (though at least in the current attack, he can't learn their +order). Using randomized path lengths may help some, since the attacker +will never be certain he has identified all nodes in the path unless +he probes the entire network, but as +long as the network remains small this attack will still be feasible. + +Helper nodes also aim to help Tor clients, because choosing entry and exit +points +randomly and changing them frequently allows an attacker who controls +even a few nodes to eventually link some of their destinations. The goal +is to take the risk once and for all about choosing a bad entry node, +rather than taking a new risk for each new circuit. (Choosing fixed +exit nodes is less useful, since even an honest exit node still doesn't +protect against a hostile website.) But obstacles remain before +we can implement helper nodes. +For one, the literature does not describe how to choose helpers from a list +of nodes that changes over time. If Alice is forced to choose a new entry +helper every $d$ days and $c$ of the $n$ nodes are bad, she can expect +to choose a compromised node around +every $dc/n$ days. Statistically over time this approach only helps +if she is better at choosing honest helper nodes than at choosing +honest nodes. Worse, an attacker with the ability to DoS nodes could +force users to switch helper nodes more frequently, or remove +other candidate helpers. + +%Do general DoS attacks have anonymity implications? See e.g. Adam +%Back's IH paper, but I think there's more to be pointed out here. -RD +% Not sure what you want to say here. -NM + +%Game theory for helper nodes: if Alice offers a hidden service on a +%server (enclave model), and nobody ever uses helper nodes, then against +%George+Steven's attack she's totally nailed. If only Alice uses a helper +%node, then she's still identified as the source of the data. If everybody +%uses a helper node (including Alice), then the attack identifies the +%helper node and also Alice, and knows which one is which. If everybody +%uses a helper node (but not Alice), then the attacker figures the real +%source was a client that is using Alice as a helper node. [How's my +%logic here?] -RD +% +% Not sure about the logic. For the attack to work with helper nodes, the +%attacker needs to guess that Alice is running the hidden service, right? +%Otherwise, how can he know to measure her traffic specifically? -NM +% +% In the Murdoch-Danezis attack, the adversary measures all servers. -RD + +%point to routing-zones section re: helper nodes to defend against +%big stuff. + +\subsection{Location-hidden services} +\label{subsec:hidden-services} + +% This section is first up against the wall when the revolution comes. + +Tor's \emph{rendezvous points} +let users provide TCP services to other Tor users without revealing +the service's location. Since this feature is relatively recent, we describe +here +a couple of our early observations from its deployment. + +First, our implementation of hidden services seems less hidden than we'd +like, since they build a different rendezvous circuit for each user, +and an external adversary can induce them to +produce traffic. This insecurity means that they may not be suitable as +a building block for Free Haven~\cite{freehaven-berk} or other anonymous +publishing systems that aim to provide long-term security, though helper +nodes, as discussed above, would seem to help. + +\emph{Hot-swap} hidden services, where more than one location can +provide the service and loss of any one location does not imply a +change in service, would help foil intersection and observation attacks +where an adversary monitors availability of a hidden service and also +monitors whether certain users or servers are online. The design +challenges in providing such services without otherwise compromising +the hidden service's anonymity remain an open problem; +however, see~\cite{move-ndss05}. + +In practice, hidden services are used for more than just providing private +access to a web server or IRC server. People are using hidden services +as a poor man's VPN and firewall-buster. Many people want to be able +to connect to the computers in their private network via secure shell, +and rather than playing with dyndns and trying to pierce holes in their +firewall, they run a hidden service on the inside and then rendezvous +with that hidden service externally. + +News sites like Bloggers Without Borders (www.b19s.org) are advertising +a hidden-service address on their front page. Doing this can provide +increased robustness if they use the dual-IP approach we describe +in~\cite{tor-design}, +but in practice they do it to increase visibility +of the Tor project and their support for privacy, and to offer +a way for their users, using unmodified software, to get end-to-end +encryption and authentication to their website. + +\subsection{Location diversity and ISP-class adversaries} +\label{subsec:routing-zones} + +Anonymity networks have long relied on diversity of node location for +protection against attacks---typically an adversary who can observe a +larger fraction of the network can launch a more effective attack. One +way to achieve dispersal involves growing the network so a given adversary +sees less. Alternately, we can arrange the topology so traffic can enter +or exit at many places (for example, by using a free-route network +like Tor rather than a cascade network like JAP). Lastly, we can use +distributed trust to spread each transaction over multiple jurisdictions. +But how do we decide whether two nodes are in related locations? + +Feamster and Dingledine defined a \emph{location diversity} metric +in~\cite{feamster:wpes2004}, and began investigating a variant of location +diversity based on the fact that the Internet is divided into thousands of +independently operated networks called {\em autonomous systems} (ASes). +The key insight from their paper is that while we typically think of a +connection as going directly from the Tor client to the first Tor node, +actually it traverses many different ASes on each hop. An adversary at +any of these ASes can monitor or influence traffic. Specifically, given +plausible initiators and recipients, and given random path selection, +some ASes in the simulation were able to observe 10\% to 30\% of the +transactions (that is, learn both the origin and the destination) on +the deployed Tor network (33 nodes as of June 2004). + +The paper concludes that for best protection against the AS-level +adversary, nodes should be in ASes that have the most links to other ASes: +Tier-1 ISPs such as AT\&T and Abovenet. Further, a given transaction +is safest when it starts or ends in a Tier-1 ISP\@. Therefore, assuming +initiator and responder are both in the U.S., it actually \emph{hurts} +our location diversity to use far-flung nodes in +continents like Asia or South America. +% it's not just entering or exiting from them. using them as the middle +% hop reduces your effective path length, which you presumably don't +% want because you chose that path length for a reason. +% +% Not sure I buy that argument. Two end nodes in the right ASs to +% discourage linking are still not known to each other. If some +% adversary in a single AS can bridge the middle node, it shouldn't +% therefore be able to identify initiator or responder; although it could +% contribute to further attacks given more assumptions. +% Nonetheless, no change to the actual text for now. + +Many open questions remain. First, it will be an immense engineering +challenge to get an entire BGP routing table to each Tor client, or to +summarize it sufficiently. Without a local copy, clients won't be +able to safely predict what ASes will be traversed on the various paths +through the Tor network to the final destination. Tarzan~\cite{tarzan:ccs02} +and MorphMix~\cite{morphmix:fc04} suggest that we compare IP prefixes to +determine location diversity; but the above paper showed that in practice +many of the Mixmaster nodes that share a single AS have entirely different +IP prefixes. When the network has scaled to thousands of nodes, does IP +prefix comparison become a more useful approximation? % Alternatively, can +%relevant parts of the routing tables be summarized centrally and delivered to +%clients in a less verbose format? +%% i already said "or to summarize is sufficiently" above. is that not +%% enough? -RD +% +Second, we can take advantage of caching certain content at the +exit nodes, to limit the number of requests that need to leave the +network at all. What about taking advantage of caches like Akamai or +Google~\cite{shsm03}? (Note that they're also well-positioned as global +adversaries.) +% +Third, if we follow the recommendations in~\cite{feamster:wpes2004} + and tailor path selection +to avoid choosing endpoints in similar locations, how much are we hurting +anonymity against larger real-world adversaries who can take advantage +of knowing our algorithm? +% +Fourth, can we use this knowledge to figure out which gaps in our network +most affect our robustness to this class of attack, and go recruit +new nodes with those ASes in mind? + +%Tor's security relies in large part on the dispersal properties of its +%network. We need to be more aware of the anonymity properties of various +%approaches so we can make better design decisions in the future. + +\subsection{The Anti-censorship problem} +\label{subsec:china} + +Citizens in a variety of countries, such as most recently China and +Iran, are blocked from accessing various sites outside +their country. These users try to find any tools available to allow +them to get-around these firewalls. Some anonymity networks, such as +Six-Four~\cite{six-four}, are designed specifically with this goal in +mind; others like the Anonymizer~\cite{anonymizer} are paid by sponsors +such as Voice of America to encourage Internet +freedom. Even though Tor wasn't +designed with ubiquitous access to the network in mind, thousands of +users across the world are now using it for exactly this purpose. +% Academic and NGO organizations, peacefire, \cite{berkman}, etc + +Anti-censorship networks hoping to bridge country-level blocks face +a variety of challenges. One of these is that they need to find enough +exit nodes---servers on the `free' side that are willing to relay +traffic from users to their final destinations. Anonymizing +networks like Tor are well-suited to this task since we have +already gathered a set of exit nodes that are willing to tolerate some +political heat. + +The other main challenge is to distribute a list of reachable relays +to the users inside the country, and give them software to use those relays, +without letting the censors also enumerate this list and block each +relay. Anonymizer solves this by buying lots of seemingly-unrelated IP +addresses (or having them donated), abandoning old addresses as they are +`used up,' and telling a few users about the new ones. Distributed +anonymizing networks again have an advantage here, in that we already +have tens of thousands of separate IP addresses whose users might +volunteer to provide this service since they've already installed and use +the software for their own privacy~\cite{koepsell:wpes2004}. Because +the Tor protocol separates routing from network discovery \cite{tor-design}, +volunteers could configure their Tor clients +to generate node descriptors and send them to a special directory +server that gives them out to dissidents who need to get around blocks. + +Of course, this still doesn't prevent the adversary +from enumerating and preemptively blocking the volunteer relays. +Perhaps a tiered-trust system could be built where a few individuals are +given relays' locations. They could then recommend other individuals +by telling them +those addresses, thus providing a built-in incentive to avoid letting the +adversary intercept them. Max-flow trust algorithms~\cite{advogato} +might help to bound the number of IP addresses leaked to the adversary. Groups +like the W3C are looking into using Tor as a component in an overall system to +help address censorship; we wish them success. + +%\cite{infranet} + + +\section{Scaling} +\label{sec:scaling} + +Tor is running today with hundreds of nodes and hundreds of thousands of +users, but it will certainly not scale to millions. +Scaling Tor involves four main challenges. First, to get a +large set of nodes, we must address incentives for +users to carry traffic for others. Next is safe node discovery, both +while bootstrapping (Tor clients must robustly find an initial +node list) and later (Tor clients must learn about a fair sample +of honest nodes and not let the adversary control circuits). +We must also detect and handle node speed and reliability as the network +becomes increasingly heterogeneous: since the speed and reliability +of a circuit is limited by its worst link, we must learn to track and +predict performance. Finally, we must stop assuming that all points on +the network can connect to all other points. + +\subsection{Incentives by Design} +\label{subsec:incentives-by-design} + +There are three behaviors we need to encourage for each Tor node: relaying +traffic; providing good throughput and reliability while doing it; +and allowing traffic to exit the network from that node. + +We encourage these behaviors through \emph{indirect} incentives: that +is, by designing the system and educating users in such a way that users +with certain goals will choose to relay traffic. One +main incentive for running a Tor node is social: volunteers +altruistically donate their bandwidth and time. We encourage this with +public rankings of the throughput and reliability of nodes, much like +seti@home. We further explain to users that they can get +deniability for any traffic emerging from the same address as a Tor +exit node, and they can use their own Tor node +as an entry or exit point with confidence that it's not run by an adversary. +Further, users may run a node simply because they need such a network +to be persistently available and usable, and the value of supporting this +exceeds any countervening costs. +Finally, we can encourage operators by improving the usability and feature +set of the software: +rate limiting support and easy packaging decrease the hassle of +maintaining a node, and our configurable exit policies allow each +operator to advertise a policy describing the hosts and ports to which +he feels comfortable connecting. + +To date these incentives appear to have been adequate. As the system scales +or as new issues emerge, however, we may also need to provide + \emph{direct} incentives: +providing payment or other resources in return for high-quality service. +Paying actual money is problematic: decentralized e-cash systems are +not yet practical, and a centralized collection system not only reduces +robustness, but also has failed in the past (the history of commercial +anonymizing networks is littered with failed attempts). A more promising +option is to use a tit-for-tat incentive scheme, where nodes provide better +service to nodes that have provided good service for them. + +Unfortunately, such an approach introduces new anonymity problems. +There are many surprising ways for nodes to game the incentive and +reputation system to undermine anonymity---such systems are typically +designed to encourage fairness in storage or bandwidth usage, not +fairness of provided anonymity. An adversary can attract more traffic +by performing well or can target individual users by selectively +performing, to undermine their anonymity. Typically a user who +chooses evenly from all nodes is most resistant to an adversary +targeting him, but that approach hampers the efficient use +of heterogeneous nodes. + +%When a node (call him Steve) performs well for Alice, does Steve gain +%reputation with the entire system, or just with Alice? If the entire +%system, how does Alice tell everybody about her experience in a way that +%prevents her from lying about it yet still protects her identity? If +%Steve's behavior only affects Alice's behavior, does this allow Steve to +%selectively perform only for Alice, and then break her anonymity later +%when somebody (presumably Alice) routes through his node? + +A possible solution is a simplified approach to the tit-for-tat +incentive scheme based on two rules: (1) each node should measure the +service it receives from adjacent nodes, and provide service relative +to the received service, but (2) when a node is making decisions that +affect its own security (such as building a circuit for its own +application connections), it should choose evenly from a sufficiently +large set of nodes that meet some minimum service +threshold~\cite{casc-rep}. This approach allows us to discourage +bad service +without opening Alice up as much to attacks. All of this requires +further study. + + +\subsection{Trust and discovery} +\label{subsec:trust-and-discovery} + +The published Tor design is deliberately simplistic in how +new nodes are authorized and how clients are informed about Tor +nodes and their status. +All nodes periodically upload a signed description +of their locations, keys, and capabilities to each of several well-known {\it + directory servers}. These directory servers construct a signed summary +of all known Tor nodes (a ``directory''), and a signed statement of which +nodes they +believe to be operational then (a ``network status''). Clients +periodically download a directory to learn the latest nodes and +keys, and more frequently download a network status to learn which nodes are +likely to be running. Tor nodes also operate as directory caches, to +lighten the bandwidth on the directory servers. + +To prevent Sybil attacks (wherein an adversary signs up many +purportedly independent nodes to increase her network view), +this design +requires the directory server operators to manually +approve new nodes. Unapproved nodes are included in the directory, +but clients +do not use them at the start or end of their circuits. In practice, +directory administrators perform little actual verification, and tend to +approve any Tor node whose operator can compose a coherent email. +This procedure +may prevent trivial automated Sybil attacks, but will do little +against a clever and determined attacker. + +There are a number of flaws in this system that need to be addressed as we +move forward. First, +each directory server represents an independent point of failure: any +compromised directory server could start recommending only compromised +nodes. +Second, as more nodes join the network, %the more unreasonable it +%becomes to expect clients to know about them all. +directories +become infeasibly large, and downloading the list of nodes becomes +burdensome. +Third, the validation scheme may do as much harm as it does good. It +does not prevent clever attackers from mounting Sybil attacks, +and it may deter node operators from joining the network---if +they expect the validation process to be difficult, or they do not share +any languages in common with the directory server operators. + +We could try to move the system in several directions, depending on our +choice of threat model and requirements. If we did not need to increase +network capacity to support more users, we could simply + adopt even stricter validation requirements, and reduce the number of +nodes in the network to a trusted minimum. +But, we can only do that if can simultaneously make node capacity +scale much more than we anticipate to be feasible soon, and if we can find +entities willing to run such nodes, an equally daunting prospect. + +In order to address the first two issues, it seems wise to move to a system +including a number of semi-trusted directory servers, no one of which can +compromise a user on its own. Ultimately, of course, we cannot escape the +problem of a first introducer: since most users will run Tor in whatever +configuration the software ships with, the Tor distribution itself will +remain a single point of failure so long as it includes the seed +keys for directory servers, a list of directory servers, or any other means +to learn which nodes are on the network. But omitting this information +from the Tor distribution would only delegate the trust problem to each +individual user. %, most of whom are presumably less informed about how to make +%trust decisions than the Tor developers. +A well publicized, widely available, authoritatively and independently +endorsed and signed list of initial directory servers and their keys +is a possible solution. But, setting that up properly is itself a large +bootstrapping task. + +%Network discovery, sybil, node admission, scaling. It seems that the code +%will ship with something and that's our trust root. We could try to get +%people to build a web of trust, but no. Where we go from here depends +%on what threats we have in mind. Really decentralized if your threat is +%RIAA; less so if threat is to application data or individuals or... + + +\subsection{Measuring performance and capacity} +\label{subsec:performance} + +One of the paradoxes with engineering an anonymity network is that we'd like +to learn as much as we can about how traffic flows so we can improve the +network, but we want to prevent others from learning how traffic flows in +order to trace users' connections through the network. Furthermore, many +mechanisms that help Tor run efficiently +require measurements about the network. + +Currently, nodes try to deduce their own available bandwidth (based on how +much traffic they have been able to transfer recently) and include this +information in the descriptors they upload to the directory. Clients +choose servers weighted by their bandwidth, neglecting really slow +servers and capping the influence of really fast ones. +% +This is, of course, eminently cheatable. A malicious node can get a +disproportionate amount of traffic simply by claiming to have more bandwidth +than it does. But better mechanisms have their problems. If bandwidth data +is to be measured rather than self-reported, it is usually possible for +nodes to selectively provide better service for the measuring party, or +sabotage the measured value of other nodes. Complex solutions for +mix networks have been proposed, but do not address the issues +completely~\cite{mix-acc,casc-rep}. + +Even with no cheating, network measurement is complex. It is common +for views of a node's latency and/or bandwidth to vary wildly between +observers. Further, it is unclear whether total bandwidth is really +the right measure; perhaps clients should instead be considering nodes +based on unused bandwidth or observed throughput. +%How to measure performance without letting people selectively deny service +%by distinguishing pings. Heck, just how to measure performance at all. In +%practice people have funny firewalls that don't match up to their exit +%policies and Tor doesn't deal. +% +%Network investigation: Is all this bandwidth publishing thing a good idea? +%How can we collect stats better? Note weasel's smokeping, at +%http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor +%which probably gives george and steven enough info to break tor? +% +And even if we can collect and use this network information effectively, +we must ensure +that it is not more useful to attackers than to us. While it +seems plausible that bandwidth data alone is not enough to reveal +sender-recipient connections under most circumstances, it could certainly +reveal the path taken by large traffic flows under low-usage circumstances. + +\subsection{Non-clique topologies} + +Tor's comparatively weak threat model may allow easier scaling than +other +designs. High-latency mix networks need to avoid partitioning attacks, where +network splits let an attacker distinguish users in different partitions. +Since Tor assumes the adversary cannot cheaply observe nodes at will, +a network split may not decrease protection much. +Thus, one option when the scale of a Tor network +exceeds some size is simply to split it. Nodes could be allocated into +partitions while hampering collaborating hostile nodes from taking over +a single partition~\cite{casc-rep}. +Clients could switch between +networks, even on a per-circuit basis. +%Future analysis may uncover +%other dangers beyond those affecting mix-nets. + +More conservatively, we can try to scale a single Tor network. Likely +problems with adding more servers to a single Tor network include an +explosion in the number of sockets needed on each server as more servers +join, and increased coordination overhead to keep each users' view of +the network consistent. As we grow, we will also have more instances of +servers that can't reach each other simply due to Internet topology or +routing problems. + +%include restricting the number of sockets and the amount of bandwidth +%used by each node. The number of sockets is determined by the network's +%connectivity and the number of users, while bandwidth capacity is determined +%by the total bandwidth of nodes on the network. The simplest solution to +%bandwidth capacity is to add more nodes, since adding a Tor node of any +%feasible bandwidth will increase the traffic capacity of the network. So as +%a first step to scaling, we should focus on making the network tolerate more +%nodes, by reducing the interconnectivity of the nodes; later we can reduce +%overhead associated with directories, discovery, and so on. + +We can address these points by reducing the network's connectivity. +Danezis~\cite{danezis:pet2003} considers +the anonymity implications of restricting routes on mix networks and +recommends an approach based on expander graphs (where any subgraph is likely +to have many neighbors). It is not immediately clear that this approach will +extend to Tor, which has a weaker threat model but higher performance +requirements: instead of analyzing the +probability of an attacker's viewing whole paths, we will need to examine the +attacker's likelihood of compromising the endpoints. +% +Tor may not need an expander graph per se: it +may be enough to have a single central subnet that is highly connected, like +an Internet backbone. % As an +%example, assume fifty nodes of relatively high traffic capacity. This +%\emph{center} forms a clique. Assume each center node can +%handle 200 connections to other nodes (including the other ones in the +%center). Assume every noncenter node connects to three nodes in the +%center and anyone out of the center that they want to. Then the +%network easily scales to c. 2500 nodes with commensurate increase in +%bandwidth. +There are many open questions: how to distribute connectivity information +(presumably nodes will learn about the central nodes +when they download Tor), whether central nodes +will need to function as a `backbone', and so on. As above, +this could reduce the amount of anonymity available from a mix-net, +but for a low-latency network where anonymity derives largely from +the edges, it may be feasible. + +%In a sense, Tor already has a non-clique topology. +%Individuals can set up and run Tor nodes without informing the +%directory servers. This allows groups to run a +%local Tor network of private nodes that connects to the public Tor +%network. This network is hidden behind the Tor network, and its +%only visible connection to Tor is at those points where it connects. +%As far as the public network, or anyone observing it, is concerned, +%they are running clients. +} + +\section{The Future} +\label{sec:conclusion} + +Tor is the largest and most diverse low-latency anonymity network +available, but we are still in the beginning stages of deployment. Several +major questions remain. + +First, will our volunteer-based approach to sustainability work in the +long term? As we add more features and destabilize the network, the +developers spend a lot of time keeping the server operators happy. Even +though Tor is free software, the network would likely stagnate and die at +this stage if the developers stopped actively working on it. We may get +an unexpected boon from the fact that we're a general-purpose overlay +network: as Tor grows more popular, other groups who need an overlay +network on the Internet are starting to adapt Tor to their needs. +% +Second, Tor is only one of many components that preserve privacy online. +For applications where it is desirable to +keep identifying information out of application traffic, someone must build +more and better protocol-aware proxies that are usable by ordinary people. +% +Third, we need to gain a reputation for social good, and learn how to +coexist with the variety of Internet services and their established +authentication mechanisms. We can't just keep escalating the blacklist +standoff forever. +% +Fourth, the current Tor +architecture does not scale even to handle current user demand. We must +find designs and incentives to let some clients relay traffic too, without +sacrificing too much anonymity. + +These are difficult and open questions. Yet choosing not to solve them +means leaving most users to a less secure network or no anonymizing +network at all. + +\bibliographystyle{plain} \bibliography{tor-design} + +\end{document} + +\clearpage +\appendix + +\begin{figure}[t] +%\unitlength=1in +\centering +%\begin{picture}(6.0,2.0) +%\put(3,1){\makebox(0,0)[c]{\epsfig{figure=graphnodes,width=6in}}} +%\end{picture} +\mbox{\epsfig{figure=graphnodes,width=5in}} +\caption{Number of Tor nodes over time, through January 2005. Lowest +line is number of exit +nodes that allow connections to port 80. Middle line is total number of +verified (registered) Tor nodes. The line above that represents nodes +that are running but not yet registered.} +\label{fig:graphnodes} +\end{figure} + +\begin{figure}[t] +\centering +\mbox{\epsfig{figure=graphtraffic,width=5in}} +\caption{The sum of traffic reported by each node over time, through +January 2005. The bottom +pair show average throughput, and the top pair represent the largest 15 +minute burst in each 4 hour period.} +\label{fig:graphtraffic} +\end{figure} + + + +%Making use of nodes with little bandwidth, or high latency/packet loss. + +%Running Tor nodes behind NATs, behind great-firewalls-of-China, etc. +%Restricted routes. How to propagate to everybody the topology? BGP +%style doesn't work because we don't want just *one* path. Point to +%Geoff's stuff. + -- cgit v1.2.3-54-g00ecf