diff options
Diffstat (limited to 'doc/design-paper/challenges.tex')
-rw-r--r-- | doc/design-paper/challenges.tex | 1505 |
1 files changed, 0 insertions, 1505 deletions
diff --git a/doc/design-paper/challenges.tex b/doc/design-paper/challenges.tex deleted file mode 100644 index 6949693bf0..0000000000 --- a/doc/design-paper/challenges.tex +++ /dev/null @@ -1,1505 +0,0 @@ -\documentclass{llncs} - -\usepackage{url} -\usepackage{amsmath} -\usepackage{epsfig} - -\setlength{\textwidth}{5.9in} -\setlength{\textheight}{8.4in} -\setlength{\topmargin}{.5cm} -\setlength{\oddsidemargin}{1cm} -\setlength{\evensidemargin}{1cm} - -\newenvironment{tightlist}{\begin{list}{$\bullet$}{ - \setlength{\itemsep}{0mm} - \setlength{\parsep}{0mm} - % \setlength{\labelsep}{0mm} - % \setlength{\labelwidth}{0mm} - % \setlength{\topsep}{0mm} - }}{\end{list}} - -\begin{document} - -\title{Challenges in deploying low-latency anonymity (DRAFT)} - -\author{Roger Dingledine\inst{1} \and -Nick Mathewson\inst{1} \and -Paul Syverson\inst{2}} -\institute{The Free Haven Project \email{<\{arma,nickm\}@freehaven.net>} \and -Naval Research Laboratory \email{<syverson@itd.nrl.navy.mil>}} - -\maketitle -\pagestyle{plain} - -\begin{abstract} - There are many unexpected or unexpectedly difficult obstacles to - deploying anonymous communications. Drawing on our experiences deploying - Tor (the second-generation onion routing network), we describe social - challenges and technical issues that must be faced - in building, deploying, and sustaining a scalable, distributed, low-latency - anonymity network. -\end{abstract} - -\section{Introduction} -% Your network is not practical unless it is sustainable and distributed. -Anonymous communication is full of surprises. This paper discusses some -unexpected challenges arising from our experiences deploying Tor, a -low-latency general-purpose anonymous communication system. We will discuss -some of the difficulties we have experienced and how we have met them (or how -we plan to meet them, if we know). We also discuss some less -troublesome open problems that we must nevertheless eventually address. -%We will describe both those future challenges that we intend to explore and -%those that we have decided not to explore and why. - -Tor is an overlay network for anonymizing TCP streams over the -Internet~\cite{tor-design}. It addresses limitations in earlier Onion -Routing designs~\cite{or-ih96,or-jsac98,or-discex00,or-pet00} by adding -perfect forward secrecy, congestion control, directory servers, data -integrity, configurable exit policies, and location-hidden services using -rendezvous points. Tor works on the real-world Internet, requires no special -privileges or kernel modifications, requires little synchronization or -coordination between nodes, and provides a reasonable trade-off between -anonymity, usability, and efficiency. - -We deployed the public Tor network in October 2003; since then it has -grown to over a hundred volunteer-operated nodes -and as much as 80 megabits of -average traffic per second. Tor's research strategy has focused on deploying -a network to as many users as possible; thus, we have resisted designs that -would compromise deployability by imposing high resource demands on node -operators, and designs that would compromise usability by imposing -unacceptable restrictions on which applications we support. Although this -strategy has -drawbacks (including a weakened threat model, as discussed below), it has -made it possible for Tor to serve many thousands of users and attract -funding from diverse sources whose goals range from security on a -national scale down to individual liberties. - -In~\cite{tor-design} we gave an overall view of Tor's -design and goals. Here we describe some policy, social, and technical -issues that we face as we continue deployment. -Rather than providing complete solutions to every problem, we -instead lay out the challenges and constraints that we have observed while -deploying Tor. In doing so, we aim to provide a research agenda -of general interest to projects attempting to build -and deploy practical, usable anonymity networks in the wild. - -%While the Tor design paper~\cite{tor-design} gives an overall view its -%design and goals, -%this paper describes the policy and technical issues that Tor faces as -%we continue deployment. Rather than trying to provide complete solutions -%to every problem here, we lay out the assumptions and constraints -%that we have observed through deploying Tor in the wild. In doing so, we -%aim to create a research agenda for others to -%help in addressing these issues. -% Section~\ref{sec:what-is-tor} gives an -%overview of the Tor -%design and ours goals. Sections~\ref{sec:crossroads-policy} -%and~\ref{sec:crossroads-design} go on to describe the practical challenges, -%both policy and technical respectively, -%that stand in the way of moving -%from a practical useful network to a practical useful anonymous network. - -%\section{What Is Tor} -\section{Background} -Here we give a basic overview of the Tor design and its properties, and -compare Tor to other low-latency anonymity designs. - -\subsection{Tor, threat models, and distributed trust} -\label{sec:what-is-tor} - -%Here we give a basic overview of the Tor design and its properties. For -%details on the design, assumptions, and security arguments, we refer -%the reader to the Tor design paper~\cite{tor-design}. - -Tor provides \emph{forward privacy}, so that users can connect to -Internet sites without revealing their logical or physical locations -to those sites or to observers. It also provides \emph{location-hidden -services}, so that servers can support authorized users without -giving an effective vector for physical or online attackers. -Tor provides these protections even when a portion of its -infrastructure is compromised. - -To connect to a remote server via Tor, the client software learns a signed -list of Tor nodes from one of several central \emph{directory servers}, and -incrementally creates a private pathway or \emph{circuit} of encrypted -connections through authenticated Tor nodes on the network, negotiating a -separate set of encryption keys for each hop along the circuit. The circuit -is extended one node at a time, and each node along the way knows only the -immediately previous and following nodes in the circuit, so no individual Tor -node knows the complete path that each fixed-sized data packet (or -\emph{cell}) will take. -%Because each node sees no more than one hop in the -%circuit, -Thus, neither an eavesdropper nor a compromised node can -see both the connection's source and destination. Later requests use a new -circuit, to complicate long-term linkability between different actions by -a single user. - -Tor also helps servers hide their locations while -providing services such as web publishing or instant -messaging. Using ``rendezvous points'', other Tor users can -connect to these authenticated hidden services, neither one learning the -other's network identity. - -Tor attempts to anonymize the transport layer, not the application layer. -This approach is useful for applications such as SSH -where authenticated communication is desired. However, when anonymity from -those with whom we communicate is desired, -application protocols that include personally identifying information need -additional application-level scrubbing proxies, such as -Privoxy~\cite{privoxy} for HTTP\@. Furthermore, Tor does not relay arbitrary -IP packets; it only anonymizes TCP streams and DNS requests -%, and only supports -%connections via SOCKS -(but see Section~\ref{subsec:tcp-vs-ip}). - -Most node operators do not want to allow arbitrary TCP traffic. % to leave -%their server. -To address this, Tor provides \emph{exit policies} so -each exit node can block the IP addresses and ports it is unwilling to allow. -Tor nodes advertise their exit policies to the directory servers, so that -clients can tell which nodes will support their connections. - -As of January 2005, the Tor network has grown to around a hundred nodes -on four continents, with a total capacity exceeding 1Gbit/s. Appendix A -shows a graph of the number of working nodes over time, as well as a -graph of the number of bytes being handled by the network over time. -The network is now sufficiently diverse for further development -and testing; but of course we always encourage new nodes -to join. - -Tor research and development has been funded by ONR and DARPA -for use in securing government -communications, and by the Electronic Frontier Foundation for use -in maintaining civil liberties for ordinary citizens online. The Tor -protocol is one of the leading choices -for the anonymizing layer in the European Union's PRIME directive to -help maintain privacy in Europe. -The AN.ON project in Germany -has integrated an independent implementation of the Tor protocol into -their popular Java Anon Proxy anonymizing client. -% This wide variety of -%interests helps maintain both the stability and the security of the -%network. - -\medskip -\noindent -{\bf Threat models and design philosophy.} -The ideal Tor network would be practical, useful and anonymous. When -trade-offs arise between these properties, Tor's research strategy has been -to remain useful enough to attract many users, -and practical enough to support them. Only subject to these -constraints do we try to maximize -anonymity.\footnote{This is not the only possible -direction in anonymity research: designs exist that provide more anonymity -than Tor at the expense of significantly increased resource requirements, or -decreased flexibility in application support (typically because of increased -latency). Such research does not typically abandon aspirations toward -deployability or utility, but instead tries to maximize deployability and -utility subject to a certain degree of structural anonymity (structural because -usability and practicality affect usage which affects the actual anonymity -provided by the network \cite{econymics,back01}).} -%{We believe that these -%approaches can be promising and useful, but that by focusing on deploying a -%usable system in the wild, Tor helps us experiment with the actual parameters -%of what makes a system ``practical'' for volunteer operators and ``useful'' -%for home users, and helps illuminate undernoticed issues which any deployed -%volunteer anonymity network will need to address.} -Because of our strategy, Tor has a weaker threat model than many designs in -the literature. In particular, because we -support interactive communications without impractically expensive padding, -we fall prey to a variety -of intra-network~\cite{back01,attack-tor-oak05,flow-correlation04} and -end-to-end~\cite{danezis:pet2004,SS03} anonymity-breaking attacks. - -Tor does not attempt to defend against a global observer. In general, an -attacker who can measure both ends of a connection through the Tor network -% I say 'measure' rather than 'observe', to encompass murdoch-danezis -% style attacks. -RD -can correlate the timing and volume of data on that connection as it enters -and leaves the network, and so link communication partners. -Known solutions to this attack would seem to require introducing a -prohibitive degree of traffic padding between the user and the network, or -introducing an unacceptable degree of latency (but see Section -\ref{subsec:mid-latency}). Also, it is not clear that these methods would -work at all against a minimally active adversary who could introduce timing -patterns or additional traffic. Thus, Tor only attempts to defend against -external observers who cannot observe both sides of a user's connections. - - -Against internal attackers who sign up Tor nodes, the situation is more -complicated. In the simplest case, if an adversary has compromised $c$ of -$n$ nodes on the Tor network, then the adversary will be able to compromise -a random circuit with probability $\frac{c^2}{n^2}$ (since the circuit -initiator chooses hops randomly). But there are -complicating factors: -(1)~If the user continues to build random circuits over time, an adversary - is pretty certain to see a statistical sample of the user's traffic, and - thereby can build an increasingly accurate profile of her behavior. (See - Section~\ref{subsec:helper-nodes} for possible solutions.) -(2)~An adversary who controls a popular service outside the Tor network - can be certain to observe all connections to that service; he - can therefore trace connections to that service with probability - $\frac{c}{n}$. -(3)~Users do not in fact choose nodes with uniform probability; they - favor nodes with high bandwidth or uptime, and exit nodes that - permit connections to their favorite services. -(See Section~\ref{subsec:routing-zones} for discussion of larger -adversaries and our dispersal goals.) - -% I'm trying to make this paragraph work without reference to the -% analysis/confirmation distinction, which we haven't actually introduced -% yet, and which we realize isn't very stable anyway. Also, I don't want to -% deprecate these attacks if we can't demonstrate that they don't work, since -% in case they *do* turn out to work well against Tor, we'll look pretty -% foolish. -NM -More powerful attacks may exist. In \cite{hintz-pet02} it was -shown that an attacker who can catalog data volumes of popular -responder destinations (say, websites with consistent data volumes) may not -need to -observe both ends of a stream to learn source-destination links for those -responders. -Similarly, latencies of going through various routes can be -cataloged~\cite{back01} to connect endpoints. -% Also, \cite{kesdogan:pet2002} takes the -% attack another level further, to narrow down where you could be -% based on an intersection attack on subpages in a website. -RD -It has not yet been shown whether these attacks will succeed or fail -in the presence of the variability and volume quantization introduced by the -Tor network, but it seems likely that these factors will at best delay -rather than halt the attacks in the cases where they succeed. -Along similar lines, the same paper suggests a ``clogging -attack'' in which the throughput on a circuit is observed to slow -down when an adversary clogs the right nodes with his own traffic. -To determine the nodes in a circuit this attack requires the ability -to continuously monitor the traffic exiting the network on a circuit -that is up long enough to probe all network nodes in binary fashion. -% Though somewhat related, clogging and interference are really different -% attacks with different assumptions about adversary distribution and -% capabilities as well as different techniques. -pfs -Murdoch and Danezis~\cite{attack-tor-oak05} show a practical -interference attack against portions of -the fifty node Tor network as deployed in mid 2004. -An outside attacker can actively trace a circuit through the Tor network -by observing changes in the latency of his -own traffic sent through various Tor nodes. This can be done -simultaneously at multiple nodes; however, like clogging, -this attack only reveals -the Tor nodes in the circuit, not initiator and responder addresses, -so it is still necessary to discover the endpoints to complete an -effective attack. Increasing the size and diversity of the Tor network may -help counter these attacks. - -%discuss $\frac{c^2}{n^2}$, except how in practice the chance of owning -%the last hop is not $c/n$ since that doesn't take the destination (website) -%into account. so in cases where the adversary does not also control the -%final destination we're in good shape, but if he *does* then we'd be better -%off with a system that lets each hop choose a path. -% -%Isn't it more accurate to say ``If the adversary _always_ controls the final -% dest, we would be just as well off with such as system.'' ? If not, why -% not? -nm -% Sure. In fact, better off, since they seem to scale more easily. -rd - -%Murdoch and Danezis describe an attack -%\cite{attack-tor-oak05} that lets an attacker determine the nodes used -%in a circuit; yet s/he cannot identify the initiator or responder, -%e.g., client or web server, through this attack. So the endpoints -%remain secure, which is the goal. It is conceivable that an -%adversary could attack or set up observation of all connections -%to an arbitrary Tor node in only a few minutes. If such an adversary -%were to exist, s/he could use this probing to remotely identify a node -%for further attack. Of more likely immediate practical concern -%an adversary with active access to the responder traffic -%wants to keep a circuit alive long enough to attack an identified -%node. Thus it is important to prevent the responding end of the circuit -%from keeping it open indefinitely. -%Also, someone could identify nodes in this way and if in their -%jurisdiction, immediately get a subpoena (if they even need one) -%telling the node operator(s) that she must retain all the active -%circuit data she now has. -%Further, the enclave model, which had previously looked to be the most -%generally secure, seems particularly threatened by this attack, since -%it identifies endpoints when they're also nodes in the Tor network: -%see Section~\ref{subsec:helper-nodes} for discussion of some ways to -%address this issue. - -\medskip -\noindent -{\bf Distributed trust.} -In practice Tor's threat model is based on -dispersal and diversity. -Our defense lies in having a diverse enough set of nodes -to prevent most real-world -adversaries from being in the right places to attack users, -by distributing each transaction -over several nodes in the network. This ``distributed trust'' approach -means the Tor network can be safely operated and used by a wide variety -of mutually distrustful users, providing sustainability and security. -%than some previous attempts at anonymizing networks. - -No organization can achieve this security on its own. If a single -corporation or government agency were to build a private network to -protect its operations, any connections entering or leaving that network -would be obviously linkable to the controlling organization. The members -and operations of that agency would be easier, not harder, to distinguish. - -Instead, to protect our networks from traffic analysis, we must -collaboratively blend the traffic from many organizations and private -citizens, so that an eavesdropper can't tell which users are which, -and who is looking for what information. %By bringing more users onto -%the network, all users become more secure~\cite{econymics}. -%[XXX I feel uncomfortable saying this last sentence now. -RD] -%[So, I took it out. I think we can do without it. -PFS] -The Tor network has a broad range of users, including ordinary citizens -concerned about their privacy, corporations -who don't want to reveal information to their competitors, and law -enforcement and government intelligence agencies who need -to do operations on the Internet without being noticed. -Naturally, organizations will not want to depend on others for their -security. If most participating providers are reliable, Tor tolerates -some hostile infiltration of the network. For maximum protection, -the Tor design includes an enclave approach that lets data be encrypted -(and authenticated) end-to-end, so high-sensitivity users can be sure it -hasn't been read or modified. This even works for Internet services that -don't have built-in encryption and authentication, such as unencrypted -HTTP or chat, and it requires no modification of those services. - -\subsection{Related work} -Tor differs from other deployed systems for traffic analysis resistance -in its security and flexibility. Mix networks such as -Mixmaster~\cite{mixmaster-spec} or its successor Mixminion~\cite{minion-design} -gain the highest degrees of anonymity at the expense of introducing highly -variable delays, making them unsuitable for applications such as web -browsing. Commercial single-hop -proxies~\cite{anonymizer} can provide good performance, but -a single compromise can expose all users' traffic, and a single-point -eavesdropper can perform traffic analysis on the entire network. -%Also, their proprietary implementations place any infrastructure that -%depends on these single-hop solutions at the mercy of their providers' -%financial health as well as network security. -The Java -Anon Proxy~\cite{web-mix} provides similar functionality to Tor but -handles only web browsing rather than all TCP\@. -%Some peer-to-peer file-sharing overlay networks such as -%Freenet~\cite{freenet} and Mute~\cite{mute} -The Freedom -network from Zero-Knowledge Systems~\cite{freedom21-security} -was even more flexible than Tor in -transporting arbitrary IP packets, and also supported -pseudonymity in addition to anonymity; but it had -a different approach to sustainability (collecting money from users -and paying ISPs to run Tor nodes), and was eventually shut down due to financial -load. Finally, %potentially more scalable -% [I had added 'potentially' because the scalability of these designs -% is not established, and I am uncomfortable making the -% bolder unmodified assertion. Roger took 'potentially' out. -% Here's an attempt at more neutral wording -pfs] -peer-to-peer designs that are intended to be more scalable, -for example Tarzan~\cite{tarzan:ccs02} and -MorphMix~\cite{morphmix:fc04}, have been proposed in the literature but -have not been fielded. These systems differ somewhat -in threat model and presumably practical resistance to threats. -Note that MorphMix differs from Tor only in -node discovery and circuit setup; so Tor's architecture is flexible -enough to contain a MorphMix experiment. -We direct the interested reader -to~\cite{tor-design} for a more in-depth review of related work. - -%XXXX six-four. crowds. i2p. - -%XXXX -%have a serious discussion of morphmix's assumptions, since they would -%seem to be the direct competition. in fact tor is a flexible architecture -%that would encompass morphmix, and they're nearly identical except for -%path selection and node discovery. and the trust system morphmix has -%seems overkill (and/or insecure) based on the threat model we've picked. -% this para should probably move to the scalability / directory system. -RD -% Nope. Cut for space, except for small comment added above -PFS - -\section{Social challenges} - -Many of the issues the Tor project needs to address extend beyond -system design and technology development. In particular, the -Tor project's \emph{image} with respect to its users and the rest of -the Internet impacts the security it can provide. -With this image issue in mind, this section discusses the Tor user base and -Tor's interaction with other services on the Internet. - -\subsection{Communicating security} - -Usability for anonymity systems -contributes to their security, because usability -affects the possible anonymity set~\cite{econymics,back01}. -Conversely, an unusable system attracts few users and thus can't provide -much anonymity. - -This phenomenon has a second-order effect: knowing this, users should -choose which anonymity system to use based in part on how usable -and secure -\emph{others} will find it, in order to get the protection of a larger -anonymity set. Thus we might supplement the adage ``usability is a security -parameter''~\cite{back01} with a new one: ``perceived usability is a -security parameter.'' From here we can better understand the effects -of publicity on security: the more convincing your -advertising, the more likely people will believe you have users, and thus -the more users you will attract. Perversely, over-hyped systems (if they -are not too broken) may be a better choice than modestly promoted ones, -if the hype attracts more users~\cite{usability-network-effect}. - -So it follows that we should come up with ways to accurately communicate -the available security levels to the user, so she can make informed -decisions. JAP aims to do this by including a -comforting `anonymity meter' dial in the software's graphical interface, -giving the user an impression of the level of protection for her current -traffic. - -However, there's a catch. For users to share the same anonymity set, -they need to act like each other. An attacker who can distinguish -a given user's traffic from the rest of the traffic will not be -distracted by anonymity set size. For high-latency systems like -Mixminion, where the threat model is based on mixing messages with each -other, there's an arms race between end-to-end statistical attacks and -counter-strategies~\cite{statistical-disclosure,minion-design,e2e-traffic,trickle02}. -But for low-latency systems like Tor, end-to-end \emph{traffic -correlation} attacks~\cite{danezis:pet2004,defensive-dropping,SS03} -allow an attacker who can observe both ends of a communication -to correlate packet timing and volume, quickly linking -the initiator to her destination. - -Like Tor, the current JAP implementation does not pad connections -apart from using small fixed-size cells for transport. In fact, -JAP's cascade-based network topology may be more vulnerable to these -attacks, because its network has fewer edges. JAP was born out of -the ISDN mix design~\cite{isdn-mixes}, where padding made sense because -every user had a fixed bandwidth allocation and altering the timing -pattern of packets could be immediately detected. But in its current context -as an Internet web anonymizer, adding sufficient padding to JAP -would probably be prohibitively expensive and ineffective against a -minimally active attacker.\footnote{Even if JAP could -fund higher-capacity nodes indefinitely, our experience -suggests that many users would not accept the increased per-user -bandwidth requirements, leading to an overall much smaller user base. But -see Section~\ref{subsec:mid-latency}.} Therefore, since under this threat -model the number of concurrent users does not seem to have much impact -on the anonymity provided, we suggest that JAP's anonymity meter is not -accurately communicating security levels to its users. - -On the other hand, while the number of active concurrent users may not -matter as much as we'd like, it still helps to have some other users -on the network. We investigate this issue next. - -\subsection{Reputability and perceived social value} -Another factor impacting the network's security is its reputability: -the perception of its social value based on its current user base. If Alice is -the only user who has ever downloaded the software, it might be socially -accepted, but she's not getting much anonymity. Add a thousand -activists, and she's anonymous, but everyone thinks she's an activist too. -Add a thousand -diverse citizens (cancer survivors, privacy enthusiasts, and so on) -and now she's harder to profile. - -Furthermore, the network's reputability affects its operator base: more people -are willing to run a service if they believe it will be used by human rights -workers than if they believe it will be used exclusively for disreputable -ends. This effect becomes stronger if node operators themselves think they -will be associated with their users' disreputable ends. - -So the more cancer survivors on Tor, the better for the human rights -activists. The more malicious hackers, the worse for the normal users. Thus, -reputability is an anonymity issue for two reasons. First, it impacts -the sustainability of the network: a network that's always about to be -shut down has difficulty attracting and keeping adequate nodes. -Second, a disreputable network is more vulnerable to legal and -political attacks, since it will attract fewer supporters. - -While people therefore have an incentive for the network to be used for -``more reputable'' activities than their own, there are still trade-offs -involved when it comes to anonymity. To follow the above example, a -network used entirely by cancer survivors might welcome file sharers -onto the network, though of course they'd prefer a wider -variety of users. - -Reputability becomes even more tricky in the case of privacy networks, -since the good uses of the network (such as publishing by journalists in -dangerous countries) are typically kept private, whereas network abuses -or other problems tend to be more widely publicized. - -The impact of public perception on security is especially important -during the bootstrapping phase of the network, where the first few -widely publicized uses of the network can dictate the types of users it -attracts next. -As an example, some U.S.~Department of Energy -penetration testing engineers are tasked with compromising DoE computers -from the outside. They only have a limited number of ISPs from which to -launch their attacks, and they found that the defenders were recognizing -attacks because they came from the same IP space. These engineers wanted -to use Tor to hide their tracks. First, from a technical standpoint, -Tor does not support the variety of IP packets one would like to use in -such attacks (see Section~\ref{subsec:tcp-vs-ip}). But aside from this, -we also decided that it would probably be poor precedent to encourage -such use---even legal use that improves national security---and managed -to dissuade them. - -%% "outside of academia, jap has just lost, permanently". (That is, -%% even though the crime detection issues are resolved and are unlikely -%% to go down the same way again, public perception has not been kind.) - -\subsection{Sustainability and incentives} -One of the unsolved problems in low-latency anonymity designs is -how to keep the nodes running. ZKS's Freedom network -depended on paying third parties to run its servers; the JAP project's -bandwidth depends on grants to pay for its bandwidth and -administrative expenses. In Tor, bandwidth and administrative costs are -distributed across the volunteers who run Tor nodes, so we at least have -reason to think that the Tor network could survive without continued research -funding.\footnote{It also helps that Tor is implemented with free and open - source software that can be maintained by anybody with the ability and - inclination.} But why are these volunteers running nodes, and what can we -do to encourage more volunteers to do so? - -We have not formally surveyed Tor node operators to learn why they are -running nodes, but -from the information they have provided, it seems that many of them run Tor -nodes for reasons of personal interest in privacy issues. It is possible -that others are running Tor nodes to protect their own -anonymity, but of course they are -hardly likely to tell us specifics if they are. -%Significantly, Tor's threat model changes the anonymity incentives for running -%a node. In a high-latency mix network, users can receive additional -%anonymity by running their own node, since doing so obscures when they are -%injecting messages into the network. But, anybody observing all I/O to a Tor -%node can tell when the node is generating traffic that corresponds to -%none of its incoming traffic. -% -%I didn't buy the above for reason's subtle enough that I just cut it -PFS -Tor exit node operators do attain a degree of -``deniability'' for traffic that originates at that exit node. For - example, it is likely in practice that HTTP requests from a Tor node's IP - will be assumed to be from the Tor network. - More significantly, people and organizations who use Tor for - anonymity depend on the - continued existence of the Tor network to do so; running a node helps to - keep the network operational. -%\item Local Tor entry and exit nodes allow users on a network to run in an -% `enclave' configuration. [XXXX need to resolve this. They would do this -% for E2E encryption + auth?] - - -%We must try to make the costs of running a Tor node easily minimized. -Since Tor is run by volunteers, the most crucial software usability issue is -usability by operators: when an operator leaves, the network becomes less -usable by everybody. To keep operators pleased, we must try to keep Tor's -resource and administrative demands as low as possible. - -Because of ISP billing structures, many Tor operators have underused capacity -that they are willing to donate to the network, at no additional monetary -cost to them. Features to limit bandwidth have been essential to adoption. -Also useful has been a ``hibernation'' feature that allows a Tor node that -wants to provide high bandwidth, but no more than a certain amount in a -giving billing cycle, to become dormant once its bandwidth is exhausted, and -to reawaken at a random offset into the next billing cycle. This feature has -interesting policy implications, however; see -the next section below. -Exit policies help to limit administrative costs by limiting the frequency of -abuse complaints (see Section~\ref{subsec:tor-and-blacklists}). We discuss -technical incentive mechanisms in Section~\ref{subsec:incentives-by-design}. - -%[XXXX say more. Why else would you run a node? What else can we do/do we -% already do to make running a node more attractive?] -%[We can enforce incentives; see Section 6.1. We can rate-limit clients. -% We can put "top bandwidth nodes lists" up a la seti@home.] - -\subsection{Bandwidth and file-sharing} -\label{subsec:bandwidth-and-file-sharing} -%One potentially problematical area with deploying Tor has been our response -%to file-sharing applications. -Once users have configured their applications to work with Tor, the largest -remaining usability issue is performance. Users begin to suffer -when websites ``feel slow.'' -Clients currently try to build their connections through nodes that they -guess will have enough bandwidth. But even if capacity is allocated -optimally, it seems unlikely that the current network architecture will have -enough capacity to provide every user with as much bandwidth as she would -receive if she weren't using Tor, unless far more nodes join the network. - -%Limited capacity does not destroy the network, however. Instead, usage tends -%towards an equilibrium: when performance suffers, users who value performance -%over anonymity tend to leave the system, thus freeing capacity until the -%remaining users on the network are exactly those willing to use that capacity -%there is. - -Much of Tor's recent bandwidth difficulties have come from file-sharing -applications. These applications provide two challenges to -any anonymizing network: their intensive bandwidth requirement, and the -degree to which they are associated (correctly or not) with copyright -infringement. - -High-bandwidth protocols can make the network unresponsive, -but tend to be somewhat self-correcting as lack of bandwidth drives away -users who need it. Issues of copyright violation, -however, are more interesting. Typical exit node operators want to help -people achieve private and anonymous speech, not to help people (say) host -Vin Diesel movies for download; and typical ISPs would rather not -deal with customers who draw menacing letters -from the MPAA\@. While it is quite likely that the operators are doing nothing -illegal, many ISPs have policies of dropping users who get repeated legal -threats regardless of the merits of those threats, and many operators would -prefer to avoid receiving even meritless legal threats. -So when letters arrive, operators are likely to face -pressure to block file-sharing applications entirely, in order to avoid the -hassle. - -But blocking file-sharing is not easy: popular -protocols have evolved to run on non-standard ports to -get around other port-based bans. Thus, exit node operators who want to -block file-sharing would have to find some way to integrate Tor with a -protocol-aware exit filter. This could be a technically expensive -undertaking, and one with poor prospects: it is unlikely that Tor exit nodes -would succeed where so many institutional firewalls have failed. Another -possibility for sensitive operators is to run a restrictive node that -only permits exit connections to a restricted range of ports that are -not frequently associated with file sharing. There are increasingly few such -ports. - -Other possible approaches might include rate-limiting connections, especially -long-lived connections or connections to file-sharing ports, so that -high-bandwidth connections do not flood the network. We might also want to -give priority to cells on low-bandwidth connections to keep them interactive, -but this could have negative anonymity implications. - -For the moment, it seems that Tor's bandwidth issues have rendered it -unattractive for bulk file-sharing traffic; this may continue to be so in the -future. Nevertheless, Tor will likely remain attractive for limited use in -file-sharing protocols that have separate control and data channels. - -%[We should say more -- but what? That we'll see a similar -% equilibriating effect as with bandwidth, where sensitive ops switch to -% middleman, and we become less useful for file-sharing, so the file-sharing -% people back off, so we get more ops since there's less file-sharing, so the -% file-sharers come back, etc.] - -%XXXX -%in practice, plausible deniability is hypothetical and doesn't seem very -%convincing. if ISPs find the activity antisocial, they don't care *why* -%your computer is doing that behavior. - -\subsection{Tor and blacklists} -\label{subsec:tor-and-blacklists} - -It was long expected that, alongside legitimate users, Tor would also -attract troublemakers who exploit Tor to abuse services on the -Internet with vandalism, rude mail, and so on. -Our initial answer to this situation was to use ``exit policies'' -to allow individual Tor nodes to block access to specific IP/port ranges. -This approach aims to make operators more willing to run Tor by allowing -them to prevent their nodes from being used for abusing particular -services. For example, all Tor nodes currently block SMTP (port 25), -to avoid being used for spam. - -Exit policies are useful, but they are insufficient: if not all nodes -block a given service, that service may try to block Tor instead. -While being blockable is important to being good netizens, we would like -to encourage services to allow anonymous access. Services should not -need to decide between blocking legitimate anonymous use and allowing -unlimited abuse. - -This is potentially a bigger problem than it may appear. -On the one hand, services should be allowed to refuse connections from -sources of possible abuse. -But when a Tor node administrator decides whether he prefers to be able -to post to Wikipedia from his IP address, or to allow people to read -Wikipedia anonymously through his Tor node, he is making the decision -for others as well. (For a while, Wikipedia -blocked all posting from all Tor nodes based on IP addresses.) If -the Tor node shares an address with a campus or corporate NAT, -then the decision can prevent the entire population from posting. -This is a loss for both Tor -and Wikipedia: we don't want to compete for (or divvy up) the -NAT-protected entities of the world. - -Worse, many IP blacklists are coarse-grained: they ignore Tor's exit -policies, partly because it's easier to implement and partly -so they can punish -all Tor nodes. One IP blacklist even bans -every class C network that contains a Tor node, and recommends banning SMTP -from these networks even though Tor does not allow SMTP at all. This -strategic decision aims to discourage the -operation of anything resembling an open proxy by encouraging its neighbors -to shut it down to get unblocked themselves. This pressure even -affects Tor nodes running in middleman mode (disallowing all exits) when -those nodes are blacklisted too. - -Problems of abuse occur mainly with services such as IRC networks and -Wikipedia, which rely on IP blocking to ban abusive users. While at first -blush this practice might seem to depend on the anachronistic assumption that -each IP is an identifier for a single user, it is actually more reasonable in -practice: it assumes that non-proxy IPs are a costly resource, and that an -abuser can not change IPs at will. By blocking IPs which are used by Tor -nodes, open proxies, and service abusers, these systems hope to make -ongoing abuse difficult. Although the system is imperfect, it works -tolerably well for them in practice. - -Of course, we would prefer that legitimate anonymous users be able to -access abuse-prone services. One conceivable approach would require -would-be IRC users, for instance, to register accounts if they want to -access the IRC network from Tor. In practice this would not -significantly impede abuse if creating new accounts were easily automatable; -this is why services use IP blocking. To deter abuse, pseudonymous -identities need to require a significant switching cost in resources or human -time. Some popular webmail applications -impose cost with Reverse Turing Tests, but this step may not deter all -abusers. Freedom used blind signatures to limit -the number of pseudonyms for each paying account, but Tor has neither the -ability nor the desire to collect payment. - -We stress that as far as we can tell, most Tor uses are not -abusive. Most services have not complained, and others are actively -working to find ways besides banning to cope with the abuse. For example, -the Freenode IRC network had a problem with a coordinated group of -abusers joining channels and subtly taking over the conversation; but -when they labelled all users coming from Tor IPs as ``anonymous users,'' -removing the ability of the abusers to blend in, the abuse stopped. - -%The use of squishy IP-based ``authentication'' and ``authorization'' -%has not broken down even to the level that SSNs used for these -%purposes have in commercial and public record contexts. Externalities -%and misplaced incentives cause a continued focus on fighting identity -%theft by protecting SSNs rather than developing better authentication -%and incentive schemes \cite{price-privacy}. Similarly we can expect a -%continued use of identification by IP number as long as there is no -%workable alternative. - -%[XXX Mention correct DNS-RBL implementation. -NM] - -\section{Design choices} - -In addition to social issues, Tor also faces some design trade-offs that must -be investigated as the network develops. - -\subsection{Transporting the stream vs transporting the packets} -\label{subsec:stream-vs-packet} -\label{subsec:tcp-vs-ip} - -Tor transports streams; it does not tunnel packets. -It has often been suggested that like the old Freedom -network~\cite{freedom21-security}, Tor should -``obviously'' anonymize IP traffic -at the IP layer. Before this could be done, many issues need to be resolved: - -\begin{enumerate} -\setlength{\itemsep}{0mm} -\setlength{\parsep}{0mm} -\item \emph{IP packets reveal OS characteristics.} We would still need to do -IP-level packet normalization, to stop things like TCP fingerprinting -attacks. %There likely exist libraries that can help with this. -This is unlikely to be a trivial task, given the diversity and complexity of -TCP stacks. -\item \emph{Application-level streams still need scrubbing.} We still need -Tor to be easy to integrate with user-level application-specific proxies -such as Privoxy. So it's not just a matter of capturing packets and -anonymizing them at the IP layer. -\item \emph{Certain protocols will still leak information.} For example, we -must rewrite DNS requests so they are delivered to an unlinkable DNS server -rather than the DNS server at a user's ISP; thus, we must understand the -protocols we are transporting. -\item \emph{The crypto is unspecified.} First we need a block-level encryption -approach that can provide security despite -packet loss and out-of-order delivery. Freedom allegedly had one, but it was -never publicly specified. -Also, TLS over UDP is not yet implemented or -specified, though some early work has begun~\cite{dtls}. -\item \emph{We'll still need to tune network parameters.} Since the above -encryption system will likely need sequence numbers (and maybe more) to do -replay detection, handle duplicate frames, and so on, we will be reimplementing -a subset of TCP anyway---a notoriously tricky path. -\item \emph{Exit policies for arbitrary IP packets mean building a secure -IDS\@.} Our node operators tell us that exit policies are one of -the main reasons they're willing to run Tor. -Adding an Intrusion Detection System to handle exit policies would -increase the security complexity of Tor, and would likely not work anyway, -as evidenced by the entire field of IDS and counter-IDS papers. Many -potential abuse issues are resolved by the fact that Tor only transports -valid TCP streams (as opposed to arbitrary IP including malformed packets -and IP floods), so exit policies become even \emph{more} important as -we become able to transport IP packets. We also need to compactly -describe exit policies so clients can predict -which nodes will allow which packets to exit. -\item \emph{The Tor-internal name spaces would need to be redesigned.} We -support hidden service {\tt{.onion}} addresses (and other special addresses, -like {\tt{.exit}} which lets the user request a particular exit node), -by intercepting the addresses when they are passed to the Tor client. -Doing so at the IP level would require a more complex interface between -Tor and the local DNS resolver. -\end{enumerate} - -This list is discouragingly long, but being able to transport more -protocols obviously has some advantages. It would be good to learn which -items are actual roadblocks and which are easier to resolve than we think. - -To be fair, Tor's stream-based approach has run into -stumbling blocks as well. While Tor supports the SOCKS protocol, -which provides a standardized interface for generic TCP proxies, many -applications do not support SOCKS\@. For them we already need to -replace the networking system calls with SOCKS-aware -versions, or run a SOCKS tunnel locally, neither of which is -easy for the average user. %---even with good instructions. -Even when applications can use SOCKS, they often make DNS requests -themselves before handing an IP address to Tor, which advertises -where the user is about to connect. -We are still working on more usable solutions. - -%So to actually provide good anonymity, we need to make sure that -%users have a practical way to use Tor anonymously. Possibilities include -%writing wrappers for applications to anonymize them automatically; improving -%the applications' support for SOCKS; writing libraries to help application -%writers use Tor properly; and implementing a local DNS proxy to reroute DNS -%requests to Tor so that applications can simply point their DNS resolvers at -%localhost and continue to use SOCKS for data only. - -\subsection{Mid-latency} -\label{subsec:mid-latency} - -Some users need to resist traffic correlation attacks. Higher-latency -mix-networks introduce variability into message -arrival times: as timing variance increases, timing correlation attacks -require increasingly more data~\cite{e2e-traffic}. Can we improve Tor's -resistance without losing too much usability? - -We need to learn whether we can trade a small increase in latency -for a large anonymity increase, or if we'd end up trading a lot of -latency for only a minimal security gain. A trade-off might be worthwhile -even if we -could only protect certain use cases, such as infrequent short-duration -transactions. % To answer this question -We might adapt the techniques of~\cite{e2e-traffic} to a lower-latency mix -network, where the messages are batches of cells in temporally clustered -connections. These large fixed-size batches can also help resist volume -signature attacks~\cite{hintz-pet02}. We could also experiment with traffic -shaping to get a good balance of throughput and security. -%Other padding regimens might supplement the -%mid-latency option; however, we should continue the caution with which -%we have always approached padding lest the overhead cost us too much -%performance or too many volunteers. - -We must keep usability in mind too. How much can latency increase -before we drive users away? We've already been forced to increase -latency slightly, as our growing network incorporates more DSL and -cable-modem nodes and more nodes in distant continents. Perhaps we can -harness this increased latency to improve anonymity rather than just -reduce usability. Further, if we let clients label certain circuits as -mid-latency as they are constructed, we could handle both types of traffic -on the same network, giving users a choice between speed and security---and -giving researchers a chance to experiment with parameters to improve the -quality of those choices. - -\subsection{Enclaves and helper nodes} -\label{subsec:helper-nodes} - -It has long been thought that users can improve their anonymity by -running their own node~\cite{tor-design,or-ih96,or-pet00}, and using -it in an \emph{enclave} configuration, where all their circuits begin -at the node under their control. Running Tor clients or servers at -the enclave perimeter is useful when policy or other requirements -prevent individual machines within the enclave from running Tor -clients~\cite{or-jsac98,or-discex00}. - -Of course, Tor's default path length of -three is insufficient for these enclaves, since the entry and/or exit -% [edit war: without the ``and/'' the natural reading here -% is aut rather than vel. And the use of the plural verb does not work -pfs] -themselves are sensitive. Tor thus increments path length by one -for each sensitive endpoint in the circuit. -Enclaves also help to protect against end-to-end attacks, since it's -possible that traffic coming from the node has simply been relayed from -elsewhere. However, if the node has recognizable behavior patterns, -an attacker who runs nodes in the network can triangulate over time to -gain confidence that it is in fact originating the traffic. Wright et -al.~\cite{wright03} introduce the notion of a \emph{helper node}---a -single fixed entry node for each user---to combat this \emph{predecessor -attack}. - -However, the attack in~\cite{attack-tor-oak05} shows that simply adding -to the path length, or using a helper node, may not protect an enclave -node. A hostile web server can send constant interference traffic to -all nodes in the network, and learn which nodes are involved in the -circuit (though at least in the current attack, he can't learn their -order). Using randomized path lengths may help some, since the attacker -will never be certain he has identified all nodes in the path unless -he probes the entire network, but as -long as the network remains small this attack will still be feasible. - -Helper nodes also aim to help Tor clients, because choosing entry and exit -points -randomly and changing them frequently allows an attacker who controls -even a few nodes to eventually link some of their destinations. The goal -is to take the risk once and for all about choosing a bad entry node, -rather than taking a new risk for each new circuit. (Choosing fixed -exit nodes is less useful, since even an honest exit node still doesn't -protect against a hostile website.) But obstacles remain before -we can implement helper nodes. -For one, the literature does not describe how to choose helpers from a list -of nodes that changes over time. If Alice is forced to choose a new entry -helper every $d$ days and $c$ of the $n$ nodes are bad, she can expect -to choose a compromised node around -every $dc/n$ days. Statistically over time this approach only helps -if she is better at choosing honest helper nodes than at choosing -honest nodes. Worse, an attacker with the ability to DoS nodes could -force users to switch helper nodes more frequently, or remove -other candidate helpers. - -%Do general DoS attacks have anonymity implications? See e.g. Adam -%Back's IH paper, but I think there's more to be pointed out here. -RD -% Not sure what you want to say here. -NM - -%Game theory for helper nodes: if Alice offers a hidden service on a -%server (enclave model), and nobody ever uses helper nodes, then against -%George+Steven's attack she's totally nailed. If only Alice uses a helper -%node, then she's still identified as the source of the data. If everybody -%uses a helper node (including Alice), then the attack identifies the -%helper node and also Alice, and knows which one is which. If everybody -%uses a helper node (but not Alice), then the attacker figures the real -%source was a client that is using Alice as a helper node. [How's my -%logic here?] -RD -% -% Not sure about the logic. For the attack to work with helper nodes, the -%attacker needs to guess that Alice is running the hidden service, right? -%Otherwise, how can he know to measure her traffic specifically? -NM -% -% In the Murdoch-Danezis attack, the adversary measures all servers. -RD - -%point to routing-zones section re: helper nodes to defend against -%big stuff. - -\subsection{Location-hidden services} -\label{subsec:hidden-services} - -% This section is first up against the wall when the revolution comes. - -Tor's \emph{rendezvous points} -let users provide TCP services to other Tor users without revealing -the service's location. Since this feature is relatively recent, we describe -here -a couple of our early observations from its deployment. - -First, our implementation of hidden services seems less hidden than we'd -like, since they build a different rendezvous circuit for each user, -and an external adversary can induce them to -produce traffic. This insecurity means that they may not be suitable as -a building block for Free Haven~\cite{freehaven-berk} or other anonymous -publishing systems that aim to provide long-term security, though helper -nodes, as discussed above, would seem to help. - -\emph{Hot-swap} hidden services, where more than one location can -provide the service and loss of any one location does not imply a -change in service, would help foil intersection and observation attacks -where an adversary monitors availability of a hidden service and also -monitors whether certain users or servers are online. The design -challenges in providing such services without otherwise compromising -the hidden service's anonymity remain an open problem; -however, see~\cite{move-ndss05}. - -In practice, hidden services are used for more than just providing private -access to a web server or IRC server. People are using hidden services -as a poor man's VPN and firewall-buster. Many people want to be able -to connect to the computers in their private network via secure shell, -and rather than playing with dyndns and trying to pierce holes in their -firewall, they run a hidden service on the inside and then rendezvous -with that hidden service externally. - -News sites like Bloggers Without Borders (www.b19s.org) are advertising -a hidden-service address on their front page. Doing this can provide -increased robustness if they use the dual-IP approach we describe -in~\cite{tor-design}, -but in practice they do it to increase visibility -of the Tor project and their support for privacy, and to offer -a way for their users, using unmodified software, to get end-to-end -encryption and authentication to their website. - -\subsection{Location diversity and ISP-class adversaries} -\label{subsec:routing-zones} - -Anonymity networks have long relied on diversity of node location for -protection against attacks---typically an adversary who can observe a -larger fraction of the network can launch a more effective attack. One -way to achieve dispersal involves growing the network so a given adversary -sees less. Alternately, we can arrange the topology so traffic can enter -or exit at many places (for example, by using a free-route network -like Tor rather than a cascade network like JAP). Lastly, we can use -distributed trust to spread each transaction over multiple jurisdictions. -But how do we decide whether two nodes are in related locations? - -Feamster and Dingledine defined a \emph{location diversity} metric -in~\cite{feamster:wpes2004}, and began investigating a variant of location -diversity based on the fact that the Internet is divided into thousands of -independently operated networks called {\em autonomous systems} (ASes). -The key insight from their paper is that while we typically think of a -connection as going directly from the Tor client to the first Tor node, -actually it traverses many different ASes on each hop. An adversary at -any of these ASes can monitor or influence traffic. Specifically, given -plausible initiators and recipients, and given random path selection, -some ASes in the simulation were able to observe 10\% to 30\% of the -transactions (that is, learn both the origin and the destination) on -the deployed Tor network (33 nodes as of June 2004). - -The paper concludes that for best protection against the AS-level -adversary, nodes should be in ASes that have the most links to other ASes: -Tier-1 ISPs such as AT\&T and Abovenet. Further, a given transaction -is safest when it starts or ends in a Tier-1 ISP\@. Therefore, assuming -initiator and responder are both in the U.S., it actually \emph{hurts} -our location diversity to use far-flung nodes in -continents like Asia or South America. -% it's not just entering or exiting from them. using them as the middle -% hop reduces your effective path length, which you presumably don't -% want because you chose that path length for a reason. -% -% Not sure I buy that argument. Two end nodes in the right ASs to -% discourage linking are still not known to each other. If some -% adversary in a single AS can bridge the middle node, it shouldn't -% therefore be able to identify initiator or responder; although it could -% contribute to further attacks given more assumptions. -% Nonetheless, no change to the actual text for now. - -Many open questions remain. First, it will be an immense engineering -challenge to get an entire BGP routing table to each Tor client, or to -summarize it sufficiently. Without a local copy, clients won't be -able to safely predict what ASes will be traversed on the various paths -through the Tor network to the final destination. Tarzan~\cite{tarzan:ccs02} -and MorphMix~\cite{morphmix:fc04} suggest that we compare IP prefixes to -determine location diversity; but the above paper showed that in practice -many of the Mixmaster nodes that share a single AS have entirely different -IP prefixes. When the network has scaled to thousands of nodes, does IP -prefix comparison become a more useful approximation? % Alternatively, can -%relevant parts of the routing tables be summarized centrally and delivered to -%clients in a less verbose format? -%% i already said "or to summarize is sufficiently" above. is that not -%% enough? -RD -% -Second, we can take advantage of caching certain content at the -exit nodes, to limit the number of requests that need to leave the -network at all. What about taking advantage of caches like Akamai or -Google~\cite{shsm03}? (Note that they're also well-positioned as global -adversaries.) -% -Third, if we follow the recommendations in~\cite{feamster:wpes2004} - and tailor path selection -to avoid choosing endpoints in similar locations, how much are we hurting -anonymity against larger real-world adversaries who can take advantage -of knowing our algorithm? -% -Fourth, can we use this knowledge to figure out which gaps in our network -most affect our robustness to this class of attack, and go recruit -new nodes with those ASes in mind? - -%Tor's security relies in large part on the dispersal properties of its -%network. We need to be more aware of the anonymity properties of various -%approaches so we can make better design decisions in the future. - -\subsection{The Anti-censorship problem} -\label{subsec:china} - -Citizens in a variety of countries, such as most recently China and -Iran, are blocked from accessing various sites outside -their country. These users try to find any tools available to allow -them to get around these firewalls. Some anonymity networks, such as -Six-Four~\cite{six-four}, are designed specifically with this goal in -mind; others like the Anonymizer~\cite{anonymizer} are paid by sponsors -such as Voice of America to encourage Internet -freedom. Even though Tor wasn't -designed with ubiquitous access to the network in mind, thousands of -users across the world are now using it for exactly this purpose. -% Academic and NGO organizations, peacefire, \cite{berkman}, etc - -Anti-censorship networks hoping to bridge country-level blocks face -a variety of challenges. One of these is that they need to find enough -exit nodes---servers on the `free' side that are willing to relay -traffic from users to their final destinations. Anonymizing -networks like Tor are well-suited to this task since we have -already gathered a set of exit nodes that are willing to tolerate some -political heat. - -The other main challenge is to distribute a list of reachable relays -to the users inside the country, and give them software to use those relays, -without letting the censors also enumerate this list and block each -relay. Anonymizer solves this by buying lots of seemingly-unrelated IP -addresses (or having them donated), abandoning old addresses as they are -`used up,' and telling a few users about the new ones. Distributed -anonymizing networks again have an advantage here, in that we already -have tens of thousands of separate IP addresses whose users might -volunteer to provide this service since they've already installed and use -the software for their own privacy~\cite{koepsell:wpes2004}. Because -the Tor protocol separates routing from network discovery \cite{tor-design}, -volunteers could configure their Tor clients -to generate node descriptors and send them to a special directory -server that gives them out to dissidents who need to get around blocks. - -Of course, this still doesn't prevent the adversary -from enumerating and preemptively blocking the volunteer relays. -Perhaps a tiered-trust system could be built where a few individuals are -given relays' locations. They could then recommend other individuals -by telling them -those addresses, thus providing a built-in incentive to avoid letting the -adversary intercept them. Max-flow trust algorithms~\cite{advogato} -might help to bound the number of IP addresses leaked to the adversary. Groups -like the W3C are looking into using Tor as a component in an overall system to -help address censorship; we wish them success. - -%\cite{infranet} - -\section{Scaling} -\label{sec:scaling} - -Tor is running today with hundreds of nodes and tens of thousands of -users, but it will certainly not scale to millions. -Scaling Tor involves four main challenges. First, to get a -large set of nodes, we must address incentives for -users to carry traffic for others. Next is safe node discovery, both -while bootstrapping (Tor clients must robustly find an initial -node list) and later (Tor clients must learn about a fair sample -of honest nodes and not let the adversary control circuits). -We must also detect and handle node speed and reliability as the network -becomes increasingly heterogeneous: since the speed and reliability -of a circuit is limited by its worst link, we must learn to track and -predict performance. Finally, we must stop assuming that all points on -the network can connect to all other points. - -\subsection{Incentives by Design} -\label{subsec:incentives-by-design} - -There are three behaviors we need to encourage for each Tor node: relaying -traffic; providing good throughput and reliability while doing it; -and allowing traffic to exit the network from that node. - -We encourage these behaviors through \emph{indirect} incentives: that -is, by designing the system and educating users in such a way that users -with certain goals will choose to relay traffic. One -main incentive for running a Tor node is social: volunteers -altruistically donate their bandwidth and time. We encourage this with -public rankings of the throughput and reliability of nodes, much like -seti@home. We further explain to users that they can get -deniability for any traffic emerging from the same address as a Tor -exit node, and they can use their own Tor node -as an entry or exit point with confidence that it's not run by an adversary. -Further, users may run a node simply because they need such a network -to be persistently available and usable, and the value of supporting this -exceeds any countervening costs. -Finally, we can encourage operators by improving the usability and feature -set of the software: -rate limiting support and easy packaging decrease the hassle of -maintaining a node, and our configurable exit policies allow each -operator to advertise a policy describing the hosts and ports to which -he feels comfortable connecting. - -To date these incentives appear to have been adequate. As the system scales -or as new issues emerge, however, we may also need to provide - \emph{direct} incentives: -providing payment or other resources in return for high-quality service. -Paying actual money is problematic: decentralized e-cash systems are -not yet practical, and a centralized collection system not only reduces -robustness, but also has failed in the past (the history of commercial -anonymizing networks is littered with failed attempts). A more promising -option is to use a tit-for-tat incentive scheme, where nodes provide better -service to nodes that have provided good service for them. - -Unfortunately, such an approach introduces new anonymity problems. -There are many surprising ways for nodes to game the incentive and -reputation system to undermine anonymity---such systems are typically -designed to encourage fairness in storage or bandwidth usage, not -fairness of provided anonymity. An adversary can attract more traffic -by performing well or can target individual users by selectively -performing, to undermine their anonymity. Typically a user who -chooses evenly from all nodes is most resistant to an adversary -targeting him, but that approach hampers the efficient use -of heterogeneous nodes. - -%When a node (call him Steve) performs well for Alice, does Steve gain -%reputation with the entire system, or just with Alice? If the entire -%system, how does Alice tell everybody about her experience in a way that -%prevents her from lying about it yet still protects her identity? If -%Steve's behavior only affects Alice's behavior, does this allow Steve to -%selectively perform only for Alice, and then break her anonymity later -%when somebody (presumably Alice) routes through his node? - -A possible solution is a simplified approach to the tit-for-tat -incentive scheme based on two rules: (1) each node should measure the -service it receives from adjacent nodes, and provide service relative -to the received service, but (2) when a node is making decisions that -affect its own security (such as building a circuit for its own -application connections), it should choose evenly from a sufficiently -large set of nodes that meet some minimum service -threshold~\cite{casc-rep}. This approach allows us to discourage -bad service -without opening Alice up as much to attacks. All of this requires -further study. - -\subsection{Trust and discovery} -\label{subsec:trust-and-discovery} - -The published Tor design is deliberately simplistic in how -new nodes are authorized and how clients are informed about Tor -nodes and their status. -All nodes periodically upload a signed description -of their locations, keys, and capabilities to each of several well-known {\it - directory servers}. These directory servers construct a signed summary -of all known Tor nodes (a ``directory''), and a signed statement of which -nodes they -believe to be operational then (a ``network status''). Clients -periodically download a directory to learn the latest nodes and -keys, and more frequently download a network status to learn which nodes are -likely to be running. Tor nodes also operate as directory caches, to -lighten the bandwidth on the directory servers. - -To prevent Sybil attacks (wherein an adversary signs up many -purportedly independent nodes to increase her network view), -this design -requires the directory server operators to manually -approve new nodes. Unapproved nodes are included in the directory, -but clients -do not use them at the start or end of their circuits. In practice, -directory administrators perform little actual verification, and tend to -approve any Tor node whose operator can compose a coherent email. -This procedure -may prevent trivial automated Sybil attacks, but will do little -against a clever and determined attacker. - -There are a number of flaws in this system that need to be addressed as we -move forward. First, -each directory server represents an independent point of failure: any -compromised directory server could start recommending only compromised -nodes. -Second, as more nodes join the network, %the more unreasonable it -%becomes to expect clients to know about them all. -directories -become infeasibly large, and downloading the list of nodes becomes -burdensome. -Third, the validation scheme may do as much harm as it does good. It -does not prevent clever attackers from mounting Sybil attacks, -and it may deter node operators from joining the network---if -they expect the validation process to be difficult, or they do not share -any languages in common with the directory server operators. - -We could try to move the system in several directions, depending on our -choice of threat model and requirements. If we did not need to increase -network capacity to support more users, we could simply - adopt even stricter validation requirements, and reduce the number of -nodes in the network to a trusted minimum. -But, we can only do that if we can simultaneously make node capacity -scale much more than we anticipate to be feasible soon, and if we can find -entities willing to run such nodes, an equally daunting prospect. - -In order to address the first two issues, it seems wise to move to a system -including a number of semi-trusted directory servers, no one of which can -compromise a user on its own. Ultimately, of course, we cannot escape the -problem of a first introducer: since most users will run Tor in whatever -configuration the software ships with, the Tor distribution itself will -remain a single point of failure so long as it includes the seed -keys for directory servers, a list of directory servers, or any other means -to learn which nodes are on the network. But omitting this information -from the Tor distribution would only delegate the trust problem to each -individual user. %, most of whom are presumably less informed about how to make -%trust decisions than the Tor developers. -A well publicized, widely available, authoritatively and independently -endorsed and signed list of initial directory servers and their keys -is a possible solution. But, setting that up properly is itself a large -bootstrapping task. - -%Network discovery, sybil, node admission, scaling. It seems that the code -%will ship with something and that's our trust root. We could try to get -%people to build a web of trust, but no. Where we go from here depends -%on what threats we have in mind. Really decentralized if your threat is -%RIAA; less so if threat is to application data or individuals or... - -\subsection{Measuring performance and capacity} -\label{subsec:performance} - -One of the paradoxes with engineering an anonymity network is that we'd like -to learn as much as we can about how traffic flows so we can improve the -network, but we want to prevent others from learning how traffic flows in -order to trace users' connections through the network. Furthermore, many -mechanisms that help Tor run efficiently -require measurements about the network. - -Currently, nodes try to deduce their own available bandwidth (based on how -much traffic they have been able to transfer recently) and include this -information in the descriptors they upload to the directory. Clients -choose servers weighted by their bandwidth, neglecting really slow -servers and capping the influence of really fast ones. -% -This is, of course, eminently cheatable. A malicious node can get a -disproportionate amount of traffic simply by claiming to have more bandwidth -than it does. But better mechanisms have their problems. If bandwidth data -is to be measured rather than self-reported, it is usually possible for -nodes to selectively provide better service for the measuring party, or -sabotage the measured value of other nodes. Complex solutions for -mix networks have been proposed, but do not address the issues -completely~\cite{mix-acc,casc-rep}. - -Even with no cheating, network measurement is complex. It is common -for views of a node's latency and/or bandwidth to vary wildly between -observers. Further, it is unclear whether total bandwidth is really -the right measure; perhaps clients should instead be considering nodes -based on unused bandwidth or observed throughput. -%How to measure performance without letting people selectively deny service -%by distinguishing pings. Heck, just how to measure performance at all. In -%practice people have funny firewalls that don't match up to their exit -%policies and Tor doesn't deal. -% -%Network investigation: Is all this bandwidth publishing thing a good idea? -%How can we collect stats better? Note weasel's smokeping, at -%http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor -%which probably gives george and steven enough info to break tor? -% -And even if we can collect and use this network information effectively, -we must ensure -that it is not more useful to attackers than to us. While it -seems plausible that bandwidth data alone is not enough to reveal -sender-recipient connections under most circumstances, it could certainly -reveal the path taken by large traffic flows under low-usage circumstances. - -\subsection{Non-clique topologies} - -Tor's comparatively weak threat model may allow easier scaling than -other -designs. High-latency mix networks need to avoid partitioning attacks, where -network splits let an attacker distinguish users in different partitions. -Since Tor assumes the adversary cannot cheaply observe nodes at will, -a network split may not decrease protection much. -Thus, one option when the scale of a Tor network -exceeds some size is simply to split it. Nodes could be allocated into -partitions while hampering collaborating hostile nodes from taking over -a single partition~\cite{casc-rep}. -Clients could switch between -networks, even on a per-circuit basis. -%Future analysis may uncover -%other dangers beyond those affecting mix-nets. - -More conservatively, we can try to scale a single Tor network. Likely -problems with adding more servers to a single Tor network include an -explosion in the number of sockets needed on each server as more servers -join, and increased coordination overhead to keep each users' view of -the network consistent. As we grow, we will also have more instances of -servers that can't reach each other simply due to Internet topology or -routing problems. - -%include restricting the number of sockets and the amount of bandwidth -%used by each node. The number of sockets is determined by the network's -%connectivity and the number of users, while bandwidth capacity is determined -%by the total bandwidth of nodes on the network. The simplest solution to -%bandwidth capacity is to add more nodes, since adding a Tor node of any -%feasible bandwidth will increase the traffic capacity of the network. So as -%a first step to scaling, we should focus on making the network tolerate more -%nodes, by reducing the interconnectivity of the nodes; later we can reduce -%overhead associated with directories, discovery, and so on. - -We can address these points by reducing the network's connectivity. -Danezis~\cite{danezis:pet2003} considers -the anonymity implications of restricting routes on mix networks and -recommends an approach based on expander graphs (where any subgraph is likely -to have many neighbors). It is not immediately clear that this approach will -extend to Tor, which has a weaker threat model but higher performance -requirements: instead of analyzing the -probability of an attacker's viewing whole paths, we will need to examine the -attacker's likelihood of compromising the endpoints. -% -Tor may not need an expander graph per se: it -may be enough to have a single central subnet that is highly connected, like -an Internet backbone. % As an -%example, assume fifty nodes of relatively high traffic capacity. This -%\emph{center} forms a clique. Assume each center node can -%handle 200 connections to other nodes (including the other ones in the -%center). Assume every noncenter node connects to three nodes in the -%center and anyone out of the center that they want to. Then the -%network easily scales to c. 2500 nodes with commensurate increase in -%bandwidth. -There are many open questions: how to distribute connectivity information -(presumably nodes will learn about the central nodes -when they download Tor), whether central nodes -will need to function as a `backbone', and so on. As above, -this could reduce the amount of anonymity available from a mix-net, -but for a low-latency network where anonymity derives largely from -the edges, it may be feasible. - -%In a sense, Tor already has a non-clique topology. -%Individuals can set up and run Tor nodes without informing the -%directory servers. This allows groups to run a -%local Tor network of private nodes that connects to the public Tor -%network. This network is hidden behind the Tor network, and its -%only visible connection to Tor is at those points where it connects. -%As far as the public network, or anyone observing it, is concerned, -%they are running clients. - -\section{The Future} -\label{sec:conclusion} - -Tor is the largest and most diverse low-latency anonymity network -available, but we are still in the beginning stages of deployment. Several -major questions remain. - -First, will our volunteer-based approach to sustainability work in the -long term? As we add more features and destabilize the network, the -developers spend a lot of time keeping the server operators happy. Even -though Tor is free software, the network would likely stagnate and die at -this stage if the developers stopped actively working on it. We may get -an unexpected boon from the fact that we're a general-purpose overlay -network: as Tor grows more popular, other groups who need an overlay -network on the Internet are starting to adapt Tor to their needs. -% -Second, Tor is only one of many components that preserve privacy online. -For applications where it is desirable to -keep identifying information out of application traffic, someone must build -more and better protocol-aware proxies that are usable by ordinary people. -% -Third, we need to gain a reputation for social good, and learn how to -coexist with the variety of Internet services and their established -authentication mechanisms. We can't just keep escalating the blacklist -standoff forever. -% -Fourth, the current Tor -architecture does not scale even to handle current user demand. We must -find designs and incentives to let some clients relay traffic too, without -sacrificing too much anonymity. - -These are difficult and open questions. Yet choosing not to solve them -means leaving most users to a less secure network or no anonymizing -network at all. - -\bibliographystyle{plain} \bibliography{tor-design} - -\clearpage -\appendix - -\begin{figure}[t] -%\unitlength=1in -\centering -%\begin{picture}(6.0,2.0) -%\put(3,1){\makebox(0,0)[c]{\epsfig{figure=graphnodes,width=6in}}} -%\end{picture} -\mbox{\epsfig{figure=graphnodes,width=5in}} -\caption{Number of Tor nodes over time, through January 2005. Lowest -line is number of exit -nodes that allow connections to port 80. Middle line is total number of -verified (registered) Tor nodes. The line above that represents nodes -that are running but not yet registered.} -\label{fig:graphnodes} -\end{figure} - -\begin{figure}[t] -\centering -\mbox{\epsfig{figure=graphtraffic,width=5in}} -\caption{The sum of traffic reported by each node over time, through -January 2005. The bottom -pair show average throughput, and the top pair represent the largest 15 -minute burst in each 4 hour period.} -\label{fig:graphtraffic} -\end{figure} - -\end{document} - -%Making use of nodes with little bandwidth, or high latency/packet loss. - -%Running Tor nodes behind NATs, behind great-firewalls-of-China, etc. -%Restricted routes. How to propagate to everybody the topology? BGP -%style doesn't work because we don't want just *one* path. Point to -%Geoff's stuff. - |