revamp secs 8 and 9

i haven't figured out what order the paragraphs in sec8 should go. or sec9, for that matter. please fix it. :) svn:r756
author: Roger Dingledine <arma@torproject.org> 2003-11-04 14:59:58 +0000
committer: Roger Dingledine <arma@torproject.org> 2003-11-04 14:59:58 +0000
commit: cb97bf8c069c9ce6e2dfc85f61c2e43256277ef4 (patch)
tree: 10e5c60a55e9b8bd938d9df319b35db470840788 /doc/tor-design.tex
parent: 55057a3929e368829fceddaa7ea893b8a00cb8f8 (diff)
download: tor-cb97bf8c069c9ce6e2dfc85f61c2e43256277ef4.tar.gz
tor-cb97bf8c069c9ce6e2dfc85f61c2e43256277ef4.zip
1 files changed, 139 insertions, 225 deletions
diff --git a/doc/tor-design.tex b/doc/tor-design.tex
index c12662a43b..d38d151c38 100644
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -1351,11 +1351,6 @@ acknowledge his existence.
 \Section{Attacks and Defenses}
 \label{sec:attacks}
 
-% XXX In sec9 we should talk about bandwidth classes, which will
-%     enable us to accept a lot more ORs than if we continue to
-%     require 10mbit connections for all ORs. -RD
-
-  
 Below we summarize a variety of attacks, and discuss how well our
 design withstands them.
 
@@ -1647,236 +1642,157 @@ by the session key shared by the client and server.
 \Section{Open Questions in Low-latency Anonymity}
 \label{sec:maintaining-anonymity}
  
-% There must be a better intro than this! -NM
 In addition to the open problems discussed in
-Section~\ref{subsec:non-goals}, many other questions remain to be
-solved by future research before we can be confident of our security.
-
-Many of these open issues are questions of balance.  For example,
-how often should users rotate to fresh circuits?  Too-frequent
-rotation is inefficient, expensive, and may lead to intersection attacks
-and predecessor attacks \cite{wright03},
-but too-infrequent rotation
-makes the user's traffic linkable.   Along with opening a fresh
-circuit, clients can also limit linkability by exiting from a middle point
-of the circuit, or by truncating and re-extending the circuit; but
-more analysis is needed to determine the proper trade-off.
-
-A similar question surrounds timing of directory operations:
-how often should directories be updated?  With too-infrequent
-updates clients receive an inaccurate picture of the network; with
-too-frequent updates the directory servers are overloaded.
-
-%do different exit policies at different exit nodes trash anonymity sets,
-%or not mess with them much?
-%% Why would they?  By routing traffic to certain nodes preferentially?
-
-%[XXX Choosing paths and path lengths: I'm not writing this bit till
-%  Arma's pathselection stuff is in. -NM]
-%Alice always chooses her path to contain at least
-%three nodes unrelated to herself and her destination, choosing the
-%number of nodes beyond the third from a geometric distribution;
-%explain why. -NM
-
-%%%% Roger said that he'd put a path selection paragraph into section
-%%%% 4 that would replace this.
-%
-%I probably should have noted that this means loops will be on at least
-%five hop routes, which should be rare given the distribution.  I'm    
-%realizing that this is reproducing some of the thought that led to a  
-%default of five hops in the original onion routing design.  There were
-%some different assumptions, which I won't spell out now.  Note that   
-%enclave level protections really change these assumptions.  If most   
-%circuits are just two hops, then just a single link observer will be  
-%able to tell that two enclaves are communicating with high probability.
-%So, it would seem that enclaves should have a four node minimum circuit
-%to prevent trivial circuit insider identification of the whole circuit,
-%and three hop minimum for circuits from an enclave to some nonclave    
-%responder. But then... we would have to make everyone obey these rules 
-%or a node that through timing inferred it was on a four hop circuit    
-%would know that it was probably carrying enclave to enclave traffic.   
-%Which... if there were even a moderate number of bad nodes in the      
-%network would make it advantageous to break the connection to conduct  
-%a reformation intersection attack. Ahhh! I gotta stop thinking         
-%about this and work on the paper some before the family wakes up.  
-%On Sat, Oct 25, 2003 at 06:57:12AM -0400, Paul Syverson wrote:
-%> Which... if there were even a moderate number of bad nodes in the
-%> network would make it advantageous to break the connection to conduct
-%> a reformation intersection attack. Ahhh! I gotta stop thinking
-%> about this and work on the paper some before the family wakes up. 
-%This is the sort of issue that should go in the 'maintaining anonymity
-%with tor' section towards the end. :)
-%Email from between roger and me to beginning of section above. Fix and move.
+Section~\ref{subsec:non-goals}, many other questions must be solved
+before we can be confident of Tor's security.
+
+Many of these open issues are questions of balance. For example,
+how often should users rotate to fresh circuits? Frequent rotation
+is inefficient, expensive, and may lead to intersection attacks and
+predecessor attacks \cite{wright03}, but infrequent rotation makes the
+user's traffic linkable. Along with opening a fresh circuit, clients can
+also limit linkability by exiting from a middle point of the circuit,
+or by truncating and re-extending the circuit; but more analysis is
+needed to determine the proper trade-off.
+
+A similar question surrounds timing of directory operations: how often
+should directories be updated?  Clients that update infrequently receive
+an inaccurate picture of the network, but frequent updates can overload
+the directory servers. More generally, we must find more
+decentralized yet practical ways to distribute up-to-date snapshots of
+network status without introducing new attacks.
+
+How should we choose path lengths? If she uses only two hops, then both
+these nodes are certain that by colluding they will learn about Alice
+and Bob. Our current approach is that Alice always chooses at least three
+nodes unrelated to herself and her destination. Thus normally she chooses
+three nodes, but if she is running an OR and her destination is on an OR,
+she uses five. Should Alice choose a nondeterministic path length (say,
+increasing it from a geometric distribution), to foil an attacker who
+uses timing to learn that he is the fifth hop and thus concludes that
+both Alice and the responder are on ORs?
 
 Throughout this paper, we have assumed that end-to-end traffic
 confirmation will immediately and automatically defeat a low-latency
-anonymity system. Even high-latency anonymity
-systems can be vulnerable to end-to-end traffic confirmation, if the
-traffic volumes are high enough, and if users' habits are sufficiently
-distinct \cite{limits-open,statistical-disclosure}.  \emph{Can
-  anything be done to make low-latency systems resist these attacks as
-  well as high-latency systems?}
-Tor already makes some effort to conceal the starts and
-ends of streams by wrapping all long-range control commands in
-identical-looking relay cells, but more analysis is needed.  Link
-padding could frustrate passive observers who count packets; long-range
-padding could work against observers who own the first hop in a
-circuit.  But more research needs to be done in order to find an
-efficient and practical approach.  Volunteers prefer not to run
-constant-bandwidth padding; but more sophisticated traffic shaping
-approaches remain somewhat unanalyzed. 
-%[XXX is this so?] 
-Recent work
-on long-range padding \cite{defensive-dropping} shows promise.  One
-could also try to reduce correlation in packet timing by batching and
-re-ordering packets, but it is unclear whether this could improve
-anonymity without introducing so much latency as to render the
+anonymity system. Even high-latency anonymity systems can be
+vulnerable to end-to-end traffic confirmation, if the traffic volumes
+are high enough, and if users' habits are sufficiently distinct
+\cite{limits-open,statistical-disclosure}. Can anything be done to
+make low-latency systems resist these attacks as well as high-latency
+systems? Tor already makes some effort to conceal the starts and ends of
+streams by wrapping all long-range control commands in identical-looking
+relay cells. Link padding could frustrate passive observers who count
+packets; long-range padding could work against observers who own the
+first hop in a circuit. But more research remains to find an efficient
+and practical approach. Volunteers prefer not to run constant-bandwidth
+padding; but no convincing traffic shaping approach has ever been
+specified. Recent work on long-range padding \cite{defensive-dropping}
+shows promise. One could also try to reduce correlation in packet timing
+by batching and re-ordering packets, but it is unclear whether this could
+improve anonymity without introducing so much latency as to render the
 network unusable.
 
-Even if passive timing attacks were wholly solved, active timing
-attacks would remain.  \emph{What can
-  be done to address attackers who can introduce timing patterns into
-  a user's traffic?}  % [XXX mention likely approaches]
-
-%%% I think we cover this by framing the problem as ``Can we make 
-%%% end-to-end characteristics of low-latency systems as good as
-%%% those of high-latency systems?''  Eliminating long-term
-%%% intersection is a hard problem.
-%
-%Even regardless of link padding from Alice to the cloud, there will be
-%times when Alice is simply not online. Link padding, at the edges or
-%inside the cloud, does not help for this.
-
-In order to scale to many users, and to prevent an
-attacker from observing the whole network at once, it may be necessary
-for low-latency anonymity systems to support far more servers than Tor
-currently anticipates.  This introduces several issues.  First, if
-approval by a centralized set of directory servers is no longer
-feasible, what mechanism should be used to prevent adversaries from
-signing up many spurious servers? 
-Second, if clients can no longer have a complete
-picture of the network at all times, how can they perform
-discovery while preventing attackers from manipulating or exploiting
-gaps in client knowledge?  Third, if there are too many servers
-for every server to constantly communicate with every other, what kind
-of non-clique topology should the network use?   Restricted-route
-topologies promise comparable anonymity with better scalability
-\cite{danezis-pets03}, but whatever topology we choose, we need some
-way to keep attackers from manipulating their position within it.
-Fourth, since no centralized authority is tracking server reliability,
-How do we prevent unreliable servers from rendering the network
-unusable?  Fifth, do clients receive so much anonymity benefit from
-running their own servers that we should expect them all to do so, or
-do we need to find another incentive structure to motivate them?
-(Tarzan and MorphMix present possible solutions.)
-
-% [[ XXX how to approve new nodes (advogato, sybil, captcha (RTT));]
-
-Alternatively, it may be the case that one of these problems proves
-intractable, or that the drawbacks to many-server systems prove
-greater than the benefits.  Nevertheless, we may still do well to
-consider non-clique topologies.  A cascade topology may provide more
-defense against traffic confirmation.
-% XXX Why would it?   Cite.  -NM
-%
-% Huh? Do you mean for simple attacks just because of larger anonymity
-% sets? -PS
-Does the hydra topology (many input nodes, few output nodes) work
-better? Are we going to get a hydra anyway because most nodes will be
-middleman nodes?
-
-As mentioned in Section~\ref{subsec:dos}, Tor could improve its
-robustness against node failure by buffering transmitted stream data
-at the network's edges until the data has been acknowledged by the
-other end of the stream.  The efficacy of this approach remains to be
-tested, however, and there may be more effective means for ensuring
-reliable connections in the presence of unreliable nodes.
-
-%%% Keeping this original paragraph for a little while, since it 
-%%% is not the same as what's written there now.
-%
-%Because Tor depends on TLS and TCP to provide a reliable transport,
-%when one of the servers goes down, all the circuits (and thus streams)
-%traveling over that server must break.  This reduces anonymity because
-%everybody needs to reconnect right then (does it? how much?)  and
-%because exit connections all break at the same time, and it also harms
-%usability. It seems the problem is even worse in a peer-to-peer
-%environment, because so far such systems don't really provide an
-%incentive for nodes to stay connected when they're done browsing, so
-%we would expect a much higher churn rate than for onion routing.
-%there ways of allowing streams to survive the loss of a node in the
-%path?
-
-% Roger or Paul suggested that we say something about incentives,
-% too, but I think that's a better candidate for our future work
-% section.  After all, we will doubtlessly learn very much about why
-% people do or don't run and use Tor in the near future. -NM
-
-%We should run a squid at each exit node, to provide comparable anonymity
-%to private exit nodes for cache hits, to speed everything up, and to
-%have a buffer for funny stuff coming out of port 80.
-% on the other hand, it hampers PFS, because ORs have pages in the cache.
-%I previously elsewhere suggested bulk transfer proxies to carve
-%up big things so that they could be downloaded in less noticeable
-%pieces over several normal looking connections. We could suggest
-%similarly one or a handful of squid nodes that might serve up
-%some of the more sensitive but common material, especially if
-%the relevant sites didn't want to or couldn't run their own OR.
-%This would be better than having everyone run a squid which would
-%just help identify after the fact the different history of that
-%node's activity. All this kind of speculation needs to move to
-%future work section I guess. -PS]
+Common wisdom suggests that Alice should run her own onion router for best
+anonymity, because traffic coming through her node could plausibly have
+come from elsewhere. How much mixing do we need before this is actually
+effective, or is it immediately beneficial because many real-world
+adversaries won't be able to observe Alice's router?
+
+To scale to many users, and to prevent an attacker from observing the
+whole network at once, it may be necessary for low-latency anonymity
+systems to support far more servers than Tor currently anticipates.
+This introduces several issues.  First, if approval by a centralized set
+of directory servers is no longer feasible, what mechanism should be used
+to prevent adversaries from signing up many colluding servers? Second,
+if clients can no longer have a complete picture of the network at all
+times, how can they perform discovery while preventing attackers from
+manipulating or exploiting gaps in client knowledge?  Third, if there
+are too many servers for every server to constantly communicate with
+every other, what kind of non-clique topology should the network use?
+Restricted-route topologies promise comparable anonymity with better
+scalability \cite{danezis-pets03}, but whatever topology we choose, we
+need some way to keep attackers from manipulating their position within
+it \cite{casc-rep}. Fourth, since no centralized authority is tracking
+server reliability, How do we prevent unreliable servers from rendering
+the network unusable?  Fifth, do clients receive so much anonymity benefit
+from running their own servers that we should expect them all to do so
+\cite{econymics}, or do we need to find another incentive structure to
+motivate them?  Tarzan and MorphMix present possible solutions.
+
+% advogato, captcha
+
+A cascade topology with long-range padding and mixing may provide more
+defense against traffic confirmation against a large adversary, because
+it aggregates many users. Does the hydra topology (many input nodes,
+few output nodes) work better against some adversaries? Are we going to
+get a hydra anyway because most nodes will be middleman nodes?
+
+When a Tor node goes down, all its circuits (and thus streams) must break.
+Do users abandon the system because of this brittleness? How well
+does the method in Section~\ref{subsec:dos} allow streams to survive
+node failure? If affected users rebuild circuits immediately, how much
+anonymity is lost? It seems the problem is even worse in a peer-to-peer
+environment---so far such systems don't provide an incentive for peers to
+stay connected when they're done retrieving content, so we would expect
+a higher churn rate.
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 \Section{Future Directions}
 \label{sec:conclusion}
 
-Tor brings together many innovations into
-a unified deployable system. But there are still several attacks that
-work quite well, as well as a number of sustainability and run-time
-issues remaining to be ironed out. In particular:
-
-% Many of these (Scalability, cover traffic, morphmix) 
-% are duplicates from open problems.
-%
-
-\emph{Scalability:} Tor's emphasis on design simplicity and
-deployability has led us to adopt a clique topology, a
-semi-centralized model for directories and trusts, and a
-full-network-visibility model for client knowledge.  None of these
-properties will scale to more than a few hundred servers.
-Promising approaches to better scalability exist (see
-Section~\ref{sec:maintaining-anonymity}), but more deployment
-experience would be helpful in learning the relative importance of
-these bottlenecks.
-
-\emph{Incentives:} Volunteers may want to run nodes for publicity
-or better anonymity \cite{econymics}. 
-more users -> more anonymity
-
-\emph{Cover traffic:} Currently we avoid cover traffic because
-whereas its costs in performance and bandwidth are clear, and because its
-security benefits are not well understood. With more research
-\cite{SS03,defensive-dropping}, this price/value ratio may change,
-both for link-level cover traffic and also long-range cover traffic.
-
-\emph{Better directory distribution:} Even with the threshold
-directory agreement algorithm described in Section~\ref{subsec:dirservers},
-directory distribution is still performance-critical. We must find more
-decentralized yet practical ways to distribute up-to-date snapshots of
-network status without introducing new attacks.  Also, directory
-retrieval presents a scaling problem, since clients currently
-download a description of the entire network state every 15
-minutes.  As the state grows larger and clients more numerous, we
-may need to move to a solution in which clients only receive
-incremental updates to directory state.
-
-\emph{Implementing location-hidden servers:} While
-Section~\ref{sec:rendezvous} describes a design for rendezvous
-points and location-hidden servers, these features have not yet been
-implemented.  While doing so we are likely to encounter additional
-issues that must be resolved, both in terms of usability and anonymity.
+Tor brings together many innovations into a unified deployable system. The
+immediate next steps include:
+
+\emph{Scalability:} Tor's emphasis on design simplicity and deployability
+has led us to adopt a clique topology, a semi-centralized model for
+directories and trusts, and a full-network-visibility model for client
+knowledge. These properties will not scale past a few hundred servers.
+Section~\ref{sec:maintaining-anonymity} describes some promising
+approaches, but more deployment experience will be helpful in learning
+the relative importance of these bottlenecks.
+
+\emph{Bandwidth classes:} In this paper we assume all onion routers have
+good bandwidth and latency. We should adapt the Morphmix model,
+where nodes advertise their bandwidth level (DSL, T1, T3), and
+Alice avoids bottlenecks in her path by choosing nodes that match or
+exceed her bandwidth. In this way DSL users can join the Tor network.
+
+\emph{Incentives:} Volunteers who run nodes are rewarded with publicity
+and possibly better anonymity \cite{econymics}. More nodes means increased
+scalability, and more users can mean more anonymity. We need to continue
+examining the incentive structures for participating in Tor.
+
+\emph{Cover traffic:} Currently Tor avoids cover traffic because its costs
+in performance and bandwidth are clear, whereas its security benefits are
+not well-understood. We must pursue more research on both link-level cover
+traffic and long-range cover traffic to determine some simple padding
+schemes that offer provable protection against our chosen adversary.
+
+%%\emph{Offer two relay cell sizes:} Traffic on the Internet tends to be
+%%large for bulk transfers and small for interactive traffic. One cell
+%%size cannot be optimal for both types of traffic.
+% This should go in the spec and todo, but not the paper yet. -RD
+
+\emph{Caching at exit nodes:} We should run a caching web proxy at each
+exit node, to provide anonymity for cached pages (Alice's request never
+leaves the Tor network), to improve speed, and to reduce bandwidth cost.
+%XXX and to have a layer to block to block funny stuff out of port 80.
+% is that a useful thing to say?
+On the other hand, forward security is weakened because routers have the
+pages in their cache. We must find the right balance between usability
+and security.
+
+\emph{Better directory distribution:} Directory retrieval presents
+a scaling problem, since clients currently download a description of
+the entire network state every 15 minutes. As the state grows larger
+and clients more numerous, we may need to move to a solution in which
+clients only receive incremental updates to directory state.
+
+\emph{Implement location-hidden services:} The design in
+Section~\ref{sec:rendezvous} has not yet been implemented.  While doing
+so we are likely to encounter additional issues that must be resolved,
+both in terms of usability and anonymity.
 
 \emph{Further specification review:} Although we have a public,
 byte-level specification for the Tor protocols, this document has
@@ -1889,7 +1805,6 @@ designer of MorphMix to make the common elements of our two systems
 share a common specification and implementation. So far, this seems
 to be relatively straightforward.  Interoperability will allow testing
 and direct comparison of the two designs for trust and scalability.
-% XXXX Bandwidth classes.
 
 \emph{Wider-scale deployment:} The original goal of Tor was to
 gain experience in deploying an anonymizing overlay network, and
@@ -1900,7 +1815,6 @@ able to evaluate some of our design decisions, including our
 robustness/latency trade-offs, our performance trade-offs (including
 cell size), our abuse-prevention mechanisms, and
 our overall usability.
-% XXX large and small cells on same network.
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
author	Roger Dingledine <arma@torproject.org>	2003-11-04 14:59:58 +0000
committer	Roger Dingledine <arma@torproject.org>	2003-11-04 14:59:58 +0000
commit	cb97bf8c069c9ce6e2dfc85f61c2e43256277ef4 (patch)
tree	10e5c60a55e9b8bd938d9df319b35db470840788 /doc/tor-design.tex
parent	55057a3929e368829fceddaa7ea893b8a00cb8f8 (diff)
download	tor-cb97bf8c069c9ce6e2dfc85f61c2e43256277ef4.tar.gz tor-cb97bf8c069c9ce6e2dfc85f61c2e43256277ef4.zip