early iteration of an 'experiences' section

tweaks throughout svn:r994
author: Roger Dingledine <arma@torproject.org> 2004-01-14 11:10:29 +0000
committer: Roger Dingledine <arma@torproject.org> 2004-01-14 11:10:29 +0000
commit: 619fdd1ea7ad6364360957a3f484e3451f24dc85 (patch)
tree: 0d96050753c27a4c72ac89cfb004003ae13a7613
parent: 9a53afb45b20fc06b5f9fca17608465f061ee048 (diff)
download: tor-619fdd1ea7ad6364360957a3f484e3451f24dc85.tar.gz
tor-619fdd1ea7ad6364360957a3f484e3451f24dc85.zip
2 files changed, 112 insertions, 54 deletions
diff --git a/doc/tor-design.bib b/doc/tor-design.bib
index 337712b51a..76610b4d17 100644
--- a/doc/tor-design.bib
+++ b/doc/tor-design.bib
@@ -843,7 +843,7 @@
   volume =       1,
   number =       1,
   pages =        {66--92},
-  month =        {November},
+  month =        {June},
 }
   %note =         {\url{http://citeseer.nj.nec.com/284739.html}}
 
diff --git a/doc/tor-design.tex b/doc/tor-design.tex
index ecd1999291..ef90595c5f 100644
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -75,7 +75,7 @@ close with a list of open problems in anonymous communication.
 \label{sec:intro}
 
 Onion Routing is a distributed overlay network designed to anonymize
-low-latency TCP-based applications such as web browsing, secure shell,
+TCP-based applications such as web browsing, secure shell,
 and instant messaging. Clients choose a path through the network and
 build a \emph{circuit}, in which each node (or ``onion router'' or ``OR'')
 in the path knows its predecessor and successor, but no other nodes in
@@ -164,7 +164,7 @@ that can be unreliable and complex. % open to partitioning attacks.
 Tor takes a simplified view toward distributing such
 information. Certain more trusted nodes act as \emph{directory
 servers}: they provide signed directories that describe known
-routers and their availability. Users periodically download the
+routers and their current state. Users periodically download the
 directories via HTTP.
 
 \textbf{Variable exit policies:} Tor provides a consistent mechanism
@@ -180,15 +180,17 @@ circuit could change the contents of data cells as they passed by---for
 example, to alter a connection request so it would connect
 to a different webserver, or to `tag' encrypted traffic and look for
 corresponding corrupted traffic at the network edges \cite{minion-design}.
-Tor hampers these attacks by checking data integrity before it leaves
+Tor hampers these attacks by verifying data integrity before it leaves
 the network.
 
-\textbf{Improved robustness to failed nodes:} A failed node
-in the old design meant that circuit building failed, but thanks to
-Tor's step-by-step circuit building, users notice failed nodes
-while building circuits and route around them. Additionally, liveness
-information from directories allows users to avoid unreliable nodes in
-the first place.
+%\textbf{Improved robustness to failed nodes:} A failed node
+%in the old design meant that circuit building failed, but thanks to
+%Tor's step-by-step circuit building, users notice failed nodes
+%while building circuits and route around them. Additionally, liveness
+%information from directories allows users to avoid unreliable nodes in
+%the first place.
+%% Can't really claim this, now that we've found so many variants of
+%% attack on partial-circuit-building. -RD
 
 \textbf{Rendezvous points and hidden services:}
 Tor provides an integrated mechanism for responder anonymity via
@@ -205,9 +207,13 @@ TCP streams. Not requiring patches (or built-in support) in an
 operating system's network stack has been valuable to Tor's
 portability and deployability.
 
-We have implemented most of the above features. Our source code is
-available under a free license, and, as far as we know, is unencumbered by
-patents. We have recently begun deploying a wide-area alpha network
+We have implemented all of the above features except rendezvous
+points. Our source code is
+available under a free license, and Tor
+%, as far as we know, is unencumbered by patents.
+is not covered by the patent that affected distribution and use of
+earlier versions of Onion Routing.
+We have recently begun deploying a wide-area alpha network
 to test the design in practice, to get more experience with usability
 and users, and to provide a research platform for experimentation.
 
@@ -216,7 +222,8 @@ our goals and assumptions in Section~\ref{sec:assumptions},
 and then address the above list of improvements in
 Sections~\ref{sec:design}-\ref{sec:rendezvous}. We summarize
 in Section~\ref{sec:attacks} how our design stands up to
-known attacks, and conclude with a list of open problems in
+known attacks, and talk about our early deployment experiences in
+Section~\ref{sec:in-the-wild}. We conclude with a list of open problems in
 Section~\ref{sec:maintaining-anonymity} and future work for the Onion
 Routing project in Section~\ref{sec:conclusion}.
 
@@ -234,31 +241,26 @@ decrypts, delays, and re-orders messages, before relaying them toward
 their destinations.
 
 Subsequent relay-based anonymity designs have diverged in two
-main directions.  Some have tried to maximize anonymity at
-the cost of introducing comparatively large and variable latencies,
-including {\bf Babel} \cite{babel}, {\bf Mixmaster}
-\cite{mixmaster-spec}, and
-{\bf Mixminion} \cite{minion-design}.  Because of this
-decision, these \emph{high-latency} networks resist strong global
-adversaries,
+main directions. Systems like {\bf Babel} \cite{babel}, {\bf Mixmaster}
+\cite{mixmaster-spec}, and {\bf Mixminion} \cite{minion-design} have tried
+to maximize anonymity at the cost of introducing comparatively large and
+variable latencies. Because of this decision, these \emph{high-latency}
+networks resist strong global adversaries,
 but introduce too much lag for interactive tasks like web browsing,
 Internet chat, or SSH connections.
 
 Tor belongs to the second category: \emph{low-latency} designs that
 try to anonymize interactive network traffic. These systems handle
-a variety of bidirectional protocols.
-They also provide more convenient
+a variety of bidirectional protocols. They also provide more convenient
 mail delivery than the high-latency anonymous email
 networks, because the remote mail server provides explicit and timely
-delivery confirmation.
-But because these designs typically
+delivery confirmation. But because these designs typically
 involve many packets that must be delivered quickly, it is
 difficult for them to prevent an attacker who can eavesdrop both ends of the
 communication from correlating the timing and volume
 of traffic entering the anonymity network with traffic leaving it.  These
-protocols are also vulnerable against active attacks in which an
-adversary introduces timing patterns into traffic entering the network and
-looks
+protocols are similarly vulnerable to an active adversary who introduces
+timing patterns into traffic entering the network and looks
 for correlated patterns among exiting traffic.
 Although some work has been done to frustrate
 these attacks, %\footnote{
@@ -269,13 +271,13 @@ these attacks, %\footnote{
 %}
 % Point in the footnote is covered above, yes? -PS
  most designs protect primarily against traffic analysis rather than traffic
-confirmation (cf.\ Section~\ref{subsec:threat-model}).
+confirmation (see Section~\ref{subsec:threat-model}).
 
 The simplest low-latency designs are single-hop proxies such as the
-{\bf Anonymizer} \cite{anonymizer}, wherein a single trusted server strips the
+{\bf Anonymizer} \cite{anonymizer}: a single trusted server strips the
 data's origin before relaying it.  These designs are easy to
 analyze, but users must trust the anonymizing proxy.
-Concentrating the traffic to a single point increases the anonymity set
+Concentrating the traffic to this single point increases the anonymity set
 (the people a given user is hiding among), but it is vulnerable if the
 adversary can observe all traffic going into and out of the proxy.
 
@@ -305,6 +307,7 @@ stronger anonymity at the cost of allowing a single user to shut
 down the network simply by not sending. Systems like {\bf ISDN mixes}
 \cite{isdn-mixes} were designed for other environments with
 different assumptions.
+%XXX please can we fix this sentence to something less demeaning
 
 In P2P designs like {\bf Tarzan} \cite{tarzan:ccs02} and {\bf MorphMix}
 \cite{morphmix:fc04}, all participants both generate traffic and relay
@@ -313,7 +316,7 @@ whether a given peer originated a request
 or just relayed it from another peer. While Tarzan and MorphMix use
 layered encryption as above, {\bf Crowds} \cite{crowds-tissec} simply assumes
 an adversary who cannot observe the initiator: it uses no public-key
-encryption, so any node on a circuit can read that circuit's traffic.
+encryption, so any node on a circuit can read the circuit's traffic.
 
 {\bf Hordes} \cite{hordes-jcs} is based on Crowds but also uses multicast
 responses to hide the initiator. {\bf Herbivore} \cite{herbivore} and
@@ -346,7 +349,7 @@ and anonymity.  For example, a system that understands HTTP, such as Crowds,
 can strip
 identifying information from those requests, can take advantage of caching
 to limit the number of requests that leave the network, and can batch
-or encode those requests in order to minimize the number of connections.
+or encode those requests to minimize the number of connections.
 On the other hand, an IP-level anonymizer can handle nearly any protocol,
 even ones unforeseen by its designers (though these systems require
 kernel-level modifications to some operating systems, and so are more
@@ -407,7 +410,7 @@ therefore not
 require modifying applications; should not introduce prohibitive delays;
 and should require users to make as few configuration decisions
 as possible.  Finally, Tor should be easily implemented on all common
-platforms; we cannot require users to change their operating system in order
+platforms; we cannot require users to change their operating system
 to be anonymous.  (The current Tor implementation runs on Windows and
 assorted Unix clones including Linux, FreeBSD, and MacOS X.)
 
@@ -452,9 +455,9 @@ as Privoxy to hide differences between clients, and expunge protocol
 features that leak identity.
 Note that by this separation Tor can also provide services that
 are anonymous to the network yet authenticated to the responder, like
-SSH. Similarly, Tor does not currently integrate
-tunneling for non-stream-based protocols like UDP; this too must be
-provided by an external service.
+SSH. Similarly, Tor does not integrate
+tunneling for non-stream-based protocols like UDP; this must be
+provided by an external service if appropriate.
 
 \textbf{Not steganographic:} Tor does not try to conceal who is connected
 to the network.
@@ -497,10 +500,9 @@ network stops; or by introducing patterns into traffic that can later be
 detected. The adversary might subvert the directory servers to give users
 differing views of network state. Additionally, he can try to decrease
 the network's reliability by attacking nodes or by performing antisocial
-activities from reliable servers and trying to get them taken down;
-making the network unreliable flushes users to other less anonymous
-systems, where they may be easier to attack.
-We summarize
+activities from reliable nodes and trying to get them taken down---making
+the network unreliable flushes users to other less anonymous
+systems, where they may be easier to attack. We summarize
 in Section~\ref{sec:attacks} how well the Tor design defends against
 each of these attacks.
 
@@ -863,8 +865,8 @@ Volunteers are generally more willing to run services that can limit
 their own bandwidth usage. To accommodate them, Tor servers use a
 token bucket approach \cite{tannenbaum96} to
 enforce a long-term average rate of incoming bytes, while still
-permitting short-term bursts above the allowed bandwidth. Current bucket
-sizes are set to ten seconds' worth of traffic.
+permitting short-term bursts above the allowed bandwidth.
+% Current bucket sizes are set to ten seconds' worth of traffic.
 
 %Further, we want to avoid starving any Tor streams. Entire circuits
 %could starve if we read greedily from connections and one connection
@@ -878,13 +880,13 @@ Because the Tor protocol generates roughly the same number of outgoing
 bytes as incoming bytes, it is sufficient in practice to limit only
 incoming bytes.
 With TCP streams, however, the correspondence is not one-to-one:
-relaying a single incoming byte can require an entire 256-byte cell.
+relaying a single incoming byte can require an entire 512-byte cell.
 (We can't just wait for more bytes, because the local application may
 be waiting for a reply.) Therefore, we treat this case as if the entire
 cell size had been read, regardless of the fullness of the cell.
 
 Further, inspired by Rennhard et al's design in \cite{anonnet}, a
-circuit's edges heuristically distinguish interactive streams from bulk
+circuit's edges can heuristically distinguish interactive streams from bulk
 streams by comparing the frequency with which they supply cells.  We can
 provide good latency for interactive streams by giving them preferential
 service, while still giving good overall throughput to the bulk
@@ -950,10 +952,8 @@ Currently, non-data relay cells do not affect the windows. Thus we
 avoid potential deadlock issues, for example, arising because a stream
 can't send a \emph{relay sendme} cell when its packaging window is empty.
 
-These arbitrarily chosen parameters
-%are probably not optimal; more
-%research remains to find which parameters
-seem to give tolerable throughput and delay; more research remains.
+These arbitrarily chosen parameters seem to give tolerable throughput
+and delay; see Section~\ref{sec:in-the-wild}.
 
 \Section{Other design decisions}
 
@@ -984,8 +984,8 @@ TLS handshakes or accepting \emph{create} cells.  So long as these
 tokens are easy to verify and computationally expensive to produce, this
 approach limits the attack multiplier.  Additionally, ORs may limit
 the rate at which they accept create cells and TLS connections, so that
-the computational work of processing them does not drown out the (comparatively
-inexpensive) work of symmetric cryptography needed to keep cells
+the computational work of processing them does not drown out the
+symmetric cryptography operations that keep cells
 flowing.  This rate limiting could, however, allow an attacker
 to slow down other users when they build new circuits.
 
@@ -1189,8 +1189,8 @@ Using directory servers is simpler and more flexible than flooding.
 Flooding is expensive, and complicates the analysis when we
 start experimenting with non-clique network topologies. Signed
 directories can be cached by other
-onion routers.
-Thus directory servers are not a performance
+onion routers,
+so directory servers are not a performance
 bottleneck when we have many users, and do not aid traffic analysis by
 forcing clients to periodically announce their existence to any
 central point.
@@ -1414,7 +1414,7 @@ been shown to be effective against SafeWeb \cite{hintz-pet02}.
 It may be less effective against Tor, since
 streams are multiplexed within the same circuit, and
 fingerprinting will be limited to
-the granularity of cells (currently 256 bytes). Additional
+the granularity of cells (currently 512 bytes). Additional
 defenses could include
 larger cell sizes, padding schemes to group websites
 into large sets, and link
@@ -1610,6 +1610,64 @@ point is no more sensitive than any other OR on
 a circuit, since all data passing through the rendezvous is encrypted
 with a session key shared by Alice and Bob.
 
+\Section{Early experiences: Tor in the Wild}
+\label{sec:in-the-wild}
+
+The current Tor network, as of mid January 2004, consists of 16 nodes
+(14 in the US, 2 in Europe), and we're adding more each week as the code
+gets more robust.\footnote{For comparison, the current remailer network
+has about 30 nodes.} Each node has at least a 768k/768k connection, and
+most have 10Mb. The number of users varies (and of course, it's hard to
+tell for sure), but we sometimes have several hundred users---admins at
+several companies have started putting their entire department's web
+traffic through Tor, to block snooping admins in other divisions of
+their company from reading the traffic. Tor users have reported using
+the network for web browsing, ftp, IRC, AIM, Kazaa, and ssh.
+
+As of mid January, each Tor node was processing roughly 800,000 relay
+cells (a bit under half a gigabyte) per week. On average, about 80\%
+of each 500-byte payload is full for cells going back to the client,
+whereas about 40\% is full for cells coming from the client. (They are
+difference because most of our traffic is web browsing.) Interactive
+traffic like ssh brings down the average a lot---once we have more
+experience, and assuming we can resolve the anonymity issues, we will
+consider partitioning traffic into two relay cell sizes: one to handle
+bulk traffic and one for interactive traffic.
+
+We haven't asked to use PlanetLab \cite{planetlab} to provide more nodes,
+because their AUP excludes projects like Tor (see also \cite{darkside}. On
+the other hand, we have had no abuse issues since the network was deployed
+in October 2003. Our default exit policy rejects smtp requests, to block
+spamming even before it becomes an issue. For now we're happy with our
+slow growth rate, while we add features, resolve bugs, and get a feel for
+what users actually want from an anonymity system. We are not eager to
+attract the Kazaa or warez communities, even though they would greatly
+bolster the anonymity sets---we must build a reputation of being for
+privacy, human rights, research, and other entirely legitimate activities.
+
+As for performance, profiling shows that almost all the CPU time for the
+Tor program itself is spent in AES (which is fast). Thus latency comes
+from two factors. First, network latency is a critical factor: we are
+intentionally bouncing traffic around the world several times. Second,
+our end-to-end congestion control algorithm focuses on protecting our
+volunteer servers from accidental DoS rather than providing maximum
+performance. Right now the first $500*500B=250KB$ of the stream arrives
+quickly, and after that throughput depends on the rate that \emph{relay
+sendme} acknowledgements arrive. We can tweak the congestion control
+parameters to provide faster throughput at the expense of requiring
+larger buffers at each node; adding the heuristics mentioned in
+Section~\ref{subsec:rate-limit} to give better speed to low-volume
+streams will change the equation too. More research remains to find the
+right balance.
+
+%performs badly on lossy networks. may need airhook or something else as
+%transport alternative?
+
+With the current network's topology and load, users can typically
+get 1-2 megabits sustained transfer rate. Overall, this performance is
+sufficient. The Tor design focuses on security; usability and performance
+just have to not suck too much.
+
 \Section{Open Questions in Low-latency Anonymity}
 \label{sec:maintaining-anonymity}
author	Roger Dingledine <arma@torproject.org>	2004-01-14 11:10:29 +0000
committer	Roger Dingledine <arma@torproject.org>	2004-01-14 11:10:29 +0000
commit	619fdd1ea7ad6364360957a3f484e3451f24dc85 (patch)
tree	0d96050753c27a4c72ac89cfb004003ae13a7613
parent	9a53afb45b20fc06b5f9fca17608465f061ee048 (diff)
download	tor-619fdd1ea7ad6364360957a3f484e3451f24dc85.tar.gz tor-619fdd1ea7ad6364360957a3f484e3451f24dc85.zip