diff options
Diffstat (limited to 'doc/design-paper')
-rw-r--r-- | doc/design-paper/roadmap-2007.pdf | bin | 119462 -> 0 bytes | |||
-rw-r--r-- | doc/design-paper/roadmap-2007.tex | 690 | ||||
-rw-r--r-- | doc/design-paper/roadmap-future.pdf | bin | 72297 -> 0 bytes | |||
-rw-r--r-- | doc/design-paper/roadmap-future.tex | 895 |
4 files changed, 0 insertions, 1585 deletions
diff --git a/doc/design-paper/roadmap-2007.pdf b/doc/design-paper/roadmap-2007.pdf Binary files differdeleted file mode 100644 index 2422c05888..0000000000 --- a/doc/design-paper/roadmap-2007.pdf +++ /dev/null diff --git a/doc/design-paper/roadmap-2007.tex b/doc/design-paper/roadmap-2007.tex deleted file mode 100644 index cebe4a5905..0000000000 --- a/doc/design-paper/roadmap-2007.tex +++ /dev/null @@ -1,690 +0,0 @@ -\documentclass{article} - -\usepackage{url} - -\newenvironment{tightlist}{\begin{list}{$\bullet$}{ - \setlength{\itemsep}{0mm} - \setlength{\parsep}{0mm} - % \setlength{\labelsep}{0mm} - % \setlength{\labelwidth}{0mm} - % \setlength{\topsep}{0mm} - }}{\end{list}} -\newcommand{\tmp}[1]{{\bf #1} [......] \\} -\newcommand{\plan}[1]{ {\bf (#1)}} - -\begin{document} - -\title{Tor Development Roadmap: Wishlist for Nov 2006--Dec 2007} -\author{Roger Dingledine \and Nick Mathewson \and Shava Nerad} - -\maketitle -\pagestyle{plain} - -% TO DO: -% add cites -% add time estimates - - -\section{Introduction} -%Hi, Roger! Hi, Shava. This paragraph should get deleted soon. Right now, -%this document goes into about as much detail as I'd like to go into for a -%technical audience, since that's the audience I know best. It doesn't have -%time estimates everywhere. It isn't well prioritized, and it doesn't -%distinguish well between things that need lots of research and things that -%don't. The breakdowns don't all make sense. There are lots of things where -%I don't make it clear how they fit into larger goals, and lots of larger -%goals that don't break down into little things. It isn't all stuff we can do -%for sure, and it isn't even all stuff we can do for sure in 2007. The -%tmp\{\} macro indicates stuff I haven't said enough about. That said, here -%plangoes... - -Tor (the software) and Tor (the overall software/network/support/document -suite) are now experiencing all the crises of success. Over the next year, -we're probably going to grow more in terms of users, developers, and funding -than before. This gives us the opportunity to perform long-neglected -maintenance tasks. - -\section{Code and design infrastructure} - -\subsection{Protocol revision} -To maintain backward compatibility, we've postponed major protocol -changes and redesigns for a long time. Because of this, there are a number -of sensible revisions we've been putting off until we could deploy several of -them at once. To do each of these, we first need to discuss design -alternatives with other cryptographers and outside collaborators to -make sure that our choices are secure. - -First of all, our protocol needs better {\bf versioning support} so that we -can make backward-incompatible changes to our core protocol. There are -difficult anonymity issues here, since many naive designs would make it easy -to tell clients apart (and then track them) based on their supported versions. - -With protocol versioning support would come the ability to {\bf future-proof - our ciphersuites}. For example, not only our OR protocol, but also our -directory protocol, is pretty firmly tied to the SHA-1 hash function, which -though not yet known to be insecure for our purposes, has begun to show -its age. We should -remove assumptions throughout our design based on the assumption that public -keys, secret keys, or digests will remain any particular size indefinitely. - -Our OR {\bf authentication protocol}, though provably -secure\cite{tap:pet2006}, relies more on particular aspects of RSA and our -implementation thereof than we had initially believed. To future-proof -against changes, we should replace it with a less delicate approach. - -\plan{For all the above: 2 person-months to specify, spread over several - months with time for interaction with external participants. One - person-month to implement. Start specifying in early 2007.} - -We might design a {\bf stream migration} feature so that streams tunneled -over Tor could be more resilient to dropped connections and changed IPs. -\plan{Not in 2007.} - -A new protocol could support {\bf multiple cell sizes}. Right now, all data -passes through the Tor network divided into 512-byte cells. This is -efficient for high-bandwidth protocols, but inefficient for protocols -like SSH or AIM that send information in small chunks. Of course, we need to -investigate the extent to which multiple sizes could make it easier for an -adversary to fingerprint a traffic pattern. \plan{Not in 2007.} - -As a part of our design, we should investigate possible {\bf cipher modes} -other than counter mode. For example, a mode with built-in integrity -checking, error propagation, and random access could simplify our protocol -significantly. Sadly, many of these are patented and unavailable for us. -\plan{Not in 2007.} - -\subsection{Scalability} - -\subsubsection{Improved directory efficiency} -Right now, clients download a statement of the {\bf network status} made by -each directory authority. We could reduce network bandwidth significantly by -having the authorities jointly sign a statement reflecting their vote on the -current network status. This would save clients up to 160K per hour, and -make their view of the network more uniform. Of course, we'd need to make -sure the voting process was secure and resilient to failures in the -network.\plan{Must do; specify in 2006. 2 weeks to specify, 3-4 weeks to - implement.} - -We should {\bf shorten router descriptors}, since the current format includes -a great deal of information that's only of interest to the directory -authorities, and not of interest to clients. We can do this by having each -router upload a short-form and a long-form signed descriptor, and having -clients download only the short form. Even a naive version of this would -save about 40\% of the bandwidth currently spent by clients downloading -descriptors.\plan{Must do; specify in 2006. 3-4 weeks.} - -We should {\bf have routers upload their descriptors even less often}, so -that clients do not need to download replacements every 18 hours whether any -information has changed or not. (As of Tor 0.1.2.3-alpha, clients tolerate -routers that don't upload often, but routers still upload at least every 18 -hours to support older clients.) \plan{Must do, but not until 0.1.1.x is -deprecated in mid 2007. 1 week.} - -\subsubsection{Non-clique topology} -Our current network design achieves a certain amount of its anonymity by -making clients act like each other through the simple expedient of making -sure that all clients know all servers, and that any server can talk to any -other server. But as the number of servers increases to serve an -ever-greater number of clients, these assumptions become impractical. - -At worst, if these scalability issues become troubling before a solution is -found, we can design and build a solution to {\bf split the network into -multiple slices} until a better solution comes along. This is not ideal, -since rather than looking like all other users from a point of view of path -selection, users would ``only'' look like 200,000--300,000 other -users.\plan{Not unless needed.} - -We are in the process of designing {\bf improved schemes for network - scalability}. Some approaches focus on limiting what an adversary can know -about what a user knows; others focus on reducing the extent to which an -adversary can exploit this knowledge. These are currently in their infancy, -and will probably not be needed in 2007, but they must be designed in 2007 if -they are to be deployed in 2008.\plan{Design in 2007; unknown difficulty. - Write a paper.} - -\subsubsection{Relay incentives} -To support more users on the network, we need to get more servers. So far, -we've relied on volunteerism to attract server operators, and so far it's -served us well. But in the long run, we need to {\bf design incentives for - users to run servers} and relay traffic for others. Most obviously, we -could try to build the network so that servers offered improved service for -other servers, but we would need to do so without weakening anonymity and -making it obvious which connections originate from users running servers. We -have some preliminary designs~\cite{incentives-txt,tor-challenges}, -but need to perform -some more research to make sure they would be safe and effective.\plan{Write - a draft paper; 2 person-months.} - -\subsection{Portability} -Our {\bf Windows implementation}, though much improved, continues to lag -behind Unix and Mac OS X, especially when running as a server. We hope to -merge promising patches from Mike Chiussi to address this point, and bring -Windows performance on par with other platforms.\plan{Do in 2007; 1.5 months - to integrate not counting Mike's work.} - -We should have {\bf better support for portable devices}, including modes of -operation that require less RAM, and that write to disk less frequently (to -avoid wearing out flash RAM).\plan{Optional; 2 weeks.} - -We should {\bf stop using socketpair on Windows}; instead, we can use -in-memory structures to communicate between cpuworkers and the main thread, -and between connections.\plan{Optional; 1 week.} - -\subsection{Performance: resource usage} -We've been working on {\bf using less RAM}, especially on servers. This has -paid off a lot for directory caches in the 0.1.2, which in some cases are -using 90\% less memory than they used to require. But we can do better, -especially in the area around our buffer management algorithms, by using an -approach more like the BSD and Linux kernels use instead of our current ring -buffer approach. (For OR connections, we can just use queues of cell-sized -chunks produced with a specialized allocator.) This could potentially save -around 25 to 50\% of the memory currently allocated for network buffers, and -make Tor a more attractive proposition for restricted-memory environments -like old computers, mobile devices, and the like.\plan{Do in 2007; 2-3 weeks - plus one week measurement.} - -We should improve our {\bf bandwidth limiting}. The current system has been -crucial in making users willing to run servers: nobody is willing to run a -server if it might use an unbounded amount of bandwidth, especially if they -are charged for their usage. We can make our system better by letting users -configure bandwidth limits independently for their own traffic and traffic -relayed for others; and by adding write limits for users running directory -servers.\plan{Do in 2006; 2-3 weeks.} - -On many hosts, sockets are still in short supply, and will be until we can -migrate our protocol to UDP. We can {\bf use fewer sockets} by making our -self-to-self connections happen internally to the code rather than involving -the operating system's socket implementation.\plan{Optional; 1 week.} - -\subsection{Performance: network usage} -We know too little about how well our current path -selection algorithms actually spread traffic around the network in practice. -We should {\bf research the efficacy of our traffic allocation} and either -assure ourselves that it is close enough to optimal as to need no improvement -(unlikely) or {\bf identify ways to improve network usage}, and get more -users' traffic delivered faster. Performing this research will require -careful thought about anonymity implications. - -We should also {\bf examine the efficacy of our congestion control - algorithm}, and see whether we can improve client performance in the -presence of a congested network through dynamic `sendme' window sizes or -other means. This will have anonymity implications too if we aren't careful. - -\plan{For both of the above: research, design and write - a measurement tool in 2007: 1 month. See if we can interest a graduate - student.} - -We should work on making Tor's cell-based protocol perform better on -networks with low bandwidth -and high packet loss.\plan{Do in 2007 if we're funded to do it; 4-6 weeks.} - -\subsection{Performance scenario: one Tor client, many users} -We should {\bf improve Tor's performance when a single Tor handles many - clients}. Many organizations want to manage a single Tor client on their -firewall for many users, rather than having each user install a separate -Tor client. We haven't optimized for this scenario, and it is likely that -there are some code paths in the current implementation that become -inefficient when a single Tor is servicing hundreds or thousands of client -connections. (Additionally, it is likely that such clients have interesting -anonymity requirements the we should investigate.) We should profile Tor -under appropriate loads, identify bottlenecks, and fix them.\plan{Do in 2007 - if we're funded to do it; 4-8 weeks.} - -\subsection{Tor servers on asymmetric bandwidth} - -Tor should work better on servers that have asymmetric connections like cable -or DSL. Because Tor has separate TCP connections between each -hop, if the incoming bytes are arriving just fine and the outgoing bytes are -all getting dropped on the floor, the TCP push-back mechanisms don't really -transmit this information back to the incoming streams.\plan{Do in 2007 since - related to bandwidth limiting. 3-4 weeks.} - -\subsection{Running Tor as both client and server} - -Many performance tradeoffs and balances that might need more attention. -We first need to track and fix whatever bottlenecks emerge; but we also -need to invent good algorithms for prioritizing the client's traffic -without starving the server's traffic too much.\plan{No idea; try -profiling and improving things in 2007.} - -\subsection{Protocol redesign for UDP} -Tor has relayed only TCP traffic since its first versions, and has used -TLS-over-TCP to do so. This approach has proved reliable and flexible, but -in the long term we will need to allow UDP traffic on the network, and switch -some or all of the network to using a UDP transport. {\bf Supporting UDP - traffic} will make Tor more suitable for protocols that require UDP, such -as many VOIP protocols. {\bf Using a UDP transport} could greatly reduce -resource limitations on servers, and make the network far less interruptible -by lossy connections. Either of these protocol changes would require a great -deal of design work, however. We hope to be able to enlist the aid of a few -talented graduate students to assist with the initial design and -specification, but the actual implementation will require significant testing -of different reliable transport approaches.\plan{Maybe do a design in 2007 if -we find an interested academic. Ian or Ben L might be good partners here.} - -\section{Blocking resistance} - -\subsection{Design for blocking resistance} -We have written a design document explaining our general approach to blocking -resistance. We should workshop it with other experts in the field to get -their ideas about how we can improve Tor's efficacy as an anti-censorship -tool. - -\subsection{Implementation: client-side and bridges-side} - -Our anticensorship design calls for some nodes to act as ``bridges'' -that are outside a national firewall, and others inside the firewall to -act as pure clients. This part of the design is quite clear-cut; we're -probably ready to begin implementing it. To {\bf implement bridges}, we -need to have servers publish themselves as limited-availability relays -to a special bridge authority if they judge they'd make good servers. -We will also need to help provide documentation for port forwarding, -and an easy configuration tool for running as a bridge. - -To {\bf implement clients}, we need to provide a flexible interface to -learn about bridges and to act on knowledge of bridges. We also need -to teach them how to know to use bridges as their first hop, and how to -fetch directory information from both classes of directory authority. - -Clients also need to {\bf use the encrypted directory variant} added in Tor -0.1.2.3-alpha. This will let them retrieve directory information over Tor -once they've got their initial bridges. We may want to get the rest of the -Tor user base to begin using this encrypted directory variant too, to -provide cover. - -Bridges will want to be able to {\bf listen on multiple addresses and ports} -if they can, to give the adversary more ports to block. - -\subsection{Research: anonymity implications from becoming a bridge} - -\subsection{Implementation: bridge authority} - -The design here is also reasonably clear-cut: we need to run some -directory authorities with a slightly modified protocol that doesn't leak -the entire list of bridges. Thus users can learn up-to-date information -for bridges they already know about, but they can't learn about arbitrary -new bridges. - -\subsection{Normalizing the Tor protocol on the wire} -Additionally, we should {\bf resist content-based filters}. Though an -adversary can't see what users are saying, some aspects of our protocol are -easy to fingerprint {\em as} Tor. We should correct this where possible. - -Look like Firefox; or look like nothing? -Future research: investigate timing similarities with other protocols. - -\subsection{Access control for bridges} -Design/impl: password-protecting bridges, in light of above. -And/or more general access control. - -\subsection{Research: scanning-resistance} - -\subsection{Research/Design/Impl: how users discover bridges} -Our design anticipates an arms race between discovery methods and censors. -We need to begin the infrastructure on our side quickly, preferably in a -flexible language like Python, so we can adapt quickly to censorship. - -phase one: personal bridges -phase two: families of personal bridges -phase three: more structured social network -phase four: bag of tricks -Research: phase five... - -Integration with Psiphon, etc? - -\subsection{Document best practices for users} -Document best practices for various activities common among -blocked users (e.g. WordPress use). - -\subsection{Research: how to know if a bridge has been blocked?} - -\subsection{GeoIP maintenance, and "private" user statistics} -How to know if the whole idea is working? - -\subsection{Research: hiding whether the user is reading or publishing?} - -\subsection{Research: how many bridges do you need to know to maintain -reachability?} - -\subsection{Resisting censorship of the Tor website, docs, and mirrors} - -We should take some effort to consider {\bf initial distribution of Tor and - related information} in countries where the Tor website and mirrors are -censored. (Right now, most countries that block access to Tor block only the -main website and leave mirrors and the network itself untouched.) Falling -back on word-of-mouth is always a good last resort, but we should also take -steps to make sure it's relatively easy for users to get ahold of a copy. - -\section{Security} - -\subsection{Security research projects} - -We should investigate approaches with some promise to help Tor resist -end-to-end traffic correlation attacks. It's an open research question -whether (and to what extent) {\bf mixed-latency} networks, {\bf low-volume - long-distance padding}, or other approaches can resist these attacks, which -are currently some of the most effective against careful Tor users. We -should research these questions and perform simulations to identify -opportunities for strengthening our design without dropping performance to -unacceptable levels. %Cite something -\plan{Start doing this in 2007; write a paper. 8-16 weeks.} - -We've got some preliminary results suggesting that {\bf a topology-aware - routing algorithm}~\cite{feamster:wpes2004} could reduce Tor users' -vulnerability against local or ISP-level adversaries, by ensuring that they -are never in a position to watch both ends of a connection. We need to -examine the effects of this approach in more detail and consider side-effects -on anonymity against other kinds of adversaries. If the approach still looks -promising, we should investigate ways for clients to implement it (or an -approximation of it) without having to download routing tables for the whole -Internet. \plan{Not in 2007 unless a graduate student wants to do it.} - -%\tmp{defenses against end-to-end correlation} We don't expect any to work -%right now, but it would be useful to learn that one did. Alternatively, -%proving that one didn't would free up researchers in the field to go work on -%other things. -% -% See above; I think I got this. - -We should research the efficacy of {\bf website fingerprinting} attacks, -wherein an adversary tries to match the distinctive traffic and timing -pattern of the resources constituting a given website to the traffic pattern -of a user's client. These attacks work great in simulations, but in -practice we hear they don't work nearly as well. We should get some actual -numbers to investigate the issue, and figure out what's going on. If we -resist these attacks, or can improve our design to resist them, we should. -% add cites -\plan{Possibly part of end-to-end correlation paper. Otherwise, not in 2007 - unless a graduate student is interested.} - -\subsection{Implementation security} -Right now, each Tor node stores its keys unencrypted. We should {\bf encrypt - more Tor keys} so that Tor authorities can require a startup password. We -should look into adding intermediary medium-term ``signing keys'' between -identity keys and onion keys, so that a password could be required to replace -a signing key, but not to start Tor. This would improve Tor's long-term -security, especially in its directory authority infrastructure.\plan{Design this - as a part of the revised ``v2.1'' directory protocol; implement it in - 2007. 3-4 weeks.} - -We should also {\bf mark RAM that holds key material as non-swappable} so -that there is no risk of recovering key material from a hard disk -compromise. This would require submitting patches upstream to OpenSSL, where -support for marking memory as sensitive is currently in a very preliminary -state.\plan{Nice to do, but not in immediate Tor scope.} - -There are numerous tools for identifying trouble spots in code (such as -Coverity or even VS2005's code analysis tool) and we should convince somebody -to run some of them against the Tor codebase. Ideally, we could figure out a -way to get our code checked periodically rather than just once.\plan{Almost - no time once we talk somebody into it.} - -We should try {\bf protocol fuzzing} to identify errors in our -implementation.\plan{Not in 2007 unless we find a grad student or - undergraduate who wants to try.} - -Our guard nodes help prevent an attacker from being able to become a chosen -client's entry point by having each client choose a few favorite entry points -as ``guards'' and stick to them. We should implement a {\bf directory - guards} feature to keep adversaries from enumerating Tor users by acting as -a directory cache.\plan{Do in 2007; 2 weeks.} - -\subsection{Detect corrupt exits and other servers} -With the success of our network, we've attracted servers in many locations, -operated by many kinds of people. Unfortunately, some of these locations -have compromised or defective networks, and some of these people are -untrustworthy or incompetent. Our current design relies on authority -administrators to identify bad nodes and mark them as nonfunctioning. We -should {\bf automate the process of identifying malfunctioning nodes} as -follows: - -We should create a generic {\bf feedback mechanism for add-on tools} like -Mike Perry's ``Snakes on a Tor'' to report failing nodes to authorities. -\plan{Do in 2006; 1-2 weeks.} - -We should write tools to {\bf detect more kinds of innocent node failure}, -such as nodes whose network providers intercept SSL, nodes whose network -providers censor popular websites, and so on. We should also try to detect -{\bf routers that snoop traffic}; we could do this by launching connections -to throwaway accounts, and seeing which accounts get used.\plan{Do in 2007; - ask Mike Perry if he's interested. 4-6 weeks.} - -We should add {\bf an efficient way for authorities to mark a set of servers - as probably collaborating} though not necessarily otherwise dishonest. -This happens when an administrator starts multiple routers, but doesn't mark -them as belonging to the same family.\plan{Do during v2.1 directory protocol - redesign; 1-2 weeks to implement.} - -To avoid attacks where an adversary claims good performance in order to -attract traffic, we should {\bf have authorities measure node performance} -(including stability and bandwidth) themselves, and not simply believe what -they're told. Measuring stability can be done by tracking MTBF. Measuring -bandwidth can be tricky, since it's hard to distinguish between a server with -low capacity, and a high-capacity server with most of its capacity in -use.\plan{Do ``Stable'' in 2007; 2-3 weeks. ``Fast'' will be harder; do it - if we can interest a grad student.} - -{\bf Operating a directory authority should be easier.} We rely on authority -operators to keep the network running well, but right now their job involves -too much busywork and administrative overhead. A better interface for them -to use could free their time to work on exception cases rather than on -adding named nodes to the network.\plan{Do in 2007; 4-5 weeks.} - -\subsection{Protocol security} - -In addition to other protocol changes discussed above, -% And should we move some of them down here? -NM -we should add {\bf hooks for denial-of-service resistance}; we have some -preliminary designs, but we shouldn't postpone them until we really need them. -If somebody tries a DDoS attack against the Tor network, we won't want to -wait for all the servers and clients to upgrade to a new -version.\plan{Research project; do this in 2007 if funded.} - -\section{Development infrastructure} - -\subsection{Build farm} -We've begun to deploy a cross-platform distributed build farm of hosts -that build and test the Tor source every time it changes in our development -repository. - -We need to {\bf get more participants}, so that we can test a larger variety -of platforms. (Previously, we've only found out when our code had broken on -obscure platforms when somebody got around to building it.) - -We need also to {\bf add our dependencies} to the build farm, so that we can -ensure that libraries we need (especially libevent) do not stop working on -any important platform between one release and the next. - -\plan{This is ongoing as more buildbots arrive.} - -\subsection{Improved testing harness} -Currently, our {\bf unit tests} cover only about 20\% of the code base. This -is uncomfortably low; we should write more and switch to a more flexible -testing framework.\plan{Ongoing basis, time permitting.} - -We should also write flexible {\bf automated single-host deployment tests} so -we can more easily verify that the current codebase works with the -network.\plan{Worthwhile in 2007; would save lots of time. 2-4 weeks.} - -We should build automated {\bf stress testing} frameworks so we can see which -realistic loads cause Tor to perform badly, and regularly profile Tor against -these loads. This would give us {\it in vitro} performance values to -supplement our deployment experience.\plan{Worthwhile in 2007; 2-6 weeks.} - -We should improve our memory profiling code.\plan{...} - - -\subsection{Centralized build system} -We currently rely on a separate packager to maintain the packaging system and -to build Tor on each platform for which we distribute binaries. Separate -package maintainers is sensible, but separate package builders has meant -long turnaround times between source releases and package releases. We -should create the necessary infrastructure for us to produce binaries for all -major packages within an hour or so of source release.\plan{We should - brainstorm this at least in 2007.} - -\subsection{Improved metrics} -We need a way to {\bf measure the network's health, capacity, and degree of - utilization}. Our current means for doing this are ad hoc and not -completely accurate - -We need better ways to {\bf tell which countries are users are coming from, - and how many there are}. A good perspective of the network helps us -allocate resources and identify trouble spots, but our current approaches -will work less and less well as we make it harder for adversaries to -enumerate users. We'll probably want to shift to a smarter, statistical -approach rather than our current ``count and extrapolate'' method. - -\plan{All of this in 2007 if funded; 4-8 weeks} - -% \tmp{We'd like to know how much of the network is getting used.} -% I think this is covered above -NM - -\subsection{Controller library} -We've done lots of design and development on our controller interface, which -allows UI applications and other tools to interact with Tor. We could -encourage the development of more such tools by releasing a {\bf - general-purpose controller library}, ideally with API support for several -popular programming languages.\plan{2006 or 2007; 1-2 weeks.} - -\section{User experience} - -\subsection{Get blocked less, get blocked less broadly} -Right now, some services block connections from the Tor network because -they don't have a better -way to keep vandals from abusing them than blocking IP addresses associated -with vandalism. Our approach so far has been to educate them about better -solutions that currently exist, but we should also {\bf create better -solutions for limiting vandalism by anonymous users} like credential and -blind-signature based implementations, and encourage their use. Other -promising starting points including writing a patch and explanation for -Wikipedia, and helping Freenode to document, maintain, and expand its -current Tor-friendly position.\plan{Do a writeup here in 2007; 1-2 weeks.} - -Those who do block Tor users also block overbroadly, sometimes blacklisting -operators of Tor servers that do not permit exit to their services. We could -obviate innocent reasons for doing so by designing a {\bf narrowly-targeted Tor - RBL service} so that those who wanted to overblock Tor could no longer -plead incompetence.\plan{Possibly in 2007 if we decide it's a good idea; 3 - weeks.} - -\subsection{All-in-one bundle} -We need a well-tested, well-documented bundle of Tor and supporting -applications configured to use it correctly. We have an initial -implementation well under way, but it will need additional work in -identifying requisite Firefox extensions, identifying security threats, -improving user experience, and so on. This will need significantly more work -before it's ready for a general public release. - -\subsection{LiveCD Tor} -We need a nice bootable livecd containing a minimal OS and a few applications -configured to use it correctly. The Anonym.OS project demonstrated that this -is quite feasible, but their project is not currently maintained. - -\subsection{A Tor client in a VM} -\tmp{a.k.a JanusVM} which is quite related to the firewall-level deployment -section below. JanusVM is a Linux kernel running in VMWare. It gets an IP -address from the network, and serves as a DHCP server for its host Windows -machine. It intercepts all outgoing traffic and redirects it into Privoxy, -Tor, etc. This Linux-in-Windows approach may help us with scalability in -the short term, and it may also be a good long-term solution rather than -accepting all security risks in Windows. - -%\subsection{Interface improvements} -%\tmp{Allow controllers to manipulate server status.} -% (Why is this in the User Experience section?) -RD -% I think it's better left to a generic ``make controller iface better'' item. - -\subsection{Firewall-level deployment} -Another useful deployment mode for some users is using {\bf Tor in a firewall - configuration}, and directing all their traffic through Tor. This can be a -little tricky to set up currently, but it's an effective way to make sure no -traffic leaves the host un-anonymized. To achieve this, we need to {\bf - improve and port our new TransPort} feature which allows Tor to be used -without SOCKS support; to {\bf add an anonymizing DNS proxy} feature to Tor; -and to {\bf construct a recommended set of firewall configurations} to redirect -traffic to Tor. - -This is an area where {\bf deployment via a livecd}, or an installation -targeted at specialized home routing hardware, could be useful. - -\subsection{Assess software and configurations for anonymity risks} -Right now, users and packagers are more or less on their own when selecting -Firefox extensions. We should {\bf assemble a recommended list of browser - extensions} through experiment, and include this in the application bundles -we distribute. - -We should also describe {\bf best practices for using Tor with each class of - application}. For example, Ethan Zuckerman has written a detailed -tutorial on how to use Tor, Firefox, GMail, and Wordpress to blog with -improved safety. There are many other cases on the Internet where anonymity -would be helpful, and there are a lot of ways to screw up using Tor. - -The Foxtor and Torbutton extensions serve similar purposes; we should pick a -favorite, and merge in the useful features of the other. - -%\tmp{clean up our own bundled software: -%E.g. Merge the good features of Foxtor into Torbutton} -% -% What else did you have in mind? -NM - -\subsection{Localization} -Right now, most of our user-facing code is internationalized. We need to -internationalize the last few hold-outs (like the Tor expert installer), and get -more translations for the parts that are already internationalized. - -Also, we should look into a {\bf unified translator's solution}. Currently, -since different tools have been internationalized using the -framework-appropriate method, different tools require translators to localize -them via different interfaces. Inasmuch as possible, we should make -translators only need to use a single tool to translate the whole Tor suite. - -\section{Support} - -It would be nice to set up some {\bf user support infrastructure} and -{\bf contributor support infrastructure}, especially focusing on server -operators and on coordinating volunteers. - -This includes intuitive and easy ticket systems for bug reports and -feature suggestions (not just mailing lists with a half dozen people -and no clear roles for who answers what), but it also includes a more -personalized and efficient framework for interaction so we keep the -attention and interest of the contributors, and so we make them feel -helpful and wanted. - -\section{Documentation} - -\subsection{Unified documentation scheme} - -We need to {\bf inventory our documentation.} Our documentation so far has -been mostly produced on an {\it ad hoc} basis, in response to particular -needs and requests. We should figure out what documentation we have, which of -it (if any) should get priority, and whether we can't put it all into a -single format. - -We could {\bf unify the docs} into a single book-like thing. This will also -help us identify what sections of the ``book'' are missing. - -\subsection{Missing technical documentation} - -We should {\bf revise our design paper} to reflect the new decisions and -research we've made since it was published in 2004. This will help other -researchers evaluate and suggest improvements to Tor's current design. - -Other projects sometimes implement the client side of our protocol. We -encourage this, but we should write {\bf a document about how to avoid -excessive resource use}, so we don't need to worry that they will do so -without regard to the effect of their choices on server resources. - -\subsection{Missing user documentation} - -Our documentation falls into two broad categories: some is `discoursive' and -explains in detail why users should take certain actions, and other -documentation is `comprehensive' and describes all of Tor's features. Right -now, we have no document that is both deep, readable, and thorough. We -should correct this by identifying missing spots in our design. - -\bibliographystyle{plain} \bibliography{tor-design} - -\end{document} - diff --git a/doc/design-paper/roadmap-future.pdf b/doc/design-paper/roadmap-future.pdf Binary files differdeleted file mode 100644 index 8300ce19c9..0000000000 --- a/doc/design-paper/roadmap-future.pdf +++ /dev/null diff --git a/doc/design-paper/roadmap-future.tex b/doc/design-paper/roadmap-future.tex deleted file mode 100644 index 4ab240f977..0000000000 --- a/doc/design-paper/roadmap-future.tex +++ /dev/null @@ -1,895 +0,0 @@ -\documentclass{article} - -\usepackage{url} -\usepackage{fullpage} - -\newenvironment{tightlist}{\begin{list}{$\bullet$}{ - \setlength{\itemsep}{0mm} - \setlength{\parsep}{0mm} - % \setlength{\labelsep}{0mm} - % \setlength{\labelwidth}{0mm} - % \setlength{\topsep}{0mm} - }}{\end{list}} -\newcommand{\tmp}[1]{{\bf #1} [......] \\} -\newcommand{\plan}[1]{ {\bf (#1)}} - -\begin{document} - -\title{Tor Development Roadmap: Wishlist for 2008 and beyond} -\author{Roger Dingledine \and Nick Mathewson} -\date{} - -\maketitle -\pagestyle{plain} - -\section{Introduction} - -Tor (the software) and Tor (the overall software/network/support/document -suite) are now experiencing all the crises of success. Over the next -years, we're probably going to grow even more in terms of users, developers, -and funding than before. This document attempts to lay out all the -well-understood next steps that Tor needs to take. We should periodically -reorganize it to reflect current and intended priorities. - -\section{Everybody can be a relay} - -We've made a lot of progress towards letting an ordinary Tor client also -serve as a Tor relay. But these issues remain. - -\subsection{UPNP} - -We should teach Vidalia how to speak UPNP to automatically open and -forward ports on common (e.g. Linksys) routers. There are some promising -Qt-based UPNP libs out there, and in any case there are others (e.g. in -Perl) that we can base it on. - -\subsection{``ORPort auto'' to look for a reachable port} - -Vidalia defaults to port 443 on Windows and port 8080 elsewhere. But if -that port is already in use, or the ISP filters incoming connections -on that port (some cablemodem providers filter 443 inbound), the user -needs to learn how to notice this, and then pick a new one and type it -into Vidalia. - -We should add a new option ``auto'' that cycles through a set of preferred -ports, testing bindability and reachability for each of them, and only -complains to the user once it's given up on the common choices. - -\subsection{Incentives design} - -Roger has been working with researchers at Rice University to simulate -and analyze a new design where the directory authorities assign gold -stars to well-behaving relays, and then all the relays give priority -to traffic from gold-starred relays. The great feature of the design is -that not only does it provide the (explicit) incentive to run a relay, -but it also aims to grow the overall capacity of the network, so even -non-relays will benefit. - -It needs more analysis, and perhaps more design work, before we try -deploying it. - -\subsection{Windows libevent} - -Tor relays still don't work well or reliably on Windows XP or Windows -Vista, because we don't use the Windows-native ``overlapped IO'' -approach. Christian King made a good start at teaching libevent about -overlapped IO during Google Summer of Code 2007, and next steps are -to a) finish that, b) teach Tor to do openssl calls on buffers rather -than directly to the network, and c) teach Tor to use the new libevent -buffers approach. - -\subsection{Network scaling} - -If we attract many more relays, we will need to handle the growing pains -in terms of getting all the directory information to all the users. - -The first piece of this issue is a practical question: since the -directory size scales linearly with more relays, at some point it -will no longer be practical for every client to learn about every -relay. We can try to reduce the amount of information each client needs -to fetch (e.g. based on fetching less information preemptively as in -Section~\ref{subsec:fewer-descriptor-fetches} below), but eventually -clients will need to learn about only a subset of the network, and we -will need to design good ways to divide up the network information. - -The second piece is an anonymity question that arises from this -partitioning: if Tor's security comes from having all the clients -behaving in similar ways, yet we are now giving different clients -different directory information, how can we minimize the new anonymity -attacks we introduce? - -\subsection{Using fewer sockets} - -Since in the current network every Tor relay can reach every other Tor -relay, and we have many times more users than relays, pretty much every -possible link in the network is in use. That is, the current network -is a clique in practice. - -And since each of these connections requires a TCP socket, it's going -to be hard for the network to grow much larger: many systems come with -a default of 1024 file descriptors allowed per process, and raising -that ulimit is hard for end users. Worse, many low-end gateway/firewall -routers can't handle this many connections in their routing table. - -One approach is a restricted-route topology~\cite{danezis:pet2003}: -predefine which relays can reach which other relays, and communicate -these restrictions to the relays and the clients. We need to compute -which links are acceptable in a way that's decentralized yet scalable, -and in a way that achieves a small-worlds property; and we -need an efficient (compact) way to characterize the topology information -so all the users could keep up to date. - -Another approach would be to switch to UDP-based transport between -relays, so we don't need to keep the TCP sockets open at all. Needs more -investigation too. - -\subsection{Auto bandwidth detection and rate limiting, especially for - asymmetric connections.} - - -\subsection{Better algorithms for giving priority to local traffic} - -Proposal 111 made a lot of progress at separating local traffic from -relayed traffic, so Tor users can rate limit the relayed traffic at a -stricter level. But since we want to pass both traffic classes over the -same TCP connection, we can't keep them entirely separate. The current -compromise is that we treat all bytes to/from a given connectin as -local traffic if any of the bytes within the past N seconds were local -bytes. But a) we could use some more intelligent heuristics, and b) -this leaks information to an active attacker about when local traffic -was sent/received. - -\subsection{Tolerate absurdly wrong clocks, even for relays} - -Many of our users are on Windows, running with a clock several days or -even several years off from reality. Some of them are even intentionally -in this state so they can run software that will only run in the past. - -Before Tor 0.1.1.x, Tor clients would still function if their clock was -wildly off --- they simply got a copy of the directory and believed it. -Starting in Tor 0.1.1.x (and even moreso in Tor 0.2.0.x), the clients -only use networkstatus documents that they believe to be recent, so -clients with extremely wrong clocks no longer work. (This bug has been -an unending source of vague and confusing bug reports.) - -The first step is for clients to recognize when all the directory material -they're fetching has roughly the same offset from their current time, -and then automatically correct for it. - -Once that's working well, clients who opt to become bridge relays should -be able to use the same approach to serve accurate directory information -to their bridge users. - -\subsection{Risks from being a relay} - -Three different research -papers~\cite{back01,clog-the-queue,attack-tor-oak05} describe ways to -identify the nodes in a circuit by running traffic through candidate nodes -and looking for dips in the traffic while the circuit is active. These -clogging attacks are not that scary in the Tor context so long as relays -are never clients too. But if we're trying to encourage more clients to -turn on relay functionality too (whether as bridge relays or as normal -relays), then we need to understand this threat better and learn how to -mitigate it. - -One promising research direction is to investigate the RelayBandwidthRate -feature that lets Tor rate limit relayed traffic differently from local -traffic. Since the attacker's ``clogging'' traffic is not in the same -bandwidth class as the traffic initiated by the user, it may be harder -to detect interference. Or it may not be. - -\subsection{First a bridge, then a public relay?} - -Once enough of the items in this section are done, I want all clients -to start out automatically detecting their reachability and opting -to be bridge relays. - -Then if they realize they have enough consistency and bandwidth, they -should automatically upgrade to being non-exit relays. - -What metrics should we use for deciding when we're fast enough -and stable enough to switch? Given that the list of bridge relays needs -to be kept secret, it doesn't make much sense to switch back. - -\section{Tor on low resources / slow links} -\subsection{Reducing directory fetches further} -\label{subsec:fewer-descriptor-fetches} -\subsection{AvoidDiskWrites} -\subsection{Using less ram} -\subsection{Better DoS resistance for tor servers / authorities} -\section{Blocking resistance} -\subsection{Better bridge-address-distribution strategies} -\subsection{Get more volunteers running bridges} -\subsection{Handle multiple bridge authorities} -\subsection{Anonymity for bridge users: second layer of entry guards, etc?} -\subsection{More TLS normalization} -\subsection{Harder to block Tor software distribution} -\subsection{Integration with Psiphon} -\section{Packaging} -\subsection{Switch Privoxy out for Polipo} - - Make Vidalia able to launch more programs itself -\subsection{Continue Torbutton improvements} - especially better docs -\subsection{Vidalia and stability (especially wrt ongoing Windows problems)} - learn how to get useful crash reports (tracebacks) from Windows users -\subsection{Polipo support on Windows} -\subsection{Auto update for Tor, Vidalia, others} -\subsection{Tor browser bundle for USB and standalone use} -\subsection{LiveCD solution} -\subsection{VM-based solution} -\subsection{Tor-on-enclave-firewall configuration} -\subsection{General tutorials on what common applications are Tor-friendly} -\subsection{Controller libraries (torctl) plus documentation} -\subsection{Localization and translation (Vidalia, Torbutton, web pages)} -\section{Interacting better with Internet sites} -\subsection{Make tordnsel (tor exitlist) better and more well-known} -\subsection{Nymble} -\subsection{Work with Wikipedia, Slashdot, Google(, IRC networks)} -\subsection{IPv6 support for exit destinations} -\section{Network health} -\subsection{torflow / soat to detect bad relays} -\subsection{make authorities more automated} -\subsection{torstatus pages and better trend tracking} -\subsection{better metrics for assessing network health / growth} - - geoip usage-by-country reporting and aggregation - (Once that's working, switch to Directory guards) -\section{Performance research} -\subsection{Load balance better} -\subsection{Improve our congestion control algorithms} -\subsection{Two-hops vs Three-hops} -\subsection{Transport IP packets end-to-end} -\section{Outreach and user education} -\subsection{"Who uses Tor" use cases} -\subsection{Law enforcement contacts} - - "Was this IP address a Tor relay recently?" database -\subsection{Commercial/enterprise outreach. Help them use Tor well and - not fear it.} -\subsection{NGO outreach and training.} - - "How to be a safe blogger" -\subsection{More activist coordinators, more people to answer user questions} -\subsection{More people to hold hands of server operators} -\subsection{Teaching the media about Tor} -\subsection{The-dangers-of-plaintext awareness} -\subsection{check.torproject.org and other "privacy checkers"} -\subsection{Stronger legal FAQ for US} -\subsection{Legal FAQs for other countries} -\section{Anonymity research} -\subsection{estimate relay bandwidth more securely} -\subsection{website fingerprinting attacks} -\subsection{safer e2e defenses} -\subsection{Using Tor when you really need anonymity. Can you compose it - with other steps, like more trusted guards or separate proxies?} -\subsection{Topology-aware routing; routing-zones, steven's pet2007 paper.} -\subsection{Exactly what do guard nodes provide?} - -Entry guards seem to defend against all sorts of attacks. Can we work -through all the benefits they provide? Papers like Nikita's CCS 2007 -paper make me think their value is not well-understood by the research -community. - -\section{Organizational growth and stability} -\subsection{A contingency plan if Roger gets hit by a bus} - - Get a new executive director -\subsection{More diversity of funding} - - Don't rely on any one funder as much - - Don't rely on any sector or funder category as much -\subsection{More Tor-funded people who are skilled at peripheral apps like - Vidalia, Torbutton, Polipo, etc} -\subsection{More coordinated media handling and strategy} -\subsection{Clearer and more predictable trademark behavior} -\subsection{More outside funding for internships, etc e.g. GSoC.} -\section{Hidden services} -\subsection{Scaling: how to handle many hidden services} -\subsection{Performance: how to rendezvous with them quickly} -\subsection{Authentication/authorization: how to tolerate DoS / load} -\section{Tor as a general overlay network} -\subsection{Choose paths / exit by country} -\subsection{Easier to run your own private servers and have Tor use them - anywhere in the path} -\subsection{Easier to run an independent Tor network} -\section{Code security/correctness} -\subsection{veracode} -\subsection{code audit} -\subsection{more fuzzing tools} -\subsection{build farm, better testing harness} -\subsection{Long-overdue code refactoring and cleanup} -\section{Protocol security} -\subsection{safer circuit handshake} -\subsection{protocol versioning for future compatibility} -\subsection{cell sizes} -\subsection{adapt to new key sizes, etc} - -\bibliographystyle{plain} \bibliography{tor-design} - -\end{document} - - - - -\section{Code and design infrastructure} - -\subsection{Protocol revision} -To maintain backward compatibility, we've postponed major protocol -changes and redesigns for a long time. Because of this, there are a number -of sensible revisions we've been putting off until we could deploy several of -them at once. To do each of these, we first need to discuss design -alternatives with other cryptographers and outside collaborators to -make sure that our choices are secure. - -First of all, our protocol needs better {\bf versioning support} so that we -can make backward-incompatible changes to our core protocol. There are -difficult anonymity issues here, since many naive designs would make it easy -to tell clients apart (and then track them) based on their supported versions. - -With protocol versioning support would come the ability to {\bf future-proof - our ciphersuites}. For example, not only our OR protocol, but also our -directory protocol, is pretty firmly tied to the SHA-1 hash function, which -though not yet known to be insecure for our purposes, has begun to show -its age. We should -remove assumptions throughout our design based on the assumption that public -keys, secret keys, or digests will remain any particular size indefinitely. - -Our OR {\bf authentication protocol}, though provably -secure\cite{tap:pet2006}, relies more on particular aspects of RSA and our -implementation thereof than we had initially believed. To future-proof -against changes, we should replace it with a less delicate approach. - -\plan{For all the above: 2 person-months to specify, spread over several - months with time for interaction with external participants. One - person-month to implement. Start specifying in early 2007.} - -We might design a {\bf stream migration} feature so that streams tunneled -over Tor could be more resilient to dropped connections and changed IPs. -\plan{Not in 2007.} - -A new protocol could support {\bf multiple cell sizes}. Right now, all data -passes through the Tor network divided into 512-byte cells. This is -efficient for high-bandwidth protocols, but inefficient for protocols -like SSH or AIM that send information in small chunks. Of course, we need to -investigate the extent to which multiple sizes could make it easier for an -adversary to fingerprint a traffic pattern. \plan{Not in 2007.} - -As a part of our design, we should investigate possible {\bf cipher modes} -other than counter mode. For example, a mode with built-in integrity -checking, error propagation, and random access could simplify our protocol -significantly. Sadly, many of these are patented and unavailable for us. -\plan{Not in 2007.} - -\subsection{Scalability} - -\subsubsection{Improved directory efficiency} - -We should {\bf have routers upload their descriptors even less often}, so -that clients do not need to download replacements every 18 hours whether any -information has changed or not. (As of Tor 0.1.2.3-alpha, clients tolerate -routers that don't upload often, but routers still upload at least every 18 -hours to support older clients.) \plan{Must do, but not until 0.1.1.x is -deprecated in mid 2007. 1 week.} - -\subsubsection{Non-clique topology} -Our current network design achieves a certain amount of its anonymity by -making clients act like each other through the simple expedient of making -sure that all clients know all servers, and that any server can talk to any -other server. But as the number of servers increases to serve an -ever-greater number of clients, these assumptions become impractical. - -At worst, if these scalability issues become troubling before a solution is -found, we can design and build a solution to {\bf split the network into -multiple slices} until a better solution comes along. This is not ideal, -since rather than looking like all other users from a point of view of path -selection, users would ``only'' look like 200,000--300,000 other -users.\plan{Not unless needed.} - -We are in the process of designing {\bf improved schemes for network - scalability}. Some approaches focus on limiting what an adversary can know -about what a user knows; others focus on reducing the extent to which an -adversary can exploit this knowledge. These are currently in their infancy, -and will probably not be needed in 2007, but they must be designed in 2007 if -they are to be deployed in 2008.\plan{Design in 2007; unknown difficulty. - Write a paper.} - -\subsubsection{Relay incentives} -To support more users on the network, we need to get more servers. So far, -we've relied on volunteerism to attract server operators, and so far it's -served us well. But in the long run, we need to {\bf design incentives for - users to run servers} and relay traffic for others. Most obviously, we -could try to build the network so that servers offered improved service for -other servers, but we would need to do so without weakening anonymity and -making it obvious which connections originate from users running servers. We -have some preliminary designs~\cite{incentives-txt,tor-challenges}, -but need to perform -some more research to make sure they would be safe and effective.\plan{Write - a draft paper; 2 person-months.} -(XXX we did that) - -\subsection{Portability} -Our {\bf Windows implementation}, though much improved, continues to lag -behind Unix and Mac OS X, especially when running as a server. We hope to -merge promising patches from Christian King to address this point, and bring -Windows performance on par with other platforms.\plan{Do in 2007; 1.5 months - to integrate not counting Mike's work.} - -We should have {\bf better support for portable devices}, including modes of -operation that require less RAM, and that write to disk less frequently (to -avoid wearing out flash RAM).\plan{Optional; 2 weeks.} - -\subsection{Performance: resource usage} -We've been working on {\bf using less RAM}, especially on servers. This has -paid off a lot for directory caches in the 0.1.2, which in some cases are -using 90\% less memory than they used to require. But we can do better, -especially in the area around our buffer management algorithms, by using an -approach more like the BSD and Linux kernels use instead of our current ring -buffer approach. (For OR connections, we can just use queues of cell-sized -chunks produced with a specialized allocator.) This could potentially save -around 25 to 50\% of the memory currently allocated for network buffers, and -make Tor a more attractive proposition for restricted-memory environments -like old computers, mobile devices, and the like.\plan{Do in 2007; 2-3 weeks - plus one week measurement.} (XXX We did this, but we need to do something -more/else.) - -\subsection{Performance: network usage} -We know too little about how well our current path -selection algorithms actually spread traffic around the network in practice. -We should {\bf research the efficacy of our traffic allocation} and either -assure ourselves that it is close enough to optimal as to need no improvement -(unlikely) or {\bf identify ways to improve network usage}, and get more -users' traffic delivered faster. Performing this research will require -careful thought about anonymity implications. - -We should also {\bf examine the efficacy of our congestion control - algorithm}, and see whether we can improve client performance in the -presence of a congested network through dynamic `sendme' window sizes or -other means. This will have anonymity implications too if we aren't careful. - -\plan{For both of the above: research, design and write - a measurement tool in 2007: 1 month. See if we can interest a graduate - student.} - -We should work on making Tor's cell-based protocol perform better on -networks with low bandwidth -and high packet loss.\plan{Do in 2007 if we're funded to do it; 4-6 weeks.} - -\subsection{Performance scenario: one Tor client, many users} -We should {\bf improve Tor's performance when a single Tor handles many - clients}. Many organizations want to manage a single Tor client on their -firewall for many users, rather than having each user install a separate -Tor client. We haven't optimized for this scenario, and it is likely that -there are some code paths in the current implementation that become -inefficient when a single Tor is servicing hundreds or thousands of client -connections. (Additionally, it is likely that such clients have interesting -anonymity requirements the we should investigate.) We should profile Tor -under appropriate loads, identify bottlenecks, and fix them.\plan{Do in 2007 - if we're funded to do it; 4-8 weeks.} - -\subsection{Tor servers on asymmetric bandwidth} - -Tor should work better on servers that have asymmetric connections like cable -or DSL. Because Tor has separate TCP connections between each -hop, if the incoming bytes are arriving just fine and the outgoing bytes are -all getting dropped on the floor, the TCP push-back mechanisms don't really -transmit this information back to the incoming streams.\plan{Do in 2007 since - related to bandwidth limiting. 3-4 weeks.} - -\subsection{Running Tor as both client and server} - -Many performance tradeoffs and balances that might need more attention. -We first need to track and fix whatever bottlenecks emerge; but we also -need to invent good algorithms for prioritizing the client's traffic -without starving the server's traffic too much.\plan{No idea; try -profiling and improving things in 2007.} - -\subsection{Protocol redesign for UDP} -Tor has relayed only TCP traffic since its first versions, and has used -TLS-over-TCP to do so. This approach has proved reliable and flexible, but -in the long term we will need to allow UDP traffic on the network, and switch -some or all of the network to using a UDP transport. {\bf Supporting UDP - traffic} will make Tor more suitable for protocols that require UDP, such -as many VOIP protocols. {\bf Using a UDP transport} could greatly reduce -resource limitations on servers, and make the network far less interruptible -by lossy connections. Either of these protocol changes would require a great -deal of design work, however. We hope to be able to enlist the aid of a few -talented graduate students to assist with the initial design and -specification, but the actual implementation will require significant testing -of different reliable transport approaches.\plan{Maybe do a design in 2007 if -we find an interested academic. Ian or Ben L might be good partners here.} - -\section{Blocking resistance} - -\subsection{Design for blocking resistance} -We have written a design document explaining our general approach to blocking -resistance. We should workshop it with other experts in the field to get -their ideas about how we can improve Tor's efficacy as an anti-censorship -tool. - -\subsection{Implementation: client-side and bridges-side} - -Bridges will want to be able to {\bf listen on multiple addresses and ports} -if they can, to give the adversary more ports to block. - -\subsection{Research: anonymity implications from becoming a bridge} - -see arma's bridge proposal; e.g. should bridge users use a second layer of -entry guards? - -\subsection{Implementation: bridge authority} - -we run some -directory authorities with a slightly modified protocol that doesn't leak -the entire list of bridges. Thus users can learn up-to-date information -for bridges they already know about, but they can't learn about arbitrary -new bridges. - -we need a design for distributing the bridge authority over more than one -server - -\subsection{Normalizing the Tor protocol on the wire} -Additionally, we should {\bf resist content-based filters}. Though an -adversary can't see what users are saying, some aspects of our protocol are -easy to fingerprint {\em as} Tor. We should correct this where possible. - -Look like Firefox; or look like nothing? -Future research: investigate timing similarities with other protocols. - -\subsection{Research: scanning-resistance} - -\subsection{Research/Design/Impl: how users discover bridges} -Our design anticipates an arms race between discovery methods and censors. -We need to begin the infrastructure on our side quickly, preferably in a -flexible language like Python, so we can adapt quickly to censorship. - -phase one: personal bridges -phase two: families of personal bridges -phase three: more structured social network -phase four: bag of tricks -Research: phase five... - -Integration with Psiphon, etc? - -\subsection{Document best practices for users} -Document best practices for various activities common among -blocked users (e.g. WordPress use). - -\subsection{Research: how to know if a bridge has been blocked?} - -\subsection{GeoIP maintenance, and "private" user statistics} -How to know if the whole idea is working? - -\subsection{Research: hiding whether the user is reading or publishing?} - -\subsection{Research: how many bridges do you need to know to maintain -reachability?} - -\subsection{Resisting censorship of the Tor website, docs, and mirrors} - -We should take some effort to consider {\bf initial distribution of Tor and - related information} in countries where the Tor website and mirrors are -censored. (Right now, most countries that block access to Tor block only the -main website and leave mirrors and the network itself untouched.) Falling -back on word-of-mouth is always a good last resort, but we should also take -steps to make sure it's relatively easy for users to get ahold of a copy. - -\section{Security} - -\subsection{Security research projects} - -We should investigate approaches with some promise to help Tor resist -end-to-end traffic correlation attacks. It's an open research question -whether (and to what extent) {\bf mixed-latency} networks, {\bf low-volume - long-distance padding}, or other approaches can resist these attacks, which -are currently some of the most effective against careful Tor users. We -should research these questions and perform simulations to identify -opportunities for strengthening our design without dropping performance to -unacceptable levels. %Cite something -\plan{Start doing this in 2007; write a paper. 8-16 weeks.} - -We've got some preliminary results suggesting that {\bf a topology-aware - routing algorithm}~\cite{feamster:wpes2004} could reduce Tor users' -vulnerability against local or ISP-level adversaries, by ensuring that they -are never in a position to watch both ends of a connection. We need to -examine the effects of this approach in more detail and consider side-effects -on anonymity against other kinds of adversaries. If the approach still looks -promising, we should investigate ways for clients to implement it (or an -approximation of it) without having to download routing tables for the whole -Internet. \plan{Not in 2007 unless a graduate student wants to do it.} - -%\tmp{defenses against end-to-end correlation} We don't expect any to work -%right now, but it would be useful to learn that one did. Alternatively, -%proving that one didn't would free up researchers in the field to go work on -%other things. -% -% See above; I think I got this. - -We should research the efficacy of {\bf website fingerprinting} attacks, -wherein an adversary tries to match the distinctive traffic and timing -pattern of the resources constituting a given website to the traffic pattern -of a user's client. These attacks work great in simulations, but in -practice we hear they don't work nearly as well. We should get some actual -numbers to investigate the issue, and figure out what's going on. If we -resist these attacks, or can improve our design to resist them, we should. -% add cites -\plan{Possibly part of end-to-end correlation paper. Otherwise, not in 2007 - unless a graduate student is interested.} - -\subsection{Implementation security} - -We should also {\bf mark RAM that holds key material as non-swappable} so -that there is no risk of recovering key material from a hard disk -compromise. This would require submitting patches upstream to OpenSSL, where -support for marking memory as sensitive is currently in a very preliminary -state.\plan{Nice to do, but not in immediate Tor scope.} - -There are numerous tools for identifying trouble spots in code (such as -Coverity or even VS2005's code analysis tool) and we should convince somebody -to run some of them against the Tor codebase. Ideally, we could figure out a -way to get our code checked periodically rather than just once.\plan{Almost - no time once we talk somebody into it.} - -We should try {\bf protocol fuzzing} to identify errors in our -implementation.\plan{Not in 2007 unless we find a grad student or - undergraduate who wants to try.} - -Our guard nodes help prevent an attacker from being able to become a chosen -client's entry point by having each client choose a few favorite entry points -as ``guards'' and stick to them. We should implement a {\bf directory - guards} feature to keep adversaries from enumerating Tor users by acting as -a directory cache.\plan{Do in 2007; 2 weeks.} - -\subsection{Detect corrupt exits and other servers} -With the success of our network, we've attracted servers in many locations, -operated by many kinds of people. Unfortunately, some of these locations -have compromised or defective networks, and some of these people are -untrustworthy or incompetent. Our current design relies on authority -administrators to identify bad nodes and mark them as nonfunctioning. We -should {\bf automate the process of identifying malfunctioning nodes} as -follows: - -We should create a generic {\bf feedback mechanism for add-on tools} like -Mike Perry's ``Snakes on a Tor'' to report failing nodes to authorities. -\plan{Do in 2006; 1-2 weeks.} - -We should write tools to {\bf detect more kinds of innocent node failure}, -such as nodes whose network providers intercept SSL, nodes whose network -providers censor popular websites, and so on. We should also try to detect -{\bf routers that snoop traffic}; we could do this by launching connections -to throwaway accounts, and seeing which accounts get used.\plan{Do in 2007; - ask Mike Perry if he's interested. 4-6 weeks.} - -We should add {\bf an efficient way for authorities to mark a set of servers - as probably collaborating} though not necessarily otherwise dishonest. -This happens when an administrator starts multiple routers, but doesn't mark -them as belonging to the same family.\plan{Do during v2.1 directory protocol - redesign; 1-2 weeks to implement.} - -To avoid attacks where an adversary claims good performance in order to -attract traffic, we should {\bf have authorities measure node performance} -(including stability and bandwidth) themselves, and not simply believe what -they're told. We also measure stability by tracking MTBF. Measuring -bandwidth will be tricky, since it's hard to distinguish between a server with -low capacity, and a high-capacity server with most of its capacity in -use. See also Nikita's NDSS 2008 paper.\plan{Do it if we can interest -a grad student.} - -{\bf Operating a directory authority should be easier.} We rely on authority -operators to keep the network running well, but right now their job involves -too much busywork and administrative overhead. A better interface for them -to use could free their time to work on exception cases rather than on -adding named nodes to the network.\plan{Do in 2007; 4-5 weeks.} - -\subsection{Protocol security} - -In addition to other protocol changes discussed above, -% And should we move some of them down here? -NM -we should add {\bf hooks for denial-of-service resistance}; we have some -preliminary designs, but we shouldn't postpone them until we really need them. -If somebody tries a DDoS attack against the Tor network, we won't want to -wait for all the servers and clients to upgrade to a new -version.\plan{Research project; do this in 2007 if funded.} - -\section{Development infrastructure} - -\subsection{Build farm} -We've begun to deploy a cross-platform distributed build farm of hosts -that build and test the Tor source every time it changes in our development -repository. - -We need to {\bf get more participants}, so that we can test a larger variety -of platforms. (Previously, we've only found out when our code had broken on -obscure platforms when somebody got around to building it.) - -We need also to {\bf add our dependencies} to the build farm, so that we can -ensure that libraries we need (especially libevent) do not stop working on -any important platform between one release and the next. - -\plan{This is ongoing as more buildbots arrive.} - -\subsection{Improved testing harness} -Currently, our {\bf unit tests} cover only about 20\% of the code base. This -is uncomfortably low; we should write more and switch to a more flexible -testing framework.\plan{Ongoing basis, time permitting.} - -We should also write flexible {\bf automated single-host deployment tests} so -we can more easily verify that the current codebase works with the -network.\plan{Worthwhile in 2007; would save lots of time. 2-4 weeks.} - -We should build automated {\bf stress testing} frameworks so we can see which -realistic loads cause Tor to perform badly, and regularly profile Tor against -these loads. This would give us {\it in vitro} performance values to -supplement our deployment experience.\plan{Worthwhile in 2007; 2-6 weeks.} - -We should improve our memory profiling code.\plan{...} - - -\subsection{Centralized build system} -We currently rely on a separate packager to maintain the packaging system and -to build Tor on each platform for which we distribute binaries. Separate -package maintainers is sensible, but separate package builders has meant -long turnaround times between source releases and package releases. We -should create the necessary infrastructure for us to produce binaries for all -major packages within an hour or so of source release.\plan{We should - brainstorm this at least in 2007.} - -\subsection{Improved metrics} -We need a way to {\bf measure the network's health, capacity, and degree of - utilization}. Our current means for doing this are ad hoc and not -completely accurate - -We need better ways to {\bf tell which countries are users are coming from, - and how many there are}. A good perspective of the network helps us -allocate resources and identify trouble spots, but our current approaches -will work less and less well as we make it harder for adversaries to -enumerate users. We'll probably want to shift to a smarter, statistical -approach rather than our current ``count and extrapolate'' method. - -\plan{All of this in 2007 if funded; 4-8 weeks} - -% \tmp{We'd like to know how much of the network is getting used.} -% I think this is covered above -NM - -\subsection{Controller library} -We've done lots of design and development on our controller interface, which -allows UI applications and other tools to interact with Tor. We could -encourage the development of more such tools by releasing a {\bf - general-purpose controller library}, ideally with API support for several -popular programming languages.\plan{2006 or 2007; 1-2 weeks.} - -\section{User experience} - -\subsection{Get blocked less, get blocked less broadly} -Right now, some services block connections from the Tor network because -they don't have a better -way to keep vandals from abusing them than blocking IP addresses associated -with vandalism. Our approach so far has been to educate them about better -solutions that currently exist, but we should also {\bf create better -solutions for limiting vandalism by anonymous users} like credential and -blind-signature based implementations, and encourage their use. Other -promising starting points including writing a patch and explanation for -Wikipedia, and helping Freenode to document, maintain, and expand its -current Tor-friendly position.\plan{Do a writeup here in 2007; 1-2 weeks.} - -Those who do block Tor users also block overbroadly, sometimes blacklisting -operators of Tor servers that do not permit exit to their services. We could -obviate innocent reasons for doing so by designing a {\bf narrowly-targeted Tor - RBL service} so that those who wanted to overblock Tor could no longer -plead incompetence.\plan{Possibly in 2007 if we decide it's a good idea; 3 - weeks.} - -\subsection{All-in-one bundle} -We need a well-tested, well-documented bundle of Tor and supporting -applications configured to use it correctly. We have an initial -implementation well under way, but it will need additional work in -identifying requisite Firefox extensions, identifying security threats, -improving user experience, and so on. This will need significantly more work -before it's ready for a general public release. - -\subsection{LiveCD Tor} -We need a nice bootable livecd containing a minimal OS and a few applications -configured to use it correctly. The Anonym.OS project demonstrated that this -is quite feasible, but their project is not currently maintained. - -\subsection{A Tor client in a VM} -\tmp{a.k.a JanusVM} which is quite related to the firewall-level deployment -section below. JanusVM is a Linux kernel running in VMWare. It gets an IP -address from the network, and serves as a DHCP server for its host Windows -machine. It intercepts all outgoing traffic and redirects it into Privoxy, -Tor, etc. This Linux-in-Windows approach may help us with scalability in -the short term, and it may also be a good long-term solution rather than -accepting all security risks in Windows. - -%\subsection{Interface improvements} -%\tmp{Allow controllers to manipulate server status.} -% (Why is this in the User Experience section?) -RD -% I think it's better left to a generic ``make controller iface better'' item. - -\subsection{Firewall-level deployment} -Another useful deployment mode for some users is using {\bf Tor in a firewall - configuration}, and directing all their traffic through Tor. This can be a -little tricky to set up currently, but it's an effective way to make sure no -traffic leaves the host un-anonymized. To achieve this, we need to {\bf - improve and port our new TransPort} feature which allows Tor to be used -without SOCKS support; to {\bf add an anonymizing DNS proxy} feature to Tor; -and to {\bf construct a recommended set of firewall configurations} to redirect -traffic to Tor. - -This is an area where {\bf deployment via a livecd}, or an installation -targeted at specialized home routing hardware, could be useful. - -\subsection{Assess software and configurations for anonymity risks} -Right now, users and packagers are more or less on their own when selecting -Firefox extensions. We should {\bf assemble a recommended list of browser - extensions} through experiment, and include this in the application bundles -we distribute. - -We should also describe {\bf best practices for using Tor with each class of - application}. For example, Ethan Zuckerman has written a detailed -tutorial on how to use Tor, Firefox, GMail, and Wordpress to blog with -improved safety. There are many other cases on the Internet where anonymity -would be helpful, and there are a lot of ways to screw up using Tor. - -The Foxtor and Torbutton extensions serve similar purposes; we should pick a -favorite, and merge in the useful features of the other. - -%\tmp{clean up our own bundled software: -%E.g. Merge the good features of Foxtor into Torbutton} -% -% What else did you have in mind? -NM - -\subsection{Localization} -Right now, most of our user-facing code is internationalized. We need to -internationalize the last few hold-outs (like the Tor expert installer), and get -more translations for the parts that are already internationalized. - -Also, we should look into a {\bf unified translator's solution}. Currently, -since different tools have been internationalized using the -framework-appropriate method, different tools require translators to localize -them via different interfaces. Inasmuch as possible, we should make -translators only need to use a single tool to translate the whole Tor suite. - -\section{Support} - -It would be nice to set up some {\bf user support infrastructure} and -{\bf contributor support infrastructure}, especially focusing on server -operators and on coordinating volunteers. - -This includes intuitive and easy ticket systems for bug reports and -feature suggestions (not just mailing lists with a half dozen people -and no clear roles for who answers what), but it also includes a more -personalized and efficient framework for interaction so we keep the -attention and interest of the contributors, and so we make them feel -helpful and wanted. - -\section{Documentation} - -\subsection{Unified documentation scheme} - -We need to {\bf inventory our documentation.} Our documentation so far has -been mostly produced on an {\it ad hoc} basis, in response to particular -needs and requests. We should figure out what documentation we have, which of -it (if any) should get priority, and whether we can't put it all into a -single format. - -We could {\bf unify the docs} into a single book-like thing. This will also -help us identify what sections of the ``book'' are missing. - -\subsection{Missing technical documentation} - -We should {\bf revise our design paper} to reflect the new decisions and -research we've made since it was published in 2004. This will help other -researchers evaluate and suggest improvements to Tor's current design. - -Other projects sometimes implement the client side of our protocol. We -encourage this, but we should write {\bf a document about how to avoid -excessive resource use}, so we don't need to worry that they will do so -without regard to the effect of their choices on server resources. - -\subsection{Missing user documentation} - -Our documentation falls into two broad categories: some is `discoursive' and -explains in detail why users should take certain actions, and other -documentation is `comprehensive' and describes all of Tor's features. Right -now, we have no document that is both deep, readable, and thorough. We -should correct this by identifying missing spots in our design. - -\bibliographystyle{plain} \bibliography{tor-design} - -\end{document} - |