summaryrefslogtreecommitdiff
path: root/doc/design-paper/roadmap-2007.tex
blob: 7eef057006ecf67e1a3ab84e187f343ecfb2ea5c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
\documentclass{article}

\newenvironment{tightlist}{\begin{list}{$\bullet$}{
  \setlength{\itemsep}{0mm}
    \setlength{\parsep}{0mm}
    %  \setlength{\labelsep}{0mm}
    %  \setlength{\labelwidth}{0mm}
    %  \setlength{\topsep}{0mm}
    }}{\end{list}}
\newcommand{\tmp}[1]{{\bf #1} [......] \\}

\begin{document}

\title{Tor Development Roadmap: Wishlist for Nov 2006--Dec 2007}
\author{Roger Dingledine \and Nick Mathewson \and Shava Nerad}

\maketitle
\pagestyle{plain}

\section{Introduction}
Hi, Roger!  Hi, Shava.  This paragraph should get deleted soon.  Right now,
this document goes into about as much detail as I'd like to go into for a
technical audience, since that's the audience I know best.  It doesn't have
time estimates everywhere.  It isn't well prioritized, and it doesn't
distinguish well between things that need lots of research and things that
don't.  The breakdowns don't all make sense.  There are lots of things where
I don't make it clear how they fit into larger goals, and lots of larger
goals that don't break down into little things. It isn't all stuff we can do
for sure, and it isn't even all stuff we can do for sure in 2007.  The
tmp\{\} macro indicates stuff I haven't said enough about.  That said, here
goes...

Tor (the software) and Tor (the overall software/network/support/document
suite) are now experiencing all the crises of success.  Over the next year,
we're probably going to grow more in terms of users, developers, and funding
than before.  This gives us the opportunity to perform long-neglected
maintenance tasks.

\section{Code and design infrastructure}

\subsection{Protocol revision}
To maintain backward compatibility, we've postponed major protocol
changes and redesigns for a long time.  Because of this, there are a number
of sensible revisions we've been putting off until we could deploy several of
them at once.  To do each of these, we first need to discuss design
alternatives with cryptographers and other outside collaborators to
make sure that our choices are secure.

First of all, our protocol needs better {\bf versioning support} so that we
can make backward-incompatible changes to our core protocol.  There are
difficult anonymity issues here, since many naive designs would make it easy
to tell clients apart based on their supported versions.

With protocol versioning support would come the ability to {\bf future-proof
  our ciphersuites}.  For example, not only our OR protocol, but also our
directory protocol, is pretty firmly tied to the SHA-1 hash function, which
though not insecure for our purposes, has begun to show its age.  We should
remove assumptions thoughout our design based on the assumption that public
keys, secret keys, or digests will remain any particular size infinitely.

A new protocol could support {\bf multiple cell sizes}.  Right now, all data
passes through the Tor network divided into 512-byte cells.  This is
efficient for high-bandwidth protocols, but inefficient for protocols
like SSH or AIM that send information in small chunks.  Of course, we need to
investigate the extent to which multiple sizes could make it easier for an
adversary to fingerprint a traffic pattern.

Our OR {\bf authentication protocol}, though provably
secure\cite{goldberg-tap}, relies more on particular aspects of RSA and our
implementation thereof than we had initially believed.  To future-proof
against changes, we should replace it with a less delicate approach.

\tmp{Stream migration?}

\subsection{Scalability}

\subsubsection{Improved directory performance}
Right now, clients download a statement of the {\bf network status} made by
each directory authority.  We could reduce network bandwidth significantly by
having the authorities jointly sign a statement reflecting their vote on the
current network status.  This would save clients up to 160K per hour, and
make their view of the network more uniform.  Of course, we'd need to make
sure the voting process was secure and resilient to failures in the network.

We should {\bf shorten router descriptors}, since the current format includes
a great deal of information that's only of interest to the directory
authorities, and not of interest to clients.  We can do this by having each
router upload a short-form and a long-form signed descriptor, and having
clients download only the short form.  Even a naive version of this would
save about 40\% of the bandwidth currently spent on descriptors.

We should {\bf have routers upload their descriptors even less often}, so
that clients do not need to download replacements every 18 hours whether any
information has changed or not.  (As of Tor 0.1.2.3-alpha, clients tolerate
routers that don't upload often, but routers still upload at least every 18
hours to support older clients.)

\subsubsection{Non-clique topology}
Our current network design achieves a certain amount of its anonymity by
making clients act like each other through the simple expedient of making
sure that all clients know all servers, and that any server can talk to any
other server.  But as the number of servers increases to serve an
ever-greater number of clients, these assumptions become impractical.

At worst, if these scalability issues become troubling before a solution is
found, we can design and build a solution to {\bf split the network into
multiple slices} until a better solution comes along.  This is not ideal,
since rather than looking like all other users from a point of view of path
selection, users would ``only'' look like 200,000--300,000 other users.

We are in the process of designing {\bf improved schemes for network
  scalability}.  Some approaches focus on limiting what an adversary can know
about what a user knows; others focus on reducing the extent to which an
adversary can exploit this knowledge.  These are currently in their infancy,
and will probably not be needed in 2007, but they must be designed in 2007 if
they are to be deployed in 2008.

\subsubsection{Relay incentives}

\tmp{We need incentives to relay.}

\subsection{Portability}
Our {\bf Windows implementation}, though much improved, continues to lag
behind Unix and Mac OS X, especially when running as a server.  We hope to
merge promising patches from Mike Chiussi to address this point, and bring
Windows performance on par with other platforms.

We should have {\bf better support for portable devices}, including modes of
operation that require less RAM, and that write to disk less frequently (to
avoid wearing out flash RAM).

\subsection{Performance: resource usage}

\tmp{Use less RAM when we have little.  Make buffer code smarter}

\tmp{Allow separate bandwidth buckets for different bandwidth classes}  This
gets us more users happy to run servers.

\tmp{Write-limiting for directory servers}

\tmp{Don't use so many sockets} We can save some for hidden services and for
  encrypted directories.

\subsection{Performance: network usage}

\tmp{Do research to figure out how well capacity is actually used.}

\tmp{Tune pathgen algorithms to use it better.}


\subsection{Blue-sky: UDP}

\tmp{support udp}

\tmp{Use udp as a transport}




\section{Blocking resistance}

\subsection{Design for blocking resistance}
We have written a design document explaining our general approach to blocking
resistance.  We should workshop it with other experts in the field to get
their ideas about how we can improve Tor's efficacy as an anti-censorship
tool.


\subsection{Implementation: client-side and bridges-side}
Our anticensorship design calls for some nodes to act as ``bridges'' that can
circumvent a national firewall, and others inside the firewall to act as pure
clients.  The design here is quite clear-cut; we're probably ready to begin
implementing it.  To implement bridges, we need only to have servers publish
themselves as limited-availability relays to a special bridge authority if
they judge they'd make good servers.  Clients need a flexible interface to
learn about bridges and to act on knowledge of bridges.

Clients also need to {\bf use the encrypted directory variant} added in Tor
0.1.2.3-alpha.  This will let them retrieve directory information over Tor
once they've got their initial bridges.

Bridges will want to be able to {\bf listen on multiple addresses and ports}
if they can, to give the adversary more ports to block.

Additionally, we should {\bf resist content-based filters}.  Though an
adversary can't see what users are saying, some aspects of our protocol are
easy to fingerprint {\em as} Tor.  We should correct this where possible.

\subsection{Implementation: bridge authorities}
Our design anticipates an arms race between discovery methods and censors.
We need to begin the infrastructure on our side quickly, preferably in a
flexible language like Python, so we can adapt quickly to censorship.

\section{Security}

\subsection{Security research projects}

\tmp{Mixed-latency}

\tmp{long-distance padding}

\tmp{router-zones}

\tmp{defenses against end-to-end correlation}  We don't expect any to work
right now, but it would be useful to learn that one did.  Alternatively,
proving that one didn't would free up researchers in the field to go work on
other things.

\subsection{Implementation security}

\tmp{Encrypt more keys}

\tmp{Talk Coverity or somebody with a copy of vs2005 into running tools on
  our code}

\tmp{Directory guards}

\subsection{Detect corrupt exits and other servers}

\tmp{Improved feedback mechanism for tools like SOAT to use}

\tmp{More tools like SOAT: check for routers that bork SSL, routers that
  sniff (and use) passwords...}

\tmp{Add a way for authorities to declare families.}

\tmp{Make authority administration simpler so authority ops spend less time
  on random junk and more time on care and feeding of the network.}

\tmp{Authorities should measure Stable (and maybe Fast) themselves, and not
  just believe declared router uptime.}

\subsection{Protocol security}

\tmp{Build in hooks for DoS-resistance: when we need it, we'll really need
  it.}


\section{Development infrastructure}

\subsection{Build farm}
We've begun to deploy a cross-platform distributed build farm of hosts
that build and test the Tor source every time it changes in our development
repository.

We need to {\bf get more participants}, so that we can test a larger variety
of platforms.  (Previously, we've only found out when our code had broken on
obscure platforms when somebody got around to building it.)

We need also to {\bf add our dependencies} to the build farm, so that we can
ensure that libraries we need (especially libevent) do not stop working on
any important platform between one release and the next.

\subsection{Improved testing harness}
Currently, our {\bf unit tests} cover only about XX\% of the code base.  This
is uncomfortably low; we should write more and switch to a more flexible
testing framework.

We should also write flexible {\bf automated single-host deployment tests} so
we can more easily verify that the current codebase works with the network.

\subsection{Centralized build system}
We currently rely on a separate packager to maintain the packaging system and
to build Tor on each platform for which we distribute binaries.  Separate
package maintainers is sensible, but separate package builders has meant
long turnaround times between source releases and package releases.  We
should create the necessary infrastructure for us to produce binaries for all
major packages within an hour or so of source release.

\subsection{Improved metrics}
\tmp{We'd like to know how the network is doing.}

\tmp{We'd like to know where users are in an even less intrusive way.}

\tmp{We'd like to know how much of the network is getting used.}

\subsection{Controller library}
We've done lots of design and development on our controller interface, which
allows UI applications and other tools to interact with Tor.  We could
encorage the development of more such tools by releasing a {\bf
  general-purpose controller library}, ideally with API support for several
popular programming languages.

\section{User experience}

\subsection{Get blocked less, get blocked less hard}
Right now, some services block access to Tor because they don't have a better
way to keep vandals from abusing them than blocking IP addresses associated
with vandalism.  Our approach so far has been to educate them about better
solutions that currently exist, but we should also {\bf create better
solutions for limiting vandalism by anonymous users} like credential and
blind-signature based implementations, and encourage their use.

Those who do block Tor users also block overbroadly, sometimes blacklisting
operators of Tor servers that do not permit exit to their services.  We could
obviate innocent reasons for doing so by designing a {\bf narrowly-targeted Tor
  RBL service} so that those who wanted to overblock Tor clould no longer
plead incompetence.

\subsection{All-in-one bundle}
\tmp{a.k.a ``Torpedo'', but rename this.}

\subsection{LiveCD Tor}
\tmp{a.k.a anonym.os done right}

\subsection{Interface improvements}
\tmp{Allow controllers to manipulate server status.}


\subsection{Firewall-level deployment}
Another useful deployment mode for some users is using {\bf Tor in a firewall
  configuration}, and directing all their traffic through Tor.  This can be a
little tricky to set up currently, but it's an effective way to make sure no
traffic leaves the host un-anonymized.  To achieve this, we need to {\bf
  improve and port our new TransPort} feature which allows Tor to be used
without SOCKS support; to {\bf add an anonymizing DNS proxy} feature to Tor;
and to {\bf construct a recommended set of firewall configurations} to redirect
traffic to Tor.

This is an area where {\bf deployment via a livecd}, or an installation
targetted at specialized home routing hardware, could be useful.

\subsection{Localization}
Right now, most of our user-facing code is internationalized.  We need to
internationalize the last few hold-outs (like the Tor installer), and get
more translations for the parts that are already internationalized.

Also, we should look into a {\bf unified translator's solution}.  Currently,
since different tools have been internationalized using the
framework-appropriate method, different tools require translators to localize
them via different interfaces.  Inasmuch as possible, we should make
translators only need to use a single tool to translate the whole Tor suite.

\section{Documentation}

\subsection{Unified documentation scheme}

We need to {\bf inventory our documentation.}  Our documentation so far has
been mostly produced on an {\it ad hoc} basis, in response to particular
needs and requests.  We should figure out what documentation we have, whih of
it (if any) should get priotority, and whether we can't put it all into a
single format.

We could {\bf unify the docs} into a single book-like thing.  This will also
help us identify what sections of the ``book'' are missing.

\subsection{Missing technical documentation}

We should {\bf revise our design paper} to reflect the new decisions and
research we've made since it was published in 2004.  This will help other
researchers evaluate and suggest improvements to Tor's current design.

Other projects sometimes implement the client side of our prototocol.  We
encourage this, but we should write {\bf a document about how to avoid
excessive resource use}, so we don't need to worry that they will do so
without regard to the effect of their choices on server resources.

\subsection{Missing user documentation}

\tmp{Discoursive and comprehensive docs}

\end{document}