diff options
Diffstat (limited to 'doc/design-paper/blocking.tex')
-rw-r--r-- | doc/design-paper/blocking.tex | 1894 |
1 files changed, 0 insertions, 1894 deletions
diff --git a/doc/design-paper/blocking.tex b/doc/design-paper/blocking.tex deleted file mode 100644 index 3b7d05ca57..0000000000 --- a/doc/design-paper/blocking.tex +++ /dev/null @@ -1,1894 +0,0 @@ -%\documentclass{llncs} -\documentclass{usenixsubmit} -%\documentclass[twocolumn]{article} -%usepackage{usenix} - -\usepackage{url} -\usepackage{amsmath} -\usepackage{epsfig} - -\setlength{\textwidth}{6.0in} -\setlength{\textheight}{8.5in} -\setlength{\topmargin}{.5cm} -\setlength{\oddsidemargin}{1cm} -\setlength{\evensidemargin}{1cm} - -\newenvironment{tightlist}{\begin{list}{$\bullet$}{ - \setlength{\itemsep}{0mm} - \setlength{\parsep}{0mm} - % \setlength{\labelsep}{0mm} - % \setlength{\labelwidth}{0mm} - % \setlength{\topsep}{0mm} - }}{\end{list}} - -\newcommand{\workingnote}[1]{} % The version that hides the note. -%\newcommand{\workingnote}[1]{(**#1)} % makes the note visible. - -\date{} -\title{Design of a blocking-resistant anonymity system\\DRAFT} - -%\author{Roger Dingledine\inst{1} \and Nick Mathewson\inst{1}} -\author{Roger Dingledine \\ The Tor Project \\ arma@torproject.org \and -Nick Mathewson \\ The Tor Project \\ nickm@torproject.org} - -\begin{document} -\maketitle -\pagestyle{plain} - -\begin{abstract} - -Internet censorship is on the rise as websites around the world are -increasingly blocked by government-level firewalls. Although popular -anonymizing networks like Tor were originally designed to keep attackers from -tracing people's activities, many people are also using them to evade local -censorship. But if the censor simply denies access to the Tor network -itself, blocked users can no longer benefit from the security Tor offers. - -Here we describe a design that builds upon the current Tor network -to provide an anonymizing network that resists blocking -by government-level attackers. We have implemented and deployed this -design, and talk briefly about early use. - -\end{abstract} - -\section{Introduction} - -Anonymizing networks like Tor~\cite{tor-design} bounce traffic around a -network of encrypting relays. Unlike encryption, which hides only {\it what} -is said, these networks also aim to hide who is communicating with whom, which -users are using which websites, and so on. These systems have a -broad range of users, including ordinary citizens who want to avoid being -profiled for targeted advertisements, corporations who don't want to reveal -information to their competitors, and law enforcement and government -intelligence agencies who need to do operations on the Internet without being -noticed. - -Historical anonymity research has focused on an -attacker who monitors the user (call her Alice) and tries to discover her -activities, yet lets her reach any piece of the network. In more modern -threat models such as Tor's, the adversary is allowed to perform active -attacks such as modifying communications to trick Alice -into revealing her destination, or intercepting some connections -to run a man-in-the-middle attack. But these systems still assume that -Alice can eventually reach the anonymizing network. - -An increasing number of users are using the Tor software -less for its anonymity properties than for its censorship -resistance properties---if they use Tor to access Internet sites like -Wikipedia -and Blogspot, they are no longer affected by local censorship -and firewall rules. In fact, an informal user study -%(described in Appendix~\ref{app:geoip}) -showed that a few hundred thousand users people access the Tor network -each day, with about 20\% of them coming from China~\cite{something}. - -The current Tor design is easy to block if the attacker controls Alice's -connection to the Tor network---by blocking the directory authorities, -by blocking all the relay IP addresses in the directory, or by filtering -based on the network fingerprint of the Tor TLS handshake. Here we -describe an -extended design that builds upon the current Tor network to provide an -anonymizing -network that resists censorship as well as anonymity-breaking attacks. -In section~\ref{sec:adversary} we discuss our threat model---that is, -the assumptions we make about our adversary. Section~\ref{sec:current-tor} -describes the components of the current Tor design and how they can be -leveraged for a new blocking-resistant design. Section~\ref{sec:related} -explains the features and drawbacks of the currently deployed solutions. -In sections~\ref{sec:bridges} through~\ref{sec:discovery}, we explore the -components of our designs in detail. Section~\ref{sec:security} considers -security implications and Section~\ref{sec:reachability} presents other -issues with maintaining connectivity and sustainability for the design. -%Section~\ref{sec:future} speculates about future more complex designs, -Finally section~\ref{sec:conclusion} summarizes our next steps and -recommendations. - -% The other motivation is for places where we're concerned they will -% try to enumerate a list of Tor users. So even if they're not blocking -% the Tor network, it may be smart to not be visible as connecting to it. - -%And adding more different classes of users and goals to the Tor network -%improves the anonymity for all Tor users~\cite{econymics,usability:weis2006}. - -% Adding use classes for countering blocking as well as anonymity has -% benefits too. Should add something about how providing undetected -% access to Tor would facilitate people talking to, e.g., govt. authorities -% about threats to public safety etc. in an environment where Tor use -% is not otherwise widespread and would make one stand out. - -\section{Adversary assumptions} -\label{sec:adversary} - -To design an effective anti-censorship tool, we need a good model for the -goals and resources of the censors we are evading. Otherwise, we risk -spending our effort on keeping the adversaries from doing things they have no -interest in doing, and thwarting techniques they do not use. -The history of blocking-resistance designs is littered with conflicting -assumptions about what adversaries to expect and what problems are -in the critical path to a solution. Here we describe our best -understanding of the current situation around the world. - -In the traditional security style, we aim to defeat a strong -attacker---if we can defend against this attacker, we inherit protection -against weaker attackers as well. After all, we want a general design -that will work for citizens of China, Thailand, and other censored -countries; for -whistleblowers in firewalled corporate networks; and for people in -unanticipated oppressive situations. In fact, by designing with -a variety of adversaries in mind, we can take advantage of the fact that -adversaries will be in different stages of the arms race at each location, -so an address blocked in one locale can still be useful in others. -We focus on an attacker with somewhat complex goals: - -\begin{tightlist} -\item The attacker would like to restrict the flow of certain kinds of - information, particularly when this information is seen as embarrassing to - those in power (such as information about rights violations or corruption), - or when it enables or encourages others to oppose them effectively (such as - information about opposition movements or sites that are used to organize - protests). -\item As a second-order effect, censors aim to chill citizens' behavior by - creating an impression that their online activities are monitored. -\item In some cases, censors make a token attempt to block a few sites for - obscenity, blasphemy, and so on, but their efforts here are mainly for - show. In other cases, they really do try hard to block such content. -\item Complete blocking (where nobody at all can ever download censored - content) is not a - goal. Attackers typically recognize that perfect censorship is not only - impossible, it is unnecessary: if ``undesirable'' information is known only - to a small few, further censoring efforts can be focused elsewhere. -\item Similarly, the censors do not attempt to shut down or block {\it - every} anti-censorship tool---merely the tools that are popular and - effective (because these tools impede the censors' information restriction - goals) and those tools that are highly visible (thus making the censors - look ineffectual to their citizens and their bosses). -\item Reprisal against {\it most} passive consumers of {\it most} kinds of - blocked information is also not a goal, given the broadness of most - censorship regimes. This seems borne out by fact.\footnote{So far in places - like China, the authorities mainly go after people who publish materials - and coordinate organized movements~\cite{mackinnon-personal}. - If they find that a - user happens to be reading a site that should be blocked, the typical - response is simply to block the site. Of course, even with an encrypted - connection, the adversary may be able to distinguish readers from - publishers by observing whether Alice is mostly downloading bytes or mostly - uploading them---we discuss this issue more in - Section~\ref{subsec:upload-padding}.} -\item Producers and distributors of targeted information are in much - greater danger than consumers; the attacker would like to not only block - their work, but identify them for reprisal. -\item The censors (or their governments) would like to have a working, useful - Internet. There are economic, political, and social factors that prevent - them from ``censoring'' the Internet by outlawing it entirely, or by - blocking access to all but a tiny list of sites. - Nevertheless, the censors {\it are} willing to block innocuous content - (like the bulk of a newspaper's reporting) in order to censor other content - distributed through the same channels (like that newspaper's coverage of - the censored country). -\end{tightlist} - -We assume there are three main technical network attacks in use by censors -currently~\cite{clayton:pet2006}: - -\begin{tightlist} -\item Block a destination or type of traffic by automatically searching for - certain strings or patterns in TCP packets. Offending packets can be - dropped, or can trigger a response like closing the - connection. -\item Block certain IP addresses or destination ports at a - firewall or other routing control point. -\item Intercept DNS requests and give bogus responses for certain - destination hostnames. -\end{tightlist} - -We assume the network firewall has limited CPU and memory per -connection~\cite{clayton:pet2006}. Against an adversary who could carefully -examine the contents of every packet and correlate the packets in every -stream on the network, we would need some stronger mechanism such as -steganography, which introduces its own -problems~\cite{active-wardens,tcpstego}. But we make a ``weak -steganography'' assumption here: to remain unblocked, it is necessary to -remain unobservable only by computational resources on par with a modern -router, firewall, proxy, or IDS. - -We assume that while various different regimes can coordinate and share -notes, there will be a time lag between one attacker learning how to overcome -a facet of our design and other attackers picking it up. (The most common -vector of transmission seems to be commercial providers of censorship tools: -once a provider adds a feature to meet one country's needs or requests, the -feature is available to all of the provider's customers.) Conversely, we -assume that insider attacks become a higher risk only after the early stages -of network development, once the system has reached a certain level of -success and visibility. - -We do not assume that government-level attackers are always uniform -across the country. For example, users of different ISPs in China -experience different censorship policies and mechanisms~\cite{china-ccs07}. -%there is no single centralized place in China -%that coordinates its specific censorship decisions and steps. - -We assume that the attacker may be able to use political and economic -resources to secure the cooperation of extraterritorial or multinational -corporations and entities in investigating information sources. -For example, the censors can threaten the service providers of -troublesome blogs with economic reprisals if they do not reveal the -authors' identities. - -We assume that our users have control over their hardware and -software---they don't have any spyware installed, there are no -cameras watching their screens, etc. Unfortunately, in many situations -these threats are real~\cite{zuckerman-threatmodels}; yet -software-based security systems like ours are poorly equipped to handle -a user who is entirely observed and controlled by the adversary. See -Section~\ref{subsec:cafes-and-livecds} for more discussion of what little -we can do about this issue. - -Similarly, we assume that the user will be able to fetch a genuine -version of Tor, rather than one supplied by the adversary; see -Section~\ref{subsec:trust-chain} for discussion on helping the user -confirm that he has a genuine version and that he can connect to the -real Tor network. - -\section{Adapting the current Tor design to anti-censorship} -\label{sec:current-tor} - -Tor is popular and sees a lot of use---it's the largest anonymity -network of its kind, and has -attracted more than 1500 volunteer-operated routers from around the -world. Tor protects each user by routing their traffic through a multiply -encrypted ``circuit'' built of a few randomly selected relay, each of which -can remove only a single layer of encryption. Each relay sees only the step -before it and the step after it in the circuit, and so no single relay can -learn the connection between a user and her chosen communication partners. -In this section, we examine some of the reasons why Tor has become popular, -with particular emphasis to how we can take advantage of these properties -for a blocking-resistance design. - -Tor aims to provide three security properties: -\begin{tightlist} -\item 1. A local network attacker can't learn, or influence, your -destination. -\item 2. No single router in the Tor network can link you to your -destination. -\item 3. The destination, or somebody watching the destination, -can't learn your location. -\end{tightlist} - -For blocking-resistance, we care most clearly about the first -property. But as the arms race progresses, the second property -will become important---for example, to discourage an adversary -from volunteering a relay in order to learn that Alice is reading -or posting to certain websites. The third property helps keep users safe from -collaborating websites: consider websites and other Internet services -that have been pressured -recently into revealing the identity of bloggers -%~\cite{arrested-bloggers} -or treating clients differently depending on their network -location~\cite{netauth}. -%~\cite{google-geolocation}. - -The Tor design provides other features as well that are not typically -present in manual or ad hoc circumvention techniques. - -First, Tor has a well-analyzed and well-understood way to distribute -information about relay. -Tor directory authorities automatically aggregate, test, -and publish signed summaries of the available Tor routers. Tor clients -can fetch these summaries to learn which routers are available and -which routers are suitable for their needs. Directory information is cached -throughout the Tor network, so once clients have bootstrapped they never -need to interact with the authorities directly. (To tolerate a minority -of compromised directory authorities, we use a threshold trust scheme--- -see Section~\ref{subsec:trust-chain} for details.) - -Second, the list of directory authorities is not hard-wired. -Clients use the default authorities if no others are specified, -but it's easy to start a separate (or even overlapping) Tor network just -by running a different set of authorities and convincing users to prefer -a modified client. For example, we could launch a distinct Tor network -inside China; some users could even use an aggregate network made up of -both the main network and the China network. (But we should not be too -quick to create other Tor networks---part of Tor's anonymity comes from -users behaving like other users, and there are many unsolved anonymity -questions if different users know about different pieces of the network.) - -Third, in addition to automatically learning from the chosen directories -which Tor routers are available and working, Tor takes care of building -paths through the network and rebuilding them as needed. So the user -never has to know how paths are chosen, never has to manually pick -working proxies, and so on. More generally, at its core the Tor protocol -is simply a tool that can build paths given a set of routers. Tor is -quite flexible about how it learns about the routers and how it chooses -the paths. Harvard's Blossom project~\cite{blossom-thesis} makes this -flexibility more concrete: Blossom makes use of Tor not for its security -properties but for its reachability properties. It runs a separate set -of directory authorities, its own set of Tor routers (called the Blossom -network), and uses Tor's flexible path-building to let users view Internet -resources from any point in the Blossom network. - -Fourth, Tor separates the role of \emph{internal relay} from the -role of \emph{exit relay}. That is, some volunteers choose just to relay -traffic between Tor users and Tor routers, and others choose to also allow -connections to external Internet resources. Because we don't force all -volunteers to play both roles, we end up with more relays. This increased -diversity in turn is what gives Tor its security: the more options the -user has for her first hop, and the more options she has for her last hop, -the less likely it is that a given attacker will be watching both ends -of her circuit~\cite{tor-design}. As a bonus, because our design attracts -more internal relays that want to help out but don't want to deal with -being an exit relay, we end up providing more options for the first -hop---the one most critical to being able to reach the Tor network. - -Fifth, Tor is sustainable. Zero-Knowledge Systems offered the commercial -but now defunct Freedom Network~\cite{freedom21-security}, a design with -security comparable to Tor's, but its funding model relied on collecting -money from users to pay relay operators. Modern commercial proxy systems -similarly -need to keep collecting money to support their infrastructure. On the -other hand, Tor has built a self-sustaining community of volunteers who -donate their time and resources. This community trust is rooted in Tor's -open design: we tell the world exactly how Tor works, and we provide all -the source code. Users can decide for themselves, or pay any security -expert to decide, whether it is safe to use. Further, Tor's modularity -as described above, along with its open license, mean that its impact -will continue to grow. - -Sixth, Tor has an established user base of hundreds of -thousands of people from around the world. This diversity of -users contributes to sustainability as above: Tor is used by -ordinary citizens, activists, corporations, law enforcement, and -even government and military users, -%\footnote{\url{https://www.torproject.org/overview}} -and they can -only achieve their security goals by blending together in the same -network~\cite{econymics,usability:weis2006}. This user base also provides -something else: hundreds of thousands of different and often-changing -addresses that we can leverage for our blocking-resistance design. - -Finally and perhaps most importantly, Tor provides anonymity and prevents any -single relay from linking users to their communication partners. Despite -initial appearances, {\it distributed-trust anonymity is critical for -anti-censorship efforts}. If any single relay can expose dissident bloggers -or compile a list of users' behavior, the censors can profitably compromise -that relay's operator, perhaps by applying economic pressure to their -employers, -breaking into their computer, pressuring their family (if they have relatives -in the censored area), or so on. Furthermore, in designs where any relay can -expose its users, the censors can spread suspicion that they are running some -of the relays and use this belief to chill use of the network. - -We discuss and adapt these components further in -Section~\ref{sec:bridges}. But first we examine the strengths and -weaknesses of other blocking-resistance approaches, so we can expand -our repertoire of building blocks and ideas. - -\section{Current proxy solutions} -\label{sec:related} - -Relay-based blocking-resistance schemes generally have two main -components: a relay component and a discovery component. The relay part -encompasses the process of establishing a connection, sending traffic -back and forth, and so on---everything that's done once the user knows -where she's going to connect. Discovery is the step before that: the -process of finding one or more usable relays. - -For example, we can divide the pieces of Tor in the previous section -into the process of building paths and sending -traffic over them (relay) and the process of learning from the directory -authorities about what routers are available (discovery). With this -distinction -in mind, we now examine several categories of relay-based schemes. - -\subsection{Centrally-controlled shared proxies} - -Existing commercial anonymity solutions (like Anonymizer.com) are based -on a set of single-hop proxies. In these systems, each user connects to -a single proxy, which then relays traffic between the user and her -destination. These public proxy -systems are typically characterized by two features: they control and -operate the proxies centrally, and many different users get assigned -to each proxy. - -In terms of the relay component, single proxies provide weak security -compared to systems that distribute trust over multiple relays, since a -compromised proxy can trivially observe all of its users' actions, and -an eavesdropper only needs to watch a single proxy to perform timing -correlation attacks against all its users' traffic and thus learn where -everyone is connecting. Worse, all users -need to trust the proxy company to have good security itself as well as -to not reveal user activities. - -On the other hand, single-hop proxies are easier to deploy, and they -can provide better performance than distributed-trust designs like Tor, -since traffic only goes through one relay. They're also more convenient -from the user's perspective---since users entirely trust the proxy, -they can just use their web browser directly. - -Whether public proxy schemes are more or less scalable than Tor is -still up for debate: commercial anonymity systems can use some of their -revenue to provision more bandwidth as they grow, whereas volunteer-based -anonymity systems can attract thousands of fast relays to spread the load. - -The discovery piece can take several forms. Most commercial anonymous -proxies have one or a handful of commonly known websites, and their users -log in to those websites and relay their traffic through them. When -these websites get blocked (generally soon after the company becomes -popular), if the company cares about users in the blocked areas, they -start renting lots of disparate IP addresses and rotating through them -as they get blocked. They notify their users of new addresses (by email, -for example). It's an arms race, since attackers can sign up to receive the -email too, but operators have one nice trick available to them: because they -have a list of paying subscribers, they can notify certain subscribers -about updates earlier than others. - -Access control systems on the proxy let them provide service only to -users with certain characteristics, such as paying customers or people -from certain IP address ranges. - -Discovery in the face of a government-level firewall is a complex and -unsolved -topic, and we're stuck in this same arms race ourselves; we explore it -in more detail in Section~\ref{sec:discovery}. But first we examine the -other end of the spectrum---getting volunteers to run the proxies, -and telling only a few people about each proxy. - -\subsection{Independent personal proxies} - -Personal proxies such as Circumventor~\cite{circumventor} and -CGIProxy~\cite{cgiproxy} use the same technology as the public ones as -far as the relay component goes, but they use a different strategy for -discovery. Rather than managing a few centralized proxies and constantly -getting new addresses for them as the old addresses are blocked, they -aim to have a large number of entirely independent proxies, each managing -its own (much smaller) set of users. - -As the Circumventor site explains, ``You don't -actually install the Circumventor \emph{on} the computer that is blocked -from accessing Web sites. You, or a friend of yours, has to install the -Circumventor on some \emph{other} machine which is not censored.'' - -This tactic has great advantages in terms of blocking-resistance---recall -our assumption in Section~\ref{sec:adversary} that the attention -a system attracts from the attacker is proportional to its number of -users and level of publicity. If each proxy only has a few users, and -there is no central list of proxies, most of them will never get noticed by -the censors. - -On the other hand, there's a huge scalability question that so far has -prevented these schemes from being widely useful: how does the fellow -in China find a person in Ohio who will run a Circumventor for him? In -some cases he may know and trust some people on the outside, but in many -cases he's just out of luck. Just as hard, how does a new volunteer in -Ohio find a person in China who needs it? - -% another key feature of a proxy run by your uncle is that you -% self-censor, so you're unlikely to bring abuse complaints onto -% your uncle. self-censoring clearly has a downside too, though. - -This challenge leads to a hybrid design---centrally-distributed -personal proxies---which we will investigate in more detail in -Section~\ref{sec:discovery}. - -\subsection{Open proxies} - -Yet another currently used approach to bypassing firewalls is to locate -open and misconfigured proxies on the Internet. A quick Google search -for ``open proxy list'' yields a wide variety of freely available lists -of HTTP, HTTPS, and SOCKS proxies. Many small companies have sprung up -providing more refined lists to paying customers. - -There are some downsides to using these open proxies though. First, -the proxies are of widely varying quality in terms of bandwidth and -stability, and many of them are entirely unreachable. Second, unlike -networks of volunteers like Tor, the legality of routing traffic through -these proxies is questionable: it's widely believed that most of them -don't realize what they're offering, and probably wouldn't allow it if -they realized. Third, in many cases the connection to the proxy is -unencrypted, so firewalls that filter based on keywords in IP packets -will not be hindered. Fourth, in many countries (including China), the -firewall authorities hunt for open proxies as well, to preemptively -block them. And last, many users are suspicious that some -open proxies are a little \emph{too} convenient: are they run by the -adversary, in which case they get to monitor all the user's requests -just as single-hop proxies can? - -A distributed-trust design like Tor resolves each of these issues for -the relay component, but a constantly changing set of thousands of open -relays is clearly a useful idea for a discovery component. For example, -users might be able to make use of these proxies to bootstrap their -first introduction into the Tor network. - -\subsection{Blocking resistance and JAP} - -K\"{o}psell and Hilling's Blocking Resistance -design~\cite{koepsell:wpes2004} is probably -the closest related work, and is the starting point for the design in this -paper. In this design, the JAP anonymity system~\cite{web-mix} is used -as a base instead of Tor. Volunteers operate a large number of access -points that relay traffic to the core JAP -network, which in turn anonymizes users' traffic. The software to run these -relays is, as in our design, included in the JAP client software and enabled -only when the user decides to enable it. Discovery is handled with a -CAPTCHA-based mechanism; users prove that they aren't an automated process, -and are given the address of an access point. (The problem of a determined -attacker with enough manpower to launch many requests and enumerate all the -access points is not considered in depth.) There is also some suggestion -that information about access points could spread through existing social -networks. - -\subsection{Infranet} - -The Infranet design~\cite{infranet} uses one-hop relays to deliver web -content, but disguises its communications as ordinary HTTP traffic. Requests -are split into multiple requests for URLs on the relay, which then encodes -its responses in the content it returns. The relay needs to be an actual -website with plausible content and a number of URLs which the user might want -to access---if the Infranet software produced its own cover content, it would -be far easier for censors to identify. To keep the censors from noticing -that cover content changes depending on what data is embedded, Infranet needs -the cover content to have an innocuous reason for changing frequently: the -paper recommends watermarked images and webcams. - -The attacker and relay operators in Infranet's threat model are significantly -different than in ours. Unlike our attacker, Infranet's censor can't be -bypassed with encrypted traffic (presumably because the censor blocks -encrypted traffic, or at least considers it suspicious), and has more -computational resources to devote to each connection than ours (so it can -notice subtle patterns over time). Unlike our bridge operators, Infranet's -operators (and users) have more bandwidth to spare; the overhead in typical -steganography schemes is far higher than Tor's. - -The Infranet design does not include a discovery element. Discovery, -however, is a critical point: if whatever mechanism allows users to learn -about relays also allows the censor to do so, he can trivially discover and -block their addresses, even if the steganography would prevent mere traffic -observation from revealing the relays' addresses. - -\subsection{RST-evasion and other packet-level tricks} - -In their analysis of China's firewall's content-based blocking, Clayton, -Murdoch and Watson discovered that rather than blocking all packets in a TCP -streams once a forbidden word was noticed, the firewall was simply forging -RST packets to make the communicating parties believe that the connection was -closed~\cite{clayton:pet2006}. They proposed altering operating systems -to ignore forged RST packets. This approach might work in some cases, but -in practice it appears that many firewalls start filtering by IP address -once a sufficient number of RST packets have been sent. - -Other packet-level responses to filtering include splitting -sensitive words across multiple TCP packets, so that the censors' -firewalls can't notice them without performing expensive stream -reconstruction~\cite{ptacek98insertion}. This technique relies on the -same insight as our weak steganography assumption. - -%\subsection{Internal caching networks} - -%Freenet~\cite{freenet-pets00} is an anonymous peer-to-peer data store. -%Analyzing Freenet's security can be difficult, as its design is in flux as -%new discovery and routing mechanisms are proposed, and no complete -%specification has (to our knowledge) been written. Freenet servers relay -%requests for specific content (indexed by a digest of the content) -%``toward'' the server that hosts it, and then cache the content as it -%follows the same path back to -%the requesting user. If Freenet's routing mechanism is successful in -%allowing nodes to learn about each other and route correctly even as some -%node-to-node links are blocked by firewalls, then users inside censored areas -%can ask a local Freenet server for a piece of content, and get an answer -%without having to connect out of the country at all. Of course, operators of -%servers inside the censored area can still be targeted, and the addresses of -%external servers can still be blocked. - -%\subsection{Skype} - -%The popular Skype voice-over-IP software uses multiple techniques to tolerate -%restrictive networks, some of which allow it to continue operating in the -%presence of censorship. By switching ports and using encryption, Skype -%attempts to resist trivial blocking and content filtering. Even if no -%encryption were used, it would still be expensive to scan all voice -%traffic for sensitive words. Also, most current keyloggers are unable to -%store voice traffic. Nevertheless, Skype can still be blocked, especially at -%its central login server. - -%*sjmurdoch* "we consider the login server to be the only central component in -%the Skype p2p network." -%*sjmurdoch* http://www1.cs.columbia.edu/~salman/publications/skype1_4.pdf -%-> *sjmurdoch* ok. what is the login server's role? -%-> *sjmurdoch* and do you need to reach it directly to use skype? -%*sjmurdoch* It checks the username and password -%*sjmurdoch* It is necessary in the current implementation, but I don't know if -%it is a fundemental limitation of the architecture - -\subsection{Tor itself} - -And last, we include Tor itself in the list of current solutions -to firewalls. Tens of thousands of people use Tor from countries that -routinely filter their Internet. Tor's website has been blocked in most -of them. But why hasn't the Tor network been blocked yet? - -We have several theories. The first is the most straightforward: tens of -thousands of people are simply too few to matter. It may help that Tor is -perceived to be for experts only, and thus not worth attention yet. The -more subtle variant on this theory is that we've positioned Tor in the -public eye as a tool for retaining civil liberties in more free countries, -so perhaps blocking authorities don't view it as a threat. (We revisit -this idea when we consider whether and how to publicize a Tor variant -that improves blocking-resistance---see Section~\ref{subsec:publicity} -for more discussion.) - -The broader explanation is that the maintenance of most government-level -filters is aimed at stopping widespread information flow and appearing to be -in control, not by the impossible goal of blocking all possible ways to bypass -censorship. Censors realize that there will always -be ways for a few people to get around the firewall, and as long as Tor -has not publically threatened their control, they see no urgent need to -block it yet. - -We should recognize that we're \emph{already} in the arms race. These -constraints can give us insight into the priorities and capabilities of -our various attackers. - -\section{The relay component of our blocking-resistant design} -\label{sec:bridges} - -Section~\ref{sec:current-tor} describes many reasons why Tor is -well-suited as a building block in our context, but several changes will -allow the design to resist blocking better. The most critical changes are -to get more relay addresses, and to distribute them to users differently. - -%We need to address three problems: -%- adapting the relay component of Tor so it resists blocking better. -%- Discovery. -%- Tor's network fingerprint. - -%Here we describe the new pieces we need to add to the current Tor design. - -\subsection{Bridge relays} - -Today, Tor relays operate on a few thousand distinct IP addresses; -an adversary -could enumerate and block them all with little trouble. To provide a -means of ingress to the network, we need a larger set of entry points, most -of which an adversary won't be able to enumerate easily. Fortunately, we -have such a set: the Tor users. - -Hundreds of thousands of people around the world use Tor. We can leverage -our already self-selected user base to produce a list of thousands of -frequently-changing IP addresses. Specifically, we can give them a little -button in the GUI that says ``Tor for Freedom'', and users who click -the button will turn into \emph{bridge relays} (or just \emph{bridges} -for short). They can rate limit relayed connections to 10 KB/s (almost -nothing for a broadband user in a free country, but plenty for a user -who otherwise has no access at all), and since they are just relaying -bytes back and forth between blocked users and the main Tor network, they -won't need to make any external connections to Internet sites. Because -of this separation of roles, and because we're making use of software -that the volunteers have already installed for their own use, we expect -our scheme to attract and maintain more volunteers than previous schemes. - -As usual, there are new anonymity and security implications from running a -bridge relay, particularly from letting people relay traffic through your -Tor client; but we leave this discussion for Section~\ref{sec:security}. - -%...need to outline instructions for a Tor config that will publish -%to an alternate directory authority, and for controller commands -%that will do this cleanly. - -\subsection{The bridge directory authority} - -How do the bridge relays advertise their existence to the world? We -introduce a second new component of the design: a specialized directory -authority that aggregates and tracks bridges. Bridge relays periodically -publish relay descriptors (summaries of their keys, locations, etc, -signed by their long-term identity key), just like the relays in the -``main'' Tor network, but in this case they publish them only to the -bridge directory authorities. - -The main difference between bridge authorities and the directory -authorities for the main Tor network is that the main authorities provide -a list of every known relay, but the bridge authorities only give -out a relay descriptor if you already know its identity key. That is, -you can keep up-to-date on a bridge's location and other information -once you know about it, but you can't just grab a list of all the bridges. - -The identity key, IP address, and directory port for each bridge -authority ship by default with the Tor software, so the bridge relays -can be confident they're publishing to the right location, and the -blocked users can establish an encrypted authenticated channel. See -Section~\ref{subsec:trust-chain} for more discussion of the public key -infrastructure and trust chain. - -Bridges use Tor to publish their descriptors privately and securely, -so even an attacker monitoring the bridge directory authority's network -can't make a list of all the addresses contacting the authority. -Bridges may publish to only a subset of the -authorities, to limit the potential impact of an authority compromise. - - -%\subsection{A simple matter of engineering} -% -%Although we've described bridges and bridge authorities in simple terms -%above, some design modifications and features are needed in the Tor -%codebase to add them. We describe the four main changes here. -% -%Firstly, we need to get smarter about rate limiting: -%Bandwidth classes -% -%Secondly, while users can in fact configure which directory authorities -%they use, we need to add a new type of directory authority and teach -%bridges to fetch directory information from the main authorities while -%publishing relay descriptors to the bridge authorities. We're most of -%the way there, since we can already specify attributes for directory -%authorities: -%add a separate flag named ``blocking''. -% -%Thirdly, need to build paths using bridges as the first -%hop. One more hole in the non-clique assumption. -% -%Lastly, since bridge authorities don't answer full network statuses, -%we need to add a new way for users to learn the current status for a -%single relay or a small set of relays---to answer such questions as -%``is it running?'' or ``is it behaving correctly?'' We describe in -%Section~\ref{subsec:enclave-dirs} a way for the bridge authority to -%publish this information without resorting to signing each answer -%individually. - -\subsection{Putting them together} -\label{subsec:relay-together} - -If a blocked user knows the identity keys of a set of bridge relays, and -he has correct address information for at least one of them, he can use -that one to make a secure connection to the bridge authority and update -his knowledge about the other bridge relays. He can also use it to make -secure connections to the main Tor network and directory authorities, so he -can build circuits and connect to the rest of the Internet. All of these -updates happen in the background: from the blocked user's perspective, -he just accesses the Internet via his Tor client like always. - -So now we've reduced the problem from how to circumvent the firewall -for all transactions (and how to know that the pages you get have not -been modified by the local attacker) to how to learn about a working -bridge relay. - -There's another catch though. We need to make sure that the network -traffic we generate by simply connecting to a bridge relay doesn't stand -out too much. - -%The following section describes ways to bootstrap knowledge of your first -%bridge relay, and ways to maintain connectivity once you know a few -%bridge relays. - -% (See Section~\ref{subsec:first-bridge} for a discussion -%of exactly what information is sufficient to characterize a bridge relay.) - - - -\section{Hiding Tor's network fingerprint} -\label{sec:network-fingerprint} -\label{subsec:enclave-dirs} - -Currently, Tor uses two protocols for its network communications. The -main protocol uses TLS for encrypted and authenticated communication -between Tor instances. The second protocol is standard HTTP, used for -fetching directory information. All Tor relays listen on their ``ORPort'' -for TLS connections, and some of them opt to listen on their ``DirPort'' -as well, to serve directory information. Tor relays choose whatever port -numbers they like; the relay descriptor they publish to the directory -tells users where to connect. - -One format for communicating address information about a bridge relay is -its IP address and DirPort. From there, the user can ask the bridge's -directory cache for an up-to-date copy of its relay descriptor, and -learn its current circuit keys, its ORPort, and so on. - -However, connecting directly to the directory cache involves a plaintext -HTTP request. A censor could create a network fingerprint (known as a -\emph{signature} in the intrusion detection field) for the request -and/or its response, thus preventing these connections. To resolve this -vulnerability, we've modified the Tor protocol so that users can connect -to the directory cache via the main Tor port---they establish a TLS -connection with the bridge as normal, and then send a special ``begindir'' -relay command to establish an internal connection to its directory cache. - -Therefore a better way to summarize a bridge's address is by its IP -address and ORPort, so all communications between the client and the -bridge will use ordinary TLS. But there are other details that need -more investigation. - -What port should bridges pick for their ORPort? We currently recommend -that they listen on port 443 (the default HTTPS port) if they want to -be most useful, because clients behind standard firewalls will have -the best chance to reach them. Is this the best choice in all cases, -or should we encourage some fraction of them pick random ports, or other -ports commonly permitted through firewalls like 53 (DNS) or 110 -(POP)? Or perhaps we should use other ports where TLS traffic is -expected, like 993 (IMAPS) or 995 (POP3S). We need more research on our -potential users, and their current and anticipated firewall restrictions. - -Furthermore, we need to look at the specifics of Tor's TLS handshake. -Right now Tor uses some predictable strings in its TLS handshakes. For -example, it sets the X.509 organizationName field to ``Tor'', and it puts -the Tor relay's nickname in the certificate's commonName field. We -should tweak the handshake protocol so it doesn't rely on any unusual details -in the certificate, yet it remains secure; the certificate itself -should be made to resemble an ordinary HTTPS certificate. We should also try -to make our advertised cipher-suites closer to what an ordinary web server -would support. - -Tor's TLS handshake uses two-certificate chains: one certificate -contains the self-signed identity key for -the router, and the second contains a current TLS key, signed by the -identity key. We use these to authenticate that we're talking to the right -router, and to limit the impact of TLS-key exposure. Most (though far from -all) consumer-oriented HTTPS services provide only a single certificate. -These extra certificates may help identify Tor's TLS handshake; instead, -bridges should consider using only a single TLS key certificate signed by -their identity key, and providing the full value of the identity key in an -early handshake cell. More significantly, Tor currently has all clients -present certificates, so that clients are harder to distinguish from relays. -But in a blocking-resistance environment, clients should not present -certificates at all. - -Last, what if the adversary starts observing the network traffic even -more closely? Even if our TLS handshake looks innocent, our traffic timing -and volume still look different than a user making a secure web connection -to his bank. The same techniques used in the growing trend to build tools -to recognize encrypted Bittorrent traffic -%~\cite{bt-traffic-shaping} -could be used to identify Tor communication and recognize bridge -relays. Rather than trying to look like encrypted web traffic, we may be -better off trying to blend with some other encrypted network protocol. The -first step is to compare typical network behavior for a Tor client to -typical network behavior for various other protocols. This statistical -cat-and-mouse game is made more complex by the fact that Tor transports a -variety of protocols, and we'll want to automatically handle web browsing -differently from, say, instant messaging. - -% Tor cells are 512 bytes each. So TLS records will be roughly -% multiples of this size? How bad is this? -RD -% Look at ``Inferring the Source of Encrypted HTTP Connections'' -% by Marc Liberatore and Brian Neil Levine (CCS 2006) -% They substantially flesh out the numbers for the web fingerprinting -% attack. -PS -% Yes, but I meant detecting the fingerprint of Tor traffic itself, not -% learning what websites we're going to. I wouldn't be surprised to -% learn that these are related problems, but it's not obvious to me. -RD - -\subsection{Identity keys as part of addressing information} -\label{subsec:id-address} - -We have described a way for the blocked user to bootstrap into the -network once he knows the IP address and ORPort of a bridge. What about -local spoofing attacks? That is, since we never learned an identity -key fingerprint for the bridge, a local attacker could intercept our -connection and pretend to be the bridge we had in mind. It turns out -that giving false information isn't that bad---since the Tor client -ships with trusted keys for the bridge directory authority and the Tor -network directory authorities, the user can learn whether he's being -given a real connection to the bridge authorities or not. (After all, -if the adversary intercepts every connection the user makes and gives -him a bad connection each time, there's nothing we can do.) - -What about anonymity-breaking attacks from observing traffic, if the -blocked user doesn't start out knowing the identity key of his intended -bridge? The vulnerabilities aren't so bad in this case either---the -adversary could do similar attacks just by monitoring the network -traffic. -% cue paper by steven and george - -Once the Tor client has fetched the bridge's relay descriptor, it should -remember the identity key fingerprint for that bridge relay. Thus if -the bridge relay moves to a new IP address, the client can query the -bridge directory authority to look up a fresh relay descriptor using -this fingerprint. - -So we've shown that it's \emph{possible} to bootstrap into the network -just by learning the IP address and ORPort of a bridge, but are there -situations where it's more convenient or more secure to learn the bridge's -identity fingerprint as well as instead, while bootstrapping? We keep -that question in mind as we next investigate bootstrapping and discovery. - -\section{Discovering working bridge relays} -\label{sec:discovery} - -Tor's modular design means that we can develop a better relay component -independently of developing the discovery component. This modularity's -great promise is that we can pick any discovery approach we like; but the -unfortunate fact is that we have no magic bullet for discovery. We're -in the same arms race as all the other designs we described in -Section~\ref{sec:related}. - -In this section we describe a variety of approaches to adding discovery -components for our design. - -\subsection{Bootstrapping: finding your first bridge.} -\label{subsec:first-bridge} - -In Section~\ref{subsec:relay-together}, we showed that a user who knows -a working bridge address can use it to reach the bridge authority and -to stay connected to the Tor network. But how do new users reach the -bridge authority in the first place? After all, the bridge authority -will be one of the first addresses that a censor blocks. - -First, we should recognize that most government firewalls are not -perfect. That is, they may allow connections to Google cache or some -open proxy servers, or they let file-sharing traffic, Skype, instant -messaging, or World-of-Warcraft connections through. Different users will -have different mechanisms for bypassing the firewall initially. Second, -we should remember that most people don't operate in a vacuum; users will -hopefully know other people who are in other situations or have other -resources available. In the rest of this section we develop a toolkit -of different options and mechanisms, so that we can enable users in a -diverse set of contexts to bootstrap into the system. - -(For users who can't use any of these techniques, hopefully they know -a friend who can---for example, perhaps the friend already knows some -bridge relay addresses. If they can't get around it at all, then we -can't help them---they should go meet more people or learn more about -the technology running the firewall in their area.) - -By deploying all the schemes in the toolkit at once, we let bridges and -blocked users employ the discovery approach that is most appropriate -for their situation. - -\subsection{Independent bridges, no central discovery} - -The first design is simply to have no centralized discovery component at -all. Volunteers run bridges, and we assume they have some blocked users -in mind and communicate their address information to them out-of-band -(for example, through Gmail). This design allows for small personal -bridges that have only one or a handful of users in mind, but it can -also support an entire community of users. For example, Citizen Lab's -upcoming Psiphon single-hop proxy tool~\cite{psiphon} plans to use this -\emph{social network} approach as its discovery component. - -There are several ways to do bootstrapping in this design. In the simple -case, the operator of the bridge informs each chosen user about his -bridge's address information and/or keys. A different approach involves -blocked users introducing new blocked users to the bridges they know. -That is, somebody in the blocked area can pass along a bridge's address to -somebody else they trust. This scheme brings in appealing but complex game -theoretic properties: the blocked user making the decision has an incentive -only to delegate to trustworthy people, since an adversary who learns -the bridge's address and filters it makes it unavailable for both of them. -Also, delegating known bridges to members of your social network can be -dangerous: an the adversary who can learn who knows which bridges may -be able to reconstruct the social network. - -Note that a central set of bridge directory authorities can still be -compatible with a decentralized discovery process. That is, how users -first learn about bridges is entirely up to the bridges, but the process -of fetching up-to-date descriptors for them can still proceed as described -in Section~\ref{sec:bridges}. Of course, creating a central place that -knows about all the bridges may not be smart, especially if every other -piece of the system is decentralized. Further, if a user only knows -about one bridge and he loses track of it, it may be quite a hassle to -reach the bridge authority. We address these concerns next. - -\subsection{Families of bridges, no central discovery} - -Because the blocked users are running our software too, we have many -opportunities to improve usability or robustness. Our second design builds -on the first by encouraging volunteers to run several bridges at once -(or coordinate with other bridge volunteers), such that some -of the bridges are likely to be available at any given time. - -The blocked user's Tor client would periodically fetch an updated set of -recommended bridges from any of the working bridges. Now the client can -learn new additions to the bridge pool, and can expire abandoned bridges -or bridges that the adversary has blocked, without the user ever needing -to care. To simplify maintenance of the community's bridge pool, each -community could run its own bridge directory authority---reachable via -the available bridges, and also mirrored at each bridge. - -\subsection{Public bridges with central discovery} - -What about people who want to volunteer as bridges but don't know any -suitable blocked users? What about people who are blocked but don't -know anybody on the outside? Here we describe how to make use of these -\emph{public bridges} in a way that still makes it hard for the attacker -to learn all of them. - -The basic idea is to divide public bridges into a set of pools based on -identity key. Each pool corresponds to a \emph{distribution strategy}: -an approach to distributing its bridge addresses to users. Each strategy -is designed to exercise a different scarce resource or property of -the user. - -How do we divide bridges between these strategy pools such that they're -evenly distributed and the allocation is hard to influence or predict, -but also in a way that's amenable to creating more strategies later -on without reshuffling all the pools? We assign a given bridge -to a strategy pool by hashing the bridge's identity key along with a -secret that only the bridge authority knows: the first $n$ bits of this -hash dictate the strategy pool number, where $n$ is a parameter that -describes how many strategy pools we want at this point. We choose $n=3$ -to start, so we divide bridges between 8 pools; but as we later invent -new distribution strategies, we can increment $n$ to split the 8 into -16. Since a bridge can't predict the next bit in its hash, it can't -anticipate which identity key will correspond to a certain new pool -when the pools are split. Further, since the bridge authority doesn't -provide any feedback to the bridge about which strategy pool it's in, -an adversary who signs up bridges with the goal of filling a certain -pool~\cite{casc-rep} will be hindered. - -% This algorithm is not ideal. When we split pools, each existing -% pool is cut in half, where half the bridges remain with the -% old distribution policy, and half will be under what the new one -% is. So the new distribution policy inherits a bunch of blocked -% bridges if the old policy was too loose, or a bunch of unblocked -% bridges if its policy was still secure. -RD -% -% I think it should be more chordlike. -% Bridges are allocated to wherever on the ring which is divided -% into arcs (buckets). -% If a bucket gets too full, you can just split it. -% More on this below. -PFS - -The first distribution strategy (used for the first pool) publishes bridge -addresses in a time-release fashion. The bridge authority divides the -available bridges into partitions, and each partition is deterministically -available only in certain time windows. That is, over the course of a -given time slot (say, an hour), each requester is given a random bridge -from within that partition. When the next time slot arrives, a new set -of bridges from the pool are available for discovery. Thus some bridge -address is always available when a new -user arrives, but to learn about all bridges the attacker needs to fetch -all new addresses at every new time slot. By varying the length of the -time slots, we can make it harder for the attacker to guess when to check -back. We expect these bridges will be the first to be blocked, but they'll -help the system bootstrap until they \emph{do} get blocked. Further, -remember that we're dealing with different blocking regimes around the -world that will progress at different rates---so this pool will still -be useful to some users even as the arms races progress. - -The second distribution strategy publishes bridge addresses based on the IP -address of the requesting user. Specifically, the bridge authority will -divide the available bridges in the pool into a bunch of partitions -(as in the first distribution scheme), hash the requester's IP address -with a secret of its own (as in the above allocation scheme for creating -pools), and give the requester a random bridge from the appropriate -partition. To raise the bar, we should discard the last octet of the -IP address before inputting it to the hash function, so an attacker -who only controls a single ``/24'' network only counts as one user. A -large attacker like China will still be able to control many addresses, -but the hassle of establishing connections from each network (or spoofing -TCP connections) may still slow them down. Similarly, as a special case, -we should treat IP addresses that are Tor exit nodes as all being on -the same network. - -The third strategy combines the time-based and location-based -strategies to further constrain and rate-limit the available bridge -addresses. Specifically, the bridge address provided in a given time -slot to a given network location is deterministic within the partition, -rather than chosen randomly each time from the partition. Thus, repeated -requests during that time slot from a given network are given the same -bridge address as the first request. - -The fourth strategy is based on Circumventor's discovery strategy. -The Circumventor project, realizing that its adoption will remain limited -if it has no central coordination mechanism, has started a mailing list to -distribute new proxy addresses every few days. From experimentation it -seems they have concluded that sending updates every three or four days -is sufficient to stay ahead of the current attackers. - -The fifth strategy provides an alternative approach to a mailing list: -users provide an email address and receive an automated response -listing an available bridge address. We could limit one response per -email address. To further rate limit queries, we could require a CAPTCHA -solution -%~\cite{captcha} -in each case too. In fact, we wouldn't need to -implement the CAPTCHA on our side: if we only deliver bridge addresses -to Yahoo or GMail addresses, we can leverage the rate-limiting schemes -that other parties already impose for account creation. - -The sixth strategy ties in the social network design with public -bridges and a reputation system. We pick some seeds---trusted people in -blocked areas---and give them each a few dozen bridge addresses and a few -\emph{delegation tokens}. We run a website next to the bridge authority, -where users can log in (they connect via Tor, and they don't need to -provide actual identities, just persistent pseudonyms). Users can delegate -trust to other people they know by giving them a token, which can be -exchanged for a new account on the website. Accounts in ``good standing'' -then accrue new bridge addresses and new tokens. As usual, reputation -schemes bring in a host of new complexities~\cite{rep-anon}: how do we -decide that an account is in good standing? We could tie reputation -to whether the bridges they're told about have been blocked---see -Section~\ref{subsec:geoip} below for initial thoughts on how to discover -whether bridges have been blocked. We could track reputation between -accounts (if you delegate to somebody who screws up, it impacts you too), -or we could use blinded delegation tokens~\cite{chaum-blind} to prevent -the website from mapping the seeds' social network. We put off deeper -discussion of the social network reputation strategy for future work. - -Pools seven and eight are held in reserve, in case our currently deployed -tricks all fail at once and the adversary blocks all those bridges---so -we can adapt and move to new approaches quickly, and have some bridges -immediately available for the new schemes. New strategies might be based -on some other scarce resource, such as relaying traffic for others or -other proof of energy spent. (We might also worry about the incentives -for bridges that sign up and get allocated to the reserve pools: will they -be unhappy that they're not being used? But this is a transient problem: -if Tor users are bridges by default, nobody will mind not being used yet. -See also Section~\ref{subsec:incentives}.) - -%Is it useful to load balance which bridges are handed out? The above -%pool concept makes some bridges wildly popular and others less so. -%But I guess that's the point. - -\subsection{Public bridges with coordinated discovery} - -We presented the above discovery strategies in the context of a single -bridge directory authority, but in practice we will want to distribute the -operations over several bridge authorities---a single point of failure -or attack is a bad move. The first answer is to run several independent -bridge directory authorities, and bridges gravitate to one based on -their identity key. The better answer would be some federation of bridge -authorities that work together to provide redundancy but don't introduce -new security issues. We could even imagine designs where the bridge -authorities have encrypted versions of the bridge's relay descriptors, -and the users learn a decryption key that they keep private when they -first hear about the bridge---this way the bridge authorities would not -be able to learn the IP address of the bridges. - -We leave this design question for future work. - -\subsection{Assessing whether bridges are useful} - -Learning whether a bridge is useful is important in the bridge authority's -decision to include it in responses to blocked users. For example, if -we end up with a list of thousands of bridges and only a few dozen of -them are reachable right now, most blocked users will not end up knowing -about working bridges. - -There are three components for assessing how useful a bridge is. First, -is it reachable from the public Internet? Second, what proportion of -the time is it available? Third, is it blocked in certain jurisdictions? - -The first component can be tested just as we test reachability of -ordinary Tor relays. Specifically, the bridges do a self-test---connect -to themselves via the Tor network---before they are willing to -publish their descriptor, to make sure they're not obviously broken or -misconfigured. Once the bridges publish, the bridge authority also tests -reachability to make sure they're not confused or outright lying. - -The second component can be measured and tracked by the bridge authority. -By doing periodic reachability tests, we can get a sense of how often the -bridge is available. More complex tests will involve bandwidth-intensive -checks to force the bridge to commit resources in order to be counted as -available. We need to evaluate how the relationship of uptime percentage -should weigh into our choice of which bridges to advertise. We leave -this to future work. - -The third component is perhaps the trickiest: with many different -adversaries out there, how do we keep track of which adversaries have -blocked which bridges, and how do we learn about new blocks as they -occur? We examine this problem next. - -\subsection{How do we know if a bridge relay has been blocked?} -\label{subsec:geoip} - -There are two main mechanisms for testing whether bridges are reachable -from inside each blocked area: active testing via users, and passive -testing via bridges. - -In the case of active testing, certain users inside each area -sign up as testing relays. The bridge authorities can then use a -Blossom-like~\cite{blossom-thesis} system to build circuits through them -to each bridge and see if it can establish the connection. But how do -we pick the users? If we ask random users to do the testing (or if we -solicit volunteers from the users), the adversary should sign up so he -can enumerate the bridges we test. Indeed, even if we hand-select our -testers, the adversary might still discover their location and monitor -their network activity to learn bridge addresses. - -Another answer is not to measure directly, but rather let the bridges -report whether they're being used. -%If they periodically report to their -%bridge directory authority how much use they're seeing, perhaps the -%authority can make smart decisions from there. -Specifically, bridges should install a GeoIP database such as the public -IP-To-Country list~\cite{ip-to-country}, and then periodically report to the -bridge authorities which countries they're seeing use from. This data -would help us track which countries are making use of the bridge design, -and can also let us learn about new steps the adversary has taken in -the arms race. (The compressed GeoIP database is only several hundred -kilobytes, and we could even automate the update process by serving it -from the bridge authorities.) -More analysis of this passive reachability -testing design is needed to resolve its many edge cases: for example, -if a bridge stops seeing use from a certain area, does that mean the -bridge is blocked or does that mean those users are asleep? - -There are many more problems with the general concept of detecting whether -bridges are blocked. First, different zones of the Internet are blocked -in different ways, and the actual firewall jurisdictions do not match -country borders. Our bridge scheme could help us map out the topology -of the censored Internet, but this is a huge task. More generally, -if a bridge relay isn't reachable, is that because of a network block -somewhere, because of a problem at the bridge relay, or just a temporary -outage somewhere in between? And last, an attacker could poison our -bridge database by signing up already-blocked bridges. In this case, -if we're stingy giving out bridge addresses, users in that country won't -learn working bridges. - -All of these issues are made more complex when we try to integrate this -testing into our social network reputation system above. -Since in that case we punish or reward users based on whether bridges -get blocked, the adversary has new attacks to trick or bog down the -reputation tracking. Indeed, the bridge authority doesn't even know -what zone the blocked user is in, so do we blame him for any possible -censored zone, or what? - -Clearly more analysis is required. The eventual solution will probably -involve a combination of passive measurement via GeoIP and active -measurement from trusted testers. More generally, we can use the passive -feedback mechanism to track usage of the bridge network as a whole---which -would let us respond to attacks and adapt the design, and it would also -let the general public track the progress of the project. - -%Worry: the adversary could choose not to block bridges but just record -%connections to them. So be it, I guess. - -\subsection{Advantages of deploying all solutions at once} - -For once, we're not in the position of the defender: we don't have to -defend against every possible filtering scheme; we just have to defend -against at least one. On the flip side, the attacker is forced to guess -how to allocate his resources to defend against each of these discovery -strategies. So by deploying all of our strategies at once, we not only -increase our chances of finding one that the adversary has difficulty -blocking, but we actually make \emph{all} of the strategies more robust -in the face of an adversary with limited resources. - -%\subsection{Remaining unsorted notes} - -%In the first subsection we describe how to find a first bridge. - -%Going to be an arms race. Need a bag of tricks. Hard to say -%which ones will work. Don't spend them all at once. - -%Some techniques are sufficient to get us an IP address and a port, -%and others can get us IP:port:key. Lay out some plausible options -%for how users can bootstrap into learning their first bridge. - -%\section{The account / reputation system} -%\section{Social networks with directory-side support} -%\label{sec:accounts} - -%One answer is to measure based on whether the bridge addresses -%we give it end up blocked. But how do we decide if they get blocked? - -%Perhaps each bridge should be known by a single bridge directory -%authority. This makes it easier to trace which users have learned about -%it, so easier to blame or reward. It also makes things more brittle, -%since loss of that authority means its bridges aren't advertised until -%they switch, and means its bridge users are sad too. -%(Need a slick hash algorithm that will map our identity key to a -%bridge authority, in a way that's sticky even when we add bridge -%directory authorities, but isn't sticky when our authority goes -%away. Does this exist?) -% [[Ian says: What about just using something like hash table chaining? -% This should work, so long as the client knows which authorities currently -% exist.]] - -%\subsection{Discovery based on social networks} - -%A token that can be exchanged at the bridge authority (assuming you -%can reach it) for a new bridge address. - -%The account server runs as a Tor controller for the bridge authority. - -%Users can establish reputations, perhaps based on social network -%connectivity, perhaps based on not getting their bridge relays blocked, - -%Probably the most critical lesson learned in past work on reputation -%systems in privacy-oriented environments~\cite{rep-anon} is the need for -%verifiable transactions. That is, the entity computing and advertising -%reputations for participants needs to actually learn in a convincing -%way that a given transaction was successful or unsuccessful. - -%(Lesson from designing reputation systems~\cite{rep-anon}: easy to -%reward good behavior, hard to punish bad behavior. - -\section{Security considerations} -\label{sec:security} - -\subsection{Possession of Tor in oppressed areas} - -Many people speculate that installing and using a Tor client in areas with -particularly extreme firewalls is a high risk---and the risk increases -as the firewall gets more restrictive. This notion certainly has merit, but -there's -a counter pressure as well: as the firewall gets more restrictive, more -ordinary people behind it end up using Tor for more mainstream activities, -such as learning -about Wall Street prices or looking at pictures of women's ankles. So -as the restrictive firewall pushes up the number of Tor users, the -``typical'' Tor user becomes more mainstream, and therefore mere -use or possession of the Tor software is not so surprising. - -It's hard to say which of these pressures will ultimately win out, -but we should keep both sides of the issue in mind. - -%Nick, want to rewrite/elaborate on this section? - -%Ian suggests: -% Possession of Tor: this is totally of-the-cuff, and there are lots of -% security issues to think about, but what about an ActiveX version of -% Tor? The magic you learn (as opposed to a bridge address) is a plain -% old HTTPS server, which feeds you an ActiveX applet pre-configured with -% some bridge address (possibly on the same machine). For bonus points, -% somehow arrange that (a) the applet is signed in some way the user can -% reliably check, but (b) don't end up with anything like an incriminating -% long-term cert stored on the user's computer. This may be marginally -% useful in some Internet-cafe situations as well, though (a) is even -% harder to get right there. - - -\subsection{Observers can tell who is publishing and who is reading} -\label{subsec:upload-padding} - -Tor encrypts traffic on the local network, and it obscures the eventual -destination of the communication, but it doesn't do much to obscure the -traffic volume. In particular, a user publishing a home video will have a -different network fingerprint than a user reading an online news article. -Based on our assumption in Section~\ref{sec:adversary} that users who -publish material are in more danger, should we work to improve Tor's -security in this situation? - -In the general case this is an extremely challenging task: -effective \emph{end-to-end traffic confirmation attacks} -are known where the adversary observes the origin and the -destination of traffic and confirms that they are part of the -same communication~\cite{danezis:pet2004,e2e-traffic}. Related are -\emph{website fingerprinting attacks}, where the adversary downloads -a few hundred popular websites, makes a set of "fingerprints" for each -site, and then observes the target Tor client's traffic to look for -a match~\cite{pet05-bissias,defensive-dropping}. But can we do better -against a limited adversary who just does coarse-grained sweeps looking -for unusually prolific publishers? - -One answer is for bridge users to automatically send bursts of padding -traffic periodically. (This traffic can be implemented in terms of -long-range drop cells, which are already part of the Tor specification.) -Of course, convincingly simulating an actual human publishing interesting -content is a difficult arms race, but it may be worthwhile to at least -start the race. More research remains. - -\subsection{Anonymity effects from acting as a bridge relay} - -Against some attacks, relaying traffic for others can improve -anonymity. The simplest example is an attacker who owns a small number -of Tor relays. He will see a connection from the bridge, but he won't -be able to know whether the connection originated there or was relayed -from somebody else. More generally, the mere uncertainty of whether the -traffic originated from that user may be helpful. - -There are some cases where it doesn't seem to help: if an attacker can -watch all of the bridge's incoming and outgoing traffic, then it's easy -to learn which connections were relayed and which started there. (In this -case he still doesn't know the final destinations unless he is watching -them too, but in this case bridges are no better off than if they were -an ordinary client.) - -There are also some potential downsides to running a bridge. First, while -we try to make it hard to enumerate all bridges, it's still possible to -learn about some of them, and for some people just the fact that they're -running one might signal to an attacker that they place a higher value -on their anonymity. Second, there are some more esoteric attacks on Tor -relays that are not as well-understood or well-tested---for example, an -attacker may be able to ``observe'' whether the bridge is sending traffic -even if he can't actually watch its network, by relaying traffic through -it and noticing changes in traffic timing~\cite{attack-tor-oak05}. On -the other hand, it may be that limiting the bandwidth the bridge is -willing to relay will allow this sort of attacker to determine if it's -being used as a bridge but not easily learn whether it is adding traffic -of its own. - -We also need to examine how entry guards fit in. Entry guards -(a small set of nodes that are always used for the first -step in a circuit) help protect against certain attacks -where the attacker runs a few Tor relays and waits for -the user to choose these relays as the beginning and end of her -circuit\footnote{\url{http://wiki.noreply.org/noreply/TheOnionRouter/TorFAQ#EntryGuards}}. -If the blocked user doesn't use the bridge's entry guards, then the bridge -doesn't gain as much cover benefit. On the other hand, what design changes -are needed for the blocked user to use the bridge's entry guards without -learning what they are (this seems hard), and even if we solve that, -do they then need to use the guards' guards and so on down the line? - -It is an open research question whether the benefits of running a bridge -outweigh the risks. A lot of the decision rests on which attacks the -users are most worried about. For most users, we don't think running a -bridge relay will be that damaging, and it could help quite a bit. - -\subsection{Trusting local hardware: Internet cafes and LiveCDs} -\label{subsec:cafes-and-livecds} - -Assuming that users have their own trusted hardware is not -always reasonable. - -For Internet cafe Windows computers that let you attach your own USB key, -a USB-based Tor image would be smart. There's Torpark, and hopefully -there will be more thoroughly analyzed and trustworthy options down the -road. Worries remain about hardware or software keyloggers and other -spyware, as well as physical surveillance. - -If the system lets you boot from a CD or from a USB key, you can gain -a bit more security by bringing a privacy LiveCD with you. (This -approach isn't foolproof either of course, since hardware -keyloggers and physical surveillance are still a worry). - -In fact, LiveCDs are also useful if it's your own hardware, since it's -easier to avoid leaving private data and logs scattered around the -system. - -%\subsection{Forward compatibility and retiring bridge authorities} -% -%Eventually we'll want to change the identity key and/or location -%of a bridge authority. How do we do this mostly cleanly? - -\subsection{The trust chain} -\label{subsec:trust-chain} - -Tor's ``public key infrastructure'' provides a chain of trust to -let users verify that they're actually talking to the right relays. -There are four pieces to this trust chain. - -First, when Tor clients are establishing circuits, at each step -they demand that the next Tor relay in the path prove knowledge of -its private key~\cite{tor-design}. This step prevents the first node -in the path from just spoofing the rest of the path. Second, the -Tor directory authorities provide a signed list of relays along with -their public keys---so unless the adversary can control a threshold -of directory authorities, he can't trick the Tor client into using other -Tor relays. Third, the location and keys of the directory authorities, -in turn, is hard-coded in the Tor source code---so as long as the user -got a genuine version of Tor, he can know that he is using the genuine -Tor network. And last, the source code and other packages are signed -with the GPG keys of the Tor developers, so users can confirm that they -did in fact download a genuine version of Tor. - -In the case of blocked users contacting bridges and bridge directory -authorities, the same logic applies in parallel: the blocked users fetch -information from both the bridge authorities and the directory authorities -for the `main' Tor network, and they combine this information locally. - -How can a user in an oppressed country know that he has the correct -key fingerprints for the developers? As with other security systems, it -ultimately comes down to human interaction. The keys are signed by dozens -of people around the world, and we have to hope that our users have met -enough people in the PGP web of trust -%~\cite{pgp-wot} -that they can learn -the correct keys. For users that aren't connected to the global security -community, though, this question remains a critical weakness. - -%\subsection{Security through obscurity: publishing our design} - -%Many other schemes like dynaweb use the typical arms race strategy of -%not publishing their plans. Our goal here is to produce a design---a -%framework---that can be public and still secure. Where's the tradeoff? - -%\section{Performance improvements} -%\label{sec:performance} -% -%\subsection{Fetch relay descriptors just-in-time} -% -%I guess we should encourage most places to do this, so blocked -%users don't stand out. -% -% -%network-status and directory optimizations. caching better. partitioning -%issues? - -\section{Maintaining reachability} -\label{sec:reachability} - -\subsection{How many bridge relays should you know about?} - -The strategies described in Section~\ref{sec:discovery} talked about -learning one bridge address at a time. But if most bridges are ordinary -Tor users on cable modem or DSL connection, many of them will disappear -and/or move periodically. How many bridge relays should a blocked user -know about so that she is likely to have at least one reachable at any -given point? This is already a challenging problem if we only consider -natural churn: the best approach is to see what bridges we attract in -reality and measure their churn. We may also need to factor in a parameter -for how quickly bridges get discovered and blocked by the attacker; -we leave this for future work after we have more deployment experience. - -A related question is: if the bridge relays change IP addresses -periodically, how often does the blocked user need to fetch updates in -order to keep from being cut out of the loop? - -Once we have more experience and intuition, we should explore technical -solutions to this problem too. For example, if the discovery strategies -give out $k$ bridge addresses rather than a single bridge address, perhaps -we can improve robustness from the user perspective without significantly -aiding the adversary. Rather than giving out a new random subset of $k$ -addresses at each point, we could bind them together into \emph{bridge -families}, so all users that learn about one member of the bridge family -are told about the rest as well. - -This scheme may also help defend against attacks to map the set of -bridges. That is, if all blocked users learn a random subset of bridges, -the attacker should learn about a few bridges, monitor the country-level -firewall for connections to them, then watch those users to see what -other bridges they use, and repeat. By segmenting the bridge address -space, we can limit the exposure of other users. - -\subsection{Cablemodem users don't usually provide important websites} -\label{subsec:block-cable} - -Another attacker we might be concerned about is that the attacker could -just block all DSL and cablemodem network addresses, on the theory that -they don't run any important services anyway. If most of our bridges -are on these networks, this attack could really hurt. - -The first answer is to aim to get volunteers both from traditionally -``consumer'' networks and also from traditionally ``producer'' networks. -Since bridges don't need to be Tor exit nodes, as we improve our usability -it seems quite feasible to get a lot of websites helping out. - -The second answer (not as practical) would be to encourage more use of -consumer networks for popular and useful Internet services. -%(But P2P exists; -%minor websites exist; gaming exists; IM exists; ...) - -A related attack we might worry about is based on large countries putting -economic pressure on companies that want to expand their business. For -example, what happens if Verizon wants to sell services in China, and -China pressures Verizon to discourage its users in the free world from -running bridges? - -\subsection{Scanning resistance: making bridges more subtle} - -If it's trivial to verify that a given address is operating as a bridge, -and most bridges run on a predictable port, then it's conceivable our -attacker could scan the whole Internet looking for bridges. (In fact, -he can just concentrate on scanning likely networks like cablemodem -and DSL services---see Section~\ref{subsec:block-cable} above for -related attacks.) It would be nice to slow down this attack. It would -be even nicer to make it hard to learn whether we're a bridge without -first knowing some secret. We call this general property \emph{scanning -resistance}, and it goes along with normalizing Tor's TLS handshake and -network fingerprint. - -We could provide a password to the blocked user, and she (or her Tor -client) provides a nonced hash of this password when she connects. We'd -need to give her an ID key for the bridge too (in addition to the IP -address and port---see Section~\ref{subsec:id-address}), and wait to -present the password until we've finished the TLS handshake, else it -would look unusual. If Alice can authenticate the bridge before she -tries to send her password, we can resist an adversary who pretends -to be the bridge and launches a man-in-the-middle attack to learn the -password. But even if she can't, we still resist against widespread -scanning. - -How should the bridge behave if accessed without the correct -authorization? Perhaps it should act like an unconfigured HTTPS server -(``welcome to the default Apache page''), or maybe it should mirror -and act like common websites, or websites randomly chosen from Google. - -We might assume that the attacker can recognize HTTPS connections that -use self-signed certificates. (This process would be resource-intensive -but not out of the realm of possibility.) But even in this case, many -popular websites around the Internet use self-signed or just plain broken -SSL certificates. - -%to unknown servers. It can then attempt to connect to them and block -%connections to servers that seem suspicious. It may be that password -%protected web sites will not be suspicious in general, in which case -%that may be the easiest way to give controlled access to the bridge. -%If such sites that have no other overt features are automatically -%blocked when detected, then we may need to be more subtle. -%Possibilities include serving an innocuous web page if a TLS encrypted -%request is received without the authorization needed to access the Tor -%network and only responding to a requested access to the Tor network -%of proper authentication is given. If an unauthenticated request to -%access the Tor network is sent, the bridge should respond as if -%it has received a message it does not understand (as would be the -%case were it not a bridge). - -% Ian suggests a ``socialist millionaires'' protocol here, for something. - -% Did we once mention knocking here? it's a good idea, but we should clarify -% what we mean. Ian also notes that knocking itself is very fingerprintable, -% and we should beware. - -\subsection{How to motivate people to run bridge relays} -\label{subsec:incentives} - -One of the traditional ways to get people to run software that benefits -others is to give them motivation to install it themselves. An often -suggested approach is to install it as a stunning screensaver so everybody -will be pleased to run it. We take a similar approach here, by leveraging -the fact that these users are already interested in protecting their -own Internet traffic, so they will install and run the software. - -Eventually, we may be able to make all Tor users become bridges if they -pass their self-reachability tests---the software and installers need -more work on usability first, but we're making progress. - -In the mean time, we can make a snazzy network graph with -Vidalia\footnote{\url{http://vidalia-project.net/}} that -emphasizes the connections the bridge user is currently relaying. -%(Minor -%anonymity implications, but hey.) (In many cases there won't be much -%activity, so this may backfire. Or it may be better suited to full-fledged -%Tor relay.) - -% Also consider everybody-a-relay. Many of the scalability questions -% are easier when you're talking about making everybody a bridge. - -%\subsection{What if the clients can't install software?} - -%[this section should probably move to the related work section, -%or just disappear entirely.] - -%Bridge users without Tor software - -%Bridge relays could always open their socks proxy. This is bad though, -%first -%because bridges learn the bridge users' destinations, and second because -%we've learned that open socks proxies tend to attract abusive users who -%have no idea they're using Tor. - -%Bridges could require passwords in the socks handshake (not supported -%by most software including Firefox). Or they could run web proxies -%that require authentication and then pass the requests into Tor. This -%approach is probably a good way to help bootstrap the Psiphon network, -%if one of its barriers to deployment is a lack of volunteers willing -%to exit directly to websites. But it clearly drops some of the nice -%anonymity and security features Tor provides. - -%A hybrid approach where the user gets his anonymity from Tor but his -%software-less use from a web proxy running on a trusted machine on the -%free side. - -\subsection{Publicity attracts attention} -\label{subsec:publicity} - -Many people working on this field want to publicize the existence -and extent of censorship concurrently with the deployment of their -circumvention software. The easy reason for this two-pronged push is -to attract volunteers for running proxies in their systems; but in many -cases their main goal is not to focus on getting more users signed up, -but rather to educate the rest of the world about the -censorship. The media also tries to do its part by broadcasting the -existence of each new circumvention system. - -But at the same time, this publicity attracts the attention of the -censors. We can slow down the arms race by not attracting as much -attention, and just spreading by word of mouth. If our goal is to -establish a solid social network of bridges and bridge users before -the adversary gets involved, does this extra attention work to our -disadvantage? - -\subsection{The Tor website: how to get the software} - -One of the first censoring attacks against a system like ours is to -block the website and make the software itself hard to find. Our system -should work well once the user is running an authentic -copy of Tor and has found a working bridge, but to get to that point -we rely on their individual skills and ingenuity. - -Right now, most countries that block access to Tor block only the main -website and leave mirrors and the network itself untouched. -Falling back on word-of-mouth is always a good last resort, but we should -also take steps to make sure it's relatively easy for users to get a copy, -such as publicizing the mirrors more and making copies available through -other media. We might also mirror the latest version of the software on -each bridge, so users who hear about an honest bridge can get a good -copy. -See Section~\ref{subsec:first-bridge} for more discussion. - -% Ian suggests that we have every tor relay distribute a signed copy of the -% software. - -\section{Next Steps} -\label{sec:conclusion} - -Technical solutions won't solve the whole censorship problem. After all, -the firewalls in places like China are \emph{socially} very -successful, even if technologies and tricks exist to get around them. -However, having a strong technical solution is still necessary as one -important piece of the puzzle. - -In this paper, we have shown that Tor provides a great set of building -blocks to start from. The next steps are to deploy prototype bridges and -bridge authorities, implement some of the proposed discovery strategies, -and then observe the system in operation and get more intuition about -the actual requirements and adversaries we're up against. - -\bibliographystyle{plain} \bibliography{tor-design} - -%\appendix - -%\section{Counting Tor users by country} -%\label{app:geoip} - -\end{document} - - - -\section{Future designs} -\label{sec:future} - -\subsection{Bridges inside the blocked network too} - -Assuming actually crossing the firewall is the risky part of the -operation, can we have some bridge relays inside the blocked area too, -and more established users can use them as relays so they don't need to -communicate over the firewall directly at all? A simple example here is -to make new blocked users into internal bridges also---so they sign up -on the bridge authority as part of doing their query, and we give out -their addresses -rather than (or along with) the external bridge addresses. This design -is a lot trickier because it brings in the complexity of whether the -internal bridges will remain available, can maintain reachability with -the outside world, etc. - -More complex future designs involve operating a separate Tor network -inside the blocked area, and using \emph{hidden service bridges}---bridges -that can be accessed by users of the internal Tor network but whose -addresses are not published or findable, even by these users---to get -from inside the firewall to the rest of the Internet. But this design -requires directory authorities to run inside the blocked area too, -and they would be a fine target to take down the network. - -% Hidden services as bridge directory authorities. - - ------------------------------------------- - -ship geoip db to bridges. they look up users who tls to them in the db, -and upload a signed list of countries and number-of-users each day. the -bridge authority aggregates them and publishes stats. - -bridge relays have buddies -they ask a user to test the reachability of their buddy. -leaks O(1) bridges, but not O(n). - -we should not be blockable by ordinary cisco censorship features. -that is, if they want to block our new design, they will need to -add a feature to block exactly this. -strategically speaking, this may come in handy. - -Bridges come in clumps of 4 or 8 or whatever. If you know one bridge -in a clump, the authority will tell you the rest. Now bridges can -ask users to test reachability of their buddies. - -Giving out clumps helps with dynamic IP addresses too. Whether it -should be 4 or 8 depends on our churn. - -the account server. let's call it a database, it doesn't have to -be a thing that human interacts with. - -so how do we reward people for being good? - -\subsubsection{Public Bridges with Coordinated Discovery} - -****Pretty much this whole subsubsection will probably need to be -deferred until ``later'' and moved to after end document, but I'm leaving -it here for now in case useful.****** - -Rather than be entirely centralized, we can have a coordinated -collection of bridge authorities, analogous to how Tor network -directory authorities now work. - -Key components -``Authorities'' will distribute caches of what they know to overlapping -collections of nodes so that no one node is owned by one authority. -Also so that it is impossible to DoS info maintained by one authority -simply by making requests to it. - -Where a bridge gets assigned is not predictable by the bridge? - -If authorities don't know the IP addresses of the bridges they -are responsible for, they can't abuse that info (or be attacked for -having it). But, they also can't, e.g., control being sent massive -lists of nodes that were never good. This raises another question. -We generally decry use of IP address for location, etc. but we -need to do that to limit the introduction of functional but useless -IP addresses because, e.g., they are in China and the adversary -owns massive chunks of the IP space there. - -We don't want an arbitrary someone to be able to contact the -authorities and say an IP address is bad because it would be easy -for an adversary to take down all the suspicious bridges -even if they provide good cover websites, etc. Only the bridge -itself and/or the directory authority can declare a bridge blocked -from somewhere. - - -9. Bridge directories must not simply be a handful of nodes that -provide the list of bridges. They must flood or otherwise distribute -information out to other Tor nodes as mirrors. That way it becomes -difficult for censors to flood the bridge directory authorities with -requests, effectively denying access for others. But, there's lots of -churn and a much larger size than Tor directories. We are forced to -handle the directory scaling problem here much sooner than for the -network in general. Authorities can pass their bridge directories -(and policy info) to some moderate number of unidentified Tor nodes. -Anyone contacting one of those nodes can get bridge info. the nodes -must remain somewhat synched to prevent the adversary from abusing, -e.g., a timed release policy or the distribution to those nodes must -be resilient even if they are not coordinating. - -I think some kind of DHT like scheme would work here. A Tor node is -assigned a chunk of the directory. Lookups in the directory should be -via hashes of keys (fingerprints) and that should determine the Tor -nodes responsible. Ordinary directories can publish lists of Tor nodes -responsible for fingerprint ranges. Clients looking to update info on -some bridge will make a Tor connection to one of the nodes responsible -for that address. Instead of shutting down a circuit after getting -info on one address, extend it to another that is responsible for that -address (the node from which you are extending knows you are doing so -anyway). Keep going. This way you can amortize the Tor connection. - -10. We need some way to give new identity keys out to those who need -them without letting those get immediately blocked by authorities. One -way is to give a fingerprint that gets you more fingerprints, as -already described. These are meted out/updated periodically but allow -us to keep track of which sources are compromised: if a distribution -fingerprint repeatedly leads to quickly blocked bridges, it should be -suspect, dropped, etc. Since we're using hashes, there shouldn't be a -correlation with bridge directory mirrors, bridges, portions of the -network observed, etc. It should just be that the authorities know -about that key that leads to new addresses. - -This last point is very much like the issues in the valet nodes paper, -which is essentially about blocking resistance wrt exiting the Tor network, -while this paper is concerned with blocking the entering to the Tor network. -In fact the tickets used to connect to the IPo (Introduction Point), -could serve as an example, except that instead of authorizing -a connection to the Hidden Service, it's authorizing the downloading -of more fingerprints. - -Also, the fingerprints can follow the hash(q + '1' + cookie) scheme of -that paper (where q = hash(PK + salt) gave the q.onion address). This -allows us to control and track which fingerprint was causing problems. - -Note that, unlike many settings, the reputation problem should not be -hard here. If a bridge says it is blocked, then it might as well be. -If an adversary can say that the bridge is blocked wrt -$\mathit{censor}_i$, then it might as well be, since -$\mathit{censor}_i$ can presumably then block that bridge if it so -chooses. - -11. How much damage can the adversary do by running nodes in the Tor -network and watching for bridge nodes connecting to it? (This is -analogous to an Introduction Point watching for Valet Nodes connecting -to it.) What percentage of the network do you need to own to do how -much damage. Here the entry-guard design comes in helpfully. So we -need to have bridges use entry-guards, but (cf. 3 above) not use -bridges as entry-guards. Here's a serious tradeoff (again akin to the -ratio of valets to IPos) the more bridges/client the worse the -anonymity of that client. The fewer bridges/client the worse the -blocking resistance of that client. - - - |