``` Filename: 203-https-frontend.txt Title: Avoiding censorship by impersonating an HTTPS server Author: Nick Mathewson Created: 24 Jun 2012 Status: Obsolete Note: Obsoleted-by pluggable transports. Overview: One frequently proposed approach for censorship resistance is that Tor bridges ought to act like another TLS-based service, and deliver traffic to Tor only if the client can demonstrate some shared knowledge with the bridge. In this document, I discuss some design considerations for building such systems, and propose a few possible architectures and designs. Background: Most of our previous work on censorship resistance has focused on preventing passive attackers from identifying Tor bridges, or from doing so cheaply. But active attackers exist, and exist in the wild: right now, the most sophisticated censors use their anti-Tor passive attacks only as a first round of filtering before launching a secondary active attack to confirm suspected Tor nodes. One idea we've been talking about for a while is that of having a service that looks like an HTTPS service unless a client does some particular secret thing to prove it is allowed to use it as a Tor bridge. Such a system would still succumb to passive traffic analysis attacks (since the packet timings and sizes for HTTPS don't look that much like Tor), but it would be enough to beat many current censors. Goals and requirements: We should make it impossible for a passive attacker who examines only a few packets at a time to distinguish Tor->Bridge traffic from an HTTPS client talking to an HTTPS server. We should make it impossible for an active attacker talking to the server to tell a Tor bridge server from a regular HTTPS server. We should make it impossible for an active attacker who can MITM the server to learn from the client whether it thought it was connecting to an HTTPS server or a Tor bridge. (This implies that an MITM attacker shouldn't be able to learn anything that would help it convince the server to act like a bridge.) It would be nice to minimize the required code changes to Tor, and the required code changes to any other software. It would be good to avoid any requirement of close integration with any particular HTTP or HTTPS implementation. If we're replacing our own profile with that of an HTTPS service, we should do so in a way that lets us use the profile of a popular HTTPS implementation. Efficiency would be good: layering TLS inside TLS is best avoided if we can. Discussion: We need an actual web server; HTTP and HTTPS are so complicated that there's no practical way to behave in a bug-compatible way with any popular webserver short of running that webserver. More obviously, we need a TLS implementation (or we can't implement HTTPS), and we need a Tor bridge (since that's the whole point of this exercise). So from a top-level point of view, the question becomes: how shall we wire these together? There are three obvious ways; I'll discuss them in turn below. Design #1: TLS in Tor Under this design, Tor accepts HTTPS connections, decides which ones don't look like the Tor protocol, and relays them to a webserver. +--------------------------------------+ +------+ TLS | +------------+ http +-----------+ | | User |<------> | Tor Bridge |<----->| Webserver | | +------+ | +------------+ +-----------+ | | trusted host/network | +--------------------------------------+ This approach would let us use a completely unmodified webserver implementation, but would require the most extensive changes in Tor: we'd need to add yet another flavor to Tor's TLS ice cream parlor, and try to emulate a popular webserver's TLS behavior even more thoroughly. To authenticate, we would need to take a hybrid approach, and begin forwarding traffic to the webserver as soon as a webserver might respond to the traffic. This could be pretty complicated, since it requires us to have a model of how the webserver would respond to any given set of bytes. As a workaround, we might try relaying _all_ input to the webserver, and only replying as Tor in the cases where the website hasn't replied. (This would likely create recognizable timing patterns, though.) The authentication itself could use a system akin to Tor proposals 189/190, where an early AUTHORIZE cell shows knowledge of a shared secret if the client is a Tor client. Design #2: TLS in the web server +----------------------------------+ +------+ TLS | +------------+ tor0 +-----+ | | User |<------> | Webserver |<------->| Tor | | +------+ | +------------+ +-----+ | | trusted host/network | +----------------------------------+ In this design, we write an Apache module or something that can recognize an authenticator of some kind in an HTTPS header, or recognize a valid AUTHORIZE cell, and respond by forwarding the traffic to a Tor instance. To avoid the efficiency issue of doing an extra local encrypt/decrypt, we need to have the webserver talk to Tor over a local unencrypted connection. (I've denoted this as "tor0" in the diagram above.) For implementation convenience, we might want to implement that as a NULL TLS connection, so that the Tor server code wouldn't have to change except to allow local NULL TLS connections in this configuration. For the Tor handshake to work properly here, we'll need a way for the Tor instance to know which public key the webserver is configured to use. We wouldn't need to support the parts of the Tor link protocol used to authenticate clients to servers: relays shouldn't be using this subsystem at all. The Tor client would need to connect and prove its status as a Tor client. If the client uses some means other than AUTHORIZE cells, or if we want to do the authentication in a pluggable transport, and we therefore decided to offload the responsibility for TLS itself to the pluggable transport, that would scare me: Supporting pluggable transports that have the responsibility for TLS would make it fairly easy to mess up the crypto, and I'd rather not have it be so easy to write a pluggable transport that accidentally makes Tor less secure. Design #3: Reverse proxy +----------------------------------+ | +-------+ http +-----------+ | | | |<------>| Webserver | | +------+ TLS | | | +-----------+ | | User |<------> | Proxy | | +------+ | | | tor0 +-----------+ | | | |<------>| Tor | | | +-------+ +-----------+ | | trusted host/network | +----------------------------------+ In this design, we write a server-side proxy to sit in front of Tor and a webserver, or repurpose some existing HTTPS proxy. Its role will be to do TLS, and then forward connections to Tor or the webserver as appropriate. (In the web world, this kind of thing is called a "reverse proxy", so that's the term I'm using here.) To avoid fingerprinting, we should choose a proxy that's already in common use as a TLS front-end for webservers -- nginx, perhaps. Unfortunately, the more popular tools here seem to be pretty complex, and the simpler tools less widely deployed. More investigation would be needed. The authorization considerations would be as in Design #2 above; for the reasons discussed there, it's probably a good idea to build the necessary authorization into Tor itself. I generally like this design best: it lets us isolate the "Check for a valid authenticator and/or a valid or invalid HTTP header, and react accordingly" question to a single program. How to authenticate: The easiest way Designing a good MITM-resistant AUTHORIZE cell, or an equivalent HTTP header, is an open problem that we should solve in proposals 190 and 191 and their successors. I'm calling it out-of-scope here; please see those proposals, their attendant discussion, and their eventual successors. How to authenticate: a slightly harder way Some proposals in this vein have in the past suggested a special HTTP header to distinguish Tor connections from non-Tor connections. This could work too, though it would require substantially larger changes on the Tor client's part, would still require the client take measures to avoid MITM attacks, and would also require the client to implement a particular browser's http profile. Some considerations on distinguishability Against a passive eavesdropper, the easiest way to avoid distinguishability in server responses will be to use an actual web server or reverse web proxy's TLS implementation. (Distinguishability based on client TLS use is another topic entirely.) Against an active non-MITM attacker, the best probing attacks will be ones designed to provoke the system into acting in ways different from those in which a webserver would act: responding earlier than a web server would respond, or later, or differently. We need to make sure that, whatever the front-end program is, it answers anything that would qualify as a well-formed or ill-formed HTTP request whenever the web server would. This must mean, for example, that whatever the correct form of client authorization turns out to be, no prefix of that authorization is ever something that the webserver would respond to. With some web servers (I believe), that's as easy as making sure that any valid authenticator isn't too long, and doesn't contain a CR or LF character. With others, the authenticator would need to be a valid HTTP request, with all the attendant difficulty that would raise. Against an attacker who can MITM the bridge, the best attacks will be to wait for clients to connect and see how they behave. In this case, the client probably needs to be able to authenticate the bridge certificate as presented in the initial TLS handshake -- or some other aspect of the TLS handshake if we're feeling insane. If the certificate or handshake isn't as expected, the client should behave as a web browser that's just received a bad TLS certificate. (The alternative there would be to try to impersonate an HTTPS client that has just accepted a self-signed certificate. But that would probably require the Tor client to impersonate a full web browser, which isn't realistic.) Side note: What to put on the webserver? To credibly pretend not to be ourselves, we must pretend to be something else in particular -- and something not easily identifiable or inherently worthless. We should not, for example, have all deployments of this kind use a fixed website, even if that website is the default "Welcome to Apache" configuration: A censor would probably feel that they weren't breaking anything important by blocking all unconfigured websites with nothing on them. Therefore, we should probably conceive of a system like this as "Something to add to your HTTPS website" rather than as a standalone installation. Related work: meek [1] is a pluggable transport that uses HTTP for carrying bytes and TLS for obfuscation. Traffic is relayed through a third-party server (Google App Engine). It uses a trick to talk to the third party so that it looks like it is talking to an unblocked server. meek itself is not really about HTTP at all. It uses HTTP only because it's convenient and the big Internet services we use as cover also use HTTP. meek uses HTTP as a transport, and TLS for obfuscation, but the key idea is really "domain fronting," where it appears to the censor you are talking to one domain (www.google.com), but behind the scenes you are talking to another (meek-reflect.appspot.com). The meek-server program is an ordinary HTTP (not necessarily even HTTPS!) server, whose communication is easily fingerprintable; but that doesn't matter because the censor never sees that part of the communication, only the communication between the client and CDN. One way to think about the difference: if a censor (somehow) learns the IP address of a bridge as described in this proposal, it's easy and low-cost for the censor to block that bridge by IP address. meek aims to make it much more expensive: even if you know a domain is being used (in part) for circumvention, in order to block it have to block something important like the Google frontend or CloudFlare (high collateral damage). 1. https://trac.torproject.org/projects/tor/wiki/doc/meek ```