summaryrefslogtreecommitdiff
path: root/doc/spec/proposals/ideas
diff options
context:
space:
mode:
Diffstat (limited to 'doc/spec/proposals/ideas')
-rw-r--r--doc/spec/proposals/ideas/xxx-auto-update.txt39
-rw-r--r--doc/spec/proposals/ideas/xxx-bridge-disbursement.txt174
-rw-r--r--doc/spec/proposals/ideas/xxx-bwrate-algs.txt106
-rw-r--r--doc/spec/proposals/ideas/xxx-choosing-crypto-in-tor-protocol.txt138
-rw-r--r--doc/spec/proposals/ideas/xxx-controllers-intercept-extends.txt44
-rw-r--r--doc/spec/proposals/ideas/xxx-encrypted-services.txt18
-rw-r--r--doc/spec/proposals/ideas/xxx-exit-scanning-outline.txt44
-rw-r--r--doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt137
-rw-r--r--doc/spec/proposals/ideas/xxx-grand-scaling-plan.txt97
-rw-r--r--doc/spec/proposals/ideas/xxx-hide-platform.txt37
-rw-r--r--doc/spec/proposals/ideas/xxx-port-knocking.txt91
-rw-r--r--doc/spec/proposals/ideas/xxx-rate-limit-exits.txt63
-rw-r--r--doc/spec/proposals/ideas/xxx-separate-streams-by-port.txt59
-rw-r--r--doc/spec/proposals/ideas/xxx-using-spdy.txt143
-rw-r--r--doc/spec/proposals/ideas/xxx-what-uses-sha1.txt247
15 files changed, 0 insertions, 1437 deletions
diff --git a/doc/spec/proposals/ideas/xxx-auto-update.txt b/doc/spec/proposals/ideas/xxx-auto-update.txt
deleted file mode 100644
index dc9a857c1e..0000000000
--- a/doc/spec/proposals/ideas/xxx-auto-update.txt
+++ /dev/null
@@ -1,39 +0,0 @@
-
-Notes on an auto updater:
-
-steve wants a "latest" symlink so he can always just fetch that.
-
-roger worries that this will exacerbate the "what version are you
-using?" "latest." problem.
-
-weasel suggests putting the latest recommended version in dns. then
-we don't have to hit the website. it's got caching, it's lightweight,
-it scales. just put it in a TXT record or something.
-
-but, no dnssec.
-
-roger suggests a file on the https website that lists the latest
-recommended version (or filename or url or something like that).
-
-(steve seems to already be doing this with xerobank. he additionally
-suggests a little blurb that can be displayed to the user to describe
-what's new.)
-
-how to verify you're getting the right file?
-a) it's https.
-b) ship with a signing key, and use some openssl functions to verify.
-c) both
-
-andrew reminds us that we have a "recommended versions" line in the
-consensus directory already.
-
-if only we had some way to point out the "latest stable recommendation"
-from this list. we could list it first, or something.
-
-the recommended versions line also doesn't take into account which
-packages are available -- e.g. on Windows one version might be the best
-available, and on OS X it might be a different one.
-
-aren't there existing solutions to this? surely there is a beautiful,
-efficient, crypto-correct auto updater lib out there. even for windows.
-
diff --git a/doc/spec/proposals/ideas/xxx-bridge-disbursement.txt b/doc/spec/proposals/ideas/xxx-bridge-disbursement.txt
deleted file mode 100644
index 6c9a3c71ed..0000000000
--- a/doc/spec/proposals/ideas/xxx-bridge-disbursement.txt
+++ /dev/null
@@ -1,174 +0,0 @@
-
-How to hand out bridges.
-
-Divide bridges into 'strategies' as they come in. Do this uniformly
-at random for now.
-
-For each strategy, we'll hand out bridges in a different way to
-clients. This document describes two strategies: email-based and
-IP-based.
-
-0. Notation:
-
- HMAC(k,v) : an HMAC of v using the key k.
-
- A|B: The string A concatenated with the string B.
-
-
-1. Email-based.
-
- Goal: bootstrap based on one or more popular email service's sybil
- prevention algorithms.
-
-
- Parameters:
- HMAC -- an HMAC function
- P -- a time period
- K -- the number of bridges to send in a period.
-
- Setup: Generate two nonces, N and M.
-
- As bridges arrive, put them into a ring according to HMAC(N,ID)
- where ID is the bridges's identity digest.
-
- Divide time into divisions of length P.
-
- When we get an email:
-
- If it's not from a supported email service, reject it.
-
- If we already sent a response to that email address (normalized)
- in this period, send _exactly_ the same response.
-
- If it is from a supported service, generate X = HMAC(M,PS|E) where E
- is the lowercased normalized email address for the user, and
- where PS is the start of the currrent period. Send
- the first K bridges in the ring after point X.
-
- [If we want to make sure that repeat queries are given exactly the
- same results, then we can't let the ring change during the
- time period. For a long time period like a month, that's quite a
- hassle. How about instead just keeping a replay cache of addresses
- that have been answered, and sending them a "sorry, you already got
- your addresses for the time period; perhaps you should try these
- other fine distribution strategies while you wait?" response? This
- approach would also resolve the "Make sure you can't construct a
- distinct address to match an existing one" note below. -RD]
-
- [I think, if we get a replay, we need to send back the same
- answer as we did the first time, not say "try again."
- Otherwise we need to worry that an attacker can keep people
- from getting bridges by preemtively asking for them,
- or that an attacker may force them to prove they haven't
- gotten any bridges by asking. -NM]
-
- [While we're at it, if we do the replay cache thing and don't need
- repeatable answers, we could just pick K random answers from the
- pool. Is it beneficial that a bridge user who knows about a clump of
- nodes will be sharing them with other users who know about a similar
- (overlapping) clump? One good aspect is against an adversary who
- learns about a clump this way and watches those bridges to learn
- other users and discover *their* bridges: he doesn't learn about
- as many new bridges as he might if they were randomly distributed.
- A drawback is against an adversary who happens to pick two email
- addresses in P that include overlapping answers: he can measure
- the difference in clumps and estimate how quickly the bridge pool
- is growing. -RD]
-
- [Random is one more darn thing to implement; rings are already
- there. -NM]
-
- [If we make the period P be mailbox-specific, and make it a random
- value around some mean, then we make it harder for an attacker to
- know when to try using his small army of gmail addresses to gather
- another harvest. But we also make it harder for users to know when
- they can try again. -RD]
-
- [Letting the users know about when they can try again seems
- worthwhile. Otherwise users and attackers will all probe and
- probe and probe until they get an answer. No additional
- security will be achieved, but bandwidth will be lost. -NM]
-
- To normalize an email address:
- Start with the RFC822 address. Consider only the mailbox {???}
- portion of the address (username@domain). Put this into lowercase
- ascii.
-
- Questions:
- What to do with weird character encodings? Look up the RFC.
-
- Notes:
- Make sure that you can't force a single email address to appear
- in lots of different ways. IOW, if nickm@freehaven.net and
- NICKM@freehaven.net aren't treated the same, then I can get lots
- more bridges than I should.
-
- Make sure you can't construct a distinct address to match an
- existing one. IOW, if we treat nickm@X and nickm@Y as the same
- user, then anybody can register nickm@Z and use it to tell which
- bridges nickm@X got (or would get).
-
- Make sure that we actually check headers so we can't be trivially
- used to spam people.
-
-
-2. IP-based.
-
- Goal: avoid handing out all the bridges to users in a similar IP
- space and time.
-
- Parameters:
-
- T_Flush -- how long it should take a user on a single network to
- see a whole cluster of bridges.
-
- N_C
-
- K -- the number of bridges we hand out in response to a single
- request.
-
- Setup: using an AS map or a geoip map or some other flawed input
- source, divide IP space into "areas" such that surveying a large
- collection of "areas" is hard. For v0, use /24 address blocks.
-
- Group areas into N_C clusters.
-
- Generate secrets L, M, N.
-
- Set the period P such that P*(bridges-per-cluster/K) = T_flush.
- Don't set P to greater than a week, or less than three hours.
-
- When we get a bridge:
-
- Based on HMAC(L,ID), assign the bridge to a cluster. Within each
- cluster, keep the bridges in a ring based on HMAC(M,ID).
-
- [Should we re-sort the rings for each new time period, so the ring
- for a given cluster is based on HMAC(M,PS|ID)? -RD]
-
- When we get a connection:
-
- If it's http, redirect it to https.
-
- Let area be the incoming IP network. Let PS be the current
- period. Compute X = HMAC(N, PS|area). Return the next K bridges
- in the ring after X.
-
- [Don't we want to compute C = HMAC(key, area) to learn what cluster
- to answer from, and then X = HMAC(key, PS|area) to pick a point in
- that ring? -RD]
-
-
- Need to clarify that some HMACs are for rings, and some are for
- partitions. How rings scale is clear. How do we grow the number of
- partitions? Looking at successive bits from the HMAC output is one way.
-
-3. Open issues
-
- Denial of service attacks
- A good view of network topology
-
-at some point we should learn some reliability stats on our bridges. when
-we say above 'give out k bridges', we might give out 2 reliable ones and
-k-2 others. we count around the ring the same way we do now, to find them.
-
diff --git a/doc/spec/proposals/ideas/xxx-bwrate-algs.txt b/doc/spec/proposals/ideas/xxx-bwrate-algs.txt
deleted file mode 100644
index 757f5bc55e..0000000000
--- a/doc/spec/proposals/ideas/xxx-bwrate-algs.txt
+++ /dev/null
@@ -1,106 +0,0 @@
-# The following two algorithms
-
-
-# Algorithm 1
-# TODO: Burst and Relay/Regular differentiation
-
-BwRate = Bandwidth Rate in Bytes Per Second
-GlobalWriteBucket = 0
-GlobalReadBucket = 0
-Epoch = Token Fill Rate in seconds: suggest 50ms=.050
-SecondCounter = 0
-MinWriteBytes = Minimum amount bytes per write
-
-Every Epoch Seconds:
- UseMinWriteBytes = MinWriteBytes
- WriteCnt = 0
- ReadCnt = 0
- BytesRead = 0
-
- For Each Open OR Conn with pending write data:
- WriteCnt++
- For Each Open OR Conn:
- ReadCnt++
-
- BytesToRead = (BwRate*Epoch + GlobalReadBucket)/ReadCnt
- BytesToWrite = (BwRate*Epoch + GlobalWriteBucket)/WriteCnt
-
- if BwRate/WriteCnt < MinWriteBytes:
- # If we aren't likely to accumulate enough bytes in a second to
- # send a whole cell for our connections, send partials
- Log(NOTICE, "Too many ORCons to write full blocks. Sending short packets.")
- UseMinWriteBytes = 1
- # Other option: We could switch to plan 2 here
-
- # Service each writable ORConn. If there are any partial writes,
- # return remaining bytes from this epoch to the global pool
- For Each Open OR Conn with pending write data:
- ORConn->write_bucket += BytesToWrite
- if ORConn->write_bucket > UseMinWriteBytes:
- w = write(ORConn, MIN(len(ORConn->write_data), ORConn->write_bucket))
- # possible that w < ORConn->write_data here due to TCP pushback.
- # We should restore the rest of the write_bucket to the global
- # buffer
- GlobalWriteBucket += (ORConn->write_bucket - w)
- ORConn->write_bucket = 0
-
- For Each Open OR Conn:
- r = read_nonblock(ORConn, BytesToRead)
- BytesRead += r
-
- SecondCounter += Epoch
- if SecondCounter < 1:
- # Save unused bytes from this epoch to be used later in the second
- GlobalReadBucket += (BwRate*Epoch - BytesRead)
- else:
- SecondCounter = 0
- GlobalReadBucket = 0
- GlobalWriteBucket = 0
- For Each ORConn:
- ORConn->write_bucket = 0
-
-
-
-# Alternate plan for Writing fairly. Reads would still be covered
-# by plan 1 as there is no additional network overhead for short reads,
-# so we don't need to try to avoid them.
-#
-# I think this is actually pretty similar to what we do now, but
-# with the addition that the bytes accumulate up to the second mark
-# and we try to keep track of our position in the write list here
-# (unless libevent is doing that for us already and I just don't see it)
-#
-# TODO: Burst and Relay/Regular differentiation
-
-# XXX: The inability to send single cells will cause us to block
-# on EXTEND cells for low-bandwidth node pairs..
-BwRate = Bandwidth Rate in Bytes Per Second
-WriteBytes = Bytes per write
-Epoch = MAX(MIN(WriteBytes/BwRate, .333s), .050s)
-
-SecondCounter = 0
-GlobalWriteBucket = 0
-
-# New connections are inserted at Head-1 (the 'tail' of this circular list)
-# This is not 100% fifo for all node data, but it is the best we can do
-# without insane amounts of additional queueing complexity.
-WriteConnList = List of Open OR Conns with pending write data > WriteBytes
-WriteConnHead = 0
-
-Every Epoch Seconds:
- GlobalWriteBucket += BwRate*Epoch
- WriteListEnd = WriteConnHead
-
- do
- ORCONN = WriteConnList[WriteConnHead]
- w = write(ORConn, WriteBytes)
- GlobalWriteBucket -= w
- WriteConnHead += 1
- while GlobalWriteBucket > 0 and WriteConnHead != WriteListEnd
-
- SecondCounter += Epoch
- if SecondCounter >= 1:
- SecondCounter = 0
- GlobalWriteBucket = 0
-
-
diff --git a/doc/spec/proposals/ideas/xxx-choosing-crypto-in-tor-protocol.txt b/doc/spec/proposals/ideas/xxx-choosing-crypto-in-tor-protocol.txt
deleted file mode 100644
index e8489570f7..0000000000
--- a/doc/spec/proposals/ideas/xxx-choosing-crypto-in-tor-protocol.txt
+++ /dev/null
@@ -1,138 +0,0 @@
-Filename: xxx-choosing-crypto-in-tor-protocol.txt
-Title: Picking cryptographic standards in the Tor wire protocol
-Author: Marian
-Created: 2009-05-16
-Status: Draft
-
-Motivation:
-
- SHA-1 is horribly outdated and not suited for security critical
- purposes. SHA-2, RIPEMD-160, Whirlpool and Tigerare good options
- for a short-term replacement, but in the long run, we will
- probably want to upgrade to the winner or a semi-finalist of the
- SHA-3 competition.
-
- For a 2006 comparison of different hash algorithms, read:
- http://www.sane.nl/sane2006/program/final-papers/R10.pdf
-
- Other reading about SHA-1:
- http://www.schneier.com/blog/archives/2005/02/sha1_broken.html
- http://www.schneier.com/blog/archives/2005/08/new_cryptanalyt.html
- http://www.schneier.com/paper-preimages.html
-
- Additionally, AES has been theoretically broken for years. While
- the attack is still not efficient enough that the public sector
- has been able to prove that it works, we should probably consider
- the time between a theoretical attack and a practical attack as an
- opportunity to figure out how to upgrade to a better algorithm,
- such as Twofish.
-
- See:
- http://schneier.com/crypto-gram-0209.html#1
-
-Design:
-
- I suggest that nodes should publish in directories which
- cryptographic standards, such as hash algorithms and ciphers,
- they support. Clients communicating with nodes will then
- pick whichever of those cryptographic standards they prefer
- the most. In the case that the node does not publish which
- cryptographic standards it supports, the client should assume
- that the server supports the older standards, such as SHA-1
- and AES, until such time as we choose to desupport those
- standards.
-
- Node to node communications could work similarly. However, in
- case they both support a set of algorithms but have different
- preferences, the disagreement would have to be resolved
- somehow. Two possibilities include:
- * the node requesting communications presents which
- cryptographic standards it supports in the request. The
- other node picks.
- * both nodes send each other lists of what they support and
- what version of Tor they are using. The newer node picks,
- based on the assumption that the newer node has the most up
- to date information about which hash algorithm is the best.
- Of course, the node could lie about its version, but then
- again, it could also maliciously choose only to support older
- algorithms.
-
- Using this method, we could potentially add server side support
- to hash algorithms and ciphers before we instruct clients to
- begin preferring those hash algorithms and ciphers. In this way,
- the clients could upgrade and the servers would already support
- the newly preferred hash algorithms and ciphers, even if the
- servers were still using older versions of Tor, so long as the
- older versions of Tor were at least new enough to have server
- side support.
-
- This would make quickly upgrading to new hash algorithms and
- ciphers easier. This could be very useful when new attacks
- are published.
-
- One concern is that client preferences could expose the client
- to segmentation attacks. To mitigate this, we suggest hardcoding
- preferences in the client, to prevent the client from choosing
- to use a new hash algorithm or cipher that no one else is using
- yet. While offering a preference might be useful in case a client
- with an older version of Tor wants to start using the newer hash
- algorithm or cipher that everyone else is using, if the client
- cares enough, he or she can just upgrade Tor.
-
- We may also have to worry about nodes which, through laziness or
- maliciousness, refuse to start supporting new hash algorithms or
- ciphers. This must be balanced with the need to maintain
- backward compatibility so the client will have a large selection
- of nodes to pick from. Adding new hash algorithms and ciphers
- long before we suggest nodes start using them can help mitigate
- this. However, eventually, once sufficient nodes support new
- standards, client side support for older standards should be
- disabled, particularly if there are practical rather than merely
- theoretical attacks.
-
- Server side support for older standards can be kept much longer
- than client side support, since clients using older hashes and
- ciphers are really only hurting theirselvse.
-
- If server side support for a hash algorithm or cipher is added
- but never preferred before we decide we don't really want it,
- support can be removed without having to worry about backward
- compatibility.
-
-Security implications:
- Improving cryptography will improve Tor's security. However, if
- clients pick different cryptographic standards, they could be
- partitioned based on their cryptographic preferences. We also
- need to worry about nodes refusing to support new standards.
- These issues are detailed above.
-
-Specification:
-
- Todo. Need better understanding of how Tor currently works or
- help from someone who does.
-
-Compatibility:
-
- This idea is intended to allow easier upgrading of cryptographic
- hash algorithms and ciphers while maintaining backwards
- compatibility. However, at some point, backwards compatibility
- with very old hashes and ciphers should be dropped for security
- reasons.
-
-Implementation:
-
- Todo.
-
-Performance and scalability nodes:
-
- Better hashes and cipher are someimes a little more CPU intensive
- than weaker ones. For instance, on most computers AES is a little
- faster than Twofish. However, in that example, I consider Twofish's
- additional security worth the tradeoff.
-
-Acknowledgements:
-
- Discussed this on IRC with a few people, mostly Nick Mathewson.
- Nick was particularly helpful in explaining how Tor works,
- explaining goals, and providing various links to Tor
- specifications.
diff --git a/doc/spec/proposals/ideas/xxx-controllers-intercept-extends.txt b/doc/spec/proposals/ideas/xxx-controllers-intercept-extends.txt
deleted file mode 100644
index 76ba5c84b5..0000000000
--- a/doc/spec/proposals/ideas/xxx-controllers-intercept-extends.txt
+++ /dev/null
@@ -1,44 +0,0 @@
-Author: Geoff Goodell
-Title: Allow controller to manage circuit extensions
-Date: 12 March 2006
-
-History:
-
- This was once bug 268. Moving it into the proposal system for posterity.
-
-Test:
-
-Tor controllers should have a means of learning more about circuits built
-through Tor routers. Specifically, if a Tor controller is connected to a Tor
-router, it should be able to subscribe to a new class of events, perhaps
-"onion" or "router" events. A Tor router SHOULD then ensure that the
-controller is informed:
-
-(a) (NEW) when it receives a connection from some other location, in which
-case it SHOULD indicate (1) a unique identifier for the circuit, and (2) a
-ServerID in the event of an OR connection from another Tor router, and
-Hostname otherwise.
-
-(b) (REQUEST) when it receives a request to extend an existing circuit to a
-successive Tor router, in which case it SHOULD provide (1) the unique
-identifier for the circuit, (2) a Hostname (or, if possible, ServerID) of the
-previous Tor router in the circuit, and (3) a ServerID for the requested
-successive Tor router in the circuit;
-
-(c) (EXTEND) Tor will attempt to extend the circuit to some other router, in
-which case it SHOULD provide the same fields as provided for REQUEST.
-
-(d) (SUCCEEDED) The circuit has been successfully extended to some ther
-router, in which case it SHOULD provide the same fields as provided for
-REQUEST.
-
-We also need a new configuration option analogous to _leavestreamsunattached,
-specifying whether the controller is to manage circuit extensions or not.
-Perhaps we can call it "_leavecircuitsunextended". When set to 0, Tor
-manages everything as usual. When set to 1, a circuit received by the Tor
-router cannot transition from "REQUEST" to "EXTEND" state without being
-directed by a new controller command. The controller command probably does
-not need any arguments, since circuits are extended per client source
-routing, and all that the controller does is accept or reject the extension.
-
-This feature can be used as a basis for enforcing routing policy.
diff --git a/doc/spec/proposals/ideas/xxx-encrypted-services.txt b/doc/spec/proposals/ideas/xxx-encrypted-services.txt
deleted file mode 100644
index 3414f3c4fb..0000000000
--- a/doc/spec/proposals/ideas/xxx-encrypted-services.txt
+++ /dev/null
@@ -1,18 +0,0 @@
-
-the basic idea might be to generate a keypair, and sign little statements
-like "this key corresponds to this relay id", and publish them on karsten's
-hs dht.
-
-so if you want to talk to it, you look it up, then go to that exit.
-and by 'go to' i mean 'build a tor circuit like normal except you're sure
-where to exit'
-
-connecting to it is slower than usual, but once you're connected, it's no
-slower than normal tor.
-and you get what wikileaks wants from its hidden service, which is really
-just the UI piece.
-indymedia also wants this.
-
-might be interesting to let an encrypted service list more than one relay,
-too.
-
diff --git a/doc/spec/proposals/ideas/xxx-exit-scanning-outline.txt b/doc/spec/proposals/ideas/xxx-exit-scanning-outline.txt
deleted file mode 100644
index d84094400a..0000000000
--- a/doc/spec/proposals/ideas/xxx-exit-scanning-outline.txt
+++ /dev/null
@@ -1,44 +0,0 @@
-1. Scanning process
- A. Non-HTML/JS HTTP mime types compared via SHA1 hash
- B. Dynamic HTTP content filtered at 4 levels:
- 1. IP change+Tor cookie utilization
- - Tor cookies replayed with new IP in case of changes
- 2. HTML Tag+Attribute+JS comparison
- - Comparisons made based only on "relevant" HTML tags
- and attributes
- 3. HTML Tag+Attribute+JS diffing
- - Tags, attributes and JS AST nodes that change during
- Non-Tor fetches pruned from comparison
- 4. URLS with > N% of node failures removed
- - results purged from filesystem at end of scan loop
- C. SSL scanning handles some forms of dynamic certs
- 1. Catalogs certs for all IPs resolved locally
- by getaddrinfo over the duration of the scan.
- - Updated each test.
- 2. If the domain presents a new cert for each IP, this
- is noted on the failure result for the node
- 3. If the same IP presents two different certs locally,
- the cert list is first refreshed, and if it happens
- again, discarded
- 4. A N% node failure filter also applies
- D. Scanner can be restarted from any point in the event
- of scanner or system crashes, or graceful shutdown.
- - Results+scan state pickled to filesystem continuously
-2. Cron job checks results periodically for reporting
- A. Divide failures into three types of BadExit based on type
- and frequency over time and incident rate
- B. write reject lines to approved-routers for those three types:
- 1. ID Hex based (for misconfig/network problems easily fixed)
- 2. IP based (for content modification)
- 3. IP+mask based (for continuous/egregious content modification)
- C. Emails results to tor-scanners@freehaven.net
-3. Human Review and Appeal
- A. ID Hex-based BadExit is meant to be possible to removed easily
- without needing to beg us.
- - Should this behavior be encouraged?
- B. Optionally can reserve IP based badexits for human review
- 1. Results are encapsulated fully on the filesystem and can be
- reviewed without network access
- 2. Soat has --rescan to rescan failed nodes from a data directory
- - New set of URLs used
-
diff --git a/doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt b/doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt
deleted file mode 100644
index 49c6615a66..0000000000
--- a/doc/spec/proposals/ideas/xxx-geoip-survey-plan.txt
+++ /dev/null
@@ -1,137 +0,0 @@
-
-
-Abstract
-
- This document explains how to tell about how many Tor users there
- are, and how many there are in which country. Statistics are
- involved.
-
-Motivation
-
- There are a few reasons we need to keep track of which countries
- Tor users (in aggregate) are coming from:
-
- - Resource allocation. Knowing about underserved countries with
- lots of users can let us know about where we need to direct
- translation and outreach efforts.
-
- - Anticensorship. Sudden drops in usage on a national basis can
- indicate the arrival of a censorious firewall.
-
- - Sponsor outreach and self-evalutation. Many people and
- organizations who are interested in funding The Tor Project's
- work want to know that we're successfully serving parts of the
- world they're interested in, and that efforts to expand our
- userbase are actually succeeding. So do we.
-
-Goals
-
- We want to know approximately how many Tor users there are, and which
- countries they're in, even in the presence of a hypothetical
- "directory guard" feature. Some uncertainty is okay, but we'd like
- to be able to put a bound on the uncertainty.
-
- We need to make sure this information isn't exposed in a way that
- helps an adversary.
-
-Methods for current clients:
-
- Every client downloads network status documents. There are
- currently three methods (one hypothetical) for clients to get them.
- - 0.1.2.x clients (and earlier) fetch a v2 networkstatus
- document about every NETWORKSTATUS_CLIENT_DL_INTERVAL [30
- minutes].
-
- - 0.2.0.x clients fetch a v3 networkstatus consensus document
- at a random interval between when their current document is no
- longer freshest, and when their current document is about to
- expire.
-
- [In both of the above cases, clients choose a running
- directory cache at random with odds roughly proportional to
- its bandwidth. If they're just starting, they know a XXXX FIXME -NM]
-
- - In some future version, clients will choose directory caches
- to serve as their "directory guards" to avoid profiling
- attacks, similarly to how clients currently start all their
- circuits at guard nodes.
-
- We assume that a directory cache can tell which of these three
- categories a client is in by the format of its status request.
-
- A directory cache can be made to count distinct client IP
- addresses that make a certain request of it in a given timeframe,
- and total requests made to it over that timeframe. For the first
- two cases, a cache can get a picture of the overall
- number and countries of users in the network by dividing the IP
- count by the probability with which they (as a cache) would be
- chosen. Assuming that our listed bandwidth is such that we expect
- to be chosen with probability P for any given request, and we've
- been counting IPs for long enough that we expect the average
- client to have made N requests, they will have visited us at least
- once with probability P' = 1-(1-P)^N, and so we divide the IP
- counts we've seen by P' for our estimate. To estimate total
- number of clients of a given type, determine how many requests a
- client of that type will make over that time, and assume we'll
- have seen P of them.
-
- Both of these numbers are useful: the IP counts will give the
- total number of IPs connecting to the network, and the request
- counts will give the total number of users on the network at any
- given time.
-
- Notes:
- - [Over H hours, the N for V2 clients is 2*H, and the N for V3
- clients is currently around H/2 or H/3.]
-
- - (We should only count requests that we actually intend to answer;
- 503 requests shouldn't count.)
-
- - These measurements should also be taken at a directory
- authority if possible: their picture of the network is skewed
- by clients that fetch from them directly. These clients,
- however, are all the clients that are just bootstrapping
- (assuming that the fallback-consensus feature isn't yet used
- much).
-
- - These measurements also overestimate the V2 download rate if
- some downloads fail and clients retry them later after backing
- off.
-
-Methods for directory guards:
-
- If directory guards are in use, directory guards get a picture of
- all those users who chose them as a guard when they were listed
- as a good choice for a guard, and who are also on the network
- now. The cleanest data here will come from nodes that were listed
- as good new-guards choices for a while, and have not been so for a
- while longer (to study decay rates); nodes that have been listed
- as good new-guard choices consistently for a long time (to get a
- sample of the network); and nodes that have been listed as good
- new-guard choices only recently (to get a sample of new users and
- users whose guards have died out.)
-
- Since directory guards are currently unspecified, we'll need to
- make some guesses about how they'll turn out to work. Here are
- a couple of approaches that could work.
- - We could have clients pick completely new directory guards on
- a rolling basis every two months or so. This would ensure
- that staying as a guard for a while would be sufficient to
- see a sample of users. This is potentially advantageous for
- load-balancing the network as well, though it might lose some
- of the benefits of directory guard. We need to quantify the
- impact of this; it might not actually make stuff worse in
- practice, if most guards don't stay good guards for a month
- or two.
-
- - We could try to collect statistics at several directory
- guards and combine their statisics, but we would need to make
- sure that for all time, at least one of the directory guards
- had been recommended as a good choice for new guards. By
- looking at new-IP rates for guards, we could get an idea of
- user uptake; for looking at old-IP decay rates, we could get
- an idea of turnover. This approach would entail significant
- complexity, and we'd probably need to record more information
- than we'd really like to.
-
-
diff --git a/doc/spec/proposals/ideas/xxx-grand-scaling-plan.txt b/doc/spec/proposals/ideas/xxx-grand-scaling-plan.txt
deleted file mode 100644
index 336798cc0f..0000000000
--- a/doc/spec/proposals/ideas/xxx-grand-scaling-plan.txt
+++ /dev/null
@@ -1,97 +0,0 @@
-
-Right now as I understand it, there are n big scaling problems heading
-our way:
-
-1) Clients need to learn all the relay descriptors they could use. That's
-a lot of bytes through a potentially small pipe.
-2) Relays need to hold open TCP connections to most other relays.
-3) Clients need to learn the whole networkstatus. Even using v3, as
-the network grows that will become unwieldy.
-4) Dir mirrors need to mirror all the relay descriptors; eventually this
-will get big too.
-
-Here's my plan.
-
---------------------------------------------------------------------
-
-Piece one: download O(1) descriptors rather than O(n) descriptors.
-
-We need to change our circuit extend protocol so it fetches a relay
-descriptor at every 'extend' operation:
- - Client fetches networkstatus, picks guards, connects to one.
- - Client picks middle hop out of networkstatus, asks guard for
- its descriptor, then extends to it.
- - Clients picks exit hop out of networkstatus, asks middle hop
- for its descriptor, then extends to it. Done.
-
-The client needs to ask for the descriptor even if it already has a
-copy, because otherwise we leak too much. Also, the descriptor needs to
-be padded to some large (but not too large) size to prevent the middle
-hops from guessing about it.
-
-The first step towards this is to instrument the current code to see
-how much of a win this would actually be -- I am guessing it is already
-a win even with the current number of descriptors.
-
-We also would need to assign the 'Exit' flag more usefully, and make
-clients pay attention to it when picking their last hop, since they
-don't actually know the exit policies of the relays they're choosing from.
-
-We also need to think harder about other implications -- for example,
-a relay with a tiny exit policy won't get the Exit flag, and thus won't
-ever get picked as an exit relay. Plus, our "enclave exit" model is out
-the window unless we figure out a cool trick.
-
-More generally, we'll probably want to compress the descriptors that we
-send back; maybe 8k is a good upper bound? I wonder if we could ask for
-several descriptors, and bundle back all of the ones that fit in the 8k?
-
-We'd also want to put the load balancing weights into the networkstatus,
-so clients can choose fast nodes more often without needing to see the
-descriptors. This is a good opportunity for the authorities to be able
-to put "more accurate" weights in if they learn to detect attacks. It
-also means we should consider running automated audits to make sure the
-authorities aren't trying to snooker everybody.
-
-I'm aiming to get Peter Palfrader to tackle this problem in mid 2008,
-but I bet he could use some help.
-
---------------------------------------------------------------------
-
-Piece two: inter-relay communication uses UDP
-
-If relays send packets to/from other relays via UDP, they don't need a
-new descriptor for each such link. Thus we'll still need to keep state
-for each link, but we won't max out on sockets.
-
-Clearly a lot more work needs to be done here. Ian Goldberg has a student
-who has been working on it, and if all goes well we'll be chipping in
-some funding to continue that. Also, Camilo Viecco has been doing his
-PhD thesis on it.
-
---------------------------------------------------------------------
-
-Piece three: networkstatus documents get partitioned
-
-While the authorities should be expected to be able to handle learning
-about all the relays, there's no reason the clients or the mirrors need
-to. Authorities should put a cap on the number of relays listed in a
-single networkstatus, and split them when they get too big.
-
-We'd need a good way to have each authority come to the same conclusion
-about which partition a given relay goes into.
-
-Directory mirrors would then mirror all the relay descriptors in their
-partition. This is compatible with 'piece one' above, since clients in
-a given partition will only ask about descriptors in that partition.
-
-More complex versions of this design would involve overlapping partitions,
-but that would seem to start contradicting other parts of this proposal
-right quick.
-
-Nobody is working on this piece yet. It's hard to say when we'll need
-it, but it would be nice to have some more thought on it before the week
-that we need it.
-
---------------------------------------------------------------------
-
diff --git a/doc/spec/proposals/ideas/xxx-hide-platform.txt b/doc/spec/proposals/ideas/xxx-hide-platform.txt
deleted file mode 100644
index ad19fb1fd4..0000000000
--- a/doc/spec/proposals/ideas/xxx-hide-platform.txt
+++ /dev/null
@@ -1,37 +0,0 @@
-Filename: xxx-hide-platform.txt
-Title: Hide Tor Platform Information
-Author: Jacob Appelbaum
-Created: 24-July-2008
-Status: Draft
-
-
- Hiding Tor Platform Information
-
-0.0 Introduction
-
-The current Tor program publishes its specific Tor version and related OS
-platform information. This information could be misused by an attacker.
-
-0.1 Current Implementation
-
-Currently, the Tor binary sends data that looks like the following:
-
- Tor 0.2.0.26-rc (r14597) on Darwin Power Macintosh
- Tor 0.1.2.19 on Windows XP Service Pack 3 [workstation] {terminal services,
- single user}
-
-1.0 Suggested changes
-
-It would be useful to allow a user to configure the disclosure of such
-information. Such a change would be an option in the torrc file like so:
-
- HidePlatform Yes
-
-1.1 Suggested default behavior in the future
-
-If a user would like to disclose this information, they could configure their
-Tor to do so.
-
- HidePlatform No
-
-
diff --git a/doc/spec/proposals/ideas/xxx-port-knocking.txt b/doc/spec/proposals/ideas/xxx-port-knocking.txt
deleted file mode 100644
index 85c27ec52d..0000000000
--- a/doc/spec/proposals/ideas/xxx-port-knocking.txt
+++ /dev/null
@@ -1,91 +0,0 @@
-Filename: xxx-port-knocking.txt
-Title: Port knocking for bridge scanning resistance
-Author: Jacob Appelbaum
-Created: 19-April-2009
-Status: Draft
-
- Port knocking for bridge scanning resistance
-
-0.0 Introduction
-
-This document is a collection of ideas relating to improving scanning
-resistance for private bridge relays. This is intented to stop opportunistic
-network scanning and subsequent discovery of private bridge relays.
-
-
-0.1 Current Implementation
-
-Currently private bridges are only hidden by their obscurity. If you know
-a bridge ip address, the bridge can be detected trivially and added to a block
-list.
-
-0.2 Configuring an external port knocking program to control the firewall
-
-It is currently possible for bridge operators to configure a port knocking
-daemon that controls access to the incoming OR port. This is currently out of
-scope for Tor and Tor configuration. This process requires the firewall to know
-the current nodes in the Tor network.
-
-1.0 Suggested changes
-
-Private bridge operators should be able to configure a method of hiding their
-relay. Only authorized users should be able to communicate with the private
-bridge. This should be done with Tor and if possible without the help of the
-firewall. It should be possible for a Tor user to enter a secret key into
-Tor or optionally Vidalia on a per bridge basis. This secret key should be
-used to authenticate the bridge user to the private bridge.
-
-1.x Issues with low ports and bind() for ORPort
-
-Tor opens low numbered ports during startup and then drops privileges. It is
-no longer possible to rebind to those lower ports after they are closed.
-
-1.x Issues with OS level packet filtering
-
-Tor does not know about any OS level packet filtering. Currently there is no
-packet filters that understands the Tor network in real time.
-
-1.x Possible partioning of users by bridge operator
-
-Depending on implementation, it may be possible for bridge operators to
-uniquely identify users. This appears to be a general bridge issue when a
-bridge operator uniquely deploys bridges per user.
-
-2.0 Implementation ideas
-
-This is a suggested set of methods for port knocking.
-
-2.x Using SPA port knocking
-
-Single Packet Authentication port knocking encodes all required data into a
-single UDP packet. Improperly formatted packets may be simply discarded.
-Properly formatted packets should be processed and appropriate actions taken.
-
-2.x Using DNS as a transport for SPA
-
-It should be possible for Tor to bind to port 53 at startup and merely drop all
-packets that are not valid. UDP does not require a response and invalid packets
-will not trigger a response from Tor. With base32 encoding it should be
-possible to encode SPA as valid DNS requests. This should allow use of the
-public DNS infrastructure for authorization requests if desired.
-
-2.x Ghetto firewalling with opportunistic connection closing
-
-Until a user has authenticated with Tor, Tor only has a UDP listener. This
-listener should never send data in response, it should only open an ORPort
-when a user has successfully authenticated. After a user has authenticated
-with Tor to open an ORPort, only users who have authenticated will be able
-to use it. All other users as identified by their ip address will have their
-connection closed before any data is sent or received. This should be
-accomplished with an access policy. By default, the access policy should block
-all access to the ORPort.
-
-2.x Timing and reset of access policies
-
-Access to the ORPort is sensitive. The bridge should remove any exceptions
-to its access policy regularly when the ORPort is unused. Valid users should
-reauthenticate if they do not use the ORPort within a given time frame.
-
-2.x Additional considerations
-
-There are many. A format of the packet and the crypto involved is a good start.
diff --git a/doc/spec/proposals/ideas/xxx-rate-limit-exits.txt b/doc/spec/proposals/ideas/xxx-rate-limit-exits.txt
deleted file mode 100644
index 81fed20af8..0000000000
--- a/doc/spec/proposals/ideas/xxx-rate-limit-exits.txt
+++ /dev/null
@@ -1,63 +0,0 @@
-
-1. Overview
-
- We should rate limit the volume of stream creations at exits:
-
-2.1. Per-circuit limits
-
- If a given circuit opens more than N streams in X seconds, further
- stream requests over the next Y seconds should fail with the reason
- 'resourcelimit'. Clients will automatically notice this and switch to
- a new circuit.
-
- The goal is to limit the effects of port scans on a given exit relay,
- so the relay's ISP won't get hassled as much.
-
- First thoughts for parameters would be N=100 streams in X=5 seconds
- causes 30 seconds of fails; and N=300 streams in X=30 seconds causes
- 30 seconds of fails.
-
- We could simplify by, instead of having a "for 30 seconds" parameter,
- just marking the circuit as forever failing new requests. (We don't want
- to just close the circuit because it may still have open streams on it.)
-
-2.2. Per-destination limits
-
- If a given circuit opens more than N1 streams in X seconds to a single
- IP address, or all the circuits combined open more than N2 streams,
- then we should fail further attempts to reach that address for a while.
-
- The goal is to limit the abuse that Tor exit relays can dish out
- to a single target either for socket DoS or for web crawling, in
- the hopes of a) not triggering their automated defenses, and b) not
- making them upset at Tor. Hopefully these self-imposed bans will be
- much shorter-lived than bans or barriers put up by the websites.
-
-3. Issues
-
-3.1. Circuit-creation overload
-
- Making clients move to new circuits more often will cause more circuit
- creation requests.
-
-3.2. How to pick the parameters?
-
- If we pick the numbers too low, then popular sites are effectively
- cut out of Tor. If we pick them too high, we don't do much good.
-
- Worse, picking them wrong isn't easy to fix, since the deployed Tor
- servers will ship with a certain set of numbers.
-
- We could put numbers (or "general settings") in the networkstatus
- consensus, and Tor exits would adapt more dynamically.
-
- We could also have a local config option about how aggressive this
- server should be with its parameters.
-
-4. Client-side limitations
-
- Perhaps the clients should have built-in rate limits too, so they avoid
- harrassing the servers by default?
-
- Tricky if we want to get Tor clients in use at large enclaves.
-
diff --git a/doc/spec/proposals/ideas/xxx-separate-streams-by-port.txt b/doc/spec/proposals/ideas/xxx-separate-streams-by-port.txt
deleted file mode 100644
index f26c1e580f..0000000000
--- a/doc/spec/proposals/ideas/xxx-separate-streams-by-port.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-Filename: xxx-separate-streams-by-port.txt
-Title: Separate streams across circuits by destination port
-Author: Robert Hogan
-Created: 21-Oct-2008
-Status: Draft
-
-Here's a patch Robert Hogan wrote to use only one destination port per
-circuit. It's based on a wishlist item Roger wrote, to never send AIM
-usernames over the same circuit that we're hoping to browse anonymously
-through. The remaining open question is: how many extra circuits does this
-cause an ordinary user to create? My guess is not very many, but I'm wary
-of putting this in until we have some better estimate. On the other hand,
-not putting it in means that we have a known security flaw. Hm.
-
-Index: src/or/or.h
-===================================================================
---- src/or/or.h (revision 17143)
-+++ src/or/or.h (working copy)
-@@ -1874,6 +1874,7 @@
-
- uint8_t state; /**< Current status of this circuit. */
- uint8_t purpose; /**< Why are we creating this circuit? */
-+ uint16_t service; /**< Port conn must have to use this circuit. */
-
- /** How many relay data cells can we package (read from edge streams)
- * on this circuit before we receive a circuit-level sendme cell asking
-Index: src/or/circuituse.c
-===================================================================
---- src/or/circuituse.c (revision 17143)
-+++ src/or/circuituse.c (working copy)
-@@ -62,10 +62,16 @@
- return 0;
- }
-
-- if (purpose == CIRCUIT_PURPOSE_C_GENERAL)
-+ if (purpose == CIRCUIT_PURPOSE_C_GENERAL) {
- if (circ->timestamp_dirty &&
- circ->timestamp_dirty+get_options()->MaxCircuitDirtiness <= now)
- return 0;
-+ /* If the circuit is dirty and used for services on another port,
-+ then it is not suitable. */
-+ if (circ->service && conn->socks_request->port &&
-+ (circ->service != conn->socks_request->port))
-+ return 0;
-+ }
-
- /* decide if this circ is suitable for this conn */
-
-@@ -1351,7 +1357,9 @@
- if (connection_ap_handshake_send_resolve(conn) < 0)
- return -1;
- }
--
-+ if (conn->socks_request->port
-+ && (TO_CIRCUIT(circ)->purpose == CIRCUIT_PURPOSE_C_GENERAL))
-+ TO_CIRCUIT(circ)->service = conn->socks_request->port;
- return 1;
- }
-
diff --git a/doc/spec/proposals/ideas/xxx-using-spdy.txt b/doc/spec/proposals/ideas/xxx-using-spdy.txt
deleted file mode 100644
index d733a84b69..0000000000
--- a/doc/spec/proposals/ideas/xxx-using-spdy.txt
+++ /dev/null
@@ -1,143 +0,0 @@
-Filename: xxx-using-spdy.txt
-Title: Using the SPDY protocol to improve Tor performance
-Author: Steven J. Murdoch
-Created: 03-Feb-2010
-Status: Draft
-Target:
-
-1. Overview
-
- The SPDY protocol [1] is an alternative method for transferring
- web content over TCP, designed to improve efficiency and
- performance. A SPDY-aware browser can already communicate with
- a SPDY-aware web server over Tor, because this only requires a TCP
- stream to be set up. However, a SPDY-aware browser cannot
- communicate with a non-SPDY-aware web server. This proposal
- outlines how Tor could support this latter case, and why it
- may be good for performance.
-
-2. Motivation
-
- About 90% of Tor traffic, by connection, is HTTP [2], but
- users report subjective performance to be poor. It would
- therefore be desirable to improve this situation. SPDY was
- designed to offer better performance than HTTP, in
- high-latency and/or low-bandwidth situations, and is therefore
- an option worth examining.
-
- If a user wishes to access a SPDY-enabled web server over Tor,
- all they need to do is to configure their SPDY-enabled browser
- (e.g. Google Chrome) to use Tor. However, there are few
- SPDY-enabled web servers, and even if there was high demand
- from Tor users, there would be little motivation for server
- operators to upgrade, for the benefit of only a small
- proportion of their users.
-
- The motivation of this proposal is to allow only the user to
- install a SPDY-enabled browser, and permit web servers to
- remain unmodified. Essentially, Tor would incorporate a proxy
- on the exit node, which communicates SPDY to the web browser
- and normal HTTP to the web server. This proxy would translate
- between the two transport protocols, and possibly perform
- other optimizations.
-
- SPDY currently offers five optimizations:
-
- 1) Multiplexed streams:
- An unlimited number of resources can be transferred
- concurrently, over a single TCP connection.
-
- 2) Request prioritization:
- The client can set a priority on each resource, to assist
- the server in re-ordering responses.
-
- 3) Compression:
- Both HTTP header and resource content can be compressed.
-
- 4) Server push:
- The server can offer the client resources which have not
- been requested, but which the server believes will be.
-
- 5) Server hint:
- The server can suggest that the client request further
- resources, before the main content is transferred.
-
- Tor currently effectively implements (1), by being able to put
- multiple streams on one circuit. SPDY however requires fewer
- round-trips to do the same. The other features are not
- implemented by Tor. Therefore it is reasonable to expect that
- a HTTP <-> SPDY proxy may improve Tor performance, by some
- amount.
-
- The consequences on caching need to be considered carefully.
- Most of the optimizations SPDY offers have no effect because
- the existing HTTP cache control headers are transmitted without
- modification. Server push is more problematic, because here
- the server may push a resource that the client already has.
-
-3. Design outline
-
- One way to implement the SPDY proxy is for Tor exit nodes to
- advertise this capability in their descriptor. The OP would
- then preferentially select these nodes when routing streams
- destined for port 80.
-
- Then, rather than sending the usual RELAY_BEGIN cell, the OP
- would send a RELAY_BEGIN_TRANSFORMED cell, with a parameter to
- indicate that the exit node should translate between SPDY and
- HTTP. The rest of the connection process would operate as
- usual.
-
- There would need to be some way of elegantly handling non-HTTP
- traffic which goes over port 80.
-
-4. Implementation status
-
- SPDY is under active development and both the specification
- and implementations are in a state of flux. Initial
- experiments with Google Chrome in SPDY-mode and server
- libraries indicate that more work is needed before they are
- production-ready. There is no indication that browsers other
- than Google Chrome will support SPDY (and no official
- statement as to whether Google Chrome will eventually enable
- SPDY by default).
-
- Implementing a full SPDY proxy would be non-trivial. Stream
- multiplexing and compression are supported by existing
- libraries and would be fairly simple to implement. Request
- prioritization would require some form of caching on the
- proxy-side. Server push and server hint would require content
- parsing to identify resources which should be treated
- specially.
-
-5. Security and policy implications
-
- A SPDY proxy would be a significant amount of code, and may
- pull in external libraries. This code will process potentially
- malicious data, both at the SPDY and HTTP sides. This proposal
- therefore increases the risk that exit nodes will be
- compromised by exploiting a bug in the proxy.
-
- This proposal would also be the first way in which Tor is
- modifying TCP stream data. Arguably this is still meta-data
- (HTTP headers), but there may be some concern that Tor should
- not be doing this.
-
- Torbutton only works with Firefox, but SPDY only works with
- Google Chrome. We should be careful not to recommend that
- users adopt a browser which harms their privacy in other ways.
-
-6. Open questions:
-
- - How difficult would this be to implement?
-
- - How much performance improvement would it actually result in?
-
- - Is there some way to rapidly develop a prototype which would
- answer the previous question?
-
-[1] SPDY: An experimental protocol for a faster web
- http://dev.chromium.org/spdy/spdy-whitepaper
-[2] Shining Light in Dark Places: Understanding the Tor Network Damon McCoy,
- Kevin Bauer, Dirk Grunwald, Tadayoshi Kohno, Douglas Sicker
- http://www.cs.washington.edu/homes/yoshi/papers/Tor/PETS2008_37.pdf
diff --git a/doc/spec/proposals/ideas/xxx-what-uses-sha1.txt b/doc/spec/proposals/ideas/xxx-what-uses-sha1.txt
deleted file mode 100644
index b3ca3eea5a..0000000000
--- a/doc/spec/proposals/ideas/xxx-what-uses-sha1.txt
+++ /dev/null
@@ -1,247 +0,0 @@
-Filename: xxx-what-uses-sha1.txt
-Title: Where does Tor use SHA-1 today?
-Authors: Nick Mathewson, Marian
-Created: 30-Dec-2008
-Status: Meta
-
-
-Introduction:
-
- Tor uses SHA-1 as a message digest. SHA-1 is showing its age:
- theoretical attacks for finding collisions against it get better
- every year or two, and it will likely be broken in practice before
- too long.
-
- According to smart crypto people, the SHA-2 functions (SHA-256, etc)
- share too much of SHA-1's structure to be very good. RIPEMD-160 is
- also based on flawed past hashes. Some people think other hash
- functions (e.g. Whirlpool and Tiger) are not as bad; most of these
- have not seen enough analysis to be used yet.
-
- Here is a 2006 paper about hash algorithms.
- http://www.sane.nl/sane2006/program/final-papers/R10.pdf
-
- (Todo: Ask smart crypto people.)
-
- By 2012, the NIST SHA-3 competition will be done, and with luck we'll
- have something good to switch too. But it's probably a bad idea to
- wait until 2012 to figure out _how_ to migrate to a new hash
- function, for two reasons:
- 1) It's not inconceivable we'll want to migrate in a hurry
- some time before then.
- 2) It's likely that migrating to a new hash function will
- require protocol changes, and it's easiest to make protocol
- changes backward compatible if we lay the groundwork in
- advance. It would suck to have to break compatibility with
- a big hard-to-test "flag day" protocol change.
-
- This document attempts to list everything Tor uses SHA-1 for today.
- This is the first step in getting all the design work done to switch
- to something else.
-
- This document SHOULD NOT be a clearinghouse of what to do about our
- use of SHA-1. That's better left for other individual proposals.
-
-
-Why now?
-
- The recent publication of "MD5 considered harmful today: Creating a
- rogue CA certificate" by Alexander Sotirov, Marc Stevens, Jacob
- Appelbaum, Arjen Lenstra, David Molnar, Dag Arne Osvik, and Benne de
- Weger has reminded me that:
-
- * You can't rely on theoretical attacks to stay theoretical.
- * It's quite unpleasant when theoretical attacks become practical
- and public on days you were planning to leave for vacation.
- * Broken hash functions (which SHA-1 is not quite yet AFAIU)
- should be dropped like hot potatoes. Failure to do so can make
- one look silly.
-
-
-Triage
-
- How severe are these problems? Let's divide them into these
- categories, where H(x) is the SHA-1 hash of x:
- PREIMAGE -- find any x such that a H(x) has a chosen value
- -- A SHA-1 usage that only depends on preimage
- resistance
- * Also SECOND PREIMAGE. Given x, find a y not equal to
- x such that H(x) = H(y)
- COLLISION<role> -- A SHA-1 usage that depends on collision
- resistance, but the only party who could mount a
- collision-based attack is already in a trusted role
- (like a distribution signer or a directory authority).
- COLLISION -- find any x and y such that H(x) = H(y) -- A
- SHA-1 usage that depends on collision resistance
- and doesn't need the attacker to have any special keys.
-
- There is no need to put much effort into fixing PREIMAGE and SECOND
- PREIMAGE usages in the near-term: while there have been some
- theoretical results doing these attacks against SHA-1, they don't
- seem to be close to practical yet. To fix COLLISION<code-signing>
- usages is not too important either, since anyone who has the key to
- sign the code can mount far worse attacks. It would be good to fix
- COLLISION<authority> usages, since we try to resist bad authorities
- to a limited extent. The COLLISION usages are the most important
- to fix.
-
- Kelsey and Schneier published a theoretical second preimage attack
- against SHA-1 in 2005, so it would be a good idea to fix PREIMAGE
- and SECOND PREIMAGE usages after fixing COLLISION usages or where fixes
- require minimal effort.
-
- http://www.schneier.com/paper-preimages.html
-
- Additionally, we need to consider the impact of a successful attack
- in each of these cases. SHA-1 collisions are still expensive even
- if recent results are verified, and anybody with the resources to
- compute one also has the resources to mount a decent Sybil attack.
-
- Let's be pessimistic, and not assume that producing collisions of
- a given format is actually any harder than producing collisions at
- all.
-
-
-What Tor uses hashes for today:
-
-1. Infrastructure.
-
- A. Our X.509 certificates are signed with SHA-1.
- COLLSION
- B. TLS uses SHA-1 (and MD5) internally to generate keys.
- PREIMAGE?
- * At least breaking SHA-1 and MD5 simultaneously is
- much more difficult than breaking either
- independently.
- C. Some of the TLS ciphersuites we allow use SHA-1.
- PREIMAGE?
- D. When we sign our code with GPG, it might be using SHA-1.
- COLLISION<code-signing>
- * GPG 1.4 and up have writing support for SHA-2 hashes.
- This blog has help for converting:
- http://www.schwer.us/journal/2005/02/19/sha-1-broken-and-gnupg-gpg/
- E. Our GPG keys might be authenticated with SHA-1.
- COLLISION<code-signing-key-signing>
- F. OpenSSL's random number generator uses SHA-1, I believe.
- PREIMAGE
-
-2. The Tor protocol
-
- A. Everything we sign, we sign using SHA-1-based OAEP-MGF1.
- PREIMAGE?
- B. Our CREATE cell format uses SHA-1 for: OAEP padding.
- PREIMAGE?
- C. Our EXTEND cells use SHA-1 to hash the identity key of the
- target server.
- COLLISION
- D. Our CREATED cells use SHA-1 to hash the derived key data.
- ??
- E. The data we use in CREATE_FAST cells to generate a key is the
- length of a SHA-1.
- NONE
- F. The data we send back in a CREATED/CREATED_FAST cell is the length
- of a SHA-1.
- NONE
- G. We use SHA-1 to derive our circuit keys from the negotiated g^xy
- value.
- NONE
- H. We use SHA-1 to derive the digest field of each RELAY cell, but that's
- used more as a checksum than as a strong digest.
- NONE
-
-3. Directory services
-
- [All are COLLISION or COLLISION<authority> ]
-
- A. All signatures are generated on the SHA-1 of their corresponding
- documents, using PKCS1 padding.
- * In dir-spec.txt, section 1.3, it states,
- "SIGNATURE" Object contains a signature (using the signing key)
- of the PKCS1-padded digest of the entire document, taken from
- the beginning of the Initial item, through the newline after
- the Signature Item's keyword and its arguments."
- So our attacker, Malcom, could generate a collision for the hash
- that is signed. Thus, a second pre-image attack is possible.
- Vulnerable to regular collision attack only if key is stolen.
- If the key is stolen, Malcom could distribute two different
- copies of the document which have the same hash. Maybe useful
- for a partitioning attack?
- B. Router descriptors identify their corresponding extra-info documents
- by their SHA-1 digest.
- * A third party might use a second pre-image attack to generate a
- false extra-info document that has the same hash. The router
- itself might use a regular collision attack to generate multiple
- extra-info documents with the same hash, which might be useful
- for a partitioning attack.
- C. Fingerprints in router descriptors are taken using SHA-1.
- * The fingerprint must match the public key. Not sure what would
- happen if two routers had different public keys but the same
- fingerprint. There could perhaps be unpredictable behaviour.
- D. In router descriptors, routers in the same "Family" may be listed
- by server nicknames or hexdigests.
- * Does not seem critical.
- E. Fingerprints in authority certs are taken using SHA-1.
- F. Fingerprints in dir-source lines of votes and consensuses are taken
- using SHA-1.
- G. Networkstatuses refer to routers identity keys and descriptors by their
- SHA-1 digests.
- H. Directory-signature lines identify which key is doing the signing by
- the SHA-1 digests of the authority's signing key and its identity key.
- I. The following items are downloaded by the SHA-1 of their contents:
- XXXX list them
- J. The following items are downloaded by the SHA-1 of an identity key:
- XXXX list them too.
-
-4. The rendezvous protocol
-
- A. Hidden servers use SHA-1 to establish introduction points on relays,
- and relays use SHA-1 to check incoming introduction point
- establishment requests.
- B. Hidden servers use SHA-1 in multiple places when generating hidden
- service descriptors.
- * The permanent-id is the first 80 bits of the SHA-1 hash of the
- public key
- ** time-period performs caclulations using the permanent-id
- * The secret-id-part is the SHA-1 has of the time period, the
- descriptor-cookie, and replica.
- * Hash of introduction point's identity key.
- C. Hidden servers performing basic-type client authorization for their
- services use SHA-1 when encrypting introduction points contained in
- hidden service descriptors.
- D. Hidden service directories use SHA-1 to check whether a given hidden
- service descriptor may be published under a given descriptor
- identifier or not.
- E. Hidden servers use SHA-1 to derive .onion addresses of their
- services.
- * What's worse, it only uses the first 80 bits of the SHA-1 hash.
- However, the rend-spec.txt says we aren't worried about arbitrary
- collisons?
- F. Clients use SHA-1 to generate the current hidden service descriptor
- identifiers for a given .onion address.
- G. Hidden servers use SHA-1 to remember digests of the first parts of
- Diffie-Hellman handshakes contained in introduction requests in order
- to detect replays. See the RELAY_ESTABLISH_INTRO cell. We seem to be
- taking a hash of a hash here.
- H. Hidden servers use SHA-1 during the Diffie-Hellman key exchange with
- a connecting client.
-
-5. The bridge protocol
-
- XXXX write me
-
- A. Client may attempt to query for bridges where he knows a digest
- (probably SHA-1) before a direct query.
-
-6. The Tor user interface
-
- A. We log information about servers based on SHA-1 hashes of their
- identity keys.
- COLLISION
- B. The controller identifies servers based on SHA-1 hashes of their
- identity keys.
- COLLISION
- C. Nearly all of our configuration options that list servers allow SHA-1
- hashes of their identity keys.
- COLLISION
- E. The deprecated .exit notation uses SHA-1 hashes of identity keys
- COLLISION