From 9d2645e84a7e4e44e6d44b4f003b156ae6a0f1e1 Mon Sep 17 00:00:00 2001 From: Nick Mathewson Date: Sat, 28 Jun 2008 19:25:39 +0000 Subject: Add proposal 143: Improvements of Distributed Storage for Tor Hidden Service Descriptors svn:r15552 --- proposals/143-distributed-storage-improvements.txt | 195 +++++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 proposals/143-distributed-storage-improvements.txt (limited to 'proposals/143-distributed-storage-improvements.txt') diff --git a/proposals/143-distributed-storage-improvements.txt b/proposals/143-distributed-storage-improvements.txt new file mode 100644 index 0000000..ebfa122 --- /dev/null +++ b/proposals/143-distributed-storage-improvements.txt @@ -0,0 +1,195 @@ +Filename: 143-distributed-storage-improvements.txt +Title: Improvements of Distributed Storage for Tor Hidden Service Descriptors +Version: $Revision$ +Last-Modified: $Date$ +Author: Karsten Loesing +Created: 28-Jun-2008 +Status: Open + +Change history: + + 28-Jun-2008 Initial proposal for or-dev + +Overview: + + An evaluation of the distributed storage for Tor hidden service + descriptors and subsequent discussions have brought up a few improvements + to proposal 114. All improvements are backwards compatible to the + implementation of proposal 114. + +Design: + + 1. Report Bad Directory Nodes + + Bad hidden service directory nodes could deny existence of previously + stored descriptors. A bad directory node that does this with all stored + descriptors causes harm to the distributed storage in general, but + replication will cope with this problem in most cases. However, an + adversary that attempts to make a specific hidden service unavailable by + running relays that become responsible for all of a service's + descriptors poses a more serious threat. The distributed storage needs to + defend against this attack by detecting and removing bad directory nodes. + + As a countermeasure hidden services try to download their descriptors + every hour at random times from the hidden service directories that are + responsible for storing it. If a directory node replies with 404 (Not + found), the hidden service reports the supposedly bad directory node to + a random selection of half of the directory authorities (with version + numbers equal to or higher than the first version that implements this + proposal). The hidden service posts a complaint message using HTTP 'POST' + to a URL "/tor/rendezvous/complain" with the following message format: + + "hidden-service-directory-complaint" identifier NL + + [At start, exactly once] + + The identifier of the hidden service directory node to be + investigated. + + "rendezvous-service-descriptor" descriptor NL + + [At end, Excatly once] + + The hidden service descriptor that the supposedly bad directory node + does not serve. + + The directory authority checks if the descriptor is valid and the hidden + service directory responsible for storing it. It waits for a random time + of up to 30 minutes before posting the descriptor to the hidden service + directory. If the publication is acknowledged, the directory authority + waits another random time of up to 30 minutes before attempting to + request the descriptor that it has posted. If the directory node replies + with 404 (Not found), it will be blacklisted for being a hidden service + directory node for the next 48 hours. + + A blacklisted hidden service directory is assigned the new flag BadHSDir + instead of the HSDir flag in the vote that a directory authority creates. + In a consensus a relay is only assigned a HSDir flag if the majority of + votes contains a HSDir flag and no more than one third of votes contains + a BadHSDir flag. As a result, clients do not have to learn about the + BadHSDir flag. A blacklisted directory node will simply not be assigned + the HSDir flag in the consensus. + + In order to prevent an attacker from setting up new nodes as replacement + for blacklisted directory nodes, all directory nodes in the same /24 + subnet are blacklisted, too. Furthermore, if two or more directory nodes + are blacklisted in the same /16 subnet concurrently, all other directory + nodes in that /16 subnet are blacklisted, too. Blacklisting holds for at + most 48 hours. + + 2. Publish Fewer Replicas + + The evaluation has shown that the probability of a directory node to + serve a previously stored descriptor is 85.7% (more precisely, this is + the 0.001-quantile of the empirical distribution with the rationale that + it holds for 99.9% of all empirical cases). If descriptors are replicated + to x directory nodes, the probability of at least one of the replicas to + be available for clients is 1 - (1 - 85.7%) ^ x. In order to achieve an + overall availability of 99.9%, x = 3.55 replicas need to be stored. From + this follows that 4 replicas are sufficient, rather than the currently + stored 6 replicas. + + Further, the current design stores 2 sets of descriptors on 3 directory + nodes with consecutive identities. Originally, this was meant to + facilitate replication between directory nodes, which has not been and + will not be implemented (the selection criterion of 24 hours uptime does + not make it necessary). As a result, storing descriptors on directory + nodes with consecutive identities is not required. In fact it should be + avoided to enable an attacker to create "black holes" in the identifier + ring. + + Hidden services should store their descriptors on 4 non-consecutive + directory nodes, and clients should request descriptors from these + directory nodes only. For compatibility reasons, hidden services also + store their descriptors on 2 consecutive directory nodes. Hence, 0.2.0.x + clients will be able to retrieve 4 out of 6 descriptors, but will fail + for the remaining 2 descriptors, which is sufficient for reliability. As + soon as 0.2.0.x is deprecated, hidden services can stop publishing the + additional 2 replicas. + + 3. Change Default Value of Being Hidden Service Directory + + The requirements for becoming a hidden service directory node are an open + directory port and an uptime of at least 24 hours. The evaluation has + shown that there are 300 hidden service directory candidates in the mean, + but only 6 of them are configured to act as hidden service directories. + This is bad, because those 6 nodes need to serve a large share of all + hidden service descriptors. Optimally, there should be hundreds of hidden + service directories. Having a large number of 0.2.1.x directory nodes + also has a positive effect on 0.2.0.x hidden services and clients. + + Therefore, the new default of HidServDirectoryV2 should be 1, so that a + Tor relay that has an open directory port automatically accepts and + serves v2 hidden service descriptors. A relay operator can still opt-out + running a hidden service directory by changing HidServDirectoryV2 to 0. + The additional bandwidth requirements for running a hidden service + directory node in addition to being a directory cache are negligible. + + 4. Make Descriptors Persistent on Directory Nodes + + Hidden service directories that are restarted by their operators or after + a failure will not be selected as hidden service directories within the + next 24 hours. However, some clients might still think that these nodes + are responsible for certain descriptors, because they work on the basis + of network consensuses that are up to three hours old. The directory + nodes should be able to serve the previously received descriptors to + these clients. Therefore, directory nodes make all received descriptors + persistent and load previously received descriptors on startup. + + 5. Store and Serve Descriptors Regardless of Responsibility + + Currently, directory nodes only accept descriptors for which they think + they are responsible. This may lead to problems when a directory node + uses an older or newer network consensus than hidden service or client + or when a directory node has been restarted recently. In fact, there are + no security issues in storing or serving descriptors for which a + directory node thinks it is not responsible. To the contrary, doing so + may improve reliability in border cases. As a result, a directory node + does not pay attention to responsibilty when receiving a publication or + fetch request, but stores or serves the requested descriptor. Likewise, + the directory node does not remove descriptors when it thinks it is not + responsible for them any more. + + 6. Avoid Periodic Descriptor Re-Publication + + In the current implementation a hidden service re-publishes its + descriptor either when its content changes or an hour elapses. However, + the evaluation has shown that failures of hidden service directory nodes, + i.e. of nodes that have not failed within the last 24 hours, are very + rare. Together with making descriptors persistent on directory nodes, + there is no necessity to re-publish descriptors hourly. + + The only two events leading to descriptor re-publication should be a + change of the descriptor content and a new directory node becoming + responsible for the descriptor. Hidden services should therefore consider + re-publication every time they learn about a new network consensus + instead of hourly. + + 7. Discard Expired Descriptors + + The current implementation lets directory nodes keep a descriptor for two + days before discarding it. However, with the v2 design, descriptors are + only valid for at most one day. Directory nodes should determine the + validity of stored descriptors and discard them one hour after they have + expired (to compensate wrong clocks on clients). + + 8. Shorten Client-Side Descriptor Fetch History + + When clients try to download a hidden service descriptor, they memorize + fetch requests to directory nodes for up to 15 minutes. This allows them + to request all replicas of a descriptor to avoid bad or failing directory + nodes, but without querying the same directory node twice. + + The downside is that a client that has requested a descriptor without + success, will not be able to find a hidden service that has been started + during the following 15 minutes after the client's last request. + + This can be improved by shortening the fetch history to only 5 minutes. + This time should be sufficient to complete requests for all replicas of a + descriptor, but without ending in an infinite request loop. + +Compatibility: + + All proposed improvements are compatible to the currently implemented + design as described in proposal 114. + -- cgit v1.2.3-54-g00ecf