aboutsummaryrefslogtreecommitdiff
path: root/proposals/143-distributed-storage-improvements.txt
diff options
context:
space:
mode:
authorNick Mathewson <nickm@torproject.org>2008-06-28 19:25:39 +0000
committerNick Mathewson <nickm@torproject.org>2008-06-28 19:25:39 +0000
commit9d2645e84a7e4e44e6d44b4f003b156ae6a0f1e1 (patch)
tree361ed56254b73cbfc7f6e03f0fc24283b448e5c2 /proposals/143-distributed-storage-improvements.txt
parentbdadf561644c749078f6f264a76d6b45da222a9b (diff)
downloadtorspec-9d2645e84a7e4e44e6d44b4f003b156ae6a0f1e1.tar.gz
torspec-9d2645e84a7e4e44e6d44b4f003b156ae6a0f1e1.zip
Add proposal 143: Improvements of Distributed Storage for Tor Hidden Service Descriptors
svn:r15552
Diffstat (limited to 'proposals/143-distributed-storage-improvements.txt')
-rw-r--r--proposals/143-distributed-storage-improvements.txt195
1 files changed, 195 insertions, 0 deletions
diff --git a/proposals/143-distributed-storage-improvements.txt b/proposals/143-distributed-storage-improvements.txt
new file mode 100644
index 0000000..ebfa122
--- /dev/null
+++ b/proposals/143-distributed-storage-improvements.txt
@@ -0,0 +1,195 @@
+Filename: 143-distributed-storage-improvements.txt
+Title: Improvements of Distributed Storage for Tor Hidden Service Descriptors
+Version: $Revision$
+Last-Modified: $Date$
+Author: Karsten Loesing
+Created: 28-Jun-2008
+Status: Open
+
+Change history:
+
+ 28-Jun-2008 Initial proposal for or-dev
+
+Overview:
+
+ An evaluation of the distributed storage for Tor hidden service
+ descriptors and subsequent discussions have brought up a few improvements
+ to proposal 114. All improvements are backwards compatible to the
+ implementation of proposal 114.
+
+Design:
+
+ 1. Report Bad Directory Nodes
+
+ Bad hidden service directory nodes could deny existence of previously
+ stored descriptors. A bad directory node that does this with all stored
+ descriptors causes harm to the distributed storage in general, but
+ replication will cope with this problem in most cases. However, an
+ adversary that attempts to make a specific hidden service unavailable by
+ running relays that become responsible for all of a service's
+ descriptors poses a more serious threat. The distributed storage needs to
+ defend against this attack by detecting and removing bad directory nodes.
+
+ As a countermeasure hidden services try to download their descriptors
+ every hour at random times from the hidden service directories that are
+ responsible for storing it. If a directory node replies with 404 (Not
+ found), the hidden service reports the supposedly bad directory node to
+ a random selection of half of the directory authorities (with version
+ numbers equal to or higher than the first version that implements this
+ proposal). The hidden service posts a complaint message using HTTP 'POST'
+ to a URL "/tor/rendezvous/complain" with the following message format:
+
+ "hidden-service-directory-complaint" identifier NL
+
+ [At start, exactly once]
+
+ The identifier of the hidden service directory node to be
+ investigated.
+
+ "rendezvous-service-descriptor" descriptor NL
+
+ [At end, Excatly once]
+
+ The hidden service descriptor that the supposedly bad directory node
+ does not serve.
+
+ The directory authority checks if the descriptor is valid and the hidden
+ service directory responsible for storing it. It waits for a random time
+ of up to 30 minutes before posting the descriptor to the hidden service
+ directory. If the publication is acknowledged, the directory authority
+ waits another random time of up to 30 minutes before attempting to
+ request the descriptor that it has posted. If the directory node replies
+ with 404 (Not found), it will be blacklisted for being a hidden service
+ directory node for the next 48 hours.
+
+ A blacklisted hidden service directory is assigned the new flag BadHSDir
+ instead of the HSDir flag in the vote that a directory authority creates.
+ In a consensus a relay is only assigned a HSDir flag if the majority of
+ votes contains a HSDir flag and no more than one third of votes contains
+ a BadHSDir flag. As a result, clients do not have to learn about the
+ BadHSDir flag. A blacklisted directory node will simply not be assigned
+ the HSDir flag in the consensus.
+
+ In order to prevent an attacker from setting up new nodes as replacement
+ for blacklisted directory nodes, all directory nodes in the same /24
+ subnet are blacklisted, too. Furthermore, if two or more directory nodes
+ are blacklisted in the same /16 subnet concurrently, all other directory
+ nodes in that /16 subnet are blacklisted, too. Blacklisting holds for at
+ most 48 hours.
+
+ 2. Publish Fewer Replicas
+
+ The evaluation has shown that the probability of a directory node to
+ serve a previously stored descriptor is 85.7% (more precisely, this is
+ the 0.001-quantile of the empirical distribution with the rationale that
+ it holds for 99.9% of all empirical cases). If descriptors are replicated
+ to x directory nodes, the probability of at least one of the replicas to
+ be available for clients is 1 - (1 - 85.7%) ^ x. In order to achieve an
+ overall availability of 99.9%, x = 3.55 replicas need to be stored. From
+ this follows that 4 replicas are sufficient, rather than the currently
+ stored 6 replicas.
+
+ Further, the current design stores 2 sets of descriptors on 3 directory
+ nodes with consecutive identities. Originally, this was meant to
+ facilitate replication between directory nodes, which has not been and
+ will not be implemented (the selection criterion of 24 hours uptime does
+ not make it necessary). As a result, storing descriptors on directory
+ nodes with consecutive identities is not required. In fact it should be
+ avoided to enable an attacker to create "black holes" in the identifier
+ ring.
+
+ Hidden services should store their descriptors on 4 non-consecutive
+ directory nodes, and clients should request descriptors from these
+ directory nodes only. For compatibility reasons, hidden services also
+ store their descriptors on 2 consecutive directory nodes. Hence, 0.2.0.x
+ clients will be able to retrieve 4 out of 6 descriptors, but will fail
+ for the remaining 2 descriptors, which is sufficient for reliability. As
+ soon as 0.2.0.x is deprecated, hidden services can stop publishing the
+ additional 2 replicas.
+
+ 3. Change Default Value of Being Hidden Service Directory
+
+ The requirements for becoming a hidden service directory node are an open
+ directory port and an uptime of at least 24 hours. The evaluation has
+ shown that there are 300 hidden service directory candidates in the mean,
+ but only 6 of them are configured to act as hidden service directories.
+ This is bad, because those 6 nodes need to serve a large share of all
+ hidden service descriptors. Optimally, there should be hundreds of hidden
+ service directories. Having a large number of 0.2.1.x directory nodes
+ also has a positive effect on 0.2.0.x hidden services and clients.
+
+ Therefore, the new default of HidServDirectoryV2 should be 1, so that a
+ Tor relay that has an open directory port automatically accepts and
+ serves v2 hidden service descriptors. A relay operator can still opt-out
+ running a hidden service directory by changing HidServDirectoryV2 to 0.
+ The additional bandwidth requirements for running a hidden service
+ directory node in addition to being a directory cache are negligible.
+
+ 4. Make Descriptors Persistent on Directory Nodes
+
+ Hidden service directories that are restarted by their operators or after
+ a failure will not be selected as hidden service directories within the
+ next 24 hours. However, some clients might still think that these nodes
+ are responsible for certain descriptors, because they work on the basis
+ of network consensuses that are up to three hours old. The directory
+ nodes should be able to serve the previously received descriptors to
+ these clients. Therefore, directory nodes make all received descriptors
+ persistent and load previously received descriptors on startup.
+
+ 5. Store and Serve Descriptors Regardless of Responsibility
+
+ Currently, directory nodes only accept descriptors for which they think
+ they are responsible. This may lead to problems when a directory node
+ uses an older or newer network consensus than hidden service or client
+ or when a directory node has been restarted recently. In fact, there are
+ no security issues in storing or serving descriptors for which a
+ directory node thinks it is not responsible. To the contrary, doing so
+ may improve reliability in border cases. As a result, a directory node
+ does not pay attention to responsibilty when receiving a publication or
+ fetch request, but stores or serves the requested descriptor. Likewise,
+ the directory node does not remove descriptors when it thinks it is not
+ responsible for them any more.
+
+ 6. Avoid Periodic Descriptor Re-Publication
+
+ In the current implementation a hidden service re-publishes its
+ descriptor either when its content changes or an hour elapses. However,
+ the evaluation has shown that failures of hidden service directory nodes,
+ i.e. of nodes that have not failed within the last 24 hours, are very
+ rare. Together with making descriptors persistent on directory nodes,
+ there is no necessity to re-publish descriptors hourly.
+
+ The only two events leading to descriptor re-publication should be a
+ change of the descriptor content and a new directory node becoming
+ responsible for the descriptor. Hidden services should therefore consider
+ re-publication every time they learn about a new network consensus
+ instead of hourly.
+
+ 7. Discard Expired Descriptors
+
+ The current implementation lets directory nodes keep a descriptor for two
+ days before discarding it. However, with the v2 design, descriptors are
+ only valid for at most one day. Directory nodes should determine the
+ validity of stored descriptors and discard them one hour after they have
+ expired (to compensate wrong clocks on clients).
+
+ 8. Shorten Client-Side Descriptor Fetch History
+
+ When clients try to download a hidden service descriptor, they memorize
+ fetch requests to directory nodes for up to 15 minutes. This allows them
+ to request all replicas of a descriptor to avoid bad or failing directory
+ nodes, but without querying the same directory node twice.
+
+ The downside is that a client that has requested a descriptor without
+ success, will not be able to find a hidden service that has been started
+ during the following 15 minutes after the client's last request.
+
+ This can be improved by shortening the fetch history to only 5 minutes.
+ This time should be sufficient to complete requests for all replicas of a
+ descriptor, but without ending in an infinite request loop.
+
+Compatibility:
+
+ All proposed improvements are compatible to the currently implemented
+ design as described in proposal 114.
+