From 1bd6007e3a801b8cee240678866282163831e49b Mon Sep 17 00:00:00 2001 From: Roger Dingledine Date: Sun, 18 Jan 2009 18:57:20 +0000 Subject: move my microdescriptors proposal into slot 158 svn:r18172 --- proposals/158-microdescriptors.txt | 207 +++++++++++++++++++++++++++++++++++++ 1 file changed, 207 insertions(+) create mode 100644 proposals/158-microdescriptors.txt (limited to 'proposals/158-microdescriptors.txt') diff --git a/proposals/158-microdescriptors.txt b/proposals/158-microdescriptors.txt new file mode 100644 index 0000000..f478a3c --- /dev/null +++ b/proposals/158-microdescriptors.txt @@ -0,0 +1,207 @@ +Filename: 158-microdescriptors.txt +Title: Clients download consensus + microdescriptors +Version: $Revision$ +Last-Modified: $Date$ +Author: Roger Dingledine +Created: 17-Jan-2009 +Status: Open + +1. Overview + + This proposal replaces section 3.2 of proposal 141, which was + called "Fetching descriptors on demand". Rather than modifying the + circuit-building protocol to fetch a server descriptor inline at each + circuit extend, we instead put all of the information that clients need + either into the consensus itself, or into a new set of data about each + relay called a microdescriptor. The microdescriptor is a direct + transform from the relay descriptor, so relays don't even need to know + this is happening. + + Descriptor elements that are small and frequently changing should go + in the consensus itself, and descriptor elements that are small and + relatively static should go in the microdescriptor. If we ever end up + with descriptor elements that aren't small yet clients need to know + them, we'll need to resume considering some design like the one in + proposal 141. + +2. Motivation + + See + http://archives.seul.org/or/dev/Nov-2008/msg00000.html and + http://archives.seul.org/or/dev/Nov-2008/msg00001.html and especially + http://archives.seul.org/or/dev/Nov-2008/msg00007.html + for a discussion of the options and why this is currently the best + approach. + +3. Design + + There are three pieces to the proposal. First, authorities will list in + their votes (and thus in the consensus) what relay descriptor elements + are included in the microdescriptor, and also list the expected hash + of microdescriptor for each relay. Second, directory mirrors will serve + microdescriptors. Third, clients will ask for them and cache them. + +3.1. Consensus changes + + V3 votes should include a new line: + microdescriptor-elements bar baz foo + listing each descriptor element (sorted alphabetically) that authority + included when it calculated its expected microdescriptor hashes. + + We also need to include the hash of each expected microdescriptor in + the routerstatus section. I suggest a new "m" line for each stanza, + with the base64 of the hash of the elements that the authority voted + for above. + + The consensus microdescriptor-elements and "m" lines are then computed + as described in Section 3.1.2 below. + + I believe that means we need a new consensus-method "6" that knows + how to compute the microdescriptor-elements and add "m" lines. + +3.1.1. Descriptor elements to include for now + + To start, the element list that authorities suggest should be + family onion-key + + (Note that the or-dev posts above only mention onion-key, but if + we don't also include family then clients will never learn it. It + seemed like it should be relatively static, so putting it in the + microdescriptor is smarter than trying to fit it into the consensus.) + + We could imagine a config option "family,onion-key" so authorities + could change their voted preferences without needing to upgrade. + +3.1.2. Computing consensus for microdescriptor-elements and "m" lines + + One approach is for the consensus microdescriptor-elements line to + include every element listed by a majority of authorities, sorted. The + problem here is that it will no longer be deterministic what the correct + hash for the "m" line should be. We could imagine telling the authority + to go look in its descriptor and produce the right hash itself, but + we don't want consensus calculation to be based on external data like + that. (Plus, the authority may not have the descriptor that everybody + else voted to use.) + + The better approach is to take the exact set that has the most votes + (breaking ties by the set that has the most elements, and breaking + ties after that by whichever is alphabetically first). That will + increase the odds that we actually get a microdescriptor hash that + is both a) for the descriptor we're putting in the consensus, and b) + over the elements that we're declaring it should be for. + + Then the "m" line for a given relay is the one that gets the most votes + from authorities that both a) voted for the microdescriptor-elements + line we're using, and b) voted for the descriptor we're using. + + (If there's a tie, use the smaller hash. But really, if there are + multiple such votes and they differ about a microdescriptor, we caught + one of them lying or being buggy. We should log it to track down why.) + + If there are no such votes, then we leave out the "m" line for that + relay. That means clients should avoid it for this time period. (As + an extension it could instead mean that clients should fetch the + descriptor and figure out its microdescriptor themselves. But let's + not get ahead of ourselves.) + + It would be nice to have a more foolproof way to agree on what + microdescriptor hash each authority should vote for, so we can avoid + missing "m" lines. Just switching to a new consensus-method each time + we change the set of microdescriptor-elements won't help though, since + each authority will still have to decide what hash to vote for before + knowing what consensus-method will be used. + + Here's one way we could do it. Each vote / consensus includes + the microdescriptor-elements that were used to compute the hashes, + and also a preferred-microdescriptor-elements set. If an authority + has a consensus from the previous period, then it should use the + consensus preferred-microdescriptor-elements when computing its votes + for microdescriptor-elements and the appropriate hashes in the upcoming + period. (If it has no previous consensus, then it just writes its + own preferences in both lines.) + +3.2. Directory mirrors serve microdescriptors + + Directory mirrors should then read the microdescriptor-elements line + from the consensus, and learn how to answer requests. (Directory mirrors + continue to serve normal relay descriptors too, a) to serve old clients + and b) to be able to construct microdescriptors on the fly.) + + The microdescriptors with hashes ,, should be available at: + http:///tor/micro/d/++.z + + All the microdescriptors from the current consensus should also be + available at: + http:///tor/micro/all.z + so a client that's bootstrapping doesn't need to send a 70KB URL just + to name every microdescriptor it's looking for. + + The format of a microdescriptor is the header line + "microdescriptor-header" + followed by each element (keyword and body), alphabetically. There's + no need to mention what hash it's for, since it's self-identifying: + you can hash the elements to learn this. + + (Do we need a footer line to show that it's over, or is the next + microdescriptor line or EOF enough of a hint? A footer line wouldn't + hurt much. Also, no fair voting for the microdescriptor-element + "microdescriptor-header".) + + The hash of the microdescriptor is simply the hash of the concatenated + elements -- not counting the header line or hypothetical footer line. + Unless you prefer that? + + Is there a reasonable way to version these things? We could say that + the microdescriptor-header line can contain arguments which clients + must ignore if they don't understand them. Any better ways? + + Directory mirrors should check to make sure that the microdescriptors + they're about to serve match the right hashes (either the hashes from + the fetch URL or the hashes from the consensus, respectively). + + We will probably want to consider some sort of smart data structure to + be able to quickly convert microdescriptor hashes into the appropriate + microdescriptor. Clients will want this anyway when they load their + microdescriptor cache and want to match it up with the consensus to + see what's missing. + +3.3. Clients fetch them and cache them + + When a client gets a new consensus, it looks to see if there are any + microdescriptors it needs to learn. If it needs to learn more than + some threshold of the microdescriptors (half?), it requests 'all', + else it requests only the missing ones. + + Clients maintain a cache of microdescriptors along with metadata like + when it was last referenced by a consensus. They keep a microdescriptor + until it hasn't been mentioned in any consensus for a week. Future + clients might cache them for longer or shorter times. + +3.3.1. Information leaks from clients + + If a client asks you for a set of microdescs, then you know she didn't + have them cached before. How much does that leak? What about when + we're all using our entry guards as directory guards, and we've seen + that user make a bunch of circuits already? + + Fetching "all" when you need at least half is a good first order fix, + but might not be all there is to it. + + Another future option would be to fetch some of the microdescriptors + anonymously (via a Tor circuit). + +4. Transition and deployment + + Phase one, the directory authorities should start voting on + microdescriptors and microdescriptor elements, and putting them in the + consensus. This should happen during the 0.2.1.x series, and should + be relatively easy to do. + + Phase two, directory mirrors should learn how to serve them, and learn + how to read the consensus to find out what they should be serving. This + phase could be done either in 0.2.1.x or early in 0.2.2.x, depending + on how messy it turns out to be and how quickly we get around to it. + + Phase three, clients should start fetching and caching them instead + of normal descriptors. This should happen post 0.2.1.x. + -- cgit v1.2.3-54-g00ecf