Filename: 275-md-published-time-is-silly.txt Title: Stop including meaningful "published" time in microdescriptor consensus Author: Nick Mathewson Created: 20-Feb-2017 Status: Closed Target: 0.3.1.x-alpha Implemented-In: 0.4.7.3-alpha 0. Status: As of 0.2.9.11 / 0.3.0.7 / 0.3.1.1-alpha, Tor no longer takes any special action on "future" published times, as proposed in section 4. As of 0.4.0.1-alpha, we implemented a better mechanism for relays to know when to publish. (See proposal 293.) 1. Overview This document proposes that, in order to limit the bandwidth needed for networkstatus diffs, we remove "published" part of the "r" lines in microdescriptor consensuses. The more extreme, compatibility-breaking version of this idea will reduce ed consensus diff download volume by approximately 55-75%. A less-extreme interim version would still reduce volume by approximately 5-6%. 2. Motivation The current microdescriptor consensus "r" line format is: r Nickname Identity Published IP ORPort DirPort as in: r moria1 lpXfw1/+uGEym58asExGOXAgzjE 2017-01-10 07:59:25 \ 128.31.0.34 9101 9131 As I'll show below, there's not much use for the "Published" part of these lines. By omitting them or replacing them with something more compressible, we can save space. What's more, changes in the Published field are one of the most frequent changes between successive networkstatus consensus documents. If we were to remove this field, then networkstatus diffs (see proposal 140) would be smaller. 3. Compatibility notes Above I've talked about "removing" the published field. But of course, doing this would make all existing consensus consumers stop parsing the consensus successfully. Instead, let's look at how this field is used currently in Tor, and see if we can replace the value with something else. * Published is used in the voting process to decide which descriptor should be considered. But that is taken from vote networkstatus documents, not consensuses. * Published is used in mark_my_descriptor_dirty_if_too_old() to decide whether to upload a new router descriptor. If the published time in the consensus is more than 18 hours in the past, we upload a new descriptor. (Relays are potentially looking at the microdesc consensus now, since #6769 was merged in 0.3.0.1-alpha.) Relays have plenty of other ways to notice that they should upload new descriptors. * Published is used in client_would_use_router() to decide whether a routerstatus is one that we might possibly use. We say that a routerstatus is not usable if its published time is more than OLD_ROUTER_DESC_MAX_AGE (5 days) in the past, or if it is not at least TestingEstimatedDescriptorPropagationTime (10 minutes) in the future. [***] Note that this is the only case where anything is rejected because it comes from the future. * client_would_use_router() decides whether we should download a router descriptor (not a microdescriptor) in routerlist.c * client_would_use_router() is used from count_usable_descriptors() to decide which relays are potentially usable, thereby forming the denominator of our "have descriptors / usable relays" fraction. So we have a fairly limited constraints on which Published values we can safely advertize with today's Tor implementations. If we advertise anything more than 10 minutes in the future, client_would_use_router() will consider routerstatuses unusable. If we advertize anything more than 18 hours in the past, relays will upload their descriptors far too often. 4. Proposal Immediately, in 0.2.9.x-stable (our LTS release series), we should stop caring about published_on dates in the future. This is a two-line change. As an interim solution: We should add a new consensus method number that changes the process by which Published fields in consensuses are generated. It should set all Published fields in the consensus to be the same value. These fields should be taken to rotate every 15 hours, by taking consensus valid-after time, and rounding down to the nearest multiple of 15 hours since the epoch. As a longer-term solution: Once all Tor versions earlier than 0.2.9.x are obsolete (in mid 2018), we can update with a new consensus method, and set the published_on date to some safe time in the future. 5. Analysis To consider the impact on consensus diffs: I analyzed consensus changes over the month of January 2017, using scripts at [1]. With the interim solution in place, compressed diff sizes fell by 2-7% at all measured intervals except 12 hours, where they increased by about 4%. Savings of 5-6% were most typical. With the longer-term solution in place, and all published times held constant permanently, the compressed diff sizes were uniformly at least 56% smaller. With this in mind, I think we might want to only plan to support the longer-term solution. [1] https://github.com/nmathewson/consensus-diff-analysis