aboutsummaryrefslogtreecommitdiff
path: root/proposals/275-md-published-time-is-silly.txt
diff options
context:
space:
mode:
Diffstat (limited to 'proposals/275-md-published-time-is-silly.txt')
-rw-r--r--proposals/275-md-published-time-is-silly.txt119
1 files changed, 119 insertions, 0 deletions
diff --git a/proposals/275-md-published-time-is-silly.txt b/proposals/275-md-published-time-is-silly.txt
new file mode 100644
index 0000000..b23e747
--- /dev/null
+++ b/proposals/275-md-published-time-is-silly.txt
@@ -0,0 +1,119 @@
+Filename: 275-md-published-time-is-silly.txt
+Title: Stop including meaningful "published" time in microdescriptor consensus
+Author: Nick Mathewson
+Created: 20-Feb-2017
+Status: Open
+Target: 0.3.1.x-alpha
+
+1. Overview
+
+ This document proposes that, in order to limit the bandwidth needed
+ for networkstatus diffs, we remove "published" part of the "r" lines
+ in microdescriptor consensuses.
+
+ The more extreme, compatibility-breaking version of this idea will
+ reduce ed consensus diff download volume by approximately 55-75%. A
+ less-extreme interim version would still reduce volume by
+ approximately 5-6%.
+
+2. Motivation
+
+ The current microdescriptor consensus "r" line format is:
+ r Nickname Identity Published IP ORPort DirPort
+ as in:
+ r moria1 lpXfw1/+uGEym58asExGOXAgzjE 2017-01-10 07:59:25 \
+ 128.31.0.34 9101 9131
+
+ As I'll show below, there's not much use for the "Published" part
+ of these lines. By omitting them or replacing them with
+ something more compressible, we can save space.
+
+ What's more, changes in the Published field are one of the most
+ frequent changes between successive networkstatus consensus
+ documents. If we were to remove this field, then networkstatus diffs
+ (see proposal 140) would be smaller.
+
+3. Compatibility notes
+
+ Above I've talked about "removing" the published field. But of
+ course, doing this would make all existing consensus consumers
+ stop parsing the consensus successfully.
+
+ Instead, let's look at how this field is used currently in Tor,
+ and see if we can replace the value with something else.
+
+ * Published is used in the voting process to decide which
+ descriptor should be considered. But that is takend from
+ vote networkstatus documents, not consensuses.
+
+ * Published is used in mark_my_descriptor_dirty_if_too_old()
+ to decide whether to upload a new router descriptor. If the
+ published time in the consensus is more than 18 hours in the
+ past, we upload a new descriptor. (Relays are potentially
+ looking at the microdesc consensus now, since #6769 was
+ merged in 0.3.0.1-alpha.) Relays have plenty of other ways
+ to notice that they should upload new descriptors.
+
+ * Published is used in client_would_use_router() to decide
+ whether a routerstatus is one that we might possibly use.
+ We say that a routerstatus is not usable if its published
+ time is more than OLD_ROUTER_DESC_MAX_AGE (5 days) in the
+ past, or if it is not at least
+ TestingEstimatedDescriptorPropagationTime (10 minutes) in
+ the future. [***] Note that this is the only case where anything
+ is rejected because it comes from the future.
+
+ * client_would_use_router() decides whether we should
+ download a router descriptor (not a microdescriptor)
+ in routerlist.c
+
+ * client_would_use_router() is used from
+ count_usable_descriptors() to decide which relays are
+ potentially usable, thereby forming the denominator of
+ our "have descriptors / usable relays" fraction.
+
+ So we have a fairly limited constraints on which Published values
+ we can safely advertize with today's Tor implementations. If we
+ advertise anything more than 10 minutes in the future,
+ client_would_use_router() will consider routerstatuses unusable.
+ If we advertize anything more than 18 hours in the past, relays
+ will upload their descriptors far too often.
+
+4. Proposal
+
+ Immediately, in 0.2.9.x-stable (our LTS release series), we
+ should stop caring about published_on dates in the future. This
+ is a two-line change.
+
+ As an interim solution: We should add a new consensus method number
+ that changes the process by which Published fields in consensuses are
+ generated. It should set all all Published fields in the consensus
+ should be the same value. These fields should be taken to rotate
+ every 15 hours, by taking consensus valid-after time, and rounding
+ down to the nearest multiple of 15 hours since the epoch.
+
+ As a longer-term solution: Once all Tor versions earlier than 0.2.9.x
+ are obsolete (in mid 2018), we can update with a new consensus
+ method, and set the published_on date to some safe time in the
+ future.
+
+5. Analysis
+
+ To consider the impact on consensus diffs: I analyzed consensus
+ changes over the month of January 2017, using scripts at [1].
+
+ With the interim solution in place, compressed diff sizes fell by
+ 2-7% at all measured intervals except 12 hours, where they increased
+ by about 4%. Savings of 5-6% were most typical.
+
+ With the longer-term solution in place, and all published times held
+ constant permanently, the compressed diff sizes were uniformly at
+ least 56% smaller.
+
+ With this in mind, I think we might want to only plan to support the
+ longer-term solution.
+
+ [1] https://github.com/nmathewson/consensus-diff-analysis
+
+
+