Filename: 276-lower-bw-granularity.txt
Title: Report bandwidth with lower granularity in consensus documents
Author: Nick Mathewson
Created: 20-Feb-2017
Status: Open
Target: 0.3.1.x-alpha

1. Overview

   This document proposes that, in order to limit the bandwidth needed for
   networkstatus diffs, we lower the granularity with which bandwidth is
   reported in consensus documents.

   Making this change will reduce the total compressed ed diff download
   volume by around 10%.

2. Motivation

   Consensus documents currently report bandwidth values as the median
   of the measured bandwidth values in the votes.  (Or as the median of
   all votes' values if there are not enough measurements.)  And when
   voting, in turn, authorities simply report whatever measured value
   they most recently encountered, clipped to 3 significant base-10
   figures.

   This means that, from one consensus to the next, these weights very
   often and with little significance:  A large fraction of bandwidth
   transitions are under 2% in magnitude.

   As we begin to use consensus diffs, each change will take space to
   transmit.  So lowering the amount of changes will lower client
   bandwidth requirements significantly.

3. Proposal

   I propose that we round the bandwidth values as they are placed in
   the votes to two no more than significant digits.  In addition, for
   values beginning with decimal "2" through "4", we should round the
   first two digits the nearest multiple of 2.  For values beginning
   with decimal "5" though "9", we should round to the nearest multiple
   of 5.

   This change does not require a consensus method; it will take effect
   once enough authorities have upgraded.

4. Analysis

   The rounding proposed above will not round any value by more than
   5%, so the overall impact on bandwidth balancing should be small.

   In order to assess the bandwidth savings of this approach, I
   smoothed the January 2017 consensus documents' Bandwidth fields,
   using scripts from [1].  I found that if clients download
   consensus diffs once an hour, they can expect 11-13% mean savings
   after xz or gz compression.  For two-hour intervals, the savings
   is 8-10%; for three-hour or four-hour intervals, the savings only
   is 6-8%.  After that point, we start seeing diminishing returns,
   with only 1-2% savings on a 72-hour interval's diff.

    [1] https://github.com/nmathewson/consensus-diff-analysis

5. Open questions:

   Is there a greedier smoothing algorithm that would produce better
   results?

   Is there any reason to think this amount of smoothing would not
   be save?

   Would a time-aware smoothing mechanism work better?