```
Filename: 278-directory-compression-scheme-negotiation.txt
Title: Directory Compression Scheme Negotiation
Author: Alexander Færøy
Created: 2017-03-06
Status: Closed
Implemented-In: 0.3.1.1-alpha

0. Overview

  This document describes a method to provide and use different
  compression schemes in Tor's directory specification[0] and let it be
  up the client and server to negotiate a mutually supported scheme
  using the semantics of the HTTP protocol.

  Furthermore this proposal also extends Tor's directory protocol with
  support for the LZMA and Zstandard compression schemes.

1. Motivation

  Currently Tor serves each directory client with its different document
  flavours in either an uncompressed format or, if the client adds a
  ".z"-suffix to the URL file path, a zlib-compressed document.

  This have historically been non-problematic, but it disallows us from
  easily extending the set of supported compression schemes.

  Some of the problems this proposal is trying to aid:

    - We currently only support zlib-based compression schemes and there
      is no way for directory servers or clients to announce which
      compression schemes they support. Zlib might not be the ideal
      compression scheme for all purposes.

    - It is not easily possible to add support for additional
      compression schemes without adding additional file extensions or
      flavours of the directory documents.

    - In low-bandwidth and/or low-memory client scenarios it is useful
      to be able to limit the amount of supported compression schemes to
      have a client only support the most efficient compression scheme
      for the given use-case and have the directory servers support the
      most commonly available compression schemes used throughout the
      network.

    - We add support for the LZMA compression scheme, which yields
      better compressed size and decompression time at the expensive of
      higher compression time and higher memory usage.

    - We add support for the Zstandard compression scheme, which yields
      better compression ratio than GZip, but slightly worse than LZMA,
      but with a smaller CPU and memory footprint than LZMA.

2. Analysis

  We investigated the compression ratio, memory usage, memory allocation
  strategies, and execution time for compression and decompression of
  the GZip, BZip2, LZMA, and Zstandard compression schemes at
  compression levels 1 through 9.

  The data used in this analysis can be found in [1] and the `bench`
  tool for generating the data can be found in [2].

  During the preparation for this proposal Nick have analysed
  compressing consensus diffs using both GZip, LZMA, and Zstandard. The
  result of Nick's analysis can be found in [3].

  We must continue to support both "gzip", "deflate", and "identity"
  which are the currently available compression schemes in the Tor
  network.

  Further to enhance the compression ratio Nick have also worked on
  proposal #274 (Rotate onion keys less frequently), #275 (Stop
  including meaningful "published" time in microdescriptor consensus),
  #276 (Report bandwidth with lower granularity in consensus documents),
  and #277 (Detect multiple relay instances running with same ID) which
  all aid in making our consensus documents less dynamic.

3. Proposal

  We extend the directory client requests to include the
  "Accept-Encoding" header as part of its request. The "Accept-Encoding"
  header should contain a comma-separated list of names of the
  compression schemes of which the client supports.

  For example:

    GET / HTTP/1.0
    Accept-Encoding: x-zstd, x-tor-lzma, gzip, deflate

  When a directory server receives a request with the "Accept-Encoding"
  header included, to either the ".z" compressed or the uncompressed
  version of any given document, it must decide on a mutually supported
  compression scheme and add the "Content-Encoding" header to its
  response and thus notifying the client of its decision. The
  "Content-Encoding" header can at most contain one supported
  compression scheme. If no mutual compression scheme can be negotiated
  the server must respond with an HTTP error status code of 406
  "Not Acceptable".

  For example:

    HTTP/1.0 200 OK
    Content-Length: 1337
    Connection: close
    Content-Encoding: x-zstd

  Currently supported compression scheme names includes "identity",
  "gzip", and "deflate". This proposal adds two additional compression
  scheme named "x-tor-lzma" (LZMA) and "x-zstd" (Zstandard).

  All compression scheme names are case-insensitive.

  The "deflate", "gzip", and "identity" compression schemes must be
  supported by directory servers for backwards compatibility.

  We use the name "x-tor-lzma" instead of just "x-lzma" because we
  require a defined upper bound of memory usage that is available for
  decompression of LZMA compressed data. The upper bound for memory
  available for LZMA decompression is defined as 16 MB. This currently
  means that will not use the LZMA compression scheme with a "preset"
  value higher than 6.

  Additionally, when a client, that supports this proposals, makes a
  request to a directory document with the ".z"-suffix it must send an
  ordered set of supported compression schemes where the last elements
  in the set contains compression schemes that are supported by all of
  the currently available Tor nodes ("gzip", "deflate", "identity"). In
  this way older relays will simply respond with the document compressed
  using zlib deflate without any prior knowledge of the newly added
  compression schemes.

  If a directory server receives a request to a document with the ".z"
  suffix, where the client does not include an "Accept-Encoding" header,
  the server should respond with the zlib compressed version of the
  document for backwards compatibility with client that does not support
  this proposal.

  The "Content-Length" header contains the number of compressed bytes
  sent to the client.

  The new compression schemes will be available for directory clients
  over both clearnet and BEGIN_DIR-style connections.

4. Security Implications

4.1 Compression and Decompression Bombs

  We currently detect compression and decompression "bombs" and must
  continue to do so with any additional compression schemes that we add.

  The detection of compression and decompression bombs are handled in
  `is_compression_bomb()` in torgzip.c and the same functionality is
  used both for compression and decompression. These functions must be
  extended to support LZMA and Zstandard.

4.2 Detection of Compression Algorithms

  To ensure that we do not pass compressed data through the incorrect
  decompression handler, when we have received data from another peer,
  Tor tries to detect the compression scheme in
  `detect_compression_method()`` in torgzip.c. This function should be
  extended to also detect the LZMA and Zstandard formats. Possible
  methods of applying this detection is looking at xz-tools, zstd's CLI,
  and the libmagic 'compress' module.

4.3 Fingerprinting

  All clients should aim at supporting the same set of supported
  compression schemes to avoid fingerprinting.

5. Compatibility

  This proposal does not break any backwards compatibility.

  Tor will continue to support serving uncompressed and zlib-compressed
  objects using the method defined in the directory specification[0],
  but will allow newer clients to negotiate a mutually supported
  compression scheme.

6. Performance and Scalability

  Each newly added compression scheme adds to the compression cache of a
  relay, which increases the memory requirements of a relay.

  The LZMA compression scheme yields better compression ratio at the
  expense of higher memory and CPU requirements for compression and
  slightly higher memory and CPU requirements for decompression.

  The Zstandard compression scheme yields better compression ratio than
  GZip does, but does not suffer from the same high CPU and memory
  requirements for compression as LZMA does.

  Because of the high requirements for CPU and memory usage for LZMA it
  is possible that we do not support this scheme for all available
  documents or that we only support it in situations where it is
  possible to pre-compute and cache the compressed document.

7. References

  [0]: https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
  [1]: https://docs.google.com/spreadsheets/d/1devQlUOzMPStqUl9mPawFWP99xSsRM8xWv7DNcqjFdo
  [2]: https://gitlab.com/ahf/tor-sponsor4-compression
  [3]: https://github.com/nmathewson/consensus-diff-analysis

8. Acknowledgements

  This research was supported in part by NSF grants CNS-1111539,
  CNS-1314637, CNS-1526306, CNS-1619454, and CNS-1640548.

```