From ce09d266b817e4466f823f821a9ff8fbe0740b96 Mon Sep 17 00:00:00 2001 From: Nick Mathewson Date: Mon, 7 Aug 2017 13:39:27 -0400 Subject: Add my proposal-280 draft (privcount in tor) --- proposals/280-privcount-in-tor.txt | 332 +++++++++++++++++++++++++++++++++++++ 1 file changed, 332 insertions(+) create mode 100644 proposals/280-privcount-in-tor.txt (limited to 'proposals/280-privcount-in-tor.txt') diff --git a/proposals/280-privcount-in-tor.txt b/proposals/280-privcount-in-tor.txt new file mode 100644 index 0000000..c51eb33 --- /dev/null +++ b/proposals/280-privcount-in-tor.txt @@ -0,0 +1,332 @@ +Filename: 280-privcount-in-tor.txt +Title: Privacy-Preseving Statistics with Privcount in Tor +Author: Nick Mathewson, Tim Wilson-Brown +Created: 02-Aug-2017 +Status: Draft + +0. Acknowledgments + + Tariq Elahi, George Danezis, and Ian Goldberg designed and implemented + the PrivEx blinding scheme. Rob Jansen and Aaron Johnson extended + PrivEx's differential privacy guarantees to multiple counters in + PrivCount: + + https://github.com/privcount/privcount/blob/master/README.markdown#research-background + + Rob Jansen and Tim Wilson-Brown wrote the majority of the experimental + PrivCount code, based on the PrivEx secret-sharing variant. This + implementation includes contributions from the PrivEx authors, and + others: + + https://github.com/privcount/privcount/blob/master/CONTRIBUTORS.markdown + +1. Introduction and scope + + PrivCount is a privacy-preserving way to collect aggregate statistics + about the Tor network without exposing the statistics from any single + Tor relay. + + This document describes the behavior of the in-Tor portion of the + PrivCount system. It DOES NOT describe the counter configurations, + or any other parts of the system. (These will be covered in separate + proposals.) + +2. PrivCount overview + + Here follows an oversimplified summary of PrivCount, with enough + information to explain the Tor side of things. The actual operation + of the non-Tor components is trickier than described below. + + All values in the scheme below are 64-bit unsigned integers; addition + and subtraction are modulo 2^64. + + In PrivCount, a Data Collector (in this case a Tor relay) shares + numeric data with N different Tally Reporters. (A Tally Reporter + performs the summing and unblinding roles of the Tally Server and Share + Keeper from experimental PrivCount.) + + All N Tally Reporters together can reconstruct the original data, but + no (N-1)-sized subset of the Tally Reporters can learn anything about + the data. + + (In reality, the Tally Reporters don't reconstruct the original data + at all! Instead, they will reconstruct a _sum_ of the original data + across all participating relays.) + + To share data, for each value X to be shared, the relay generates + random values B_1 though B_n, and shares each B_i secretly with a + single Tally Reporter. The relay then publishes Y = X + SUM(B_i) + Z, + where Z is a noise value taken at random from a gaussian distribution. + The Tally Reporters can reconstruct X+Z by securely computing SUM(B_i) + across all contributing Data Collectors. (Tally Reporters MUST NOT + share individual B_i values: that would expose the underlying relay + totals.) + + In order to prevent bogus data from corrupting the tally, the Tor + relays and the Tally Reporters perform multiple "instances" of this + algorithm, randomly sampling in each relays. The relay sends multiple + Y values for each measurement, built with different sets of B_i. + These "instances" are numbered in order from 1 to R. + + So that the system will still produce results in the event of a single + Tally Reporter failure, these instances are distributed across multiple + subsets of Tally Reporters. + + Below we describe a data format for this. + +3. The document format + + This document format builds on the line-based directory format used + for other tor documents, described in Tor's dir-spec.txt. + + Using this format, we describe two kinds of documents here: a + "counters" document that publishes all the Y values, and a "blinding" + document that describes the B_i values. But see "An optimized + alternative" below. + + The "counters" document has these elements: + + "privctr-dump-format" SP VERSION SP SigningKey + + [At start, exactly once] + + Describes the version of the dump format, and provides an ed25519 + signing key to identify the relay. The signing key is encoded in + base64 with padding stripped. VERSION is "alpha" now, but should + be "1" once this document is finalized. + + [[[TODO: Do we need a counter version as well? + + Noise is distributed across a particular set of counters, + to provide differential privacy guarantees for those counters. + Reducing noise requires a break in the collection. + Adding counters is ok if the noise on each counter + monotonically increases. (Removing counters always reduces + noise.) + + We also need to work out how to handle instances with mixed + Tor versions, where some Data Collectors report different + counters to other Data Collectors. (The blinding works if we + substitute zeroes for missing counters on Tally Reporters. + But we also need to add noise in this case.) + + -teor + ]]] + + "starting-at" SP IsoTime + + [Exactly once] + + The start of the time period when the statistics here were + collected. + + "ending-at" SP IsoTime + + [Exactly once] + + The end of the time period when the statistics here were + collected. + + "num-instances" SP Number + + [Exactly once] + + The number of "instances" that the relay used (see above.) + + "tally-reporter" SP Identifier SP Key SP InstanceNumbers + + [At least twice] + + The curve25519 public key of each Tally Reporter that the relay + believes in. (If the list does not match the list of + participating tally reporters, they won't be able to find the + relay's values correctly.) The identifiers are non-space, + non-nul character sequences. The Key values are encoded in + base64 with padding stripped; they must be unique within each + counters document. The InstanceNumbers are comma-separated lists + of decimal integers from 0 to (num-instances - 1), in ascending + order. + + Keyword ":" SP Int SP Int SP Int ... + + [Any number of times] + + The Y values for a single measurement. There are num-instances + such Y values for each measurement. They are 64-bit unsigned + integers, expressed in decimal. + + The "Keyword" denotes which measurement is being shared. Keyword + MAY be any sequence of characters other than colon, nul, space, + and newline, though implementators SHOULD avoid getting too + creative here. Keywords MUST be unique within a single document. + Tally Reporters MUST handle unrecognized keywords. Keywords MAY + appear in any order. + + It is safe to send the blinded totals for each instance to every + Tally Reporter. To unblind the totals, a Tally Reporter needs: + * a blinding document from each relay in the instance, and + * the per-counter blinding sums from the other Tally Reporters + in their instance. + + [[[TODO: But is it safer to create a per-instance counters + document? -- teor]]] + + The semantics of individual measurements are not specified here. + + "signature" SP Signature + + [At end, exactly once] + + The Ed25519 signature of all the fields in the document, from the + first byte, up to but not including the "signature" keyword here. + The signature is encoded in base64 with padding stripped. + + + The "blinding" document has these elements: + + "privctr-secret-offsets" SP VERSION SP SigningKey + + [At start, exactly once.] + + The VERSION and SigningKey parameters are the same as for + "privctr-dump-format". + + "instances" SP Numbers + + [Exactly once] + + The instances that this Tally Reporter handles. + They are given as comma-separated decimal integers, as in the + "tally-reporter" entry in the counters document. They MUST + match the instances listed in the counters document. + + [[[TODO: this is redundant. Specify the constraint instead? --teor]]] + + "num-counters" SP Number + + [Exactly once] + + The number of counters that the relay used in its counters + document. This MUST be equal to the number of keywords in the + counters document. + + [[[TODO: this is redundant. Specify the constraint instead? --teor]]] + + "tally-reporter-pubkey" SP Key + + [Exactly once] + + The curve25519 public key of the tally reporter who is intended + to receive an decrypt this document. The key is base64-encoded + with padding stripped. + + "count-document-digest" SP "sha3" Digest NL + "-----BEGIN ENCRYPTED DATA-----" NL + Data + "-----END ENCRYPTED DATA-----" NL + + [Exactly once] + + The SHA3-256 digest of the count document corresponding to this + blinding document. The digest is base64-encoded with padding + stripped. The data encodes the blinding values (See "The + Blinding Values") below, and is encrypted to the tally reporter's + public key using the hybrid encryption algorithm described below. + + "signature" SP Signature + + [At end, exactly once] + + The Ed25519 signature of all the fields in the document, from the + first byte, up to but not including the "signature" keyword here. + The signature is encoded in base64 with padding stripped. + + +4. The Blinding Values + + The "Data" field of the blinding documents above, when decrypted, + yields a sequence of 64-bit binary values, encoded in network + (big-endian) order. There are C * R such values, where C is the number + of keywords in the count document, and R is the number of instances + that the Tally Reporter participates in. The client generates all of + these values uniformly at random. + + For each keyword in the count document, in the order specified by the + count document, the decrypted data holds R*8 bytes for the specified + instance of that keyword's blinded counter. + + For example: if the count document lists the keywords "b", "x", "g", + and "a" (in that order), and lists instances "0", and "2", then the + decrypted data will hold the blinding values in this order: + b, instance 0 + b, instance 2 + x, instance 0 + x, instance 2 + g, instance 0 + g, instance 2 + a, instance 0 + a, instance 2 + + +4. Implementation Notes + + A relay should, when starting a new round, generate all the blinding + values and noise values in advance. The relay should then use these + values to compute Y_0 = SUM(B_i) + Z for each instance of each + counter. Having done this, the relay MUST encrypt the blinding values + to the public key of each tally reporter, and wipe them from memory. + + +5. The hybrid encryption algorithm + + We use a hybrid encryption scheme above, where items can be encrypted + to a public key. We instantiate it as follows, using curve25519 + public keys. + + To encrypt a plaintext M to a public key PK1 + 1. the sender generates a new ephemeral keypair sk2, PK2. + 2. The sender computes the shared diffie hellman secret + SEED = (sk2 * PK1). + + 3. The sender derives 64 bytes of key material as + SHAKE256(TEXT | SEED)[...64] + where "TEXT" is "Expand curve25519 for privcount encryption". + + The first 32 bytes of this is an aes key K1; + the second 32 bytes are a mac key K2. + + 4. The sender computes a ciphertext C as AES256_CTR(K1, M) + + 5. The sender computes a MAC as + SHA3_256([00 00 00 00 00 00 00 20] | K2 | C) + + 6. The hybrid-encrypted text is PK2 | MAC | C. + + +6. An optimized alternative + + As an alternative, the sequences of blinding values is NOT transmitted + to the tally reporters. Instead the client generates a single + ephemeral keypair sk_c, PK_c, and places the public key in its counts + document. It does this each time a new round begins. + + For each tally reporter with public key PK_i, the client then does + the handshake sk_c * PK_i to compute SEED_i. + + The client then generates the blinding values for that tally reporter + as SHAKE256(SEED_i)[...R*C*8]. + + After initializing the counters to Y_0, the client can discard the + blinding values and sk_c. + + Later, the tally reporters can reconstruct the blinding values as + SHAKE256(sk_i * PK_c)[...] + + This alternative allows the client to transmit only a single public + key, when previously it would need to transmit a complete set of + blinding factors for each tally reporter. Further, the alternative + does away with the need for blinding documents altogether. It is, + however, more sensitive to any defects in SHAKE256 than the design + above. Like the rest of this design, it would need rethinking if we + want to expand this scheme to work with anonymous data collectors, + such as Tor clients. -- cgit v1.2.3-54-g00ecf