proposals/ideas/xxx-draft-spec-for-TLS-normalization.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330

Filename: xxx-draft-spec-for-TLS-normalization.txt
Title: Draft spec for TLS certificate and handshake normalization
Author: Jacob Appelbaum
Created: 16-Feb-2011
Status: Draft


        Draft spec for TLS certificate and handshake normalization


                                    Overview

Scope

This is a document that proposes improvements to problems with Tor's
current TLS (Transport Layer Security) certificates and handshake that will
reduce the distinguishability of Tor traffic from other encrypted traffic that
uses TLS.  It also addresses some of the possible fingerprinting attacks
possible against the current Tor TLS protocol setup process.

Motivation and history

Censorship is an arms race and this is a step forward in the defense
of Tor.  This proposal outlines ideas to make it more difficult to
fingerprint and block Tor traffic.

Goals

This proposal intends to normalize or remove easy-to-predict or static
values in the Tor TLS certificates and with the Tor TLS setup process.
These values can be used as criteria for the automated classification of
encrypted traffic as Tor traffic. Network observers should not be able
to trivially detect Tor merely by receiving or observing the certificate
used or advertised by a Tor relay. I also propose the creation of
a hard-to-detect covert channel through which a server can signal that it
supports the third version ("V3") of the Tor handshake protocol.

Non-Goals

This document is not intended to solve all of the possible active or passive
Tor fingerprinting problems. This document focuses on removing distinctive
and predictable features of TLS protocol negotiation; we do not attempt to
make guarantees about resisting other kinds of fingerprinting of Tor
traffic, such as fingerprinting techniques related to timing or volume of
transmitted data.

                                Implementation details


Certificate Issues

The CN or commonName ASN1 field

Tor generates certificates with a predictable commonName field; the
field is within a given range of values that is specific to Tor.
Additionally, the generated host names have other undesirable properties.
The host names typically do not resolve in the DNS because the domain
names referred to are generated at random. Although they are syntatically
valid, they usually refer to domains that have never been registered by
any domain name registrar.

An example of the current commonName field: CN=www.s4ku5skci.net

An example of OpenSSL’s asn1parse over a typical Tor certificate:

   0:d=0  hl=4 l= 438 cons: SEQUENCE
    4:d=1  hl=4 l= 287 cons: SEQUENCE
    8:d=2  hl=2 l=   3 cons: cont [ 0 ]
   10:d=3  hl=2 l=   1 prim: INTEGER           :02
   13:d=2  hl=2 l=   4 prim: INTEGER           :4D3C763A
   19:d=2  hl=2 l=  13 cons: SEQUENCE
   21:d=3  hl=2 l=   9 prim: OBJECT            :sha1WithRSAEncryption
   32:d=3  hl=2 l=   0 prim: NULL
   34:d=2  hl=2 l=  35 cons: SEQUENCE
   36:d=3  hl=2 l=  33 cons: SET
   38:d=4  hl=2 l=  31 cons: SEQUENCE
   40:d=5  hl=2 l=   3 prim: OBJECT            :commonName
   45:d=5  hl=2 l=  24 prim: PRINTABLESTRING   :www.vsbsvwu5b4soh4wg.net
   71:d=2  hl=2 l=  30 cons: SEQUENCE
   73:d=3  hl=2 l=  13 prim: UTCTIME           :110123184058Z
   88:d=3  hl=2 l=  13 prim: UTCTIME           :110123204058Z
  103:d=2  hl=2 l=  28 cons: SEQUENCE
  105:d=3  hl=2 l=  26 cons: SET
  107:d=4  hl=2 l=  24 cons: SEQUENCE
  109:d=5  hl=2 l=   3 prim: OBJECT            :commonName
  114:d=5  hl=2 l=  17 prim: PRINTABLESTRING   :www.s4ku5skci.net
  133:d=2  hl=3 l= 159 cons: SEQUENCE
  136:d=3  hl=2 l=  13 cons: SEQUENCE
  138:d=4  hl=2 l=   9 prim: OBJECT            :rsaEncryption
  149:d=4  hl=2 l=   0 prim: NULL
  151:d=3  hl=3 l= 141 prim: BIT STRING
  295:d=1  hl=2 l=  13 cons: SEQUENCE
  297:d=2  hl=2 l=   9 prim: OBJECT            :sha1WithRSAEncryption
  308:d=2  hl=2 l=   0 prim: NULL
  310:d=1  hl=3 l= 129 prim: BIT STRING

I propose that the commonName field be generated to match a specific property
of the server in question. It is reasonable to set the commonName element to
match either the hostname of the relay, the detected IP address of the relay,
or for the relay operator to override certificate generation entirely by
loading a custom certificate. For custom certificates, see the Custom
Certificates section.

I propose that the value for the commonName field be populated with the
fully qualified host name as detected by reverse and forward resolution of the
IP address of the relay. If the host name is in the DNS, this host name should
be set as the common name. When forward and reverse DNS is not available, I
propose that the IP address alone be used.

The commonName field for the issuer should be set to known issuer names,
random words or omitted entirely.

Since some host names may themselves trigger censorship keyword filters,
it may be reasonable to provide an option to override the defaults and
force certain values in the commonName field.

Considerations for commonName normalization

Any host name supplied for the commonName field should resolve - even if it
does not resolve to the IP address of the relay[0]. If the commonName field
does include an IP address, it should be the current IP address of the relay as
seen by other Internet hosts.

Certificate serial numbers

Currently our generated certificate serial number is set to the number of
seconds since the epoch at the time of the certificate's creation. I propose
that we should ensure that our serial numbers are unrelated to the epoch,
since the generation methods are potentially recognizable as Tor-related.
Instead, I propose that we use a randomly generated number that is
subsequently hashed with SHA-512 and then truncated to a length chosen at
random within a finite set of bounds. The length of the serial number should be
chosen randomly at certificate generation time; it should be bound between the
most commonly found bit lengths[1] in the wild. Random sixteen byte values
appear to be the high bound for serial number as issued by Verisign and
DigiCert.  RapidSSL appears to be three bytes in length. Others common byte
lengths appear to be between one and four bytes. I propose that we choose a
byte length that is either 3, 4, or 16 bytes at certificate generation time.

This randomly generated field may now serve as a covert channel that signals to
the client that the OR will not support TLS renegotiation; this means that the
client can expect to perform a V3 TLS handshake setup. Otherwise, if the serial
number is a reasonable time since the epoch, we should assume the OR is
using an earlier protocol version and hence that it expects renegotiation.

As a security note, care must be taken to ensure that supporting this
covert channel will not lead to an attacker having a method to downgrade client
behavior.

Certificate fingerprinting issues expressed as base64 encoding

It appears that all deployed Tor certificates have the following strings in
common:

MIIB
CCA
gAwIBAgIETU
ANBgkqhkiG9w0BAQUFADA
YDVQQDEx
3d3cu

As expected these values correspond to specific ASN.1 OBJECT IDENTIFIER (OID)
properties (sha1WithRSAEncryption, commonName, etc) of how we generate our
certificates.

As an illustrated example of the common bytes of all certificates used within
the Tor network within a single one hour window, I have replaced the actual
value with a wild card ('.') character here:

-----BEGIN CERTIFICATE-----
MIIB..CCA..gAwIBAgIETU....ANBgkqhkiG9w0BAQUFADA.M..w..YDVQQDEx.3
d3cu............................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
........................... <--- Variable length and padding
-----END CERTIFICATE-----

This fine ascii art only illustrates the bytes that absolutely match in all
cases.  In many cases, it's likely that there is a high probability for a given
byte to be only a small subset of choices.

Using the above strings, the EFF's certificate observatory may trivially
discover all known relays, known bridges and unknown bridges in a single SQL
query.  I propose that we ensure that we test our certificates to ensure that
they do not have these kinds of statistical similarities without ensuring
overlap with a very large cross section of the internet's certificates.

Other certificate fields

It may be advantageous to also generate values for the O, L, ST, C, and OU
certificate fields. The C and ST fields may be populated from GeoIP information
that is already available to Tor to reflect a plausible geographic location
for the OR. The other fields should contain some semblance of a word or
grouping of words. It has been suggested[2] that we should look to guides for
certificate generation that use OpenSSL as a reasonable baseline for
understanding these fields, as well as other certificate properties.

Certificate dating and validity issues

TLS certificates found in the wild are generally found to be long-lived;
they are frequently old and often even expired. The current Tor certificate
validity time is a very small time window starting at generation time and
ending shortly thereafter, as defined in or.h by MAX_SSL_KEY_LIFETIME
(2*60*60).

I propose that the certificate validity time length is extended to a period of
twelve Earth months, possibly with a small random skew to be determined by the
implementer. Tor should randomly set the start date in the past or some
currently unspecified window of time before the current date. This would
more closely track the typical distribution of non-Tor TLS certificate
expiration times.

The certificate values, such as expiration, should not be used for anything
relating to security; for example, if the OR presents an expired TLS
certificate, this does not imply that the client should terminate the
connection (as would be appropriate for an ordinary TLS implementation).
Rather, I propose we use a TOFU style expiration policy - the certificate
should never be trusted for more than a two hour window from first sighting.

This policy should have two major impacts. The first is that an adversary will
have to perform a differential analysis of all certificates for a given IP
address rather than a single check. The second is that the server expiration
time is enforced by the client and confirmed by keys rotating in the consensus.

The expiration time should not be a fixed time that is simple to calculate by
any Deep Packet Inspection device or it will become a new Tor TLS setup
fingerprint.

Custom Certificates

It should be possible for a Tor relay operator to use a specifically supplied
certificate and secret key. This will allow a relay or bridge operator to use a
certificate signed by any member of any geographically relevant certificate
authority racket; it will also allow for any other user-supplied certificate.
This may be desirable in some kinds of filtered networks or when attempting to
avoid attracting suspicion by blending in with the TLS web server certificate
crowd.

Problematic Diffie–Hellman parameters

We currently send a static Diffie–Hellman parameter, prime p (or “prime p
outlaw”) as specified in RFC2409 as part of the TLS Server Hello response.

The use of this prime in TLS negotiations may, as a result, be filtered and
effectively banned by certain networks. We do not have to use this particular
prime in all cases.

While amusing to have the power to make specific prime numbers into a new class
of numbers (cf. imaginary, irrational, illegal [3]) - our new friend prime p
outlaw is not required.

The use of this prime in TLS negotiations may, as a result, be filtered and
effectively banned by certain networks. We do not have to use this particular
prime in all cases.

I propose that the function to initialize and generate DH parameters be
split into two functions.

First, init_dh_param() should be used only for OR-to-OR DH setup and
communication. Second, it is proposed that we create a new function
init_tls_dh_param() that will have a two-stage development process.

The first stage init_tls_dh_param() will use the same prime that
Apache2.x [4] sends (or “dh1024_apache_p”), and this change should be
made immediately. This is a known good and safe prime number (p-1 / 2
is also prime) that is currently not known to be blocked.

The second stage init_tls_dh_param() should randomly generate a new prime on a
regular basis; this is designed to make the prime difficult to outlaw or
filter.  Call this a shape-shifting or "Rakshasa" prime.  This should be added
to the 0.2.3.x branch of Tor. This prime can be generated at setup or execution
time and probably does not need to be stored on disk. Rakshasa primes only
need to be generated by Tor relays as Tor clients will never send them. Such
a prime should absolutely not be shared between different Tor relays nor
should it ever be static after the 0.2.3.x release.

As a security precaution, care must be taken to ensure that we do not generate
weak primes or known filtered primes. Both weak and filtered primes will
undermine the TLS connection security properties. OpenSSH solves this issue
dynamically in RFC 4419 [5] and may provide a solution that works reasonably
well for Tor. More research in this area including the applicability of
Miller-Rabin or AKS primality tests[6] will need to be analyzed and probably
added to Tor.

Practical key size

Currently we use a 1024 bit long RSA modulus. I propose that we increase the
RSA key size to 2048 as an additional channel to signal support for the V3
handshake setup.  2048 appears to be the most common key size[0] above 1024.
Additionally, the increase in modulus size provides a reasonable security boost
with regard to key security properties.

The implementer should increase the 1024 bit RSA modulus to 2048 bits.

Possible future filtering nightmares

At some point it may cost effective or politically feasible for a network
filter to simply block all signed or self-signed certificates without a known
valid CA trust chain. This will break many applications on the internet and
hopefully, our option for custom certificates will ensure that this step is
simply avoided by the censors.

The Rakshasa prime approach may cause censors to specifically allow only
certain known and accepted DH parameters.

Appendix: Other issues

What other obvious TLS certificate issues exist? What other static values are
present in the Tor TLS setup process?

[0] http://archives.seul.org/or/dev/Jan-2011/msg00051.html
[1] http://archives.seul.org/or/dev/Feb-2011/msg00016.html
[2] http://archives.seul.org/or/dev/Feb-2011/msg00039.html
[3] To be fair this is hardly a new class of numbers. History is rife with
    similar examples of inane authoritarian attempts at mathematical secrecy.
    Probably the most dramatic example is the story of the pupil Hipassus of
    Metapontum, pupil of the famous Pythagoras, who, legend goes, proved the
    fact that Root2 cannot be expressed as a fraction of whole numbers (now
    called an irrational number) and was assassinated for revealing this
    secret.  Further reading on the subject may be found on the Wikipedia:
    http://en.wikipedia.org/wiki/Hippasus

[4] httpd-2.2.17/modules/ss/ssl_engine_dh.c
[5] http://tools.ietf.org/html/rfc4419
[6] http://archives.seul.org/or/dev/Jan-2011/msg00037.html