start a draft for "new crypto ops"

author: Nick Mathewson <nickm@torproject.org> 2011-10-18 21:43:36 -0400
committer: Nick Mathewson <nickm@torproject.org> 2011-10-19 19:59:47 -0400
commit: d55883abf351b0df0a46ee6f5adc17dad61da9de (patch)
tree: e643c7ccc8ebec704cde6c7139266626f0124c58 /proposals/ideas
parent: ef5a3c93402000dcaa5eff4878010cbb36a01205 (diff)
download: torspec-d55883abf351b0df0a46ee6f5adc17dad61da9de.tar.gz
torspec-d55883abf351b0df0a46ee6f5adc17dad61da9de.zip
2 files changed, 614 insertions, 4 deletions
diff --git a/proposals/ideas/xxx-new-crypto-sketch.txt b/proposals/ideas/xxx-new-crypto-sketch.txt
new file mode 100644
index 0000000..a4893e1
--- /dev/null
+++ b/proposals/ideas/xxx-new-crypto-sketch.txt
@@ -0,0 +1,601 @@
+Title: Design sketch for new crypto ops
+Date: 19 Oct 2011
+Author: Nick Mathewson
+
+0. Overview
+
+  The point of this document is to discuss what crypto we ought to be using.
+  See "Initial Thoughts on Migrating Tor to New Cryptography" from last year
+  for general guidelines and principles.
+
+  In broad strokes, the parts of our crypto are:
+
+    IDENTITY KEYS AND FINGERPRINTS
+       Addressed here in Section 2.
+    LINK CRYPTO (TLS) --
+       Addressed in proposls 176, 184.  We say a little here though in
+       section 5.
+    CREATE/EXTEND CRYPTO --
+       Addressed in xxx-ntor-handshake.txt and rransom's extend draft
+    RELAY CRYPTO
+       Addressed here in Section 6
+    DIRECTORY SYSTEM
+       Addressed here.
+    HIDDEN SERVICE SYSTEM
+       Addressed in a forthcoming document by rransom.
+
+1. Base algorithm choice
+
+  There seem to be two main candidate algorithms for signatures: RSA with big
+  keys (hereinafter "RSA>1024"); and Ed25519, which is DSA with the sharp
+  edges filed off on an Edwards curve related to DJB's Curve25519.  We can
+  look at other ECC groups too.  {But see ECC Notes.}
+
+  For Diffie Hellman: Curve25519 seems like a decent choice; failing that,
+  another .  DH
+  on Z_p with big groups (hereinafter "DH>1024").  {But see ECC notes.}
+
+  For a hash function: SHA256, switching to SHA3 in 2012 when it comes out.
+  It might be worthwhile waiting for SHA3 in most places, and skipping over
+  the sha256 stage entirely.
+
+  For a stream cipher: AES-CTR is in one sense a conservative choice inasmuch
+  as AES is well-analyzed, but AES's well-known issues with cache-based
+  timing attacks are pretty worrisome.  We can mitigate that some by using
+  random secret IVs for AES-CTR, so that we will be encrypting neither
+  attacker-chosen nor attacker-known plaintext with out AES cipher, but
+  that's a bit kludgy.  Salsa20 is what rransom likes these days, but IMO we
+  aren't competent to tell whether it looks good or not; the existing attacks
+  against it don't look like very bad news to me, but who knows whether it's
+  getting enough attention that we can read.  See also ChaCha; see also the
+  other EuroCrypt {XXXX} winners/finalists; see also SHA3 if the SHA3 winner
+  specifies a way to use it as a stream cipher, or specifies an underlying
+  stream/block cipher.
+
+  For a random number generator: We currently use OpenSSL seeded with
+  RAND_poll and with platform entropy.  OpenSSL uses RC4 (XXX check).  The
+  platform entropy management can be messy, obscure, or both.  I suggest
+  that:
+    * we should seed our PRNG with more entropy sources if we can find some
+      promising code with an appropriate license
+    * instead of just using OpenSSL's PRNG, we should use OpenSSL's PRNG
+      xor'd with some other good PRNG.  (Is Yarrow still cool?  And is there
+      a combine operation better than xor?)
+    * We should re-seed the RNG before and after very sensitive operations,
+      like private key generation.
+
+
+1.1. ECC notes
+
+  ECC is the brave new[*] crypto of the future!  It's faster[**] than doing
+  crypto in Z_n (as we do for RSA and DH now) for equivalent levels of
+  security, and the resulting outputs are much shorter.
+
+  As near as I can tell as a layman, Certicom is muddying the waters as much
+  as possible wrt claiming that it's nigh impractical to deploy ECC without
+  licensing their patents.  This is rather like the silliness that PKP used
+  to pull back in the day, where they claimed that their patents covered not
+  only the existing public key cryptography algorithms, but also the very
+  idea of public key cryptography itself.
+
+  DJB claims that for every patent he's aware of, either that patent doesn't
+  cover his code, or that patent is invalid because of prior art.  I'm not
+  going to try to evaluate these claims, since I'm not supposed to be reading
+  patents for typical "let's avoid the appearance of knowing infringement"
+  reasons.  But before we dive into the world of ECC, we should see if we can
+  ask any friendly patent attorneys and ECC experts for a second or third
+  opinion here.
+
+  I note in passing that nearly all of the patents that DJB mentions in his
+  list would appear to expire over the next 12 months or so.
+
+  Additionally, there are ECC groups out there less fast than DJB's, but more
+  widely available and analyzed.  We should consider some of those too.
+
+  One final issue to investigate is whether using these algorithms will make
+  any major free software distribution decide not to include us.  I seem to
+  recall seeing that one or two of the big ones had at one point decided to
+  ship OpenSSL only with ECC disabled, either because of real patent
+  concerns, or because of an opinion that the certicom license for ECC use in
+  TLS was problematic for free software, or something like that.  We should
+  check that out.
+
+  [*] Actually, it's older than onion routing, and older than some members of
+  the Tor project.
+
+  [**] Actually, because of the common practice of choosing a small-ish prime
+  value (65537) for e in RSA, RSA public key operations can be a little
+  faster than equivalent-security ECDH or ECDSA operations.  The private key
+  operations in RSA are still much much slower.
+
+2. New identities
+
+  Identity keys and their fingerprints are used:
+    - To sign router descriptors
+    - To identify nodes in consensus directories
+    - To make sure we're talking to the right node in the link handshake
+    - To make sure that the extending node is talking to the right next node
+      when sending an extend cell
+    - To identify particular nodes in the hidden service subsystem
+    - To identify nodes in the UI in various places
+    - Internally, to identify a node uniquely in the codebase.
+    - To determine which part of the circuit ID space to use on a Tor
+      instance's links.
+
+2.1. New identities, option 1: RSA>1024, slow migration.
+
+  We use RSA for identity keys indefinitely.  Nearly all operations done with
+  an identity key are signature checking; signing happens only a few times an
+  hour per node even with pathological cases.  Since signature checking is
+  really cheap with RSA, there's no speed advantage for ECC here.  (There is
+  a space advantage, since the keys are much smaller.)
+
+  The easiest way to migrate to longer identity keys is to tell all Tors to
+  begin accepting longer identity keys now, and to tweak all our protocols so
+  that longer RSA identity keys are understood.  We should then have a pair
+  of parameters in the consensus that determines the largest and smallest
+  acceptable identity key size in the network.  Clients and servers should
+  reject any keys longer or shorter than specified.  Once all versions of Tor
+  can accept long identity keys, we raise the maximum size from 1024 to
+  somewhere in the 2048-4096 range.
+
+2.2. New identities option 2: RSA>1024, faster migration.
+
+  We use RSA for identity keys indefinitely as above.  But we allow nodes to
+  begin having longer identities now, even though older Tors won't understand
+  them.  This implies, of course, that every such node needs to have at least
+  2 identities: one RSA1024 identity for backward compatibility, one RSA>1024
+  identity for more secure identification.
+
+  We would have these identities cross-certify as follows: All keys would be
+  listed in the router descriptor.  RSA>1024 keys would be called something
+  other than identity-key, so as not to confuse older clients.  A signature
+  with the RSA>1024 key would appear right before the current RSA1024
+  signature.  This way, signed material would include both keys, and would be
+  signed by both keys.
+
+     [In other words, descriptors would look something like:
+
+      router foo...
+      ...
+      identity-key
+      -----BEGIN RSA KEY-----
+      1024-bit RSA key here
+      -----END RSA KEY-----
+      ext-identity-key
+      -----BEGIN RSA KEY-----
+      3072-bit RSA key here
+      -----END RSA KEY-----
+      ...
+      ext-signature
+      -----BEGIN SIGNATURE-----
+      signature of everything through "ext-signature\n", using the long key
+      -----END SIGNATURE-----
+      router-signature
+      -----BEGIN SIGNATURE-----
+      signature of everything through "router-signature\n", using the short key
+      -----END SIGNATURE-----
+
+     ]
+
+  See "UI notes" in the "new fingerprints" section below for some of the
+  implications of letting nodes have multiple identity keys.
+
+  We'll need to advertise these new identities in consensus directories too;
+  see XXXX below for more info there.
+
+2.3. New identities option 3: RSA>1024 and/or Ed25519, faster migration
+
+  As in option 2 above, but new keys can also be Ed25519.  If we expect that
+  not all installations will allow Ed25519 (see "ECC Notes", section 1.1),
+  we'll need to say that every server with an Ed25519 key must also have an
+  RSA>1024 key.
+
+2.4. Implications for current use of identity keys
+
+  Let's review our use of identity keys again and make sure that we can
+  handle all of them with the ideas above.
+
+    - To sign router descriptors
+
+  We discussed this in 2.2.
+
+    - To make sure we're talking to the right node in the link handshake
+
+  The current v3 link handshake can handle presenting multiple identity
+  certificates in the CERT cell.  We should consider ourselves to be
+  connected to a node with identity X if _any_ of the identity certificates
+  that it presents in its authenticated CERT cell has identity X.  To handle
+  EXTEND cells correctly, we should verify every identity we can.
+
+    - To make sure that the extending node is talking to the right next node
+      when sending an extend cell
+
+  The new extend cell format needs to allow the client to tell the extending
+  node about some identity for the destination node that the extending node
+  will be able to understand.  This is a capability of the extending node
+  that the client needs to be able to check. (Also, the extend cell needs to
+  hash that identity in a form the extending node can understand; but that's
+  a fingerprint issue.)
+
+    - To determine which part of the circuit ID space to use on a Tor
+      instance's links.
+
+  We can continue to use RSA1024 identity key comparison here by default.  We
+  can also use some other parameter of the v3 handshake, or introduce a new
+  link protocol where if the initiator authenticates, the initiator always
+  gets the low circIDs and the responder always gets the high ones.
+
+    - To identity nodes in consensus directories
+    - To identify nodes in the UI in various places
+    - Internally, to identify a node uniquely in the codebase.
+
+  See sections 3 and 4 below.
+
+    - To identify particular nodes in the hidden service subsystem
+
+  Out of scope.
+
+2.5. Migrating away from short ID keys entirely
+
+  Eventually no version of Tor that requires 1024-bit identity keys will
+  remain.  When that happens, we should stop using them entirely.  That means
+  that if we take any path other than the "slow migration" path of 2.1, we'll
+  need to make everything that looks at a node's identity also accept nodes
+  with _only_ a RSA>1024/Ed25519 identity.
+
+  At the directory service level, we should have an option to allow nodes
+  without RSA1024 identity keys (off until all clients and nodes accept new
+  identity keys).
+
+2.6. Implications for directory information
+
+  Clients must know a hash for each node's identity key, or else they can't
+  make an authenticated connection to the node or tell ORs how to extend to
+  the node.
+
+
+3. New fingerprints
+
+  Right now we compute fingerprints by taking the SHA1 hash of an ASN1
+  encoding of the RSA1024 identity key.  We encode this in hex almost
+  everywhere, and prefix it with a $.
+
+  I propose that fingerprints of the future be determined by taking a digest
+  using SHA256 or SHA3 of:
+
+      "Hash Algorithm Name", "Key Type Name", encoded key
+
+  When representing these internally, we should include the hash algorithm
+  that was used.  When representing them in the UI, we should use the
+  notation %b64, where b64 is a base-64 encoding, omitting the trailing =s.
+
+  (Other plausible characters to use are @, ?, +, ~, =, etc.  I like %, but
+  can be persuaded.  Bikeshed bikeshed bikeshed.)
+
+  Since 43 base-64 characters is enough to represent a 256-bit digest, with 2
+  bits left over, I propose that the b64 value encode
+      hh | D(hash algorithm name, key type, encoded key),
+  where hh is a 2-bit value, with one of the values:
+      00 -- sha256
+      01 -- sha3
+      10 -- to be determined
+      11 -- reserved.
+
+  We should investigate in the interface whether it's plausible to allow a
+  prefix of a node ID where the full ID would otherwise be required.  That
+  seems risky for short prefixes, though.
+
+3.1. How many fingerprints is that anyway?!
+
+  Suppose that we allow sha256 and sha3 as hash algorithms, and we allow each
+  node to have 3 identity keys: one RSA1024, one RSA>1024, and one ECC.  Then
+  we would have 7 fingerprints (6 plus the legacy SHA1(RSA1024) fingerprint),
+  for a total of 20+6*32==212 bytes per node.
+
+  It's not a horrible problem to accept them all in the UI, but the UI isn't
+  the only place that needs to know fingerprints.  Instead, let's say that
+  RSA1024 identities are only identified with SHA1 hashes.  This limits our
+  fingerprint load to a more manageable 20+32*2 == 84 bytes per node.  Still
+  not great, though.
+
+3.2. What does this imply for the UI?
+
+  In the UI we'll lose the property that no node has more than one
+  fingerprint: I do not believe that this actually hurts us.
+
+
+4. Directory changes
+
+4.1. Better cross-referencing
+
+  In some places, directory objects cross-reference one another by SHA1
+  hash.  They should use a better hash algorithm instead.
+
+  XXX
+
+4.2. More fingerprints
+
+  Because extending from node A to node B requires that we have node B's
+  fingerprint in a way that node A will understand, it is not enough to get a
+  set of identity fingerprints for each node in the format _we_ like best.
+  Instead, we must .
+
+
+4.x. An option: compound signatures on directory objects
+
+   In Tor 0.2.2.x and later, when we check a signature on a directory object
+   (not including hidden service descriptors), we only look at the first
+   DIGEST_LEN bytes of the RSA-signed data.  Once 0.2.1.x is obsolete, or on
+   any types of signatures not checked in 0.2.1.x, we can use the rest of the
+   space.  (We're using PKCS1 padding on our signatures, which has an
+   overhead of 11 bytes.  Signing a SHA1 hash with a 1024-bit key therefore
+   leaves 128-11-20==97 more bytes we could use for a SHA2 or a SHA3 hash.)
+
+4.x.
+
+
+5. Link crypto changes
+
+
+
+
+6. Relay crypto changes
+
+There are a few things we might want out of improved relay crypto.  They
+include:
+   - Resistance to end-to-end bitwise tagging attacks.
+   - Better resistance to malleability.
+   - If using counter mode, no block-cipher operations on any value known
+     to the attacker.
+
+I'll try to provide these in increasing order of difficulty.  None of these
+is necessarily correct; I should look for a security proof or a better
+construction for any that we seem likely to use.
+
+Rationales: Our existing malleability resistance is a kludge.  Doing no
+block-cipher ops on attacker-known values increases our security margins a
+little.  Our arguments about tagging attacks hold that an attacker who
+controls both ends has plenty of ways to win even if tagging attacks are
+foiled; nonetheless, most of these ways are technically slightly more
+difficult than xor-based tagging, and it could be useful to boost our
+defense-in-depth a little bit, just in case other active end-to-end attacks
+turn out to be harder than we'd though.)
+
+Option 1: Use AES-CTR in a less scary mode
+
+   When doing key expansion, in addition to establishing Kf, Kb, Df, and Db,
+   also establish IVf and IVb.  Use the current relay crypto, except instead
+   of starting the counters at 0, start them at IVf and IVb.  This way, an
+   attacker doesn't have any known plaintexts to work with, which makes AES a
+   little more robust.
+
+Option 2: As 1, but tagging attacks garble the circuit after one block.
+
+   Keep an HMAC of all previously received encrypted cells on a circuit.
+   When decrypting a cell, use this HMAC value to determine the first 64 bits
+   of the counter; increment the low 64 bits of the counter as usual.
+
+   This way, if an adversary flips any bits before passing the stream through
+   an honest node, no _subsequent_ block will be recoverable.
+
+   To prevent any part of the stream from being re-used, close any circuit if
+   the low 64 bits of the counter would ever wrap (that is, around 295
+   million terabytes).
+
+   (If we're using a stream cipher with fast re-key, then we can just have
+   the key used for each block be an HMAC of all previously received
+   ciphertext.)
+
+Option 3: As 1, but tagging attacks garble the circuit in the same block.
+
+   Use a large-block cipher mode, such as BEAR or LIONESS (depending on
+   whether we need a PRP or SPRP).  Base the key material for each block on
+   an HMAC of all previous blocks' ciphertexts.
+
+   This way, if an adversary makes any alteration in a block, that block and
+   all subsequent blocks will be garbled.  It's more expensive than 2,
+   though, especially if we need to use a LIONESS construction.
+
+   {I considered IGE here, with a trick where odd-numbered nodes on a circuit
+   start from the front of the block and even-numbered nodes start from the
+   end, but it didn't seem much better.  We should investigate relative
+   performance, though.}
+
+Option 4: Shall we have middle nodes be able to fast-stop bad data?
+
+   In all the above options, if a cell is altered, the middle node can at
+   best turn that cell and the rest of the cells on the circuit into garbage,
+   which the last node won't deliver (if honest) or can't deliver (if dishonest).
+
+   Might we prefer to do as in mixnets, and have nodes kill circuits upon
+   receiving altered cells?
+
+   I'm not so sure.  It's relatively expensive in per-cell overhead, and the
+   next-best attack to the one it prevents isn't so great.
+
+   Consider that if option 3 is in place, an end-to-end attacker who wants to
+   do a tagging attack at one node can garble the rest of the circuit, and
+   see if the output is garbled at the exit node.  But such an attacker could
+   just as easily close the circuit at one of those nodes and watch for a
+   corresponding close event, or even better -- simply pause traffic on that
+   circuit for a while and watch for a corresponding gap at the exit.  The
+   only advantage of the garbling attack would be that garbled cells are
+   presumably rarer than circuit closes or traffic pauses, and thus easier to
+   use to distinguish target circuits.  But that's still bunk; the other
+   attacks win fine, and the pause attack doesn't risk detection so much.
+
+   Still, to do this one, we'd want to have outgoing and incoming circuits
+   treated differently.  Incoming cells could work as in 1 or 2 or 3 above;
+   outgoing cells would want to have a header portion as in mixmaster, where
+   each node checks that the first N bits of the header match a MAC of all
+   data seen so far, *including the rest of the cell*.  They'd then decrypt
+   the rest of the cell, remove the first N bits of the header, and re-pad
+   the header with N bits at the end, taken from a PRNG whose seed is shared
+   with the client.  (This is basically how mixmaster works, and how
+   mixminion works in the common case.)
+
+   The space overhead here is kind of large: N bits per cell per node.  In
+   the most paranoid case, if we used 256-byte HMACs on 3-node paths, that's
+   96 bytes per cell, which is more than 20% of the total length.  But we can
+   probably do better if we let the CREATE operation also tell the node some
+   N to check.  For example, the first node doesn't need to check any bits.
+   The second and third nodes could check 64 bits apiece; that only has 16
+   bytes overhead, and high probability of catching any changes. (Birthday
+   attacks don't matter here, and an attacker who mounts this attack for long
+   enough to accidentally find a 64-bit mac will break so many circuits in
+   the process as to become totally unreliable.
+
+   But all of this leaks the path lengths and position on the path to various
+   nodes.  We might open ourselves up to partitioning attacks if different
+   clients choose different numbers of bits.  What's more, we might leak the
+   length of the path to the last node by how much junk there is at the end
+   of the cell.
+
+   Here's a simple construction for this format, to be concrete:
+
+     The CREATE operation's KDF produces the following outputs:
+           Kf, IVf  (stream cipher key, and possibly IV, for forward direction)
+           Kb, IVb  (stream cipher key, and possibly IV, for reverse direction)
+           Mf       (MAC key for forward direction)
+           Mb       (MAC key for reverse direction)
+           SEEDf    (PRNG key for forward direction)
+     And it also sets the following user-selected parameter:
+           MACBYTESf (an integer between 0 and 32 inclusive)
+           MACBYTESb (an integer between 0 and 32 inclusive)
+           CANEXIT   (boolean: can we exit from this hop?)
+
+     Let Kf[i], Mf[i], etc denote the parameter Kf, Mf, etc as shared between
+     the client and the i'th node in its circuit.
+
+     Relay cells sent towards the client have the following plaintext format:
+
+         Body:
+           Content:
+             Relay Command [1 byte]
+             StreamID      [2 bytes]
+             Length        [2 bytes]
+             Data          [Up to CELL_DATA_LEN-5-MACBYTESb bytes]
+             Padding       [randomly generated as needed to fill the cell]
+           MAC(All previous encrypted content + encrypted content,
+                                  Mb)[:MACBYTESb]   [MACBYTESb bytes]
+
+     The originator of the client-bound cell encrypts the content with the
+     next part of its Kb,IVb stream, then appends the MAC.
+
+     Non-clients receiving a client-bound relay cell encrypt the entire cell
+     body, MAC included, with the next part of the stream cipher that was
+     keyed with Kb,IVb.
+
+     When the client receives a relay cell body, it iteratively does:
+
+       For node i in circuit from 1..N:
+           Let cells_i = all previous cells which we decided were from i,
+           and let cellbody = the body of the cell, except for the last
+           MACBYTESb[i] bytes.
+           {XXXX I'd be more comfortable if the MAC covered all cells
+            passed by every node on the circuit.}
+
+           If MACBYTESb[i]>0, check whether MAC(cells_i | cellbody,
+           Mb[i])[:MACBYTESb[i]] the last MACBYTESb[i] bytes of the cell.  If
+           so, this cell is from node i.
+
+           If this cell is from node i, decrypt the first
+           CELL_DATA_LEN-MACBYTESb[i] bytes of the cell using the stream
+           keyed with Kb[i],IVb[i].  Act on it as a relay cell.
+
+           Otherwise decrypt the entire cell, MAC included, with the stream
+           keyed with Kb[i], IVb[i].
+
+      If no node sent this cell: it's junk and somebody is probably
+      messing with us!  Destroy the circuit.
+
+     When the client *sends* a cell outbound to node N:
+
+         Let cells[i] start at "" for all i in 1...N initially, and get
+         updated as below.
+
+         Let MACLEN = SUM(MACBYTESf[1...N])
+
+         Let Body =
+             Relay Command [1 byte]
+             StreamID      [2 bytes]
+             Length        [2 bytes]
+             Data          [Up to CELL_DATA_LEN-5-MACLEN bytes]
+             Padding       [randomly generated,
+                            CELL_DATA_LEN-5-MACLEN-len(Data) bytes]
+
+         Let PAD[i] = the next MACBYTESf[i] bytes from the PRNG keyed
+         with SEEDf[i], for i in 1...N.
+
+         Let STREAM[i] = the next CELL_DATA_LEN-MACBYTESf[i] bytes of
+           the stream keyed by Kf[i],IV[i], for i in 1...N.
+
+         Let PADSEEN[1] == ""
+
+         For i in 2...N:
+             Let L = len(PADSEEN[i-1])
+             Let PADSEEN[i] = (PADSEEN[i-1] xor STREAM[i-1][CELL_DATA_LEN-L:]) | PAD[i-1]
+
+         For i in N down to 1:
+
+            Let Encbody = Body xor STREAM[i][len(body)]
+            Let extra = "RECOGNIZED" if i == N, "OK" otherwise
+            Let cells[i] = cells[i] | Body | PADSEEN[i]
+            Let M = MAC(cells[i] | extra , Mf[i])
+
+            Let Body = M[:MACBYTESf[i]] | EncBody
+
+
+     To receive an outbound cell:
+
+         Let M be the first MACBYTESf bytes of the cell, let REST be the rest
+         of the cell, and let "cells" be all previous cells on this circuit.
+         If CANEXIT, and M = MAC(cells|rest|"RECOGNIZED", Mb)[:MACBYTESf],
+         and MACBYTESf > 0, this cell is for us.  If M = MAC(cells|rest|"OK",
+         Mb)[:MACBYTESf], this cell is not for us, but is valid.  Otherwise,
+         destroy the circuit.
+
+         Decrypt REST using the stream cipher keyed with Kf,IVf.  If this
+         cell is for us, act on it as a relay cell.  Otherwise, let
+         PAD = the next MACBYTESf[i] bytes of the PRNG keyed with SEEDf,
+         and relay REST | PAD.
+
+     ANOTHER VARIANT:
+
+         If we restrict MACBYTESf values to range 0..HL/2, where HL is the
+         length of the MAC output, we can replace
+           MAC(x | "RECOGIZED")[:MACBYTESf] and MAC(x | "OK")[:MACBYTESf]
+         with
+           MAC(x)[:MACBYTESf] and MAC(x)[HL-MACBYTESf:]
+
+     PICKING MACBYTESf,MACBYTESb.
+
+         We don't need to worry about birthday attacks:
+            Because we're using a MAC, only the parties who are making the
+            MACs could try to do a brute-force search for a collision, but
+            they have no reason to do so.
+
+            If a collision occurs accidentally, an adversary can't substitute
+            an earlier-seen cell for a later one with the same MAC, since the
+            MAC covers not only the cell, but all previous cells on the
+            circuit.
+         So 16 bytes is about the most we should ever do, given our usual
+         security parameters.  Let me moot the number 8 for MACBYTESb.
+
+         For outbound cells, for any hop we can exit from, choosing
+         MACBYTESf=6 gets us the current security level.  For the first hop,
+         assuming we don't exit from it, choosing MACBYTESf=0 is totally
+         safe, since the link crypto guarantees that nothing was corrupted on
+         the way.
+
+         In general, to prevent an end-to-end tagging attack, it seems
+         sufficient to do something like setting MACBYTES=8 for the last
+         hop, and MACBYTES=8 for one hop in the middle.
+
+
+ACKS
+
+   Lots of the good ideas and concerns here are due to Robert Ransom.
+
+   Michael Stone helped some with "relay option 4" above.
diff --git a/proposals/ideas/xxx-ntor-handshake.txt b/proposals/ideas/xxx-ntor-handshake.txt
index 41af5c7..1f988fc 100644
--- a/proposals/ideas/xxx-ntor-handshake.txt
+++ b/proposals/ideas/xxx-ntor-handshake.txt
@@ -67,13 +67,20 @@ Protocol:
     NODEID:     ID             -- H_LENGTH bytes
     KEYID:      KEYID(B)       -- H_LENGTH bytes
     CLIENT_PK:  X              -- G_LENGTH bytes
+    PARAMSLEN:                 -- 2 bytes
+    PARMS:                     -- PARAMSLEN byets
+
+  (The "PARAMS" component is used to encode any additional authenticated
+  information that's needed for establishing the right kind of circuit.)
 
   The server generates a keypair of y,Y = KEYGEN(), and computes
 
-    secret_input = EXP(X,y) | EXP(X,b) | ID | B | X | Y | PROTOID
+    secret_input = EXP(X,y) | EXP(X,b) | ID | B | X | Y | PARAMSLEN | PARAMS
+                     | PROTOID
     KEY_SEED = H(secret_input, t_key)
     verify = H(secret_input, t_verify)
-    auth_input = verify | ID | B | Y | X | PROTOID | "Server"
+    auth_input = verify | ID | B | Y | X | PARAMSLEN | PARAMS | PROTOID
+                     | "Server"
 
   The server sends a CREATED cell containing:
 
@@ -82,10 +89,12 @@ Protocol:
 
   The client then checks Y is in G^* [see below], and computes
 
-    secret_input = EXP(Y,x) | EXP(B,x) | ID | B | X | Y | PROTOID
+    secret_input = EXP(Y,x) | EXP(B,x) | ID | B | X | Y | PARAMSLEN | PARAMS
+                     | PROTOID
     KEY_SEED = H(secret_input, t_key)
     verify = H(secret_input, t_verify)
-    auth_input = verify | ID | B | Y | X | PROTOID | "Server"
+    auth_input = verify | ID | B | Y | X | PARAMLENS | PARAMS | PROTOID
+                     | "Server"
 
     The client verifies that AUTH == H(auth_input, t_mac).
author	Nick Mathewson <nickm@torproject.org>	2011-10-18 21:43:36 -0400
committer	Nick Mathewson <nickm@torproject.org>	2011-10-19 19:59:47 -0400
commit	d55883abf351b0df0a46ee6f5adc17dad61da9de (patch)
tree	e643c7ccc8ebec704cde6c7139266626f0124c58 /proposals/ideas
parent	ef5a3c93402000dcaa5eff4878010cbb36a01205 (diff)
download	torspec-d55883abf351b0df0a46ee6f5adc17dad61da9de.tar.gz torspec-d55883abf351b0df0a46ee6f5adc17dad61da9de.zip