``` Filename: 308-counter-galois-onion.txt Title: Counter Galois Onion: A New Proposal for Forward-Secure Relay Cryptography Authors: Jean Paul Degabriele, Alessandro Melloni, Martijn Stam Created: 13 Sep 2019 Last-Modified: 13 Sep 2019 Status: Superseded NOTE: This proposal is superseded by an improved version of the Counter Galois Onion design based on the authors' forthcoming paper, "Counter Galois Onion: Fast, Forward-Secure, and Non-Malleable Onion Encryption for Tor". The improved proposal will be publicly available once the paper is closer to being ready for publication. -nickm 1. Background and Motivation In Proposal 202, Mathewson expressed the need to update Tor's Relay cryptography and protect against tagging attacks. Towards this goal he outlined two possible approaches for constructing an onion encryption scheme that should be able to withstand tagging attacks. Later, in Proposal 261, Mathewson proposed a concrete scheme based on the tweakable wide-block cipher AEZ. The security of Proposal 261 was analysed in [DS18]. An alternative scheme was suggested in Proposal 295 which combines an instantiation of the PIV construction from [ST14] and a variant of the GCM-RUP construction from [ADL17]. In this document we propose yet another scheme, Counter Galois Onion (CGO) which improves over proposals 261 and 295 in a number of ways. CGO has a minimalistic design requiring only a block cipher in counter-mode and a universal hash function. To take advantage of Intel's AES-NI and PCLMULQDQ instructions we recommend using AES and POLYVAL [GLL18]. In terms of security, it protects against tagging attacks while simultaneously providing forward security with respect to end-to-end authenticity and confidentiality. Furthermore CGO performs better than proposal 295 in terms of efficiency and its support of "leaky pipes". 1.2 Design Overview CGO makes due with a universal hash function while simultaneously satisfying forward security. It employs two distinct types of encryption, a dynamic encryption scheme DEnc and a static encryption scheme SEnc. DEnc is used for end-to-end encryption (layer n) and SEnc is used for the intermediate layers (n-1 to 1). DEnc is a Forward- Secure Authenticated Encryption scheme for securing end-to-end communication and SEnc provides the non-malleability for protecting against tagging attacks. In order to provide forward security, the key material in DEnc is updated with every encryption whereas in SEnc the key material is static. To support leaky pipes, in the forward direction each OR first attempts a partial decryption using DEnc and if it fails it reverts to decrypting using SEnc. The rest of the document describes the scheme's operation in terms of the low-level primitives and we make no further mention of DEnc and SEnc. However, on an intuitive level it can be helpful to think of: a) the combinations of E(KSf_I, *) and PH(HSf_I, *) as well as E(KDf_I, *) and PH(HDf_I, *) as two instances of a tweakable block cipher, b) the operation E(Sf_I, <0>) | E(Sf_I, <1>) | E(Sf_I, <2>) | ... as a PRG with seed Sf_I, c) and E(JSf_I, ) | E(JSf_I, ) | ... | E(JSf_I, ) as counter-mode encryption with as the initial vector. 2. Preliminaries 2.1. Notation Symbol Meaning ------ ------- M Plaintext Sf_I PRG Seed, forward direction, layer I Sb_I PRG Seed, backward direction, layer I Cf_I Ciphertext, forward direction, layer I Cb_I Ciphertext, backward direction, layer I Tf_I Tag, forward direction, layer I LTf_I Last Tag, forward direction, layer I Tb_I Tag, backward direction, layer I LTb_I Last Tag, backward direction, layer I Nf_I Nonce, forward direction, layer I LNf_I Last Nonce, forward direction, layer I Nb_I Nonce, backward direction, layer I LNb_I Last Nonce, backward direction, layer I JSf_I Static Block Cipher Key, forward direction, layer I JSb_I Static Block Cipher Key, backward direction, layer I KSf_I Static Block Cipher Key, forward direction, layer I KSb_I Static Block Cipher Key, backward direction, layer I KDf_I Dynamic Block Cipher Key, forward direction, layer I KDb_I Dynamic Block Cipher Key, backward direction, layer I HSf_I Static Poly-Hash Key, forward direction, layer I HSb_I Static Poly-Hash Key, backward direction, layer I HDf_I Dynamic Poly-Hash Key, forward direction, layer I HDb_I Dynamic Poly-Hash Key, backward direction, layer I ^ Bitwise XOR operator | Concatenation && Logical AND operator Z[a, b] For a string Z, the substring from byte a to byte b (indexing starts at 1) INT(X) Translate string X into an unsigned integer 2.2. Security parameters POLY_HASH_LEN -- The length of the polynomial hash function's output, in bytes. For POLYVAL, POLY_HASH_LEN = 16. PAYLOAD_LEN -- The longest allowable cell payload, in bytes (509). HASH_KEY_LEN -- The key length used to digest messages in bytes. For POLYVAL, DIG_KEY_LEN = 16. BC_KEY_LEN -- The key length, in bytes, of the block cipher used. For AES we recommend ENC_KEY_LEN = 16. BC_BLOCK_LEN -- The block length, in bytes, of the block cipher used. For AES, BC_BLOCK_LEN = 16. 2.3. Primitives The polynomial hash function is POLYVAL with a HASH_KEY_LEN-byte key. We write this as PH(H, M) where H is the key and M the message to be hashed. We use AES with a BC_KEY_LEN-byte key. For AES encryption (resp., decryption) we write E(K, X) (resp., D(K, X)) where K is a BC_KEY_LEN-byte key and X the block to be encrypted (resp., decrypted). For an integer j, we use to denote the string of length BC_BLOCK_LEN representing that integer. 2.4 Key derivation and initialisation (replaces Section 5.2.2) For newer KDF needs, Tor uses the key derivation function HKDF from RFC5869, instantiated with SHA256. (This is due to a construction from Krawczyk.) The generated key material is: K = K_1 | K_2 | K_3 | ... Where H(x, t) is HMAC_SHA256 with value x and key t and K_1 = H(m_expand | INT8(1) , KEY_SEED ) and K_(i+1) = H(K_i | m_expand | INT8(i+1) , KEY_SEED ) and m_expand is an arbitrarily chosen value, and INT8(i) is an octet with the value "i". In RFC5869's vocabulary, this is HKDF-SHA256 with info == m_expand, salt == t_key, and IKM == secret_input. 2.4.1. Key derivation using the KDF When used in the ntor handshake, for each layer I, the key material is split into the following sequence of contiguous values: Length Purpose Notation ------ ------- -------- BC_KEY_LEN forward Seed Sf_I BC_KEY_LEN backward Seed Sb_I if (I < n) in addition derive the following static keys: BC_KEY_LEN forward BC Key KSf_I BC_KEY_LEN backward BC Key KSb_I BC_KEY_LEN forward CTR Key JSf_I BC_KEY_LEN backward CTR Key JSb_I HASH_KEY_LEN forward poly hash key HSf_I HASH_KEY_LEN backward poly hash key HSb_I Excess bytes from K are discarded. 2.4.2. Initialisation from Seed For each layer I compute E(Sf_I, <0>) | E(Sf_I, <1>) | E(Sf_I, <2>) | ... and parse the output as: Length Purpose Notation ------ ------- -------- BC_BLOCK_LEN forward Nonce Nf_I BC_KEY_LEN forward BC Key KDf_I HASH_KEY_LEN forward poly hash key HDf_I BC_KEY_LEN new forward Seed Sf'_I Discard excess bytes, replace Sf_I with Sf'_I, and set LNf_n and LTf_I to the zero string. Similarly for the backward direction, compute E(Sb_I, <0>) | E(Sb_I, <1>) | E(Sb_I, <2>) | ... and parse the output as: Length Purpose Notation ------ ------- -------- BC_BLOCK_LEN backward Nonce Nb_I BC_KEY_LEN forward BC Key KDb_I HASH_KEY_LEN forward poly hash key HDb_I BC_KEY_LEN new backward Seed Sb'_I Discard excess bytes, replace Sb_I with Sb'_I, and set LNb_n and LTb_I to the zero string. NOTE: For layers n-1 to 1 the values Nf_I, KDf_I, HDf_I, Sf_I and their backward counterparts are only required in order to support leaky pipes. If leaky pipes is not required these values can be safely omitted. 3. Routing relay cells Let n denote the number of nodes in the circuit. Then encryption layer n corresponds to the encryption between the OP and the exit/destination node. 3.1. Forward Direction The forward direction is the direction that CREATE/CREATE2 cells are sent. 3.1.1. Routing From the Origin When an OP sends a relay cell, the cell is produced as follows: The OP computes E(Sf_n, <0>) | E(Sf_n, <1>) | E(Sf_n, <2>) | ... and parses the output as Length Purpose Notation ------ ------- -------- 509 encryption pad Z BC_BLOCK_LEN backward Nonce Nf'_I BC_KEY_LEN forward BC Key KDf'_I HASH_KEY_LEN forward poly hash key HDf'_I BC_KEY_LEN new forward Seed Sf'_I Excess bytes are discarded. It then computes the n'th layer ciphertext (Tf_n, Cf_n) as follows: Cf_n = M ^ Z X_n = PH(HDf_n, (LNf_n | Cf_n)) Y_n = Nf_n ^ X_n Tf_n = E(KDf_n, Y_n) ^ X_n and updates its state by overwriting the old variables with the new ones. LNf_n = Nf_n Nf_n = Nf'_n KDf_n = KDf'_n HDf_n = HDf'_n Sf_n = Sf'_n It then applies the remaining n-1 layers of encryption to (Tf_n, Cf_n) as follows: For I = n-1 to 1: IV = INT(Tf_{I+1}) Z = E(JSf_I, ) | E(JSf_I, ) | ... | E(JSf_I, ) % BC_BLOCK_LEN = 16 Cf_I = Cf_{I+1} ^ Z[1, 509] X_I = PH(HSf_n, (LTf_{I+1} | Cf_I)) Y_I = Tf_{I+1} ^ X_I Tf_I = E(KSf_I, Y_I) ^ X_I LTf_{I+1} = Tf_{I+1} Upon completion the OP sends (Tf_1, Cf_1) to node 1. 3.1.2. Relaying Forward at Onion Routers When a forward relay cell (Tf_I, Cf_I) is received by OR I, it decrypts it performs the following set of steps: 'Forward' relay cell: X_I = PH(HDf_n, (LNf_I | Cf_I)) Y_I = Tf_I ^ X_I if (Nf_I == D(KDf_I, Y_I) ^ X_I) % cell recognized and authenticated compute E(Sf_I, <0>) | E(Sf_I, <1>) | E(Sf_I, <2>) | ... and parse the output as Z, Nf'_I, KDf'_I, HDf'_I, Sf'_I M = Cf_n ^ Z LNf_I = Nf_I Nf_I = Nf'_I KDf_I = KDf'_I HDf_I = HDf'_I Sf_I = Sf'_I return M else if (I == n) % last node, decryption has failed send DESTROY cell to tear down the circuit else % decrypt and forward cell X_I = PH(HSf_I, (LTf_{I+1} | Cf_I)) Y_I = Tf_I ^ X_I Tf_{I+1} = D(KSf_I, Y_I) ^ X_I IV = INT(Tf_{I+1}) Z = E(JSf_I, ) | E(JSf_I, ) | ... | E(JSf_I, ) % BC_BLOCK_LEN = 16 Cf_{I+1} = Cf_I ^ Z[1, 509] forward (Tf_{I+1}, Cf_{I+1}) to OR I+1 3.2. Backward Direction The backward direction is the opposite direction from CREATE/CREATE2 cells. 3.2.1. Routing From the Exit Node At OR n encryption proceeds as follows: It computes E(Sb_n, <0>) | E(Sb_n, <1>) | E(Sb_n, <2>) | ... and parses the output as Length Purpose Notation ------ ------- -------- 509 encryption pad Z BC_BLOCK_LEN backward Nonce Nb'_I BC_KEY_LEN forward BC Key KDb'_I HASH_KEY_LEN forward poly hash key HDb'_I BC_KEY_LEN new forward Seed Sb'_I Excess bytes are discarded. It then computes the ciphertext (Tf_n, Cf_n) as follows: Cb_n = M ^ Z X_n = PH(HDb_n, (LNb_n | Cb_n)) Y_n = Nb_n ^ X_n Tb_n = E(KDb_n, Y_n) ^ X_n) and updates its state by overwriting the old variables with the new ones. LNb_n = Nb_n Nb_n = Nb'_n KDb_n = KDb'_n HDb_n = HDb'_n Sb_n = Sb'_n 3.2.2. Relaying Backward at the Onion Routers At OR I (for I < n) when a ciphertext (Tb_I, Cb_I) in the backward direction is received it is processed as follows: X_I = PH(HSb_n, (LTb_{I-1} | Cb_I)) Y_I = Tb_I ^ X_I Tb_{I-1} = D(KSb_I, Y_I) ^ X_I IV = INT(Tb_{I-1}) Z = E(JSb_I, ) | E(JSb_I, ) | ... | E(JSb_I, ) % BC_BLOCK_LEN = 16 Cb_{I-1} = Cb_I ^ Z[1, 509] The ciphertext (Tb_I, Cb_I) is then passed along the circuit towards the OP. 3.2.2. Routing to the Origin When a ciphertext (Tb_1, Cb_1) arrives at an OP, the OP decrypts it in two stages. It first reverses the layers from 1 to n-1 as follows: For I = 1 to n-1: X_I = PH(HSb_I, (LTb_{I+1} | Cb_I)) Y_I = Tb_I ^ X_I Tb_{I+1} = E(KSb_I, Y_I) ^ X_I IV = INT(Tb_{I+1}) Z = E(JSb_I, ) | E(JSb_I, ) | ... | E(JSb_I, ) % BC_BLOCK_LEN = 16 Cb_{I+1} = Cb_I ^ Z[1, 509] Upon completion the n'th layer of encryption is removed as follows: X_n = PH(HDb_n, (LNb_n | Cb_n)) Y_n = Tb_n ^ X_n if (Nb_n = D(KDb_n, Y_n) ^ X_n) % authentication is successful compute E(Sb_n, <0>) | E(Sb_n, <1>) | E(Sb_n, <2>) | and parse the output as Z, Nb'_n, KDb'_n, HDb'_n, Sb'_n M = Cb_n ^ Z LNb_n = Nb_n Nb_n = Nb'_n KDb_n = KDb'_n HDb_n = HDb'_n Sb_n = Sb'_n return M else send DESTROY cell to tear down the circuit 4. Application connections and stream management 4.1. Amendments to the Relay Cell Format Within a circuit, the OP and the end node use the contents of RELAY packets to tunnel end-to-end commands and TCP connections ("Streams") across circuits. End-to-end commands can be initiated by either edge; streams are initiated by the OP. The payload of each unencrypted RELAY cell consists of: Relay command [1 byte] StreamID [2 bytes] Length [2 bytes] Data [PAYLOAD_LEN-21 bytes] The old Digest field is removed since sufficient information for authentication is now included in the nonce part of the payload. The old 'Recognized' field is removed. Instead a cell is recognized via a partial decryption using the node's dynamic keys - namely the following steps (already included in Section 3): Forward direction: X_I = PH(HDf_n, (LNf_I | Cf_I)) Y_I = Tf_I ^ X_I if (Nf_I == D(KDf_I, Y_I) ^ X_I) % cell is recognized and authenticated Backward direction (executed by the OP): If the OP is aware of the number of layers present in the cell there is no need to attempt to recognize the cell. Otherwise the OP can, for each layer, first attempt a partial decryption using the dynamic keys for that layer as follows: X_I = PH(HDb_I, (LNb_I | Cb_I)) Y_I = Tb_I ^ X_I if (Nb_I = D(KDb_I, Y_I) ^ X_I) % cell is recognized and authenticated The 'Length' field of a relay cell contains the number of bytes in the relay payload which contain real payload data. The remainder of the payload is padding bytes. 4.2. Appending the encrypted nonce and dealing with version-homogenic and version-heterogenic circuits When a cell is prepared to be routed from the origin (see Section 3.1.1) the encrypted nonce N is appended to the encrypted cell (occupying the last 16 bytes of the cell). If the cell is prepared to be sent to a node supporting the new protocol, S is combined with other sources to generate the layer's nonce. Otherwise, if the node only supports the old protocol, n is still appended to the encrypted cell (so that following nodes can still recover their nonce), but a synchronized nonce (as per the old protocol) is used in CTR-mode. When a cell is sent along the circuit in the 'backward' direction, nodes supporting the new protocol always assume that the last 16 bytes of the input are the nonce used by the previous node, which they process as per Section 3.2.1. If the previous node also supports the new protocol, these cells are indeed the nonce. If the previous node only supports the old protocol, these bytes are either encrypted padding bytes or encrypted data. 5. Security and Design Rationale We are currently working on a security proof to better substantiate our security claims. Below is a short informal summary on the security of CGO and its design rationale. 5.1. Resistance to crypto-tagging attacks Protection against crypto-tagging attacks is provided by layers n-1 to 1. This part of the scheme is based on the paradigm from [ADL17] which has the property that if any single bit of the OR's input is changed then all of the OR's output will be randomised. Specifically, if (Tf_I, Cf_I) is travelling in the forward direction and is processed by an honest node I, a single bit flip to either Tf_I or Cf_I will result in both Tf_{I+1} and Cf_{I+1} being completely randomised. In addition, the processing of (Tf_I, Cf_I) includes LTf_{I+1} so that any modification to (Tf_I, Cf_I) at time j will in turn randomise the value (Tf_{I+1}, Cf_{I+1}) at any time >= j . Thus once a circuit is tampered with it is not possible to recover from it at a later stage. This helps to protect against the standard crypto-tagging attack and variations thereof (Section 5.2 in [DS18]). A similar argument holds in the backward direction. 5.2. End-to-end authenticated encryption Layer n provides end-to-end authenticated encryption. Similar to the old protocol, this proposal only offers end-to-end authentication rather than per-hop authentication. However, CGO provides 128-bit authentication as opposed to the 32-bit authentication provided by the old protocol. A main observation underpinning the design of CGO is that the n'th layer does not need to be secure against the release of unverified plaintext (RUP). RUP security is only needed to protect against tagging attacks and the n'th layer does not help in that regard (but the layers below do). Consequently we employ a different scheme at the n'th layer which is designed to provide forward-secure authenticated encryption. 5.3 Forward Security As mentioned in the previous section CGO provides end-to-end authenticated encryption that is also forward secure. Our notion of forward security follows the definitions of Bellare and Yee [BY03] for both confidentiality and authenticity. Forward-secure confidentiality says that upon corrupting either the sender (or the receiver), the secrecy of the messages that have already been sent (or received) is still guaranteed. As for forward-secure authentication, upon corrupting the sender the authenticity of previously authenticated messages is still guaranteed (even if they have not yet been received). In order to achieve forward-secure authenticated encryption, CGO updates the key material of the n'th layer encryption with every cell that is processed. In order to support leaky pipes the lower layers also need to maintain a set of dynamic keys that are used to recognize cells that are intended for them. This key material is only used for partial processing, i.e. recognizing the cell, and is only updated if verification is successful. If the cell is not recognized, the node reverts to processing the cell with the static key material. If support for leaky-pipes is not required this extra processing can be omitted. 6. Efficiency Considerations Although we have not carried out any experiments to verify this, we expect CGO to perform relatively well in terms of efficiency. Firstly, it manages to achieve forward security with just a universal hash as opposed to other proposals which suggested the use of SHA2 or SHA3. In this respect we recommend using POLYVAL [GLL18], a variant of GHASH that is more compatible with Intel's PCMULQDQ instruction. Furthermore CGO admits a certain degree of parallelisability. Supporting leaky pipes requires an OR to first verify the cell using the the dynamic key material and if the cell is unrecognised it goes on to process the cell with the static key material. The important thing to note (see for instance Section 3.1.2) is that the initial processing of the cell using the static key material is almost identical to the verification using the dynamic key material, and the two computations are independent of each other. As such, although in Section 3 these were described as being evaluated sequentially, they can in fact be computed in parallel. In particular the two polynomial hashes could be computed in parallel by using the new vectorised VPCMULQDQ instruction. We are currently looking into further optimisations of the scheme as presented here. One such optimisation is the possibility of removing KDf_I and KDb_I while retaining forward security. This would further improve the efficiency of the scheme by reducing the amount of dynamic key material that needs to be updated with every cell that is processed. References [ADL17] Tomer Ashur, Orr Dunkelman, Atul Luykx, "Boosting Authenticated Encryption Robustness with Minimal Modifications", CRYPTO 2017. [BY03] Mihir Bellare, Bennett Yee, "Forward-Security in Private-Key Cryptography", CT-RSA 2003. [DS18] Jean Paul Degabriele, Martijn Stam, "Untagging Tor: A Formal Treatment of Onion Encryption", EUROCRYPT 2018. [GLL18] Shay Gueron, Adam Langley, Yehuda Lindell, "AES-GCM-SIV: Nonce Misuse-Resistant Authenticated Encryption", RFC 8452, April 2019. [ST13] Thomas Shrimpton, R. Seth Terashima, "A Modular Framework for Building Variable-Input Length Tweakable Ciphers", ASIACRYPT 2013. ```