aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--address-spec.txt2
-rw-r--r--control-spec.txt5
-rw-r--r--dir-spec.txt2
-rw-r--r--proposals/000-index.txt14
-rw-r--r--proposals/319-wide-everything.md4
-rw-r--r--proposals/325-packed-relay-cells.md4
-rw-r--r--proposals/327-pow-over-intro.txt523
-rw-r--r--proposals/329-traffic-splitting.txt811
-rw-r--r--proposals/340-packed-and-fragmented.md257
-rw-r--r--proposals/343-rend-caa.txt111
-rw-r--r--proposals/BY_INDEX.md7
-rw-r--r--proposals/README.md7
-rw-r--r--rend-spec-v3.txt124
-rw-r--r--tor-spec.txt52
14 files changed, 1227 insertions, 696 deletions
diff --git a/address-spec.txt b/address-spec.txt
index 98e7068..1e90e6e 100644
--- a/address-spec.txt
+++ b/address-spec.txt
@@ -1,4 +1,3 @@
-
Special Hostnames in Tor
Nick Mathewson
@@ -71,6 +70,7 @@ Table of Contents
- PUBKEY is the 32 bytes ed25519 master pubkey of the onion service.
- VERSION is a one byte version field (default value '\x03')
- ".onion checksum" is a constant string
+ - H is SHA3-256
- CHECKSUM is truncated to two bytes before inserting it in onion_address
When Tor sees an address in this format, it tries to look up and connect to
diff --git a/control-spec.txt b/control-spec.txt
index 9295580..52e11a0 100644
--- a/control-spec.txt
+++ b/control-spec.txt
@@ -2260,6 +2260,7 @@ Table of Contents
[SP "REASON=" Reason [SP "REMOTE_REASON=" Reason]]
[SP "SOCKS_USERNAME=" EscapedUsername]
[SP "SOCKS_PASSWORD=" EscapedPassword]
+ [SP "HS_POW=" HSPoW ]
CRLF
CircStatus =
@@ -2293,6 +2294,10 @@ Table of Contents
"HSSI_CONNECTING" / "HSSI_ESTABLISHED" /
"HSSR_CONNECTING" / "HSSR_JOINED"
+ HSPoWType = "v1"
+ HSPoWEffort = 1*DIGIT
+ HSPoW = HSPoWType "," HSPoWEffort
+
EscapedUsername = QuotedString
EscapedPassword = QuotedString
diff --git a/dir-spec.txt b/dir-spec.txt
index b6a7c8e..d6b58cd 100644
--- a/dir-spec.txt
+++ b/dir-spec.txt
@@ -1032,7 +1032,7 @@ Table of Contents
statistics.
"geoip-start-time" and "geoip-client-origins" have been replaced by
- "bridge-stats-end" and "bridge-stats-ips" in 0.2.2.4-alpha. The
+ "bridge-stats-end" and "bridge-ips" in 0.2.2.4-alpha. The
reason is that the measurement interval with "geoip-stats" as
determined by subtracting "geoip-start-time" from "published" could
have had a variable length, whereas the measurement interval in
diff --git a/proposals/000-index.txt b/proposals/000-index.txt
index a838b94..160ce4e 100644
--- a/proposals/000-index.txt
+++ b/proposals/000-index.txt
@@ -239,17 +239,17 @@ Proposals by number:
316 FlashFlow: A Secure Speed Test for Tor (Parent Proposal) [DRAFT]
317 Improve security aspects of DNS name resolution [NEEDS-REVISION]
318 Limit protover values to 0-63 [CLOSED]
-319 RELAY_FRAGMENT cells [OPEN]
+319 RELAY_FRAGMENT cells [OBSOLETE]
320 Removing TAP usage from v2 onion services [REJECTED]
321 Better performance and usability for the MyFamily option (v2) [ACCEPTED]
322 Extending link specifiers to include the directory port [OPEN]
323 Specification for Walking Onions [OPEN]
324 RTT-based Congestion Control for Tor [OPEN]
-325 Packed relay cells: saving space on small commands [OPEN]
+325 Packed relay cells: saving space on small commands [OBSOLETE]
326 The "tor-relay" Well-Known Resource Identifier [OPEN]
327 A First Take at PoW Over Introduction Circuits [DRAFT]
328 Make Relays Report When They Are Overloaded [CLOSED]
-329 Overcoming Tor's Bottlenecks with Traffic Splitting [DRAFT]
+329 Overcoming Tor's Bottlenecks with Traffic Splitting [NEEDS-REVISION]
330 Modernizing authority contact entries [OPEN]
331 Res tokens: Anonymous Credentials for Onion Service DoS Resilience [DRAFT]
332 Ntor protocol with extra data, version 3 [FINISHED]
@@ -263,6 +263,7 @@ Proposals by number:
340 Packed and fragmented relay messages [OPEN]
341 A better algorithm for out-of-sockets eviction [OPEN]
342 Decoupling hs_interval and SRV lifetime [DRAFT]
+343 CAA Extensions for the Tor Rendezvous Specification [OPEN]
Proposals by status:
@@ -271,7 +272,6 @@ Proposals by status:
294 TLS 1.3 Migration
316 FlashFlow: A Secure Speed Test for Tor (Parent Proposal)
327 A First Take at PoW Over Introduction Circuits
- 329 Overcoming Tor's Bottlenecks with Traffic Splitting
331 Res tokens: Anonymous Credentials for Onion Service DoS Resilience
342 Decoupling hs_interval and SRV lifetime
NEEDS-REVISION:
@@ -283,6 +283,7 @@ Proposals by status:
279 A Name System API for Tor Onion Services
291 The move to two guard nodes
317 Improve security aspects of DNS name resolution
+ 329 Overcoming Tor's Bottlenecks with Traffic Splitting
OPEN:
239 Consensus Hash Chaining
240 Early signing key revocation for directory authorities
@@ -296,15 +297,14 @@ Proposals by status:
306 A Tor Implementation of IPv6 Happy Eyeballs
308 Counter Galois Onion: A New Proposal for Forward-Secure Relay Cryptography
309 Optimistic SOCKS Data
- 319 RELAY_FRAGMENT cells
322 Extending link specifiers to include the directory port
323 Specification for Walking Onions
324 RTT-based Congestion Control for Tor
- 325 Packed relay cells: saving space on small commands
326 The "tor-relay" Well-Known Resource Identifier
330 Modernizing authority contact entries
340 Packed and fragmented relay messages
341 A better algorithm for out-of-sockets eviction
+ 343 CAA Extensions for the Tor Rendezvous Specification
ACCEPTED:
265 Load Balancing with Overhead Parameters [for 0.2.9.x]
282 Remove "Named" and "Unnamed" handling from consensus voting [for 0.3.3.x]
@@ -508,6 +508,8 @@ Proposals by status:
263 Request to change key exchange protocol for handshake v1.2
268 New Guard Selection Behaviour
270 RebelAlliance: A Post-Quantum Secure Hybrid Handshake Based on NewHope
+ 319 RELAY_FRAGMENT cells
+ 325 Packed relay cells: saving space on small commands
RESERVE:
133 Incorporate Unreachable ORs into the Tor Network
172 GETINFO controller option for circuit information
diff --git a/proposals/319-wide-everything.md b/proposals/319-wide-everything.md
index 0de6676..06173ed 100644
--- a/proposals/319-wide-everything.md
+++ b/proposals/319-wide-everything.md
@@ -3,9 +3,11 @@ Filename: 319-wide-everything.md
Title: RELAY_FRAGMENT cells
Author: Nick Mathewson
Created: 11 May 2020
-Status: Open
+Status: Obsolete
```
+(Proposal superseded by proposal 340)
+
(This proposal is part of the Walking Onions spec project.)
# Introduction
diff --git a/proposals/325-packed-relay-cells.md b/proposals/325-packed-relay-cells.md
index 6498d4c..7a88840 100644
--- a/proposals/325-packed-relay-cells.md
+++ b/proposals/325-packed-relay-cells.md
@@ -3,9 +3,11 @@ Filename: 325-packed-relay-cells.md
Title: Packed relay cells: saving space on small commands
Author: Nick Mathewson
Created: 10 July 2020
-Status: Open
+Status: Obsolete
```
+(Proposal superseded by proposal 340)
+
# Introduction
In proposal 319 I suggested a way to fragment long commands across
diff --git a/proposals/327-pow-over-intro.txt b/proposals/327-pow-over-intro.txt
index ca7f7cd..6ce610e 100644
--- a/proposals/327-pow-over-intro.txt
+++ b/proposals/327-pow-over-intro.txt
@@ -13,12 +13,11 @@ Status: Draft
So far our attempts at limiting the impact of introduction flooding DoS
attacks on onion services has been focused on horizontal scaling with
- Onionbalance, optimizing the CPU usage of Tor and applying congestion control
- using rate limiting. While these measures move the goalpost forward, a core
- problem with onion service DoS is that building rendezvous circuits is a
- costly procedure both for the service and for the network. For more
- information on the limitations of rate-limiting when defending against DDoS,
- see [REF_TLS_1].
+ Onionbalance, optimizing the CPU usage of Tor and applying rate limiting.
+ While these measures move the goalpost forward, a core problem with onion
+ service DoS is that building rendezvous circuits is a costly procedure both
+ for the service and for the network. For more information on the limitations
+ of rate-limiting when defending against DDoS, see [REF_TLS_1].
If we ever hope to have truly reachable global onion services, we need to
make it harder for attackers to overload the service with introduction
@@ -56,8 +55,8 @@ Status: Draft
The small botnet is a bunch of computers lined up to do an introduction
flooding attack. Assuming 500 medium-range computers, we are talking about
- an attacker with total access to 10 Thz of CPU and 10 TB of RAM. We consider
- the upfront cost for this attacker to be about $400.
+ an attacker with total access to 10 Thz of CPU and 10 TB of RAM. We
+ consider the upfront cost for this attacker to be about $400.
"The large botnet"
@@ -68,7 +67,7 @@ Status: Draft
We hope that this proposal can help us defend against the script-kiddie
attacker and small botnets. To defend against a large botnet we would need
- more tools in our disposal (see [FUTURE_DESIGNS]).
+ more tools at our disposal (see [FUTURE_DESIGNS]).
1.2.2. User profiles [USER_MODEL]
@@ -78,16 +77,15 @@ Status: Draft
This is a standard laptop/desktop user who is trying to browse the
web. They don't know how these defences work and they don't care to
- configure or tweak them. They are gonna use the default values and if the
- site doesn't load, they are gonna close their browser and be sad at Tor.
- They run a 2Ghz computer with 4GB of RAM.
+ configure or tweak them. If the site doesn't load, they are gonna close
+ their browser and be sad at Tor. They run a 2Ghz computer with 4GB of RAM.
"The motivated user"
This is a user that really wants to reach their destination. They don't
care about the journey; they just want to get there. They know what's going
- on; they are willing to tweak the default values and make their computer do
- expensive multi-minute PoW computations to get where they want to be.
+ on; they are willing to make their computer do expensive multi-minute PoW
+ computations to get where they want to be.
"The mobile user"
@@ -104,8 +102,8 @@ Status: Draft
This proposal is not perfect and it does not cover all the use cases. Still,
we think that by covering some use cases and giving reachability to the
people who really need it, we will severely demotivate the attackers from
- continuing the DoS attacks and hence stop the DoS threat all
- together. Furthermore, by increasing the cost to launch a DoS attack, a big
+ continuing the DoS attacks and hence stop the DoS threat all together.
+ Furthermore, by increasing the cost to launch a DoS attack, a big
class of DoS attackers will disappear from the map, since the expected ROI
will decrease.
@@ -135,33 +133,73 @@ Status: Draft
introduction phase of the onion service protocol.
The system described in this proposal is not meant to be on all the time, and
- should only be enabled by services when under duress. The percentage of
- clients receiving puzzles can also be configured based on the load of the
- service.
+ it can be entirely disabled for services that do not experience DoS attacks.
- In summary, the following steps are taken for the protocol to complete:
+ When the subsystem is enabled, suggested effort is continuously adjusted and
+ the computational puzzle can be bypassed entirely when the effort reaches
+ zero. In these cases, the proof-of-work subsystem can be dormant but still
+ provide the necessary parameters for clients to voluntarily provide effort
+ in order to get better placement in the priority queue.
+
+ The protocol involves the following major steps:
1) Service encodes PoW parameters in descriptor [DESC_POW]
2) Client fetches descriptor and computes PoW [CLIENT_POW]
3) Client completes PoW and sends results in INTRO1 cell [INTRO1_POW]
- 4) Service verifies PoW and queues introduction based on PoW effort [SERVICE_VERIFY]
+ 4) Service verifies PoW and queues introduction based on PoW effort
+ [SERVICE_VERIFY]
+ 5) Requests are continuously drained from the queue, highest effort first,
+ subject to multiple constraints on speed [HANDLE_QUEUE]
2.2. Proof-of-work overview
-2.2.1. Primitives
-
- For our proof-of-work function we will use the 'equix' scheme by tevador
- [REF_EQUIX]. Equix is an assymetric PoW function based on Equihash<60,3>. It
- features lightning fast verification speed, and also aims to minimize the
- assymetry between CPU and GPU. Furthermore, it's designed for this particular
- use-case and hence cryptocurrency miners are not incentivized to make
- optimized ASICs for it.
-
- The Equix scheme provides two functions that will be used in this proposal:
- - equix_solve(seed, nonce, effort) which solves a puzzle instance.
- - equix_verify(seed, nonce, effort, solution) which verifies a puzzle solution.
-
- We tune equix in section [EQUIX_TUNING].
+2.2.1. Algorithm overview
+
+ For our proof-of-work function we will use the Equi-X scheme by tevador
+ [REF_EQUIX]. Equi-X is an assymetric PoW function based on Equihash<60,3>,
+ using HashX as the underlying layer. It features lightning fast verification
+ speed, and also aims to minimize the assymetry between CPU and GPU.
+ Furthermore, it's designed for this particular use-case and hence
+ cryptocurrency miners are not incentivized to make optimized ASICs for it.
+
+ The overall scheme consists of several layers that provide different pieces
+ of this functionality:
+
+ 1) At the lowest layers, blake2b and siphash are used as hashing and PRNG
+ algorithms that are well suited to common 64-bit CPUs.
+ 2) A custom hash function, HashX, uses dynamically generated functions that
+ are tuned to be a good match for pipelined integer and floating point
+ performance on current 64-bit CPUs. This layer provides the strongest ASIC
+ resistance, since a reimplementation in hardware would need to implement
+ much of a CPU to compute these functions efficiently.
+ 3) The Equi-X layer itself builds on HashX and adds an algorithmic puzzle
+ that's designed to be strongly asymmetric and to require RAM to solve
+ efficiently.
+ 4) The PoW protocol itself builds on this Equi-X function with a particular
+ construction of the challenge input and particular constraints on the
+ allowed blake2b hash of the solution. This layer provides a linearly
+ adjustible effort that we can verify.
+ 5) Above the level of individual PoW handshakes, the client and service
+ form a closed-loop system that adjusts the effort of future handshakes.
+
+ The Equi-X scheme provides two functions that will be used in this proposal:
+ - equix_solve(challenge) which solves a puzzle instance, returning
+ a variable number of solutions per invocation depending on the specific
+ challenge value.
+ - equix_verify(challenge, solution) which verifies a puzzle solution
+ quickly. Verification still depends on executing the HashX function,
+ but far fewer times than when searching for a solution.
+
+ For the purposes of this proposal, all cryptographic algorithms are assumed
+ to produce and consume byte strings, even if internally they operate on
+ some other data type like 64-bit words. This is conventionally little endian
+ order for blake2b, which contrasts with Tor's typical use of big endian.
+ HashX itself is configured with an 8-byte output but its input is a single
+ 64-bit word of undefined byte order, of which only the low 16 bits are used
+ by Equi-X in its solution output. We treat Equi-X solution arrays as byte
+ arrays using their packed little endian 16-bit representation.
+
+ We tune Equi-X in section [EQUIX_TUNING].
2.2.2. Dynamic PoW
@@ -184,21 +222,31 @@ Status: Draft
2.2.3. PoW effort
- For our dynamic PoW system to work, we will need to be able to compare PoW
- tokens with each other. To do so we define a function:
+ It's common for proof-of-work systems to define an exponential effort
+ function based on a particular number of leading zero bits or equivalent.
+ For the benefit of our effort estimation system, it's quite useful if we
+ instead have a linear scale. We use the first 32 bits of a hashed version
+ of the Equi-X solution as compared to the full 32-bit range.
+
+ Conceptually we could define a function:
unsigned effort(uint8_t *token)
- which takes as its argument a hash output token, interprets it as a
+ which takes as its argument a hashed solution, interprets it as a
bitstring, and returns the quotient of dividing a bitstring of 1s by it.
So for example:
- effort(00000001100010101101) = 11111111111111111111 / 00000001100010101101
+ effort(00000001100010101101) = 11111111111111111111
+ / 00000001100010101101
or the same in decimal:
effort(6317) = 1048575 / 6317 = 165.
- This definition of effort has the advantage of directly expressing the
- expected number of hashes that the client had to calculate to reach the
- effort. This is in contrast to the (cheaper) exponential effort definition of
- taking the number of leading zero bits.
+ In practice we can avoid even having to perform this division, performing
+ just one multiply instead to see if a request's claimed effort is supported
+ by the smallness of the resulting 32-bit hash prefix. This assumes we send
+ the desired effort explicitly as part of each PoW solution. We do want to
+ force clients to pick a specific effort before looking for a solution,
+ otherwise a client could opportunistically claim a very large effort any
+ time a lucky hash prefix comes up. Thus the effort is communicated explicitly
+ in our protocol, and it forms part of the concatenated Equi-X challenge.
3. Protocol specification
@@ -219,14 +267,16 @@ Status: Draft
without trailing padding.
suggested-effort: An unsigned integer specifying an effort value that
- clients should aim for when contacting the service. See
+ clients should aim for when contacting the service. Can be
+ zero to mean that PoW is available but not currently
+ suggested for a first connection attempt. See
[EFFORT_ESTIMATION] for more details here.
- expiration-time: A timestamp in "YYYY-MM-DD SP HH:MM:SS" format after
- which the above seed expires and is no longer valid as
- the input for PoW. It's needed so that the size of our
- replay cache does not grow infinitely. It should be
- set to RAND_TIME(now+7200, 900) seconds.
+ expiration-time: A timestamp in "YYYY-MM-DDTHH:MM:SS" format (iso time
+ with no space) after which the above seed expires and
+ is no longer valid as the input for PoW. It's needed
+ so that our replay cache does not grow infinitely. It
+ should be set to RAND_TIME(now+7200, 900) seconds.
The service should refresh its seed when expiration-time passes. The service
SHOULD keep its previous seed in memory and accept PoWs using it to avoid
@@ -240,7 +290,8 @@ Status: Draft
3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
If a client receives a descriptor with "pow-params", it should assume that
- the service is expecting a PoW input as part of the introduction protocol.
+ the service is prepared to receive PoW solutions as part of the introduction
+ protocol.
The client parses the descriptor and extracts the PoW parameters. It makes
sure that the <expiration-time> has not expired and if it has, it needs to
@@ -248,25 +299,44 @@ Status: Draft
The client should then extract the <suggested-effort> field to configure its
PoW 'target' (see [REF_TARGET]). The client SHOULD NOT accept 'target' values
- that will cause an infinite PoW computation. {XXX: How to enforce this?}
+ that will cause unacceptably long PoW computation.
+
+ The client uses a "personalization string" P equal to the following
+ nul-terminated ascii string: "Tor hs intro v1\0".
+
+ The client looks up `ID`, the current 32-byte blinded public ID
+ (KP_hs_blind_id) for the onion service.
To complete the PoW the client follows the following logic:
- a) Client selects a target effort E.
- b) Client generates a random 16-byte nonce N.
+ a) Client selects a target effort E, based on <suggested-effort> and past
+ connection attempt history.
+ b) Client generates a secure random 16-byte nonce N, as the starting
+ point for the solution search.
c) Client derives seed C by decoding 'seed-b64'.
- d) Client calculates S = equix_solve(C || N || E)
- e) Client calculates R = blake2b(C || N || E || S)
+ d) Client calculates S = equix_solve(P || ID || C || N || E)
+ e) Client calculates R = ntohl(blake2b_32(P || ID || C || N || E || S))
f) Client checks if R * E <= UINT32_MAX.
- f1) If yes, success! The client can submit N, E, the first 4 bytes of C
- and S.
+ f1) If yes, success! The client can submit N, E, the first 4 bytes of
+ C, and S.
f2) If no, fail! The client interprets N as a 16-byte little-endian
- integer, increments it by 1 and goes back to step d).
+ integer, increments it by 1 and goes back to step d).
+
+ Note that the blake2b hash includes the output length parameter in its
+ initial state vector, so a blake2b_32 is not equivalent to the prefix of a
+ blake2b_512. We calculate the 32-bit blake2b specifically, and interpret it
+ in network byte order as an unsigned integer.
At the end of the above procedure, the client should have S as the solution
- of the Equix puzzle with N as the nonce, C as the seed. How quickly this
+ of the Equix-X puzzle with N as the nonce, C as the seed. How quickly this
happens depends solely on the target effort E parameter.
+ The algorithm as described is suitable for single-threaded computation.
+ Optionally, a client may choose multiple nonces and attempt several solutions
+ in parallel on separate CPU cores. The specific choice of nonce is entirely
+ up to the client, so parallelization choices like this do not impact the
+ network protocol's interoperability at all.
+
3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
Now that the client has an answer to the puzzle it's time to encode it into
@@ -292,6 +362,7 @@ Status: Draft
POW_VERSION is 1 for the protocol specified in this proposal
POW_NONCE is the nonce 'N' from the section above
+ POW_EFFORT is the 32-bit integer effort value, in network byte order
POW_SEED is the first 4 bytes of the seed used
This will increase the INTRODUCE1 payload size by 43 bytes since the
@@ -303,10 +374,10 @@ Status: Draft
3.4. Service verifies PoW and handles the introduction [SERVICE_VERIFY]
When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it
- should check its configuration on whether proof-of-work is required to
- complete the introduction. If it's not required, the extension SHOULD BE
- ignored. If it is required, the service follows the procedure detailed in
- this section.
+ should check its configuration on whether proof-of-work is enabled on the
+ service. If it's not enabled, the extension SHOULD BE ignored. If enabled,
+ even if the suggested effort is currently zero, the service follows the
+ procedure detailed in this section.
If the service requires the PROOF_OF_WORK extension but received an
INTRODUCE1 cell without any embedded proof-of-work, the service SHOULD
@@ -319,12 +390,12 @@ Status: Draft
a) Find a valid seed C that starts with POW_SEED. Fail if no such seed
exists.
- b) Fail if E = POW_EFFORT is lower than the minimum effort.
- c) Fail if N = POW_NONCE is present in the replay cache (see [REPLAY_PROTECTION[)
- d) Calculate R = blake2b(C || N || E || S)
- e) Fail if R * E > UINT32_MAX
- f) Fail if equix_verify(C || N || E, S) != EQUIX_OK
- g) Put the request in the queue with a priority of E
+ b) Fail if N = POW_NONCE is present in the replay cache
+ (see [REPLAY_PROTECTION])
+ c) Calculate R = ntohl(blake2b_32(P || ID || C || N || E || S))
+ d) Fail if R * E > UINT32_MAX
+ e) Fail if equix_verify(P || ID || C || N || E, S) != EQUIX_OK
+ f) Put the request in the queue with a priority of E
If any of these steps fail the service MUST ignore this introduction request
and abort the protocol.
@@ -349,7 +420,9 @@ Status: Draft
will flag some connections as replays even if they are not; with this false
positive probability increasing as the number of entries increase. However,
with the right parameter tuning this probability should be negligible and
- well handled by clients. {TODO: Figure bloom filter}
+ well handled by clients.
+
+ {TODO: Design and specify a suitable bloom filter for this purpose.}
3.4.2. The Introduction Queue [INTRO_QUEUE]
@@ -365,11 +438,11 @@ Status: Draft
structure. Each element in that priority queue is an introduction request,
and its priority is the effort put into its PoW:
- When a verified introduction comes through, the service uses the effort()
- function with the solution S as its input, and uses the output to place requests
- into the right position of the priority_queue: The bigger the effort, the
- more priority it gets in the queue. If two elements have the same effort, the
- older one has priority over the newer one.
+ When a verified introduction comes through, the service uses its included
+ effort commitment value to place each request into the right position of the
+ priority_queue: The bigger the effort, the more priority it gets in the
+ queue. If two elements have the same effort, the older one has priority over
+ the newer one.
3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
@@ -380,43 +453,103 @@ Status: Draft
3.4.3. PoW effort estimation [EFFORT_ESTIMATION]
- The service starts with a default suggested-effort value of 5000 (see
- [EQUIX_DIFFICULTY] section for more info).
+3.4.3.1. High-level description of the effort estimation process
+
+ The service starts with a default suggested-effort value of 0, which keeps
+ the PoW defenses dormant until we notice signs of overload.
+
+ The overall process of determining effort can be thought of as a set of
+ multiple coupled feedback loops. Clients perform their own effort
+ adjustments via [CLIENT_TIMEOUT] atop a base effort suggested by the service.
+ That suggestion incorporates the service's control adjustments atop a base
+ effort calculated using a sum of currently-queued client effort.
+
+ Each feedback loop has an opportunity to cover different time scales. Clients
+ can make adjustments at every single circuit creation request, whereas
+ services are limited by the extra load that frequent updates would place on
+ HSDir nodes.
+
+ In the combined client/service system these client-side increases are
+ expected to provide the most effective quick response to an emerging DoS
+ attack. After early clients increase the effort using [CLIENT_TIMEOUT],
+ later clients will benefit from the service detecting this increased queued
+ effort and offering a larger suggested_effort.
+
+ Effort increases and decreases both have an intrinsic cost. Increasing effort
+ will make the service more expensive to contact, and decreasing effort makes
+ new requests likely to become backlogged behind older requests. The steady
+ state condition is preferable to either of these side-effects, but ultimately
+ it's expected that the control loop always oscillates to some degree.
+
+3.4.3.2. Service-side effort estimation
+
+ Services keep an internal effort estimation which updates on a regular
+ periodic timer in response to measurements made on the queueing behavior
+ in the previous period. These internal effort changes can optionally trigger
+ client-visible suggested_effort changes when the difference is great enough
+ to warrant republishing to the HSDir.
+
+ This evaluation and update period is referred to as HS_UPDATE_PERIOD.
+ The service side effort estimation takes inspiration from TCP congestion
+ control's additive increase / multiplicative decrease approach, but unlike
+ a typical AIMD this algorithm is fixed-rate and doesn't update immediately
+ in response to events.
+
+ {TODO: HS_UPDATE_PERIOD is hardcoded to 300 (5 minutes) currently, but it
+ should be configurable in some way. Is it more appropriate to use the
+ service's torrc here or a consensus parameter?}
+
+3.4.3.3. Per-period service state
+
+ During each update period, the service maintains some state:
- Then during its operation the service continuously keeps track of the
- received PoW cell efforts to inform its clients of the effort they should put
- in their introduction to get service. The service informs the clients by
- using the <suggested-effort> field in the descriptor.
+ 1. TOTAL_EFFORT, a sum of all effort values for rendezvous requests that
+ were successfully validated and enqueued.
- Everytime the service handles or trims an introduction request from the
- priority queue in [HANDLE_QUEUE], the service adds the request's effort to a
- sorted list.
+ 2. REND_HANDLED, a count of rendezvous requests that were actually
+ launched. Requests that made it to dequeueing but were too old to launch
+ by then are not included.
- Then every HS_UPDATE_PERIOD seconds (which is controlled through a consensus
- parameter and has a default value of 300 seconds) and while the DoS feature
- is enabled, the service updates its <suggested-effort> value as follows:
+ 3. HAD_QUEUE, a flag which is set if at any time in the update period we
+ saw the priority queue filled with more than a minimum amount of work,
+ greater than we would expect to process in approximately 1/4 second
+ using the configured dequeue rate.
- 1. Set TOTAL_EFFORT to the sum of the effort of all valid requests that
- have been received since the last HS descriptor update (this includes
- all handled requests, trimmed requests and requests still in the queue)
+ 4. MAX_TRIMMED_EFFORT, the largest observed single request effort that we
+ discarded during the period. Requests are discarded either due to age
+ (timeout) or during culling events that discard the bottom half of the
+ entire queue when it's too full.
- 2. Set SUGGESTED_EFFORT = TOTAL_EFFORT / (SVC_BOTTOM_CAPACITY * HS_UPDATE_PERIOD).
- The denominator above is the max number of requests that the service
- could have handled during that time.
+3.4.3.4. Service AIMD conditions
- 3. Set <suggested-effort> to max(MIN_EFFORT, SUGGESTED_EFFORT).
+ At the end of each period, the service may decide to increase effort,
+ decrease effort, or make no changes, based on these accumulated state values:
- During the above procedure we use the following default values:
- - MIN_EFFORT = 1000, as the result of a simulation experiment [REF_TEVADOR_SIM]
- - SVC_BOTTOM_CAPACITY = 100, which is the number of introduction requests
- that can be handled by the service per second. This was computed in
- [POW_DIFFICULTY_TOR] as 180, but we reduced it to 100 to account for
- slower computers and networks.
+ 1. If MAX_TRIMMED_EFFORT > our previous internal suggested_effort,
+ always INCREASE. Requests that follow our latest advice are being
+ dropped.
- The above algorithm is meant to balance the suggested effort based on the
- effort of all received requests. It attempts to dynamically adjust the
- suggested effort so that it increases when an attack is received, and tones
- down when the attack has stopped.
+ 2. If the HAD_QUEUE flag was set and the queue still contains at least
+ one item with effort >= our previous internal suggested_effort,
+ INCREASE. Even if we haven't yet reached the point of dropping requests,
+ this signal indicates that the our latest suggestion isn't high enough
+ and requests will build up in the queue.
+
+ 3. If neither condition (1) or (2) are taking place and the queue is below
+ a level we would expect to process in approximately 1/4 second, choose
+ to DECREASE.
+
+ 4. If none of these conditions match, the suggested effort is unchanged.
+
+ When we INCREASE, the internal suggested_effort is increased to either its
+ previous value + 1, or (TOTAL_EFFORT / REND_HANDLED), whichever is larger.
+
+ When we DECREASE, the internal suggested_effort is scaled by 2/3rds.
+
+ Over time, this will continue to decrease our effort suggestion any time the
+ service is fully processing its request queue. If the queue stays empty, the
+ effort suggestion decreases to zero and clients should no longer submit a
+ proof-of-work solution with their first connection attempt.
It's worth noting that the suggested-effort is not a hard limit to the
efforts that are accepted by the service, and it's only meant to serve as a
@@ -424,33 +557,17 @@ Status: Draft
to the service. The service still adds requests with lower effort than
suggested-effort to the priority queue in [ADD_QUEUE].
- Finally, the above algorithm will never reset back to zero suggested-effort,
- even if the attack is completely over. That's because in that case it would
- be impossible to determine the total computing power of connecting
- clients. Instead it will reset back to MIN_EFFORT, and the operator will have
- to manually shut down the anti-DoS mechanism.
-
- {XXX: SVC_BOTTOM_CAPACITY is static above and will not be accurate for all
- boxes. Ideally we should calculate SVC_BOTTOM_CAPACITY dynamically based on
- the resources of every onion service while the algorithm is running.}
+3.4.3.5. Updating descriptor with new suggested effort
-3.4.3.1. Updating descriptor with new suggested effort
+ The service descriptors may be updated for multiple reasons including
+ introduction point rotation common to all v3 onion services, the scheduled
+ seed rotations described in [DESC_POW], and updates to the effort suggestion.
+ Even though the internal effort estimate updates on a regular timer, we avoid
+ propagating those changes into the descriptor and the HSDir hosts unless
+ there is a significant change.
- Every HS_UPDATE_PERIOD seconds the service should upload a new descriptor
- with a new suggested-effort value.
-
- The service should avoid uploading descriptors too often to avoid overwhelming
- the HSDirs. The service SHOULD NOT upload descriptors more often than
- HS_UPDATE_PERIOD. The service SHOULD NOT upload a new descriptor if the
- suggested-effort value changes by less than 15%.
-
- {XXX: Is this too often? Perhaps we can set different limits for when the
- difficulty goes up and different for when it goes down. It's more important
- to update the descriptor when the difficulty goes up.}
-
- {XXX: What attacks are possible here? Can the attacker intentionally hit this
- rate-limit and then influence the suggested effort so that clients do not
- learn about the new effort?}
+ If the PoW params otherwise match but the seed has changed by less than 15
+ percent, services SHOULD NOT upload a new descriptor.
4. Client behavior [CLIENT_BEHAVIOR]
@@ -463,8 +580,10 @@ Status: Draft
not allow the service to inform the client that the rendezvous is never gonna
occur.
- For this reason we need to define some client behaviors to work around these
- issues.
+ From the client's perspective there's no way to attribute this failure to
+ the service itself rather than the introduction point, so error accounting
+ is performed separately for each introduction-point. Existing mechanisms
+ will discard an introduction point that's required too many retries.
4.1. Clients handling timeouts [CLIENT_TIMEOUT]
@@ -477,31 +596,35 @@ Status: Draft
If the rendezvous request times out, the client SHOULD fetch a new descriptor
for the service to make sure that it's using the right suggested-effort for
- the PoW and the right PoW seed. The client SHOULD NOT fetch service
- descriptors more often than every 'hs-pow-desc-fetch-rate-limit' seconds
- (which is controlled through a consensus parameter and has a default value of
- 600 seconds).
+ the PoW and the right PoW seed. If the fetched descriptor includes a new
+ suggested effort or seed, it should first retry the request with these
+ parameters.
+
+ {TODO: This is not actually implemented yet, but we should do it. How often
+ should clients at most try to fetch new descriptors? Determined by a
+ consensus parameter? This change will also allow clients to retry
+ effectively in cases where the service has just been reconfigured to
+ enable PoW defenses.}
+
+ Every time the client retries the connection, it will count these failures
+ per-introduction-point. These counts of previous retries are combined with
+ the service's suggested_effort when calculating the actual effort to spend
+ on any individual request to a service that advertises PoW support, even
+ when the currently advertised suggested_effort is zero.
- {XXX: Is this too rare? Too often?}
+ On each retry, the client modifies its solver effort:
- When the client fetches a new descriptor, it should try connecting to the
- service with the new suggested-effort and PoW seed. If that doesn't work, it
- should double the effort and retry. The client should keep on
- doubling-and-retrying until it manages to get service, or its able to fetch a
- new descriptor again.
+ 1. If the effort is below (CLIENT_POW_EFFORT_DOUBLE_UNTIL = 1000)
+ it will be doubled.
- {XXX: This means that the client will keep on spinning and
- doubling-and-retrying for a service under this situation. There will never be
- a "Client connection timed out" page for the user. Is this good? Is this bad?
- Should we stop doubling-and-retrying after some iterations? Or should we
- throw a custom error page to the user, and ask the user to stop spinning
- whenever they want?}
+ 2. Otherwise, multiply the effort by (CLIENT_POW_RETRY_MULTIPLIER = 1.5).
-4.3. Other descriptor issues
+ 3. Constrain the new effort to be at least
+ (CLIENT_MIN_RETRY_POW_EFFORT = 8) and no greater than
+ (CLIENT_MAX_POW_EFFORT = 10000)
- Another race condition here is if the service enables PoW, while a client has
- a cached descriptor. How will the client notice that PoW is needed? Does it
- need to fetch a new descriptor? Should there be another feedback mechanism?
+ {TODO: These hardcoded limits should be replaced by timed limits and/or
+ an unlimited solver with robust cancellation. This is issue tor#40787}
5. Attacker strategies [ATTACK_META]
@@ -520,7 +643,8 @@ Status: Draft
that this attack is not possible: we tune this PoW parameter in section
[POW_TUNING_VERIFICATION].
-5.1.2. Overwhelm rendezvous capacity (aka "Overwhelm bottom half") [ATTACK_BOTTOM_HALF]
+5.1.2. Overwhelm rendezvous capacity (aka "Overwhelm bottom half")
+ [ATTACK_BOTTOM_HALF]
Given the way the introduction queue works (see [HANDLE_QUEUE]), a very
effective strategy for the attacker is to totally overwhelm the queue
@@ -588,14 +712,20 @@ Status: Draft
turn into a DoS vector of its own. We will do this tuning in a way that's
agnostic to the chosen PoW function.
- We will then move towards analyzing the default difficulty setting for our
- PoW system. That defines the expected time for clients to succeed in our
- system, and the expected time for attackers to overwhelm our system. Same as
- above we will do this in a way that's agnostic to the chosen PoW function.
+ We will then move towards analyzing the client starting difficulty setting
+ for our PoW system. That defines the expected time for clients to succeed in
+ our system, and the expected time for attackers to overwhelm our system. Same
+ as above we will do this in a way that's agnostic to the chosen PoW function.
+
+ Currently, we have hardcoded the initial client starting difficulty at 8,
+ but this may be too low to ramp up quickly to various on and off attack
+ patterns. A higher initial difficulty may be needed for these, depending on
+ their severity. This section gives us an idea of how large such attacks can
+ be.
Finally, using those two pieces we will tune our PoW function and pick the
- right default difficulty setting. At the end of this section we will know the
- resources that an attacker needs to overwhelm the onion service, the
+ right client starting difficulty setting. At the end of this section we will
+ know the resources that an attacker needs to overwhelm the onion service, the
resources that the service needs to verify introduction requests, and the
resources that legitimate clients need to get to the onion service.
@@ -673,10 +803,10 @@ Status: Draft
the "target". However, since our system is dynamic, we define "success" as an
abstract high-effort computation.
- Our system is dynamic but we still need a default difficulty settings that
- will define the metagame and be used for bootstrapping the system. The client
- and attacker can still aim higher or lower but for UX purposes and for
- analysis purposes we do need to define a default difficulty.
+ Our system is dynamic but we still need a starting difficulty setting that
+ will be used for bootstrapping the system. The client and attacker can still
+ aim higher or lower but for UX purposes and for analysis purposes we do need
+ to define a starting difficulty, to minimize retries by clients.
6.2.1. Analysis based on adversary power
@@ -730,16 +860,13 @@ Status: Draft
successes per second, then a legitimate client with a single box should be
expected to spend 1 seconds getting a single success.
- With the above table we can create some profiles for default values of our
- PoW difficulty. So for example, we can use the last case as the default
- parameter for Tor Browser, and then create three more profiles for more
- expensive cases, scaling up to the first case which could be hardest since
- the client is expected to spend 15 minutes for a single introduction.
+ With the above table we can create some profiles for starting values of our
+ PoW difficulty.
6.2.2. Analysis based on Tor's performance [POW_DIFFICULTY_TOR]
To go deeper here, we can use the performance measurements from
- [TOR_MEASUREMENTS] to get a more specific intuition on the default
+ [TOR_MEASUREMENTS] to get a more specific intuition on the starting
difficulty. In particular, we learned that completely handling an
introduction cell takes 5.55 msecs in average. Using that value, we can
compute the following table, that describes the number of introduction cells
@@ -772,7 +899,7 @@ Status: Draft
64 high-effort introduction cells per second to succeed in a
[ATTACK_BOTTOM_HALF] attack.
- We can use this table to specify a default difficulty that won't allow our
+ We can use this table to specify a starting difficulty that won't allow our
target adversary to succeed in an [ATTACK_BOTTOM_HALF] attack.
Of course, when it comes to this table, the same disclaimer as in section
@@ -781,26 +908,6 @@ Status: Draft
since they depend on auxiliary processing overheads, and on the network's
capacity.
-6.3. Tuning equix difficulty [EQUIX_DIFFICULTY]
-
- The above two sections were not depending on a particular PoW scheme. They
- gave us an intuition on the values we are aiming for in terms of verification
- speed and PoW difficulty. Now we need to make things concrete:
-
- As described in section [EFFORT_ESTIMATION] we start the service with a
- default suggested-effort value of 5000. Given the benchmarks of EquiX
- [REF_EQUIX] this should take about 2 to 3 seconds on a modern CPU.
-
- With this default difficulty setting and given the table in
- [POW_DIFFICULTY_ANALYSIS] this means that an attacker with 50 boxes will be
- able to get about 20 successful PoWs per second, and an attacker with 100
- boxes about 40 successful PoWs per second.
-
- Then using the table in [POW_DIFFICULTY_TOR] we can see that the number of
- attacker's successes is not enough to overwhelm the service through an
- [ATTACK_BOTTOM_HALF] attack. That is, an attacker would need to do about 152
- introductions per second to overwhelm the service, whereas they can only do
- 40 with 100 boxes.
7. Discussion
@@ -808,35 +915,13 @@ Status: Draft
This proposal has user facing UX consequences.
- Here is some UX improvements that don't need user-input:
-
- - Primarily, there should be a way for Tor Browser to display to users that
- additional time (and resources) will be needed to access a service that is
- under attack. Depending on the design of the system, it might even be
- possible to estimate how much time it will take.
-
- And here are a few UX approaches that will need user-input and have an
- increasing engineering difficulty. Ideally this proposal will not need
- user-input and the default behavior should work for almost all cases.
-
- a) Tor Browser needs a "range field" which the user can use to specify how
- much effort they want to spend in PoW if this ever occurs while they are
- browsing. The ranges could be from "Easy" to "Difficult", or we could try
- to estimate time using an average computer. This setting is in the Tor
- Browser settings and users need to find it.
-
- b) We start with a default effort setting, and then we use the new onion
- errors (see #19251) to estimate when an onion service connection has
- failed because of DoS, and only then we present the user a "range field"
- which they can set dynamically. Detecting when an onion service connection
- has failed because of DoS can be hard because of the lack of feedback (see
- [CLIENT_BEHAVIOR])
-
- c) We start with a default effort setting, and if things fail we
- automatically try to figure out an effort setting that will work for the
- user by doing some trial-and-error connections with different effort
- values. Until the connection succeeds we present a "Service is
- overwhelmed, please wait" message to the user.
+ When the client first attempts a pow, it can note how long iterations of the
+ hash function take, and then use this to determine an estimation of the
+ duration of the PoW. This estimation could be communicated via the control
+ port or other mechanism, such that the browser could display how long the
+ PoW is expected to take on their device. If the device is a mobile platform,
+ and this time estimation is large, it could recommend that the user try from
+ a desktop machine.
7.2. Future work [FUTURE_WORK]
@@ -1127,4 +1212,4 @@ A.2. References
[REF_TLS_1]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
[REF_TEVADOR_1]: https://lists.torproject.org/pipermail/tor-dev/2020-May/014268.html
[REF_TEVADOR_2]: https://lists.torproject.org/pipermail/tor-dev/2020-June/014358.html
- [REF_TEVADOR_SIM]: https://github.com/tevador/scratchpad/blob/master/tor-pow/effort_sim.md
+ [REF_TEVADOR_SIM]: https://github.com/mikeperry-tor/scratchpad/blob/master/tor-pow/effort_sim.py#L57
diff --git a/proposals/329-traffic-splitting.txt b/proposals/329-traffic-splitting.txt
index b9f5dce..93655f0 100644
--- a/proposals/329-traffic-splitting.txt
+++ b/proposals/329-traffic-splitting.txt
@@ -2,14 +2,16 @@ Filename: 329-traffic-splitting.txt
Title: Overcoming Tor's Bottlenecks with Traffic Splitting
Author: David Goulet, Mike Perry
Created: 2020-11-25
-Status: Draft
+Status: Needs-Revision
0. Status
This proposal describes the Conflux [CONFLUX] system developed by
Mashael AlSabah, Kevin Bauer, Tariq Elahi, and Ian Goldberg. It aims at
improving Tor client network performance by dynamically splitting
- traffic between two circuits.
+ traffic between two circuits. We have made several additional improvements
+ to the original Conflux design, by making use of congestion control
+ information, as well as updates from Multipath TCP literature.
1. Overview
@@ -36,22 +38,27 @@ Status: Draft
Tor relay queues, and not with any other bottlenecks (such as
intermediate Internet routers), we can avoid this complexity merely by
specifying that any paths that are constructed SHOULD NOT share any
- relays. In this way, we can proceed to use the exact same congestion
- control as specified in Proposal 324, for each path.
-
- For this reason, this proposal will focus on the traffic scheduling
- algorithms, rather than coupling. We propose three candidate algorithms
- that have been studied in the literature, and will compare their
- performance using simulation and consensus parameters.
+ relays (except for the exit). This assumption is valid, because non-relay
+ bottlenecks are managed by TCP of client-to-relay and relay-to-relay OR
+ connections, and not Tor's circuit-level congestion control. In this way,
+ we can proceed to use the exact same congestion control as specified in
+ [PROP324], for each path.
+
+ For this reason, this proposal will focus on protocol specification, and
+ the traffic scheduling algorithms, rather than coupling. Note that the
+ scheduling algorithms are currently in flux, and will be subject to
+ change as we tune them in Shadow, on the live network, and for future
+ UDP implementation (see [PROP339]). This proposal will be kept up to
+ date with the current implementation.
1.2. Divergence from the initial Conflux design
The initial [CONFLUX] paper doesn't provide any indications on how to
handle the size of out-of-order cell queue, which we consider a
potential dangerous memory DoS vector (see [MEMORY_DOS]). It also used
- RTT as the sole heuristic for selecting which circuit to send on, which
+ RTT as the sole heuristic for selecting which circuit to send on (which
may vary depending on the geographical locations of the participant
- relays, without considering their actual available circuit capacity
+ relays), without considering their actual available circuit capacity
(which will be available to us via Proposal 324). Additionally, since
the publication of [CONFLUX], more modern packet scheduling algorithms
have been developed, which aim to reduce out-of-order queue size.
@@ -62,13 +69,11 @@ Status: Draft
side channel, and traffic analysis risks and benefits in [RESUMPTION],
[SIDE_CHANNELS] and [TRAFFIC_ANALYSIS].
+1.3. Design Overview
-2. Design
-
- The following section describes the Conflux design. Each sub-section is
- a building block to the multipath design that Conflux proposes.
+ The following section describes the Conflux design.
- The circuit construction is as follow:
+ The circuit construction is as follows:
Primary Circuit (lower RTT)
+-------+ +--------+
@@ -91,15 +96,24 @@ Status: Draft
performance.
Then, the OP needs to link the two circuits together, as described in
- [LINKING_CIRCUITS], [LINKING_EXIT], and [LINKING_SERVICE].
+ [CONFLUX_HANDSHAKE].
+
+ For ease of explanation, the primary circuit is the circuit that is
+ more desirable to use, as per the scheduling algorithm, and the secondary
+ circuit is used after the primary is blocked by congestion control. Note
+ that for some algorithms, this selection becomes fuzzy, but all of them
+ favor the circuit with lower RTT, at the beginning of transmission.
- For ease of explanation, the primary circuit is the circuit with lower
- RTT, and the secondary circuit is the circuit with higher RTT. Initial
- RTT is measured during circuit linking, as described in
- [LINKING_CIRCUITS]. RTT is continually measured using SENDME timing, as
- in Proposal 324. This means that during use, the primary circuit and
- secondary circuit may switch roles, depending on unrelated network
- congestion caused by other Tor clients.
+ Note also that this notion of primary vs secondary is a local property
+ of the current sender: each endpoint may have different notions of
+ primary, secondary, and current sending circuit. They also may use
+ different scheduling algorithms to determine this.
+
+ Initial RTT is measured during circuit linking, as described in
+ [CONFLUX_HANDSHAKE]. After the initial link, RTT is continually measured
+ using SENDME timing, as in Proposal 324. This means that during use,
+ the primary circuit and secondary circuit may switch roles, depending on
+ unrelated network congestion caused by other Tor clients.
We also support linking onion service circuits together. In this case,
only two rendezvous circuits are linked. Each of these RP circuits will
@@ -107,14 +121,14 @@ Status: Draft
constraints apply to each half of the circuits (no shared relays between
the legs). If, by chance, the service and the client sides end up
sharing some relays, this is not catastrophic. Multipath TCP researchers
- we have consulted (see [ACKNOWLEDGEMENTS]), believe Tor's congestion
+ we have consulted (see [ACKNOWLEDGMENTS]), believe Tor's congestion
control from Proposal 324 to be sufficient in this rare case.
- Only two circuits SHOULD be linked together. However, implementations
- SHOULD make it easy for researchers to *test* more than two paths, as
- this has been shown to assist in traffic analysis resistance[WTF_SPLIT].
- At minimum, this means not hardcoding only two circuits in the
- implementation.
+ In the algorithms we recommend here, only two circuits will be linked
+ together at a time. However, implementations SHOULD support more than
+ two paths, as this has been shown to assist in traffic analysis
+ resistance[WTF_SPLIT], and will also be useful for maintaining a desired
+ target RTT, for UDP VoIP applications.
If the number of circuits exceeds the current number of guard relays,
guard relays MAY be re-used, but implementations SHOULD use the same
@@ -123,6 +137,9 @@ Status: Draft
Linked circuits MUST NOT be extended further once linked (ie:
'cannibalization' is not supported).
+
+2. Protocol Mechanics
+
2.1. Advertising support for conflux
2.1.1 Relay
@@ -130,26 +147,29 @@ Status: Draft
We propose a new protocol version in order to advertise support for
circuit linking on the relay side:
- "Relay=5" -- Relay supports Conflux as in linking circuits together using
- the new LINK, LINKED and SWITCH relay command.
+ "Conflux=1" -- Relay supports Conflux as in linking circuits together using
+ the new LINK, LINKED and SWITCH relay command.
2.1.2 Onion Service
We propose to add a new line in order to advertise conflux support in the
- onion service descriptor:
+ encrypted section of the onion service descriptor:
- "conflux" SP max-num-circ NL
+ "conflux" SP max-num-circ SP desired-ux NL
The "max-num-circ" value indicate the maximum number of rendezvous
circuits that are allowed to be linked together.
- XXX: We should let the service specify the conflux algorithm to use.
- Some services may prefer latency (LowRTT), where as some may prefer
- throughput (BLEST).
+ We let the service specify the conflux algorithm to use, when sending data
+ to the service. Some services may prefer latency, where as some may prefer
+ throughput. However, clients also have the ability to request their own UX
+ for data that the service sends, in the LINK handshake below, in part
+ because the high-throughput algorithms will require more out-of-order queue
+ memory, which may be infeasible on mobile.
The next section describes how the circuits are linked together.
-2.2. Linking circuits [LINKING_CIRCUITS]
+2.2. Conflux Handshake [CONFLUX_HANDSHAKE]
To link circuits, we propose new relay commands that are sent on both
circuits, as well as a response to confirm the join, and an ack of this
@@ -161,8 +181,9 @@ Status: Draft
linked.
When packed cells are a reality (proposal 340), these cells SHOULD be
- combined with the initial RELAY_BEGIN cell on the faster circuit leg. See
- [LINKING_EXIT] and [LINKING_SERVICE] for more details on setup in each case.
+ combined with the initial RELAY_BEGIN cell on the faster circuit leg.
+ This combination also allows better enforcement against side channels.
+ (See [SIDE_CHANNELS]).
There are other ways to do this linking that we have considered, but they
seem not to be significantly better than this method, especially since we can
@@ -183,7 +204,8 @@ Status: Draft
Sent from the exit/service to the OP, to confirm the circuits were
linked.
- These cells have the following contents:
+ The contents of these two cells is exactly the same. They have the following
+ contents:
VERSION [1 byte]
PAYLOAD [variable, up to end of relay payload]
@@ -197,18 +219,7 @@ Status: Draft
NONCE [32 bytes]
LAST_SEQNO_SENT [8 bytes]
LAST_SEQNO_RECV [8 bytes]
- ALGORITHM [1 byte]
-
- XXX: Should we let endpoints specify their preferred [SCHEDULING] alg
- here, to override consensus params? This has benefits: eg low-memory
- mobile clients can ask for an alg that is better for their reorder
- queues. But it also has complexity risk, if the other endpoint does not
- want to support it, because of its own memory issues.
- - YES. At least for Exit circuits, we *will* want to let clients
- request LowRTT or BLEST/CWND scheduling. So we need an algorithm
- field here.
- - XXX: We need to define rules for negotiation then, for onions and
- exits vs consensus.
+ DESIRED_UX [1 byte]
The NONCE contains a random 256-bit secret, used to associate the two
circuits together. The nonce MUST NOT be shared outside of the circuit
@@ -216,7 +227,28 @@ Status: Draft
MUST NOT be logged to disk.
The two sequence number fields are 0 upon initial link, but non-zero in
- the case of a resumption attempt (See [RESUMPTION]).
+ the case of a reattach or resumption attempt (See [CONFLUX_SET_MANAGEMENT]
+ and [RESUMPTION]).
+
+ The DESIRED_UX field allows the endpoint to request the UX properties
+ it wants. The other endpoint SHOULD select the best known scheduling
+ algorithm, for these properties. The endpoints do not need to agree
+ on which UX style they prefer.
+
+ The UX properties are:
+
+ 0 - NO_OPINION
+ 1 - MIN_LATENCY
+ 2 - LOW_MEM_LATENCY
+ 3 - HIGH_THROUGHPUT
+ 4 - LOW_MEM_THROUGHPUT
+
+ The algorithm choice is performed by to the *sender* of data, (ie: the
+ receiver of the PAYLOAD). The receiver of data (sender of the PAYLOAD)
+ does not need to be aware of the exact algorithm in use, but MAY enforce
+ expected properties (particularly low queue usage, in the case of requesting
+ either LOW_MEM_LATENCY or LOW_MEM_THROUGHPUT). The receiver MAY close the
+ entire conflux set if these properties are violated.
If either circuit does not receive a RELAY_CONFLUX_LINKED response, both
circuits MUST be closed.
@@ -229,40 +261,51 @@ Status: Draft
Sent from the OP to the exit/service, to provide initial RTT
measurement for the exit/service.
- For timeout of the handshake, clients SHOULD use the normal SOCKS/stream
- timeout already in use for RELAY_BEGIN.
-
- These three relay commands are send on *each* leg, to allow each endpoint to
+ These three relay commands are sent on *each* leg, to allow each endpoint to
measure the initial RTT of each leg.
- The circuit SHOULD be closed if at least one of these conditions is met:
+ The client SHOULD abandon and close circuit if the LINKED message takes too
+ long to arrive. This timeout MUST be no larger than the normal SOCKS/stream
+ timeout in use for RELAY_BEGIN, but MAY be the Circuit Build Timeout value,
+ instead. (The C-Tor implementation currently uses Circuit Build Timeout).
- - Once a LINK is received, if the next cell relay command is not a
- LINKED_ACK, unless the command is in a packed cell.
- - Once a LINKED_ACK is received, receiving any other command than these:
- * BEGIN, DATA, END, CONNECTED, RESOLVE, RESOLVED, XON, XOFF, SWITCH
- - Receiving a LINKED without a LINK.
- - Receiving a LINKED_ACK without having sent a LINKED.
-
- XXX Must define our LINK rate limiting parameters.
+ See [SIDE_CHANNELS] for rules for when to reject unexpected handshake cells.
2.2. Linking Circuits from OP to Exit [LINKING_EXIT]
- To link exit circuits, two circuits to the same exit are built. The
- client records the circuit build time of each.
-
- If the circuits are being built on-demand, for immediate use, the circuit
- with the lower build time SHOULD use Proposal 340 to append its first RELAY
- cell to the RELAY_CONFLUX_LINK, on the circuit with the lower circuit build
- time. The exit MUST respond on this same leg. After that, actual RTT
- measurements MUST be used to determine future transmissions, as specified in
- [SCHEDULING].
+ To link exit circuits, two circuits to the same exit are built, with
+ additional restrictions such that they do not share Guard or Middle
+ relays. This restriction applies via specific relay identity keys,
+ and not IP addresses, families, or networks. (This is because the purpose
+ of it is to avoid sharing a bottleneck *inside* relay circuit queues;
+ bottlenecks caused by a shared network are handled by TCP's congestion
+ control on the OR conns).
+
+ Bridges also are subject to the same constraint as Guard relays;
+ the C-Tor codebase emits a warn if only one bridge is configured, unless
+ that bridge has transport "snowflake". Snowflake is exempt from this
+ Guard restriction because it is actually backed by many bridges. In the
+ bridge case, we only warn, and do not refuse to build conflux circuits,
+ because it is not catastrophic that Bridges are shared, it is just
+ sub-optimal for performance and congestion.
+
+ When each circuit is opened, we ensure that congestion control
+ has been negotiated. If congestion control negotiation has failed, the
+ circuit MUST be closed. After this, the linking handshake begins.
The RTT times between RELAY_CONFLUX_LINK and RELAY_CONFLUX_LINKED are
- measured by the client, to determine each circuit RTT to determine primary vs
- secondary circuit use, and for packet scheduling. Similarly, the exit
- measures the RTT times between RELAY_CONFLUX_LINKED and
- RELAY_CONFLUX_LINKED_ACK, for the same purpose.
+ measured by the client, to determine primary vs secondary circuit use,
+ and for packet scheduling. Similarly, the exit measures the RTT times
+ between RELAY_CONFLUX_LINKED and RELAY_CONFLUX_LINKED_ACK, for the same
+ purpose.
+
+ Because of the race between initial data and the RELAY_CONFLUX_LINKED_ACK
+ cell, conditions can arise where an Exit needs to send data before the
+ slowest circuit delivers this ACK. In these cases, it should prefer sending
+ data on the circuit that has delivered the ACK (which will arrive immediately
+ prior to any data from the client). This circuit will be the lower RTT
+ circuit anyway, but the code needs to handle the fact that in this case,
+ there won't yet be an RTT for the second circuit.
2.3. Linking circuits to an onion service [LINKING_SERVICE]
@@ -274,7 +317,20 @@ Status: Draft
meet the client at two separate rendezvous points. These introduce
requests MUST be sent to the same intropoint (due to potential use of
onionbalance), and SHOULD be sent back-to-back on the same intro
- circuit. They MAY be combined with Proposal 340.
+ circuit. They MAY be combined with Proposal 340. (Note that if we do not
+ use Prop340, we will have to raise the limit on number of intros per
+ client circuit to 2, here, at intropoints).
+
+ When rendezvous circuits are built, they should use the same Guard,
+ Bridge, and Middle restrictions as specified in 2.2, for Exits. These
+ restrictions SHOULD also take into consideration all Middles in the path,
+ including the rendezvous point. All relay identities should be unique
+ (again, except for when the Snowflake transport is in use). The one
+ special case here is if the chosen rendezvous points by a client
+ are the same as the service's guards. In this case, the service SHOULD
+ NOT use different guards, but instead stick with the guards it has.
+ The reason for this is that we do not want to create the ability
+ for a client to force a service to use different guards.
The first rendezvous circuit to get joined SHOULD use Proposal 340 to
append the RELAY_BEGIN command, and the service MUST answer on this
@@ -283,18 +339,110 @@ Status: Draft
Once both circuits are linked and RTT is measured, packet scheduling
MUST be used, as per [SCHEDULING].
-2.4. Congestion Control Application [CONGESTION_CONTROL]
+2.4. Conflux Set Management [CONFLUX_SET_MANAGEMENT]
+
+ When managing legs, it is useful to separate sets that have completed the
+ link handshake from legs that are still performing the handshake. Linked
+ sets MAY have additional unlinked legs on the way, but these should not
+ be used for sending data until the handshake is complete.
+
+ It is also useful to enforce various additional conditions on the handshake,
+ depending on if [RESUMPTION] is supported, and if a leg has been launched
+ because of an early failure, or due to a desire for replacement.
+
+2.4.1. Pre-Building Sets
- The SENDMEs for congestion control are performed per-leg. As data
- arrives, regardless of its ordering, it is counted towards SENDME
- delivery. In this way, 'cwnd - package_window' of each leg always
- reflects the available data to send on each leg. This is important for
+ In C-Tor, conflux is only used via circuit prebuilding. Pre-built conflux
+ sets are preferred over other pre-built circuits, but if the pre-built pool
+ ends up empty, normal pre-built circuits are used. If those run out, regular
+ non-conflux circuits are built. In other words, in C-Tor, conflux sets are
+ never built on-demand, but this is strictly an implementation decision, to
+ simplify dealing with the C-Tor codebase
+
+ The consensus parameter 'cfx_max_prebuilt_set' specifies the number of
+ sets to pre-build.
+
+ During upgrade, the consensus parameter 'cfx_low_exit_threshold' will be
+ used, so that if there is a low amount of conflux-supporting exits, only
+ one conflux set will be built.
+
+2.4.2. Set construction
+
+ When a set is launched, legs begin the handshake in the unlinked state.
+ As handshakes complete, finalization is attempted, to create a linked set.
+ On the client, this finalization happens upon receipt of the LINKED cell.
+ On the exit/service, this finalization happens upon *sending* the LINKED_ACK.
+
+ The initiator of this handshake considers the set fully linked once the
+ RELAY_CONFLUX_LINKED_ACK is sent (roughly upon receipt of the LINKED cell).
+ Because of the potential race between LINKED_ACK, and initial data sent by
+ the client, the receiver of the handshake must consider a leg linked at
+ the time of *sending* a LINKED_ACK cell.
+
+ This means that exit legs may not have an RTT measurement, if data on the
+ faster leg beats the LINKED_ACK on the slower leg. The implementation MUST
+ account for this, by treating unmeasured legs as having infinite RTT.
+
+ When attempting to finalize a set, this finalization should not complete
+ if any unlinked legs are still pending.
+
+2.4.3. Closing circuits
+
+ For circuits that are unlinked, the origin SHOULD immediately relaunch a new
+ leg when it is closed, subject to the limits in [SIDE_CHANNELS].
+
+ In C-Tor, we do not support arbitrary resumption. Therefore, we perform
+ some additional checks upon closing circuits, to decide if we should
+ immediately tear down the entire set:
+ - If the closed leg was the current sending leg, close the set
+ - If the closed leg had the highest non-zero last_seq_recv/sent, close the set
+ - If data was in progress on a closed leg (inflight > cc_sendme_inc), then
+ all legs must be closed
+
+2.4.4. Reattaching Legs
+
+ While C-Tor does not support arbitrary resumption, new legs *can* be
+ attached, so long as there is no risk of data loss from a closed leg.
+ This enables latency probing, which will be important for UDP VoIP.
+
+ Currently, the C-Tor codebase checks for data loss by verifying that
+ the LINK/LINKED cell has a lower last_seq_sent than all current
+ legs' maximum last_seq_recv, and a lower last_seq_recv than all
+ current legs last_seq_sent.
+
+ This check is performed on finalization, not the receipt of first
+ handshake cell. This gives the data additional time to arrive.
+
+2.5. Congestion Control Application [CONGESTION_CONTROL]
+
+ The SENDMEs for congestion control are performed per-leg. As soon as
+ data arrives, regardless of its ordering, it is counted towards SENDME
+ delivery. In this way, 'cwnd - inflight' of each leg always reflects
+ the available data to send on each leg. This is important for
[SCHEDULING].
The Congestion control Stream XON/XOFF can be sent on either leg, and
applies to the stream's transmission on both legs.
-2.5. Sequencing [SEQUENCING]
+ In C-Tor, streams used to become blocked as soon as the OR conn
+ of their circuit was blocked. Because conflux can send on the other
+ circuit, which uses a different OR conn, this form of stream blocking
+ has been decoupled from the OR conn status, and only happens when
+ congestion control has decided that all circuits are blocked (congestion
+ control becomes blocked when either 'cwnd - inflight <= 0', *or* when
+ the local OR conn is blocked, so if all local OR conns of a set are
+ blocked, then the stream will block that way).
+
+ Note also that because congestion control only covers RELAY_COMMAND_DATA
+ cells, for all algorithms, a special case must be made such that if no
+ circuit is available to send on due to congestion control blocking,
+ commands other than RELAY_COMMAN_DATA MUST be sent on the current
+ circuit, even if the cell scheduler believes that no circuit is available.
+ Depending on the code structure of Arti, this special case may or may
+ not be necessary. It arises in C-Tor because nothing can block the
+ sending of arbitrary non-DATA relay command cells.
+
+2.6. Sequencing [SEQUENCING]
With multiple paths for data, the problem of data re-ordering appears.
In other words, cells can arrive out of order from the two circuits
@@ -315,8 +463,10 @@ Status: Draft
22 -- RELAY_CONFLUX_SWITCH
- Sent from the client to the exit/service when switching leg in an
- already linked circuit construction.
+ Sent from a sending endpoint when switching leg in an
+ already linked circuit construction. This message is sent on the leg
+ that will be used for new traffic, and tells the receiver the size of
+ the gap since the last data (if any) sent on that leg.
The cell payload format is:
@@ -348,66 +498,39 @@ Status: Draft
the leg should be switched in order to reset that relative sequence number to
fit within 4 bytes.
- In order to rate limit the use of SWITCH to prevent its use as a DropMark
- side channel, the circuit SHOULD be closed if at least one of these
- conditions is met:
-
- - The SeqNum value is below the "cc_sendme_inc" which is currently set
- at 31.
- - If immediately after receiving a SWITCH, another one is received.
+ For a discussion of rules to rate limit the usage of SWITCH as a side
+ channel, see [SIDE_CHANNELS].
- XXX: We should define our rate limiting.
-
- - If we are NOT an exit circuit.
- - If the SeqNum makes our absolute sequence number to overflow.
-
-2.6. Resumption [RESUMPTION]
+2.7. Resumption [RESUMPTION]
In the event that a circuit leg is destroyed, they MAY be resumed.
+ Full resumption is not supported in C-Tor, but is possible to implement,
+ at the expense of always storing roughly a congestion window of
+ already-transmitted data on each endpoint, in the worst case. Simpler
+ forms of resumption, where there is no data loss, are supported. This
+ is important to support latency probing, for ensuring UDP VoIP minimum
+ RTT requirements are met (roughly 300-500ms, depending on VoIP
+ implementation).
Resumption is achieved by re-using the NONCE to the same endpoint
(either [LINKING_EXIT] or [LINKING_SERVICE]). The resumed path need
not use the same middle and guard relays as the destroyed leg(s), but
SHOULD NOT share any relays with any existing legs(s).
- To provide resumption, endpoints store an absolute 64bit cell counter of
- the last cell they have sent on a conflux pair (their LAST_SEQNO_SENT),
- as well the last sequence number they have delivered in-order to edge
- connections corresponding to a conflux pair (their LAST_SEQNO_RECV).
- Additionally, endpoints MAY store the entire contents of unacked
- inflight cells (ie the 'package_window' from proposal 324), for each
- leg, along with information corresponding to those cells' absolute
- sequence numbers.
-
- These 64 bit absolute counters can wrap without issue, as congestion
- windows will never grow to 2^64 cells until well past the Singularity.
- However, it is possible that extremely long, bulk circuits could exceed
- 2^64 total sent or received cells, so endpoints SHOULD handle wrapped
- sequence numbers for purposes of computing retransmit information. (But
- even this case is unlikely to happen within the next decade or so).
-
- Upon resumption, the LAST_SEQNO_SENT and LAST_SEQNO_RECV fields are used
- to convey the sequence numbers of the last cell the relay sent and
- received on that leg. The other endpoint can use these sequence numbers
- to determine if it received the in-flight data or not, or sent more data
- since that point, up to and including this absolute sequence number. If
- LAST_SEQNO_SENT has not been received, the endpoint MAY transmit the
- missing data, if it still has it buffered.
-
- Because both endpoints get information about the other side's absolute
- SENT sequence number, they will know exactly how many re-transmitted
- packets to expect, if the circuit is successfully resumed.
-
- Re-transmitters MUST NOT re-increment their absolute sent fields
- while re-transmitting.
+ If data loss has been detected upon a link handshake, resumption can be
+ achieved by sending a switch cell, which is immediately followed by the
+ missing data. Roughly, each endpoint must check:
+ - if cell.last_seq_recv <
+ min(max(legs.last_seq_sent),max(closed_legs.last_seq_sent)):
+ - send a switch cell immediately with missing data:
+ (last_seq_sent - cell.last_seq_recv)
- If it does not have this missing data due to memory pressure, that
- endpoint MUST destroy *both* legs, as this represents unrecoverable
+ If an endpoint does not have this missing data due to memory pressure,
+ that endpoint MUST destroy *both* legs, as this represents unrecoverable
data loss.
- Otherwise, the new circuit can be re-joined, and its RTT can be compared
- to the remaining circuit to determine if the new leg is primary or
- secondary.
+ Re-transmitters MUST NOT re-increment their absolute sent fields
+ while re-transmitting.
It is even possible to resume conflux circuits where both legs have been
collapsed using this scheme, if endpoints continue to buffer their
@@ -418,60 +541,102 @@ Status: Draft
given priority to be freed in any oomkiller invocation. See [MEMORY_DOS]
for more oomkiller information.
+2.8. Data transmission
+
+ Most cells in Tor are circuit-specific, and should only be sent on a
+ circuit, even if that circuit is part of a conflux set. Cells that
+ are not multiplexed do not count towards the conflux sequence numbers.
+
+ However, in addition to the obvious RELAY_COMMAND_DATA, a subset of cells
+ MUST ALSO be multiplexed, so that their ordering is preserved when they
+ arrive at the other end. These cells do count towards conflux sequence
+ numbers, and are handled in the out-of-order queue, to preserve ordered
+ delivery:
+ RELAY_COMMAND_BEGIN
+ RELAY_COMMAND_DATA
+ RELAY_COMMAND_END
+ RELAY_COMMAND_CONNECTED
+ RELAY_COMMAND_RESOLVE
+ RELAY_COMMAND_RESOLVED
+ RELAY_COMMAND_XOFF
+ RELAY_COMMAND_XON
+
+ Currently, this set is the same as the set of cells that have stream ID,
+ but the property that leads to this requirement is not stream usage by
+ itself, it is that these cells must be ordered with respect to all data
+ on the circuit. It is not impossible that future relay commands could be
+ invented that don't have stream IDs, but yet must still arrive in order
+ with respect to circuit data cells. Prop#253 is one possible example of
+ such a thing (though we won't be implementing that proposal).
+
3. Traffic Scheduling [SCHEDULING]
In order to load balance the traffic between the two circuits, the
original conflux paper used only RTT. However, with Proposal 324, we
will have accurate information on the instantaneous available bandwidth
- of each circuit leg, as 'cwnd - package_window' (see Section 3 of
- Proposal 324).
-
- Some additional RTT optimizations are also useful, to improve
- responsiveness and minimize out-of-order queue sizes.
+ of each circuit leg, as 'cwnd - inflight' (see Section 3 of
+ Proposal 324). We also have the TCP block state of the local OR
+ connection.
We specify two traffic schedulers from the multipath literature and
- adapt them to Tor: [LOWRTT_TOR] and [BLEST_TOR]. [LOWRTT_TOR] also has
- three variants, with different trade offs.
+ adapt them to Tor: [MINRTT_TOR], and [LOWRTT_TOR]. Additionally,
+ we create low-memory variants of these that aim to minimize the
+ out-of-order queue size at the receiving endpoint.
- However, see the [TRAFFIC_ANALYSIS] sections of this proposal for
+ Additionally, see the [TRAFFIC_ANALYSIS] sections of this proposal for
important details on how this selection can be changed, to reduce
website traffic fingerprinting.
-3.1. LowRTT Scheduling [LOWRTT_TOR]
+3.1. MinRTT scheduling [MINRTT_TOR]
- This scheduling algorithm is based on the original [CONFLUX] paper, with
- ideas from [MPTCP]'s minRTT/LowRTT scheduler.
+ This schedulng algorithm is used for the MIN_LATENCY user experience.
- In this algorithm, endpoints send cells on the circuit with lower RTT
- (primary circuit). This continues while the congestion window on the
- circuit has available room: ie whenever cwnd - package_window > 0.
+ It works by always and only sending on the circuit with the current minimum
+ RTT. With this algorithm, conflux should effectively stay on the circuit with
+ the lowest initial RTT, unless that circuit's RTT raises above the RTT of the
+ other circuit (due to relay load or congestion). When the circuit's congestion
+ window is full (ie: cwnd - inflight <= 0), or if the local OR conn blocks,
+ the conflux set stops transmitting and stops reading on edge connections,
+ rather than switch.
- Whenever the primary circuit's congestion window becomes full, the
- secondary circuit is used. We stop reading on the send window source
- (edge connection) when both congestion windows become full.
+ This should result in low out-of-order queues in most situations, unless
+ the initial RTTs of the two circuits are very close (basically within the
+ Vegas RTT bounds of queue variance, 'alpha' and 'beta').
+
+3.2. LowRTT Scheduling [LOWRTT_TOR]
+
+ This scheduling algorithm is based on [MPTCP]'s LowRTT scheduler. This
+ algorithm is used for the UX choice of HIGH_THROUGHPUT.
+
+ In this algorithm, endpoints send cells on the circuit with lowest RTT that
+ has an unblocked local OR connection, and room in its congestion window (ie:
+ cwnd - inflight > 0). We stop reading on edge connections only when both
+ congestion windows become full, or when both local OR connections are blocked.
In this way, unlike original conflux, we switch to the secondary circuit
- without causing congestion on the primary circuit. This improves both
- load times, and overall throughput.
+ without causing congestion either locally, or on either circuit. This
+ improves both load times, and overall throughput. Given a large enough
+ transmission, both circuits are used to their full capacity,
+ simultaneously.
+
+3.3. MinRTT Low-Memory Scheduling [MINRTT_LOWMEM_TOR]
- This behavior matches minRTT from [MPTCP], sometimes called LowRTT.
+ The low memory version of the MinRTT scheduler ensures that we do not
+ perform a switch more often than once per congestion window worth of data.
- It may be better to stop reading on the edge connection when the primary
- congestion window becomes full, rather than switch to the secondary
- circuit as soon as the primary congestion window becomes full. (Ie: only
- switch if the RTTs themselves change which circuit is primary). This is
- what was done in the original Conflux paper. This behavior effectively
- causes us to optimize for responsiveness and congestion avoidance,
- rather than throughput. For evaluation, we will control this switching
- behavior with a consensus parameter (see [CONSENSUS_PARAMETERS]).
+ XXX: Other rate limiting, such as not switching unless the RTT changes by
+ more than X%, may be useful here.
- Because of potential side channel risk (see [SIDE_CHANNELS]), a third
- variant of this algorithm, where the primary circuit is chosen during
- the [LINKING_CIRCUITS] handshake and never changed, is also possible
- to control via consensus parameter.
+3.4. BLEST Scheduling [BLEST_TOR]
-3.2. BLEST Scheduling [BLEST_TOR]
+ XXX: Something like this might be useful for minimizing OOQ for the UX
+ choice of LOW_MEM_THROUGHPUT, but we might just be able to reduce switching
+ frequency instead.
+
+ XXX: We want an algorithm that only uses cwnd instead. This algorithm
+ has issues if the primary cwnd grows while the secondary does not.
+ Expect this section to change.
[BLEST] attempts to predict the availability of the primary circuit, and
use this information to reorder transmitted data, to minimize
@@ -528,51 +693,6 @@ Status: Draft
blocking occurs. Because it is expensive and takes significant time to
signal this over Tor, we omit this.
- XXX: We may want a third algorithm that only uses cwnd, for comparison.
- The above algorithm may have issues if the primary cwnd grows while the
- secondary does not. Expect this section to change.
-
- XXX: See [REORDER_SIGNALING] section if we want this lambda feedback.
-
-3.3. Reorder queue signaling [REORDER_SIGNALING]
-
- Reordering is fairly simple task. By following using the sequence
- number field in [SEQUENCING], endpoints can know how many cells are
- still in flight on the other leg.
-
- To reorder them properly, a buffer of out of order cells needs to be
- kept. On the Exit side, this can quickly become overwhelming
- considering ten of thousands of possible circuits can be held open
- leading to gigabytes of memory being used. There is a clear potential
- memory DoS vector in this case, covered in more detail in
- [MEMORY_DOS].
-
- Luckily, [BLEST_TOR] and the form of [LOWRTT_TOR] that only uses the
- primary circuit will minimize or eliminate this out-of-order buffer.
-
- XXX: The remainder of this section may be over-complicating things... We
- only need these concepts if we want to use BLEST's lambda feedback. Though
- turning this into some kind of receive window that indicates remaining
- reorder buffer size may also help with the total_send_window also noted
- in BLEST_TOR.
-
- The default for this queue size is governed by the 'cflx_reorder_client'
- and 'cflx_reorder_srv' consensus parameters (see [CONSENSUS_PARAMS]).
- 'cflx_reorder_srv' applies to Exits and onion services. Both parameters
- can be overridden by Torrc, to larger or smaller than the consensus
- parameter. (Low memory clients may want to lower it; SecureDrop onion
- services or other high-upload services may want to raise it).
-
- When the reorder queue hits this size, a RELAY_CONFLUX_XOFF is sent down
- the circuit leg that has data waiting in the queue and use of that leg
- SHOULD cease, until it drains to half of this value, at which point an
- RELAY_CONFLUX_XON is sent. Note that this is different than the stream
- XON/XOFF from Proposal 324.
-
- XXX: [BLEST] actually does not cease use of a path in this case, but
- instead uses this signal to adjust the lambda parameter, which biases
- traffic away from that leg.
-
4. Security Considerations
@@ -586,67 +706,122 @@ Status: Draft
pressure. This prevents resumption while data is in flight, but will not
otherwise harm operation.
- For reorder buffers, adversaries can potentially impact this at any
- point, but most obviously and most severely from the client position.
-
- In particular, clients can lie about sequence numbers, sending cells
- with sequence numbers such that the next expected sequence number is
- never sent. They can do this repeatedly on many circuits, to exhaust
- memory at exits.
-
- One option is to only allow actual traffic splitting in the downstream
- direction, towards clients, and always use the primary circuit for
- everything in the upstream direction. However, the ability to support
- conflux from the client to the exit shows promise against traffic
- analysis (see [WTF_SPLIT]).
-
- The other option is to use [BLEST_TOR] from clients to exits, as it has
- predictable interleaved cell scheduling, and minimizes reorder queues at
- exits. If the ratios prescribed by that algorithm are not followed
- within some bounds, the other endpoint can close both circuits, and free
- the queue memory.
-
- This still leaves the possibility that intermediate relays may block a
- leg, allowing cells to traverse only one leg, thus still accumulating at
- the reorder queue. Clients can also spoof sequence numbers similarly, to
- make it appear that they are following [BLEST_TOR], without actually
- sending any data on one of the legs.
-
- To handle either of these cases, when a relay is under memory pressure,
- the circuit OOM killer SHOULD free and close circuits with the oldest
- reorder queue data, first. This heuristic was shown to be best during
- the [SNIPER] attack OOM killer iteration cycle.
-
-4.2. Side Channels [SIDE_CHANNELS]
-
- Two potential side channels may be introduced by the use of Conflux:
- 1. RTT leg-use bias by altering SENDME latency
+ In terms of adversarial issues, clients can lie about sequence numbers,
+ sending cells with sequence numbers such that the next expected sequence
+ number is never sent. They can do this repeatedly on many circuits, to
+ exhaust memory at exits. Intermediate relays may also block a leg, allowing
+ cells to traverse only one leg, thus still accumulating at the reorder queue.
+
+ In C-Tor we will mitigate this in three ways: via the OOM killer, by the
+ ability for exits to request that clients use the LOW_MEM_LATENCY UX
+ behavior, and by rate limiting the frequency of switching under the
+ LOW_MEM_LATENCY UX style.
+
+ When a relay is under memory pressure, the circuit OOM killer SHOULD free
+ and close circuits with the oldest reorder queue data, first. This heuristic
+ was shown to be best during the [SNIPER] attack OOM killer iteration cycle.
+
+ The rate limiting under LOW_MEM_LATENCY will be heuristic driven, based
+ on data from Shadow simulations, and live network testing. It is possible that
+ other algorithms may be able to be similarly rate limited.
+
+4.2. Protocol Side Channels [SIDE_CHANNELS]
+
+ To understand the decisions we make below with respect to handling
+ potential side channels, it is important to understand a bit of the history
+ of the Tor threat model.
+
+ Tor's original threat model completely disregarded all traffic analysis,
+ including protocol side channels, assuming that they were all equally
+ effective, and that diversity of relays was what provided protection.
+ Numerous attack papers have proven this to be an over-generalization.
+
+ Protocol side channels are most severe when a circuit is known to be silent,
+ because stateful protocol behavior prevents other normal cells from ever being
+ sent. In these cases, it is trivial to inject a packet count pattern that has
+ zero false positives. These kinds of side channels are made use of in the
+ Guard discovery literature, such as [ONION_FOUND], and [DROPMARK]. It is even
+ more trivial to manipulate the AES-CTR cipherstream, as per [RACOON23], until
+ we implement [PROP308].
+
+ However, because we do not want to make this problem worse, it is extremely
+ important to be mindful of ways that an adversary can inject new cell
+ commands, as well as ways that the adversary can spawn new circuits
+ arbitrarily.
+
+ It is also important, though slightly less so, to be mindful of the uniqueness
+ of new handshakes, as handshakes can be used to classify usage (such as via
+ Onion Service Circuit Fingerprinting). Handshake side channels are only
+ weakly defended, via padding machines for onion services. These padding
+ machines will need to be improved, and this is also scheduled for arti.
+
+ Finally, usage-based traffic analysis need to be considered. This includes
+ things like website traffic fingerprinting, and is covered in
+ [TRAFFIC_ANALYSIS].
+
+4.2.1. Cell Injection Side Channel Mitigations
+
+ To avoid [DROPMARK] attacks, several checks must be performed, depending
+ on the cell type. The circuit MUST be closed if any of these checks fail.
+
+ RELAY_CONFLUX_LINK:
+ - Ensure conflux is enabled
+ - Ensure the circuit is an Exit (or Service Rend) circuit
+ - Ensure that no previous LINK cell has arrived on this circuit
+
+ RELAY_CONFLUX_LINKED:
+ - Ensure conflux is enabled
+ - Ensure the circuit is client-side
+ - Ensure this is an unlinked circuit that sent a LINK command
+ - Ensure that the nonce matches the nonce used in the LINK command
+ - Ensure that the cell came from the expected hop
+
+ RELAY_CONFLUX_LINKED_ACK:
+ - Ensure conflux is enabled
+ - Ensure that this circuit is not client-side
+ - Ensure that the circuit has successfully received its LINK cell
+ - Ensure that this circuit has not received a LINKED_ACK yet
+
+ RELAY_CONFLUX_SWITCH
+ - If Prop#340 is in use, this cell MUST be packed with a valid
+ multiplexed RELAY_COMMAND cell.
+ - XXX: Additional rate limiting per algorithm, after tuning.
+
+4.2.2. Guard Discovery Side Channel Mitigations
+
+ In order to mitigate potential guard discovery by malicious exits,
+ clients MUST NOT retry failed unlinked circuit legs for a set more than
+ 'cfx_max_unlinked_leg_retry' times.
+
+4.2.3. Usage-Based Side Channel Discussion
+
+ After we have solved all of the zero false positive protocol side
+ channels in Tor, our attention can turn to more subtle, usage-based
+ side channels.
+
+ Two potential usage side channels may be introduced by the use of Conflux:
+ 1. Delay-based side channels, by manipulating switching
2. Location info leaks through the use of both leg's latencies
- For RTT and leg-use bias, Guard relays could delay legs to introduce a
- pattern into the delivery of cells at the exit relay, by varying the
- latency of SENDME cells (every 100th cell) to change the distribution of
- traffic to send information. This attack could be performed in either
- direction of traffic, to bias traffic load off of a particular Guard.
- If an adversary controls both Guards, it could in theory send a binary
- signal more easily, by alternating delays on each.
-
- However, this risk weighs against the potential benefits against traffic
- fingerprinting, as per [WTF_SPLIT]. Additionally, even ignoring
- cryptographic tagging attacks, this side channel provides significantly
- lower information over time than inter-packet-delay based side channels
- that are already available to Guards and routers along the path to the
- Guard.
-
- Tor currently provides no defenses against already existing
- single-circuit delay-based side channels, though both circuit padding
- and [BACKLIT] are potential options it could conceivably deploy. The
- [BACKLIT] paper also has an excellent review of the various methods that
- have been studied for such single circuit side channels, and the
- [BACKLIT] style RTT monitoring could be used to protect against these
- conflux side channels as well. Circuit padding can also help to obscure
- which cells are SENDMEs, since circuit padding is not counted towards
- SENDME totals.
+ To perform delay-based side channels, Exits can simply disregard the RTT
+ or cwnd when deciding to switch legs, thus introducing a pattern of gaps that
+ the Guard node can detect. Guard relays can also delay legs to introduce a
+ pattern into the delivery of cells at the exit relay, by varying the latency
+ of SENDME cells (every 31st cell) to change the distribution of traffic to
+ send information. This attack could be performed in either direction of
+ traffic, to bias traffic load off of a particular Guard. If an adversary
+ controls both Guards, it could in theory send a binary signal, by
+ alternating delays on each.
+
+ However, Tor currently provides no defenses against already existing
+ single-circuit delay-based (or stop-and-start) side channels. It is already
+ the case that on a single circuit, either the Guard or the Exit can simply
+ withhold sending traffic, as per a recognizable pattern. This class of
+ attacks, and a possible defense for them, is discussed in [BACKLIT].
+
+ However, circuit padding can also help to obscure these side channels,
+ even if tuned for website fingerprinting. See [TRAFFIC_ANALYSIS] for more
+ details there.
The second class of side channel is where the Exit relay may be able to
use the two legs to further infer more information about client
@@ -658,29 +833,17 @@ Status: Draft
or if it proves possible possible to mitigate single-circuit side
channels, but not conflux side channels.
- In all cases, all of these side channels appear less severe for onion
- service traffic, due to the higher path variability due to relay
- selection, as well as the end-to-end nature of conflux in that case.
- Thus, we separate our ability to enable/disable conflux for onion
- services from Exits.
-
4.3. Traffic analysis [TRAFFIC_ANALYSIS]
Even though conflux shows benefits against traffic analysis in
[WTF_SPLIT], these gains may be moot if the adversary is able to perform
packet counting and timing analysis at guards to guess which specific
- circuits are linked. In particular, the 3 way handshake in
+ circuits are linked. In particular, the 3 way handshake in
[LINKING_CIRCUITS] may be quite noticeable.
- As one countermeasure, it may be possible to eliminate the third leg
- (RELAY_CIRCUIT_LINKED_ACK) by computing the exit/service RTT via
- measuring the time between CREATED/REND_JOINED and RELAY_CIRCUIT_LINK,
- but this will introduce cross-component complexity into Tor's protocol
- that could quickly become unwieldy and fragile.
-
Additionally, the conflux handshake may make onion services stand out
more, regardless of the number of stages in the handshake. For this
- reason, it may be more wise to simply address these issues with circuit
+ reason, it may be wise to simply address these issues with circuit
padding machines during circuit setup (see padding-spec.txt).
Additional traffic analysis considerations arise when combining conflux
@@ -698,9 +861,10 @@ Status: Draft
capability. [RESUMPTION] with buffering of the inflight unacked
package_window data, for retransmit, is a partial mitigation, if
endpoints buffer this data for retransmission for a brief time even if
- both legs close. This seems more feasible for onion services, which are
- more vulnerable to this attack. However, if the adversary controls the
- client, they will notice the resumption re-link, and still obtain
+ both legs close. This buffering seems more feasible for onion services,
+ which are more vulnerable to this attack. However, if the adversary
+ controls the client and is attacking the service in this way, they
+ will notice the resumption re-link at their client, and still obtain
confirmation that way.
It seems the only way to fully mitigate these kinds of attacks is with
@@ -713,29 +877,42 @@ Status: Draft
provide similar RST injection resistance, and resumption at Guard/Bridge
nodes, as well.
+5. Consensus Parameters [CONSENSUS]
-5. System Interactions
+ - cfx_enabled
+ - Values: 0=off, 1=on
+ - Description: Emergency off switch, in case major issues are discovered.
- - congestion control
- - EWMA and KIST
- - CBT and number of guards
- - Onion service circ obfuscation
- - Future UDP (may increase need for UDP to buffer before dropping)
- - Padding (no sequence numbers on padding cells, as per [SEQUENCING])
- - Also, any padding machines may need re-tuning
- - No 'cannibalization' of linked circuits
+ - cfx_low_exit_threshold
+ - Range: 0-10000
+ - Description: Fraction out of 10000 that represents the fractional rate of
+ exits that must support protover 5. If the fraction is below this
+ amount, the number of pre-built sets is restricted to 1.
+
+ - cfx_max_linked_set
+ - Range: 0-255
+ - Description: The total number of linked sets that can be created. 255
+ means "unlimited".
+ - cfx_max_prebuilt_set
+ - Range: 0-255
+ - Description: The maximum number of pre-built conflux sets to make.
+ This value is overridden by the 'cfx_low_exit_threshold' criteria.
-6. Consensus and Torrc Parameters [CONSENSUS]
+ - cfx_max_unlinked_leg_retry
+ - Range: 0-255
+ - Description: The maximum number of times to retry an unlinked leg that
+ fails during build or link, to mitigate guard discovery attacks.
- - conflux_circs
- - Number of conflux circuits
+ - cfx_num_legs_set
+ - Range: 0-255
+ - Description: The number of legs to link in a set.
- - conflux_sched_exits, conflux_sched_clients, conflux_sched_service
- - Three forms of LOWRTT_TOR, and BLEST_TOR
+ - cfx_send_pct
+ - XXX: Experimental tuning parameter. Subject to change/removal.
- - ConfluxOnionService
- - ConfluxOnionCircs
+ - cfx_drain_pct
+ - XXX: Experimental tuning parameter. Subject to change/removal.
7. Tuning Experiments [EXPERIMENTS]
@@ -807,7 +984,7 @@ A.2. Alternative RTT measurement [ALTERNATIVE_RTT]
We should not add more.
-Appendix B: Acknowledgments [ACKNOWLEDGEMENTS]
+Appendix B: Acknowledgments [ACKNOWLEDGMENTS]
Thanks to Per Hurtig for helping us with the framing of the MPTCP
problem space.
@@ -856,3 +1033,21 @@ References:
[DROPMARK]
https://www.petsymposium.org/2018/files/papers/issue2/popets-2018-0011.pdf
+
+[RACCOON23]
+ https://archives.seul.org/or/dev/Mar-2012/msg00019.html
+
+[ONION_FOUND]
+ https://www.researchgate.net/publication/356421302_From_Onion_Not_Found_to_Guard_Discovery/fulltext/619be24907be5f31b7ac194a/From-Onion-Not-Found-to-Guard-Discovery.pdf
+
+[VANGUARDS_ADDON]
+ https://github.com/mikeperry-tor/vanguards/blob/master/README_TECHNICAL.md
+
+[PROP324]
+ https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/324-rtt-congestion-control.txt
+
+[PROP339]
+ https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/339-udp-over-tor.md
+
+[PROP308]
+ https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/308-counter-galois-onion.txt
diff --git a/proposals/340-packed-and-fragmented.md b/proposals/340-packed-and-fragmented.md
index 6fb1ca5..74af5ca 100644
--- a/proposals/340-packed-and-fragmented.md
+++ b/proposals/340-packed-and-fragmented.md
@@ -34,7 +34,8 @@ This proposal combines ideas from
[next-generation relay cryptography](./308-counter-galois-onion).
Additionally, this proposal has been revised to incorporate another
-protocol change, and remove StreamId from the relay cell header.
+protocol change, and move StreamId from the relay cell header into a new,
+optional header.
## A preliminary change: Relay encryption, version 1.5
@@ -54,36 +55,50 @@ The new format for a decrypted relay _cell_ will be:
digest [14 bytes]
body [509 - 16 = 493 bytes]
-"Digest" and "recognized" are computed as before; the only difference
-is that they occur _before_ the rest of the cell, and that "digest" is
-truncated to 14 bytes instead of 4.
+The `recognized` and `digest` fields are computed as before; the only
+difference is that they occur _before_ the rest of the cell, and that `digest`
+is truncated to 14 bytes instead of 4.
If we are lucky, we won't have to build this encryption at all, and we
can just move to some version of GCM-UIV or other RPRP that reserves 16
bytes for an authentication tag or similar cryptographic object.
+The `body` MUST contain exactly 493 bytes as cells have a fixed size.
+
## New relay message packing
-Relay _messages_ now follow the following format:
+We define this new format for a relay message which has to fit within one
+relay cell. However, the body can be split accross many relay cells:
+
+```
+ Message Header
+ command u8
+ length u16
+ Message Routing Header (optional)
+ stream_id u16
+ Message Body
+ data u8[length]
+```
+
+One big change from the current tor protocol is something that has become
+optional: we have moved `stream_id` into a separate inner header that only
+appears sometimes named the Message Routing Header. The command value tells us
+if the header is to be expected or not.
- Header
- command u8
- length u16
- Body
- data u8[length]
+The following message types take required stream IDs: `BEGIN`, `DATA`, `END`,
+`CONNECTED`, `RESOLVE`, `RESOLVED`, and `BEGIN_DIR`, `XON`, `XOFF`.
-We require that "command" is never 0.
+The following message types from proposal 339 (UDP) take required stream IDs:
+`CONNECT_UDP`, `CONNECTED_UDP` and `DATAGRAM`.
-One big change from the current tor protocol is something that is _not_
-here: we have moved `stream_id` into the body of the relay message of
-those commands that need it.
+No other message types take stream IDs. The `stream_id` field, when present,
+MUST NOT be zero.
Messages can be split across relay cells; multiple messages can occur in
a single relay cell. We enforce the following rules:
* Headers may not be split across cells.
- * If a 0 byte follows any relay message, there are no more messages in
- that cell.
+ * If a 0 byte follows a message body, there are no more messages.
* A relay cell may not be "empty": it must have at least some part of
some message.
@@ -91,65 +106,44 @@ Unless specified elsewhere, **all** message types may be packed, and
**all** message types may be fragmented.
Every command has an associated maximum length for its messages. If not
-specified elsewhere, the maximum length for every message is 498 bytes
-(for legacy reasons).
-
-Receivers MUST validate that headers are well-formed and have valid
-lengths while handling the cell in which the header is encoded. If the
-header is invalid, the receiver must destroy the circuit.
-
+specified elsewhere, the maximum length for every message is 498 bytes (for
+legacy reasons).
+Receivers MUST validate that the cell `header` and the `message header` are
+well-formed and have valid lengths while handling the cell in which the header
+is encoded. If any of them is invalid, the circuit MUST be destroyed.
+An unrecognized `command` is considered invalid and thus MUST result in the
+circuit being immediately destroyed.
## Negotiation
-This message format requires a new `Relay` subprotocol version to
-indicate support. If a client wants to indicate support for this
-format, it sends the following extension as part of its `ntor3`
-handshake:
-
- RELAY_PROTOCOL
- version u8
-
-The `version` field is the `Relay` subprotocol version that the client
-wants to use. The relay must send back the same extension in its ntor3
-handshake to acknowledge support.
-
+This message format requires a new `Relay` subprotocol version to indicate
+support. If a client wants to indicate support for this format, it sends the
+following extension as part of its `ntor3` handshake:
-## Specifying the message format with moved stream ID.
+ EXT_FIELD_TYPE:
-Here, we'll specify how to adjust tor-spec to describe the `stream_id`
-move correctly.
+ [03] -- Packed and Fragmented Cell Request
-Below, suppose that `Relay=V` denotes whatever version of the relay
-message subprotocol denotes support for this proposal.
+This field is zero payload length. Its presence signifies that the client
+wants to use packed and fragmented cells on the circuit.
-For all relay message types that include a stream ID, we insert
-the following at the beginning of their fields:
+The Exit side ntorv3 response payload is encoded as:
->```
-> STREAM_ID [2 bytes] (Present when Relay protocol version >= V).
->```
+ EXT_FIELD_TYPE:
-We add a note saying:
+ [04] -- Packed and Fragmented Cell Response
-> STREAM_ID is part of many message types, when using Relay protocol
-> version V or later. Earlier versions of the Relay protocol put
-> STREAM_ID in the RELAY header. In those versions, the field is
-> STREAM_ID omitted from the message.
->
-> Except when noted, STREAM_ID may not be zero.
+Again, the extension presence indicates to the client that the Exit has
+acknowledged the feature and is ready to use it. If the extension is not
+present, the client MUST not use the packed and fragmented feature even though
+the Exit has advertised the correct protover.
-The following message types take required stream IDs: `BEGIN`, `DATA`, `END`,
-`CONNECTED`, `RESOLVE`, `RESOLVED`, and `BEGIN_DIR`, `XON`, `XOFF`.
-
-The following message type takes an *optional* stream ID: `SENDME`.
-(*Stream-level sendmes are not a thing anymore with proposal 324, but I
-want to give implementations the freedom to implement prop324 and this
-proposal in either order.*)
+The client MUST reject the handshake and thus close the circuit if:
-The following message types from proposal 339 (UDP) take required stream
-IDs: `CONNECT_UDP`, `CONNECTED_UDP` and `DATAGRAM`.
+ - The response extension is seen for a non-ntorv3 handshake.
+ - The response extension is seen but no request was made initially.
## Migration
@@ -204,7 +198,7 @@ this.)
### Extending message-length maxima
-For now, the maximum length for every message is 498 bytes, except as
+For now, the maximum length for every message body is 493 bytes, except as
follows:
- `DATAGRAM` messages (see proposal 339) have a maximum body length
@@ -218,149 +212,166 @@ saying so, and reserving a new subprotocol version.)
# Appendix: Example cells
-
-Here is an example of the simplest case: one message, sent in one relay
-cell. Here it is a BEGIN message.
+Here is an example of the simplest case: one message, sent in one relay cell:
```
Cell 1:
- Cell header
+ header:
circid .. [4 bytes]
command RELAY [1 byte]
- relay cell header
+ relay cell header:
recognized 0 [2 bytes]
digest (...) [14 bytes]
message header:
command BEGIN [1 byte]
length 23 [2 bytes]
- message body
- stream_id (...) [2 bytes]
+ message routing header:
+ stream_id 42 [2 bytes]
+ message body:
"www.torproject.org:443\0" [23 bytes]
- end-of-messages marker
+ end-of-messages marker:
0 [1 byte]
- padding up to end of cell
+ padding up to end of cell:
random [464 bytes]
-
```
+Total of 514 bytes which is the absolute maximum cell size.
+
Here's an example with fragmentation only: a large EXTEND2 message split
across two relay cells.
```
Cell 1:
- Cell header
- circid .. [4 bytes]
- command RELAY_EARLY [1 byte]
- relay cell header
- recognized 0 [2 bytes]
- digest (...) [14 bytes]
+ header:
+ circid .. [4 bytes]
+ command RELAY_EARLY [1 byte]
+ relay cell header:
+ recognized 0 [2 bytes]
+ digest (...) [14 bytes]
message header:
- command EXTEND [1 byte]
- length 800 [2 bytes]
- message body
- stream_id 0 [2 bytes]
- (extend body, part 1) [488 bytes]
+ command EXTEND [1 byte]
+ length 800 [2 bytes]
+ message body:
+ (extend body, part 1) [490 bytes]
Cell 2:
- Cell header
- circid .. [4 bytes]
- command RELAY [1 byte]
- relay cell header
+ header:
+ circid .. [4 bytes]
+ command RELAY [1 byte]
+ relay cell header:
recognized 0 [2 bytes]
digest (...) [14 bytes]
- message body, continued
- (extend body, part 2) [312 bytes]
- end-of-messages marker
+ message body, continued:
+ (extend body, part 2) [310 bytes] (310+490=800)
+ end-of-messages marker:
0 [1 byte]
- padding up to end of cell
- random [180 bytes]
+ padding up to end of cell:
+ random [182 bytes]
```
-Here is an example with packing only: A begin_dir message and a data
-message in the same cell.
+Each cells are 514 bytes for a message body totalling 800 bytes.
+
+Here is an example with packing only: A `BEGIN_DIR` message and a data message
+in the same cell.
```
Cell 1:
- Cell header
+ header:
circid .. [4 bytes]
command RELAY [1 byte]
- relay cell header
+ relay cell header:
recognized 0 [2 bytes]
digest (...) [14 bytes]
+
+ # First relay message
message header:
command BEGIN_DIR [1 byte]
length 0 [2 bytes]
- message body:
+ message routing header:
stream_id 32 [2 bytes]
+
+ # Second relay message
message header:
command DATA [1 byte]
length 25 [2 bytes]
- message body:
+ message routing header:
stream_id 32 [2 bytes]
+ message body:
"HTTP/1.0 GET /tor/foo\r\n\r\n" [25 bytes]
- end-of-messages marker
+
+ end-of-messages marker:
0 [1 byte]
- padding up to end of cell
+ padding up to end of cell:
random [457 bytes]
```
Here is an example with packing and fragmentation: a large DATAGRAM cell, a
-SENDME cell, and an XON cell. (Note that this sequence of cells would not
-actually be generated by the algorithm described in "Packing decisions"
-above; this is only an example of what parties need to accept.)
+SENDME cell, and an XON cell.
+
+(Note that this sequence of cells would not actually be generated by the
+algorithm described in "Packing decisions" above; this is only an example of
+what parties need to accept.)
```
Cell 1:
- Cell header
- circid .. [4 bytes]
- command RELAY [1 byte]
- relay cell header
+ header:
+ circid .. [4 bytes]
+ command RELAY [1 byte]
+ relay cell header:
recognized 0 [2 bytes]
digest (...) [14 bytes]
+
+ # First message
message header:
command DATAGRAM [1 byte]
length 1200 [2 bytes]
- message body
+ message routing header:
stream_id 99 [2 bytes]
+ message body:
(datagram body, part 1) [488 bytes]
Cell 2:
- Cell header
- circid .. [4 bytes]
- command RELAY [1 byte]
- relay cell header
+ header:
+ circid .. [4 bytes]
+ command RELAY [1 byte]
+ relay cell header:
recognized 0 [2 bytes]
digest (...) [14 bytes]
- message body, continued
+ message body, continued:
(datagram body, part 2) [493 bytes]
Cell 3:
- Cell header
- circid .. [4 bytes]
- command RELAY [1 byte]
- relay cell header
+ header:
+ circid .. [4 bytes]
+ command RELAY [1 byte]
+ relay cell header:
recognized 0 [2 bytes]
digest (...) [14 bytes]
- message body, continued
+ message body, continued:
(datagram body, part 3) [219 bytes] (488+493+219=1200)
+
+ # Second message
message header:
command SENDME [1 byte]
length 23 [2 bytes]
message body:
- stream_id 0 [2 bytes]
version 1 [1 byte]
datalen 20 [2 bytes]
data (digest to ack) [20 bytes]
+
+ # Third message
message header:
command XON [1 byte]
length 1 [2 bytes]
- message body:
+ message routing header:
stream_id 50 [2 bytes]
+ message body:
version 1 [1 byte]
- end-of-messages marker
+
+ end-of-messages marker:
0 [1 byte]
- padding up to end of cell
- random [239 bytes]
+ padding up to end of cell:
+ random [241 bytes]
```
diff --git a/proposals/343-rend-caa.txt b/proposals/343-rend-caa.txt
new file mode 100644
index 0000000..0859690
--- /dev/null
+++ b/proposals/343-rend-caa.txt
@@ -0,0 +1,111 @@
+Filename: 343-rend-caa.txt
+Title: CAA Extensions for the Tor Rendezvous Specification
+Author: Q Misell <q@as207960.net>
+Created: 2023-04-25
+Status: Open
+Ticket: https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/716
+
+Overview:
+ The document defines extensions to the Tor Rendezvous Specification Hidden
+ Service descriptor format to allow the attachment of DNS style CAA records to
+ Tor hidden services to allow the same security benefits as CAA provides in the
+ DNS.
+
+Motivation:
+ As part of the work on draft-misell-acme-onion [I-D.misell-acme-onion] at the
+ IETF it was felt necessary to define a method to incorporate CAA records
+ [RFC8659] into Tor hidden services.
+
+ CAA records in the DNS provide an mechanism to indicate which Certificate
+ Authorities are permitted to issue certificates for a given domain name, and
+ restrict which validation methods are permitted for certificate validation.
+
+ As Tor hidden service domains are not in the DNS another way to provide the
+ same security benefits as CAA does in the DNS needed to be devised.
+
+ It is important to note that a hidden service is not required to publish a CAA
+ record to obtain a certificate, as is the case in the DNS.
+
+ More information about this project in general can be found at
+ https://acmeforonions.org.
+
+Specification:
+ To enable maximal code re-use in CA codebases the same CAA record format is
+ used in Tor hidden services as in the DNS. To this end a new field is added to
+ the second layer hidden service descriptor [tor-rend-spec-v3] § 2.5.2.2.
+ with the following format:
+
+ "caa" SP flags SP tag SP value NL
+ [Any number of times]
+
+ The contents of "flag", "tag", and "value" are as per [RFC8659] § 4.1.1.
+ Multiple CAA records may be present, as is the case in the DNS.
+
+ A hidden service's second layer descriptor using CAA may look
+ something like the following:
+
+ create2-formats 2
+ single-onion-service
+ caa 0 issue "example.com"
+ caa 0 iodef "mailto:security@example.com"
+ caa 128 validationmethods "onion-csr-01"
+ introduction-point AwAGsAk5nSMpAhRqhMHbTFCTSlfhP8f5PqUhe6DatgMgk7kSL3KHCZ...
+
+ As the CAA records are in the second layer descriptor and in the case of a
+ hidden service requiring client authentication it is impossible to read them
+ without the hidden service trusting a CA's public key, a method is required to
+ signal that there are CAA records present (but not reveal their contents,
+ which may disclose unwanted information about the hidden service operator to
+ third parties). This is to allow a CA to know that it must attempt to check
+ CAA records before issuance, and fail if it is unable to do so.
+
+ To this end a new field is added to the first layer hidden service descriptor
+ [tor-rend-spec-v3] § 2.5.1.2. with the following format:
+
+ "caa-critical" NL
+ [At most once]
+
+Security Considerations:
+ The second layer descriptor is signed, encrypted and MACed in a way that only
+ a party with access to the secret key of the hidden service could manipulate
+ what is published there. Therefore, Tor CAA records have at least the same
+ security as those in the DNS secured by DNSSEC.
+
+ The "caa-critical" flag is visible to anyone with knowledge of the hidden
+ service's public key, however it reveals no information that could be used to
+ de-anonymize the hidden service operator.
+
+ The CAA flags in the second layer descriptor may reveal information about the
+ hidden service operator if they choose to publish an "iodef", "contactemail",
+ or "contactphone" tag. These however are not required for primary goal of CAA,
+ that is to restrict which CAs may issue certificates for a given domain name.
+
+ No more information is revealed by the "issue" nor "issuewild" tags than would
+ be revealed by someone making a connection to the hidden service and noting
+ which certificate is presented.
+
+Compatibility:
+ The hidden service spec [tor-rend-spec-v3] already requires that clients
+ ignore unknown lines when decoding hidden service descriptors, so this change
+ should not cause any compatibility issues. Additionally in testing no
+ compatibility issues where found with existing Tor implementations.
+
+ A hidden service with CAA records published in its descriptor is available at
+ znkiu4wogurrktkqqid2efdg4nvztm7d2jydqenrzeclfgv3byevnbid.onion, to allow
+ further compatibility testing.
+
+References:
+ [I-D.misell-acme-onion]
+ Misell, Q., "Automated Certificate Management Environment (ACME)
+ Extensions for ".onion" Domain Names", Internet-Draft
+ draft-misell-acme-onion-02, April 2023,
+ <https://datatracker.ietf.org/doc/html/draft-misell-acme-onion-02>.
+
+ [RFC8659] Hallam-Baker, P., Stradling, R., and J. Hoffman-Andrews,
+ "DNS Certification Authority Authorization (CAA) Resource
+ Record", RFC 8659, DOI 10.17487/RFC8659, November 2019,
+ <https://www.rfc-editor.org/info/rfc8659>.
+
+ [tor-rend-spec-v3]
+ The Tor Project, "Tor Rendezvous Specification - Version 3",
+ <https://spec.torproject.org/rend-spec-v3>. \ No newline at end of file
diff --git a/proposals/BY_INDEX.md b/proposals/BY_INDEX.md
index d0b1214..f48ef31 100644
--- a/proposals/BY_INDEX.md
+++ b/proposals/BY_INDEX.md
@@ -236,17 +236,17 @@ Below are a list of proposals sorted by their proposal number. See
* [`316-flashflow.md`](/proposals/316-flashflow.md): FlashFlow: A Secure Speed Test for Tor (Parent Proposal) [DRAFT]
* [`317-secure-dns-name-resolution.txt`](/proposals/317-secure-dns-name-resolution.txt): Improve security aspects of DNS name resolution [NEEDS-REVISION]
* [`318-limit-protovers.md`](/proposals/318-limit-protovers.md): Limit protover values to 0-63 [CLOSED]
-* [`319-wide-everything.md`](/proposals/319-wide-everything.md): RELAY_FRAGMENT cells [OPEN]
+* [`319-wide-everything.md`](/proposals/319-wide-everything.md): RELAY_FRAGMENT cells [OBSOLETE]
* [`320-tap-out-again.md`](/proposals/320-tap-out-again.md): Removing TAP usage from v2 onion services [REJECTED]
* [`321-happy-families.md`](/proposals/321-happy-families.md): Better performance and usability for the MyFamily option (v2) [ACCEPTED]
* [`322-dirport-linkspec.md`](/proposals/322-dirport-linkspec.md): Extending link specifiers to include the directory port [OPEN]
* [`323-walking-onions-full.md`](/proposals/323-walking-onions-full.md): Specification for Walking Onions [OPEN]
* [`324-rtt-congestion-control.txt`](/proposals/324-rtt-congestion-control.txt): RTT-based Congestion Control for Tor [OPEN]
-* [`325-packed-relay-cells.md`](/proposals/325-packed-relay-cells.md): Packed relay cells: saving space on small commands [OPEN]
+* [`325-packed-relay-cells.md`](/proposals/325-packed-relay-cells.md): Packed relay cells: saving space on small commands [OBSOLETE]
* [`326-tor-relay-well-known-uri-rfc8615.md`](/proposals/326-tor-relay-well-known-uri-rfc8615.md): The "tor-relay" Well-Known Resource Identifier [OPEN]
* [`327-pow-over-intro.txt`](/proposals/327-pow-over-intro.txt): A First Take at PoW Over Introduction Circuits [DRAFT]
* [`328-relay-overload-report.md`](/proposals/328-relay-overload-report.md): Make Relays Report When They Are Overloaded [CLOSED]
-* [`329-traffic-splitting.txt`](/proposals/329-traffic-splitting.txt): Overcoming Tor's Bottlenecks with Traffic Splitting [DRAFT]
+* [`329-traffic-splitting.txt`](/proposals/329-traffic-splitting.txt): Overcoming Tor's Bottlenecks with Traffic Splitting [NEEDS-REVISION]
* [`330-authority-contact.md`](/proposals/330-authority-contact.md): Modernizing authority contact entries [OPEN]
* [`331-res-tokens-for-anti-dos.md`](/proposals/331-res-tokens-for-anti-dos.md): Res tokens: Anonymous Credentials for Onion Service DoS Resilience [DRAFT]
* [`332-ntor-v3-with-extra-data.md`](/proposals/332-ntor-v3-with-extra-data.md): Ntor protocol with extra data, version 3 [FINISHED]
@@ -260,4 +260,5 @@ Below are a list of proposals sorted by their proposal number. See
* [`340-packed-and-fragmented.md`](/proposals/340-packed-and-fragmented.md): Packed and fragmented relay messages [OPEN]
* [`341-better-oos.md`](/proposals/341-better-oos.md): A better algorithm for out-of-sockets eviction [OPEN]
* [`342-decouple-hs-interval.md`](/proposals/342-decouple-hs-interval.md): Decoupling hs_interval and SRV lifetime [DRAFT]
+* [`343-rend-caa.txt`](/proposals/343-rend-caa.txt): CAA Extensions for the Tor Rendezvous Specification [OPEN]
diff --git a/proposals/README.md b/proposals/README.md
index 0461d6a..9ff4f18 100644
--- a/proposals/README.md
+++ b/proposals/README.md
@@ -32,15 +32,14 @@ for discussion.
* [`306-ipv6-happy-eyeballs.txt`](/proposals/306-ipv6-happy-eyeballs.txt): A Tor Implementation of IPv6 Happy Eyeballs
* [`308-counter-galois-onion.txt`](/proposals/308-counter-galois-onion.txt): Counter Galois Onion: A New Proposal for Forward-Secure Relay Cryptography
* [`309-optimistic-socks-in-tor.txt`](/proposals/309-optimistic-socks-in-tor.txt): Optimistic SOCKS Data
-* [`319-wide-everything.md`](/proposals/319-wide-everything.md): RELAY_FRAGMENT cells
* [`322-dirport-linkspec.md`](/proposals/322-dirport-linkspec.md): Extending link specifiers to include the directory port
* [`323-walking-onions-full.md`](/proposals/323-walking-onions-full.md): Specification for Walking Onions
* [`324-rtt-congestion-control.txt`](/proposals/324-rtt-congestion-control.txt): RTT-based Congestion Control for Tor
-* [`325-packed-relay-cells.md`](/proposals/325-packed-relay-cells.md): Packed relay cells: saving space on small commands
* [`326-tor-relay-well-known-uri-rfc8615.md`](/proposals/326-tor-relay-well-known-uri-rfc8615.md): The "tor-relay" Well-Known Resource Identifier
* [`330-authority-contact.md`](/proposals/330-authority-contact.md): Modernizing authority contact entries
* [`340-packed-and-fragmented.md`](/proposals/340-packed-and-fragmented.md): Packed and fragmented relay messages
* [`341-better-oos.md`](/proposals/341-better-oos.md): A better algorithm for out-of-sockets eviction
+* [`343-rend-caa.txt`](/proposals/343-rend-caa.txt): CAA Extensions for the Tor Rendezvous Specification
## ACCEPTED proposals: slated for implementation
@@ -107,7 +106,6 @@ discussion.
* [`294-tls-1.3.txt`](/proposals/294-tls-1.3.txt): TLS 1.3 Migration
* [`316-flashflow.md`](/proposals/316-flashflow.md): FlashFlow: A Secure Speed Test for Tor (Parent Proposal)
* [`327-pow-over-intro.txt`](/proposals/327-pow-over-intro.txt): A First Take at PoW Over Introduction Circuits
-* [`329-traffic-splitting.txt`](/proposals/329-traffic-splitting.txt): Overcoming Tor's Bottlenecks with Traffic Splitting
* [`331-res-tokens-for-anti-dos.md`](/proposals/331-res-tokens-for-anti-dos.md): Res tokens: Anonymous Credentials for Onion Service DoS Resilience
* [`342-decouple-hs-interval.md`](/proposals/342-decouple-hs-interval.md): Decoupling hs_interval and SRV lifetime
@@ -125,6 +123,7 @@ certain changes.
* [`279-naming-layer-api.txt`](/proposals/279-naming-layer-api.txt): A Name System API for Tor Onion Services
* [`291-two-guard-nodes.txt`](/proposals/291-two-guard-nodes.txt): The move to two guard nodes
* [`317-secure-dns-name-resolution.txt`](/proposals/317-secure-dns-name-resolution.txt): Improve security aspects of DNS name resolution
+* [`329-traffic-splitting.txt`](/proposals/329-traffic-splitting.txt): Overcoming Tor's Bottlenecks with Traffic Splitting
## NEEDS-RESEARCH proposals: blocking on research
@@ -357,7 +356,9 @@ longer relevant (the proposal is OBSOLETE).
* [`270-newhope-hybrid-handshake.txt`](/proposals/270-newhope-hybrid-handshake.txt): RebelAlliance: A Post-Quantum Secure Hybrid Handshake Based on NewHope [OBSOLETE]
* [`276-lower-bw-granularity.txt`](/proposals/276-lower-bw-granularity.txt): Report bandwidth with lower granularity in consensus documents [DEAD]
* [`286-hibernation-api.txt`](/proposals/286-hibernation-api.txt): Controller APIs for hibernation access on mobile [REJECTED]
+* [`319-wide-everything.md`](/proposals/319-wide-everything.md): RELAY_FRAGMENT cells [OBSOLETE]
* [`320-tap-out-again.md`](/proposals/320-tap-out-again.md): Removing TAP usage from v2 onion services [REJECTED]
+* [`325-packed-relay-cells.md`](/proposals/325-packed-relay-cells.md): Packed relay cells: saving space on small commands [OBSOLETE]
diff --git a/rend-spec-v3.txt b/rend-spec-v3.txt
index 1a02cec..04e5c4c 100644
--- a/rend-spec-v3.txt
+++ b/rend-spec-v3.txt
@@ -238,9 +238,10 @@ Table of contents:
LSPEC (Link specifier) [LSLEN bytes]
Link specifier types are as described in tor-spec.txt. Every set of
- link specifiers MUST include at minimum specifiers of type [00]
+ link specifiers SHOULD include at minimum specifiers of type [00]
(TLS-over-TCP, IPv4), [02] (legacy node identity) and [03] (ed25519
- identity key).
+ identity key). Sets of link specifiers without these three types
+ SHOULD be rejected.
As of 0.4.1.1-alpha, Tor includes both IPv4 and IPv6 link specifiers
in v3 onion service protocol link specifier lists. All available
@@ -1380,7 +1381,7 @@ Table of contents:
point section]
The link-specifiers is a base64 encoding of a link specifier
- block in the format described in BUILDING-BLOCKS.
+ block in the format described in [BUILDING-BLOCKS] above.
As of 0.4.1.1-alpha, services include both IPv4 and IPv6 link
specifiers in descriptors. All available addresses SHOULD be
@@ -1392,11 +1393,20 @@ Table of contents:
recognize; instead, it should use them verbatim in its EXTEND
request to the introduction point.
- The client MAY perform basic validity checks on the link
- specifiers in the descriptor. These checks SHOULD NOT leak
+ The client SHOULD perform the basic validity checks on the link
+ specifiers in the descriptor, described in `tor-spec.txt`
+ section 5.1.2. These checks SHOULD NOT leak
detailed information about the client's version, configuration,
or consensus. (See 3.3 for service link specifier handling.)
+ When connecting to the introduction point, the client SHOULD send
+ this list of link specifiers verbatim, in the same order as given
+ here.
+
+ The client MAY reject the list of link specifiers if it is
+ inconsistent with relay information from the directory, but SHOULD
+ NOT modify it.
+
"onion-key" SP "ntor" SP key NL
[Exactly once per introduction point]
@@ -1908,8 +1918,15 @@ Table of contents:
The hidden service should handle invalid or unrecognised link specifiers
the same way as clients do in section 2.5.2.2. In particular, services
- MAY perform basic validity checks on link specifiers, and SHOULD NOT
+ SHOULD perform basic validity checks on link specifiers, and SHOULD NOT
reject unrecognised link specifiers, to avoid information leaks.
+ The list of link specifiers received here SHOULD either be rejected, or
+ sent verbatim when extending to the rendezvous point, in the same order
+ received.
+
+ The service MAY reject the list of link specifiers if it is
+ inconsistent with relay information from the directory, but SHOULD
+ NOT modify it.
The ONION_KEY_TYPE field is:
@@ -2068,7 +2085,8 @@ Table of contents:
The hidden service host now also knows the keys generated by the
handshake, which it will use to encrypt and authenticate data
end-to-end between the client and the server. These keys are as
- computed in tor-spec.txt section 5.1.4.
+ computed in tor-spec.txt section 5.1.4, except that instead of using
+ AES-128 and SHA1 for this hop, we use AES-256 and SHA3-256.
3.4. Authentication during the introduction phase. [INTRO-AUTH]
@@ -2711,3 +2729,95 @@ Appendix F. Two methods for managing revision counters.
Similarly, implementations SHOULD NOT let the revision counter
increase forever without resetting it -- doing so links the service
across changes in the blinded public key.
+
+Appendix G. Text vectors
+
+ G.1. Test vectors for hs-ntor / NTOR-WITH-EXTRA-DATA
+
+ Here is a set of test values for the hs-ntor handshake, called
+ [NTOR-WITH-EXTRA-DATA] in this document. They were generated by
+ instrumenting Tor's code to dump the values for an INTRODUCE/RENDEZVOUS
+ handshake, and then by running that code on a Chutney network.
+
+ We assume an onion service with:
+
+ KP_hs_ipd_sid = 34E171E4358E501BFF21ED907E96AC6B
+ FEF697C779D040BBAF49ACC30FC5D21F
+ KP_hss_ntor = 8E5127A40E83AABF6493E41F142B6EE3
+ 604B85A3961CD7E38D247239AFF71979
+ KS_hss_ntor = A0ED5DBF94EEB2EDB3B514E4CF6ABFF6
+ 022051CC5F103391F1970A3FCD15296A
+ N_hs_subcred = 0085D26A9DEBA252263BF0231AEAC59B
+ 17CA11BAD8A218238AD6487CBAD68B57
+
+ The client wants to make in INTRODUCE request. It generates
+ the following header (everything before the ENCRYPTED portion)
+ of its INTRODUCE1 cell:
+
+ H = 000000000000000000000000000000000000000002002034E171E4358E501BFF
+ 21ED907E96AC6BFEF697C779D040BBAF49ACC30FC5D21F00
+
+ It generates the following plaintext body to encrypt. (This
+ is the "decrypted plaintext body" from [PROCESS_INTRO2].
+
+ P = 6BD364C12638DD5C3BE23D76ACA05B04E6CE932C0101000100200DE6130E4FCA
+ C4EDDA24E21220CC3EADAE403EF6B7D11C8273AC71908DE565450300067F0000
+ 0113890214F823C4F8CC085C792E0AEE0283FE00AD7520B37D0320728D5DF39B
+ 7B7077A0118A900FF4456C382F0041300ACF9C58E51C392795EF870000000000
+ 0000000000000000000000000000000000000000000000000000000000000000
+ 000000000000000000000000000000000000000000000000000000000000
+
+ The client now begins the hs-ntor handshake. It generates
+ a curve25519 keypair:
+
+ x = 60B4D6BF5234DCF87A4E9D7487BDF3F4
+ A69B6729835E825CA29089CFDDA1E341
+ X = BF04348B46D09AED726F1D66C618FDEA
+ 1DE58E8CB8B89738D7356A0C59111D5D
+
+ Then it calculates:
+
+ ENC_KEY = 9B8917BA3D05F3130DACCE5300C3DC27
+ F6D012912F1C733036F822D0ED238706
+ MAC_KEY = FC4058DA59D4DF61E7B40985D122F502
+ FD59336BC21C30CAF5E7F0D4A2C38FD5
+
+ With these, it encrypts the plaintext body P with ENC_KEY, getting
+ an encrypted value C. It computes MAC(MAC_KEY, H | X | C),
+ getting a MAC value M. It then assembles the final INTRODUCE1
+ body as H | X | C | M:
+
+ 000000000000000000000000000000000000000002002034E171E4358E501BFF
+ 21ED907E96AC6BFEF697C779D040BBAF49ACC30FC5D21F00BF04348B46D09AED
+ 726F1D66C618FDEA1DE58E8CB8B89738D7356A0C59111D5DADBECCCB38E37830
+ 4DCC179D3D9E437B452AF5702CED2CCFEC085BC02C4C175FA446525C1B9D5530
+ 563C362FDFFB802DAB8CD9EBC7A5EE17DA62E37DEEB0EB187FBB48C63298B0E8
+ 3F391B7566F42ADC97C46BA7588278273A44CE96BC68FFDAE31EF5F0913B9A9C
+ 7E0F173DBC0BDDCD4ACB4C4600980A7DDD9EAEC6E7F3FA3FC37CD95E5B8BFB3E
+ 35717012B78B4930569F895CB349A07538E42309C993223AEA77EF8AEA64F25D
+ DEE97DA623F1AEC0A47F150002150455845C385E5606E41A9A199E7111D54EF2
+ D1A51B7554D8B3692D85AC587FB9E69DF990EFB776D8
+
+ Later the service receives that body in an INTRODUCE2 cell. It
+ processes it according to the hs-ntor handshake, and recovers
+ the client's plaintext P. To continue the hs-ntor handshake,
+ the service chooses a curve25519 keypair:
+
+ y = 68CB5188CA0CD7924250404FAB54EE13
+ 92D3D2B9C049A2E446513875952F8F55
+ Y = 8FBE0DB4D4A9C7FF46701E3E0EE7FD05
+ CD28BE4F302460ADDEEC9E93354EE700
+
+ From this and the client's input, it computes:
+
+ AUTH_INPUT_MAC = 4A92E8437B8424D5E5EC279245D5C72B
+ 25A0327ACF6DAF902079FCB643D8B208
+ NTOR_KEY_SEED = 4D0C72FE8AFF35559D95ECC18EB5A368
+ 83402B28CDFD48C8A530A5A3D7D578DB
+
+ The service sends back Y | AUTH_INPUT_MAC in its RENDEZVOUS1 cell
+ body. From these, the client finishes the handshake, validates
+ AUTH_INPUT_MAC, and computes the same NTOR_KEY_SEED.
+
+ Now that both parties have the same NTOR_KEY_SEED, they can derive
+ the shared key material they will use for their circuit.
diff --git a/tor-spec.txt b/tor-spec.txt
index 34a3b44..8ab16d8 100644
--- a/tor-spec.txt
+++ b/tor-spec.txt
@@ -1068,28 +1068,28 @@ see tor-design.pdf.
5.1.1. Choosing circuit IDs in create cells
- The CircID for a CREATE/CREATE2 cell is an arbitrarily chosen
- nonzero integer, selected by the node (OP or OR) that sends the
- CREATE/CREATE2 cell. In link protocol 3 or lower, CircIDs are 2
- bytes long; in protocol 4 or higher, CircIDs are 4 bytes long.
-
- To prevent CircID collisions, when one node sends a CREATE/CREATE2
- cell to another, it chooses from only one half of the possible
- values based on the ORs' public identity keys.
-
- In link protocol version 3 or lower, if the sending node has a lower
- key, it chooses a CircID with an MSB of 0; otherwise, it chooses a
- CircID with an MSB of 1. (Public keys are compared numerically by
- modulus.) With protocol version 3 or lower, a client with no public key
- MAY choose any CircID it wishes, since clients never need to process a
- CREATE/CREATE2 cell.
+ The CircID for a CREATE/CREATE2 cell is a nonzero integer, selected
+ by the node (OP or OR) that sends the CREATE/CREATED2 cell.
+ Depending on the link protocol version, there are certain rules for
+ choosing the value of CircID which MUST be obeyed, as implementations
+ MAY decide to refuse in case of a violation. In link protocol 3 or
+ lower, CircIDs are 2 bytes long; in protocol 4 or higher, CircIDs are
+ 4 bytes long.
+
+ In link protocol version 3 or lower, the nodes choose from only one
+ half of the possible values based on the ORs' public identity keys,
+ in order to avoid collisions. If the sending node has a lower key,
+ it chooses a CircID with an MSB of 0; otherwise, it chooses a CircID
+ with an MSB of 1. (Public keys are compared numerically by modulus.)
+ A client with no public key MAY choose any CircID it wishes, since
+ clients never need to process CREATE/CREATE2 cells.
In link protocol version 4 or higher, whichever node initiated the
- connection sets its MSB to 1, and whichever node didn't initiate the
- connection sets its MSB to 0.
+ connection MUST set its MSB to 1, and whichever node didn't initiate
+ the connection MUST set its MSB to 0.
The CircID value 0 is specifically reserved for cells that do not
- belong to any circuit: CircID 0 must not be used for circuits. No
+ belong to any circuit: CircID 0 MUST not be used for circuits. No
other CircID value, including 0x8000 or 0x80000000, is reserved.
Existing Tor implementations choose their CircID values at random from
@@ -1101,7 +1101,7 @@ see tor-design.pdf.
5.1.2. EXTEND and EXTENDED cells
To extend an existing circuit, the client sends an EXTEND or EXTEND2
- relay cell to the last node in the circuit.
+ RELAY_EARLY cell to the last node in the circuit.
An EXTEND2 cell's relay payload contains:
@@ -1128,7 +1128,9 @@ see tor-design.pdf.
be listed.
Nodes MUST ignore unrecognized specifiers, and MUST accept multiple
- instances of specifiers other than 'legacy identity'.
+ instances of specifiers other than 'legacy identity' and
+ 'Ed25519 identity'. (Nodes SHOULD reject link specifier lists
+ that include multiple instances of either one of those specifiers.)
For purposes of indistinguishability, implementations SHOULD send
these link specifiers, if using them, in this order: [00], [02], [03],
@@ -1154,7 +1156,7 @@ see tor-design.pdf.
target OR did not prove its ownership of any such identity key.
If only one identity key is provided, but the extending OR knows
the other (from directory information), then the OR SHOULD also
- enforce that key.
+ enforce the key in the directory.
If an extending OR has a channel with a given Ed25519 ID and RSA
identity, and receives a request for that Ed25519 ID and a
@@ -1258,13 +1260,12 @@ see tor-design.pdf.
t_mac = PROTOID | ":mac"
t_key = PROTOID | ":key_extract"
t_verify = PROTOID | ":verify"
- MULT(a,b) = the multiplication of the curve25519 point 'a' by the
- scalar 'b'.
G = The preferred base point for curve25519 ([9])
KEYGEN() = The curve25519 key generation algorithm, returning
a private/public keypair.
m_expand = PROTOID | ":key_expand"
KEYID(A) = A
+ EXP(a, b) = The ECDH algorithm for establishing a shared secret.
To perform the handshake, the client needs to know an identity key
digest for the server, and an ntor onion key (a curve25519 public
@@ -2174,6 +2175,11 @@ see tor-design.pdf.
matched on the other side from the previous cell sent that the OR/OP
must remember.
+ (Note that if the digest in use has an output length greater than 20
+ bytes—as is the case for the hop of an onion service rendezvous
+ circuit created by the hs_ntor handshake—we truncate the digest
+ to 20 bytes here.)
+
If the VERSION is unrecognized or below the minimum accepted version (taken
from the consensus), the circuit should be torn down.