Unmodified copy of final proposal 327

This will be broken up into sections by future commits.
author: Micah Elizabeth Scott <beth@torproject.org> 2023-11-07 18:51:36 -0800
committer: Micah Elizabeth Scott <beth@torproject.org> 2023-11-09 14:16:27 -0800
commit: 93fed8bb455d1d09a4cf0fb6c79e3ed67c302f6e (patch)
tree: 282a52387d99fc6049b6d0592373409719878e02 /spec/hspow-spec
parent: 49ecdc696262876c3113c48d86c659ce64697b0e (diff)
download: torspec-93fed8bb455d1d09a4cf0fb6c79e3ed67c302f6e.tar.gz
torspec-93fed8bb455d1d09a4cf0fb6c79e3ed67c302f6e.zip
1 files changed, 1217 insertions, 0 deletions
diff --git a/spec/hspow-spec/index.md b/spec/hspow-spec/index.md
new file mode 100644
index 0000000..d728781
--- /dev/null
+++ b/spec/hspow-spec/index.md
@@ -0,0 +1,1217 @@
+```
+Filename: 327-pow-over-intro.txt
+Title: A First Take at PoW Over Introduction Circuits
+Author: George Kadianakis, Mike Perry, David Goulet, tevador
+Created: 2 April 2020
+Status: Closed
+
+0. Abstract
+
+  This proposal aims to thwart introduction flooding DoS attacks by introducing
+  a dynamic Proof-Of-Work protocol that occurs over introduction circuits.
+
+1. Motivation
+
+  So far our attempts at limiting the impact of introduction flooding DoS
+  attacks on onion services has been focused on horizontal scaling with
+  Onionbalance, optimizing the CPU usage of Tor and applying rate limiting.
+  While these measures move the goalpost forward, a core problem with onion
+  service DoS is that building rendezvous circuits is a costly procedure both
+  for the service and for the network. For more information on the limitations
+  of rate-limiting when defending against DDoS, see [REF_TLS_1].
+
+  If we ever hope to have truly reachable global onion services, we need to
+  make it harder for attackers to overload the service with introduction
+  requests. This proposal achieves this by allowing onion services to specify
+  an optional dynamic proof-of-work scheme that its clients need to participate
+  in if they want to get served.
+
+  With the right parameters, this proof-of-work scheme acts as a gatekeeper to
+  block amplification attacks by attackers while letting legitimate clients
+  through.
+
+1.1. Related work
+
+  For a similar concept, see the three internet drafts that have been proposed
+  for defending against TLS-based DDoS attacks using client puzzles [REF_TLS].
+
+1.2. Threat model [THREAT_MODEL]
+
+1.2.1. Attacker profiles [ATTACKER_MODEL]
+
+  This proposal is written to thwart specific attackers. A simple PoW proposal
+  cannot defend against all and every DoS attack on the Internet, but there are
+  adversary models we can defend against.
+
+  Let's start with some adversary profiles:
+
+  "The script-kiddie"
+
+    The script-kiddie has a single computer and pushes it to its
+    limits. Perhaps it also has a VPS and a pwned server. We are talking about
+    an attacker with total access to 10 GHz of CPU and 10 GB of RAM. We
+    consider the total cost for this attacker to be zero $.
+
+  "The small botnet"
+
+    The small botnet is a bunch of computers lined up to do an introduction
+    flooding attack. Assuming 500 medium-range computers, we are talking about
+    an attacker with total access to 10 THz of CPU and 10 TB of RAM. We
+    consider the upfront cost for this attacker to be about $400.
+
+  "The large botnet"
+
+    The large botnet is a serious operation with many thousands of computers
+    organized to do this attack. Assuming 100k medium-range computers, we are
+    talking about an attacker with total access to 200 THz of CPU and 200 TB of
+    RAM. The upfront cost for this attacker is about $36k.
+
+  We hope that this proposal can help us defend against the script-kiddie
+  attacker and small botnets. To defend against a large botnet we would need
+  more tools at our disposal (see [FUTURE_DESIGNS]).
+
+1.2.2. User profiles [USER_MODEL]
+
+  We have attackers and we have users. Here are a few user profiles:
+
+  "The standard web user"
+
+    This is a standard laptop/desktop user who is trying to browse the
+    web. They don't know how these defences work and they don't care to
+    configure or tweak them. If the site doesn't load, they are gonna close
+    their browser and be sad at Tor. They run a 2GHz computer with 4GB of RAM.
+
+  "The motivated user"
+
+    This is a user that really wants to reach their destination. They don't
+    care about the journey; they just want to get there. They know what's going
+    on; they are willing to make their computer do expensive multi-minute PoW
+    computations to get where they want to be.
+
+  "The mobile user"
+
+    This is a motivated user on a mobile phone. Even tho they want to read the
+    news article, they don't have much leeway on stressing their machine to do
+    more computation.
+
+  We hope that this proposal will allow the motivated user to always connect
+  where they want to connect to, and also give more chances to the other user
+  groups to reach the destination.
+
+1.2.3. The DoS Catch-22 [CATCH22]
+
+  This proposal is not perfect and it does not cover all the use cases. Still,
+  we think that by covering some use cases and giving reachability to the
+  people who really need it, we will severely demotivate the attackers from
+  continuing the DoS attacks and hence stop the DoS threat all together.
+  Furthermore, by increasing the cost to launch a DoS attack, a big
+  class of DoS attackers will disappear from the map, since the expected ROI
+  will decrease.
+
+2. System Overview
+
+2.1. Tor protocol overview
+
+                                          +----------------------------------+
+                                          |          Onion Service           |
+   +-------+ INTRO1  +-----------+ INTRO2 +--------+                         |
+   |Client |-------->|Intro Point|------->|  PoW   |-----------+             |
+   +-------+         +-----------+        |Verifier|           |             |
+                                          +--------+           |             |
+                                          |                    |             |
+                                          |                    |             |
+                                          |         +----------v---------+   |
+                                          |         |Intro Priority Queue|   |
+                                          +---------+--------------------+---+
+                                                           |  |  |
+                                                Rendezvous |  |  |
+                                                  circuits |  |  |
+                                                           v  v  v
+
+
+
+  The proof-of-work scheme specified in this proposal takes place during the
+  introduction phase of the onion service protocol.
+
+  The system described in this proposal is not meant to be on all the time, and
+  it can be entirely disabled for services that do not experience DoS attacks.
+
+  When the subsystem is enabled, suggested effort is continuously adjusted and
+  the computational puzzle can be bypassed entirely when the effort reaches
+  zero. In these cases, the proof-of-work subsystem can be dormant but still
+  provide the necessary parameters for clients to voluntarily provide effort
+  in order to get better placement in the priority queue.
+
+  The protocol involves the following major steps:
+
+  1) Service encodes PoW parameters in descriptor [DESC_POW]
+  2) Client fetches descriptor and computes PoW [CLIENT_POW]
+  3) Client completes PoW and sends results in INTRO1 cell [INTRO1_POW]
+  4) Service verifies PoW and queues introduction based on PoW effort
+     [SERVICE_VERIFY]
+  5) Requests are continuously drained from the queue, highest effort first,
+     subject to multiple constraints on speed [HANDLE_QUEUE]
+
+2.2. Proof-of-work overview
+
+2.2.1. Algorithm overview
+
+  For our proof-of-work function we will use the Equi-X scheme by tevador
+  [REF_EQUIX].  Equi-X is an asymmetric PoW function based on Equihash<60,3>,
+  using HashX as the underlying layer. It features lightning fast verification
+  speed, and also aims to minimize the asymmetry between CPU and GPU.
+  Furthermore, it's designed for this particular use-case and hence
+  cryptocurrency miners are not incentivized to make optimized ASICs for it.
+
+  The overall scheme consists of several layers that provide different pieces
+  of this functionality:
+
+  1) At the lowest layers, blake2b and siphash are used as hashing and PRNG
+     algorithms that are well suited to common 64-bit CPUs.
+  2) A custom hash function family, HashX, randomizes its implementation for
+     each new seed value. These functions are tuned to utilize the pipelined
+     integer performance on a modern 64-bit CPU. This layer provides the
+     strongest ASIC resistance, since a hardware reimplementation would need
+     to include a CPU-like pipelined execution unit to keep up.
+  3) The Equi-X layer itself builds on HashX and adds an algorithmic puzzle
+     that's designed to be strongly asymmetric and to require RAM to solve
+     efficiently.
+  4) The PoW protocol itself builds on this Equi-X function with a particular
+     construction of the challenge input and particular constraints on the
+     allowed blake2b hash of the solution. This layer provides a linearly
+     adjustable effort that we can verify.
+  5) Above the level of individual PoW handshakes, the client and service
+     form a closed-loop system that adjusts the effort of future handshakes.
+
+  The Equi-X scheme provides two functions that will be used in this proposal:
+      - equix_solve(challenge) which solves a puzzle instance, returning
+        a variable number of solutions per invocation depending on the specific
+        challenge value.
+      - equix_verify(challenge, solution) which verifies a puzzle solution
+        quickly. Verification still depends on executing the HashX function,
+        but far fewer times than when searching for a solution.
+
+  For the purposes of this proposal, all cryptographic algorithms are assumed
+  to produce and consume byte strings, even if internally they operate on
+  some other data type like 64-bit words. This is conventionally little endian
+  order for blake2b, which contrasts with Tor's typical use of big endian.
+  HashX itself is configured with an 8-byte output but its input is a single
+  64-bit word of undefined byte order, of which only the low 16 bits are used
+  by Equi-X in its solution output. We treat Equi-X solution arrays as byte
+  arrays using their packed little endian 16-bit representation.
+
+  We tune Equi-X in section [EQUIX_TUNING].
+
+2.2.2. Dynamic PoW
+
+  DoS is a dynamic problem where the attacker's capabilities constantly change,
+  and hence we want our proof-of-work system to be dynamic and not stuck with a
+  static difficulty setting. Hence, instead of forcing clients to go below a
+  static target like in Bitcoin to be successful, we ask clients to "bid" using
+  their PoW effort. Effectively, a client gets higher priority the higher
+  effort they put into their proof-of-work. This is similar to how
+  proof-of-stake works but instead of staking coins, you stake work.
+
+  The benefit here is that legitimate clients who really care about getting
+  access can spend a big amount of effort into their PoW computation, which
+  should guarantee access to the service given reasonable adversary models. See
+  [PARAM_TUNING] for more details about these guarantees and tradeoffs.
+
+  As a way to improve reachability and UX, the service tries to estimate the
+  effort needed for clients to get access at any given time and places it in
+  the descriptor. See [EFFORT_ESTIMATION] for more details.
+
+2.2.3. PoW effort
+
+  It's common for proof-of-work systems to define an exponential effort
+  function based on a particular number of leading zero bits or equivalent.
+  For the benefit of our effort estimation system, it's quite useful if we
+  instead have a linear scale. We use the first 32 bits of a hashed version
+  of the Equi-X solution as compared to the full 32-bit range.
+
+  Conceptually we could define a function:
+         unsigned effort(uint8_t *token)
+  which takes as its argument a hashed solution, interprets it as a
+  bitstring, and returns the quotient of dividing a bitstring of 1s by it.
+
+  So for example:
+         effort(00000001100010101101) = 11111111111111111111
+                                          / 00000001100010101101
+  or the same in decimal:
+         effort(6317) = 1048575 / 6317 = 165.
+
+  In practice we can avoid even having to perform this division, performing
+  just one multiply instead to see if a request's claimed effort is supported
+  by the smallness of the resulting 32-bit hash prefix. This assumes we send
+  the desired effort explicitly as part of each PoW solution. We do want to
+  force clients to pick a specific effort before looking for a solution,
+  otherwise a client could opportunistically claim a very large effort any
+  time a lucky hash prefix comes up. Thus the effort is communicated explicitly
+  in our protocol, and it forms part of the concatenated Equi-X challenge.
+
+3. Protocol specification
+
+3.1. Service encodes PoW parameters in descriptor [DESC_POW]
+
+  This whole protocol starts with the service encoding the PoW parameters in
+  the 'encrypted' (inner) part of the v3 descriptor. As follows:
+
+       "pow-params" SP type SP seed-b64 SP suggested-effort
+                    SP expiration-time NL
+
+        [At most once]
+
+        type: The type of PoW system used. We call the one specified here "v1"
+
+        seed-b64: A random seed that should be used as the input to the PoW
+                  hash function. Should be 32 random bytes encoded in base64
+                  without trailing padding.
+
+        suggested-effort: An unsigned integer specifying an effort value that
+                  clients should aim for when contacting the service. Can be
+                  zero to mean that PoW is available but not currently
+                  suggested for a first connection attempt. See
+                  [EFFORT_ESTIMATION] for more details here.
+
+        expiration-time: A timestamp in "YYYY-MM-DDTHH:MM:SS" format (iso time
+                         with no space) after which the above seed expires and
+                         is no longer valid as the input for PoW. It's needed
+                         so that our replay cache does not grow infinitely. It
+                         should be set to RAND_TIME(now+7200, 900) seconds.
+
+   The service should refresh its seed when expiration-time passes. The service
+   SHOULD keep its previous seed in memory and accept PoWs using it to avoid
+   race-conditions with clients that have an old seed. The service SHOULD avoid
+   generating two consequent seeds that have a common 4 bytes prefix. See
+   [INTRO1_POW] for more info.
+
+   By RAND_TIME(ts, interval) we mean a time between ts-interval and ts, chosen
+   uniformly at random.
+
+3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
+
+  If a client receives a descriptor with "pow-params", it should assume that
+  the service is prepared to receive PoW solutions as part of the introduction
+  protocol.
+
+  The client parses the descriptor and extracts the PoW parameters. It makes
+  sure that the <expiration-time> has not expired and if it has, it needs to
+  fetch a new descriptor.
+
+  The client should then extract the <suggested-effort> field to configure its
+  PoW 'target' (see [REF_TARGET]). The client SHOULD NOT accept 'target' values
+  that will cause unacceptably long PoW computation.
+
+  The client uses a "personalization string" P equal to the following
+  nul-terminated ASCII string: "Tor hs intro v1\0".
+
+  The client looks up `ID`, the current 32-byte blinded public ID
+  (KP_hs_blind_id) for the onion service.
+
+  To complete the PoW the client follows the following logic:
+
+      a) Client selects a target effort E, based on <suggested-effort> and past
+         connection attempt history.
+      b) Client generates a secure random 16-byte nonce N, as the starting
+         point for the solution search.
+      c) Client derives seed C by decoding 'seed-b64'.
+      d) Client calculates S = equix_solve(P || ID || C || N || E)
+      e) Client calculates R = ntohl(blake2b_32(P || ID || C || N || E || S))
+      f) Client checks if R * E <= UINT32_MAX.
+        f1) If yes, success! The client can submit N, E, the first 4 bytes of
+        C, and S.
+        f2) If no, fail! The client interprets N as a 16-byte little-endian
+        integer, increments it by 1 and goes back to step d).
+
+  Note that the blake2b hash includes the output length parameter in its
+  initial state vector, so a blake2b_32 is not equivalent to the prefix of a
+  blake2b_512. We calculate the 32-bit blake2b specifically, and interpret it
+  in network byte order as an unsigned integer.
+
+  At the end of the above procedure, the client should have S as the solution
+  of the Equix-X puzzle with N as the nonce, C as the seed. How quickly this
+  happens depends solely on the target effort E parameter.
+
+  The algorithm as described is suitable for single-threaded computation.
+  Optionally, a client may choose multiple nonces and attempt several solutions
+  in parallel on separate CPU cores. The specific choice of nonce is entirely
+  up to the client, so parallelization choices like this do not impact the
+  network protocol's interoperability at all.
+
+3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
+
+  Now that the client has an answer to the puzzle it's time to encode it into
+  an INTRODUCE1 cell. To do so the client adds an extension to the encrypted
+  portion of the INTRODUCE1 cell by using the EXTENSIONS field (see
+  [PROCESS_INTRO2] section in rend-spec-v3.txt). The encrypted portion of the
+  INTRODUCE1 cell only gets read by the onion service and is ignored by the
+  introduction point.
+
+  We propose a new EXT_FIELD_TYPE value:
+
+     [02] -- PROOF_OF_WORK
+
+   The EXT_FIELD content format is:
+
+        POW_VERSION    [1 byte]
+        POW_NONCE      [16 bytes]
+        POW_EFFORT     [4 bytes]
+        POW_SEED       [4 bytes]
+        POW_SOLUTION   [16 bytes]
+
+   where:
+
+    POW_VERSION is 1 for the protocol specified in this proposal
+    POW_NONCE is the nonce 'N' from the section above
+    POW_EFFORT is the 32-bit integer effort value, in network byte order
+    POW_SEED is the first 4 bytes of the seed used
+
+   This will increase the INTRODUCE1 payload size by 43 bytes since the
+   extension type and length is 2 extra bytes, the N_EXTENSIONS field is always
+   present and currently set to 0 and the EXT_FIELD is 41 bytes. According to
+   ticket #33650, INTRODUCE1 cells currently have more than 200 bytes
+   available.
+
+3.4. Service verifies PoW and handles the introduction  [SERVICE_VERIFY]
+
+   When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it
+   should check its configuration on whether proof-of-work is enabled on the
+   service. If it's not enabled, the extension SHOULD BE ignored. If enabled,
+   even if the suggested effort is currently zero, the service follows the
+   procedure detailed in this section.
+
+   If the service requires the PROOF_OF_WORK extension but received an
+   INTRODUCE1 cell without any embedded proof-of-work, the service SHOULD
+   consider this cell as a zero-effort introduction for the purposes of the
+   priority queue (see section [INTRO_QUEUE]).
+
+3.4.1. PoW verification [POW_VERIFY]
+
+   To verify the client's proof-of-work the service MUST do the following steps:
+
+      a) Find a valid seed C that starts with POW_SEED. Fail if no such seed
+         exists.
+      b) Fail if N = POW_NONCE is present in the replay cache
+              (see [REPLAY_PROTECTION])
+      c) Calculate R = ntohl(blake2b_32(P || ID || C || N || E || S))
+      d) Fail if R * E > UINT32_MAX
+      e) Fail if equix_verify(P || ID || C || N || E, S) != EQUIX_OK
+      f) Put the request in the queue with a priority of E
+
+   If any of these steps fail the service MUST ignore this introduction request
+   and abort the protocol.
+
+   In this proposal we call the above steps the "top half" of introduction
+   handling. If all the steps of the "top half" have passed, then the circuit
+   is added to the introduction queue as detailed in section [INTRO_QUEUE].
+
+3.4.1.1. Replay protection [REPLAY_PROTECTION]
+
+  The service MUST NOT accept introduction requests with the same (seed, nonce)
+  tuple. For this reason a replay protection mechanism must be employed.
+
+  The simplest way is to use a simple hash table to check whether a (seed,
+  nonce) tuple has been used before for the active duration of a
+  seed. Depending on how long a seed stays active this might be a viable
+  solution with reasonable memory/time overhead.
+
+  If there is a worry that we might get too many introductions during the
+  lifetime of a seed, we can use a Bloom filter as our replay cache
+  mechanism. The probabilistic nature of Bloom filters means that sometimes we
+  will flag some connections as replays even if they are not; with this false
+  positive probability increasing as the number of entries increase. However,
+  with the right parameter tuning this probability should be negligible and
+  well handled by clients.
+
+  {TODO: Design and specify a suitable bloom filter for this purpose.}
+
+3.4.2. The Introduction Queue  [INTRO_QUEUE]
+
+3.4.2.1. Adding introductions to the introduction queue [ADD_QUEUE]
+
+  When PoW is enabled and a verified introduction comes through, the service
+  instead of jumping straight into rendezvous, queues it and prioritizes it
+  based on how much effort was devoted by the client to PoW. This means that
+  introduction requests with high effort should be prioritized over those with
+  low effort.
+
+  To do so, the service maintains an "introduction priority queue" data
+  structure. Each element in that priority queue is an introduction request,
+  and its priority is the effort put into its PoW:
+
+  When a verified introduction comes through, the service uses its included
+  effort commitment value to place each request into the right position of the
+  priority_queue: The bigger the effort, the more priority it gets in the
+  queue. If two elements have the same effort, the older one has priority over
+  the newer one.
+
+3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
+
+  The service should handle introductions by pulling from the introduction
+  queue. We call this part of introduction handling the "bottom half" because
+  most of the computation happens in this stage. For a description of how we
+  expect such a system to work in Tor, see [TOR_SCHEDULER] section.
+
+3.4.3. PoW effort estimation [EFFORT_ESTIMATION]
+
+3.4.3.1. High-level description of the effort estimation process
+
+  The service starts with a default suggested-effort value of 0, which keeps
+  the PoW defenses dormant until we notice signs of overload.
+
+  The overall process of determining effort can be thought of as a set of
+  multiple coupled feedback loops. Clients perform their own effort
+  adjustments via [CLIENT_TIMEOUT] atop a base effort suggested by the service.
+  That suggestion incorporates the service's control adjustments atop a base
+  effort calculated using a sum of currently-queued client effort.
+
+  Each feedback loop has an opportunity to cover different time scales. Clients
+  can make adjustments at every single circuit creation request, whereas
+  services are limited by the extra load that frequent updates would place on
+  HSDir nodes.
+
+  In the combined client/service system these client-side increases are
+  expected to provide the most effective quick response to an emerging DoS
+  attack. After early clients increase the effort using [CLIENT_TIMEOUT],
+  later clients will benefit from the service detecting this increased queued
+  effort and offering a larger suggested_effort.
+
+  Effort increases and decreases both have an intrinsic cost. Increasing effort
+  will make the service more expensive to contact, and decreasing effort makes
+  new requests likely to become backlogged behind older requests. The steady
+  state condition is preferable to either of these side-effects, but ultimately
+  it's expected that the control loop always oscillates to some degree.
+
+3.4.3.2. Service-side effort estimation
+
+  Services keep an internal effort estimation which updates on a regular
+  periodic timer in response to measurements made on the queueing behavior
+  in the previous period. These internal effort changes can optionally trigger
+  client-visible suggested_effort changes when the difference is great enough
+  to warrant republishing to the HSDir.
+
+  This evaluation and update period is referred to as HS_UPDATE_PERIOD.
+  The service side effort estimation takes inspiration from TCP congestion
+  control's additive increase / multiplicative decrease approach, but unlike
+  a typical AIMD this algorithm is fixed-rate and doesn't update immediately
+  in response to events.
+
+  {TODO: HS_UPDATE_PERIOD is hardcoded to 300 (5 minutes) currently, but it
+   should be configurable in some way. Is it more appropriate to use the
+   service's torrc here or a consensus parameter?}
+
+3.4.3.3. Per-period service state
+
+  During each update period, the service maintains some state:
+
+    1. TOTAL_EFFORT, a sum of all effort values for rendezvous requests that
+       were successfully validated and enqueued.
+
+    2. REND_HANDLED, a count of rendezvous requests that were actually
+       launched. Requests that made it to dequeueing but were too old to launch
+       by then are not included.
+
+    3. HAD_QUEUE, a flag which is set if at any time in the update period we
+       saw the priority queue filled with more than a minimum amount of work,
+       greater than we would expect to process in approximately 1/4 second
+       using the configured dequeue rate.
+
+    4. MAX_TRIMMED_EFFORT, the largest observed single request effort that we
+       discarded during the period. Requests are discarded either due to age
+       (timeout) or during culling events that discard the bottom half of the
+       entire queue when it's too full.
+
+3.4.3.4. Service AIMD conditions
+
+  At the end of each period, the service may decide to increase effort,
+  decrease effort, or make no changes, based on these accumulated state values:
+
+    1. If MAX_TRIMMED_EFFORT > our previous internal suggested_effort,
+       always INCREASE. Requests that follow our latest advice are being
+       dropped.
+
+    2. If the HAD_QUEUE flag was set and the queue still contains at least
+       one item with effort >= our previous internal suggested_effort,
+       INCREASE. Even if we haven't yet reached the point of dropping requests,
+       this signal indicates that the our latest suggestion isn't high enough
+       and requests will build up in the queue.
+
+    3. If neither condition (1) or (2) are taking place and the queue is below
+       a level we would expect to process in approximately 1/4 second, choose
+       to DECREASE.
+
+    4. If none of these conditions match, the suggested effort is unchanged.
+
+  When we INCREASE, the internal suggested_effort is increased to either its
+  previous value + 1, or (TOTAL_EFFORT / REND_HANDLED), whichever is larger.
+
+  When we DECREASE, the internal suggested_effort is scaled by 2/3rds.
+
+  Over time, this will continue to decrease our effort suggestion any time the
+  service is fully processing its request queue. If the queue stays empty, the
+  effort suggestion decreases to zero and clients should no longer submit a
+  proof-of-work solution with their first connection attempt.
+
+  It's worth noting that the suggested-effort is not a hard limit to the
+  efforts that are accepted by the service, and it's only meant to serve as a
+  guideline for clients to reduce the number of unsuccessful requests that get
+  to the service. The service still adds requests with lower effort than
+  suggested-effort to the priority queue in [ADD_QUEUE].
+
+3.4.3.5. Updating descriptor with new suggested effort
+
+  The service descriptors may be updated for multiple reasons including
+  introduction point rotation common to all v3 onion services, the scheduled
+  seed rotations described in [DESC_POW], and updates to the effort suggestion.
+  Even though the internal effort estimate updates on a regular timer, we avoid
+  propagating those changes into the descriptor and the HSDir hosts unless
+  there is a significant change.
+
+  If the PoW params otherwise match but the seed has changed by less than 15
+  percent, services SHOULD NOT upload a new descriptor.
+
+4. Client behavior [CLIENT_BEHAVIOR]
+
+  This proposal introduces a bunch of new ways where a legitimate client can
+  fail to reach the onion service.
+
+  Furthermore, there is currently no end-to-end way for the onion service to
+  inform the client that the introduction failed. The INTRO_ACK cell is not
+  end-to-end (it's from the introduction point to the client) and hence it does
+  not allow the service to inform the client that the rendezvous is never gonna
+  occur.
+
+  From the client's perspective there's no way to attribute this failure to
+  the service itself rather than the introduction point, so error accounting
+  is performed separately for each introduction-point. Existing mechanisms
+  will discard an introduction point that's required too many retries.
+
+4.1. Clients handling timeouts [CLIENT_TIMEOUT]
+
+  Alice can fail to reach the onion service if her introduction request gets
+  trimmed off the priority queue in [HANDLE_QUEUE], or if the service does not
+  get through its priority queue in time and the connection times out.
+
+  This section presents a heuristic method for the client getting service even
+  in such scenarios.
+
+  If the rendezvous request times out, the client SHOULD fetch a new descriptor
+  for the service to make sure that it's using the right suggested-effort for
+  the PoW and the right PoW seed. If the fetched descriptor includes a new
+  suggested effort or seed, it should first retry the request with these
+  parameters.
+
+  {TODO: This is not actually implemented yet, but we should do it. How often
+     should clients at most try to fetch new descriptors? Determined by a
+     consensus parameter? This change will also allow clients to retry
+     effectively in cases where the service has just been reconfigured to
+     enable PoW defenses.}
+
+  Every time the client retries the connection, it will count these failures
+  per-introduction-point. These counts of previous retries are combined with
+  the service's suggested_effort when calculating the actual effort to spend
+  on any individual request to a service that advertises PoW support, even
+  when the currently advertised suggested_effort is zero.
+
+  On each retry, the client modifies its solver effort:
+
+    1. If the effort is below (CLIENT_POW_EFFORT_DOUBLE_UNTIL = 1000)
+       it will be doubled.
+
+    2. Otherwise, multiply the effort by (CLIENT_POW_RETRY_MULTIPLIER = 1.5).
+
+    3. Constrain the new effort to be at least
+       (CLIENT_MIN_RETRY_POW_EFFORT = 8) and no greater than
+       (CLIENT_MAX_POW_EFFORT = 10000)
+
+  {TODO: These hardcoded limits should be replaced by timed limits and/or
+      an unlimited solver with robust cancellation. This is issue tor#40787}
+
+5. Attacker strategies [ATTACK_META]
+
+  Now that we defined our protocol we need to start tweaking the various
+  knobs. But before we can do that, we first need to understand a few
+  high-level attacker strategies to see what we are fighting against.
+
+5.1.1. Overwhelm PoW verification (aka "Overwhelm top half") [ATTACK_TOP_HALF]
+
+  A basic attack here is the adversary spamming with bogus INTRO cells so that
+  the service does not have computing capacity to even verify the
+  proof-of-work. This adversary tries to overwhelm the procedure in the
+  [POW_VERIFY] section.
+
+  That's why we need the PoW algorithm to have a cheap verification time so
+  that this attack is not possible: we tune this PoW parameter in section
+  [POW_TUNING_VERIFICATION].
+
+5.1.2. Overwhelm rendezvous capacity (aka "Overwhelm bottom half")
+       [ATTACK_BOTTOM_HALF]
+
+  Given the way the introduction queue works (see [HANDLE_QUEUE]), a very
+  effective strategy for the attacker is to totally overwhelm the queue
+  processing by sending more high-effort introductions than the onion service
+  can handle at any given tick. This adversary tries to overwhelm the procedure
+  in the [HANDLE_QUEUE] section.
+
+  To do so, the attacker would have to send at least 20 high-effort
+  introduction cells every 100ms, where high-effort is a PoW which is above the
+  estimated level of "the motivated user" (see [USER_MODEL]).
+
+  An easier attack for the adversary, is the same strategy but with
+  introduction cells that are all above the comfortable level of "the standard
+  user" (see [USER_MODEL]). This would block out all standard users and only
+  allow motivated users to pass.
+
+5.1.3. Hybrid overwhelm strategy [ATTACK_HYBRID]
+
+  If both the top- and bottom- halves are processed by the same thread, this
+  opens up the possibility for a "hybrid" attack. Given the performance figures
+  for the bottom half (0.31 ms/req.) and the top half (5.5 ms/req.), the
+  attacker can optimally deny service by submitting 91 high-effort requests and
+  1520 invalid requests per second. This will completely saturate the main loop
+  because:
+
+  0.31*(1520+91) ~ 0.5 sec.
+  5.5*91         ~ 0.5 sec.
+
+  This attack only has half the bandwidth requirement of [ATTACK_TOP_HALF] and
+  half the compute requirement of [ATTACK_BOTTOM_HALF].
+
+  Alternatively, the attacker can adjust the ratio between invalid and
+  high-effort requests depending on their bandwidth and compute capabilities.
+
+5.1.4. Gaming the effort estimation logic [ATTACK_EFFORT]
+
+  Another way to beat this system is for the attacker to game the effort
+  estimation logic (see [EFFORT_ESTIMATION]). Essentially, there are two attacks
+  that we are trying to avoid:
+
+  - Attacker sets descriptor suggested-effort to a very high value effectively
+    making it impossible for most clients to produce a PoW token in a
+    reasonable timeframe.
+  - Attacker sets descriptor suggested-effort to a very small value so that
+    most clients aim for a small value while the attacker comfortably launches
+    an [ATTACK_BOTTOM_HALF] using medium effort PoW (see [REF_TEVADOR_1])
+
+5.1.4. Precomputed PoW attack
+
+  The attacker may precompute many valid PoW nonces and submit them all at once
+  before the current seed expires, overwhelming the service temporarily even
+  using a single computer. The current scheme gives the attackers 4 hours to
+  launch this attack since each seed lasts 2 hours and the service caches two
+  seeds.
+
+  An attacker with this attack might be aiming to DoS the service for a limited
+  amount of time, or to cause an [ATTACK_EFFORT] attack.
+
+6. Parameter tuning [POW_TUNING]
+
+  There are various parameters in this PoW system that need to be tuned:
+
+  We first start by tuning the time it takes to verify a PoW token. We do this
+  first because it's fundamental to the performance of onion services and can
+  turn into a DoS vector of its own. We will do this tuning in a way that's
+  agnostic to the chosen PoW function.
+
+  We will then move towards analyzing the client starting difficulty setting
+  for our PoW system. That defines the expected time for clients to succeed in
+  our system, and the expected time for attackers to overwhelm our system. Same
+  as above we will do this in a way that's agnostic to the chosen PoW function.
+
+  Currently, we have hardcoded the initial client starting difficulty at 8,
+  but this may be too low to ramp up quickly to various on and off attack
+  patterns. A higher initial difficulty may be needed for these, depending on
+  their severity. This section gives us an idea of how large such attacks can
+  be.
+
+  Finally, using those two pieces we will tune our PoW function and pick the
+  right client starting difficulty setting. At the end of this section we will
+  know the resources that an attacker needs to overwhelm the onion service, the
+  resources that the service needs to verify introduction requests, and the
+  resources that legitimate clients need to get to the onion service.
+
+6.1. PoW verification [POW_TUNING_VERIFICATION]
+
+  Verifying a PoW token is the first thing that a service does when it receives
+  an INTRODUCE2 cell and it's detailed in section [POW_VERIFY]. This
+  verification happens during the "top half" part of the process. Every
+  millisecond spent verifying PoW adds overhead to the already existing "top
+  half" part of handling an introduction cell. Hence we should be careful to
+  add minimal overhead here so that we don't enable attacks like [ATTACK_TOP_HALF].
+
+  During our performance measurements in [TOR_MEASUREMENTS] we learned that the
+  "top half" takes about 0.26 msecs in average, without doing any sort of PoW
+  verification. Using that value we compute the following table, that describes
+  the number of cells we can queue per second (aka times we can perform the
+  "top half" process) for different values of PoW verification time:
+
+      +---------------------+-----------------------+--------------+
+      |PoW Verification Time| Total "top half" time | Cells Queued |
+      |                     |                       |  per second  |
+      |---------------------|-----------------------|--------------|
+      |    0     msec       |    0.26      msec     |    3846      |
+      |    1     msec       |    1.26      msec     |    793       |
+      |    2     msec       |    2.26      msec     |    442       |
+      |    3     msec       |    3.26      msec     |    306       |
+      |    4     msec       |    4.26      msec     |    234       |
+      |    5     msec       |    5.26      msec     |    190       |
+      |    6     msec       |    6.26      msec     |    159       |
+      |    7     msec       |    7.26      msec     |    137       |
+      |    8     msec       |    8.26      msec     |    121       |
+      |    9     msec       |    9.26      msec     |    107       |
+      |    10    msec       |    10.26     msec     |    97        |
+      +---------------------+-----------------------+--------------+
+
+  Here is how you can read the table above:
+
+  - For a PoW function with a 1ms verification time, an attacker needs to send
+    793 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
+
+  - For a PoW function with a 2ms verification time, an attacker needs to send
+    442 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
+
+  - For a PoW function with a 10ms verification time, an attacker needs to send
+    97 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
+
+  Whether an attacker can succeed at that depends on the attacker's resources,
+  but also on the network's capacity.
+
+  Our purpose here is to have the smallest PoW verification overhead possible
+  that also allows us to achieve all our other goals.
+
+  [Note that the table above is simply the result of a naive multiplication and
+  does not take into account all the auxiliary overheads that happen every
+  second like the time to invoke the mainloop, the bottom-half processes, or
+  pretty much anything other than the "top-half" processing.
+
+  During our measurements the time to handle INTRODUCE2 cells dominates any
+  other action time: There might be events that require a long processing time,
+  but these are pretty infrequent (like uploading a new HS descriptor) and
+  hence over a long time they smooth out. Hence extrapolating the total cells
+  queued per second based on a single "top half" time seems like good enough to
+  get some initial intuition. That said, the values of "Cells queued per
+  second" from the table above, are likely much smaller than displayed above
+  because of all the auxiliary overheads.]
+
+6.2. PoW difficulty analysis [POW_DIFFICULTY_ANALYSIS]
+
+  The difficulty setting of our PoW basically dictates how difficult it should
+  be to get a success in our PoW system. An attacker who can get many successes
+  per second can pull a successful [ATTACK_BOTTOM_HALF] attack against our
+  system.
+
+  In classic PoW systems, "success" is defined as getting a hash output below
+  the "target". However, since our system is dynamic, we define "success" as an
+  abstract high-effort computation.
+
+  Our system is dynamic but we still need a starting difficulty setting that
+  will be used for bootstrapping the system. The client and attacker can still
+  aim higher or lower but for UX purposes and for analysis purposes we do need
+  to define a starting difficulty, to minimize retries by clients.
+
+6.2.1. Analysis based on adversary power
+
+  In this section we will try to do an analysis of PoW difficulty without using
+  any sort of Tor-related or PoW-related benchmark numbers.
+
+  We created the table (see [REF_TABLE]) below which shows how much time a
+  legitimate client with a single machine should expect to burn before they get
+  a single success. The x-axis is how many successes we want the attacker to be
+  able to do per second: the more successes we allow the adversary, the more
+  they can overwhelm our introduction queue. The y-axis is how many machines
+  the adversary has in her disposal, ranging from just 5 to 1000.
+
+       ===============================================================
+       |    Expected Time (in seconds) Per Success For One Machine   |
+ ===========================================================================
+ |                                                                          |
+ |   Attacker Succeses        1       5       10      20      30      50    |
+ |       per second                                                         |
+ |                                                                          |
+ |            5               5       1       0       0       0       0     |
+ |            50              50      10      5       2       1       1     |
+ |            100             100     20      10      5       3       2     |
+ | Attacker   200             200     40      20      10      6       4     |
+ |  Boxes     300             300     60      30      15      10      6     |
+ |            400             400     80      40      20      13      8     |
+ |            500             500     100     50      25      16      10    |
+ |            1000            1000    200     100     50      33      20    |
+ |                                                                          |
+ ============================================================================
+
+  Here is how you can read the table above:
+
+  - If an adversary has a botnet with 1000 boxes, and we want to limit her to 1
+    success per second, then a legitimate client with a single box should be
+    expected to spend 1000 seconds getting a single success.
+
+  - If an adversary has a botnet with 1000 boxes, and we want to limit her to 5
+    successes per second, then a legitimate client with a single box should be
+    expected to spend 200 seconds getting a single success.
+
+  - If an adversary has a botnet with 500 boxes, and we want to limit her to 5
+    successes per second, then a legitimate client with a single box should be
+    expected to spend 100 seconds getting a single success.
+
+  - If an adversary has access to 50 boxes, and we want to limit her to 5
+    successes per second, then a legitimate client with a single box should be
+    expected to spend 10 seconds getting a single success.
+
+  - If an adversary has access to 5 boxes, and we want to limit her to 5
+    successes per second, then a legitimate client with a single box should be
+    expected to spend 1 seconds getting a single success.
+
+  With the above table we can create some profiles for starting values of our
+  PoW difficulty.
+
+6.2.2. Analysis based on Tor's performance [POW_DIFFICULTY_TOR]
+
+  To go deeper here, we can use the performance measurements from
+  [TOR_MEASUREMENTS] to get a more specific intuition on the starting
+  difficulty. In particular, we learned that completely handling an
+  introduction cell takes 5.55 msecs in average. Using that value, we can
+  compute the following table, that describes the number of introduction cells
+  we can handle per second for different values of PoW verification:
+
+      +---------------------+-----------------------+--------------+
+      |PoW Verification Time| Total time to handle  | Cells handled|
+      |                     |   introduction cell   |  per second  |
+      |---------------------|-----------------------|--------------|
+      |    0      msec      |    5.55        msec   |    180.18    |
+      |    1      msec      |    6.55        msec   |    152.67    |
+      |    2      msec      |    7.55        msec   |    132.45    |
+      |    3      msec      |    8.55        msec   |    116.96    |
+      |    4      msec      |    9.55        mesc   |    104.71    |
+      |    5      msec      |    10.55       msec   |    94.79     |
+      |    6      msec      |    11.55       msec   |    86.58     |
+      |    7      msec      |    12.55       msec   |    79.68     |
+      |    8      msec      |    13.55       msec   |    73.80     |
+      |    9      msec      |    14.55       msec   |    68.73     |
+      |    10     msec      |    15.55       msec   |    64.31     |
+      +---------------------+-----------------------+--------------+
+
+  Here is how you can read the table above:
+
+  - For a PoW function with a 1ms verification time, an attacker needs to send
+    152 high-effort introduction cells per second to succeed in a
+    [ATTACK_BOTTOM_HALF] attack.
+
+  - For a PoW function with a 10ms verification time, an attacker needs to send
+    64 high-effort introduction cells per second to succeed in a
+    [ATTACK_BOTTOM_HALF] attack.
+
+  We can use this table to specify a starting difficulty that won't allow our
+  target adversary to succeed in an [ATTACK_BOTTOM_HALF] attack.
+
+  Of course, when it comes to this table, the same disclaimer as in section
+  [POW_TUNING_VERIFICATION] is valid. That is, the above table is just a
+  theoretical extrapolation and we expect the real values to be much lower
+  since they depend on auxiliary processing overheads, and on the network's
+  capacity.
+
+
+7. Discussion
+
+7.1. UX
+
+  This proposal has user facing UX consequences.
+
+  When the client first attempts a pow, it can note how long iterations of the
+  hash function take, and then use this to determine an estimation of the
+  duration of the PoW. This estimation could be communicated via the control
+  port or other mechanism, such that the browser could display how long the
+  PoW is expected to take on their device. If the device is a mobile platform,
+  and this time estimation is large, it could recommend that the user try from
+  a desktop machine.
+
+7.2. Future work [FUTURE_WORK]
+
+7.2.1. Incremental improvements to this proposal
+
+  There are various improvements that can be done in this proposal, and while
+  we are trying to keep this v1 version simple, we need to keep the design
+  extensible so that we build more features into it. In particular:
+
+  - End-to-end introduction ACKs
+
+    This proposal suffers from various UX issues because there is no end-to-end
+    mechanism for an onion service to inform the client about its introduction
+    request. If we had end-to-end introduction ACKs many of the problems from
+    [CLIENT_BEHAVIOR] would be alleviated. The problem here is that end-to-end
+    ACKs require modifications on the introduction point code and a network
+    update which is a lengthy process.
+
+  - Multithreading scheduler
+
+    Our scheduler is pretty limited by the fact that Tor has a single-threaded
+    design. If we improve our multithreading support we could handle a much
+    greater amount of introduction requests per second.
+
+7.2.2. Future designs [FUTURE_DESIGNS]
+
+  This is just the beginning in DoS defences for Tor and there are various
+  future designs and schemes that we can investigate. Here is a brief summary
+  of these:
+
+  "More advanced PoW schemes" -- We could use more advanced memory-hard PoW
+         schemes like MTP-argon2 or Itsuku to make it even harder for
+         adversaries to create successful PoWs. Unfortunately these schemes
+         have much bigger proof sizes, and they won't fit in INTRODUCE1 cells.
+         See #31223 for more details.
+
+  "Third-party anonymous credentials" -- We can use anonymous credentials and a
+         third-party token issuance server on the clearnet to issue tokens
+         based on PoW or CAPTCHA and then use those tokens to get access to the
+         service. See [REF_CREDS] for more details.
+
+  "PoW + Anonymous Credentials" -- We can make a hybrid of the above ideas
+         where we present a hard puzzle to the user when connecting to the
+         onion service, and if they solve it we then give the user a bunch of
+         anonymous tokens that can be used in the future. This can all happen
+         between the client and the service without a need for a third party.
+
+  All of the above approaches are much more complicated than this proposal, and
+  hence we want to start easy before we get into more serious projects.
+
+7.3. Environment
+
+  We love the environment! We are concerned of how PoW schemes can waste energy
+  by doing useless hash iterations. Here is a few reasons we still decided to
+  pursue a PoW approach here:
+
+  "We are not making things worse" -- DoS attacks are already happening and
+      attackers are already burning energy to carry them out both on the
+      attacker side, on the service side and on the network side. We think that
+      asking legitimate clients to carry out PoW computations is not gonna
+      affect the equation too much, since an attacker right now can very
+      quickly cause the same damage that hundreds of legitimate clients do a
+      whole day.
+
+  "We hope to make things better" -- The hope is that proposals like this will
+      make the DoS actors go away and hence the PoW system will not be used. As
+      long as DoS is happening there will be a waste of energy, but if we
+      manage to demotivate them with technical means, the network as a whole
+      will less wasteful. Also see [CATCH22] for a similar argument.
+
+8. Acknowledgements
+
+  Thanks a lot to tevador for the various improvements to the proposal and for
+  helping us understand and tweak the RandomX scheme.
+
+  Thanks to Solar Designer for the help in understanding the current PoW
+  landscape, the various approaches we could take, and teaching us a few neat
+  tricks.
+
+Appendix A.  Little-t tor introduction scheduler
+
+  This section describes how we will implement this proposal in the "tor"
+  software (little-t tor).
+
+  The following should be read as if tor is an onion service and thus the end
+  point of all inbound data.
+
+A.1. The Main Loop [MAIN_LOOP]
+
+  Tor uses libevent for its mainloop. For network I/O operations, a mainloop
+  event is used to inform tor if it can read on a certain socket, or a
+  connection object in tor.
+
+  From there, this event will empty the connection input buffer (inbuf) by
+  extracting and processing a cell at a time. The mainloop is single threaded
+  and thus each cell is handled sequentially.
+
+  Processing an INTRODUCE2 cell at the onion service means a series of
+  operations (in order):
+
+    1) Unpack cell from inbuf to local buffer.
+
+    2) Decrypt cell (AES operations).
+
+    3) Parse cell header and process it depending on its RELAY_COMMAND.
+
+    4) INTRODUCE2 cell handling which means building a rendezvous circuit:
+        i)  Path selection
+        ii) Launch circuit to first hop.
+
+    5) Return to mainloop event which essentially means back to step (1).
+
+  Tor will read at most 32 cells out of the inbuf per mainloop round.
+
+A.2. Requirements for PoW
+
+  With this proposal, in order to prioritize cells by the amount of PoW work
+  it has done, cells can _not_ be processed sequentially as described above.
+
+  Thus, we need a way to queue a certain number of cells, prioritize them and
+  then process some cell(s) from the top of the queue (that is, the cells that
+  have done the most PoW effort).
+
+  We thus require a new cell processing flow that is _not_ compatible with
+  current tor design. The elements are:
+
+    - Validate PoW and place cells in a priority queue of INTRODUCE2 cells (as
+      described in section [INTRO_QUEUE]).
+
+    - Defer "bottom half" INTRO2 cell processing for after cells have been
+      queued into the priority queue.
+
+A.3. Proposed scheduler [TOR_SCHEDULER]
+
+  The intuitive way to address the A.2 requirements would be to do this
+  simple and naive approach:
+
+    1) Mainloop: Empty inbuf INTRODUCE2 cells into priority queue
+
+    2) Process all cells in pqueue
+
+    3) Goto (1)
+
+  However, we are worried that handling all those cells before returning to the
+  mainloop opens possibilities of attack by an adversary since the priority
+  queue is not gonna be kept up to date while we process all those cells. This
+  means that we might spend lots of time dealing with introductions that don't
+  deserve it. See [BOTTOM_HALF_SCHEDULER] for more details.
+
+  We thus propose to split the INTRODUCE2 handling into two different steps:
+  "top half" and "bottom half" process, as also mentioned in [POW_VERIFY]
+  section above.
+
+A.3.1. Top half and bottom half scheduler
+
+  The top half process is responsible for queuing introductions into the
+  priority queue as follows:
+
+    a) Unpack cell from inbuf to local buffer.
+
+    b) Decrypt cell (AES operations).
+
+    c) Parse INTRODUCE2 cell header and validate PoW.
+
+    d) Return to mainloop event which essentially means step (1).
+
+  The top-half basically does all operations of section [MAIN_LOOP] except from (4).
+
+  An then, the bottom-half process is responsible for handling introductions
+  and doing rendezvous. To achieve this we introduce a new mainloop event to
+  process the priority queue _after_ the top-half event has completed. This new
+  event would do these operations sequentially:
+
+    a) Pop INTRODUCE2 cell from priority queue.
+
+    b) Parse and process INTRODUCE2 cell.
+
+    c) End event and yield back to mainloop.
+
+A.3.2. Scheduling the bottom half process [BOTTOM_HALF_SCHEDULER]
+
+  The question now becomes: when should the "bottom half" event get triggered
+  from the mainloop?
+
+  We propose that this event is scheduled in when the network I/O event
+  queues at least 1 cell into the priority queue. Then, as long as it has a
+  cell in the queue, it would re-schedule itself for immediate execution
+  meaning at the next mainloop round, it would execute again.
+
+  The idea is to try to empty the queue as fast as it can in order to provide a
+  fast response time to an introduction request but always leave a chance for
+  more cells to appear between cell processing by yielding back to the
+  mainloop. With this we are aiming to always have the most up-to-date version
+  of the priority queue when we are completing introductions: this way we are
+  prioritizing clients that spent a lot of time and effort completing their PoW.
+
+  If the size of the queue drops to 0, it stops scheduling itself in order to
+  not create a busy loop. The network I/O event will re-schedule it in time.
+
+  Notice that the proposed solution will make the service handle 1 single
+  introduction request at every main loop event. However, when we do
+  performance measurements we might learn that it's preferable to bump the
+  number of cells in the future from 1 to N where N <= 32.
+
+A.4 Performance measurements
+
+  This section will detail the performance measurements we've done on tor.git
+  for handling an INTRODUCE2 cell and then a discussion on how much more CPU
+  time we can add (for PoW validation) before it badly degrades our
+  performance.
+
+A.4.1 Tor measurements [TOR_MEASUREMENTS]
+
+  In this section we will derive measurement numbers for the "top half" and
+  "bottom half" parts of handling an introduction cell.
+
+  These measurements have been done on tor.git at commit
+  80031db32abebaf4d0a91c01db258fcdbd54a471.
+
+  We've measured several set of actions of the INTRODUCE2 cell handling process
+  on Intel(R) Xeon(R) CPU E5-2650 v4. Our service was accessed by an array of
+  clients that sent introduction requests for a period of 60 seconds.
+
+  1. Full Mainloop Event
+
+     We start by measuring the full time it takes for a mainloop event to
+     process an inbuf containing INTRODUCE2 cells. The mainloop event processed
+     2.42 cells per invocation on average during our measurements.
+
+     Total measurements: 3279
+
+       Min: 0.30 msec - 1st Q.: 5.47 msec - Median: 5.91 msec
+       Mean: 13.43 msec - 3rd Q.: 16.20 msec - Max: 257.95 msec
+
+  2. INTRODUCE2 cell processing (bottom-half)
+
+     We also measured how much time the "bottom half" part of the process
+     takes. That's the heavy part of processing an introduction request as seen
+     in step (4) of the [MAIN_LOOP] section:
+
+     Total measurements: 7931
+
+       Min: 0.28 msec - 1st Q.: 5.06 msec - Median: 5.33 msec
+       Mean: 5.29 msec - 3rd Q.: 5.57 msec - Max: 14.64 msec
+
+  3. Connection data read (top half)
+
+     Now that we have the above pieces, we can use them to measure just the
+     "top half" part of the procedure. That's when bytes are taken from the
+     connection inbound buffer and parsed into an INTRODUCE2 cell where basic
+     validation is done.
+
+     There is an average of 2.42 INTRODUCE2 cells per mainloop event and so we
+     divide that by the full mainloop event mean time to get the time for one
+     cell. From that we subtract the "bottom half" mean time to get how much
+     the "top half" takes:
+
+        => 13.43 / (7931 / 3279) = 5.55
+        => 5.55 - 5.29 = 0.26
+
+        Mean: 0.26 msec
+
+  To summarize, during our measurements the average number of INTRODUCE2 cells
+  a mainloop event processed is ~2.42 cells (7931 cells for 3279 mainloop
+  invocations).
+
+  This means that, taking the mean of mainloop event times, it takes ~5.55msec
+  (13.43/2.42) to completely process an INTRODUCE2 cell. Then if we look deeper
+  we see that the "top half" of INTRODUCE2 cell processing takes 0.26 msec in
+  average, whereas the "bottom half" takes around 5.33 msec.
+
+  The heavyness of the "bottom half" is to be expected since that's where 95%
+  of the total work takes place: in particular the rendezvous path selection
+  and circuit launch.
+
+A.2. References
+
+    [REF_EQUIX]: https://github.com/tevador/equix
+                 https://github.com/tevador/equix/blob/master/devlog.md
+    [REF_TABLE]: The table is based on the script below plus some manual editing for readability:
+                 https://gist.github.com/asn-d6/99a936b0467b0cef88a677baaf0bbd04
+    [REF_BOTNET]: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2009/07/01121538/ynam_botnets_0907_en.pdf
+    [REF_CREDS]: https://lists.torproject.org/pipermail/tor-dev/2020-March/014198.html
+    [REF_TARGET]: https://en.bitcoin.it/wiki/Target
+    [REF_TLS]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
+               https://tools.ietf.org/id/draft-nir-tls-puzzles-00.html
+               https://tools.ietf.org/html/draft-ietf-ipsecme-ddos-protection-10
+    [REF_TLS_1]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
+    [REF_TEVADOR_1]: https://lists.torproject.org/pipermail/tor-dev/2020-May/014268.html
+    [REF_TEVADOR_2]: https://lists.torproject.org/pipermail/tor-dev/2020-June/014358.html
+    [REF_TEVADOR_SIM]: https://github.com/mikeperry-tor/scratchpad/blob/master/tor-pow/effort_sim.py#L57
+```
author	Micah Elizabeth Scott <beth@torproject.org>	2023-11-07 18:51:36 -0800
committer	Micah Elizabeth Scott <beth@torproject.org>	2023-11-09 14:16:27 -0800
commit	93fed8bb455d1d09a4cf0fb6c79e3ed67c302f6e (patch)
tree	282a52387d99fc6049b6d0592373409719878e02 /spec/hspow-spec
parent	49ecdc696262876c3113c48d86c659ce64697b0e (diff)
download	torspec-93fed8bb455d1d09a4cf0fb6c79e3ed67c302f6e.tar.gz torspec-93fed8bb455d1d09a4cf0fb6c79e3ed67c302f6e.zip