Split the hspow-spec into sections without making edits

This establishes a new outline for the hspow-spec. Otherwise, it's just cut and paste without any content edits. This also does not yet fully disentangling the common and v1-specific pieces of the document.
author: Micah Elizabeth Scott <beth@torproject.org> 2023-11-08 11:14:31 -0800
committer: Micah Elizabeth Scott <beth@torproject.org> 2023-11-09 14:16:27 -0800
commit: 8489561ae79210fd7561c5a799bd28c6654ffbab (patch)
tree: 9991e1f73938d7f6f975be4ce819702585f07302 /spec/hspow-spec
parent: dbf045b6d38fa99f19041c926a988a7bc08f59e6 (diff)
download: torspec-8489561ae79210fd7561c5a799bd28c6654ffbab.tar.gz
torspec-8489561ae79210fd7561c5a799bd28c6654ffbab.zip
6 files changed, 1225 insertions, 1212 deletions
diff --git a/spec/hspow-spec/analysis-discussion.md b/spec/hspow-spec/analysis-discussion.md
new file mode 100644
index 0000000..2c9d43f
--- /dev/null
+++ b/spec/hspow-spec/analysis-discussion.md
@@ -0,0 +1,591 @@
+```text
+
+5. Attacker strategies [ATTACK_META]
+
+  Now that we defined our protocol we need to start tweaking the various
+  knobs. But before we can do that, we first need to understand a few
+  high-level attacker strategies to see what we are fighting against.
+
+5.1.1. Overwhelm PoW verification (aka "Overwhelm top half") [ATTACK_TOP_HALF]
+
+  A basic attack here is the adversary spamming with bogus INTRO cells so that
+  the service does not have computing capacity to even verify the
+  proof-of-work. This adversary tries to overwhelm the procedure in the
+  [POW_VERIFY] section.
+
+  That's why we need the PoW algorithm to have a cheap verification time so
+  that this attack is not possible: we tune this PoW parameter in section
+  [POW_TUNING_VERIFICATION].
+
+5.1.2. Overwhelm rendezvous capacity (aka "Overwhelm bottom half")
+       [ATTACK_BOTTOM_HALF]
+
+  Given the way the introduction queue works (see [HANDLE_QUEUE]), a very
+  effective strategy for the attacker is to totally overwhelm the queue
+  processing by sending more high-effort introductions than the onion service
+  can handle at any given tick. This adversary tries to overwhelm the procedure
+  in the [HANDLE_QUEUE] section.
+
+  To do so, the attacker would have to send at least 20 high-effort
+  introduction cells every 100ms, where high-effort is a PoW which is above the
+  estimated level of "the motivated user" (see [USER_MODEL]).
+
+  An easier attack for the adversary, is the same strategy but with
+  introduction cells that are all above the comfortable level of "the standard
+  user" (see [USER_MODEL]). This would block out all standard users and only
+  allow motivated users to pass.
+
+5.1.3. Hybrid overwhelm strategy [ATTACK_HYBRID]
+
+  If both the top- and bottom- halves are processed by the same thread, this
+  opens up the possibility for a "hybrid" attack. Given the performance figures
+  for the bottom half (0.31 ms/req.) and the top half (5.5 ms/req.), the
+  attacker can optimally deny service by submitting 91 high-effort requests and
+  1520 invalid requests per second. This will completely saturate the main loop
+  because:
+
+  0.31*(1520+91) ~ 0.5 sec.
+  5.5*91         ~ 0.5 sec.
+
+  This attack only has half the bandwidth requirement of [ATTACK_TOP_HALF] and
+  half the compute requirement of [ATTACK_BOTTOM_HALF].
+
+  Alternatively, the attacker can adjust the ratio between invalid and
+  high-effort requests depending on their bandwidth and compute capabilities.
+
+5.1.4. Gaming the effort estimation logic [ATTACK_EFFORT]
+
+  Another way to beat this system is for the attacker to game the effort
+  estimation logic (see [EFFORT_ESTIMATION]). Essentially, there are two attacks
+  that we are trying to avoid:
+
+  - Attacker sets descriptor suggested-effort to a very high value effectively
+    making it impossible for most clients to produce a PoW token in a
+    reasonable timeframe.
+  - Attacker sets descriptor suggested-effort to a very small value so that
+    most clients aim for a small value while the attacker comfortably launches
+    an [ATTACK_BOTTOM_HALF] using medium effort PoW (see [REF_TEVADOR_1])
+
+5.1.4. Precomputed PoW attack
+
+  The attacker may precompute many valid PoW nonces and submit them all at once
+  before the current seed expires, overwhelming the service temporarily even
+  using a single computer. The current scheme gives the attackers 4 hours to
+  launch this attack since each seed lasts 2 hours and the service caches two
+  seeds.
+
+  An attacker with this attack might be aiming to DoS the service for a limited
+  amount of time, or to cause an [ATTACK_EFFORT] attack.
+
+6. Parameter tuning [POW_TUNING]
+
+  There are various parameters in this PoW system that need to be tuned:
+
+  We first start by tuning the time it takes to verify a PoW token. We do this
+  first because it's fundamental to the performance of onion services and can
+  turn into a DoS vector of its own. We will do this tuning in a way that's
+  agnostic to the chosen PoW function.
+
+  We will then move towards analyzing the client starting difficulty setting
+  for our PoW system. That defines the expected time for clients to succeed in
+  our system, and the expected time for attackers to overwhelm our system. Same
+  as above we will do this in a way that's agnostic to the chosen PoW function.
+
+  Currently, we have hardcoded the initial client starting difficulty at 8,
+  but this may be too low to ramp up quickly to various on and off attack
+  patterns. A higher initial difficulty may be needed for these, depending on
+  their severity. This section gives us an idea of how large such attacks can
+  be.
+
+  Finally, using those two pieces we will tune our PoW function and pick the
+  right client starting difficulty setting. At the end of this section we will
+  know the resources that an attacker needs to overwhelm the onion service, the
+  resources that the service needs to verify introduction requests, and the
+  resources that legitimate clients need to get to the onion service.
+
+6.1. PoW verification [POW_TUNING_VERIFICATION]
+
+  Verifying a PoW token is the first thing that a service does when it receives
+  an INTRODUCE2 cell and it's detailed in section [POW_VERIFY]. This
+  verification happens during the "top half" part of the process. Every
+  millisecond spent verifying PoW adds overhead to the already existing "top
+  half" part of handling an introduction cell. Hence we should be careful to
+  add minimal overhead here so that we don't enable attacks like [ATTACK_TOP_HALF].
+
+  During our performance measurements in [TOR_MEASUREMENTS] we learned that the
+  "top half" takes about 0.26 msecs in average, without doing any sort of PoW
+  verification. Using that value we compute the following table, that describes
+  the number of cells we can queue per second (aka times we can perform the
+  "top half" process) for different values of PoW verification time:
+
+      +---------------------+-----------------------+--------------+
+      |PoW Verification Time| Total "top half" time | Cells Queued |
+      |                     |                       |  per second  |
+      |---------------------|-----------------------|--------------|
+      |    0     msec       |    0.26      msec     |    3846      |
+      |    1     msec       |    1.26      msec     |    793       |
+      |    2     msec       |    2.26      msec     |    442       |
+      |    3     msec       |    3.26      msec     |    306       |
+      |    4     msec       |    4.26      msec     |    234       |
+      |    5     msec       |    5.26      msec     |    190       |
+      |    6     msec       |    6.26      msec     |    159       |
+      |    7     msec       |    7.26      msec     |    137       |
+      |    8     msec       |    8.26      msec     |    121       |
+      |    9     msec       |    9.26      msec     |    107       |
+      |    10    msec       |    10.26     msec     |    97        |
+      +---------------------+-----------------------+--------------+
+
+  Here is how you can read the table above:
+
+  - For a PoW function with a 1ms verification time, an attacker needs to send
+    793 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
+
+  - For a PoW function with a 2ms verification time, an attacker needs to send
+    442 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
+
+  - For a PoW function with a 10ms verification time, an attacker needs to send
+    97 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
+
+  Whether an attacker can succeed at that depends on the attacker's resources,
+  but also on the network's capacity.
+
+  Our purpose here is to have the smallest PoW verification overhead possible
+  that also allows us to achieve all our other goals.
+
+  [Note that the table above is simply the result of a naive multiplication and
+  does not take into account all the auxiliary overheads that happen every
+  second like the time to invoke the mainloop, the bottom-half processes, or
+  pretty much anything other than the "top-half" processing.
+
+  During our measurements the time to handle INTRODUCE2 cells dominates any
+  other action time: There might be events that require a long processing time,
+  but these are pretty infrequent (like uploading a new HS descriptor) and
+  hence over a long time they smooth out. Hence extrapolating the total cells
+  queued per second based on a single "top half" time seems like good enough to
+  get some initial intuition. That said, the values of "Cells queued per
+  second" from the table above, are likely much smaller than displayed above
+  because of all the auxiliary overheads.]
+
+6.2. PoW difficulty analysis [POW_DIFFICULTY_ANALYSIS]
+
+  The difficulty setting of our PoW basically dictates how difficult it should
+  be to get a success in our PoW system. An attacker who can get many successes
+  per second can pull a successful [ATTACK_BOTTOM_HALF] attack against our
+  system.
+
+  In classic PoW systems, "success" is defined as getting a hash output below
+  the "target". However, since our system is dynamic, we define "success" as an
+  abstract high-effort computation.
+
+  Our system is dynamic but we still need a starting difficulty setting that
+  will be used for bootstrapping the system. The client and attacker can still
+  aim higher or lower but for UX purposes and for analysis purposes we do need
+  to define a starting difficulty, to minimize retries by clients.
+
+6.2.1. Analysis based on adversary power
+
+  In this section we will try to do an analysis of PoW difficulty without using
+  any sort of Tor-related or PoW-related benchmark numbers.
+
+  We created the table (see [REF_TABLE]) below which shows how much time a
+  legitimate client with a single machine should expect to burn before they get
+  a single success. The x-axis is how many successes we want the attacker to be
+  able to do per second: the more successes we allow the adversary, the more
+  they can overwhelm our introduction queue. The y-axis is how many machines
+  the adversary has in her disposal, ranging from just 5 to 1000.
+
+       ===============================================================
+       |    Expected Time (in seconds) Per Success For One Machine   |
+ ===========================================================================
+ |                                                                          |
+ |   Attacker Succeses        1       5       10      20      30      50    |
+ |       per second                                                         |
+ |                                                                          |
+ |            5               5       1       0       0       0       0     |
+ |            50              50      10      5       2       1       1     |
+ |            100             100     20      10      5       3       2     |
+ | Attacker   200             200     40      20      10      6       4     |
+ |  Boxes     300             300     60      30      15      10      6     |
+ |            400             400     80      40      20      13      8     |
+ |            500             500     100     50      25      16      10    |
+ |            1000            1000    200     100     50      33      20    |
+ |                                                                          |
+ ============================================================================
+
+  Here is how you can read the table above:
+
+  - If an adversary has a botnet with 1000 boxes, and we want to limit her to 1
+    success per second, then a legitimate client with a single box should be
+    expected to spend 1000 seconds getting a single success.
+
+  - If an adversary has a botnet with 1000 boxes, and we want to limit her to 5
+    successes per second, then a legitimate client with a single box should be
+    expected to spend 200 seconds getting a single success.
+
+  - If an adversary has a botnet with 500 boxes, and we want to limit her to 5
+    successes per second, then a legitimate client with a single box should be
+    expected to spend 100 seconds getting a single success.
+
+  - If an adversary has access to 50 boxes, and we want to limit her to 5
+    successes per second, then a legitimate client with a single box should be
+    expected to spend 10 seconds getting a single success.
+
+  - If an adversary has access to 5 boxes, and we want to limit her to 5
+    successes per second, then a legitimate client with a single box should be
+    expected to spend 1 seconds getting a single success.
+
+  With the above table we can create some profiles for starting values of our
+  PoW difficulty.
+
+6.2.2. Analysis based on Tor's performance [POW_DIFFICULTY_TOR]
+
+  To go deeper here, we can use the performance measurements from
+  [TOR_MEASUREMENTS] to get a more specific intuition on the starting
+  difficulty. In particular, we learned that completely handling an
+  introduction cell takes 5.55 msecs in average. Using that value, we can
+  compute the following table, that describes the number of introduction cells
+  we can handle per second for different values of PoW verification:
+
+      +---------------------+-----------------------+--------------+
+      |PoW Verification Time| Total time to handle  | Cells handled|
+      |                     |   introduction cell   |  per second  |
+      |---------------------|-----------------------|--------------|
+      |    0      msec      |    5.55        msec   |    180.18    |
+      |    1      msec      |    6.55        msec   |    152.67    |
+      |    2      msec      |    7.55        msec   |    132.45    |
+      |    3      msec      |    8.55        msec   |    116.96    |
+      |    4      msec      |    9.55        mesc   |    104.71    |
+      |    5      msec      |    10.55       msec   |    94.79     |
+      |    6      msec      |    11.55       msec   |    86.58     |
+      |    7      msec      |    12.55       msec   |    79.68     |
+      |    8      msec      |    13.55       msec   |    73.80     |
+      |    9      msec      |    14.55       msec   |    68.73     |
+      |    10     msec      |    15.55       msec   |    64.31     |
+      +---------------------+-----------------------+--------------+
+
+  Here is how you can read the table above:
+
+  - For a PoW function with a 1ms verification time, an attacker needs to send
+    152 high-effort introduction cells per second to succeed in a
+    [ATTACK_BOTTOM_HALF] attack.
+
+  - For a PoW function with a 10ms verification time, an attacker needs to send
+    64 high-effort introduction cells per second to succeed in a
+    [ATTACK_BOTTOM_HALF] attack.
+
+  We can use this table to specify a starting difficulty that won't allow our
+  target adversary to succeed in an [ATTACK_BOTTOM_HALF] attack.
+
+  Of course, when it comes to this table, the same disclaimer as in section
+  [POW_TUNING_VERIFICATION] is valid. That is, the above table is just a
+  theoretical extrapolation and we expect the real values to be much lower
+  since they depend on auxiliary processing overheads, and on the network's
+  capacity.
+
+
+7. Discussion
+
+7.1. UX
+
+  This proposal has user facing UX consequences.
+
+  When the client first attempts a pow, it can note how long iterations of the
+  hash function take, and then use this to determine an estimation of the
+  duration of the PoW. This estimation could be communicated via the control
+  port or other mechanism, such that the browser could display how long the
+  PoW is expected to take on their device. If the device is a mobile platform,
+  and this time estimation is large, it could recommend that the user try from
+  a desktop machine.
+
+7.2. Future work [FUTURE_WORK]
+
+7.2.1. Incremental improvements to this proposal
+
+  There are various improvements that can be done in this proposal, and while
+  we are trying to keep this v1 version simple, we need to keep the design
+  extensible so that we build more features into it. In particular:
+
+  - End-to-end introduction ACKs
+
+    This proposal suffers from various UX issues because there is no end-to-end
+    mechanism for an onion service to inform the client about its introduction
+    request. If we had end-to-end introduction ACKs many of the problems from
+    [CLIENT_BEHAVIOR] would be alleviated. The problem here is that end-to-end
+    ACKs require modifications on the introduction point code and a network
+    update which is a lengthy process.
+
+  - Multithreading scheduler
+
+    Our scheduler is pretty limited by the fact that Tor has a single-threaded
+    design. If we improve our multithreading support we could handle a much
+    greater amount of introduction requests per second.
+
+7.2.2. Future designs [FUTURE_DESIGNS]
+
+  This is just the beginning in DoS defences for Tor and there are various
+  future designs and schemes that we can investigate. Here is a brief summary
+  of these:
+
+  "More advanced PoW schemes" -- We could use more advanced memory-hard PoW
+         schemes like MTP-argon2 or Itsuku to make it even harder for
+         adversaries to create successful PoWs. Unfortunately these schemes
+         have much bigger proof sizes, and they won't fit in INTRODUCE1 cells.
+         See #31223 for more details.
+
+  "Third-party anonymous credentials" -- We can use anonymous credentials and a
+         third-party token issuance server on the clearnet to issue tokens
+         based on PoW or CAPTCHA and then use those tokens to get access to the
+         service. See [REF_CREDS] for more details.
+
+  "PoW + Anonymous Credentials" -- We can make a hybrid of the above ideas
+         where we present a hard puzzle to the user when connecting to the
+         onion service, and if they solve it we then give the user a bunch of
+         anonymous tokens that can be used in the future. This can all happen
+         between the client and the service without a need for a third party.
+
+  All of the above approaches are much more complicated than this proposal, and
+  hence we want to start easy before we get into more serious projects.
+
+7.3. Environment
+
+  We love the environment! We are concerned of how PoW schemes can waste energy
+  by doing useless hash iterations. Here is a few reasons we still decided to
+  pursue a PoW approach here:
+
+  "We are not making things worse" -- DoS attacks are already happening and
+      attackers are already burning energy to carry them out both on the
+      attacker side, on the service side and on the network side. We think that
+      asking legitimate clients to carry out PoW computations is not gonna
+      affect the equation too much, since an attacker right now can very
+      quickly cause the same damage that hundreds of legitimate clients do a
+      whole day.
+
+  "We hope to make things better" -- The hope is that proposals like this will
+      make the DoS actors go away and hence the PoW system will not be used. As
+      long as DoS is happening there will be a waste of energy, but if we
+      manage to demotivate them with technical means, the network as a whole
+      will less wasteful. Also see [CATCH22] for a similar argument.
+
+8. Acknowledgements
+
+  Thanks a lot to tevador for the various improvements to the proposal and for
+  helping us understand and tweak the RandomX scheme.
+
+  Thanks to Solar Designer for the help in understanding the current PoW
+  landscape, the various approaches we could take, and teaching us a few neat
+  tricks.
+
+Appendix A.  Little-t tor introduction scheduler
+
+  This section describes how we will implement this proposal in the "tor"
+  software (little-t tor).
+
+  The following should be read as if tor is an onion service and thus the end
+  point of all inbound data.
+
+A.1. The Main Loop [MAIN_LOOP]
+
+  Tor uses libevent for its mainloop. For network I/O operations, a mainloop
+  event is used to inform tor if it can read on a certain socket, or a
+  connection object in tor.
+
+  From there, this event will empty the connection input buffer (inbuf) by
+  extracting and processing a cell at a time. The mainloop is single threaded
+  and thus each cell is handled sequentially.
+
+  Processing an INTRODUCE2 cell at the onion service means a series of
+  operations (in order):
+
+    1) Unpack cell from inbuf to local buffer.
+
+    2) Decrypt cell (AES operations).
+
+    3) Parse cell header and process it depending on its RELAY_COMMAND.
+
+    4) INTRODUCE2 cell handling which means building a rendezvous circuit:
+        i)  Path selection
+        ii) Launch circuit to first hop.
+
+    5) Return to mainloop event which essentially means back to step (1).
+
+  Tor will read at most 32 cells out of the inbuf per mainloop round.
+
+A.2. Requirements for PoW
+
+  With this proposal, in order to prioritize cells by the amount of PoW work
+  it has done, cells can _not_ be processed sequentially as described above.
+
+  Thus, we need a way to queue a certain number of cells, prioritize them and
+  then process some cell(s) from the top of the queue (that is, the cells that
+  have done the most PoW effort).
+
+  We thus require a new cell processing flow that is _not_ compatible with
+  current tor design. The elements are:
+
+    - Validate PoW and place cells in a priority queue of INTRODUCE2 cells (as
+      described in section [INTRO_QUEUE]).
+
+    - Defer "bottom half" INTRO2 cell processing for after cells have been
+      queued into the priority queue.
+
+A.3. Proposed scheduler [TOR_SCHEDULER]
+
+  The intuitive way to address the A.2 requirements would be to do this
+  simple and naive approach:
+
+    1) Mainloop: Empty inbuf INTRODUCE2 cells into priority queue
+
+    2) Process all cells in pqueue
+
+    3) Goto (1)
+
+  However, we are worried that handling all those cells before returning to the
+  mainloop opens possibilities of attack by an adversary since the priority
+  queue is not gonna be kept up to date while we process all those cells. This
+  means that we might spend lots of time dealing with introductions that don't
+  deserve it. See [BOTTOM_HALF_SCHEDULER] for more details.
+
+  We thus propose to split the INTRODUCE2 handling into two different steps:
+  "top half" and "bottom half" process, as also mentioned in [POW_VERIFY]
+  section above.
+
+A.3.1. Top half and bottom half scheduler
+
+  The top half process is responsible for queuing introductions into the
+  priority queue as follows:
+
+    a) Unpack cell from inbuf to local buffer.
+
+    b) Decrypt cell (AES operations).
+
+    c) Parse INTRODUCE2 cell header and validate PoW.
+
+    d) Return to mainloop event which essentially means step (1).
+
+  The top-half basically does all operations of section [MAIN_LOOP] except from (4).
+
+  An then, the bottom-half process is responsible for handling introductions
+  and doing rendezvous. To achieve this we introduce a new mainloop event to
+  process the priority queue _after_ the top-half event has completed. This new
+  event would do these operations sequentially:
+
+    a) Pop INTRODUCE2 cell from priority queue.
+
+    b) Parse and process INTRODUCE2 cell.
+
+    c) End event and yield back to mainloop.
+
+A.3.2. Scheduling the bottom half process [BOTTOM_HALF_SCHEDULER]
+
+  The question now becomes: when should the "bottom half" event get triggered
+  from the mainloop?
+
+  We propose that this event is scheduled in when the network I/O event
+  queues at least 1 cell into the priority queue. Then, as long as it has a
+  cell in the queue, it would re-schedule itself for immediate execution
+  meaning at the next mainloop round, it would execute again.
+
+  The idea is to try to empty the queue as fast as it can in order to provide a
+  fast response time to an introduction request but always leave a chance for
+  more cells to appear between cell processing by yielding back to the
+  mainloop. With this we are aiming to always have the most up-to-date version
+  of the priority queue when we are completing introductions: this way we are
+  prioritizing clients that spent a lot of time and effort completing their PoW.
+
+  If the size of the queue drops to 0, it stops scheduling itself in order to
+  not create a busy loop. The network I/O event will re-schedule it in time.
+
+  Notice that the proposed solution will make the service handle 1 single
+  introduction request at every main loop event. However, when we do
+  performance measurements we might learn that it's preferable to bump the
+  number of cells in the future from 1 to N where N <= 32.
+
+A.4 Performance measurements
+
+  This section will detail the performance measurements we've done on tor.git
+  for handling an INTRODUCE2 cell and then a discussion on how much more CPU
+  time we can add (for PoW validation) before it badly degrades our
+  performance.
+
+A.4.1 Tor measurements [TOR_MEASUREMENTS]
+
+  In this section we will derive measurement numbers for the "top half" and
+  "bottom half" parts of handling an introduction cell.
+
+  These measurements have been done on tor.git at commit
+  80031db32abebaf4d0a91c01db258fcdbd54a471.
+
+  We've measured several set of actions of the INTRODUCE2 cell handling process
+  on Intel(R) Xeon(R) CPU E5-2650 v4. Our service was accessed by an array of
+  clients that sent introduction requests for a period of 60 seconds.
+
+  1. Full Mainloop Event
+
+     We start by measuring the full time it takes for a mainloop event to
+     process an inbuf containing INTRODUCE2 cells. The mainloop event processed
+     2.42 cells per invocation on average during our measurements.
+
+     Total measurements: 3279
+
+       Min: 0.30 msec - 1st Q.: 5.47 msec - Median: 5.91 msec
+       Mean: 13.43 msec - 3rd Q.: 16.20 msec - Max: 257.95 msec
+
+  2. INTRODUCE2 cell processing (bottom-half)
+
+     We also measured how much time the "bottom half" part of the process
+     takes. That's the heavy part of processing an introduction request as seen
+     in step (4) of the [MAIN_LOOP] section:
+
+     Total measurements: 7931
+
+       Min: 0.28 msec - 1st Q.: 5.06 msec - Median: 5.33 msec
+       Mean: 5.29 msec - 3rd Q.: 5.57 msec - Max: 14.64 msec
+
+  3. Connection data read (top half)
+
+     Now that we have the above pieces, we can use them to measure just the
+     "top half" part of the procedure. That's when bytes are taken from the
+     connection inbound buffer and parsed into an INTRODUCE2 cell where basic
+     validation is done.
+
+     There is an average of 2.42 INTRODUCE2 cells per mainloop event and so we
+     divide that by the full mainloop event mean time to get the time for one
+     cell. From that we subtract the "bottom half" mean time to get how much
+     the "top half" takes:
+
+        => 13.43 / (7931 / 3279) = 5.55
+        => 5.55 - 5.29 = 0.26
+
+        Mean: 0.26 msec
+
+  To summarize, during our measurements the average number of INTRODUCE2 cells
+  a mainloop event processed is ~2.42 cells (7931 cells for 3279 mainloop
+  invocations).
+
+  This means that, taking the mean of mainloop event times, it takes ~5.55msec
+  (13.43/2.42) to completely process an INTRODUCE2 cell. Then if we look deeper
+  we see that the "top half" of INTRODUCE2 cell processing takes 0.26 msec in
+  average, whereas the "bottom half" takes around 5.33 msec.
+
+  The heavyness of the "bottom half" is to be expected since that's where 95%
+  of the total work takes place: in particular the rendezvous path selection
+  and circuit launch.
+
+A.2. References
+
+    [REF_EQUIX]: https://github.com/tevador/equix
+                 https://github.com/tevador/equix/blob/master/devlog.md
+    [REF_TABLE]: The table is based on the script below plus some manual editing for readability:
+                 https://gist.github.com/asn-d6/99a936b0467b0cef88a677baaf0bbd04
+    [REF_BOTNET]: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2009/07/01121538/ynam_botnets_0907_en.pdf
+    [REF_CREDS]: https://lists.torproject.org/pipermail/tor-dev/2020-March/014198.html
+    [REF_TARGET]: https://en.bitcoin.it/wiki/Target
+    [REF_TLS]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
+               https://tools.ietf.org/id/draft-nir-tls-puzzles-00.html
+               https://tools.ietf.org/html/draft-ietf-ipsecme-ddos-protection-10
+    [REF_TLS_1]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
+    [REF_TEVADOR_1]: https://lists.torproject.org/pipermail/tor-dev/2020-May/014268.html
+    [REF_TEVADOR_2]: https://lists.torproject.org/pipermail/tor-dev/2020-June/014358.html
+    [REF_TEVADOR_SIM]: https://github.com/mikeperry-tor/scratchpad/blob/master/tor-pow/effort_sim.py#L57
+
+```
diff --git a/spec/hspow-spec/common-protocol.md b/spec/hspow-spec/common-protocol.md
new file mode 100644
index 0000000..0aa9df5
--- /dev/null
+++ b/spec/hspow-spec/common-protocol.md
@@ -0,0 +1,227 @@
+```text
+
+3. Protocol specification
+
+3.4.1.1. Replay protection [REPLAY_PROTECTION]
+
+  The service MUST NOT accept introduction requests with the same (seed, nonce)
+  tuple. For this reason a replay protection mechanism must be employed.
+
+  The simplest way is to use a simple hash table to check whether a (seed,
+  nonce) tuple has been used before for the active duration of a
+  seed. Depending on how long a seed stays active this might be a viable
+  solution with reasonable memory/time overhead.
+
+  If there is a worry that we might get too many introductions during the
+  lifetime of a seed, we can use a Bloom filter as our replay cache
+  mechanism. The probabilistic nature of Bloom filters means that sometimes we
+  will flag some connections as replays even if they are not; with this false
+  positive probability increasing as the number of entries increase. However,
+  with the right parameter tuning this probability should be negligible and
+  well handled by clients.
+
+  {TODO: Design and specify a suitable bloom filter for this purpose.}
+
+3.4.2. The Introduction Queue  [INTRO_QUEUE]
+
+3.4.2.1. Adding introductions to the introduction queue [ADD_QUEUE]
+
+  When PoW is enabled and a verified introduction comes through, the service
+  instead of jumping straight into rendezvous, queues it and prioritizes it
+  based on how much effort was devoted by the client to PoW. This means that
+  introduction requests with high effort should be prioritized over those with
+  low effort.
+
+  To do so, the service maintains an "introduction priority queue" data
+  structure. Each element in that priority queue is an introduction request,
+  and its priority is the effort put into its PoW:
+
+  When a verified introduction comes through, the service uses its included
+  effort commitment value to place each request into the right position of the
+  priority_queue: The bigger the effort, the more priority it gets in the
+  queue. If two elements have the same effort, the older one has priority over
+  the newer one.
+
+3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
+
+  The service should handle introductions by pulling from the introduction
+  queue. We call this part of introduction handling the "bottom half" because
+  most of the computation happens in this stage. For a description of how we
+  expect such a system to work in Tor, see [TOR_SCHEDULER] section.
+
+3.4.3. PoW effort estimation [EFFORT_ESTIMATION]
+
+3.4.3.1. High-level description of the effort estimation process
+
+  The service starts with a default suggested-effort value of 0, which keeps
+  the PoW defenses dormant until we notice signs of overload.
+
+  The overall process of determining effort can be thought of as a set of
+  multiple coupled feedback loops. Clients perform their own effort
+  adjustments via [CLIENT_TIMEOUT] atop a base effort suggested by the service.
+  That suggestion incorporates the service's control adjustments atop a base
+  effort calculated using a sum of currently-queued client effort.
+
+  Each feedback loop has an opportunity to cover different time scales. Clients
+  can make adjustments at every single circuit creation request, whereas
+  services are limited by the extra load that frequent updates would place on
+  HSDir nodes.
+
+  In the combined client/service system these client-side increases are
+  expected to provide the most effective quick response to an emerging DoS
+  attack. After early clients increase the effort using [CLIENT_TIMEOUT],
+  later clients will benefit from the service detecting this increased queued
+  effort and offering a larger suggested_effort.
+
+  Effort increases and decreases both have an intrinsic cost. Increasing effort
+  will make the service more expensive to contact, and decreasing effort makes
+  new requests likely to become backlogged behind older requests. The steady
+  state condition is preferable to either of these side-effects, but ultimately
+  it's expected that the control loop always oscillates to some degree.
+
+3.4.3.2. Service-side effort estimation
+
+  Services keep an internal effort estimation which updates on a regular
+  periodic timer in response to measurements made on the queueing behavior
+  in the previous period. These internal effort changes can optionally trigger
+  client-visible suggested_effort changes when the difference is great enough
+  to warrant republishing to the HSDir.
+
+  This evaluation and update period is referred to as HS_UPDATE_PERIOD.
+  The service side effort estimation takes inspiration from TCP congestion
+  control's additive increase / multiplicative decrease approach, but unlike
+  a typical AIMD this algorithm is fixed-rate and doesn't update immediately
+  in response to events.
+
+  {TODO: HS_UPDATE_PERIOD is hardcoded to 300 (5 minutes) currently, but it
+   should be configurable in some way. Is it more appropriate to use the
+   service's torrc here or a consensus parameter?}
+
+3.4.3.3. Per-period service state
+
+  During each update period, the service maintains some state:
+
+    1. TOTAL_EFFORT, a sum of all effort values for rendezvous requests that
+       were successfully validated and enqueued.
+
+    2. REND_HANDLED, a count of rendezvous requests that were actually
+       launched. Requests that made it to dequeueing but were too old to launch
+       by then are not included.
+
+    3. HAD_QUEUE, a flag which is set if at any time in the update period we
+       saw the priority queue filled with more than a minimum amount of work,
+       greater than we would expect to process in approximately 1/4 second
+       using the configured dequeue rate.
+
+    4. MAX_TRIMMED_EFFORT, the largest observed single request effort that we
+       discarded during the period. Requests are discarded either due to age
+       (timeout) or during culling events that discard the bottom half of the
+       entire queue when it's too full.
+
+3.4.3.4. Service AIMD conditions
+
+  At the end of each period, the service may decide to increase effort,
+  decrease effort, or make no changes, based on these accumulated state values:
+
+    1. If MAX_TRIMMED_EFFORT > our previous internal suggested_effort,
+       always INCREASE. Requests that follow our latest advice are being
+       dropped.
+
+    2. If the HAD_QUEUE flag was set and the queue still contains at least
+       one item with effort >= our previous internal suggested_effort,
+       INCREASE. Even if we haven't yet reached the point of dropping requests,
+       this signal indicates that the our latest suggestion isn't high enough
+       and requests will build up in the queue.
+
+    3. If neither condition (1) or (2) are taking place and the queue is below
+       a level we would expect to process in approximately 1/4 second, choose
+       to DECREASE.
+
+    4. If none of these conditions match, the suggested effort is unchanged.
+
+  When we INCREASE, the internal suggested_effort is increased to either its
+  previous value + 1, or (TOTAL_EFFORT / REND_HANDLED), whichever is larger.
+
+  When we DECREASE, the internal suggested_effort is scaled by 2/3rds.
+
+  Over time, this will continue to decrease our effort suggestion any time the
+  service is fully processing its request queue. If the queue stays empty, the
+  effort suggestion decreases to zero and clients should no longer submit a
+  proof-of-work solution with their first connection attempt.
+
+  It's worth noting that the suggested-effort is not a hard limit to the
+  efforts that are accepted by the service, and it's only meant to serve as a
+  guideline for clients to reduce the number of unsuccessful requests that get
+  to the service. The service still adds requests with lower effort than
+  suggested-effort to the priority queue in [ADD_QUEUE].
+
+3.4.3.5. Updating descriptor with new suggested effort
+
+  The service descriptors may be updated for multiple reasons including
+  introduction point rotation common to all v3 onion services, the scheduled
+  seed rotations described in [DESC_POW], and updates to the effort suggestion.
+  Even though the internal effort estimate updates on a regular timer, we avoid
+  propagating those changes into the descriptor and the HSDir hosts unless
+  there is a significant change.
+
+  If the PoW params otherwise match but the seed has changed by less than 15
+  percent, services SHOULD NOT upload a new descriptor.
+
+4. Client behavior [CLIENT_BEHAVIOR]
+
+  This proposal introduces a bunch of new ways where a legitimate client can
+  fail to reach the onion service.
+
+  Furthermore, there is currently no end-to-end way for the onion service to
+  inform the client that the introduction failed. The INTRO_ACK cell is not
+  end-to-end (it's from the introduction point to the client) and hence it does
+  not allow the service to inform the client that the rendezvous is never gonna
+  occur.
+
+  From the client's perspective there's no way to attribute this failure to
+  the service itself rather than the introduction point, so error accounting
+  is performed separately for each introduction-point. Existing mechanisms
+  will discard an introduction point that's required too many retries.
+
+4.1. Clients handling timeouts [CLIENT_TIMEOUT]
+
+  Alice can fail to reach the onion service if her introduction request gets
+  trimmed off the priority queue in [HANDLE_QUEUE], or if the service does not
+  get through its priority queue in time and the connection times out.
+
+  This section presents a heuristic method for the client getting service even
+  in such scenarios.
+
+  If the rendezvous request times out, the client SHOULD fetch a new descriptor
+  for the service to make sure that it's using the right suggested-effort for
+  the PoW and the right PoW seed. If the fetched descriptor includes a new
+  suggested effort or seed, it should first retry the request with these
+  parameters.
+
+  {TODO: This is not actually implemented yet, but we should do it. How often
+     should clients at most try to fetch new descriptors? Determined by a
+     consensus parameter? This change will also allow clients to retry
+     effectively in cases where the service has just been reconfigured to
+     enable PoW defenses.}
+
+  Every time the client retries the connection, it will count these failures
+  per-introduction-point. These counts of previous retries are combined with
+  the service's suggested_effort when calculating the actual effort to spend
+  on any individual request to a service that advertises PoW support, even
+  when the currently advertised suggested_effort is zero.
+
+  On each retry, the client modifies its solver effort:
+
+    1. If the effort is below (CLIENT_POW_EFFORT_DOUBLE_UNTIL = 1000)
+       it will be doubled.
+
+    2. Otherwise, multiply the effort by (CLIENT_POW_RETRY_MULTIPLIER = 1.5).
+
+    3. Constrain the new effort to be at least
+       (CLIENT_MIN_RETRY_POW_EFFORT = 8) and no greater than
+       (CLIENT_MAX_POW_EFFORT = 10000)
+
+  {TODO: These hardcoded limits should be replaced by timed limits and/or
+      an unlimited solver with robust cancellation. This is issue tor#40787}
+
+```
+\ No newline at end of file
diff --git a/spec/hspow-spec/index.md b/spec/hspow-spec/index.md
index aea9c46..14db248 100644
--- a/spec/hspow-spec/index.md
+++ b/spec/hspow-spec/index.md
@@ -3,1215 +3,3 @@
 The overall denial-of-service prevention strategies in Tor are described in the [Denial-of-service prevention mechanisms in Tor](../dos-spec/index.md) document. This document describes one specific mitigation, the proof-of-work client puzzle for onion service introduction.
 
 This was originally [proposal 327, A First Take at PoW Over Introduction Circuits](../proposals/327-pow-over-intro.txt) authored by George Kadianakis, Mike Perry, David Goulet, and tevador.
-
-```text
-0. Abstract
-
-  This proposal aims to thwart introduction flooding DoS attacks by introducing
-  a dynamic Proof-Of-Work protocol that occurs over introduction circuits.
-
-1. Motivation
-
-  So far our attempts at limiting the impact of introduction flooding DoS
-  attacks on onion services has been focused on horizontal scaling with
-  Onionbalance, optimizing the CPU usage of Tor and applying rate limiting.
-  While these measures move the goalpost forward, a core problem with onion
-  service DoS is that building rendezvous circuits is a costly procedure both
-  for the service and for the network. For more information on the limitations
-  of rate-limiting when defending against DDoS, see [REF_TLS_1].
-
-  If we ever hope to have truly reachable global onion services, we need to
-  make it harder for attackers to overload the service with introduction
-  requests. This proposal achieves this by allowing onion services to specify
-  an optional dynamic proof-of-work scheme that its clients need to participate
-  in if they want to get served.
-
-  With the right parameters, this proof-of-work scheme acts as a gatekeeper to
-  block amplification attacks by attackers while letting legitimate clients
-  through.
-
-1.1. Related work
-
-  For a similar concept, see the three internet drafts that have been proposed
-  for defending against TLS-based DDoS attacks using client puzzles [REF_TLS].
-
-1.2. Threat model [THREAT_MODEL]
-
-1.2.1. Attacker profiles [ATTACKER_MODEL]
-
-  This proposal is written to thwart specific attackers. A simple PoW proposal
-  cannot defend against all and every DoS attack on the Internet, but there are
-  adversary models we can defend against.
-
-  Let's start with some adversary profiles:
-
-  "The script-kiddie"
-
-    The script-kiddie has a single computer and pushes it to its
-    limits. Perhaps it also has a VPS and a pwned server. We are talking about
-    an attacker with total access to 10 GHz of CPU and 10 GB of RAM. We
-    consider the total cost for this attacker to be zero $.
-
-  "The small botnet"
-
-    The small botnet is a bunch of computers lined up to do an introduction
-    flooding attack. Assuming 500 medium-range computers, we are talking about
-    an attacker with total access to 10 THz of CPU and 10 TB of RAM. We
-    consider the upfront cost for this attacker to be about $400.
-
-  "The large botnet"
-
-    The large botnet is a serious operation with many thousands of computers
-    organized to do this attack. Assuming 100k medium-range computers, we are
-    talking about an attacker with total access to 200 THz of CPU and 200 TB of
-    RAM. The upfront cost for this attacker is about $36k.
-
-  We hope that this proposal can help us defend against the script-kiddie
-  attacker and small botnets. To defend against a large botnet we would need
-  more tools at our disposal (see [FUTURE_DESIGNS]).
-
-1.2.2. User profiles [USER_MODEL]
-
-  We have attackers and we have users. Here are a few user profiles:
-
-  "The standard web user"
-
-    This is a standard laptop/desktop user who is trying to browse the
-    web. They don't know how these defences work and they don't care to
-    configure or tweak them. If the site doesn't load, they are gonna close
-    their browser and be sad at Tor. They run a 2GHz computer with 4GB of RAM.
-
-  "The motivated user"
-
-    This is a user that really wants to reach their destination. They don't
-    care about the journey; they just want to get there. They know what's going
-    on; they are willing to make their computer do expensive multi-minute PoW
-    computations to get where they want to be.
-
-  "The mobile user"
-
-    This is a motivated user on a mobile phone. Even tho they want to read the
-    news article, they don't have much leeway on stressing their machine to do
-    more computation.
-
-  We hope that this proposal will allow the motivated user to always connect
-  where they want to connect to, and also give more chances to the other user
-  groups to reach the destination.
-
-1.2.3. The DoS Catch-22 [CATCH22]
-
-  This proposal is not perfect and it does not cover all the use cases. Still,
-  we think that by covering some use cases and giving reachability to the
-  people who really need it, we will severely demotivate the attackers from
-  continuing the DoS attacks and hence stop the DoS threat all together.
-  Furthermore, by increasing the cost to launch a DoS attack, a big
-  class of DoS attackers will disappear from the map, since the expected ROI
-  will decrease.
-
-2. System Overview
-
-2.1. Tor protocol overview
-
-                                          +----------------------------------+
-                                          |          Onion Service           |
-   +-------+ INTRO1  +-----------+ INTRO2 +--------+                         |
-   |Client |-------->|Intro Point|------->|  PoW   |-----------+             |
-   +-------+         +-----------+        |Verifier|           |             |
-                                          +--------+           |             |
-                                          |                    |             |
-                                          |                    |             |
-                                          |         +----------v---------+   |
-                                          |         |Intro Priority Queue|   |
-                                          +---------+--------------------+---+
-                                                           |  |  |
-                                                Rendezvous |  |  |
-                                                  circuits |  |  |
-                                                           v  v  v
-
-
-
-  The proof-of-work scheme specified in this proposal takes place during the
-  introduction phase of the onion service protocol.
-
-  The system described in this proposal is not meant to be on all the time, and
-  it can be entirely disabled for services that do not experience DoS attacks.
-
-  When the subsystem is enabled, suggested effort is continuously adjusted and
-  the computational puzzle can be bypassed entirely when the effort reaches
-  zero. In these cases, the proof-of-work subsystem can be dormant but still
-  provide the necessary parameters for clients to voluntarily provide effort
-  in order to get better placement in the priority queue.
-
-  The protocol involves the following major steps:
-
-  1) Service encodes PoW parameters in descriptor [DESC_POW]
-  2) Client fetches descriptor and computes PoW [CLIENT_POW]
-  3) Client completes PoW and sends results in INTRO1 cell [INTRO1_POW]
-  4) Service verifies PoW and queues introduction based on PoW effort
-     [SERVICE_VERIFY]
-  5) Requests are continuously drained from the queue, highest effort first,
-     subject to multiple constraints on speed [HANDLE_QUEUE]
-
-2.2. Proof-of-work overview
-
-2.2.1. Algorithm overview
-
-  For our proof-of-work function we will use the Equi-X scheme by tevador
-  [REF_EQUIX].  Equi-X is an asymmetric PoW function based on Equihash<60,3>,
-  using HashX as the underlying layer. It features lightning fast verification
-  speed, and also aims to minimize the asymmetry between CPU and GPU.
-  Furthermore, it's designed for this particular use-case and hence
-  cryptocurrency miners are not incentivized to make optimized ASICs for it.
-
-  The overall scheme consists of several layers that provide different pieces
-  of this functionality:
-
-  1) At the lowest layers, blake2b and siphash are used as hashing and PRNG
-     algorithms that are well suited to common 64-bit CPUs.
-  2) A custom hash function family, HashX, randomizes its implementation for
-     each new seed value. These functions are tuned to utilize the pipelined
-     integer performance on a modern 64-bit CPU. This layer provides the
-     strongest ASIC resistance, since a hardware reimplementation would need
-     to include a CPU-like pipelined execution unit to keep up.
-  3) The Equi-X layer itself builds on HashX and adds an algorithmic puzzle
-     that's designed to be strongly asymmetric and to require RAM to solve
-     efficiently.
-  4) The PoW protocol itself builds on this Equi-X function with a particular
-     construction of the challenge input and particular constraints on the
-     allowed blake2b hash of the solution. This layer provides a linearly
-     adjustable effort that we can verify.
-  5) Above the level of individual PoW handshakes, the client and service
-     form a closed-loop system that adjusts the effort of future handshakes.
-
-  The Equi-X scheme provides two functions that will be used in this proposal:
-      - equix_solve(challenge) which solves a puzzle instance, returning
-        a variable number of solutions per invocation depending on the specific
-        challenge value.
-      - equix_verify(challenge, solution) which verifies a puzzle solution
-        quickly. Verification still depends on executing the HashX function,
-        but far fewer times than when searching for a solution.
-
-  For the purposes of this proposal, all cryptographic algorithms are assumed
-  to produce and consume byte strings, even if internally they operate on
-  some other data type like 64-bit words. This is conventionally little endian
-  order for blake2b, which contrasts with Tor's typical use of big endian.
-  HashX itself is configured with an 8-byte output but its input is a single
-  64-bit word of undefined byte order, of which only the low 16 bits are used
-  by Equi-X in its solution output. We treat Equi-X solution arrays as byte
-  arrays using their packed little endian 16-bit representation.
-
-  We tune Equi-X in section [EQUIX_TUNING].
-
-2.2.2. Dynamic PoW
-
-  DoS is a dynamic problem where the attacker's capabilities constantly change,
-  and hence we want our proof-of-work system to be dynamic and not stuck with a
-  static difficulty setting. Hence, instead of forcing clients to go below a
-  static target like in Bitcoin to be successful, we ask clients to "bid" using
-  their PoW effort. Effectively, a client gets higher priority the higher
-  effort they put into their proof-of-work. This is similar to how
-  proof-of-stake works but instead of staking coins, you stake work.
-
-  The benefit here is that legitimate clients who really care about getting
-  access can spend a big amount of effort into their PoW computation, which
-  should guarantee access to the service given reasonable adversary models. See
-  [PARAM_TUNING] for more details about these guarantees and tradeoffs.
-
-  As a way to improve reachability and UX, the service tries to estimate the
-  effort needed for clients to get access at any given time and places it in
-  the descriptor. See [EFFORT_ESTIMATION] for more details.
-
-2.2.3. PoW effort
-
-  It's common for proof-of-work systems to define an exponential effort
-  function based on a particular number of leading zero bits or equivalent.
-  For the benefit of our effort estimation system, it's quite useful if we
-  instead have a linear scale. We use the first 32 bits of a hashed version
-  of the Equi-X solution as compared to the full 32-bit range.
-
-  Conceptually we could define a function:
-         unsigned effort(uint8_t *token)
-  which takes as its argument a hashed solution, interprets it as a
-  bitstring, and returns the quotient of dividing a bitstring of 1s by it.
-
-  So for example:
-         effort(00000001100010101101) = 11111111111111111111
-                                          / 00000001100010101101
-  or the same in decimal:
-         effort(6317) = 1048575 / 6317 = 165.
-
-  In practice we can avoid even having to perform this division, performing
-  just one multiply instead to see if a request's claimed effort is supported
-  by the smallness of the resulting 32-bit hash prefix. This assumes we send
-  the desired effort explicitly as part of each PoW solution. We do want to
-  force clients to pick a specific effort before looking for a solution,
-  otherwise a client could opportunistically claim a very large effort any
-  time a lucky hash prefix comes up. Thus the effort is communicated explicitly
-  in our protocol, and it forms part of the concatenated Equi-X challenge.
-
-3. Protocol specification
-
-3.1. Service encodes PoW parameters in descriptor [DESC_POW]
-
-  This whole protocol starts with the service encoding the PoW parameters in
-  the 'encrypted' (inner) part of the v3 descriptor. As follows:
-
-       "pow-params" SP type SP seed-b64 SP suggested-effort
-                    SP expiration-time NL
-
-        [At most once]
-
-        type: The type of PoW system used. We call the one specified here "v1"
-
-        seed-b64: A random seed that should be used as the input to the PoW
-                  hash function. Should be 32 random bytes encoded in base64
-                  without trailing padding.
-
-        suggested-effort: An unsigned integer specifying an effort value that
-                  clients should aim for when contacting the service. Can be
-                  zero to mean that PoW is available but not currently
-                  suggested for a first connection attempt. See
-                  [EFFORT_ESTIMATION] for more details here.
-
-        expiration-time: A timestamp in "YYYY-MM-DDTHH:MM:SS" format (iso time
-                         with no space) after which the above seed expires and
-                         is no longer valid as the input for PoW. It's needed
-                         so that our replay cache does not grow infinitely. It
-                         should be set to RAND_TIME(now+7200, 900) seconds.
-
-   The service should refresh its seed when expiration-time passes. The service
-   SHOULD keep its previous seed in memory and accept PoWs using it to avoid
-   race-conditions with clients that have an old seed. The service SHOULD avoid
-   generating two consequent seeds that have a common 4 bytes prefix. See
-   [INTRO1_POW] for more info.
-
-   By RAND_TIME(ts, interval) we mean a time between ts-interval and ts, chosen
-   uniformly at random.
-
-3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
-
-  If a client receives a descriptor with "pow-params", it should assume that
-  the service is prepared to receive PoW solutions as part of the introduction
-  protocol.
-
-  The client parses the descriptor and extracts the PoW parameters. It makes
-  sure that the <expiration-time> has not expired and if it has, it needs to
-  fetch a new descriptor.
-
-  The client should then extract the <suggested-effort> field to configure its
-  PoW 'target' (see [REF_TARGET]). The client SHOULD NOT accept 'target' values
-  that will cause unacceptably long PoW computation.
-
-  The client uses a "personalization string" P equal to the following
-  nul-terminated ASCII string: "Tor hs intro v1\0".
-
-  The client looks up `ID`, the current 32-byte blinded public ID
-  (KP_hs_blind_id) for the onion service.
-
-  To complete the PoW the client follows the following logic:
-
-      a) Client selects a target effort E, based on <suggested-effort> and past
-         connection attempt history.
-      b) Client generates a secure random 16-byte nonce N, as the starting
-         point for the solution search.
-      c) Client derives seed C by decoding 'seed-b64'.
-      d) Client calculates S = equix_solve(P || ID || C || N || E)
-      e) Client calculates R = ntohl(blake2b_32(P || ID || C || N || E || S))
-      f) Client checks if R * E <= UINT32_MAX.
-        f1) If yes, success! The client can submit N, E, the first 4 bytes of
-        C, and S.
-        f2) If no, fail! The client interprets N as a 16-byte little-endian
-        integer, increments it by 1 and goes back to step d).
-
-  Note that the blake2b hash includes the output length parameter in its
-  initial state vector, so a blake2b_32 is not equivalent to the prefix of a
-  blake2b_512. We calculate the 32-bit blake2b specifically, and interpret it
-  in network byte order as an unsigned integer.
-
-  At the end of the above procedure, the client should have S as the solution
-  of the Equix-X puzzle with N as the nonce, C as the seed. How quickly this
-  happens depends solely on the target effort E parameter.
-
-  The algorithm as described is suitable for single-threaded computation.
-  Optionally, a client may choose multiple nonces and attempt several solutions
-  in parallel on separate CPU cores. The specific choice of nonce is entirely
-  up to the client, so parallelization choices like this do not impact the
-  network protocol's interoperability at all.
-
-3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
-
-  Now that the client has an answer to the puzzle it's time to encode it into
-  an INTRODUCE1 cell. To do so the client adds an extension to the encrypted
-  portion of the INTRODUCE1 cell by using the EXTENSIONS field (see
-  [PROCESS_INTRO2] section in rend-spec-v3.txt). The encrypted portion of the
-  INTRODUCE1 cell only gets read by the onion service and is ignored by the
-  introduction point.
-
-  We propose a new EXT_FIELD_TYPE value:
-
-     [02] -- PROOF_OF_WORK
-
-   The EXT_FIELD content format is:
-
-        POW_VERSION    [1 byte]
-        POW_NONCE      [16 bytes]
-        POW_EFFORT     [4 bytes]
-        POW_SEED       [4 bytes]
-        POW_SOLUTION   [16 bytes]
-
-   where:
-
-    POW_VERSION is 1 for the protocol specified in this proposal
-    POW_NONCE is the nonce 'N' from the section above
-    POW_EFFORT is the 32-bit integer effort value, in network byte order
-    POW_SEED is the first 4 bytes of the seed used
-
-   This will increase the INTRODUCE1 payload size by 43 bytes since the
-   extension type and length is 2 extra bytes, the N_EXTENSIONS field is always
-   present and currently set to 0 and the EXT_FIELD is 41 bytes. According to
-   ticket #33650, INTRODUCE1 cells currently have more than 200 bytes
-   available.
-
-3.4. Service verifies PoW and handles the introduction  [SERVICE_VERIFY]
-
-   When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it
-   should check its configuration on whether proof-of-work is enabled on the
-   service. If it's not enabled, the extension SHOULD BE ignored. If enabled,
-   even if the suggested effort is currently zero, the service follows the
-   procedure detailed in this section.
-
-   If the service requires the PROOF_OF_WORK extension but received an
-   INTRODUCE1 cell without any embedded proof-of-work, the service SHOULD
-   consider this cell as a zero-effort introduction for the purposes of the
-   priority queue (see section [INTRO_QUEUE]).
-
-3.4.1. PoW verification [POW_VERIFY]
-
-   To verify the client's proof-of-work the service MUST do the following steps:
-
-      a) Find a valid seed C that starts with POW_SEED. Fail if no such seed
-         exists.
-      b) Fail if N = POW_NONCE is present in the replay cache
-              (see [REPLAY_PROTECTION])
-      c) Calculate R = ntohl(blake2b_32(P || ID || C || N || E || S))
-      d) Fail if R * E > UINT32_MAX
-      e) Fail if equix_verify(P || ID || C || N || E, S) != EQUIX_OK
-      f) Put the request in the queue with a priority of E
-
-   If any of these steps fail the service MUST ignore this introduction request
-   and abort the protocol.
-
-   In this proposal we call the above steps the "top half" of introduction
-   handling. If all the steps of the "top half" have passed, then the circuit
-   is added to the introduction queue as detailed in section [INTRO_QUEUE].
-
-3.4.1.1. Replay protection [REPLAY_PROTECTION]
-
-  The service MUST NOT accept introduction requests with the same (seed, nonce)
-  tuple. For this reason a replay protection mechanism must be employed.
-
-  The simplest way is to use a simple hash table to check whether a (seed,
-  nonce) tuple has been used before for the active duration of a
-  seed. Depending on how long a seed stays active this might be a viable
-  solution with reasonable memory/time overhead.
-
-  If there is a worry that we might get too many introductions during the
-  lifetime of a seed, we can use a Bloom filter as our replay cache
-  mechanism. The probabilistic nature of Bloom filters means that sometimes we
-  will flag some connections as replays even if they are not; with this false
-  positive probability increasing as the number of entries increase. However,
-  with the right parameter tuning this probability should be negligible and
-  well handled by clients.
-
-  {TODO: Design and specify a suitable bloom filter for this purpose.}
-
-3.4.2. The Introduction Queue  [INTRO_QUEUE]
-
-3.4.2.1. Adding introductions to the introduction queue [ADD_QUEUE]
-
-  When PoW is enabled and a verified introduction comes through, the service
-  instead of jumping straight into rendezvous, queues it and prioritizes it
-  based on how much effort was devoted by the client to PoW. This means that
-  introduction requests with high effort should be prioritized over those with
-  low effort.
-
-  To do so, the service maintains an "introduction priority queue" data
-  structure. Each element in that priority queue is an introduction request,
-  and its priority is the effort put into its PoW:
-
-  When a verified introduction comes through, the service uses its included
-  effort commitment value to place each request into the right position of the
-  priority_queue: The bigger the effort, the more priority it gets in the
-  queue. If two elements have the same effort, the older one has priority over
-  the newer one.
-
-3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
-
-  The service should handle introductions by pulling from the introduction
-  queue. We call this part of introduction handling the "bottom half" because
-  most of the computation happens in this stage. For a description of how we
-  expect such a system to work in Tor, see [TOR_SCHEDULER] section.
-
-3.4.3. PoW effort estimation [EFFORT_ESTIMATION]
-
-3.4.3.1. High-level description of the effort estimation process
-
-  The service starts with a default suggested-effort value of 0, which keeps
-  the PoW defenses dormant until we notice signs of overload.
-
-  The overall process of determining effort can be thought of as a set of
-  multiple coupled feedback loops. Clients perform their own effort
-  adjustments via [CLIENT_TIMEOUT] atop a base effort suggested by the service.
-  That suggestion incorporates the service's control adjustments atop a base
-  effort calculated using a sum of currently-queued client effort.
-
-  Each feedback loop has an opportunity to cover different time scales. Clients
-  can make adjustments at every single circuit creation request, whereas
-  services are limited by the extra load that frequent updates would place on
-  HSDir nodes.
-
-  In the combined client/service system these client-side increases are
-  expected to provide the most effective quick response to an emerging DoS
-  attack. After early clients increase the effort using [CLIENT_TIMEOUT],
-  later clients will benefit from the service detecting this increased queued
-  effort and offering a larger suggested_effort.
-
-  Effort increases and decreases both have an intrinsic cost. Increasing effort
-  will make the service more expensive to contact, and decreasing effort makes
-  new requests likely to become backlogged behind older requests. The steady
-  state condition is preferable to either of these side-effects, but ultimately
-  it's expected that the control loop always oscillates to some degree.
-
-3.4.3.2. Service-side effort estimation
-
-  Services keep an internal effort estimation which updates on a regular
-  periodic timer in response to measurements made on the queueing behavior
-  in the previous period. These internal effort changes can optionally trigger
-  client-visible suggested_effort changes when the difference is great enough
-  to warrant republishing to the HSDir.
-
-  This evaluation and update period is referred to as HS_UPDATE_PERIOD.
-  The service side effort estimation takes inspiration from TCP congestion
-  control's additive increase / multiplicative decrease approach, but unlike
-  a typical AIMD this algorithm is fixed-rate and doesn't update immediately
-  in response to events.
-
-  {TODO: HS_UPDATE_PERIOD is hardcoded to 300 (5 minutes) currently, but it
-   should be configurable in some way. Is it more appropriate to use the
-   service's torrc here or a consensus parameter?}
-
-3.4.3.3. Per-period service state
-
-  During each update period, the service maintains some state:
-
-    1. TOTAL_EFFORT, a sum of all effort values for rendezvous requests that
-       were successfully validated and enqueued.
-
-    2. REND_HANDLED, a count of rendezvous requests that were actually
-       launched. Requests that made it to dequeueing but were too old to launch
-       by then are not included.
-
-    3. HAD_QUEUE, a flag which is set if at any time in the update period we
-       saw the priority queue filled with more than a minimum amount of work,
-       greater than we would expect to process in approximately 1/4 second
-       using the configured dequeue rate.
-
-    4. MAX_TRIMMED_EFFORT, the largest observed single request effort that we
-       discarded during the period. Requests are discarded either due to age
-       (timeout) or during culling events that discard the bottom half of the
-       entire queue when it's too full.
-
-3.4.3.4. Service AIMD conditions
-
-  At the end of each period, the service may decide to increase effort,
-  decrease effort, or make no changes, based on these accumulated state values:
-
-    1. If MAX_TRIMMED_EFFORT > our previous internal suggested_effort,
-       always INCREASE. Requests that follow our latest advice are being
-       dropped.
-
-    2. If the HAD_QUEUE flag was set and the queue still contains at least
-       one item with effort >= our previous internal suggested_effort,
-       INCREASE. Even if we haven't yet reached the point of dropping requests,
-       this signal indicates that the our latest suggestion isn't high enough
-       and requests will build up in the queue.
-
-    3. If neither condition (1) or (2) are taking place and the queue is below
-       a level we would expect to process in approximately 1/4 second, choose
-       to DECREASE.
-
-    4. If none of these conditions match, the suggested effort is unchanged.
-
-  When we INCREASE, the internal suggested_effort is increased to either its
-  previous value + 1, or (TOTAL_EFFORT / REND_HANDLED), whichever is larger.
-
-  When we DECREASE, the internal suggested_effort is scaled by 2/3rds.
-
-  Over time, this will continue to decrease our effort suggestion any time the
-  service is fully processing its request queue. If the queue stays empty, the
-  effort suggestion decreases to zero and clients should no longer submit a
-  proof-of-work solution with their first connection attempt.
-
-  It's worth noting that the suggested-effort is not a hard limit to the
-  efforts that are accepted by the service, and it's only meant to serve as a
-  guideline for clients to reduce the number of unsuccessful requests that get
-  to the service. The service still adds requests with lower effort than
-  suggested-effort to the priority queue in [ADD_QUEUE].
-
-3.4.3.5. Updating descriptor with new suggested effort
-
-  The service descriptors may be updated for multiple reasons including
-  introduction point rotation common to all v3 onion services, the scheduled
-  seed rotations described in [DESC_POW], and updates to the effort suggestion.
-  Even though the internal effort estimate updates on a regular timer, we avoid
-  propagating those changes into the descriptor and the HSDir hosts unless
-  there is a significant change.
-
-  If the PoW params otherwise match but the seed has changed by less than 15
-  percent, services SHOULD NOT upload a new descriptor.
-
-4. Client behavior [CLIENT_BEHAVIOR]
-
-  This proposal introduces a bunch of new ways where a legitimate client can
-  fail to reach the onion service.
-
-  Furthermore, there is currently no end-to-end way for the onion service to
-  inform the client that the introduction failed. The INTRO_ACK cell is not
-  end-to-end (it's from the introduction point to the client) and hence it does
-  not allow the service to inform the client that the rendezvous is never gonna
-  occur.
-
-  From the client's perspective there's no way to attribute this failure to
-  the service itself rather than the introduction point, so error accounting
-  is performed separately for each introduction-point. Existing mechanisms
-  will discard an introduction point that's required too many retries.
-
-4.1. Clients handling timeouts [CLIENT_TIMEOUT]
-
-  Alice can fail to reach the onion service if her introduction request gets
-  trimmed off the priority queue in [HANDLE_QUEUE], or if the service does not
-  get through its priority queue in time and the connection times out.
-
-  This section presents a heuristic method for the client getting service even
-  in such scenarios.
-
-  If the rendezvous request times out, the client SHOULD fetch a new descriptor
-  for the service to make sure that it's using the right suggested-effort for
-  the PoW and the right PoW seed. If the fetched descriptor includes a new
-  suggested effort or seed, it should first retry the request with these
-  parameters.
-
-  {TODO: This is not actually implemented yet, but we should do it. How often
-     should clients at most try to fetch new descriptors? Determined by a
-     consensus parameter? This change will also allow clients to retry
-     effectively in cases where the service has just been reconfigured to
-     enable PoW defenses.}
-
-  Every time the client retries the connection, it will count these failures
-  per-introduction-point. These counts of previous retries are combined with
-  the service's suggested_effort when calculating the actual effort to spend
-  on any individual request to a service that advertises PoW support, even
-  when the currently advertised suggested_effort is zero.
-
-  On each retry, the client modifies its solver effort:
-
-    1. If the effort is below (CLIENT_POW_EFFORT_DOUBLE_UNTIL = 1000)
-       it will be doubled.
-
-    2. Otherwise, multiply the effort by (CLIENT_POW_RETRY_MULTIPLIER = 1.5).
-
-    3. Constrain the new effort to be at least
-       (CLIENT_MIN_RETRY_POW_EFFORT = 8) and no greater than
-       (CLIENT_MAX_POW_EFFORT = 10000)
-
-  {TODO: These hardcoded limits should be replaced by timed limits and/or
-      an unlimited solver with robust cancellation. This is issue tor#40787}
-
-5. Attacker strategies [ATTACK_META]
-
-  Now that we defined our protocol we need to start tweaking the various
-  knobs. But before we can do that, we first need to understand a few
-  high-level attacker strategies to see what we are fighting against.
-
-5.1.1. Overwhelm PoW verification (aka "Overwhelm top half") [ATTACK_TOP_HALF]
-
-  A basic attack here is the adversary spamming with bogus INTRO cells so that
-  the service does not have computing capacity to even verify the
-  proof-of-work. This adversary tries to overwhelm the procedure in the
-  [POW_VERIFY] section.
-
-  That's why we need the PoW algorithm to have a cheap verification time so
-  that this attack is not possible: we tune this PoW parameter in section
-  [POW_TUNING_VERIFICATION].
-
-5.1.2. Overwhelm rendezvous capacity (aka "Overwhelm bottom half")
-       [ATTACK_BOTTOM_HALF]
-
-  Given the way the introduction queue works (see [HANDLE_QUEUE]), a very
-  effective strategy for the attacker is to totally overwhelm the queue
-  processing by sending more high-effort introductions than the onion service
-  can handle at any given tick. This adversary tries to overwhelm the procedure
-  in the [HANDLE_QUEUE] section.
-
-  To do so, the attacker would have to send at least 20 high-effort
-  introduction cells every 100ms, where high-effort is a PoW which is above the
-  estimated level of "the motivated user" (see [USER_MODEL]).
-
-  An easier attack for the adversary, is the same strategy but with
-  introduction cells that are all above the comfortable level of "the standard
-  user" (see [USER_MODEL]). This would block out all standard users and only
-  allow motivated users to pass.
-
-5.1.3. Hybrid overwhelm strategy [ATTACK_HYBRID]
-
-  If both the top- and bottom- halves are processed by the same thread, this
-  opens up the possibility for a "hybrid" attack. Given the performance figures
-  for the bottom half (0.31 ms/req.) and the top half (5.5 ms/req.), the
-  attacker can optimally deny service by submitting 91 high-effort requests and
-  1520 invalid requests per second. This will completely saturate the main loop
-  because:
-
-  0.31*(1520+91) ~ 0.5 sec.
-  5.5*91         ~ 0.5 sec.
-
-  This attack only has half the bandwidth requirement of [ATTACK_TOP_HALF] and
-  half the compute requirement of [ATTACK_BOTTOM_HALF].
-
-  Alternatively, the attacker can adjust the ratio between invalid and
-  high-effort requests depending on their bandwidth and compute capabilities.
-
-5.1.4. Gaming the effort estimation logic [ATTACK_EFFORT]
-
-  Another way to beat this system is for the attacker to game the effort
-  estimation logic (see [EFFORT_ESTIMATION]). Essentially, there are two attacks
-  that we are trying to avoid:
-
-  - Attacker sets descriptor suggested-effort to a very high value effectively
-    making it impossible for most clients to produce a PoW token in a
-    reasonable timeframe.
-  - Attacker sets descriptor suggested-effort to a very small value so that
-    most clients aim for a small value while the attacker comfortably launches
-    an [ATTACK_BOTTOM_HALF] using medium effort PoW (see [REF_TEVADOR_1])
-
-5.1.4. Precomputed PoW attack
-
-  The attacker may precompute many valid PoW nonces and submit them all at once
-  before the current seed expires, overwhelming the service temporarily even
-  using a single computer. The current scheme gives the attackers 4 hours to
-  launch this attack since each seed lasts 2 hours and the service caches two
-  seeds.
-
-  An attacker with this attack might be aiming to DoS the service for a limited
-  amount of time, or to cause an [ATTACK_EFFORT] attack.
-
-6. Parameter tuning [POW_TUNING]
-
-  There are various parameters in this PoW system that need to be tuned:
-
-  We first start by tuning the time it takes to verify a PoW token. We do this
-  first because it's fundamental to the performance of onion services and can
-  turn into a DoS vector of its own. We will do this tuning in a way that's
-  agnostic to the chosen PoW function.
-
-  We will then move towards analyzing the client starting difficulty setting
-  for our PoW system. That defines the expected time for clients to succeed in
-  our system, and the expected time for attackers to overwhelm our system. Same
-  as above we will do this in a way that's agnostic to the chosen PoW function.
-
-  Currently, we have hardcoded the initial client starting difficulty at 8,
-  but this may be too low to ramp up quickly to various on and off attack
-  patterns. A higher initial difficulty may be needed for these, depending on
-  their severity. This section gives us an idea of how large such attacks can
-  be.
-
-  Finally, using those two pieces we will tune our PoW function and pick the
-  right client starting difficulty setting. At the end of this section we will
-  know the resources that an attacker needs to overwhelm the onion service, the
-  resources that the service needs to verify introduction requests, and the
-  resources that legitimate clients need to get to the onion service.
-
-6.1. PoW verification [POW_TUNING_VERIFICATION]
-
-  Verifying a PoW token is the first thing that a service does when it receives
-  an INTRODUCE2 cell and it's detailed in section [POW_VERIFY]. This
-  verification happens during the "top half" part of the process. Every
-  millisecond spent verifying PoW adds overhead to the already existing "top
-  half" part of handling an introduction cell. Hence we should be careful to
-  add minimal overhead here so that we don't enable attacks like [ATTACK_TOP_HALF].
-
-  During our performance measurements in [TOR_MEASUREMENTS] we learned that the
-  "top half" takes about 0.26 msecs in average, without doing any sort of PoW
-  verification. Using that value we compute the following table, that describes
-  the number of cells we can queue per second (aka times we can perform the
-  "top half" process) for different values of PoW verification time:
-
-      +---------------------+-----------------------+--------------+
-      |PoW Verification Time| Total "top half" time | Cells Queued |
-      |                     |                       |  per second  |
-      |---------------------|-----------------------|--------------|
-      |    0     msec       |    0.26      msec     |    3846      |
-      |    1     msec       |    1.26      msec     |    793       |
-      |    2     msec       |    2.26      msec     |    442       |
-      |    3     msec       |    3.26      msec     |    306       |
-      |    4     msec       |    4.26      msec     |    234       |
-      |    5     msec       |    5.26      msec     |    190       |
-      |    6     msec       |    6.26      msec     |    159       |
-      |    7     msec       |    7.26      msec     |    137       |
-      |    8     msec       |    8.26      msec     |    121       |
-      |    9     msec       |    9.26      msec     |    107       |
-      |    10    msec       |    10.26     msec     |    97        |
-      +---------------------+-----------------------+--------------+
-
-  Here is how you can read the table above:
-
-  - For a PoW function with a 1ms verification time, an attacker needs to send
-    793 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
-
-  - For a PoW function with a 2ms verification time, an attacker needs to send
-    442 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
-
-  - For a PoW function with a 10ms verification time, an attacker needs to send
-    97 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
-
-  Whether an attacker can succeed at that depends on the attacker's resources,
-  but also on the network's capacity.
-
-  Our purpose here is to have the smallest PoW verification overhead possible
-  that also allows us to achieve all our other goals.
-
-  [Note that the table above is simply the result of a naive multiplication and
-  does not take into account all the auxiliary overheads that happen every
-  second like the time to invoke the mainloop, the bottom-half processes, or
-  pretty much anything other than the "top-half" processing.
-
-  During our measurements the time to handle INTRODUCE2 cells dominates any
-  other action time: There might be events that require a long processing time,
-  but these are pretty infrequent (like uploading a new HS descriptor) and
-  hence over a long time they smooth out. Hence extrapolating the total cells
-  queued per second based on a single "top half" time seems like good enough to
-  get some initial intuition. That said, the values of "Cells queued per
-  second" from the table above, are likely much smaller than displayed above
-  because of all the auxiliary overheads.]
-
-6.2. PoW difficulty analysis [POW_DIFFICULTY_ANALYSIS]
-
-  The difficulty setting of our PoW basically dictates how difficult it should
-  be to get a success in our PoW system. An attacker who can get many successes
-  per second can pull a successful [ATTACK_BOTTOM_HALF] attack against our
-  system.
-
-  In classic PoW systems, "success" is defined as getting a hash output below
-  the "target". However, since our system is dynamic, we define "success" as an
-  abstract high-effort computation.
-
-  Our system is dynamic but we still need a starting difficulty setting that
-  will be used for bootstrapping the system. The client and attacker can still
-  aim higher or lower but for UX purposes and for analysis purposes we do need
-  to define a starting difficulty, to minimize retries by clients.
-
-6.2.1. Analysis based on adversary power
-
-  In this section we will try to do an analysis of PoW difficulty without using
-  any sort of Tor-related or PoW-related benchmark numbers.
-
-  We created the table (see [REF_TABLE]) below which shows how much time a
-  legitimate client with a single machine should expect to burn before they get
-  a single success. The x-axis is how many successes we want the attacker to be
-  able to do per second: the more successes we allow the adversary, the more
-  they can overwhelm our introduction queue. The y-axis is how many machines
-  the adversary has in her disposal, ranging from just 5 to 1000.
-
-       ===============================================================
-       |    Expected Time (in seconds) Per Success For One Machine   |
- ===========================================================================
- |                                                                          |
- |   Attacker Succeses        1       5       10      20      30      50    |
- |       per second                                                         |
- |                                                                          |
- |            5               5       1       0       0       0       0     |
- |            50              50      10      5       2       1       1     |
- |            100             100     20      10      5       3       2     |
- | Attacker   200             200     40      20      10      6       4     |
- |  Boxes     300             300     60      30      15      10      6     |
- |            400             400     80      40      20      13      8     |
- |            500             500     100     50      25      16      10    |
- |            1000            1000    200     100     50      33      20    |
- |                                                                          |
- ============================================================================
-
-  Here is how you can read the table above:
-
-  - If an adversary has a botnet with 1000 boxes, and we want to limit her to 1
-    success per second, then a legitimate client with a single box should be
-    expected to spend 1000 seconds getting a single success.
-
-  - If an adversary has a botnet with 1000 boxes, and we want to limit her to 5
-    successes per second, then a legitimate client with a single box should be
-    expected to spend 200 seconds getting a single success.
-
-  - If an adversary has a botnet with 500 boxes, and we want to limit her to 5
-    successes per second, then a legitimate client with a single box should be
-    expected to spend 100 seconds getting a single success.
-
-  - If an adversary has access to 50 boxes, and we want to limit her to 5
-    successes per second, then a legitimate client with a single box should be
-    expected to spend 10 seconds getting a single success.
-
-  - If an adversary has access to 5 boxes, and we want to limit her to 5
-    successes per second, then a legitimate client with a single box should be
-    expected to spend 1 seconds getting a single success.
-
-  With the above table we can create some profiles for starting values of our
-  PoW difficulty.
-
-6.2.2. Analysis based on Tor's performance [POW_DIFFICULTY_TOR]
-
-  To go deeper here, we can use the performance measurements from
-  [TOR_MEASUREMENTS] to get a more specific intuition on the starting
-  difficulty. In particular, we learned that completely handling an
-  introduction cell takes 5.55 msecs in average. Using that value, we can
-  compute the following table, that describes the number of introduction cells
-  we can handle per second for different values of PoW verification:
-
-      +---------------------+-----------------------+--------------+
-      |PoW Verification Time| Total time to handle  | Cells handled|
-      |                     |   introduction cell   |  per second  |
-      |---------------------|-----------------------|--------------|
-      |    0      msec      |    5.55        msec   |    180.18    |
-      |    1      msec      |    6.55        msec   |    152.67    |
-      |    2      msec      |    7.55        msec   |    132.45    |
-      |    3      msec      |    8.55        msec   |    116.96    |
-      |    4      msec      |    9.55        mesc   |    104.71    |
-      |    5      msec      |    10.55       msec   |    94.79     |
-      |    6      msec      |    11.55       msec   |    86.58     |
-      |    7      msec      |    12.55       msec   |    79.68     |
-      |    8      msec      |    13.55       msec   |    73.80     |
-      |    9      msec      |    14.55       msec   |    68.73     |
-      |    10     msec      |    15.55       msec   |    64.31     |
-      +---------------------+-----------------------+--------------+
-
-  Here is how you can read the table above:
-
-  - For a PoW function with a 1ms verification time, an attacker needs to send
-    152 high-effort introduction cells per second to succeed in a
-    [ATTACK_BOTTOM_HALF] attack.
-
-  - For a PoW function with a 10ms verification time, an attacker needs to send
-    64 high-effort introduction cells per second to succeed in a
-    [ATTACK_BOTTOM_HALF] attack.
-
-  We can use this table to specify a starting difficulty that won't allow our
-  target adversary to succeed in an [ATTACK_BOTTOM_HALF] attack.
-
-  Of course, when it comes to this table, the same disclaimer as in section
-  [POW_TUNING_VERIFICATION] is valid. That is, the above table is just a
-  theoretical extrapolation and we expect the real values to be much lower
-  since they depend on auxiliary processing overheads, and on the network's
-  capacity.
-
-
-7. Discussion
-
-7.1. UX
-
-  This proposal has user facing UX consequences.
-
-  When the client first attempts a pow, it can note how long iterations of the
-  hash function take, and then use this to determine an estimation of the
-  duration of the PoW. This estimation could be communicated via the control
-  port or other mechanism, such that the browser could display how long the
-  PoW is expected to take on their device. If the device is a mobile platform,
-  and this time estimation is large, it could recommend that the user try from
-  a desktop machine.
-
-7.2. Future work [FUTURE_WORK]
-
-7.2.1. Incremental improvements to this proposal
-
-  There are various improvements that can be done in this proposal, and while
-  we are trying to keep this v1 version simple, we need to keep the design
-  extensible so that we build more features into it. In particular:
-
-  - End-to-end introduction ACKs
-
-    This proposal suffers from various UX issues because there is no end-to-end
-    mechanism for an onion service to inform the client about its introduction
-    request. If we had end-to-end introduction ACKs many of the problems from
-    [CLIENT_BEHAVIOR] would be alleviated. The problem here is that end-to-end
-    ACKs require modifications on the introduction point code and a network
-    update which is a lengthy process.
-
-  - Multithreading scheduler
-
-    Our scheduler is pretty limited by the fact that Tor has a single-threaded
-    design. If we improve our multithreading support we could handle a much
-    greater amount of introduction requests per second.
-
-7.2.2. Future designs [FUTURE_DESIGNS]
-
-  This is just the beginning in DoS defences for Tor and there are various
-  future designs and schemes that we can investigate. Here is a brief summary
-  of these:
-
-  "More advanced PoW schemes" -- We could use more advanced memory-hard PoW
-         schemes like MTP-argon2 or Itsuku to make it even harder for
-         adversaries to create successful PoWs. Unfortunately these schemes
-         have much bigger proof sizes, and they won't fit in INTRODUCE1 cells.
-         See #31223 for more details.
-
-  "Third-party anonymous credentials" -- We can use anonymous credentials and a
-         third-party token issuance server on the clearnet to issue tokens
-         based on PoW or CAPTCHA and then use those tokens to get access to the
-         service. See [REF_CREDS] for more details.
-
-  "PoW + Anonymous Credentials" -- We can make a hybrid of the above ideas
-         where we present a hard puzzle to the user when connecting to the
-         onion service, and if they solve it we then give the user a bunch of
-         anonymous tokens that can be used in the future. This can all happen
-         between the client and the service without a need for a third party.
-
-  All of the above approaches are much more complicated than this proposal, and
-  hence we want to start easy before we get into more serious projects.
-
-7.3. Environment
-
-  We love the environment! We are concerned of how PoW schemes can waste energy
-  by doing useless hash iterations. Here is a few reasons we still decided to
-  pursue a PoW approach here:
-
-  "We are not making things worse" -- DoS attacks are already happening and
-      attackers are already burning energy to carry them out both on the
-      attacker side, on the service side and on the network side. We think that
-      asking legitimate clients to carry out PoW computations is not gonna
-      affect the equation too much, since an attacker right now can very
-      quickly cause the same damage that hundreds of legitimate clients do a
-      whole day.
-
-  "We hope to make things better" -- The hope is that proposals like this will
-      make the DoS actors go away and hence the PoW system will not be used. As
-      long as DoS is happening there will be a waste of energy, but if we
-      manage to demotivate them with technical means, the network as a whole
-      will less wasteful. Also see [CATCH22] for a similar argument.
-
-8. Acknowledgements
-
-  Thanks a lot to tevador for the various improvements to the proposal and for
-  helping us understand and tweak the RandomX scheme.
-
-  Thanks to Solar Designer for the help in understanding the current PoW
-  landscape, the various approaches we could take, and teaching us a few neat
-  tricks.
-
-Appendix A.  Little-t tor introduction scheduler
-
-  This section describes how we will implement this proposal in the "tor"
-  software (little-t tor).
-
-  The following should be read as if tor is an onion service and thus the end
-  point of all inbound data.
-
-A.1. The Main Loop [MAIN_LOOP]
-
-  Tor uses libevent for its mainloop. For network I/O operations, a mainloop
-  event is used to inform tor if it can read on a certain socket, or a
-  connection object in tor.
-
-  From there, this event will empty the connection input buffer (inbuf) by
-  extracting and processing a cell at a time. The mainloop is single threaded
-  and thus each cell is handled sequentially.
-
-  Processing an INTRODUCE2 cell at the onion service means a series of
-  operations (in order):
-
-    1) Unpack cell from inbuf to local buffer.
-
-    2) Decrypt cell (AES operations).
-
-    3) Parse cell header and process it depending on its RELAY_COMMAND.
-
-    4) INTRODUCE2 cell handling which means building a rendezvous circuit:
-        i)  Path selection
-        ii) Launch circuit to first hop.
-
-    5) Return to mainloop event which essentially means back to step (1).
-
-  Tor will read at most 32 cells out of the inbuf per mainloop round.
-
-A.2. Requirements for PoW
-
-  With this proposal, in order to prioritize cells by the amount of PoW work
-  it has done, cells can _not_ be processed sequentially as described above.
-
-  Thus, we need a way to queue a certain number of cells, prioritize them and
-  then process some cell(s) from the top of the queue (that is, the cells that
-  have done the most PoW effort).
-
-  We thus require a new cell processing flow that is _not_ compatible with
-  current tor design. The elements are:
-
-    - Validate PoW and place cells in a priority queue of INTRODUCE2 cells (as
-      described in section [INTRO_QUEUE]).
-
-    - Defer "bottom half" INTRO2 cell processing for after cells have been
-      queued into the priority queue.
-
-A.3. Proposed scheduler [TOR_SCHEDULER]
-
-  The intuitive way to address the A.2 requirements would be to do this
-  simple and naive approach:
-
-    1) Mainloop: Empty inbuf INTRODUCE2 cells into priority queue
-
-    2) Process all cells in pqueue
-
-    3) Goto (1)
-
-  However, we are worried that handling all those cells before returning to the
-  mainloop opens possibilities of attack by an adversary since the priority
-  queue is not gonna be kept up to date while we process all those cells. This
-  means that we might spend lots of time dealing with introductions that don't
-  deserve it. See [BOTTOM_HALF_SCHEDULER] for more details.
-
-  We thus propose to split the INTRODUCE2 handling into two different steps:
-  "top half" and "bottom half" process, as also mentioned in [POW_VERIFY]
-  section above.
-
-A.3.1. Top half and bottom half scheduler
-
-  The top half process is responsible for queuing introductions into the
-  priority queue as follows:
-
-    a) Unpack cell from inbuf to local buffer.
-
-    b) Decrypt cell (AES operations).
-
-    c) Parse INTRODUCE2 cell header and validate PoW.
-
-    d) Return to mainloop event which essentially means step (1).
-
-  The top-half basically does all operations of section [MAIN_LOOP] except from (4).
-
-  An then, the bottom-half process is responsible for handling introductions
-  and doing rendezvous. To achieve this we introduce a new mainloop event to
-  process the priority queue _after_ the top-half event has completed. This new
-  event would do these operations sequentially:
-
-    a) Pop INTRODUCE2 cell from priority queue.
-
-    b) Parse and process INTRODUCE2 cell.
-
-    c) End event and yield back to mainloop.
-
-A.3.2. Scheduling the bottom half process [BOTTOM_HALF_SCHEDULER]
-
-  The question now becomes: when should the "bottom half" event get triggered
-  from the mainloop?
-
-  We propose that this event is scheduled in when the network I/O event
-  queues at least 1 cell into the priority queue. Then, as long as it has a
-  cell in the queue, it would re-schedule itself for immediate execution
-  meaning at the next mainloop round, it would execute again.
-
-  The idea is to try to empty the queue as fast as it can in order to provide a
-  fast response time to an introduction request but always leave a chance for
-  more cells to appear between cell processing by yielding back to the
-  mainloop. With this we are aiming to always have the most up-to-date version
-  of the priority queue when we are completing introductions: this way we are
-  prioritizing clients that spent a lot of time and effort completing their PoW.
-
-  If the size of the queue drops to 0, it stops scheduling itself in order to
-  not create a busy loop. The network I/O event will re-schedule it in time.
-
-  Notice that the proposed solution will make the service handle 1 single
-  introduction request at every main loop event. However, when we do
-  performance measurements we might learn that it's preferable to bump the
-  number of cells in the future from 1 to N where N <= 32.
-
-A.4 Performance measurements
-
-  This section will detail the performance measurements we've done on tor.git
-  for handling an INTRODUCE2 cell and then a discussion on how much more CPU
-  time we can add (for PoW validation) before it badly degrades our
-  performance.
-
-A.4.1 Tor measurements [TOR_MEASUREMENTS]
-
-  In this section we will derive measurement numbers for the "top half" and
-  "bottom half" parts of handling an introduction cell.
-
-  These measurements have been done on tor.git at commit
-  80031db32abebaf4d0a91c01db258fcdbd54a471.
-
-  We've measured several set of actions of the INTRODUCE2 cell handling process
-  on Intel(R) Xeon(R) CPU E5-2650 v4. Our service was accessed by an array of
-  clients that sent introduction requests for a period of 60 seconds.
-
-  1. Full Mainloop Event
-
-     We start by measuring the full time it takes for a mainloop event to
-     process an inbuf containing INTRODUCE2 cells. The mainloop event processed
-     2.42 cells per invocation on average during our measurements.
-
-     Total measurements: 3279
-
-       Min: 0.30 msec - 1st Q.: 5.47 msec - Median: 5.91 msec
-       Mean: 13.43 msec - 3rd Q.: 16.20 msec - Max: 257.95 msec
-
-  2. INTRODUCE2 cell processing (bottom-half)
-
-     We also measured how much time the "bottom half" part of the process
-     takes. That's the heavy part of processing an introduction request as seen
-     in step (4) of the [MAIN_LOOP] section:
-
-     Total measurements: 7931
-
-       Min: 0.28 msec - 1st Q.: 5.06 msec - Median: 5.33 msec
-       Mean: 5.29 msec - 3rd Q.: 5.57 msec - Max: 14.64 msec
-
-  3. Connection data read (top half)
-
-     Now that we have the above pieces, we can use them to measure just the
-     "top half" part of the procedure. That's when bytes are taken from the
-     connection inbound buffer and parsed into an INTRODUCE2 cell where basic
-     validation is done.
-
-     There is an average of 2.42 INTRODUCE2 cells per mainloop event and so we
-     divide that by the full mainloop event mean time to get the time for one
-     cell. From that we subtract the "bottom half" mean time to get how much
-     the "top half" takes:
-
-        => 13.43 / (7931 / 3279) = 5.55
-        => 5.55 - 5.29 = 0.26
-
-        Mean: 0.26 msec
-
-  To summarize, during our measurements the average number of INTRODUCE2 cells
-  a mainloop event processed is ~2.42 cells (7931 cells for 3279 mainloop
-  invocations).
-
-  This means that, taking the mean of mainloop event times, it takes ~5.55msec
-  (13.43/2.42) to completely process an INTRODUCE2 cell. Then if we look deeper
-  we see that the "top half" of INTRODUCE2 cell processing takes 0.26 msec in
-  average, whereas the "bottom half" takes around 5.33 msec.
-
-  The heavyness of the "bottom half" is to be expected since that's where 95%
-  of the total work takes place: in particular the rendezvous path selection
-  and circuit launch.
-
-A.2. References
-
-    [REF_EQUIX]: https://github.com/tevador/equix
-                 https://github.com/tevador/equix/blob/master/devlog.md
-    [REF_TABLE]: The table is based on the script below plus some manual editing for readability:
-                 https://gist.github.com/asn-d6/99a936b0467b0cef88a677baaf0bbd04
-    [REF_BOTNET]: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2009/07/01121538/ynam_botnets_0907_en.pdf
-    [REF_CREDS]: https://lists.torproject.org/pipermail/tor-dev/2020-March/014198.html
-    [REF_TARGET]: https://en.bitcoin.it/wiki/Target
-    [REF_TLS]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
-               https://tools.ietf.org/id/draft-nir-tls-puzzles-00.html
-               https://tools.ietf.org/html/draft-ietf-ipsecme-ddos-protection-10
-    [REF_TLS_1]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
-    [REF_TEVADOR_1]: https://lists.torproject.org/pipermail/tor-dev/2020-May/014268.html
-    [REF_TEVADOR_2]: https://lists.torproject.org/pipermail/tor-dev/2020-June/014358.html
-    [REF_TEVADOR_SIM]: https://github.com/mikeperry-tor/scratchpad/blob/master/tor-pow/effort_sim.py#L57
-```
diff --git a/spec/hspow-spec/motivation.md b/spec/hspow-spec/motivation.md
new file mode 100644
index 0000000..1c77f58
--- /dev/null
+++ b/spec/hspow-spec/motivation.md
@@ -0,0 +1,106 @@
+```text
+
+0. Abstract
+
+  This proposal aims to thwart introduction flooding DoS attacks by introducing
+  a dynamic Proof-Of-Work protocol that occurs over introduction circuits.
+
+1. Motivation
+
+  So far our attempts at limiting the impact of introduction flooding DoS
+  attacks on onion services has been focused on horizontal scaling with
+  Onionbalance, optimizing the CPU usage of Tor and applying rate limiting.
+  While these measures move the goalpost forward, a core problem with onion
+  service DoS is that building rendezvous circuits is a costly procedure both
+  for the service and for the network. For more information on the limitations
+  of rate-limiting when defending against DDoS, see [REF_TLS_1].
+
+  If we ever hope to have truly reachable global onion services, we need to
+  make it harder for attackers to overload the service with introduction
+  requests. This proposal achieves this by allowing onion services to specify
+  an optional dynamic proof-of-work scheme that its clients need to participate
+  in if they want to get served.
+
+  With the right parameters, this proof-of-work scheme acts as a gatekeeper to
+  block amplification attacks by attackers while letting legitimate clients
+  through.
+
+1.1. Related work
+
+  For a similar concept, see the three internet drafts that have been proposed
+  for defending against TLS-based DDoS attacks using client puzzles [REF_TLS].
+
+1.2. Threat model [THREAT_MODEL]
+
+1.2.1. Attacker profiles [ATTACKER_MODEL]
+
+  This proposal is written to thwart specific attackers. A simple PoW proposal
+  cannot defend against all and every DoS attack on the Internet, but there are
+  adversary models we can defend against.
+
+  Let's start with some adversary profiles:
+
+  "The script-kiddie"
+
+    The script-kiddie has a single computer and pushes it to its
+    limits. Perhaps it also has a VPS and a pwned server. We are talking about
+    an attacker with total access to 10 GHz of CPU and 10 GB of RAM. We
+    consider the total cost for this attacker to be zero $.
+
+  "The small botnet"
+
+    The small botnet is a bunch of computers lined up to do an introduction
+    flooding attack. Assuming 500 medium-range computers, we are talking about
+    an attacker with total access to 10 THz of CPU and 10 TB of RAM. We
+    consider the upfront cost for this attacker to be about $400.
+
+  "The large botnet"
+
+    The large botnet is a serious operation with many thousands of computers
+    organized to do this attack. Assuming 100k medium-range computers, we are
+    talking about an attacker with total access to 200 THz of CPU and 200 TB of
+    RAM. The upfront cost for this attacker is about $36k.
+
+  We hope that this proposal can help us defend against the script-kiddie
+  attacker and small botnets. To defend against a large botnet we would need
+  more tools at our disposal (see [FUTURE_DESIGNS]).
+
+1.2.2. User profiles [USER_MODEL]
+
+  We have attackers and we have users. Here are a few user profiles:
+
+  "The standard web user"
+
+    This is a standard laptop/desktop user who is trying to browse the
+    web. They don't know how these defences work and they don't care to
+    configure or tweak them. If the site doesn't load, they are gonna close
+    their browser and be sad at Tor. They run a 2GHz computer with 4GB of RAM.
+
+  "The motivated user"
+
+    This is a user that really wants to reach their destination. They don't
+    care about the journey; they just want to get there. They know what's going
+    on; they are willing to make their computer do expensive multi-minute PoW
+    computations to get where they want to be.
+
+  "The mobile user"
+
+    This is a motivated user on a mobile phone. Even tho they want to read the
+    news article, they don't have much leeway on stressing their machine to do
+    more computation.
+
+  We hope that this proposal will allow the motivated user to always connect
+  where they want to connect to, and also give more chances to the other user
+  groups to reach the destination.
+
+1.2.3. The DoS Catch-22 [CATCH22]
+
+  This proposal is not perfect and it does not cover all the use cases. Still,
+  we think that by covering some use cases and giving reachability to the
+  people who really need it, we will severely demotivate the attackers from
+  continuing the DoS attacks and hence stop the DoS threat all together.
+  Furthermore, by increasing the cost to launch a DoS attack, a big
+  class of DoS attackers will disappear from the map, since the expected ROI
+  will decrease.
+
+```
+\ No newline at end of file
diff --git a/spec/hspow-spec/overview.md b/spec/hspow-spec/overview.md
new file mode 100644
index 0000000..cb4ea7e
--- /dev/null
+++ b/spec/hspow-spec/overview.md
@@ -0,0 +1,68 @@
+```text
+
+2. System Overview
+
+2.1. Tor protocol overview
+
+                                          +----------------------------------+
+                                          |          Onion Service           |
+   +-------+ INTRO1  +-----------+ INTRO2 +--------+                         |
+   |Client |-------->|Intro Point|------->|  PoW   |-----------+             |
+   +-------+         +-----------+        |Verifier|           |             |
+                                          +--------+           |             |
+                                          |                    |             |
+                                          |                    |             |
+                                          |         +----------v---------+   |
+                                          |         |Intro Priority Queue|   |
+                                          +---------+--------------------+---+
+                                                           |  |  |
+                                                Rendezvous |  |  |
+                                                  circuits |  |  |
+                                                           v  v  v
+
+
+
+  The proof-of-work scheme specified in this proposal takes place during the
+  introduction phase of the onion service protocol.
+
+  The system described in this proposal is not meant to be on all the time, and
+  it can be entirely disabled for services that do not experience DoS attacks.
+
+  When the subsystem is enabled, suggested effort is continuously adjusted and
+  the computational puzzle can be bypassed entirely when the effort reaches
+  zero. In these cases, the proof-of-work subsystem can be dormant but still
+  provide the necessary parameters for clients to voluntarily provide effort
+  in order to get better placement in the priority queue.
+
+  The protocol involves the following major steps:
+
+  1) Service encodes PoW parameters in descriptor [DESC_POW]
+  2) Client fetches descriptor and computes PoW [CLIENT_POW]
+  3) Client completes PoW and sends results in INTRO1 cell [INTRO1_POW]
+  4) Service verifies PoW and queues introduction based on PoW effort
+     [SERVICE_VERIFY]
+  5) Requests are continuously drained from the queue, highest effort first,
+     subject to multiple constraints on speed [HANDLE_QUEUE]
+
+2.2. Proof-of-work overview
+
+2.2.2. Dynamic PoW
+
+  DoS is a dynamic problem where the attacker's capabilities constantly change,
+  and hence we want our proof-of-work system to be dynamic and not stuck with a
+  static difficulty setting. Hence, instead of forcing clients to go below a
+  static target like in Bitcoin to be successful, we ask clients to "bid" using
+  their PoW effort. Effectively, a client gets higher priority the higher
+  effort they put into their proof-of-work. This is similar to how
+  proof-of-stake works but instead of staking coins, you stake work.
+
+  The benefit here is that legitimate clients who really care about getting
+  access can spend a big amount of effort into their PoW computation, which
+  should guarantee access to the service given reasonable adversary models. See
+  [PARAM_TUNING] for more details about these guarantees and tradeoffs.
+
+  As a way to improve reachability and UX, the service tries to estimate the
+  effort needed for clients to get access at any given time and places it in
+  the descriptor. See [EFFORT_ESTIMATION] for more details.
+
+```
diff --git a/spec/hspow-spec/v1-equix.md b/spec/hspow-spec/v1-equix.md
new file mode 100644
index 0000000..2756e39
--- /dev/null
+++ b/spec/hspow-spec/v1-equix.md
@@ -0,0 +1,233 @@
+```text
+
+2.2.1. Algorithm overview
+
+  For our proof-of-work function we will use the Equi-X scheme by tevador
+  [REF_EQUIX].  Equi-X is an asymmetric PoW function based on Equihash<60,3>,
+  using HashX as the underlying layer. It features lightning fast verification
+  speed, and also aims to minimize the asymmetry between CPU and GPU.
+  Furthermore, it's designed for this particular use-case and hence
+  cryptocurrency miners are not incentivized to make optimized ASICs for it.
+
+  The overall scheme consists of several layers that provide different pieces
+  of this functionality:
+
+  1) At the lowest layers, blake2b and siphash are used as hashing and PRNG
+     algorithms that are well suited to common 64-bit CPUs.
+  2) A custom hash function family, HashX, randomizes its implementation for
+     each new seed value. These functions are tuned to utilize the pipelined
+     integer performance on a modern 64-bit CPU. This layer provides the
+     strongest ASIC resistance, since a hardware reimplementation would need
+     to include a CPU-like pipelined execution unit to keep up.
+  3) The Equi-X layer itself builds on HashX and adds an algorithmic puzzle
+     that's designed to be strongly asymmetric and to require RAM to solve
+     efficiently.
+  4) The PoW protocol itself builds on this Equi-X function with a particular
+     construction of the challenge input and particular constraints on the
+     allowed blake2b hash of the solution. This layer provides a linearly
+     adjustable effort that we can verify.
+  5) Above the level of individual PoW handshakes, the client and service
+     form a closed-loop system that adjusts the effort of future handshakes.
+
+  The Equi-X scheme provides two functions that will be used in this proposal:
+      - equix_solve(challenge) which solves a puzzle instance, returning
+        a variable number of solutions per invocation depending on the specific
+        challenge value.
+      - equix_verify(challenge, solution) which verifies a puzzle solution
+        quickly. Verification still depends on executing the HashX function,
+        but far fewer times than when searching for a solution.
+
+  For the purposes of this proposal, all cryptographic algorithms are assumed
+  to produce and consume byte strings, even if internally they operate on
+  some other data type like 64-bit words. This is conventionally little endian
+  order for blake2b, which contrasts with Tor's typical use of big endian.
+  HashX itself is configured with an 8-byte output but its input is a single
+  64-bit word of undefined byte order, of which only the low 16 bits are used
+  by Equi-X in its solution output. We treat Equi-X solution arrays as byte
+  arrays using their packed little endian 16-bit representation.
+
+  We tune Equi-X in section [EQUIX_TUNING].
+
+2.2.3. PoW effort
+
+  It's common for proof-of-work systems to define an exponential effort
+  function based on a particular number of leading zero bits or equivalent.
+  For the benefit of our effort estimation system, it's quite useful if we
+  instead have a linear scale. We use the first 32 bits of a hashed version
+  of the Equi-X solution as compared to the full 32-bit range.
+
+  Conceptually we could define a function:
+         unsigned effort(uint8_t *token)
+  which takes as its argument a hashed solution, interprets it as a
+  bitstring, and returns the quotient of dividing a bitstring of 1s by it.
+
+  So for example:
+         effort(00000001100010101101) = 11111111111111111111
+                                          / 00000001100010101101
+  or the same in decimal:
+         effort(6317) = 1048575 / 6317 = 165.
+
+  In practice we can avoid even having to perform this division, performing
+  just one multiply instead to see if a request's claimed effort is supported
+  by the smallness of the resulting 32-bit hash prefix. This assumes we send
+  the desired effort explicitly as part of each PoW solution. We do want to
+  force clients to pick a specific effort before looking for a solution,
+  otherwise a client could opportunistically claim a very large effort any
+  time a lucky hash prefix comes up. Thus the effort is communicated explicitly
+  in our protocol, and it forms part of the concatenated Equi-X challenge.
+
+3.1. Service encodes PoW parameters in descriptor [DESC_POW]
+
+  This whole protocol starts with the service encoding the PoW parameters in
+  the 'encrypted' (inner) part of the v3 descriptor. As follows:
+
+       "pow-params" SP type SP seed-b64 SP suggested-effort
+                    SP expiration-time NL
+
+        [At most once]
+
+        type: The type of PoW system used. We call the one specified here "v1"
+
+        seed-b64: A random seed that should be used as the input to the PoW
+                  hash function. Should be 32 random bytes encoded in base64
+                  without trailing padding.
+
+        suggested-effort: An unsigned integer specifying an effort value that
+                  clients should aim for when contacting the service. Can be
+                  zero to mean that PoW is available but not currently
+                  suggested for a first connection attempt. See
+                  [EFFORT_ESTIMATION] for more details here.
+
+        expiration-time: A timestamp in "YYYY-MM-DDTHH:MM:SS" format (iso time
+                         with no space) after which the above seed expires and
+                         is no longer valid as the input for PoW. It's needed
+                         so that our replay cache does not grow infinitely. It
+                         should be set to RAND_TIME(now+7200, 900) seconds.
+
+   The service should refresh its seed when expiration-time passes. The service
+   SHOULD keep its previous seed in memory and accept PoWs using it to avoid
+   race-conditions with clients that have an old seed. The service SHOULD avoid
+   generating two consequent seeds that have a common 4 bytes prefix. See
+   [INTRO1_POW] for more info.
+
+   By RAND_TIME(ts, interval) we mean a time between ts-interval and ts, chosen
+   uniformly at random.
+
+3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
+
+  If a client receives a descriptor with "pow-params", it should assume that
+  the service is prepared to receive PoW solutions as part of the introduction
+  protocol.
+
+  The client parses the descriptor and extracts the PoW parameters. It makes
+  sure that the <expiration-time> has not expired and if it has, it needs to
+  fetch a new descriptor.
+
+  The client should then extract the <suggested-effort> field to configure its
+  PoW 'target' (see [REF_TARGET]). The client SHOULD NOT accept 'target' values
+  that will cause unacceptably long PoW computation.
+
+  The client uses a "personalization string" P equal to the following
+  nul-terminated ASCII string: "Tor hs intro v1\0".
+
+  The client looks up `ID`, the current 32-byte blinded public ID
+  (KP_hs_blind_id) for the onion service.
+
+  To complete the PoW the client follows the following logic:
+
+      a) Client selects a target effort E, based on <suggested-effort> and past
+         connection attempt history.
+      b) Client generates a secure random 16-byte nonce N, as the starting
+         point for the solution search.
+      c) Client derives seed C by decoding 'seed-b64'.
+      d) Client calculates S = equix_solve(P || ID || C || N || E)
+      e) Client calculates R = ntohl(blake2b_32(P || ID || C || N || E || S))
+      f) Client checks if R * E <= UINT32_MAX.
+        f1) If yes, success! The client can submit N, E, the first 4 bytes of
+        C, and S.
+        f2) If no, fail! The client interprets N as a 16-byte little-endian
+        integer, increments it by 1 and goes back to step d).
+
+  Note that the blake2b hash includes the output length parameter in its
+  initial state vector, so a blake2b_32 is not equivalent to the prefix of a
+  blake2b_512. We calculate the 32-bit blake2b specifically, and interpret it
+  in network byte order as an unsigned integer.
+
+  At the end of the above procedure, the client should have S as the solution
+  of the Equix-X puzzle with N as the nonce, C as the seed. How quickly this
+  happens depends solely on the target effort E parameter.
+
+  The algorithm as described is suitable for single-threaded computation.
+  Optionally, a client may choose multiple nonces and attempt several solutions
+  in parallel on separate CPU cores. The specific choice of nonce is entirely
+  up to the client, so parallelization choices like this do not impact the
+  network protocol's interoperability at all.
+
+3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
+
+  Now that the client has an answer to the puzzle it's time to encode it into
+  an INTRODUCE1 cell. To do so the client adds an extension to the encrypted
+  portion of the INTRODUCE1 cell by using the EXTENSIONS field (see
+  [PROCESS_INTRO2] section in rend-spec-v3.txt). The encrypted portion of the
+  INTRODUCE1 cell only gets read by the onion service and is ignored by the
+  introduction point.
+
+  We propose a new EXT_FIELD_TYPE value:
+
+     [02] -- PROOF_OF_WORK
+
+   The EXT_FIELD content format is:
+
+        POW_VERSION    [1 byte]
+        POW_NONCE      [16 bytes]
+        POW_EFFORT     [4 bytes]
+        POW_SEED       [4 bytes]
+        POW_SOLUTION   [16 bytes]
+
+   where:
+
+    POW_VERSION is 1 for the protocol specified in this proposal
+    POW_NONCE is the nonce 'N' from the section above
+    POW_EFFORT is the 32-bit integer effort value, in network byte order
+    POW_SEED is the first 4 bytes of the seed used
+
+   This will increase the INTRODUCE1 payload size by 43 bytes since the
+   extension type and length is 2 extra bytes, the N_EXTENSIONS field is always
+   present and currently set to 0 and the EXT_FIELD is 41 bytes. According to
+   ticket #33650, INTRODUCE1 cells currently have more than 200 bytes
+   available.
+
+3.4. Service verifies PoW and handles the introduction  [SERVICE_VERIFY]
+
+   When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it
+   should check its configuration on whether proof-of-work is enabled on the
+   service. If it's not enabled, the extension SHOULD BE ignored. If enabled,
+   even if the suggested effort is currently zero, the service follows the
+   procedure detailed in this section.
+
+   If the service requires the PROOF_OF_WORK extension but received an
+   INTRODUCE1 cell without any embedded proof-of-work, the service SHOULD
+   consider this cell as a zero-effort introduction for the purposes of the
+   priority queue (see section [INTRO_QUEUE]).
+
+3.4.1. PoW verification [POW_VERIFY]
+
+   To verify the client's proof-of-work the service MUST do the following steps:
+
+      a) Find a valid seed C that starts with POW_SEED. Fail if no such seed
+         exists.
+      b) Fail if N = POW_NONCE is present in the replay cache
+              (see [REPLAY_PROTECTION])
+      c) Calculate R = ntohl(blake2b_32(P || ID || C || N || E || S))
+      d) Fail if R * E > UINT32_MAX
+      e) Fail if equix_verify(P || ID || C || N || E, S) != EQUIX_OK
+      f) Put the request in the queue with a priority of E
+
+   If any of these steps fail the service MUST ignore this introduction request
+   and abort the protocol.
+
+   In this proposal we call the above steps the "top half" of introduction
+   handling. If all the steps of the "top half" have passed, then the circuit
+   is added to the introduction queue as detailed in section [INTRO_QUEUE].
+
+```
author	Micah Elizabeth Scott <beth@torproject.org>	2023-11-08 11:14:31 -0800
committer	Micah Elizabeth Scott <beth@torproject.org>	2023-11-09 14:16:27 -0800
commit	8489561ae79210fd7561c5a799bd28c6654ffbab (patch)
tree	9991e1f73938d7f6f975be4ce819702585f07302 /spec/hspow-spec
parent	dbf045b6d38fa99f19041c926a988a7bc08f59e6 (diff)
download	torspec-8489561ae79210fd7561c5a799bd28c6654ffbab.tar.gz torspec-8489561ae79210fd7561c5a799bd28c6654ffbab.zip