aboutsummaryrefslogtreecommitdiff
path: root/spec/hspow-spec
diff options
context:
space:
mode:
authorMicah Elizabeth Scott <beth@torproject.org>2023-11-08 11:14:31 -0800
committerMicah Elizabeth Scott <beth@torproject.org>2023-11-09 14:16:27 -0800
commit8489561ae79210fd7561c5a799bd28c6654ffbab (patch)
tree9991e1f73938d7f6f975be4ce819702585f07302 /spec/hspow-spec
parentdbf045b6d38fa99f19041c926a988a7bc08f59e6 (diff)
downloadtorspec-8489561ae79210fd7561c5a799bd28c6654ffbab.tar.gz
torspec-8489561ae79210fd7561c5a799bd28c6654ffbab.zip
Split the hspow-spec into sections without making edits
This establishes a new outline for the hspow-spec. Otherwise, it's just cut and paste without any content edits. This also does not yet fully disentangling the common and v1-specific pieces of the document.
Diffstat (limited to 'spec/hspow-spec')
-rw-r--r--spec/hspow-spec/analysis-discussion.md591
-rw-r--r--spec/hspow-spec/common-protocol.md227
-rw-r--r--spec/hspow-spec/index.md1212
-rw-r--r--spec/hspow-spec/motivation.md106
-rw-r--r--spec/hspow-spec/overview.md68
-rw-r--r--spec/hspow-spec/v1-equix.md233
6 files changed, 1225 insertions, 1212 deletions
diff --git a/spec/hspow-spec/analysis-discussion.md b/spec/hspow-spec/analysis-discussion.md
new file mode 100644
index 0000000..2c9d43f
--- /dev/null
+++ b/spec/hspow-spec/analysis-discussion.md
@@ -0,0 +1,591 @@
+```text
+
+5. Attacker strategies [ATTACK_META]
+
+ Now that we defined our protocol we need to start tweaking the various
+ knobs. But before we can do that, we first need to understand a few
+ high-level attacker strategies to see what we are fighting against.
+
+5.1.1. Overwhelm PoW verification (aka "Overwhelm top half") [ATTACK_TOP_HALF]
+
+ A basic attack here is the adversary spamming with bogus INTRO cells so that
+ the service does not have computing capacity to even verify the
+ proof-of-work. This adversary tries to overwhelm the procedure in the
+ [POW_VERIFY] section.
+
+ That's why we need the PoW algorithm to have a cheap verification time so
+ that this attack is not possible: we tune this PoW parameter in section
+ [POW_TUNING_VERIFICATION].
+
+5.1.2. Overwhelm rendezvous capacity (aka "Overwhelm bottom half")
+ [ATTACK_BOTTOM_HALF]
+
+ Given the way the introduction queue works (see [HANDLE_QUEUE]), a very
+ effective strategy for the attacker is to totally overwhelm the queue
+ processing by sending more high-effort introductions than the onion service
+ can handle at any given tick. This adversary tries to overwhelm the procedure
+ in the [HANDLE_QUEUE] section.
+
+ To do so, the attacker would have to send at least 20 high-effort
+ introduction cells every 100ms, where high-effort is a PoW which is above the
+ estimated level of "the motivated user" (see [USER_MODEL]).
+
+ An easier attack for the adversary, is the same strategy but with
+ introduction cells that are all above the comfortable level of "the standard
+ user" (see [USER_MODEL]). This would block out all standard users and only
+ allow motivated users to pass.
+
+5.1.3. Hybrid overwhelm strategy [ATTACK_HYBRID]
+
+ If both the top- and bottom- halves are processed by the same thread, this
+ opens up the possibility for a "hybrid" attack. Given the performance figures
+ for the bottom half (0.31 ms/req.) and the top half (5.5 ms/req.), the
+ attacker can optimally deny service by submitting 91 high-effort requests and
+ 1520 invalid requests per second. This will completely saturate the main loop
+ because:
+
+ 0.31*(1520+91) ~ 0.5 sec.
+ 5.5*91 ~ 0.5 sec.
+
+ This attack only has half the bandwidth requirement of [ATTACK_TOP_HALF] and
+ half the compute requirement of [ATTACK_BOTTOM_HALF].
+
+ Alternatively, the attacker can adjust the ratio between invalid and
+ high-effort requests depending on their bandwidth and compute capabilities.
+
+5.1.4. Gaming the effort estimation logic [ATTACK_EFFORT]
+
+ Another way to beat this system is for the attacker to game the effort
+ estimation logic (see [EFFORT_ESTIMATION]). Essentially, there are two attacks
+ that we are trying to avoid:
+
+ - Attacker sets descriptor suggested-effort to a very high value effectively
+ making it impossible for most clients to produce a PoW token in a
+ reasonable timeframe.
+ - Attacker sets descriptor suggested-effort to a very small value so that
+ most clients aim for a small value while the attacker comfortably launches
+ an [ATTACK_BOTTOM_HALF] using medium effort PoW (see [REF_TEVADOR_1])
+
+5.1.4. Precomputed PoW attack
+
+ The attacker may precompute many valid PoW nonces and submit them all at once
+ before the current seed expires, overwhelming the service temporarily even
+ using a single computer. The current scheme gives the attackers 4 hours to
+ launch this attack since each seed lasts 2 hours and the service caches two
+ seeds.
+
+ An attacker with this attack might be aiming to DoS the service for a limited
+ amount of time, or to cause an [ATTACK_EFFORT] attack.
+
+6. Parameter tuning [POW_TUNING]
+
+ There are various parameters in this PoW system that need to be tuned:
+
+ We first start by tuning the time it takes to verify a PoW token. We do this
+ first because it's fundamental to the performance of onion services and can
+ turn into a DoS vector of its own. We will do this tuning in a way that's
+ agnostic to the chosen PoW function.
+
+ We will then move towards analyzing the client starting difficulty setting
+ for our PoW system. That defines the expected time for clients to succeed in
+ our system, and the expected time for attackers to overwhelm our system. Same
+ as above we will do this in a way that's agnostic to the chosen PoW function.
+
+ Currently, we have hardcoded the initial client starting difficulty at 8,
+ but this may be too low to ramp up quickly to various on and off attack
+ patterns. A higher initial difficulty may be needed for these, depending on
+ their severity. This section gives us an idea of how large such attacks can
+ be.
+
+ Finally, using those two pieces we will tune our PoW function and pick the
+ right client starting difficulty setting. At the end of this section we will
+ know the resources that an attacker needs to overwhelm the onion service, the
+ resources that the service needs to verify introduction requests, and the
+ resources that legitimate clients need to get to the onion service.
+
+6.1. PoW verification [POW_TUNING_VERIFICATION]
+
+ Verifying a PoW token is the first thing that a service does when it receives
+ an INTRODUCE2 cell and it's detailed in section [POW_VERIFY]. This
+ verification happens during the "top half" part of the process. Every
+ millisecond spent verifying PoW adds overhead to the already existing "top
+ half" part of handling an introduction cell. Hence we should be careful to
+ add minimal overhead here so that we don't enable attacks like [ATTACK_TOP_HALF].
+
+ During our performance measurements in [TOR_MEASUREMENTS] we learned that the
+ "top half" takes about 0.26 msecs in average, without doing any sort of PoW
+ verification. Using that value we compute the following table, that describes
+ the number of cells we can queue per second (aka times we can perform the
+ "top half" process) for different values of PoW verification time:
+
+ +---------------------+-----------------------+--------------+
+ |PoW Verification Time| Total "top half" time | Cells Queued |
+ | | | per second |
+ |---------------------|-----------------------|--------------|
+ | 0 msec | 0.26 msec | 3846 |
+ | 1 msec | 1.26 msec | 793 |
+ | 2 msec | 2.26 msec | 442 |
+ | 3 msec | 3.26 msec | 306 |
+ | 4 msec | 4.26 msec | 234 |
+ | 5 msec | 5.26 msec | 190 |
+ | 6 msec | 6.26 msec | 159 |
+ | 7 msec | 7.26 msec | 137 |
+ | 8 msec | 8.26 msec | 121 |
+ | 9 msec | 9.26 msec | 107 |
+ | 10 msec | 10.26 msec | 97 |
+ +---------------------+-----------------------+--------------+
+
+ Here is how you can read the table above:
+
+ - For a PoW function with a 1ms verification time, an attacker needs to send
+ 793 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
+
+ - For a PoW function with a 2ms verification time, an attacker needs to send
+ 442 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
+
+ - For a PoW function with a 10ms verification time, an attacker needs to send
+ 97 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
+
+ Whether an attacker can succeed at that depends on the attacker's resources,
+ but also on the network's capacity.
+
+ Our purpose here is to have the smallest PoW verification overhead possible
+ that also allows us to achieve all our other goals.
+
+ [Note that the table above is simply the result of a naive multiplication and
+ does not take into account all the auxiliary overheads that happen every
+ second like the time to invoke the mainloop, the bottom-half processes, or
+ pretty much anything other than the "top-half" processing.
+
+ During our measurements the time to handle INTRODUCE2 cells dominates any
+ other action time: There might be events that require a long processing time,
+ but these are pretty infrequent (like uploading a new HS descriptor) and
+ hence over a long time they smooth out. Hence extrapolating the total cells
+ queued per second based on a single "top half" time seems like good enough to
+ get some initial intuition. That said, the values of "Cells queued per
+ second" from the table above, are likely much smaller than displayed above
+ because of all the auxiliary overheads.]
+
+6.2. PoW difficulty analysis [POW_DIFFICULTY_ANALYSIS]
+
+ The difficulty setting of our PoW basically dictates how difficult it should
+ be to get a success in our PoW system. An attacker who can get many successes
+ per second can pull a successful [ATTACK_BOTTOM_HALF] attack against our
+ system.
+
+ In classic PoW systems, "success" is defined as getting a hash output below
+ the "target". However, since our system is dynamic, we define "success" as an
+ abstract high-effort computation.
+
+ Our system is dynamic but we still need a starting difficulty setting that
+ will be used for bootstrapping the system. The client and attacker can still
+ aim higher or lower but for UX purposes and for analysis purposes we do need
+ to define a starting difficulty, to minimize retries by clients.
+
+6.2.1. Analysis based on adversary power
+
+ In this section we will try to do an analysis of PoW difficulty without using
+ any sort of Tor-related or PoW-related benchmark numbers.
+
+ We created the table (see [REF_TABLE]) below which shows how much time a
+ legitimate client with a single machine should expect to burn before they get
+ a single success. The x-axis is how many successes we want the attacker to be
+ able to do per second: the more successes we allow the adversary, the more
+ they can overwhelm our introduction queue. The y-axis is how many machines
+ the adversary has in her disposal, ranging from just 5 to 1000.
+
+ ===============================================================
+ | Expected Time (in seconds) Per Success For One Machine |
+ ===========================================================================
+ | |
+ | Attacker Succeses 1 5 10 20 30 50 |
+ | per second |
+ | |
+ | 5 5 1 0 0 0 0 |
+ | 50 50 10 5 2 1 1 |
+ | 100 100 20 10 5 3 2 |
+ | Attacker 200 200 40 20 10 6 4 |
+ | Boxes 300 300 60 30 15 10 6 |
+ | 400 400 80 40 20 13 8 |
+ | 500 500 100 50 25 16 10 |
+ | 1000 1000 200 100 50 33 20 |
+ | |
+ ============================================================================
+
+ Here is how you can read the table above:
+
+ - If an adversary has a botnet with 1000 boxes, and we want to limit her to 1
+ success per second, then a legitimate client with a single box should be
+ expected to spend 1000 seconds getting a single success.
+
+ - If an adversary has a botnet with 1000 boxes, and we want to limit her to 5
+ successes per second, then a legitimate client with a single box should be
+ expected to spend 200 seconds getting a single success.
+
+ - If an adversary has a botnet with 500 boxes, and we want to limit her to 5
+ successes per second, then a legitimate client with a single box should be
+ expected to spend 100 seconds getting a single success.
+
+ - If an adversary has access to 50 boxes, and we want to limit her to 5
+ successes per second, then a legitimate client with a single box should be
+ expected to spend 10 seconds getting a single success.
+
+ - If an adversary has access to 5 boxes, and we want to limit her to 5
+ successes per second, then a legitimate client with a single box should be
+ expected to spend 1 seconds getting a single success.
+
+ With the above table we can create some profiles for starting values of our
+ PoW difficulty.
+
+6.2.2. Analysis based on Tor's performance [POW_DIFFICULTY_TOR]
+
+ To go deeper here, we can use the performance measurements from
+ [TOR_MEASUREMENTS] to get a more specific intuition on the starting
+ difficulty. In particular, we learned that completely handling an
+ introduction cell takes 5.55 msecs in average. Using that value, we can
+ compute the following table, that describes the number of introduction cells
+ we can handle per second for different values of PoW verification:
+
+ +---------------------+-----------------------+--------------+
+ |PoW Verification Time| Total time to handle | Cells handled|
+ | | introduction cell | per second |
+ |---------------------|-----------------------|--------------|
+ | 0 msec | 5.55 msec | 180.18 |
+ | 1 msec | 6.55 msec | 152.67 |
+ | 2 msec | 7.55 msec | 132.45 |
+ | 3 msec | 8.55 msec | 116.96 |
+ | 4 msec | 9.55 mesc | 104.71 |
+ | 5 msec | 10.55 msec | 94.79 |
+ | 6 msec | 11.55 msec | 86.58 |
+ | 7 msec | 12.55 msec | 79.68 |
+ | 8 msec | 13.55 msec | 73.80 |
+ | 9 msec | 14.55 msec | 68.73 |
+ | 10 msec | 15.55 msec | 64.31 |
+ +---------------------+-----------------------+--------------+
+
+ Here is how you can read the table above:
+
+ - For a PoW function with a 1ms verification time, an attacker needs to send
+ 152 high-effort introduction cells per second to succeed in a
+ [ATTACK_BOTTOM_HALF] attack.
+
+ - For a PoW function with a 10ms verification time, an attacker needs to send
+ 64 high-effort introduction cells per second to succeed in a
+ [ATTACK_BOTTOM_HALF] attack.
+
+ We can use this table to specify a starting difficulty that won't allow our
+ target adversary to succeed in an [ATTACK_BOTTOM_HALF] attack.
+
+ Of course, when it comes to this table, the same disclaimer as in section
+ [POW_TUNING_VERIFICATION] is valid. That is, the above table is just a
+ theoretical extrapolation and we expect the real values to be much lower
+ since they depend on auxiliary processing overheads, and on the network's
+ capacity.
+
+
+7. Discussion
+
+7.1. UX
+
+ This proposal has user facing UX consequences.
+
+ When the client first attempts a pow, it can note how long iterations of the
+ hash function take, and then use this to determine an estimation of the
+ duration of the PoW. This estimation could be communicated via the control
+ port or other mechanism, such that the browser could display how long the
+ PoW is expected to take on their device. If the device is a mobile platform,
+ and this time estimation is large, it could recommend that the user try from
+ a desktop machine.
+
+7.2. Future work [FUTURE_WORK]
+
+7.2.1. Incremental improvements to this proposal
+
+ There are various improvements that can be done in this proposal, and while
+ we are trying to keep this v1 version simple, we need to keep the design
+ extensible so that we build more features into it. In particular:
+
+ - End-to-end introduction ACKs
+
+ This proposal suffers from various UX issues because there is no end-to-end
+ mechanism for an onion service to inform the client about its introduction
+ request. If we had end-to-end introduction ACKs many of the problems from
+ [CLIENT_BEHAVIOR] would be alleviated. The problem here is that end-to-end
+ ACKs require modifications on the introduction point code and a network
+ update which is a lengthy process.
+
+ - Multithreading scheduler
+
+ Our scheduler is pretty limited by the fact that Tor has a single-threaded
+ design. If we improve our multithreading support we could handle a much
+ greater amount of introduction requests per second.
+
+7.2.2. Future designs [FUTURE_DESIGNS]
+
+ This is just the beginning in DoS defences for Tor and there are various
+ future designs and schemes that we can investigate. Here is a brief summary
+ of these:
+
+ "More advanced PoW schemes" -- We could use more advanced memory-hard PoW
+ schemes like MTP-argon2 or Itsuku to make it even harder for
+ adversaries to create successful PoWs. Unfortunately these schemes
+ have much bigger proof sizes, and they won't fit in INTRODUCE1 cells.
+ See #31223 for more details.
+
+ "Third-party anonymous credentials" -- We can use anonymous credentials and a
+ third-party token issuance server on the clearnet to issue tokens
+ based on PoW or CAPTCHA and then use those tokens to get access to the
+ service. See [REF_CREDS] for more details.
+
+ "PoW + Anonymous Credentials" -- We can make a hybrid of the above ideas
+ where we present a hard puzzle to the user when connecting to the
+ onion service, and if they solve it we then give the user a bunch of
+ anonymous tokens that can be used in the future. This can all happen
+ between the client and the service without a need for a third party.
+
+ All of the above approaches are much more complicated than this proposal, and
+ hence we want to start easy before we get into more serious projects.
+
+7.3. Environment
+
+ We love the environment! We are concerned of how PoW schemes can waste energy
+ by doing useless hash iterations. Here is a few reasons we still decided to
+ pursue a PoW approach here:
+
+ "We are not making things worse" -- DoS attacks are already happening and
+ attackers are already burning energy to carry them out both on the
+ attacker side, on the service side and on the network side. We think that
+ asking legitimate clients to carry out PoW computations is not gonna
+ affect the equation too much, since an attacker right now can very
+ quickly cause the same damage that hundreds of legitimate clients do a
+ whole day.
+
+ "We hope to make things better" -- The hope is that proposals like this will
+ make the DoS actors go away and hence the PoW system will not be used. As
+ long as DoS is happening there will be a waste of energy, but if we
+ manage to demotivate them with technical means, the network as a whole
+ will less wasteful. Also see [CATCH22] for a similar argument.
+
+8. Acknowledgements
+
+ Thanks a lot to tevador for the various improvements to the proposal and for
+ helping us understand and tweak the RandomX scheme.
+
+ Thanks to Solar Designer for the help in understanding the current PoW
+ landscape, the various approaches we could take, and teaching us a few neat
+ tricks.
+
+Appendix A. Little-t tor introduction scheduler
+
+ This section describes how we will implement this proposal in the "tor"
+ software (little-t tor).
+
+ The following should be read as if tor is an onion service and thus the end
+ point of all inbound data.
+
+A.1. The Main Loop [MAIN_LOOP]
+
+ Tor uses libevent for its mainloop. For network I/O operations, a mainloop
+ event is used to inform tor if it can read on a certain socket, or a
+ connection object in tor.
+
+ From there, this event will empty the connection input buffer (inbuf) by
+ extracting and processing a cell at a time. The mainloop is single threaded
+ and thus each cell is handled sequentially.
+
+ Processing an INTRODUCE2 cell at the onion service means a series of
+ operations (in order):
+
+ 1) Unpack cell from inbuf to local buffer.
+
+ 2) Decrypt cell (AES operations).
+
+ 3) Parse cell header and process it depending on its RELAY_COMMAND.
+
+ 4) INTRODUCE2 cell handling which means building a rendezvous circuit:
+ i) Path selection
+ ii) Launch circuit to first hop.
+
+ 5) Return to mainloop event which essentially means back to step (1).
+
+ Tor will read at most 32 cells out of the inbuf per mainloop round.
+
+A.2. Requirements for PoW
+
+ With this proposal, in order to prioritize cells by the amount of PoW work
+ it has done, cells can _not_ be processed sequentially as described above.
+
+ Thus, we need a way to queue a certain number of cells, prioritize them and
+ then process some cell(s) from the top of the queue (that is, the cells that
+ have done the most PoW effort).
+
+ We thus require a new cell processing flow that is _not_ compatible with
+ current tor design. The elements are:
+
+ - Validate PoW and place cells in a priority queue of INTRODUCE2 cells (as
+ described in section [INTRO_QUEUE]).
+
+ - Defer "bottom half" INTRO2 cell processing for after cells have been
+ queued into the priority queue.
+
+A.3. Proposed scheduler [TOR_SCHEDULER]
+
+ The intuitive way to address the A.2 requirements would be to do this
+ simple and naive approach:
+
+ 1) Mainloop: Empty inbuf INTRODUCE2 cells into priority queue
+
+ 2) Process all cells in pqueue
+
+ 3) Goto (1)
+
+ However, we are worried that handling all those cells before returning to the
+ mainloop opens possibilities of attack by an adversary since the priority
+ queue is not gonna be kept up to date while we process all those cells. This
+ means that we might spend lots of time dealing with introductions that don't
+ deserve it. See [BOTTOM_HALF_SCHEDULER] for more details.
+
+ We thus propose to split the INTRODUCE2 handling into two different steps:
+ "top half" and "bottom half" process, as also mentioned in [POW_VERIFY]
+ section above.
+
+A.3.1. Top half and bottom half scheduler
+
+ The top half process is responsible for queuing introductions into the
+ priority queue as follows:
+
+ a) Unpack cell from inbuf to local buffer.
+
+ b) Decrypt cell (AES operations).
+
+ c) Parse INTRODUCE2 cell header and validate PoW.
+
+ d) Return to mainloop event which essentially means step (1).
+
+ The top-half basically does all operations of section [MAIN_LOOP] except from (4).
+
+ An then, the bottom-half process is responsible for handling introductions
+ and doing rendezvous. To achieve this we introduce a new mainloop event to
+ process the priority queue _after_ the top-half event has completed. This new
+ event would do these operations sequentially:
+
+ a) Pop INTRODUCE2 cell from priority queue.
+
+ b) Parse and process INTRODUCE2 cell.
+
+ c) End event and yield back to mainloop.
+
+A.3.2. Scheduling the bottom half process [BOTTOM_HALF_SCHEDULER]
+
+ The question now becomes: when should the "bottom half" event get triggered
+ from the mainloop?
+
+ We propose that this event is scheduled in when the network I/O event
+ queues at least 1 cell into the priority queue. Then, as long as it has a
+ cell in the queue, it would re-schedule itself for immediate execution
+ meaning at the next mainloop round, it would execute again.
+
+ The idea is to try to empty the queue as fast as it can in order to provide a
+ fast response time to an introduction request but always leave a chance for
+ more cells to appear between cell processing by yielding back to the
+ mainloop. With this we are aiming to always have the most up-to-date version
+ of the priority queue when we are completing introductions: this way we are
+ prioritizing clients that spent a lot of time and effort completing their PoW.
+
+ If the size of the queue drops to 0, it stops scheduling itself in order to
+ not create a busy loop. The network I/O event will re-schedule it in time.
+
+ Notice that the proposed solution will make the service handle 1 single
+ introduction request at every main loop event. However, when we do
+ performance measurements we might learn that it's preferable to bump the
+ number of cells in the future from 1 to N where N <= 32.
+
+A.4 Performance measurements
+
+ This section will detail the performance measurements we've done on tor.git
+ for handling an INTRODUCE2 cell and then a discussion on how much more CPU
+ time we can add (for PoW validation) before it badly degrades our
+ performance.
+
+A.4.1 Tor measurements [TOR_MEASUREMENTS]
+
+ In this section we will derive measurement numbers for the "top half" and
+ "bottom half" parts of handling an introduction cell.
+
+ These measurements have been done on tor.git at commit
+ 80031db32abebaf4d0a91c01db258fcdbd54a471.
+
+ We've measured several set of actions of the INTRODUCE2 cell handling process
+ on Intel(R) Xeon(R) CPU E5-2650 v4. Our service was accessed by an array of
+ clients that sent introduction requests for a period of 60 seconds.
+
+ 1. Full Mainloop Event
+
+ We start by measuring the full time it takes for a mainloop event to
+ process an inbuf containing INTRODUCE2 cells. The mainloop event processed
+ 2.42 cells per invocation on average during our measurements.
+
+ Total measurements: 3279
+
+ Min: 0.30 msec - 1st Q.: 5.47 msec - Median: 5.91 msec
+ Mean: 13.43 msec - 3rd Q.: 16.20 msec - Max: 257.95 msec
+
+ 2. INTRODUCE2 cell processing (bottom-half)
+
+ We also measured how much time the "bottom half" part of the process
+ takes. That's the heavy part of processing an introduction request as seen
+ in step (4) of the [MAIN_LOOP] section:
+
+ Total measurements: 7931
+
+ Min: 0.28 msec - 1st Q.: 5.06 msec - Median: 5.33 msec
+ Mean: 5.29 msec - 3rd Q.: 5.57 msec - Max: 14.64 msec
+
+ 3. Connection data read (top half)
+
+ Now that we have the above pieces, we can use them to measure just the
+ "top half" part of the procedure. That's when bytes are taken from the
+ connection inbound buffer and parsed into an INTRODUCE2 cell where basic
+ validation is done.
+
+ There is an average of 2.42 INTRODUCE2 cells per mainloop event and so we
+ divide that by the full mainloop event mean time to get the time for one
+ cell. From that we subtract the "bottom half" mean time to get how much
+ the "top half" takes:
+
+ => 13.43 / (7931 / 3279) = 5.55
+ => 5.55 - 5.29 = 0.26
+
+ Mean: 0.26 msec
+
+ To summarize, during our measurements the average number of INTRODUCE2 cells
+ a mainloop event processed is ~2.42 cells (7931 cells for 3279 mainloop
+ invocations).
+
+ This means that, taking the mean of mainloop event times, it takes ~5.55msec
+ (13.43/2.42) to completely process an INTRODUCE2 cell. Then if we look deeper
+ we see that the "top half" of INTRODUCE2 cell processing takes 0.26 msec in
+ average, whereas the "bottom half" takes around 5.33 msec.
+
+ The heavyness of the "bottom half" is to be expected since that's where 95%
+ of the total work takes place: in particular the rendezvous path selection
+ and circuit launch.
+
+A.2. References
+
+ [REF_EQUIX]: https://github.com/tevador/equix
+ https://github.com/tevador/equix/blob/master/devlog.md
+ [REF_TABLE]: The table is based on the script below plus some manual editing for readability:
+ https://gist.github.com/asn-d6/99a936b0467b0cef88a677baaf0bbd04
+ [REF_BOTNET]: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2009/07/01121538/ynam_botnets_0907_en.pdf
+ [REF_CREDS]: https://lists.torproject.org/pipermail/tor-dev/2020-March/014198.html
+ [REF_TARGET]: https://en.bitcoin.it/wiki/Target
+ [REF_TLS]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
+ https://tools.ietf.org/id/draft-nir-tls-puzzles-00.html
+ https://tools.ietf.org/html/draft-ietf-ipsecme-ddos-protection-10
+ [REF_TLS_1]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
+ [REF_TEVADOR_1]: https://lists.torproject.org/pipermail/tor-dev/2020-May/014268.html
+ [REF_TEVADOR_2]: https://lists.torproject.org/pipermail/tor-dev/2020-June/014358.html
+ [REF_TEVADOR_SIM]: https://github.com/mikeperry-tor/scratchpad/blob/master/tor-pow/effort_sim.py#L57
+
+```
diff --git a/spec/hspow-spec/common-protocol.md b/spec/hspow-spec/common-protocol.md
new file mode 100644
index 0000000..0aa9df5
--- /dev/null
+++ b/spec/hspow-spec/common-protocol.md
@@ -0,0 +1,227 @@
+```text
+
+3. Protocol specification
+
+3.4.1.1. Replay protection [REPLAY_PROTECTION]
+
+ The service MUST NOT accept introduction requests with the same (seed, nonce)
+ tuple. For this reason a replay protection mechanism must be employed.
+
+ The simplest way is to use a simple hash table to check whether a (seed,
+ nonce) tuple has been used before for the active duration of a
+ seed. Depending on how long a seed stays active this might be a viable
+ solution with reasonable memory/time overhead.
+
+ If there is a worry that we might get too many introductions during the
+ lifetime of a seed, we can use a Bloom filter as our replay cache
+ mechanism. The probabilistic nature of Bloom filters means that sometimes we
+ will flag some connections as replays even if they are not; with this false
+ positive probability increasing as the number of entries increase. However,
+ with the right parameter tuning this probability should be negligible and
+ well handled by clients.
+
+ {TODO: Design and specify a suitable bloom filter for this purpose.}
+
+3.4.2. The Introduction Queue [INTRO_QUEUE]
+
+3.4.2.1. Adding introductions to the introduction queue [ADD_QUEUE]
+
+ When PoW is enabled and a verified introduction comes through, the service
+ instead of jumping straight into rendezvous, queues it and prioritizes it
+ based on how much effort was devoted by the client to PoW. This means that
+ introduction requests with high effort should be prioritized over those with
+ low effort.
+
+ To do so, the service maintains an "introduction priority queue" data
+ structure. Each element in that priority queue is an introduction request,
+ and its priority is the effort put into its PoW:
+
+ When a verified introduction comes through, the service uses its included
+ effort commitment value to place each request into the right position of the
+ priority_queue: The bigger the effort, the more priority it gets in the
+ queue. If two elements have the same effort, the older one has priority over
+ the newer one.
+
+3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
+
+ The service should handle introductions by pulling from the introduction
+ queue. We call this part of introduction handling the "bottom half" because
+ most of the computation happens in this stage. For a description of how we
+ expect such a system to work in Tor, see [TOR_SCHEDULER] section.
+
+3.4.3. PoW effort estimation [EFFORT_ESTIMATION]
+
+3.4.3.1. High-level description of the effort estimation process
+
+ The service starts with a default suggested-effort value of 0, which keeps
+ the PoW defenses dormant until we notice signs of overload.
+
+ The overall process of determining effort can be thought of as a set of
+ multiple coupled feedback loops. Clients perform their own effort
+ adjustments via [CLIENT_TIMEOUT] atop a base effort suggested by the service.
+ That suggestion incorporates the service's control adjustments atop a base
+ effort calculated using a sum of currently-queued client effort.
+
+ Each feedback loop has an opportunity to cover different time scales. Clients
+ can make adjustments at every single circuit creation request, whereas
+ services are limited by the extra load that frequent updates would place on
+ HSDir nodes.
+
+ In the combined client/service system these client-side increases are
+ expected to provide the most effective quick response to an emerging DoS
+ attack. After early clients increase the effort using [CLIENT_TIMEOUT],
+ later clients will benefit from the service detecting this increased queued
+ effort and offering a larger suggested_effort.
+
+ Effort increases and decreases both have an intrinsic cost. Increasing effort
+ will make the service more expensive to contact, and decreasing effort makes
+ new requests likely to become backlogged behind older requests. The steady
+ state condition is preferable to either of these side-effects, but ultimately
+ it's expected that the control loop always oscillates to some degree.
+
+3.4.3.2. Service-side effort estimation
+
+ Services keep an internal effort estimation which updates on a regular
+ periodic timer in response to measurements made on the queueing behavior
+ in the previous period. These internal effort changes can optionally trigger
+ client-visible suggested_effort changes when the difference is great enough
+ to warrant republishing to the HSDir.
+
+ This evaluation and update period is referred to as HS_UPDATE_PERIOD.
+ The service side effort estimation takes inspiration from TCP congestion
+ control's additive increase / multiplicative decrease approach, but unlike
+ a typical AIMD this algorithm is fixed-rate and doesn't update immediately
+ in response to events.
+
+ {TODO: HS_UPDATE_PERIOD is hardcoded to 300 (5 minutes) currently, but it
+ should be configurable in some way. Is it more appropriate to use the
+ service's torrc here or a consensus parameter?}
+
+3.4.3.3. Per-period service state
+
+ During each update period, the service maintains some state:
+
+ 1. TOTAL_EFFORT, a sum of all effort values for rendezvous requests that
+ were successfully validated and enqueued.
+
+ 2. REND_HANDLED, a count of rendezvous requests that were actually
+ launched. Requests that made it to dequeueing but were too old to launch
+ by then are not included.
+
+ 3. HAD_QUEUE, a flag which is set if at any time in the update period we
+ saw the priority queue filled with more than a minimum amount of work,
+ greater than we would expect to process in approximately 1/4 second
+ using the configured dequeue rate.
+
+ 4. MAX_TRIMMED_EFFORT, the largest observed single request effort that we
+ discarded during the period. Requests are discarded either due to age
+ (timeout) or during culling events that discard the bottom half of the
+ entire queue when it's too full.
+
+3.4.3.4. Service AIMD conditions
+
+ At the end of each period, the service may decide to increase effort,
+ decrease effort, or make no changes, based on these accumulated state values:
+
+ 1. If MAX_TRIMMED_EFFORT > our previous internal suggested_effort,
+ always INCREASE. Requests that follow our latest advice are being
+ dropped.
+
+ 2. If the HAD_QUEUE flag was set and the queue still contains at least
+ one item with effort >= our previous internal suggested_effort,
+ INCREASE. Even if we haven't yet reached the point of dropping requests,
+ this signal indicates that the our latest suggestion isn't high enough
+ and requests will build up in the queue.
+
+ 3. If neither condition (1) or (2) are taking place and the queue is below
+ a level we would expect to process in approximately 1/4 second, choose
+ to DECREASE.
+
+ 4. If none of these conditions match, the suggested effort is unchanged.
+
+ When we INCREASE, the internal suggested_effort is increased to either its
+ previous value + 1, or (TOTAL_EFFORT / REND_HANDLED), whichever is larger.
+
+ When we DECREASE, the internal suggested_effort is scaled by 2/3rds.
+
+ Over time, this will continue to decrease our effort suggestion any time the
+ service is fully processing its request queue. If the queue stays empty, the
+ effort suggestion decreases to zero and clients should no longer submit a
+ proof-of-work solution with their first connection attempt.
+
+ It's worth noting that the suggested-effort is not a hard limit to the
+ efforts that are accepted by the service, and it's only meant to serve as a
+ guideline for clients to reduce the number of unsuccessful requests that get
+ to the service. The service still adds requests with lower effort than
+ suggested-effort to the priority queue in [ADD_QUEUE].
+
+3.4.3.5. Updating descriptor with new suggested effort
+
+ The service descriptors may be updated for multiple reasons including
+ introduction point rotation common to all v3 onion services, the scheduled
+ seed rotations described in [DESC_POW], and updates to the effort suggestion.
+ Even though the internal effort estimate updates on a regular timer, we avoid
+ propagating those changes into the descriptor and the HSDir hosts unless
+ there is a significant change.
+
+ If the PoW params otherwise match but the seed has changed by less than 15
+ percent, services SHOULD NOT upload a new descriptor.
+
+4. Client behavior [CLIENT_BEHAVIOR]
+
+ This proposal introduces a bunch of new ways where a legitimate client can
+ fail to reach the onion service.
+
+ Furthermore, there is currently no end-to-end way for the onion service to
+ inform the client that the introduction failed. The INTRO_ACK cell is not
+ end-to-end (it's from the introduction point to the client) and hence it does
+ not allow the service to inform the client that the rendezvous is never gonna
+ occur.
+
+ From the client's perspective there's no way to attribute this failure to
+ the service itself rather than the introduction point, so error accounting
+ is performed separately for each introduction-point. Existing mechanisms
+ will discard an introduction point that's required too many retries.
+
+4.1. Clients handling timeouts [CLIENT_TIMEOUT]
+
+ Alice can fail to reach the onion service if her introduction request gets
+ trimmed off the priority queue in [HANDLE_QUEUE], or if the service does not
+ get through its priority queue in time and the connection times out.
+
+ This section presents a heuristic method for the client getting service even
+ in such scenarios.
+
+ If the rendezvous request times out, the client SHOULD fetch a new descriptor
+ for the service to make sure that it's using the right suggested-effort for
+ the PoW and the right PoW seed. If the fetched descriptor includes a new
+ suggested effort or seed, it should first retry the request with these
+ parameters.
+
+ {TODO: This is not actually implemented yet, but we should do it. How often
+ should clients at most try to fetch new descriptors? Determined by a
+ consensus parameter? This change will also allow clients to retry
+ effectively in cases where the service has just been reconfigured to
+ enable PoW defenses.}
+
+ Every time the client retries the connection, it will count these failures
+ per-introduction-point. These counts of previous retries are combined with
+ the service's suggested_effort when calculating the actual effort to spend
+ on any individual request to a service that advertises PoW support, even
+ when the currently advertised suggested_effort is zero.
+
+ On each retry, the client modifies its solver effort:
+
+ 1. If the effort is below (CLIENT_POW_EFFORT_DOUBLE_UNTIL = 1000)
+ it will be doubled.
+
+ 2. Otherwise, multiply the effort by (CLIENT_POW_RETRY_MULTIPLIER = 1.5).
+
+ 3. Constrain the new effort to be at least
+ (CLIENT_MIN_RETRY_POW_EFFORT = 8) and no greater than
+ (CLIENT_MAX_POW_EFFORT = 10000)
+
+ {TODO: These hardcoded limits should be replaced by timed limits and/or
+ an unlimited solver with robust cancellation. This is issue tor#40787}
+
+``` \ No newline at end of file
diff --git a/spec/hspow-spec/index.md b/spec/hspow-spec/index.md
index aea9c46..14db248 100644
--- a/spec/hspow-spec/index.md
+++ b/spec/hspow-spec/index.md
@@ -3,1215 +3,3 @@
The overall denial-of-service prevention strategies in Tor are described in the [Denial-of-service prevention mechanisms in Tor](../dos-spec/index.md) document. This document describes one specific mitigation, the proof-of-work client puzzle for onion service introduction.
This was originally [proposal 327, A First Take at PoW Over Introduction Circuits](../proposals/327-pow-over-intro.txt) authored by George Kadianakis, Mike Perry, David Goulet, and tevador.
-
-```text
-0. Abstract
-
- This proposal aims to thwart introduction flooding DoS attacks by introducing
- a dynamic Proof-Of-Work protocol that occurs over introduction circuits.
-
-1. Motivation
-
- So far our attempts at limiting the impact of introduction flooding DoS
- attacks on onion services has been focused on horizontal scaling with
- Onionbalance, optimizing the CPU usage of Tor and applying rate limiting.
- While these measures move the goalpost forward, a core problem with onion
- service DoS is that building rendezvous circuits is a costly procedure both
- for the service and for the network. For more information on the limitations
- of rate-limiting when defending against DDoS, see [REF_TLS_1].
-
- If we ever hope to have truly reachable global onion services, we need to
- make it harder for attackers to overload the service with introduction
- requests. This proposal achieves this by allowing onion services to specify
- an optional dynamic proof-of-work scheme that its clients need to participate
- in if they want to get served.
-
- With the right parameters, this proof-of-work scheme acts as a gatekeeper to
- block amplification attacks by attackers while letting legitimate clients
- through.
-
-1.1. Related work
-
- For a similar concept, see the three internet drafts that have been proposed
- for defending against TLS-based DDoS attacks using client puzzles [REF_TLS].
-
-1.2. Threat model [THREAT_MODEL]
-
-1.2.1. Attacker profiles [ATTACKER_MODEL]
-
- This proposal is written to thwart specific attackers. A simple PoW proposal
- cannot defend against all and every DoS attack on the Internet, but there are
- adversary models we can defend against.
-
- Let's start with some adversary profiles:
-
- "The script-kiddie"
-
- The script-kiddie has a single computer and pushes it to its
- limits. Perhaps it also has a VPS and a pwned server. We are talking about
- an attacker with total access to 10 GHz of CPU and 10 GB of RAM. We
- consider the total cost for this attacker to be zero $.
-
- "The small botnet"
-
- The small botnet is a bunch of computers lined up to do an introduction
- flooding attack. Assuming 500 medium-range computers, we are talking about
- an attacker with total access to 10 THz of CPU and 10 TB of RAM. We
- consider the upfront cost for this attacker to be about $400.
-
- "The large botnet"
-
- The large botnet is a serious operation with many thousands of computers
- organized to do this attack. Assuming 100k medium-range computers, we are
- talking about an attacker with total access to 200 THz of CPU and 200 TB of
- RAM. The upfront cost for this attacker is about $36k.
-
- We hope that this proposal can help us defend against the script-kiddie
- attacker and small botnets. To defend against a large botnet we would need
- more tools at our disposal (see [FUTURE_DESIGNS]).
-
-1.2.2. User profiles [USER_MODEL]
-
- We have attackers and we have users. Here are a few user profiles:
-
- "The standard web user"
-
- This is a standard laptop/desktop user who is trying to browse the
- web. They don't know how these defences work and they don't care to
- configure or tweak them. If the site doesn't load, they are gonna close
- their browser and be sad at Tor. They run a 2GHz computer with 4GB of RAM.
-
- "The motivated user"
-
- This is a user that really wants to reach their destination. They don't
- care about the journey; they just want to get there. They know what's going
- on; they are willing to make their computer do expensive multi-minute PoW
- computations to get where they want to be.
-
- "The mobile user"
-
- This is a motivated user on a mobile phone. Even tho they want to read the
- news article, they don't have much leeway on stressing their machine to do
- more computation.
-
- We hope that this proposal will allow the motivated user to always connect
- where they want to connect to, and also give more chances to the other user
- groups to reach the destination.
-
-1.2.3. The DoS Catch-22 [CATCH22]
-
- This proposal is not perfect and it does not cover all the use cases. Still,
- we think that by covering some use cases and giving reachability to the
- people who really need it, we will severely demotivate the attackers from
- continuing the DoS attacks and hence stop the DoS threat all together.
- Furthermore, by increasing the cost to launch a DoS attack, a big
- class of DoS attackers will disappear from the map, since the expected ROI
- will decrease.
-
-2. System Overview
-
-2.1. Tor protocol overview
-
- +----------------------------------+
- | Onion Service |
- +-------+ INTRO1 +-----------+ INTRO2 +--------+ |
- |Client |-------->|Intro Point|------->| PoW |-----------+ |
- +-------+ +-----------+ |Verifier| | |
- +--------+ | |
- | | |
- | | |
- | +----------v---------+ |
- | |Intro Priority Queue| |
- +---------+--------------------+---+
- | | |
- Rendezvous | | |
- circuits | | |
- v v v
-
-
-
- The proof-of-work scheme specified in this proposal takes place during the
- introduction phase of the onion service protocol.
-
- The system described in this proposal is not meant to be on all the time, and
- it can be entirely disabled for services that do not experience DoS attacks.
-
- When the subsystem is enabled, suggested effort is continuously adjusted and
- the computational puzzle can be bypassed entirely when the effort reaches
- zero. In these cases, the proof-of-work subsystem can be dormant but still
- provide the necessary parameters for clients to voluntarily provide effort
- in order to get better placement in the priority queue.
-
- The protocol involves the following major steps:
-
- 1) Service encodes PoW parameters in descriptor [DESC_POW]
- 2) Client fetches descriptor and computes PoW [CLIENT_POW]
- 3) Client completes PoW and sends results in INTRO1 cell [INTRO1_POW]
- 4) Service verifies PoW and queues introduction based on PoW effort
- [SERVICE_VERIFY]
- 5) Requests are continuously drained from the queue, highest effort first,
- subject to multiple constraints on speed [HANDLE_QUEUE]
-
-2.2. Proof-of-work overview
-
-2.2.1. Algorithm overview
-
- For our proof-of-work function we will use the Equi-X scheme by tevador
- [REF_EQUIX]. Equi-X is an asymmetric PoW function based on Equihash<60,3>,
- using HashX as the underlying layer. It features lightning fast verification
- speed, and also aims to minimize the asymmetry between CPU and GPU.
- Furthermore, it's designed for this particular use-case and hence
- cryptocurrency miners are not incentivized to make optimized ASICs for it.
-
- The overall scheme consists of several layers that provide different pieces
- of this functionality:
-
- 1) At the lowest layers, blake2b and siphash are used as hashing and PRNG
- algorithms that are well suited to common 64-bit CPUs.
- 2) A custom hash function family, HashX, randomizes its implementation for
- each new seed value. These functions are tuned to utilize the pipelined
- integer performance on a modern 64-bit CPU. This layer provides the
- strongest ASIC resistance, since a hardware reimplementation would need
- to include a CPU-like pipelined execution unit to keep up.
- 3) The Equi-X layer itself builds on HashX and adds an algorithmic puzzle
- that's designed to be strongly asymmetric and to require RAM to solve
- efficiently.
- 4) The PoW protocol itself builds on this Equi-X function with a particular
- construction of the challenge input and particular constraints on the
- allowed blake2b hash of the solution. This layer provides a linearly
- adjustable effort that we can verify.
- 5) Above the level of individual PoW handshakes, the client and service
- form a closed-loop system that adjusts the effort of future handshakes.
-
- The Equi-X scheme provides two functions that will be used in this proposal:
- - equix_solve(challenge) which solves a puzzle instance, returning
- a variable number of solutions per invocation depending on the specific
- challenge value.
- - equix_verify(challenge, solution) which verifies a puzzle solution
- quickly. Verification still depends on executing the HashX function,
- but far fewer times than when searching for a solution.
-
- For the purposes of this proposal, all cryptographic algorithms are assumed
- to produce and consume byte strings, even if internally they operate on
- some other data type like 64-bit words. This is conventionally little endian
- order for blake2b, which contrasts with Tor's typical use of big endian.
- HashX itself is configured with an 8-byte output but its input is a single
- 64-bit word of undefined byte order, of which only the low 16 bits are used
- by Equi-X in its solution output. We treat Equi-X solution arrays as byte
- arrays using their packed little endian 16-bit representation.
-
- We tune Equi-X in section [EQUIX_TUNING].
-
-2.2.2. Dynamic PoW
-
- DoS is a dynamic problem where the attacker's capabilities constantly change,
- and hence we want our proof-of-work system to be dynamic and not stuck with a
- static difficulty setting. Hence, instead of forcing clients to go below a
- static target like in Bitcoin to be successful, we ask clients to "bid" using
- their PoW effort. Effectively, a client gets higher priority the higher
- effort they put into their proof-of-work. This is similar to how
- proof-of-stake works but instead of staking coins, you stake work.
-
- The benefit here is that legitimate clients who really care about getting
- access can spend a big amount of effort into their PoW computation, which
- should guarantee access to the service given reasonable adversary models. See
- [PARAM_TUNING] for more details about these guarantees and tradeoffs.
-
- As a way to improve reachability and UX, the service tries to estimate the
- effort needed for clients to get access at any given time and places it in
- the descriptor. See [EFFORT_ESTIMATION] for more details.
-
-2.2.3. PoW effort
-
- It's common for proof-of-work systems to define an exponential effort
- function based on a particular number of leading zero bits or equivalent.
- For the benefit of our effort estimation system, it's quite useful if we
- instead have a linear scale. We use the first 32 bits of a hashed version
- of the Equi-X solution as compared to the full 32-bit range.
-
- Conceptually we could define a function:
- unsigned effort(uint8_t *token)
- which takes as its argument a hashed solution, interprets it as a
- bitstring, and returns the quotient of dividing a bitstring of 1s by it.
-
- So for example:
- effort(00000001100010101101) = 11111111111111111111
- / 00000001100010101101
- or the same in decimal:
- effort(6317) = 1048575 / 6317 = 165.
-
- In practice we can avoid even having to perform this division, performing
- just one multiply instead to see if a request's claimed effort is supported
- by the smallness of the resulting 32-bit hash prefix. This assumes we send
- the desired effort explicitly as part of each PoW solution. We do want to
- force clients to pick a specific effort before looking for a solution,
- otherwise a client could opportunistically claim a very large effort any
- time a lucky hash prefix comes up. Thus the effort is communicated explicitly
- in our protocol, and it forms part of the concatenated Equi-X challenge.
-
-3. Protocol specification
-
-3.1. Service encodes PoW parameters in descriptor [DESC_POW]
-
- This whole protocol starts with the service encoding the PoW parameters in
- the 'encrypted' (inner) part of the v3 descriptor. As follows:
-
- "pow-params" SP type SP seed-b64 SP suggested-effort
- SP expiration-time NL
-
- [At most once]
-
- type: The type of PoW system used. We call the one specified here "v1"
-
- seed-b64: A random seed that should be used as the input to the PoW
- hash function. Should be 32 random bytes encoded in base64
- without trailing padding.
-
- suggested-effort: An unsigned integer specifying an effort value that
- clients should aim for when contacting the service. Can be
- zero to mean that PoW is available but not currently
- suggested for a first connection attempt. See
- [EFFORT_ESTIMATION] for more details here.
-
- expiration-time: A timestamp in "YYYY-MM-DDTHH:MM:SS" format (iso time
- with no space) after which the above seed expires and
- is no longer valid as the input for PoW. It's needed
- so that our replay cache does not grow infinitely. It
- should be set to RAND_TIME(now+7200, 900) seconds.
-
- The service should refresh its seed when expiration-time passes. The service
- SHOULD keep its previous seed in memory and accept PoWs using it to avoid
- race-conditions with clients that have an old seed. The service SHOULD avoid
- generating two consequent seeds that have a common 4 bytes prefix. See
- [INTRO1_POW] for more info.
-
- By RAND_TIME(ts, interval) we mean a time between ts-interval and ts, chosen
- uniformly at random.
-
-3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
-
- If a client receives a descriptor with "pow-params", it should assume that
- the service is prepared to receive PoW solutions as part of the introduction
- protocol.
-
- The client parses the descriptor and extracts the PoW parameters. It makes
- sure that the <expiration-time> has not expired and if it has, it needs to
- fetch a new descriptor.
-
- The client should then extract the <suggested-effort> field to configure its
- PoW 'target' (see [REF_TARGET]). The client SHOULD NOT accept 'target' values
- that will cause unacceptably long PoW computation.
-
- The client uses a "personalization string" P equal to the following
- nul-terminated ASCII string: "Tor hs intro v1\0".
-
- The client looks up `ID`, the current 32-byte blinded public ID
- (KP_hs_blind_id) for the onion service.
-
- To complete the PoW the client follows the following logic:
-
- a) Client selects a target effort E, based on <suggested-effort> and past
- connection attempt history.
- b) Client generates a secure random 16-byte nonce N, as the starting
- point for the solution search.
- c) Client derives seed C by decoding 'seed-b64'.
- d) Client calculates S = equix_solve(P || ID || C || N || E)
- e) Client calculates R = ntohl(blake2b_32(P || ID || C || N || E || S))
- f) Client checks if R * E <= UINT32_MAX.
- f1) If yes, success! The client can submit N, E, the first 4 bytes of
- C, and S.
- f2) If no, fail! The client interprets N as a 16-byte little-endian
- integer, increments it by 1 and goes back to step d).
-
- Note that the blake2b hash includes the output length parameter in its
- initial state vector, so a blake2b_32 is not equivalent to the prefix of a
- blake2b_512. We calculate the 32-bit blake2b specifically, and interpret it
- in network byte order as an unsigned integer.
-
- At the end of the above procedure, the client should have S as the solution
- of the Equix-X puzzle with N as the nonce, C as the seed. How quickly this
- happens depends solely on the target effort E parameter.
-
- The algorithm as described is suitable for single-threaded computation.
- Optionally, a client may choose multiple nonces and attempt several solutions
- in parallel on separate CPU cores. The specific choice of nonce is entirely
- up to the client, so parallelization choices like this do not impact the
- network protocol's interoperability at all.
-
-3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
-
- Now that the client has an answer to the puzzle it's time to encode it into
- an INTRODUCE1 cell. To do so the client adds an extension to the encrypted
- portion of the INTRODUCE1 cell by using the EXTENSIONS field (see
- [PROCESS_INTRO2] section in rend-spec-v3.txt). The encrypted portion of the
- INTRODUCE1 cell only gets read by the onion service and is ignored by the
- introduction point.
-
- We propose a new EXT_FIELD_TYPE value:
-
- [02] -- PROOF_OF_WORK
-
- The EXT_FIELD content format is:
-
- POW_VERSION [1 byte]
- POW_NONCE [16 bytes]
- POW_EFFORT [4 bytes]
- POW_SEED [4 bytes]
- POW_SOLUTION [16 bytes]
-
- where:
-
- POW_VERSION is 1 for the protocol specified in this proposal
- POW_NONCE is the nonce 'N' from the section above
- POW_EFFORT is the 32-bit integer effort value, in network byte order
- POW_SEED is the first 4 bytes of the seed used
-
- This will increase the INTRODUCE1 payload size by 43 bytes since the
- extension type and length is 2 extra bytes, the N_EXTENSIONS field is always
- present and currently set to 0 and the EXT_FIELD is 41 bytes. According to
- ticket #33650, INTRODUCE1 cells currently have more than 200 bytes
- available.
-
-3.4. Service verifies PoW and handles the introduction [SERVICE_VERIFY]
-
- When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it
- should check its configuration on whether proof-of-work is enabled on the
- service. If it's not enabled, the extension SHOULD BE ignored. If enabled,
- even if the suggested effort is currently zero, the service follows the
- procedure detailed in this section.
-
- If the service requires the PROOF_OF_WORK extension but received an
- INTRODUCE1 cell without any embedded proof-of-work, the service SHOULD
- consider this cell as a zero-effort introduction for the purposes of the
- priority queue (see section [INTRO_QUEUE]).
-
-3.4.1. PoW verification [POW_VERIFY]
-
- To verify the client's proof-of-work the service MUST do the following steps:
-
- a) Find a valid seed C that starts with POW_SEED. Fail if no such seed
- exists.
- b) Fail if N = POW_NONCE is present in the replay cache
- (see [REPLAY_PROTECTION])
- c) Calculate R = ntohl(blake2b_32(P || ID || C || N || E || S))
- d) Fail if R * E > UINT32_MAX
- e) Fail if equix_verify(P || ID || C || N || E, S) != EQUIX_OK
- f) Put the request in the queue with a priority of E
-
- If any of these steps fail the service MUST ignore this introduction request
- and abort the protocol.
-
- In this proposal we call the above steps the "top half" of introduction
- handling. If all the steps of the "top half" have passed, then the circuit
- is added to the introduction queue as detailed in section [INTRO_QUEUE].
-
-3.4.1.1. Replay protection [REPLAY_PROTECTION]
-
- The service MUST NOT accept introduction requests with the same (seed, nonce)
- tuple. For this reason a replay protection mechanism must be employed.
-
- The simplest way is to use a simple hash table to check whether a (seed,
- nonce) tuple has been used before for the active duration of a
- seed. Depending on how long a seed stays active this might be a viable
- solution with reasonable memory/time overhead.
-
- If there is a worry that we might get too many introductions during the
- lifetime of a seed, we can use a Bloom filter as our replay cache
- mechanism. The probabilistic nature of Bloom filters means that sometimes we
- will flag some connections as replays even if they are not; with this false
- positive probability increasing as the number of entries increase. However,
- with the right parameter tuning this probability should be negligible and
- well handled by clients.
-
- {TODO: Design and specify a suitable bloom filter for this purpose.}
-
-3.4.2. The Introduction Queue [INTRO_QUEUE]
-
-3.4.2.1. Adding introductions to the introduction queue [ADD_QUEUE]
-
- When PoW is enabled and a verified introduction comes through, the service
- instead of jumping straight into rendezvous, queues it and prioritizes it
- based on how much effort was devoted by the client to PoW. This means that
- introduction requests with high effort should be prioritized over those with
- low effort.
-
- To do so, the service maintains an "introduction priority queue" data
- structure. Each element in that priority queue is an introduction request,
- and its priority is the effort put into its PoW:
-
- When a verified introduction comes through, the service uses its included
- effort commitment value to place each request into the right position of the
- priority_queue: The bigger the effort, the more priority it gets in the
- queue. If two elements have the same effort, the older one has priority over
- the newer one.
-
-3.4.2.2. Handling introductions from the introduction queue [HANDLE_QUEUE]
-
- The service should handle introductions by pulling from the introduction
- queue. We call this part of introduction handling the "bottom half" because
- most of the computation happens in this stage. For a description of how we
- expect such a system to work in Tor, see [TOR_SCHEDULER] section.
-
-3.4.3. PoW effort estimation [EFFORT_ESTIMATION]
-
-3.4.3.1. High-level description of the effort estimation process
-
- The service starts with a default suggested-effort value of 0, which keeps
- the PoW defenses dormant until we notice signs of overload.
-
- The overall process of determining effort can be thought of as a set of
- multiple coupled feedback loops. Clients perform their own effort
- adjustments via [CLIENT_TIMEOUT] atop a base effort suggested by the service.
- That suggestion incorporates the service's control adjustments atop a base
- effort calculated using a sum of currently-queued client effort.
-
- Each feedback loop has an opportunity to cover different time scales. Clients
- can make adjustments at every single circuit creation request, whereas
- services are limited by the extra load that frequent updates would place on
- HSDir nodes.
-
- In the combined client/service system these client-side increases are
- expected to provide the most effective quick response to an emerging DoS
- attack. After early clients increase the effort using [CLIENT_TIMEOUT],
- later clients will benefit from the service detecting this increased queued
- effort and offering a larger suggested_effort.
-
- Effort increases and decreases both have an intrinsic cost. Increasing effort
- will make the service more expensive to contact, and decreasing effort makes
- new requests likely to become backlogged behind older requests. The steady
- state condition is preferable to either of these side-effects, but ultimately
- it's expected that the control loop always oscillates to some degree.
-
-3.4.3.2. Service-side effort estimation
-
- Services keep an internal effort estimation which updates on a regular
- periodic timer in response to measurements made on the queueing behavior
- in the previous period. These internal effort changes can optionally trigger
- client-visible suggested_effort changes when the difference is great enough
- to warrant republishing to the HSDir.
-
- This evaluation and update period is referred to as HS_UPDATE_PERIOD.
- The service side effort estimation takes inspiration from TCP congestion
- control's additive increase / multiplicative decrease approach, but unlike
- a typical AIMD this algorithm is fixed-rate and doesn't update immediately
- in response to events.
-
- {TODO: HS_UPDATE_PERIOD is hardcoded to 300 (5 minutes) currently, but it
- should be configurable in some way. Is it more appropriate to use the
- service's torrc here or a consensus parameter?}
-
-3.4.3.3. Per-period service state
-
- During each update period, the service maintains some state:
-
- 1. TOTAL_EFFORT, a sum of all effort values for rendezvous requests that
- were successfully validated and enqueued.
-
- 2. REND_HANDLED, a count of rendezvous requests that were actually
- launched. Requests that made it to dequeueing but were too old to launch
- by then are not included.
-
- 3. HAD_QUEUE, a flag which is set if at any time in the update period we
- saw the priority queue filled with more than a minimum amount of work,
- greater than we would expect to process in approximately 1/4 second
- using the configured dequeue rate.
-
- 4. MAX_TRIMMED_EFFORT, the largest observed single request effort that we
- discarded during the period. Requests are discarded either due to age
- (timeout) or during culling events that discard the bottom half of the
- entire queue when it's too full.
-
-3.4.3.4. Service AIMD conditions
-
- At the end of each period, the service may decide to increase effort,
- decrease effort, or make no changes, based on these accumulated state values:
-
- 1. If MAX_TRIMMED_EFFORT > our previous internal suggested_effort,
- always INCREASE. Requests that follow our latest advice are being
- dropped.
-
- 2. If the HAD_QUEUE flag was set and the queue still contains at least
- one item with effort >= our previous internal suggested_effort,
- INCREASE. Even if we haven't yet reached the point of dropping requests,
- this signal indicates that the our latest suggestion isn't high enough
- and requests will build up in the queue.
-
- 3. If neither condition (1) or (2) are taking place and the queue is below
- a level we would expect to process in approximately 1/4 second, choose
- to DECREASE.
-
- 4. If none of these conditions match, the suggested effort is unchanged.
-
- When we INCREASE, the internal suggested_effort is increased to either its
- previous value + 1, or (TOTAL_EFFORT / REND_HANDLED), whichever is larger.
-
- When we DECREASE, the internal suggested_effort is scaled by 2/3rds.
-
- Over time, this will continue to decrease our effort suggestion any time the
- service is fully processing its request queue. If the queue stays empty, the
- effort suggestion decreases to zero and clients should no longer submit a
- proof-of-work solution with their first connection attempt.
-
- It's worth noting that the suggested-effort is not a hard limit to the
- efforts that are accepted by the service, and it's only meant to serve as a
- guideline for clients to reduce the number of unsuccessful requests that get
- to the service. The service still adds requests with lower effort than
- suggested-effort to the priority queue in [ADD_QUEUE].
-
-3.4.3.5. Updating descriptor with new suggested effort
-
- The service descriptors may be updated for multiple reasons including
- introduction point rotation common to all v3 onion services, the scheduled
- seed rotations described in [DESC_POW], and updates to the effort suggestion.
- Even though the internal effort estimate updates on a regular timer, we avoid
- propagating those changes into the descriptor and the HSDir hosts unless
- there is a significant change.
-
- If the PoW params otherwise match but the seed has changed by less than 15
- percent, services SHOULD NOT upload a new descriptor.
-
-4. Client behavior [CLIENT_BEHAVIOR]
-
- This proposal introduces a bunch of new ways where a legitimate client can
- fail to reach the onion service.
-
- Furthermore, there is currently no end-to-end way for the onion service to
- inform the client that the introduction failed. The INTRO_ACK cell is not
- end-to-end (it's from the introduction point to the client) and hence it does
- not allow the service to inform the client that the rendezvous is never gonna
- occur.
-
- From the client's perspective there's no way to attribute this failure to
- the service itself rather than the introduction point, so error accounting
- is performed separately for each introduction-point. Existing mechanisms
- will discard an introduction point that's required too many retries.
-
-4.1. Clients handling timeouts [CLIENT_TIMEOUT]
-
- Alice can fail to reach the onion service if her introduction request gets
- trimmed off the priority queue in [HANDLE_QUEUE], or if the service does not
- get through its priority queue in time and the connection times out.
-
- This section presents a heuristic method for the client getting service even
- in such scenarios.
-
- If the rendezvous request times out, the client SHOULD fetch a new descriptor
- for the service to make sure that it's using the right suggested-effort for
- the PoW and the right PoW seed. If the fetched descriptor includes a new
- suggested effort or seed, it should first retry the request with these
- parameters.
-
- {TODO: This is not actually implemented yet, but we should do it. How often
- should clients at most try to fetch new descriptors? Determined by a
- consensus parameter? This change will also allow clients to retry
- effectively in cases where the service has just been reconfigured to
- enable PoW defenses.}
-
- Every time the client retries the connection, it will count these failures
- per-introduction-point. These counts of previous retries are combined with
- the service's suggested_effort when calculating the actual effort to spend
- on any individual request to a service that advertises PoW support, even
- when the currently advertised suggested_effort is zero.
-
- On each retry, the client modifies its solver effort:
-
- 1. If the effort is below (CLIENT_POW_EFFORT_DOUBLE_UNTIL = 1000)
- it will be doubled.
-
- 2. Otherwise, multiply the effort by (CLIENT_POW_RETRY_MULTIPLIER = 1.5).
-
- 3. Constrain the new effort to be at least
- (CLIENT_MIN_RETRY_POW_EFFORT = 8) and no greater than
- (CLIENT_MAX_POW_EFFORT = 10000)
-
- {TODO: These hardcoded limits should be replaced by timed limits and/or
- an unlimited solver with robust cancellation. This is issue tor#40787}
-
-5. Attacker strategies [ATTACK_META]
-
- Now that we defined our protocol we need to start tweaking the various
- knobs. But before we can do that, we first need to understand a few
- high-level attacker strategies to see what we are fighting against.
-
-5.1.1. Overwhelm PoW verification (aka "Overwhelm top half") [ATTACK_TOP_HALF]
-
- A basic attack here is the adversary spamming with bogus INTRO cells so that
- the service does not have computing capacity to even verify the
- proof-of-work. This adversary tries to overwhelm the procedure in the
- [POW_VERIFY] section.
-
- That's why we need the PoW algorithm to have a cheap verification time so
- that this attack is not possible: we tune this PoW parameter in section
- [POW_TUNING_VERIFICATION].
-
-5.1.2. Overwhelm rendezvous capacity (aka "Overwhelm bottom half")
- [ATTACK_BOTTOM_HALF]
-
- Given the way the introduction queue works (see [HANDLE_QUEUE]), a very
- effective strategy for the attacker is to totally overwhelm the queue
- processing by sending more high-effort introductions than the onion service
- can handle at any given tick. This adversary tries to overwhelm the procedure
- in the [HANDLE_QUEUE] section.
-
- To do so, the attacker would have to send at least 20 high-effort
- introduction cells every 100ms, where high-effort is a PoW which is above the
- estimated level of "the motivated user" (see [USER_MODEL]).
-
- An easier attack for the adversary, is the same strategy but with
- introduction cells that are all above the comfortable level of "the standard
- user" (see [USER_MODEL]). This would block out all standard users and only
- allow motivated users to pass.
-
-5.1.3. Hybrid overwhelm strategy [ATTACK_HYBRID]
-
- If both the top- and bottom- halves are processed by the same thread, this
- opens up the possibility for a "hybrid" attack. Given the performance figures
- for the bottom half (0.31 ms/req.) and the top half (5.5 ms/req.), the
- attacker can optimally deny service by submitting 91 high-effort requests and
- 1520 invalid requests per second. This will completely saturate the main loop
- because:
-
- 0.31*(1520+91) ~ 0.5 sec.
- 5.5*91 ~ 0.5 sec.
-
- This attack only has half the bandwidth requirement of [ATTACK_TOP_HALF] and
- half the compute requirement of [ATTACK_BOTTOM_HALF].
-
- Alternatively, the attacker can adjust the ratio between invalid and
- high-effort requests depending on their bandwidth and compute capabilities.
-
-5.1.4. Gaming the effort estimation logic [ATTACK_EFFORT]
-
- Another way to beat this system is for the attacker to game the effort
- estimation logic (see [EFFORT_ESTIMATION]). Essentially, there are two attacks
- that we are trying to avoid:
-
- - Attacker sets descriptor suggested-effort to a very high value effectively
- making it impossible for most clients to produce a PoW token in a
- reasonable timeframe.
- - Attacker sets descriptor suggested-effort to a very small value so that
- most clients aim for a small value while the attacker comfortably launches
- an [ATTACK_BOTTOM_HALF] using medium effort PoW (see [REF_TEVADOR_1])
-
-5.1.4. Precomputed PoW attack
-
- The attacker may precompute many valid PoW nonces and submit them all at once
- before the current seed expires, overwhelming the service temporarily even
- using a single computer. The current scheme gives the attackers 4 hours to
- launch this attack since each seed lasts 2 hours and the service caches two
- seeds.
-
- An attacker with this attack might be aiming to DoS the service for a limited
- amount of time, or to cause an [ATTACK_EFFORT] attack.
-
-6. Parameter tuning [POW_TUNING]
-
- There are various parameters in this PoW system that need to be tuned:
-
- We first start by tuning the time it takes to verify a PoW token. We do this
- first because it's fundamental to the performance of onion services and can
- turn into a DoS vector of its own. We will do this tuning in a way that's
- agnostic to the chosen PoW function.
-
- We will then move towards analyzing the client starting difficulty setting
- for our PoW system. That defines the expected time for clients to succeed in
- our system, and the expected time for attackers to overwhelm our system. Same
- as above we will do this in a way that's agnostic to the chosen PoW function.
-
- Currently, we have hardcoded the initial client starting difficulty at 8,
- but this may be too low to ramp up quickly to various on and off attack
- patterns. A higher initial difficulty may be needed for these, depending on
- their severity. This section gives us an idea of how large such attacks can
- be.
-
- Finally, using those two pieces we will tune our PoW function and pick the
- right client starting difficulty setting. At the end of this section we will
- know the resources that an attacker needs to overwhelm the onion service, the
- resources that the service needs to verify introduction requests, and the
- resources that legitimate clients need to get to the onion service.
-
-6.1. PoW verification [POW_TUNING_VERIFICATION]
-
- Verifying a PoW token is the first thing that a service does when it receives
- an INTRODUCE2 cell and it's detailed in section [POW_VERIFY]. This
- verification happens during the "top half" part of the process. Every
- millisecond spent verifying PoW adds overhead to the already existing "top
- half" part of handling an introduction cell. Hence we should be careful to
- add minimal overhead here so that we don't enable attacks like [ATTACK_TOP_HALF].
-
- During our performance measurements in [TOR_MEASUREMENTS] we learned that the
- "top half" takes about 0.26 msecs in average, without doing any sort of PoW
- verification. Using that value we compute the following table, that describes
- the number of cells we can queue per second (aka times we can perform the
- "top half" process) for different values of PoW verification time:
-
- +---------------------+-----------------------+--------------+
- |PoW Verification Time| Total "top half" time | Cells Queued |
- | | | per second |
- |---------------------|-----------------------|--------------|
- | 0 msec | 0.26 msec | 3846 |
- | 1 msec | 1.26 msec | 793 |
- | 2 msec | 2.26 msec | 442 |
- | 3 msec | 3.26 msec | 306 |
- | 4 msec | 4.26 msec | 234 |
- | 5 msec | 5.26 msec | 190 |
- | 6 msec | 6.26 msec | 159 |
- | 7 msec | 7.26 msec | 137 |
- | 8 msec | 8.26 msec | 121 |
- | 9 msec | 9.26 msec | 107 |
- | 10 msec | 10.26 msec | 97 |
- +---------------------+-----------------------+--------------+
-
- Here is how you can read the table above:
-
- - For a PoW function with a 1ms verification time, an attacker needs to send
- 793 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
-
- - For a PoW function with a 2ms verification time, an attacker needs to send
- 442 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
-
- - For a PoW function with a 10ms verification time, an attacker needs to send
- 97 dummy introduction cells per second to succeed in a [ATTACK_TOP_HALF] attack.
-
- Whether an attacker can succeed at that depends on the attacker's resources,
- but also on the network's capacity.
-
- Our purpose here is to have the smallest PoW verification overhead possible
- that also allows us to achieve all our other goals.
-
- [Note that the table above is simply the result of a naive multiplication and
- does not take into account all the auxiliary overheads that happen every
- second like the time to invoke the mainloop, the bottom-half processes, or
- pretty much anything other than the "top-half" processing.
-
- During our measurements the time to handle INTRODUCE2 cells dominates any
- other action time: There might be events that require a long processing time,
- but these are pretty infrequent (like uploading a new HS descriptor) and
- hence over a long time they smooth out. Hence extrapolating the total cells
- queued per second based on a single "top half" time seems like good enough to
- get some initial intuition. That said, the values of "Cells queued per
- second" from the table above, are likely much smaller than displayed above
- because of all the auxiliary overheads.]
-
-6.2. PoW difficulty analysis [POW_DIFFICULTY_ANALYSIS]
-
- The difficulty setting of our PoW basically dictates how difficult it should
- be to get a success in our PoW system. An attacker who can get many successes
- per second can pull a successful [ATTACK_BOTTOM_HALF] attack against our
- system.
-
- In classic PoW systems, "success" is defined as getting a hash output below
- the "target". However, since our system is dynamic, we define "success" as an
- abstract high-effort computation.
-
- Our system is dynamic but we still need a starting difficulty setting that
- will be used for bootstrapping the system. The client and attacker can still
- aim higher or lower but for UX purposes and for analysis purposes we do need
- to define a starting difficulty, to minimize retries by clients.
-
-6.2.1. Analysis based on adversary power
-
- In this section we will try to do an analysis of PoW difficulty without using
- any sort of Tor-related or PoW-related benchmark numbers.
-
- We created the table (see [REF_TABLE]) below which shows how much time a
- legitimate client with a single machine should expect to burn before they get
- a single success. The x-axis is how many successes we want the attacker to be
- able to do per second: the more successes we allow the adversary, the more
- they can overwhelm our introduction queue. The y-axis is how many machines
- the adversary has in her disposal, ranging from just 5 to 1000.
-
- ===============================================================
- | Expected Time (in seconds) Per Success For One Machine |
- ===========================================================================
- | |
- | Attacker Succeses 1 5 10 20 30 50 |
- | per second |
- | |
- | 5 5 1 0 0 0 0 |
- | 50 50 10 5 2 1 1 |
- | 100 100 20 10 5 3 2 |
- | Attacker 200 200 40 20 10 6 4 |
- | Boxes 300 300 60 30 15 10 6 |
- | 400 400 80 40 20 13 8 |
- | 500 500 100 50 25 16 10 |
- | 1000 1000 200 100 50 33 20 |
- | |
- ============================================================================
-
- Here is how you can read the table above:
-
- - If an adversary has a botnet with 1000 boxes, and we want to limit her to 1
- success per second, then a legitimate client with a single box should be
- expected to spend 1000 seconds getting a single success.
-
- - If an adversary has a botnet with 1000 boxes, and we want to limit her to 5
- successes per second, then a legitimate client with a single box should be
- expected to spend 200 seconds getting a single success.
-
- - If an adversary has a botnet with 500 boxes, and we want to limit her to 5
- successes per second, then a legitimate client with a single box should be
- expected to spend 100 seconds getting a single success.
-
- - If an adversary has access to 50 boxes, and we want to limit her to 5
- successes per second, then a legitimate client with a single box should be
- expected to spend 10 seconds getting a single success.
-
- - If an adversary has access to 5 boxes, and we want to limit her to 5
- successes per second, then a legitimate client with a single box should be
- expected to spend 1 seconds getting a single success.
-
- With the above table we can create some profiles for starting values of our
- PoW difficulty.
-
-6.2.2. Analysis based on Tor's performance [POW_DIFFICULTY_TOR]
-
- To go deeper here, we can use the performance measurements from
- [TOR_MEASUREMENTS] to get a more specific intuition on the starting
- difficulty. In particular, we learned that completely handling an
- introduction cell takes 5.55 msecs in average. Using that value, we can
- compute the following table, that describes the number of introduction cells
- we can handle per second for different values of PoW verification:
-
- +---------------------+-----------------------+--------------+
- |PoW Verification Time| Total time to handle | Cells handled|
- | | introduction cell | per second |
- |---------------------|-----------------------|--------------|
- | 0 msec | 5.55 msec | 180.18 |
- | 1 msec | 6.55 msec | 152.67 |
- | 2 msec | 7.55 msec | 132.45 |
- | 3 msec | 8.55 msec | 116.96 |
- | 4 msec | 9.55 mesc | 104.71 |
- | 5 msec | 10.55 msec | 94.79 |
- | 6 msec | 11.55 msec | 86.58 |
- | 7 msec | 12.55 msec | 79.68 |
- | 8 msec | 13.55 msec | 73.80 |
- | 9 msec | 14.55 msec | 68.73 |
- | 10 msec | 15.55 msec | 64.31 |
- +---------------------+-----------------------+--------------+
-
- Here is how you can read the table above:
-
- - For a PoW function with a 1ms verification time, an attacker needs to send
- 152 high-effort introduction cells per second to succeed in a
- [ATTACK_BOTTOM_HALF] attack.
-
- - For a PoW function with a 10ms verification time, an attacker needs to send
- 64 high-effort introduction cells per second to succeed in a
- [ATTACK_BOTTOM_HALF] attack.
-
- We can use this table to specify a starting difficulty that won't allow our
- target adversary to succeed in an [ATTACK_BOTTOM_HALF] attack.
-
- Of course, when it comes to this table, the same disclaimer as in section
- [POW_TUNING_VERIFICATION] is valid. That is, the above table is just a
- theoretical extrapolation and we expect the real values to be much lower
- since they depend on auxiliary processing overheads, and on the network's
- capacity.
-
-
-7. Discussion
-
-7.1. UX
-
- This proposal has user facing UX consequences.
-
- When the client first attempts a pow, it can note how long iterations of the
- hash function take, and then use this to determine an estimation of the
- duration of the PoW. This estimation could be communicated via the control
- port or other mechanism, such that the browser could display how long the
- PoW is expected to take on their device. If the device is a mobile platform,
- and this time estimation is large, it could recommend that the user try from
- a desktop machine.
-
-7.2. Future work [FUTURE_WORK]
-
-7.2.1. Incremental improvements to this proposal
-
- There are various improvements that can be done in this proposal, and while
- we are trying to keep this v1 version simple, we need to keep the design
- extensible so that we build more features into it. In particular:
-
- - End-to-end introduction ACKs
-
- This proposal suffers from various UX issues because there is no end-to-end
- mechanism for an onion service to inform the client about its introduction
- request. If we had end-to-end introduction ACKs many of the problems from
- [CLIENT_BEHAVIOR] would be alleviated. The problem here is that end-to-end
- ACKs require modifications on the introduction point code and a network
- update which is a lengthy process.
-
- - Multithreading scheduler
-
- Our scheduler is pretty limited by the fact that Tor has a single-threaded
- design. If we improve our multithreading support we could handle a much
- greater amount of introduction requests per second.
-
-7.2.2. Future designs [FUTURE_DESIGNS]
-
- This is just the beginning in DoS defences for Tor and there are various
- future designs and schemes that we can investigate. Here is a brief summary
- of these:
-
- "More advanced PoW schemes" -- We could use more advanced memory-hard PoW
- schemes like MTP-argon2 or Itsuku to make it even harder for
- adversaries to create successful PoWs. Unfortunately these schemes
- have much bigger proof sizes, and they won't fit in INTRODUCE1 cells.
- See #31223 for more details.
-
- "Third-party anonymous credentials" -- We can use anonymous credentials and a
- third-party token issuance server on the clearnet to issue tokens
- based on PoW or CAPTCHA and then use those tokens to get access to the
- service. See [REF_CREDS] for more details.
-
- "PoW + Anonymous Credentials" -- We can make a hybrid of the above ideas
- where we present a hard puzzle to the user when connecting to the
- onion service, and if they solve it we then give the user a bunch of
- anonymous tokens that can be used in the future. This can all happen
- between the client and the service without a need for a third party.
-
- All of the above approaches are much more complicated than this proposal, and
- hence we want to start easy before we get into more serious projects.
-
-7.3. Environment
-
- We love the environment! We are concerned of how PoW schemes can waste energy
- by doing useless hash iterations. Here is a few reasons we still decided to
- pursue a PoW approach here:
-
- "We are not making things worse" -- DoS attacks are already happening and
- attackers are already burning energy to carry them out both on the
- attacker side, on the service side and on the network side. We think that
- asking legitimate clients to carry out PoW computations is not gonna
- affect the equation too much, since an attacker right now can very
- quickly cause the same damage that hundreds of legitimate clients do a
- whole day.
-
- "We hope to make things better" -- The hope is that proposals like this will
- make the DoS actors go away and hence the PoW system will not be used. As
- long as DoS is happening there will be a waste of energy, but if we
- manage to demotivate them with technical means, the network as a whole
- will less wasteful. Also see [CATCH22] for a similar argument.
-
-8. Acknowledgements
-
- Thanks a lot to tevador for the various improvements to the proposal and for
- helping us understand and tweak the RandomX scheme.
-
- Thanks to Solar Designer for the help in understanding the current PoW
- landscape, the various approaches we could take, and teaching us a few neat
- tricks.
-
-Appendix A. Little-t tor introduction scheduler
-
- This section describes how we will implement this proposal in the "tor"
- software (little-t tor).
-
- The following should be read as if tor is an onion service and thus the end
- point of all inbound data.
-
-A.1. The Main Loop [MAIN_LOOP]
-
- Tor uses libevent for its mainloop. For network I/O operations, a mainloop
- event is used to inform tor if it can read on a certain socket, or a
- connection object in tor.
-
- From there, this event will empty the connection input buffer (inbuf) by
- extracting and processing a cell at a time. The mainloop is single threaded
- and thus each cell is handled sequentially.
-
- Processing an INTRODUCE2 cell at the onion service means a series of
- operations (in order):
-
- 1) Unpack cell from inbuf to local buffer.
-
- 2) Decrypt cell (AES operations).
-
- 3) Parse cell header and process it depending on its RELAY_COMMAND.
-
- 4) INTRODUCE2 cell handling which means building a rendezvous circuit:
- i) Path selection
- ii) Launch circuit to first hop.
-
- 5) Return to mainloop event which essentially means back to step (1).
-
- Tor will read at most 32 cells out of the inbuf per mainloop round.
-
-A.2. Requirements for PoW
-
- With this proposal, in order to prioritize cells by the amount of PoW work
- it has done, cells can _not_ be processed sequentially as described above.
-
- Thus, we need a way to queue a certain number of cells, prioritize them and
- then process some cell(s) from the top of the queue (that is, the cells that
- have done the most PoW effort).
-
- We thus require a new cell processing flow that is _not_ compatible with
- current tor design. The elements are:
-
- - Validate PoW and place cells in a priority queue of INTRODUCE2 cells (as
- described in section [INTRO_QUEUE]).
-
- - Defer "bottom half" INTRO2 cell processing for after cells have been
- queued into the priority queue.
-
-A.3. Proposed scheduler [TOR_SCHEDULER]
-
- The intuitive way to address the A.2 requirements would be to do this
- simple and naive approach:
-
- 1) Mainloop: Empty inbuf INTRODUCE2 cells into priority queue
-
- 2) Process all cells in pqueue
-
- 3) Goto (1)
-
- However, we are worried that handling all those cells before returning to the
- mainloop opens possibilities of attack by an adversary since the priority
- queue is not gonna be kept up to date while we process all those cells. This
- means that we might spend lots of time dealing with introductions that don't
- deserve it. See [BOTTOM_HALF_SCHEDULER] for more details.
-
- We thus propose to split the INTRODUCE2 handling into two different steps:
- "top half" and "bottom half" process, as also mentioned in [POW_VERIFY]
- section above.
-
-A.3.1. Top half and bottom half scheduler
-
- The top half process is responsible for queuing introductions into the
- priority queue as follows:
-
- a) Unpack cell from inbuf to local buffer.
-
- b) Decrypt cell (AES operations).
-
- c) Parse INTRODUCE2 cell header and validate PoW.
-
- d) Return to mainloop event which essentially means step (1).
-
- The top-half basically does all operations of section [MAIN_LOOP] except from (4).
-
- An then, the bottom-half process is responsible for handling introductions
- and doing rendezvous. To achieve this we introduce a new mainloop event to
- process the priority queue _after_ the top-half event has completed. This new
- event would do these operations sequentially:
-
- a) Pop INTRODUCE2 cell from priority queue.
-
- b) Parse and process INTRODUCE2 cell.
-
- c) End event and yield back to mainloop.
-
-A.3.2. Scheduling the bottom half process [BOTTOM_HALF_SCHEDULER]
-
- The question now becomes: when should the "bottom half" event get triggered
- from the mainloop?
-
- We propose that this event is scheduled in when the network I/O event
- queues at least 1 cell into the priority queue. Then, as long as it has a
- cell in the queue, it would re-schedule itself for immediate execution
- meaning at the next mainloop round, it would execute again.
-
- The idea is to try to empty the queue as fast as it can in order to provide a
- fast response time to an introduction request but always leave a chance for
- more cells to appear between cell processing by yielding back to the
- mainloop. With this we are aiming to always have the most up-to-date version
- of the priority queue when we are completing introductions: this way we are
- prioritizing clients that spent a lot of time and effort completing their PoW.
-
- If the size of the queue drops to 0, it stops scheduling itself in order to
- not create a busy loop. The network I/O event will re-schedule it in time.
-
- Notice that the proposed solution will make the service handle 1 single
- introduction request at every main loop event. However, when we do
- performance measurements we might learn that it's preferable to bump the
- number of cells in the future from 1 to N where N <= 32.
-
-A.4 Performance measurements
-
- This section will detail the performance measurements we've done on tor.git
- for handling an INTRODUCE2 cell and then a discussion on how much more CPU
- time we can add (for PoW validation) before it badly degrades our
- performance.
-
-A.4.1 Tor measurements [TOR_MEASUREMENTS]
-
- In this section we will derive measurement numbers for the "top half" and
- "bottom half" parts of handling an introduction cell.
-
- These measurements have been done on tor.git at commit
- 80031db32abebaf4d0a91c01db258fcdbd54a471.
-
- We've measured several set of actions of the INTRODUCE2 cell handling process
- on Intel(R) Xeon(R) CPU E5-2650 v4. Our service was accessed by an array of
- clients that sent introduction requests for a period of 60 seconds.
-
- 1. Full Mainloop Event
-
- We start by measuring the full time it takes for a mainloop event to
- process an inbuf containing INTRODUCE2 cells. The mainloop event processed
- 2.42 cells per invocation on average during our measurements.
-
- Total measurements: 3279
-
- Min: 0.30 msec - 1st Q.: 5.47 msec - Median: 5.91 msec
- Mean: 13.43 msec - 3rd Q.: 16.20 msec - Max: 257.95 msec
-
- 2. INTRODUCE2 cell processing (bottom-half)
-
- We also measured how much time the "bottom half" part of the process
- takes. That's the heavy part of processing an introduction request as seen
- in step (4) of the [MAIN_LOOP] section:
-
- Total measurements: 7931
-
- Min: 0.28 msec - 1st Q.: 5.06 msec - Median: 5.33 msec
- Mean: 5.29 msec - 3rd Q.: 5.57 msec - Max: 14.64 msec
-
- 3. Connection data read (top half)
-
- Now that we have the above pieces, we can use them to measure just the
- "top half" part of the procedure. That's when bytes are taken from the
- connection inbound buffer and parsed into an INTRODUCE2 cell where basic
- validation is done.
-
- There is an average of 2.42 INTRODUCE2 cells per mainloop event and so we
- divide that by the full mainloop event mean time to get the time for one
- cell. From that we subtract the "bottom half" mean time to get how much
- the "top half" takes:
-
- => 13.43 / (7931 / 3279) = 5.55
- => 5.55 - 5.29 = 0.26
-
- Mean: 0.26 msec
-
- To summarize, during our measurements the average number of INTRODUCE2 cells
- a mainloop event processed is ~2.42 cells (7931 cells for 3279 mainloop
- invocations).
-
- This means that, taking the mean of mainloop event times, it takes ~5.55msec
- (13.43/2.42) to completely process an INTRODUCE2 cell. Then if we look deeper
- we see that the "top half" of INTRODUCE2 cell processing takes 0.26 msec in
- average, whereas the "bottom half" takes around 5.33 msec.
-
- The heavyness of the "bottom half" is to be expected since that's where 95%
- of the total work takes place: in particular the rendezvous path selection
- and circuit launch.
-
-A.2. References
-
- [REF_EQUIX]: https://github.com/tevador/equix
- https://github.com/tevador/equix/blob/master/devlog.md
- [REF_TABLE]: The table is based on the script below plus some manual editing for readability:
- https://gist.github.com/asn-d6/99a936b0467b0cef88a677baaf0bbd04
- [REF_BOTNET]: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2009/07/01121538/ynam_botnets_0907_en.pdf
- [REF_CREDS]: https://lists.torproject.org/pipermail/tor-dev/2020-March/014198.html
- [REF_TARGET]: https://en.bitcoin.it/wiki/Target
- [REF_TLS]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
- https://tools.ietf.org/id/draft-nir-tls-puzzles-00.html
- https://tools.ietf.org/html/draft-ietf-ipsecme-ddos-protection-10
- [REF_TLS_1]: https://www.ietf.org/archive/id/draft-nygren-tls-client-puzzles-02.txt
- [REF_TEVADOR_1]: https://lists.torproject.org/pipermail/tor-dev/2020-May/014268.html
- [REF_TEVADOR_2]: https://lists.torproject.org/pipermail/tor-dev/2020-June/014358.html
- [REF_TEVADOR_SIM]: https://github.com/mikeperry-tor/scratchpad/blob/master/tor-pow/effort_sim.py#L57
-```
diff --git a/spec/hspow-spec/motivation.md b/spec/hspow-spec/motivation.md
new file mode 100644
index 0000000..1c77f58
--- /dev/null
+++ b/spec/hspow-spec/motivation.md
@@ -0,0 +1,106 @@
+```text
+
+0. Abstract
+
+ This proposal aims to thwart introduction flooding DoS attacks by introducing
+ a dynamic Proof-Of-Work protocol that occurs over introduction circuits.
+
+1. Motivation
+
+ So far our attempts at limiting the impact of introduction flooding DoS
+ attacks on onion services has been focused on horizontal scaling with
+ Onionbalance, optimizing the CPU usage of Tor and applying rate limiting.
+ While these measures move the goalpost forward, a core problem with onion
+ service DoS is that building rendezvous circuits is a costly procedure both
+ for the service and for the network. For more information on the limitations
+ of rate-limiting when defending against DDoS, see [REF_TLS_1].
+
+ If we ever hope to have truly reachable global onion services, we need to
+ make it harder for attackers to overload the service with introduction
+ requests. This proposal achieves this by allowing onion services to specify
+ an optional dynamic proof-of-work scheme that its clients need to participate
+ in if they want to get served.
+
+ With the right parameters, this proof-of-work scheme acts as a gatekeeper to
+ block amplification attacks by attackers while letting legitimate clients
+ through.
+
+1.1. Related work
+
+ For a similar concept, see the three internet drafts that have been proposed
+ for defending against TLS-based DDoS attacks using client puzzles [REF_TLS].
+
+1.2. Threat model [THREAT_MODEL]
+
+1.2.1. Attacker profiles [ATTACKER_MODEL]
+
+ This proposal is written to thwart specific attackers. A simple PoW proposal
+ cannot defend against all and every DoS attack on the Internet, but there are
+ adversary models we can defend against.
+
+ Let's start with some adversary profiles:
+
+ "The script-kiddie"
+
+ The script-kiddie has a single computer and pushes it to its
+ limits. Perhaps it also has a VPS and a pwned server. We are talking about
+ an attacker with total access to 10 GHz of CPU and 10 GB of RAM. We
+ consider the total cost for this attacker to be zero $.
+
+ "The small botnet"
+
+ The small botnet is a bunch of computers lined up to do an introduction
+ flooding attack. Assuming 500 medium-range computers, we are talking about
+ an attacker with total access to 10 THz of CPU and 10 TB of RAM. We
+ consider the upfront cost for this attacker to be about $400.
+
+ "The large botnet"
+
+ The large botnet is a serious operation with many thousands of computers
+ organized to do this attack. Assuming 100k medium-range computers, we are
+ talking about an attacker with total access to 200 THz of CPU and 200 TB of
+ RAM. The upfront cost for this attacker is about $36k.
+
+ We hope that this proposal can help us defend against the script-kiddie
+ attacker and small botnets. To defend against a large botnet we would need
+ more tools at our disposal (see [FUTURE_DESIGNS]).
+
+1.2.2. User profiles [USER_MODEL]
+
+ We have attackers and we have users. Here are a few user profiles:
+
+ "The standard web user"
+
+ This is a standard laptop/desktop user who is trying to browse the
+ web. They don't know how these defences work and they don't care to
+ configure or tweak them. If the site doesn't load, they are gonna close
+ their browser and be sad at Tor. They run a 2GHz computer with 4GB of RAM.
+
+ "The motivated user"
+
+ This is a user that really wants to reach their destination. They don't
+ care about the journey; they just want to get there. They know what's going
+ on; they are willing to make their computer do expensive multi-minute PoW
+ computations to get where they want to be.
+
+ "The mobile user"
+
+ This is a motivated user on a mobile phone. Even tho they want to read the
+ news article, they don't have much leeway on stressing their machine to do
+ more computation.
+
+ We hope that this proposal will allow the motivated user to always connect
+ where they want to connect to, and also give more chances to the other user
+ groups to reach the destination.
+
+1.2.3. The DoS Catch-22 [CATCH22]
+
+ This proposal is not perfect and it does not cover all the use cases. Still,
+ we think that by covering some use cases and giving reachability to the
+ people who really need it, we will severely demotivate the attackers from
+ continuing the DoS attacks and hence stop the DoS threat all together.
+ Furthermore, by increasing the cost to launch a DoS attack, a big
+ class of DoS attackers will disappear from the map, since the expected ROI
+ will decrease.
+
+``` \ No newline at end of file
diff --git a/spec/hspow-spec/overview.md b/spec/hspow-spec/overview.md
new file mode 100644
index 0000000..cb4ea7e
--- /dev/null
+++ b/spec/hspow-spec/overview.md
@@ -0,0 +1,68 @@
+```text
+
+2. System Overview
+
+2.1. Tor protocol overview
+
+ +----------------------------------+
+ | Onion Service |
+ +-------+ INTRO1 +-----------+ INTRO2 +--------+ |
+ |Client |-------->|Intro Point|------->| PoW |-----------+ |
+ +-------+ +-----------+ |Verifier| | |
+ +--------+ | |
+ | | |
+ | | |
+ | +----------v---------+ |
+ | |Intro Priority Queue| |
+ +---------+--------------------+---+
+ | | |
+ Rendezvous | | |
+ circuits | | |
+ v v v
+
+
+
+ The proof-of-work scheme specified in this proposal takes place during the
+ introduction phase of the onion service protocol.
+
+ The system described in this proposal is not meant to be on all the time, and
+ it can be entirely disabled for services that do not experience DoS attacks.
+
+ When the subsystem is enabled, suggested effort is continuously adjusted and
+ the computational puzzle can be bypassed entirely when the effort reaches
+ zero. In these cases, the proof-of-work subsystem can be dormant but still
+ provide the necessary parameters for clients to voluntarily provide effort
+ in order to get better placement in the priority queue.
+
+ The protocol involves the following major steps:
+
+ 1) Service encodes PoW parameters in descriptor [DESC_POW]
+ 2) Client fetches descriptor and computes PoW [CLIENT_POW]
+ 3) Client completes PoW and sends results in INTRO1 cell [INTRO1_POW]
+ 4) Service verifies PoW and queues introduction based on PoW effort
+ [SERVICE_VERIFY]
+ 5) Requests are continuously drained from the queue, highest effort first,
+ subject to multiple constraints on speed [HANDLE_QUEUE]
+
+2.2. Proof-of-work overview
+
+2.2.2. Dynamic PoW
+
+ DoS is a dynamic problem where the attacker's capabilities constantly change,
+ and hence we want our proof-of-work system to be dynamic and not stuck with a
+ static difficulty setting. Hence, instead of forcing clients to go below a
+ static target like in Bitcoin to be successful, we ask clients to "bid" using
+ their PoW effort. Effectively, a client gets higher priority the higher
+ effort they put into their proof-of-work. This is similar to how
+ proof-of-stake works but instead of staking coins, you stake work.
+
+ The benefit here is that legitimate clients who really care about getting
+ access can spend a big amount of effort into their PoW computation, which
+ should guarantee access to the service given reasonable adversary models. See
+ [PARAM_TUNING] for more details about these guarantees and tradeoffs.
+
+ As a way to improve reachability and UX, the service tries to estimate the
+ effort needed for clients to get access at any given time and places it in
+ the descriptor. See [EFFORT_ESTIMATION] for more details.
+
+```
diff --git a/spec/hspow-spec/v1-equix.md b/spec/hspow-spec/v1-equix.md
new file mode 100644
index 0000000..2756e39
--- /dev/null
+++ b/spec/hspow-spec/v1-equix.md
@@ -0,0 +1,233 @@
+```text
+
+2.2.1. Algorithm overview
+
+ For our proof-of-work function we will use the Equi-X scheme by tevador
+ [REF_EQUIX]. Equi-X is an asymmetric PoW function based on Equihash<60,3>,
+ using HashX as the underlying layer. It features lightning fast verification
+ speed, and also aims to minimize the asymmetry between CPU and GPU.
+ Furthermore, it's designed for this particular use-case and hence
+ cryptocurrency miners are not incentivized to make optimized ASICs for it.
+
+ The overall scheme consists of several layers that provide different pieces
+ of this functionality:
+
+ 1) At the lowest layers, blake2b and siphash are used as hashing and PRNG
+ algorithms that are well suited to common 64-bit CPUs.
+ 2) A custom hash function family, HashX, randomizes its implementation for
+ each new seed value. These functions are tuned to utilize the pipelined
+ integer performance on a modern 64-bit CPU. This layer provides the
+ strongest ASIC resistance, since a hardware reimplementation would need
+ to include a CPU-like pipelined execution unit to keep up.
+ 3) The Equi-X layer itself builds on HashX and adds an algorithmic puzzle
+ that's designed to be strongly asymmetric and to require RAM to solve
+ efficiently.
+ 4) The PoW protocol itself builds on this Equi-X function with a particular
+ construction of the challenge input and particular constraints on the
+ allowed blake2b hash of the solution. This layer provides a linearly
+ adjustable effort that we can verify.
+ 5) Above the level of individual PoW handshakes, the client and service
+ form a closed-loop system that adjusts the effort of future handshakes.
+
+ The Equi-X scheme provides two functions that will be used in this proposal:
+ - equix_solve(challenge) which solves a puzzle instance, returning
+ a variable number of solutions per invocation depending on the specific
+ challenge value.
+ - equix_verify(challenge, solution) which verifies a puzzle solution
+ quickly. Verification still depends on executing the HashX function,
+ but far fewer times than when searching for a solution.
+
+ For the purposes of this proposal, all cryptographic algorithms are assumed
+ to produce and consume byte strings, even if internally they operate on
+ some other data type like 64-bit words. This is conventionally little endian
+ order for blake2b, which contrasts with Tor's typical use of big endian.
+ HashX itself is configured with an 8-byte output but its input is a single
+ 64-bit word of undefined byte order, of which only the low 16 bits are used
+ by Equi-X in its solution output. We treat Equi-X solution arrays as byte
+ arrays using their packed little endian 16-bit representation.
+
+ We tune Equi-X in section [EQUIX_TUNING].
+
+2.2.3. PoW effort
+
+ It's common for proof-of-work systems to define an exponential effort
+ function based on a particular number of leading zero bits or equivalent.
+ For the benefit of our effort estimation system, it's quite useful if we
+ instead have a linear scale. We use the first 32 bits of a hashed version
+ of the Equi-X solution as compared to the full 32-bit range.
+
+ Conceptually we could define a function:
+ unsigned effort(uint8_t *token)
+ which takes as its argument a hashed solution, interprets it as a
+ bitstring, and returns the quotient of dividing a bitstring of 1s by it.
+
+ So for example:
+ effort(00000001100010101101) = 11111111111111111111
+ / 00000001100010101101
+ or the same in decimal:
+ effort(6317) = 1048575 / 6317 = 165.
+
+ In practice we can avoid even having to perform this division, performing
+ just one multiply instead to see if a request's claimed effort is supported
+ by the smallness of the resulting 32-bit hash prefix. This assumes we send
+ the desired effort explicitly as part of each PoW solution. We do want to
+ force clients to pick a specific effort before looking for a solution,
+ otherwise a client could opportunistically claim a very large effort any
+ time a lucky hash prefix comes up. Thus the effort is communicated explicitly
+ in our protocol, and it forms part of the concatenated Equi-X challenge.
+
+3.1. Service encodes PoW parameters in descriptor [DESC_POW]
+
+ This whole protocol starts with the service encoding the PoW parameters in
+ the 'encrypted' (inner) part of the v3 descriptor. As follows:
+
+ "pow-params" SP type SP seed-b64 SP suggested-effort
+ SP expiration-time NL
+
+ [At most once]
+
+ type: The type of PoW system used. We call the one specified here "v1"
+
+ seed-b64: A random seed that should be used as the input to the PoW
+ hash function. Should be 32 random bytes encoded in base64
+ without trailing padding.
+
+ suggested-effort: An unsigned integer specifying an effort value that
+ clients should aim for when contacting the service. Can be
+ zero to mean that PoW is available but not currently
+ suggested for a first connection attempt. See
+ [EFFORT_ESTIMATION] for more details here.
+
+ expiration-time: A timestamp in "YYYY-MM-DDTHH:MM:SS" format (iso time
+ with no space) after which the above seed expires and
+ is no longer valid as the input for PoW. It's needed
+ so that our replay cache does not grow infinitely. It
+ should be set to RAND_TIME(now+7200, 900) seconds.
+
+ The service should refresh its seed when expiration-time passes. The service
+ SHOULD keep its previous seed in memory and accept PoWs using it to avoid
+ race-conditions with clients that have an old seed. The service SHOULD avoid
+ generating two consequent seeds that have a common 4 bytes prefix. See
+ [INTRO1_POW] for more info.
+
+ By RAND_TIME(ts, interval) we mean a time between ts-interval and ts, chosen
+ uniformly at random.
+
+3.2. Client fetches descriptor and computes PoW [CLIENT_POW]
+
+ If a client receives a descriptor with "pow-params", it should assume that
+ the service is prepared to receive PoW solutions as part of the introduction
+ protocol.
+
+ The client parses the descriptor and extracts the PoW parameters. It makes
+ sure that the <expiration-time> has not expired and if it has, it needs to
+ fetch a new descriptor.
+
+ The client should then extract the <suggested-effort> field to configure its
+ PoW 'target' (see [REF_TARGET]). The client SHOULD NOT accept 'target' values
+ that will cause unacceptably long PoW computation.
+
+ The client uses a "personalization string" P equal to the following
+ nul-terminated ASCII string: "Tor hs intro v1\0".
+
+ The client looks up `ID`, the current 32-byte blinded public ID
+ (KP_hs_blind_id) for the onion service.
+
+ To complete the PoW the client follows the following logic:
+
+ a) Client selects a target effort E, based on <suggested-effort> and past
+ connection attempt history.
+ b) Client generates a secure random 16-byte nonce N, as the starting
+ point for the solution search.
+ c) Client derives seed C by decoding 'seed-b64'.
+ d) Client calculates S = equix_solve(P || ID || C || N || E)
+ e) Client calculates R = ntohl(blake2b_32(P || ID || C || N || E || S))
+ f) Client checks if R * E <= UINT32_MAX.
+ f1) If yes, success! The client can submit N, E, the first 4 bytes of
+ C, and S.
+ f2) If no, fail! The client interprets N as a 16-byte little-endian
+ integer, increments it by 1 and goes back to step d).
+
+ Note that the blake2b hash includes the output length parameter in its
+ initial state vector, so a blake2b_32 is not equivalent to the prefix of a
+ blake2b_512. We calculate the 32-bit blake2b specifically, and interpret it
+ in network byte order as an unsigned integer.
+
+ At the end of the above procedure, the client should have S as the solution
+ of the Equix-X puzzle with N as the nonce, C as the seed. How quickly this
+ happens depends solely on the target effort E parameter.
+
+ The algorithm as described is suitable for single-threaded computation.
+ Optionally, a client may choose multiple nonces and attempt several solutions
+ in parallel on separate CPU cores. The specific choice of nonce is entirely
+ up to the client, so parallelization choices like this do not impact the
+ network protocol's interoperability at all.
+
+3.3. Client sends PoW in INTRO1 cell [INTRO1_POW]
+
+ Now that the client has an answer to the puzzle it's time to encode it into
+ an INTRODUCE1 cell. To do so the client adds an extension to the encrypted
+ portion of the INTRODUCE1 cell by using the EXTENSIONS field (see
+ [PROCESS_INTRO2] section in rend-spec-v3.txt). The encrypted portion of the
+ INTRODUCE1 cell only gets read by the onion service and is ignored by the
+ introduction point.
+
+ We propose a new EXT_FIELD_TYPE value:
+
+ [02] -- PROOF_OF_WORK
+
+ The EXT_FIELD content format is:
+
+ POW_VERSION [1 byte]
+ POW_NONCE [16 bytes]
+ POW_EFFORT [4 bytes]
+ POW_SEED [4 bytes]
+ POW_SOLUTION [16 bytes]
+
+ where:
+
+ POW_VERSION is 1 for the protocol specified in this proposal
+ POW_NONCE is the nonce 'N' from the section above
+ POW_EFFORT is the 32-bit integer effort value, in network byte order
+ POW_SEED is the first 4 bytes of the seed used
+
+ This will increase the INTRODUCE1 payload size by 43 bytes since the
+ extension type and length is 2 extra bytes, the N_EXTENSIONS field is always
+ present and currently set to 0 and the EXT_FIELD is 41 bytes. According to
+ ticket #33650, INTRODUCE1 cells currently have more than 200 bytes
+ available.
+
+3.4. Service verifies PoW and handles the introduction [SERVICE_VERIFY]
+
+ When a service receives an INTRODUCE1 with the PROOF_OF_WORK extension, it
+ should check its configuration on whether proof-of-work is enabled on the
+ service. If it's not enabled, the extension SHOULD BE ignored. If enabled,
+ even if the suggested effort is currently zero, the service follows the
+ procedure detailed in this section.
+
+ If the service requires the PROOF_OF_WORK extension but received an
+ INTRODUCE1 cell without any embedded proof-of-work, the service SHOULD
+ consider this cell as a zero-effort introduction for the purposes of the
+ priority queue (see section [INTRO_QUEUE]).
+
+3.4.1. PoW verification [POW_VERIFY]
+
+ To verify the client's proof-of-work the service MUST do the following steps:
+
+ a) Find a valid seed C that starts with POW_SEED. Fail if no such seed
+ exists.
+ b) Fail if N = POW_NONCE is present in the replay cache
+ (see [REPLAY_PROTECTION])
+ c) Calculate R = ntohl(blake2b_32(P || ID || C || N || E || S))
+ d) Fail if R * E > UINT32_MAX
+ e) Fail if equix_verify(P || ID || C || N || E, S) != EQUIX_OK
+ f) Put the request in the queue with a priority of E
+
+ If any of these steps fail the service MUST ignore this introduction request
+ and abort the protocol.
+
+ In this proposal we call the above steps the "top half" of introduction
+ handling. If all the steps of the "top half" have passed, then the circuit
+ is added to the introduction queue as detailed in section [INTRO_QUEUE].
+
+```