Age | Commit message (Collapse) | Author |
|
This started as a response to ticket #40792 where Coverity is
complaining about a potential year 2038 bug where we cast time_t from
approx_time() to uint32_t for use in token_bucket_ctr.
There was a larger can of worms though, since token_bucket really
doesn't want to be using wallclock time here. I audited the call sites
for approx_time() and changed any that used a 32-bit cast or made
inappropriate use of wallclock time. Things like certificate lifetime,
consensus intervals, etc. need wallclock time. Measurements of rates
over time, however, are better served with a monotonic timer that does
not try and sync with wallclock ever.
Looking closer at token_bucket, its design is a bit odd because it was
initially intended for use with tick units but later forked into
token_bucket_rw which uses ticks to count bytes per second, and
token_bucket_ctr which uses seconds to count slower events. The rates
represented by either token bucket can't be lower than 1 per second, so
the slower timer in 'ctr' is necessary to represent the slower rates of
things like connections or introduction packets or rendezvous attempts.
I considered modifying token_bucket to use 64-bit timestamps overall
instead of 32-bit, but that seemed like an unnecessarily invasive change
that would grant some peace of mind but probably not help much. I was
more interested in removing the dependency on wallclock time. The
token_bucket_rw timer already uses monotonic time. This patch converts
token_bucket_ctr to use monotonic time as well. It introduces a new
monotime_coarse_absolute_sec(), which is currently the same as nsec
divided by a billion but could be optimized easily if we ever need to.
This patch also might fix a rollover bug.. I haven't tested this
extensively but I don't think the previous version of the rollover code
on either token bucket was correct, and I would expect it to get stuck
after the first rollover.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
This is a protocol breaking change that implements nickm's
changes to prop 327 to add an algorithm personalization string
and blinded HS id to the EquiX challenge string for our onion
service client puzzle.
This corresponds with the spec changes in torspec!130,
and it fixes a proposed vulnerability documented in
ticket tor#40789.
Clients and services prior to this patch will no longer
be compatible with the proposed "v1" proof-of-work protocol.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
This lets controller apps see the outgoing PoW effort on client
circuits, and the validated effort received on an incoming service
circuit.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
This dequeue path has been through a few revisions by now, first
limiting us to a fixed number per event loop callback, then an
additional limit based on a token bucket, then the current version
which has only the token bucket.
The thinking behing processing multiple requests per callback was to
optimize our usage of libevent, but in effect this creates a
prioritization problem. I think even a small fixed limit would be less
reliable than just backing out this optimization and always allowing
other callbacks to interrupt us in-between dequeues.
With this patch I'm seeing much smoother queueing behavior when I add
artificial delays to the main thread in testing.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
This centralizes the logic for deciding on these magic thresholds,
and tries to reduce them to just two: a min and max. The min should be a
"nearly empty" threshold, indicating that the queue only contains work
we expect to be able to complete very soon. The max level triggers a
bulk culling process that reduces the queue to half that amount.
This patch calculates both thresholds based on the torrc pqueue rate
settings if they're present, and uses generic defaults if the user asked
for an unlimited dequeue rate in torrc.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
The goal of this patch is to add an additional mechanism for adjusting
PoW effort upwards, where clients rather than services can choose to
solve their puzzles at a higher effort than what was suggested in the
descriptor.
I wanted to use hs_cache's existing unreachability stats to drive this
effort bump, but this revealed some cases where a circuit (intro or
rend) closed early on can end up in hs_cache with an all zero intro
point key, where nobody will find it. This moves intro_auth_pk
initialization earlier in a couple places and adds nonfatal asserts to
catch the problem if it shows up elsewhere.
The actual effort adjustment method I chose is to multiply the suggested
effort by (1 + unresponsive_count), then ensure the result is at least
1. If a service has suggested effort of 0 but we fail to connect,
retries will all use an effort of 1. If the suggestion was 50, we'll try
50, 100, 150, 200, etc. This is bounded both by our client effort limit
and by the limit on unresponsive_count (currently 5).
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
hs_pow_free_service_state
Asan catches this pretty readily when ending a service gracefully while
a DoS is in progress and the queue is full of items that haven't yet
timed out.
The module boundaries in hs_circuit are quite fuzzy here, but I'm trying
to follow the vibe of the existing hs_pow code.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
I don't think the concept of "minimum effort" is really useful to us,
so this patch removes it entirely and consequentially changes the way
that "total" effort is calculated so that we don't rely on any minimum
and we instead ramp up effort no faster than necessary.
If at least some portion of the attack is conducted by clients that
avoid PoW or provide incorrect solutions, those (potentially very
cheap) attacks will end up keeping the pqueue full. Prior to this patch,
that would cause suggested efforts to be unnecessarily high, because
rounding these very cheap requests up to even a minimum of 1 will
overestimate how much actual attack effort is being spent.
The result is that this patch is a simplification and it also allows a
slower start, where PoW effort jumps up either by a single unit or by an
amount calculated from actual effort in the queue.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
This adds a new "pow" module for the user-visible proof
of work support in ./configure, and this disables
src/feature/hs/hs_pow at compile-time.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
This adds a token bucket ratelimiter on the dequeue side
of hs_pow's priority queue. It adds config options and docs
for those options. (HiddenServicePoWQueueRate/Burst)
I'm testing this as a way to limit the overhead of circuit
creation when we're experiencing a flood of rendezvous requests.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
Adds two new metrics for hs_pow, and an internal parameter within
hs_metrics for implementing gauge parameters that reset before
every update.
Signed-off-by: Micah Elizabeth Scott <beth@torproject.org>
|
|
top_of_rend_pqueue_is_worthwhile requires a nonempty queue.
|
|
Now, pow should auto-enable and auto-disable itself.
|
|
This allows us to more accurately estimate effort, based on real bottom-half
throughput over the duration of a descriptor update.
|
|
|
|
our pqueue implementation does bizarre unspecified things with
ordering of elements that are equal. it certainly doesn't do any
sort of "first in first out" property that i was expecting.
now make it explicit by saying that "equal-effort, added-earlier" is
higher priority.
|
|
specifically, if we have 16 in-flight rend circs, and the next
one at the top of the pqueue is lower than our suggested effort,
then don't launch it yet.
this way we always launch adequate-effort requests immediately, and
we always handle *some* low-effort requests, but we are ready at any
moment to handle a few new adequate-effort requests.
|
|
this change makes us reach the callback *after* each mainloop
run, rather than as the next event to run immediately after
activation.
with the old behavior, we were starving everything else to drain the
pqueue entirely, each time we got a new intro2 cell.
now we at least will get to other activities as well.
|
|
not used in decision-making yet, but it's all ready to use in a
"don't dequeue any more if we have too many in-flight" kind of way
|
|
i.e. we were putting higher effort intro2 cells at the *end*
|
|
|
|
now we let ourselves queue up to twice as many as we expect, and when
we get to the limit we make a new pqueue and move over the first n
elements that we like most.
(the old approach, of calling SMARTLIST_DEL_CURRENT_KEEPORDER() on
elements in a pqueue, will destroy its heapify property.)
we also discard elements that are too old, either during the trimming
process or if they come up as the next request to respond to.
lastly, fix a fencepost error on how many rend reqs we would handle
per iteration.
|
|
should help with unit testing
|
|
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
If PoW are enabled, use a priority queue by effort for the rendezvous
requests hooked into the mainloop.
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
When parsing an INTRODUCE2 cell, we extract data in order to launch the
rendezvous circuit. This commit creates a data structure just for that
data so it can be used by future commits for prop327 in order to copy
that data over a priority queue instead of the whole intro data data
structure which contains pointers that could dissapear.
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
At this commit, the tor main loop solves it. We might consider moving
this to the CPU pool at some point.
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
This adds a `reason` label to the `hs_intro_rejected_intro_req_count` and
`hs_rdv_error_count` metrics introduced in #40755.
Metric look up and intialization is now more a bit more involved. This may be
fine for now, but it will become unwieldy if/when we add more labels (and as
such will need to be refactored).
Also, in the future, we may want to introduce finer grained `reason` labels.
For example, the `invalid_introduce2` label actually covers multiple types of
errors that can happen during the processing of an INTRODUCE2 cell (such as
cell parse errors, replays, decryption errors).
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
|
|
This introduces a couple of new service side metrics:
* `hs_intro_rejected_intro_req_count`, which counts the number of introduction
requests rejected by the hidden service
* `hs_rdv_error_count`, which counts the number of rendezvous errors as seen by
the hidden service (this number includes the number of circuit establishment
failures, failed retries, end-to-end circuit setup failures)
Closes #40755. This partially addresses #40717.
Signed-off-by: Gabriela Moldovan <gabi@torproject.org>
|
|
|
|
This can happen if our measurement subsystem decides to snatch it.
Fixes #40696
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
Move the retry from circuit_expire_building() to when the offending
circuit is being closed.
Fixes #40695
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
Logic is too convoluted and we can't efficiently apply a specific
timeout depending on the purpose.
Remove it and instead rely on the right circuit cutoff instead of
keeping this flagged circuit open forever.
Part of #40694
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
We had 3 callsites setting up the circuit congestion control and so this
commit consolidates all 3 calls into 1 function.
Related to #40586
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
Once the cpath is finalized, e2e encryption setup, transfer the ccontrol
from the rendezvous circuit to the cpath.
This allows the congestion control subsystem to properly function for
both upload and download side of onion services.
Closes #40586
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
|
|
These parameters will vary depending on path length, especially for onions.
|
|
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
Move it to extension.trunnel instead so that extension ABI construction
can be used in other parts of tor than just HS cells.
Specifically, we'll use it in the ntorv3 data payload and make a
congestion control parameter extension using that binary structure.
Only rename. No code behavior changes.
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
|
|
This is unfortunately massive but both functionalities were extremely
intertwined and it would have required us to actually change the HSv2 code in
order to be able to split this into multiple commits.
After this commit, there are still artefacts of v2 in the code but there is no
more support for service, intro point and HSDir.
The v2 support for rendezvous circuit is still available since that code is
the same for the v3 and we will leave it in so if a client is able to
rendezvous on v2 then it can still transfer traffic. Once the entire network
has moved away from v2, we can remove v2 rendezvous point support.
Related to #40266
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
Related to #40266
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
|
|
Typos found with codespell.
Please keep in mind that this should have impact on actual code
and must be carefully evaluated:
src/core/or/lttng_circuit.inc
- ctf_enum_value("CONTROLER", CIRCUIT_PURPOSE_CONTROLLER)
+ ctf_enum_value("CONTROLLER", CIRCUIT_PURPOSE_CONTROLLER)
|
|
The total number of rendezvous circuit created and the number of established
ones which is a gauge that decreases to keep an updated counter.
Related to #40063
Signed-off-by: David Goulet <dgoulet@torproject.org>
|
|
|