From ea41a664476e7bc3690f080fbd3c13e2e32629fc Mon Sep 17 00:00:00 2001 From: Nick Mathewson Date: Fri, 22 Oct 2021 17:36:04 -0400 Subject: Add proposals 336 and 337. --- proposals/336-randomize-guard-retries.md | 87 +++++++++++++++++++ proposals/337-simpler-guard-usability.md | 138 +++++++++++++++++++++++++++++++ 2 files changed, 225 insertions(+) create mode 100644 proposals/336-randomize-guard-retries.md create mode 100644 proposals/337-simpler-guard-usability.md (limited to 'proposals') diff --git a/proposals/336-randomize-guard-retries.md b/proposals/336-randomize-guard-retries.md new file mode 100644 index 0000000..5ee5b71 --- /dev/null +++ b/proposals/336-randomize-guard-retries.md @@ -0,0 +1,87 @@ +``` +Filename: 336-randomize-guard-retries.md +Title: Randomized schedule for guard retries +Author: Nick Mathewson +Created: 2021-10-22 +Status: Open +``` + +# Introduction + +When we notice that a guard isn't working, we don't mark it as retriable +until a certain interval has passed. Currently, these intervals are +fixed, as described in the documentation for `GUARDS_RETRY_SCHED` in +`guard-spec` appendix A.1. Here we propose using a randomized retry +interval instead, based on the same decorrelated-jitter algorithm we use +for directory retries. + +The upside of this approach is that it makes our behavior in +the presence of an unreliable network a bit harder for an attacker to +predict. It also means that if a guard goes down for a while, its +clients will notice that it is up at staggered times, rather than +probing it in lock-step. + +The downside of this approach is that we can, if we get unlucky +enough, completely fail to notice that a preferred guard is online when +we would otherwise have noticed sooner. + +Note that when a guard is marked retriable, it isn't necessarily retried +immediately. Instead, its status is changed from "Unreachable" to +"Unknown", which will cause it to get retried. + +For reference, our previous schedule was: + +``` + {param:PRIMARY_GUARDS_RETRY_SCHED} + -- every 10 minutes for the first six hours, + -- every 90 minutes for the next 90 hours, + -- every 4 hours for the next 3 days, + -- every 9 hours thereafter. + + {param:GUARDS_RETRY_SCHED} -- + -- every hour for the first six hours, + -- every 4 hours for the next 90 hours, + -- every 18 hours for the next 3 days, + -- every 36 hours thereafter. +``` + +# The new algorithm + +We re-use the decorrelated-jitter algorithm from `dir-spec` section 5.5. +The specific formula used to compute the 'i+1'th delay is: + +``` +Delay_{i+1} = MIN(cap, random_between(lower_bound, upper_bound)) +where upper_bound = MAX(lower_bound+1, Delay_i * 3) + lower_bound = MAX(1, base_delay). +``` + +For primary guards, we set base_delay to 30 seconds and cap to 6 hours. + +For non-primary guards, we set base_delay to 10 minutes and cap to 36 +hours. + +(These parameters were selected by simulating the results of using them +until they looked "a bit more aggressive" than the current algorithm, but +not too much.) + +The average behavior for the new primary schedule is: + +``` +First 1.0 hours: 10.14283 attempts. (Avg delay 4m 47.41s) +First 6.0 hours: 19.02377 attempts. (Avg delay 15m 36.95s) +First 96.0 hours: 56.11173 attempts. (Avg delay 1h 40m 3.13s) +First 168.0 hours: 83.67091 attempts. (Avg delay 1h 58m 43.16s) +Steady state: 2h 36m 44.63s between attempts. +``` + +The average behavior for the new non-primary schedule is: + +``` +First 1.0 hours: 3.08069 attempts. (Avg delay 14m 26.08s) +First 6.0 hours: 8.1473 attempts. (Avg delay 35m 25.27s) +First 96.0 hours: 22.57442 attempts. (Avg delay 3h 49m 32.16s) +First 168.0 hours: 29.02873 attempts. (Avg delay 5h 27m 2.36s) +Steady state: 11h 15m 28.47s between attempts. +``` + diff --git a/proposals/337-simpler-guard-usability.md b/proposals/337-simpler-guard-usability.md new file mode 100644 index 0000000..d75c4d8 --- /dev/null +++ b/proposals/337-simpler-guard-usability.md @@ -0,0 +1,138 @@ +``` +Filename: 337-simpler-guard-usability.md +Title: A simpler way to decide, "Is this guard usable?" +Author: Nick Mathewson +Created: 2021-10-22 +Status: Open +``` + +# Introduction + +The current `guard-spec` describes a mechanism for how to behave when +our primary guards are unreachable, and we don't know which other guards +are reachable. This proposal describes a simpler method, currently +implemented in [Arti](https://gitlab.torproject.org/tpo/core/arti/). + +(Note that this method might not actually give different results: its +only advantage is that it is much simpler to implement.) + +## The task at hand + +For illustration, we'll assume that our primary guards are P1, P2, and +P3, and our subsequent guards (in preference order) are G1, G2, G3, and +so on. The status of each guard is Reachable (we think we can connect +to it), Unreachable (we think it's down), or Unknown (we haven't tried +it recently). + +The question becomes, "What should we do when P1, P2, and P3 are +Unreachable, and G1, G2, ... are all Unknown"? + +In this circumstance, we _could_ say that we only build circuits to G1, +wait for them to succeed or fail, and only try G2 if we see that the +circuits to G1 have failed completely. But that delays in the case that +G1 is down. + +Instead, the first time we get a circuit request, we try to build one +circuit to G1. On the next circuit request, if the circuit to G1 isn't +done yet, we launch a circuit to G2 instead. The next request (if the +G1 and G2 circuits are still pending) goes to G3, and so on. But +(here's the critical part!) we don't actually _use_ the circuit to G2 +unless the circuit to G1 fails, and we don't actually _use_ the circuit +to G3 unless the circuits to G1 and G2 both fail. + +This approach causes Tor clients to check the status of multiple +possible guards in parallel, while not actually _using_ any guard until +we're sure that all the guards we'd rather use are down. + +## The current algorithm and its drawbacks + +For the current algorithm, see `guard-spec` section 4.9: circuits are +exploratory if they are not using a primary guard. If such an +exploratory circuit is `waiting_for_better_guard`, then we advance it +(or not) depending on the status of all other _circuits_ using guards that +we'd rather be using. + +In other words, the current algorithm is described in terms of actions +to take with given circuits. + +For Arti (and for other modular Tor implementations), however, this +algorithm is a bit of a pain: it introduces dependencies between the +guard code and the circuit handling code, requiring each one to mess +with the other. + +# Proposal + +I suggest that we describe an alternative algorithm for handing circuits +to non-primary guards, to be used in preference to the current +algorithm. Unlike the existing approach, it isolates the guard logic a +bit better from the circuit logic. + +## Handling exploratory circuits + +When all primary guards are Unreachable, we need to try non-primary +guards. We select the first such guard (in preference order) that is +neither Unreachable nor Pending. Whenever we give out such a guard, if +the guard's status is Unknown, then we call that guard "Pending" until +the attempt to use it succeeds or fails. We remember when the guard +became Pending. + +> Aside: None of the above is a change from our existing specification. + +After completing a circuit, the implementation must check whether +its guard is usable. A guard is usable according to these rules: + +Primary guards are always usable. + +Non-primary guards are usable for a given circuit if every guard earlier +in the preference list is either unsuitable for that circuit +(e.g. because of family restrictions), or marked as Unreachable, or has +been pending for at least `{NONPRIMARY_GUARD_CONNECT_TIMEOUT}`. + +Non-primary guards are unusable for a given circuit if some guard earlier +in the preference list is suitable for the circuit _and_ Reachable. + +Non-primary guards are unusable if they have not become usable after +`{NONPRIMARY_GUARD_IDLE_TIMEOUT}` seconds. + +If a circuit's guard is neither usable nor unusable immediately, the +circuit is not discarded; instead, it is kept (but not used) until it +becomes usable or unusable. + +> I am not 100% sure whether this description produces the same behavior +> as the current guard-spec, but it is simpler to describe, and has +> proven to be simpler to implement. + +## Implications for program design. + +(This entire section is implementation detail to explain why this is a +simplification from the previous algorithm. It is for explanatory +purposes only and is not part of the spec.) + +With this algorithm, we cut down the interaction between the guard code +and the circuit code considerably, but we do not remove it entirely. +Instead, there remains (in Arti terms) a pair of communication channels +between the circuit manager and the guard manager: + + * Whenever a guard is given to the circuit manager, the circuit manager + receives the write end of a single-use channel to + report whether the guard has succeeded or failed. + + * Whenever a non-primary guard is given to the circuit manager, the + circuit receives the read end of a single-use channel that will tell + it whether the guard is usable or unusable. This channel doesn't + report anything until the guard has one status or the other. + +With this design, the circuit manager never needs to look at the list of +guards, and the guard manager never needs to look at the list of +circuits. + +## Subtleties concerning "guard success" + +Note that the above definitions of a Reachable guard depend on reporting +when the _guard_ is successful or failed. This is not necessarily the +same as reporting whether the _circuit_ is successful or failed. For +example, a circuit that fails after the first hop does not necessarily +indicate that there's anything wrong with the guard. Similarly, we can +reasonably conclude that the guard is working (at least somewhat) as +long as we have an open channel to it. + -- cgit v1.2.3-54-g00ecf