From ea41a664476e7bc3690f080fbd3c13e2e32629fc Mon Sep 17 00:00:00 2001
From: Nick Mathewson <nickm@torproject.org>
Date: Fri, 22 Oct 2021 17:36:04 -0400
Subject: Add proposals 336 and 337.

---
 proposals/337-simpler-guard-usability.md | 138 +++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+)
 create mode 100644 proposals/337-simpler-guard-usability.md

(limited to 'proposals/337-simpler-guard-usability.md')

diff --git a/proposals/337-simpler-guard-usability.md b/proposals/337-simpler-guard-usability.md
new file mode 100644
index 0000000..d75c4d8
--- /dev/null
+++ b/proposals/337-simpler-guard-usability.md
@@ -0,0 +1,138 @@
+```
+Filename: 337-simpler-guard-usability.md
+Title: A simpler way to decide, "Is this guard usable?"
+Author: Nick Mathewson
+Created: 2021-10-22
+Status: Open
+```
+
+# Introduction
+
+The current `guard-spec` describes a mechanism for how to behave when
+our primary guards are unreachable, and we don't know which other guards
+are reachable.  This proposal describes a simpler method, currently
+implemented in [Arti](https://gitlab.torproject.org/tpo/core/arti/).
+
+(Note that this method might not actually give different results: its
+only advantage is that it is much simpler to implement.)
+
+## The task at hand
+
+For illustration, we'll assume that our primary guards are P1, P2, and
+P3, and our subsequent guards (in preference order) are G1, G2, G3, and
+so on.  The status of each guard is Reachable (we think we can connect
+to it), Unreachable (we think it's down), or Unknown (we haven't tried
+it recently).
+
+The question becomes, "What should we do when P1, P2, and P3 are
+Unreachable, and G1, G2, ... are all Unknown"?
+
+In this circumstance, we _could_ say that we only build circuits to G1,
+wait for them to succeed or fail, and only try G2 if we see that the
+circuits to G1 have failed completely.  But that delays in the case that
+G1 is down.
+
+Instead, the first time we get a circuit request, we try to build one
+circuit to G1.  On the next circuit request, if the circuit to G1 isn't
+done yet, we launch a circuit to G2 instead.  The next request (if the
+G1 and G2 circuits are still pending) goes to G3, and so on.  But
+(here's the critical part!) we don't actually _use_ the circuit to G2
+unless the circuit to G1 fails, and we don't actually _use_ the circuit
+to G3 unless the circuits to G1 and G2 both fail.
+
+This approach causes Tor clients to check the status of multiple
+possible guards in parallel, while not actually _using_ any guard until
+we're sure that all the guards we'd rather use are down.
+
+## The current algorithm and its drawbacks
+
+For the current algorithm, see `guard-spec` section 4.9: circuits are
+exploratory if they are not using a primary guard.  If such an
+exploratory circuit is `waiting_for_better_guard`, then we advance it
+(or not) depending on the status of all other _circuits_ using guards that
+we'd rather be using.
+
+In other words, the current algorithm is described in terms of actions
+to take with given circuits.
+
+For Arti (and for other modular Tor implementations), however, this
+algorithm is a bit of a pain: it introduces dependencies between the
+guard code and the circuit handling code, requiring each one to mess
+with the other.
+
+# Proposal
+
+I suggest that we describe an alternative algorithm for handing circuits
+to non-primary guards, to be used in preference to the current
+algorithm.  Unlike the existing approach, it isolates the guard logic a
+bit better from the circuit logic.
+
+## Handling exploratory circuits
+
+When all primary guards are Unreachable, we need to try non-primary
+guards.  We select the first such guard (in preference order) that is
+neither Unreachable nor Pending.  Whenever we give out such a guard, if
+the guard's status is Unknown, then we call that guard "Pending" until
+the attempt to use it succeeds or fails.  We remember when the guard
+became Pending.
+
+> Aside: None of the above is a change from our existing specification.
+
+After completing a circuit, the implementation must check whether
+its guard is usable.  A guard is usable according to these rules:
+
+Primary guards are always usable.
+
+Non-primary guards are usable for a given circuit if every guard earlier
+in the preference list is either unsuitable for that circuit
+(e.g. because of family restrictions), or marked as Unreachable, or has
+been pending for at least `{NONPRIMARY_GUARD_CONNECT_TIMEOUT}`.
+
+Non-primary guards are unusable for a given circuit if some guard earlier
+in the preference list is suitable for the circuit _and_ Reachable.
+
+Non-primary guards are unusable if they have not become usable after
+`{NONPRIMARY_GUARD_IDLE_TIMEOUT}` seconds.
+
+If a circuit's guard is neither usable nor unusable immediately, the
+circuit is not discarded; instead, it is kept (but not used) until it
+becomes usable or unusable.
+
+> I am not 100% sure whether this description produces the same behavior
+> as the current guard-spec, but it is simpler to describe, and has
+> proven to be simpler to implement.
+
+## Implications for program design.
+
+(This entire section is implementation detail to explain why this is a
+simplification from the previous algorithm. It is for explanatory
+purposes only and is not part of the spec.)
+
+With this algorithm, we cut down the interaction between the guard code
+and the circuit code considerably, but we do not remove it entirely.
+Instead, there remains (in Arti terms) a pair of communication channels
+between the circuit manager and the guard manager:
+
+ * Whenever a guard is given to the circuit manager, the circuit manager
+   receives the write end of a single-use channel to
+   report whether the guard has succeeded or failed.
+
+ * Whenever a non-primary guard is given to the circuit manager, the
+   circuit receives the read end of a single-use channel that will tell
+   it whether the guard is usable or unusable.  This channel doesn't
+   report anything until the guard has one status or the other.
+
+With this design, the circuit manager never needs to look at the list of
+guards, and the guard manager never needs to look at the list of
+circuits.
+
+## Subtleties concerning "guard success"
+
+Note that the above definitions of a Reachable guard depend on reporting
+when the _guard_ is successful or failed. This is not necessarily the
+same as reporting whether the _circuit_ is successful or failed.  For
+example, a circuit that fails after the first hop does not necessarily
+indicate that there's anything wrong with the guard.  Similarly, we can
+reasonably conclude that the guard is working (at least somewhat) as
+long as we have an open channel to it.
+
-- 
cgit v1.2.3-54-g00ecf