aboutsummaryrefslogtreecommitdiff
path: root/proposals/336-randomize-guard-retries.md
diff options
context:
space:
mode:
authorNick Mathewson <nickm@torproject.org>2021-10-22 17:36:04 -0400
committerNick Mathewson <nickm@torproject.org>2021-10-22 17:36:04 -0400
commitea41a664476e7bc3690f080fbd3c13e2e32629fc (patch)
tree95674a08081e89563a4c4104c54ffa87ecac5c51 /proposals/336-randomize-guard-retries.md
parent52a5e7152726d279fc73a437d38a6c23eb9fcf73 (diff)
downloadtorspec-ea41a664476e7bc3690f080fbd3c13e2e32629fc.tar.gz
torspec-ea41a664476e7bc3690f080fbd3c13e2e32629fc.zip
Add proposals 336 and 337.
Diffstat (limited to 'proposals/336-randomize-guard-retries.md')
-rw-r--r--proposals/336-randomize-guard-retries.md87
1 files changed, 87 insertions, 0 deletions
diff --git a/proposals/336-randomize-guard-retries.md b/proposals/336-randomize-guard-retries.md
new file mode 100644
index 0000000..5ee5b71
--- /dev/null
+++ b/proposals/336-randomize-guard-retries.md
@@ -0,0 +1,87 @@
+```
+Filename: 336-randomize-guard-retries.md
+Title: Randomized schedule for guard retries
+Author: Nick Mathewson
+Created: 2021-10-22
+Status: Open
+```
+
+# Introduction
+
+When we notice that a guard isn't working, we don't mark it as retriable
+until a certain interval has passed. Currently, these intervals are
+fixed, as described in the documentation for `GUARDS_RETRY_SCHED` in
+`guard-spec` appendix A.1. Here we propose using a randomized retry
+interval instead, based on the same decorrelated-jitter algorithm we use
+for directory retries.
+
+The upside of this approach is that it makes our behavior in
+the presence of an unreliable network a bit harder for an attacker to
+predict. It also means that if a guard goes down for a while, its
+clients will notice that it is up at staggered times, rather than
+probing it in lock-step.
+
+The downside of this approach is that we can, if we get unlucky
+enough, completely fail to notice that a preferred guard is online when
+we would otherwise have noticed sooner.
+
+Note that when a guard is marked retriable, it isn't necessarily retried
+immediately. Instead, its status is changed from "Unreachable" to
+"Unknown", which will cause it to get retried.
+
+For reference, our previous schedule was:
+
+```
+ {param:PRIMARY_GUARDS_RETRY_SCHED}
+ -- every 10 minutes for the first six hours,
+ -- every 90 minutes for the next 90 hours,
+ -- every 4 hours for the next 3 days,
+ -- every 9 hours thereafter.
+
+ {param:GUARDS_RETRY_SCHED} --
+ -- every hour for the first six hours,
+ -- every 4 hours for the next 90 hours,
+ -- every 18 hours for the next 3 days,
+ -- every 36 hours thereafter.
+```
+
+# The new algorithm
+
+We re-use the decorrelated-jitter algorithm from `dir-spec` section 5.5.
+The specific formula used to compute the 'i+1'th delay is:
+
+```
+Delay_{i+1} = MIN(cap, random_between(lower_bound, upper_bound))
+where upper_bound = MAX(lower_bound+1, Delay_i * 3)
+ lower_bound = MAX(1, base_delay).
+```
+
+For primary guards, we set base_delay to 30 seconds and cap to 6 hours.
+
+For non-primary guards, we set base_delay to 10 minutes and cap to 36
+hours.
+
+(These parameters were selected by simulating the results of using them
+until they looked "a bit more aggressive" than the current algorithm, but
+not too much.)
+
+The average behavior for the new primary schedule is:
+
+```
+First 1.0 hours: 10.14283 attempts. (Avg delay 4m 47.41s)
+First 6.0 hours: 19.02377 attempts. (Avg delay 15m 36.95s)
+First 96.0 hours: 56.11173 attempts. (Avg delay 1h 40m 3.13s)
+First 168.0 hours: 83.67091 attempts. (Avg delay 1h 58m 43.16s)
+Steady state: 2h 36m 44.63s between attempts.
+```
+
+The average behavior for the new non-primary schedule is:
+
+```
+First 1.0 hours: 3.08069 attempts. (Avg delay 14m 26.08s)
+First 6.0 hours: 8.1473 attempts. (Avg delay 35m 25.27s)
+First 96.0 hours: 22.57442 attempts. (Avg delay 3h 49m 32.16s)
+First 168.0 hours: 29.02873 attempts. (Avg delay 5h 27m 2.36s)
+Steady state: 11h 15m 28.47s between attempts.
+```
+