diff options
Diffstat (limited to 'spec/guard-spec')
-rw-r--r-- | spec/guard-spec/algorithm.md | 672 | ||||
-rw-r--r-- | spec/guard-spec/appendices.md | 152 | ||||
-rw-r--r-- | spec/guard-spec/guard-selection/index.md | 72 | ||||
-rw-r--r-- | spec/guard-spec/index.md | 47 | ||||
-rw-r--r-- | spec/guard-spec/state-instances.md | 49 |
5 files changed, 992 insertions, 0 deletions
diff --git a/spec/guard-spec/algorithm.md b/spec/guard-spec/algorithm.md new file mode 100644 index 0000000..ba1c725 --- /dev/null +++ b/spec/guard-spec/algorithm.md @@ -0,0 +1,672 @@ +<a id="guard-spec.txt-4"></a> + +# The algorithm + +<a id="guard-spec.txt-4.0"></a> + +## The guards listed in the current consensus. {#GUARDS} + +By {set:GUARDS} we mean the set of all guards in the current +consensus that are usable for all circuits and directory +requests. (They must have the flags: Stable, Fast, V2Dir, Guard.) + +**Rationale** + +We require all guards to have the flags that we potentially need +from any guard, so that all guards are usable for all circuits. + +<a id="guard-spec.txt-4.1"></a> + +## The Sampled Guard Set. {#SAMPLED} + +We maintain a set, {set:SAMPLED_GUARDS}, that persists across +invocations of Tor. It is a subset of the nodes ordered by a sample idx that +we have seen listed as a guard in the consensus at some point. +For each such guard, we record persistently: + +```text + - {pvar:ADDED_ON_DATE}: The date on which it was added to + sampled_guards. + + We set this value to a point in the past, using + RAND(now, {GUARD_LIFETIME}/10). See + Appendix [RANDOM] below. + + - {pvar:ADDED_BY_VERSION}: The version of Tor that added it to + sampled_guards. + + - {pvar:IS_LISTED}: Whether it was listed as a usable Guard in + the _most recent_ consensus we have seen. + + - {pvar:FIRST_UNLISTED_AT}: If IS_LISTED is false, the publication date + of the earliest consensus in which this guard was listed such that we + have not seen it listed in any later consensus. Otherwise "None." + We randomize this to a point in the past, based on + RAND(added_at_time, {REMOVE_UNLISTED_GUARDS_AFTER} / 5) +``` + +For each guard in {SAMPLED_GUARDS}, we also record this data, +non-persistently: + +```text + - {tvar:last_tried_connect}: A 'last tried to connect at' + time. Default 'never'. + + - {tvar:is_reachable}: an "is reachable" tristate, with + possible values { <state:yes>, <state:no>, <state:maybe> }. + Default '<maybe>.' + + [Note: "yes" is not strictly necessary, but I'm + making it distinct from "maybe" anyway, to make our + logic clearer. A guard is "maybe" reachable if it's + worth trying. A guard is "yes" reachable if we tried + it and succeeded.] + + [Note 2: This variable is, in fact, a combination + of different context-sensitive variables depending + on the _purpose_ for which we are selecting a guard. + When we are selecing a guard for an ordinary + circuit, we look at the regular {is_reachable}. + But when we are selecting the guard for a one-hop + directory circuit, we also look at an instance + of {is_reachable} that tracks whether + downloads of the types we are making have failed + recently.] + + - {tvar:failing_since}: The first time when we failed to + connect to this guard. Defaults to "never". Reset to + "never" when we successfully connect to this guard. + + - {tvar:is_pending} A "pending" flag. This indicates that we + are trying to build an exploratory circuit through the + guard, and we don't know whether it will succeed. + + - {tvar:pending_since}: A timestamp. Set whenever we set + {tvar:is_pending} to true; cleared whenever we set + {tvar:is_pending} to false. NOTE +``` + +We require that {SAMPLED_GUARDS} contain at least +{MIN_FILTERED_SAMPLE} guards from the consensus (if possible), +but not more than {MAX_SAMPLE_THRESHOLD} of the number of guards +in the consensus, and not more than {MAX_SAMPLE_SIZE} in total. +(But if the maximum would be smaller than {MIN_FILTERED_SAMPLE}, we +set the maximum at {MIN_FILTERED_SAMPLE}.) + +To add a new guard to {SAMPLED_GUARDS}, pick an entry at random from +({GUARDS} - {SAMPLED_GUARDS}), according to the path selection rules. + +We remove an entry from {SAMPLED_GUARDS} if: + +```text + * We have a live consensus, and {IS_LISTED} is false, and + {FIRST_UNLISTED_AT} is over {REMOVE_UNLISTED_GUARDS_AFTER} + days in the past. + + OR + + * We have a live consensus, and {ADDED_ON_DATE} is over + {GUARD_LIFETIME} ago, *and* {CONFIRMED_ON_DATE} is either + "never", or over {GUARD_CONFIRMED_MIN_LIFETIME} ago. +``` + +Note that {SAMPLED_GUARDS} does not depend on our configuration. +It is possible that we can't actually connect to any of these +guards. + +**Rationale** + +The {SAMPLED_GUARDS} set is meant to limit the total number of +guards that a client will connect to in a given period. The +upper limit on its size prevents us from considering too many +guards. + +The first expiration mechanism is there so that our +{SAMPLED_GUARDS} list does not accumulate so many dead +guards that we cannot add new ones. + +The second expiration mechanism makes us rotate our guards slowly +over time. + +Ordering the {SAMPLED_GUARDS} set in the order in which we sampled those +guards and picking guards from that set according to this ordering improves +load-balancing. It is closer to offer the expected usage of the guard nodes +as per the path selection rules. + +The ordering also improves on another objective of this proposal: trying to +resist an adversary pushing clients over compromised guards, since the +adversary would need the clients to exhaust all their initial +{SAMPLED_GUARDS} set before having a chance to use a newly deployed +adversary node. + +<a id="guard-spec.txt-4.2"></a> + +## The Usable Sample {#FILTERED} + +We maintain another set, {set:FILTERED_GUARDS}, that does not +persist. It is derived from: + +```text + - {SAMPLED_GUARDS} + - our current configuration, + - the path bias information. +``` + +A guard is a member of {set:FILTERED_GUARDS} if and only if all +of the following are true: + +```text + - It is a member of {SAMPLED_GUARDS}, with {IS_LISTED} set to + true. + - It is not disabled because of path bias issues. + - It is not disabled because of ReachableAddresses policy, + the ClientUseIPv4 setting, the ClientUseIPv6 setting, + the FascistFirewall setting, or some other + option that prevents using some addresses. + - It is not disabled because of ExcludeNodes. + - It is a bridge if UseBridges is true; or it is not a + bridge if UseBridges is false. + - Is included in EntryNodes if EntryNodes is set and + UseBridges is not. (But see 2.B above). +``` + +We have an additional subset, {set:USABLE_FILTERED_GUARDS}, which +is defined to be the subset of {FILTERED_GUARDS} where +{is_reachable} is `<yes>` or `<maybe>`. + +We try to maintain a requirement that {USABLE_FILTERED_GUARDS} +contain at least {MIN_FILTERED_SAMPLE} elements: + +```text + Whenever we are going to sample from {USABLE_FILTERED_GUARDS}, + and it contains fewer than {MIN_FILTERED_SAMPLE} elements, we + add new elements to {SAMPLED_GUARDS} until one of the following + is true: + + * {USABLE_FILTERED_GUARDS} is large enough, + OR + * {SAMPLED_GUARDS} is at its maximum size. + + ** Rationale ** +``` + +These filters are applied _after_ sampling: if we applied them +before the sampling, then our sample would reflect the set of +filtering restrictions that we had in the past. + +<a id="guard-spec.txt-4.3"></a> + +## The confirmed-guard list. {#CONFIRMED} + +\[formerly USED_GUARDS\] + +We maintain a persistent ordered list, {list:CONFIRMED_GUARDS}. +It contains guards that we have used before, in our preference +order of using them. It is a subset of {SAMPLED_GUARDS}. For +each guard in this list, we store persistently: + +- {pvar:IDENTITY} Its fingerprint. + +```text + - {pvar:CONFIRMED_ON_DATE} When we added this guard to + {CONFIRMED_GUARDS}. + + Randomized to a point in the past as RAND(now, {GUARD_LIFETIME}/10). +``` + +We append new members to {CONFIRMED_GUARDS} when we mark a circuit +built through a guard as "for user traffic." + +Whenever we remove a member from {SAMPLED_GUARDS}, we also remove +it from {CONFIRMED_GUARDS}. + +```text + [Note: You can also regard the {CONFIRMED_GUARDS} list as a + total ordering defined over a subset of {SAMPLED_GUARDS}.] +``` + +Definition: we call Guard A "higher priority" than another Guard B +if, when A and B are both reachable, we would rather use A. We +define priority as follows: + +```text + * Every guard in {CONFIRMED_GUARDS} has a higher priority + than every guard not in {CONFIRMED_GUARDS}. + + * Among guards in {CONFIRMED_GUARDS}, the one appearing earlier + on the {CONFIRMED_GUARDS} list has a higher priority. + + * Among guards that do not appear in {CONFIRMED_GUARDS}, + {is_pending}==true guards have higher priority. + + * Among those, the guard with earlier {last_tried_connect} time + has higher priority. + + * Finally, among guards that do not appear in + {CONFIRMED_GUARDS} with {is_pending==false}, all have equal + priority. + + ** Rationale ** +``` + +We add elements to this ordering when we have actually used them +for building a usable circuit. We could mark them at some other +time (such as when we attempt to connect to them, or when we +actually connect to them), but this approach keeps us from +committing to a guard before we actually use it for sensitive +traffic. + +<a id="guard-spec.txt-4.4"></a> + +## The Primary guards {#PRIMARY} + +We keep a run-time non-persistent ordered list of +{list:PRIMARY_GUARDS}. It is a subset of {FILTERED_GUARDS}. It +contains {N_PRIMARY_GUARDS} elements. + +To compute primary guards, take the ordered intersection of +{CONFIRMED_GUARDS} and {FILTERED_GUARDS}, and take the first +{N_PRIMARY_GUARDS} elements. If there are fewer than +{N_PRIMARY_GUARDS} elements, append additional elements to +PRIMARY_GUARDS chosen from ({FILTERED_GUARDS} - {CONFIRMED_GUARDS}), +ordered in "sample order" (that is, by {ADDED_ON_DATE}). + +Once an element has been added to {PRIMARY_GUARDS}, we do not remove it +until it is replaced by some element from {CONFIRMED_GUARDS}. +That is: if a non-primary guard becomes confirmed and not every primary +guard is confirmed, then the list of primary guards list is regenerated, +first from the confirmed guards (as before), and then from any +non-confirmed primary guards. + +Note that {PRIMARY_GUARDS} do not have to be in +{USABLE_FILTERED_GUARDS}: they might be unreachable. + +**Rationale** + +These guards are treated differently from other guards. If one of +them is usable, then we use it right away. For other guards +{FILTERED_GUARDS}, if it's usable, then before using it we might +first double-check whether perhaps one of the primary guards is +usable after all. + +<a id="guard-spec.txt-4.5"></a> + +## Retrying guards. {#RETRYING} + +(We run this process as frequently as needed. It can be done once +a second, or just-in-time.) + +If a primary sampled guard's {is_reachable} status is `<no>`, then +we decide whether to update its {is_reachable} status to `<maybe>` +based on its {last_tried_connect} time, its {failing_since} time, +and the {PRIMARY_GUARDS_RETRY_SCHED} schedule. + +If a non-primary sampled guard's {is_reachable} status is `<no>`, then +we decide whether to update its {is_reachable} status to `<maybe>` +based on its {last_tried_connect} time, its {failing_since} time, +and the {GUARDS_RETRY_SCHED} schedule. + +**Rationale** + +An observation that a guard has been 'unreachable' only lasts for +a given amount of time, since we can't infer that it's unreachable +now from the fact that it was unreachable a few minutes ago. + +## Circuit status {#CIRC-STATUS} + +Sometimes the guard selection algorithm will return a guard that we +would always be happy to use; but in other cases, the guard +selection algorithm can return a guard that we shouldn't use +without gathering additional information. + +From the point of view of guard selection, +every circuit we build is considered to be in one of these states: + + * `<state:usable_on_completion>` + * `<state:usable_if_no_better_guard>` + * `<state:waiting_for_better_guard>` + * `<state:complete>` + +You may only attach streams to `<complete>` circuits. +(Additionally, you may only send RENDEZVOUS messages, ESTABLISH_INTRO +messages, and INTRODUCE messages on `<complete>` circuits.) + +The per-circuit state machine is: + + * New circuits are `<usable_on_completion>` or + `<usable_if_no_better_guard>`. + + * A `<usable_on_completion>` circuit may become `<complete>`, or may + fail. + + * A `<usable_if_no_better_guard>` circuit may become + `<usable_on_completion>`; may become `<waiting_for_better_guard>`; or may + fail. + + * A `<waiting_for_better_guard>` circuit will become `<complete>`, or will + be closed, or will fail. + + * A `<complete>` circuit remains `<complete>` until it fails or is + closed. + +```mermaid +stateDiagram-v2 + [*] --> usable_on_completion + [*] --> usable_if_no_better_guard + usable_on_completion --> complete + usable_on_completion --> failed + usable_if_no_better_guard --> usable_on_completion + usable_if_no_better_guard --> waiting_for_better_guard + usable_if_no_better_guard --> failed + waiting_for_better_guard --> complete + waiting_for_better_guard --> failed + waiting_for_better_guard --> closed + complete --> failed + complete --> closed + failed --> [*] + closed --> [*] +``` +<!-- TODO: The above diagram does not yet render. Fix that. --> + +Each of these transitions is described in sections below. + +<a id="guard-spec.txt-4.6"></a> + +## Selecting guards for circuits. \[Section:SELECTING\] {#SELECTING} + +Now that we have described the various lists of guards, we can explain how +guards are chosen for each circuit. + +We keep, as global transient state: + + * {tvar:last_time_on_internet} -- the last time at which we + successfully used a circuit or connected to a guard. At + startup we set this to "infinitely far in the past." + +As an input to the algorithm, we take a list of _restrictions_. +These may include specific relays or families that we need to avoid. + +Here is the algorithm. It is given as a series of sub-algorithms, +in decreasing order of preference from best case to worst case. +When we want to build a circuit, and we need to pick a guard: + + * In the base case, if any entry in PRIMARY_GUARDS has {is_reachable} status of + `<maybe>` or `<yes>`, consider only such guards. Call them the + "reachable primary guards". + + Start by considering the the first {NUM_USABLE_PRIMARY_GUARDS} (or + {NUM_USABLE_PRIMARY_DIRECTORY_GUARDS}) reachable primary guards. + Remove any guards that do not follow our path selection + restrictions, to build a temporary list. If that temporary list contains at + least one guard, choose a guard from that list uniformly at + random. + + If the temporary list contains no guards, return the first guard + from our reachable primary guard that does obey the path restriction. + + When selecting a guard according to this approach, its circuit + is `<usable_on_completion>`. + + \[Note: We do not use {is_pending} on primary guards, since we + are willing to try to build multiple circuits through them + before we know for sure whether they work, and since we will + not use any non-primary guards until we are sure that the + primary guards are all down. (XX is this good?)\] + + * Otherwise, if the ordered intersection of {CONFIRMED_GUARDS} + and {USABLE_FILTERED_GUARDS} is nonempty, return the first + entry in that intersection that has {is_pending} set to + false. Set its value of {is_pending} to true, + and set its {pending_since} to the current time. + The circuit + is now `<usable_if_no_better_guard>`. (If all entries have + {is_pending} true, pick the first one.) + + * Otherwise, if there is no such entry, select a member from + {USABLE_FILTERED_GUARDS} in sample order. Set its {is_pending} field to + true, and set its {pending_since} to the current time. + The circuit is `<usable_if_no_better_guard>`. + + * Otherwise, in the worst case, if USABLE_FILTERED_GUARDS is empty, we have exhausted + all the sampled guards. In this case we proceed by marking all guards + as `<maybe>` reachable so that we can keep on trying circuits. + +Whenever we select a guard for a new circuit attempt, we update the +{last_tried_connect} time for the guard to 'now.' + +In some cases (for example, when we need a certain directory feature, +or when we need to avoid using a certain exit as a guard), we need to +restrict the guards that we use for a single circuit. When this happens, we +remember the restrictions that applied when choosing the guard for +that circuit, since we will need them later (see \[UPDATE_WAITING\].). + +**Rationale** + +We're getting to the core of the algorithm here. Our main goals are to +make sure that + + 1. If it's possible to use a primary guard, we do. + 2. We probably use the first primary guard. + +So we only try non-primary guards if we're pretty sure that all +the primary guards are down, and we only try a given primary guard +if the earlier primary guards seem down. + +When we _do_ try non-primary guards, however, we only build one +circuit through each, to give it a chance to succeed or fail. If +ever such a circuit succeeds, we don't use it until we're pretty +sure that it's the best guard we're getting. (see below). + +\[XXX timeout.\] + +<a id="guard-spec.txt-4.7"></a> + +## When a circuit fails. {#ON_FAIL} + +When a circuit fails in a way that makes us conclude that a guard +is not reachable, we take the following steps: + +```text + * Set the guard's {is_reachable} status to <no>. If it had + {is_pending} set to true, we make it non-pending and clear + {pending_since}. + + * Close the circuit, of course. (This removes it from + consideration by the algorithm in [UPDATE_WAITING].) + + * Update the list of waiting circuits. (See [UPDATE_WAITING] + below.) +``` + +\[Note: the existing Tor logic will cause us to create more +circuits in response to some of these steps; and also see +\[ON_CONSENSUS\].\] + +\[Note 2: In the case of a one-hop circuit made for a directory +request, it is possible for the request to fail _after_ the circuit +is built: for example, if we ask for the latest consensus and we are +told "404". In this case, we mark the appropriate directory-specific +`{is_reachable}` instance for that guard to `<no>`.\] + +> C tor implements the above "note 2" by treating requests for +> directory guards for as if they had an +> extra type of restriction, rather than a separate instance of +> `{is_reachable}`. (For more on restrictions, see ["Selecting +> Guards"](#SELECTING) above.) This requires the C tor +> impementation to special-case this restriction type, so that +> it is treated the same way as an `{is_reachable}` variable. + + +**Rationale** + +See \[SELECTING\] above for rationale. + +<a id="guard-spec.txt-4.8"></a> + +## When a circuit succeeds {#ON_SUCCESS} + +When a circuit succeeds in a way that makes us conclude that a +guard _was_ reachable, we take these steps: + +```text + * We set its {is_reachable} status to <yes>. + * We set its {failing_since} to "never". + * If the guard was {is_pending}, we clear the {is_pending} flag + and set {pending_since} to false. + * If the guard was not a member of {CONFIRMED_GUARDS}, we add + it to the end of {CONFIRMED_GUARDS}. + + * If this circuit was <usable_on_completion>, this circuit is + now <complete>. You may attach streams to this circuit, + and use it for hidden services. + + * If this circuit was <usable_if_no_better_guard>, it is now + <waiting_for_better_guard>. You may not yet attach streams to it. + Then check whether the {last_time_on_internet} is more than + {INTERNET_LIKELY_DOWN_INTERVAL} seconds ago: + + * If it is, then mark all {PRIMARY_GUARDS} as "maybe" + reachable. + + * If it is not, update the list of waiting circuits. (See + [UPDATE_WAITING] below) +``` + +\[Note: the existing Tor logic will cause us to create more +circuits in response to some of these steps; and see +\[ON_CONSENSUS\].\] + +**Rationale** + +See \[SELECTING\] above for rationale. + +<a id="guard-spec.txt-4.9"></a> + +## Updating the list of waiting circuits {#UPDATE_WAITING} + +We run this procedure whenever it's possible that a +`<waiting_for_better_guard>` circuit might be ready to be called +`<complete>`. + +```text + * If any circuit C1 is <waiting_for_better_guard>, AND: + * All primary guards have reachable status of <no>. + * There is no circuit C2 that "blocks" C1. + Then, upgrade C1 to <complete>. + + Definition: In the algorithm above, C2 "blocks" C1 if: + * C2 obeys all the restrictions that C1 had to obey, AND + * C2 has higher priority than C1, AND + * Either C2 is <complete>, or C2 is <waiting_for_better_guard>, + or C2 has been <usable_if_no_better_guard> for no more than + {NONPRIMARY_GUARD_CONNECT_TIMEOUT} seconds. + + We run this procedure periodically: + + * If any circuit stays in <waiting_for_better_guard> + for more than {NONPRIMARY_GUARD_IDLE_TIMEOUT} seconds, + time it out. + + **Rationale** +``` + +If we open a connection to a guard, we might want to use it +immediately (if we're sure that it's the best we can do), or we +might want to wait a little while to see if some other circuit +which we like better will finish. + +When we mark a circuit `<complete>`, we don't close the +lower-priority circuits immediately: we might decide to use +them after all if the `<complete>` circuit goes down before +{NONPRIMARY_GUARD_IDLE_TIMEOUT} seconds. + +<a id="guard-spec.txt-4.9.1"></a> + +### Without a list of waiting circuits {#NO_CIRCLIST} + +As an alternative to the section \[SECTION:UPDATE_WAITING\] above, +this section presents a new way to maintain guard status +independently of tracking individual circuit status. This +formulation gives a result equivalent or similar to the approach +above, but simplifies the necessary communications between the +guard and circuit subsystems. + +As before, when all primary guards are Unreachable, we need to +try non-primary guards. We select the first such guard (in +preference order) that is neither Unreachable nor Pending. +Whenever we give out such a guard, if the guard's status is +Unknown, then we call that guard "Pending" with its {is_pending} +flag, until the attempt to use it succeeds or fails. We remember +when the guard became Pending with the {pending_since variable}. + +After completing a circuit, the implementation must check whether +its guard is usable. A guard's usability status may be "usable", +"unusable", or "unknown". A guard is usable according to +these rules: + +1. Primary guards are always usable. + +1. Non-primary guards are usable _for a given circuit_ if every + guard earlier in the preference list is either unsuitable for + that circuit (e.g. because of family restrictions), or marked as + Unreachable, or has been pending for at least + `{NONPRIMARY_GUARD_CONNECT_TIMEOUT}`. + + Non-primary guards are not usable _for a given circuit_ if some + guard earlier in the preference list is suitable for the circuit + _and_ Reachable. + + Non-primary guards are unusable if they have not become + usable after `{NONPRIMARY_GUARD_IDLE_TIMEOUT}` seconds. + +1. If a circuit's guard is not usable or unusable immediately, the + circuit is not discarded; instead, it is kept (but not used) until the + guard becomes usable or unusable. + +<a id="guard-spec.txt-4.10"></a> + +## Whenever we get a new consensus. {#ON_CONSENSUS} + +We update {GUARDS}. + +For every guard in {SAMPLED_GUARDS}, we update {IS_LISTED} and +{FIRST_UNLISTED_AT}. + +\[\*\*\] We remove entries from {SAMPLED_GUARDS} if appropriate, +according to the sampled-guards expiration rules. If they were +in {CONFIRMED_GUARDS}, we also remove them from +{CONFIRMED_GUARDS}. + +We recompute {FILTERED_GUARDS}, and everything that derives from +it, including {USABLE_FILTERED_GUARDS}, and {PRIMARY_GUARDS}. + +(Whenever one of the configuration options that affects the +filter is updated, we repeat the process above, starting at the +\[\*\*\] line.) + +```text +4.11. Deciding whether to generate a new circuit. + [Section:NEW_CIRCUIT_NEEDED] +``` + +We generate a new circuit when we don't have +enough circuits either built or in-progress to handle a given +stream, or an expected stream. + +For the purpose of this rule, we say that `<waiting_for_better_guard>` +circuits are neither built nor in-progress; that `<complete>` +circuits are built; and that the other states are in-progress. + +```text +4.12. When we are missing descriptors. + [Section:MISSING_DESCRIPTORS] +``` + +We need either a router descriptor or a microdescriptor in order +to build a circuit through a guard. If we do not have such a +descriptor for a guard, we can still use the guard for one-hop +directory fetches, but not for longer circuits. + +(Also, when we are missing descriptors for our first +{NUM_USABLE_PRIMARY_GUARDS} primary guards, we don't build +circuits at all until we have fetched them.) diff --git a/spec/guard-spec/appendices.md b/spec/guard-spec/appendices.md new file mode 100644 index 0000000..3aaa4e3 --- /dev/null +++ b/spec/guard-spec/appendices.md @@ -0,0 +1,152 @@ +<a id="guard-spec.txt-A"></a> + +# Appendices + +<a id="guard-spec.txt-A.0"></a> + +## Acknowledgements + +This research was supported in part by NSF grants CNS-1111539, +CNS-1314637, CNS-1526306, CNS-1619454, and CNS-1640548. + +<a id="guard-spec.txt-A.1"></a> + +## Parameters with suggested values. {#PARAM_VALS} + +(All suggested values chosen arbitrarily) + +{param:MAX_SAMPLE_THRESHOLD} -- 20% + +{param:MAX_SAMPLE_SIZE} -- 60 + +{param:GUARD_LIFETIME} -- 120 days + +```text + {param:REMOVE_UNLISTED_GUARDS_AFTER} -- 20 days + [previously ENTRY_GUARD_REMOVE_AFTER] + + {param:MIN_FILTERED_SAMPLE} -- 20 + + {param:N_PRIMARY_GUARDS} -- 3 + + {param:PRIMARY_GUARDS_RETRY_SCHED} + + We recommend the following schedule, which is the one + used in Arti: + + -- Use the "decorrelated-jitter" algorithm from "dir-spec.txt" + section 5.5 where `base_delay` is 30 seconds and `cap` + is 6 hours. + + This legacy schedule is the one used in C tor: + + -- every 10 minutes for the first six hours, + -- every 90 minutes for the next 90 hours, + -- every 4 hours for the next 3 days, + -- every 9 hours thereafter. + + {param:GUARDS_RETRY_SCHED} -- + + We recommend the following schedule, which is the one + used in Arti: + + -- Use the "decorrelated-jitter" algorithm from "dir-spec.txt" + section 5.5 where `base_delay` is 10 minutes and `cap` + is 36 hours. + + This legacy schedule is the one used in C tor: + + -- every hour for the first six hours, + -- every 4 hours for the 90 hours, + -- every 18 hours for the next 3 days, + -- every 36 hours thereafter. + + {param:INTERNET_LIKELY_DOWN_INTERVAL} -- 10 minutes + + {param:NONPRIMARY_GUARD_CONNECT_TIMEOUT} -- 15 seconds + + {param:NONPRIMARY_GUARD_IDLE_TIMEOUT} -- 10 minutes + + {param:MEANINGFUL_RESTRICTION_FRAC} -- .2 + + {param:EXTREME_RESTRICTION_FRAC} -- .01 + + {param:GUARD_CONFIRMED_MIN_LIFETIME} -- 60 days + + {param:NUM_USABLE_PRIMARY_GUARDS} -- 1 + + {param:NUM_USABLE_PRIMARY_DIRECTORY_GUARDS} -- 3 +``` + +<a id="guard-spec.txt-A.2"></a> + +## Random values {#RANDOM} + +Frequently, we want to randomize the expiration time of something +so that it's not easy for an observer to match it to its start +time. We do this by randomizing its start date a little, so that +we only need to remember a fixed expiration interval. + +By RAND(now, INTERVAL) we mean a time between now and INTERVAL in +the past, chosen uniformly at random. + +<a id="guard-spec.txt-A.5"></a> + +## Persistent state format {#persistent-state} + +The persistent state format doesn't need to be part of this +specification, since different implementations can do it +differently. Nonetheless, here's the one Tor uses: + +The "state" file contains one Guard entry for each sampled guard +in each instance of the guard state (see section 2). The value +of this Guard entry is a set of space-separated K=V entries, +where K contains any nonspace character except =, and V contains +any nonspace characters. + +Implementations must retain any unrecognized K=V entries for a +sampled guard when they regenerate the state file. + +The order of K=V entries is not allowed to matter. + +Recognized fields (values of K) are: + +```text + "in" -- the name of the guard state instance that this + sampled guard is in. If a sampled guard is in two guard + states instances, it appears twice, with a different "in" + field each time. Required. + + "rsa_id" -- the RSA id digest for this guard, encoded in + hex. Required. + + "bridge_addr" -- If the guard is a bridge, its configured address and + port (this can be the ORPort or a pluggable transport port). Optional. + + "nickname" -- the guard's nickname, if any. Optional. + + "sampled_on" -- the date when the guard was sampled. Required. + + "sampled_by" -- the Tor version that sampled this guard. + Optional. + + "unlisted_since" -- the date since which the guard has been + unlisted. Optional. + + "listed" -- 0 if the guard is not listed; 1 if it is. Required. + + "confirmed_on" -- date when the guard was + confirmed. Optional. + + "confirmed_idx" -- position of the guard in the confirmed + list. Optional. + + "pb_use_attempts", "pb_use_successes", "pb_circ_attempts", + "pb_circ_successes", "pb_successful_circuits_closed", + "pb_collapsed_circuits", "pb_unusable_circuits", + "pb_timeouts" -- state for the circuit path bias algorithm, + given in decimal fractions. Optional. +``` + +All dates here are given as a (spaceless) ISO8601 combined date +and time in UTC (e.g., 2016-11-29T19:39:31). diff --git a/spec/guard-spec/guard-selection/index.md b/spec/guard-spec/guard-selection/index.md new file mode 100644 index 0000000..c3738d4 --- /dev/null +++ b/spec/guard-spec/guard-selection/index.md @@ -0,0 +1,72 @@ +<a id="guard-spec.txt-3"></a> + +# Circuit Creation, Entry Guard Selection (1000 foot view) {#1000-foot} + +A circuit in Tor is a path through the network connecting a client to +its destination. At a high-level, a three-hop exit circuit will look +like this: + +Client \<-> Entry Guard \<-> Middle Node \<-> Exit Node \<-> Destination + +Entry guards are the only nodes which a client will connect to +directly. Exit relays are the nodes by which traffic exits the +Tor network in order to connect to an external destination. + +## Path selection + +For any multi-hop circuit, at least one entry guard and middle node(s) are +required. An exit node is required if traffic will exit the Tor +network. Depending on its configuration, a relay listed in a +consensus could be used for any of these roles. However, this +specification defines how entry guards specifically should be selected and +managed, as opposed to middle or exit nodes. + +### Managing entry guards + +At a high level, a relay listed in a consensus will move through the +following states in the process from initial selection to eventual +usage as an entry guard: + +```text + relays listed in consensus + | + sampled + | | + confirmed filtered + | | | + primary usable_filtered +``` + +Relays listed in the latest consensus can be sampled for guard usage +if they have the "Guard" flag. Sampling is random but weighted by +a measured bandwidth multiplied by bandwidth-weights (Wgg if guard only, +Wgd if guard+exit flagged). + +Once a path is built and a circuit established using this guard, it +is marked as confirmed. Until this point, guards are first sampled +and then filtered based on information such as our current +configuration (see SAMPLED and FILTERED sections) and later marked as +usable_filtered if the guard is not primary but can be reached. + +It is always preferable to use a primary guard when building a new +circuit in order to reduce guard churn; only on failure to connect to +existing primary guards will new guards be used. + +### Middle and exit node selection + +Middle nodes are selected at random from relays listed in the latest +consensus, weighted by bandwidth and bandwidth-weights. Exit nodes are +chosen similarly but restricted to relays with a sufficiently permissive +exit policy. + +## Circuit Building + +Once a path is chosen, Tor will use this path to build a new circuit. + +If the circuit is built successfully, Tor will either use it +immediately, or Tor will wait for a circuit with a more preferred +guard if there's a good chance that it will be able to make one. + +If the circuit fails in a way that makes us conclude that a guard +is not reachable, the guard is marked as unreachable, the circuit is +closed, and waiting circuits are updated. diff --git a/spec/guard-spec/index.md b/spec/guard-spec/index.md new file mode 100644 index 0000000..e9167bf --- /dev/null +++ b/spec/guard-spec/index.md @@ -0,0 +1,47 @@ +# Tor Guard Specification + +<a id="guard-spec.txt-1"></a> + +## Introduction and motivation + +Tor uses entry guards to prevent an attacker who controls some +fraction of the network from observing a fraction of every user's +traffic. If users chose their entries and exits uniformly at +random from the list of servers every time they build a circuit, +then an adversary who had (k/N) of the network would deanonymize +F=(k/N)^2 of all circuits... and after a given user had built C +circuits, the attacker would see them at least once with +probability 1-(1-F)^C. With large C, the attacker would get a +sample of every user's traffic with probability 1. + +To prevent this from happening, Tor clients choose a small number +of guard nodes (e.g. 3). These guard nodes are the only +nodes that the client will connect to directly. If they are not +compromised, the user's paths are not compromised. + +This specification outlines Tor's guard housekeeping algorithm, +which tries to meet the following goals: + +```text + - Heuristics and algorithms for determining how and which guards + are chosen should be kept as simple and easy to understand as + possible. + + - Clients in censored regions or who are behind a fascist + firewall who connect to the Tor network should not experience + any significant disadvantage in terms of reachability or + usability. + + - Tor should make a best attempt at discovering the most + appropriate behavior, with as little user input and + configuration as possible. + + - Tor clients should discover usable guards without too much + delay. + + - Tor clients should resist (to the extent possible) attacks + that try to force them onto compromised guards. + + - Should maintain the load-balancing offered by the path selection + algorithm +``` diff --git a/spec/guard-spec/state-instances.md b/spec/guard-spec/state-instances.md new file mode 100644 index 0000000..5449648 --- /dev/null +++ b/spec/guard-spec/state-instances.md @@ -0,0 +1,49 @@ +<a id="guard-spec.txt-2"></a> + +# State instances + +In the algorithm below, we describe a set of persistent and +non-persistent state variables. These variables should be +treated as an object, of which multiple instances can exist. + +In particular, we specify the use of three particular instances: + +A. UseBridges + +```text + If UseBridges is set, then we replace the {GUARDS} set in + [Sec:GUARDS] below with the list of configured + bridges. We maintain a separate persistent instance of + {SAMPLED_GUARDS} and {CONFIRMED_GUARDS} and other derived + values for the UseBridges case. + + In this case, we impose no upper limit on the sample size. + + B. EntryNodes / ExcludeNodes / Reachable*Addresses / + FascistFirewall / ClientUseIPv4=0 + + If one of the above options is set, and UseBridges is not, + then we compare the fraction of usable guards in the consensus + to the total number of guards in the consensus. + + If this fraction is less than {MEANINGFUL_RESTRICTION_FRAC}, + we use a separate instance of the state. + + (While Tor is running, we do not change back and forth between + the separate instance of the state and the default instance + unless the fraction of usable guards is 5% higher than, or 5% + lower than, {MEANINGFUL_RESTRICTION_FRAC}. This prevents us + from flapping back and forth between instances if we happen to + hit {MEANINGFUL_RESTRICTION_FRAC} exactly. + + If this fraction is less than {EXTREME_RESTRICTION_FRAC}, we use a + separate instance of the state, and warn the user. + + [TODO: should we have a different instance for each set of heavily + restricted options?] + + C. Default + + If neither of the above variant-state instances is used, + we use a default instance. +``` |