aboutsummaryrefslogtreecommitdiff
path: root/proposals/268-guard-selection.txt
diff options
context:
space:
mode:
authorNick Mathewson <nickm@torproject.org>2016-07-07 10:08:45 -0400
committerNick Mathewson <nickm@torproject.org>2016-07-07 10:08:45 -0400
commitf6f897c365044bcb293726b4579358826ceb038c (patch)
treeb52850d3bb78c1d9f5096c428599afd9df4fd82b /proposals/268-guard-selection.txt
parent8104db8f26f64a5436425dd07b0ebd34d34e969e (diff)
downloadtorspec-f6f897c365044bcb293726b4579358826ceb038c.tar.gz
torspec-f6f897c365044bcb293726b4579358826ceb038c.zip
Add the twstrike guard selection draft as a separate proposal
Taken from https://raw.githubusercontent.com/twstrike/torspec/review/proposals/259-guard-selection.txt See editorial note for comment on why I'm not just dropping this in over prop259.
Diffstat (limited to 'proposals/268-guard-selection.txt')
-rw-r--r--proposals/268-guard-selection.txt507
1 files changed, 507 insertions, 0 deletions
diff --git a/proposals/268-guard-selection.txt b/proposals/268-guard-selection.txt
new file mode 100644
index 0000000..9277dd8
--- /dev/null
+++ b/proposals/268-guard-selection.txt
@@ -0,0 +1,507 @@
+Filename: 268-guard-selection.txt
+Title: New Guard Selection Behaviour
+Author: Isis Lovecruft, George Kadianakis, [Ola Bini]
+Created: 2015-10-28
+Status: Draft
+
+ (Editorial note: this was origianlly written as a revision of
+ proposal 259, but it diverges so substantially that it seemed
+ better to assign it a new number for reference, so that we
+ aren't always talking about "The old 259" and "the new 259". -NM)
+
+§1. Overview
+
+ Tor uses entry guards to prevent an attacker who controls some
+ fraction of the network from observing a fraction of every user's
+ traffic. If users chose their entries and exits uniformly at
+ random from the list of servers every time they build a circuit,
+ then an adversary who had (k/N) of the network would deanonymize
+ F=(k/N)^2 of all circuits... and after a given user had built C
+ circuits, the attacker would see them at least once with
+ probability 1-(1-F)^C. With large C, the attacker would get a
+ sample of every user's traffic with probability 1.
+
+ To prevent this from happening, Tor clients choose a small number of
+ guard nodes (currently 3). These guard nodes are the only nodes
+ that the client will connect to directly. If they are not
+ compromised, the user's paths are not compromised.
+
+ But attacks remain. Consider an attacker who can run a firewall
+ between a target user and the Tor network, and make
+ many of the guards they don't control appear to be unreachable.
+ Or consider an attacker who can identify a user's guards, and mount
+ denial-of-service attacks on them until the user picks a guard
+ that the attacker controls.
+
+ In the presence of these attacks, we can't continue to connect to
+ the Tor network unconditionally. Doing so would eventually result
+ in the user choosing a hostile node as their guard, and losing
+ anonymity.
+
+ This proposal outlines a new entry guard selection algorithm, which
+ addresses the following concerns:
+
+ - Heuristics and algorithms for determining how and which guard(s)
+ is(/are) chosen should be kept as simple and easy to understand
+ as possible.
+
+ - Clients in censored regions or who are behind a fascist firewall
+ who connect to the Tor network should not experience any
+ significant disadvantage in terms of reachability or usability.
+
+ - Tor should make a best attempt at discovering the most
+ appropriate behaviour, with as little user input and
+ configuration as possible.
+
+
+§2. Design
+
+ Alice, an OP attempting to connect to the Tor network, should
+ undertake the following steps to determine information about the
+ local network and to select (some) appropriate entry guards. In the
+ following scenario, it is assumed that Alice has already obtained a
+ recent, valid, and verifiable consensus document.
+
+ The algorithm is divided into four components such that the full
+ algorithm is implemented by first invoking START, then repeatedly
+ calling NEXT while adviced it SHOULD_CONTINUE and finally calling
+ END. For an example usage see §A. Appendix.
+
+ Several components of NEXT can be invoked asynchronously. SHOULD_CONTINUE
+ is used for the algorithm to be able to tell the caller whether we
+ consider the work done or not - this can be used to retry primary
+ guards when we finally are able to connect to a guard after a long
+ network outage, for example.
+
+ This algorithm keeps track of the unreachability status for guards
+ in state global to the system, so that repeated runs will not have
+ to rediscover unreachability over and over again. However, this
+ state does not need to be persisted permanently - it is purely an
+ optimization.
+
+ The algorithm expects several arguments to guide its behavior. These
+ will be defined in §2.1.
+
+ The goal of this algorithm is to strongly prefer connecting to the
+ same guards we have connected to before, while also trying to detect
+ conditions such as a network outage. The way it does this is by keeping
+ track of how many guards we have exposed ourselves to, and if we have
+ connected to too many we will fall back to only retrying the ones we have
+ already tried. The algorithm also decides on sample set that should
+ be persisted - in order to minimize the risk of an attacker forcing
+ enumeration of the whole network by triggering rebuilding of
+ circuits.
+
+
+§2.1. Definitions
+
+ Bad guard: a guard is considered bad if it conforms with the function IS_BAD
+ (see §G. Appendix for details).
+
+ Dead guard: a guard is considered dead if it conforms with the function
+ IS_DEAD (see §H. Appendix for details).
+
+ Obsolete guard: a guard is considered obsolete if it conforms with the
+ function IS_OBSOLETE (see §I. Appendix for details).
+
+ Live entry guard: a guard is considered live if it conforms with the function
+ IS_LIVE (see §D. Appendix for details).
+
+§2.1. The START algorithm
+
+ In order to start choosing an entry guard, use the START
+ algorithm. This takes four arguments that can be used to fine tune
+ the workings:
+
+ USED_GUARDS
+ This is a list that contains all the guards that have been used
+ before by this client. We will prioritize using guards from this
+ list in order to minimize our exposure. The list is expected to
+ be sorted based on priority, where the first entry will have the
+ highest priority.
+
+ SAMPLED_GUARDS
+ This is a set that contains all guards that should be considered
+ for connection. This set should be persisted between runs. It
+ should be filled by using NEXT_BY_BANDWIDTH with GUARDS as an
+ argument if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD
+ guards after winnowing out older guards.
+
+ N_PRIMARY_GUARDS
+ The number of guards we should consider our primary
+ guards. These guards will be retried more frequently and will
+ take precedence in most situations. By default the primary
+ guards will be the first N_PRIMARY_GUARDS guards from USED_GUARDS.
+ When the algorith is used in constrained mode (have bridges or entry
+ nodes in the configuration file), this value should be 1 otherwise the
+ proposed value is 3.
+
+ DIR
+ If this argument is set, we should only consider guards that can
+ be directory guards. If not set, we will consider all guards.
+
+ The primary work of START is to initialize the state machine depicted
+ in §2.2. The initial state of the machine is defined by:
+
+ GUARDS
+ This is a set of all guards from the consensus. It will primarily be used
+ to fill in SAMPLED_GUARDS
+
+ FILTERED_SAMPLED
+ This is a set that contains all guards that we are willing to connect to.
+ It will be obtained from calling FILTER_SET with SAMPLED_GUARDS as
+ argument.
+
+ REMAINING_GUARDS
+ This is a running set of the guards we have not yet tried to connect to.
+ It should be initialized to be FILTERED_SAMPLED without USED_GUARDS.
+
+ STATE
+ A variable that keeps track of which state in the state
+ machine we are currently in. It should be initialized to
+ STATE_PRIMARY_GUARDS.
+
+ PRIMARY_GUARDS
+ This list keeps track of our primary guards. These are guards
+ that we will prioritize when trying to connect, and will also
+ retry more often in case of failure with other guards.
+ It should be initialized by calling algorithm
+ NEXT_PRIMARY_GUARD repeatedly until PRIMARY_GUARDS contains
+ N_PRIMARY_GUARDS elements.
+
+
+§2.2. The NEXT algorithm
+
+ The NEXT algorithm is composed of several different possibly flows. The
+ first one is a simple state machine that can transfer between two
+ different states. Every time NEXT is invoked, it will resume at the
+ state where it left off previously. In the course of selecting an
+ entry guard, a new consensus can arrive. When that happens we need
+ to update the data structures used, but nothing else should change.
+
+ Before jumping in to the state machine, we should first check if it
+ was at least PRIMARY_GUARDS_RETRY_INTERVAL minutes since we tried
+ any of the PRIMARY_GUARDS. If this is the case, and we are not in
+ STATE_PRIMARY_GUARDS, we should save the previous state and set the
+ state to STATE_PRIMARY_GUARDS.
+
+
+§2.2.1. The STATE_PRIMARY_GUARDS state
+
+ Return each entry in PRIMARY_GUARDS in turn. For each entry, if the
+ guard should be retried and considered suitable use it. A guard is
+ considered to eligible to retry if is marked for retry or is live
+ and id not bad. Also, a guard is considered to be suitable if is
+ live and, if is a directory it should not be a cache.
+
+ If all entries have been tried transition to STATE_TRY_REMAINING.
+
+§2.2.2. The STATE_TRY_REMAINING state
+
+ Return each entry in USED_GUARDS that is not in PRIMARY_GUARDS in
+ turn.For each entry, if a guard is found return it.
+
+ Return each entry from REMAINING_GUARDS in turn.
+ For each entry, if the guard should be retried and considered
+ suitable use it and mark it as unreachable. A guard is
+ considered to eligible to retry if is marked for retry or is live
+ and id not bad. Also, a guard is considered to be suitable if is
+ live and, if is a directory it should not be a cache.
+
+ If no entries remain in REMAINING_GUARDS, transition to
+ STATE_PRIMARY_GUARDS.
+
+
+§2.2.3. ON_NEW_CONSENSUS
+
+ First, ensure that all guard profiles are updated with information
+ about whether they were in the newest consensus or not.
+
+ Update the bad status for all guards in USED_GUARDS and SAMPLED_GUARDS.
+ Remove all dead guards from USED_GUARDS and SAMPLED_GUARDS.
+ Remove all obsolete guards from USED_GUARDS and SAMPLED_GUARDS.
+
+§2.3. The SHOULD_CONTINUE algorithm
+
+ This algorithm takes as an argument a boolean indicating whether the
+ circuit was successfully built or not.
+
+ After the caller have tried to build a circuit with a returned
+ guard, they should invoke SHOULD_CONTINUE to understand if the
+ algorithm is finished or not. SHOULD_CONTINUE will always return
+ true if the circuit failed. If the circuit succeeded,
+ SHOULD_CONTINUE will always return false, unless the guard that
+ succeeded was the first guard to succeed after
+ INTERNET_LIKELY_DOWN_INTERVAL minutes - in that case it will set the
+ state to STATE_PRIMARY_GUARDS and return true.
+
+
+§2.4. The END algorithm
+
+ The goal of this algorithm is simply to make sure that we keep track
+ of successful connections made. This algorithm should be invoked
+ with the guard that was used to correctly set up a circuit.
+
+ Once invoked, this algorithm will mark the guard as used, and make
+ sure it is in USED_GUARDS, by adding it at the end if it was not there.
+
+
+§2.5. Helper algorithms
+
+ These algorithms are used in the above algorithms, but have been
+ separated out here in order to make the flow clearer.
+
+ NEXT_PRIMARY_GUARD
+ - Return the first entry from USED_GUARDS that is not in
+ PRIMARY_GUARDS and that is in the most recent consensus.
+ - If USED_GUARDS is empty, use NEXT_BY_BANDWIDTH with
+ REMAINING_GUARDS as the argument.
+
+ NEXT_BY_BANDWIDTH
+ - Takes G as an argument, which should be a set of guards to
+ choose from.
+ - Return a randomly select element from G, weighted by bandwidth.
+
+ FILTER_SET
+ - Takes G as an argument, which should be a set of guards to filter.
+ - Filter out guards in G that don't comply with IS_LIVE (see
+ §D. Appendix for details).
+ - If the filtered set is smaller than MINIMUM_FILTERED_SAMPLE_SIZE and G
+ is smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, expand G and try to
+ filter out again. G is expanded by adding one new guard at a time using
+ NEXT_BY_BANDWIDTH with GUARDS as an argument.
+ - If G is not smaller than MAXIMUM_SAMPLE_SIZE_THRESHOLD, G should not be
+ expanded. Abort execution of this function by returning null and report
+ an error to the user.
+
+
+§3. Consensus Parameters, & Configurable Variables
+
+ This proposal introduces several new parameters that ideally should
+ be set in the consensus but that should also be possible to
+ set or override in the client configuration file. Some of these have
+ proposed values, but for others more simulation and trial needs to
+ happen.
+
+ PRIMARY_GUARDS_RETRY_INTERVAL
+ In order to make it more likely we connect to a primary guard,
+ we would like to retry the primary guards more often than other
+ types of guards. This parameter controls how many minutes should
+ pass before we consider retrying primary guards again. The
+ proposed value is 3.
+
+ SAMPLE_SET_THRESHOLD
+ In order to allow us to recognize completely unreachable network,
+ we would like to avoid connecting to too many guards before switching
+ modes. We also want to avoid exposing ourselves to too many nodes in a
+ potentially hostile situation. This parameter, expressed as a
+ fraction, determines the number of guards we should keep as the
+ sampled set of the only guards we will consider connecting
+ to. It will be used as a fraction for the sampled set.
+ If we assume there are 1900 guards, a setting of 0.02
+ means we will have a sample set of 38 guards.
+ This limits our total exposure. Proposed value is 0.02.
+
+ MINIMUM_FILTERED_SAMPLE_SIZE
+ The minimum size of the sampled set after filtering out nodes based on
+ client configuration (FILTERED_SAMPLED). Proposed value is ???.
+
+ MAXIMUM_SAMPLE_SIZE_THRESHOLD
+ In order to guarantee a minimum size of guards after filtering,
+ we expand SAMPLED_GUARDS until a limit. This fraction of GUARDS will be
+ used as an upper bound when expanding SAMPLED_GUARDS.
+ Proposed value is 0.03.
+
+ INTERNET_LIKELY_DOWN_INTERVAL
+ The number of minutes since we started trying to find an entry
+ guard before we should consider the network down and consider
+ retrying primary guards before using a functioning guard
+ found. Proposed value 5.
+
+§4. Security properties and behavior under various conditions
+
+ Under normal conditions, this algorithm will allow us to quickly
+ connect and use guards we have used before with high likelihood of
+ working. Assuming the first primary guard is reachable and in the
+ consensus, this algorithm will deterministically always return that
+ guard.
+
+ Under dystopic conditions (when a firewall is in place that blocks
+ all ports except for potentially port 80 and 443), this algorithm
+ will try to connect to 2% of all guards before switching modes to try
+ dystopic guards. Currently, that means trying to connect to circa 40
+ guards before getting a successful connection. If we assume a
+ connection try will take maximum 10 seconds, that means it will take
+ up to 6 minutes to get a working connection.
+
+ When the network is completely down, we will try to connect to 2% of
+ all guards plus 2% of all dystopic guards before realizing we are
+ down. This means circa 50 guards tried assuming there are 1900 guards
+ in the network.
+
+ In terms of exposure, we will connect to a maximum of 2% of all
+ guards plus 2% of all dystopic guards, or 3% of all guards,
+ whichever is lower. If N is the number of guards, and k is the
+ number of guards an attacker controls, that means an attacker would
+ have a probability of 1-(1-(k/N)^2)^(N * 0.03) to have one of their
+ guards selected before we fall back. In real terms, this means an
+ attacker would need to control over 10% of all guards in order to
+ have a larger than 50% chance of controlling a guard for any given client.
+
+ In addition, since the sampled set changes slowly (the suggestion
+ here is that guards in it expire every month) it is not possible for
+ an attacker to force a connection to an entry guard that isn't
+ already in the users sampled set.
+
+
+§A. Appendix: An example usage
+
+ In order to clarify how this algorithm is supposed to be used, this
+ pseudo code illustrates the building of a circuit:
+
+ ESTABLISH_CIRCUIT:
+
+ if chosen_entry_node = NULL
+ if context = NULL
+ context = ALGO_CHOOSE_ENTRY_GUARD_START(used_guards,
+ sampled_guards=[],
+ options,
+ n_primary_guards=3,
+ dir=false,
+ guards_in_consensus)
+
+ chosen_entry_node = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context)
+ if not IS_SUITABLE(chosen_entry_node)
+ try another entry guard
+
+ circuit = composeCircuit(chosen_entry_node)
+ return circuit
+
+ ON_FIRST_HOP_CALLBACK(channel):
+
+ if !SHOULD_CONTINUE:
+ ALGO_CHOOSE_ENTRY_GUARD_END(entryGuard)
+ else
+ chosen_entry_node = NULL
+
+
+§B. Appendix: Entry Points in Tor
+
+ In order to clarify how this algorithm is supposed to be integrated with
+ Tor, here are some entry points to trigger actions mentioned in spec:
+
+ When establish_circuit:
+
+ If *chosen_entry_node* doesn't exist
+ If *context* exist, populate the first one as *context*
+ Otherwise, use ALGO_CHOOSE_ENTRY_GUARD_START to initalize a new *context*.
+
+ After this when we want to choose_good_entry_server, we will use
+ ALGO_CHOOSE_ENTRY_GUARD_NEXT to get a candidate.
+
+ Use chosen_entry_node to build_circuit and handle_first_hop,
+ return this circuit
+
+ When entry_guard_register_connect_status(should_continue):
+
+ if !should_continue:
+ Call ALGO_CHOOSE_ENTRY_GUARD_END(chosen_entry_node)
+ else:
+ Set chosen_entry_node to NULL
+
+ When new directory_info_has_arrived:
+
+ Do ON_NEW_CONSENSUS
+
+
+§C. Appendix: IS_SUITABLE helper function
+
+ A guard is suitable if it satisfies all of the folowing conditions:
+ - It's considered to be live, according to IS_LIVE.
+ - It's a directory cache if a directory guard is requested.
+ - It's not the chosen exit node.
+ - It's not in the family of the chosen exit node.
+
+ This conforms to the existing conditions in "populate_live_entry_guards()".
+
+
+§D. Appendix: IS_LIVE helper function
+
+ A guard is considered live if it satisfies all of the folowing conditions:
+ - It's not disabled because of path bias issues (path_bias_disabled).
+ - It was not observed to become unusable according to the directory or
+ the user configuration (bad_since).
+ - It's marked for retry (can_retry) or it's been unreachable for some
+ time (unreachable_since) but enough time has passed since we last tried
+ to connect to it (entry_is_time_to_retry).
+ - It's in our node list, meaninig it's present in the latest consensus.
+ - It has a usable descriptor (either a routerdescriptor or a
+ microdescriptor) unless a directory guard is requested.
+ - It's a general-purpose router unless UseBridges is configured.
+ - It's reachable by the configuration (fascist_firewall_allows_node).
+
+ This conforms to the existing conditions in "entry_is_live()".
+
+ A guard is observed to become unusable according to the directory or the
+ user configuration if it satisfies any of the following conditions:
+ - It's not in our node list, meaninig it's present in the latest consensus.
+ - It's not currently running (is_running).
+ - It's not a bridge and not a configured bridge
+ (node_is_a_configured_bridge) and UseBridges is True.
+ - It's not a possible guard and is not in EntryNodes and UseBridges is
+ False.
+ - It's in ExcludeNodes. Nevertheless this is ignored when
+ loading from config.
+ - It's not reachable by the configuration (fascist_firewall_allows_node).
+ - It's disabled because of path bias issues (path_bias_disabled).
+
+ This conforms to the existing conditions in "entry_guards_compute_status()".
+
+§E. Appendix: UseBridges and Bridges configurations
+
+ This is mutually exclusive with EntryNodes.
+
+ If options->UseBridges OR options->EntryNodes:
+ - guards = populate_live_entry_guards() - this is the "bridge flavour" of
+ IS_SUITABLE as mentioned before.
+ - return node_sl_choose_by_bandwidth(guards, WEIGHT_FOR_GUARD)
+ This is "choose a guard from S by bandwidth weight".
+
+ UseBridges and Bridges must be set together. Bridges go to bridge_list (via
+ bridge_add_from_config()), but how is it used?
+ learned_bridge_descriptor() adds the bridge to the global entry_guards if
+ UseBridges = True.
+
+ We either keep the existing global entry_guards OR incorporate bridges in the
+ proposal (remove non bridges from USED_GUARDS, and REMAINING_GUARDS = bridges?)
+
+ If UseBridges is set as true, we need to fill the SAMPLED_GUARDS
+ with bridges specified and learned from consensus.
+
+§F. Appendix: EntryNodes configuration
+
+ This is mutually exclusive with Bridges.
+
+ The global entry_guards will be updated with entries in EntryNodes
+ (see entry_guards_set_from_config()).
+
+ If EntryNodes is set, we need to fill the SAMPLED_GUARDS with
+ EntryNodes specified in options.
+
+§G. Appendix: IS_BAD helper function
+
+ A guard is considered bad if is not included in the newest
+ consensus.
+
+§H. Appendix: IS_DEAD helper function
+
+ A guard is considered dead if it's marked as bad for
+ ENTRY_GUARD_REMOVE_AFTER period (30 days) unless they have been disabled
+ because of path bias issues (path_bias_disabled).
+
+§I. Appendix: IS_OBSOLETE helper function
+
+ A guard is considered obsolete if it was chosen by an Tor
+ version we can't recognize or it was chosen more than GUARD_LIFETIME ago.
+
+-*- coding: utf-8 -*-