aboutsummaryrefslogtreecommitdiff
path: root/proposals/255-hs-load-balancing.txt
diff options
context:
space:
mode:
authorTom van der Woerdt <info@tvdw.eu>2015-10-12 20:05:51 +0200
committerTom van der Woerdt <info@tvdw.eu>2015-10-12 20:34:32 +0200
commitbc6855ecce9af8335329bc39f9196c1e6e1ede01 (patch)
tree40f1f80b81915c8aee8d8ef63be83604168d2e54 /proposals/255-hs-load-balancing.txt
parent8401f6bc692532a048d42dc48af932d6ea379ea8 (diff)
downloadtorspec-bc6855ecce9af8335329bc39f9196c1e6e1ede01.tar.gz
torspec-bc6855ecce9af8335329bc39f9196c1e6e1ede01.zip
Add proposal for load-balancing hidden services
Diffstat (limited to 'proposals/255-hs-load-balancing.txt')
-rw-r--r--proposals/255-hs-load-balancing.txt157
1 files changed, 157 insertions, 0 deletions
diff --git a/proposals/255-hs-load-balancing.txt b/proposals/255-hs-load-balancing.txt
new file mode 100644
index 0000000..eaab035
--- /dev/null
+++ b/proposals/255-hs-load-balancing.txt
@@ -0,0 +1,157 @@
+Filename: 255-hs-load-balancing.txt
+Title: Controller features to allow for load-balancing hidden services
+Author: Tom van der Woerdt
+Created: 2015-10-12
+Status: draft
+
+1. Overview and motivation
+
+To address scaling concerns with the onion web, we want to be able to
+spread the load of hidden services across multiple machines.
+OnionBalance is a great stab at this, and it can currently give us 60x
+the capacity by publishing 6 separate descriptors, each with 10
+introduction points, but more is better. This proposal aims to address
+hidden service scaling up to a point where we can handle millions of
+concurrent connections.
+
+The basic idea involves splitting the 'introduce' from the
+'rendezvous', in the tor implementation, and adding new events and
+commands to the control specification to allow intercepting
+introductions and transmitting them to different nodes, which will then
+take care of the actual rendezvous. External controller code could
+relay the data to another node or a pool of nodes, all which are run by
+the hidden service operator, effectively distributing the load of
+hidden services over multiple processes.
+
+By cleverly utilizing the current descriptor methods through
+OnionBalance, we could publish up to sixty unique introduction points,
+which could translate to many thousands of parallel tor workers after
+implementing this proposal. This should allow hidden services to go
+multi-threaded with a few small changes, and continue scaling for a
+long time.
+
+
+2. Specification
+
+We propose two additions to the control specification, of which one is
+an event and the other is a new command. We also introduce two new
+configuration options.
+
+
+2.1. HiddenServiceAutomaticRendezvous configuration option
+
+The syntax is:
+ "HiddenServiceAutomaticRendezvous" SP [1|0] CRLF
+
+This configuration option is defined to be a boolean toggle which, if
+zero, stops the tor implementation from automatically doing a rendezvous
+when an INTRODUCE2 cell is received. Instead, an event will be sent to
+the controllers. If no controllers are present, the introduction cell
+should be dropped, as acting on it instead of dropping it could open a
+window for a DoS.
+
+This configuration option can be specified on a per-hidden service
+level, and can be set through the controller for ephemeral hidden
+services as well.
+
+
+2.2. HiddenServiceTag configuration option
+
+The syntax is:
+ "HiddenServiceTag" SP [a-zA-Z0-9] CRLF
+
+To identify groups of hidden services more easily across nodes, a
+name/tag can be given to a hidden service. Defaults to the storage path
+of the hidden service (HiddenServiceDir).
+
+
+2.3. The "INTRODUCE" event
+
+The syntax is:
+ "650" SP "INTRODUCE" SP HSTag SP RendezvousData CRLF
+
+ HSTag = the tag of the hidden service
+ RendezvousData = implementation-specific, but must not contain
+ whitespace, must only contain human-readable
+ characters, and should be no longer than 2048 bytes
+
+The INTRODUCE event should contain sufficient data to allow continuing
+the rendezvous from another Tor instance. The exact format is left
+unspecified and left up to the implementation. From this follows that
+only matching versions can be used safely to coordinate the rendezvous
+of hidden service connections.
+
+
+2.4. "PERFORM-RENDEZVOUS" command
+
+The syntax is:
+ "PERFORM-RENDEZVOUS" SP HSTag SP RendezvousData CRLF
+
+This command allows a controller to perform a rendezvous using data
+received through an INTRODUCE event. The format of RendezvousData is
+not specified other than that it must not contain whitespace, and
+should be no longer than 2048 bytes.
+
+
+2.5. The RendezvousData blob
+
+The "RendezvousData" blob is opaque to the controller, however the tor
+implementation should of course know how to deal with it. Its contents
+is the minimal amount of data required to process the INTRODUCE2 cell
+on another machine.
+
+Before proposal 224 is implemented, this could consist of the
+INTRODUCE2 cell payload, the key to decrypt the cell with if the cell
+is not already decrypted (which may be preferable, for performance
+reasons), and data necessary for other machines to recognize what to do
+with the cell.
+
+After proposal 224 is implemented, the blob would contain any
+additional keys needed to perform the rendezvous handshake.
+
+Implementations do not need to handle blobs generated by other versions
+of the software. Because of this, it is recommended to include a
+version number which can be used to verify that the blob is from a
+compatible implementation.
+
+
+3. Compatibility and security
+
+The implementation of these methods should, ideally, not change
+anything in the network, and all control changes are opt-in, so this
+proposal is fully backwards compatible.
+
+Controllers handling this data must be careful to not leak rendezvous
+data to untrusted parties, as it could be used to intercept and
+manipulate hidden services traffic.
+
+
+4. Example
+
+Let's take an example where a client (Alice) tries to contact Bob's
+hidden service. To do this, Bob follows the normal hidden service
+specification, except he sets up ten servers to do this. One of these
+publishes the descriptor, the others have this disabled. When the
+INTRODUCE2 cell arrives at the node which published the descriptor, it
+does not immediately try to perform the rendezvous, but instead outputs
+this to the controller. Through an out-of-band process this message is
+relayed to a controller of another node of Bob's, and this transmits
+the "PERFORM-RENDEZVOUS" command to that node. This node finally
+performs the rendezvous, and will continue to serve data to Alice,
+whose client will now not have to talk to the introduction point
+anymore.
+
+
+5. Other considerations
+
+We have left the actual format of the rendezvous data in the control
+protocol unspecified, so that controllers do not need to worry about
+the various types of hidden service connections, most notably proposal
+224.
+
+The decision to not implement the actual cell relaying in the tor
+implementation itself was taken to allow more advanced configurations,
+and to leave the actual load-balancing algorithm to the implementor of
+the controller. The developer of the tor implementation should not
+have to choose between a round-robin algorithm and something that could
+pull CPU load averages from a centralized monitoring system.