``` Filename: 255-hs-load-balancing.txt Title: Controller features to allow for load-balancing hidden services Author: Tom van der Woerdt Created: 2015-10-12 Status: Reserve 1. Overview and motivation To address scaling concerns with the onion web, we want to be able to spread the load of hidden services across multiple machines. OnionBalance is a great stab at this, and it can currently give us 60x the capacity by publishing 6 separate descriptors, each with 10 introduction points, but more is better. This proposal aims to address hidden service scaling up to a point where we can handle millions of concurrent connections. The basic idea involves splitting the 'introduce' from the 'rendezvous', in the tor implementation, and adding new events and commands to the control specification to allow intercepting introductions and transmitting them to different nodes, which will then take care of the actual rendezvous. External controller code could relay the data to another node or a pool of nodes, all which are run by the hidden service operator, effectively distributing the load of hidden services over multiple processes. By cleverly utilizing the current descriptor methods through OnionBalance, we could publish up to sixty unique introduction points, which could translate to many thousands of parallel tor workers after implementing this proposal. This should allow hidden services to go multi-threaded with a few small changes, and continue scaling for a long time. 2. Specification We propose two additions to the control specification, of which one is an event and the other is a new command. We also introduce two new configuration options. 2.1. HiddenServiceAutomaticRendezvous configuration option The syntax is: "HiddenServiceAutomaticRendezvous" SP [1|0] CRLF This configuration option is defined to be a boolean toggle which, if zero, stops the tor implementation from automatically doing a rendezvous when an INTRODUCE2 cell is received. Instead, an event will be sent to the controllers. If no controllers are present, the introduction cell should be dropped, as acting on it instead of dropping it could open a window for a DoS. This configuration option can be specified on a per-hidden service level, and can be set through the controller for ephemeral hidden services as well. 2.2. HiddenServiceTag configuration option The syntax is: "HiddenServiceTag" SP [a-zA-Z0-9] CRLF To identify groups of hidden services more easily across nodes, a name/tag can be given to a hidden service. Defaults to the storage path of the hidden service (HiddenServiceDir). 2.3. The "INTRODUCE" event The syntax is: "650" SP "INTRODUCE" SP HSTag SP RendezvousData CRLF HSTag = the tag of the hidden service RendezvousData = implementation-specific, but must not contain whitespace, must only contain human-readable characters, and should be no longer than 2048 bytes The INTRODUCE event should contain sufficient data to allow continuing the rendezvous from another Tor instance. The exact format is left unspecified and left up to the implementation. From this follows that only matching versions can be used safely to coordinate the rendezvous of hidden service connections. 2.4. "PERFORM-RENDEZVOUS" command The syntax is: "PERFORM-RENDEZVOUS" SP HSTag SP RendezvousData CRLF This command allows a controller to perform a rendezvous using data received through an INTRODUCE event. The format of RendezvousData is not specified other than that it must not contain whitespace, and should be no longer than 2048 bytes. 2.5. The RendezvousData blob The "RendezvousData" blob is opaque to the controller, however the tor implementation should of course know how to deal with it. Its contents is the minimal amount of data required to process the INTRODUCE2 cell on another machine. Before proposal 224 is implemented, this could consist of the INTRODUCE2 cell payload, the key to decrypt the cell if the cell is not already decrypted (which may be preferable, for performance reasons), and data necessary for other machines to recognize what to do with the cell. After proposal 224 is implemented, the blob would contain any additional keys needed to perform the rendezvous handshake. Implementations do not need to handle blobs generated by other versions of the software. Because of this, it is recommended to include a version number which can be used to verify that the blob is from a compatible implementation. 3. Compatibility and security The implementation of these methods should, ideally, not change anything in the network, and all control changes are opt-in, so this proposal is fully backwards compatible. Controllers handling this data must be careful to not leak rendezvous data to untrusted parties, as it could be used to intercept and manipulate hidden services traffic. 4. Example Let's take an example where a client (Alice) tries to contact Bob's hidden service. To do this, Bob follows the normal hidden service specification, except he sets up ten servers to do this. One of these publishes the descriptor, the others have this disabled. When the INTRODUCE2 cell arrives at the node which published the descriptor, it does not immediately try to perform the rendezvous, but instead outputs this to the controller. Through an out-of-band process this message is relayed to a controller of another node of Bob's, and this transmits the "PERFORM-RENDEZVOUS" command to that node. This node performs the rendezvous, and will continue to serve data to Alice, whose client will now not have to talk to the introduction point anymore. 5. Other considerations We have left the actual format of the rendezvous data in the control protocol unspecified, so that controllers do not need to worry about the various types of hidden service connections, most notably proposal 224. The decision to not implement the actual cell relaying in the tor implementation itself was taken to allow more advanced configurations, and to leave the actual load-balancing algorithm to the implementor of the controller. The developer of the tor implementation should not have to choose between a round-robin algorithm and something that could pull CPU load averages from a centralized monitoring system. ```