roger's grand plan for how to handle 100000 relays.

still needs to be fleshed out a bit. ;) svn:r14315
author: Roger Dingledine <arma@torproject.org> 2008-04-08 08:59:02 +0000
committer: Roger Dingledine <arma@torproject.org> 2008-04-08 08:59:02 +0000
commit: 8543ef05b34e0a21fd205480efe75c633e3c85cf (patch)
tree: 29885dc9d4b5d52fe73dff935a8cb0ffef917346 /proposals/ideas/xxx-grand-scaling-plan.txt
parent: 04e23fd70ae6fa7582244836febdee3808b7e0ca (diff)
download: torspec-8543ef05b34e0a21fd205480efe75c633e3c85cf.tar.gz
torspec-8543ef05b34e0a21fd205480efe75c633e3c85cf.zip
1 files changed, 96 insertions, 0 deletions
diff --git a/proposals/ideas/xxx-grand-scaling-plan.txt b/proposals/ideas/xxx-grand-scaling-plan.txt
new file mode 100644
index 0000000..e012574
--- /dev/null
+++ b/proposals/ideas/xxx-grand-scaling-plan.txt
@@ -0,0 +1,96 @@
+
+Right now as I understand it, there are n big scaling problems heading
+our way:
+
+1) Clients need to learn all the relay descriptors they could use. That's
+a lot of bytes through a potentially small pipe.
+2) Relays need to hold open TCP connections to most other relays.
+3) Clients need to learn the whole networkstatus. Even using v3, as
+the network grows that will become unwieldy.
+4) Dir mirrors need to mirror all the relay descriptors; eventually this
+will get big too.
+
+Here's my plan.
+
+--------------------------------------------------------------------
+
+Piece one: download O(1) descriptors rather than O(n) descriptors.
+
+We need to change our circuit extend protocol so it fetches a relay
+descriptor at every 'extend' operation:
+  - Client fetches networkstatus, picks guards, connects to one.
+  - Client picks middle hop out of networkstatus, asks guard for
+    its descriptor, then extends to it.
+  - Clients picks exit hop out of networkstatus, asks middle hop
+    for its descriptor, the extends to it. Done.
+The client needs to ask for the descriptor even if it already has a
+copy, because otherwise we leak too much. Also, the descriptor needs to
+be padded to some large (but not too large) size to prevent the middle
+hops from guessing about it.
+
+The first step towards this is to instrument the current code to see
+how much of a win this would actually be -- I am guessing it is already
+a win even with the current number of descriptors.
+
+We also would need to assign the 'Exit' flag more usefully, and make
+clients pay attention to it when picking their last hop, since they
+don't actually know the exit policies of the relays they're choosing from.
+
+We also need to think harder about other implications -- for example,
+a relay with a tiny exit policy won't get the Exit flag, and thus won't
+ever get picked as an exit relay. Plus, our "enclave exit" model is out
+the window unless we figure out a cool trick.
+
+More generally, we'll probably want to compress the descriptors that we
+send back; maybe 8k is a good upper bound? I wonder if we could ask for
+several descriptors, and bundle back all of the ones that fit in the 8k?
+
+We'd also want to put the load balancing weights into the networkstatus,
+so clients can choose fast nodes more often without needing to see the
+descriptors. This is a good opportunity for the authorities to be able
+to put "more accurate" weights in if they learn to detect attacks. It
+also means we should consider running automated audits to make sure the
+authorities aren't trying to snooker everybody.
+
+I'm aiming to get Peter Palfrader to tackle this problem in mid 2008,
+but I bet he could use some help.
+
+--------------------------------------------------------------------
+
+Piece two: inter-relay communication uses UDP
+
+If relays send packets to/from other relays via UDP, they don't need a
+new descriptor for each such link. Thus we'll still need to keep state
+for each link, but we won't max out on sockets.
+
+Clearly a lot more work needs to be done here. Ian Goldberg has a student
+who has been working on it, and if all goes well we'll be chipping in
+some funding to continue that. Also, Camilo Viecco has been doing his
+PhD thesis on it.
+
+--------------------------------------------------------------------
+
+Piece three: networkstatus documents get partitioned
+
+While the authorities should be expected to be able to handle learning
+about all the relays, there's no reason the clients or the mirrors need
+to. Authorities should put a cap on the number of relays listed in a
+single networkstatus, and split them when they get too big.
+
+We'd need a good way to have each authority come to the same conclusion
+about which partition a given relay goes into.
+
+Directory mirrors would then mirror all the relay descriptors in their
+partition. This is compatible with 'piece one' above, since clients in
+a given partition will only ask about descriptors in that partition.
+
+More complex versions of this design would involve overlapping partitions,
+but that would seem to start contradicting other parts of this proposal
+right quick.
+
+Nobody is working on this piece yet. It's hard to say when we'll need
+it, but it would be nice to have some more thought on it before the week
+that we need it.
+
+--------------------------------------------------------------------
+
author	Roger Dingledine <arma@torproject.org>	2008-04-08 08:59:02 +0000
committer	Roger Dingledine <arma@torproject.org>	2008-04-08 08:59:02 +0000
commit	8543ef05b34e0a21fd205480efe75c633e3c85cf (patch)
tree	29885dc9d4b5d52fe73dff935a8cb0ffef917346 /proposals/ideas/xxx-grand-scaling-plan.txt
parent	04e23fd70ae6fa7582244836febdee3808b7e0ca (diff)
download	torspec-8543ef05b34e0a21fd205480efe75c633e3c85cf.tar.gz torspec-8543ef05b34e0a21fd205480efe75c633e3c85cf.zip