diff options
Diffstat (limited to 'doc/spec/proposals/ideas/xxx-grand-scaling-plan.txt')
-rw-r--r-- | doc/spec/proposals/ideas/xxx-grand-scaling-plan.txt | 97 |
1 files changed, 0 insertions, 97 deletions
diff --git a/doc/spec/proposals/ideas/xxx-grand-scaling-plan.txt b/doc/spec/proposals/ideas/xxx-grand-scaling-plan.txt deleted file mode 100644 index 336798cc0f..0000000000 --- a/doc/spec/proposals/ideas/xxx-grand-scaling-plan.txt +++ /dev/null @@ -1,97 +0,0 @@ - -Right now as I understand it, there are n big scaling problems heading -our way: - -1) Clients need to learn all the relay descriptors they could use. That's -a lot of bytes through a potentially small pipe. -2) Relays need to hold open TCP connections to most other relays. -3) Clients need to learn the whole networkstatus. Even using v3, as -the network grows that will become unwieldy. -4) Dir mirrors need to mirror all the relay descriptors; eventually this -will get big too. - -Here's my plan. - --------------------------------------------------------------------- - -Piece one: download O(1) descriptors rather than O(n) descriptors. - -We need to change our circuit extend protocol so it fetches a relay -descriptor at every 'extend' operation: - - Client fetches networkstatus, picks guards, connects to one. - - Client picks middle hop out of networkstatus, asks guard for - its descriptor, then extends to it. - - Clients picks exit hop out of networkstatus, asks middle hop - for its descriptor, then extends to it. Done. - -The client needs to ask for the descriptor even if it already has a -copy, because otherwise we leak too much. Also, the descriptor needs to -be padded to some large (but not too large) size to prevent the middle -hops from guessing about it. - -The first step towards this is to instrument the current code to see -how much of a win this would actually be -- I am guessing it is already -a win even with the current number of descriptors. - -We also would need to assign the 'Exit' flag more usefully, and make -clients pay attention to it when picking their last hop, since they -don't actually know the exit policies of the relays they're choosing from. - -We also need to think harder about other implications -- for example, -a relay with a tiny exit policy won't get the Exit flag, and thus won't -ever get picked as an exit relay. Plus, our "enclave exit" model is out -the window unless we figure out a cool trick. - -More generally, we'll probably want to compress the descriptors that we -send back; maybe 8k is a good upper bound? I wonder if we could ask for -several descriptors, and bundle back all of the ones that fit in the 8k? - -We'd also want to put the load balancing weights into the networkstatus, -so clients can choose fast nodes more often without needing to see the -descriptors. This is a good opportunity for the authorities to be able -to put "more accurate" weights in if they learn to detect attacks. It -also means we should consider running automated audits to make sure the -authorities aren't trying to snooker everybody. - -I'm aiming to get Peter Palfrader to tackle this problem in mid 2008, -but I bet he could use some help. - --------------------------------------------------------------------- - -Piece two: inter-relay communication uses UDP - -If relays send packets to/from other relays via UDP, they don't need a -new descriptor for each such link. Thus we'll still need to keep state -for each link, but we won't max out on sockets. - -Clearly a lot more work needs to be done here. Ian Goldberg has a student -who has been working on it, and if all goes well we'll be chipping in -some funding to continue that. Also, Camilo Viecco has been doing his -PhD thesis on it. - --------------------------------------------------------------------- - -Piece three: networkstatus documents get partitioned - -While the authorities should be expected to be able to handle learning -about all the relays, there's no reason the clients or the mirrors need -to. Authorities should put a cap on the number of relays listed in a -single networkstatus, and split them when they get too big. - -We'd need a good way to have each authority come to the same conclusion -about which partition a given relay goes into. - -Directory mirrors would then mirror all the relay descriptors in their -partition. This is compatible with 'piece one' above, since clients in -a given partition will only ask about descriptors in that partition. - -More complex versions of this design would involve overlapping partitions, -but that would seem to start contradicting other parts of this proposal -right quick. - -Nobody is working on this piece yet. It's hard to say when we'll need -it, but it would be nice to have some more thought on it before the week -that we need it. - --------------------------------------------------------------------- - |