``` Filename: 265-load-balancing-with-overhead.txt Title: Load Balancing with Overhead Parameters Authors: Mike Perry Created: 01 January 2016 Status: Open Target: arti-dirauth NOTE: This is one way to address several load balancing problems in Tor, including padding overhead and Exit+Guard issues. However, before attempting this, we should see if we can simplify the equations further by changing how we assign Guard, Fast and Stable flags in the first place. If we assign Guard flags such that Guards are properly allocated wrt Middle and Fast, and avoid assigning Guard to Exit, this will become simpler. Unfortunately, this is literally impossible to fix with C-Tor. In adition to numerous overrides and disparate safety checks that prevent changes, several bugs mean that Guard, Stable, and Fast flags are randomly assigned: See: https://gitlab.torproject.org/tpo/core/tor/-/issues/40230 https://gitlab.torproject.org/tpo/core/tor/-/issues/40395 https://gitlab.torproject.org/tpo/core/tor/-/issues/19162 https://gitlab.torproject.org/tpo/core/tor/-/issues/40733 https://gitlab.torproject.org/tpo/network-health/analysis/-/issues/45 https://gitlab.torproject.org/tpo/core/torspec/-/issues/100 https://gitlab.torproject.org/tpo/core/torspec/-/issues/160 https://gitlab.torproject.org/tpo/core/torspec/-/issues/158 Other approaches to flag equations that have been proposed: https://github.com/frochet/wf_proposal/blob/master/waterfilling-balancing-with-max-diversity.txt https://petsymposium.org/popets/2023/popets-2023-0127.pdf 0. Motivation In order to properly load balance in the presence of padding and non-negligible amounts of directory and hidden service traffic, the load balancing equations in Section 3.8.3 of dir-spec.txt are in need of some modifications. In addition to supporting the idea of overhead, the load balancing equations can also be simplified by treating Guard+Exit nodes as Exit nodes in all cases. This causes the 9 sub-cases of the current load balancing equations to consolidate into a single solution, which also will greatly simplify the consensus process, and eliminate edge cases such as #16255[1]. 1. Overview For padding overhead due to Proposals 251 and 254, and changes to hidden service path selection in Proposal 247, it will be useful to be able to specify a pair of parameters that represents the additional traffic present on Guard and Middle nodes due to these changes. The current load balancing equations unfortunately make this excessively complicated. With overhead factors included, each of the 9 subcases goes from being a short solution to over a page of calculations for each subcase. Moreover, out of 8751 hourly consensus documents produced in 2015[2], only 78 of them had a non-zero weight for using Guard+Exit nodes in the Guard position (weight Wgd), and most of those were well under 1%. The highest weight for using Guard+Exits in the Guard position recorded in 2015 was 2.62% (on December 10th, 2015). This means clients that chose a Guard node during that particular hour used only 2.62% of Guard+Exit flagged nodes' bandwidth when performing a bandwidth-weighted Guard selection. All clients that chose a Guard node during any other hour did not consider Guard+Exit nodes at all as potential candidates for their Guards. This indicates that we can greatly simplify these load balancing equations with little to no change in diversity to the network. 2. Simplified Load Balancing Equations Recall that the point of the load balancing equations in section 3.8.3 of dir-spec.txt is to ensure that an equal amount of client traffic is distributed between Guards, Middles, Exits, and Guard+Exits, where each flag type can occupy one or more positions in a path. This allocation is accomplished by solving a system of equations for weights for flag position selection to ensure equal allocation of client traffic for each position in a circuit. If we ignore overhead for the moment and treat Guard+Exit nodes as Exit nodes, then this allows the simplified system of equations to become: Wgg*G == M + Wme*E + Wmg*G # Guard position == middle position Wgg*G == Wee*E # Guard position == equals exit position Wmg*G + Wgg*G == G # Guard allocation weights sum to 1 Wme*E + Wee*E == E # Exit allocation weights sum to 1 This system has four equations and four unknowns, and by transitivity we ensure that allocated capacity for guard, middle, and exit positions are all equal. Unlike the equations in 3.8.3 of dir-spec.txt, there are no special cases to the solutions of these equations because there is no shortage of constraints and no decision points for allocation based on scarcity. Thus, there is only one solution. Using SymPy's symbolic equation solver (see attached script) we obtain: E + G + M E + G + M 2*E - G - M 2*G - E - M Wee: ---------, Wgg: ---------, Wme: -----------, Wmg: ------------ 3*E 3*G 3*E 3*G For the rest of the flags weights, we will do the following: Dual-flagged (Guard+Exit) nodes should be treated as Exits: Wgd = 0, Wmd = Wme, Wed = Wee Directory requests use middle weights: Wbd=Wmd, Wbg=Wmg, Wbe=Wme, Wbm=Wmm Handle bridges and strange exit policies: Wgm=Wgg, Wem=Wee, Weg=Wed 2.1. Checking for underflow and overflow In the old load balancing equations, we required a case-by-case proof to guard against overflow and underflow, and to decide what to do in the event of various overflow and underflow conditions[3]. Even still, the system proved fragile to changes, such as the implementation of Guard uptime fractions[1]. Here, with the simplified equations, we can plainly see that the only time that a negative weight can arise is in Wme and Wmg, when 2*E < G+M or when 2*G < E+M. In other words, only when Exits or Guards are scarce. Similarly, the only time that a weight exceeding 1.0 can arise is in Wee and Wgg, which also happens when 2*E < G+M or 2*G < E+M. This means that parameters will always overflow in pairs (Wee and Wme, and/or Wgg and Wmg). In both these cases, simply clipping the parameters at 1 and 0 provides as close of a balancing condition as is possible, given the scarcity. 3. Load balancing with Overhead Parameters Intuitively, overhead due to padding and path selection changes can be represented as missing capacity in the relevant position. This means that in the presence of a Guard overhead fraction of G_o and a Middle overhead fraction of M_o, the total fraction of actual client traffic carried in those positions is (1-G_o) and (1-M_o), respectively. Then, to achieve a balanced allocation of traffic, we consider only the actual client capacity carried in each position: # Guard position minus overhead matches middle position minus overhead: (1-G_o)*(Wgg*G) == (1-M_o)*(M + Wme*E + Wmg*G) # Guard position minus overhead matches exit position: (1-G_o)*(Wgg*G) == 1*(Wee*E) # Guard weights still sum to 1: Wmg*G + Wgg*G == G # Exit weights still sum to 1: Wme*E + Wee*E == E Solving this system with SymPy unfortunately yields some unintuitively simplified results. For each weight, we first show the SymPy solution, and then factor that solution into a form analogous to Section 2: -(G_o - 1)*(M_o - 1)*(E + G + M) Wee: --------------------------------------- E*(G_o + M_o - (G_o - 1)*(M_o - 1) - 2) (1 - G_o)*(1 - M_o)*(E + G + M) Wee: --------------------------------------- E*(2 - G_o - M_o + (1 - G_o)*(1 - M_o)) (M_o - 1)*(E + G + M) Wgg: --------------------------------------- G*(G_o + M_o - (G_o - 1)*(M_o - 1) - 2) (1 - M_o)*(E + G + M) Wgg: --------------------------------------- G*(2 - G_o - M_o + (1 - G_o)*(1- M_o)) -E*(M_o - 1) + G*(G_o - 1)*(-M_o + 2) - M*(M_o - 1) Wmg: --------------------------------------------------- G*(G_o + M_o - (G_o - 1)*(M_o - 1) - 2) (2 - M_o)*G*(1 - G_o) - M*(1 - M_o) - E*(1 - M_o) Wmg: --------------------------------------------------- G*(2 - G_o - M_o + (1 - G_o )*(1 - M_o)) E*(G_o + M_o - 2) + G*(G_o - 1)*(M_o - 1) + M*(G_o - 1)*(M_o - 1) Wme: ----------------------------------------------------------------- E*(G_o + M_o - (G_o - 1)*(M_o - 1) - 2) (2 - G_o - M_o)*E - G*(1 - G_o)*(1 - M_o) - M*(1 - G_o)*(1 - M_o) Wme: ----------------------------------------------------------------- E*(2 - G_o - M_o + (1 - G_o)*(1 - M_o)) A simple spot check with G_o = M_o = 0 shows us that with zero overhead, these solutions become identical to the solutions in Section 2 of this proposal. The final condition that we need to ensure is that these weight values never become negative or greater than 1.0[3]. 3.1. Ensuring against underflow and overflow Note that if M_o = G_o = 0, then the solutions and the overflow conditions are the same as in Section 2. Unfortunately, SymPy is unable to solve multivariate inequalities, which prevents us from directly deriving overflow conditions for each variable independently (at least easily and without mistakes). Wolfram Alpha is able to derive closed form solutions to some degree for this, but they are more complicated than checking the weights for underflow and overflow directly. However, for all overflow and underflow cases, simply warning in the event of overflow or underflow in the weight variable solutions above is equivalent anyway. Optimal load balancing given this scarcity should still result if we clip the resulting solutions to [0, 1.0]. It will be wise in the implementation to test the overflow conditions with M_o = G_o = 0, and with their actual values. This will allow us to know if the overflow is a result of inherent unbalancing, or due to input overhead values that are too large (and need to be reduced by, for example, reducing padding). 4. Consensus integration 4.1. Making use of the Overhead Factors In order to keep the consensus process simple on the Directory Authorities, the overhead parameters represent the combined overhead from many factors. The G_o variable is meant to account for sum of directory overhead, netflow padding overhead, future two-hop padding overhead, and future hidden service overhead (for cases where Guard->Middle->Exit circuits are not used). The M_o variable is meant to account for multi-hop padding overhead, hidden service overhead, as well as an overhead for any future two-hop directory connections (so that we can consolidate Guards and Directory guard functionality into a single Guard node). There is no need for an E_o variable, because even if there were Exit-specific overhead, it could be represented by an equivalent reductions in both G_o and M_o instead. Since all relevant padding and directory overhead information is included in the extra-info documents for each relay, the M_o and G_o variables could be computed automatically from these extra-info documents during the consensus process. However, it is probably wiser to keep humans in the loop and set them manually as consensus parameters instead, especially since we have not previously had to deal with serious adversarial consequences from malicious extra-info reporting. For clarity, though, it may be a good idea to separate all of the components of M_o and G_o into separate consensus parameters, and combine them (via addition) in the final equations. That way it will be easier to pinpoint the source of any potential overflow issues. This separation will also enable us to potentially govern padding's contribution to the overhead via a single tunable value. 4.2. Accounting for hidden service overhead with Prop 247 XXX: Hidden service path selection and 247 complicates this. With 247, we want paths only of G M M, where the Ms exclude Guard-flaged nodes. This means that M_o needs to add the total hidden service *network bytecount* overhead (2X the hidden service end-to-end traffic bytecount). We also need to *subtract* 4*Wmg*hs_e2e_bytecount from the G_o overhead, to account for not using Guard-flagged nodes for the four M's in full prop-247 G M M M M G circuits. 4.3. Accounting for RSOS overhead XXX: We also need to separately account for RSOS (and maybe SOS?) path usage in M_o. This will require separate acocunting for these service types in extra-info descriptors. 4.4 Integration with Guardfraction The GuardFraction changes in Proposal 236 and #16255 should continue to work with these new equations, so long as the total T, G, and M values are counted after the GuardFraction multiplier has been applied. 4.5. Guard flag assignment Ideally, the Guard flag assignment process would also not count Exit-flagged nodes when determining the Guard flag uptime and bandwidth cutoffs, since we will not be using Guard+Exit flagged nodes as Guard nodes at all when this change is applied. This will result in more accurate thresholds for Guard node status, as well as better control over the true total amount of Guard bandwidth in the consensus. 4.6. Cannibalization XXX: It sucks and complicates everything. kill it, except for hsdirs. 1. https://trac.torproject.org/projects/tor/ticket/16255 2. https://collector.torproject.org/archive/relay-descriptors/consensuses/ 3. http://tor-dev.torproject.narkive.com/17H9FewJ/correctness-proof-for-new-bandwidth-weights-bug-1952 Appendix A: SymPy Script for Balancing Equation Solutions #!/usr/bin/python from sympy.solvers import solve from sympy import simplify, Symbol, init_printing, pprint # Sympy variable declarations (G,M,E,D) = (Symbol('G'),Symbol('M'),Symbol('E'),Symbol('D')) (Wgd,Wmd,Wed,Wme,Wmg,Wgg,Wee) = (Symbol('Wgd'),Symbol('Wmd'),Symbol('Wed'), Symbol('Wme'),Symbol('Wmg'),Symbol('Wgg'), Symbol('Wee')) (G_o, M_o) = (Symbol('G_o'),Symbol('M_o')) print "Current Load Balancing Equation Solutions, Case 1:" pprint(solve( [Wgg*G + Wgd*D - (M + Wmd*D + Wme*E + Wmg*G), Wgg*G + Wgd*D - (Wee*E + Wed*D), Wed*D + Wmd*D + Wgd*D - D, Wmg*G + Wgg*G - G, Wme*E + Wee*E - E, Wmg - Wmd, 3*Wed - 1], Wgd, Wmd, Wed, Wme, Wmg, Wgg, Wee)) print print "Case 1 with guard and middle overhead: " pprint(solve( [(1-G_o)*(Wgg*G + Wgd*D) - (1-M_o)*(M + Wmd*D + Wme*E + Wmg*G), (1-G_o)*(Wgg*G + Wgd*D) - (Wee*E + Wed*D), Wed*D + Wmd*D + Wgd*D - D, Wmg*G + Wgg*G - G, Wme*E + Wee*E - E, Wmg - Wmd, 3*Wed - 1], Wgd, Wmd, Wed, Wme, Wmg, Wgg, Wee)) print "\n\n" print "Elimination of combined Guard+Exit flags (no overhead): " pprint(solve( [(Wgg*G) - (M + Wme*E + Wmg*G), (Wgg*G) - 1*(Wee*E), Wmg*G + Wgg*G - G, Wme*E + Wee*E - E], Wme, Wmg, Wgg, Wee)) print print "Elimination of combined Guard+Exit flags (Guard+middle overhead): " combined = solve( [(1-G_o)*(Wgg*G) - (1-M_o)*(M + Wme*E + Wmg*G), (1-G_o)*(Wgg*G) - 1*(Wee*E), Wmg*G + Wgg*G - G, Wme*E + Wee*E - E], Wme, Wmg, Wgg, Wee) pprint(combined) ```