summaryrefslogtreecommitdiff
path: root/doc/dir-spec.txt
blob: fadec9bfc1d3390e7c6cfe7f4e2917b41979f0cd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
$Id$

                  Tor directory protocol for 0.1.1.x series

0. Scope and preliminaries

   This document should eventually be merged into tor-spec.txt and replace
   the existing notes on directories.

   This is not a finalized version; what we actually wind up implementing
   may be very different from the system described here.

0.1. Goals

   There are several problems with the way Tor handles directories right
   now:
      1. Directories are very large and use a lot of bandwidth.
      2. Every directory server is a single point of failure.
      3. Requiring every client to know every server won't scale.
      4. Requiring every directory cache to know every server won't scale.
      5. Our current "verified server" system is kind of nonsensical.
      6. Getting more directory servers adds more points of failure and
         worsens possible partitioning attacks.

   This design tries to solve every problem except problems 3 and 4, and to
   be compatible with likely eventual solutions to problems 3 and 4.

1. Outline

   There is no longer any such thing as a "signed directory".  Instead,
   directory servers sign a very compressed 'network status' object that
   lists the current descriptors and their status, and router descriptors
   continue to be self-signed by servers.  Clients download network status
   listings periodically, and download router descriptors as needed.  ORs
   upload descriptors relatively infrequently.

   There are multiple directory servers.  Rather than doing anything
   complicated to coordinate themselves, clients simply rotate through them
   in order, and only use servers that most of the last several directory
   servers like.

2. Router descriptors

   The router descriptor format is unchanged from tor-spec.txt.

   ORs SHOULD generate a new router descriptor whenever any of the
   following events have occurred:

      - A period of time (18 hrs by default) has passed since the last
        time a descriptor was generated.

      - A descriptor field other than bandwidth or uptime has changed.

      - Bandwidth has changed by more than +/- 50% from the last time a
        descriptor was generated, and at least a given interval of time
        (20 mins by default) has passed since then.

      - Uptime has been reset.

   After generating a descriptor, ORs upload it to every directory
   server they know.

3. Network status

   Directory servers generate, sign, and compress a network-status document
   as needed.  As an optimization, they may rate-limit the number of such
   documents generated to once every few seconds.  Directory servers should
   rate-limit at least to the point where these documents are generated no
   faster than once per second.

   The network status document contains a preamble, a set of router status
   entries, and a signature, in that order.

   We use the same meta-format as used for directories and router descriptors
   in "tor-spec.txt".

   The preamble contains:

      "network-status-version" -- A document format version.  For this
         specification, the version is "2".
      "dir-source" -- The hostname, current IP address, and directory
         port of the directory server, separated by spaces.
      "fingerprint" -- A base16-encoded hash of the signing key's
         fingerprint, with no additional spaces added.
      "contact" -- An arbitrary string describing how to contact the
         directory server's administrator.  Administrators should include at
         least an email address and a PGP fingerprint.
      "dir-signing-key" -- The directory server's public signing key.
      "client-versions" -- A comma-separated list of recommended client versions.
      "server-versions" -- A comma-separated list of recommended server versions.
      "published" -- The publication time for this network-status object.
      "dir-options" -- A set of flags separated by spaces:
          "Names" if this directory server performs name bindings.
          "Versions" if this directory server recommends software versions.

   The dir-options entry is optional.  The "-versions" entries are required if
   the "Versions" flag is present.  The other entries are required and must
   appear exactly once. The "network-status-version" entry must appear first;
   the others may appear in any order.

   For each router, the router entry contains:  (This format is designed for
   conciseness.)

      "r" -- followed by the following elements, separated by spaces:
          - The OR's nickname,
          - A hash of its identity key, encoded in base64, with trailing =
            signs removed.
          - A hash of its most recent descriptor, encoded in base64, with
            trailing = signs removed.  (The hash is calculated as for
            computing the signature of a descriptor.)
          - The publication time of its most recent descriptor.
          - An IP
          - An OR port
          - A directory port (or "0" for none")
      "s" -- A series of space-separated status flags:
          "Exit" if the router is useful for building general-purpose exit
             circuits.
          "Stable" if the router tends to stay up for a long time.
          "Fast" if the router has high bandwidth.
          "Running" if the router is currently usable.
          "Named" if the router's identity-nickname mapping is canonical.
          "Valid" if the router has been 'validated'.
          "Authority" if the router is a directory authority.

      The "r" entry for each router must appear first and is required.  The
      's" entry is optional.  Unrecognized flags, or extra elements on the
      "r" line must be ignored.

   The signature section contains:

      "directory-signature". A signature of the rest of the document using
      the directory server's signing key.

   We compress the network status list with zlib before transmitting it.

4. Directory server operation

   By default, directory servers remember all non-expired, non-superseded OR
   descriptors that they have seen.

   For each OR, a directory server remembers whether the OR was running and
   functional the last time they tried to connect to it, and possibly other
   liveness information.

   Directory server administrators may label some servers or IPs as
   blacklisted, and elect not to include them in their network-status lists.

   Thus, the network-status list includes all non-blacklisted,
   non-expired, non-superseded descriptors for ORs that the directory has
   observed at least once to be running.

   Directory server administrators may decide to support name binding.  If
   they do, then they must maintain a file of nickname-to-identity-key
   mappings, and try to keep this file consistent with other directory
   servers.  If they don't, they act as clients, and report bindings made by
   other directory servers (name X is bound to identity Y if at least one
   binding directory lists it, and no directory binds X to some other Y'.)

   The authoritative network-status published by a host should be available at:
      http://<hostname>/tor/status/authority.z

   An authoritative network-status published by another host with fingerprint
   <F> should be available at:
      http://<hostname>/tor/status/fp/<F>.z

   An authoritative network-status published by other hosts with fingerprints
   <F1>,<F2>,<F3> should be available at:
      http://<hostname>/tor/status/fp/<F1>+<F2>+<F3>.z

   The most recent network-status documents from all known authoritative
   directories, concatenated, should be available at:
         http://<hostname>/tor/status/all.z

   The most recent descriptor for a server whose identity key has a
   fingerprint of <F> should be available at:
      http://<hostname>/tor/server/fp/<F>.z

   The most recent descriptors for servers have fingerprints <F1>,<F2>,<F3>
   should be available at:
      http://<hostname>/tor/server/fp/<F1>+<F2>+<F3>.z

   The most recent descriptor for this server should be at:
      http://<hostname>/tor/server/authority.z

   A concatenated set of the most recent descriptors for all known servers
   should be available at:
      http://<hostname>/tor/server/all.z

   For debugging, directories MAY expose non-compressed objects at URLs like
   the above, but without the final ".z".

   Clients MUST handle compressed concatenated information in two forms:
     - A concatenated list of zlib-compressed objects.
     - A zlib-compressed concatenated list of objects.
   Directory servers MAY generate either format: the former requires less
   CPU, but the latter requires less bandwidth.

4.1. Caching

   Directory caches (most ORs) regularly download network status documents,
   and republish them at a URL based on the directory server's identity key:
      http://<hostname>/tor/status/<identity fingerprint>.z

   A concatenated list of all network-status documents should be available at:
      http://<hostname>/tor/status/all.z

4.2. Compression


5. Client operation

   Every OP or OR, including directory servers, acts as a client to the
   directory protocol.

   Each client maintains a list of trusted directory servers.  Periodically
   (currently every 20 minutes), the client downloads a new network status. It
   chooses the directory server from which its current information is most
   out-of-date, and retries on failure until it finds a running server.

   When choosing ORs to build circuits, clients proceed as follows:
     - A server is "listed" if it is listed by more than half of the "live"
       network status documents the clients have downloaded.  (A network
       status is "live" if it is the most recently downloaded network status
       document for a given directory server, and the server is a directory
       server trusted by the client, and the network-status document is no
       more than D (say, 10) days old.)
     - A server is "valid" is it is listed as valid by more than half of the
       "live" downloaded" network-status document.
     - A server is "running" if it is listed as running by more than
       half of the "recent" downloaded network-status documents.
       (A network status is "recent" if it was published in the last
       60 minutes.  If there are fewer than 3 such documents, the most
       recently published 3 are "recent."  If there are fewer than 3 in all,
       all are "recent.")


   Clients store network status documents so long as they are live.

5.1. Scheduling network status downloads

   This download scheduling algorithm implements the approach described above
   in a relatively low-state fashion.  It reflects the current Tor
   implementation.

   Clients maintain a list of authorities; each client tries to keep the same
   list, in the same order.

   Periodically, on startup, and on HUP, clients check whether they need to
   download fresh network status documents.  The approach is as follows:
     - If we have under X network status documents newer than OLD, we choose a
       member of the list at random and try download XX documents starting
       with that member's.
     - Otherwise, if we have no network status documents newer than NEW, we
       check to see which authority's document we retrieved most recently,
       and try to retrieve the next authority's document.  If we can't, we
       try the next authority in sequence, and so on.

5.2. Managing naming

   In order to provide human-memorable names for individual server
   identities, some directory servers bind names to IDs.  Clients handle
   names in two ways:

   If a client is encountering a name it has not mapped before:

      If all the "binding" networks-status documents the client has so far
      received same claim that the name binds to some identity X, and the
      client has received at least three network-status documents, the client
      maps the name to X.

   If a client is encountering a name it has mapped before:

      It uses the last-mapped identity value, unless all of the "binding"
      network status documents bind the name to some other identity.

5.3. Notes on what we do now.

   THIS SECTION SHOULD BE FOLDED INTO THE EARLIER SECTIONS; THEY ARE WRONG;
   THIS IS RIGHT.

   All downloaded networkstatuses are discarded once they are 10 days old (by
   published date).

   Authdirs download each others' networkstatus every
   AUTHORITY_NS_CACHE_INTERVAL minutes (currently 10).

   Directory caches download authorities' networkstatus every
   NONAUTHORITY_NS_CACHE_INTERVAL minutes (currently 10).

   Clients always try to replace any networkstatus received over
   NETWORKSTATUS_MAX_VALIDITY ago (currently 2 days). Also, when the most
   recently received networkstatus is more than
   NETWORKSTATUS_CLIENT_DL_INTERVAL (30 minutes) old, and we do not have any
   open directory connections fetching a networkstatus, clients try to
   download the networkstatus on their list after the most recently received
   networkstatus, skipping failed networkstatuses.  A networkstatus is
   "failed" if NETWORKSTATUS_N_ALLOWABLE_FAILURES (3) attempts in a row have
   all failed.

   We do not update router statuses if we have less than half of the
   networkstatuses.

   A networkstatus is "live" if it is the most recent we have received signed
   by a given trusted authority.

   A networkstatus is "recent" if it is "live" and:
       - it was received in the last DEFAULT_RUNNING_INTERVAL (currently 60
         minutes)
   OR  - it was one of the MIN_TO_INFLUENCE_RUNNING (3) most recently received
         networkstatuses.

   Authorities always believe their own opinion as to a router's status.  For
   other tors:
     - a router is valid if more than half of the live networkstatuses think
       it's valid.
     - a router is named if more than half of the live networkstatuses from
       naming authorities think it's named, and they all think it has the
       same name.
     - a router is running if more than half of the recent networkstatuses
       think it's running.

   Everyone downloads router descriptors as follows:

     - If any networkstatus lists a more recently published routerdesc with a
       different descriptor digest, and no more than
       MAX_ROUTERDESC_DOWNLOAD_FAILURES attempts to retrieve that routerdesc
       have failed, then that routerdesc is "downloadable".

     - Every DirFetchInterval, or whenever a request for routerdescs returns
       no routerdescs, we launch a set of requests for all downloadable
       routerdescs.  We divide the downloadable routerdescs into groups of no
       more than DL_PER_REQUEST, and send a request for each group to
       directory servers chosen independently.

     - We also launch a request as above when a request for routerdescs
       fails and we have no directory connections fetching routerdescs.

   TODO Specify here:
    - When to 0-out failure count for networkstatus?

    - Drop fallback to download-all.  Also, always split download.

    - For versions: if you're listed by more than half of live versioning
      networkstatuses, good.  if less than half of networkstatuses are live,
      don't do anything.  If half are live, and half of less of the
      versioning ones list you, warn.  Only warn once every 24 hours.

    - For names: warn if an unnamed router is specified by nickname.
      Rate-limit these warnings.
      - Also, don't believe N->K if another naming authdir says N->K'.
      - Revise naming rule: N->K is true if any naming directory says N->K,
        and no other naming directory says N->K' or N'->K.

    - Minimum info to build circuits.

    - Revise: always split requests when we have too little info to build
      circuits.

    - Describe when router is "out of date".  (Any dirserver says so.)

    - Change rule from "do not launch new connections when one exists" to
      "do not request any fingerprint that we're currently requesting."

    - Launch new connections every minute, plus whenever a download fails.
    - Reset routerdesc failure count after 60 minutes, or when
      when network comes back on after absence.
    - Make "I didn't get the one I thought was most recent" a failure.
      - Retry these every 5 minutes if you're a client.
      - Mirrors should retry these harder and more often.
    - If we have a routerdesc for Bob, and he says, "I'm 0.1.0.x", don't
      fetch a new one if it was published in the last 2 hours. (??)

    


6. Remaining issues

   Client-knowledge partitioning is worrisome.  Most versions of this don't
   seem to be worse than the Danezis-Murdoch tracing attack, since an
   attacker can't do more than deduce probable exits from entries (or vice
   versa).  But what about when the client connects to A and B but in a
   different order?  How bad can it be partitioned based on its knowledge?


================================================================================
Everything below this line is obsolete.
--------------------------------------------------------------------------------

                      Tor network discovery protocol

0. Scope

This document proposes a way of doing more distributed network discovery
while maintaining some amount of admission control. We don't recommend
you implement this as-is; it needs more discussion.

Terminology:
  - Client: The Tor component that chooses paths.
  - Server: A relay node that passes traffic along.

1. Goals.

We want more decentralized discovery for network topology and status.
In particular:

1a. We want to let clients learn about new servers from anywhere
    and build circuits through them if they wish. This means that
    Tor nodes need to be able to Extend to nodes they don't already
    know about.

1b. We want to let servers limit the addresses and ports they're
    willing to extend to. This is necessary e.g. for middleman nodes
    who have jerks trying to extend from them to badmafia.com:80 all
    day long and it's drawing attention.

1b'. While we're at it, we also want to handle servers that *can't*
    extend to some addresses/ports, e.g. because they're behind NAT or
    otherwise firewalled. (See section 5 below.)

1c. We want to provide a robust (available) and not-too-centralized
    mechanism for tracking network status (which nodes are up and working)
    and admission (which nodes are "recommended" for certain uses).

2. Assumptions.

2a. People get the code from us, and they trust us (or our gpg keys, or
    something down the trust chain that's equivalent).

2b. Even if the software allows humans to change the client configuration,
    most of them will use the default that's provided. so we should
    provide one that is the right balance of robust and safe. That is,
    we need to hard-code enough "first introduction" locations that new
    clients will always have an available way to get connected.

2c. Assume that the current "ask them to email us and see if it seems
    suspiciously related to previous emails" approach will not catch
    the strong Sybil attackers. Therefore, assume the Sybil attackers
    we do want to defend against can produce only a limited number of
    not-obviously-on-the-same-subnet nodes.

2d. Roger has only a limited amount of time for approving nodes; shouldn't
    be the time bottleneck anyway; and is doing a poor job at keeping
    out some adversaries.

2e. Some people would be willing to offer servers but will be put off
    by the need to send us mail and identify themselves.
2e'. Some evil people will avoid doing evil things based on the perception
    (however true or false) that there are humans monitoring the network
    and discouraging evil behavior.
2e''. Some people will trust the network, and the code, more if they
    have the perception that there are trustworthy humans guiding the
    deployed network.

2f. We can trust servers to accurately report their characteristics
    (uptime, capacity, exit policies, etc), as long as we have some
    mechanism for notifying clients when we notice that they're lying.

2g. There exists a "main" core Internet in which most locations can access
    most locations. We'll focus on it (first).

3. Some notes on how to achieve.

Piece one: (required)

  We ship with N (e.g. 20) directory server locations and fingerprints.

  Directory servers serve signed network-status pages, listing their
  opinions of network status and which routers are good (see 4a below).

  Dirservers collect and provide server descriptors as well. These don't
  need to be signed by the dirservers, since they're self-certifying
  and timestamped.

  (In theory the dirservers don't need to be the ones serving the
  descriptors, but in practice the dirservers would need to point people
  at the place that does, so for simplicity let's assume that they do.)

  Clients then get network-status pages from a threshold of dirservers,
  fetch enough of the corresponding server descriptors to make them happy,
  and proceed as now.

Piece two: (optional)

  We ship with S (e.g. 3) seed keys (trust anchors), and ship with
  signed timestamped certs for each dirserver. Dirservers also serve a
  list of certs, maybe including a "publish all certs since time foo"
  functionality. If at least two seeds agree about something, then it
  is so.

  Now dirservers can be added, and revoked, without requiring users to
  upgrade to a new version. If we only ship with dirserver locations
  and not fingerprints, it also means that dirservers can rotate their
  signing keys transparently.

  But, keeping track of the seed keys becomes a critical security issue.
  And rotating them in a backward-compatible way adds complexity. Also,
  dirserver locations must be at least somewhere static, since each lost
  dirserver degrades reachability for old clients. So as the dirserver
  list rolls over we have no choice but to put out new versions.


Piece three: (optional)

  Notice that this doesn't preclude other approaches to discovering
  different concurrent Tor networks. For example, a Tor network inside
  China could ship Tor with a different torrc and poof, they're using
  a different set of dirservers. Some smarter clients could be made to
  learn about both networks, and be told which nodes bridge the networks.
  ...

4. Unresolved issues.

4a. How do the dirservers decide whether to recommend a server? We
    could have them do it based on contact from the human, but by
    assumptions 2c and 2d above, that's going to be less effective, and
    more of a hassle, as we scale up. Thus I propose that they simply
    do some basic automatic measuring themselves, starting with the
    current "are they connected to me" measurement, and that's all
    that is done.

    We could blacklist as we notice evil servers, but then we're in
    the same boat all the irc networks are in. We could whitelist as we
    notice new servers, and stop whitelisting (maybe rolling back a bit)
    once an attack is in progress. If we assume humans aren't particularly
    good at this anyway, we could just do automated delayed whitelisting,
    and have a "you're under attack" switch the human can enable for a
    while to start acting more conservatively.

    Once upon a time we collected contact info for servers, which was
    mainly used to remind people that their servers are down and could
    they please restart. Now that we have a critical mass of servers,
    I've stopped doing that reminding. So contact info is less important.

4b. What do we do about recommended-versions? Do we need a threshold of
    dirservers to claim that your version is obsolete before you believe
    them? Or do we make it have less effect -- e.g. print a warning but
    never actually quit? Coordinating all the humans to upgrade their
    recommended-version strings at once seems bad. Maybe if we have
    seeds, the seeds can sign a recommended-version and upload it to
    the dirservers.

4c. What does it mean to bind a nickname to a key? What if each dirserver
    does it differently, so one nickname corresponds to several keys?
    Maybe the solution is that nickname<=>key bindings should be
    individually configured by clients in their torrc (if they want to
    refer to nicknames in their torrc), and we stop thinking of nicknames
    as globally unique.

4d. What new features need to be added to server descriptors so they
    remain compact yet support new functionality? Section 5 is a start
    of discussion of one answer to this.



5. Regarding "Blossom: an unstructured overlay network for end-to-end
connectivity."

SECTION 5A: Blossom Architecture

Define "transport domain" as a set of nodes who can all mutually name each
other directly, using transport-layer (e.g. HOST:PORT) naming.

Define "clique" as a set of nodes who can all mutually contact each other directly,
using transport-layer (e.g. HOST:PORT) naming.

Neither transport domains and cliques form a partition of the set of all nodes.
Just as cliques may overlap in theoretical graphs, transport domains and
cliques may overlap in the context of Blossom.

In this section we address possible solutions to the problem of how to allow
Tor routers in different transport domains to communicate.

First, we presume that for every interface between transport domains A and B,
one Tor router T_A exists in transport domain A, one Tor router T_B exists in
transport domain B, and (without loss of generality) T_A can open a persistent
connection to T_B.  Any Tor traffic between the two routers will occur over
this connection, which effectively renders the routers equal partners in
bridging between the two transport domains.  We refer to the established link
between two transport domains as a "bridge" (we use this term because there is
no serious possibility of confusion with the notion of a layer 2 bridge).

Next, suppose that the universe consists of transport domains connected by
persistent connections in this manner.  An individual router can open multiple
connections to routers within the same foreign transport domain, and it can
establish separate connections to routers within multiple foreign transport
domains.

As in regular Tor, each Blossom router pushes its descriptor to directory
servers.  These directory servers can be within the same transport domain, but
they need not be.  The trick is that if a directory server is in another
transport domain, then that directory server must know through which Tor
routers to send messages destined for the Tor router in question.

Blossom routers can advertise themselves to other transport domains in two
ways:

(1) Directly push the descriptor to a directory server in the other transport
domain.  This probably works particularly well if the other transport domain is
"the Internet", or if there are hard-coded directory servers in "the Internet".
The router has the responsibility to inform the directory server about which
routers can be used to reach it.

(2) Push the descriptor to a directory server in the same transport domain.
This is the easiest solution for the router, but it relies upon the existence
of a directory server in the same transport domain that is capable of
communicating with directory servers in the remote transport domain.  In order
for this to work, some individual Tor routers must have published their
descriptors in remote transport domains (i.e. followed the first option) in
order to provide a link by which directory servers can communiate
bidirectionally.

If all directory servers are within the same transport domain, then approach
(1) is sufficient: routers can exist within multiple transport domains, and as
long as the network of transport domains is fully connected by bridges, any
router will be able to access any other router in a foreign transport domain
simply by extending along the path specified by the directory server.  However,
we want the system to be truly decentralized, which means not electing any
particular transport domain to be the master domain in which entries are
published.

This is the explanation for (2): in order for a directory server to share
information with a directory server in a foreign transport domain to which it
cannot speak directly, it must use Tor, which means referring to the other
directory server by using a router in the foreign transport domain.  However,
in order to use Tor, it must be able to reach that router, which means that a
descriptor for that router must exist in its table, along with a means of
reaching it.  Therefore, in order for a mutual exchange of information between
routers in transport domain A and those in transport domain B to be possible,
when routers in transport domain A cannot establish direct connections with
routers in transport domain B, then some router in transport domain B must have
pushed its descriptor to a directory server in transport domain A, so that the
directory server in transport domain A can use that router to reach the
directory server in transport domain B.

Descriptors for Blossom routers are read-only, as for regular Tor routers, so
directory servers cannot modify them.  However, Tor directory servers also
publish a "network-status" page that provide information about which nodes are
up and which are not.  Directory servers could provide an additional field for
Blossom nodes.  For each Blossom node, the directory server specifies a set of
paths (may be only one) through the overlay (i.e. an ordered list of router
names/IDs) to a router in a foreign transport domain.  (This field may be a set
of paths rather than a single path.)

A new router publishing to a directory server in a foreign transport should
include a list of routers.  This list should be either:

a. ...a list of routers to which the router has persistent connections, or, if
the new router does not have any persistent connections,

b. ...a (not necessarily exhaustive) list of fellow routers that are in the
same transport domain.

The directory server will be able to use this information to derive a path to
the new router, as follows.  If the new router used approach (a), then the
directory server will define the set of paths to the new router as union of the
set of paths to the routers on the list with the name of the last hop appended
to each path.  If the new router used approach (b), then the directory server
will define the paths to the new router as the union of the set of paths to the
routers specified in the list.  The directory server will then insert the newly
defined path into the field in the network-status page from the router.

When confronted with the choice of multiple different paths to reach the same
router, the Blossom nodes may use a route selection protocol similar in design
to that used by BGP (may be a simple distance-vector route selection procedure
that only takes into account path length, or may be more complex to avoid
loops, cache results, etc.) in order to choose the best one.

If a .exit name is not provided, then a path will be chosen whose nodes are all
among the set of nodes provided by the directory server that are believed to be
in the same transport domain (i.e. no explicit path).  Thus, there should be no
surprises to the client.  All routers should be careful to define their exit
policies carefully, with the knowledge that clients from potentially any
transport domain could access that which is not explicitly restricted.

SECTION 5B: Tor+Blossom desiderata

The interests of Blossom would be best served by implementing the following
modifications to Tor:

I. CLIENTS

Objectives: Ultimately, we want Blossom requests to be indistinguishable in
format from non-Blossom .exit requests, i.e. hostname.forwarder.exit.

Proposal: Blossom is a process that manipulates Tor, so it should be
implemented as a Tor Control, extending control-spec.txt.  For each request,
Tor uses the control protocol to ask the Blossom process whether it (the
Blossom process) wants to build or assign a particular circuit to service the
request.  Blossom chooses one of the following responses:

a. (Blossom exit node, circuit cached) "use this circuit" -- provides a circuit
ID

b. (Blossom exit node, circuit not cached) "I will build one" -- provides a
list of routers, gets a circuit ID.

c. (Regular (non-Blossom) exit node) "No, do it yourself" -- provides nothing.

II. ROUTERS

Objectives: Blossom routers are like regular Tor routers, except that Blossom
routers need these features as well:

a. the ability to open peresistent connections,

b. the ability to know whwther they should use a persistent connection to reach
another router,

c. the ability to define a set of routers to which to establish persistent
connections, as readable from a configuration file, and

d. the ability to tell a directory server that (1) it is Blossom-enabled, and
(2) it can be reached by some set of routers to which it explicitly establishes
persistent connections.

Proposal: Address the aforementioned points as follows.

a. need the ability to open a specified number of persistent connections.  This
can be accomplished by implementing a generic should_i_close_this_conn() and
which_conns_should_i_try_to_open_even_when_i_dont_need_them().

b. The Tor design already supports this, but we must be sure to establish the
persistent connections explicitly, re-establish them when they are lost, and
not close them unnecessarily.

c. We must modify Tor to add a new configuration option, allowing either (a)
explicit specification of the set of routers to which to establish persistent
connections, or (b) a random choice of some nodes to which to establish
persistent connections, chosen from the set of nodes local to the transport
domain of the specified directory server (for example).

III. DIRSERVERS

Objective: Blossom directory servers may provide extra
fields in their network-status pages.  Blossom directory servers may
communicate with Blossom clients/routers in nonstandard ways in addition to
standard ways.

Proposal: Geoff should be able to implement a directory server according to the
Tor specification (dir-spec.txt).