diff options
Diffstat (limited to 'proposals/ideas/old/xxx-exit-scanning-outline.txt')
-rw-r--r-- | proposals/ideas/old/xxx-exit-scanning-outline.txt | 44 |
1 files changed, 44 insertions, 0 deletions
diff --git a/proposals/ideas/old/xxx-exit-scanning-outline.txt b/proposals/ideas/old/xxx-exit-scanning-outline.txt new file mode 100644 index 0000000..d840944 --- /dev/null +++ b/proposals/ideas/old/xxx-exit-scanning-outline.txt @@ -0,0 +1,44 @@ +1. Scanning process + A. Non-HTML/JS HTTP mime types compared via SHA1 hash + B. Dynamic HTTP content filtered at 4 levels: + 1. IP change+Tor cookie utilization + - Tor cookies replayed with new IP in case of changes + 2. HTML Tag+Attribute+JS comparison + - Comparisons made based only on "relevant" HTML tags + and attributes + 3. HTML Tag+Attribute+JS diffing + - Tags, attributes and JS AST nodes that change during + Non-Tor fetches pruned from comparison + 4. URLS with > N% of node failures removed + - results purged from filesystem at end of scan loop + C. SSL scanning handles some forms of dynamic certs + 1. Catalogs certs for all IPs resolved locally + by getaddrinfo over the duration of the scan. + - Updated each test. + 2. If the domain presents a new cert for each IP, this + is noted on the failure result for the node + 3. If the same IP presents two different certs locally, + the cert list is first refreshed, and if it happens + again, discarded + 4. A N% node failure filter also applies + D. Scanner can be restarted from any point in the event + of scanner or system crashes, or graceful shutdown. + - Results+scan state pickled to filesystem continuously +2. Cron job checks results periodically for reporting + A. Divide failures into three types of BadExit based on type + and frequency over time and incident rate + B. write reject lines to approved-routers for those three types: + 1. ID Hex based (for misconfig/network problems easily fixed) + 2. IP based (for content modification) + 3. IP+mask based (for continuous/egregious content modification) + C. Emails results to tor-scanners@freehaven.net +3. Human Review and Appeal + A. ID Hex-based BadExit is meant to be possible to removed easily + without needing to beg us. + - Should this behavior be encouraged? + B. Optionally can reserve IP based badexits for human review + 1. Results are encapsulated fully on the filesystem and can be + reviewed without network access + 2. Soat has --rescan to rescan failed nodes from a data directory + - New set of URLs used + |