``` Filename: 159-exit-scanning.txt Title: Exit Scanning Author: Mike Perry Created: 13-Feb-2009 Status: Informational Overview: This proposal describes the implementation and integration of an automated exit node scanner for scanning the Tor network for malicious, misconfigured, firewalled or filtered nodes. Motivation: Tor exit nodes can be run by anyone with an Internet connection. Often, these users aren't fully aware of limitations of their networking setup. Content filters, antivirus software, advertisements injected by their service providers, malicious upstream providers, and the resource limitations of their computer or networking equipment have all been observed on the current Tor network. It is also possible that some nodes exist purely for malicious purposes. In the past, there have been intermittent instances of nodes spoofing SSH keys, as well as nodes being used for purposes of plaintext surveillance. While it is not realistic to expect to catch extremely targeted or completely passive malicious adversaries, the goal is to prevent malicious adversaries from deploying dragnet attacks against large segments of the Tor userbase. Scanning methodology: The first scans to be implemented are HTTP, HTML, Javascript, and SSL scans. The HTTP scan scrapes Google for common filetype urls such as exe, msi, doc, dmg, etc. It then fetches these urls through Non-Tor and Tor, and compares the SHA1 hashes of the resulting content. The SSL scan downloads certificates for all IPs a domain will locally resolve to and compares these certificates to those seen over Tor. The scanner notes if a domain had rotated certificates locally in the results for each scan. The HTML scan checks HTML, Javascript, and plugin content for modifications. Because of the dynamic nature of most of the web, the scanner has a number of mechanisms built in to filter out false positives that are used when a change is noticed between Tor and Non-Tor. All tests also share a URL-based false positive filter that automatically removes results retroactively if the number of failures exceeds a certain percentage of nodes tested with the URL. Deployment Stages: To avoid instances where bugs cause us to mark exit nodes as BadExit improperly, it is proposed that we begin use of the scanner in stages. 1. Manual Review: In the first stage, basic scans will be run by a small number of people while we stabilize the scanner. The scanner has the ability to resume crashed scans, and to rescan nodes that fail various tests. 2. Human Review: In the second stage, results will be automatically mailed to an email list of interested parties for review. We will also begin classifying failure types into three to four different severity levels, based on both the reliability of the test and the nature of the failure. 3. Automatic BadExit Marking: In the final stage, the scanner will begin marking exits depending on the failure severity level in one of three different ways: by node idhex, by node IP, or by node IP mask. A potential fourth, less severe category of results may still be delivered via email only for review. BadExit markings will be delivered in batches upon completion of whole-network scans, so that the final false positive filter has an opportunity to filter out URLs that exhibit dynamic content beyond what we can filter. Specification of Exit Marking: Technically, BadExit could be marked via SETCONF AuthDirBadExit over the control port, but this would allow full access to the directory authority configuration and operation. The approved-routers file could also be used, but currently it only supports fingerprints, and it also contains other data unrelated to exit scanning that would be difficult to coordinate. Instead, we propose that a new badexit-routers file that has three keywords: BadExitNet 1*[exitpattern from 2.3 in dir-spec.txt] BadExitFP 1*[hexdigest from 2.3 in dir-spec.txt] BadExitNet lines would follow the codepaths used by AuthDirBadExit to set authdir_badexit_policy, and BadExitFP would follow the codepaths from approved-router's !badexit lines. The scanner would have exclusive ability to write, append, rewrite, and modify this file. Prior to building a new consensus vote, a participating Tor authority would read in a fresh copy. Security Implications: Aside from evading the scanner's detection, there are two additional high-level security considerations: 1. Ensure nodes cannot be marked BadExit by an adversary at will It is possible individual website owners will be able to target certain Tor nodes, but once they begin to attempt to fail more than the URL filter percentage of the exits, their sites will be automatically discarded. Failing specific nodes is possible, but scanned results are fully reproducible, and BadExits should be rare enough that humans are never fully removed from the loop. State (cookies, cache, etc) does not otherwise persist in the scanner between exit nodes to enable one exit node to bias the results of a later one. 2. Ensure that scanner compromise does not yield authority compromise Having a separate file that is under the exclusive control of the scanner allows us to heavily isolate the scanner from the Tor authority, potentially even running them on separate machines. ```