diff options
Diffstat (limited to 'doc/HACKING')
-rw-r--r-- | doc/HACKING/CircuitPaddingQuickStart.md | 22 | ||||
-rw-r--r-- | doc/HACKING/CodeStructure.md | 123 | ||||
-rw-r--r-- | doc/HACKING/CodingStandards.md | 129 | ||||
-rw-r--r-- | doc/HACKING/CodingStandardsRust.md | 262 | ||||
-rw-r--r-- | doc/HACKING/Fuzzing.md | 38 | ||||
-rw-r--r-- | doc/HACKING/GettingStarted.md | 8 | ||||
-rw-r--r-- | doc/HACKING/GettingStartedRust.md | 40 | ||||
-rw-r--r-- | doc/HACKING/HelpfulTools.md | 130 | ||||
-rw-r--r-- | doc/HACKING/Module.md | 6 | ||||
-rw-r--r-- | doc/HACKING/README.1st.md | 4 | ||||
-rw-r--r-- | doc/HACKING/ReleasingTor.md | 15 | ||||
-rw-r--r-- | doc/HACKING/Tracing.md | 91 | ||||
-rw-r--r-- | doc/HACKING/WritingTests.md | 160 | ||||
-rw-r--r-- | doc/HACKING/android/Simpleperf.md | 8 | ||||
-rw-r--r-- | doc/HACKING/tracing/EventsCircuit.md | 139 | ||||
-rw-r--r-- | doc/HACKING/tracing/README.md | 163 |
16 files changed, 802 insertions, 536 deletions
diff --git a/doc/HACKING/CircuitPaddingQuickStart.md b/doc/HACKING/CircuitPaddingQuickStart.md index 2780b5c6ea..2b01dae074 100644 --- a/doc/HACKING/CircuitPaddingQuickStart.md +++ b/doc/HACKING/CircuitPaddingQuickStart.md @@ -18,20 +18,20 @@ The quick and dirty plan is to: ## Clone and compile tor -```bash -git clone https://git.torproject.org/tor.git -cd tor -git checkout tor-0.4.1.5 +```console +$ git clone https://git.torproject.org/tor.git +$ cd tor +$ git checkout tor-0.4.1.5 ``` Above we use the tag for tor-0.4.1.5 where the circuit padding framework was released. Note that this version of the framework is missing many features and fixes that have since been merged to origin/master. If you need the newest framework features, you should use that master instead. -```bash -sh autogen.sh -./configure -make +```console +$ sh autogen.sh +$ ./configure +$ make ``` When you run `./configure` you'll be told of missing dependencies and packages to install on debian-based distributions. Important: if you plan to run `tor` on @@ -186,9 +186,9 @@ We also have to modify `circpad_machines_init()` in `circuitpadding.c` to register our machines: ```c - /* Register machines for the APE WF defense */ - circpad_machine_client_wf_ape(origin_padding_machines); - circpad_machine_relay_wf_ape(relay_padding_machines); +/* Register machines for the APE WF defense */ +circpad_machine_client_wf_ape(origin_padding_machines); +circpad_machine_relay_wf_ape(relay_padding_machines); ``` We run `make` to get a new `tor` binary and copy it to our local TB. diff --git a/doc/HACKING/CodeStructure.md b/doc/HACKING/CodeStructure.md deleted file mode 100644 index d387018f9b..0000000000 --- a/doc/HACKING/CodeStructure.md +++ /dev/null @@ -1,123 +0,0 @@ -# Code Structure - -TODO: revise this to talk about how things are, rather than how things -have changed. - -For quite a while now, the program *tor* has been built from source -code in just two directories: **src/common** and **src/or**. - -This has become more-or-less untenable, for a few reasons -- most -notably of which is that it has led our code to become more -spaghetti-ish than I can endorse with a clean conscience. - -So to fix that, we've gone and done a huge code movement in our git -master branch, which will land in a release once Tor `0.3.5.1-alpha` is -out. - -Here's what we did: - - * **src/common** has been turned into a set of static libraries. These -all live in the **src/lib/*** directories. The dependencies between -these libraries should have no cycles. The libraries are: - - - **arch** -- Headers to handle architectural differences - - **cc** -- headers to handle differences among compilers - - **compress** -- wraps zlib, zstd, lzma - - **container** -- high-level container types - - **crypt_ops** -- Cryptographic operations. Planning to split this into -a higher and lower level library - - **ctime** -- Operations that need to run in constant-time. (Properly, -data-invariant time) - - **defs** -- miscelaneous definitions needed throughout Tor. - - **encoding** -- transforming one data type into another, and various -data types into strings. - - **err** -- lowest-level error handling, in cases where we can't use -the logs because something that the logging system needs has broken. - - **evloop** -- Generic event-loop handling logic - - **fdio** -- Low-level IO wrapper functions for file descriptors. - - **fs** -- Operations on the filesystem - - **intmath** -- low-level integer math and misc bit-twiddling hacks - - **lock** -- low-level locking code - - **log** -- Tor's logging module. This library sits roughly halfway up -the library dependency diagram, since everything it depends on has to -be carefully crafted to *not* log. - - **malloc** -- Low-level wrappers for the platform memory allocation functions. - - **math** -- Higher-level mathematical functions, and floating-point math - - **memarea** -- An arena allocator - - **meminfo** -- Functions for querying the current process's memory -status and resources - - **net** -- Networking compatibility and convenience code - - **osinfo** -- Querying information about the operating system - - **process** -- Launching and querying the status of other processes - - **sandbox** -- Backend for the linux seccomp2 sandbox - - **smartlist_core** -- The lowest-level of the smartlist_t data type. -Separated from the rest of the containers library because the logging -subsystem depends on it. - - **string** -- Compatibility and convenience functions for manipulating -C strings. - - **term** -- Terminal-related functions (currently limited to a getpass -function). - - **testsupport** -- Macros for mocking, unit tests, etc. - - **thread** -- Higher-level thread compatibility code - - **time** -- Higher-level time management code, including format -conversions and monotonic time - - **tls** -- Our wrapper around our TLS library - - **trace** -- Formerly src/trace -- a generic event tracing API - - **wallclock** -- Low-level time code, used by the log module. - - * To ensure that the dependency graph in **src/common** remains under -control, there is a tool that you can run called `make -check-includes`. It verifies that each module in Tor only includes -the headers that it is permitted to include, using a per-directory -*.may_include* file. - - * The **src/or/or.h** header has been split into numerous smaller -headers. Notably, many important structures are now declared in a -header called *foo_st.h*, where "foo" is the name of the structure. - - * The **src/or** directory, which had most of Tor's code, had been split -up into several directories. This is still a work in progress: This -code has not itself been refactored, and its dependency graph is still -a tangled web. I hope we'll be working on that over the coming -releases, but it will take a while to do. - - - The new top-level source directories are: - - **src/core** -- Code necessary to actually perform or use onion routing. - - **src/feature** -- Code used only by some onion routing -configurations, or only for a special purpose. - - **src/app** -- Top-level code to run, invoke, and configure the -lower-level code - - - The new second-level source directories are: - - **src/core/crypto** -- High-level cryptographic protocols used in Tor - - **src/core/mainloop** -- Tor's event loop, connection-handling, and -traffic-routing code. - - **src/core/or** -- Parts related to handling onion routing itself - - **src/core/proto** -- support for encoding and decoding different -wire protocols - - **src/feature/api** -- Support for making Tor embeddable - - **src/feature/client** -- Functionality which only Tor clients need - - **src/feature/control** -- Controller implementation - - **src/feature/dirauth** -- Directory authority - - **src/feature/dircache** -- Directory cache - - **src/feature/dirclient** -- Directory client - - **src/feature/dircommon** -- Shared code between the other directory modules - - **src/feature/hibernate** -- Hibernating when Tor is out of bandwidth -or shutting down - - **src/feature/hs** -- v3 onion service implementation - - **src/feature/hs_common** -- shared code between both onion service -implementations - - **src/feature/nodelist** -- storing and accessing the list of relays on -the network. - - **src/feature/relay** -- code that only relay servers and exit servers need. - - **src/feature/rend** -- v2 onion service implementation - - **src/feature/stats** -- statistics and history - - **src/app/config** -- configuration and state for Tor - - **src/app/main** -- Top-level functions to invoke the rest or Tor. - - * The `tor` executable is now built in **src/app/tor** rather than **src/or/tor**. - - * There are more static libraries than before that you need to build -into your application if you want to embed Tor. Rather than -maintaining this list yourself, I recommend that you run `make -show-libs` to have Tor emit a list of what you need to link. diff --git a/doc/HACKING/CodingStandards.md b/doc/HACKING/CodingStandards.md index 150acf1852..cd3417d0b5 100644 --- a/doc/HACKING/CodingStandards.md +++ b/doc/HACKING/CodingStandards.md @@ -118,27 +118,36 @@ instance of the feature (--reverse). For example, for #30224, we wanted to know when the bridge-distribution-request feature was introduced into Tor: - $ git log -S bridge-distribution-request --reverse - commit ebab521525 - Author: Roger Dingledine <arma@torproject.org> - Date: Sun Nov 13 02:39:16 2016 -0500 - Add new BridgeDistribution config option +```console +$ git log -S bridge-distribution-request --reverse commit ebab521525 +Author: Roger Dingledine <arma@torproject.org> +Date: Sun Nov 13 02:39:16 2016 -0500 - $ git describe --contains ebab521525 - tor-0.3.2.3-alpha~15^2~4 + Add new BridgeDistribution config option + +$ git describe --contains ebab521525 +tor-0.3.2.3-alpha~15^2~4 +``` If you need to know all the Tor versions that contain a commit, use: - $ git tag --contains 9f2efd02a1 | sort -V - tor-0.2.5.16 - tor-0.2.8.17 - tor-0.2.9.14 - tor-0.2.9.15 - ... - tor-0.3.0.13 - tor-0.3.1.9 - tor-0.3.1.10 - ... + +```console +$ git tag --contains 9f2efd02a1 | sort -V +tor-0.2.5.16 +tor-0.2.8.17 +tor-0.2.9.14 +tor-0.2.9.15 +... +tor-0.3.0.13 +tor-0.3.1.9 +tor-0.3.1.10 +... +``` + +If a bug was introduced before the oldest currently supported release series +of Tor, and it's hard to track down where it was introduced, you may say +"bugfix on all supported versions of Tor." If at all possible, try to create the changes file in the same commit where you are making the change. Please give it a distinctive name that no other @@ -180,6 +189,14 @@ What needs a changes file? What does not need a changes file? * Bugfixes for code that hasn't shipped in any released version of Tor + * Any change to a file that is not distributed in the tarball. This + includes: + * Any change to our CI configuration that does not affect the distributed + source. + * Any change to developer-only tools, unless those tools are distributed + in the tarball. + * Non-functional code movement. + * Identifier re-namings, comment edits, spelling fixes, and so on. Why use changes files instead of Git commit messages? @@ -438,8 +455,10 @@ use `tor_assert_nonfatal()` in place of `tor_assert()`. If you'd like to write a conditional that incorporates a nonfatal assertion, use the `BUG()` macro, as in: - if (BUG(ptr == NULL)) - return -1; +```c +if (BUG(ptr == NULL)) + return -1; +``` ## Allocator conventions @@ -451,33 +470,39 @@ Also, a type named `abc_t` should be freed by a function named `abc_free_()`. Don't call this `abc_free_()` function directly -- instead, wrap it in a macro called `abc_free()`, using the `FREE_AND_NULL` macro: - void abc_free_(abc_t *obj); - #define abc_free(obj) FREE_AND_NULL(abc_t, abc_free_, (obj)) +```c +void abc_free_(abc_t *obj); +#define abc_free(obj) FREE_AND_NULL(abc_t, abc_free_, (obj)) +``` This macro will free the underlying `abc_t` object, and will also set the object pointer to NULL. You should define all `abc_free_()` functions to accept NULL inputs: - void - abc_free_(abc_t *obj) - { - if (!obj) - return; - tor_free(obj->name); - thing_free(obj->thing); - tor_free(obj); - } +```c +void +abc_free_(abc_t *obj) +{ + if (!obj) + return; + tor_free(obj->name); + thing_free(obj->thing); + tor_free(obj); +} +``` If you need a free function that takes a `void *` argument (for example, to use it as a function callback), define it with a name like `abc_free_void()`: - static void - abc_free_void_(void *obj) - { - abc_free_(obj); - } +```c +static void +abc_free_void_(void *obj) +{ + abc_free_(obj); +} +``` When deallocating, don't say e.g. `if (x) tor_free(x)`. The convention is to have deallocators do nothing when NULL pointer is passed. @@ -488,24 +513,28 @@ Say what functions do as a series of one or more imperative sentences, as though you were telling somebody how to be the function. In other words, DO NOT say: - /** The strtol function parses a number. - * - * nptr -- the string to parse. It can include whitespace. - * endptr -- a string pointer to hold the first thing that is not part - * of the number, if present. - * base -- the numeric base. - * returns: the resulting number. - */ - long strtol(const char *nptr, char **nptr, int base); +```c +/** The strtol function parses a number. + * + * nptr -- the string to parse. It can include whitespace. + * endptr -- a string pointer to hold the first thing that is not part + * of the number, if present. + * base -- the numeric base. + * returns: the resulting number. + */ +long strtol(const char *nptr, char **nptr, int base); +``` Instead, please DO say: - /** Parse a number in radix <b>base</b> from the string <b>nptr</b>, - * and return the result. Skip all leading whitespace. If - * <b>endptr</b> is not NULL, set *<b>endptr</b> to the first character - * after the number parsed. - **/ - long strtol(const char *nptr, char **nptr, int base); +```c +/** Parse a number in radix <b>base</b> from the string <b>nptr</b>, + * and return the result. Skip all leading whitespace. If + * <b>endptr</b> is not NULL, set *<b>endptr</b> to the first character + * after the number parsed. + **/ +long strtol(const char *nptr, char **nptr, int base); +``` Doxygen comments are the contract in our abstraction-by-contract world: if the functions that call your function rely on it doing something, then your diff --git a/doc/HACKING/CodingStandardsRust.md b/doc/HACKING/CodingStandardsRust.md index 36a0dcda2a..c821465173 100644 --- a/doc/HACKING/CodingStandardsRust.md +++ b/doc/HACKING/CodingStandardsRust.md @@ -22,20 +22,26 @@ For example, in a hypothetical `tor_addition` Rust module: In `src/rust/tor_addition/addition.rs`: - pub fn get_sum(a: i32, b: i32) -> i32 { - a + b - } +```rust +pub fn get_sum(a: i32, b: i32) -> i32 { + a + b +} +``` In `src/rust/tor_addition/lib.rs`: - pub use addition::*; +```rust +pub use addition::*; +``` In `src/rust/tor_addition/ffi.rs`: - #[no_mangle] - pub extern "C" fn tor_get_sum(a: c_int, b: c_int) -> c_int { - get_sum(a, b) - } +```rust +#[no_mangle] +pub extern "C" fn tor_get_sum(a: c_int, b: c_int) -> c_int { + get_sum(a, b) +} +``` If your Rust code must call out to parts of Tor's C code, you must declare the functions you are calling in the `external` crate, located @@ -129,16 +135,18 @@ crate. Unittests SHOULD go into their own module inside the module they are testing, e.g. in `src/rust/tor_addition/addition.rs` you should put: - #[cfg(test)] - mod test { - use super::*; +```rust +#[cfg(test)] +mod test { + use super::*; - #[test] - fn addition_with_zero() { - let sum: i32 = get_sum(5i32, 0i32); - assert_eq!(sum, 5); - } +#[test] + fn addition_with_zero() { + let sum: i32 = get_sum(5i32, 0i32); + assert_eq!(sum, 5); } +} +``` ## Benchmarking @@ -151,13 +159,17 @@ benchmarks in the following manner. If you wish to benchmark some of your Rust code, you MUST put the following in the `[features]` section of your crate's `Cargo.toml`: - [features] - bench = [] +```toml +[features] +bench = [] +``` Next, in your crate's `lib.rs` you MUST put: - #[cfg(all(test, feature = "bench"))] - extern crate test; +```rust +#[cfg(all(test, feature = "bench"))] +extern crate test; +``` This ensures that the external crate `test`, which contains utilities for basic benchmarks, is only used when running benchmarks via `cargo @@ -166,16 +178,18 @@ bench --features bench`. Finally, to write your benchmark code, in `src/rust/tor_addition/addition.rs` you SHOULD put: - #[cfg(all(test, features = "bench"))] - mod bench { - use test::Bencher; - use super::*; +```rust +#[cfg(all(test, features = "bench"))] +mod bench { + use test::Bencher; + use super::*; - #[bench] - fn addition_small_integers(b: &mut Bencher) { - b.iter(| | get_sum(5i32, 0i32)); - } +#[bench] + fn addition_small_integers(b: &mut Bencher) { + b.iter(| | get_sum(5i32, 0i32)); } +} +``` ## Fuzzing @@ -247,39 +261,47 @@ Here are some additional bits of advice and rules: potential error with the eel operator, `?` or another non panicking way. For example, consider a function which parses a string into an integer: - fn parse_port_number(config_string: &str) -> u16 { - u16::from_str_radix(config_string, 10).unwrap() - } + ```rust + fn parse_port_number(config_string: &str) -> u16 { + u16::from_str_radix(config_string, 10).unwrap() + } + ``` There are numerous ways this can fail, and the `unwrap()` will cause the whole program to byte the dust! Instead, either you SHOULD use `ok()` (or another equivalent function which will return an `Option` or a `Result`) and change the return type to be compatible: - fn parse_port_number(config_string: &str) -> Option<u16> { - u16::from_str_radix(config_string, 10).ok() - } + ```rust + fn parse_port_number(config_string: &str) -> Option<u16> { + u16::from_str_radix(config_string, 10).ok() + } + ``` or you SHOULD use `or()` (or another similar method): - fn parse_port_number(config_string: &str) -> Option<u16> { - u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16") - } + ```rust + fn parse_port_number(config_string: &str) -> Option<u16> { + u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16") + } + ``` Using methods like `or()` can be particularly handy when you must do something afterwards with the data, for example, if we wanted to guarantee that the port is high. Combining these methods with the eel operator (`?`) makes this even easier: - fn parse_port_number(config_string: &str) -> Result<u16, Err> { - let port = u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16"))?; + ```rust + fn parse_port_number(config_string: &str) -> Result<u16, Err> { + let port = u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16"))?; - if port > 1024 { - return Ok(port); - } else { - return Err("Low ports not allowed"); - } + if port > 1024 { + return Ok(port); + } else { + return Err("Low ports not allowed"); } + } + ``` 2. `unsafe` @@ -292,25 +314,29 @@ Here are some additional bits of advice and rules: When creating an FFI in Rust for C code to call, it is NOT REQUIRED to declare the entire function `unsafe`. For example, rather than doing: - #[no_mangle] - pub unsafe extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 { - for number in &mut numbers { - *number += 1; - } - std::mem::transmute::<[u8; 4], u32>(numbers) + ```rust + #[no_mangle] + pub unsafe extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 { + for number in &mut numbers { + *number += 1; } + std::mem::transmute::<[u8; 4], u32>(numbers) + } + ``` You SHOULD instead do: - #[no_mangle] - pub extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 { - for index in 0..numbers.len() { - numbers[index] += 1; - } - unsafe { - std::mem::transmute::<[u8; 4], u32>(numbers) - } + ```rust + #[no_mangle] + pub extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 { + for index in 0..numbers.len() { + numbers[index] += 1; } + unsafe { + std::mem::transmute::<[u8; 4], u32>(numbers) + } + } + ``` 3. Pass only C-compatible primitive types and bytes over the boundary @@ -385,45 +411,51 @@ Here are some additional bits of advice and rules: rather than using an untyped mapping between strings and integers like so: - use std::collections::HashMap; + ```rust + use std::collections::HashMap; - pub fn get_elements_with_over_9000_points(map: &HashMap<String, usize>) -> Vec<String> { - ... - } + pub fn get_elements_with_over_9000_points(map: &HashMap<String, usize>) -> Vec<String> { + ... + } + ``` It would be safer to define a new type, such that some other usage of `HashMap<String, usize>` cannot be confused for this type: - pub struct DragonBallZPowers(pub HashMap<String, usize>); + ```rust + pub struct DragonBallZPowers(pub HashMap<String, usize>); - impl DragonBallZPowers { - pub fn over_nine_thousand<'a>(&'a self) -> Vec<&'a String> { - let mut powerful_enough: Vec<&'a String> = Vec::with_capacity(5); + impl DragonBallZPowers { + pub fn over_nine_thousand<'a>(&'a self) -> Vec<&'a String> { + let mut powerful_enough: Vec<&'a String> = Vec::with_capacity(5); - for (character, power) in &self.0 { - if *power > 9000 { - powerful_enough.push(character); - } - } - powerful_enough - } - } + for (character, power) in &self.0 { + if *power > 9000 { + powerful_enough.push(character); + } + } + powerful_enough + } + } + ``` Note the following code, which uses Rust's type aliasing, is valid but it does NOT meet the desired type safety goals: - pub type Power = usize; + ```rust + pub type Power = usize; - pub fn over_nine_thousand(power: &Power) -> bool { - if *power > 9000 { - return true; - } - false + pub fn over_nine_thousand(power: &Power) -> bool { + if *power > 9000 { + return true; } + false + } - // We can still do the following: - let his_power: usize = 9001; - over_nine_thousand(&his_power); + // We can still do the following: + let his_power: usize = 9001; + over_nine_thousand(&his_power); + ``` 7. Unsafe mucking around with lifetimes @@ -431,15 +463,17 @@ Here are some additional bits of advice and rules: family of types, individual lifetimes can be treated as types. For example, one can arbitrarily extend and shorten lifetime using `std::mem::transmute`: - struct R<'a>(&'a i32); + ```rust + struct R<'a>(&'a i32); - unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> { - std::mem::transmute::<R<'b>, R<'static>>(r) - } + unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> { + std::mem::transmute::<R<'b>, R<'static>>(r) + } - unsafe fn shorten_invariant_lifetime<'b, 'c>(r: &'b mut R<'static>) -> &'b mut R<'c> { - std::mem::transmute::<&'b mut R<'static>, &'b mut R<'c>>(r) - } + unsafe fn shorten_invariant_lifetime<'b, 'c>(r: &'b mut R<'static>) -> &'b mut R<'c> { + std::mem::transmute::<&'b mut R<'static>, &'b mut R<'c>>(r) + } + ``` Calling `extend_lifetime()` would cause an `R` passed into it to live forever for the life of the program (the `'static` lifetime). Similarly, @@ -460,12 +494,14 @@ Here are some additional bits of advice and rules: For example, `std::mem::transmute` can be abused in ways where casting with `as` would be both simpler and safer: - // Don't do this - let ptr = &0; - let ptr_num_transmute = unsafe { std::mem::transmute::<&i32, usize>(ptr)}; + ```rust + // Don't do this + let ptr = &0; + let ptr_num_transmute = unsafe { std::mem::transmute::<&i32, usize>(ptr)}; - // Use an `as` cast instead - let ptr_num_cast = ptr as *const i32 as usize; + // Use an `as` cast instead + let ptr_num_cast = ptr as *const i32 as usize; + ``` In fact, using `std::mem::transmute` for *any* reason is a code smell and as such SHOULD be avoided. @@ -475,8 +511,10 @@ Here are some additional bits of advice and rules: This is generally fine to do, but it has some behaviours which you should be aware of. Casting down chops off the high bits, e.g.: - let x: u32 = 4294967295; - println!("{}", x as u16); // prints 65535 + ```rust + let x: u32 = 4294967295; + println!("{}", x as u16); // prints 65535 + ``` Some cases which you MUST NOT do include: @@ -487,24 +525,28 @@ Here are some additional bits of advice and rules: * Casting between integers and floats when the thing being cast cannot fit into the type it is being casted into, e.g.: - println!("{}", 42949.0f32 as u8); // prints 197 in debug mode and 0 in release - println!("{}", 1.04E+17 as u8); // prints 0 in both modes - println!("{}", (0.0/0.0) as i64); // prints whatever the heck LLVM wants + ```rust + println!("{}", 42949.0f32 as u8); // prints 197 in debug mode and 0 in release + println!("{}", 1.04E+17 as u8); // prints 0 in both modes + println!("{}", (0.0/0.0) as i64); // prints whatever the heck LLVM wants + ``` Because this behaviour is undefined, it can even produce segfaults in safe Rust code. For example, the following program built in release mode segfaults: - #[inline(never)] - pub fn trigger_ub(sl: &[u8; 666]) -> &[u8] { - // Note that the float is out of the range of `usize`, invoking UB when casting. - let idx = 1e99999f64 as usize; - &sl[idx..] // The bound check is elided due to `idx` being of an undefined value. - } - - fn main() { - println!("{}", trigger_ub(&[1; 666])[999999]); // ~ out of bound - } + ```rust + #[inline(never)] + pub fn trigger_ub(sl: &[u8; 666]) -> &[u8] { + // Note that the float is out of the range of `usize`, invoking UB when casting. + let idx = 1e99999f64 as usize; + &sl[idx..] // The bound check is elided due to `idx` being of an undefined value. + } + + fn main() { + println!("{}", trigger_ub(&[1; 666])[999999]); // ~ out of bound + } + ``` And in debug mode panics with: diff --git a/doc/HACKING/Fuzzing.md b/doc/HACKING/Fuzzing.md index 487716bb6d..d9e133008a 100644 --- a/doc/HACKING/Fuzzing.md +++ b/doc/HACKING/Fuzzing.md @@ -6,7 +6,10 @@ Check out fuzzing-corpora, and set TOR_FUZZ_CORPORA to point to the place where you checked it out. To run the fuzzing test cases in a deterministic fashion, use: - make test-fuzz-corpora + +```console +$ make test-fuzz-corpora +``` This won't actually fuzz Tor! It will just run all the fuzz binaries on our existing set of testcases for the fuzzer. @@ -58,11 +61,13 @@ machine you care about, anyway. To Build: Get AFL from http://lcamtuf.coredump.cx/afl/ and unpack it - cd afl - make - cd ../tor - PATH=$PATH:../afl/ CC="../afl/afl-gcc" ./configure --enable-expensive-hardening - AFL_HARDEN=1 make clean fuzzers + ```console + $ cd afl + $ make + $ cd ../tor + $ PATH=$PATH:../afl/ CC="../afl/afl-gcc" ./configure --enable-expensive-hardening + $ AFL_HARDEN=1 make clean fuzzers + ``` To Find The ASAN Memory Limit: (64-bit only) @@ -75,10 +80,12 @@ Read afl/docs/notes_for_asan.txt for more details. Download recidivm from https://jwilk.net/software/recidivm Download the signature Check the signature - tar xvzf recidivm*.tar.gz - cd recidivm* - make - /path/to/recidivm -v src/test/fuzz/fuzz-http + ```console + $ tar xvzf recidivm*.tar.gz + $ cd recidivm* + $ make + $ /path/to/recidivm -v src/test/fuzz/fuzz-http + ``` Use the final "ok" figure as the input to -m when calling afl-fuzz (Normally, recidivm would output a figure automatically, but in some cases, the fuzzing harness will hang when the memory limit is too small.) @@ -88,9 +95,11 @@ don't care about memory limits. To Run: - mkdir -p src/test/fuzz/fuzz_http_findings - ../afl/afl-fuzz -i ${TOR_FUZZ_CORPORA}/http -o src/test/fuzz/fuzz_http_findings -m <asan-memory-limit> -- src/test/fuzz/fuzz-http +```console +$ mkdir -p src/test/fuzz/fuzz_http_findings +$ ../afl/afl-fuzz -i ${TOR_FUZZ_CORPORA}/http -o src/test/fuzz/fuzz_http_findings -m <asan-memory-limit> -- src/test/fuzz/fuzz-http +``` AFL has a multi-core mode, check the documentation for details. You might find the included fuzz-multi.sh script useful for this. @@ -109,7 +118,10 @@ valid inputs may take a second or so, particularly with the fuzzer and sanitizers enabled. To see what fuzz-http is doing with a test case, call it like this: - src/test/fuzz/fuzz-http --debug < /path/to/test.case + +```console +$ src/test/fuzz/fuzz-http --debug < /path/to/test.case +``` (Logging is disabled while fuzzing to increase fuzzing speed.) diff --git a/doc/HACKING/GettingStarted.md b/doc/HACKING/GettingStarted.md index 633a7f0417..ee3da10a4b 100644 --- a/doc/HACKING/GettingStarted.md +++ b/doc/HACKING/GettingStarted.md @@ -37,13 +37,17 @@ Once you've reached this point, here's what you need to know. We keep our source under version control in Git. To get the latest version, run: - git clone https://git.torproject.org/git/tor + ```console + $ git clone https://git.torproject.org/git/tor + ``` This will give you a checkout of the master branch. If you're going to fix a bug that appears in a stable version, check out the appropriate "maint" branch, as in: - git checkout maint-0.4.3 + ```console + $ git checkout maint-0.4.3 + ``` 2. Find your way around the source. diff --git a/doc/HACKING/GettingStartedRust.md b/doc/HACKING/GettingStartedRust.md index af80018f4e..adacf8afc2 100644 --- a/doc/HACKING/GettingStartedRust.md +++ b/doc/HACKING/GettingStartedRust.md @@ -54,7 +54,9 @@ fetching dependencies from Cargo or specifying a local directory. **Fetch dependencies from Cargo** - ./configure --enable-rust --enable-cargo-online-mode +```console +$ ./configure --enable-rust --enable-cargo-online-mode +``` **Using a local dependency cache** @@ -66,13 +68,17 @@ We vendor our Rust dependencies in a separate repo using [cargo-vendor](https://github.com/alexcrichton/cargo-vendor). To use them, do: - git submodule init - git submodule update +```console +$ git submodule init +$ git submodule update +``` To specify the local directory containing the dependencies, (assuming you are in the top level of the repository) configure tor with: - TOR_RUST_DEPENDENCIES='path_to_dependencies_directory' ./configure --enable-rust +```console +$ TOR_RUST_DEPENDENCIES='path_to_dependencies_directory' ./configure --enable-rust +``` (Note that `TOR_RUST_DEPENDENCIES` must be the full path to the directory; it cannot be relative.) @@ -80,7 +86,9 @@ cannot be relative.) Assuming you used the above `git submodule` commands and you're in the topmost directory of the repository, this would be: - TOR_RUST_DEPENDENCIES=`pwd`/src/ext/rust/crates ./configure --enable-rust +```console +$ TOR_RUST_DEPENDENCIES=`pwd`/src/ext/rust/crates ./configure --enable-rust +``` ## Identifying which modules to rewrite @@ -102,10 +110,12 @@ areas of responsibility. A good first step is to build a module-level callgraph to understand how interconnected your target module is. - git clone https://git.torproject.org/user/nickm/calltool.git - cd tor - CFLAGS=0 ./configure - ../calltool/src/main.py module_callgraph +```console +$ git clone https://git.torproject.org/user/nickm/calltool.git +$ cd tor +$ CFLAGS=0 ./configure +$ ../calltool/src/main.py module_callgraph +``` The output will tell you each module name, along with a set of every module that the module calls. Modules which call fewer other modules are better targets. @@ -156,15 +166,21 @@ run on your crate. Configure Tor's build system to build with Rust enabled: - ./configure --enable-fatal-warnings --enable-rust --enable-cargo-online-mode +```console +$ ./configure --enable-fatal-warnings --enable-rust --enable-cargo-online-mode +``` Tor's test should be run by doing: - make check +```console +$ make check +``` Tor's integration tests should also pass: - make test-stem +```console +$ make test-stem +``` ## Submitting a patch diff --git a/doc/HACKING/HelpfulTools.md b/doc/HACKING/HelpfulTools.md index 15bd153318..0ce59576f0 100644 --- a/doc/HACKING/HelpfulTools.md +++ b/doc/HACKING/HelpfulTools.md @@ -43,7 +43,9 @@ Builds should show up on the web at jenkins.torproject.org and on IRC at ## Valgrind - valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/app/tor +```console +$ valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/app/tor +``` (Note that if you get a zillion openssl warnings, you will also need to pass `--undef-value-errors=no` to valgrind, or rebuild your openssl @@ -77,10 +79,12 @@ we wish to permit are also documented in the blacklist file. Lcov is a utility that generates pretty HTML reports of test code coverage. To generate such a report: - ./configure --enable-coverage - make - make coverage-html - $BROWSER ./coverage_html/index.html +```console +$ ./configure --enable-coverage +$ make +$ make coverage-html +$ $BROWSER ./coverage_html/index.html +``` This will run the tor unit test suite `./src/test/test` and generate the HTML coverage code report under the directory `./coverage_html/`. To change the @@ -93,36 +97,48 @@ investigated (as of July 2014). To quickly run all the tests distributed with Tor: - make check +```console +$ make check +``` To run the fast unit tests only: - make test +```console +$ make test +``` To selectively run just some tests (the following can be combined arbitrarily): - ./src/test/test <name_of_test> [<name of test 2>] ... - ./src/test/test <prefix_of_name_of_test>.. [<prefix_of_name_of_test2>..] ... - ./src/test/test :<name_of_excluded_test> [:<name_of_excluded_test2]... +```console +$ ./src/test/test <name_of_test> [<name of test 2>] ... +$ ./src/test/test <prefix_of_name_of_test>.. [<prefix_of_name_of_test2>..] ... +$ ./src/test/test :<name_of_excluded_test> [:<name_of_excluded_test2]... +``` To run all tests, including those based on Stem or Chutney: - make test-full +```console +$ make test-full +``` To run all tests, including those based on Stem or Chutney that require a working connection to the internet: - make test-full-online +```console +$ make test-full-online +``` ## Running gcov for unit test coverage - ./configure --enable-coverage - make - make check - # or--- make test-full ? make test-full-online? - mkdir coverage-output - ./scripts/test/coverage coverage-output +```console +$ ./configure --enable-coverage +$ make +$ make check +$ # or--- make test-full ? make test-full-online? +$ mkdir coverage-output +$ ./scripts/test/coverage coverage-output +``` (On OSX, you'll need to start with `--enable-coverage CC=clang`.) @@ -145,7 +161,9 @@ you can run `make reset-gcov` to clear the intermediary gcov output. If you have two different `coverage-output` directories, and you want to see a meaningful diff between them, you can run: - ./scripts/test/cov-diff coverage-output1 coverage-output2 | less +```console +$ ./scripts/test/cov-diff coverage-output1 coverage-output2 | less +``` In this diff, any lines that were visited at least once will have coverage "1", and line numbers are deleted. This lets you inspect what you (probably) really @@ -313,12 +331,16 @@ that you're using the emacs-specific version of `etags` (bundled under the If you're using vim or emacs, you can also use Universal Ctags to build a tag file using the syntax: - ctags -R -D 'MOCK_IMPL(r,h,a)=r h a' . +```console +$ ctags -R -D 'MOCK_IMPL(r,h,a)=r h a' . +``` If you're using an older version of Universal Ctags, you can use the following instead: - ctags -R --mline-regex-c='/MOCK_IMPL\([^,]+,\W*([a-zA-Z0-9_]+)\W*,/\1/f/{mgroup=1}' . +```console +ctags -R --mline-regex-c='/MOCK_IMPL\([^,]+,\W*([a-zA-Z0-9_]+)\W*,/\1/f/{mgroup=1}' . +``` A vim-compatible tag file will be generated by default. If you use emacs, add the `-e` flag to generate an emacs-compatible tag file. @@ -330,50 +352,58 @@ source code. Here's how to use it: 1. Begin every file that should be documented with - /** - * \file filename.c - * \brief Short description of the file. - */ +``` + /** + * \file filename.c + * \brief Short description of the file. + */ +``` - (Doxygen will recognize any comment beginning with /** as special.) + (Doxygen will recognize any comment beginning with /** as special.) 2. Before any function, structure, #define, or variable you want to document, add a comment of the form: - /** Describe the function's actions in imperative sentences. - * - * Use blank lines for paragraph breaks - * - and - * - hyphens - * - for - * - lists. - * - * Write <b>argument_names</b> in boldface. - * - * \code - * place_example_code(); - * between_code_and_endcode_commands(); - * \endcode - */ +``` +/** Describe the function's actions in imperative sentences. + * + * Use blank lines for paragraph breaks + * - and + * - hyphens + * - for + * - lists. + * + * Write <b>argument_names</b> in boldface. + * + * \code + * place_example_code(); + * between_code_and_endcode_commands(); + * \endcode + */ +``` 3. Make sure to escape the characters `<`, `>`, `\`, `%` and `#` as `\<`, `\>`, `\\`, `\%` and `\#`. 4. To document structure members, you can use two forms: - struct foo { - /** You can put the comment before an element; */ - int a; - int b; /**< Or use the less-than symbol to put the comment - * after the element. */ - }; +```c +struct foo { + /** You can put the comment before an element; */ + int a; + int b; /**< Or use the less-than symbol to put the comment + * after the element. */ +}; +``` 5. To generate documentation from the Tor source code, type: - $ doxygen -g +```console +$ doxygen -g +``` - to generate a file called `Doxyfile`. Edit that file and run - `doxygen` to generate the API documentation. + to generate a file called `Doxyfile`. Edit that file and run + `doxygen` to generate the API documentation. 6. See the Doxygen manual for more information; this summary just scratches the surface. diff --git a/doc/HACKING/Module.md b/doc/HACKING/Module.md index f8a9773d47..b9d3a654eb 100644 --- a/doc/HACKING/Module.md +++ b/doc/HACKING/Module.md @@ -70,7 +70,7 @@ There are couples of "rules" you want to follow: base. Every entry point should have a second definition if the module is disabled. For instance: - ``` + ```c #ifdef HAVE_MODULE_DIRAUTH int sr_init(int save_to_disk); @@ -109,7 +109,9 @@ There are couples of "rules" you want to follow: * When you include headers from the module, **always** use the full module path in your statement. Example: - `#include "feature/dirauth/dirvote.h"` +```c +#include "feature/dirauth/dirvote.h"` +``` The main reason is that we do **not** add the module include path by default so it needs to be specified. But also, it helps our human brain understand diff --git a/doc/HACKING/README.1st.md b/doc/HACKING/README.1st.md index 2278a61d6c..4bc3298c67 100644 --- a/doc/HACKING/README.1st.md +++ b/doc/HACKING/README.1st.md @@ -32,7 +32,9 @@ For an explanation of how to change Tor's design to work differently, look at For the latest version of the code, get a copy of git, and - git clone https://git.torproject.org/git/tor +```console +$ git clone https://git.torproject.org/git/tor +``` ## Stay in touch diff --git a/doc/HACKING/ReleasingTor.md b/doc/HACKING/ReleasingTor.md index 2464d8afb4..d9b8d9d823 100644 --- a/doc/HACKING/ReleasingTor.md +++ b/doc/HACKING/ReleasingTor.md @@ -38,9 +38,9 @@ new Tor release: 3. Run checks that aren't covered above, including: - * clang scan-build. (See the script in ./scripts/test/scan_build.sh) + * `clang scan-build`. (See the script in ./scripts/test/scan_build.sh) - * make test-network and make test-network-all (with + * `make test-network` and make `test-network-all` (with --enable-fragile-hardening) * Running Tor yourself and making sure that it actually works for you. @@ -57,8 +57,7 @@ new Tor release: of them and reordering to focus on what users and funders would find interesting and understandable. - To do this, run - `./scripts/maint/sortChanges.py changes/* > changelog.in` + To do this, run `./scripts/maint/sortChanges.py changes/* > changelog.inx` to combine headings and sort the entries. Copy the changelog.in file into the ChangeLog. Run 'format_changelog.py' (see below) to clean up the line breaks. @@ -164,9 +163,11 @@ new Tor release: 1. Sign the tarball, then sign and push the git tag: - gpg -ba <the_tarball> - git tag -s tor-0.4.x.y-<status> - git push origin tag tor-0.4.x.y-<status> +```console +$ gpg -ba <the_tarball> +$ git tag -s tor-0.4.x.y-<status> +$ git push origin tag tor-0.4.x.y-<status> +``` (You must do this before you update the website: the website scripts rely on finding the version by tag.) diff --git a/doc/HACKING/Tracing.md b/doc/HACKING/Tracing.md deleted file mode 100644 index e1e97abe6d..0000000000 --- a/doc/HACKING/Tracing.md +++ /dev/null @@ -1,91 +0,0 @@ -# Tracing - -This document describes how the event tracing subsystem works in tor so -developers can add events to the code base but also hook them to an event -tracing framework. - -## Basics - -Event tracing is separated in two concepts, trace events and a tracer. The -tracing subsystem can be found in `src/trace`. The `events.h` header file is -the main file that maps the different tracers to trace events. - -### Events - -A trace event is basically a function from which we can pass any data that -we want to collect. In addition, we specify a context for the event such as -a subsystem and an event name. - -A trace event in tor has the following standard format: - - tor_trace(subsystem, event\_name, args...) - -The `subsystem` parameter is the name of the subsytem the trace event is in. -For example that could be "scheduler" or "vote" or "hs". The idea is to add -some context to the event so when we collect them we know where it's coming -from. The `event_name` is the name of the event which helps a lot with -adding some semantic to the event. Finally, `args` is any number of -arguments we want to collect. - -Here is an example of a possible tracepoint in main(): - - tor_trace(main, init_phase, argc) - -The above is a tracepoint in the `main` subsystem with `init_phase` as the -event name and the `int argc` is passed to the event as well. - -How `argc` is collected or used has nothing to do with the instrumentation -(adding trace events to the code). It is the work of the tracer so this is why -the trace events and collection framework (tracer) are decoupled. You _can_ -have trace events without a tracer. - -### Tracer - -In `src/trace/events.h`, we map the `tor_trace()` function to the right -tracer. A tracer support is only enabled at compile time. For instance, the -file `src/trace/debug.h` contains the mapping of the generic tracing function -`tor_trace()` to the `log_debug()` function. More specialized function can be -mapped depending on the tracepoint. - -## Build System - -This section describes how it is integrated into the build system of tor. - -By default, every tracing events are disabled in tor that is `tor_trace()` -is a NOP. - -To enable a tracer, there is a configure option on the form of: - - --enable-tracing-<tracer> - -We have an option that will send every trace events to a `log_debug()` (as -mentionned above) which will print you the subsystem and name of the event but -not the arguments for technical reasons. This is useful if you want to quickly -see if your trace event is being hit or well written. To do so, use this -configure option: - - --enable-tracing-debug - -## Instrument Tor - -This is pretty easy. Let's say you want to add a trace event in -`src/feature/rend/rendcache.c`, you only have to add this include statement: - - #include "trace/events.h" - -Once done, you can add as many as you want `tor_trace()` that you need. -Please use the right subsystem (here it would be `hs`) and a unique name that -tells what the event is for. For example: - - tor_trace(hs, store_desc_as_client, desc, desc_id); - -If you look in `src/trace/events.h`, you'll see that if tracing is enabled it -will be mapped to a function called: - - tor_trace_hs_store_desc_as_client(desc, desc_id) - -And the point of all this is for that function to be defined in a new file -that you might want to add named `src/trace/hs.{c|h}` which would defined how -to collect the data for the `tor_trace_hs_store_desc_as_client()` function -like for instance sending it to a `log_debug()` or do more complex operations -or use a userspace tracer like LTTng (https://lttng.org). diff --git a/doc/HACKING/WritingTests.md b/doc/HACKING/WritingTests.md index d212020525..e1497a77c2 100644 --- a/doc/HACKING/WritingTests.md +++ b/doc/HACKING/WritingTests.md @@ -107,7 +107,9 @@ covered or uncovered. To count new or modified uncovered lines in D2, you can run: - ./scripts/test/cov-diff ${D1} ${D2}" | grep '^+ *\#' | wc -l +```console +$ ./scripts/test/cov-diff ${D1} ${D2}" | grep '^+ *\#' | wc -l +``` ## Marking lines as unreachable by tests @@ -163,28 +165,30 @@ I use the term "unit test" and "regression tests" very sloppily here. Here's an example of a test function for a simple function in util.c: - static void - test_util_writepid(void *arg) - { - (void) arg; +```c +static void +test_util_writepid(void *arg) +{ + (void) arg; - char *contents = NULL; - const char *fname = get_fname("tmp_pid"); - unsigned long pid; - char c; + char *contents = NULL; + const char *fname = get_fname("tmp_pid"); + unsigned long pid; + char c; - write_pidfile(fname); + write_pidfile(fname); - contents = read_file_to_str(fname, 0, NULL); - tt_assert(contents); + contents = read_file_to_str(fname, 0, NULL); + tt_assert(contents); - int n = sscanf(contents, "%lu\n%c", &pid, &c); - tt_int_op(n, OP_EQ, 1); - tt_int_op(pid, OP_EQ, getpid()); + int n = sscanf(contents, "%lu\n%c", &pid, &c); + tt_int_op(n, OP_EQ, 1); + tt_int_op(pid, OP_EQ, getpid()); - done: - tor_free(contents); - } +done: + tor_free(contents); +} +``` This should look pretty familiar to you if you've read the tinytest manual. One thing to note here is that we use the testing-specific @@ -214,10 +218,12 @@ macro-protected declaration of the function in the module's header. For example, `crypto_curve25519.h` contains: - #ifdef CRYPTO_CURVE25519_PRIVATE - STATIC int curve25519_impl(uint8_t *output, const uint8_t *secret, - const uint8_t *basepoint); - #endif +```c +#ifdef CRYPTO_CURVE25519_PRIVATE +STATIC int curve25519_impl(uint8_t *output, const uint8_t *secret, + const uint8_t *basepoint); +#endif +``` The `crypto_curve25519.c` file and the `test_crypto.c` file both define `CRYPTO_CURVE25519_PRIVATE`, so they can see this declaration. @@ -231,28 +237,29 @@ the test _really tests_ the code. For example, here is a _bad_ test for the unlink() function (which is supposed to remove a file). - static void - test_unlink_badly(void *arg) - { - (void) arg; - int r; +```c +static void +test_unlink_badly(void *arg) +{ + (void) arg; + int r; - const char *fname = get_fname("tmpfile"); + const char *fname = get_fname("tmpfile"); - /* If the file isn't there, unlink returns -1 and sets ENOENT */ - r = unlink(fname); - tt_int_op(n, OP_EQ, -1); - tt_int_op(errno, OP_EQ, ENOENT); + /* If the file isn't there, unlink returns -1 and sets ENOENT */ + r = unlink(fname); + tt_int_op(n, OP_EQ, -1); + tt_int_op(errno, OP_EQ, ENOENT); - /* If the file DOES exist, unlink returns 0. */ - write_str_to_file(fname, "hello world", 0); - r = unlink(fnme); - tt_int_op(r, OP_EQ, 0); - - done: - tor_free(contents); - } + /* If the file DOES exist, unlink returns 0. */ + write_str_to_file(fname, "hello world", 0); + r = unlink(fnme); + tt_int_op(r, OP_EQ, 0); +done: + tor_free(contents); +} +``` This test might get very high coverage on unlink(). So why is it a bad test? Because it doesn't check that unlink() *actually removes the @@ -273,20 +280,25 @@ To write tests for this case, you can replace the underlying functions with testing stubs while your unit test is running. You need to declare the underlying function as 'mockable', as follows: - MOCK_DECL(returntype, functionname, (argument list)); +```c +MOCK_DECL(returntype, functionname, (argument list)); +``` and then later implement it as: - MOCK_IMPL(returntype, functionname, (argument list)) - { - /* implementation here */ - } +```c +MOCK_IMPL(returntype, functionname, (argument list)) +{ + /* implementation here */ +} +``` For example, if you had a 'connect to remote server' function, you could declare it as: - - MOCK_DECL(int, connect_to_remote, (const char *name, status_t *status)); +```c +MOCK_DECL(int, connect_to_remote, (const char *name, status_t *status)); +``` When you declare a function this way, it will be declared as normal in regular builds, but when the module is built for testing, it is declared @@ -295,11 +307,15 @@ as a function pointer initialized to the actual implementation. In your tests, if you want to override the function with a temporary replacement, you say: - MOCK(functionname, replacement_function_name); +```c +MOCK(functionname, replacement_function_name); +``` And later, you can restore the original function with: - UNMOCK(functionname); +```c +UNMOCK(functionname); +``` For more information, see the definitions of this mocking logic in `testsupport.h`. @@ -324,11 +340,13 @@ cases and failure csaes. For example, consider testing this function: - /** Remove all elements E from sl such that E==element. Preserve - * the order of any elements before E, but elements after E can be - * rearranged. - */ - void smartlist_remove(smartlist_t *sl, const void *element); +```c +/** Remove all elements E from sl such that E==element. Preserve + * the order of any elements before E, but elements after E can be + * rearranged. + */ +void smartlist_remove(smartlist_t *sl, const void *element); +``` In order to test it well, you should write tests for at least all of the following cases. (These would be black-box tests, since we're only looking @@ -355,19 +373,21 @@ When you consider edge cases, you might try: Now let's look at the implementation: - void - smartlist_remove(smartlist_t *sl, const void *element) - { - int i; - if (element == NULL) +```c +void +smartlist_remove(smartlist_t *sl, const void *element) +{ + int i; + if (element == NULL) return; - for (i=0; i < sl->num_used; i++) + for (i=0; i < sl->num_used; i++) if (sl->list[i] == element) { - sl->list[i] = sl->list[--sl->num_used]; /* swap with the end */ - i--; /* so we process the new i'th element */ - sl->list[sl->num_used] = NULL; + sl->list[i] = sl->list[--sl->num_used]; /* swap with the end */ + i--; /* so we process the new i'th element */ + sl->list[sl->num_used] = NULL; } - } +} +``` Based on the implementation, we now see three more edge cases to test: @@ -484,3 +504,15 @@ targets in `Makefile.am`. (Adding new kinds of program to chutney will still require hacking the code.) + +## Other integration tests + +It's fine to write tests that use a POSIX shell to invoke Tor or test other +aspects of the system. When you do this, have a look at our existing tests +of this kind in `src/test/` to make sure that you haven't forgotten anything +important. For example: it can be tricky to make sure you're invoking Tor at +the right path in various build scenarios. + +We use a POSIX shell whenever possible here, and we use the shellcheck tool +to make sure that our scripts portable. We should only require bash for +scripts that are developer-only. diff --git a/doc/HACKING/android/Simpleperf.md b/doc/HACKING/android/Simpleperf.md index c7e63a7c86..ed640f912e 100644 --- a/doc/HACKING/android/Simpleperf.md +++ b/doc/HACKING/android/Simpleperf.md @@ -29,7 +29,9 @@ the Android Software Development Kit (SDK) and Native Development Kit 3. Install the Android Package you generated in step 1: +```bash $ adb install /path/to/your/app-fullperm-debug.apk +``` 4. Check on your device that the newly installed Orbot actually works and behaves in the way you expect it to. @@ -76,10 +78,12 @@ was spend on the call. To access binaries, `torrc` files, and other useful information on the device do the following: +```console $ adb shell (device):/ $ run-as org.torproject.android (device):/data/data/org.torproject.android $ ls app_bin app_data cache databases files lib shared_prefs +``` Descriptors, control authentication cookie, state, and other files can be found in the `app_data` directory. The `torrc` can be found in the `app_bin/` @@ -88,10 +92,14 @@ was spend on the call. - You can enable logging in Tor via the syslog (or android) log mechanism with: +```console $ adb shell (device):/ $ run-as org.torproject.android (device):/data/data/org.torproject.android $ echo -e "\nLog info syslog" >> app_bin/torrc +``` Start Tor the normal way via Orbot and collect the logs from your computer using +```console $ adb logcat +``` diff --git a/doc/HACKING/tracing/EventsCircuit.md b/doc/HACKING/tracing/EventsCircuit.md new file mode 100644 index 0000000000..42abdda856 --- /dev/null +++ b/doc/HACKING/tracing/EventsCircuit.md @@ -0,0 +1,139 @@ +# Circuit Subsystem Trace Events + +The circuit subsystem emits a series of tracing events related to a circuit +object life cycle and its state change. + +This document describes each event as in what data they record and what they +represent. + +## Background + +There are two types of circuits: origin and OR (onion router). Both of them +are derived from a base object called a general circuit. + +- Origin circuits are the ones initiated by tor itself so client or onion + service circuits for instance. + +- OR circuits are the ones going through us that we have not initiated and + thus only seen by relays. + +Many operations are done on the base (general) circuit, and some are specific +to an origin or OR. The following section describes each of them by circuit +type. + +## Trace Events + +For the LTTng tracer, the subsystem name of these events is: `tor_circuit`. + +Also, unless specified otherwise, every event emits a common set of parameters +thus they should always be expected in the following order: + +- `circ_id`: For an origin circuit, this is the global circuit identifier used + in a cell. For an OR circuit, the value is 0. + +- `purpose`: Purpose of the circuit as in what it is used for. Note that this + can change during the lifetime of a circuit. See `CIRCUIT_PURPOSE_*` in + `core/or/circuitlist.h` for an exhaustive list of the possible values. + +- `state`: State of a circuit. This changes during the lifetime of a circuit. + See `CIRCUIT_STATE_*` in `core/or/circuitlist.h` for an exhaustive list of + the possible values. + +Now, the tracing events. + +### General Circuit (`circuit_t`) + +The following events are triggered for the base circuit object and thus apply +to all types of circuits. + + * `free`: A circuit object is freed that is memory is released and not + usable anymore. After this event, no more events will be emitted for the + specific circuit object. + + * `mark_for_close`: A circuit object is marked for close that is scheduled + to be closed in a later mainloop periodic event. + + Extra parameters: + + - `end_reason`: Reason why the circuit is closed. Tor often changes that + reason to something generic sometimes in order to avoid leaking internal + reasons to the end point. Thus, this value can be different from + orig_close_reason. + + - `orig_close_reason`: Original reason why the circuit is closed. That + value never changes and contains the internal reason why we close it. It + is **never** this reason that is sent back on the circuit. + + * `change_purpose`: Purpose change. + + Extra parameters: + + (`purpose` parameter is not present) + + - `old_purpose`: Previous purpose that is no longer. + + - `new_purpose`: New purpose assigned to the circuit. + + * `change_state`: State change. + + Extra parameters: + + (`state` parameter is not present) + + - `old_state`: Previous state that is no longer. + + - `new_state`: New state assigned to the circuit. + +### Origin Circuit (`origin_circuit_t`) + +The following events are triggered only for origin circuits. + + * `new_origin`: New origin circuit has been created meaning it has been + newly allocated, initialized and added to the global list. + + * `establish`: Circuit is being established. This is the initial first step + where the path was selected and a connection to the first hop has been + launched. + + * `cannibalized`: Circuit has been cannibalized. This happens when we have + an already opened unused circuit (preemptive circuits) and it was picked. + + * `first_onion_skin`: First onion skin was sent that is the handshake with + the first hop. + + Extra parameters: + + - `fingerprint`: Identity digest (RSA) of the first hop. + + * `intermediate_onion_skin`: An intermediate onion skin was sent which can + be why any hops after the first one. There is thus `N - 1` of these events + where `N` is the total number of hops in the path. + + Extra parameters: + + - `fingerprint`: Identity digest (RSA) of the next hop. + + * `opened`: Circuit just became opened which means that all hops down the + path have negotiated the handshake between them and us and the circuit is + now ready to send cells. + + * `timeout`: Circuit has timed out that is we waited too long for the + circuit to be built. + + * `idle_timeout`: Circuit has timed out due to idleness. This is controlled + by the MaxCircuitDirtiness parameter which is 10 min by default. + +For the common use case of a 3-hop circuit, the following events should be +seen in this order: + + `new_origin` -> `establish` -> `first_onion_skin` -> + `intermediate_onion_skin` -> `intermediate_onion_skin` -> `opened` + +### OR Circuit (`or_circuit_t`) + +The following events are triggered only for OR circuits. For each of them, the +`circ_id` parameter is not present since it would always be 0. The `purpose` +and `state` remain. + + * `new_or`: New OR circuit has been created meaning it has been newly + allocated, initialized and added to the global list. diff --git a/doc/HACKING/tracing/README.md b/doc/HACKING/tracing/README.md new file mode 100644 index 0000000000..d9fb2e5341 --- /dev/null +++ b/doc/HACKING/tracing/README.md @@ -0,0 +1,163 @@ +# Tracing + +This document describes how the event tracing subsystem works in tor so +developers can add events to the code base but also hook them to an event +tracing framework (i.e. tracer). + +## WARNING ## + +Tracing the tor daemon **always** generates sensitive data if used in +production (on the public network). + +It **is** ethical for researchers to use tracing for their own tor client (for +example: building paths, timings, or performance). + +It is **NOT** ethical to archive, publish or keep data containing other users' +activity such as relay data or anything that handles users' traffic. This +of course includes any logs below notice level. + +Publishing analysis of tracing data containing user traffic is **NOT** safe +either. + +In other words, tracing data that contains other users's activity is **NOT** +safe to publish in any form. + +## Basics ### + +Tracing is separated in two different concepts. The tracing API and the +tracing probes. + +The API is in `src/lib/trace/` which defines how to call tracepoints in the +tor code. Every C files should include `src/lib/trace/events.h` if they want +to call a tracepoint. + +The probes are what actually record the tracepoint data. Because they often +need to access specific subsystem objects, the probes are within each +subsystem. They are defined in the `trace-probes-<subsystem>.c` files. + +### Events + +A trace event is basically a function from which we can pass any data that we +want to collect. In addition, we specify a context for the event such as the +subsystem and an event name. + +A trace event in tor has the following standard format: + +```c +tor_trace(subsystem, event_name, args...); +``` + +The `subsystem` parameter is the name of the subsytem the trace event is in. +For example that could be "scheduler" or "vote" or "hs". The idea is to add +some context to the event so when we collect them we know where it's coming +from. + +The `event_name` is the name of the event which adds better semantic to the +event. + +The `args` can be any number of arguments we want to collect. + +Here is an example of a possible tracepoint in main(): + +```c +tor_trace(main, init_phase, argc); +``` + +The above is a tracepoint in the `main` subsystem with `init_phase` as the +event name and the `int argc` is passed to the event as one argument. + +How `argc` is collected or used has nothing to do with the instrumentation +(adding trace events to the code). It is the work of the tracer so this is why +the trace events and collection framework (tracer) are decoupled. You _can_ +have trace events without a tracer. + +### Instrumentation ### + +In `src/lib/trace/events.h`, we map the high level `tor_trace()` macro to one +or many enabled instrumentation. + +Currently, we have 3 types of possible instrumentation: + +1. Debug + + This will map every tracepoint to `log_debug()`. However, none of the + arguments will be passed on because we don't know their type nor the string + format of the debug log. The output is standardized like this: + +``` +[debug] __FUNC__: Tracepoint <event_name> from subsystem <subsystem> hit. +``` + +2. USDT + + User Statically-Defined Tracing (USDT) is a kind of probe which can be + handled by a variety of tracers such as SystemTap, DTrace, perf, eBPF and + ftrace. + + For each tracer, one will need to define the ABI in order for the tracer to + be able to extract the data from the tracepoint objects. For instance, the + tracer needs to know how to print the circuit state of a `circuit_t` + object. + +3. LTTng-UST + + LTTng Userspace is a tracer that has it own type of instrumentation. The + probe definitions are created within the C code and is strongly typed. + + For more information, see https://lttng.org/docs. + +## Build System + +This section describes how the instrumentation is integrated into the build +system of tor. + +By default, every tracing events are disabled in tor that is `tor_trace()` is +a NOP thus has no execution cost time. + +To enable a specific instrumentation, there are configure options: + +1. Debug: `--enable-tracing-instrumentation-debug` + +2. USDT: `--enable-tracing-instrumentation-usdt` + +3. LTTng: `--enable-tracing-instrumentation-lttng` + +They can all be used together or independently. If one of them is set, +`HAVE_TRACING` define is set. And for each instrumentation, a +`USE_TRACING_INSTRUMENTATION_<type>` is set. + +## Adding a Tracepoint ## + +This is pretty easy. Let's say you want to add a trace event in +`src/feature/rend/rendcache.c`, you first need to include this file: + +```c +#include "lib/trace/events.h" +``` + +Then, the `tor_trace()` macro can be used with the specific format detailled +before in a previous section. As an example: + +```c +tor_trace(hs, store_desc_as_client, desc, desc_id); +``` + +For `Debug` instrumentation, you have nothing else to do. + +For `USDT`, instrumentation, you will need to define the probes in a way the +specific tracer can understand. For instance, SystemTap requires you to define +a `tapset` for each tracepoints. + +For `LTTng`, you will need to define the probes in the +`trace-probes-<subsystem>.{c|h}` file. See the `trace-probes-circuit.{c|h}` +file as an example and https://lttng.org/docs/v2.11/#doc-instrumenting. + +## Performance ## + +A word about performance when a tracepoint is enabled. One of the goal of a +tracepoint (USDT, LTTng-UST, ...) is that they can be enabled or disabled. By +default, they are disabled which means the tracer will not record the data but +it has to do a check thus the cost is basically the one of a `branch`. + +If enabled, then the performance depends on the tracer. In the case of +LTTng-UST, the event costs around 110nsec. |