diff options
author | Nick Mathewson <nickm@torproject.org> | 2017-09-04 11:44:56 -0400 |
---|---|---|
committer | Nick Mathewson <nickm@torproject.org> | 2017-09-04 11:44:56 -0400 |
commit | a229d6c2f836ea38d97310b16490a94b13965958 (patch) | |
tree | 6d74913f8b7320c8ec7632e52b4ed3e619dd4639 | |
parent | e2e13e7c8a2a40074805a19651d7837484d3a8b8 (diff) | |
parent | 1645c5503d0832c23a8ea68df8a6a5fcd12b3383 (diff) | |
download | tor-a229d6c2f836ea38d97310b16490a94b13965958.tar.gz tor-a229d6c2f836ea38d97310b16490a94b13965958.zip |
Merge branch 'bug22818_squashed'
-rw-r--r-- | doc/HACKING/CodingStandardsRust.md | 469 | ||||
-rw-r--r-- | doc/HACKING/GettingStartedRust.md | 158 |
2 files changed, 627 insertions, 0 deletions
diff --git a/doc/HACKING/CodingStandardsRust.md b/doc/HACKING/CodingStandardsRust.md new file mode 100644 index 0000000000..d0b17c1604 --- /dev/null +++ b/doc/HACKING/CodingStandardsRust.md @@ -0,0 +1,469 @@ + + Rust Coding Standards +======================= + +You MUST follow the standards laid out in `.../doc/HACKING/CodingStandards.md`, +where applicable. + + Module/Crate Declarations +--------------------------- + +Each Tor C module which is being rewritten MUST be in its own crate. +See the structure of `.../src/rust` for examples. + +In your crate, you MUST use `lib.rs` ONLY for pulling in external +crates (e.g. `extern crate libc;`) and exporting public objects from +other Rust modules (e.g. `pub use mymodule::foo;`). For example, if +you create a crate in `.../src/rust/yourcrate`, your Rust code should +live in `.../src/rust/yourcrate/yourcode.rs` and the public interface +to it should be exported in `.../src/rust/yourcrate/lib.rs`. + +If your code is to be called from Tor C code, you MUST define a safe +`ffi.rs`. See the "Safety" section further down for more details. + +For example, in a hypothetical `tor_addition` Rust module: + +In `.../src/rust/tor_addition/addition.rs`: + + pub fn get_sum(a: i32, b: i32) -> i32 { + a + b + } + +In `.../src/rust/tor_addition/lib.rs`: + + pub use addition::*; + +In `.../src/rust/tor_addition/ffi.rs`: + + #[no_mangle] + pub extern "C" fn tor_get_sum(a: c_int, b: c_int) -> c_int { + get_sum(a, b) + } + +If your Rust code must call out to parts of Tor's C code, you must +declare the functions you are calling in the `external` crate, located +at `.../src/rust/external`. + +<!-- XXX get better examples of how to declare these externs, when/how they --> +<!-- XXX are unsafe, what they are expected to do —isis --> + +Modules should strive to be below 500 lines (tests excluded). Single +responsibility and limited dependencies should be a guiding standard. + +If you have any external modules as dependencies (e.g. `extern crate +libc;`), you MUST declare them in your crate's `lib.rs` and NOT in any +other module. + + Dependencies +-------------- + +In general, we use modules from only the Rust standard library +whenever possible. We will review including external crates on a +case-by-case basis. + + Documentation +--------------- + +You MUST include `#[deny(missing_docs)]` in your crate. + +For function/method comments, you SHOULD include a one-sentence, "first person" +description of function behaviour (see requirements for documentation as +described in `.../src/HACKING/CodingStandards.md`), then an `# Inputs` section +for inputs or initialisation values, a `# Returns` section for return +values/types, a `# Warning` section containing warnings for unsafe behaviours or +panics that could happen. For publicly accessible +types/constants/objects/functions/methods, you SHOULD also include an +`# Examples` section with runnable doctests. + +You MUST document your module with _module docstring_ comments, +i.e. `//!` at the beginning of each line. + + Style +------- + +You SHOULD consider breaking up large literal numbers with `_` when it makes it +more human readable to do so, e.g. `let x: u64 = 100_000_000_000`. + + Testing +--------- + +All code MUST be unittested and integration tested. + +Public functions/objects exported from a crate SHOULD include doctests +describing how the function/object is expected to be used. + +Integration tests SHOULD go into a `tests/` directory inside your +crate. Unittests SHOULD go into their own module inside the module +they are testing, e.g. in `.../src/rust/tor_addition/addition.rs` you +should put: + + #[cfg(test)] + mod test { + use super::*; + + #[test] + fn addition_with_zero() { + let sum: i32 = get_sum(5i32, 0i32); + assert_eq!(sum, 5); + } + } + + Benchmarking +-------------- + +The external `test` crate can be used for most benchmarking. However, using +this crate requires nightly Rust. Since we may want to switch to a more +stable Rust compiler eventually, we shouldn't do things which will automatically +break builds for stable compilers. Therefore, you MUST feature-gate your +benchmarks in the following manner. + +If you wish to benchmark some of your Rust code, you MUST put the +following in the `[features]` section of your crate's `Cargo.toml`: + + [features] + bench = [] + +Next, in your crate's `lib.rs` you MUST put: + + #[cfg(all(test, feature = "bench"))] + extern crate test; + +This ensures that the external crate `test`, which contains utilities +for basic benchmarks, is only used when running benchmarks via `cargo +bench --features bench`. + +Finally, to write your benchmark code, in +`.../src/rust/tor_addition/addition.rs` you SHOULD put: + + #[cfg(all(test, features = "bench"))] + mod bench { + use test::Bencher; + use super::*; + + #[bench] + fn addition_small_integers(b: &mut Bencher) { + b.iter(| | get_sum(5i32, 0i32)); + } + } + + Fuzzing +--------- + +If you wish to fuzz parts of your code, please see the +[`cargo fuzz`](https://github.com/rust-fuzz/cargo-fuzz) crate, which uses +[libfuzzer-sys](https://github.com/rust-fuzz/libfuzzer-sys). + + Whitespace & Formatting +------------------------- + +You MUST run `rustfmt` (https://github.com/rust-lang-nursery/rustfmt) +on your code before your code will be merged. You can install rustfmt +by doing `cargo install rustfmt-nightly` and then run it with `cargo +fmt`. + + Safety +-------- + +You SHOULD read [the nomicon](https://doc.rust-lang.org/nomicon/) before writing +Rust FFI code. It is *highly advised* that you read and write normal Rust code +before attempting to write FFI or any other unsafe code. + +Here are some additional bits of advice and rules: + +0. Any behaviours which Rust considers to be undefined are forbidden + + From https://doc.rust-lang.org/reference/behavior-considered-undefined.html: + + > Behavior considered undefined + > + > The following is a list of behavior which is forbidden in all Rust code, + > including within unsafe blocks and unsafe functions. Type checking provides the + > guarantee that these issues are never caused by safe code. + > + > * Data races + > * Dereferencing a null/dangling raw pointer + > * Reads of [undef](http://llvm.org/docs/LangRef.html#undefined-values) + > (uninitialized) memory + > * Breaking the + > [pointer aliasing rules](http://llvm.org/docs/LangRef.html#pointer-aliasing-rules) + > with raw pointers (a subset of the rules used by C) + > * `&mut T` and `&T` follow LLVM’s scoped noalias model, except if the `&T` + > contains an `UnsafeCell<U>`. Unsafe code must not violate these aliasing + > guarantees. + > * Mutating non-mutable data (that is, data reached through a shared + > reference or data owned by a `let` binding), unless that data is + > contained within an `UnsafeCell<U>`. + > * Invoking undefined behavior via compiler intrinsics: + > - Indexing outside of the bounds of an object with + > `std::ptr::offset` (`offset` intrinsic), with the exception of + > one byte past the end which is permitted. + > - Using `std::ptr::copy_nonoverlapping_memory` (`memcpy32`/`memcpy64` + > intrinsics) on overlapping buffers + > * Invalid values in primitive types, even in private fields/locals: + > - Dangling/null references or boxes + > - A value other than `false` (0) or `true` (1) in a `bool` + > - A discriminant in an `enum` not included in the type definition + > - A value in a `char` which is a surrogate or above `char::MAX` + > - Non-UTF-8 byte sequences in a `str` + > * Unwinding into Rust from foreign code or unwinding from Rust into foreign + > code. Rust's failure system is not compatible with exception handling in other + > languages. Unwinding must be caught and handled at FFI boundaries. + +1. `unwrap()` + + If you call `unwrap()`, anywhere, even in a test, you MUST include + an inline comment stating how the unwrap will either 1) never fail, + or 2) should fail (i.e. in a unittest). + + You SHOULD NOT use `unwrap()` anywhere in which it is possible to handle the + potential error with either `expect()` or the eel operator, `?`. + For example, consider a function which parses a string into an integer: + + fn parse_port_number(config_string: &str) -> u16 { + u16::from_str_radix(config_string, 10).unwrap() + } + + There are numerous ways this can fail, and the `unwrap()` will cause the + whole program to byte the dust! Instead, either you SHOULD use `expect()` + (or another equivalent function which will return an `Option` or a `Result`) + and change the return type to be compatible: + + fn parse_port_number(config_string: &str) -> Option<u16> { + u16::from_str_radix(config_string, 10).expect("Couldn't parse port into a u16") + } + + or you SHOULD use `or()` (or another similar method): + + fn parse_port_number(config_string: &str) -> Option<u16> { + u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16") + } + + Using methods like `or()` can be particularly handy when you must do + something afterwards with the data, for example, if we wanted to guarantee + that the port is high. Combining these methods with the eel operator (`?`) + makes this even easier: + + fn parse_port_number(config_string: &str) -> Result<u16, Err> { + let port = u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16"))?; + + if port > 1024 { + return Ok(port); + } else { + return Err("Low ports not allowed"); + } + } + +2. `unsafe` + + If you use `unsafe`, you MUST describe a contract in your + documentation which describes how and when the unsafe code may + fail, and what expectations are made w.r.t. the interfaces to + unsafe code. This is also REQUIRED for major pieces of FFI between + C and Rust. + + When creating an FFI in Rust for C code to call, it is NOT REQUIRED + to declare the entire function `unsafe`. For example, rather than doing: + + #[no_mangle] + pub unsafe extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 { + for number in &mut numbers { + *number += 1; + } + std::mem::transmute::<[u8; 4], u32>(numbers) + } + + You SHOULD instead do: + + #[no_mangle] + pub extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 { + for index in 0..numbers.len() { + numbers[index] += 1; + } + unsafe { + std::mem::transmute::<[u8; 4], u32>(numbers) + } + } + +3. Pass only integer types and bytes over the boundary + + The only non-integer type which may cross the FFI boundary is + bytes, e.g. `&[u8]`. This SHOULD be done on the Rust side by + passing a pointer (`*mut libc::c_char`) and a length + (`libc::size_t`). + + One might be tempted to do this via doing + `CString::new("blah").unwrap().into_raw()`. This has several problems: + + a) If you do `CString::new("bl\x00ah")` then the unwrap() will fail + due to the additional NULL terminator, causing a dangling + pointer to be returned (as well as a potential use-after-free). + + b) Returning the raw pointer will cause the CString to run its deallocator, + which causes any C code which tries to access the contents to dereference a + NULL pointer. + + c) If we were to do `as_raw()` this would result in a potential double-free + since the Rust deallocator would run and possibly Tor's deallocator. + + d) Calling `into_raw()` without later using the same pointer in Rust to call + `from_raw()` and then deallocate in Rust can result in a + [memory leak](https://doc.rust-lang.org/std/ffi/struct.CString.html#method.into_raw). + + [It was determined](https://github.com/rust-lang/rust/pull/41074) that this + is safe to do if you use the same allocator in C and Rust and also specify + the memory alignment for CString (except that there is no way to specify + the alignment for CString). It is believed that the alignment is always 1, + which would mean it's safe to dealloc the resulting `*mut c_char` in Tor's + C code. However, the Rust developers are not willing to guarantee the + stability of, or a contract for, this behaviour, citing concerns that this + is potentially extremely and subtly unsafe. + +4. Perform an allocation on the other side of the boundary + + After crossing the boundary, the other side MUST perform an + allocation to copy the data and is therefore responsible for + freeing that memory later. + +5. No touching other language's enums + + Rust enums should never be touched from C (nor can they be safely + `#[repr(C)]`) nor vice versa: + + > "The chosen size is the default enum size for the target platform's C + > ABI. Note that enum representation in C is implementation defined, so this is + > really a "best guess". In particular, this may be incorrect when the C code + > of interest is compiled with certain flags." + + (from https://gankro.github.io/nomicon/other-reprs.html) + +6. Type safety + + Wherever possible and sensical, you SHOULD create new types in a + manner which prevents type confusion or misuse. For example, + rather than using an untyped mapping between strings and integers + like so: + + use std::collections::HashMap; + + pub fn get_elements_with_over_9000_points(map: &HashMap<String, usize>) -> Vec<String> { + ... + } + + It would be safer to define a new type, such that some other usage + of `HashMap<String, usize>` cannot be confused for this type: + + pub struct DragonBallZPowers(pub HashMap<String, usize>); + + impl DragonBallZPowers { + pub fn over_nine_thousand<'a>(&'a self) -> Vec<&'a String> { + let mut powerful_enough: Vec<&'a String> = Vec::with_capacity(5); + + for (character, power) in &self.0 { + if *power > 9000 { + powerful_enough.push(character); + } + } + powerful_enough + } + } + + Note the following code, which uses Rust's type aliasing, is valid + but it does NOT meet the desired type safety goals: + + pub type Power = usize; + + pub fn over_nine_thousand(power: &Power) -> bool { + if *power > 9000 { + return true; + } + false + } + + // We can still do the following: + let his_power: usize = 9001; + over_nine_thousand(&his_power); + +7. Unsafe mucking around with lifetimes + + Because lifetimes are technically, in type theory terms, a kind, i.e. a + family of types, individual lifetimes can be treated as types. For example, + one can arbitrarily extend and shorten lifetime using `std::mem::transmute`: + + struct R<'a>(&'a i32); + + unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> { + std::mem::transmute::<R<'b>, R<'static>>(r) + } + + unsafe fn shorten_invariant_lifetime<'b, 'c>(r: &'b mut R<'static>) -> &'b mut R<'c> { + std::mem::transmute::<&'b mut R<'static>, &'b mut R<'c>>(r) + } + + Calling `extend_lifetime()` would cause an `R` passed into it to live forever + for the life of the program (the `'static` lifetime). Similarly, + `shorten_invariant_lifetime()` could be used to take something meant to live + forever, and cause it to disappear! This is incredibly unsafe. If you're + going to be mucking around with lifetimes like this, first, you better have + an extremely good reason, and second, you may as be honest and explicit about + it, and for ferris' sake just use a raw pointer. + + In short, just because lifetimes can be treated like types doesn't mean you + should do it. + +8. Doing excessively unsafe things when there's a safer alternative + + Similarly to #7, often there are excessively unsafe ways to do a task and a + simpler, safer way. You MUST choose the safer option where possible. + + For example, `std::mem::transmute` can be abused in ways where casting with + `as` would be both simpler and safer: + + // Don't do this + let ptr = &0; + let ptr_num_transmute = unsafe { std::mem::transmute::<&i32, usize>(ptr)}; + + // Use an `as` cast instead + let ptr_num_cast = ptr as *const i32 as usize; + + In fact, using `std::mem::transmute` for *any* reason is a code smell and as + such SHOULD be avoided. + +9. Casting integers with `as` + + This is generally fine to do, but it has some behaviours which you should be + aware of. Casting down chops off the high bits, e.g.: + + let x: u32 = 4294967295; + println!("{}", x as u16); // prints 65535 + + Some cases which you MUST NOT do include: + + * Casting an `u128` down to an `f32` or vice versa (e.g. + `u128::MAX as f32` but this isn't only a problem with overflowing + as it is also undefined behaviour for `42.0f32 as u128`), + + * Casting between integers and floats when the thing being cast + cannot fit into the type it is being casted into, e.g.: + + println!("{}", 42949.0f32 as u8); // prints 197 in debug mode and 0 in release + println!("{}", 1.04E+17 as u8); // prints 0 in both modes + println!("{}", (0.0/0.0) as i64); // prints whatever the heck LLVM wants + + Because this behaviour is undefined, it can even produce segfaults in + safe Rust code. For example, the following program built in release + mode segfaults: + + #[inline(never)] + pub fn trigger_ub(sl: &[u8; 666]) -> &[u8] { + // Note that the float is out of the range of `usize`, invoking UB when casting. + let idx = 1e99999f64 as usize; + &sl[idx..] // The bound check is elided due to `idx` being of an undefined value. + } + + fn main() { + println!("{}", trigger_ub(&[1; 666])[999999]); // ~ out of bound + } + + And in debug mode panics with: + + thread 'main' panicked at 'slice index starts at 140721821254240 but ends at 666', /checkout/src/libcore/slice/mod.rs:754:4 diff --git a/doc/HACKING/GettingStartedRust.md b/doc/HACKING/GettingStartedRust.md new file mode 100644 index 0000000000..cfb5196aa4 --- /dev/null +++ b/doc/HACKING/GettingStartedRust.md @@ -0,0 +1,158 @@ + + Hacking on Rust in Tor +======================== + + Getting Started +----------------- + +Please read or review our documentation on Rust coding standards +(`.../doc/HACKING/CodingStandardsRust.md`) before doing anything. + +Please also read +[the Rust Code of Conduct](https://www.rust-lang.org/en-US/conduct.html). We aim +to follow the good example set by the Rust community and be excellent to one +another. Let's be careful with each other, so we can be memory-safe together! + +Next, please contact us before rewriting anything! Rust in Tor is still an +experiment. It is an experiment that we very much want to see succeed, so we're +going slowly and carefully. For the moment, it's also a completely +volunteer-driven effort: while many, if not most, of us are paid to work on Tor, +we are not yet funded to write Rust code for Tor. Please be patient with the +other people who are working on getting more Rust code into Tor, because they +are graciously donating their free time to contribute to this effort. + + Resources for learning Rust +----------------------------- + +**Beginning resources** + +The primary resource for learning Rust is +[The Book](https://doc.rust-lang.org/book/). If you'd like to start writing +Rust immediately, without waiting for anything to install, there is +[an interactive browser-based playground](https://play.rust-lang.org/). + +**Advanced resources** + +If you're interested in playing with various Rust compilers and viewing a very +nicely displayed output of the generated assembly, there is +[the Godbolt compiler explorer](https://rust.godbolt.org/) + +For learning how to write unsafe Rust, read +[The Rustonomicon](https://doc.rust-lang.org/nomicon/). + +For learning everything you ever wanted to know about Rust macros, there is +[The Little Book of Rust Macros](https://danielkeep.github.io/tlborm/book/index.html). + + Compiling Tor with Rust enabled +--------------------------------- + +You will need to run the `configure` script with the `--enable-rust` flag to +explicitly build with Rust. Additionally, you will need to specify where to +fetch Rust dependencies, as we allow for either fetching dependencies from Cargo +or specifying a local directory. + +**Fetch dependencies from Cargo** + + ./configure --enable-rust --enable-cargo-online-mode + +**Using a local dependency cache** + +**NOTE**: local dependency caches which were not *originally* created via + `--enable-cargo-online-mode` are broken. See https://bugs.torproject.org/22907 + +To specify a local directory: + + RUST_DEPENDENCIES='path_to_dependencies_directory' ./configure --enable-rust + +(Note that RUST_DEPENDENCIES must be the full path to the directory; it cannot +be relative.) + +You'll need the following Rust dependencies (as of this writing): + + libc==0.2.22 + +To get them, do: + + mkdir path_to_dependencies_directory + cd path_to_dependencies_directory + git clone https://github.com/rust-lang/libc + cd libc + git checkout 0.2.22 + cargo package + cd .. + ln -s libc/target/package/libc-0.2.22 libc-0.2.22 + + + Identifying which modules to rewrite +====================================== + +The places in the Tor codebase that are good candidates for porting to Rust are: + +1. loosely coupled to other Tor submodules, +2. have high test coverage, and +3. would benefit from being implemented in a memory safe language. + +Help in either identifying places such as this, or working to improve existing +areas of the C codebase by adding regression tests and simplifying dependencies, +would be really helpful. + +Furthermore, as submodules in C are implemented in Rust, this is a good +opportunity to refactor, add more tests, and split modules into smaller areas of +responsibility. + +A good first step is to build a module-level callgraph to understand how +interconnected your target module is. + + git clone https://git.torproject.org/user/nickm/calltool.git + cd tor + CFLAGS=0 ./configure + ../calltool/src/main.py module_callgraph + +The output will tell you each module name, along with a set of every module that +the module calls. Modules which call fewer other modules are better targets. + + Writing your Rust module +========================== + +Strive to change the C API as little as possible. + +We are currently targetting Rust nightly, *for now*. We expect this to change +moving forward, as we understand more about which nightly features we need. It +is on our TODO list to try to cultivate good standing with various distro +maintainers of `rustc` and `cargo`, in order to ensure that whatever version we +solidify on is readily available. + + Adding your Rust module to Tor's build system +----------------------------------------------- + +0. Your translation of the C module should live in its own crate(s) + in the `.../tor/src/rust/` directory. +1. Add your crate to `.../tor/src/rust/Cargo.toml`, in the + `[workspace.members]` section. +2. Append your crate's static library to the `rust_ldadd` definition + (underneath `if USE_RUST`) in `.../tor/Makefile.am`. + + How to test your Rust code +---------------------------- + +Everything should be tested full stop. Even non-public functionality. + +Be sure to edit `.../tor/src/test/test_rust.sh` to add the name of your crate to +the `crates` variable! This will ensure that `cargo test` is run on your crate. + +Configure Tor's build system to build with Rust enabled: + + ./configure --enable-fatal-warnings --enable-rust --enable-cargo-online-mode + +Tor's test should be run by doing: + + make check + +Tor's integration tests should also pass: + + make test-stem + + Submitting a patch +===================== + +Please follow the instructions in `.../doc/HACKING/GettingStarted.md`. |