aboutsummaryrefslogtreecommitdiff
path: root/doc/HACKING/CodingStandardsRust.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/HACKING/CodingStandardsRust.md')
-rw-r--r--doc/HACKING/CodingStandardsRust.md330
1 files changed, 180 insertions, 150 deletions
diff --git a/doc/HACKING/CodingStandardsRust.md b/doc/HACKING/CodingStandardsRust.md
index fc562816db..c821465173 100644
--- a/doc/HACKING/CodingStandardsRust.md
+++ b/doc/HACKING/CodingStandardsRust.md
@@ -1,48 +1,51 @@
+# Rust Coding Standards
- Rust Coding Standards
-=======================
-
-You MUST follow the standards laid out in `.../doc/HACKING/CodingStandards.md`,
+You MUST follow the standards laid out in `doc/HACKING/CodingStandards.md`,
where applicable.
- Module/Crate Declarations
----------------------------
+## Module/Crate Declarations
Each Tor C module which is being rewritten MUST be in its own crate.
-See the structure of `.../src/rust` for examples.
+See the structure of `src/rust` for examples.
In your crate, you MUST use `lib.rs` ONLY for pulling in external
crates (e.g. `extern crate libc;`) and exporting public objects from
other Rust modules (e.g. `pub use mymodule::foo;`). For example, if
-you create a crate in `.../src/rust/yourcrate`, your Rust code should
-live in `.../src/rust/yourcrate/yourcode.rs` and the public interface
-to it should be exported in `.../src/rust/yourcrate/lib.rs`.
+you create a crate in `src/rust/yourcrate`, your Rust code should
+live in `src/rust/yourcrate/yourcode.rs` and the public interface
+to it should be exported in `src/rust/yourcrate/lib.rs`.
If your code is to be called from Tor C code, you MUST define a safe
`ffi.rs`. See the "Safety" section further down for more details.
For example, in a hypothetical `tor_addition` Rust module:
-In `.../src/rust/tor_addition/addition.rs`:
+In `src/rust/tor_addition/addition.rs`:
- pub fn get_sum(a: i32, b: i32) -> i32 {
- a + b
- }
+```rust
+pub fn get_sum(a: i32, b: i32) -> i32 {
+ a + b
+}
+```
-In `.../src/rust/tor_addition/lib.rs`:
+In `src/rust/tor_addition/lib.rs`:
- pub use addition::*;
+```rust
+pub use addition::*;
+```
-In `.../src/rust/tor_addition/ffi.rs`:
+In `src/rust/tor_addition/ffi.rs`:
- #[no_mangle]
- pub extern "C" fn tor_get_sum(a: c_int, b: c_int) -> c_int {
- get_sum(a, b)
- }
+```rust
+#[no_mangle]
+pub extern "C" fn tor_get_sum(a: c_int, b: c_int) -> c_int {
+ get_sum(a, b)
+}
+```
If your Rust code must call out to parts of Tor's C code, you must
declare the functions you are calling in the `external` crate, located
-at `.../src/rust/external`.
+at `src/rust/external`.
<!-- XXX get better examples of how to declare these externs, when/how they -->
<!-- XXX are unsafe, what they are expected to do —isis -->
@@ -54,8 +57,7 @@ If you have any external modules as dependencies (e.g. `extern crate
libc;`), you MUST declare them in your crate's `lib.rs` and NOT in any
other module.
- Dependencies and versions
----------------------------
+## Dependencies and versions
In general, we use modules from only the Rust standard library
whenever possible. We will review including external crates on a
@@ -81,8 +83,7 @@ Currently, Tor requires that you use the latest stable Rust version. At
some point in the future, we will freeze on a given stable Rust version,
to ensure backward compatibility with stable distributions that ship it.
- Updating/Adding Dependencies
-------------------------------
+## Updating/Adding Dependencies
To add/remove/update dependencies, first add your dependencies,
exactly specifying their versions, into the appropriate *crate-level*
@@ -101,14 +102,13 @@ Next, run `/scripts/maint/updateRustDependencies.sh`. Then, go into
`src/ext/rust` and commit the changes to the `tor-rust-dependencies`
repo.
- Documentation
----------------
+## Documentation
You MUST include `#![deny(missing_docs)]` in your crate.
For function/method comments, you SHOULD include a one-sentence, "first person"
description of function behaviour (see requirements for documentation as
-described in `.../src/HACKING/CodingStandards.md`), then an `# Inputs` section
+described in `src/HACKING/CodingStandards.md`), then an `# Inputs` section
for inputs or initialisation values, a `# Returns` section for return
values/types, a `# Warning` section containing warnings for unsafe behaviours or
panics that could happen. For publicly accessible
@@ -118,14 +118,12 @@ types/constants/objects/functions/methods, you SHOULD also include an
You MUST document your module with _module docstring_ comments,
i.e. `//!` at the beginning of each line.
- Style
--------
+## Style
You SHOULD consider breaking up large literal numbers with `_` when it makes it
more human readable to do so, e.g. `let x: u64 = 100_000_000_000`.
- Testing
----------
+## Testing
All code MUST be unittested and integration tested.
@@ -134,22 +132,23 @@ describing how the function/object is expected to be used.
Integration tests SHOULD go into a `tests/` directory inside your
crate. Unittests SHOULD go into their own module inside the module
-they are testing, e.g. in `.../src/rust/tor_addition/addition.rs` you
+they are testing, e.g. in `src/rust/tor_addition/addition.rs` you
should put:
- #[cfg(test)]
- mod test {
- use super::*;
+```rust
+#[cfg(test)]
+mod test {
+ use super::*;
- #[test]
- fn addition_with_zero() {
- let sum: i32 = get_sum(5i32, 0i32);
- assert_eq!(sum, 5);
- }
+#[test]
+ fn addition_with_zero() {
+ let sum: i32 = get_sum(5i32, 0i32);
+ assert_eq!(sum, 5);
}
+}
+```
- Benchmarking
---------------
+## Benchmarking
The external `test` crate can be used for most benchmarking. However, using
this crate requires nightly Rust. Since we may want to switch to a more
@@ -160,49 +159,52 @@ benchmarks in the following manner.
If you wish to benchmark some of your Rust code, you MUST put the
following in the `[features]` section of your crate's `Cargo.toml`:
- [features]
- bench = []
+```toml
+[features]
+bench = []
+```
Next, in your crate's `lib.rs` you MUST put:
- #[cfg(all(test, feature = "bench"))]
- extern crate test;
+```rust
+#[cfg(all(test, feature = "bench"))]
+extern crate test;
+```
This ensures that the external crate `test`, which contains utilities
for basic benchmarks, is only used when running benchmarks via `cargo
bench --features bench`.
Finally, to write your benchmark code, in
-`.../src/rust/tor_addition/addition.rs` you SHOULD put:
+`src/rust/tor_addition/addition.rs` you SHOULD put:
- #[cfg(all(test, features = "bench"))]
- mod bench {
- use test::Bencher;
- use super::*;
+```rust
+#[cfg(all(test, features = "bench"))]
+mod bench {
+ use test::Bencher;
+ use super::*;
- #[bench]
- fn addition_small_integers(b: &mut Bencher) {
- b.iter(| | get_sum(5i32, 0i32));
- }
+#[bench]
+ fn addition_small_integers(b: &mut Bencher) {
+ b.iter(| | get_sum(5i32, 0i32));
}
+}
+```
- Fuzzing
----------
+## Fuzzing
If you wish to fuzz parts of your code, please see the
-[`cargo fuzz`](https://github.com/rust-fuzz/cargo-fuzz) crate, which uses
+[cargo fuzz](https://github.com/rust-fuzz/cargo-fuzz) crate, which uses
[libfuzzer-sys](https://github.com/rust-fuzz/libfuzzer-sys).
- Whitespace & Formatting
--------------------------
+## Whitespace & Formatting
You MUST run `rustfmt` (https://github.com/rust-lang-nursery/rustfmt)
on your code before your code will be merged. You can install rustfmt
by doing `cargo install rustfmt-nightly` and then run it with `cargo
fmt`.
- Safety
---------
+## Safety
You SHOULD read [the nomicon](https://doc.rust-lang.org/nomicon/) before writing
Rust FFI code. It is *highly advised* that you read and write normal Rust code
@@ -222,10 +224,10 @@ Here are some additional bits of advice and rules:
>
> * Data races
> * Dereferencing a null/dangling raw pointer
- > * Reads of [undef](http://llvm.org/docs/LangRef.html#undefined-values)
+ > * Reads of [undef](https://llvm.org/docs/LangRef.html#undefined-values)
> (uninitialized) memory
> * Breaking the
- > [pointer aliasing rules](http://llvm.org/docs/LangRef.html#pointer-aliasing-rules)
+ > [pointer aliasing rules](https://llvm.org/docs/LangRef.html#pointer-aliasing-rules)
> with raw pointers (a subset of the rules used by C)
> * `&mut T` and `&T` follow LLVM’s scoped noalias model, except if the `&T`
> contains an `UnsafeCell<U>`. Unsafe code must not violate these aliasing
@@ -256,42 +258,50 @@ Here are some additional bits of advice and rules:
or 2) should fail (i.e. in a unittest).
You SHOULD NOT use `unwrap()` anywhere in which it is possible to handle the
- potential error with either `expect()` or the eel operator, `?`.
+ potential error with the eel operator, `?` or another non panicking way.
For example, consider a function which parses a string into an integer:
- fn parse_port_number(config_string: &str) -> u16 {
- u16::from_str_radix(config_string, 10).unwrap()
- }
+ ```rust
+ fn parse_port_number(config_string: &str) -> u16 {
+ u16::from_str_radix(config_string, 10).unwrap()
+ }
+ ```
There are numerous ways this can fail, and the `unwrap()` will cause the
- whole program to byte the dust! Instead, either you SHOULD use `expect()`
+ whole program to byte the dust! Instead, either you SHOULD use `ok()`
(or another equivalent function which will return an `Option` or a `Result`)
and change the return type to be compatible:
- fn parse_port_number(config_string: &str) -> Option<u16> {
- u16::from_str_radix(config_string, 10).expect("Couldn't parse port into a u16")
- }
+ ```rust
+ fn parse_port_number(config_string: &str) -> Option<u16> {
+ u16::from_str_radix(config_string, 10).ok()
+ }
+ ```
or you SHOULD use `or()` (or another similar method):
- fn parse_port_number(config_string: &str) -> Option<u16> {
- u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16")
- }
+ ```rust
+ fn parse_port_number(config_string: &str) -> Option<u16> {
+ u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16")
+ }
+ ```
Using methods like `or()` can be particularly handy when you must do
something afterwards with the data, for example, if we wanted to guarantee
that the port is high. Combining these methods with the eel operator (`?`)
makes this even easier:
- fn parse_port_number(config_string: &str) -> Result<u16, Err> {
- let port = u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16"))?;
+ ```rust
+ fn parse_port_number(config_string: &str) -> Result<u16, Err> {
+ let port = u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16"))?;
- if port > 1024 {
- return Ok(port);
- } else {
- return Err("Low ports not allowed");
- }
+ if port > 1024 {
+ return Ok(port);
+ } else {
+ return Err("Low ports not allowed");
}
+ }
+ ```
2. `unsafe`
@@ -304,25 +314,29 @@ Here are some additional bits of advice and rules:
When creating an FFI in Rust for C code to call, it is NOT REQUIRED
to declare the entire function `unsafe`. For example, rather than doing:
- #[no_mangle]
- pub unsafe extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
- for number in &mut numbers {
- *number += 1;
- }
- std::mem::transmute::<[u8; 4], u32>(numbers)
+ ```rust
+ #[no_mangle]
+ pub unsafe extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
+ for number in &mut numbers {
+ *number += 1;
}
+ std::mem::transmute::<[u8; 4], u32>(numbers)
+ }
+ ```
You SHOULD instead do:
- #[no_mangle]
- pub extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
- for index in 0..numbers.len() {
- numbers[index] += 1;
- }
- unsafe {
- std::mem::transmute::<[u8; 4], u32>(numbers)
- }
+ ```rust
+ #[no_mangle]
+ pub extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
+ for index in 0..numbers.len() {
+ numbers[index] += 1;
}
+ unsafe {
+ std::mem::transmute::<[u8; 4], u32>(numbers)
+ }
+ }
+ ```
3. Pass only C-compatible primitive types and bytes over the boundary
@@ -397,45 +411,51 @@ Here are some additional bits of advice and rules:
rather than using an untyped mapping between strings and integers
like so:
- use std::collections::HashMap;
+ ```rust
+ use std::collections::HashMap;
- pub fn get_elements_with_over_9000_points(map: &HashMap<String, usize>) -> Vec<String> {
- ...
- }
+ pub fn get_elements_with_over_9000_points(map: &HashMap<String, usize>) -> Vec<String> {
+ ...
+ }
+ ```
It would be safer to define a new type, such that some other usage
of `HashMap<String, usize>` cannot be confused for this type:
- pub struct DragonBallZPowers(pub HashMap<String, usize>);
+ ```rust
+ pub struct DragonBallZPowers(pub HashMap<String, usize>);
- impl DragonBallZPowers {
- pub fn over_nine_thousand<'a>(&'a self) -> Vec<&'a String> {
- let mut powerful_enough: Vec<&'a String> = Vec::with_capacity(5);
+ impl DragonBallZPowers {
+ pub fn over_nine_thousand<'a>(&'a self) -> Vec<&'a String> {
+ let mut powerful_enough: Vec<&'a String> = Vec::with_capacity(5);
- for (character, power) in &self.0 {
- if *power > 9000 {
- powerful_enough.push(character);
- }
- }
- powerful_enough
- }
- }
+ for (character, power) in &self.0 {
+ if *power > 9000 {
+ powerful_enough.push(character);
+ }
+ }
+ powerful_enough
+ }
+ }
+ ```
Note the following code, which uses Rust's type aliasing, is valid
but it does NOT meet the desired type safety goals:
- pub type Power = usize;
+ ```rust
+ pub type Power = usize;
- pub fn over_nine_thousand(power: &Power) -> bool {
- if *power > 9000 {
- return true;
- }
- false
+ pub fn over_nine_thousand(power: &Power) -> bool {
+ if *power > 9000 {
+ return true;
}
+ false
+ }
- // We can still do the following:
- let his_power: usize = 9001;
- over_nine_thousand(&his_power);
+ // We can still do the following:
+ let his_power: usize = 9001;
+ over_nine_thousand(&his_power);
+ ```
7. Unsafe mucking around with lifetimes
@@ -443,15 +463,17 @@ Here are some additional bits of advice and rules:
family of types, individual lifetimes can be treated as types. For example,
one can arbitrarily extend and shorten lifetime using `std::mem::transmute`:
- struct R<'a>(&'a i32);
+ ```rust
+ struct R<'a>(&'a i32);
- unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> {
- std::mem::transmute::<R<'b>, R<'static>>(r)
- }
+ unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> {
+ std::mem::transmute::<R<'b>, R<'static>>(r)
+ }
- unsafe fn shorten_invariant_lifetime<'b, 'c>(r: &'b mut R<'static>) -> &'b mut R<'c> {
- std::mem::transmute::<&'b mut R<'static>, &'b mut R<'c>>(r)
- }
+ unsafe fn shorten_invariant_lifetime<'b, 'c>(r: &'b mut R<'static>) -> &'b mut R<'c> {
+ std::mem::transmute::<&'b mut R<'static>, &'b mut R<'c>>(r)
+ }
+ ```
Calling `extend_lifetime()` would cause an `R` passed into it to live forever
for the life of the program (the `'static` lifetime). Similarly,
@@ -472,12 +494,14 @@ Here are some additional bits of advice and rules:
For example, `std::mem::transmute` can be abused in ways where casting with
`as` would be both simpler and safer:
- // Don't do this
- let ptr = &0;
- let ptr_num_transmute = unsafe { std::mem::transmute::<&i32, usize>(ptr)};
+ ```rust
+ // Don't do this
+ let ptr = &0;
+ let ptr_num_transmute = unsafe { std::mem::transmute::<&i32, usize>(ptr)};
- // Use an `as` cast instead
- let ptr_num_cast = ptr as *const i32 as usize;
+ // Use an `as` cast instead
+ let ptr_num_cast = ptr as *const i32 as usize;
+ ```
In fact, using `std::mem::transmute` for *any* reason is a code smell and as
such SHOULD be avoided.
@@ -487,8 +511,10 @@ Here are some additional bits of advice and rules:
This is generally fine to do, but it has some behaviours which you should be
aware of. Casting down chops off the high bits, e.g.:
- let x: u32 = 4294967295;
- println!("{}", x as u16); // prints 65535
+ ```rust
+ let x: u32 = 4294967295;
+ println!("{}", x as u16); // prints 65535
+ ```
Some cases which you MUST NOT do include:
@@ -499,24 +525,28 @@ Here are some additional bits of advice and rules:
* Casting between integers and floats when the thing being cast
cannot fit into the type it is being casted into, e.g.:
- println!("{}", 42949.0f32 as u8); // prints 197 in debug mode and 0 in release
- println!("{}", 1.04E+17 as u8); // prints 0 in both modes
- println!("{}", (0.0/0.0) as i64); // prints whatever the heck LLVM wants
+ ```rust
+ println!("{}", 42949.0f32 as u8); // prints 197 in debug mode and 0 in release
+ println!("{}", 1.04E+17 as u8); // prints 0 in both modes
+ println!("{}", (0.0/0.0) as i64); // prints whatever the heck LLVM wants
+ ```
Because this behaviour is undefined, it can even produce segfaults in
safe Rust code. For example, the following program built in release
mode segfaults:
- #[inline(never)]
- pub fn trigger_ub(sl: &[u8; 666]) -> &[u8] {
- // Note that the float is out of the range of `usize`, invoking UB when casting.
- let idx = 1e99999f64 as usize;
- &sl[idx..] // The bound check is elided due to `idx` being of an undefined value.
- }
-
- fn main() {
- println!("{}", trigger_ub(&[1; 666])[999999]); // ~ out of bound
- }
+ ```rust
+ #[inline(never)]
+ pub fn trigger_ub(sl: &[u8; 666]) -> &[u8] {
+ // Note that the float is out of the range of `usize`, invoking UB when casting.
+ let idx = 1e99999f64 as usize;
+ &sl[idx..] // The bound check is elided due to `idx` being of an undefined value.
+ }
+
+ fn main() {
+ println!("{}", trigger_ub(&[1; 666])[999999]); // ~ out of bound
+ }
+ ```
And in debug mode panics with: