summaryrefslogtreecommitdiff
path: root/doc/HACKING/CodingStandardsRust.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/HACKING/CodingStandardsRust.md')
-rw-r--r--doc/HACKING/CodingStandardsRust.md262
1 files changed, 152 insertions, 110 deletions
diff --git a/doc/HACKING/CodingStandardsRust.md b/doc/HACKING/CodingStandardsRust.md
index 97026c9b7c..fb2c93bbea 100644
--- a/doc/HACKING/CodingStandardsRust.md
+++ b/doc/HACKING/CodingStandardsRust.md
@@ -22,20 +22,26 @@ For example, in a hypothetical `tor_addition` Rust module:
In `src/rust/tor_addition/addition.rs`:
- pub fn get_sum(a: i32, b: i32) -> i32 {
- a + b
- }
+```rust
+pub fn get_sum(a: i32, b: i32) -> i32 {
+ a + b
+}
+```
In `src/rust/tor_addition/lib.rs`:
- pub use addition::*;
+```rust
+pub use addition::*;
+```
In `src/rust/tor_addition/ffi.rs`:
- #[no_mangle]
- pub extern "C" fn tor_get_sum(a: c_int, b: c_int) -> c_int {
- get_sum(a, b)
- }
+```rust
+#[no_mangle]
+pub extern "C" fn tor_get_sum(a: c_int, b: c_int) -> c_int {
+ get_sum(a, b)
+}
+```
If your Rust code must call out to parts of Tor's C code, you must
declare the functions you are calling in the `external` crate, located
@@ -129,16 +135,18 @@ crate. Unittests SHOULD go into their own module inside the module
they are testing, e.g. in `src/rust/tor_addition/addition.rs` you
should put:
- #[cfg(test)]
- mod test {
- use super::*;
+```rust
+#[cfg(test)]
+mod test {
+ use super::*;
- #[test]
- fn addition_with_zero() {
- let sum: i32 = get_sum(5i32, 0i32);
- assert_eq!(sum, 5);
- }
+#[test]
+ fn addition_with_zero() {
+ let sum: i32 = get_sum(5i32, 0i32);
+ assert_eq!(sum, 5);
}
+}
+```
## Benchmarking
@@ -151,13 +159,17 @@ benchmarks in the following manner.
If you wish to benchmark some of your Rust code, you MUST put the
following in the `[features]` section of your crate's `Cargo.toml`:
- [features]
- bench = []
+```toml
+[features]
+bench = []
+```
Next, in your crate's `lib.rs` you MUST put:
- #[cfg(all(test, feature = "bench"))]
- extern crate test;
+```rust
+#[cfg(all(test, feature = "bench"))]
+extern crate test;
+```
This ensures that the external crate `test`, which contains utilities
for basic benchmarks, is only used when running benchmarks via `cargo
@@ -166,16 +178,18 @@ bench --features bench`.
Finally, to write your benchmark code, in
`src/rust/tor_addition/addition.rs` you SHOULD put:
- #[cfg(all(test, features = "bench"))]
- mod bench {
- use test::Bencher;
- use super::*;
+```rust
+#[cfg(all(test, features = "bench"))]
+mod bench {
+ use test::Bencher;
+ use super::*;
- #[bench]
- fn addition_small_integers(b: &mut Bencher) {
- b.iter(| | get_sum(5i32, 0i32));
- }
+#[bench]
+ fn addition_small_integers(b: &mut Bencher) {
+ b.iter(| | get_sum(5i32, 0i32));
}
+}
+```
## Fuzzing
@@ -247,39 +261,47 @@ Here are some additional bits of advice and rules:
potential error with the eel operator, `?` or another non panicking way.
For example, consider a function which parses a string into an integer:
- fn parse_port_number(config_string: &str) -> u16 {
- u16::from_str_radix(config_string, 10).unwrap()
- }
+ ```rust
+ fn parse_port_number(config_string: &str) -> u16 {
+ u16::from_str_radix(config_string, 10).unwrap()
+ }
+ ```
There are numerous ways this can fail, and the `unwrap()` will cause the
whole program to byte the dust! Instead, either you SHOULD use `ok()`
(or another equivalent function which will return an `Option` or a `Result`)
and change the return type to be compatible:
- fn parse_port_number(config_string: &str) -> Option<u16> {
- u16::from_str_radix(config_string, 10).ok()
- }
+ ```rust
+ fn parse_port_number(config_string: &str) -> Option<u16> {
+ u16::from_str_radix(config_string, 10).ok()
+ }
+ ```
or you SHOULD use `or()` (or another similar method):
- fn parse_port_number(config_string: &str) -> Option<u16> {
- u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16")
- }
+ ```rust
+ fn parse_port_number(config_string: &str) -> Option<u16> {
+ u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16")
+ }
+ ```
Using methods like `or()` can be particularly handy when you must do
something afterwards with the data, for example, if we wanted to guarantee
that the port is high. Combining these methods with the eel operator (`?`)
makes this even easier:
- fn parse_port_number(config_string: &str) -> Result<u16, Err> {
- let port = u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16"))?;
+ ```rust
+ fn parse_port_number(config_string: &str) -> Result<u16, Err> {
+ let port = u16::from_str_radix(config_string, 10).or(Err("Couldn't parse port into a u16"))?;
- if port > 1024 {
- return Ok(port);
- } else {
- return Err("Low ports not allowed");
- }
+ if port > 1024 {
+ return Ok(port);
+ } else {
+ return Err("Low ports not allowed");
}
+ }
+ ```
2. `unsafe`
@@ -292,25 +314,29 @@ Here are some additional bits of advice and rules:
When creating an FFI in Rust for C code to call, it is NOT REQUIRED
to declare the entire function `unsafe`. For example, rather than doing:
- #[no_mangle]
- pub unsafe extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
- for number in &mut numbers {
- *number += 1;
- }
- std::mem::transmute::<[u8; 4], u32>(numbers)
+ ```rust
+ #[no_mangle]
+ pub unsafe extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
+ for number in &mut numbers {
+ *number += 1;
}
+ std::mem::transmute::<[u8; 4], u32>(numbers)
+ }
+ ```
You SHOULD instead do:
- #[no_mangle]
- pub extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
- for index in 0..numbers.len() {
- numbers[index] += 1;
- }
- unsafe {
- std::mem::transmute::<[u8; 4], u32>(numbers)
- }
+ ```rust
+ #[no_mangle]
+ pub extern "C" fn increment_and_combine_numbers(mut numbers: [u8; 4]) -> u32 {
+ for index in 0..numbers.len() {
+ numbers[index] += 1;
}
+ unsafe {
+ std::mem::transmute::<[u8; 4], u32>(numbers)
+ }
+ }
+ ```
3. Pass only C-compatible primitive types and bytes over the boundary
@@ -385,45 +411,51 @@ Here are some additional bits of advice and rules:
rather than using an untyped mapping between strings and integers
like so:
- use std::collections::HashMap;
+ ```rust
+ use std::collections::HashMap;
- pub fn get_elements_with_over_9000_points(map: &HashMap<String, usize>) -> Vec<String> {
- ...
- }
+ pub fn get_elements_with_over_9000_points(map: &HashMap<String, usize>) -> Vec<String> {
+ ...
+ }
+ ```
It would be safer to define a new type, such that some other usage
of `HashMap<String, usize>` cannot be confused for this type:
- pub struct DragonBallZPowers(pub HashMap<String, usize>);
+ ```rust
+ pub struct DragonBallZPowers(pub HashMap<String, usize>);
- impl DragonBallZPowers {
- pub fn over_nine_thousand<'a>(&'a self) -> Vec<&'a String> {
- let mut powerful_enough: Vec<&'a String> = Vec::with_capacity(5);
+ impl DragonBallZPowers {
+ pub fn over_nine_thousand<'a>(&'a self) -> Vec<&'a String> {
+ let mut powerful_enough: Vec<&'a String> = Vec::with_capacity(5);
- for (character, power) in &self.0 {
- if *power > 9000 {
- powerful_enough.push(character);
- }
- }
- powerful_enough
- }
- }
+ for (character, power) in &self.0 {
+ if *power > 9000 {
+ powerful_enough.push(character);
+ }
+ }
+ powerful_enough
+ }
+ }
+ ```
Note the following code, which uses Rust's type aliasing, is valid
but it does NOT meet the desired type safety goals:
- pub type Power = usize;
+ ```rust
+ pub type Power = usize;
- pub fn over_nine_thousand(power: &Power) -> bool {
- if *power > 9000 {
- return true;
- }
- false
+ pub fn over_nine_thousand(power: &Power) -> bool {
+ if *power > 9000 {
+ return true;
}
+ false
+ }
- // We can still do the following:
- let his_power: usize = 9001;
- over_nine_thousand(&his_power);
+ // We can still do the following:
+ let his_power: usize = 9001;
+ over_nine_thousand(&his_power);
+ ```
7. Unsafe mucking around with lifetimes
@@ -431,15 +463,17 @@ Here are some additional bits of advice and rules:
family of types, individual lifetimes can be treated as types. For example,
one can arbitrarily extend and shorten lifetime using `std::mem::transmute`:
- struct R<'a>(&'a i32);
+ ```rust
+ struct R<'a>(&'a i32);
- unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> {
- std::mem::transmute::<R<'b>, R<'static>>(r)
- }
+ unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> {
+ std::mem::transmute::<R<'b>, R<'static>>(r)
+ }
- unsafe fn shorten_invariant_lifetime<'b, 'c>(r: &'b mut R<'static>) -> &'b mut R<'c> {
- std::mem::transmute::<&'b mut R<'static>, &'b mut R<'c>>(r)
- }
+ unsafe fn shorten_invariant_lifetime<'b, 'c>(r: &'b mut R<'static>) -> &'b mut R<'c> {
+ std::mem::transmute::<&'b mut R<'static>, &'b mut R<'c>>(r)
+ }
+ ```
Calling `extend_lifetime()` would cause an `R` passed into it to live forever
for the life of the program (the `'static` lifetime). Similarly,
@@ -460,12 +494,14 @@ Here are some additional bits of advice and rules:
For example, `std::mem::transmute` can be abused in ways where casting with
`as` would be both simpler and safer:
- // Don't do this
- let ptr = &0;
- let ptr_num_transmute = unsafe { std::mem::transmute::<&i32, usize>(ptr)};
+ ```rust
+ // Don't do this
+ let ptr = &0;
+ let ptr_num_transmute = unsafe { std::mem::transmute::<&i32, usize>(ptr)};
- // Use an `as` cast instead
- let ptr_num_cast = ptr as *const i32 as usize;
+ // Use an `as` cast instead
+ let ptr_num_cast = ptr as *const i32 as usize;
+ ```
In fact, using `std::mem::transmute` for *any* reason is a code smell and as
such SHOULD be avoided.
@@ -475,8 +511,10 @@ Here are some additional bits of advice and rules:
This is generally fine to do, but it has some behaviours which you should be
aware of. Casting down chops off the high bits, e.g.:
- let x: u32 = 4294967295;
- println!("{}", x as u16); // prints 65535
+ ```rust
+ let x: u32 = 4294967295;
+ println!("{}", x as u16); // prints 65535
+ ```
Some cases which you MUST NOT do include:
@@ -487,24 +525,28 @@ Here are some additional bits of advice and rules:
* Casting between integers and floats when the thing being cast
cannot fit into the type it is being casted into, e.g.:
- println!("{}", 42949.0f32 as u8); // prints 197 in debug mode and 0 in release
- println!("{}", 1.04E+17 as u8); // prints 0 in both modes
- println!("{}", (0.0/0.0) as i64); // prints whatever the heck LLVM wants
+ ```rust
+ println!("{}", 42949.0f32 as u8); // prints 197 in debug mode and 0 in release
+ println!("{}", 1.04E+17 as u8); // prints 0 in both modes
+ println!("{}", (0.0/0.0) as i64); // prints whatever the heck LLVM wants
+ ```
Because this behaviour is undefined, it can even produce segfaults in
safe Rust code. For example, the following program built in release
mode segfaults:
- #[inline(never)]
- pub fn trigger_ub(sl: &[u8; 666]) -> &[u8] {
- // Note that the float is out of the range of `usize`, invoking UB when casting.
- let idx = 1e99999f64 as usize;
- &sl[idx..] // The bound check is elided due to `idx` being of an undefined value.
- }
-
- fn main() {
- println!("{}", trigger_ub(&[1; 666])[999999]); // ~ out of bound
- }
+ ```rust
+ #[inline(never)]
+ pub fn trigger_ub(sl: &[u8; 666]) -> &[u8] {
+ // Note that the float is out of the range of `usize`, invoking UB when casting.
+ let idx = 1e99999f64 as usize;
+ &sl[idx..] // The bound check is elided due to `idx` being of an undefined value.
+ }
+
+ fn main() {
+ println!("{}", trigger_ub(&[1; 666])[999999]); // ~ out of bound
+ }
+ ```
And in debug mode panics with: