diff options
Diffstat (limited to 'doc/HACKING')
-rw-r--r-- | doc/HACKING/CodeStructure.md | 129 | ||||
-rw-r--r-- | doc/HACKING/CodingStandards.md | 43 | ||||
-rw-r--r-- | doc/HACKING/CodingStandardsRust.md | 24 | ||||
-rw-r--r-- | doc/HACKING/HelpfulTools.md | 46 | ||||
-rw-r--r-- | doc/HACKING/Module.md | 111 | ||||
-rw-r--r-- | doc/HACKING/ReleasingTor.md | 4 | ||||
-rw-r--r-- | doc/HACKING/Tracing.md | 2 |
7 files changed, 337 insertions, 22 deletions
diff --git a/doc/HACKING/CodeStructure.md b/doc/HACKING/CodeStructure.md new file mode 100644 index 0000000000..736d6cd484 --- /dev/null +++ b/doc/HACKING/CodeStructure.md @@ -0,0 +1,129 @@ + +TODO: revise this to talk about how things are, rather than how things +have changed. + +TODO: Make this into good markdown. + + + +For quite a while now, the program "tor" has been built from source +code in just two directories: src/common and src/or. + +This has become more-or-less untenable, for a few reasons -- most +notably of which is that it has led our code to become more +spaghetti-ish than I can endorse with a clean conscience. + +So to fix that, we've gone and done a huge code movement in our git +master branch, which will land in a release once Tor 0.3.5.1-alpha is +out. + +Here's what we did: + + * src/common has been turned into a set of static libraries. These +all live in the "src/lib/*" directories. The dependencies between +these libraries should have no cycles. The libraries are: + + arch -- Headers to handle architectural differences + cc -- headers to handle differences among compilers + compress -- wraps zlib, zstd, lzma + container -- high-level container types + crypt_ops -- Cryptographic operations. Planning to split this into +a higher and lower level library + ctime -- Operations that need to run in constant-time. (Properly, +data-invariant time) + defs -- miscelaneous definitions needed throughout Tor. + encoding -- transforming one data type into another, and various +data types into strings. + err -- lowest-level error handling, in cases where we can't use +the logs because something that the logging system needs has broken. + evloop -- Generic event-loop handling logic + fdio -- Low-level IO wrapper functions for file descriptors. + fs -- Operations on the filesystem + intmath -- low-level integer math and misc bit-twiddling hacks + lock -- low-level locking code + log -- Tor's logging module. This library sits roughly halfway up +the library dependency diagram, since everything it depends on has to +be carefully crafted to *not* log. + malloc -- Low-level wrappers for the platform memory allocation functions. + math -- Higher-level mathematical functions, and floating-point math + memarea -- An arena allocator + meminfo -- Functions for querying the current process's memory +status and resources + net -- Networking compatibility and convenience code + osinfo -- Querying information about the operating system + process -- Launching and querying the status of other processes + sandbox -- Backend for the linux seccomp2 sandbox + smartlist_core -- The lowest-level of the smartlist_t data type. +Separated from the rest of the containers library because the logging +subsystem depends on it. + string -- Compatibility and convenience functions for manipulating +C strings. + term -- Terminal-related functions (currently limited to a getpass +function). + testsupport -- Macros for mocking, unit tests, etc. + thread -- Higher-level thread compatibility code + time -- Higher-level time management code, including format +conversions and monotonic time + tls -- Our wrapper around our TLS library + trace -- Formerly src/trace -- a generic event tracing API + wallclock -- Low-level time code, used by the log module. + + * To ensure that the dependency graph in src/common remains under +control, there is a tool that you can run called "make +check-includes". It verifies that each module in Tor only includes +the headers that it is permitted to include, using a per-directory +".may_include" file. + + * The src/or/or.h header has been split into numerous smaller +headers. Notably, many important structures are now declared in a +header called foo_st.h, where "foo" is the name of the structure. + + * The src/or directory, which had most of Tor's code, had been split +up into several directories. This is still a work in progress: This +code has not itself been refactored, and its dependency graph is still +a tangled web. I hope we'll be working on that over the coming +releases, but it will take a while to do. + + The new top-level source directories are: + + src/core -- Code necessary to actually perform or use onion routing. + src/feature -- Code used only by some onion routing +configurations, or only for a special purpose. + src/app -- Top-level code to run, invoke, and configure the +lower-level code + + The new second-level source directories are: + src/core/crypto -- High-level cryptographic protocols used in Tor + src/core/mainloop -- Tor's event loop, connection-handling, and +traffic-routing code. + src/core/or -- Parts related to handling onion routing itself + src/core/proto -- support for encoding and decoding different +wire protocols + + src/feature/api -- Support for making Tor embeddable + src/feature/client -- Functionality which only Tor clients need + src/feature/control -- Controller implementation + src/feature/dirauth -- Directory authority + src/feature/dircache -- Directory cache + src/feature/dirclient -- Directory client + src/feature/dircommon -- Shared code between the other directory modules + src/feature/hibernate -- Hibernating when Tor is out of bandwidth +or shutting down + src/feature/hs -- v3 onion service implementation + src/feature/hs_common -- shared code between both onion service +implementations + src/feature/nodelist -- storing and accessing the list of relays on +the network. + src/feature/relay -- code that only relay servers and exit servers need. + src/feature/rend -- v2 onion service implementation + src/feature/stats -- statistics and history + + src/app/config -- configuration and state for Tor + src/app/main -- Top-level functions to invoke the rest or Tor. + + * The "tor" executable is now built in src/app/tor rather than src/or/tor. + + * There are more static libraries than before that you need to build +into your application if you want to embed Tor. Rather than +maintaining this list yourself, I recommend that you run "make +show-libs" to have Tor emit a list of what you need to link. diff --git a/doc/HACKING/CodingStandards.md b/doc/HACKING/CodingStandards.md index 79a6a9f0ce..4f229348e4 100644 --- a/doc/HACKING/CodingStandards.md +++ b/doc/HACKING/CodingStandards.md @@ -42,6 +42,23 @@ If you have changed build system components: - For example, if you have changed Makefiles, autoconf files, or anything else that affects the build system. +License issues +============== + +Tor is distributed under the license terms in the LICENSE -- in +brief, the "3-clause BSD license". If you send us code to +distribute with Tor, it needs to be code that we can distribute +under those terms. Please don't send us patches unless you agree +to allow this. + +Some compatible licenses include: + + - 3-clause BSD + - 2-clause BSD + - CC0 Public Domain Dedication + + + How we use Git branches ======================= @@ -155,7 +172,6 @@ deviations from our C whitespace style. Generally, we use: - Unix-style line endings - K&R-style indentation - No space before newlines - - A blank line at the end of each file - Never more than one blank line in a row - Always spaces, never tabs - No more than 79-columns per line. @@ -168,6 +184,9 @@ deviations from our C whitespace style. Generally, we use: `puts (x)`. - Function declarations at the start of the line. +If you use an editor that has plugins for editorconfig.org, the file +`.editorconfig` will help you to conform this coding style. + We try hard to build without warnings everywhere. In particular, if you're using gcc, you should invoke the configure script with the option `--enable-fatal-warnings`. This will tell the compiler @@ -181,8 +200,8 @@ We have some wrapper functions like `tor_malloc`, `tor_free`, `tor_strdup`, and always succeed or exit.) You can get a full list of the compatibility functions that Tor provides by -looking through `src/common/util*.h` and `src/common/compat*.h`. You can see the -available containers in `src/common/containers*.h`. You should probably +looking through `src/lib/*/*.h`. You can see the +available containers in `src/lib/containers/*.h`. You should probably familiarize yourself with these modules before you write too much code, or else you'll wind up reinventing the wheel. @@ -195,6 +214,24 @@ We don't call `memcmp()` directly. Use `fast_memeq()`, `fast_memneq()`, Also see a longer list of functions to avoid in: https://people.torproject.org/~nickm/tor-auto/internal/this-not-that.html +What code can use what other code? +---------------------------------- + +We're trying to simplify Tor's structure over time. In the long run, we want +Tor to be structured as a set of modules with *no circular dependencies*. + +This property is currently provided by the modules in src/lib, but not +throughout the rest of Tor. In general, higher-level libraries may use +lower-level libraries, but never the reverse. + +To prevent new circular dependencies from landing, we have a tool that +you can invoke with `make check-includes`, and which is run +automatically as part of `make check`. This tool will verify that, for +every source directory with a `.may_include` file, no local headers are +included except those specifically permitted by the `.may_include` file. +When editing one of these files, please make sure that you are not +introducing any cycles into Tor's dependency graph. + Floating point math is hard --------------------------- diff --git a/doc/HACKING/CodingStandardsRust.md b/doc/HACKING/CodingStandardsRust.md index 7c6405e624..fc562816db 100644 --- a/doc/HACKING/CodingStandardsRust.md +++ b/doc/HACKING/CodingStandardsRust.md @@ -104,7 +104,7 @@ repo. Documentation --------------- -You MUST include `#[deny(missing_docs)]` in your crate. +You MUST include `#![deny(missing_docs)]` in your crate. For function/method comments, you SHOULD include a one-sentence, "first person" description of function behaviour (see requirements for documentation as @@ -324,12 +324,26 @@ Here are some additional bits of advice and rules: } } -3. Pass only integer types and bytes over the boundary +3. Pass only C-compatible primitive types and bytes over the boundary - The only non-integer type which may cross the FFI boundary is + Rust's C-compatible primitive types are integers and floats. + These types are declared in the [libc crate](https://doc.rust-lang.org/libc/x86_64-unknown-linux-gnu/libc/index.html#types). + Most Rust objects have different [representations](https://doc.rust-lang.org/libc/x86_64-unknown-linux-gnu/libc/index.html#types) + in C and Rust, so they can't be passed using FFI. + + Tor currently uses the following Rust primitive types from libc for FFI: + * defined-size integers: `uint32_t` + * native-sized integers: `c_int` + * native-sized floats: `c_double` + * native-sized raw pointers: `* c_void`, `* c_char`, `** c_char` + + TODO: C smartlist to Stringlist conversion using FFI + + The only non-primitive type which may cross the FFI boundary is bytes, e.g. `&[u8]`. This SHOULD be done on the Rust side by - passing a pointer (`*mut libc::c_char`) and a length - (`libc::size_t`). + passing a pointer (`*mut libc::c_char`). The length can be passed + explicitly (`libc::size_t`), or the string can be NUL-byte terminated + C string. One might be tempted to do this via doing `CString::new("blah").unwrap().into_raw()`. This has several problems: diff --git a/doc/HACKING/HelpfulTools.md b/doc/HACKING/HelpfulTools.md index f919d08ec1..d499238526 100644 --- a/doc/HACKING/HelpfulTools.md +++ b/doc/HACKING/HelpfulTools.md @@ -4,25 +4,49 @@ Useful tools These aren't strictly necessary for hacking on Tor, but they can help track down bugs. +Travis/Appveyor CI +------------------ +It's CI. + +Looks like this: +* https://travis-ci.org/torproject/tor +* https://ci.appveyor.com/project/torproject/tor + +Travis builds and runs tests on Linux, and eventually macOS (#24629). +Appveyor builds and runs tests on Windows (using Windows Services for Linux). + +Runs automatically on Pull Requests sent to torproject/tor. You can set it up +for your fork to build commits outside of PRs too: + +1. sign up for GitHub: https://github.com/join +2. fork https://github.com/torproject/tor: + https://help.github.com/articles/fork-a-repo/ +3. follow https://docs.travis-ci.com/user/getting-started/#To-get-started-with-Travis-CI. + skip steps involving `.travis.yml` (we already have one). +4. go to https://ci.appveyor.com/login , log in with github, and select + "NEW PROJECT" + +Builds should show up on the web at travis-ci.com and on IRC at #tor-ci on +OFTC. If they don't, ask #tor-dev (also on OFTC). + Jenkins ------- - https://jenkins.torproject.org +It's CI/builders. Looks like this: https://jenkins.torproject.org -Dmalloc -------- +Runs automatically on commits merged to git.torproject.org. We CI the +master branch and all supported tor versions. We also build nightly debian +packages from master. -The dmalloc library will keep track of memory allocation, so you can find out -if we're leaking memory, doing any double-frees, or so on. +Builds Linux and Windows cross-compilation. Runs Linux tests. - dmalloc -l -/dmalloc.log - (run the commands it tells you) - ./configure --with-dmalloc +Builds should show up on the web at jenkins.torproject.org and on IRC at +#tor-bots on OFTC. If they don't, ask #tor-dev (also on OFTC). Valgrind -------- - valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/or/tor + valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/app/tor (Note that if you get a zillion openssl warnings, you will also need to pass `--undef-value-errors=no` to valgrind, or rebuild your openssl @@ -226,10 +250,10 @@ Beforehand, install google-perftools. Now you can run Tor with profiling enabled, and use the pprof utility to look at performance! See the gperftools manual for more info, but basically: -2. Run `env CPUPROFILE=/tmp/profile src/or/tor -f <path/torrc>`. The profile file +2. Run `env CPUPROFILE=/tmp/profile src/app/tor -f <path/torrc>`. The profile file is not written to until Tor finishes execuction. -3. Run `pprof src/or/tor /tm/profile` to start the REPL. +3. Run `pprof src/app/tor /tm/profile` to start the REPL. Generating and analyzing a callgraph ------------------------------------ diff --git a/doc/HACKING/Module.md b/doc/HACKING/Module.md new file mode 100644 index 0000000000..9cf36090b4 --- /dev/null +++ b/doc/HACKING/Module.md @@ -0,0 +1,111 @@ +# Modules in Tor # + +This document describes the build system and coding standards when writing a +module in Tor. + +## What is a module? ## + +In the context of the tor code base, a module is a subsystem that we can +selectively enable or disable, at `configure` time. + +Currently, there is only one module: + + - Directory Authority subsystem (dirauth) + +It is located in its own directory in `src/feature/dirauth/`. To disable it, +one need to pass `--disable-module-dirauth` at configure time. All modules +are currently enabled by default. + +## Build System ## + +The changes to the build system are pretty straightforward. + +1. Locate in the `configure.ac` file this define: `m4_define(MODULES`. It + contains a list (white-space separated) of the module in tor. Add yours to + the list. + +2. Use the `AC_ARG_ENABLE([module-dirauth]` template for your new module. We + use the "disable module" approach instead of enabling them one by one. So, + by default, tor will build all the modules. + + This will define the `HAVE_MODULE_<name>` statement which can be used in + the C code to conditionally compile things for your module. And the + `BUILD_MODULE_<name>` is also defined for automake files (e.g: include.am). + +3. In the `src/core/include.am` file, locate the `MODULE_DIRAUTH_SOURCES` + value. You need to create your own `_SOURCES` variable for your module + and then conditionally add the it to `LIBTOR_A_SOURCES` if you should + build the module. + + It is then **very** important to add your SOURCES variable to + `src_or_libtor_testing_a_SOURCES` so the tests can build it. + +4. Do the same for header files, locate `ORHEADERS +=` which always add all + headers of all modules so the symbol can be found for the module entry + points. + +Finally, your module will automatically be included in the +`TOR_MODULES_ALL_ENABLED` variable which is used to build the unit tests. They +always build everything in order to tests everything. + +## Coding ## + +As mentioned above, a module must be isolated in its own directory (name of +the module) in `src/feature/`. + +There are couples of "rules" you want to follow: + +* Minimize as much as you can the number of entry points into your module. + Less is always better but of course that doesn't work out for every use + case. However, it is a good thing to always keep that in mind. + +* Do **not** use the `HAVE_MODULE_<name>` define outside of the module code + base. Every entry point should have a second definition if the module is + disabled. For instance: + + ``` + #ifdef HAVE_MODULE_DIRAUTH + + int sr_init(int save_to_disk); + + #else /* HAVE_MODULE_DIRAUTH */ + + static inline int + sr_init(int save_to_disk) + { + (void) save_to_disk; + return 0; + } + + #endif /* HAVE_MODULE_DIRAUTH */ + + ``` + + The main reason for this approach is to avoid having conditional code + everywhere in the code base. It should be centralized as much as possible + which helps maintainability but also avoids conditional spaghetti code + making the code much more difficult to follow/understand. + +* It is possible that you end up with code that needs to be used by the rest + of the code base but is still part of your module. As a good example, if + you look at `src/feature/shared_random_client.c`: it contains code needed + by the hidden service subsystem but mainly related to the shared random + subsystem very specific to the dirauth module. + + This is fine but try to keep it as lean as possible and never use the same + filename as the one in the module. For example, this is a bad idea and + should never be done: + + - `src/feature/dirclient/shared_random.c` + - `src/feature/dirauth/shared_random.c` + +* When you include headers from the module, **always** use the full module + path in your statement. Example: + + `#include "feature/dirauth/dirvote.h"` + + The main reason is that we do **not** add the module include path by default + so it needs to be specified. But also, it helps our human brain understand + which part comes from a module or not. + + Even **in** the module itself, use the full include path like above. diff --git a/doc/HACKING/ReleasingTor.md b/doc/HACKING/ReleasingTor.md index 6c8fa1331f..55a40fc89b 100644 --- a/doc/HACKING/ReleasingTor.md +++ b/doc/HACKING/ReleasingTor.md @@ -7,7 +7,7 @@ new Tor release: === 0. Preliminaries -1. Get at least three of weasel/arma/Sebastian/Sina to put the new +1. Get at least two of weasel/arma/Sebastian to put the new version number in their approved versions list. Give them a few days to do this if you can. @@ -34,7 +34,7 @@ new Tor release: What about Coverity Scan? - What about clan scan-build? + What about clang scan-build? Does 'make distcheck' complain? diff --git a/doc/HACKING/Tracing.md b/doc/HACKING/Tracing.md index 349aade23a..24fa761310 100644 --- a/doc/HACKING/Tracing.md +++ b/doc/HACKING/Tracing.md @@ -69,7 +69,7 @@ configure option: ## Instrument Tor ## This is pretty easy. Let's say you want to add a trace event in -`src/or/rendcache.c`, you only have to add this include statement: +`src/feature/rend/rendcache.c`, you only have to add this include statement: #include "trace/events.h" |