summaryrefslogtreecommitdiff
path: root/doc/HACKING
diff options
context:
space:
mode:
Diffstat (limited to 'doc/HACKING')
-rw-r--r--doc/HACKING447
1 files changed, 299 insertions, 148 deletions
diff --git a/doc/HACKING b/doc/HACKING
index 50b5d80d18..b612953743 100644
--- a/doc/HACKING
+++ b/doc/HACKING
@@ -1,41 +1,189 @@
+Hacking Tor: An Incomplete Guide
+================================
-0. Useful tools.
+Getting started
+---------------
-0.0 The buildbot.
+For full information on how Tor is supposed to work, look at the files in
+https://gitweb.torproject.org/torspec.git/tree
- http://tor-buildbot.freehaven.net:8010/
+For an explanation of how to change Tor's design to work differently, look at
+https://gitweb.torproject.org/torspec.git/blob_plain/HEAD:/proposals/001-process.txt
- - Down because nickm isn't running services at home any more. ioerror says
- he will resurrect it.
+For the latest version of the code, get a copy of git, and
-0.1. Useful command-lines that are non-trivial to reproduce but can
-help with tracking bugs or leaks.
+ git clone git://git.torproject.org/git/tor .
-dmalloc -l ~/dmalloc.log
-(run the commands it tells you)
-./configure --with-dmalloc
+We talk about Tor on the tor-talk mailing list. Design proposals and
+discussion belong on the tor-dev mailing list. We hang around on
+irc.oftc.net, with general discussion happening on #tor and development
+happening on #tor-dev.
+
+How we use Git branches
+-----------------------
+
+Each main development series (like 0.2.1, 0.2.2, etc) has its main work
+applied to a single branch. At most one series can be the development series
+at a time; all other series are maintenance series that get bug-fixes only.
+The development series is built in a git branch called "master"; the
+maintenance series are built in branches called "maint-0.2.0", "maint-0.2.1",
+and so on. We regularly merge the active maint branches forward.
+
+For all series except the development series, we also have a "release" branch
+(as in "release-0.2.1"). The release series is based on the corresponding
+maintenance series, except that it deliberately lags the maint series for
+most of its patches, so that bugfix patches are not typically included in a
+maintenance release until they've been tested for a while in a development
+release. Occasionally, we'll merge an urgent bugfix into the release branch
+before it gets merged into maint, but that's rare.
+
+If you're working on a bugfix for a bug that occurs in a particular version,
+base your bugfix branch on the "maint" branch for the first _actively
+developed_ series that has that bug. (Right now, that's 0.2.1.) If you're
+working on a new feature, base it on the master branch.
+
+
+How we log changes
+------------------
+
+When you do a commit that needs a ChangeLog entry, add a new file to
+the "changes" toplevel subdirectory. It should have the format of a
+one-entry changelog section from the current ChangeLog file, as in
+
+ o Major bugfixes:
+ - Fix a potential buffer overflow. Fixes bug 9999; bugfix on
+ 0.3.1.4-beta.
+
+To write a changes file, first categorize the change. Some common categories
+are: Minor bugfixes, Major bugfixes, Minor features, Major features, Code
+simplifications and refactoring. Then say what the change does. If
+it's a bugfix, mention what bug it fixes and when the bug was
+introduced. To find out which Git tag the change was introduced in,
+you can use "git describe --contains <sha1 of commit>".
+
+If at all possible, try to create this file in the same commit where
+you are making the change. Please give it a distinctive name that no
+other branch will use for the lifetime of your change.
+
+When we go to make a release, we will concatenate all the entries
+in changes to make a draft changelog, and clear the directory. We'll
+then edit the draft changelog into a nice readable format.
+
+What needs a changes file?::
+ A not-exhaustive list: Anything that might change user-visible
+ behavior. Anything that changes internals, documentation, or the build
+ system enough that somebody could notice. Big or interesting code
+ rewrites. Anything about which somebody might plausibly wonder "when
+ did that happen, and/or why did we do that" 6 months down the line.
+
+Why use changes files instead of Git commit messages?::
+ Git commit messages are written for developers, not users, and they
+ are nigh-impossible to revise after the fact.
+
+Why use changes files instead of entries in the ChangeLog?::
+ Having every single commit touch the ChangeLog file tended to create
+ zillions of merge conflicts.
+
+Useful tools
+------------
+
+These aren't strictly necessary for hacking on Tor, but they can help track
+down bugs.
+
+The buildbot
+~~~~~~~~~~~~
+
+https://buildbot.vidalia-project.net/one_line_per_build
+
+Dmalloc
+~~~~~~~
+
+The dmalloc library will keep track of memory allocation, so you can find out
+if we're leaking memory, doing any double-frees, or so on.
+
+ dmalloc -l ~/dmalloc.log
+ (run the commands it tells you)
+ ./configure --with-dmalloc
+
+Valgrind
+~~~~~~~~
valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/or/tor
-0.2. Running gcov for unit test coverage
+(Note that if you get a zillion openssl warnings, you will also need to
+pass --undef-value-errors=no to valgrind, or rebuild your openssl
+with -DPURIFY.)
+Running gcov for unit test coverage
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+-----
make clean
make CFLAGS='-g -fprofile-arcs -ftest-coverage'
- ./src/or/test
+ ./src/test/test
cd src/common; gcov *.[ch]
cd ../or; gcov *.[ch]
+-----
+
+Then, look at the .gcov files. '-' before a line means that the
+compiler generated no code for that line. '######' means that the
+line was never reached. Lines with numbers were called that number
+of times.
+
+Profiling Tor with oprofile
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Then, look at the .gcov files. '-' before a line means that the
- compiler generated no code for that line. '######' means that the
- line was never reached. Lines with numbers were called that number
- of times.
+The oprofile tool runs (on Linux only!) to tell you what functions Tor is
+spending its CPU time in, so we can identify berformance pottlenecks.
-1. Coding conventions
+Here are some basic instructions
-1.0. Whitespace and C conformance
+ - Build tor with debugging symbols (you probably already have, unless
+ you messed with CFLAGS during the build process).
+ - Build all the libraries you care about with debugging symbols
+ (probably you only care about libssl, maybe zlib and Libevent).
+ - Copy this tor to a new directory
+ - Copy all the libraries it uses to that dir too (ldd ./tor will
+ tell you)
+ - Set LD_LIBRARY_PATH to include that dir. ldd ./tor should now
+ show you it's using the libs in that dir
+ - Run that tor
+ - Reset oprofiles counters/start it
+ * "opcontrol --reset; opcontrol --start", if Nick remembers right.
+ - After a while, have it dump the stats on tor and all the libs
+ in that dir you created.
+ * "opcontrol --dump;"
+ * "opreport -l that_dir/*"
+ - Profit
+
+
+Coding conventions
+------------------
+
+Patch checklist
+~~~~~~~~~~~~~~~
+
+If possible, send your patch as one of these (in descending order of
+preference)
+
+ - A git branch we can pull from
+ - Patches generated by git format-patch
+ - A unified diff
+
+Did you remember...
+
+ - To build your code while configured with --enable-gcc-warnings?
+ - To run "make check-spaces" on your code?
+ - To write unit tests, as possible?
+ - To base your code on the appropriate branch?
+ - To include a file in the "changes" directory as appropriate?
+
+Whitespace and C conformance
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Invoke "make check-spaces" from time to time, so it can tell you about
+deviations from our C whitespace style. Generally, we use:
- Invoke "make check-spaces" from time to time, so it can tell you about
- deviations from our C whitespace style. Generally, we use:
- Unix-style line endings
- K&R-style indentation
- No space before newlines
@@ -52,15 +200,17 @@ valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/or/tor
"puts (x)".
- Function declarations at the start of the line.
- We try hard to build without warnings everywhere. In particular, if you're
- using gcc, you should invoke the configure script with the option
- "--enable-gcc-warnings". This will give a bunch of extra warning flags to
- the compiler, and help us find divergences from our preferred C style.
+We try hard to build without warnings everywhere. In particular, if you're
+using gcc, you should invoke the configure script with the option
+"--enable-gcc-warnings". This will give a bunch of extra warning flags to
+the compiler, and help us find divergences from our preferred C style.
+
+Getting emacs to edit Tor source properly
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-1.0.1. Getting emacs to edit Tor source properly.
+Nick likes to put the following snippet in his .emacs file:
- Hi, folks! Nick here. I like to put the following snippet in my .emacs
- file:
+-----
(add-hook 'c-mode-hook
(lambda ()
(font-lock-mode 1)
@@ -80,90 +230,99 @@ valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/or/tor
(set-variable 'c-basic-offset 8)
(set-variable 'tab-width 8))
))))
+-----
- You'll note that it defaults to showing all trailing whitespace. The
- "cond" test detects whether the file is one of a few C free software
- projects that I often edit, and sets up the indentation level and tab
- preferences to match what they want.
+You'll note that it defaults to showing all trailing whitespace. The "cond"
+test detects whether the file is one of a few C free software projects that I
+often edit, and sets up the indentation level and tab preferences to match
+what they want.
- If you want to try this out, you'll need to change the filename regex
- patterns to match where you keep your Tor files.
+If you want to try this out, you'll need to change the filename regex
+patterns to match where you keep your Tor files.
- If you *only* use emacs to edit Tor, you could always just say:
+If you use emacs for editing Tor and nothing else, you could always just say:
- (add-hook 'c-mode-hook
+-----
+ (add-hook 'c-mode-hook
(lambda ()
(font-lock-mode 1)
(set-variable 'show-trailing-whitespace t)
(set-variable 'indent-tabs-mode nil)
(set-variable 'c-basic-offset 2)))
+-----
+
+There is probably a better way to do this. No, we are probably not going
+to clutter the files with emacs stuff.
- There is probably a better way to do this. No, we are probably not going
- to clutter the files with emacs stuff.
-1.1. Details
+Functions to use
+~~~~~~~~~~~~~~~~
- Use tor_malloc, tor_free, tor_strdup, and tor_gettimeofday instead of their
- generic equivalents. (They always succeed or exit.)
+We have some wrapper functions like tor_malloc, tor_free, tor_strdup, and
+tor_gettimeofday; use them instead of their generic equivalents. (They
+always succeed or exit.)
- You can get a full list of the compatibility functions that Tor provides by
- looking through src/common/util.h and src/common/compat.h. You can see the
- available containers in src/common/containers.h. You should probably
- familiarize yourself with these modules before you write too much code,
- or else you'll wind up reinventing the wheel.
+You can get a full list of the compatibility functions that Tor provides by
+looking through src/common/util.h and src/common/compat.h. You can see the
+available containers in src/common/containers.h. You should probably
+familiarize yourself with these modules before you write too much code, or
+else you'll wind up reinventing the wheel.
- Use 'INLINE' instead of 'inline', so that we work properly on Windows.
+Use 'INLINE' instead of 'inline', so that we work properly on Windows.
-1.2. Calling and naming conventions
+Calling and naming conventions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Whenever possible, functions should return -1 on error and 0 on success.
+Whenever possible, functions should return -1 on error and 0 on success.
- For multi-word identifiers, use lowercase words combined with
- underscores. (e.g., "multi_word_identifier"). Use ALL_CAPS for macros and
- constants.
+For multi-word identifiers, use lowercase words combined with
+underscores. (e.g., "multi_word_identifier"). Use ALL_CAPS for macros and
+constants.
- Typenames should end with "_t".
+Typenames should end with "_t".
- Function names should be prefixed with a module name or object name. (In
- general, code to manipulate an object should be a module with the same
- name as the object, so it's hard to tell which convention is used.)
+Function names should be prefixed with a module name or object name. (In
+general, code to manipulate an object should be a module with the same name
+as the object, so it's hard to tell which convention is used.)
- Functions that do things should have imperative-verb names
- (e.g. buffer_clear, buffer_resize); functions that return booleans should
- have predicate names (e.g. buffer_is_empty, buffer_needs_resizing).
+Functions that do things should have imperative-verb names
+(e.g. buffer_clear, buffer_resize); functions that return booleans should
+have predicate names (e.g. buffer_is_empty, buffer_needs_resizing).
- If you find that you have four or more possible return code values, it's
- probably time to create an enum. If you find that you are passing three or
- more flags to a function, it's probably time to create a flags argument
- that takes a bitfield.
+If you find that you have four or more possible return code values, it's
+probably time to create an enum. If you find that you are passing three or
+more flags to a function, it's probably time to create a flags argument that
+takes a bitfield.
-1.3. What To Optimize
+What To Optimize
+~~~~~~~~~~~~~~~~
- Don't optimize anything if it's not in the critical path. Right now,
- the critical path seems to be AES, logging, and the network itself.
- Feel free to do your own profiling to determine otherwise.
+Don't optimize anything if it's not in the critical path. Right now, the
+critical path seems to be AES, logging, and the network itself. Feel free to
+do your own profiling to determine otherwise.
-1.4. Log conventions
+Log conventions
+~~~~~~~~~~~~~~~
- http://wiki.noreply.org/noreply/TheOnionRouter/TorFAQ#LogLevels
+https://wiki.torproject.org/noreply/TheOnionRouter/TorFAQ#LogLevels
- No error or warning messages should be expected during normal OR or OP
- operation.
+No error or warning messages should be expected during normal OR or OP
+operation.
- If a library function is currently called such that failure always
- means ERR, then the library function should log WARN and let the caller
- log ERR.
+If a library function is currently called such that failure always means ERR,
+then the library function should log WARN and let the caller log ERR.
- [XXX Proposed convention: every message of severity INFO or higher should
- either (A) be intelligible to end-users who don't know the Tor source; or
- (B) somehow inform the end-users that they aren't expected to understand
- the message (perhaps with a string like "internal error"). Option (A) is
- to be preferred to option (B). -NM]
+Every message of severity INFO or higher should either (A) be intelligible
+to end-users who don't know the Tor source; or (B) somehow inform the
+end-users that they aren't expected to understand the message (perhaps
+with a string like "internal error"). Option (A) is to be preferred to
+option (B).
-1.5. Doxygen
+Doxygen
+~~~~~~~~
- We use the 'doxygen' utility to generate documentation from our
- source code. Here's how to use it:
+We use the 'doxygen' utility to generate documentation from our
+source code. Here's how to use it:
1. Begin every file that should be documented with
/**
@@ -214,11 +373,12 @@ valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/or/tor
6. See the Doxygen manual for more information; this summary just
scratches the surface.
-1.5.1. Doxygen comment conventions
+Doxygen comment conventions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Say what functions do as a series of one or more imperative sentences, as
- though you were telling somebody how to be the function. In other words,
- DO NOT say:
+Say what functions do as a series of one or more imperative sentences, as
+though you were telling somebody how to be the function. In other words, DO
+NOT say:
/** The strtol function parses a number.
*
@@ -230,7 +390,7 @@ valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/or/tor
*/
long strtol(const char *nptr, char **nptr, int base);
- Instead, please DO say:
+Instead, please DO say:
/** Parse a number in radix <b>base</b> from the string <b>nptr</b>,
* and return the result. Skip all leading whitespace. If
@@ -239,66 +399,57 @@ valgrind --leak-check=yes --error-limit=no --show-reachable=yes src/or/tor
**/
long strtol(const char *nptr, char **nptr, int base);
- Doxygen comments are the contract in our abstraction-by-contract world: if
- the functions that call your function rely on it doing something, then your
- function should mention that it does that something in the documentation.
- If you rely on a function doing something beyond what is in its
- documentation, then you should watch out, or it might do something else
- later.
-
-2. Code notes
-
-2.1. Dataflows
-
-2.1.1. How Incoming data is handled
-
-There are two paths for data arriving at Tor over the network: regular
-TCP data, and DNS.
-
-2.1.1.1. TCP.
-
-When Tor takes information over the network, it uses the functions
-read_to_buf() and read_to_buf_tls() in buffers.c. These read from a
-socket or an SSL* into a buffer_t, which is an mbuf-style linkedlist
-of memory chunks.
-
-read_to_buf() and read_to_buf_tls() are called only from
-connection_read_to_buf() in connection.c. It takes a connection_t
-pointer, and reads data into it over the network, up to the
-connection's current bandwidth limits. It places that data into the
-"inbuf" field of the connection, and then:
- - Adjusts the connection's want-to-read/want-to-write status as
- appropriate.
- - Increments the read and written counts for the connection as
- appropriate.
- - Adjusts bandwidth buckets as appropriate.
-
-connection_read_to_buf() is called only from connection_handle_read().
-The connection_handle_read() function is called whenever libevent
-decides (based on select, poll, epoll, kqueue, etc) that there is data
-to read from a connection. If any data is read,
-connection_handle_read() calls connection_process_inbuf() to see if
-any of the data can be processed. If the connection was closed,
-connection_handle_read() calls connection_reached_eof().
-
-Connection_process_inbuf() and connection_reached_eof() both dispatch
-based on the connection type to determine what to do with the data
-that's just arrived on the connection's inbuf field. Each type of
-connection has its own version of these functions. For example,
-directory connections process incoming data in
-connection_dir_process_inbuf(), while OR connections process incoming
-data in connection_or_process_inbuf(). These
-connection_*_process_inbuf() functions extract data from the
-connection's inbuf field (a buffer_t), using functions from buffers.c.
-Some of these accessor functions are straightforward data extractors
-(like fetch_from_buf()); others do protocol-specific parsing.
-
-
-2.1.1.2. DNS
-
-Tor launches (and optionally accepts) DNS requests using the code in
-eventdns.c, which is a copy of libevent's evdns.c. (We don't use
-libevent's version because it is not yet in the versions of libevent
-all our users have.) DNS replies are read in nameserver_read();
-DNS queries are read in server_port_read().
+Doxygen comments are the contract in our abstraction-by-contract world: if
+the functions that call your function rely on it doing something, then your
+function should mention that it does that something in the documentation. If
+you rely on a function doing something beyond what is in its documentation,
+then you should watch out, or it might do something else later.
+
+Putting out a new release
+-------------------------
+
+Here are the steps Roger takes when putting out a new Tor release:
+
+1) Use it for a while, as a client, as a relay, as a hidden service,
+and as a directory authority. See if it has any obvious bugs, and
+resolve those.
+
+2) Gather the changes/* files into a changelog entry, rewriting many
+of them and reordering to focus on what users and funders would find
+interesting and understandable.
+
+3) Compose a short release blurb to highlight the user-facing
+changes. Insert said release blurb into the ChangeLog stanza. If it's
+a stable release, add it to the ReleaseNotes file too. If we're adding
+to a release-0.2.x branch, manually commit the changelogs to the later
+git branches too.
+
+4) Bump the version number in configure.in and rebuild.
+
+5) Make dist, put the tarball up somewhere, and tell #tor about it. Wait
+a while to see if anybody has problems building it. Try to get Sebastian
+or somebody to try building it on Windows.
+
+6) Get at least two of weasel/arma/karsten to put the new version number
+in their approved versions list.
+
+7) Sign and push the tarball to the website in the dist/ directory. Sign
+and push the git tag.
+
+8) Edit include/versions.wmi to note the new version. Rebuild and push
+the website.
+
+9) Email Erinn and weasel (cc'ing tor-assistants) that a new tarball
+is up. This step should probably change to mailing more packagers.
+
+10) Add the version number to Trac. To do this, go to Trac, log in,
+select "Admin" near the top of the screen, then select "Versions" from
+the menu on the left. At the right, there will be an "Add version"
+box. By convention, we enter the version in the form "Tor:
+0.2.2.23-alpha" (or whatever the version is), and we select the date as
+the date in the ChangeLog.
+
+11) Wait up to a day or two (for a development release), or until most
+packages are up (for a stable release), and mail the release blurb and
+changelog to tor-talk or tor-announce.