From 03b48352c6b8416faee50e769bf226b06958e253 Mon Sep 17 00:00:00 2001 From: "Rob G. Jansen" Date: Tue, 28 Feb 2012 18:19:49 -0500 Subject: Fix busy Libevent loops (infinite loops in Shadow) There is a bug causing busy loops in Libevent and infinite loops in the Shadow simulator. A connection that is marked for close, wants to flush, is held open to flush, but is rate limited (the token bucket is empty) triggers the bug. This commit fixes the bug. Details are below. This currently happens on read and write callbacks when the active socket is marked for close. In this case, Tor doesn't actually try to complete the read or write (it returns from those methods when marked), but instead tries to clear the connection with conn_close_if_marked(). Tor will not close a marked connection that contains data: it must be flushed first. The bug occurs when this flush operation on the marked connection can not occur because the connection is rate-limited (its write token bucket is empty). The fix is to detect when rate limiting is preventing a marked connection from properly flushing. In this case, it should be flagged as read/write_blocked_on_bandwidth and the read/write events de-registered from Libevent. When the token bucket gets refilled, it will check the associated read/write_blocked_on_bandwidth flag, and add the read/write event back to Libevent, which will cause it to fire. This time, it will be properly flushed and closed. The reason that both read and write events are both de-registered when the marked connection can not flush is because both result in the same behavior. Both read/write events on marked connections will never again do any actual reads/writes, and are only useful to trigger the flush and close the connection. By setting the associated read/write_blocked_on_bandwidth flag, we ensure that the event will get added back to Libevent, properly flushed, and closed. Why is this important? Every Shadow event occurs at a discrete time instant. If Tor does not properly deregister Libevent events that fire but result in Tor essentially doing nothing, Libevent will repeatedly fire the event. In Shadow this means infinite loop, outside of Shadow this means wasted CPU cycles. --- src/or/main.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/src/or/main.c b/src/or/main.c index 163dd5d0fb..a5ad1d9648 100644 --- a/src/or/main.c +++ b/src/or/main.c @@ -848,6 +848,20 @@ conn_close_if_marked(int i) "Holding conn (fd %d) open for more flushing.", (int)conn->s)); conn->timestamp_lastwritten = now; /* reset so we can flush more */ + } else if (sz == 0) { /* retval is also 0 */ + /* Connection must flush before closing, but it's being rate-limited. + Let's remove from Libevent, and mark it as blocked on bandwidth so it + will be re-added on next token bucket refill. Prevents busy Libevent + loops where we keep ending up here and returning 0 until we are no + longer blocked on bandwidth. */ + if (connection_is_reading(conn)) { + conn->read_blocked_on_bw = 1; + connection_stop_reading(conn); + } + if (connection_is_writing(conn)) { + conn->write_blocked_on_bw = 1; + connection_stop_writing(conn); + } } return 0; } -- cgit v1.2.3-54-g00ecf From 9282c88998f80a027b5260472bf21125693a15bc Mon Sep 17 00:00:00 2001 From: Nick Mathewson Date: Tue, 15 May 2012 10:22:17 -0400 Subject: Add rate-limited log message to bug5263 fix Initially I said, "I claim that we shouldn't be reading and marked; let's see if I'm right." But Rob finds that it does. --- src/or/main.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/or/main.c b/src/or/main.c index a5ad1d9648..3e913d968d 100644 --- a/src/or/main.c +++ b/src/or/main.c @@ -854,14 +854,24 @@ conn_close_if_marked(int i) will be re-added on next token bucket refill. Prevents busy Libevent loops where we keep ending up here and returning 0 until we are no longer blocked on bandwidth. */ - if (connection_is_reading(conn)) { - conn->read_blocked_on_bw = 1; - connection_stop_reading(conn); - } if (connection_is_writing(conn)) { conn->write_blocked_on_bw = 1; connection_stop_writing(conn); } + if (connection_is_reading(conn)) { +#define MARKED_READING_RATE 180 + static ratelim_t marked_read_lim = RATELIM_INIT(MARKED_READING_RATE); + char *m; + if ((m = rate_limit_log(&marked_read_lim, now))) { + log_warn(LD_BUG, "Marked connection (fd %d, type %s, state %s) " + "is still reading; that shouldn't happen.%s", + (int)conn->s, conn_type_to_string(conn->type), + conn_state_to_string(conn->type, conn->state), m); + tor_free(m); + } + conn->read_blocked_on_bw = 1; + connection_stop_reading(conn); + } } return 0; } -- cgit v1.2.3-54-g00ecf From 9c588730598c8da6c8ad8f0829e443235699a37d Mon Sep 17 00:00:00 2001 From: Nick Mathewson Date: Tue, 5 Jun 2012 12:11:08 -0400 Subject: Add changes file for bug5263 --- changes/bug5263 | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 changes/bug5263 diff --git a/changes/bug5263 b/changes/bug5263 new file mode 100644 index 0000000000..25793f141b --- /dev/null +++ b/changes/bug5263 @@ -0,0 +1,5 @@ + o Minor bugfixes: + - Disable writing on marked-for-close connections when they are + blocked on bandwidth, to prevent busy-looping in Libevent. Fixes + bug 5263; bugfix on 0.0.2pre13, where we first added a special + case for flushing marked connections. -- cgit v1.2.3-54-g00ecf