diff options
author | Matt Traudt <sirmatt@ksu.edu> | 2017-09-27 16:11:05 -0400 |
---|---|---|
committer | David Goulet <dgoulet@torproject.org> | 2017-09-29 11:06:31 -0400 |
commit | 3ef7e6f1870bf6bc2c26076c6abb545a576f9a42 (patch) | |
tree | 2b9759e1a993becc1fdbbefe8fa6046e870b6ba9 /src/or/scheduler_kist.c | |
parent | fc6c0b46fb2741d58e4c8afc22800c76c304b0dc (diff) | |
download | tor-3ef7e6f1870bf6bc2c26076c6abb545a576f9a42.tar.gz tor-3ef7e6f1870bf6bc2c26076c6abb545a576f9a42.zip |
sched: Don't get KIST stuck in an infinite loop
When a channel is scheduled and flush cells returns 0 that is no cells to
flush, we flag it back in waiting for cells so it doesn't get stuck in a
possible infinite loop.
It has been observed on moria1 where a closed channel end up in the scheduler
where the flush process returned 0 cells but it was ultimately kept in the
scheduling loop forever. We suspect that this is due to a more deeper problem
in tor where the channel_more_to_flush() is actually looking at the wrong
queue and was returning 1 for an empty channel thus putting the channel in the
"Case 4" of the scheduler which is to go back in pending state thus
re-considered at the next iteration.
This is a fix that allows the KIST scheduler to recover properly from a not
entirelly diagnosed problem in tor.
Fixes #23676
Signed-off-by: David Goulet <dgoulet@torproject.org>
Diffstat (limited to 'src/or/scheduler_kist.c')
-rw-r--r-- | src/or/scheduler_kist.c | 17 |
1 files changed, 16 insertions, 1 deletions
diff --git a/src/or/scheduler_kist.c b/src/or/scheduler_kist.c index 9e960cdef9..2dddd3e9a4 100644 --- a/src/or/scheduler_kist.c +++ b/src/or/scheduler_kist.c @@ -591,8 +591,23 @@ kist_scheduler_run(void) if (flush_result > 0) { update_socket_written(&socket_table, chan, flush_result * (CELL_MAX_NETWORK_SIZE + TLS_PER_CELL_OVERHEAD)); + } else { + /* XXX: This can happen because tor sometimes does flush in an + * opportunistic way cells from the circuit to the outbuf so the + * channel can end up here without having anything to flush nor needed + * to write to the kernel. Hopefully we'll fix that soon but for now + * we have to handle this case which happens kind of often. */ + log_debug(LD_SCHED, + "We didn't flush anything on a chan that we think " + "can write and wants to write. The channel's state is '%s' " + "and in scheduler state %d. We're going to mark it as " + "waiting_for_cells (as that's most likely the issue) and " + "stop scheduling it this round.", + channel_state_to_string(chan->state), + chan->scheduler_state); + chan->scheduler_state = SCHED_CHAN_WAITING_FOR_CELLS; + continue; } - /* XXX What if we didn't flush? */ } /* Decide what to do with the channel now */ |