diff options
author | Michael Pratt <mpratt@google.com> | 2020-10-05 18:12:35 -0400 |
---|---|---|
committer | Michael Pratt <mpratt@google.com> | 2020-10-28 17:54:13 +0000 |
commit | fc116b69e2004c159d0f2563c6e91ac75a79f872 (patch) | |
tree | 9eff578e2c87dfd496aa00a9ca2ce869e689ed1e /src/runtime/runtime2.go | |
parent | 642329fdd55aabafc67b3a7c50902e29125621ab (diff) | |
download | go-fc116b69e2004c159d0f2563c6e91ac75a79f872.tar.gz go-fc116b69e2004c159d0f2563c6e91ac75a79f872.zip |
runtime: try to elide timer stealing if P has no timers
Following golang.org/cl/259578, findrunnable still must touch every
other P in checkTimers in order to look for timers to steal. This scales
poorly with GOMAXPROCS and potentially performs poorly by pulling remote
Ps into cache.
Add timerpMask, a bitmask that tracks whether each P may have any timers
on its timer heap.
Ideally we would update this field on any timer add / remove to always
keep it up to date. Unfortunately, updating a shared global structure is
antithetical to sharding timers by P, and doing so approximately doubles
the cost of addtimer / deltimer in microbenchmarks.
Instead we only (potentially) clear the mask when the P goes idle. This
covers the best case of avoiding looking at a P _at all_ when it is idle
and has no timers. See the comment on updateTimerPMask for more details
on the trade-off. Future CLs may be able to expand cases we can avoid
looking at the timers.
Note that the addition of idlepMask to p.init is a no-op. The zero value
of the mask is the correct init value so it is not necessary, but it is
included for clarity.
Benchmark results from WakeupParallel/syscall/pair/race/1ms (see
golang.org/cl/228577). Note that these are on top of golang.org/cl/259578:
name old msec new msec delta
Perf-task-clock-8 244 ± 4% 246 ± 4% ~ (p=0.841 n=5+5)
Perf-task-clock-16 247 ±11% 252 ± 4% ~ (p=1.000 n=5+5)
Perf-task-clock-32 270 ± 1% 268 ± 2% ~ (p=0.548 n=5+5)
Perf-task-clock-64 302 ± 3% 296 ± 1% ~ (p=0.222 n=5+5)
Perf-task-clock-128 358 ± 3% 352 ± 2% ~ (p=0.310 n=5+5)
Perf-task-clock-256 483 ± 3% 458 ± 1% -5.16% (p=0.008 n=5+5)
Perf-task-clock-512 663 ± 1% 612 ± 4% -7.61% (p=0.008 n=5+5)
Perf-task-clock-1024 1.06k ± 1% 0.95k ± 2% -10.24% (p=0.008 n=5+5)
Updates #28808
Updates #18237
Change-Id: I4239cd89f21ad16dfbbef58d81981da48acd0605
Reviewed-on: https://go-review.googlesource.com/c/go/+/264477
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Trust: Michael Pratt <mpratt@google.com>
Diffstat (limited to 'src/runtime/runtime2.go')
-rw-r--r-- | src/runtime/runtime2.go | 17 |
1 files changed, 14 insertions, 3 deletions
diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go index a2e4411c7d..2dbc0efca3 100644 --- a/src/runtime/runtime2.go +++ b/src/runtime/runtime2.go @@ -1052,15 +1052,26 @@ var ( sched schedt newprocs int32 - // allpLock protects P-less reads and size changes of allp and - // idlepMask, and all writes to allp. + // allpLock protects P-less reads and size changes of allp, idlepMask, + // and timerpMask, and all writes to allp. allpLock mutex // len(allp) == gomaxprocs; may change at safe points, otherwise // immutable. allp []*p // Bitmask of Ps in _Pidle list, one bit per P. Reads and writes must // be atomic. Length may change at safe points. - idlepMask pIdleMask + // + // Each P must update only its own bit. In order to maintain + // consistency, a P going idle must the idle mask simultaneously with + // updates to the idle P list under the sched.lock, otherwise a racing + // pidleget may clear the mask before pidleput sets the mask, + // corrupting the bitmap. + // + // N.B., procresize takes ownership of all Ps in stopTheWorldWithSema. + idlepMask pMask + // Bitmask of Ps that may have a timer, one bit per P. Reads and writes + // must be atomic. Length may change at safe points. + timerpMask pMask // Information about what cpu features are available. // Packages outside the runtime should not use these |