Age | Commit message (Collapse) | Author |
|
Use a bool instead of markKind;
it doesn't save space, but the semantics are more obvious.
Move type markKind closer to its only remaining use.
Change-Id: I9945a7baaeb764295a2709f83120ce3a82fa3beb
Reviewed-on: https://go-review.googlesource.com/c/go/+/177880
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
name old alloc/op new alloc/op delta
Template 37.1MB ± 0% 36.8MB ± 0% -0.57% (p=0.008 n=5+5)
Unicode 28.1MB ± 0% 28.1MB ± 0% -0.07% (p=0.008 n=5+5)
GoTypes 125MB ± 0% 124MB ± 0% -0.61% (p=0.008 n=5+5)
Compiler 571MB ± 0% 568MB ± 0% -0.60% (p=0.008 n=5+5)
SSA 1.88GB ± 0% 1.86GB ± 0% -0.82% (p=0.008 n=5+5)
Flate 22.9MB ± 0% 22.8MB ± 0% -0.59% (p=0.008 n=5+5)
GoParser 27.5MB ± 0% 27.3MB ± 0% -0.53% (p=0.008 n=5+5)
Reflect 79.8MB ± 0% 79.5MB ± 0% -0.40% (p=0.008 n=5+5)
Tar 34.9MB ± 0% 34.7MB ± 0% -0.44% (p=0.008 n=5+5)
XML 45.7MB ± 0% 45.4MB ± 0% -0.58% (p=0.008 n=5+5)
[Geo mean] 80.3MB 79.9MB -0.52%
name old allocs/op new allocs/op delta
Template 380k ± 0% 378k ± 0% -0.57% (p=0.008 n=5+5)
Unicode 340k ± 0% 340k ± 0% -0.08% (p=0.008 n=5+5)
GoTypes 1.36M ± 0% 1.36M ± 0% -0.44% (p=0.008 n=5+5)
Compiler 5.52M ± 0% 5.49M ± 0% -0.45% (p=0.008 n=5+5)
SSA 17.6M ± 0% 17.5M ± 0% -0.42% (p=0.008 n=5+5)
Flate 235k ± 0% 234k ± 0% -0.65% (p=0.008 n=5+5)
GoParser 302k ± 0% 300k ± 0% -0.70% (p=0.008 n=5+5)
Reflect 982k ± 0% 978k ± 0% -0.40% (p=0.008 n=5+5)
Tar 353k ± 0% 351k ± 0% -0.53% (p=0.008 n=5+5)
XML 437k ± 0% 435k ± 0% -0.48% (p=0.008 n=5+5)
[Geo mean] 844k 840k -0.47%
Updates #27739
Change-Id: I5d533013270cbbd7c0bad1b43da96c8499be76f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/177917
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Allocate two more ssa local worklists on the stack. The initial sizes
are chosen to cover >99% of the calls.
name old time/op new time/op delta
Template 281ms ± 2% 283ms ± 5% ~ (p=0.443 n=18+19)
Unicode 136ms ± 4% 135ms ± 7% ~ (p=0.277 n=20+20)
GoTypes 886ms ± 2% 885ms ± 2% ~ (p=0.862 n=20+20)
Compiler 4.03s ± 2% 4.02s ± 1% ~ (p=0.270 n=19+20)
SSA 9.66s ± 1% 9.64s ± 2% ~ (p=0.253 n=20+20)
Flate 186ms ± 5% 183ms ± 6% ~ (p=0.174 n=20+20)
GoParser 222ms ± 4% 219ms ± 4% ~ (p=0.081 n=20+20)
Reflect 569ms ± 2% 568ms ± 2% ~ (p=0.686 n=19+19)
Tar 258ms ± 4% 256ms ± 3% ~ (p=0.211 n=20+20)
XML 319ms ± 2% 317ms ± 3% ~ (p=0.158 n=18+20)
name old user-time/op new user-time/op delta
Template 396ms ± 6% 392ms ± 6% ~ (p=0.211 n=20+20)
Unicode 212ms ±10% 211ms ± 9% ~ (p=0.904 n=20+20)
GoTypes 1.21s ± 3% 1.21s ± 2% ~ (p=0.183 n=20+20)
Compiler 5.60s ± 2% 5.62s ± 2% ~ (p=0.355 n=18+18)
SSA 14.0s ± 6% 13.9s ± 5% ~ (p=0.678 n=20+20)
Flate 250ms ± 8% 245ms ± 6% ~ (p=0.166 n=19+20)
GoParser 305ms ± 6% 304ms ± 5% ~ (p=0.659 n=20+20)
Reflect 760ms ± 3% 758ms ± 4% ~ (p=0.758 n=20+20)
Tar 362ms ± 6% 357ms ± 5% ~ (p=0.108 n=20+20)
XML 429ms ± 4% 429ms ± 4% ~ (p=0.799 n=20+20)
name old alloc/op new alloc/op delta
Template 39.0MB ± 0% 38.8MB ± 0% -0.55% (p=0.000 n=20+20)
Unicode 29.1MB ± 0% 29.1MB ± 0% -0.06% (p=0.000 n=20+20)
GoTypes 116MB ± 0% 115MB ± 0% -0.50% (p=0.000 n=20+20)
Compiler 493MB ± 0% 491MB ± 0% -0.46% (p=0.000 n=19+20)
SSA 1.40GB ± 0% 1.40GB ± 0% -0.31% (p=0.000 n=19+20)
Flate 25.0MB ± 0% 24.9MB ± 0% -0.60% (p=0.000 n=19+19)
GoParser 30.9MB ± 0% 30.7MB ± 0% -0.66% (p=0.000 n=20+20)
Reflect 77.5MB ± 0% 77.1MB ± 0% -0.52% (p=0.000 n=20+20)
Tar 39.2MB ± 0% 39.0MB ± 0% -0.47% (p=0.000 n=20+20)
XML 44.8MB ± 0% 44.6MB ± 0% -0.45% (p=0.000 n=20+19)
name old allocs/op new allocs/op delta
Template 382k ± 0% 379k ± 0% -0.69% (p=0.000 n=20+19)
Unicode 337k ± 0% 336k ± 0% -0.09% (p=0.000 n=20+20)
GoTypes 1.19M ± 0% 1.18M ± 0% -0.64% (p=0.000 n=20+20)
Compiler 4.60M ± 0% 4.58M ± 0% -0.57% (p=0.000 n=20+20)
SSA 11.5M ± 0% 11.4M ± 0% -0.42% (p=0.000 n=19+20)
Flate 235k ± 0% 233k ± 0% -0.74% (p=0.000 n=20+19)
GoParser 316k ± 0% 313k ± 0% -0.69% (p=0.000 n=20+20)
Reflect 953k ± 0% 946k ± 0% -0.81% (p=0.000 n=20+20)
Tar 391k ± 0% 388k ± 0% -0.61% (p=0.000 n=20+19)
XML 413k ± 0% 411k ± 0% -0.56% (p=0.000 n=20+20)
Change-Id: I7378174e3550b47df4368b24cf24c8ce1b85c906
Reviewed-on: https://go-review.googlesource.com/104656
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
|
|
Replace derecursed postorder computation with one that
mimics DFS traversal.
Corrected outerinner function in loopfinder
Leave enhanced checks in place.
Change-Id: I657ba5e89c88941028d6d4c72e9f9056e30f1ce8
Reviewed-on: https://go-review.googlesource.com/40872
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.
The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.
DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.
Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.
Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Previously we always issued a spill right after the op
that was being spilled. This CL pushes spills father away
from the generator, hopefully pushing them into unlikely branches.
For example:
x = ...
if unlikely {
call ...
}
... use x ...
Used to compile to
x = ...
spill x
if unlikely {
call ...
restore x
}
It now compiles to
x = ...
if unlikely {
spill x
call ...
restore x
}
This is particularly useful for code which appends, as the only
call is an unlikely call to growslice. It also helps for the
spills needed around write barrier calls.
The basic algorithm is walk down the dominator tree following a
path where the block still dominates all of the restores. We're
looking for a block that:
1) dominates all restores
2) has the value being spilled in a register
3) has a loop depth no deeper than the value being spilled
The walking-down code is iterative. I was forced to limit it to
searching 100 blocks so it doesn't become O(n^2). Maybe one day
we'll find a better way.
I had to delete most of David's code which pushed spills out of loops.
I suspect this CL subsumes most of the cases that his code handled.
Generally positive performance improvements, but hard to tell for sure
with all the noise. (compilebench times are unchanged.)
name old time/op new time/op delta
BinaryTree17-12 2.91s ±15% 2.80s ±12% ~ (p=0.063 n=10+10)
Fannkuch11-12 3.47s ± 0% 3.30s ± 4% -4.91% (p=0.000 n=9+10)
FmtFprintfEmpty-12 48.0ns ± 1% 47.4ns ± 1% -1.32% (p=0.002 n=9+9)
FmtFprintfString-12 85.6ns ±11% 79.4ns ± 3% -7.27% (p=0.005 n=10+10)
FmtFprintfInt-12 91.8ns ±10% 85.9ns ± 4% ~ (p=0.203 n=10+9)
FmtFprintfIntInt-12 135ns ±13% 127ns ± 1% -5.72% (p=0.025 n=10+9)
FmtFprintfPrefixedInt-12 167ns ± 1% 168ns ± 2% ~ (p=0.580 n=9+10)
FmtFprintfFloat-12 249ns ±11% 230ns ± 1% -7.32% (p=0.000 n=10+10)
FmtManyArgs-12 504ns ± 7% 506ns ± 1% ~ (p=0.198 n=9+9)
GobDecode-12 6.95ms ± 1% 7.04ms ± 1% +1.37% (p=0.001 n=10+10)
GobEncode-12 6.32ms ±13% 6.04ms ± 1% ~ (p=0.063 n=10+10)
Gzip-12 233ms ± 1% 235ms ± 0% +1.01% (p=0.000 n=10+9)
Gunzip-12 40.1ms ± 1% 39.6ms ± 0% -1.12% (p=0.000 n=10+8)
HTTPClientServer-12 227µs ± 9% 221µs ± 5% ~ (p=0.114 n=9+8)
JSONEncode-12 16.1ms ± 2% 15.8ms ± 1% -2.09% (p=0.002 n=9+8)
JSONDecode-12 61.8ms ±11% 57.9ms ± 1% -6.30% (p=0.000 n=10+9)
Mandelbrot200-12 4.30ms ± 3% 4.28ms ± 1% ~ (p=0.203 n=10+8)
GoParse-12 3.18ms ± 2% 3.18ms ± 2% ~ (p=0.579 n=10+10)
RegexpMatchEasy0_32-12 76.7ns ± 1% 77.5ns ± 1% +0.92% (p=0.002 n=9+8)
RegexpMatchEasy0_1K-12 239ns ± 3% 239ns ± 1% ~ (p=0.204 n=10+10)
RegexpMatchEasy1_32-12 71.4ns ± 1% 70.6ns ± 0% -1.15% (p=0.000 n=10+9)
RegexpMatchEasy1_1K-12 383ns ± 2% 390ns ±10% ~ (p=0.181 n=8+9)
RegexpMatchMedium_32-12 114ns ± 0% 113ns ± 1% -0.88% (p=0.000 n=9+8)
RegexpMatchMedium_1K-12 36.3µs ± 1% 36.8µs ± 1% +1.59% (p=0.000 n=10+8)
RegexpMatchHard_32-12 1.90µs ± 1% 1.90µs ± 1% ~ (p=0.341 n=10+10)
RegexpMatchHard_1K-12 59.4µs ±11% 57.8µs ± 1% ~ (p=0.968 n=10+9)
Revcomp-12 461ms ± 1% 462ms ± 1% ~ (p=1.000 n=9+9)
Template-12 67.5ms ± 1% 66.3ms ± 1% -1.77% (p=0.000 n=10+8)
TimeParse-12 314ns ± 3% 309ns ± 0% -1.56% (p=0.000 n=9+8)
TimeFormat-12 340ns ± 2% 331ns ± 1% -2.79% (p=0.000 n=10+10)
The go binary is 0.2% larger. Not really sure why the size
would change.
Change-Id: Ia5116e53a3aeb025ef350ffc51c14ae5cc17871c
Reviewed-on: https://go-review.googlesource.com/34822
Reviewed-by: David Chase <drchase@google.com>
|
|
We compute a lot of stuff based off the CFG: postorder traversal,
dominators, dominator tree, loop nest. Multiple phases use this
information and we end up recomputing some of it. Add a cache
for this information so if the CFG hasn't changed, we can reuse
the previous computation.
Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2
Reviewed-on: https://go-review.googlesource.com/29356
Reviewed-by: David Chase <drchase@google.com>
|
|
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Provide indexes along with block pointers for Preds
and Succs arrays. This allows us to splice edges in
and out of those arrays in constant time.
Fixes worst-case O(n^2) behavior in deadcode and fuse.
benchmark old ns/op new ns/op delta
BenchmarkFuse1-8 2065 2057 -0.39%
BenchmarkFuse10-8 9408 9073 -3.56%
BenchmarkFuse100-8 105238 76277 -27.52%
BenchmarkFuse1000-8 3982562 1026750 -74.22%
BenchmarkFuse10000-8 301220329 12824005 -95.74%
BenchmarkDeadCode1-8 1588 1566 -1.39%
BenchmarkDeadCode10-8 4333 4250 -1.92%
BenchmarkDeadCode100-8 32031 32574 +1.70%
BenchmarkDeadCode1000-8 590407 468275 -20.69%
BenchmarkDeadCode10000-8 17822890 5000818 -71.94%
BenchmarkDeadCode100000-8 1388706640 78021127 -94.38%
BenchmarkDeadCode200000-8 5372518479 168598762 -96.86%
Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17
Reviewed-on: https://go-review.googlesource.com/22589
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Here, "fix" means "replace". The new dominator computation
is the "simple" algorithm from Lengauer and Tarjan's TOPLAS
paper, with minimal changes.
Also included is a test that tweaks the fixed error.
Change-Id: I0abdf53d5d64df1e67e4e62f55e88957045cd63b
Reviewed-on: https://go-review.googlesource.com/22401
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Replaced incorrect recursion-free rendering of DFS with
something that was correct. Enhanced test with all
permutations of IF successors to ensure that all possible
DFS traversals are exercised.
Test is improved version of
https://go-review.googlesource.com/#/c/22334
Update 15084.
Change-Id: I6e944c41244e47fe5f568dfc2b360ff93b94079e
Reviewed-on: https://go-review.googlesource.com/22347
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
|
|
These passes do not modify the dominator tree too much.
% benchstat old.txt new.txt
name old time/op new time/op delta
Template 335ms ± 3% 325ms ± 8% ~ (p=0.074 n=8+9)
GoTypes 1.05s ± 1% 1.05s ± 3% ~ (p=0.095 n=9+10)
Compiler 5.37s ± 4% 5.29s ± 1% -1.42% (p=0.022 n=9+10)
MakeBash 34.9s ± 3% 34.4s ± 2% ~ (p=0.095 n=9+10)
name old alloc/op new alloc/op delta
Template 55.4MB ± 0% 54.9MB ± 0% -0.81% (p=0.000 n=10+10)
GoTypes 179MB ± 0% 178MB ± 0% -0.89% (p=0.000 n=10+10)
Compiler 807MB ± 0% 798MB ± 0% -1.10% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Template 498k ± 0% 496k ± 0% -0.29% (p=0.000 n=9+9)
GoTypes 1.42M ± 0% 1.41M ± 0% -0.24% (p=0.000 n=10+10)
Compiler 5.61M ± 0% 5.60M ± 0% -0.12% (p=0.000 n=10+10)
Change-Id: I4cd20cfba3f132ebf371e16046ab14d7e42799ec
Reviewed-on: https://go-review.googlesource.com/21806
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Change-Id: I91873aaebf79bdf1c00d38aacc1a1fb8d79656a7
Reviewed-on: https://go-review.googlesource.com/21433
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This commit replaces some of
for i := len(x) - 1; i >= 0; i-- {...}
style loops, which do not rely on reverse iteration order.
Change-Id: I5542834286562da058200c06e7a173b13760e54d
Reviewed-on: https://go-review.googlesource.com/21044
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Spotted a minor source of excess allocation in the register
allocator. Rearranged the dominator tree code to pull its
scratch memory from a reused buffer attached to Config.
Change-Id: I6da6e7b112f7d3eb1fd00c58faa8214cdea44e38
Reviewed-on: https://go-review.googlesource.com/19450
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Converted working slices of pointer into slices of pointer
index. Half the size (on 64-bit machine) and no pointers
to trace if GC occurs while they're live.
TODO - could expose slice mapping ID->*Block; some dom
clients also construct these.
Minor optimization in regalloc that cuts allocation count.
Minor optimization in compile.go that cuts calls to Sprintf.
Change-Id: I28f0bfed422b7344af333dc52ea272441e28e463
Reviewed-on: https://go-review.googlesource.com/19104
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
|
|
Use faulting loads instead of test/jeq to do nil checks.
Fold nil checks into a following load/store if possible.
Makes binaries about 2% smaller.
Change-Id: I54af0f0a93c853f37e34e0ce7e3f01dd2ac87f64
Reviewed-on: https://go-review.googlesource.com/16287
Reviewed-by: David Chase <drchase@google.com>
|
|
Move to implicit (mostly) instead of explicit exit blocks.
RET and RETJMP have no outgoing edges - they implicitly exit.
CALL only has one outgoing edge, as its exception edge is
implicit as well.
Exit blocks are only used for unconditionally panicking code,
like the failed branches of nil and bounds checks.
There may now be more than one exit block. No merges happen
at exit blocks.
The only downside is it is harder to find all the places code
can exit the method. See the reverse dominator code for an
example.
Change-Id: I42e2fd809a4bf81301ab993e29ad9f203ce48eb0
Reviewed-on: https://go-review.googlesource.com/14462
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Implements the simple Lengauer-Tarjan algorithm for dominator
and post-dominator calculation.
benchmark old ns/op new ns/op delta
BenchmarkDominatorsLinear-8 1403862 1292741 -7.92%
BenchmarkDominatorsFwdBack-8 1270633 1428285 +12.41%
BenchmarkDominatorsManyPred-8 225932354 1530886 -99.32%
BenchmarkDominatorsMaxPred-8 445994225 1393612 -99.69%
BenchmarkDominatorsMaxPredVal-8 447235248 1246899 -99.72%
BenchmarkNilCheckDeep1-8 829 1259 +51.87%
BenchmarkNilCheckDeep10-8 2199 2397 +9.00%
BenchmarkNilCheckDeep100-8 57325 29405 -48.70%
BenchmarkNilCheckDeep1000-8 6625837 2933151 -55.73%
BenchmarkNilCheckDeep10000-8 763559787 319105541 -58.21%
benchmark old MB/s new MB/s speedup
BenchmarkDominatorsLinear-8 7.12 7.74 1.09x
BenchmarkDominatorsFwdBack-8 7.87 7.00 0.89x
BenchmarkDominatorsManyPred-8 0.04 6.53 163.25x
BenchmarkDominatorsMaxPred-8 0.02 7.18 359.00x
BenchmarkDominatorsMaxPredVal-8 0.02 8.02 401.00x
BenchmarkNilCheckDeep1-8 1.21 0.79 0.65x
BenchmarkNilCheckDeep10-8 4.55 4.17 0.92x
BenchmarkNilCheckDeep100-8 1.74 3.40 1.95x
BenchmarkNilCheckDeep1000-8 0.15 0.34 2.27x
BenchmarkNilCheckDeep10000-8 0.01 0.03 3.00x
Change-Id: Icec3d774422a9bc64914779804c8c0ab73aa72bf
Reviewed-on: https://go-review.googlesource.com/11971
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This change has some tests verifying functionality and an assortment of
benchmarks of various block lists. It modifies NewBlock to allocate in
contiguous blocks improving the performance of intersect() for extremely
large graphs by 30-40%.
benchmark old ns/op new ns/op delta
BenchmarkDominatorsLinear-8 1185619 901154 -23.99%
BenchmarkDominatorsFwdBack-8 1302138 863537 -33.68%
BenchmarkDominatorsManyPred-8 404670521 247450911 -38.85%
BenchmarkDominatorsMaxPred-8 455809002 471675119 +3.48%
BenchmarkDominatorsMaxPredVal-8 819315864 468257300 -42.85%
BenchmarkNilCheckDeep1-8 766 706 -7.83%
BenchmarkNilCheckDeep10-8 2553 2209 -13.47%
BenchmarkNilCheckDeep100-8 58606 57545 -1.81%
BenchmarkNilCheckDeep1000-8 7753012 8025750 +3.52%
BenchmarkNilCheckDeep10000-8 1224165946 789995184 -35.47%
Change-Id: Id3d6bc9cb1138e8177934441073ac7873ddf7ade
Reviewed-on: https://go-review.googlesource.com/11716
Reviewed-by: Keith Randall <khr@golang.org>
|
|
These benchmarks demonstrate that
the nilcheckelim pass is roughly O(n^2):
BenchmarkNilCheckDeep1 2000000 741 ns/op 1.35 MB/s
BenchmarkNilCheckDeep10 1000000 2237 ns/op 4.47 MB/s
BenchmarkNilCheckDeep100 20000 60713 ns/op 1.65 MB/s
BenchmarkNilCheckDeep1000 200 7925198 ns/op 0.13 MB/s
BenchmarkNilCheckDeep10000 1 1220104252 ns/op 0.01 MB/s
Profiling suggests that building the
dominator tree is also O(n^2),
and before size factors take over,
considerably more expensive than nilcheckelim.
Change-Id: If966b38ec52243a25f355dab871300d29db02e16
Reviewed-on: https://go-review.googlesource.com/11520
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Requested in CL 11380.
Change-Id: Icf0d23fb8d383c76272401e363cc9b2169d11403
Reviewed-on: https://go-review.googlesource.com/11450
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
The SSA implementation logs for three purposes:
* debug logging
* fatal errors
* unimplemented features
Separating these three uses lets us attempt an SSA
implementation for all functions, not just
_ssa functions. This turns the entire standard
library into a compilation test, and makes it
easy to figure out things like
"how much coverage does SSA have now" and
"what should we do next to get more coverage?".
Functions called _ssa are still special.
They log profusely by default and
the output of the SSA implementation
is used. For all other functions,
logging is off, and the implementation
is built and discarded, due to lack of
support for the runtime.
While we're here, fix a few minor bugs and
add some extra Unimplementeds to allow
all.bash to pass.
As of now, SSA handles 20.79% of the functions
in the standard library (689 of 3314).
The top missing features are:
10.03% 2597 SSA unimplemented: zero for type error not implemented
7.79% 2016 SSA unimplemented: addr: bad op DOTPTR
7.33% 1898 SSA unimplemented: unhandled expr EQ
6.10% 1579 SSA unimplemented: unhandled expr OROR
4.91% 1271 SSA unimplemented: unhandled expr NE
4.49% 1163 SSA unimplemented: unhandled expr LROT
4.00% 1036 SSA unimplemented: unhandled expr LEN
3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC
2.37% 615 SSA unimplemented: zero for type []byte not implemented
1.90% 492 SSA unimplemented: unhandled stmt CALLMETH
1.74% 450 SSA unimplemented: unhandled expr CALLINTER
1.74% 450 SSA unimplemented: unhandled expr DOT
1.71% 444 SSA unimplemented: unhandled expr ANDAND
1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR
1.54% 400 SSA unimplemented: unhandled expr CALLMETH
1.51% 390 SSA unimplemented: unhandled stmt SWITCH
1.47% 380 SSA unimplemented: unhandled expr CONV
1.33% 345 SSA unimplemented: addr: bad op *
1.30% 336 SSA unimplemented: unhandled OLITERAL 6
Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7
Reviewed-on: https://go-review.googlesource.com/11160
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Semi-regular merge of tip to dev.ssa.
Complicated a bit by the move of cmd/internal/* to cmd/compile/internal/*.
Change-Id: I1c66d3c29bb95cce4a53c5a3476373aa5245303d
|