aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/dom.go
AgeCommit message (Collapse)Author
2019-08-29cmd/compile: simplify postorderJosh Bleecher Snyder
Use a bool instead of markKind; it doesn't save space, but the semantics are more obvious. Move type markKind closer to its only remaining use. Change-Id: I9945a7baaeb764295a2709f83120ce3a82fa3beb Reviewed-on: https://go-review.googlesource.com/c/go/+/177880 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2019-05-17cmd/compile: optimize postorderJosh Bleecher Snyder
name old alloc/op new alloc/op delta Template 37.1MB ± 0% 36.8MB ± 0% -0.57% (p=0.008 n=5+5) Unicode 28.1MB ± 0% 28.1MB ± 0% -0.07% (p=0.008 n=5+5) GoTypes 125MB ± 0% 124MB ± 0% -0.61% (p=0.008 n=5+5) Compiler 571MB ± 0% 568MB ± 0% -0.60% (p=0.008 n=5+5) SSA 1.88GB ± 0% 1.86GB ± 0% -0.82% (p=0.008 n=5+5) Flate 22.9MB ± 0% 22.8MB ± 0% -0.59% (p=0.008 n=5+5) GoParser 27.5MB ± 0% 27.3MB ± 0% -0.53% (p=0.008 n=5+5) Reflect 79.8MB ± 0% 79.5MB ± 0% -0.40% (p=0.008 n=5+5) Tar 34.9MB ± 0% 34.7MB ± 0% -0.44% (p=0.008 n=5+5) XML 45.7MB ± 0% 45.4MB ± 0% -0.58% (p=0.008 n=5+5) [Geo mean] 80.3MB 79.9MB -0.52% name old allocs/op new allocs/op delta Template 380k ± 0% 378k ± 0% -0.57% (p=0.008 n=5+5) Unicode 340k ± 0% 340k ± 0% -0.08% (p=0.008 n=5+5) GoTypes 1.36M ± 0% 1.36M ± 0% -0.44% (p=0.008 n=5+5) Compiler 5.52M ± 0% 5.49M ± 0% -0.45% (p=0.008 n=5+5) SSA 17.6M ± 0% 17.5M ± 0% -0.42% (p=0.008 n=5+5) Flate 235k ± 0% 234k ± 0% -0.65% (p=0.008 n=5+5) GoParser 302k ± 0% 300k ± 0% -0.70% (p=0.008 n=5+5) Reflect 982k ± 0% 978k ± 0% -0.40% (p=0.008 n=5+5) Tar 353k ± 0% 351k ± 0% -0.53% (p=0.008 n=5+5) XML 437k ± 0% 435k ± 0% -0.48% (p=0.008 n=5+5) [Geo mean] 844k 840k -0.47% Updates #27739 Change-Id: I5d533013270cbbd7c0bad1b43da96c8499be76f5 Reviewed-on: https://go-review.googlesource.com/c/go/+/177917 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-04cmd/compile: stack-allocate 2 worklists in order, dom passesAlberto Donizetti
Allocate two more ssa local worklists on the stack. The initial sizes are chosen to cover >99% of the calls. name old time/op new time/op delta Template 281ms ± 2% 283ms ± 5% ~ (p=0.443 n=18+19) Unicode 136ms ± 4% 135ms ± 7% ~ (p=0.277 n=20+20) GoTypes 886ms ± 2% 885ms ± 2% ~ (p=0.862 n=20+20) Compiler 4.03s ± 2% 4.02s ± 1% ~ (p=0.270 n=19+20) SSA 9.66s ± 1% 9.64s ± 2% ~ (p=0.253 n=20+20) Flate 186ms ± 5% 183ms ± 6% ~ (p=0.174 n=20+20) GoParser 222ms ± 4% 219ms ± 4% ~ (p=0.081 n=20+20) Reflect 569ms ± 2% 568ms ± 2% ~ (p=0.686 n=19+19) Tar 258ms ± 4% 256ms ± 3% ~ (p=0.211 n=20+20) XML 319ms ± 2% 317ms ± 3% ~ (p=0.158 n=18+20) name old user-time/op new user-time/op delta Template 396ms ± 6% 392ms ± 6% ~ (p=0.211 n=20+20) Unicode 212ms ±10% 211ms ± 9% ~ (p=0.904 n=20+20) GoTypes 1.21s ± 3% 1.21s ± 2% ~ (p=0.183 n=20+20) Compiler 5.60s ± 2% 5.62s ± 2% ~ (p=0.355 n=18+18) SSA 14.0s ± 6% 13.9s ± 5% ~ (p=0.678 n=20+20) Flate 250ms ± 8% 245ms ± 6% ~ (p=0.166 n=19+20) GoParser 305ms ± 6% 304ms ± 5% ~ (p=0.659 n=20+20) Reflect 760ms ± 3% 758ms ± 4% ~ (p=0.758 n=20+20) Tar 362ms ± 6% 357ms ± 5% ~ (p=0.108 n=20+20) XML 429ms ± 4% 429ms ± 4% ~ (p=0.799 n=20+20) name old alloc/op new alloc/op delta Template 39.0MB ± 0% 38.8MB ± 0% -0.55% (p=0.000 n=20+20) Unicode 29.1MB ± 0% 29.1MB ± 0% -0.06% (p=0.000 n=20+20) GoTypes 116MB ± 0% 115MB ± 0% -0.50% (p=0.000 n=20+20) Compiler 493MB ± 0% 491MB ± 0% -0.46% (p=0.000 n=19+20) SSA 1.40GB ± 0% 1.40GB ± 0% -0.31% (p=0.000 n=19+20) Flate 25.0MB ± 0% 24.9MB ± 0% -0.60% (p=0.000 n=19+19) GoParser 30.9MB ± 0% 30.7MB ± 0% -0.66% (p=0.000 n=20+20) Reflect 77.5MB ± 0% 77.1MB ± 0% -0.52% (p=0.000 n=20+20) Tar 39.2MB ± 0% 39.0MB ± 0% -0.47% (p=0.000 n=20+20) XML 44.8MB ± 0% 44.6MB ± 0% -0.45% (p=0.000 n=20+19) name old allocs/op new allocs/op delta Template 382k ± 0% 379k ± 0% -0.69% (p=0.000 n=20+19) Unicode 337k ± 0% 336k ± 0% -0.09% (p=0.000 n=20+20) GoTypes 1.19M ± 0% 1.18M ± 0% -0.64% (p=0.000 n=20+20) Compiler 4.60M ± 0% 4.58M ± 0% -0.57% (p=0.000 n=20+20) SSA 11.5M ± 0% 11.4M ± 0% -0.42% (p=0.000 n=19+20) Flate 235k ± 0% 233k ± 0% -0.74% (p=0.000 n=20+19) GoParser 316k ± 0% 313k ± 0% -0.69% (p=0.000 n=20+20) Reflect 953k ± 0% 946k ± 0% -0.81% (p=0.000 n=20+20) Tar 391k ± 0% 388k ± 0% -0.61% (p=0.000 n=20+19) XML 413k ± 0% 411k ± 0% -0.56% (p=0.000 n=20+20) Change-Id: I7378174e3550b47df4368b24cf24c8ce1b85c906 Reviewed-on: https://go-review.googlesource.com/104656 Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2017-04-19cmd/compile: enhance postorder computation and repair loop finderDavid Chase
Replace derecursed postorder computation with one that mimics DFS traversal. Corrected outerinner function in loopfinder Leave enhanced checks in place. Change-Id: I657ba5e89c88941028d6d4c72e9f9056e30f1ce8 Reviewed-on: https://go-review.googlesource.com/40872 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-03-17cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.ConfigJosh Bleecher Snyder
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill the roles laid out for them in CL 38160. The only non-trivial change in this CL is how cached values and blocks get IDs. Prior to this CL, their IDs were assigned as part of resetting the cache, and only modified IDs were reset. This required knowing how many values and blocks were modified, which required a tight coupling between ssa.Func and ssa.Config. To eliminate that coupling, we now zero values and blocks during reset, and assign their IDs when they are used. Since unused values and blocks have ID == 0, we can efficiently find the last used value/block, to avoid zeroing everything. Bulk zeroing is efficient, but not efficient enough to obviate the need to avoid zeroing everything every time. As a happy side-effect, ssa.Func.Free is no longer necessary. DebugHashMatch and friends now belong in func.go. They have been left in place for clarity and review. I will move them in a subsequent CL. Passes toolstash -cmp. No compiler performance impact. No change in 'go test cmd/compile/internal/ssa' execution time. Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098 Reviewed-on: https://go-review.googlesource.com/38167 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-03-15cmd/compile: put spills in better placesDavid Chase
Previously we always issued a spill right after the op that was being spilled. This CL pushes spills father away from the generator, hopefully pushing them into unlikely branches. For example: x = ... if unlikely { call ... } ... use x ... Used to compile to x = ... spill x if unlikely { call ... restore x } It now compiles to x = ... if unlikely { spill x call ... restore x } This is particularly useful for code which appends, as the only call is an unlikely call to growslice. It also helps for the spills needed around write barrier calls. The basic algorithm is walk down the dominator tree following a path where the block still dominates all of the restores. We're looking for a block that: 1) dominates all restores 2) has the value being spilled in a register 3) has a loop depth no deeper than the value being spilled The walking-down code is iterative. I was forced to limit it to searching 100 blocks so it doesn't become O(n^2). Maybe one day we'll find a better way. I had to delete most of David's code which pushed spills out of loops. I suspect this CL subsumes most of the cases that his code handled. Generally positive performance improvements, but hard to tell for sure with all the noise. (compilebench times are unchanged.) name old time/op new time/op delta BinaryTree17-12 2.91s ±15% 2.80s ±12% ~ (p=0.063 n=10+10) Fannkuch11-12 3.47s ± 0% 3.30s ± 4% -4.91% (p=0.000 n=9+10) FmtFprintfEmpty-12 48.0ns ± 1% 47.4ns ± 1% -1.32% (p=0.002 n=9+9) FmtFprintfString-12 85.6ns ±11% 79.4ns ± 3% -7.27% (p=0.005 n=10+10) FmtFprintfInt-12 91.8ns ±10% 85.9ns ± 4% ~ (p=0.203 n=10+9) FmtFprintfIntInt-12 135ns ±13% 127ns ± 1% -5.72% (p=0.025 n=10+9) FmtFprintfPrefixedInt-12 167ns ± 1% 168ns ± 2% ~ (p=0.580 n=9+10) FmtFprintfFloat-12 249ns ±11% 230ns ± 1% -7.32% (p=0.000 n=10+10) FmtManyArgs-12 504ns ± 7% 506ns ± 1% ~ (p=0.198 n=9+9) GobDecode-12 6.95ms ± 1% 7.04ms ± 1% +1.37% (p=0.001 n=10+10) GobEncode-12 6.32ms ±13% 6.04ms ± 1% ~ (p=0.063 n=10+10) Gzip-12 233ms ± 1% 235ms ± 0% +1.01% (p=0.000 n=10+9) Gunzip-12 40.1ms ± 1% 39.6ms ± 0% -1.12% (p=0.000 n=10+8) HTTPClientServer-12 227µs ± 9% 221µs ± 5% ~ (p=0.114 n=9+8) JSONEncode-12 16.1ms ± 2% 15.8ms ± 1% -2.09% (p=0.002 n=9+8) JSONDecode-12 61.8ms ±11% 57.9ms ± 1% -6.30% (p=0.000 n=10+9) Mandelbrot200-12 4.30ms ± 3% 4.28ms ± 1% ~ (p=0.203 n=10+8) GoParse-12 3.18ms ± 2% 3.18ms ± 2% ~ (p=0.579 n=10+10) RegexpMatchEasy0_32-12 76.7ns ± 1% 77.5ns ± 1% +0.92% (p=0.002 n=9+8) RegexpMatchEasy0_1K-12 239ns ± 3% 239ns ± 1% ~ (p=0.204 n=10+10) RegexpMatchEasy1_32-12 71.4ns ± 1% 70.6ns ± 0% -1.15% (p=0.000 n=10+9) RegexpMatchEasy1_1K-12 383ns ± 2% 390ns ±10% ~ (p=0.181 n=8+9) RegexpMatchMedium_32-12 114ns ± 0% 113ns ± 1% -0.88% (p=0.000 n=9+8) RegexpMatchMedium_1K-12 36.3µs ± 1% 36.8µs ± 1% +1.59% (p=0.000 n=10+8) RegexpMatchHard_32-12 1.90µs ± 1% 1.90µs ± 1% ~ (p=0.341 n=10+10) RegexpMatchHard_1K-12 59.4µs ±11% 57.8µs ± 1% ~ (p=0.968 n=10+9) Revcomp-12 461ms ± 1% 462ms ± 1% ~ (p=1.000 n=9+9) Template-12 67.5ms ± 1% 66.3ms ± 1% -1.77% (p=0.000 n=10+8) TimeParse-12 314ns ± 3% 309ns ± 0% -1.56% (p=0.000 n=9+8) TimeFormat-12 340ns ± 2% 331ns ± 1% -2.79% (p=0.000 n=10+10) The go binary is 0.2% larger. Not really sure why the size would change. Change-Id: Ia5116e53a3aeb025ef350ffc51c14ae5cc17871c Reviewed-on: https://go-review.googlesource.com/34822 Reviewed-by: David Chase <drchase@google.com>
2016-09-19cmd/compile: cache CFG-dependent computationsKeith Randall
We compute a lot of stuff based off the CFG: postorder traversal, dominators, dominator tree, loop nest. Multiple phases use this information and we end up recomputing some of it. Add a cache for this information so if the CFG hasn't changed, we can reuse the previous computation. Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2 Reviewed-on: https://go-review.googlesource.com/29356 Reviewed-by: David Chase <drchase@google.com>
2016-05-16cmd/compile: use sparse algorithm for phis in large programDavid Chase
This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-05cmd/compile: enable constant-time CFG editingKeith Randall
Provide indexes along with block pointers for Preds and Succs arrays. This allows us to splice edges in and out of those arrays in constant time. Fixes worst-case O(n^2) behavior in deadcode and fuse. benchmark old ns/op new ns/op delta BenchmarkFuse1-8 2065 2057 -0.39% BenchmarkFuse10-8 9408 9073 -3.56% BenchmarkFuse100-8 105238 76277 -27.52% BenchmarkFuse1000-8 3982562 1026750 -74.22% BenchmarkFuse10000-8 301220329 12824005 -95.74% BenchmarkDeadCode1-8 1588 1566 -1.39% BenchmarkDeadCode10-8 4333 4250 -1.92% BenchmarkDeadCode100-8 32031 32574 +1.70% BenchmarkDeadCode1000-8 590407 468275 -20.69% BenchmarkDeadCode10000-8 17822890 5000818 -71.94% BenchmarkDeadCode100000-8 1388706640 78021127 -94.38% BenchmarkDeadCode200000-8 5372518479 168598762 -96.86% Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17 Reviewed-on: https://go-review.googlesource.com/22589 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-26cmd/compile: fix another bug in dominator computationDavid Chase
Here, "fix" means "replace". The new dominator computation is the "simple" algorithm from Lengauer and Tarjan's TOPLAS paper, with minimal changes. Also included is a test that tweaks the fixed error. Change-Id: I0abdf53d5d64df1e67e4e62f55e88957045cd63b Reviewed-on: https://go-review.googlesource.com/22401 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-04-22cmd/compile: in a Tarjan algorithm, DFS should really be DFSDavid Chase
Replaced incorrect recursion-free rendering of DFS with something that was correct. Enhanced test with all permutations of IF successors to ensure that all possible DFS traversals are exercised. Test is improved version of https://go-review.googlesource.com/#/c/22334 Update 15084. Change-Id: I6e944c41244e47fe5f568dfc2b360ff93b94079e Reviewed-on: https://go-review.googlesource.com/22347 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com>
2016-04-12cmd/compile: share dominator tree among many passesAlexandru Moșoi
These passes do not modify the dominator tree too much. % benchstat old.txt new.txt name old time/op new time/op delta Template 335ms ± 3% 325ms ± 8% ~ (p=0.074 n=8+9) GoTypes 1.05s ± 1% 1.05s ± 3% ~ (p=0.095 n=9+10) Compiler 5.37s ± 4% 5.29s ± 1% -1.42% (p=0.022 n=9+10) MakeBash 34.9s ± 3% 34.4s ± 2% ~ (p=0.095 n=9+10) name old alloc/op new alloc/op delta Template 55.4MB ± 0% 54.9MB ± 0% -0.81% (p=0.000 n=10+10) GoTypes 179MB ± 0% 178MB ± 0% -0.89% (p=0.000 n=10+10) Compiler 807MB ± 0% 798MB ± 0% -1.10% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 498k ± 0% 496k ± 0% -0.29% (p=0.000 n=9+9) GoTypes 1.42M ± 0% 1.41M ± 0% -0.24% (p=0.000 n=10+10) Compiler 5.61M ± 0% 5.60M ± 0% -0.12% (p=0.000 n=10+10) Change-Id: I4cd20cfba3f132ebf371e16046ab14d7e42799ec Reviewed-on: https://go-review.googlesource.com/21806 Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-04-03all: fix spelling mistakesEric Engestrom
Signed-off-by: Eric Engestrom <eric@engestrom.ch> Change-Id: I91873aaebf79bdf1c00d38aacc1a1fb8d79656a7 Reviewed-on: https://go-review.googlesource.com/21433 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-23cmd/compile: prettify loop iterationsMarvin Stenger
This commit replaces some of for i := len(x) - 1; i >= 0; i-- {...} style loops, which do not rely on reverse iteration order. Change-Id: I5542834286562da058200c06e7a173b13760e54d Reviewed-on: https://go-review.googlesource.com/21044 Reviewed-by: Keith Randall <khr@golang.org>
2016-03-02all: single space after period.Brad Fitzpatrick
The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-22[dev.ssa] cmd/compile: memory allocation tweaks to regalloc and domDavid Chase
Spotted a minor source of excess allocation in the register allocator. Rearranged the dominator tree code to pull its scratch memory from a reused buffer attached to Config. Change-Id: I6da6e7b112f7d3eb1fd00c58faa8214cdea44e38 Reviewed-on: https://go-review.googlesource.com/19450 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-02[dev.ssa] cmd/compile: reducing alloc footprint of dominator calcDavid Chase
Converted working slices of pointer into slices of pointer index. Half the size (on 64-bit machine) and no pointers to trace if GC occurs while they're live. TODO - could expose slice mapping ID->*Block; some dom clients also construct these. Minor optimization in regalloc that cuts allocation count. Minor optimization in compile.go that cuts calls to Sprintf. Change-Id: I28f0bfed422b7344af333dc52ea272441e28e463 Reviewed-on: https://go-review.googlesource.com/19104 Run-TryBot: Todd Neal <todd@tneal.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Todd Neal <todd@tneal.org>
2015-10-25[dev.ssa] cmd/compile: optimize nil checksKeith Randall
Use faulting loads instead of test/jeq to do nil checks. Fold nil checks into a following load/store if possible. Makes binaries about 2% smaller. Change-Id: I54af0f0a93c853f37e34e0ce7e3f01dd2ac87f64 Reviewed-on: https://go-review.googlesource.com/16287 Reviewed-by: David Chase <drchase@google.com>
2015-09-11[dev.ssa] cmd/compile/internal/ssa: simplify how exit blocks are usedKeith Randall
Move to implicit (mostly) instead of explicit exit blocks. RET and RETJMP have no outgoing edges - they implicitly exit. CALL only has one outgoing edge, as its exception edge is implicit as well. Exit blocks are only used for unconditionally panicking code, like the failed branches of nil and bounds checks. There may now be more than one exit block. No merges happen at exit blocks. The only downside is it is harder to find all the places code can exit the method. See the reverse dominator code for an example. Change-Id: I42e2fd809a4bf81301ab993e29ad9f203ce48eb0 Reviewed-on: https://go-review.googlesource.com/14462 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-15[dev.ssa] cmd/compile/internal : Implement Lengauer-Tarjan for dominatorsTodd Neal
Implements the simple Lengauer-Tarjan algorithm for dominator and post-dominator calculation. benchmark old ns/op new ns/op delta BenchmarkDominatorsLinear-8 1403862 1292741 -7.92% BenchmarkDominatorsFwdBack-8 1270633 1428285 +12.41% BenchmarkDominatorsManyPred-8 225932354 1530886 -99.32% BenchmarkDominatorsMaxPred-8 445994225 1393612 -99.69% BenchmarkDominatorsMaxPredVal-8 447235248 1246899 -99.72% BenchmarkNilCheckDeep1-8 829 1259 +51.87% BenchmarkNilCheckDeep10-8 2199 2397 +9.00% BenchmarkNilCheckDeep100-8 57325 29405 -48.70% BenchmarkNilCheckDeep1000-8 6625837 2933151 -55.73% BenchmarkNilCheckDeep10000-8 763559787 319105541 -58.21% benchmark old MB/s new MB/s speedup BenchmarkDominatorsLinear-8 7.12 7.74 1.09x BenchmarkDominatorsFwdBack-8 7.87 7.00 0.89x BenchmarkDominatorsManyPred-8 0.04 6.53 163.25x BenchmarkDominatorsMaxPred-8 0.02 7.18 359.00x BenchmarkDominatorsMaxPredVal-8 0.02 8.02 401.00x BenchmarkNilCheckDeep1-8 1.21 0.79 0.65x BenchmarkNilCheckDeep10-8 4.55 4.17 0.92x BenchmarkNilCheckDeep100-8 1.74 3.40 1.95x BenchmarkNilCheckDeep1000-8 0.15 0.34 2.27x BenchmarkNilCheckDeep10000-8 0.01 0.03 3.00x Change-Id: Icec3d774422a9bc64914779804c8c0ab73aa72bf Reviewed-on: https://go-review.googlesource.com/11971 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-07[dev.ssa] cmd/compile/ssa: dominator tests and benchmarksTodd Neal
This change has some tests verifying functionality and an assortment of benchmarks of various block lists. It modifies NewBlock to allocate in contiguous blocks improving the performance of intersect() for extremely large graphs by 30-40%. benchmark old ns/op new ns/op delta BenchmarkDominatorsLinear-8 1185619 901154 -23.99% BenchmarkDominatorsFwdBack-8 1302138 863537 -33.68% BenchmarkDominatorsManyPred-8 404670521 247450911 -38.85% BenchmarkDominatorsMaxPred-8 455809002 471675119 +3.48% BenchmarkDominatorsMaxPredVal-8 819315864 468257300 -42.85% BenchmarkNilCheckDeep1-8 766 706 -7.83% BenchmarkNilCheckDeep10-8 2553 2209 -13.47% BenchmarkNilCheckDeep100-8 58606 57545 -1.81% BenchmarkNilCheckDeep1000-8 7753012 8025750 +3.52% BenchmarkNilCheckDeep10000-8 1224165946 789995184 -35.47% Change-Id: Id3d6bc9cb1138e8177934441073ac7873ddf7ade Reviewed-on: https://go-review.googlesource.com/11716 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-29[dev.ssa] cmd/compile/ssa: add nilcheckelim benchmarksJosh Bleecher Snyder
These benchmarks demonstrate that the nilcheckelim pass is roughly O(n^2): BenchmarkNilCheckDeep1 2000000 741 ns/op 1.35 MB/s BenchmarkNilCheckDeep10 1000000 2237 ns/op 4.47 MB/s BenchmarkNilCheckDeep100 20000 60713 ns/op 1.65 MB/s BenchmarkNilCheckDeep1000 200 7925198 ns/op 0.13 MB/s BenchmarkNilCheckDeep10000 1 1220104252 ns/op 0.01 MB/s Profiling suggests that building the dominator tree is also O(n^2), and before size factors take over, considerably more expensive than nilcheckelim. Change-Id: If966b38ec52243a25f355dab871300d29db02e16 Reviewed-on: https://go-review.googlesource.com/11520 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-24[dev.ssa] cmd/compile/ssa: add -f suffix to logging methodsJosh Bleecher Snyder
Requested in CL 11380. Change-Id: Icf0d23fb8d383c76272401e363cc9b2169d11403 Reviewed-on: https://go-review.googlesource.com/11450 Reviewed-by: Alan Donovan <adonovan@google.com>
2015-06-21[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errorsJosh Bleecher Snyder
The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-05-28[dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranchKeith Randall
Semi-regular merge of tip to dev.ssa. Complicated a bit by the move of cmd/internal/* to cmd/compile/internal/*. Change-Id: I1c66d3c29bb95cce4a53c5a3476373aa5245303d