aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/cse.go
AgeCommit message (Collapse)Author
2021-02-26cmd/compile: change StaticCall to return a "Results"David Chase
StaticLECall (multiple value in +mem, multiple value result +mem) -> StaticCall (multiple ergister value in +mem, multiple register-sized-value result +mem) -> ARCH CallStatic (multiple ergister value in +mem, multiple register-sized-value result +mem) But the architecture-dependent stuff is indifferent to whether it is mem->mem or (mem)->(mem) until Prog generation. Deal with OpSelectN -> Prog in ssagen/ssa.go, others, as they appear. For #40724. Change-Id: I1d0436f6371054f1881862641d8e7e418e4a6a16 Reviewed-on: https://go-review.googlesource.com/c/go/+/293391 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-12-08[dev.regabi] cmd/compile: add ssa.Aux tag interface for Value.AuxMatthew Dempsky
It's currently hard to automate refactorings around the Value.Aux field, because we don't have any static typing information for it. Adding a tag interface will make subsequent CLs easier and safer. Passes buildall w/ toolstash -cmp. Updates #42982. Change-Id: I41ae8e411a66bda3195a0957b60c2fe8a8002893 Reviewed-on: https://go-review.googlesource.com/c/go/+/275756 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>
2020-06-10cmd/compile: always tighten and de-duplicate tuple selectorsMichael Munday
The scheduler assumes two special invariants that apply to tuple selectors (Select0 and Select1 ops): 1. There is only one tuple selector of each type per generator. 2. Tuple selectors and generators reside in the same block. Prior to this CL the assumption was that these invariants would only be broken by the CSE pass. The CSE pass therefore contained code to move and de-duplicate selectors to fix these invariants. However it is also possible to write relatively basic optimization rules that cause these invariants to be broken. For example: (A (Select0 (B))) -> (Select1 (B)) This rule could result in the newly added selector (Select1) being in a different block to the tuple generator (see issue #38356). It could also result in duplicate selectors if this rule matches multiple times for the same tuple generator (see issue #39472). The CSE pass will 'fix' these invariants. However it will only do so when optimizations are enabled (since disabling optimizations disables the CSE pass). This CL moves the CSE tuple selector fixup code into its own pass and makes it mandatory even when optimizations are disabled. This allows tuple selectors to be treated like normal ops for most of the compilation pipeline until after the new pass has run, at which point we need to be careful to maintain the invariant again. Fixes #39472. Change-Id: Ia3f79e09d9c65ac95f897ce37e967ee1258a080b Reviewed-on: https://go-review.googlesource.com/c/go/+/237118 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-05-14cmd/compile: fix tuple selector bug in CSE passMichael Munday
When tuple generators and selectors are eliminated as part of the CSE pass we may end up with tuple selectors that are in different blocks to the tuple generators that they correspond to. This breaks the invariant that tuple generators and their corresponding selectors must be in the same block. Therefore after CSE this situation must be corrected. Unfortunately the fixup code did not take into account that selectors could be eliminated by CSE. It assumed that only the tuple generators could be eliminated. In some situations this meant that it got into a state where it was replacing references to selectors with references to dead selectors in the wrong block. To fix this we move the fixup code after the CSE rewrites have been applied. This removes any difficult-to-reason-about interactions with the CSE rewriter. Fixes #38916. Change-Id: I2211982dcdba399d03299f0a819945b3eb93b291 Reviewed-on: https://go-review.googlesource.com/c/go/+/233857 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-05cmd/compile: fix liveness for open-coded defer args for infinite loopsDan Scales
Once defined, a stack slot holding an open-coded defer arg should always be marked live, since it may be used at any time if there is a panic. These stack slots are typically kept live naturally by the open-defer code inlined at each return/exit point. However, we need to do extra work to make sure that they are kept live if a function has an infinite loop or a panic exit. For this fix, only in the case of a function that is using open-coded defers, we compute the set of blocks (most often empty) that cannot reach a return or a BlockExit (panic) because of an infinite loop. Then, for each block b which cannot reach a return or BlockExit or is a BlockExit block, we mark each defer arg slot as live, as long as the definition of the defer arg slot dominates block b. For this change, had to export (*Func).sdom (-> Sdom) and SparseTree.isAncestorEq (-> IsAncestorEq) Updates #35277 Change-Id: I7b53c9bd38ba384a3794386dd0eb94e4cbde4eb1 Reviewed-on: https://go-review.googlesource.com/c/go/+/204802 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02cmd/compile: allow multiple SSA block control valuesMichael Munday
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-14cmd/compile: assign and preserve statement boundaries.David Chase
A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2017-05-19cmd/compile/internal/ssa: fix spelling mistakeKevin Burke
Change-Id: I4b8f1b61c10f60ddb3687759af0be1641c1f78ce Reviewed-on: https://go-review.googlesource.com/43111 Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2017-05-09cmd/compile: ignore types when considering tuple select for CSETodd Neal
Fixes #20097 Change-Id: I3c9626ccc8cd0c46a7081ea8650b2ff07a5d4fcd Reviewed-on: https://go-review.googlesource.com/41505 Run-TryBot: Todd Neal <todd@tneal.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-05-09cmd/compile: change ssa.Type into *types.TypeJosh Bleecher Snyder
When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-02-09cmd/compile: remove some allocs from CSEJosh Bleecher Snyder
Pick up a few pennies: * CSE gets run twice for each function, but the set of Aux values doesn't change. Avoid populating it twice. * Don't bother populating auxmap for values that can't be CSE'd anyway. name old alloc/op new alloc/op delta Template 41.0MB ± 0% 40.7MB ± 0% -0.61% (p=0.008 n=5+5) Unicode 32.3MB ± 0% 32.3MB ± 0% -0.22% (p=0.008 n=5+5) GoTypes 122MB ± 0% 121MB ± 0% -0.55% (p=0.008 n=5+5) Compiler 482MB ± 0% 479MB ± 0% -0.58% (p=0.008 n=5+5) SSA 865MB ± 0% 862MB ± 0% -0.35% (p=0.008 n=5+5) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.056 n=5+5) GoParser 32.6MB ± 0% 32.4MB ± 0% -0.58% (p=0.008 n=5+5) Reflect 84.2MB ± 0% 83.8MB ± 0% -0.57% (p=0.008 n=5+5) Tar 27.7MB ± 0% 27.6MB ± 0% -0.37% (p=0.008 n=5+5) XML 44.7MB ± 0% 44.5MB ± 0% -0.53% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 373k ± 0% 373k ± 1% ~ (p=1.000 n=5+5) Unicode 326k ± 0% 325k ± 0% ~ (p=0.548 n=5+5) GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.841 n=5+5) Compiler 4.16M ± 0% 4.15M ± 0% ~ (p=0.222 n=5+5) SSA 7.57M ± 0% 7.56M ± 0% -0.22% (p=0.008 n=5+5) Flate 238k ± 1% 239k ± 1% ~ (p=0.690 n=5+5) GoParser 304k ± 0% 304k ± 0% ~ (p=1.000 n=5+5) Reflect 1.01M ± 0% 1.00M ± 0% -0.31% (p=0.016 n=4+5) Tar 245k ± 0% 245k ± 1% ~ (p=0.548 n=5+5) XML 393k ± 0% 391k ± 1% ~ (p=0.095 n=5+5) Change-Id: I78f1ffe129bd8fd590b7511717dd2bf9f5ecbd6d Reviewed-on: https://go-review.googlesource.com/36690 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-02cmd/compile: fix CSE with commutative opsKeith Randall
CSE opportunities were being missed for commutative ops. We used to order the args of commutative ops (by arg ID) once at the start of CSE. But that may not be enough. i1 = (Load ptr mem) i2 = (Load ptr mem) x1 = (Add i1 j) x2 = (Add i2 j) Equivalent commutative ops x1 and x2 may not get their args ordered in the same way because because at the start of CSE, we don't know that the i values will be CSEd. If x1 and x2 get opposite orders we won't CSE them. Instead, (re)order the args of commutative operations by their equivalence class IDs each time we partition an equivalence class. Change-Id: Ic609fa83b85299782a5e85bf93dc6023fccf4b0c Reviewed-on: https://go-review.googlesource.com/33632 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Todd Neal <todd@tneal.org>
2016-11-18cmd/compile: in cse, allow for new ssa valuesPhilip Hofer
The table of rewrites in ssa/cse is not sized appropriately for ssa IDs that are created during copying of selects into new blocks. Fixes #17918 Change-Id: I65fe86c6aab5efa679aa473aadc4ee6ea882cd41 Reviewed-on: https://go-review.googlesource.com/33240 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-27cmd/compile: reuse sort helpersJosh Bleecher Snyder
sort.Sort's argument always escapes. cse generates many calls to sort.Sort. Set up a single escaping variable and re-use it across loops. name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.2MB ± 0% -1.24% (p=0.000 n=15+15) Unicode 33.4MB ± 0% 33.3MB ± 0% -0.09% (p=0.000 n=15+15) GoTypes 121MB ± 0% 119MB ± 0% -1.48% (p=0.000 n=14+15) Compiler 474MB ± 0% 465MB ± 0% -1.94% (p=0.000 n=14+15) name old allocs/op new allocs/op delta Template 405k ± 0% 394k ± 0% -2.64% (p=0.000 n=15+15) Unicode 350k ± 0% 350k ± 0% -0.14% (p=0.000 n=14+15) GoTypes 1.21M ± 0% 1.18M ± 0% -3.07% (p=0.000 n=15+14) Compiler 4.37M ± 0% 4.18M ± 0% -4.39% (p=0.000 n=15+15) Change-Id: I68cf56dafa0f3ea778826eea19908bd761556154 Reviewed-on: https://go-review.googlesource.com/32220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-05cmd/compile: make CSE fasterKeith Randall
To refine a set of possibly equivalent values, the old CSE algorithm picked one value, compared it against all the others, and made two sets out of the results (the values that match the picked value and the values that didn't). Unfortunately, this leads to O(n^2) behavior. The picked value ends up being equal to no other values, we make size 1 and size n-1 sets, and then recurse on the size n-1 set. Instead, sort the set by the equivalence classes of its arguments. Then we just look for spots in the sorted list where the equivalence classes of the arguments change. This lets us do a multi-way split for O(n lg n) time. This change makes cmpDepth unnecessary. The refinement portion used to call the type comparator. That is unnecessary as the type was already part of the initial partition. Lowers time of 16361 from 8 sec to 3 sec. Lowers time of 15112 from 282 sec to 20 sec. That's kind of unfair, as CL 30257 changed it from 21 sec to 282 sec. But that CL fixed other bad compile times (issue #17127) by large factors, so net still a big win. Fixes #15112 Fixes #16361 Change-Id: I351ce111bae446608968c6d48710eeb6a3d8e527 Reviewed-on: https://go-review.googlesource.com/30354 Reviewed-by: Todd Neal <todd@tneal.org>
2016-10-04cmd/compile: lower cse comparison depthKeith Randall
Todd originally set cmpDepth to 4. Quoting: I picked a depth of 4 by timing tests of `go tool compile arithConst_ssa.go` and `go test -c net/http`. 3.89 / 3.92 CL w/cmpDepth = 1 3.78 / 3.92 CL w/cmpDepth = 2 3.44 / 3.96 CL w/cmpDepth = 3 3.29 / 3.9 CL w/cmpDepth = 4 3.3 / 3.93 CL w/cmpDepth = 5 3.29 / 3.92 CL w/cmpDepth = 10 I don't see the same behavior now, differences in those two benchmarks are in the noise (between 1 and 4). In issue 17127, CSE takes a really long time. Lowering cmpDepth from 4 to 1 lowers compile time from 8 minutes to 1 minute. Fixes #17127 Change-Id: I6dc544bbcf2a9dca73637d0182d3de1a5ae6c944 Reviewed-on: https://go-review.googlesource.com/30257 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-09-19cmd/compile: cache CFG-dependent computationsKeith Randall
We compute a lot of stuff based off the CFG: postorder traversal, dominators, dominator tree, loop nest. Multiple phases use this information and we end up recomputing some of it. Add a cache for this information so if the CFG hasn't changed, we can reuse the previous computation. Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2 Reviewed-on: https://go-review.googlesource.com/29356 Reviewed-by: David Chase <drchase@google.com>
2016-08-17cmd/compile: CSE copied tuple selectorsCherry Zhang
In CSE if a tuple generator is CSE'd to a different block, its selectors are copied to the same block. In this case, also CES the copied selectors. Test copied from Keith's CL 27202. Fixes #16741. Change-Id: I2fc8b9513d430f10d6104275cfff5fb75d3ef3d9 Reviewed-on: https://go-review.googlesource.com/27236 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-07-18[dev.ssa] cmd/compile: clean up tuple types and selectsKeith Randall
Make tuple types and their SelectX ops fully generic. These ops no longer need to be lowered. Regalloc understands them and their tuple-generating arguments. We can now have opcodes returning arbitrary pairs of results. (And it would be easy to move to >2 results if needed.) Update arm implementation to the new standard. Implement just enough in 386 port to do 64-bit add. Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79 Reviewed-on: https://go-review.googlesource.com/24976 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-06-24[dev.ssa] cmd/compile: move tuple selectors to generator's block in CSECherry Zhang
CSE may substitute a tuple generator with another one in a different block. In this case, since we want tuple selectors to stay together with the tuple generator, copy the selector to the new generator's block and rewrite its use. Op.isTupleGenerator and Op.isTupleSelector are introduced to assert tuple ops. Use it in tighten as well. Updates #15365. Change-Id: Ia9e8c734b9cc3bc9fca4a2750041eef9cdfac5a5 Reviewed-on: https://go-review.googlesource.com/24137 Reviewed-by: David Chase <drchase@google.com>
2016-05-26cmd/compile: improve domorder documentationJosh Bleecher Snyder
domorder has some non-obvious useful properties that we’re relying on in cse. Document them and provide an argument that they hold. While we’re here, do some minor renaming. The argument is a re-working of a private email exchange with Todd Neal and David Chase. Change-Id: Ie154e0521bde642f5f11e67fc542c5eb938258be Reviewed-on: https://go-review.googlesource.com/23449 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-05-16cmd/compile: use sparse algorithm for phis in large programDavid Chase
This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-03cmd/compile: never CSE two memoriesKeith Randall
It never makes sense to CSE two ops that generate memory. We might as well start those ops off in their own partition. Fixes #15520 Change-Id: I0091ed51640f2c10cd0117f290b034dde7a86721 Reviewed-on: https://go-review.googlesource.com/22741 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-17cmd/compile/internal/ssa: use Compare instead of EqualJosh Bleecher Snyder
They have different semantics. Equal is stricter and is designed for the front-end. Compare is looser and cheaper and is designed for the back-end. To avoid possible regression, remove Equal from ssa.Type. Updates #15043 Change-Id: Ie23ce75ff6b4d01b7982e0a89e6f81b5d099d8d6 Reviewed-on: https://go-review.googlesource.com/21483 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-15cmd/compile: speed up dom checking in cseTodd Neal
Process a slice of equivalent values by setting replaced values to nil instead of removing them from the slice to eliminate copying. Also take advantage of the entry number sort to break early once we reach a value in a block that is not dominated. For the code in issue #15112: Before: real 0m52.603s user 0m56.957s sys 0m1.213s After: real 0m22.048s user 0m26.445s sys 0m0.939s Updates #15112 Change-Id: I06d9e1e1f1ad85d7fa196c5d51f0dc163907376d Reviewed-on: https://go-review.googlesource.com/22068 Reviewed-by: David Chase <drchase@google.com>
2016-04-13cmd/compile: sort partitions by dom to speed up cseTodd Neal
We do two O(n) scans of all values in an eqclass when computing substitutions for CSE. In unfortunate cases, like those found in #15112, we can have a large eqclass composed of values found in blocks none of whom dominate the other. This leads to O(n^2) behavior. The elements are removed one at a time, with O(n) scans each time. This CL removes the linear scan by sorting the eqclass so that dominant values will be sorted first. As long as we also ensure we don't disturb the sort order, then we no longer need to scan for the maximally dominant value. For the code in issue #15112: Before: real 1m26.094s user 1m30.776s sys 0m1.125s Aefter: real 0m52.099s user 0m56.829s sys 0m1.092s Updates #15112 Change-Id: Ic4f8680ed172e716232436d31963209c146ef850 Reviewed-on: https://go-review.googlesource.com/21981 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Todd Neal <todd@tneal.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-13cmd/compile: use shared dom tree for cse, tooAlexandru Moșoi
Missed this in the previous CL where the shared dom tree was introduced. Change-Id: If0bd85d4b4567d7e87814ed511603b1303ab3903 Reviewed-on: https://go-review.googlesource.com/21970 Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-04-13cmd/compile: teach CSE that new objects are bespokeJosh Bleecher Snyder
runtime.newobject never returns the same thing twice, so the resulting value will never be a common subexpression. This helps when compiling large static data structures that include pointers, such as maps and slices. No clear performance impact on other code. (See below.) For the code in issue #15112: Before: real 1m14.238s user 1m18.985s sys 0m0.787s After: real 0m47.172s user 0m52.248s sys 0m0.767s For the code in issue #15235, size 10k: Before: real 0m44.916s user 0m46.577s sys 0m0.304s After: real 0m7.703s user 0m9.041s sys 0m0.316s Still more work to be done, particularly for #15112. Updates #15112 Updates #15235 name old time/op new time/op delta Template 330ms ±11% 333ms ±13% ~ (p=0.749 n=20+19) Unicode 148ms ± 6% 152ms ± 8% ~ (p=0.072 n=18+20) GoTypes 1.01s ± 7% 1.01s ± 3% ~ (p=0.583 n=20+20) Compiler 5.04s ± 2% 5.06s ± 2% ~ (p=0.314 n=20+20) name old user-ns/op new user-ns/op delta Template 444user-ms ±11% 445user-ms ±10% ~ (p=0.738 n=20+20) Unicode 215user-ms ± 5% 218user-ms ± 5% ~ (p=0.239 n=18+18) GoTypes 1.45user-s ± 3% 1.45user-s ± 4% ~ (p=0.620 n=20+20) Compiler 7.23user-s ± 2% 7.22user-s ± 2% ~ (p=0.901 n=20+19) name old alloc/op new alloc/op delta Template 55.0MB ± 0% 55.0MB ± 0% ~ (p=0.547 n=20+20) Unicode 37.6MB ± 0% 37.6MB ± 0% ~ (p=0.301 n=20+20) GoTypes 177MB ± 0% 177MB ± 0% ~ (p=0.065 n=20+19) Compiler 798MB ± 0% 797MB ± 0% -0.05% (p=0.000 n=19+20) name old allocs/op new allocs/op delta Template 492k ± 0% 493k ± 0% +0.03% (p=0.030 n=20+20) Unicode 377k ± 0% 377k ± 0% ~ (p=0.423 n=20+19) GoTypes 1.40M ± 0% 1.40M ± 0% ~ (p=0.102 n=20+20) Compiler 5.53M ± 0% 5.53M ± 0% ~ (p=0.094 n=17+18) name old text-bytes new text-bytes delta HelloSize 561k ± 0% 561k ± 0% ~ (all samples are equal) CmdGoSize 6.13M ± 0% 6.13M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 306k ± 0% 306k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 905k ± 0% 905k ± 0% ~ (all samples are equal) CmdGoSize 9.64M ± 0% 9.64M ± 0% ~ (all samples are equal) Change-Id: Id774e2901d7701a3ec7a1c1d1cf1d9327a4107fc Reviewed-on: https://go-review.googlesource.com/21937 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-17cmd/compile: keep value use counts in SSAKeith Randall
Keep track of how many uses each Value has. Each appearance in Value.Args and in Block.Control counts once. The number of uses of a value is generically useful to constrain rewrite rules. For instance, we might want to prevent merging index operations into loads if the same index expression is used lots of times. But I have one use in particular for which the use count is required. We must make sure we don't combine ops with loads if the load has more than one use. Otherwise, we may split a single load into multiple loads and that breaks perceived behavior in the presence of races. In particular, the load of m.state in sync/mutex.go:Lock can't be done twice. (I have a separate CL which triggers the mutex failure. This CL has a test which demonstrates a similar failure.) Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e Reviewed-on: https://go-review.googlesource.com/20790 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-02all: single space after period.Brad Fitzpatrick
The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-25[dev.ssa] cmd/compile: enhance command line option processing for SSADavid Chase
The -d compiler flag can also specify ssa phase and flag, for example -d=ssa/generic_cse/time,ssa/generic_cse/stats Spaces in the phase names can be specified with an underscore. Flags currently parsed (not necessarily recognized by the phases yet) are: on, off, mem, time, debug, stats, and test On, off and time are handled in the harness, debug, stats, and test are interpreted by the phase itself. The pass is now attached to the Func being compiled, and a new method logStats(key, ...value) on *Func to encourage a semi-standardized format for that output. Output fields are separated by tabs to ease digestion by awk and spreadsheets. For example, if f.pass.stats > 0 { f.logStat("CSE REWRITES", rewrites) } Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc Reviewed-on: https://go-review.googlesource.com/19885 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-24[dev.ssa] cmd/compile: speed up cseTodd Neal
Construct better initial partitions by recursively comparing values and their arguments. This saves one second on compile of arithConst_ssa.go (4.3s to 3.3s) and shows a 3-5% increase with compilebench. name old time/op new time/op delta Template 266ms ± 3% 253ms ± 4% -5.08% (p=0.032 n=5+5) GoTypes 927ms ± 3% 885ms ± 2% -4.55% (p=0.016 n=5+5) Compiler 3.91s ± 3% 3.73s ± 2% -4.49% (p=0.008 n=5+5) MakeBash 31.6s ± 1% 30.5s ± 3% -3.51% (p=0.016 n=5+5) Change-Id: I6ede31ff459131ccfed69531acfbd06b19837700 Reviewed-on: https://go-review.googlesource.com/19838 Reviewed-by: David Chase <drchase@google.com>
2016-02-22[dev.ssa] cmd/compile: double speed of CSE phaseDavid Chase
Replaced comparison based on (*Type).String() with an allocation-free structural comparison. Roughly doubles speed of CSE, also reduces allocations. Checked that roughly the same number of CSEs were detected during make.bash (about a million) and that "new" CSEs were caused by the effect described above. Change-Id: Id205a9f6986efd518043e12d651f0b01206aeb1b Reviewed-on: https://go-review.googlesource.com/19471 Reviewed-by: Keith Randall <khr@golang.org>
2016-02-22[dev.ssa] cmd/compile/internal/ssa: handle commutative operations in cseAlexandru Moșoi
* If a operation is commutative order the parameters in a canonical way. Size of pkg/tool/linux_amd64/* excluding compile: before: 95882288 after: 95868152 change: 14136 ~0.015% I tried something similar with Leq and Geq, but the results were not great because it confuses the 'lowered cse' pass too much which can no longer remove redundant comparisons from IsInBounds. Change-Id: I2f928663a11320bfc51c7fa47e384b7411c420ba Reviewed-on: https://go-review.googlesource.com/19727 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-22[dev.ssa] cmd/compile: add a zero arg cse passTodd Neal
Add an initial cse pass that only operates on zero argument values. This removes the need for a special case in cse for removing OpSB and speeds up arithConst_ssa.go compilation by 9% while slowing "test -c net/http" by 1.5%. Change-Id: Id1500482485426f66c6c2eba75eeaf4f19c8a889 Reviewed-on: https://go-review.googlesource.com/19454 Run-TryBot: Todd Neal <todd@tneal.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-02-08[dev.ssa] cmd/compile: fix for bug in cse speed improvementsDavid Chase
Problem was caused by use of Args[].Aux differences in early partitioning. This artificially separated two equivalent expressions because sort ignores the Aux field, hence things can end with equal things separated by unequal things and thus the equal things are split into more than one partition. For example: SliceLen(a), SliceLen(b), SliceLen(a). Fix: don't use Args[].Aux in initial partitioning. Left in a debugging flag and some debugging Fprintf's; not sure if that is house style or not. We'll probably want to be more systematic in our naming conventions, e.g. ssa.cse, ssa.scc, etc. Change-Id: Ib1412539cc30d91ea542c0ac7b2f9b504108ca7f Reviewed-on: https://go-review.googlesource.com/19316 Reviewed-by: Keith Randall <khr@golang.org>
2016-02-08[dev.ssa] cmd/compile: speed up cseTodd Neal
Examine both Aux and AuxInt to form more precise initial partitions. Restructure loop to avoid repeated type.Equal() call. Speeds up compilation of testdata/gen/arithConst_ssa by 25%. Change-Id: I3cfb1d254adf0601ee69239e1885b0cf2a23575b Reviewed-on: https://go-review.googlesource.com/19313 Run-TryBot: Todd Neal <todd@tneal.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-02-04[dev.ssa] cmd/compile: remove dead codeTodd Neal
Change-Id: I1738e3af7de0972c54d74325d80781059d0796d8 Reviewed-on: https://go-review.googlesource.com/19186 Run-TryBot: Todd Neal <todd@tneal.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-01-29[dev.ssa] cmd/compile: fix -N buildKeith Randall
The OpSB hack didn't quite work. We need to really CSE these ops to make regalloc happy. Change-Id: I9f4d7bfb0929407c84ee60c9e25ff0c0fbea84af Reviewed-on: https://go-review.googlesource.com/19083 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-01-28[dev.ssa] cmd/compile: make cse fasterKeith Randall
It is one of the slowest compiler phases right now, and we run two of them. Instead of using a map to make the initial partition, use a sort. It is much less memory intensive. Do a few optimizations to avoid work for size-1 equivalence classes. Implement -N. Change-Id: I1d2d85d3771abc918db4dd7cc30b0b2d854b15e1 Reviewed-on: https://go-review.googlesource.com/19024 Reviewed-by: David Chase <drchase@google.com>
2015-12-12[dev.ssa] cmd/compile: allow control values to be CSEdKeith Randall
With the separate flagalloc pass, it should be fine to allow CSE of control values. The worst that can happen is that the comparison gets un-CSEd by flagalloc. Fix bug in flagalloc where flag restores were getting clobbered by rematerialization during register allocation. Change-Id: If476cf98b69973e8f1a8eb29441136dd12fab8ad Reviewed-on: https://go-review.googlesource.com/17760 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org>
2015-09-11[dev.ssa] cmd/compile: minor CSE cleanupJosh Bleecher Snyder
Remove unnecessary local var split. Change-Id: I907ef682b5fd9b3a67771edd1fe90c558f8937ea Reviewed-on: https://go-review.googlesource.com/14523 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>
2015-09-09[dev.ssa] cmd/compile: fix N^2 dominator queries in CSEDavid Chase
Added tree numbering data structure. Changed dominator query in CSE. Removed skip-for-too-big patch in CSE. Passes all.bash. Change-Id: I98d7c61b6015c81f5edab553615db17bc7a58d68 Reviewed-on: https://go-review.googlesource.com/14326 Reviewed-by: Keith Randall <khr@golang.org>
2015-09-06[dev.ssa] cmd/compile: fix buildJosh Bleecher Snyder
CL 14337 made SSA support fixedbugs/issue9604b.go. That test contains > 40k blocks. This made the O(n^2) dom algorithm fail to terminate in a reasonable length of time, breaking the build. For the moment, cap the number of blocks to fix the build. This will be reverted when a more efficient dom algorithm is put in place, which will be soon. Change-Id: Ia66c2629481d29d06655ec54d1deff076b0422c6 Reviewed-on: https://go-review.googlesource.com/14342 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-09-04[dev.ssa] cmd/compile: store floats in AuxIntTodd Neal
Store floats in AuxInt to reduce allocations. Change-Id: I101e6322530b4a0b2ea3591593ad022c992e8df8 Reviewed-on: https://go-review.googlesource.com/14320 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2015-09-03[dev.ssa] cmd/compile: cse should treat -0.0 and 0.0 as differentTodd Neal
cse was incorrectly classifying -0.0 and 0.0 as equivalent. This lead to invalid code as ssa uses PXOR -0.0, reg to negate a floating point. Fixes math. Change-Id: Id7eb10c71749eaed897f29b02c33891cf5820acf Reviewed-on: https://go-review.googlesource.com/14205 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-23[dev.ssa] cmd/compile: don't alloc new CSE classesJosh Bleecher Snyder
This reduces the time to compile test/slice3.go on my laptop from ~12s to ~3.8s. It reduces the max memory use from ~4.8gb to ~450mb. This is still considerably worse than tip, at 1s and 300mb respectively, but it's getting closer. Hopefully this will fix the build at long last. Change-Id: Iac26b52023f408438cba3ea1b81dcd82ca402b90 Reviewed-on: https://go-review.googlesource.com/12566 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-23[dev.ssa] cmd/compile: use v.Args[x].Op in CSE keyJosh Bleecher Snyder
Experimentally, the Ops of v.Args do a good job of differentiating values that will end up in different partitions. Most values have at most two args, so use them. This reduces the wall time to run test/slice3.go on my laptop from ~20s to ~12s. Credit to Todd Neal for the idea. Change-Id: I55d08f09eb678bbe8366924ca2fabcd32526bf41 Reviewed-on: https://go-review.googlesource.com/12565 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-23[dev.ssa] cmd/compile: speed up cseJosh Bleecher Snyder
By walking only the current set of partitions at any given point, the cse pass ended up doing lots of extraneous, effectively O(n^2) work. Using a regular for loop allows each cse pass to make as much progress as possible by processing each new class as it is introduced. This can and should be optimized further, but it already reduces by 75% cse time on test/slice3.go. The overall time to compile test/slice3.go is still dominated by the O(n^2) work in the liveness pass. However, Keith is rewriting regalloc anyway. Change-Id: I8be020b2f69352234587eeadeba923481bf43fcc Reviewed-on: https://go-review.googlesource.com/12244 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-23[dev.ssa] cmd/compile: don't combine phi vars from different blocks in CSEJosh Bleecher Snyder
Here is a concrete case in which this goes wrong. func f_ssa() int { var n int Next: for j := 0; j < 3; j++ { for i := 0; i < 10; i++ { if i == 6 { continue Next } n = i } n += j + j + j + j + j + j + j + j + j + j // j * 10 } return n } What follows is the function printout before and after CSE. Note blocks b8 and b10 in the before case. b8 is the inner loop's condition: i < 10. b10 is the inner loop's increment: i++. v82 is i. On entry to b8, it is either 0 (v19) the first time, or the result of incrementing v82, by way of v29. The CSE pass considered v82 and v49 to be common subexpressions, and eliminated v82 in favor of v49. In the after case, v82 is now dead and will shortly be eliminated. As a result, v29 is also dead, and we have lost the increment. The loop runs forever. BEFORE CSE f_ssa <nil> b1: v1 = Arg <mem> v2 = SP <uint64> v4 = Addr <*int> {~r0} v2 v13 = Zero <mem> [8] v4 v1 v14 = Const <int> v15 = Const <int> v17 = Const <int> [3] v19 = Const <int> v21 = Const <int> [10] v24 = Const <int> [6] v28 = Const <int> [1] v43 = Const <int> [1] Plain -> b3 b2: <- b7 Exit v47 b3: <- b1 Plain -> b4 b4: <- b3 b6 v49 = Phi <int> v15 v44 v68 = Phi <int> v14 v67 v81 = Phi <mem> v13 v81 v18 = Less <bool> v49 v17 If v18 -> b5 b7 b5: <- b4 Plain -> b8 b6: <- b12 b11 v67 = Phi <int> v66 v41 v44 = Add <int> v49 v43 Plain -> b4 b7: <- b4 v47 = Store <mem> v4 v68 v81 Plain -> b2 b8: <- b5 b10 v66 = Phi <int> v68 v82 v82 = Phi <int> v19 v29 v22 = Less <bool> v82 v21 If v22 -> b9 b11 b9: <- b8 v25 = Eq <bool> v82 v24 If v25 -> b12 b13 b10: <- b13 v29 = Add <int> v82 v28 Plain -> b8 b11: <- b8 v32 = Add <int> v49 v49 v33 = Add <int> v32 v49 v34 = Add <int> v33 v49 v35 = Add <int> v34 v49 v36 = Add <int> v35 v49 v37 = Add <int> v36 v49 v38 = Add <int> v37 v49 v39 = Add <int> v38 v49 v40 = Add <int> v39 v49 v41 = Add <int> v66 v40 Plain -> b6 b12: <- b9 Plain -> b6 b13: <- b9 Plain -> b10 AFTER CSE f_ssa <nil> b1: v1 = Arg <mem> v2 = SP <uint64> v4 = Addr <*int> {~r0} v2 v13 = Zero <mem> [8] v4 v1 v14 = Const <int> v15 = Const <int> v17 = Const <int> [3] v19 = Const <int> v21 = Const <int> [10] v24 = Const <int> [6] v28 = Const <int> [1] v43 = Const <int> [1] Plain -> b3 b2: <- b7 Exit v47 b3: <- b1 Plain -> b4 b4: <- b3 b6 v49 = Phi <int> v19 v44 v68 = Phi <int> v19 v67 v81 = Phi <mem> v13 v81 v18 = Less <bool> v49 v17 If v18 -> b5 b7 b5: <- b4 Plain -> b8 b6: <- b12 b11 v67 = Phi <int> v66 v41 v44 = Add <int> v49 v43 Plain -> b4 b7: <- b4 v47 = Store <mem> v4 v68 v81 Plain -> b2 b8: <- b5 b10 v66 = Phi <int> v68 v49 v82 = Phi <int> v19 v29 v22 = Less <bool> v49 v21 If v22 -> b9 b11 b9: <- b8 v25 = Eq <bool> v49 v24 If v25 -> b12 b13 b10: <- b13 v29 = Add <int> v49 v43 Plain -> b8 b11: <- b8 v32 = Add <int> v49 v49 v33 = Add <int> v32 v49 v34 = Add <int> v33 v49 v35 = Add <int> v34 v49 v36 = Add <int> v35 v49 v37 = Add <int> v36 v49 v38 = Add <int> v37 v49 v39 = Add <int> v38 v49 v40 = Add <int> v39 v49 v41 = Add <int> v66 v40 Plain -> b6 b12: <- b9 Plain -> b6 b13: <- b9 Plain -> b10 Change-Id: I16fc4ec527ec63f24f7d0d79d1a4a59bf37269de Reviewed-on: https://go-review.googlesource.com/12444 Reviewed-by: Keith Randall <khr@golang.org>