Age | Commit message (Collapse) | Author |
|
StaticLECall (multiple value in +mem, multiple value result +mem) ->
StaticCall (multiple ergister value in +mem,
multiple register-sized-value result +mem) ->
ARCH CallStatic (multiple ergister value in +mem,
multiple register-sized-value result +mem)
But the architecture-dependent stuff is indifferent to whether
it is mem->mem or (mem)->(mem) until Prog generation.
Deal with OpSelectN -> Prog in ssagen/ssa.go, others, as they
appear.
For #40724.
Change-Id: I1d0436f6371054f1881862641d8e7e418e4a6a16
Reviewed-on: https://go-review.googlesource.com/c/go/+/293391
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
It's currently hard to automate refactorings around the Value.Aux
field, because we don't have any static typing information for it.
Adding a tag interface will make subsequent CLs easier and safer.
Passes buildall w/ toolstash -cmp.
Updates #42982.
Change-Id: I41ae8e411a66bda3195a0957b60c2fe8a8002893
Reviewed-on: https://go-review.googlesource.com/c/go/+/275756
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
|
|
The scheduler assumes two special invariants that apply to tuple
selectors (Select0 and Select1 ops):
1. There is only one tuple selector of each type per generator.
2. Tuple selectors and generators reside in the same block.
Prior to this CL the assumption was that these invariants would
only be broken by the CSE pass. The CSE pass therefore contained
code to move and de-duplicate selectors to fix these invariants.
However it is also possible to write relatively basic optimization
rules that cause these invariants to be broken. For example:
(A (Select0 (B))) -> (Select1 (B))
This rule could result in the newly added selector (Select1) being
in a different block to the tuple generator (see issue #38356). It
could also result in duplicate selectors if this rule matches
multiple times for the same tuple generator (see issue #39472).
The CSE pass will 'fix' these invariants. However it will only do
so when optimizations are enabled (since disabling optimizations
disables the CSE pass).
This CL moves the CSE tuple selector fixup code into its own pass
and makes it mandatory even when optimizations are disabled. This
allows tuple selectors to be treated like normal ops for most of
the compilation pipeline until after the new pass has run, at which
point we need to be careful to maintain the invariant again.
Fixes #39472.
Change-Id: Ia3f79e09d9c65ac95f897ce37e967ee1258a080b
Reviewed-on: https://go-review.googlesource.com/c/go/+/237118
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
When tuple generators and selectors are eliminated as part of the
CSE pass we may end up with tuple selectors that are in different
blocks to the tuple generators that they correspond to. This breaks
the invariant that tuple generators and their corresponding
selectors must be in the same block. Therefore after CSE this
situation must be corrected.
Unfortunately the fixup code did not take into account that selectors
could be eliminated by CSE. It assumed that only the tuple generators
could be eliminated. In some situations this meant that it got into
a state where it was replacing references to selectors with references
to dead selectors in the wrong block.
To fix this we move the fixup code after the CSE rewrites have been
applied. This removes any difficult-to-reason-about interactions
with the CSE rewriter.
Fixes #38916.
Change-Id: I2211982dcdba399d03299f0a819945b3eb93b291
Reviewed-on: https://go-review.googlesource.com/c/go/+/233857
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Once defined, a stack slot holding an open-coded defer arg should always be marked
live, since it may be used at any time if there is a panic. These stack slots are
typically kept live naturally by the open-defer code inlined at each return/exit point.
However, we need to do extra work to make sure that they are kept live if a
function has an infinite loop or a panic exit.
For this fix, only in the case of a function that is using open-coded defers, we
compute the set of blocks (most often empty) that cannot reach a return or a
BlockExit (panic) because of an infinite loop. Then, for each block b which
cannot reach a return or BlockExit or is a BlockExit block, we mark each defer arg
slot as live, as long as the definition of the defer arg slot dominates block b.
For this change, had to export (*Func).sdom (-> Sdom) and SparseTree.isAncestorEq
(-> IsAncestorEq)
Updates #35277
Change-Id: I7b53c9bd38ba384a3794386dd0eb94e4cbde4eb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204802
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.
Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.
This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.
Passes toolstash-check -all.
Results of compilebench:
name old time/op new time/op delta
Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20)
Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18)
GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18)
Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18)
SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18)
Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20)
GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20)
Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20)
Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20)
XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18)
LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19)
ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20)
LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20)
StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20)
name old user-time/op new user-time/op delta
Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20)
Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20)
GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19)
Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18)
SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18)
Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18)
GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20)
Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18)
Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20)
XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20)
LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20)
ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19)
LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20)
name old object-bytes new object-bytes delta
Template 559kB ± 0% 559kB ± 0% ~ (all equal)
Unicode 216kB ± 0% 216kB ± 0% ~ (all equal)
GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal)
Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20)
SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20)
Flate 343kB ± 0% 343kB ± 0% ~ (all equal)
GoParser 441kB ± 0% 441kB ± 0% ~ (all equal)
Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal)
Tar 487kB ± 0% 487kB ± 0% ~ (all equal)
XML 632kB ± 0% 632kB ± 0% ~ (all equal)
name old export-bytes new export-bytes delta
Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal)
Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal)
GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal)
Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20)
SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20)
Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal)
GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal)
Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal)
Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal)
XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal)
CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal)
CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal)
CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal)
Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Change-Id: I4b8f1b61c10f60ddb3687759af0be1641c1f78ce
Reviewed-on: https://go-review.googlesource.com/43111
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
|
|
Fixes #20097
Change-Id: I3c9626ccc8cd0c46a7081ea8650b2ff07a5d4fcd
Reviewed-on: https://go-review.googlesource.com/41505
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Pick up a few pennies:
* CSE gets run twice for each function,
but the set of Aux values doesn't change.
Avoid populating it twice.
* Don't bother populating auxmap for values
that can't be CSE'd anyway.
name old alloc/op new alloc/op delta
Template 41.0MB ± 0% 40.7MB ± 0% -0.61% (p=0.008 n=5+5)
Unicode 32.3MB ± 0% 32.3MB ± 0% -0.22% (p=0.008 n=5+5)
GoTypes 122MB ± 0% 121MB ± 0% -0.55% (p=0.008 n=5+5)
Compiler 482MB ± 0% 479MB ± 0% -0.58% (p=0.008 n=5+5)
SSA 865MB ± 0% 862MB ± 0% -0.35% (p=0.008 n=5+5)
Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.056 n=5+5)
GoParser 32.6MB ± 0% 32.4MB ± 0% -0.58% (p=0.008 n=5+5)
Reflect 84.2MB ± 0% 83.8MB ± 0% -0.57% (p=0.008 n=5+5)
Tar 27.7MB ± 0% 27.6MB ± 0% -0.37% (p=0.008 n=5+5)
XML 44.7MB ± 0% 44.5MB ± 0% -0.53% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Template 373k ± 0% 373k ± 1% ~ (p=1.000 n=5+5)
Unicode 326k ± 0% 325k ± 0% ~ (p=0.548 n=5+5)
GoTypes 1.16M ± 0% 1.16M ± 0% ~ (p=0.841 n=5+5)
Compiler 4.16M ± 0% 4.15M ± 0% ~ (p=0.222 n=5+5)
SSA 7.57M ± 0% 7.56M ± 0% -0.22% (p=0.008 n=5+5)
Flate 238k ± 1% 239k ± 1% ~ (p=0.690 n=5+5)
GoParser 304k ± 0% 304k ± 0% ~ (p=1.000 n=5+5)
Reflect 1.01M ± 0% 1.00M ± 0% -0.31% (p=0.016 n=4+5)
Tar 245k ± 0% 245k ± 1% ~ (p=0.548 n=5+5)
XML 393k ± 0% 391k ± 1% ~ (p=0.095 n=5+5)
Change-Id: I78f1ffe129bd8fd590b7511717dd2bf9f5ecbd6d
Reviewed-on: https://go-review.googlesource.com/36690
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
CSE opportunities were being missed for commutative ops. We used to
order the args of commutative ops (by arg ID) once at the start of CSE.
But that may not be enough.
i1 = (Load ptr mem)
i2 = (Load ptr mem)
x1 = (Add i1 j)
x2 = (Add i2 j)
Equivalent commutative ops x1 and x2 may not get their args ordered in
the same way because because at the start of CSE, we don't know that
the i values will be CSEd. If x1 and x2 get opposite orders we won't
CSE them.
Instead, (re)order the args of commutative operations by their
equivalence class IDs each time we partition an equivalence class.
Change-Id: Ic609fa83b85299782a5e85bf93dc6023fccf4b0c
Reviewed-on: https://go-review.googlesource.com/33632
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
|
|
The table of rewrites in ssa/cse is not sized appropriately for
ssa IDs that are created during copying of selects into new blocks.
Fixes #17918
Change-Id: I65fe86c6aab5efa679aa473aadc4ee6ea882cd41
Reviewed-on: https://go-review.googlesource.com/33240
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
sort.Sort's argument always escapes.
cse generates many calls to sort.Sort.
Set up a single escaping variable
and re-use it across loops.
name old alloc/op new alloc/op delta
Template 40.7MB ± 0% 40.2MB ± 0% -1.24% (p=0.000 n=15+15)
Unicode 33.4MB ± 0% 33.3MB ± 0% -0.09% (p=0.000 n=15+15)
GoTypes 121MB ± 0% 119MB ± 0% -1.48% (p=0.000 n=14+15)
Compiler 474MB ± 0% 465MB ± 0% -1.94% (p=0.000 n=14+15)
name old allocs/op new allocs/op delta
Template 405k ± 0% 394k ± 0% -2.64% (p=0.000 n=15+15)
Unicode 350k ± 0% 350k ± 0% -0.14% (p=0.000 n=14+15)
GoTypes 1.21M ± 0% 1.18M ± 0% -3.07% (p=0.000 n=15+14)
Compiler 4.37M ± 0% 4.18M ± 0% -4.39% (p=0.000 n=15+15)
Change-Id: I68cf56dafa0f3ea778826eea19908bd761556154
Reviewed-on: https://go-review.googlesource.com/32220
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
To refine a set of possibly equivalent values, the old CSE algorithm
picked one value, compared it against all the others, and made two sets
out of the results (the values that match the picked value and the
values that didn't). Unfortunately, this leads to O(n^2) behavior. The
picked value ends up being equal to no other values, we make size 1 and
size n-1 sets, and then recurse on the size n-1 set.
Instead, sort the set by the equivalence classes of its arguments. Then
we just look for spots in the sorted list where the equivalence classes
of the arguments change. This lets us do a multi-way split for O(n lg
n) time.
This change makes cmpDepth unnecessary.
The refinement portion used to call the type comparator. That is
unnecessary as the type was already part of the initial partition.
Lowers time of 16361 from 8 sec to 3 sec.
Lowers time of 15112 from 282 sec to 20 sec. That's kind of unfair, as
CL 30257 changed it from 21 sec to 282 sec. But that CL fixed other bad
compile times (issue #17127) by large factors, so net still a big win.
Fixes #15112
Fixes #16361
Change-Id: I351ce111bae446608968c6d48710eeb6a3d8e527
Reviewed-on: https://go-review.googlesource.com/30354
Reviewed-by: Todd Neal <todd@tneal.org>
|
|
Todd originally set cmpDepth to 4. Quoting:
I picked a depth of 4 by timing tests of `go tool compile arithConst_ssa.go` and `go test -c net/http`.
3.89 / 3.92 CL w/cmpDepth = 1
3.78 / 3.92 CL w/cmpDepth = 2
3.44 / 3.96 CL w/cmpDepth = 3
3.29 / 3.9 CL w/cmpDepth = 4
3.3 / 3.93 CL w/cmpDepth = 5
3.29 / 3.92 CL w/cmpDepth = 10
I don't see the same behavior now, differences in those two benchmarks
are in the noise (between 1 and 4).
In issue 17127, CSE takes a really long time. Lowering cmpDepth
from 4 to 1 lowers compile time from 8 minutes to 1 minute.
Fixes #17127
Change-Id: I6dc544bbcf2a9dca73637d0182d3de1a5ae6c944
Reviewed-on: https://go-review.googlesource.com/30257
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
We compute a lot of stuff based off the CFG: postorder traversal,
dominators, dominator tree, loop nest. Multiple phases use this
information and we end up recomputing some of it. Add a cache
for this information so if the CFG hasn't changed, we can reuse
the previous computation.
Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2
Reviewed-on: https://go-review.googlesource.com/29356
Reviewed-by: David Chase <drchase@google.com>
|
|
In CSE if a tuple generator is CSE'd to a different block, its
selectors are copied to the same block. In this case, also CES
the copied selectors.
Test copied from Keith's CL 27202.
Fixes #16741.
Change-Id: I2fc8b9513d430f10d6104275cfff5fb75d3ef3d9
Reviewed-on: https://go-review.googlesource.com/27236
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Make tuple types and their SelectX ops fully generic.
These ops no longer need to be lowered.
Regalloc understands them and their tuple-generating arguments.
We can now have opcodes returning arbitrary pairs of results.
(And it would be easy to move to >2 results if needed.)
Update arm implementation to the new standard.
Implement just enough in 386 port to do 64-bit add.
Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79
Reviewed-on: https://go-review.googlesource.com/24976
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
CSE may substitute a tuple generator with another one in a different
block. In this case, since we want tuple selectors to stay together
with the tuple generator, copy the selector to the new generator's
block and rewrite its use.
Op.isTupleGenerator and Op.isTupleSelector are introduced to assert
tuple ops. Use it in tighten as well.
Updates #15365.
Change-Id: Ia9e8c734b9cc3bc9fca4a2750041eef9cdfac5a5
Reviewed-on: https://go-review.googlesource.com/24137
Reviewed-by: David Chase <drchase@google.com>
|
|
domorder has some non-obvious useful properties
that we’re relying on in cse.
Document them and provide an argument that they hold.
While we’re here, do some minor renaming.
The argument is a re-working of a private email
exchange with Todd Neal and David Chase.
Change-Id: Ie154e0521bde642f5f11e67fc542c5eb938258be
Reviewed-on: https://go-review.googlesource.com/23449
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
It never makes sense to CSE two ops that generate memory.
We might as well start those ops off in their own partition.
Fixes #15520
Change-Id: I0091ed51640f2c10cd0117f290b034dde7a86721
Reviewed-on: https://go-review.googlesource.com/22741
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
They have different semantics.
Equal is stricter and is designed for the front-end.
Compare is looser and cheaper and is designed for the back-end.
To avoid possible regression, remove Equal from ssa.Type.
Updates #15043
Change-Id: Ie23ce75ff6b4d01b7982e0a89e6f81b5d099d8d6
Reviewed-on: https://go-review.googlesource.com/21483
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Process a slice of equivalent values by setting replaced values to nil
instead of removing them from the slice to eliminate copying. Also take
advantage of the entry number sort to break early once we reach a value
in a block that is not dominated.
For the code in issue #15112:
Before:
real 0m52.603s
user 0m56.957s
sys 0m1.213s
After:
real 0m22.048s
user 0m26.445s
sys 0m0.939s
Updates #15112
Change-Id: I06d9e1e1f1ad85d7fa196c5d51f0dc163907376d
Reviewed-on: https://go-review.googlesource.com/22068
Reviewed-by: David Chase <drchase@google.com>
|
|
We do two O(n) scans of all values in an eqclass when computing
substitutions for CSE.
In unfortunate cases, like those found in #15112, we can have a large
eqclass composed of values found in blocks none of whom dominate the
other. This leads to O(n^2) behavior. The elements are removed one at a
time, with O(n) scans each time.
This CL removes the linear scan by sorting the eqclass so that dominant
values will be sorted first. As long as we also ensure we don't disturb
the sort order, then we no longer need to scan for the maximally
dominant value.
For the code in issue #15112:
Before:
real 1m26.094s
user 1m30.776s
sys 0m1.125s
Aefter:
real 0m52.099s
user 0m56.829s
sys 0m1.092s
Updates #15112
Change-Id: Ic4f8680ed172e716232436d31963209c146ef850
Reviewed-on: https://go-review.googlesource.com/21981
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Missed this in the previous CL where the shared
dom tree was introduced.
Change-Id: If0bd85d4b4567d7e87814ed511603b1303ab3903
Reviewed-on: https://go-review.googlesource.com/21970
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
runtime.newobject never returns the same thing twice,
so the resulting value will never be a common subexpression.
This helps when compiling large static data structures
that include pointers, such as maps and slices.
No clear performance impact on other code. (See below.)
For the code in issue #15112:
Before:
real 1m14.238s
user 1m18.985s
sys 0m0.787s
After:
real 0m47.172s
user 0m52.248s
sys 0m0.767s
For the code in issue #15235, size 10k:
Before:
real 0m44.916s
user 0m46.577s
sys 0m0.304s
After:
real 0m7.703s
user 0m9.041s
sys 0m0.316s
Still more work to be done, particularly for #15112.
Updates #15112
Updates #15235
name old time/op new time/op delta
Template 330ms ±11% 333ms ±13% ~ (p=0.749 n=20+19)
Unicode 148ms ± 6% 152ms ± 8% ~ (p=0.072 n=18+20)
GoTypes 1.01s ± 7% 1.01s ± 3% ~ (p=0.583 n=20+20)
Compiler 5.04s ± 2% 5.06s ± 2% ~ (p=0.314 n=20+20)
name old user-ns/op new user-ns/op delta
Template 444user-ms ±11% 445user-ms ±10% ~ (p=0.738 n=20+20)
Unicode 215user-ms ± 5% 218user-ms ± 5% ~ (p=0.239 n=18+18)
GoTypes 1.45user-s ± 3% 1.45user-s ± 4% ~ (p=0.620 n=20+20)
Compiler 7.23user-s ± 2% 7.22user-s ± 2% ~ (p=0.901 n=20+19)
name old alloc/op new alloc/op delta
Template 55.0MB ± 0% 55.0MB ± 0% ~ (p=0.547 n=20+20)
Unicode 37.6MB ± 0% 37.6MB ± 0% ~ (p=0.301 n=20+20)
GoTypes 177MB ± 0% 177MB ± 0% ~ (p=0.065 n=20+19)
Compiler 798MB ± 0% 797MB ± 0% -0.05% (p=0.000 n=19+20)
name old allocs/op new allocs/op delta
Template 492k ± 0% 493k ± 0% +0.03% (p=0.030 n=20+20)
Unicode 377k ± 0% 377k ± 0% ~ (p=0.423 n=20+19)
GoTypes 1.40M ± 0% 1.40M ± 0% ~ (p=0.102 n=20+20)
Compiler 5.53M ± 0% 5.53M ± 0% ~ (p=0.094 n=17+18)
name old text-bytes new text-bytes delta
HelloSize 561k ± 0% 561k ± 0% ~ (all samples are equal)
CmdGoSize 6.13M ± 0% 6.13M ± 0% ~ (all samples are equal)
name old data-bytes new data-bytes delta
HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal)
CmdGoSize 306k ± 0% 306k ± 0% ~ (all samples are equal)
name old exe-bytes new exe-bytes delta
HelloSize 905k ± 0% 905k ± 0% ~ (all samples are equal)
CmdGoSize 9.64M ± 0% 9.64M ± 0% ~ (all samples are equal)
Change-Id: Id774e2901d7701a3ec7a1c1d1cf1d9327a4107fc
Reviewed-on: https://go-review.googlesource.com/21937
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
|
|
Keep track of how many uses each Value has. Each appearance in
Value.Args and in Block.Control counts once.
The number of uses of a value is generically useful to
constrain rewrite rules. For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.
But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use. Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races. In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice. (I have a separate
CL which triggers the mutex failure. This CL has a test which
demonstrates a similar failure.)
Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
|
|
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
|
|
Construct better initial partitions by recursively comparing values and
their arguments. This saves one second on compile of arithConst_ssa.go
(4.3s to 3.3s) and shows a 3-5% increase with compilebench.
name old time/op new time/op delta
Template 266ms ± 3% 253ms ± 4% -5.08% (p=0.032 n=5+5)
GoTypes 927ms ± 3% 885ms ± 2% -4.55% (p=0.016 n=5+5)
Compiler 3.91s ± 3% 3.73s ± 2% -4.49% (p=0.008 n=5+5)
MakeBash 31.6s ± 1% 30.5s ± 3% -3.51% (p=0.016 n=5+5)
Change-Id: I6ede31ff459131ccfed69531acfbd06b19837700
Reviewed-on: https://go-review.googlesource.com/19838
Reviewed-by: David Chase <drchase@google.com>
|
|
Replaced comparison based on (*Type).String() with an
allocation-free structural comparison. Roughly doubles
speed of CSE, also reduces allocations.
Checked that roughly the same number of CSEs were detected
during make.bash (about a million) and that "new" CSEs
were caused by the effect described above.
Change-Id: Id205a9f6986efd518043e12d651f0b01206aeb1b
Reviewed-on: https://go-review.googlesource.com/19471
Reviewed-by: Keith Randall <khr@golang.org>
|
|
* If a operation is commutative order the parameters
in a canonical way.
Size of pkg/tool/linux_amd64/* excluding compile:
before: 95882288
after: 95868152
change: 14136 ~0.015%
I tried something similar with Leq and Geq, but the results were
not great because it confuses the 'lowered cse' pass too much
which can no longer remove redundant comparisons from IsInBounds.
Change-Id: I2f928663a11320bfc51c7fa47e384b7411c420ba
Reviewed-on: https://go-review.googlesource.com/19727
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Add an initial cse pass that only operates on zero argument
values. This removes the need for a special case in cse for removing
OpSB and speeds up arithConst_ssa.go compilation by 9% while slowing
"test -c net/http" by 1.5%.
Change-Id: Id1500482485426f66c6c2eba75eeaf4f19c8a889
Reviewed-on: https://go-review.googlesource.com/19454
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Problem was caused by use of Args[].Aux differences
in early partitioning. This artificially separated
two equivalent expressions because sort ignores the
Aux field, hence things can end with equal things
separated by unequal things and thus the equal things
are split into more than one partition. For example:
SliceLen(a), SliceLen(b), SliceLen(a).
Fix: don't use Args[].Aux in initial partitioning.
Left in a debugging flag and some debugging Fprintf's;
not sure if that is house style or not. We'll probably
want to be more systematic in our naming conventions,
e.g. ssa.cse, ssa.scc, etc.
Change-Id: Ib1412539cc30d91ea542c0ac7b2f9b504108ca7f
Reviewed-on: https://go-review.googlesource.com/19316
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Examine both Aux and AuxInt to form more precise initial partitions.
Restructure loop to avoid repeated type.Equal() call. Speeds up
compilation of testdata/gen/arithConst_ssa by 25%.
Change-Id: I3cfb1d254adf0601ee69239e1885b0cf2a23575b
Reviewed-on: https://go-review.googlesource.com/19313
Run-TryBot: Todd Neal <todd@tneal.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change-Id: I1738e3af7de0972c54d74325d80781059d0796d8
Reviewed-on: https://go-review.googlesource.com/19186
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The OpSB hack didn't quite work. We need to really
CSE these ops to make regalloc happy.
Change-Id: I9f4d7bfb0929407c84ee60c9e25ff0c0fbea84af
Reviewed-on: https://go-review.googlesource.com/19083
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
It is one of the slowest compiler phases right now, and we
run two of them.
Instead of using a map to make the initial partition, use a sort.
It is much less memory intensive.
Do a few optimizations to avoid work for size-1 equivalence classes.
Implement -N.
Change-Id: I1d2d85d3771abc918db4dd7cc30b0b2d854b15e1
Reviewed-on: https://go-review.googlesource.com/19024
Reviewed-by: David Chase <drchase@google.com>
|
|
With the separate flagalloc pass, it should be fine to
allow CSE of control values. The worst that can happen
is that the comparison gets un-CSEd by flagalloc.
Fix bug in flagalloc where flag restores were getting
clobbered by rematerialization during register allocation.
Change-Id: If476cf98b69973e8f1a8eb29441136dd12fab8ad
Reviewed-on: https://go-review.googlesource.com/17760
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
Remove unnecessary local var split.
Change-Id: I907ef682b5fd9b3a67771edd1fe90c558f8937ea
Reviewed-on: https://go-review.googlesource.com/14523
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Added tree numbering data structure.
Changed dominator query in CSE.
Removed skip-for-too-big patch in CSE.
Passes all.bash.
Change-Id: I98d7c61b6015c81f5edab553615db17bc7a58d68
Reviewed-on: https://go-review.googlesource.com/14326
Reviewed-by: Keith Randall <khr@golang.org>
|
|
CL 14337 made SSA support fixedbugs/issue9604b.go.
That test contains > 40k blocks.
This made the O(n^2) dom algorithm fail to terminate
in a reasonable length of time, breaking the build.
For the moment, cap the number of blocks
to fix the build.
This will be reverted when a more efficient
dom algorithm is put in place,
which will be soon.
Change-Id: Ia66c2629481d29d06655ec54d1deff076b0422c6
Reviewed-on: https://go-review.googlesource.com/14342
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Store floats in AuxInt to reduce allocations.
Change-Id: I101e6322530b4a0b2ea3591593ad022c992e8df8
Reviewed-on: https://go-review.googlesource.com/14320
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
cse was incorrectly classifying -0.0 and 0.0 as equivalent. This lead
to invalid code as ssa uses PXOR -0.0, reg to negate a floating point.
Fixes math.
Change-Id: Id7eb10c71749eaed897f29b02c33891cf5820acf
Reviewed-on: https://go-review.googlesource.com/14205
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This reduces the time to compile
test/slice3.go on my laptop from ~12s to ~3.8s.
It reduces the max memory use from ~4.8gb to
~450mb.
This is still considerably worse than tip,
at 1s and 300mb respectively, but it's
getting closer.
Hopefully this will fix the build at long last.
Change-Id: Iac26b52023f408438cba3ea1b81dcd82ca402b90
Reviewed-on: https://go-review.googlesource.com/12566
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Experimentally, the Ops of v.Args do a good job
of differentiating values that will end up in
different partitions.
Most values have at most two args, so use them.
This reduces the wall time to run test/slice3.go
on my laptop from ~20s to ~12s.
Credit to Todd Neal for the idea.
Change-Id: I55d08f09eb678bbe8366924ca2fabcd32526bf41
Reviewed-on: https://go-review.googlesource.com/12565
Reviewed-by: Keith Randall <khr@golang.org>
|
|
By walking only the current set of partitions
at any given point, the cse pass ended up doing
lots of extraneous, effectively O(n^2) work.
Using a regular for loop allows each cse pass to
make as much progress as possible by processing
each new class as it is introduced.
This can and should be optimized further,
but it already reduces by 75% cse time on test/slice3.go.
The overall time to compile test/slice3.go is still
dominated by the O(n^2) work in the liveness pass.
However, Keith is rewriting regalloc anyway.
Change-Id: I8be020b2f69352234587eeadeba923481bf43fcc
Reviewed-on: https://go-review.googlesource.com/12244
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Here is a concrete case in which this goes wrong.
func f_ssa() int {
var n int
Next:
for j := 0; j < 3; j++ {
for i := 0; i < 10; i++ {
if i == 6 {
continue Next
}
n = i
}
n += j + j + j + j + j + j + j + j + j + j // j * 10
}
return n
}
What follows is the function printout before and after CSE.
Note blocks b8 and b10 in the before case.
b8 is the inner loop's condition: i < 10.
b10 is the inner loop's increment: i++.
v82 is i. On entry to b8, it is either 0 (v19) the first time,
or the result of incrementing v82, by way of v29.
The CSE pass considered v82 and v49 to be common subexpressions,
and eliminated v82 in favor of v49.
In the after case, v82 is now dead and will shortly be eliminated.
As a result, v29 is also dead, and we have lost the increment.
The loop runs forever.
BEFORE CSE
f_ssa <nil>
b1:
v1 = Arg <mem>
v2 = SP <uint64>
v4 = Addr <*int> {~r0} v2
v13 = Zero <mem> [8] v4 v1
v14 = Const <int>
v15 = Const <int>
v17 = Const <int> [3]
v19 = Const <int>
v21 = Const <int> [10]
v24 = Const <int> [6]
v28 = Const <int> [1]
v43 = Const <int> [1]
Plain -> b3
b2: <- b7
Exit v47
b3: <- b1
Plain -> b4
b4: <- b3 b6
v49 = Phi <int> v15 v44
v68 = Phi <int> v14 v67
v81 = Phi <mem> v13 v81
v18 = Less <bool> v49 v17
If v18 -> b5 b7
b5: <- b4
Plain -> b8
b6: <- b12 b11
v67 = Phi <int> v66 v41
v44 = Add <int> v49 v43
Plain -> b4
b7: <- b4
v47 = Store <mem> v4 v68 v81
Plain -> b2
b8: <- b5 b10
v66 = Phi <int> v68 v82
v82 = Phi <int> v19 v29
v22 = Less <bool> v82 v21
If v22 -> b9 b11
b9: <- b8
v25 = Eq <bool> v82 v24
If v25 -> b12 b13
b10: <- b13
v29 = Add <int> v82 v28
Plain -> b8
b11: <- b8
v32 = Add <int> v49 v49
v33 = Add <int> v32 v49
v34 = Add <int> v33 v49
v35 = Add <int> v34 v49
v36 = Add <int> v35 v49
v37 = Add <int> v36 v49
v38 = Add <int> v37 v49
v39 = Add <int> v38 v49
v40 = Add <int> v39 v49
v41 = Add <int> v66 v40
Plain -> b6
b12: <- b9
Plain -> b6
b13: <- b9
Plain -> b10
AFTER CSE
f_ssa <nil>
b1:
v1 = Arg <mem>
v2 = SP <uint64>
v4 = Addr <*int> {~r0} v2
v13 = Zero <mem> [8] v4 v1
v14 = Const <int>
v15 = Const <int>
v17 = Const <int> [3]
v19 = Const <int>
v21 = Const <int> [10]
v24 = Const <int> [6]
v28 = Const <int> [1]
v43 = Const <int> [1]
Plain -> b3
b2: <- b7
Exit v47
b3: <- b1
Plain -> b4
b4: <- b3 b6
v49 = Phi <int> v19 v44
v68 = Phi <int> v19 v67
v81 = Phi <mem> v13 v81
v18 = Less <bool> v49 v17
If v18 -> b5 b7
b5: <- b4
Plain -> b8
b6: <- b12 b11
v67 = Phi <int> v66 v41
v44 = Add <int> v49 v43
Plain -> b4
b7: <- b4
v47 = Store <mem> v4 v68 v81
Plain -> b2
b8: <- b5 b10
v66 = Phi <int> v68 v49
v82 = Phi <int> v19 v29
v22 = Less <bool> v49 v21
If v22 -> b9 b11
b9: <- b8
v25 = Eq <bool> v49 v24
If v25 -> b12 b13
b10: <- b13
v29 = Add <int> v49 v43
Plain -> b8
b11: <- b8
v32 = Add <int> v49 v49
v33 = Add <int> v32 v49
v34 = Add <int> v33 v49
v35 = Add <int> v34 v49
v36 = Add <int> v35 v49
v37 = Add <int> v36 v49
v38 = Add <int> v37 v49
v39 = Add <int> v38 v49
v40 = Add <int> v39 v49
v41 = Add <int> v66 v40
Plain -> b6
b12: <- b9
Plain -> b6
b13: <- b9
Plain -> b10
Change-Id: I16fc4ec527ec63f24f7d0d79d1a4a59bf37269de
Reviewed-on: https://go-review.googlesource.com/12444
Reviewed-by: Keith Randall <khr@golang.org>
|