Age | Commit message (Collapse) | Author |
|
For #40724.
Change-Id: I7d1e76139d187cd15a6e0df9d19542b7200589f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/297911
Trust: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The list of conflicted files for this merge is:
src/cmd/compile/internal/gc/inl.go
src/cmd/compile/internal/gc/order.go
src/cmd/compile/internal/gc/ssa.go
test/fixedbugs/issue20415.go
test/fixedbugs/issue22822.go
test/fixedbugs/issue28079b.go
inl.go was updated for changes on dev.regabi: namely that OSELRECV has
been removed, and that OSELRECV2 now only uses List, rather than both
Left and List.
order.go was updated IsAutoTmp is now a standalone function, rather
than a method on Node.
ssa.go was similarly updated for new APIs involving package ir.
The tests are all merging upstream additions for gccgo error messages
with changes to cmd/compile's error messages on the dev.regabi branch.
Change-Id: Icaaf186d69da791b5994dbb6688ec989caabec42
|
|
The type of the shift count must be an unsigned integer. Some s390x
rules for shift have their auxint type being int8. This results in a
compilation failure on s390x with an invalid operation when running
make.bash using older versions of go (e.g: go1.10.4).
This CL adds an auxint type of uint8 and changes the ops for shift and
rotate to use auxint with type uint8. The related rules are also
modified to address this change.
Fixes #43090
Change-Id: I594274b6e3d9b23092fc9e9f4b354870164f2f19
Reviewed-on: https://go-review.googlesource.com/c/go/+/277078
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
|
|
It's currently hard to automate refactorings around the Value.Aux
field, because we don't have any static typing information for it.
Adding a tag interface will make subsequent CLs easier and safer.
Passes buildall w/ toolstash -cmp.
Updates #42982.
Change-Id: I41ae8e411a66bda3195a0957b60c2fe8a8002893
Reviewed-on: https://go-review.googlesource.com/c/go/+/275756
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
|
|
Change-Id: I2233a6a157ec8feffaefd6a8ee65b1c38778c1cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/255238
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
|
|
This is prerequisite to moving call expansion later into SSA,
and probably a good idea anyway. Passes tests.
This is the first minimal CL that does a 1-for-1 substitution
of *ssa.AuxCall for *obj.LSym. Next step (next CL) is to make
this change for all calls so that additional information can
be stored in AuxCall.
Change-Id: Ia3a7715648fd9fb1a176850767a726e6f5b959eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/237680
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
instructions in AuxInt
The current implementation stores the comparison pseudo-ops of arm64
conditional instructions (CSEL/CSEL0) in Aux, this patch modifies it
and stores it in AuxInt, which can avoid the allocation.
Change-Id: I0b69e51f63acd84c6878c6a59ccf6417501a8cfc
Reviewed-on: https://go-review.googlesource.com/c/go/+/252517
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Encode the flag results in an auxint field instead of having
one opcode per flag state. This helps us handle the new *noov
branches in a unified manner.
This is only for arm, arm64 is in a subsequent CL.
We could extend to other architectures as well, athough it would
only be cleanup, no behavioral change.
Update #39505
Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/238077
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This first pass makes the rules using the condition code mask
(CCMask) and rotate parameters (RotateParams) aux values strongly
typed. This required adding strongly typed aux handling to the
block rulegen.
More CLs like this to follow, but this is probably the most
complex.
Passes toolstash-check -all.
Change-Id: Ie513b07d527f0c1b398d7748331442dcb5f7b17d
Reviewed-on: https://go-review.googlesource.com/c/go/+/228518
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Right now the Aux and AuxInt fields of ssa.Values are typed as
interface{} and int64, respectively. Each rule that uses these values
must cast them to the type they actually are (*obj.LSym, or int32, or
ValAndOff, etc.), use them, and then cast them back to interface{} or
int64.
We know for each opcode what the types of the Aux and AuxInt fields
should be. So let's modify the rule generator to declare the types to
be what we know they should be, autoconverting to and from the generic
types for us. That way we can make the rules more type safe.
It's difficult to make a single CL for this, so I've coopted the "=>"
token to indicate a rule that is strongly typed. "->" rules are
processed as before. That will let us migrate a few rules at a time in
separate CLs. Hopefully we can reach a state where all rules are
strongly typed and we can drop the distinction.
This CL changes just a few rules to get a feel for what this
transition would look like.
I've decided not to put explicit types in the rules. I think it
makes the rules somewhat clearer, but definitely more verbose.
In particular, the passthrough rules that don't modify the fields
in question are verbose for no real reason.
Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d
Reviewed-on: https://go-review.googlesource.com/c/go/+/190197
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Trying this CL again, with a fixed test that allows platforms
to disagree on the exact behavior of converting NaNs.
We store 32-bit floating point constants in a 64-bit field, by
converting that 32-bit float to 64-bit float to store it, and convert
it back to use it.
That works for *almost* all floating-point constants. The exception is
signaling NaNs. The round trip described above means we can't represent
a 32-bit signaling NaN, because conversions strip the signaling bit.
To fix this issue, just forbid NaNs as floating-point constants in SSA
form. This shouldn't affect any real-world code, as people seldom
constant-propagate NaNs (except in test code).
Additionally, NaNs are somewhat underspecified (which of the many NaNs
do you get when dividing 0/0?), so when cross-compiling there's a
danger of using the compiler machine's NaN regime for some math, and
the target machine's NaN regime for other math. Better to use the
target machine's NaN regime always.
Update #36400
Change-Id: Idf203b688a15abceabbd66ba290d4e9f63619ecb
Reviewed-on: https://go-review.googlesource.com/c/go/+/221790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
The goal here is improved AuxInt printing in ssa.html.
Instead of displaying an inscrutable encoded integer,
it displays something like
v25 (28) = UBFX <int> [lsb=4,width=8] v52
which is much nicer for debugging.
Change-Id: I40713ff7f4a857c4557486cdf73c2dff137511ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/221420
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This reverts CL 213477.
Reason for revert: tests are failing on linux-mips*-rtrk builders.
Change-Id: I8168f7450890233f1bd7e53930b73693c26d4dc0
Reviewed-on: https://go-review.googlesource.com/c/go/+/220897
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
We store 32-bit floating point constants in a 64-bit field, by
converting that 32-bit float to 64-bit float to store it, and convert
it back to use it.
That works for *almost* all floating-point constants. The exception is
signaling NaNs. The round trip described above means we can't represent
a 32-bit signaling NaN, because conversions strip the signaling bit.
To fix this issue, just forbid NaNs as floating-point constants in SSA
form. This shouldn't affect any real-world code, as people seldom
constant-propagate NaNs (except in test code).
Additionally, NaNs are somewhat underspecified (which of the many NaNs
do you get when dividing 0/0?), so when cross-compiling there's a
danger of using the compiler machine's NaN regime for some math, and
the target machine's NaN regime for other math. Better to use the
target machine's NaN regime always.
This has been a bug since 1.10, and there's an easy workaround
(declare a global varaible containing the signaling NaN pattern, and
use that as the argument to math.Float32frombits) so we'll fix it in
1.15.
Fixes #36400
Update #36399
Change-Id: Icf155e743281560eda2eed953d19a829552ccfda
Reviewed-on: https://go-review.googlesource.com/c/go/+/213477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Once defined, a stack slot holding an open-coded defer arg should always be marked
live, since it may be used at any time if there is a panic. These stack slots are
typically kept live naturally by the open-defer code inlined at each return/exit point.
However, we need to do extra work to make sure that they are kept live if a
function has an infinite loop or a panic exit.
For this fix, only in the case of a function that is using open-coded defers, we
compute the set of blocks (most often empty) that cannot reach a return or a
BlockExit (panic) because of an infinite loop. Then, for each block b which
cannot reach a return or BlockExit or is a BlockExit block, we mark each defer arg
slot as live, as long as the definition of the defer arg slot dominates block b.
For this change, had to export (*Func).sdom (-> Sdom) and SparseTree.isAncestorEq
(-> IsAncestorEq)
Updates #35277
Change-Id: I7b53c9bd38ba384a3794386dd0eb94e4cbde4eb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204802
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.
Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.
This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.
Passes toolstash-check -all.
Results of compilebench:
name old time/op new time/op delta
Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20)
Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18)
GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18)
Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18)
SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18)
Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20)
GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20)
Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20)
Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20)
XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18)
LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19)
ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20)
LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20)
StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20)
name old user-time/op new user-time/op delta
Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20)
Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20)
GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19)
Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18)
SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18)
Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18)
GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20)
Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18)
Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20)
XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20)
LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20)
ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19)
LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20)
name old object-bytes new object-bytes delta
Template 559kB ± 0% 559kB ± 0% ~ (all equal)
Unicode 216kB ± 0% 216kB ± 0% ~ (all equal)
GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal)
Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20)
SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20)
Flate 343kB ± 0% 343kB ± 0% ~ (all equal)
GoParser 441kB ± 0% 441kB ± 0% ~ (all equal)
Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal)
Tar 487kB ± 0% 487kB ± 0% ~ (all equal)
XML 632kB ± 0% 632kB ± 0% ~ (all equal)
name old export-bytes new export-bytes delta
Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal)
Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal)
GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal)
Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20)
SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20)
Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal)
GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal)
Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal)
Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal)
XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal)
CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal)
CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal)
CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal)
Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Prior to this CL conditional branches on s390x always used an
extended mnemonic such as BNE, BLT and so on to represent branch
instructions with different condition code masks. This CL adds
support for numeric condition code masks to the s390x SSA backend
so that we can encode the condition under which a Block's
successor is chosen as a field in that Block rather than in its
type.
This change will be useful as we come to add support for combined
compare-and-branch instructions. Rather than trying to add extended
mnemonics for every possible combination of mask and compare-and-
branch instruction we can instead use a single mnemonic for each
instruction.
Change-Id: Idb7458f187b50906877d683695c291dff5279553
Reviewed-on: https://go-review.googlesource.com/c/go/+/197178
Reviewed-by: Keith Randall <khr@golang.org>
|
|
We never used it, might as well get rid of it.
Change-Id: I5c23c93e90173bff9ac1fc1b8ae1e2025215d6eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/191938
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Signalling NaNs were being converted to quiet NaNs during constant
propagation through integer <-> float store-to-load forwarding.
This occurs because we store float32 constants as float64
values and CPU hardware 'quietens' NaNs during conversion between
the two.
Eventually we want to move to using float32 values to store float32
constants, however this will be a big change since both the compiler
and the assembler expect float64 values. So for now this is a small
change that will fix the immediate issue.
Fixes #27193.
Change-Id: Iac54bd8c13abe26f9396712bc71f9b396f842724
Reviewed-on: https://go-review.googlesource.com/132956
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Lack of a well-defined order between VarDef and related
address operations sometimes causes problems with store order
and write barrier transformations; glitches in the order are
made irreparable (by later optimizations) if the two parts of
the glitch straddle a split in the original block caused by
insertion of a write barrier diamond.
Fix this by creating a LocalAddr for addresses of locals
(what VarDef matters for) that takes a memory input to
help make the order explicit. Addr is modified to only
be legal for SB operand, so there is no overlap between
Addr and LocalAddr uses (there may be some downstream
cleanup from this).
Changes to generic.rules and rewrite.go ensure that codegen
tests continue to pass; CSE of LocalAddr is impaired, not
quite sure of the cost.
Fixes #26105.
Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515
Reviewed-on: https://go-review.googlesource.com/122483
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
gc/ssa.go initilizes SP and SB values with TUINTPTR type.
Assign same type in SSA tests and modify check.go to catch
mismatching types for those ops.
This makes SSA tests more consistent.
Change-Id: I798440d57d00fb949d1a0cd796759c9b82a934bd
Reviewed-on: https://go-review.googlesource.com/106658
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Introduce a new SSA pass to generate CondSelect intstrutions,
and add CondSelect lowering rules for arm64.
In order to make the CSEL instruction easier to optimize,
and to simplify the introduction of CSNEG, CSINC, and CSINV
in the future, modify the CSEL instruction to accept a condition
code in the aux field.
Notably, this change makes the go1 Gzip benchmark
more than 10% faster.
Benchmarks on a Cavium ThunderX:
name old time/op new time/op delta
BinaryTree17-96 15.9s ± 6% 16.0s ± 4% ~ (p=0.968 n=10+9)
Fannkuch11-96 7.17s ± 0% 7.00s ± 0% -2.43% (p=0.000 n=8+9)
FmtFprintfEmpty-96 208ns ± 1% 207ns ± 0% ~ (p=0.152 n=10+8)
FmtFprintfString-96 379ns ± 0% 375ns ± 0% -0.95% (p=0.000 n=10+9)
FmtFprintfInt-96 385ns ± 0% 383ns ± 0% -0.52% (p=0.000 n=9+10)
FmtFprintfIntInt-96 591ns ± 0% 586ns ± 0% -0.85% (p=0.006 n=7+9)
FmtFprintfPrefixedInt-96 656ns ± 0% 667ns ± 0% +1.71% (p=0.000 n=10+10)
FmtFprintfFloat-96 967ns ± 0% 984ns ± 0% +1.78% (p=0.000 n=10+10)
FmtManyArgs-96 2.35µs ± 0% 2.25µs ± 0% -4.63% (p=0.000 n=9+8)
GobDecode-96 31.0ms ± 0% 30.8ms ± 0% -0.36% (p=0.006 n=9+9)
GobEncode-96 24.4ms ± 0% 24.5ms ± 0% +0.30% (p=0.000 n=9+9)
Gzip-96 1.60s ± 0% 1.43s ± 0% -10.58% (p=0.000 n=9+10)
Gunzip-96 167ms ± 0% 169ms ± 0% +0.83% (p=0.000 n=8+9)
HTTPClientServer-96 311µs ± 1% 308µs ± 0% -0.75% (p=0.000 n=10+10)
JSONEncode-96 65.0ms ± 0% 64.8ms ± 0% -0.25% (p=0.000 n=9+8)
JSONDecode-96 262ms ± 1% 261ms ± 1% ~ (p=0.579 n=10+10)
Mandelbrot200-96 18.0ms ± 0% 18.1ms ± 0% +0.17% (p=0.000 n=8+10)
GoParse-96 14.0ms ± 0% 14.1ms ± 1% +0.42% (p=0.003 n=9+10)
RegexpMatchEasy0_32-96 644ns ± 2% 645ns ± 2% ~ (p=0.836 n=10+10)
RegexpMatchEasy0_1K-96 3.70µs ± 0% 3.49µs ± 0% -5.58% (p=0.000 n=10+10)
RegexpMatchEasy1_32-96 662ns ± 2% 657ns ± 2% ~ (p=0.137 n=10+10)
RegexpMatchEasy1_1K-96 4.47µs ± 0% 4.31µs ± 0% -3.48% (p=0.000 n=10+10)
RegexpMatchMedium_32-96 844ns ± 2% 849ns ± 1% ~ (p=0.208 n=10+10)
RegexpMatchMedium_1K-96 179µs ± 0% 182µs ± 0% +1.20% (p=0.000 n=10+10)
RegexpMatchHard_32-96 10.0µs ± 0% 10.1µs ± 0% +0.48% (p=0.000 n=10+9)
RegexpMatchHard_1K-96 297µs ± 0% 297µs ± 0% -0.14% (p=0.000 n=10+10)
Revcomp-96 3.08s ± 0% 3.13s ± 0% +1.56% (p=0.000 n=9+9)
Template-96 276ms ± 2% 275ms ± 1% ~ (p=0.393 n=10+10)
TimeParse-96 1.37µs ± 0% 1.36µs ± 0% -0.53% (p=0.000 n=10+7)
TimeFormat-96 1.40µs ± 0% 1.42µs ± 0% +0.97% (p=0.000 n=10+10)
[Geo mean] 264µs 262µs -0.77%
Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2
Reviewed-on: https://go-review.googlesource.com/55670
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
When we go from a branch block to a plain block, reset the
branch prediction bit. Downstream passes asssume that if the
branch prediction is set, then the block has 2 successors.
Fixes #23504
Change-Id: I2898ec002228b2e34fe80ce420c6939201c0a5aa
Reviewed-on: https://go-review.googlesource.com/88955
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Updates #18162 (mostly fixes)
Change-Id: I35bcb8a688bdaa432adb0ddbb73a2f7adda47b9e
Reviewed-on: https://go-review.googlesource.com/37958
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Relax the 'phi after non-phi' SSA sanity check to allow
RegKill ops interspersed with phi ops in a block. This fixes
a sanity check failure when -dwarflocationlists is enabled.
Updates #22694.
Change-Id: Iaae604ab6f1a8b150664dd120003727a6fb2f698
Reviewed-on: https://go-review.googlesource.com/77610
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
The loop finder doesn't return good information if it
encounters an irreducible loop. Make a start on improving
this, and set a function-level flag to indicate when there
is such a loop (and the returned information might be flaky).
Use that flag to prevent the loop rotater from getting
confused; the existing code seems to depend on artifacts
of the previous loop-finding algorithm. (There is one
irreducible loop in the go library, in "inflate.go").
Change-Id: If6e26feab38d9b009d2252d556e1470c803bde40
Reviewed-on: https://go-review.googlesource.com/42150
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This CL adds generic SSA rules to propagate constants through raw bits
conversions between floats and integers. This allows constants to
propagate through some math functions. For example, math.Copysign(0, -1)
is now constant folded to a load of -0.0.
Requires a fix to the ARM assembler which loaded -0.0 as +0.0.
Change-Id: I52649a4691077c7414f19d17bb599a6743c23ac2
Reviewed-on: https://go-review.googlesource.com/62250
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Update #20178
Change-Id: I603f77268ed38afdd84228c775efe006f08f14a7
Reviewed-on: https://go-review.googlesource.com/45018
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Enhance the one-live-memory-at-a-time check to run during many
more phases of the SSA backend. Also make it work in an interblock
fashion.
Change types.IsMemory to return true for tuples containing a memory type.
Fix trim pass to build the merged phi correctly. Doesn't affect
code but allows the check to pass after trim runs.
Switch the AddTuple* ops to take the memory-containing tuple argument second.
Update #20335
Change-Id: I5b03ef3606b75a9e4f765276bb8b183cdc172b43
Reviewed-on: https://go-review.googlesource.com/43495
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Tuple ops are weird. They are essentially a pair of ops,
one which consumes a mem and one which generates a mem (the Select1).
The schedule pass didn't handle these quite right.
Fix the scheduler to include both parts of the paired op in
the store chain. That makes sure that loads are correctly ordered
with respect to the first of the pair.
Add a check for the ssacheck builder, that there is only one
live store at a time. I thought we already had such a check, but
apparently not...
Fixes #20335
Change-Id: I59eb3446a329100af38d22820b1ca2190ca46a78
Reviewed-on: https://go-review.googlesource.com/43294
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Replace derecursed postorder computation with one that
mimics DFS traversal.
Corrected outerinner function in loopfinder
Leave enhanced checks in place.
Change-Id: I657ba5e89c88941028d6d4c72e9f9056e30f1ce8
Reviewed-on: https://go-review.googlesource.com/40872
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Remove size AuxInt in Store, and alignment in Move/Zero. We still
pass size AuxInt to Move/Zero, as it is used for partial Move/Zero
lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288).
SizeAndAlign is gone.
Passes "toolstash -cmp" on std.
Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d
Reviewed-on: https://go-review.googlesource.com/38150
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
When the compiler insert write barriers, the frontend makes
conservative decisions at an early stage. This may have false
positives which result in write barriers for stack writes.
A new phase, writebarrier, is added to the SSA backend, to delay
the decision and eliminate false positives. The frontend still
makes conservative decisions. When building SSA, instead of
emitting runtime calls directly, it emits WB ops (StoreWB,
MoveWB, etc.), which will be expanded to branches and runtime
calls in writebarrier phase. Writes to static locations on stack
are detected and write barriers are removed.
All write barriers of stack writes found by the script from
issue #17330 are eliminated (except two false positives).
Fixes #17330.
Change-Id: I9bd66333da9d0ceb64dcaa3c6f33502798d1a0f8
Reviewed-on: https://go-review.googlesource.com/31131
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
We compute a lot of stuff based off the CFG: postorder traversal,
dominators, dominator tree, loop nest. Multiple phases use this
information and we end up recomputing some of it. Add a cache
for this information so if the CFG hasn't changed, we can reuse
the previous computation.
Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2
Reviewed-on: https://go-review.googlesource.com/29356
Reviewed-by: David Chase <drchase@google.com>
|
|
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.
NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).
I rewrote the nilcheckelim pass to handle this case. In particular,
there can now be multiple NilCheck ops per block.
I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.
Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.
Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
No need for it, we can treat calls as (mostly) normal values
that take a memory and return a memory.
Lowers the number of basic blocks needed to represent a function.
"go test -c net/http" uses 27% fewer basic blocks.
Probably doesn't affect generated code much, but should help
various passes whose running time and/or space depends on
the number of basic blocks.
Fixes #15631
Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6
Reviewed-on: https://go-review.googlesource.com/28950
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Makes the AuxInt arg to Move/Zero print in a readable format.
Change-Id: I12295959b00ff7c1638d35836cc6d64d112c11ca
Reviewed-on: https://go-review.googlesource.com/28271
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Provide indexes along with block pointers for Preds
and Succs arrays. This allows us to splice edges in
and out of those arrays in constant time.
Fixes worst-case O(n^2) behavior in deadcode and fuse.
benchmark old ns/op new ns/op delta
BenchmarkFuse1-8 2065 2057 -0.39%
BenchmarkFuse10-8 9408 9073 -3.56%
BenchmarkFuse100-8 105238 76277 -27.52%
BenchmarkFuse1000-8 3982562 1026750 -74.22%
BenchmarkFuse10000-8 301220329 12824005 -95.74%
BenchmarkDeadCode1-8 1588 1566 -1.39%
BenchmarkDeadCode10-8 4333 4250 -1.92%
BenchmarkDeadCode100-8 32031 32574 +1.70%
BenchmarkDeadCode1000-8 590407 468275 -20.69%
BenchmarkDeadCode10000-8 17822890 5000818 -71.94%
BenchmarkDeadCode100000-8 1388706640 78021127 -94.38%
BenchmarkDeadCode200000-8 5372518479 168598762 -96.86%
Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17
Reviewed-on: https://go-review.googlesource.com/22589
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Fixes #15510
Change-Id: I2e0568778ef90cf29712753b8c42109ef84a0256
Reviewed-on: https://go-review.googlesource.com/22784
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Fixes #15488
Change-Id: I054eb1e1c859de315e3cdbdef5428682bce693fd
Reviewed-on: https://go-review.googlesource.com/22609
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
It is unused, remove the clutter.
Change-Id: I51a44326b125ef79241459c463441f76a289cc08
Reviewed-on: https://go-review.googlesource.com/22586
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Make sure ops have the right number of args, set
aux and auxint only if allowed, etc.
Normalize error reporting format.
Change-Id: Ie545fcc5990c8c7d62d40d9a0a55885f941eb645
Reviewed-on: https://go-review.googlesource.com/22320
Reviewed-by: David Chase <drchase@google.com>
|
|
Ancestor comparison was the wrong way around, effectively
disabling the def-must-dominate-use check.
Update #15084
Change-Id: Ic56d674c5000569d2cc855bbb000a60eae517c7c
Reviewed-on: https://go-review.googlesource.com/22330
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
This is a fix for the ssacheck builder
http://build.golang.org/log/baa00f70c34e41186051cfe90568de3d91f115d7
after CL 21307 for sinking spills down loop exits
https://go-review.googlesource.com/#/c/21037/
The fix is to reuse (move) the original spill, thus preserving
the definition of the variable and its use count. Original and
copy both use the same stack slot, but ssacheck needs to see
a definition for the variable itself.
Fixes #15279.
Change-Id: I286285490193dc211b312d64dbc5a54867730bd6
Reviewed-on: https://go-review.googlesource.com/21995
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Previously if we were only using the low bits of AuxInt,
the high bits were ignored and could be junk. This CL
changes that behavior to define the high bits to be the
sign-extended version of the low bits for all cases.
There are 2 main benefits:
- Deterministic representation. This helps with CSE.
(Const8 [0x1]) and (Const8 [0x101]) used to be the same "value"
but CSE couldn't see them as such.
- Testability. We can check that all ops leave AuxInt in a state
consistent with the new rule. In the old scheme, it was hard
to check whether a rule correctly used only the low-order bits.
Side benefits:
- ==0 and !=0 tests are easier.
Drawbacks:
- This differs from the runtime representation in registers,
where it is important that we allow upper bits to be undefined
(so we're not sign/zero-extending all the time).
- Ops that treat AuxInt as unsigned (shifts, mostly) need to be
a bit more careful.
Change-Id: I9a685ff27e36dc03287c9ab1cecd6c0b4045c819
Reviewed-on: https://go-review.googlesource.com/21256
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Keep track of how many uses each Value has. Each appearance in
Value.Args and in Block.Control counts once.
The number of uses of a value is generically useful to
constrain rewrite rules. For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.
But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use. Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races. In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice. (I have a separate
CL which triggers the mutex failure. This CL has a test which
demonstrates a similar failure.)
Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
|
|
Split the auxFloat type into 32/64 bit versions and perform checking for
exactly representable float32 values. Perform const folding on
float32/64. Comment out some const negation rules that the frontend
already performs.
Change-Id: Ib3f8d59fa8b30e50fe0267786cfb3c50a06169d2
Reviewed-on: https://go-review.googlesource.com/20568
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Make sure we do any just-before-return cleanup on all paths out of a
function, including when recovering. Each exit path should include
deferreturn (if there are any defers) and then the exit
code (e.g. copying heap-escaping return values back to the stack).
Introduce a Defer SSA block type which has two outgoing edges - one the
fallthrough edge (the defer was queued successfully) and one which
immediately returns (the defer had a successful recover() call and
normal execution should resume at the return point).
Fixes #14725
Change-Id: Iad035c9fd25ef8b7a74dafbd7461cf04833d981f
Reviewed-on: https://go-review.googlesource.com/20486
Reviewed-by: David Chase <drchase@google.com>
|