Age | Commit message (Collapse) | Author |
|
Add a little more detail to the ssa README relating to GOSSAFUNC.
Update the -d=ssa help section to give a little more detail on what
to expect with applying the /debug=X qualifier to a phase.
Change-Id: I7027735f1f2955dbb5b9be36d9a648e8dc655048
Reviewed-on: https://go-review.googlesource.com/c/go/+/315229
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.
Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
|
|
This moves all remaining GOEXPERIMENT flags into the objabi.Experiment
struct, drops the "_enabled" from their name, and makes them all bool
typed.
We also drop DebugFlags.Fieldtrack because the previous CL shifted the
one test that used it to use GOEXPERIMENT instead.
Change-Id: I3406fe62b1c300bb4caeaffa6ca5ce56a70497fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/302389
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
It's no longer conditional.
Change-Id: I697bb0e9ffe9644ec4d2766f7e8be8b82d3b0638
Reviewed-on: https://go-review.googlesource.com/c/go/+/286013
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Change-Id: Ibf68e663f29a5cb3b64a7d923c005c16da647769
Reviewed-on: https://go-review.googlesource.com/c/go/+/266537
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
|
|
As it says, delay expanpsion of OpArg to the expand_calls phase,
to enable (eventually) interprocedural SSA optimizations, and
(sooner) change to a register ABI.
Includes a round of cleanup to function names and comments,
largely to match the expanded scope of the functions.
This CL removes the per-function dependence on GOSSAHASH,
but the go116lateCallExpansion kill switch remains (and was
tested locally to ensure it worked).
Two functions in expand_calls.go that performed overlapping
things were combined into a single function that is called
twice.
Fixes #42236.
For #40724.
Change-Id: Icbb78947eaa39f17f2c1210d5c2caef20abd6571
Reviewed-on: https://go-review.googlesource.com/c/go/+/262117
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This adds a pass to detect common selection operations,
to avoid generating duplicates. Duplicate offsets are
also detected.
All aggregate types are now handled; there is some freedom in where
expand_calls is run, though it must run before softfloat.
Debug-name-maintenance is now incremental both in decompose builtin
and in expand_calls; it might be good to push this into all the
decompose passes.
(this is a smash of 5 CLs that rewrote some of the same code several
times to deal with phase-ordering problems, and included an abandoned
attempt.)
For #40724.
Change-Id: I2a0c32f20660bf8b99e2bcecd33545d97d2bd3c6
Reviewed-on: https://go-review.googlesource.com/c/go/+/249458
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This was useful for debugging failures occurring during make.bash.
The added flush also ensures that any hints in the GOSSAFUNC output
are flushed before fatal exit.
The environment variable GOSSADIR specifies where the SSA html debugging
files should be placed. To avoid collisions, each one is written into
the [package].[functionOrMethod].html, where [package] is the filepath
separator separated package name, function is the function name, and method
is either (*Type).Method, or Type.Method, as appropriate. Directories
are created as necessary to make this work.
Change-Id: I420927426b618b633bb1ffc51cf0f223b8f6d49c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252338
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
arguments
This change incorporates the decision that it should be possible to
run call expansion relatively late in the optimization chain, so that
(1) calls themselves can be exposed to useful optimizations
(2) the effect of selectors on aggregates is seen at the rewrite,
so that assignment of parts into registers is less complicated
(at least I hope it works that way).
That means that selectors feeding into SelectN need to be processed,
and Make* feeding into call parameters need to be processed.
This does however require that call expansion run before decompose
builtins.
This doesn't yet handle rewrites of strings, slices, interfaces,
and complex numbers.
Passes run.bash and race.bash
Change-Id: I71ff23d3c491043beb30e926949970c4f63ef1a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/245133
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Still needs to generate the calls that will need lowering.
Change-Id: Ifd4e510193441a5e27c462c1f1d704f07bf6dec3
Reviewed-on: https://go-review.googlesource.com/c/go/+/242359
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
I noticed that there is a Todo comment here. This variable is only used for filename when dump a function's ssa passes result in details. It is no problem to print a function alone, but may be edited by not only one goroutine if dump multiple functions at the same time. Although it looks only dump one function's ssa passes now. As far as I am concerned this variable can be a member variable of the struct Func. I'm not sure if this change is necessary. Looking forward to your advices, thank you very much.
Change-Id: I35dd7247889e0cc7f19c0b400b597206592dee75
Reviewed-on: https://go-review.googlesource.com/c/go/+/244918
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
The scheduler assumes two special invariants that apply to tuple
selectors (Select0 and Select1 ops):
1. There is only one tuple selector of each type per generator.
2. Tuple selectors and generators reside in the same block.
Prior to this CL the assumption was that these invariants would
only be broken by the CSE pass. The CSE pass therefore contained
code to move and de-duplicate selectors to fix these invariants.
However it is also possible to write relatively basic optimization
rules that cause these invariants to be broken. For example:
(A (Select0 (B))) -> (Select1 (B))
This rule could result in the newly added selector (Select1) being
in a different block to the tuple generator (see issue #38356). It
could also result in duplicate selectors if this rule matches
multiple times for the same tuple generator (see issue #39472).
The CSE pass will 'fix' these invariants. However it will only do
so when optimizations are enabled (since disabling optimizations
disables the CSE pass).
This CL moves the CSE tuple selector fixup code into its own pass
and makes it mandatory even when optimizations are disabled. This
allows tuple selectors to be treated like normal ops for most of
the compilation pipeline until after the new pass has run, at which
point we need to be careful to maintain the invariant again.
Fixes #39472.
Change-Id: Ia3f79e09d9c65ac95f897ce37e967ee1258a080b
Reviewed-on: https://go-review.googlesource.com/c/go/+/237118
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
If the final pass(es) are identical during ssa.html generation,
they are persisted in-memory as "pendingPhases" but never get
written as a column in the html. This change flushes those
in-memory phases.
Fixes #38242
Change-Id: Id13477dcbe7b419a818bb457861b2422ba5ef4bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/227182
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Replace HTMLWriter's Logger field with a *Func. Implement Fatalf method
for HTMLWriter which gets the Frontend() from the Func and calls down
into it's Fatalf method, passing the msg and args along. Replace
remaining calls to the old Logger with calls to logging methods on
the Func.
Change-Id: I966342ef9997396f3416fb152fa52d60080ebecb
Reviewed-on: https://go-review.googlesource.com/c/go/+/227277
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Use a separate compiler pass to introduce complicated x86 addressing
modes. Loads in the normal architecture rules (for x86 and all other
platforms) can have constant offsets (AuxInt values) and symbols (Aux
values), but no more.
The complex addressing modes (x+y, x+2*y, etc.) are introduced in a
separate pass that combines loads with LEAQx ops.
Organizing rewrites this way simplifies the number of rewrites
required, as there are lots of different rule orderings that have to
be specified to ensure these complex addressing modes are always found
if they are possible.
Update #36468
Change-Id: I5b4bf7b03a1e731d6dfeb9ef19b376175f3b4b44
Reviewed-on: https://go-review.googlesource.com/c/go/+/217097
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
This CL incorporates code from CL 201206 by Josh Bleecher Snyder
(thanks Josh).
This CL restores the integer-in-range optimizations in the SSA
backend. The fuse pass is enhanced to detect inequalities that
could be merged and fuse their associated blocks while the generic
rules optimize them into a single unsigned comparison.
For example, the inequality `x >= 0 && x < 10` will now be optimized
to `unsigned(x) < 10`.
Overall has a fairly positive impact on binary sizes.
name old time/op new time/op delta
Template 192ms ± 1% 192ms ± 1% ~ (p=0.757 n=17+18)
Unicode 76.6ms ± 2% 76.5ms ± 2% ~ (p=0.603 n=19+19)
GoTypes 694ms ± 1% 693ms ± 1% ~ (p=0.569 n=19+20)
Compiler 3.26s ± 0% 3.27s ± 0% +0.25% (p=0.000 n=20+20)
SSA 7.41s ± 0% 7.49s ± 0% +1.10% (p=0.000 n=17+19)
Flate 120ms ± 1% 120ms ± 1% +0.38% (p=0.003 n=19+19)
GoParser 152ms ± 1% 152ms ± 1% ~ (p=0.061 n=17+19)
Reflect 422ms ± 1% 425ms ± 2% +0.76% (p=0.001 n=18+20)
Tar 167ms ± 1% 167ms ± 0% ~ (p=0.730 n=18+19)
XML 233ms ± 4% 231ms ± 1% ~ (p=0.752 n=20+17)
LinkCompiler 927ms ± 8% 928ms ± 8% ~ (p=0.857 n=19+20)
ExternalLinkCompiler 1.81s ± 2% 1.81s ± 2% ~ (p=0.513 n=19+20)
LinkWithoutDebugCompiler 556ms ±10% 583ms ±13% +4.95% (p=0.007 n=20+20)
[Geo mean] 478ms 481ms +0.52%
name old user-time/op new user-time/op delta
Template 270ms ± 5% 269ms ± 7% ~ (p=0.925 n=20+20)
Unicode 134ms ± 7% 131ms ±14% ~ (p=0.593 n=18+20)
GoTypes 981ms ± 3% 987ms ± 2% +0.63% (p=0.049 n=19+18)
Compiler 4.50s ± 2% 4.50s ± 1% ~ (p=0.588 n=19+20)
SSA 10.6s ± 2% 10.6s ± 1% ~ (p=0.141 n=20+19)
Flate 164ms ± 8% 165ms ±10% ~ (p=0.738 n=20+20)
GoParser 202ms ± 5% 203ms ± 6% ~ (p=0.820 n=20+20)
Reflect 587ms ± 6% 597ms ± 3% ~ (p=0.087 n=20+18)
Tar 230ms ± 6% 228ms ± 8% ~ (p=0.569 n=19+20)
XML 311ms ± 6% 314ms ± 5% ~ (p=0.369 n=20+20)
LinkCompiler 878ms ± 8% 887ms ± 7% ~ (p=0.289 n=20+20)
ExternalLinkCompiler 1.60s ± 7% 1.60s ± 7% ~ (p=0.820 n=20+20)
LinkWithoutDebugCompiler 498ms ±12% 489ms ±11% ~ (p=0.398 n=20+20)
[Geo mean] 611ms 611ms +0.05%
name old alloc/op new alloc/op delta
Template 36.1MB ± 0% 36.0MB ± 0% -0.32% (p=0.000 n=20+20)
Unicode 28.3MB ± 0% 28.3MB ± 0% -0.03% (p=0.000 n=19+20)
GoTypes 121MB ± 0% 121MB ± 0% ~ (p=0.226 n=16+20)
Compiler 563MB ± 0% 563MB ± 0% ~ (p=0.166 n=20+19)
SSA 1.32GB ± 0% 1.33GB ± 0% +0.88% (p=0.000 n=20+19)
Flate 22.7MB ± 0% 22.7MB ± 0% -0.02% (p=0.033 n=19+20)
GoParser 27.9MB ± 0% 27.9MB ± 0% -0.02% (p=0.001 n=20+20)
Reflect 78.3MB ± 0% 78.2MB ± 0% -0.01% (p=0.019 n=20+20)
Tar 34.0MB ± 0% 34.0MB ± 0% -0.04% (p=0.000 n=20+20)
XML 43.9MB ± 0% 43.9MB ± 0% -0.07% (p=0.000 n=20+19)
LinkCompiler 205MB ± 0% 205MB ± 0% +0.44% (p=0.000 n=20+18)
ExternalLinkCompiler 223MB ± 0% 223MB ± 0% +0.03% (p=0.000 n=20+20)
LinkWithoutDebugCompiler 139MB ± 0% 142MB ± 0% +1.75% (p=0.000 n=20+20)
[Geo mean] 93.7MB 93.9MB +0.20%
name old allocs/op new allocs/op delta
Template 363k ± 0% 361k ± 0% -0.58% (p=0.000 n=20+19)
Unicode 329k ± 0% 329k ± 0% -0.06% (p=0.000 n=19+20)
GoTypes 1.28M ± 0% 1.28M ± 0% -0.01% (p=0.000 n=20+20)
Compiler 5.40M ± 0% 5.40M ± 0% -0.01% (p=0.000 n=20+20)
SSA 12.7M ± 0% 12.8M ± 0% +0.80% (p=0.000 n=20+20)
Flate 228k ± 0% 228k ± 0% ~ (p=0.194 n=20+20)
GoParser 295k ± 0% 295k ± 0% -0.04% (p=0.000 n=20+20)
Reflect 949k ± 0% 949k ± 0% -0.01% (p=0.000 n=20+20)
Tar 337k ± 0% 337k ± 0% -0.06% (p=0.000 n=20+20)
XML 418k ± 0% 417k ± 0% -0.17% (p=0.000 n=20+20)
LinkCompiler 553k ± 0% 554k ± 0% +0.22% (p=0.000 n=20+19)
ExternalLinkCompiler 1.52M ± 0% 1.52M ± 0% +0.27% (p=0.000 n=20+20)
LinkWithoutDebugCompiler 186k ± 0% 186k ± 0% +0.06% (p=0.000 n=20+20)
[Geo mean] 723k 723k +0.03%
name old text-bytes new text-bytes delta
HelloSize 828kB ± 0% 828kB ± 0% -0.01% (p=0.000 n=20+20)
name old data-bytes new data-bytes delta
HelloSize 13.4kB ± 0% 13.4kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 180kB ± 0% 180kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.23MB ± 0% 1.23MB ± 0% -0.33% (p=0.000 n=20+20)
file before after Δ %
addr2line 4320075 4311883 -8192 -0.190%
asm 5191932 5187836 -4096 -0.079%
buildid 2835338 2831242 -4096 -0.144%
compile 20531717 20569099 +37382 +0.182%
cover 5322511 5318415 -4096 -0.077%
dist 3723749 3719653 -4096 -0.110%
doc 4743515 4739419 -4096 -0.086%
fix 3413960 3409864 -4096 -0.120%
link 6690119 6686023 -4096 -0.061%
nm 4269616 4265520 -4096 -0.096%
pprof 14942189 14929901 -12288 -0.082%
trace 11807164 11790780 -16384 -0.139%
vet 8384104 8388200 +4096 +0.049%
go 15339076 15334980 -4096 -0.027%
total 132258257 132226007 -32250 -0.024%
Fixes #30645.
Change-Id: If551ac5996097f3685870d083151b5843170aab0
Reviewed-on: https://go-review.googlesource.com/c/go/+/165998
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This change adds the option to run the ssa checker with a random seed.
The current system uses a completely fixed seed,
which is good for reproducibility but bad for exploring the state space.
Preserve what we have, but also provide a way for the caller
to provide a seed. The caller can report the seed
alongside any failures.
Change-Id: I2676a8112d8260e6cac86d95d2e8db4d3221aeeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/216418
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Add an internal mode to simplify debugging of posets
by checking the integrity after every mutation. Turn
it on within SSA checked builds.
Change-Id: Idaa8277f58e5bce3753702e212cea4d698de30ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/196780
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Nilcheck would move statements from NilCheck values to others that
turned out were already dead, which leads to lost statements. Better
to eliminate the dead code first.
One "error" is removed from test/prove.go because the code is
actually dead, and the additional deadcode pass removes it before
prove can run.
Change-Id: If75926ca1acbb59c7ab9c8ef14d60a02a0a94f8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/198479
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
|
|
While working on #30645, I noticed that many instances
in which the walkinrange optimization could apply
were not even being considered.
This was because of extraneous blocks in the CFG,
of the type that shortcircuit normally removes.
The change improves the shortcircuit pass to handle
most of those cases. (There are a few that can only be
reasonably detected later in compilation, after other
optimizations have been run, but not enough to be worth chasing.)
Notable changes:
* Instead of calculating live-across-blocks values, use v.Uses == 1.
This is cheaper and more straightforward.
v.Uses did not exist when this pass was initially written.
* Incorporate a fusePlain and loop until stable.
This is necessary to find many of the instances.
* Allow Copy and Not wrappers around Phi values.
This significantly increases effectiveness.
* Allow removal of all preds, creating a dead block.
The previous pass stopped unnecessarily at one pred.
* Use phielimValue during cleanup instead of manually
setting the op to OpCopy.
The result is marginally faster compilation and smaller code.
name old time/op new time/op delta
Template 213ms ± 2% 212ms ± 2% -0.63% (p=0.002 n=49+48)
Unicode 90.0ms ± 2% 89.8ms ± 2% ~ (p=0.122 n=48+48)
GoTypes 710ms ± 3% 711ms ± 2% ~ (p=0.433 n=45+49)
Compiler 3.23s ± 2% 3.22s ± 2% ~ (p=0.124 n=47+49)
SSA 10.0s ± 1% 10.0s ± 1% -0.43% (p=0.000 n=48+50)
Flate 135ms ± 3% 135ms ± 2% ~ (p=0.311 n=49+49)
GoParser 158ms ± 2% 158ms ± 2% ~ (p=0.757 n=48+48)
Reflect 447ms ± 2% 447ms ± 2% ~ (p=0.815 n=49+48)
Tar 189ms ± 2% 189ms ± 3% ~ (p=0.530 n=47+49)
XML 251ms ± 3% 250ms ± 1% -0.75% (p=0.002 n=49+48)
[Geo mean] 427ms 426ms -0.25%
name old user-time/op new user-time/op delta
Template 265ms ± 2% 265ms ± 2% ~ (p=0.969 n=48+50)
Unicode 119ms ± 6% 119ms ± 6% ~ (p=0.738 n=50+50)
GoTypes 923ms ± 2% 925ms ± 2% ~ (p=0.057 n=43+47)
Compiler 4.37s ± 2% 4.37s ± 2% ~ (p=0.691 n=50+46)
SSA 13.4s ± 1% 13.4s ± 1% ~ (p=0.282 n=42+49)
Flate 162ms ± 2% 162ms ± 2% ~ (p=0.774 n=48+50)
GoParser 186ms ± 2% 186ms ± 3% ~ (p=0.213 n=47+47)
Reflect 572ms ± 2% 573ms ± 3% ~ (p=0.303 n=50+49)
Tar 240ms ± 3% 240ms ± 2% ~ (p=0.939 n=46+44)
XML 302ms ± 2% 302ms ± 2% ~ (p=0.399 n=47+47)
[Geo mean] 540ms 541ms +0.07%
name old alloc/op new alloc/op delta
Template 36.8MB ± 0% 36.7MB ± 0% -0.42% (p=0.008 n=5+5)
Unicode 28.1MB ± 0% 28.1MB ± 0% ~ (p=0.151 n=5+5)
GoTypes 124MB ± 0% 124MB ± 0% -0.26% (p=0.008 n=5+5)
Compiler 571MB ± 0% 566MB ± 0% -0.84% (p=0.008 n=5+5)
SSA 1.86GB ± 0% 1.85GB ± 0% -0.58% (p=0.008 n=5+5)
Flate 22.8MB ± 0% 22.8MB ± 0% -0.17% (p=0.008 n=5+5)
GoParser 27.3MB ± 0% 27.3MB ± 0% -0.20% (p=0.008 n=5+5)
Reflect 79.5MB ± 0% 79.3MB ± 0% -0.20% (p=0.008 n=5+5)
Tar 34.7MB ± 0% 34.6MB ± 0% -0.42% (p=0.008 n=5+5)
XML 45.4MB ± 0% 45.3MB ± 0% -0.29% (p=0.008 n=5+5)
[Geo mean] 80.0MB 79.7MB -0.34%
name old allocs/op new allocs/op delta
Template 378k ± 0% 377k ± 0% -0.22% (p=0.008 n=5+5)
Unicode 339k ± 0% 339k ± 0% ~ (p=0.643 n=5+5)
GoTypes 1.36M ± 0% 1.36M ± 0% -0.10% (p=0.008 n=5+5)
Compiler 5.51M ± 0% 5.50M ± 0% -0.13% (p=0.008 n=5+5)
SSA 17.5M ± 0% 17.5M ± 0% -0.14% (p=0.008 n=5+5)
Flate 234k ± 0% 234k ± 0% -0.04% (p=0.008 n=5+5)
GoParser 299k ± 0% 299k ± 0% -0.05% (p=0.008 n=5+5)
Reflect 978k ± 0% 979k ± 0% +0.02% (p=0.016 n=5+5)
Tar 351k ± 0% 351k ± 0% -0.04% (p=0.008 n=5+5)
XML 435k ± 0% 435k ± 0% -0.11% (p=0.008 n=5+5)
[Geo mean] 840k 840k -0.08%
file before after Δ %
go 14794788 14770212 -24576 -0.166%
addr2line 4203688 4199592 -4096 -0.097%
api 5954056 5941768 -12288 -0.206%
asm 4862704 4846320 -16384 -0.337%
cgo 4778920 4770728 -8192 -0.171%
compile 24001568 23923792 -77776 -0.324%
cover 5198440 5190248 -8192 -0.158%
dist 3595248 3587056 -8192 -0.228%
doc 4618504 4610312 -8192 -0.177%
fix 3337416 3333320 -4096 -0.123%
link 6120408 6116312 -4096 -0.067%
nm 4149064 4140872 -8192 -0.197%
objdump 4555608 4547416 -8192 -0.180%
pprof 14616324 14595844 -20480 -0.140%
test2json 2766328 2762232 -4096 -0.148%
trace 11638844 11622460 -16384 -0.141%
vet 8274936 8258552 -16384 -0.198%
total 132520780 132270972 -249808 -0.189%
Change-Id: Ifcd235a2a6e5f13ed5c93e62523e2ef61321fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/178197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
CSE can make dead values live again.
Running deadcode first avoids that;
it also makes CSE more efficient.
file before after Δ %
api 5970616 5966520 -4096 -0.069%
asm 4867088 4846608 -20480 -0.421%
compile 23988320 23935072 -53248 -0.222%
link 6084376 6080280 -4096 -0.067%
nm 4165736 4161640 -4096 -0.098%
objdump 4572216 4568120 -4096 -0.090%
pprof 14452996 14457092 +4096 +0.028%
trace 11467292 11471388 +4096 +0.036%
total 132181100 132099180 -81920 -0.062%
Compiler performance impact is negligible:
name old alloc/op new alloc/op delta
Template 38.8MB ± 0% 38.8MB ± 0% -0.04% (p=0.008 n=5+5)
Unicode 28.2MB ± 0% 28.2MB ± 0% ~ (p=1.000 n=5+5)
GoTypes 131MB ± 0% 131MB ± 0% -0.14% (p=0.008 n=5+5)
Compiler 606MB ± 0% 606MB ± 0% -0.05% (p=0.008 n=5+5)
SSA 2.14GB ± 0% 2.13GB ± 0% -0.26% (p=0.008 n=5+5)
Flate 24.0MB ± 0% 24.0MB ± 0% -0.18% (p=0.008 n=5+5)
GoParser 28.8MB ± 0% 28.8MB ± 0% -0.15% (p=0.008 n=5+5)
Reflect 83.8MB ± 0% 83.7MB ± 0% -0.11% (p=0.008 n=5+5)
Tar 36.4MB ± 0% 36.4MB ± 0% -0.09% (p=0.008 n=5+5)
XML 47.9MB ± 0% 47.8MB ± 0% -0.15% (p=0.008 n=5+5)
[Geo mean] 84.6MB 84.5MB -0.12%
name old allocs/op new allocs/op delta
Template 379k ± 0% 380k ± 0% +0.15% (p=0.008 n=5+5)
Unicode 340k ± 0% 340k ± 0% ~ (p=0.738 n=5+5)
GoTypes 1.36M ± 0% 1.36M ± 0% +0.05% (p=0.008 n=5+5)
Compiler 5.49M ± 0% 5.49M ± 0% +0.12% (p=0.008 n=5+5)
SSA 17.5M ± 0% 17.5M ± 0% -0.18% (p=0.008 n=5+5)
Flate 235k ± 0% 235k ± 0% ~ (p=0.079 n=5+5)
GoParser 302k ± 0% 302k ± 0% ~ (p=0.310 n=5+5)
Reflect 976k ± 0% 977k ± 0% +0.08% (p=0.008 n=5+5)
Tar 352k ± 0% 352k ± 0% +0.12% (p=0.008 n=5+5)
XML 436k ± 0% 436k ± 0% -0.05% (p=0.008 n=5+5)
[Geo mean] 842k 842k +0.03%
Change-Id: I53e8faed1859885ca5c4a5d45067a50984f3eff1
Reviewed-on: https://go-review.googlesource.com/c/go/+/175879
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
noteRule is useful when you're trying to debug
a particular rule, or get a general sense for
how often a rule fires overall.
It is less useful if you're trying to figure
out which functions might be useful to benchmark
to ascertain the impact of a newly added rule.
Enter countRule. You use it like noteRule,
except that you get per-function summaries.
Sample output:
# runtime
(*mspan).sweep: idx1=1
evacuate_faststr: idx1=1
evacuate_fast32: idx1=1
evacuate: idx1=2
evacuate_fast64: idx1=1
sweepone: idx1=1
purgecachedstats: idx1=1
mProf_Free: idx1=1
This suggests that the map benchmarks
might be good to run for this added rule.
Change-Id: Id471c3231f1736165f2020f6979ff01c29677808
Reviewed-on: https://go-review.googlesource.com/c/go/+/167088
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change-Id: I4a70f4a52f84cf50f99939351319504b1c5dff76
Reviewed-on: https://go-review.googlesource.com/c/go/+/175777
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The phi tighten pass moves rematerializable phi args
to the immediate predecessor of the phis.
This reduces value lifetimes for regalloc.
However, the critical edge removal pass can introduce
new blocks, which can change what a block's
immediate precedessor is. This can result in tightened
phi args being spilled unnecessarily.
This change moves the phi tighten pass after the
critical edge pass, when the block structure is stable.
This improves the code generated for
func f(s string) bool { return s == "abcde" }
Before this change:
"".f STEXT nosplit size=44 args=0x18 locals=0x0
0x0000 00000 (x.go:3) MOVQ "".s+16(SP), AX
0x0005 00005 (x.go:3) CMPQ AX, $5
0x0009 00009 (x.go:3) JNE 40
0x000b 00011 (x.go:3) MOVQ "".s+8(SP), AX
0x0010 00016 (x.go:3) CMPL (AX), $1684234849
0x0016 00022 (x.go:3) JNE 36
0x0018 00024 (x.go:3) CMPB 4(AX), $101
0x001c 00028 (x.go:3) SETEQ AL
0x001f 00031 (x.go:3) MOVB AL, "".~r1+24(SP)
0x0023 00035 (x.go:3) RET
0x0024 00036 (x.go:3) XORL AX, AX
0x0026 00038 (x.go:3) JMP 31
0x0028 00040 (x.go:3) XORL AX, AX
0x002a 00042 (x.go:3) JMP 31
Observe the duplicated blocks at the end.
After this change:
"".f STEXT nosplit size=40 args=0x18 locals=0x0
0x0000 00000 (x.go:3) MOVQ "".s+16(SP), AX
0x0005 00005 (x.go:3) CMPQ AX, $5
0x0009 00009 (x.go:3) JNE 36
0x000b 00011 (x.go:3) MOVQ "".s+8(SP), AX
0x0010 00016 (x.go:3) CMPL (AX), $1684234849
0x0016 00022 (x.go:3) JNE 36
0x0018 00024 (x.go:3) CMPB 4(AX), $101
0x001c 00028 (x.go:3) SETEQ AL
0x001f 00031 (x.go:3) MOVB AL, "".~r1+24(SP)
0x0023 00035 (x.go:3) RET
0x0024 00036 (x.go:3) XORL AX, AX
0x0026 00038 (x.go:3) JMP 31
Change-Id: I12c81aa53b89456cb5809aa5396378245f3beda9
Reviewed-on: https://go-review.googlesource.com/c/go/+/172597
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
I forgot how to pull up the ssa debug options help, so instead of
writing -d=ssa/help, I just wrote -d=ssa/. Much to my amusement, the
compiler just crashed, as shown below. Fix that.
panic: runtime error: index out of range
goroutine 1 [running]:
cmd/compile/internal/ssa.PhaseOption(0x7ffc375d2b70, 0x0, 0xdbff91, 0x5, 0x1, 0x0, 0x0, 0x1, 0x1)
/home/mvdan/tip/src/cmd/compile/internal/ssa/compile.go:327 +0x1876
cmd/compile/internal/gc.Main(0xde7bd8)
/home/mvdan/tip/src/cmd/compile/internal/gc/main.go:411 +0x41d0
main.main()
/home/mvdan/tip/src/cmd/compile/main.go:51 +0xab
Change-Id: Ia2ad394382ddf8f4498b16b5cfb49be0317fc1aa
Reviewed-on: https://go-review.googlesource.com/c/154421
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
A little bit of compiler stress testing. Randomize the order
of the values in a block before every phase. This randomization
makes sure that we're not implicitly depending on that order.
Currently the random seed is a hash of the function name.
It provides determinism, but sacrifices some coverage.
Other arrangements are possible (env var, ...) but require
more setup.
Fixes #20178
Change-Id: Idae792a23264bd9a3507db6ba49b6d591a608e83
Reviewed-on: https://go-review.googlesource.com/c/33909
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This makes it easier to track names of function arguments
for debugging purposes.
Change-Id: Ic34856fe0b910005e1c7bc051d769d489a4b158e
Reviewed-on: https://go-review.googlesource.com/c/150098
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The branchelim pass works better after fuse.
Running fuse before branchelim also increases
the stability of generated code amidst other compiler changes,
which was the original motivation behind this change.
The fuse pass is not cheap enough to run in its entirety
before branchelim, but the most important half of it is.
This change makes it possible to run "plain fuse" independently
and does so before branchelim.
During make.bash, elimIf occurrences increase from 4244 to 4288 (1%),
and elimIfElse occurrences increase from 989 to 1079 (9%).
Toolspeed impact is marginal; plain fuse pays for itself.
name old time/op new time/op delta
Template 189ms ± 2% 189ms ± 2% ~ (p=0.890 n=45+46)
Unicode 93.2ms ± 5% 93.4ms ± 7% ~ (p=0.790 n=48+48)
GoTypes 662ms ± 4% 660ms ± 4% ~ (p=0.186 n=48+49)
Compiler 2.89s ± 4% 2.91s ± 3% +0.89% (p=0.050 n=49+44)
SSA 8.23s ± 2% 8.21s ± 1% ~ (p=0.165 n=46+44)
Flate 123ms ± 4% 123ms ± 3% +0.58% (p=0.031 n=47+49)
GoParser 154ms ± 4% 154ms ± 4% ~ (p=0.492 n=49+48)
Reflect 430ms ± 4% 429ms ± 4% ~ (p=1.000 n=48+48)
Tar 171ms ± 3% 170ms ± 4% ~ (p=0.122 n=48+48)
XML 232ms ± 3% 232ms ± 2% ~ (p=0.850 n=46+49)
[Geo mean] 394ms 394ms +0.02%
name old user-time/op new user-time/op delta
Template 236ms ± 5% 236ms ± 4% ~ (p=0.934 n=50+50)
Unicode 132ms ± 7% 130ms ± 9% ~ (p=0.087 n=50+50)
GoTypes 861ms ± 3% 867ms ± 4% ~ (p=0.124 n=48+50)
Compiler 3.93s ± 4% 3.94s ± 3% ~ (p=0.584 n=49+44)
SSA 12.2s ± 2% 12.3s ± 1% ~ (p=0.610 n=46+45)
Flate 149ms ± 4% 150ms ± 4% ~ (p=0.194 n=48+49)
GoParser 193ms ± 5% 191ms ± 6% ~ (p=0.239 n=49+50)
Reflect 553ms ± 5% 556ms ± 5% ~ (p=0.091 n=49+49)
Tar 218ms ± 5% 218ms ± 5% ~ (p=0.359 n=49+50)
XML 299ms ± 5% 298ms ± 4% ~ (p=0.482 n=50+49)
[Geo mean] 516ms 516ms -0.01%
name old alloc/op new alloc/op delta
Template 36.3MB ± 0% 36.3MB ± 0% -0.02% (p=0.000 n=49+49)
Unicode 29.7MB ± 0% 29.7MB ± 0% ~ (p=0.270 n=50+50)
GoTypes 126MB ± 0% 126MB ± 0% -0.34% (p=0.000 n=50+49)
Compiler 534MB ± 0% 531MB ± 0% -0.50% (p=0.000 n=50+50)
SSA 1.98GB ± 0% 1.98GB ± 0% -0.06% (p=0.000 n=49+49)
Flate 24.6MB ± 0% 24.6MB ± 0% -0.29% (p=0.000 n=50+50)
GoParser 29.5MB ± 0% 29.4MB ± 0% -0.15% (p=0.000 n=49+50)
Reflect 87.3MB ± 0% 87.2MB ± 0% -0.13% (p=0.000 n=49+50)
Tar 35.6MB ± 0% 35.5MB ± 0% -0.17% (p=0.000 n=50+50)
XML 48.2MB ± 0% 48.0MB ± 0% -0.30% (p=0.000 n=48+50)
[Geo mean] 83.1MB 82.9MB -0.20%
name old allocs/op new allocs/op delta
Template 352k ± 0% 352k ± 0% -0.01% (p=0.004 n=49+49)
Unicode 341k ± 0% 341k ± 0% ~ (p=0.341 n=48+50)
GoTypes 1.28M ± 0% 1.28M ± 0% -0.03% (p=0.000 n=50+49)
Compiler 4.96M ± 0% 4.96M ± 0% -0.05% (p=0.000 n=50+49)
SSA 15.5M ± 0% 15.5M ± 0% -0.01% (p=0.000 n=50+49)
Flate 233k ± 0% 233k ± 0% +0.01% (p=0.032 n=49+49)
GoParser 294k ± 0% 294k ± 0% ~ (p=0.052 n=46+48)
Reflect 1.04M ± 0% 1.04M ± 0% ~ (p=0.171 n=50+47)
Tar 343k ± 0% 343k ± 0% -0.03% (p=0.000 n=50+50)
XML 429k ± 0% 429k ± 0% -0.04% (p=0.000 n=50+50)
[Geo mean] 812k 812k -0.02%
Object files grow slightly; branchelim often increases binary size, at least on amd64.
name old object-bytes new object-bytes delta
Template 509kB ± 0% 509kB ± 0% -0.01% (p=0.008 n=5+5)
Unicode 224kB ± 0% 224kB ± 0% ~ (all equal)
GoTypes 1.84MB ± 0% 1.84MB ± 0% +0.00% (p=0.008 n=5+5)
Compiler 6.71MB ± 0% 6.71MB ± 0% +0.01% (p=0.008 n=5+5)
SSA 21.2MB ± 0% 21.2MB ± 0% +0.01% (p=0.008 n=5+5)
Flate 324kB ± 0% 324kB ± 0% -0.00% (p=0.008 n=5+5)
GoParser 404kB ± 0% 404kB ± 0% -0.02% (p=0.008 n=5+5)
Reflect 1.40MB ± 0% 1.40MB ± 0% +0.09% (p=0.008 n=5+5)
Tar 452kB ± 0% 452kB ± 0% +0.06% (p=0.008 n=5+5)
XML 596kB ± 0% 596kB ± 0% +0.00% (p=0.008 n=5+5)
[Geo mean] 1.04MB 1.04MB +0.01%
Change-Id: I535c711b85380ff657fc0f022bebd9cb14ddd07f
Reviewed-on: https://go-review.googlesource.com/c/129378
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Since we print almost everything to ssa.html in the GOSSAFUNC mode,
there is a need to stop spamming stdout when user just wants to see
ssa.html.
This changes cleans output of the GOSSAFUNC debug mode.
To enable the dump of the debug data to stdout, one must
put suffix + after the function name like that:
GOSSAFUNC=Foo+
Otherwise gc will not print the IR and ASM to stdout after each phase.
AST IR is still sent to stdout because it is not included
into ssa.html. It will be fixed in a separate change.
The change adds printing out the full path to the ssa.html file.
Updates #25942
Change-Id: I711e145e05f0443c7df5459ca528dced273a62ee
Reviewed-on: https://go-review.googlesource.com/126603
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Late opt pass may generate dead stores, which messes up store
chain calculation in later passes. Run generic deadcode even
in -N mode to remove them.
Fixes #26163.
Change-Id: I8276101717bb978d5980e6c7998f53fd8d0ae10f
Reviewed-on: https://go-review.googlesource.com/121856
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Display just a few columns in ssa.html, other
columns can be expanded by clicking on collapsed column.
Use sans serif font for the text, slightly smaller font size
for non program text.
Fixes #25286
Change-Id: I1094695135401602d90b97b69e42f6dda05871a2
Reviewed-on: https://go-review.googlesource.com/117275
Run-TryBot: Yury Smolsky <yury@smolsky.by>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
Propagate values through some wide Zero/Move operations. Among
other things this allows us to optimize some kinds of array
initialization. For example, the following code no longer
requires a temporary be allocated on the stack. Instead it
writes the values directly into the return value.
func f(i uint32) [4]uint32 {
return [4]uint32{i, i+1, i+2, i+3}
}
The return value is unnecessarily cleared but removing that is
probably a task for dead store analysis (I think it needs to
be able to match multiple Store ops to wide Zero ops).
In order to reliably remove stack variables that are rendered
unnecessary by these new rules I've added a new generic version
of the unread autos elimination pass.
These rules are triggered more than 5000 times when building and
testing the standard library.
Updates #15925 (fixes for arrays of up to 4 elements).
Updates #24386 (fixes for up to 4 kept elements).
Updates #24416.
compilebench results:
name old time/op new time/op delta
Template 353ms ± 5% 359ms ± 3% ~ (p=0.143 n=10+10)
Unicode 219ms ± 1% 217ms ± 4% ~ (p=0.740 n=7+10)
GoTypes 1.26s ± 1% 1.26s ± 2% ~ (p=0.549 n=9+10)
Compiler 6.00s ± 1% 6.08s ± 1% +1.42% (p=0.000 n=9+8)
SSA 15.3s ± 2% 15.6s ± 1% +2.43% (p=0.000 n=10+10)
Flate 237ms ± 2% 240ms ± 2% +1.31% (p=0.015 n=10+10)
GoParser 285ms ± 1% 285ms ± 1% ~ (p=0.878 n=8+8)
Reflect 797ms ± 3% 807ms ± 2% ~ (p=0.065 n=9+10)
Tar 334ms ± 0% 335ms ± 4% ~ (p=0.460 n=8+10)
XML 419ms ± 0% 423ms ± 1% +0.91% (p=0.001 n=7+9)
StdCmd 46.0s ± 0% 46.4s ± 0% +0.85% (p=0.000 n=9+9)
name old user-time/op new user-time/op delta
Template 337ms ± 3% 346ms ± 5% ~ (p=0.053 n=9+10)
Unicode 205ms ±10% 205ms ± 8% ~ (p=1.000 n=10+10)
GoTypes 1.22s ± 2% 1.21s ± 3% ~ (p=0.436 n=10+10)
Compiler 5.85s ± 1% 5.93s ± 0% +1.46% (p=0.000 n=10+8)
SSA 14.9s ± 1% 15.3s ± 1% +2.62% (p=0.000 n=10+10)
Flate 229ms ± 4% 228ms ± 6% ~ (p=0.796 n=10+10)
GoParser 271ms ± 3% 275ms ± 4% ~ (p=0.165 n=10+10)
Reflect 779ms ± 5% 775ms ± 2% ~ (p=0.971 n=10+10)
Tar 317ms ± 4% 319ms ± 5% ~ (p=0.853 n=10+10)
XML 404ms ± 4% 409ms ± 5% ~ (p=0.436 n=10+10)
name old alloc/op new alloc/op delta
Template 34.9MB ± 0% 35.0MB ± 0% +0.26% (p=0.000 n=10+10)
Unicode 29.3MB ± 0% 29.3MB ± 0% +0.02% (p=0.000 n=10+10)
GoTypes 115MB ± 0% 115MB ± 0% +0.30% (p=0.000 n=10+10)
Compiler 519MB ± 0% 521MB ± 0% +0.30% (p=0.000 n=10+10)
SSA 1.55GB ± 0% 1.57GB ± 0% +1.34% (p=0.000 n=10+9)
Flate 24.1MB ± 0% 24.2MB ± 0% +0.10% (p=0.000 n=10+10)
GoParser 28.1MB ± 0% 28.1MB ± 0% +0.07% (p=0.000 n=10+10)
Reflect 78.7MB ± 0% 78.7MB ± 0% +0.03% (p=0.000 n=8+10)
Tar 34.4MB ± 0% 34.5MB ± 0% +0.12% (p=0.000 n=10+10)
XML 43.2MB ± 0% 43.2MB ± 0% +0.13% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Template 330k ± 0% 330k ± 0% -0.01% (p=0.017 n=10+10)
Unicode 337k ± 0% 337k ± 0% +0.01% (p=0.000 n=9+10)
GoTypes 1.15M ± 0% 1.15M ± 0% +0.03% (p=0.000 n=10+10)
Compiler 4.77M ± 0% 4.77M ± 0% +0.03% (p=0.000 n=9+10)
SSA 12.5M ± 0% 12.6M ± 0% +1.16% (p=0.000 n=10+10)
Flate 221k ± 0% 221k ± 0% +0.05% (p=0.000 n=9+10)
GoParser 275k ± 0% 275k ± 0% +0.01% (p=0.014 n=10+9)
Reflect 944k ± 0% 944k ± 0% -0.02% (p=0.000 n=10+10)
Tar 324k ± 0% 323k ± 0% -0.12% (p=0.000 n=10+10)
XML 384k ± 0% 384k ± 0% -0.01% (p=0.001 n=10+10)
name old object-bytes new object-bytes delta
Template 476kB ± 0% 476kB ± 0% -0.04% (p=0.000 n=10+10)
Unicode 218kB ± 0% 218kB ± 0% ~ (all equal)
GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.04% (p=0.000 n=10+10)
Compiler 6.25MB ± 0% 6.24MB ± 0% -0.09% (p=0.000 n=10+10)
SSA 15.9MB ± 0% 16.1MB ± 0% +1.22% (p=0.000 n=10+10)
Flate 304kB ± 0% 304kB ± 0% -0.13% (p=0.000 n=10+10)
GoParser 370kB ± 0% 370kB ± 0% -0.00% (p=0.000 n=10+10)
Reflect 1.27MB ± 0% 1.27MB ± 0% -0.12% (p=0.000 n=10+10)
Tar 421kB ± 0% 419kB ± 0% -0.64% (p=0.000 n=10+10)
XML 518kB ± 0% 517kB ± 0% -0.12% (p=0.000 n=10+10)
name old export-bytes new export-bytes delta
Template 16.7kB ± 0% 16.7kB ± 0% ~ (all equal)
Unicode 6.52kB ± 0% 6.52kB ± 0% ~ (all equal)
GoTypes 29.2kB ± 0% 29.2kB ± 0% ~ (all equal)
Compiler 88.0kB ± 0% 88.0kB ± 0% ~ (all equal)
SSA 109kB ± 0% 109kB ± 0% ~ (all equal)
Flate 4.49kB ± 0% 4.49kB ± 0% ~ (all equal)
GoParser 8.10kB ± 0% 8.10kB ± 0% ~ (all equal)
Reflect 7.71kB ± 0% 7.71kB ± 0% ~ (all equal)
Tar 9.15kB ± 0% 9.15kB ± 0% ~ (all equal)
XML 12.3kB ± 0% 12.3kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 676kB ± 0% 672kB ± 0% -0.59% (p=0.000 n=10+10)
CmdGoSize 7.26MB ± 0% 7.24MB ± 0% -0.18% (p=0.000 n=10+10)
name old data-bytes new data-bytes delta
HelloSize 10.2kB ± 0% 10.2kB ± 0% ~ (all equal)
CmdGoSize 248kB ± 0% 248kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal)
CmdGoSize 145kB ± 0% 145kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.46MB ± 0% 1.45MB ± 0% -0.31% (p=0.000 n=10+10)
CmdGoSize 14.7MB ± 0% 14.7MB ± 0% -0.17% (p=0.000 n=10+10)
Change-Id: Ic72b0c189dd542f391e1c9ab88a76e9148dc4285
Reviewed-on: https://go-review.googlesource.com/106495
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change the help doc of
go tool compile -d=ssa/help
from this:
compile: GcFlag -d=ssa/<phase>/<flag>[=<value>|<function_name>]
<phase> is one of:
check, all, build, intrinsics, early_phielim, early_copyelim
early_deadcode, short_circuit, decompose_user, opt, zero_arg_cse
opt_deadcode, generic_cse, phiopt, nilcheckelim, prove, loopbce
decompose_builtin, softfloat, late_opt, generic_deadcode, check_bce
fuse, dse, writebarrier, insert_resched_checks, tighten, lower
lowered_cse, elim_unread_autos, lowered_deadcode, checkLower
late_phielim, late_copyelim, phi_tighten, late_deadcode, critical
likelyadjust, layout, schedule, late_nilcheck, flagalloc, regalloc
loop_rotate, stackframe, trim
<flag> is one of on, off, debug, mem, time, test, stats, dump
<value> defaults to 1
<function_name> is required for "dump", specifies name of function to dump after <phase>
Except for dump, output is directed to standard out; dump appears in a file.
Phase "all" supports flags "time", "mem", and "dump".
Phases "intrinsics" supports flags "on", "off", and "debug".
Interpretation of the "debug" value depends on the phase.
Dump files are named <phase>__<function_name>_<seq>.dump.
To this:
compile: PhaseOptions usage:
go tool compile -d=ssa/<phase>/<flag>[=<value>|<function_name>]
where:
- <phase> is one of:
check, all, build, intrinsics, early_phielim, early_copyelim
early_deadcode, short_circuit, decompose_user, opt, zero_arg_cse
opt_deadcode, generic_cse, phiopt, nilcheckelim, prove
decompose_builtin, softfloat, late_opt, generic_deadcode, check_bce
branchelim, fuse, dse, writebarrier, insert_resched_checks, lower
lowered_cse, elim_unread_autos, lowered_deadcode, checkLower
late_phielim, late_copyelim, tighten, phi_tighten, late_deadcode
critical, likelyadjust, layout, schedule, late_nilcheck, flagalloc
regalloc, loop_rotate, stackframe, trim
- <flag> is one of:
on, off, debug, mem, time, test, stats, dump
- <value> defaults to 1
- <function_name> is required for the "dump" flag, and specifies the
name of function to dump after <phase>
Phase "all" supports flags "time", "mem", and "dump".
Phase "intrinsics" supports flags "on", "off", and "debug".
If the "dump" flag is specified, the output is written on a file named
<phase>__<function_name>_<seq>.dump; otherwise it is directed to stdout.
Also add a few examples at the bottom.
Fixes #20349
Change-Id: I334799e951e7b27855b3ace5d2d966c4d6ec4cff
Reviewed-on: https://go-review.googlesource.com/110062
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
prove now is able to do what loopbce used to do.
Passes toolstash -cmp.
Compilebench of the whole serie (master 9967582f770f6):
name old time/op new time/op delta
Template 208ms ±18% 198ms ± 4% ~ (p=0.690 n=5+5)
Unicode 99.1ms ±19% 96.5ms ± 4% ~ (p=0.548 n=5+5)
GoTypes 623ms ± 1% 633ms ± 1% ~ (p=0.056 n=5+5)
Compiler 2.94s ± 2% 3.02s ± 4% ~ (p=0.095 n=5+5)
SSA 6.77s ± 1% 7.11s ± 2% +4.94% (p=0.008 n=5+5)
Flate 129ms ± 1% 136ms ± 0% +4.87% (p=0.016 n=5+4)
GoParser 152ms ± 3% 156ms ± 1% ~ (p=0.095 n=5+5)
Reflect 380ms ± 2% 392ms ± 1% +3.30% (p=0.008 n=5+5)
Tar 185ms ± 6% 184ms ± 2% ~ (p=0.690 n=5+5)
XML 223ms ± 2% 228ms ± 3% ~ (p=0.095 n=5+5)
StdCmd 26.8s ± 2% 28.0s ± 5% +4.46% (p=0.032 n=5+5)
name old user-ns/op new user-ns/op delta
Template 252M ± 5% 248M ± 3% ~ (p=1.000 n=5+5)
Unicode 118M ± 7% 121M ± 4% ~ (p=0.548 n=5+5)
GoTypes 790M ± 2% 793M ± 2% ~ (p=0.690 n=5+5)
Compiler 3.78G ± 3% 3.91G ± 4% ~ (p=0.056 n=5+5)
SSA 8.98G ± 2% 9.52G ± 3% +6.08% (p=0.008 n=5+5)
Flate 155M ± 1% 160M ± 0% +3.47% (p=0.016 n=5+4)
GoParser 185M ± 4% 187M ± 2% ~ (p=0.310 n=5+5)
Reflect 469M ± 1% 481M ± 1% +2.52% (p=0.016 n=5+5)
Tar 222M ± 4% 222M ± 2% ~ (p=0.841 n=5+5)
XML 269M ± 1% 274M ± 2% +1.88% (p=0.032 n=5+5)
name old text-bytes new text-bytes delta
HelloSize 664k ± 0% 664k ± 0% ~ (all equal)
CmdGoSize 7.23M ± 0% 7.22M ± 0% -0.06% (p=0.008 n=5+5)
name old data-bytes new data-bytes delta
HelloSize 134k ± 0% 134k ± 0% ~ (all equal)
CmdGoSize 390k ± 0% 390k ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.39M ± 0% 1.39M ± 0% ~ (all equal)
CmdGoSize 14.4M ± 0% 14.4M ± 0% -0.06% (p=0.008 n=5+5)
Go1 of the whole serie:
name old time/op new time/op delta
BinaryTree17-16 5.40s ± 6% 5.38s ± 4% ~ (p=1.000 n=12+10)
Fannkuch11-16 4.04s ± 3% 3.81s ± 3% -5.70% (p=0.000 n=11+11)
FmtFprintfEmpty-16 60.7ns ± 2% 60.2ns ± 3% ~ (p=0.136 n=11+10)
FmtFprintfString-16 115ns ± 2% 114ns ± 4% ~ (p=0.175 n=11+10)
FmtFprintfInt-16 118ns ± 2% 125ns ± 2% +5.76% (p=0.000 n=11+10)
FmtFprintfIntInt-16 196ns ± 2% 204ns ± 3% +4.42% (p=0.000 n=10+11)
FmtFprintfPrefixedInt-16 207ns ± 2% 214ns ± 2% +3.23% (p=0.000 n=10+11)
FmtFprintfFloat-16 364ns ± 3% 357ns ± 2% -1.88% (p=0.002 n=11+11)
FmtManyArgs-16 773ns ± 2% 775ns ± 1% ~ (p=0.457 n=11+10)
GobDecode-16 11.2ms ± 4% 11.0ms ± 3% -1.51% (p=0.022 n=10+9)
GobEncode-16 9.91ms ± 6% 9.81ms ± 5% ~ (p=0.699 n=11+11)
Gzip-16 339ms ± 1% 338ms ± 1% ~ (p=0.438 n=11+11)
Gunzip-16 64.4ms ± 1% 65.2ms ± 1% +1.28% (p=0.001 n=10+11)
HTTPClientServer-16 157µs ± 7% 160µs ± 5% ~ (p=0.133 n=11+11)
JSONEncode-16 22.3ms ± 4% 23.2ms ± 4% +3.79% (p=0.000 n=11+11)
JSONDecode-16 96.7ms ± 3% 96.6ms ± 1% ~ (p=0.562 n=11+11)
Mandelbrot200-16 6.42ms ± 1% 6.40ms ± 1% ~ (p=0.365 n=11+11)
GoParse-16 5.59ms ± 7% 5.42ms ± 5% -3.07% (p=0.020 n=11+10)
RegexpMatchEasy0_32-16 113ns ± 2% 113ns ± 3% ~ (p=0.968 n=11+10)
RegexpMatchEasy0_1K-16 417ns ± 1% 416ns ± 3% ~ (p=0.742 n=11+10)
RegexpMatchEasy1_32-16 106ns ± 1% 107ns ± 3% ~ (p=0.223 n=11+11)
RegexpMatchEasy1_1K-16 654ns ± 2% 657ns ± 1% ~ (p=0.672 n=11+8)
RegexpMatchMedium_32-16 176ns ± 3% 177ns ± 1% ~ (p=0.664 n=11+9)
RegexpMatchMedium_1K-16 56.3µs ± 3% 56.7µs ± 3% ~ (p=0.171 n=11+11)
RegexpMatchHard_32-16 2.83µs ± 5% 2.83µs ± 4% ~ (p=0.735 n=11+11)
RegexpMatchHard_1K-16 82.7µs ± 2% 82.7µs ± 2% ~ (p=0.853 n=10+10)
Revcomp-16 679ms ± 9% 782ms ±29% +15.16% (p=0.031 n=9+11)
Template-16 118ms ± 1% 109ms ± 2% -7.49% (p=0.000 n=11+11)
TimeParse-16 474ns ± 1% 462ns ± 1% -2.59% (p=0.000 n=11+11)
TimeFormat-16 482ns ± 1% 494ns ± 1% +2.49% (p=0.000 n=10+11)
name old speed new speed delta
GobDecode-16 68.7MB/s ± 4% 69.8MB/s ± 3% +1.52% (p=0.022 n=10+9)
GobEncode-16 77.6MB/s ± 6% 78.3MB/s ± 5% ~ (p=0.699 n=11+11)
Gzip-16 57.2MB/s ± 1% 57.3MB/s ± 1% ~ (p=0.428 n=11+11)
Gunzip-16 301MB/s ± 2% 298MB/s ± 1% -1.07% (p=0.007 n=11+11)
JSONEncode-16 86.9MB/s ± 4% 83.7MB/s ± 4% -3.63% (p=0.000 n=11+11)
JSONDecode-16 20.1MB/s ± 3% 20.1MB/s ± 1% ~ (p=0.529 n=11+11)
GoParse-16 10.4MB/s ± 6% 10.7MB/s ± 4% +3.12% (p=0.020 n=11+10)
RegexpMatchEasy0_32-16 282MB/s ± 2% 282MB/s ± 3% ~ (p=0.756 n=11+10)
RegexpMatchEasy0_1K-16 2.45GB/s ± 1% 2.46GB/s ± 2% ~ (p=0.705 n=11+10)
RegexpMatchEasy1_32-16 299MB/s ± 1% 297MB/s ± 2% ~ (p=0.151 n=11+11)
RegexpMatchEasy1_1K-16 1.56GB/s ± 2% 1.56GB/s ± 1% ~ (p=0.717 n=11+8)
RegexpMatchMedium_32-16 5.67MB/s ± 4% 5.63MB/s ± 1% ~ (p=0.538 n=11+9)
RegexpMatchMedium_1K-16 18.2MB/s ± 3% 18.1MB/s ± 3% ~ (p=0.156 n=11+11)
RegexpMatchHard_32-16 11.3MB/s ± 5% 11.3MB/s ± 4% ~ (p=0.711 n=11+11)
RegexpMatchHard_1K-16 12.4MB/s ± 1% 12.4MB/s ± 2% ~ (p=0.535 n=9+10)
Revcomp-16 370MB/s ± 5% 332MB/s ±24% ~ (p=0.062 n=8+11)
Template-16 16.5MB/s ± 1% 17.8MB/s ± 2% +8.11% (p=0.000 n=11+11)
Change-Id: I41e46f375ee127785c6491f7ef5bd35581261ae6
Reviewed-on: https://go-review.googlesource.com/104039
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Moving tighten after lowering benefits from the removal of values by
lowering and lowered CSE. It lets us make better decisions about
which values are rematerializable and which generate flags.
Empirically, it lowers stack usage (by avoiding spills)
and generates slightly smaller and faster binaries.
Fixes #19853
Fixes #21041
name old time/op new time/op delta
Template 195ms ± 4% 193ms ± 4% -1.33% (p=0.000 n=92+97)
Unicode 94.1ms ± 9% 92.5ms ± 8% -1.66% (p=0.002 n=97+95)
GoTypes 572ms ± 5% 566ms ± 7% -0.92% (p=0.001 n=95+98)
Compiler 2.56s ± 4% 2.52s ± 3% -1.41% (p=0.000 n=94+97)
SSA 6.52s ± 2% 6.47s ± 3% -0.82% (p=0.000 n=96+94)
Flate 117ms ± 5% 116ms ± 7% -0.72% (p=0.018 n=97+97)
GoParser 148ms ± 6% 146ms ± 4% -0.97% (p=0.002 n=98+95)
Reflect 370ms ± 7% 363ms ± 6% -1.79% (p=0.000 n=99+98)
Tar 175ms ± 6% 173ms ± 6% -1.11% (p=0.001 n=94+95)
XML 204ms ± 6% 201ms ± 5% -1.49% (p=0.000 n=97+96)
[Geo mean] 363ms 359ms -1.22%
name old user-time/op new user-time/op delta
Template 251ms ± 5% 245ms ± 5% -2.40% (p=0.000 n=97+93)
Unicode 131ms ±10% 128ms ± 9% -1.93% (p=0.001 n=100+99)
GoTypes 760ms ± 4% 752ms ± 4% -0.96% (p=0.000 n=97+95)
Compiler 3.51s ± 3% 3.48s ± 2% -1.04% (p=0.000 n=96+95)
SSA 9.57s ± 4% 9.52s ± 2% -0.50% (p=0.004 n=97+96)
Flate 149ms ± 6% 147ms ± 6% -1.46% (p=0.000 n=98+96)
GoParser 184ms ± 5% 181ms ± 7% -1.84% (p=0.000 n=98+97)
Reflect 469ms ± 6% 461ms ± 6% -1.69% (p=0.000 n=100+98)
Tar 219ms ± 8% 217ms ± 7% -0.90% (p=0.035 n=96+96)
XML 255ms ± 5% 251ms ± 6% -1.48% (p=0.000 n=98+98)
[Geo mean] 476ms 469ms -1.42%
name old alloc/op new alloc/op delta
Template 37.8MB ± 0% 37.8MB ± 0% -0.17% (p=0.000 n=100+100)
Unicode 28.8MB ± 0% 28.8MB ± 0% -0.02% (p=0.000 n=100+95)
GoTypes 112MB ± 0% 112MB ± 0% -0.20% (p=0.000 n=100+97)
Compiler 466MB ± 0% 464MB ± 0% -0.27% (p=0.000 n=100+100)
SSA 1.49GB ± 0% 1.49GB ± 0% -0.08% (p=0.000 n=100+99)
Flate 24.4MB ± 0% 24.3MB ± 0% -0.25% (p=0.000 n=98+99)
GoParser 30.7MB ± 0% 30.6MB ± 0% -0.26% (p=0.000 n=99+100)
Reflect 76.4MB ± 0% 76.4MB ± 0% ~ (p=0.253 n=100+100)
Tar 38.9MB ± 0% 38.8MB ± 0% -0.20% (p=0.000 n=100+97)
XML 41.5MB ± 0% 41.4MB ± 0% -0.19% (p=0.000 n=100+98)
[Geo mean] 77.5MB 77.4MB -0.16%
name old allocs/op new allocs/op delta
Template 381k ± 0% 381k ± 0% -0.15% (p=0.000 n=100+100)
Unicode 342k ± 0% 342k ± 0% -0.01% (p=0.000 n=100+98)
GoTypes 1.19M ± 0% 1.18M ± 0% -0.24% (p=0.000 n=100+100)
Compiler 4.52M ± 0% 4.50M ± 0% -0.29% (p=0.000 n=100+100)
SSA 12.3M ± 0% 12.3M ± 0% -0.11% (p=0.000 n=100+100)
Flate 234k ± 0% 234k ± 0% -0.26% (p=0.000 n=99+96)
GoParser 318k ± 0% 317k ± 0% -0.21% (p=0.000 n=99+100)
Reflect 974k ± 0% 974k ± 0% -0.03% (p=0.000 n=100+100)
Tar 392k ± 0% 391k ± 0% -0.17% (p=0.000 n=100+99)
XML 404k ± 0% 403k ± 0% -0.24% (p=0.000 n=99+99)
[Geo mean] 794k 792k -0.17%
name old object-bytes new object-bytes delta
Template 393kB ± 0% 392kB ± 0% -0.19% (p=0.008 n=5+5)
Unicode 207kB ± 0% 207kB ± 0% ~ (all equal)
GoTypes 1.23MB ± 0% 1.22MB ± 0% -0.11% (p=0.008 n=5+5)
Compiler 4.34MB ± 0% 4.33MB ± 0% -0.15% (p=0.008 n=5+5)
SSA 9.85MB ± 0% 9.85MB ± 0% -0.07% (p=0.008 n=5+5)
Flate 235kB ± 0% 234kB ± 0% -0.59% (p=0.008 n=5+5)
GoParser 297kB ± 0% 296kB ± 0% -0.22% (p=0.008 n=5+5)
Reflect 1.03MB ± 0% 1.03MB ± 0% -0.00% (p=0.008 n=5+5)
Tar 332kB ± 0% 331kB ± 0% -0.15% (p=0.008 n=5+5)
XML 413kB ± 0% 412kB ± 0% -0.19% (p=0.008 n=5+5)
[Geo mean] 728kB 727kB -0.17%
Change-Id: I9b5cdb668ed102a001897a05e833105acba220a2
Reviewed-on: https://go-review.googlesource.com/95995
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Introduce a new SSA pass to generate CondSelect intstrutions,
and add CondSelect lowering rules for arm64.
In order to make the CSEL instruction easier to optimize,
and to simplify the introduction of CSNEG, CSINC, and CSINV
in the future, modify the CSEL instruction to accept a condition
code in the aux field.
Notably, this change makes the go1 Gzip benchmark
more than 10% faster.
Benchmarks on a Cavium ThunderX:
name old time/op new time/op delta
BinaryTree17-96 15.9s ± 6% 16.0s ± 4% ~ (p=0.968 n=10+9)
Fannkuch11-96 7.17s ± 0% 7.00s ± 0% -2.43% (p=0.000 n=8+9)
FmtFprintfEmpty-96 208ns ± 1% 207ns ± 0% ~ (p=0.152 n=10+8)
FmtFprintfString-96 379ns ± 0% 375ns ± 0% -0.95% (p=0.000 n=10+9)
FmtFprintfInt-96 385ns ± 0% 383ns ± 0% -0.52% (p=0.000 n=9+10)
FmtFprintfIntInt-96 591ns ± 0% 586ns ± 0% -0.85% (p=0.006 n=7+9)
FmtFprintfPrefixedInt-96 656ns ± 0% 667ns ± 0% +1.71% (p=0.000 n=10+10)
FmtFprintfFloat-96 967ns ± 0% 984ns ± 0% +1.78% (p=0.000 n=10+10)
FmtManyArgs-96 2.35µs ± 0% 2.25µs ± 0% -4.63% (p=0.000 n=9+8)
GobDecode-96 31.0ms ± 0% 30.8ms ± 0% -0.36% (p=0.006 n=9+9)
GobEncode-96 24.4ms ± 0% 24.5ms ± 0% +0.30% (p=0.000 n=9+9)
Gzip-96 1.60s ± 0% 1.43s ± 0% -10.58% (p=0.000 n=9+10)
Gunzip-96 167ms ± 0% 169ms ± 0% +0.83% (p=0.000 n=8+9)
HTTPClientServer-96 311µs ± 1% 308µs ± 0% -0.75% (p=0.000 n=10+10)
JSONEncode-96 65.0ms ± 0% 64.8ms ± 0% -0.25% (p=0.000 n=9+8)
JSONDecode-96 262ms ± 1% 261ms ± 1% ~ (p=0.579 n=10+10)
Mandelbrot200-96 18.0ms ± 0% 18.1ms ± 0% +0.17% (p=0.000 n=8+10)
GoParse-96 14.0ms ± 0% 14.1ms ± 1% +0.42% (p=0.003 n=9+10)
RegexpMatchEasy0_32-96 644ns ± 2% 645ns ± 2% ~ (p=0.836 n=10+10)
RegexpMatchEasy0_1K-96 3.70µs ± 0% 3.49µs ± 0% -5.58% (p=0.000 n=10+10)
RegexpMatchEasy1_32-96 662ns ± 2% 657ns ± 2% ~ (p=0.137 n=10+10)
RegexpMatchEasy1_1K-96 4.47µs ± 0% 4.31µs ± 0% -3.48% (p=0.000 n=10+10)
RegexpMatchMedium_32-96 844ns ± 2% 849ns ± 1% ~ (p=0.208 n=10+10)
RegexpMatchMedium_1K-96 179µs ± 0% 182µs ± 0% +1.20% (p=0.000 n=10+10)
RegexpMatchHard_32-96 10.0µs ± 0% 10.1µs ± 0% +0.48% (p=0.000 n=10+9)
RegexpMatchHard_1K-96 297µs ± 0% 297µs ± 0% -0.14% (p=0.000 n=10+10)
Revcomp-96 3.08s ± 0% 3.13s ± 0% +1.56% (p=0.000 n=9+9)
Template-96 276ms ± 2% 275ms ± 1% ~ (p=0.393 n=10+10)
TimeParse-96 1.37µs ± 0% 1.36µs ± 0% -0.53% (p=0.000 n=10+7)
TimeFormat-96 1.40µs ± 0% 1.42µs ± 0% +0.97% (p=0.000 n=10+10)
[Geo mean] 264µs 262µs -0.77%
Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2
Reviewed-on: https://go-review.googlesource.com/55670
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change dump file names to group them alphabetically in directory
listings, in pass run order.
Change-Id: I8070578a5b4a3a7983dcc527ea1cfdb10a6d7d24
Reviewed-on: https://go-review.googlesource.com/83958
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Updates #18162 (mostly fixes)
Change-Id: I35bcb8a688bdaa432adb0ddbb73a2f7adda47b9e
Reviewed-on: https://go-review.googlesource.com/37958
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
For structs, slices, strings, interfaces, etc, propagation of
names to their components (e.g., complex.real, complex.imag)
is fragile (depends on phase ordering) and not done right
for the "dec" pass.
The dec pass is subsumed into decomposeBuiltin,
and then names are pushed into the args of all
OpFooMake opcodes.
compile/ssa/debug_test.go was fixed to pay attention to
variable values, and the reference files include checks
for the fixes in this CL (which make debugging better).
Change-Id: Ic2591ebb1698d78d07292b92c53667e6c37fa0cd
Reviewed-on: https://go-review.googlesource.com/73210
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
|
|
This is a crude compiler pass to eliminate stores to auto variables
that are only ever written to.
Eliminates an unnecessary store to x from the following code:
func f() int {
var x := 1
return *(&x)
}
Fixes #19765.
Change-Id: If2c63a8ae67b8c590b6e0cc98a9610939a3eeffa
Reviewed-on: https://go-review.googlesource.com/38746
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This avoids generating writeBarrier.enabled
blocks for dead stores.
Change-Id: Ib11d8e2ba952f3f1f01d16776e40a7200a7683cf
Reviewed-on: https://go-review.googlesource.com/42012
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Old loops look like this:
loop:
CMPQ ...
JGE exit
...
JMP loop
exit:
New loops look like this:
JMP entry
loop:
...
entry:
CMPQ ...
JLT loop
This removes one instruction (the unconditional jump) from
the inner loop.
Kinda surprisingly, it matters.
This is a bit different than the peeling that the old obj
library did in that we don't duplicate the loop exit test.
We just jump to the test. I'm not sure if it is better or
worse to do that (peeling gets rid of the JMP but means more
code duplication), but this CL is certainly a much simpler
compiler change, so I'll try this way first.
The obj library used to do peeling before
CL https://go-review.googlesource.com/c/36205 turned it off.
Fixes #15837 (remove obj instruction reordering)
The reordering is already removed, this CL implements the only
part of that reordering that we'd like to keep.
Fixes #14758 (append loop)
name old time/op new time/op delta
Foo-12 817ns ± 4% 538ns ± 0% -34.08% (p=0.000 n=10+9)
Bar-12 850ns ±11% 570ns ±13% -32.88% (p=0.000 n=10+10)
Update #19595 (BLAS slowdown)
name old time/op new time/op delta
DgemvMedMedNoTransIncN-12 13.2µs ± 9% 10.2µs ± 1% -22.26% (p=0.000 n=9+9)
Fixes #19633 (append loop)
name old time/op new time/op delta
Foo-12 810ns ± 1% 540ns ± 0% -33.30% (p=0.000 n=8+9)
Update #18977 (Fannkuch11 regression)
name old time/op new time/op delta
Fannkuch11-8 2.80s ± 0% 3.01s ± 0% +7.47% (p=0.000 n=9+10)
This one makes no sense. There's strictly 1 less instruction in the
inner loop (17 instead of 18). They are exactly the same instructions
except for the JMP that has been elided.
go1 benchmarks generally don't look very impressive. But the gains for the
specific issues above make this CL still probably worth it.
name old time/op new time/op delta
BinaryTree17-8 2.32s ± 0% 2.34s ± 0% +1.14% (p=0.000 n=9+7)
Fannkuch11-8 2.80s ± 0% 3.01s ± 0% +7.47% (p=0.000 n=9+10)
FmtFprintfEmpty-8 44.1ns ± 1% 46.1ns ± 1% +4.53% (p=0.000 n=10+10)
FmtFprintfString-8 67.8ns ± 0% 74.4ns ± 1% +9.80% (p=0.000 n=10+9)
FmtFprintfInt-8 74.9ns ± 0% 78.4ns ± 0% +4.67% (p=0.000 n=8+10)
FmtFprintfIntInt-8 117ns ± 1% 123ns ± 1% +4.69% (p=0.000 n=9+10)
FmtFprintfPrefixedInt-8 160ns ± 1% 146ns ± 0% -8.22% (p=0.000 n=8+10)
FmtFprintfFloat-8 214ns ± 0% 206ns ± 0% -3.91% (p=0.000 n=8+8)
FmtManyArgs-8 468ns ± 0% 497ns ± 1% +6.09% (p=0.000 n=8+10)
GobDecode-8 6.16ms ± 0% 6.21ms ± 1% +0.76% (p=0.000 n=9+10)
GobEncode-8 4.90ms ± 0% 4.92ms ± 1% +0.37% (p=0.028 n=9+10)
Gzip-8 209ms ± 0% 212ms ± 0% +1.33% (p=0.000 n=10+10)
Gunzip-8 36.6ms ± 0% 38.0ms ± 1% +4.03% (p=0.000 n=9+9)
HTTPClientServer-8 84.2µs ± 0% 86.0µs ± 1% +2.14% (p=0.000 n=9+9)
JSONEncode-8 13.6ms ± 3% 13.8ms ± 1% +1.55% (p=0.003 n=9+10)
JSONDecode-8 53.2ms ± 5% 52.9ms ± 0% ~ (p=0.280 n=10+10)
Mandelbrot200-8 3.78ms ± 0% 3.78ms ± 1% ~ (p=0.661 n=10+9)
GoParse-8 2.89ms ± 0% 2.94ms ± 2% +1.50% (p=0.000 n=10+10)
RegexpMatchEasy0_32-8 68.5ns ± 2% 68.9ns ± 1% ~ (p=0.136 n=10+10)
RegexpMatchEasy0_1K-8 220ns ± 1% 225ns ± 1% +2.41% (p=0.000 n=10+10)
RegexpMatchEasy1_32-8 64.7ns ± 0% 64.5ns ± 0% -0.28% (p=0.042 n=10+10)
RegexpMatchEasy1_1K-8 348ns ± 1% 355ns ± 0% +1.90% (p=0.000 n=10+10)
RegexpMatchMedium_32-8 102ns ± 1% 105ns ± 1% +2.95% (p=0.000 n=10+10)
RegexpMatchMedium_1K-8 33.1µs ± 3% 32.5µs ± 0% -1.75% (p=0.000 n=10+10)
RegexpMatchHard_32-8 1.71µs ± 1% 1.70µs ± 1% -0.84% (p=0.002 n=10+9)
RegexpMatchHard_1K-8 51.1µs ± 0% 50.8µs ± 1% -0.48% (p=0.004 n=10+10)
Revcomp-8 411ms ± 1% 402ms ± 0% -2.22% (p=0.000 n=10+9)
Template-8 61.8ms ± 1% 59.7ms ± 0% -3.44% (p=0.000 n=9+9)
TimeParse-8 306ns ± 0% 318ns ± 0% +3.83% (p=0.000 n=10+10)
TimeFormat-8 320ns ± 0% 318ns ± 1% -0.53% (p=0.012 n=7+10)
Change-Id: Ifaf29abbe5874e437048e411ba8f7cfbc9e1c94b
Reviewed-on: https://go-review.googlesource.com/38431
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.
There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.
objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.
Fixes #15165.
Fixes #20026.
Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.
This is a giant CL, but it is almost entirely routine refactoring.
The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.
Passes toolstash -cmp.
Updates #15756
Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.
The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.
DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.
Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.
Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The compiler's -d flag accepts string-valued flags, but currently only
for SSA debug flags. Extend it to support string values for other
flags. This also makes the syntax somewhat more sane so flag=value and
flag:value now both accept integers and strings.
Change-Id: Idd144d8479a430970cc1688f824bffe0a56ed2df
Reviewed-on: https://go-review.googlesource.com/37345
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770
|
|
XPos is a compact (8 instead of 16 bytes on a 64bit machine) source
position representation. There is a 1:1 correspondence between each
XPos and each regular Pos, translated via a global table.
In some sense this brings back the LineHist, though positions can
track line and column information; there is a O(1) translation
between the representations (no binary search), and the translation
is factored out.
The size increase with the prior change is brought down again and
the compiler speed is in line with the master repo (measured on
the same "quiet" machine as for prior change):
name old time/op new time/op delta
Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4)
Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4)
GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4)
Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4)
MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5)
name old user-ns/op new user-ns/op delta
Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4)
Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5)
GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4)
Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4)
Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47
Reviewed-on: https://go-review.googlesource.com/34506
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Loop breaking with a counter. Benchmarked (see comments),
eyeball checked for sanity on popular loops. This code
ought to handle loops in general, and properly inserts phi
functions in cases where the earlier version might not have.
Includes test, plus modifications to test/run.go to deal with
timeout and killing looping test. Tests broken by the addition
of extra code (branch frequency and live vars) for added
checks turn the check insertion off.
If GOEXPERIMENT=preemptibleloops, the compiler inserts reschedule
checks on every backedge of every reducible loop. Alternately,
specifying GO_GCFLAGS=-d=ssa/insert_resched_checks/on will
enable it for a single compilation, but because the core Go
libraries contain some loops that may run long, this is less
likely to have the desired effect.
This is intended as a tool to help in the study and diagnosis
of GC and other latency problems, now that goal STW GC latency
is on the order of 100 microseconds or less.
Updates #17831.
Updates #10958.
Change-Id: I6206c163a5b0248e3f21eb4fc65f73a179e1f639
Reviewed-on: https://go-review.googlesource.com/33910
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|