aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/compile.go
AgeCommit message (Collapse)Author
2021-04-29cmd/compile: minor doc enhancementsThan McIntosh
Add a little more detail to the ssa README relating to GOSSAFUNC. Update the -d=ssa help section to give a little more detail on what to expect with applying the /debug=X qualifier to a phase. Change-Id: I7027735f1f2955dbb5b9be36d9a648e8dc655048 Reviewed-on: https://go-review.googlesource.com/c/go/+/315229 Trust: Than McIntosh <thanm@google.com> Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2021-04-16internal/buildcfg: move build configuration out of cmd/internal/objabiRuss Cox
The go/build package needs access to this configuration, so move it into a new package available to the standard library. Change-Id: I868a94148b52350c76116451f4ad9191246adcff Reviewed-on: https://go-review.googlesource.com/c/go/+/310731 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-03-18cmd: move experiment flags into objabi.ExperimentAustin Clements
This moves all remaining GOEXPERIMENT flags into the objabi.Experiment struct, drops the "_enabled" from their name, and makes them all bool typed. We also drop DebugFlags.Fieldtrack because the previous CL shifted the one test that used it to use GOEXPERIMENT instead. Change-Id: I3406fe62b1c300bb4caeaffa6ca5ce56a70497fe Reviewed-on: https://go-review.googlesource.com/c/go/+/302389 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-01-26[dev.regabi] cmd/compile: remove leftover code form late call lowering workDavid Chase
It's no longer conditional. Change-Id: I697bb0e9ffe9644ec4d2766f7e8be8b82d3b0638 Reviewed-on: https://go-review.googlesource.com/c/go/+/286013 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-10-30cmd/compile: code cleanupMichele Di Pede
Change-Id: Ibf68e663f29a5cb3b64a7d923c005c16da647769 Reviewed-on: https://go-review.googlesource.com/c/go/+/266537 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2020-10-29cmd/compile: delay expansion of OpArg until expand_callsDavid Chase
As it says, delay expanpsion of OpArg to the expand_calls phase, to enable (eventually) interprocedural SSA optimizations, and (sooner) change to a register ABI. Includes a round of cleanup to function names and comments, largely to match the expanded scope of the functions. This CL removes the per-function dependence on GOSSAHASH, but the go116lateCallExpansion kill switch remains (and was tested locally to ensure it worked). Two functions in expand_calls.go that performed overlapping things were combined into a single function that is called twice. Fixes #42236. For #40724. Change-Id: Icbb78947eaa39f17f2c1210d5c2caef20abd6571 Reviewed-on: https://go-review.googlesource.com/c/go/+/262117 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-23cmd/compile: avoid generating CSEs; do all aggregates; maintain debug namesDavid Chase
This adds a pass to detect common selection operations, to avoid generating duplicates. Duplicate offsets are also detected. All aggregate types are now handled; there is some freedom in where expand_calls is run, though it must run before softfloat. Debug-name-maintenance is now incremental both in decompose builtin and in expand_calls; it might be good to push this into all the decompose passes. (this is a smash of 5 CLs that rewrote some of the same code several times to deal with phase-ordering problems, and included an abandoned attempt.) For #40724. Change-Id: I2a0c32f20660bf8b99e2bcecd33545d97d2bd3c6 Reviewed-on: https://go-review.googlesource.com/c/go/+/249458 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-01cmd/compile: allow directory specification for GOSSAFUNC outputDavid Chase
This was useful for debugging failures occurring during make.bash. The added flush also ensures that any hints in the GOSSAFUNC output are flushed before fatal exit. The environment variable GOSSADIR specifies where the SSA html debugging files should be placed. To avoid collisions, each one is written into the [package].[functionOrMethod].html, where [package] is the filepath separator separated package name, function is the function name, and method is either (*Type).Method, or Type.Method, as appropriate. Directories are created as necessary to make this work. Change-Id: I420927426b618b633bb1ffc51cf0f223b8f6d49c Reviewed-on: https://go-review.googlesource.com/c/go/+/252338 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-01cmd/compile: fix late call expansion for SSA-able aggregate results and ↵David Chase
arguments This change incorporates the decision that it should be possible to run call expansion relatively late in the optimization chain, so that (1) calls themselves can be exposed to useful optimizations (2) the effect of selectors on aggregates is seen at the rewrite, so that assignment of parts into registers is less complicated (at least I hope it works that way). That means that selectors feeding into SelectN need to be processed, and Make* feeding into call parameters need to be processed. This does however require that call expansion run before decompose builtins. This doesn't yet handle rewrites of strings, slices, interfaces, and complex numbers. Passes run.bash and race.bash Change-Id: I71ff23d3c491043beb30e926949970c4f63ef1a4 Reviewed-on: https://go-review.googlesource.com/c/go/+/245133 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-18cmd/compile: add code to expand calls just before late optDavid Chase
Still needs to generate the calls that will need lowering. Change-Id: Ifd4e510193441a5e27c462c1f1d704f07bf6dec3 Reviewed-on: https://go-review.googlesource.com/c/go/+/242359 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-17cmd/compile: move dumpFileSeqsurechen
I noticed that there is a Todo comment here. This variable is only used for filename when dump a function's ssa passes result in details. It is no problem to print a function alone, but may be edited by not only one goroutine if dump multiple functions at the same time. Although it looks only dump one function's ssa passes now. As far as I am concerned this variable can be a member variable of the struct Func. I'm not sure if this change is necessary. Looking forward to your advices, thank you very much. Change-Id: I35dd7247889e0cc7f19c0b400b597206592dee75 Reviewed-on: https://go-review.googlesource.com/c/go/+/244918 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org>
2020-06-10cmd/compile: always tighten and de-duplicate tuple selectorsMichael Munday
The scheduler assumes two special invariants that apply to tuple selectors (Select0 and Select1 ops): 1. There is only one tuple selector of each type per generator. 2. Tuple selectors and generators reside in the same block. Prior to this CL the assumption was that these invariants would only be broken by the CSE pass. The CSE pass therefore contained code to move and de-duplicate selectors to fix these invariants. However it is also possible to write relatively basic optimization rules that cause these invariants to be broken. For example: (A (Select0 (B))) -> (Select1 (B)) This rule could result in the newly added selector (Select1) being in a different block to the tuple generator (see issue #38356). It could also result in duplicate selectors if this rule matches multiple times for the same tuple generator (see issue #39472). The CSE pass will 'fix' these invariants. However it will only do so when optimizations are enabled (since disabling optimizations disables the CSE pass). This CL moves the CSE tuple selector fixup code into its own pass and makes it mandatory even when optimizations are disabled. This allows tuple selectors to be treated like normal ops for most of the compilation pipeline until after the new pass has run, at which point we need to be careful to maintain the invariant again. Fixes #39472. Change-Id: Ia3f79e09d9c65ac95f897ce37e967ee1258a080b Reviewed-on: https://go-review.googlesource.com/c/go/+/237118 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-04-05cmd/compile: restore missing columns in ssa.htmlBradford Lamson-Scribner
If the final pass(es) are identical during ssa.html generation, they are persisted in-memory as "pendingPhases" but never get written as a column in the html. This change flushes those in-memory phases. Fixes #38242 Change-Id: Id13477dcbe7b419a818bb457861b2422ba5ef4bc Reviewed-on: https://go-review.googlesource.com/c/go/+/227182 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-05cmd/compile: refactor around HTMLWriter removing logger in favor of FuncBradford Lamson-Scribner
Replace HTMLWriter's Logger field with a *Func. Implement Fatalf method for HTMLWriter which gets the Frontend() from the Func and calls down into it's Fatalf method, passing the msg and args along. Replace remaining calls to the old Logger with calls to logging methods on the Func. Change-Id: I966342ef9997396f3416fb152fa52d60080ebecb Reviewed-on: https://go-review.googlesource.com/c/go/+/227277 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-10cmd/compile: insert complicated x86 addressing modes as a separate passKeith Randall
Use a separate compiler pass to introduce complicated x86 addressing modes. Loads in the normal architecture rules (for x86 and all other platforms) can have constant offsets (AuxInt values) and symbols (Aux values), but no more. The complex addressing modes (x+y, x+2*y, etc.) are introduced in a separate pass that combines loads with LEAQx ops. Organizing rewrites this way simplifies the number of rewrites required, as there are lots of different rule orderings that have to be specified to ensure these complex addressing modes are always found if they are possible. Update #36468 Change-Id: I5b4bf7b03a1e731d6dfeb9ef19b376175f3b4b44 Reviewed-on: https://go-review.googlesource.com/c/go/+/217097 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2020-03-03cmd/compile: optimize integer-in-range checksMichael Munday
This CL incorporates code from CL 201206 by Josh Bleecher Snyder (thanks Josh). This CL restores the integer-in-range optimizations in the SSA backend. The fuse pass is enhanced to detect inequalities that could be merged and fuse their associated blocks while the generic rules optimize them into a single unsigned comparison. For example, the inequality `x >= 0 && x < 10` will now be optimized to `unsigned(x) < 10`. Overall has a fairly positive impact on binary sizes. name old time/op new time/op delta Template 192ms ± 1% 192ms ± 1% ~ (p=0.757 n=17+18) Unicode 76.6ms ± 2% 76.5ms ± 2% ~ (p=0.603 n=19+19) GoTypes 694ms ± 1% 693ms ± 1% ~ (p=0.569 n=19+20) Compiler 3.26s ± 0% 3.27s ± 0% +0.25% (p=0.000 n=20+20) SSA 7.41s ± 0% 7.49s ± 0% +1.10% (p=0.000 n=17+19) Flate 120ms ± 1% 120ms ± 1% +0.38% (p=0.003 n=19+19) GoParser 152ms ± 1% 152ms ± 1% ~ (p=0.061 n=17+19) Reflect 422ms ± 1% 425ms ± 2% +0.76% (p=0.001 n=18+20) Tar 167ms ± 1% 167ms ± 0% ~ (p=0.730 n=18+19) XML 233ms ± 4% 231ms ± 1% ~ (p=0.752 n=20+17) LinkCompiler 927ms ± 8% 928ms ± 8% ~ (p=0.857 n=19+20) ExternalLinkCompiler 1.81s ± 2% 1.81s ± 2% ~ (p=0.513 n=19+20) LinkWithoutDebugCompiler 556ms ±10% 583ms ±13% +4.95% (p=0.007 n=20+20) [Geo mean] 478ms 481ms +0.52% name old user-time/op new user-time/op delta Template 270ms ± 5% 269ms ± 7% ~ (p=0.925 n=20+20) Unicode 134ms ± 7% 131ms ±14% ~ (p=0.593 n=18+20) GoTypes 981ms ± 3% 987ms ± 2% +0.63% (p=0.049 n=19+18) Compiler 4.50s ± 2% 4.50s ± 1% ~ (p=0.588 n=19+20) SSA 10.6s ± 2% 10.6s ± 1% ~ (p=0.141 n=20+19) Flate 164ms ± 8% 165ms ±10% ~ (p=0.738 n=20+20) GoParser 202ms ± 5% 203ms ± 6% ~ (p=0.820 n=20+20) Reflect 587ms ± 6% 597ms ± 3% ~ (p=0.087 n=20+18) Tar 230ms ± 6% 228ms ± 8% ~ (p=0.569 n=19+20) XML 311ms ± 6% 314ms ± 5% ~ (p=0.369 n=20+20) LinkCompiler 878ms ± 8% 887ms ± 7% ~ (p=0.289 n=20+20) ExternalLinkCompiler 1.60s ± 7% 1.60s ± 7% ~ (p=0.820 n=20+20) LinkWithoutDebugCompiler 498ms ±12% 489ms ±11% ~ (p=0.398 n=20+20) [Geo mean] 611ms 611ms +0.05% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 36.0MB ± 0% -0.32% (p=0.000 n=20+20) Unicode 28.3MB ± 0% 28.3MB ± 0% -0.03% (p=0.000 n=19+20) GoTypes 121MB ± 0% 121MB ± 0% ~ (p=0.226 n=16+20) Compiler 563MB ± 0% 563MB ± 0% ~ (p=0.166 n=20+19) SSA 1.32GB ± 0% 1.33GB ± 0% +0.88% (p=0.000 n=20+19) Flate 22.7MB ± 0% 22.7MB ± 0% -0.02% (p=0.033 n=19+20) GoParser 27.9MB ± 0% 27.9MB ± 0% -0.02% (p=0.001 n=20+20) Reflect 78.3MB ± 0% 78.2MB ± 0% -0.01% (p=0.019 n=20+20) Tar 34.0MB ± 0% 34.0MB ± 0% -0.04% (p=0.000 n=20+20) XML 43.9MB ± 0% 43.9MB ± 0% -0.07% (p=0.000 n=20+19) LinkCompiler 205MB ± 0% 205MB ± 0% +0.44% (p=0.000 n=20+18) ExternalLinkCompiler 223MB ± 0% 223MB ± 0% +0.03% (p=0.000 n=20+20) LinkWithoutDebugCompiler 139MB ± 0% 142MB ± 0% +1.75% (p=0.000 n=20+20) [Geo mean] 93.7MB 93.9MB +0.20% name old allocs/op new allocs/op delta Template 363k ± 0% 361k ± 0% -0.58% (p=0.000 n=20+19) Unicode 329k ± 0% 329k ± 0% -0.06% (p=0.000 n=19+20) GoTypes 1.28M ± 0% 1.28M ± 0% -0.01% (p=0.000 n=20+20) Compiler 5.40M ± 0% 5.40M ± 0% -0.01% (p=0.000 n=20+20) SSA 12.7M ± 0% 12.8M ± 0% +0.80% (p=0.000 n=20+20) Flate 228k ± 0% 228k ± 0% ~ (p=0.194 n=20+20) GoParser 295k ± 0% 295k ± 0% -0.04% (p=0.000 n=20+20) Reflect 949k ± 0% 949k ± 0% -0.01% (p=0.000 n=20+20) Tar 337k ± 0% 337k ± 0% -0.06% (p=0.000 n=20+20) XML 418k ± 0% 417k ± 0% -0.17% (p=0.000 n=20+20) LinkCompiler 553k ± 0% 554k ± 0% +0.22% (p=0.000 n=20+19) ExternalLinkCompiler 1.52M ± 0% 1.52M ± 0% +0.27% (p=0.000 n=20+20) LinkWithoutDebugCompiler 186k ± 0% 186k ± 0% +0.06% (p=0.000 n=20+20) [Geo mean] 723k 723k +0.03% name old text-bytes new text-bytes delta HelloSize 828kB ± 0% 828kB ± 0% -0.01% (p=0.000 n=20+20) name old data-bytes new data-bytes delta HelloSize 13.4kB ± 0% 13.4kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 180kB ± 0% 180kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.23MB ± 0% 1.23MB ± 0% -0.33% (p=0.000 n=20+20) file before after Δ % addr2line 4320075 4311883 -8192 -0.190% asm 5191932 5187836 -4096 -0.079% buildid 2835338 2831242 -4096 -0.144% compile 20531717 20569099 +37382 +0.182% cover 5322511 5318415 -4096 -0.077% dist 3723749 3719653 -4096 -0.110% doc 4743515 4739419 -4096 -0.086% fix 3413960 3409864 -4096 -0.120% link 6690119 6686023 -4096 -0.061% nm 4269616 4265520 -4096 -0.096% pprof 14942189 14929901 -12288 -0.082% trace 11807164 11790780 -16384 -0.139% vet 8384104 8388200 +4096 +0.049% go 15339076 15334980 -4096 -0.027% total 132258257 132226007 -32250 -0.024% Fixes #30645. Change-Id: If551ac5996097f3685870d083151b5843170aab0 Reviewed-on: https://go-review.googlesource.com/c/go/+/165998 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02cmd/compile: add -d=ssa/check/seed=SEEDJosh Bleecher Snyder
This change adds the option to run the ssa checker with a random seed. The current system uses a completely fixed seed, which is good for reproducibility but bad for exploring the state space. Preserve what we have, but also provide a way for the caller to provide a seed. The caller can report the seed alongside any failures. Change-Id: I2676a8112d8260e6cac86d95d2e8db4d3221aeeb Reviewed-on: https://go-review.googlesource.com/c/go/+/216418 Reviewed-by: Keith Randall <khr@golang.org>
2019-10-14cmd/compile: add debugging mode for posetGiovanni Bajo
Add an internal mode to simplify debugging of posets by checking the integrity after every mutation. Turn it on within SSA checked builds. Change-Id: Idaa8277f58e5bce3753702e212cea4d698de30ca Reviewed-on: https://go-review.googlesource.com/c/go/+/196780 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2019-10-03cmd/compile: run deadcode before nilcheck for better statement relocationDavid Chase
Nilcheck would move statements from NilCheck values to others that turned out were already dead, which leads to lost statements. Better to eliminate the dead code first. One "error" is removed from test/prove.go because the code is actually dead, and the additional deadcode pass removes it before prove can run. Change-Id: If75926ca1acbb59c7ab9c8ef14d60a02a0a94f8b Reviewed-on: https://go-review.googlesource.com/c/go/+/198479 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Jeremy Faller <jeremy@golang.org>
2019-08-27cmd/compile: improve shortcircuit passJosh Bleecher Snyder
While working on #30645, I noticed that many instances in which the walkinrange optimization could apply were not even being considered. This was because of extraneous blocks in the CFG, of the type that shortcircuit normally removes. The change improves the shortcircuit pass to handle most of those cases. (There are a few that can only be reasonably detected later in compilation, after other optimizations have been run, but not enough to be worth chasing.) Notable changes: * Instead of calculating live-across-blocks values, use v.Uses == 1. This is cheaper and more straightforward. v.Uses did not exist when this pass was initially written. * Incorporate a fusePlain and loop until stable. This is necessary to find many of the instances. * Allow Copy and Not wrappers around Phi values. This significantly increases effectiveness. * Allow removal of all preds, creating a dead block. The previous pass stopped unnecessarily at one pred. * Use phielimValue during cleanup instead of manually setting the op to OpCopy. The result is marginally faster compilation and smaller code. name old time/op new time/op delta Template 213ms ± 2% 212ms ± 2% -0.63% (p=0.002 n=49+48) Unicode 90.0ms ± 2% 89.8ms ± 2% ~ (p=0.122 n=48+48) GoTypes 710ms ± 3% 711ms ± 2% ~ (p=0.433 n=45+49) Compiler 3.23s ± 2% 3.22s ± 2% ~ (p=0.124 n=47+49) SSA 10.0s ± 1% 10.0s ± 1% -0.43% (p=0.000 n=48+50) Flate 135ms ± 3% 135ms ± 2% ~ (p=0.311 n=49+49) GoParser 158ms ± 2% 158ms ± 2% ~ (p=0.757 n=48+48) Reflect 447ms ± 2% 447ms ± 2% ~ (p=0.815 n=49+48) Tar 189ms ± 2% 189ms ± 3% ~ (p=0.530 n=47+49) XML 251ms ± 3% 250ms ± 1% -0.75% (p=0.002 n=49+48) [Geo mean] 427ms 426ms -0.25% name old user-time/op new user-time/op delta Template 265ms ± 2% 265ms ± 2% ~ (p=0.969 n=48+50) Unicode 119ms ± 6% 119ms ± 6% ~ (p=0.738 n=50+50) GoTypes 923ms ± 2% 925ms ± 2% ~ (p=0.057 n=43+47) Compiler 4.37s ± 2% 4.37s ± 2% ~ (p=0.691 n=50+46) SSA 13.4s ± 1% 13.4s ± 1% ~ (p=0.282 n=42+49) Flate 162ms ± 2% 162ms ± 2% ~ (p=0.774 n=48+50) GoParser 186ms ± 2% 186ms ± 3% ~ (p=0.213 n=47+47) Reflect 572ms ± 2% 573ms ± 3% ~ (p=0.303 n=50+49) Tar 240ms ± 3% 240ms ± 2% ~ (p=0.939 n=46+44) XML 302ms ± 2% 302ms ± 2% ~ (p=0.399 n=47+47) [Geo mean] 540ms 541ms +0.07% name old alloc/op new alloc/op delta Template 36.8MB ± 0% 36.7MB ± 0% -0.42% (p=0.008 n=5+5) Unicode 28.1MB ± 0% 28.1MB ± 0% ~ (p=0.151 n=5+5) GoTypes 124MB ± 0% 124MB ± 0% -0.26% (p=0.008 n=5+5) Compiler 571MB ± 0% 566MB ± 0% -0.84% (p=0.008 n=5+5) SSA 1.86GB ± 0% 1.85GB ± 0% -0.58% (p=0.008 n=5+5) Flate 22.8MB ± 0% 22.8MB ± 0% -0.17% (p=0.008 n=5+5) GoParser 27.3MB ± 0% 27.3MB ± 0% -0.20% (p=0.008 n=5+5) Reflect 79.5MB ± 0% 79.3MB ± 0% -0.20% (p=0.008 n=5+5) Tar 34.7MB ± 0% 34.6MB ± 0% -0.42% (p=0.008 n=5+5) XML 45.4MB ± 0% 45.3MB ± 0% -0.29% (p=0.008 n=5+5) [Geo mean] 80.0MB 79.7MB -0.34% name old allocs/op new allocs/op delta Template 378k ± 0% 377k ± 0% -0.22% (p=0.008 n=5+5) Unicode 339k ± 0% 339k ± 0% ~ (p=0.643 n=5+5) GoTypes 1.36M ± 0% 1.36M ± 0% -0.10% (p=0.008 n=5+5) Compiler 5.51M ± 0% 5.50M ± 0% -0.13% (p=0.008 n=5+5) SSA 17.5M ± 0% 17.5M ± 0% -0.14% (p=0.008 n=5+5) Flate 234k ± 0% 234k ± 0% -0.04% (p=0.008 n=5+5) GoParser 299k ± 0% 299k ± 0% -0.05% (p=0.008 n=5+5) Reflect 978k ± 0% 979k ± 0% +0.02% (p=0.016 n=5+5) Tar 351k ± 0% 351k ± 0% -0.04% (p=0.008 n=5+5) XML 435k ± 0% 435k ± 0% -0.11% (p=0.008 n=5+5) [Geo mean] 840k 840k -0.08% file before after Δ % go 14794788 14770212 -24576 -0.166% addr2line 4203688 4199592 -4096 -0.097% api 5954056 5941768 -12288 -0.206% asm 4862704 4846320 -16384 -0.337% cgo 4778920 4770728 -8192 -0.171% compile 24001568 23923792 -77776 -0.324% cover 5198440 5190248 -8192 -0.158% dist 3595248 3587056 -8192 -0.228% doc 4618504 4610312 -8192 -0.177% fix 3337416 3333320 -4096 -0.123% link 6120408 6116312 -4096 -0.067% nm 4149064 4140872 -8192 -0.197% objdump 4555608 4547416 -8192 -0.180% pprof 14616324 14595844 -20480 -0.140% test2json 2766328 2762232 -4096 -0.148% trace 11638844 11622460 -16384 -0.141% vet 8274936 8258552 -16384 -0.198% total 132520780 132270972 -249808 -0.189% Change-Id: Ifcd235a2a6e5f13ed5c93e62523e2ef61321fccf Reviewed-on: https://go-review.googlesource.com/c/go/+/178197 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27cmd/compile: run deadcode before lowered CSEJosh Bleecher Snyder
CSE can make dead values live again. Running deadcode first avoids that; it also makes CSE more efficient. file before after Δ % api 5970616 5966520 -4096 -0.069% asm 4867088 4846608 -20480 -0.421% compile 23988320 23935072 -53248 -0.222% link 6084376 6080280 -4096 -0.067% nm 4165736 4161640 -4096 -0.098% objdump 4572216 4568120 -4096 -0.090% pprof 14452996 14457092 +4096 +0.028% trace 11467292 11471388 +4096 +0.036% total 132181100 132099180 -81920 -0.062% Compiler performance impact is negligible: name old alloc/op new alloc/op delta Template 38.8MB ± 0% 38.8MB ± 0% -0.04% (p=0.008 n=5+5) Unicode 28.2MB ± 0% 28.2MB ± 0% ~ (p=1.000 n=5+5) GoTypes 131MB ± 0% 131MB ± 0% -0.14% (p=0.008 n=5+5) Compiler 606MB ± 0% 606MB ± 0% -0.05% (p=0.008 n=5+5) SSA 2.14GB ± 0% 2.13GB ± 0% -0.26% (p=0.008 n=5+5) Flate 24.0MB ± 0% 24.0MB ± 0% -0.18% (p=0.008 n=5+5) GoParser 28.8MB ± 0% 28.8MB ± 0% -0.15% (p=0.008 n=5+5) Reflect 83.8MB ± 0% 83.7MB ± 0% -0.11% (p=0.008 n=5+5) Tar 36.4MB ± 0% 36.4MB ± 0% -0.09% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.8MB ± 0% -0.15% (p=0.008 n=5+5) [Geo mean] 84.6MB 84.5MB -0.12% name old allocs/op new allocs/op delta Template 379k ± 0% 380k ± 0% +0.15% (p=0.008 n=5+5) Unicode 340k ± 0% 340k ± 0% ~ (p=0.738 n=5+5) GoTypes 1.36M ± 0% 1.36M ± 0% +0.05% (p=0.008 n=5+5) Compiler 5.49M ± 0% 5.49M ± 0% +0.12% (p=0.008 n=5+5) SSA 17.5M ± 0% 17.5M ± 0% -0.18% (p=0.008 n=5+5) Flate 235k ± 0% 235k ± 0% ~ (p=0.079 n=5+5) GoParser 302k ± 0% 302k ± 0% ~ (p=0.310 n=5+5) Reflect 976k ± 0% 977k ± 0% +0.08% (p=0.008 n=5+5) Tar 352k ± 0% 352k ± 0% +0.12% (p=0.008 n=5+5) XML 436k ± 0% 436k ± 0% -0.05% (p=0.008 n=5+5) [Geo mean] 842k 842k +0.03% Change-Id: I53e8faed1859885ca5c4a5d45067a50984f3eff1 Reviewed-on: https://go-review.googlesource.com/c/go/+/175879 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-08cmd/compile: add countRule rewrite rule helperJosh Bleecher Snyder
noteRule is useful when you're trying to debug a particular rule, or get a general sense for how often a rule fires overall. It is less useful if you're trying to figure out which functions might be useful to benchmark to ascertain the impact of a newly added rule. Enter countRule. You use it like noteRule, except that you get per-function summaries. Sample output: # runtime (*mspan).sweep: idx1=1 evacuate_faststr: idx1=1 evacuate_fast32: idx1=1 evacuate: idx1=2 evacuate_fast64: idx1=1 sweepone: idx1=1 purgecachedstats: idx1=1 mProf_Free: idx1=1 This suggests that the map benchmarks might be good to run for this added rule. Change-Id: Id471c3231f1736165f2020f6979ff01c29677808 Reviewed-on: https://go-review.googlesource.com/c/go/+/167088 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-05-07cmd/compile: note that some rules know the name of the opt passJosh Bleecher Snyder
Change-Id: I4a70f4a52f84cf50f99939351319504b1c5dff76 Reviewed-on: https://go-review.googlesource.com/c/go/+/175777 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-04-19cmd/compile: move phi tighten after criticalJosh Bleecher Snyder
The phi tighten pass moves rematerializable phi args to the immediate predecessor of the phis. This reduces value lifetimes for regalloc. However, the critical edge removal pass can introduce new blocks, which can change what a block's immediate precedessor is. This can result in tightened phi args being spilled unnecessarily. This change moves the phi tighten pass after the critical edge pass, when the block structure is stable. This improves the code generated for func f(s string) bool { return s == "abcde" } Before this change: "".f STEXT nosplit size=44 args=0x18 locals=0x0 0x0000 00000 (x.go:3) MOVQ "".s+16(SP), AX 0x0005 00005 (x.go:3) CMPQ AX, $5 0x0009 00009 (x.go:3) JNE 40 0x000b 00011 (x.go:3) MOVQ "".s+8(SP), AX 0x0010 00016 (x.go:3) CMPL (AX), $1684234849 0x0016 00022 (x.go:3) JNE 36 0x0018 00024 (x.go:3) CMPB 4(AX), $101 0x001c 00028 (x.go:3) SETEQ AL 0x001f 00031 (x.go:3) MOVB AL, "".~r1+24(SP) 0x0023 00035 (x.go:3) RET 0x0024 00036 (x.go:3) XORL AX, AX 0x0026 00038 (x.go:3) JMP 31 0x0028 00040 (x.go:3) XORL AX, AX 0x002a 00042 (x.go:3) JMP 31 Observe the duplicated blocks at the end. After this change: "".f STEXT nosplit size=40 args=0x18 locals=0x0 0x0000 00000 (x.go:3) MOVQ "".s+16(SP), AX 0x0005 00005 (x.go:3) CMPQ AX, $5 0x0009 00009 (x.go:3) JNE 36 0x000b 00011 (x.go:3) MOVQ "".s+8(SP), AX 0x0010 00016 (x.go:3) CMPL (AX), $1684234849 0x0016 00022 (x.go:3) JNE 36 0x0018 00024 (x.go:3) CMPB 4(AX), $101 0x001c 00028 (x.go:3) SETEQ AL 0x001f 00031 (x.go:3) MOVB AL, "".~r1+24(SP) 0x0023 00035 (x.go:3) RET 0x0024 00036 (x.go:3) XORL AX, AX 0x0026 00038 (x.go:3) JMP 31 Change-Id: I12c81aa53b89456cb5809aa5396378245f3beda9 Reviewed-on: https://go-review.googlesource.com/c/go/+/172597 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-02-26cmd/compile: don't crash on -d=ssa/Daniel Martí
I forgot how to pull up the ssa debug options help, so instead of writing -d=ssa/help, I just wrote -d=ssa/. Much to my amusement, the compiler just crashed, as shown below. Fix that. panic: runtime error: index out of range goroutine 1 [running]: cmd/compile/internal/ssa.PhaseOption(0x7ffc375d2b70, 0x0, 0xdbff91, 0x5, 0x1, 0x0, 0x0, 0x1, 0x1) /home/mvdan/tip/src/cmd/compile/internal/ssa/compile.go:327 +0x1876 cmd/compile/internal/gc.Main(0xde7bd8) /home/mvdan/tip/src/cmd/compile/internal/gc/main.go:411 +0x41d0 main.main() /home/mvdan/tip/src/cmd/compile/main.go:51 +0xab Change-Id: Ia2ad394382ddf8f4498b16b5cfb49be0317fc1aa Reviewed-on: https://go-review.googlesource.com/c/154421 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-11-28cmd/compile: randomize value order in block for testingKeith Randall
A little bit of compiler stress testing. Randomize the order of the values in a block before every phase. This randomization makes sure that we're not implicitly depending on that order. Currently the random seed is a hash of the function name. It provides determinism, but sacrifices some coverage. Other arrangements are possible (env var, ...) but require more setup. Fixes #20178 Change-Id: Idae792a23264bd9a3507db6ba49b6d591a608e83 Reviewed-on: https://go-review.googlesource.com/c/33909 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-23cmd/compile: decompose composite OpArg before decomposeUserDavid Chase
This makes it easier to track names of function arguments for debugging purposes. Change-Id: Ic34856fe0b910005e1c7bc051d769d489a4b158e Reviewed-on: https://go-review.googlesource.com/c/150098 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-10-15cmd/compile: fuse before branchelimJosh Bleecher Snyder
The branchelim pass works better after fuse. Running fuse before branchelim also increases the stability of generated code amidst other compiler changes, which was the original motivation behind this change. The fuse pass is not cheap enough to run in its entirety before branchelim, but the most important half of it is. This change makes it possible to run "plain fuse" independently and does so before branchelim. During make.bash, elimIf occurrences increase from 4244 to 4288 (1%), and elimIfElse occurrences increase from 989 to 1079 (9%). Toolspeed impact is marginal; plain fuse pays for itself. name old time/op new time/op delta Template 189ms ± 2% 189ms ± 2% ~ (p=0.890 n=45+46) Unicode 93.2ms ± 5% 93.4ms ± 7% ~ (p=0.790 n=48+48) GoTypes 662ms ± 4% 660ms ± 4% ~ (p=0.186 n=48+49) Compiler 2.89s ± 4% 2.91s ± 3% +0.89% (p=0.050 n=49+44) SSA 8.23s ± 2% 8.21s ± 1% ~ (p=0.165 n=46+44) Flate 123ms ± 4% 123ms ± 3% +0.58% (p=0.031 n=47+49) GoParser 154ms ± 4% 154ms ± 4% ~ (p=0.492 n=49+48) Reflect 430ms ± 4% 429ms ± 4% ~ (p=1.000 n=48+48) Tar 171ms ± 3% 170ms ± 4% ~ (p=0.122 n=48+48) XML 232ms ± 3% 232ms ± 2% ~ (p=0.850 n=46+49) [Geo mean] 394ms 394ms +0.02% name old user-time/op new user-time/op delta Template 236ms ± 5% 236ms ± 4% ~ (p=0.934 n=50+50) Unicode 132ms ± 7% 130ms ± 9% ~ (p=0.087 n=50+50) GoTypes 861ms ± 3% 867ms ± 4% ~ (p=0.124 n=48+50) Compiler 3.93s ± 4% 3.94s ± 3% ~ (p=0.584 n=49+44) SSA 12.2s ± 2% 12.3s ± 1% ~ (p=0.610 n=46+45) Flate 149ms ± 4% 150ms ± 4% ~ (p=0.194 n=48+49) GoParser 193ms ± 5% 191ms ± 6% ~ (p=0.239 n=49+50) Reflect 553ms ± 5% 556ms ± 5% ~ (p=0.091 n=49+49) Tar 218ms ± 5% 218ms ± 5% ~ (p=0.359 n=49+50) XML 299ms ± 5% 298ms ± 4% ~ (p=0.482 n=50+49) [Geo mean] 516ms 516ms -0.01% name old alloc/op new alloc/op delta Template 36.3MB ± 0% 36.3MB ± 0% -0.02% (p=0.000 n=49+49) Unicode 29.7MB ± 0% 29.7MB ± 0% ~ (p=0.270 n=50+50) GoTypes 126MB ± 0% 126MB ± 0% -0.34% (p=0.000 n=50+49) Compiler 534MB ± 0% 531MB ± 0% -0.50% (p=0.000 n=50+50) SSA 1.98GB ± 0% 1.98GB ± 0% -0.06% (p=0.000 n=49+49) Flate 24.6MB ± 0% 24.6MB ± 0% -0.29% (p=0.000 n=50+50) GoParser 29.5MB ± 0% 29.4MB ± 0% -0.15% (p=0.000 n=49+50) Reflect 87.3MB ± 0% 87.2MB ± 0% -0.13% (p=0.000 n=49+50) Tar 35.6MB ± 0% 35.5MB ± 0% -0.17% (p=0.000 n=50+50) XML 48.2MB ± 0% 48.0MB ± 0% -0.30% (p=0.000 n=48+50) [Geo mean] 83.1MB 82.9MB -0.20% name old allocs/op new allocs/op delta Template 352k ± 0% 352k ± 0% -0.01% (p=0.004 n=49+49) Unicode 341k ± 0% 341k ± 0% ~ (p=0.341 n=48+50) GoTypes 1.28M ± 0% 1.28M ± 0% -0.03% (p=0.000 n=50+49) Compiler 4.96M ± 0% 4.96M ± 0% -0.05% (p=0.000 n=50+49) SSA 15.5M ± 0% 15.5M ± 0% -0.01% (p=0.000 n=50+49) Flate 233k ± 0% 233k ± 0% +0.01% (p=0.032 n=49+49) GoParser 294k ± 0% 294k ± 0% ~ (p=0.052 n=46+48) Reflect 1.04M ± 0% 1.04M ± 0% ~ (p=0.171 n=50+47) Tar 343k ± 0% 343k ± 0% -0.03% (p=0.000 n=50+50) XML 429k ± 0% 429k ± 0% -0.04% (p=0.000 n=50+50) [Geo mean] 812k 812k -0.02% Object files grow slightly; branchelim often increases binary size, at least on amd64. name old object-bytes new object-bytes delta Template 509kB ± 0% 509kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.84MB ± 0% 1.84MB ± 0% +0.00% (p=0.008 n=5+5) Compiler 6.71MB ± 0% 6.71MB ± 0% +0.01% (p=0.008 n=5+5) SSA 21.2MB ± 0% 21.2MB ± 0% +0.01% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% -0.00% (p=0.008 n=5+5) GoParser 404kB ± 0% 404kB ± 0% -0.02% (p=0.008 n=5+5) Reflect 1.40MB ± 0% 1.40MB ± 0% +0.09% (p=0.008 n=5+5) Tar 452kB ± 0% 452kB ± 0% +0.06% (p=0.008 n=5+5) XML 596kB ± 0% 596kB ± 0% +0.00% (p=0.008 n=5+5) [Geo mean] 1.04MB 1.04MB +0.01% Change-Id: I535c711b85380ff657fc0f022bebd9cb14ddd07f Reviewed-on: https://go-review.googlesource.com/c/129378 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-08-23cmd/compile: clean the output of GOSSAFUNCYury Smolsky
Since we print almost everything to ssa.html in the GOSSAFUNC mode, there is a need to stop spamming stdout when user just wants to see ssa.html. This changes cleans output of the GOSSAFUNC debug mode. To enable the dump of the debug data to stdout, one must put suffix + after the function name like that: GOSSAFUNC=Foo+ Otherwise gc will not print the IR and ASM to stdout after each phase. AST IR is still sent to stdout because it is not included into ssa.html. It will be fixed in a separate change. The change adds printing out the full path to the ssa.html file. Updates #25942 Change-Id: I711e145e05f0443c7df5459ca528dced273a62ee Reviewed-on: https://go-review.googlesource.com/126603 Run-TryBot: Yury Smolsky <yury@smolsky.by> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-07-02cmd/compile: run generic deadcode in -N modeCherry Zhang
Late opt pass may generate dead stores, which messes up store chain calculation in later passes. Run generic deadcode even in -N mode to remove them. Fixes #26163. Change-Id: I8276101717bb978d5980e6c7998f53fd8d0ae10f Reviewed-on: https://go-review.googlesource.com/121856 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2018-06-13cmd/compile: use expandable columns in ssa.htmlYury Smolsky
Display just a few columns in ssa.html, other columns can be expanded by clicking on collapsed column. Use sans serif font for the text, slightly smaller font size for non program text. Fixes #25286 Change-Id: I1094695135401602d90b97b69e42f6dda05871a2 Reviewed-on: https://go-review.googlesource.com/117275 Run-TryBot: Yury Smolsky <yury@smolsky.by> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-14cmd/compile: assign and preserve statement boundaries.David Chase
A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-05-08cmd/compile: add some generic composite type optimizationsMichael Munday
Propagate values through some wide Zero/Move operations. Among other things this allows us to optimize some kinds of array initialization. For example, the following code no longer requires a temporary be allocated on the stack. Instead it writes the values directly into the return value. func f(i uint32) [4]uint32 { return [4]uint32{i, i+1, i+2, i+3} } The return value is unnecessarily cleared but removing that is probably a task for dead store analysis (I think it needs to be able to match multiple Store ops to wide Zero ops). In order to reliably remove stack variables that are rendered unnecessary by these new rules I've added a new generic version of the unread autos elimination pass. These rules are triggered more than 5000 times when building and testing the standard library. Updates #15925 (fixes for arrays of up to 4 elements). Updates #24386 (fixes for up to 4 kept elements). Updates #24416. compilebench results: name old time/op new time/op delta Template 353ms ± 5% 359ms ± 3% ~ (p=0.143 n=10+10) Unicode 219ms ± 1% 217ms ± 4% ~ (p=0.740 n=7+10) GoTypes 1.26s ± 1% 1.26s ± 2% ~ (p=0.549 n=9+10) Compiler 6.00s ± 1% 6.08s ± 1% +1.42% (p=0.000 n=9+8) SSA 15.3s ± 2% 15.6s ± 1% +2.43% (p=0.000 n=10+10) Flate 237ms ± 2% 240ms ± 2% +1.31% (p=0.015 n=10+10) GoParser 285ms ± 1% 285ms ± 1% ~ (p=0.878 n=8+8) Reflect 797ms ± 3% 807ms ± 2% ~ (p=0.065 n=9+10) Tar 334ms ± 0% 335ms ± 4% ~ (p=0.460 n=8+10) XML 419ms ± 0% 423ms ± 1% +0.91% (p=0.001 n=7+9) StdCmd 46.0s ± 0% 46.4s ± 0% +0.85% (p=0.000 n=9+9) name old user-time/op new user-time/op delta Template 337ms ± 3% 346ms ± 5% ~ (p=0.053 n=9+10) Unicode 205ms ±10% 205ms ± 8% ~ (p=1.000 n=10+10) GoTypes 1.22s ± 2% 1.21s ± 3% ~ (p=0.436 n=10+10) Compiler 5.85s ± 1% 5.93s ± 0% +1.46% (p=0.000 n=10+8) SSA 14.9s ± 1% 15.3s ± 1% +2.62% (p=0.000 n=10+10) Flate 229ms ± 4% 228ms ± 6% ~ (p=0.796 n=10+10) GoParser 271ms ± 3% 275ms ± 4% ~ (p=0.165 n=10+10) Reflect 779ms ± 5% 775ms ± 2% ~ (p=0.971 n=10+10) Tar 317ms ± 4% 319ms ± 5% ~ (p=0.853 n=10+10) XML 404ms ± 4% 409ms ± 5% ~ (p=0.436 n=10+10) name old alloc/op new alloc/op delta Template 34.9MB ± 0% 35.0MB ± 0% +0.26% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 29.3MB ± 0% +0.02% (p=0.000 n=10+10) GoTypes 115MB ± 0% 115MB ± 0% +0.30% (p=0.000 n=10+10) Compiler 519MB ± 0% 521MB ± 0% +0.30% (p=0.000 n=10+10) SSA 1.55GB ± 0% 1.57GB ± 0% +1.34% (p=0.000 n=10+9) Flate 24.1MB ± 0% 24.2MB ± 0% +0.10% (p=0.000 n=10+10) GoParser 28.1MB ± 0% 28.1MB ± 0% +0.07% (p=0.000 n=10+10) Reflect 78.7MB ± 0% 78.7MB ± 0% +0.03% (p=0.000 n=8+10) Tar 34.4MB ± 0% 34.5MB ± 0% +0.12% (p=0.000 n=10+10) XML 43.2MB ± 0% 43.2MB ± 0% +0.13% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 330k ± 0% 330k ± 0% -0.01% (p=0.017 n=10+10) Unicode 337k ± 0% 337k ± 0% +0.01% (p=0.000 n=9+10) GoTypes 1.15M ± 0% 1.15M ± 0% +0.03% (p=0.000 n=10+10) Compiler 4.77M ± 0% 4.77M ± 0% +0.03% (p=0.000 n=9+10) SSA 12.5M ± 0% 12.6M ± 0% +1.16% (p=0.000 n=10+10) Flate 221k ± 0% 221k ± 0% +0.05% (p=0.000 n=9+10) GoParser 275k ± 0% 275k ± 0% +0.01% (p=0.014 n=10+9) Reflect 944k ± 0% 944k ± 0% -0.02% (p=0.000 n=10+10) Tar 324k ± 0% 323k ± 0% -0.12% (p=0.000 n=10+10) XML 384k ± 0% 384k ± 0% -0.01% (p=0.001 n=10+10) name old object-bytes new object-bytes delta Template 476kB ± 0% 476kB ± 0% -0.04% (p=0.000 n=10+10) Unicode 218kB ± 0% 218kB ± 0% ~ (all equal) GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.04% (p=0.000 n=10+10) Compiler 6.25MB ± 0% 6.24MB ± 0% -0.09% (p=0.000 n=10+10) SSA 15.9MB ± 0% 16.1MB ± 0% +1.22% (p=0.000 n=10+10) Flate 304kB ± 0% 304kB ± 0% -0.13% (p=0.000 n=10+10) GoParser 370kB ± 0% 370kB ± 0% -0.00% (p=0.000 n=10+10) Reflect 1.27MB ± 0% 1.27MB ± 0% -0.12% (p=0.000 n=10+10) Tar 421kB ± 0% 419kB ± 0% -0.64% (p=0.000 n=10+10) XML 518kB ± 0% 517kB ± 0% -0.12% (p=0.000 n=10+10) name old export-bytes new export-bytes delta Template 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) Unicode 6.52kB ± 0% 6.52kB ± 0% ~ (all equal) GoTypes 29.2kB ± 0% 29.2kB ± 0% ~ (all equal) Compiler 88.0kB ± 0% 88.0kB ± 0% ~ (all equal) SSA 109kB ± 0% 109kB ± 0% ~ (all equal) Flate 4.49kB ± 0% 4.49kB ± 0% ~ (all equal) GoParser 8.10kB ± 0% 8.10kB ± 0% ~ (all equal) Reflect 7.71kB ± 0% 7.71kB ± 0% ~ (all equal) Tar 9.15kB ± 0% 9.15kB ± 0% ~ (all equal) XML 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 676kB ± 0% 672kB ± 0% -0.59% (p=0.000 n=10+10) CmdGoSize 7.26MB ± 0% 7.24MB ± 0% -0.18% (p=0.000 n=10+10) name old data-bytes new data-bytes delta HelloSize 10.2kB ± 0% 10.2kB ± 0% ~ (all equal) CmdGoSize 248kB ± 0% 248kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) CmdGoSize 145kB ± 0% 145kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.46MB ± 0% 1.45MB ± 0% -0.31% (p=0.000 n=10+10) CmdGoSize 14.7MB ± 0% 14.7MB ± 0% -0.17% (p=0.000 n=10+10) Change-Id: Ic72b0c189dd542f391e1c9ab88a76e9148dc4285 Reviewed-on: https://go-review.googlesource.com/106495 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-29cmd/compile: better formatting for ssa phases options docAlberto Donizetti
Change the help doc of go tool compile -d=ssa/help from this: compile: GcFlag -d=ssa/<phase>/<flag>[=<value>|<function_name>] <phase> is one of: check, all, build, intrinsics, early_phielim, early_copyelim early_deadcode, short_circuit, decompose_user, opt, zero_arg_cse opt_deadcode, generic_cse, phiopt, nilcheckelim, prove, loopbce decompose_builtin, softfloat, late_opt, generic_deadcode, check_bce fuse, dse, writebarrier, insert_resched_checks, tighten, lower lowered_cse, elim_unread_autos, lowered_deadcode, checkLower late_phielim, late_copyelim, phi_tighten, late_deadcode, critical likelyadjust, layout, schedule, late_nilcheck, flagalloc, regalloc loop_rotate, stackframe, trim <flag> is one of on, off, debug, mem, time, test, stats, dump <value> defaults to 1 <function_name> is required for "dump", specifies name of function to dump after <phase> Except for dump, output is directed to standard out; dump appears in a file. Phase "all" supports flags "time", "mem", and "dump". Phases "intrinsics" supports flags "on", "off", and "debug". Interpretation of the "debug" value depends on the phase. Dump files are named <phase>__<function_name>_<seq>.dump. To this: compile: PhaseOptions usage: go tool compile -d=ssa/<phase>/<flag>[=<value>|<function_name>] where: - <phase> is one of: check, all, build, intrinsics, early_phielim, early_copyelim early_deadcode, short_circuit, decompose_user, opt, zero_arg_cse opt_deadcode, generic_cse, phiopt, nilcheckelim, prove decompose_builtin, softfloat, late_opt, generic_deadcode, check_bce branchelim, fuse, dse, writebarrier, insert_resched_checks, lower lowered_cse, elim_unread_autos, lowered_deadcode, checkLower late_phielim, late_copyelim, tighten, phi_tighten, late_deadcode critical, likelyadjust, layout, schedule, late_nilcheck, flagalloc regalloc, loop_rotate, stackframe, trim - <flag> is one of: on, off, debug, mem, time, test, stats, dump - <value> defaults to 1 - <function_name> is required for the "dump" flag, and specifies the name of function to dump after <phase> Phase "all" supports flags "time", "mem", and "dump". Phase "intrinsics" supports flags "on", "off", and "debug". If the "dump" flag is specified, the output is written on a file named <phase>__<function_name>_<seq>.dump; otherwise it is directed to stdout. Also add a few examples at the bottom. Fixes #20349 Change-Id: I334799e951e7b27855b3ace5d2d966c4d6ec4cff Reviewed-on: https://go-review.googlesource.com/110062 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-04-29cmd/compile: remove loopbce passGiovanni Bajo
prove now is able to do what loopbce used to do. Passes toolstash -cmp. Compilebench of the whole serie (master 9967582f770f6): name old time/op new time/op delta Template 208ms ±18% 198ms ± 4% ~ (p=0.690 n=5+5) Unicode 99.1ms ±19% 96.5ms ± 4% ~ (p=0.548 n=5+5) GoTypes 623ms ± 1% 633ms ± 1% ~ (p=0.056 n=5+5) Compiler 2.94s ± 2% 3.02s ± 4% ~ (p=0.095 n=5+5) SSA 6.77s ± 1% 7.11s ± 2% +4.94% (p=0.008 n=5+5) Flate 129ms ± 1% 136ms ± 0% +4.87% (p=0.016 n=5+4) GoParser 152ms ± 3% 156ms ± 1% ~ (p=0.095 n=5+5) Reflect 380ms ± 2% 392ms ± 1% +3.30% (p=0.008 n=5+5) Tar 185ms ± 6% 184ms ± 2% ~ (p=0.690 n=5+5) XML 223ms ± 2% 228ms ± 3% ~ (p=0.095 n=5+5) StdCmd 26.8s ± 2% 28.0s ± 5% +4.46% (p=0.032 n=5+5) name old user-ns/op new user-ns/op delta Template 252M ± 5% 248M ± 3% ~ (p=1.000 n=5+5) Unicode 118M ± 7% 121M ± 4% ~ (p=0.548 n=5+5) GoTypes 790M ± 2% 793M ± 2% ~ (p=0.690 n=5+5) Compiler 3.78G ± 3% 3.91G ± 4% ~ (p=0.056 n=5+5) SSA 8.98G ± 2% 9.52G ± 3% +6.08% (p=0.008 n=5+5) Flate 155M ± 1% 160M ± 0% +3.47% (p=0.016 n=5+4) GoParser 185M ± 4% 187M ± 2% ~ (p=0.310 n=5+5) Reflect 469M ± 1% 481M ± 1% +2.52% (p=0.016 n=5+5) Tar 222M ± 4% 222M ± 2% ~ (p=0.841 n=5+5) XML 269M ± 1% 274M ± 2% +1.88% (p=0.032 n=5+5) name old text-bytes new text-bytes delta HelloSize 664k ± 0% 664k ± 0% ~ (all equal) CmdGoSize 7.23M ± 0% 7.22M ± 0% -0.06% (p=0.008 n=5+5) name old data-bytes new data-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 390k ± 0% 390k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.39M ± 0% 1.39M ± 0% ~ (all equal) CmdGoSize 14.4M ± 0% 14.4M ± 0% -0.06% (p=0.008 n=5+5) Go1 of the whole serie: name old time/op new time/op delta BinaryTree17-16 5.40s ± 6% 5.38s ± 4% ~ (p=1.000 n=12+10) Fannkuch11-16 4.04s ± 3% 3.81s ± 3% -5.70% (p=0.000 n=11+11) FmtFprintfEmpty-16 60.7ns ± 2% 60.2ns ± 3% ~ (p=0.136 n=11+10) FmtFprintfString-16 115ns ± 2% 114ns ± 4% ~ (p=0.175 n=11+10) FmtFprintfInt-16 118ns ± 2% 125ns ± 2% +5.76% (p=0.000 n=11+10) FmtFprintfIntInt-16 196ns ± 2% 204ns ± 3% +4.42% (p=0.000 n=10+11) FmtFprintfPrefixedInt-16 207ns ± 2% 214ns ± 2% +3.23% (p=0.000 n=10+11) FmtFprintfFloat-16 364ns ± 3% 357ns ± 2% -1.88% (p=0.002 n=11+11) FmtManyArgs-16 773ns ± 2% 775ns ± 1% ~ (p=0.457 n=11+10) GobDecode-16 11.2ms ± 4% 11.0ms ± 3% -1.51% (p=0.022 n=10+9) GobEncode-16 9.91ms ± 6% 9.81ms ± 5% ~ (p=0.699 n=11+11) Gzip-16 339ms ± 1% 338ms ± 1% ~ (p=0.438 n=11+11) Gunzip-16 64.4ms ± 1% 65.2ms ± 1% +1.28% (p=0.001 n=10+11) HTTPClientServer-16 157µs ± 7% 160µs ± 5% ~ (p=0.133 n=11+11) JSONEncode-16 22.3ms ± 4% 23.2ms ± 4% +3.79% (p=0.000 n=11+11) JSONDecode-16 96.7ms ± 3% 96.6ms ± 1% ~ (p=0.562 n=11+11) Mandelbrot200-16 6.42ms ± 1% 6.40ms ± 1% ~ (p=0.365 n=11+11) GoParse-16 5.59ms ± 7% 5.42ms ± 5% -3.07% (p=0.020 n=11+10) RegexpMatchEasy0_32-16 113ns ± 2% 113ns ± 3% ~ (p=0.968 n=11+10) RegexpMatchEasy0_1K-16 417ns ± 1% 416ns ± 3% ~ (p=0.742 n=11+10) RegexpMatchEasy1_32-16 106ns ± 1% 107ns ± 3% ~ (p=0.223 n=11+11) RegexpMatchEasy1_1K-16 654ns ± 2% 657ns ± 1% ~ (p=0.672 n=11+8) RegexpMatchMedium_32-16 176ns ± 3% 177ns ± 1% ~ (p=0.664 n=11+9) RegexpMatchMedium_1K-16 56.3µs ± 3% 56.7µs ± 3% ~ (p=0.171 n=11+11) RegexpMatchHard_32-16 2.83µs ± 5% 2.83µs ± 4% ~ (p=0.735 n=11+11) RegexpMatchHard_1K-16 82.7µs ± 2% 82.7µs ± 2% ~ (p=0.853 n=10+10) Revcomp-16 679ms ± 9% 782ms ±29% +15.16% (p=0.031 n=9+11) Template-16 118ms ± 1% 109ms ± 2% -7.49% (p=0.000 n=11+11) TimeParse-16 474ns ± 1% 462ns ± 1% -2.59% (p=0.000 n=11+11) TimeFormat-16 482ns ± 1% 494ns ± 1% +2.49% (p=0.000 n=10+11) name old speed new speed delta GobDecode-16 68.7MB/s ± 4% 69.8MB/s ± 3% +1.52% (p=0.022 n=10+9) GobEncode-16 77.6MB/s ± 6% 78.3MB/s ± 5% ~ (p=0.699 n=11+11) Gzip-16 57.2MB/s ± 1% 57.3MB/s ± 1% ~ (p=0.428 n=11+11) Gunzip-16 301MB/s ± 2% 298MB/s ± 1% -1.07% (p=0.007 n=11+11) JSONEncode-16 86.9MB/s ± 4% 83.7MB/s ± 4% -3.63% (p=0.000 n=11+11) JSONDecode-16 20.1MB/s ± 3% 20.1MB/s ± 1% ~ (p=0.529 n=11+11) GoParse-16 10.4MB/s ± 6% 10.7MB/s ± 4% +3.12% (p=0.020 n=11+10) RegexpMatchEasy0_32-16 282MB/s ± 2% 282MB/s ± 3% ~ (p=0.756 n=11+10) RegexpMatchEasy0_1K-16 2.45GB/s ± 1% 2.46GB/s ± 2% ~ (p=0.705 n=11+10) RegexpMatchEasy1_32-16 299MB/s ± 1% 297MB/s ± 2% ~ (p=0.151 n=11+11) RegexpMatchEasy1_1K-16 1.56GB/s ± 2% 1.56GB/s ± 1% ~ (p=0.717 n=11+8) RegexpMatchMedium_32-16 5.67MB/s ± 4% 5.63MB/s ± 1% ~ (p=0.538 n=11+9) RegexpMatchMedium_1K-16 18.2MB/s ± 3% 18.1MB/s ± 3% ~ (p=0.156 n=11+11) RegexpMatchHard_32-16 11.3MB/s ± 5% 11.3MB/s ± 4% ~ (p=0.711 n=11+11) RegexpMatchHard_1K-16 12.4MB/s ± 1% 12.4MB/s ± 2% ~ (p=0.535 n=9+10) Revcomp-16 370MB/s ± 5% 332MB/s ±24% ~ (p=0.062 n=8+11) Template-16 16.5MB/s ± 1% 17.8MB/s ± 2% +8.11% (p=0.000 n=11+11) Change-Id: I41e46f375ee127785c6491f7ef5bd35581261ae6 Reviewed-on: https://go-review.googlesource.com/104039 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-02-27cmd/compile: tighten after loweringJosh Bleecher Snyder
Moving tighten after lowering benefits from the removal of values by lowering and lowered CSE. It lets us make better decisions about which values are rematerializable and which generate flags. Empirically, it lowers stack usage (by avoiding spills) and generates slightly smaller and faster binaries. Fixes #19853 Fixes #21041 name old time/op new time/op delta Template 195ms ± 4% 193ms ± 4% -1.33% (p=0.000 n=92+97) Unicode 94.1ms ± 9% 92.5ms ± 8% -1.66% (p=0.002 n=97+95) GoTypes 572ms ± 5% 566ms ± 7% -0.92% (p=0.001 n=95+98) Compiler 2.56s ± 4% 2.52s ± 3% -1.41% (p=0.000 n=94+97) SSA 6.52s ± 2% 6.47s ± 3% -0.82% (p=0.000 n=96+94) Flate 117ms ± 5% 116ms ± 7% -0.72% (p=0.018 n=97+97) GoParser 148ms ± 6% 146ms ± 4% -0.97% (p=0.002 n=98+95) Reflect 370ms ± 7% 363ms ± 6% -1.79% (p=0.000 n=99+98) Tar 175ms ± 6% 173ms ± 6% -1.11% (p=0.001 n=94+95) XML 204ms ± 6% 201ms ± 5% -1.49% (p=0.000 n=97+96) [Geo mean] 363ms 359ms -1.22% name old user-time/op new user-time/op delta Template 251ms ± 5% 245ms ± 5% -2.40% (p=0.000 n=97+93) Unicode 131ms ±10% 128ms ± 9% -1.93% (p=0.001 n=100+99) GoTypes 760ms ± 4% 752ms ± 4% -0.96% (p=0.000 n=97+95) Compiler 3.51s ± 3% 3.48s ± 2% -1.04% (p=0.000 n=96+95) SSA 9.57s ± 4% 9.52s ± 2% -0.50% (p=0.004 n=97+96) Flate 149ms ± 6% 147ms ± 6% -1.46% (p=0.000 n=98+96) GoParser 184ms ± 5% 181ms ± 7% -1.84% (p=0.000 n=98+97) Reflect 469ms ± 6% 461ms ± 6% -1.69% (p=0.000 n=100+98) Tar 219ms ± 8% 217ms ± 7% -0.90% (p=0.035 n=96+96) XML 255ms ± 5% 251ms ± 6% -1.48% (p=0.000 n=98+98) [Geo mean] 476ms 469ms -1.42% name old alloc/op new alloc/op delta Template 37.8MB ± 0% 37.8MB ± 0% -0.17% (p=0.000 n=100+100) Unicode 28.8MB ± 0% 28.8MB ± 0% -0.02% (p=0.000 n=100+95) GoTypes 112MB ± 0% 112MB ± 0% -0.20% (p=0.000 n=100+97) Compiler 466MB ± 0% 464MB ± 0% -0.27% (p=0.000 n=100+100) SSA 1.49GB ± 0% 1.49GB ± 0% -0.08% (p=0.000 n=100+99) Flate 24.4MB ± 0% 24.3MB ± 0% -0.25% (p=0.000 n=98+99) GoParser 30.7MB ± 0% 30.6MB ± 0% -0.26% (p=0.000 n=99+100) Reflect 76.4MB ± 0% 76.4MB ± 0% ~ (p=0.253 n=100+100) Tar 38.9MB ± 0% 38.8MB ± 0% -0.20% (p=0.000 n=100+97) XML 41.5MB ± 0% 41.4MB ± 0% -0.19% (p=0.000 n=100+98) [Geo mean] 77.5MB 77.4MB -0.16% name old allocs/op new allocs/op delta Template 381k ± 0% 381k ± 0% -0.15% (p=0.000 n=100+100) Unicode 342k ± 0% 342k ± 0% -0.01% (p=0.000 n=100+98) GoTypes 1.19M ± 0% 1.18M ± 0% -0.24% (p=0.000 n=100+100) Compiler 4.52M ± 0% 4.50M ± 0% -0.29% (p=0.000 n=100+100) SSA 12.3M ± 0% 12.3M ± 0% -0.11% (p=0.000 n=100+100) Flate 234k ± 0% 234k ± 0% -0.26% (p=0.000 n=99+96) GoParser 318k ± 0% 317k ± 0% -0.21% (p=0.000 n=99+100) Reflect 974k ± 0% 974k ± 0% -0.03% (p=0.000 n=100+100) Tar 392k ± 0% 391k ± 0% -0.17% (p=0.000 n=100+99) XML 404k ± 0% 403k ± 0% -0.24% (p=0.000 n=99+99) [Geo mean] 794k 792k -0.17% name old object-bytes new object-bytes delta Template 393kB ± 0% 392kB ± 0% -0.19% (p=0.008 n=5+5) Unicode 207kB ± 0% 207kB ± 0% ~ (all equal) GoTypes 1.23MB ± 0% 1.22MB ± 0% -0.11% (p=0.008 n=5+5) Compiler 4.34MB ± 0% 4.33MB ± 0% -0.15% (p=0.008 n=5+5) SSA 9.85MB ± 0% 9.85MB ± 0% -0.07% (p=0.008 n=5+5) Flate 235kB ± 0% 234kB ± 0% -0.59% (p=0.008 n=5+5) GoParser 297kB ± 0% 296kB ± 0% -0.22% (p=0.008 n=5+5) Reflect 1.03MB ± 0% 1.03MB ± 0% -0.00% (p=0.008 n=5+5) Tar 332kB ± 0% 331kB ± 0% -0.15% (p=0.008 n=5+5) XML 413kB ± 0% 412kB ± 0% -0.19% (p=0.008 n=5+5) [Geo mean] 728kB 727kB -0.17% Change-Id: I9b5cdb668ed102a001897a05e833105acba220a2 Reviewed-on: https://go-review.googlesource.com/95995 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-02-20cmd/compile/internal/ssa: emit csel on arm64philhofer
Introduce a new SSA pass to generate CondSelect intstrutions, and add CondSelect lowering rules for arm64. In order to make the CSEL instruction easier to optimize, and to simplify the introduction of CSNEG, CSINC, and CSINV in the future, modify the CSEL instruction to accept a condition code in the aux field. Notably, this change makes the go1 Gzip benchmark more than 10% faster. Benchmarks on a Cavium ThunderX: name old time/op new time/op delta BinaryTree17-96 15.9s ± 6% 16.0s ± 4% ~ (p=0.968 n=10+9) Fannkuch11-96 7.17s ± 0% 7.00s ± 0% -2.43% (p=0.000 n=8+9) FmtFprintfEmpty-96 208ns ± 1% 207ns ± 0% ~ (p=0.152 n=10+8) FmtFprintfString-96 379ns ± 0% 375ns ± 0% -0.95% (p=0.000 n=10+9) FmtFprintfInt-96 385ns ± 0% 383ns ± 0% -0.52% (p=0.000 n=9+10) FmtFprintfIntInt-96 591ns ± 0% 586ns ± 0% -0.85% (p=0.006 n=7+9) FmtFprintfPrefixedInt-96 656ns ± 0% 667ns ± 0% +1.71% (p=0.000 n=10+10) FmtFprintfFloat-96 967ns ± 0% 984ns ± 0% +1.78% (p=0.000 n=10+10) FmtManyArgs-96 2.35µs ± 0% 2.25µs ± 0% -4.63% (p=0.000 n=9+8) GobDecode-96 31.0ms ± 0% 30.8ms ± 0% -0.36% (p=0.006 n=9+9) GobEncode-96 24.4ms ± 0% 24.5ms ± 0% +0.30% (p=0.000 n=9+9) Gzip-96 1.60s ± 0% 1.43s ± 0% -10.58% (p=0.000 n=9+10) Gunzip-96 167ms ± 0% 169ms ± 0% +0.83% (p=0.000 n=8+9) HTTPClientServer-96 311µs ± 1% 308µs ± 0% -0.75% (p=0.000 n=10+10) JSONEncode-96 65.0ms ± 0% 64.8ms ± 0% -0.25% (p=0.000 n=9+8) JSONDecode-96 262ms ± 1% 261ms ± 1% ~ (p=0.579 n=10+10) Mandelbrot200-96 18.0ms ± 0% 18.1ms ± 0% +0.17% (p=0.000 n=8+10) GoParse-96 14.0ms ± 0% 14.1ms ± 1% +0.42% (p=0.003 n=9+10) RegexpMatchEasy0_32-96 644ns ± 2% 645ns ± 2% ~ (p=0.836 n=10+10) RegexpMatchEasy0_1K-96 3.70µs ± 0% 3.49µs ± 0% -5.58% (p=0.000 n=10+10) RegexpMatchEasy1_32-96 662ns ± 2% 657ns ± 2% ~ (p=0.137 n=10+10) RegexpMatchEasy1_1K-96 4.47µs ± 0% 4.31µs ± 0% -3.48% (p=0.000 n=10+10) RegexpMatchMedium_32-96 844ns ± 2% 849ns ± 1% ~ (p=0.208 n=10+10) RegexpMatchMedium_1K-96 179µs ± 0% 182µs ± 0% +1.20% (p=0.000 n=10+10) RegexpMatchHard_32-96 10.0µs ± 0% 10.1µs ± 0% +0.48% (p=0.000 n=10+9) RegexpMatchHard_1K-96 297µs ± 0% 297µs ± 0% -0.14% (p=0.000 n=10+10) Revcomp-96 3.08s ± 0% 3.13s ± 0% +1.56% (p=0.000 n=9+9) Template-96 276ms ± 2% 275ms ± 1% ~ (p=0.393 n=10+10) TimeParse-96 1.37µs ± 0% 1.36µs ± 0% -0.53% (p=0.000 n=10+7) TimeFormat-96 1.40µs ± 0% 1.42µs ± 0% +0.97% (p=0.000 n=10+10) [Geo mean] 264µs 262µs -0.77% Change-Id: Ie54eee4b3092af53e6da3baa6d1755098f57f3a2 Reviewed-on: https://go-review.googlesource.com/55670 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-12-14cmd/compile/internal/ssa: group dump files alphabeticallyGeoff Berry
Change dump file names to group them alphabetically in directory listings, in pass run order. Change-Id: I8070578a5b4a3a7983dcc527ea1cfdb10a6d7d24 Reviewed-on: https://go-review.googlesource.com/83958 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-30cmd/compile: use soft-float routines for soft-float targetsVladimir Stefanovic
Updates #18162 (mostly fixes) Change-Id: I35bcb8a688bdaa432adb0ddbb73a2f7adda47b9e Reviewed-on: https://go-review.googlesource.com/37958 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-05cmd/compile: repair name propagation into aggregate partsDavid Chase
For structs, slices, strings, interfaces, etc, propagation of names to their components (e.g., complex.real, complex.imag) is fragile (depends on phase ordering) and not done right for the "dec" pass. The dec pass is subsumed into decomposeBuiltin, and then names are pushed into the args of all OpFooMake opcodes. compile/ssa/debug_test.go was fixed to pay attention to variable values, and the reference files include checks for the fixes in this CL (which make debugging better). Change-Id: Ic2591ebb1698d78d07292b92c53667e6c37fa0cd Reviewed-on: https://go-review.googlesource.com/73210 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>
2017-08-24cmd/compile: eliminate stores to unread auto variablesMichael Munday
This is a crude compiler pass to eliminate stores to auto variables that are only ever written to. Eliminates an unnecessary store to x from the following code: func f() int { var x := 1 return *(&x) } Fixes #19765. Change-Id: If2c63a8ae67b8c590b6e0cc98a9610939a3eeffa Reviewed-on: https://go-review.googlesource.com/38746 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-29cmd/compile: move writebarrier pass after dseJosh Bleecher Snyder
This avoids generating writeBarrier.enabled blocks for dead stores. Change-Id: Ib11d8e2ba952f3f1f01d16776e40a7200a7683cf Reviewed-on: https://go-review.googlesource.com/42012 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-24cmd/compile: rotate loops so conditional branch is at the endKeith Randall
Old loops look like this: loop: CMPQ ... JGE exit ... JMP loop exit: New loops look like this: JMP entry loop: ... entry: CMPQ ... JLT loop This removes one instruction (the unconditional jump) from the inner loop. Kinda surprisingly, it matters. This is a bit different than the peeling that the old obj library did in that we don't duplicate the loop exit test. We just jump to the test. I'm not sure if it is better or worse to do that (peeling gets rid of the JMP but means more code duplication), but this CL is certainly a much simpler compiler change, so I'll try this way first. The obj library used to do peeling before CL https://go-review.googlesource.com/c/36205 turned it off. Fixes #15837 (remove obj instruction reordering) The reordering is already removed, this CL implements the only part of that reordering that we'd like to keep. Fixes #14758 (append loop) name old time/op new time/op delta Foo-12 817ns ± 4% 538ns ± 0% -34.08% (p=0.000 n=10+9) Bar-12 850ns ±11% 570ns ±13% -32.88% (p=0.000 n=10+10) Update #19595 (BLAS slowdown) name old time/op new time/op delta DgemvMedMedNoTransIncN-12 13.2µs ± 9% 10.2µs ± 1% -22.26% (p=0.000 n=9+9) Fixes #19633 (append loop) name old time/op new time/op delta Foo-12 810ns ± 1% 540ns ± 0% -33.30% (p=0.000 n=8+9) Update #18977 (Fannkuch11 regression) name old time/op new time/op delta Fannkuch11-8 2.80s ± 0% 3.01s ± 0% +7.47% (p=0.000 n=9+10) This one makes no sense. There's strictly 1 less instruction in the inner loop (17 instead of 18). They are exactly the same instructions except for the JMP that has been elided. go1 benchmarks generally don't look very impressive. But the gains for the specific issues above make this CL still probably worth it. name old time/op new time/op delta BinaryTree17-8 2.32s ± 0% 2.34s ± 0% +1.14% (p=0.000 n=9+7) Fannkuch11-8 2.80s ± 0% 3.01s ± 0% +7.47% (p=0.000 n=9+10) FmtFprintfEmpty-8 44.1ns ± 1% 46.1ns ± 1% +4.53% (p=0.000 n=10+10) FmtFprintfString-8 67.8ns ± 0% 74.4ns ± 1% +9.80% (p=0.000 n=10+9) FmtFprintfInt-8 74.9ns ± 0% 78.4ns ± 0% +4.67% (p=0.000 n=8+10) FmtFprintfIntInt-8 117ns ± 1% 123ns ± 1% +4.69% (p=0.000 n=9+10) FmtFprintfPrefixedInt-8 160ns ± 1% 146ns ± 0% -8.22% (p=0.000 n=8+10) FmtFprintfFloat-8 214ns ± 0% 206ns ± 0% -3.91% (p=0.000 n=8+8) FmtManyArgs-8 468ns ± 0% 497ns ± 1% +6.09% (p=0.000 n=8+10) GobDecode-8 6.16ms ± 0% 6.21ms ± 1% +0.76% (p=0.000 n=9+10) GobEncode-8 4.90ms ± 0% 4.92ms ± 1% +0.37% (p=0.028 n=9+10) Gzip-8 209ms ± 0% 212ms ± 0% +1.33% (p=0.000 n=10+10) Gunzip-8 36.6ms ± 0% 38.0ms ± 1% +4.03% (p=0.000 n=9+9) HTTPClientServer-8 84.2µs ± 0% 86.0µs ± 1% +2.14% (p=0.000 n=9+9) JSONEncode-8 13.6ms ± 3% 13.8ms ± 1% +1.55% (p=0.003 n=9+10) JSONDecode-8 53.2ms ± 5% 52.9ms ± 0% ~ (p=0.280 n=10+10) Mandelbrot200-8 3.78ms ± 0% 3.78ms ± 1% ~ (p=0.661 n=10+9) GoParse-8 2.89ms ± 0% 2.94ms ± 2% +1.50% (p=0.000 n=10+10) RegexpMatchEasy0_32-8 68.5ns ± 2% 68.9ns ± 1% ~ (p=0.136 n=10+10) RegexpMatchEasy0_1K-8 220ns ± 1% 225ns ± 1% +2.41% (p=0.000 n=10+10) RegexpMatchEasy1_32-8 64.7ns ± 0% 64.5ns ± 0% -0.28% (p=0.042 n=10+10) RegexpMatchEasy1_1K-8 348ns ± 1% 355ns ± 0% +1.90% (p=0.000 n=10+10) RegexpMatchMedium_32-8 102ns ± 1% 105ns ± 1% +2.95% (p=0.000 n=10+10) RegexpMatchMedium_1K-8 33.1µs ± 3% 32.5µs ± 0% -1.75% (p=0.000 n=10+10) RegexpMatchHard_32-8 1.71µs ± 1% 1.70µs ± 1% -0.84% (p=0.002 n=10+9) RegexpMatchHard_1K-8 51.1µs ± 0% 50.8µs ± 1% -0.48% (p=0.004 n=10+10) Revcomp-8 411ms ± 1% 402ms ± 0% -2.22% (p=0.000 n=10+9) Template-8 61.8ms ± 1% 59.7ms ± 0% -3.44% (p=0.000 n=9+9) TimeParse-8 306ns ± 0% 318ns ± 0% +3.83% (p=0.000 n=10+10) TimeFormat-8 320ns ± 0% 318ns ± 1% -0.53% (p=0.012 n=7+10) Change-Id: Ifaf29abbe5874e437048e411ba8f7cfbc9e1c94b Reviewed-on: https://go-review.googlesource.com/38431 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19cmd/internal/objabi: extract shared functionality from objMatthew Dempsky
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing the assembler backends no longer requires reinstalling cmd/link or cmd/addr2line. There's also now one canonical definition of the object file format in cmd/internal/objabi/doc.go, with a warning to update all three implementations. objabi is still something of a grab bag of unrelated code (e.g., flag and environment variable handling probably belong in a separate "tool" package), but this is still progress. Fixes #15165. Fixes #20026. Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c Reviewed-on: https://go-review.googlesource.com/40972 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-17cmd/compile: move Frontend field from ssa.Config to ssa.FuncJosh Bleecher Snyder
Suggested by mdempsky in CL 38232. This allows us to use the Frontend field to associate frontend state and information with a function. See the following CL in the series for examples. This is a giant CL, but it is almost entirely routine refactoring. The ssa test API is starting to feel a bit unwieldy. I will clean it up separately, once the dust has settled. Passes toolstash -cmp. Updates #15756 Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf Reviewed-on: https://go-review.googlesource.com/38327 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.ConfigJosh Bleecher Snyder
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill the roles laid out for them in CL 38160. The only non-trivial change in this CL is how cached values and blocks get IDs. Prior to this CL, their IDs were assigned as part of resetting the cache, and only modified IDs were reset. This required knowing how many values and blocks were modified, which required a tight coupling between ssa.Func and ssa.Config. To eliminate that coupling, we now zero values and blocks during reset, and assign their IDs when they are used. Since unused values and blocks have ID == 0, we can efficiently find the last used value/block, to avoid zeroing everything. Bulk zeroing is efficient, but not efficient enough to obviate the need to avoid zeroing everything every time. As a happy side-effect, ssa.Func.Free is no longer necessary. DebugHashMatch and friends now belong in func.go. They have been left in place for clarity and review. I will move them in a subsequent CL. Passes toolstash -cmp. No compiler performance impact. No change in 'go test cmd/compile/internal/ssa' execution time. Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098 Reviewed-on: https://go-review.googlesource.com/38167 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-03-03cmd/compile: accept string debug flagsAustin Clements
The compiler's -d flag accepts string-valued flags, but currently only for SSA debug flags. Extend it to support string values for other flags. This also makes the syntax somewhat more sane so flag=value and flag:value now both accept integers and strings. Change-Id: Idd144d8479a430970cc1688f824bffe0a56ed2df Reviewed-on: https://go-review.googlesource.com/37345 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-02-01all: merge dev.inline into masterRuss Cox
Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770
2017-01-09[dev.inline] cmd/internal/src: introduce compact source position representationRobert Griesemer
XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-09cmd/compile: insert scheduling checks on loop backedgesDavid Chase
Loop breaking with a counter. Benchmarked (see comments), eyeball checked for sanity on popular loops. This code ought to handle loops in general, and properly inserts phi functions in cases where the earlier version might not have. Includes test, plus modifications to test/run.go to deal with timeout and killing looping test. Tests broken by the addition of extra code (branch frequency and live vars) for added checks turn the check insertion off. If GOEXPERIMENT=preemptibleloops, the compiler inserts reschedule checks on every backedge of every reducible loop. Alternately, specifying GO_GCFLAGS=-d=ssa/insert_resched_checks/on will enable it for a single compilation, but because the core Go libraries contain some loops that may run long, this is less likely to have the desired effect. This is intended as a tool to help in the study and diagnosis of GC and other latency problems, now that goal STW GC latency is on the order of 100 microseconds or less. Updates #17831. Updates #10958. Change-Id: I6206c163a5b0248e3f21eb4fc65f73a179e1f639 Reviewed-on: https://go-review.googlesource.com/33910 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>