Age | Commit message (Collapse) | Author |
|
This doesn't change any behavior, but should help the compiler realise
that these funcs really do nothing at all.
Change-Id: Ib26c02ef264691acac983538ec300f91d6ff98db
Reviewed-on: https://go-review.googlesource.com/c/go/+/280314
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The ValAndOff type is a 64bit integer holding a 32bit value and a
32bit offset in each half, but for historical reasons its Val and Off
methods returned an int64. This was convenient when AuxInt was always
an int64, but now that AuxInts are typed we can return int32 from Val
and Off and get rid of a several casts and now unnecessary range
checks.
This change:
- changes the Val and Off methods to return an int32 (from int64);
- adds Val64 and Off64 methods for convenience in the few remaining
places (in the ssa.go files) where Val and Off are stored in int64
fields;
- deletes makeValAndOff64, renames makeValAndOff32 to makeValAndOff
- deletes a few ValAndOff methods that are now unused;
- removes several validOff/validValAndOff check that will always
return true.
Passes:
GOARCH=amd64 gotip build -toolexec 'toolstash -cmp' -a std
GOARCH=386 gotip build -toolexec 'toolstash -cmp' -a std
GOARCH=s390x gotip build -toolexec 'toolstash -cmp' -a std
(the three GOARCHs with SSA rules files impacted by the change).
Change-Id: I2abbbf42188c798631b94d3a55ca44256f140be7
Reviewed-on: https://go-review.googlesource.com/c/go/+/299149
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Passes toolstash-check -all.
Change-Id: Ia441582f7f67184eb831e184f9c3c0e3c11001bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/229698
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This shrinks the compiler without impacting performance.
(The performance-sensitive part of rewrite rules is the non-match case.)
Passes toolstash-check -all.
Executable size:
file before after Δ %
compile 20356168 20163960 -192208 -0.944%
total 115599376 115407168 -192208 -0.166%
Text size:
file before after Δ %
cmd/compile/internal/ssa.s 3928309 3778774 -149535 -3.807%
total 18862943 18713408 -149535 -0.793%
Memory allocated compiling package SSA:
SSA 12.7M ± 0% 12.5M ± 0% -1.74% (p=0.008 n=5+5)
Compiler speed impact:
name old time/op new time/op delta
Template 211ms ± 1% 211ms ± 2% ~ (p=0.832 n=49+49)
Unicode 82.8ms ± 2% 83.2ms ± 2% +0.44% (p=0.022 n=46+49)
GoTypes 726ms ± 1% 728ms ± 2% ~ (p=0.076 n=46+48)
Compiler 3.39s ± 2% 3.40s ± 2% ~ (p=0.633 n=48+49)
SSA 7.71s ± 1% 7.65s ± 1% -0.78% (p=0.000 n=45+44)
Flate 134ms ± 1% 134ms ± 1% ~ (p=0.195 n=50+49)
GoParser 167ms ± 1% 167ms ± 1% ~ (p=0.390 n=47+47)
Reflect 453ms ± 3% 452ms ± 2% ~ (p=0.492 n=48+49)
Tar 184ms ± 3% 184ms ± 2% ~ (p=0.862 n=50+48)
XML 248ms ± 2% 248ms ± 2% ~ (p=0.096 n=49+47)
[Geo mean] 415ms 415ms -0.03%
name old user-time/op new user-time/op delta
Template 273ms ± 1% 273ms ± 2% ~ (p=0.711 n=48+48)
Unicode 117ms ± 6% 117ms ± 5% ~ (p=0.633 n=50+50)
GoTypes 972ms ± 2% 974ms ± 1% +0.29% (p=0.016 n=47+49)
Compiler 4.46s ± 6% 4.51s ± 6% ~ (p=0.093 n=50+50)
SSA 10.4s ± 1% 10.3s ± 2% -0.94% (p=0.000 n=45+50)
Flate 166ms ± 2% 167ms ± 2% ~ (p=0.148 n=49+48)
GoParser 202ms ± 1% 202ms ± 2% -0.28% (p=0.014 n=47+49)
Reflect 594ms ± 2% 594ms ± 2% ~ (p=0.717 n=48+49)
Tar 224ms ± 2% 224ms ± 2% ~ (p=0.805 n=50+49)
XML 311ms ± 1% 310ms ± 1% ~ (p=0.177 n=49+48)
[Geo mean] 537ms 537ms +0.01%
Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221237
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
We added chunking of rewrite rules to speed up compiling package SSA.
This series of changes has significantly shrunk the number of
rewrite rules, and they are no longer being added nearly as fast.
Now that we are sharing v.Args across multiple rewrite rules,
there is additional benefit to having more rules in a single function.
Removing chunking now has an incidental impact on compiling package SSA,
marginally speeds up other compilation, shrinks the cmd/compile binary,
and simplifies the code.
name old time/op new time/op delta
Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97)
Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91)
GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96)
Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94)
SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95)
Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94)
GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94)
Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97)
Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94)
XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92)
[Geo mean] 419ms 418ms -0.21%
name old user-time/op new user-time/op delta
Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96)
Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98)
GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99)
Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100)
SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96)
Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96)
GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97)
Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92)
Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99)
XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93)
[Geo mean] 543ms 542ms -0.13%
Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa
Reviewed-on: https://go-review.googlesource.com/c/go/+/216220
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
CL 213703 converted generated rewrite rules for commutative ops
to use loops instead of duplicated code.
However, it loaded args using expressions like
v.Args[i] and v.Args[i^1], which the compiler could
not eliminate bounds for (including with all outstanding
prove CLs).
Also, given a series of separate rewrite rules for the same op,
we generated bounds checks for every rewrite rule, even though
we were repeatedly loading the same set of args.
This change reduces both sets of bounds checks.
Instead of loading v.Args[i] and v.Args[i^1] for commutative loops,
we now preload v.Args[0] and v.Args[1] into local variables,
and then swap them (as needed) in the commutative loop post statement.
And we now load all top level v.Args into local variables
at the beginning of every rewrite rule function.
The second optimization is the more significant,
but the first helps a little, and they play together
nicely from the perspective of generating the code.
This does increase register pressure, but the reduced bounds
checks more than compensate.
Note that the vast majority of rewrite rules evaluated
are not applied, so the prologue is the most important
part of the rewrite rules.
There is one subtle aspect to the new generated code.
Because the top level v.Args are shared across rewrite rules,
and rule evaluation can swap v_0 and v_1, v_0 and v_1
can end up being swapped from one rule to the next.
That is OK, because any time a rule does not get applied,
they will have been swapped exactly twice.
Passes toolstash-check -all.
name old time/op new time/op delta
Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96)
Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90)
GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94)
Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100)
SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99)
Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96)
GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93)
Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94)
Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95)
XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94)
[Geo mean] 424ms 421ms -0.68%
name old user-time/op new user-time/op delta
Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98)
Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90)
GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93)
Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100)
SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97)
Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92)
GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96)
Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92)
Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98)
XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95)
[Geo mean] 547ms 544ms -0.61%
file before after Δ %
compile 21113112 21109016 -4096 -0.019%
total 131704940 131700844 -4096 -0.003%
Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956
Reviewed-on: https://go-review.googlesource.com/c/go/+/216218
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
First, renove unnecessary "// cond:" lines from the generated files.
This shaves off about ~7k lines.
Second, join "if cond { break }" statements via "||", which allows us to
deduplicate a large number of them. This shaves off another ~25k lines.
This change is not for readability or simplicity; but rather, to avoid
unnecessary verbosity that makes the generated files larger. All in all,
git reports that the generated files overall weigh ~200KiB less, or
about 2.7% less.
While at it, add a -trace flag to rulegen.
Updates #33644.
Change-Id: I3fac0290a6066070cc62400bf970a4ae0929470a
Reviewed-on: https://go-review.googlesource.com/c/go/+/196498
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
First, add cpu and memory profiling flags, as these are useful to see
where rulegen is spending its time. It now takes many seconds to run on
a recent laptop, so we have to keep an eye on what it's doing.
Second, stop writing '_ = var' lines to keep imports and variables used
at all times. Now that rulegen removes all such unused names, they're
unnecessary.
To perform the removal, lean on go/types to first detect what names are
unused. We can configure it to give us all the type-checking errors in a
file, so we can collect all "declared but not used" errors in a single
pass.
We then use astutil.Apply to remove the relevant nodes based on the line
information from each unused error. This allows us to apply the changes
without having to do extra parser+printer roundtrips to plaintext, which
are far too expensive.
We need to do multiple such passes, as removing an unused variable
declaration might then make another declaration unused. Two passes are
enough to clean every file at the moment, so add a limit of three passes
for now to avoid eating cpu uncontrollably by accident.
The resulting performance of the changes above is a ~30% loss across the
table, since go/types is fairly expensive. The numbers were obtained
with 'benchcmd Rulegen go run *.go', which involves compiling rulegen
itself, but that seems reflective of how the program is used.
name old time/op new time/op delta
Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4)
name old user-time/op new user-time/op delta
Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4)
name old sys-time/op new sys-time/op delta
Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5)
name old peak-RSS-bytes new peak-RSS-bytes delta
Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5)
We can live with a bit more resource usage, but the time/op getting
close to 10s isn't good. To win that back, introduce concurrency in
main.go. This further increases resource usage a bit, but the real time
on this quad-core laptop is greatly reduced. The final benchstat is as
follows:
name old time/op new time/op delta
Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5)
name old user-time/op new user-time/op delta
Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5)
name old sys-time/op new sys-time/op delta
Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5)
name old peak-RSS-bytes new peak-RSS-bytes delta
Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5)
It might be possible to reduce the cpu or memory usage in the future,
such as configuring go/types to do less work, or taking shortcuts to
avoid having to run it many times. For now, ~2x cpu and ~4x memory usage
seems like a fair trade for a faster and better rulegen.
Finally, we can remove the old code that tried to remove some unused
variables in a hacky and unmaintainable way.
Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6
Reviewed-on: https://go-review.googlesource.com/c/go/+/189798
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
A lot of the naked for loops begin like:
for {
v := b.Control
if v.Op != OpConstBool {
break
}
...
return true
}
Instead, write them out in a more compact and readable way:
for v.Op == OpConstBool {
...
return true
}
This requires the addition of two bytes.Buffer writers, as this helps us
make a decision based on future pieces of generated code. This probably
makes rulegen slightly slower, but that's not noticeable; the code
generation still takes ~3.5s on my laptop, excluding build time.
The "v := b.Control" declaration can be moved to the top of each
function. Even though the rules can modify b.Control when firing, they
also make the function return, so v can't be used again.
While at it, remove three unnecessary lines from the top of each
rewriteBlock func.
In total, this results in ~4k lines removed from the generated code, and
a slight improvement in readability.
Change-Id: I317e4c6a4842c64df506f4513375475fad2aeec5
Reviewed-on: https://go-review.googlesource.com/c/go/+/167399
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change-Id: I7e921b7b4665ff76dc8bae493b2a49318690770b
Reviewed-on: https://go-review.googlesource.com/c/go/+/168637
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Flagalloc has the unenviable task of splitting
flag-generating ops that have been merged with loads
when the flags need to "spilled" (i.e. regenerated).
Since there weren't very many of them, there was a hard-coded list
of ops and bespoke code written to split them.
This change migrates load splitting into rewrite rules,
to make them easier to maintain.
Change-Id: I7750eafb888a802206c410f9c341b3133e7748b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/166978
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|