Age | Commit message (Collapse) | Author |
|
Change-Id: Iafd72fb06a491075f7f996a6684e0d495c96aee5
Reviewed-on: https://go-review.googlesource.com/c/go/+/264342
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Fix a longstanding TODO.
Provides widespread, minor improvements.
Negligible compiler cost.
Because the freeze nears, put in a safety flag to easily disable.
Change-Id: I338812181ab6d806fecf22afd3c3502e2c94f7a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/229600
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.
Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.
This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.
Passes toolstash-check -all.
Results of compilebench:
name old time/op new time/op delta
Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20)
Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18)
GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18)
Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18)
SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18)
Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20)
GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20)
Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20)
Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20)
XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18)
LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19)
ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20)
LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20)
StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20)
name old user-time/op new user-time/op delta
Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20)
Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20)
GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19)
Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18)
SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18)
Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18)
GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20)
Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18)
Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20)
XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20)
LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20)
ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19)
LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20)
name old object-bytes new object-bytes delta
Template 559kB ± 0% 559kB ± 0% ~ (all equal)
Unicode 216kB ± 0% 216kB ± 0% ~ (all equal)
GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal)
Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20)
SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20)
Flate 343kB ± 0% 343kB ± 0% ~ (all equal)
GoParser 441kB ± 0% 441kB ± 0% ~ (all equal)
Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal)
Tar 487kB ± 0% 487kB ± 0% ~ (all equal)
XML 632kB ± 0% 632kB ± 0% ~ (all equal)
name old export-bytes new export-bytes delta
Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal)
Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal)
GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal)
Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20)
SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20)
Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal)
GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal)
Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal)
Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal)
XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal)
CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal)
CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal)
CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal)
Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Flagalloc has the unenviable task of splitting
flag-generating ops that have been merged with loads
when the flags need to "spilled" (i.e. regenerated).
Since there weren't very many of them, there was a hard-coded list
of ops and bespoke code written to split them.
This change migrates load splitting into rewrite rules,
to make them easier to maintain.
Change-Id: I7750eafb888a802206c410f9c341b3133e7748b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/166978
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
CMPL/CMPW/CMPB can take a memory operand on 386, and this CL
implements that optimization.
1. The total size of pkg/linux_386 decreases about 45KB, excluding
cmd/compile.
2. The go1 benchmark shows a little improvement.
name old time/op new time/op delta
BinaryTree17-4 3.36s ± 2% 3.37s ± 3% ~ (p=0.537 n=40+40)
Fannkuch11-4 3.59s ± 1% 3.53s ± 2% -1.58% (p=0.000 n=40+40)
FmtFprintfEmpty-4 46.0ns ± 3% 45.8ns ± 3% ~ (p=0.249 n=40+40)
FmtFprintfString-4 80.0ns ± 4% 78.8ns ± 3% -1.49% (p=0.001 n=40+40)
FmtFprintfInt-4 89.7ns ± 2% 90.3ns ± 2% +0.74% (p=0.003 n=40+40)
FmtFprintfIntInt-4 144ns ± 3% 143ns ± 3% -0.95% (p=0.003 n=40+40)
FmtFprintfPrefixedInt-4 181ns ± 4% 180ns ± 2% ~ (p=0.103 n=40+40)
FmtFprintfFloat-4 412ns ± 3% 408ns ± 4% -0.97% (p=0.018 n=40+40)
FmtManyArgs-4 607ns ± 4% 605ns ± 4% ~ (p=0.148 n=40+40)
GobDecode-4 7.19ms ± 4% 7.24ms ± 5% ~ (p=0.340 n=40+40)
GobEncode-4 7.04ms ± 9% 6.99ms ± 9% ~ (p=0.289 n=40+40)
Gzip-4 400ms ± 6% 398ms ± 5% ~ (p=0.168 n=40+40)
Gunzip-4 41.2ms ± 3% 41.7ms ± 3% +1.40% (p=0.001 n=40+40)
HTTPClientServer-4 62.5µs ± 1% 62.1µs ± 2% -0.61% (p=0.000 n=37+37)
JSONEncode-4 20.7ms ± 4% 20.4ms ± 3% -1.60% (p=0.000 n=40+40)
JSONDecode-4 69.4ms ± 4% 69.2ms ± 6% ~ (p=0.177 n=40+40)
Mandelbrot200-4 5.22ms ± 6% 5.21ms ± 3% ~ (p=0.531 n=40+40)
GoParse-4 3.29ms ± 3% 3.28ms ± 3% ~ (p=0.321 n=40+39)
RegexpMatchEasy0_32-4 104ns ± 4% 103ns ± 7% -0.89% (p=0.040 n=40+40)
RegexpMatchEasy0_1K-4 852ns ± 3% 853ns ± 2% ~ (p=0.357 n=40+40)
RegexpMatchEasy1_32-4 113ns ± 8% 113ns ± 3% ~ (p=0.906 n=40+40)
RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 5% ~ (p=0.326 n=40+40)
RegexpMatchMedium_32-4 136ns ± 3% 133ns ± 3% -2.31% (p=0.000 n=40+40)
RegexpMatchMedium_1K-4 44.0µs ± 3% 43.7µs ± 3% ~ (p=0.053 n=40+40)
RegexpMatchHard_32-4 2.27µs ± 3% 2.26µs ± 4% ~ (p=0.391 n=40+40)
RegexpMatchHard_1K-4 68.0µs ± 3% 68.9µs ± 3% +1.28% (p=0.000 n=40+40)
Revcomp-4 1.86s ± 5% 1.86s ± 2% ~ (p=0.950 n=40+40)
Template-4 73.4ms ± 4% 69.9ms ± 7% -4.78% (p=0.000 n=40+40)
TimeParse-4 449ns ± 4% 441ns ± 5% -1.76% (p=0.000 n=40+40)
TimeFormat-4 416ns ± 3% 417ns ± 4% ~ (p=0.304 n=40+40)
[Geo mean] 67.7µs 67.3µs -0.55%
name old speed new speed delta
GobDecode-4 107MB/s ± 4% 106MB/s ± 5% ~ (p=0.336 n=40+40)
GobEncode-4 109MB/s ± 5% 110MB/s ± 9% ~ (p=0.142 n=38+40)
Gzip-4 48.5MB/s ± 5% 48.8MB/s ± 5% ~ (p=0.172 n=40+40)
Gunzip-4 472MB/s ± 3% 465MB/s ± 3% -1.39% (p=0.001 n=40+40)
JSONEncode-4 93.6MB/s ± 4% 95.1MB/s ± 3% +1.61% (p=0.000 n=40+40)
JSONDecode-4 28.0MB/s ± 3% 28.1MB/s ± 6% ~ (p=0.181 n=40+40)
GoParse-4 17.6MB/s ± 3% 17.7MB/s ± 3% ~ (p=0.350 n=40+39)
RegexpMatchEasy0_32-4 308MB/s ± 4% 311MB/s ± 6% +0.96% (p=0.025 n=40+40)
RegexpMatchEasy0_1K-4 1.20GB/s ± 3% 1.20GB/s ± 2% ~ (p=0.317 n=40+40)
RegexpMatchEasy1_32-4 282MB/s ± 7% 282MB/s ± 3% ~ (p=0.516 n=40+40)
RegexpMatchEasy1_1K-4 994MB/s ± 4% 991MB/s ± 5% ~ (p=0.319 n=40+40)
RegexpMatchMedium_32-4 7.31MB/s ± 3% 7.49MB/s ± 3% +2.46% (p=0.000 n=40+40)
RegexpMatchMedium_1K-4 23.3MB/s ± 3% 23.4MB/s ± 3% ~ (p=0.052 n=40+40)
RegexpMatchHard_32-4 14.1MB/s ± 3% 14.1MB/s ± 4% ~ (p=0.391 n=40+40)
RegexpMatchHard_1K-4 15.1MB/s ± 3% 14.9MB/s ± 3% -1.27% (p=0.000 n=40+40)
Revcomp-4 137MB/s ± 5% 137MB/s ± 2% ~ (p=0.942 n=40+40)
Template-4 26.5MB/s ± 4% 27.8MB/s ± 7% +5.03% (p=0.000 n=40+40)
[Geo mean] 78.6MB/s 79.0MB/s +0.57%
Change-Id: Idcacc6881ef57cd7dc33aa87b711282842b72a53
Reviewed-on: https://go-review.googlesource.com/126618
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Some *mem ops are loads, some are stores, some are modifications.
Replace mem->load for the loads.
Replace mem->store for the stores.
Replace mem->modify for the load-modify-stores.
The only semantic change in this CL is to mark
ADD(Q|L)constmodify (which used to be ADD(Q|L)constmem) as
both a read and a write, instead of just a write. This is arguably
a bug fix, but the bug isn't triggerable at the moment, see CL 112157.
Change-Id: Iccb45aea817b606adb2d712ff99b10ee28e4616a
Reviewed-on: https://go-review.googlesource.com/112159
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Allow the compiler to generate code like CMPQ 16(AX), $7
It's tricky because it's difficult to spill such a comparison during
flagalloc, because the same memory state might not be available at
the restore locations.
Solve this problem by decomposing the compare+load back into its parts
if it needs to be spilled.
The big win is that the write barrier test goes from:
MOVL runtime.writeBarrier(SB), CX
TESTL CX, CX
JNE 60
to
CMPL runtime.writeBarrier(SB), $0
JNE 59
It's one instruction and one byte smaller.
Fixes #19485
Fixes #15245
Update #22460
Binaries are about 0.15% smaller.
Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836
Reviewed-on: https://go-review.googlesource.com/86035
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
|
|
We compute a lot of stuff based off the CFG: postorder traversal,
dominators, dominator tree, loop nest. Multiple phases use this
information and we end up recomputing some of it. Add a cache
for this information so if the CFG hasn't changed, we can reuse
the previous computation.
Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2
Reviewed-on: https://go-review.googlesource.com/29356
Reviewed-by: David Chase <drchase@google.com>
|
|
If an op generates a tuple, and part of that tuple is of flags type,
then treat the op as clobbering flags.
Normally this doesn't matter because we do:
v1 = ADDS <int32, flags>
v2 = Select0 v1 <int32>
v3 = Select1 v1 <flags>
And v3 will do the right clobbering of flags. But in the rare
cases where we issue a tuple-with-flag op and the flag portion
is dead, then we never issue a Select1. But v1 still clobbers flags,
so we need to respect that.
Fixes builder failure in CL 28950.
Change-Id: I589089fd81aaeaaa9750bb8d85e7b10199aaa002
Reviewed-on: https://go-review.googlesource.com/29083
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Reg allocator skips flag-typed values. Flag allocator uses the type
and whether the op has "clobberFlags" set.
Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64.
PPC64 is coded blindly.
Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f
Reviewed-on: https://go-review.googlesource.com/25480
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
We want tuple-reading ops immediately follow tuple-generating op, so
that tuple values will not be spilled/copied.
The mechanism introduced in the previous CL cannot really avoid tuples
interleaving. In this CL we always emit tuple and their selectors together.
Maybe remove the tuple scores if it does not help on performance (todo).
Also let tighten not move tuple-reading ops across blocks.
In the previous CL a special case of regenerating flags with tuple-reading
pseudo-op is added, but it did not cover end-of-block case. This is fixed
in this CL and the condition is generalized.
Progress on SSA backend for ARM. Still not complete.
Updates #15365.
Change-Id: I8980b34e7a64eb98153540e9e19a3782e20406ff
Reviewed-on: https://go-review.googlesource.com/23792
Reviewed-by: David Chase <drchase@google.com>
|
|
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Fix hardcoded flag register mask in ssa/flagalloc.go by auto-generating
the mask.
Also fix a mistake (in previous CL) about conditional branches.
Progress on SSA backend for ARM. Still not complete. Now "container/ring"
package compiles and tests passed.
Updates #15365.
Change-Id: Id7c8805c30dbb8107baedb485ed0f71f59ed6ea8
Reviewed-on: https://go-review.googlesource.com/23093
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Provide indexes along with block pointers for Preds
and Succs arrays. This allows us to splice edges in
and out of those arrays in constant time.
Fixes worst-case O(n^2) behavior in deadcode and fuse.
benchmark old ns/op new ns/op delta
BenchmarkFuse1-8 2065 2057 -0.39%
BenchmarkFuse10-8 9408 9073 -3.56%
BenchmarkFuse100-8 105238 76277 -27.52%
BenchmarkFuse1000-8 3982562 1026750 -74.22%
BenchmarkFuse10000-8 301220329 12824005 -95.74%
BenchmarkDeadCode1-8 1588 1566 -1.39%
BenchmarkDeadCode10-8 4333 4250 -1.92%
BenchmarkDeadCode100-8 32031 32574 +1.70%
BenchmarkDeadCode1000-8 590407 468275 -20.69%
BenchmarkDeadCode10000-8 17822890 5000818 -71.94%
BenchmarkDeadCode100000-8 1388706640 78021127 -94.38%
BenchmarkDeadCode200000-8 5372518479 168598762 -96.86%
Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17
Reviewed-on: https://go-review.googlesource.com/22589
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Keep track of how many uses each Value has. Each appearance in
Value.Args and in Block.Control counts once.
The number of uses of a value is generically useful to
constrain rewrite rules. For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.
But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use. Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races. In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice. (I have a separate
CL which triggers the mutex failure. This CL has a test which
demonstrates a similar failure.)
Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
|
|
Make sure we do any just-before-return cleanup on all paths out of a
function, including when recovering. Each exit path should include
deferreturn (if there are any defers) and then the exit
code (e.g. copying heap-escaping return values back to the stack).
Introduce a Defer SSA block type which has two outgoing edges - one the
fallthrough edge (the defer was queued successfully) and one which
immediately returns (the defer had a successful recover() call and
normal execution should resume at the return point).
Fixes #14725
Change-Id: Iad035c9fd25ef8b7a74dafbd7461cf04833d981f
Reviewed-on: https://go-review.googlesource.com/20486
Reviewed-by: David Chase <drchase@google.com>
|
|
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Flagalloc was recalculating flags is some situations
when it didn't need to. Fixed by using the same name
for the original flag calculation instruction throughout.
Change-Id: Ic0bf58f728a8d87748434dd25a67b0708755e1f8
Reviewed-on: https://go-review.googlesource.com/19237
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Rename StoreConst to ValAndOff so we can use it for other ops.
Make ValAndOff print nicely.
Add some notes & checks related to my aborted attempt to
implement combined CMP+load ops.
Change-Id: I2f901d12d42bc5a82879af0334806aa184a97e27
Reviewed-on: https://go-review.googlesource.com/18947
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
The x86 backend automatically rewrites MOV $0, AX to
XOR AX, AX. That rewrite isn't ok when the flags register
is live across the MOV. Keep track of which moves care
about preserving flags, then disable this rewrite for them.
On x86, Prog.Mark was being used to hold the length of the
instruction. We already store that in Prog.Isize, so no
need to store it in Prog.Mark also. This frees up Prog.Mark
to hold a bitmask on x86 just like all the other architectures.
Update #12405
Change-Id: Ibad8a8f41fc6222bec1e4904221887d3cc3ca029
Reviewed-on: https://go-review.googlesource.com/18861
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
Reorder how register & stack allocation is done. We used to allocate
registers, then fix up merge edges, then allocate stack slots. This
lead to lots of unnecessary copies on merge edges:
v2 = LoadReg v1
v3 = StoreReg v2
If v1 and v3 are allocated to the same stack slot, then this code is
unnecessary. But at regalloc time we didn't know the homes of v1 and
v3.
To fix this problem, allocate all the stack slots before fixing up the
merge edges. That way, we know what stack slots values use so we know
what copies are required.
Use a good technique for shuffling values around on merge edges.
Improves performance of the go1 TimeParse benchmark by ~12%
Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa
Reviewed-on: https://go-review.googlesource.com/17915
Reviewed-by: David Chase <drchase@google.com>
|
|
With the separate flagalloc pass, it should be fine to
allow CSE of control values. The worst that can happen
is that the comparison gets un-CSEd by flagalloc.
Fix bug in flagalloc where flag restores were getting
clobbered by rematerialization during register allocation.
Change-Id: If476cf98b69973e8f1a8eb29441136dd12fab8ad
Reviewed-on: https://go-review.googlesource.com/17760
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
Spilling/restoring flag values is a pain to do during regalloc.
Instead, allocate the flag register in a separate pass. Regalloc then
operates normally on any flag recomputation instructions.
Change-Id: Ia1c3d9e6eff678861193093c0b48a00f90e4156b
Reviewed-on: https://go-review.googlesource.com/17694
Reviewed-by: David Chase <drchase@google.com>
|