aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/flagalloc.go
AgeCommit message (Collapse)Author
2020-10-22cmd/compile: remove go115flagallocdeadcodeCherry Zhang
Change-Id: Iafd72fb06a491075f7f996a6684e0d495c96aee5 Reviewed-on: https://go-review.googlesource.com/c/go/+/264342 Trust: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2020-04-23cmd/compile: remove dead values after flagallocJosh Bleecher Snyder
Fix a longstanding TODO. Provides widespread, minor improvements. Negligible compiler cost. Because the freeze nears, put in a safety flag to easily disable. Change-Id: I338812181ab6d806fecf22afd3c3502e2c94f7a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/229600 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02cmd/compile: allow multiple SSA block control valuesMichael Munday
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-03-19cmd/compile: move flagalloc op splitting to rewrite rulesJosh Bleecher Snyder
Flagalloc has the unenviable task of splitting flag-generating ops that have been merged with loads when the flags need to "spilled" (i.e. regenerated). Since there weren't very many of them, there was a hard-coded list of ops and bespoke code written to split them. This change migrates load splitting into rewrite rules, to make them easier to maintain. Change-Id: I7750eafb888a802206c410f9c341b3133e7748b8 Reviewed-on: https://go-review.googlesource.com/c/go/+/166978 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-08-20cmd/compile: optimize 386's comparisonBen Shi
CMPL/CMPW/CMPB can take a memory operand on 386, and this CL implements that optimization. 1. The total size of pkg/linux_386 decreases about 45KB, excluding cmd/compile. 2. The go1 benchmark shows a little improvement. name old time/op new time/op delta BinaryTree17-4 3.36s ± 2% 3.37s ± 3% ~ (p=0.537 n=40+40) Fannkuch11-4 3.59s ± 1% 3.53s ± 2% -1.58% (p=0.000 n=40+40) FmtFprintfEmpty-4 46.0ns ± 3% 45.8ns ± 3% ~ (p=0.249 n=40+40) FmtFprintfString-4 80.0ns ± 4% 78.8ns ± 3% -1.49% (p=0.001 n=40+40) FmtFprintfInt-4 89.7ns ± 2% 90.3ns ± 2% +0.74% (p=0.003 n=40+40) FmtFprintfIntInt-4 144ns ± 3% 143ns ± 3% -0.95% (p=0.003 n=40+40) FmtFprintfPrefixedInt-4 181ns ± 4% 180ns ± 2% ~ (p=0.103 n=40+40) FmtFprintfFloat-4 412ns ± 3% 408ns ± 4% -0.97% (p=0.018 n=40+40) FmtManyArgs-4 607ns ± 4% 605ns ± 4% ~ (p=0.148 n=40+40) GobDecode-4 7.19ms ± 4% 7.24ms ± 5% ~ (p=0.340 n=40+40) GobEncode-4 7.04ms ± 9% 6.99ms ± 9% ~ (p=0.289 n=40+40) Gzip-4 400ms ± 6% 398ms ± 5% ~ (p=0.168 n=40+40) Gunzip-4 41.2ms ± 3% 41.7ms ± 3% +1.40% (p=0.001 n=40+40) HTTPClientServer-4 62.5µs ± 1% 62.1µs ± 2% -0.61% (p=0.000 n=37+37) JSONEncode-4 20.7ms ± 4% 20.4ms ± 3% -1.60% (p=0.000 n=40+40) JSONDecode-4 69.4ms ± 4% 69.2ms ± 6% ~ (p=0.177 n=40+40) Mandelbrot200-4 5.22ms ± 6% 5.21ms ± 3% ~ (p=0.531 n=40+40) GoParse-4 3.29ms ± 3% 3.28ms ± 3% ~ (p=0.321 n=40+39) RegexpMatchEasy0_32-4 104ns ± 4% 103ns ± 7% -0.89% (p=0.040 n=40+40) RegexpMatchEasy0_1K-4 852ns ± 3% 853ns ± 2% ~ (p=0.357 n=40+40) RegexpMatchEasy1_32-4 113ns ± 8% 113ns ± 3% ~ (p=0.906 n=40+40) RegexpMatchEasy1_1K-4 1.03µs ± 4% 1.03µs ± 5% ~ (p=0.326 n=40+40) RegexpMatchMedium_32-4 136ns ± 3% 133ns ± 3% -2.31% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 44.0µs ± 3% 43.7µs ± 3% ~ (p=0.053 n=40+40) RegexpMatchHard_32-4 2.27µs ± 3% 2.26µs ± 4% ~ (p=0.391 n=40+40) RegexpMatchHard_1K-4 68.0µs ± 3% 68.9µs ± 3% +1.28% (p=0.000 n=40+40) Revcomp-4 1.86s ± 5% 1.86s ± 2% ~ (p=0.950 n=40+40) Template-4 73.4ms ± 4% 69.9ms ± 7% -4.78% (p=0.000 n=40+40) TimeParse-4 449ns ± 4% 441ns ± 5% -1.76% (p=0.000 n=40+40) TimeFormat-4 416ns ± 3% 417ns ± 4% ~ (p=0.304 n=40+40) [Geo mean] 67.7µs 67.3µs -0.55% name old speed new speed delta GobDecode-4 107MB/s ± 4% 106MB/s ± 5% ~ (p=0.336 n=40+40) GobEncode-4 109MB/s ± 5% 110MB/s ± 9% ~ (p=0.142 n=38+40) Gzip-4 48.5MB/s ± 5% 48.8MB/s ± 5% ~ (p=0.172 n=40+40) Gunzip-4 472MB/s ± 3% 465MB/s ± 3% -1.39% (p=0.001 n=40+40) JSONEncode-4 93.6MB/s ± 4% 95.1MB/s ± 3% +1.61% (p=0.000 n=40+40) JSONDecode-4 28.0MB/s ± 3% 28.1MB/s ± 6% ~ (p=0.181 n=40+40) GoParse-4 17.6MB/s ± 3% 17.7MB/s ± 3% ~ (p=0.350 n=40+39) RegexpMatchEasy0_32-4 308MB/s ± 4% 311MB/s ± 6% +0.96% (p=0.025 n=40+40) RegexpMatchEasy0_1K-4 1.20GB/s ± 3% 1.20GB/s ± 2% ~ (p=0.317 n=40+40) RegexpMatchEasy1_32-4 282MB/s ± 7% 282MB/s ± 3% ~ (p=0.516 n=40+40) RegexpMatchEasy1_1K-4 994MB/s ± 4% 991MB/s ± 5% ~ (p=0.319 n=40+40) RegexpMatchMedium_32-4 7.31MB/s ± 3% 7.49MB/s ± 3% +2.46% (p=0.000 n=40+40) RegexpMatchMedium_1K-4 23.3MB/s ± 3% 23.4MB/s ± 3% ~ (p=0.052 n=40+40) RegexpMatchHard_32-4 14.1MB/s ± 3% 14.1MB/s ± 4% ~ (p=0.391 n=40+40) RegexpMatchHard_1K-4 15.1MB/s ± 3% 14.9MB/s ± 3% -1.27% (p=0.000 n=40+40) Revcomp-4 137MB/s ± 5% 137MB/s ± 2% ~ (p=0.942 n=40+40) Template-4 26.5MB/s ± 4% 27.8MB/s ± 7% +5.03% (p=0.000 n=40+40) [Geo mean] 78.6MB/s 79.0MB/s +0.57% Change-Id: Idcacc6881ef57cd7dc33aa87b711282842b72a53 Reviewed-on: https://go-review.googlesource.com/126618 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08cmd/compile: rename memory-using operationsKeith Randall
Some *mem ops are loads, some are stores, some are modifications. Replace mem->load for the loads. Replace mem->store for the stores. Replace mem->modify for the load-modify-stores. The only semantic change in this CL is to mark ADD(Q|L)constmodify (which used to be ADD(Q|L)constmem) as both a read and a write, instead of just a write. This is arguably a bug fix, but the bug isn't triggerable at the moment, see CL 112157. Change-Id: Iccb45aea817b606adb2d712ff99b10ee28e4616a Reviewed-on: https://go-review.googlesource.com/112159 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-26cmd/compile: implement comparisons directly with memoryKeith Randall
Allow the compiler to generate code like CMPQ 16(AX), $7 It's tricky because it's difficult to spill such a comparison during flagalloc, because the same memory state might not be available at the restore locations. Solve this problem by decomposing the compare+load back into its parts if it needs to be spilled. The big win is that the write barrier test goes from: MOVL runtime.writeBarrier(SB), CX TESTL CX, CX JNE 60 to CMPL runtime.writeBarrier(SB), $0 JNE 59 It's one instruction and one byte smaller. Fixes #19485 Fixes #15245 Update #22460 Binaries are about 0.15% smaller. Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836 Reviewed-on: https://go-review.googlesource.com/86035 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2016-09-19cmd/compile: cache CFG-dependent computationsKeith Randall
We compute a lot of stuff based off the CFG: postorder traversal, dominators, dominator tree, loop nest. Multiple phases use this information and we end up recomputing some of it. Add a cache for this information so if the CFG hasn't changed, we can reuse the previous computation. Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2 Reviewed-on: https://go-review.googlesource.com/29356 Reviewed-by: David Chase <drchase@google.com>
2016-09-12cmd/compile: fix tuple-generating flag ops as clobbering flagsKeith Randall
If an op generates a tuple, and part of that tuple is of flags type, then treat the op as clobbering flags. Normally this doesn't matter because we do: v1 = ADDS <int32, flags> v2 = Select0 v1 <int32> v3 = Select1 v1 <flags> And v3 will do the right clobbering of flags. But in the rare cases where we issue a tuple-with-flag op and the flag portion is dead, then we never issue a Select1. But v1 still clobbers flags, so we need to respect that. Fixes builder failure in CL 28950. Change-Id: I589089fd81aaeaaa9750bb8d85e7b10199aaa002 Reviewed-on: https://go-review.googlesource.com/29083 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-07[dev.ssa] cmd/compile: remove flags from regMaskCherry Zhang
Reg allocator skips flag-typed values. Flag allocator uses the type and whether the op has "clobberFlags" set. Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64. PPC64 is coded blindly. Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f Reviewed-on: https://go-review.googlesource.com/25480 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-06-08[dev.ssa] cmd/compile: fix scheduling of tuple opsCherry Zhang
We want tuple-reading ops immediately follow tuple-generating op, so that tuple values will not be spilled/copied. The mechanism introduced in the previous CL cannot really avoid tuples interleaving. In this CL we always emit tuple and their selectors together. Maybe remove the tuple scores if it does not help on performance (todo). Also let tighten not move tuple-reading ops across blocks. In the previous CL a special case of regenerating flags with tuple-reading pseudo-op is added, but it did not cover end-of-block case. This is fixed in this CL and the condition is generalized. Progress on SSA backend for ARM. Still not complete. Updates #15365. Change-Id: I8980b34e7a64eb98153540e9e19a3782e20406ff Reviewed-on: https://go-review.googlesource.com/23792 Reviewed-by: David Chase <drchase@google.com>
2016-06-02[dev.ssa] cmd/compile: decompose 64-bit integer on ARMCherry Zhang
Introduce dec64 rules to (generically) decompose 64-bit integer on 32-bit architectures. 64-bit integer is composed/decomposed with Int64Make/Hi/Lo ops, as for complex types. The idea of dealing with Add64 is the following: (Add64 (Int64Make xh xl) (Int64Make yh yl)) -> (Int64Make (Add32withcarry xh yh (Select0 (Add32carry xl yl))) (Select1 (Add32carry xl yl))) where Add32carry returns a tuple (flags,uint32). Select0 and Select1 read the first and the second component of the tuple, respectively. The two Add32carry will be CSE'd. Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo). Also add support of KeepAlive, to fix build after merge. Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go, cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in cmd/compile/internal/gc/testdata passed. Progress on SSA for ARM. Still not complete. Updates #15365. Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec Reviewed-on: https://go-review.googlesource.com/23213 Reviewed-by: Keith Randall <khr@golang.org>
2016-05-19[dev.ssa] cmd/compile: handle boolean values for SSA on ARMCherry Zhang
Fix hardcoded flag register mask in ssa/flagalloc.go by auto-generating the mask. Also fix a mistake (in previous CL) about conditional branches. Progress on SSA backend for ARM. Still not complete. Now "container/ring" package compiles and tests passed. Updates #15365. Change-Id: Id7c8805c30dbb8107baedb485ed0f71f59ed6ea8 Reviewed-on: https://go-review.googlesource.com/23093 Reviewed-by: Keith Randall <khr@golang.org>
2016-05-05cmd/compile: enable constant-time CFG editingKeith Randall
Provide indexes along with block pointers for Preds and Succs arrays. This allows us to splice edges in and out of those arrays in constant time. Fixes worst-case O(n^2) behavior in deadcode and fuse. benchmark old ns/op new ns/op delta BenchmarkFuse1-8 2065 2057 -0.39% BenchmarkFuse10-8 9408 9073 -3.56% BenchmarkFuse100-8 105238 76277 -27.52% BenchmarkFuse1000-8 3982562 1026750 -74.22% BenchmarkFuse10000-8 301220329 12824005 -95.74% BenchmarkDeadCode1-8 1588 1566 -1.39% BenchmarkDeadCode10-8 4333 4250 -1.92% BenchmarkDeadCode100-8 32031 32574 +1.70% BenchmarkDeadCode1000-8 590407 468275 -20.69% BenchmarkDeadCode10000-8 17822890 5000818 -71.94% BenchmarkDeadCode100000-8 1388706640 78021127 -94.38% BenchmarkDeadCode200000-8 5372518479 168598762 -96.86% Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17 Reviewed-on: https://go-review.googlesource.com/22589 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-17cmd/compile: keep value use counts in SSAKeith Randall
Keep track of how many uses each Value has. Each appearance in Value.Args and in Block.Control counts once. The number of uses of a value is generically useful to constrain rewrite rules. For instance, we might want to prevent merging index operations into loads if the same index expression is used lots of times. But I have one use in particular for which the use count is required. We must make sure we don't combine ops with loads if the load has more than one use. Otherwise, we may split a single load into multiple loads and that breaks perceived behavior in the presence of races. In particular, the load of m.state in sync/mutex.go:Lock can't be done twice. (I have a separate CL which triggers the mutex failure. This CL has a test which demonstrates a similar failure.) Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e Reviewed-on: https://go-review.googlesource.com/20790 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-10cmd/compile: fix defer/deferreturnKeith Randall
Make sure we do any just-before-return cleanup on all paths out of a function, including when recovering. Each exit path should include deferreturn (if there are any defers) and then the exit code (e.g. copying heap-escaping return values back to the stack). Introduce a Defer SSA block type which has two outgoing edges - one the fallthrough edge (the defer was queued successfully) and one which immediately returns (the defer had a successful recover() call and normal execution should resume at the return point). Fixes #14725 Change-Id: Iad035c9fd25ef8b7a74dafbd7461cf04833d981f Reviewed-on: https://go-review.googlesource.com/20486 Reviewed-by: David Chase <drchase@google.com>
2016-03-02all: single space after period.Brad Fitzpatrick
The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-11[dev.ssa] cmd/compile: remove redundant compare opsKeith Randall
Flagalloc was recalculating flags is some situations when it didn't need to. Fixed by using the same name for the original flag calculation instruction throughout. Change-Id: Ic0bf58f728a8d87748434dd25a67b0708755e1f8 Reviewed-on: https://go-review.googlesource.com/19237 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-01-29[dev.ssa] cmd/compile: prepare for some load+op combiningKeith Randall
Rename StoreConst to ValAndOff so we can use it for other ops. Make ValAndOff print nicely. Add some notes & checks related to my aborted attempt to implement combined CMP+load ops. Change-Id: I2f901d12d42bc5a82879af0334806aa184a97e27 Reviewed-on: https://go-review.googlesource.com/18947 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: David Chase <drchase@google.com>
2016-01-26[dev.ssa] cmd/compile: disable xor clearing when flags must be preservedKeith Randall
The x86 backend automatically rewrites MOV $0, AX to XOR AX, AX. That rewrite isn't ok when the flags register is live across the MOV. Keep track of which moves care about preserving flags, then disable this rewrite for them. On x86, Prog.Mark was being used to hold the length of the instruction. We already store that in Prog.Isize, so no need to store it in Prog.Mark also. This frees up Prog.Mark to hold a bitmask on x86 just like all the other architectures. Update #12405 Change-Id: Ibad8a8f41fc6222bec1e4904221887d3cc3ca029 Reviewed-on: https://go-review.googlesource.com/18861 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>
2015-12-21[dev.ssa] cmd/compile: better register allocatorKeith Randall
Reorder how register & stack allocation is done. We used to allocate registers, then fix up merge edges, then allocate stack slots. This lead to lots of unnecessary copies on merge edges: v2 = LoadReg v1 v3 = StoreReg v2 If v1 and v3 are allocated to the same stack slot, then this code is unnecessary. But at regalloc time we didn't know the homes of v1 and v3. To fix this problem, allocate all the stack slots before fixing up the merge edges. That way, we know what stack slots values use so we know what copies are required. Use a good technique for shuffling values around on merge edges. Improves performance of the go1 TimeParse benchmark by ~12% Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa Reviewed-on: https://go-review.googlesource.com/17915 Reviewed-by: David Chase <drchase@google.com>
2015-12-12[dev.ssa] cmd/compile: allow control values to be CSEdKeith Randall
With the separate flagalloc pass, it should be fine to allow CSE of control values. The worst that can happen is that the comparison gets un-CSEd by flagalloc. Fix bug in flagalloc where flag restores were getting clobbered by rematerialization during register allocation. Change-Id: If476cf98b69973e8f1a8eb29441136dd12fab8ad Reviewed-on: https://go-review.googlesource.com/17760 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org>
2015-12-11[dev.ssa] cmd/compile: allocate the flag register in a separate passKeith Randall
Spilling/restoring flag values is a pain to do during regalloc. Instead, allocate the flag register in a separate pass. Regalloc then operates normally on any flag recomputation instructions. Change-Id: Ia1c3d9e6eff678861193093c0b48a00f90e4156b Reviewed-on: https://go-review.googlesource.com/17694 Reviewed-by: David Chase <drchase@google.com>