diff options
author | Xiangdong Ji <xiangdong.ji@arm.com> | 2020-05-06 09:54:40 +0000 |
---|---|---|
committer | Keith Randall <khr@golang.org> | 2020-05-29 15:39:54 +0000 |
commit | e8f5a33191b6b2690fdfda770272a650f4df631d (patch) | |
tree | 8df9f6abacde97b935ee85ffd4be4d26d7503947 /test/codegen | |
parent | 65f514edfb0ca5208e961318306eeddfdf79fda7 (diff) | |
download | go-e8f5a33191b6b2690fdfda770272a650f4df631d.tar.gz go-e8f5a33191b6b2690fdfda770272a650f4df631d.zip |
cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.
Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.
Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:
Block-Op Meaning ARM condition codes
1. LTnoov less than MI
2. GEnoov greater than or equal PL
3. LEnoov less than or equal MI || EQ
4. GTnoov greater than NEQ & PL
The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.
Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.
Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.
This change also updates TestFormats to support using %#x.
Examples exhibiting where does the issue come from:
1: 'if x + 3 < 0' might be converted to:
before:
CMN $3, R0
BGE <else branch> // wrong branch is taken if 'x+3' overflows
after:
CMN $3, R0
BPL <else branch>
2: 'if y - 3 > 0' might be converted to:
before:
CMP $3, R0
BLE <else branch> // wrong branch is taken if 'y-3' underflows
after:
CMP $3, R0
BMI <else branch>
BEQ <else branch>
Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.
S1:
name old time/op new time/op delta
CondRewrite/SoloJump 13.6ns ± 0% 12.9ns ± 0% -5.15% (p=0.000 n=10+10)
CondRewrite/CombJump 13.8ns ± 1% 12.9ns ± 0% -6.32% (p=0.000 n=10+10)
S2:
name old time/op new time/op delta
CondRewrite/SoloJump 11.6ns ± 0% 10.9ns ± 0% -6.03% (p=0.000 n=10+10)
CondRewrite/CombJump 11.4ns ± 0% 10.8ns ± 1% -5.53% (p=0.000 n=10+10)
S3:
name old time/op new time/op delta
CondRewrite/SoloJump 7.36ns ± 0% 7.50ns ± 0% +1.79% (p=0.000 n=9+10)
CondRewrite/CombJump 7.35ns ± 0% 7.75ns ± 0% +5.51% (p=0.000 n=8+9)
S4:
name old time/op new time/op delta
CondRewrite/SoloJump-224 11.5ns ± 1% 10.9ns ± 0% -4.97% (p=0.000 n=10+10)
CondRewrite/CombJump-224 11.9ns ± 0% 11.5ns ± 0% -2.95% (p=0.000 n=10+10)
S5:
name old time/op new time/op delta
CondRewrite/SoloJump 10.0ns ± 0% 10.0ns ± 0% -0.45% (p=0.000 n=9+10)
CondRewrite/CombJump 9.93ns ± 0% 9.77ns ± 0% -1.53% (p=0.000 n=10+9)
Go1 perf. data:
name old time/op new time/op delta
BinaryTree17 6.29s ± 1% 6.30s ± 1% ~ (p=1.000 n=5+5)
Fannkuch11 5.40s ± 0% 5.40s ± 0% ~ (p=0.841 n=5+5)
FmtFprintfEmpty 97.9ns ± 0% 98.9ns ± 3% ~ (p=0.937 n=4+5)
FmtFprintfString 171ns ± 3% 171ns ± 2% ~ (p=0.754 n=5+5)
FmtFprintfInt 212ns ± 0% 217ns ± 6% +2.55% (p=0.008 n=5+5)
FmtFprintfIntInt 296ns ± 1% 297ns ± 2% ~ (p=0.516 n=5+5)
FmtFprintfPrefixedInt 371ns ± 2% 374ns ± 7% ~ (p=1.000 n=5+5)
FmtFprintfFloat 435ns ± 1% 439ns ± 2% ~ (p=0.056 n=5+5)
FmtManyArgs 1.37µs ± 1% 1.36µs ± 1% ~ (p=0.730 n=5+5)
GobDecode 14.6ms ± 4% 14.4ms ± 4% ~ (p=0.690 n=5+5)
GobEncode 11.8ms ±20% 11.6ms ±15% ~ (p=1.000 n=5+5)
Gzip 507ms ± 0% 491ms ± 0% -3.22% (p=0.008 n=5+5)
Gunzip 73.8ms ± 0% 73.9ms ± 0% ~ (p=0.690 n=5+5)
HTTPClientServer 116µs ± 0% 116µs ± 0% ~ (p=0.686 n=4+4)
JSONEncode 21.8ms ± 1% 21.6ms ± 2% ~ (p=0.151 n=5+5)
JSONDecode 104ms ± 1% 103ms ± 1% -1.08% (p=0.016 n=5+5)
Mandelbrot200 9.53ms ± 0% 9.53ms ± 0% ~ (p=0.421 n=5+5)
GoParse 7.55ms ± 1% 7.51ms ± 1% ~ (p=0.151 n=5+5)
RegexpMatchEasy0_32 158ns ± 0% 158ns ± 0% ~ (all equal)
RegexpMatchEasy0_1K 606ns ± 1% 608ns ± 3% ~ (p=0.937 n=5+5)
RegexpMatchEasy1_32 143ns ± 0% 144ns ± 1% ~ (p=0.095 n=5+4)
RegexpMatchEasy1_1K 927ns ± 2% 944ns ± 2% ~ (p=0.056 n=5+5)
RegexpMatchMedium_32 16.0ns ± 0% 16.0ns ± 0% ~ (all equal)
RegexpMatchMedium_1K 69.3µs ± 2% 69.7µs ± 0% ~ (p=0.690 n=5+5)
RegexpMatchHard_32 3.73µs ± 0% 3.73µs ± 1% ~ (p=0.984 n=5+5)
RegexpMatchHard_1K 111µs ± 1% 110µs ± 0% ~ (p=0.151 n=5+5)
Revcomp 1.91s ±47% 1.77s ±68% ~ (p=1.000 n=5+5)
Template 138ms ± 1% 138ms ± 1% ~ (p=1.000 n=5+5)
TimeParse 787ns ± 2% 785ns ± 1% ~ (p=0.540 n=5+5)
TimeFormat 729ns ± 1% 726ns ± 1% ~ (p=0.151 n=5+5)
Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Diffstat (limited to 'test/codegen')
-rw-r--r-- | test/codegen/comparisons.go | 136 |
1 files changed, 136 insertions, 0 deletions
diff --git a/test/codegen/comparisons.go b/test/codegen/comparisons.go index c020ea8eb7..eb2f3317c9 100644 --- a/test/codegen/comparisons.go +++ b/test/codegen/comparisons.go @@ -248,3 +248,139 @@ func CmpLogicalToZero(a, b, c uint32, d, e uint64) uint64 { } return 0 } + +// The following CmpToZero_ex* check that cmp|cmn with bmi|bpl are generated for +// 'comparing to zero' expressions + +// var + const +func CmpToZero_ex1(a int64, e int32) int { + // arm64:`CMN`,-`ADD`,`(BMI|BPL)` + if a+3 < 0 { + return 1 + } + + // arm64:`CMN`,-`ADD`,`BEQ`,`(BMI|BPL)` + if a+5 <= 0 { + return 1 + } + + // arm64:`CMN`,-`ADD`,`(BMI|BPL)` + if a+13 >= 0 { + return 2 + } + + // arm64:`CMP`,-`SUB`,`(BMI|BPL)` + if a-7 < 0 { + return 3 + } + + // arm64:`CMP`,-`SUB`,`(BMI|BPL)` + if a-11 >= 0 { + return 4 + } + + // arm64:`CMP`,-`SUB`,`BEQ`,`(BMI|BPL)` + if a-19 > 0 { + return 4 + } + + // arm64:`CMNW`,-`ADDW`,`(BMI|BPL)` + if e+3 < 0 { + return 5 + } + + // arm64:`CMNW`,-`ADDW`,`(BMI|BPL)` + if e+13 >= 0 { + return 6 + } + + // arm64:`CMPW`,-`SUBW`,`(BMI|BPL)` + if e-7 < 0 { + return 7 + } + + // arm64:`CMPW`,-`SUBW`,`(BMI|BPL)` + if e-11 >= 0 { + return 8 + } + + return 0 +} + +// var + var +// TODO: optimize 'var - var' +func CmpToZero_ex2(a, b, c int64, e, f, g int32) int { + // arm64:`CMN`,-`ADD`,`(BMI|BPL)` + if a+b < 0 { + return 1 + } + + // arm64:`CMN`,-`ADD`,`BEQ`,`(BMI|BPL)` + if a+c <= 0 { + return 1 + } + + // arm64:`CMN`,-`ADD`,`(BMI|BPL)` + if b+c >= 0 { + return 2 + } + + // arm64:`CMNW`,-`ADDW`,`(BMI|BPL)` + if e+f < 0 { + return 5 + } + + // arm64:`CMNW`,-`ADDW`,`(BMI|BPL)` + if f+g >= 0 { + return 6 + } + return 0 +} + +// var + var*var +func CmpToZero_ex3(a, b, c, d int64, e, f, g, h int32) int { + // arm64:`CMN`,-`MADD`,`MUL`,`(BMI|BPL)` + if a+b*c < 0 { + return 1 + } + + // arm64:`CMN`,-`MADD`,`MUL`,`(BMI|BPL)` + if b+c*d >= 0 { + return 2 + } + + // arm64:`CMNW`,-`MADDW`,`MULW`,`BEQ`,`(BMI|BPL)` + if e+f*g > 0 { + return 5 + } + + // arm64:`CMNW`,-`MADDW`,`MULW`,`BEQ`,`(BMI|BPL)` + if f+g*h <= 0 { + return 6 + } + return 0 +} + +// var - var*var +func CmpToZero_ex4(a, b, c, d int64, e, f, g, h int32) int { + // arm64:`CMP`,-`MSUB`,`MUL`,`BEQ`,`(BMI|BPL)` + if a-b*c > 0 { + return 1 + } + + // arm64:`CMP`,-`MSUB`,`MUL`,`(BMI|BPL)` + if b-c*d >= 0 { + return 2 + } + + // arm64:`CMPW`,-`MSUBW`,`MULW`,`(BMI|BPL)` + if e-f*g < 0 { + return 5 + } + + // arm64:`CMPW`,-`MSUBW`,`MULW`,`(BMI|BPL)` + if f-g*h >= 0 { + return 6 + } + return 0 +} |