Age | Commit message (Collapse) | Author |
|
non-standard calls on ARM64
On ARM64, (external) linker generated trampoline may clobber R16
and R17. In CL 183842 we change Duff's devices not to use those
registers. However, this is not enough. The register allocator
also needs to know that these registers may be clobbered in any
calls that don't follow the standard Go calling convention. This
include Duff's devices and the write barrier.
Fixes #46927.
Updates #32773.
Change-Id: Ia52a891d9bbb8515c927617dd53aee5af5bd9aa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/184437
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Meng Zhuo <mzh@golangcn.org>
(cherry picked from commit 11b4aee05bfe83513cf08f83091e5aef8b33e766)
Reviewed-on: https://go-review.googlesource.com/c/go/+/331030
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
|
|
barrier call on PPC64
When external linking, for large binaries, the external linker
may insert a trampoline for the write barrier call, which looks
0000000005a98cc8 <__long_branch_runtime.gcWriteBarrier>:
5a98cc8: 86 01 82 3d addis r12,r2,390
5a98ccc: d8 bd 8c e9 ld r12,-16936(r12)
5a98cd0: a6 03 89 7d mtctr r12
5a98cd4: 20 04 80 4e bctr
It clobbers R12 (and CTR, which is never live across a call).
As at compile time we don't know whether the binary is big and
what link mode will be used, I think we need to mark R12 as
clobbered for write barrier call. For extra safety (future-proof)
we mark caller-saved register that cannot be used for function
arguments, which includes R11, as potentially clobbered as well.
Updates #40851.
Fixes #40868.
Change-Id: Iedd901c5072f1127cc59b0a48cfeb4aaec81b519
Reviewed-on: https://go-review.googlesource.com/c/go/+/248917
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
(cherry picked from commit b58d29741650c7bf10b17f455666e2727e1cdd2e)
Reviewed-on: https://go-review.googlesource.com/c/go/+/249019
|
|
They were missed as part of the refactoring to use a separate
addressing modes pass.
Fixes #40426
Change-Id: Ie0418b2fac4ba1ffe720644ac918f6d728d5e420
Reviewed-on: https://go-review.googlesource.com/c/go/+/244859
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Fixes the *noov opcodes so they handle a constant argument properly.
Most of the infrastructure for this CL is in CL 238077 (the arm32 one).
Fixes #39505
Change-Id: Id424a4e18964b848f05aa42f4d78e5f2e2cdf43b
Reviewed-on: https://go-review.googlesource.com/c/go/+/237999
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Encode the flag results in an auxint field instead of having
one opcode per flag state. This helps us handle the new *noov
branches in a unified manner.
This is only for arm, arm64 is in a subsequent CL.
We could extend to other architectures as well, athough it would
only be cleanup, no behavioral change.
Update #39505
Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/238077
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
These conversion instructions set the condition code and so should
be marked as clobbering flags.
Fixes #39651.
Change-Id: I91cc9687ea70ef0551bb3139c1875071c349d43e
Reviewed-on: https://go-review.googlesource.com/c/go/+/238628
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.
Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.
Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:
Block-Op Meaning ARM condition codes
1. LTnoov less than MI
2. GEnoov greater than or equal PL
3. LEnoov less than or equal MI || EQ
4. GTnoov greater than NEQ & PL
The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.
For more details please refer to the previous fix on 64-bit ARM:
https://go-review.googlesource.com/c/go/+/233097
Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.
name old time/op new time/op delta
BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8)
Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8)
FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8)
FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8)
FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7)
FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8)
FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8)
FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7)
GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8)
GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7)
Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8)
Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8)
HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8)
JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7)
JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7)
Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8)
GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8)
RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7)
RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8)
RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7)
RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8)
RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8)
RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8)
Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8)
Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8)
TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8)
TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8)
name old speed new speed delta
GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8)
GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7)
Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8)
Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8)
JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7)
JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7)
GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8)
RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7)
RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8)
RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7)
RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8)
RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8)
RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8)
Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8)
Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8)
Fixes #39303
Updates #38740
Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.
Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.
Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:
Block-Op Meaning ARM condition codes
1. LTnoov less than MI
2. GEnoov greater than or equal PL
3. LEnoov less than or equal MI || EQ
4. GTnoov greater than NEQ & PL
The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.
Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.
Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.
This change also updates TestFormats to support using %#x.
Examples exhibiting where does the issue come from:
1: 'if x + 3 < 0' might be converted to:
before:
CMN $3, R0
BGE <else branch> // wrong branch is taken if 'x+3' overflows
after:
CMN $3, R0
BPL <else branch>
2: 'if y - 3 > 0' might be converted to:
before:
CMP $3, R0
BLE <else branch> // wrong branch is taken if 'y-3' underflows
after:
CMP $3, R0
BMI <else branch>
BEQ <else branch>
Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.
S1:
name old time/op new time/op delta
CondRewrite/SoloJump 13.6ns ± 0% 12.9ns ± 0% -5.15% (p=0.000 n=10+10)
CondRewrite/CombJump 13.8ns ± 1% 12.9ns ± 0% -6.32% (p=0.000 n=10+10)
S2:
name old time/op new time/op delta
CondRewrite/SoloJump 11.6ns ± 0% 10.9ns ± 0% -6.03% (p=0.000 n=10+10)
CondRewrite/CombJump 11.4ns ± 0% 10.8ns ± 1% -5.53% (p=0.000 n=10+10)
S3:
name old time/op new time/op delta
CondRewrite/SoloJump 7.36ns ± 0% 7.50ns ± 0% +1.79% (p=0.000 n=9+10)
CondRewrite/CombJump 7.35ns ± 0% 7.75ns ± 0% +5.51% (p=0.000 n=8+9)
S4:
name old time/op new time/op delta
CondRewrite/SoloJump-224 11.5ns ± 1% 10.9ns ± 0% -4.97% (p=0.000 n=10+10)
CondRewrite/CombJump-224 11.9ns ± 0% 11.5ns ± 0% -2.95% (p=0.000 n=10+10)
S5:
name old time/op new time/op delta
CondRewrite/SoloJump 10.0ns ± 0% 10.0ns ± 0% -0.45% (p=0.000 n=9+10)
CondRewrite/CombJump 9.93ns ± 0% 9.77ns ± 0% -1.53% (p=0.000 n=10+9)
Go1 perf. data:
name old time/op new time/op delta
BinaryTree17 6.29s ± 1% 6.30s ± 1% ~ (p=1.000 n=5+5)
Fannkuch11 5.40s ± 0% 5.40s ± 0% ~ (p=0.841 n=5+5)
FmtFprintfEmpty 97.9ns ± 0% 98.9ns ± 3% ~ (p=0.937 n=4+5)
FmtFprintfString 171ns ± 3% 171ns ± 2% ~ (p=0.754 n=5+5)
FmtFprintfInt 212ns ± 0% 217ns ± 6% +2.55% (p=0.008 n=5+5)
FmtFprintfIntInt 296ns ± 1% 297ns ± 2% ~ (p=0.516 n=5+5)
FmtFprintfPrefixedInt 371ns ± 2% 374ns ± 7% ~ (p=1.000 n=5+5)
FmtFprintfFloat 435ns ± 1% 439ns ± 2% ~ (p=0.056 n=5+5)
FmtManyArgs 1.37µs ± 1% 1.36µs ± 1% ~ (p=0.730 n=5+5)
GobDecode 14.6ms ± 4% 14.4ms ± 4% ~ (p=0.690 n=5+5)
GobEncode 11.8ms ±20% 11.6ms ±15% ~ (p=1.000 n=5+5)
Gzip 507ms ± 0% 491ms ± 0% -3.22% (p=0.008 n=5+5)
Gunzip 73.8ms ± 0% 73.9ms ± 0% ~ (p=0.690 n=5+5)
HTTPClientServer 116µs ± 0% 116µs ± 0% ~ (p=0.686 n=4+4)
JSONEncode 21.8ms ± 1% 21.6ms ± 2% ~ (p=0.151 n=5+5)
JSONDecode 104ms ± 1% 103ms ± 1% -1.08% (p=0.016 n=5+5)
Mandelbrot200 9.53ms ± 0% 9.53ms ± 0% ~ (p=0.421 n=5+5)
GoParse 7.55ms ± 1% 7.51ms ± 1% ~ (p=0.151 n=5+5)
RegexpMatchEasy0_32 158ns ± 0% 158ns ± 0% ~ (all equal)
RegexpMatchEasy0_1K 606ns ± 1% 608ns ± 3% ~ (p=0.937 n=5+5)
RegexpMatchEasy1_32 143ns ± 0% 144ns ± 1% ~ (p=0.095 n=5+4)
RegexpMatchEasy1_1K 927ns ± 2% 944ns ± 2% ~ (p=0.056 n=5+5)
RegexpMatchMedium_32 16.0ns ± 0% 16.0ns ± 0% ~ (all equal)
RegexpMatchMedium_1K 69.3µs ± 2% 69.7µs ± 0% ~ (p=0.690 n=5+5)
RegexpMatchHard_32 3.73µs ± 0% 3.73µs ± 1% ~ (p=0.984 n=5+5)
RegexpMatchHard_1K 111µs ± 1% 110µs ± 0% ~ (p=0.151 n=5+5)
Revcomp 1.91s ±47% 1.77s ±68% ~ (p=1.000 n=5+5)
Template 138ms ± 1% 138ms ± 1% ~ (p=1.000 n=5+5)
TimeParse 787ns ± 2% 785ns ± 1% ~ (p=0.540 n=5+5)
TimeFormat 729ns ± 1% 726ns ± 1% ~ (p=0.151 n=5+5)
Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
This CL changes the arm64 TBZ/TBNZ block from using Aux to using
a (typed) AuxInt. The corresponding rules have also been changed
to be typed.
Passes
GOARCH=arm64 gotip build -toolexec 'toolstash -cmp' -a std
Change-Id: I98d0cd2a791948f1db13259c17fb1b9b2807a043
Reviewed-on: https://go-review.googlesource.com/c/go/+/230839
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
name old time/op new time/op delta
Modify-16 404ns ± 1% 365ns ± 1% -9.73% (p=0.000 n=10+10)
ConstModify-16 407ns ± 0% 385ns ± 2% -5.56% (p=0.000 n=9+10)
Seems to generally help generated code.
Binary size change is in the noise.
Change-Id: I57891bfaf0f7dfc5d143bb9f7ebafc7079d2614f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228098
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
name old time/op new time/op delta
LoadAdd-16 545ns ± 0% 456ns ± 0% -16.31% (p=0.000 n=10+10)
Update #36468
Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227960
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
PanicBounds and PanicExtend are lowered to runtime calls (with a
non-Go ABI), but are not currently marked as calls. Since liveness
analysis only emits stack maps at calls in the runtime, this means
these panic call sites in the runtime won't get a stack map. These
almost immediately turn into throws in the runtime, but there's still
a chance they'll try to grow the stack first, which would lead to a
different panic.
To fix this, mark these operations as calls.
Outside the runtime, we currently emit stack maps for everything that
isn't an unsafe-point, so these panic calls get stack maps by default.
However, we're about to move to emitting stack maps only at call
sites, at which point this will start to matter outside the runtime as
well.
I confirmed that this has no effect on anything but PCDATA/FUNCDATA in
runtime and net/http.
For #36365.
Change-Id: Ic5bb463fd152cc320c815dc04cf62005261ae169
Reviewed-on: https://go-review.googlesource.com/c/go/+/230539
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This updates the PPC64.rules file to use the MOD instructions
that are available in power9. Prior to power9 this is done
using a longer sequence with multiply and divide.
Included in this change is removal of the REM* opcode variations
that set the CC or OV bits since their settings are based
on the DIV and are not appropriate for the REM.
Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/229761
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Implement multi-control branches for riscv64, switching to using the BNEZ
pseudo-instruction when rewriting conditionals. This will allow for further
branch optimisations to later be performed via rewrites.
Change-Id: I7f2c69f3c77494b403f26058c6bc8432d8070ad0
Reviewed-on: https://go-review.googlesource.com/c/go/+/226399
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
|
|
This first pass makes the rules using the condition code mask
(CCMask) and rotate parameters (RotateParams) aux values strongly
typed. This required adding strongly typed aux handling to the
block rulegen.
More CLs like this to follow, but this is probably the most
complex.
Passes toolstash-check -all.
Change-Id: Ie513b07d527f0c1b398d7748331442dcb5f7b17d
Reviewed-on: https://go-review.googlesource.com/c/go/+/228518
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
On s390x, some floating point arithmetic instructions (FSUB, FADD) generate flag.
This patch allows those related SSA ops to return a tuple, where the second argument of
the tuple is the generated flag. We can use the flag and remove the
subsequent comparison instruction (e.g: LTDBR).
This CL also reduces the .text section for math.test binary by 0.4KB.
Benchmarks:
name old time/op new time/op delta
Acos-18 12.1ns ± 0% 12.1ns ± 0% ~ (all equal)
Acosh-18 18.5ns ± 0% 18.5ns ± 0% ~ (all equal)
Asin-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal)
Asinh-18 19.4ns ± 0% 19.5ns ± 1% ~ (p=0.444 n=5+5)
Atan-18 10.0ns ± 0% 10.0ns ± 0% ~ (all equal)
Atanh-18 19.1ns ± 1% 19.2ns ± 2% ~ (p=0.841 n=5+5)
Atan2-18 16.4ns ± 0% 16.4ns ± 0% ~ (all equal)
Cbrt-18 14.8ns ± 0% 14.8ns ± 0% ~ (all equal)
Ceil-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Copysign-18 0.80ns ± 0% 0.80ns ± 0% ~ (all equal)
Cos-18 7.19ns ± 0% 7.19ns ± 0% ~ (p=0.556 n=4+5)
Cosh-18 12.4ns ± 0% 12.4ns ± 0% ~ (all equal)
Erf-18 10.8ns ± 0% 10.8ns ± 0% ~ (all equal)
Erfc-18 11.0ns ± 0% 11.0ns ± 0% ~ (all equal)
Erfinv-18 23.0ns ±16% 26.8ns ± 1% +16.90% (p=0.008 n=5+5)
Erfcinv-18 23.3ns ±15% 26.1ns ± 7% ~ (p=0.087 n=5+5)
Exp-18 8.67ns ± 0% 8.67ns ± 0% ~ (p=1.000 n=4+4)
ExpGo-18 50.8ns ± 3% 52.4ns ± 2% ~ (p=0.063 n=5+5)
Expm1-18 9.49ns ± 1% 9.47ns ± 0% ~ (p=1.000 n=5+5)
Exp2-18 52.7ns ± 1% 50.5ns ± 3% -4.10% (p=0.024 n=5+5)
Exp2Go-18 50.6ns ± 1% 48.4ns ± 3% -4.39% (p=0.008 n=5+5)
Abs-18 0.67ns ± 0% 0.67ns ± 0% ~ (p=0.444 n=5+5)
Dim-18 1.02ns ± 0% 1.03ns ± 0% +0.98% (p=0.008 n=5+5)
Floor-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Max-18 3.09ns ± 1% 3.05ns ± 0% -1.42% (p=0.008 n=5+5)
Min-18 3.32ns ± 1% 3.30ns ± 0% -0.72% (p=0.016 n=5+4)
Mod-18 62.3ns ± 1% 65.8ns ± 3% +5.55% (p=0.008 n=5+5)
Frexp-18 5.05ns ± 2% 4.98ns ± 0% ~ (p=0.683 n=5+5)
Gamma-18 24.4ns ± 0% 24.1ns ± 0% -1.23% (p=0.008 n=5+5)
Hypot-18 10.3ns ± 0% 10.3ns ± 0% ~ (all equal)
HypotGo-18 10.2ns ± 0% 10.2ns ± 0% ~ (all equal)
Ilogb-18 3.56ns ± 1% 3.54ns ± 0% ~ (p=0.595 n=5+5)
J0-18 113ns ± 0% 108ns ± 1% -4.42% (p=0.016 n=4+5)
J1-18 115ns ± 0% 109ns ± 1% -4.87% (p=0.016 n=4+5)
Jn-18 240ns ± 0% 230ns ± 2% -4.41% (p=0.008 n=5+5)
Ldexp-18 6.19ns ± 0% 6.19ns ± 0% ~ (p=0.444 n=5+5)
Lgamma-18 32.2ns ± 0% 32.2ns ± 0% ~ (all equal)
Log-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal)
Logb-18 4.23ns ± 0% 4.22ns ± 0% ~ (p=0.444 n=5+5)
Log1p-18 12.7ns ± 0% 12.7ns ± 0% ~ (all equal)
Log10-18 18.1ns ± 0% 18.2ns ± 0% ~ (p=0.167 n=5+5)
Log2-18 14.0ns ± 0% 14.0ns ± 0% ~ (all equal)
Modf-18 10.4ns ± 0% 10.5ns ± 0% +0.96% (p=0.016 n=4+5)
Nextafter32-18 11.3ns ± 0% 11.3ns ± 0% ~ (all equal)
Nextafter64-18 4.01ns ± 1% 3.97ns ± 0% ~ (p=0.333 n=5+4)
PowInt-18 32.7ns ± 0% 32.7ns ± 0% ~ (all equal)
PowFrac-18 33.2ns ± 0% 33.1ns ± 0% ~ (p=0.095 n=4+5)
Pow10Pos-18 1.58ns ± 0% 1.58ns ± 0% ~ (all equal)
Pow10Neg-18 5.81ns ± 0% 5.81ns ± 0% ~ (all equal)
Round-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
RoundToEven-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Remainder-18 40.6ns ± 0% 40.7ns ± 0% ~ (p=0.238 n=5+4)
Signbit-18 1.57ns ± 0% 1.57ns ± 0% ~ (all equal)
Sin-18 6.75ns ± 0% 6.74ns ± 0% ~ (p=0.333 n=5+4)
Sincos-18 29.5ns ± 0% 29.5ns ± 0% ~ (all equal)
Sinh-18 14.4ns ± 0% 14.4ns ± 0% ~ (all equal)
SqrtIndirect-18 3.97ns ± 0% 4.15ns ± 0% +4.59% (p=0.008 n=5+5)
SqrtLatency-18 8.01ns ± 0% 8.01ns ± 0% ~ (all equal)
SqrtIndirectLatency-18 11.6ns ± 0% 11.6ns ± 0% ~ (all equal)
SqrtGoLatency-18 44.7ns ± 0% 45.0ns ± 0% +0.67% (p=0.008 n=5+5)
SqrtPrime-18 1.26µs ± 0% 1.27µs ± 0% +0.63% (p=0.029 n=4+4)
Tan-18 11.1ns ± 0% 11.1ns ± 0% ~ (all equal)
Tanh-18 15.8ns ± 0% 15.8ns ± 0% ~ (all equal)
Trunc-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Y0-18 113ns ± 2% 108ns ± 3% -5.11% (p=0.008 n=5+5)
Y1-18 112ns ± 3% 107ns ± 0% -4.29% (p=0.000 n=5+4)
Yn-18 229ns ± 0% 220ns ± 1% -3.76% (p=0.016 n=4+5)
Float64bits-18 1.09ns ± 0% 1.09ns ± 0% ~ (all equal)
Float64frombits-18 0.55ns ± 0% 0.55ns ± 0% ~ (all equal)
Float32bits-18 0.96ns ±16% 0.86ns ± 0% ~ (p=0.563 n=5+5)
Float32frombits-18 1.03ns ±28% 0.84ns ± 0% ~ (p=0.167 n=5+5)
FMA-18 1.60ns ± 0% 1.60ns ± 0% ~ (all equal)
[Geo mean] 10.0ns 9.9ns -0.41%
Change-Id: Ief7e63ea5a8ba404b0a4696e12b9b7e0b05a9a03
Reviewed-on: https://go-review.googlesource.com/c/go/+/209160
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.
Fixes #37316.
Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This change includes the following:
- Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9.
These instructions do not require an index register, which
allows more loads and stores within a loop without initializing
multiple index registers. The LoweredQuadXXX generate LXV/STXV.
- Create LoweredMoveXXXShort and LoweredZeroXXXShort for short
moves that don't generate loops, and therefore don't clobber the
address registers or flags.
- Use registers other than R3 and R4 to avoid conflicting with
registers that have already been allocated to avoid unnecessary
register moves.
- Eliminate the use of R14 as scratch register and use R31
instead.
- Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a
loop with more than 3 iterations.
This performance opportunity was noticed in github.com/golang/snappy
benchmarks. Results on power9:
WordsDecode1e1 54.1ns ± 0% 53.8ns ± 0% -0.51% (p=0.029 n=4+4)
WordsDecode1e2 287ns ± 0% 282ns ± 1% -1.83% (p=0.029 n=4+4)
WordsDecode1e3 3.98µs ± 0% 3.64µs ± 0% -8.52% (p=0.029 n=4+4)
WordsDecode1e4 66.9µs ± 0% 67.0µs ± 0% +0.20% (p=0.029 n=4+4)
WordsDecode1e5 723µs ± 0% 723µs ± 0% -0.01% (p=0.200 n=4+4)
WordsDecode1e6 7.21ms ± 0% 7.21ms ± 0% -0.02% (p=1.000 n=4+4)
WordsEncode1e1 29.9ns ± 0% 29.4ns ± 0% -1.51% (p=0.029 n=4+4)
WordsEncode1e2 2.12µs ± 0% 1.75µs ± 0% -17.70% (p=0.029 n=4+4)
WordsEncode1e3 11.7µs ± 0% 11.2µs ± 0% -4.61% (p=0.029 n=4+4)
WordsEncode1e4 119µs ± 0% 120µs ± 0% +0.36% (p=0.029 n=4+4)
WordsEncode1e5 1.21ms ± 0% 1.22ms ± 0% +0.41% (p=0.029 n=4+4)
WordsEncode1e6 12.0ms ± 0% 12.0ms ± 0% +0.57% (p=0.029 n=4+4)
RandomEncode 286µs ± 0% 203µs ± 0% -28.82% (p=0.029 n=4+4)
ExtendMatch 47.4µs ± 0% 47.0µs ± 0% -0.85% (p=0.029 n=4+4)
Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858
Reviewed-on: https://go-review.googlesource.com/c/go/+/226539
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Before using some CPU instructions, we must check for their presence.
We use global variables in the runtime package to record features.
Prior to this CL, we issued a regular memory load for these features.
The downside to this is that, because it is a regular memory load,
it cannot be hoisted out of loops or otherwise reordered with other loads.
This CL introduces a new intrinsic just for checking cpu features.
It still ends up resulting in a memory load, but that memory load can
now be floated to the entry block and rematerialized as needed.
One downside is that the regular load could be combined with the comparison
into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE.
(It is possible that MOVBQZX+TESTQ+NE would be better.)
This CL does only amd64. It is easy to extend to other architectures.
For the benchmark in #36196, on my machine, this offers a mild speedup.
name old time/op new time/op delta
FMA-8 1.39ns ± 6% 1.29ns ± 9% -7.19% (p=0.000 n=97+96)
NonFMA-8 2.03ns ±11% 2.04ns ±12% ~ (p=0.618 n=99+98)
Updates #15808
Updates #36196
Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/212360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
|
|
Store multiple instructions can clobber flags on s390x when the
offset passed into the assembler is outside the range representable
with a signed 20 bit integer. This is because the assembler uses
the agfi instruction to implement the large offset. The assembler
could use a different sequence of instructions, but for now just
mark the instruction as 'clobberFlags' since this is risk free.
Noticed while investigating #38195.
No test yet since I'm not sure how to get this bug to trigger and
I haven't seen it affect real code.
Change-Id: I4a6ab96455a3ef8ffacb76ef0166b97eb40ff925
Reviewed-on: https://go-review.googlesource.com/c/go/+/226759
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Things like CMPQ 4(AX)(BX*8), CX
Fixes #37955
Change-Id: Icbed430f65c91a0e3f38a633d8321d79433ad8b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/224219
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
The compiler-inserted write barrier calls use a special ABI
for speed and to minimize the binary size impact.
runtime.gcWriteBarrier takes its args in DI and AX.
This change adds gcWriteBarrier wrapper functions,
varying only in the register used for the second argument.
(Allowing variation in the first argument doesn't offer improvements,
which is convenient, as it avoids quadratic API growth.)
This reduces the number of register copies.
The goals are reduced binary size via reduced register pressure/copies.
One downside to this change is that when the write barrier is on,
we may bounce through several different write barrier wrappers,
which is bad for the instruction cache.
Package runtime write barrier benchmarks for this change:
name old time/op new time/op delta
WriteBarrier-8 16.6ns ± 6% 15.6ns ± 6% -5.73% (p=0.000 n=97+99)
BulkWriteBarrier-8 4.37ns ± 7% 4.22ns ± 8% -3.45% (p=0.000 n=96+99)
However, I don't particularly trust these numbers.
I ran runtime.BenchmarkWriteBarrier multiple times as I rebased
this change, and noticed that the results have high variance
depending on the parent change, perhaps due to aligment.
This change was stress tested with GOGC=1 GODEBUG=gccheckmark=1 go test std.
This change reduces binary sizes:
file before after Δ %
addr2line 4308720 4296688 -12032 -0.279%
api 5965592 5945368 -20224 -0.339%
asm 5148088 5025464 -122624 -2.382%
buildid 2848760 2844904 -3856 -0.135%
cgo 4828968 4812840 -16128 -0.334%
compile 19754720 19529744 -224976 -1.139%
cover 5256840 5236600 -20240 -0.385%
dist 3670312 3658264 -12048 -0.328%
doc 4669608 4657576 -12032 -0.258%
fix 3377976 3365944 -12032 -0.356%
link 6614888 6586472 -28416 -0.430%
nm 4258368 4254528 -3840 -0.090%
objdump 4656336 4644304 -12032 -0.258%
pack 2295176 2295432 +256 +0.011%
pprof 14762356 14709364 -52992 -0.359%
test2json 2824456 2820600 -3856 -0.137%
trace 11684404 11643700 -40704 -0.348%
vet 8284760 8252248 -32512 -0.392%
total 115210328 114580040 -630288 -0.547%
This change improves compiler performance:
name old time/op new time/op delta
Template 208ms ± 3% 207ms ± 3% -0.40% (p=0.030 n=43+44)
Unicode 80.2ms ± 3% 81.3ms ± 3% +1.25% (p=0.000 n=41+44)
GoTypes 699ms ± 3% 694ms ± 2% -0.71% (p=0.016 n=42+37)
Compiler 3.26s ± 2% 3.23s ± 2% -0.86% (p=0.000 n=43+45)
SSA 6.97s ± 1% 6.93s ± 1% -0.63% (p=0.000 n=43+45)
Flate 134ms ± 3% 133ms ± 2% ~ (p=0.139 n=45+42)
GoParser 165ms ± 2% 164ms ± 1% -0.79% (p=0.000 n=45+40)
Reflect 434ms ± 4% 435ms ± 4% ~ (p=0.937 n=44+44)
Tar 181ms ± 2% 181ms ± 2% ~ (p=0.702 n=43+45)
XML 244ms ± 2% 244ms ± 2% ~ (p=0.237 n=45+44)
[Geo mean] 403ms 402ms -0.29%
name old user-time/op new user-time/op delta
Template 271ms ± 2% 268ms ± 1% -1.40% (p=0.000 n=42+42)
Unicode 117ms ± 3% 116ms ± 5% ~ (p=0.066 n=45+45)
GoTypes 948ms ± 2% 936ms ± 2% -1.30% (p=0.000 n=41+40)
Compiler 4.26s ± 1% 4.21s ± 2% -1.25% (p=0.000 n=37+45)
SSA 9.52s ± 2% 9.41s ± 1% -1.18% (p=0.000 n=44+45)
Flate 167ms ± 2% 165ms ± 2% -1.15% (p=0.000 n=44+41)
GoParser 201ms ± 2% 198ms ± 1% -1.40% (p=0.000 n=43+43)
Reflect 563ms ± 8% 560ms ± 7% ~ (p=0.206 n=45+44)
Tar 224ms ± 2% 222ms ± 2% -0.81% (p=0.000 n=45+45)
XML 308ms ± 2% 304ms ± 1% -1.17% (p=0.000 n=42+43)
[Geo mean] 525ms 519ms -1.08%
name old alloc/op new alloc/op delta
Template 36.3MB ± 0% 36.3MB ± 0% ~ (p=0.421 n=5+5)
Unicode 28.4MB ± 0% 28.3MB ± 0% ~ (p=0.056 n=5+5)
GoTypes 121MB ± 0% 121MB ± 0% -0.14% (p=0.008 n=5+5)
Compiler 567MB ± 0% 567MB ± 0% -0.06% (p=0.016 n=4+5)
SSA 1.26GB ± 0% 1.26GB ± 0% -0.07% (p=0.008 n=5+5)
Flate 22.9MB ± 0% 22.8MB ± 0% ~ (p=0.310 n=5+5)
GoParser 28.0MB ± 0% 27.9MB ± 0% -0.09% (p=0.008 n=5+5)
Reflect 78.4MB ± 0% 78.4MB ± 0% -0.03% (p=0.008 n=5+5)
Tar 34.2MB ± 0% 34.2MB ± 0% -0.05% (p=0.008 n=5+5)
XML 44.4MB ± 0% 44.4MB ± 0% -0.04% (p=0.016 n=5+5)
[Geo mean] 76.4MB 76.3MB -0.05%
name old allocs/op new allocs/op delta
Template 356k ± 0% 356k ± 0% -0.13% (p=0.008 n=5+5)
Unicode 326k ± 0% 326k ± 0% -0.07% (p=0.008 n=5+5)
GoTypes 1.24M ± 0% 1.24M ± 0% -0.24% (p=0.008 n=5+5)
Compiler 5.30M ± 0% 5.28M ± 0% -0.34% (p=0.008 n=5+5)
SSA 11.9M ± 0% 11.9M ± 0% -0.16% (p=0.008 n=5+5)
Flate 226k ± 0% 225k ± 0% -0.12% (p=0.008 n=5+5)
GoParser 287k ± 0% 286k ± 0% -0.29% (p=0.008 n=5+5)
Reflect 930k ± 0% 929k ± 0% -0.05% (p=0.008 n=5+5)
Tar 332k ± 0% 331k ± 0% -0.12% (p=0.008 n=5+5)
XML 411k ± 0% 411k ± 0% -0.12% (p=0.008 n=5+5)
[Geo mean] 771k 770k -0.16%
For some packages, this change significantly reduces the size of executable text.
Examples:
file before after Δ %
cmd/internal/obj/arm.s 68658 66855 -1803 -2.626%
cmd/internal/obj/mips.s 57486 56272 -1214 -2.112%
cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250%
cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053%
cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019%
Full listing:
file before after Δ %
container/ring.s 1890 1870 -20 -1.058%
container/list.s 5366 5390 +24 +0.447%
internal/cpu.s 3298 3295 -3 -0.091%
internal/testlog.s 1507 1501 -6 -0.398%
image/color.s 8281 8248 -33 -0.399%
runtime.s 480970 480075 -895 -0.186%
sync.s 16497 16408 -89 -0.539%
internal/singleflight.s 2591 2577 -14 -0.540%
math/rand.s 10456 10438 -18 -0.172%
cmd/go/internal/par.s 2801 2790 -11 -0.393%
internal/reflectlite.s 28477 28417 -60 -0.211%
errors.s 2750 2736 -14 -0.509%
internal/oserror.s 446 434 -12 -2.691%
sort.s 17061 17046 -15 -0.088%
io.s 17063 16999 -64 -0.375%
vendor/golang.org/x/crypto/hkdf.s 1962 1936 -26 -1.325%
text/tabwriter.s 9617 9574 -43 -0.447%
hash/crc64.s 3414 3408 -6 -0.176%
hash/crc32.s 6657 6651 -6 -0.090%
bytes.s 31932 31863 -69 -0.216%
strconv.s 53158 52799 -359 -0.675%
strings.s 42829 42665 -164 -0.383%
encoding/ascii85.s 4833 4791 -42 -0.869%
vendor/golang.org/x/text/transform.s 16810 16724 -86 -0.512%
path.s 6848 6845 -3 -0.044%
encoding/base32.s 9658 9592 -66 -0.683%
bufio.s 23051 22908 -143 -0.620%
compress/bzip2.s 11773 11764 -9 -0.076%
image.s 37565 37502 -63 -0.168%
syscall.s 82359 82279 -80 -0.097%
regexp/syntax.s 83573 82930 -643 -0.769%
image/jpeg.s 36535 36490 -45 -0.123%
regexp.s 64396 64214 -182 -0.283%
time.s 82724 82622 -102 -0.123%
plugin.s 6539 6536 -3 -0.046%
context.s 10959 10865 -94 -0.858%
internal/poll.s 24286 24270 -16 -0.066%
reflect.s 168304 167927 -377 -0.224%
internal/fmtsort.s 7416 7376 -40 -0.539%
os.s 52465 51787 -678 -1.292%
cmd/go/internal/lockedfile/internal/filelock.s 2326 2317 -9 -0.387%
os/signal.s 4657 4648 -9 -0.193%
runtime/debug.s 6040 5998 -42 -0.695%
encoding/binary.s 30838 30801 -37 -0.120%
vendor/golang.org/x/net/route.s 23694 23491 -203 -0.857%
path/filepath.s 17895 17889 -6 -0.034%
cmd/vendor/golang.org/x/sys/unix.s 78125 78109 -16 -0.020%
io/ioutil.s 6999 6996 -3 -0.043%
encoding/base64.s 12094 12007 -87 -0.719%
crypto/cipher.s 20466 20372 -94 -0.459%
cmd/go/internal/robustio.s 2672 2669 -3 -0.112%
encoding/pem.s 9302 9286 -16 -0.172%
internal/obscuretestdata.s 1719 1695 -24 -1.396%
crypto/aes.s 11014 11002 -12 -0.109%
os/exec.s 29388 29231 -157 -0.534%
cmd/internal/browser.s 2266 2260 -6 -0.265%
internal/goroot.s 4601 4592 -9 -0.196%
vendor/golang.org/x/crypto/chacha20poly1305.s 8945 8942 -3 -0.034%
cmd/vendor/golang.org/x/crypto/ssh/terminal.s 27226 27195 -31 -0.114%
index/suffixarray.s 36431 36411 -20 -0.055%
fmt.s 77017 76709 -308 -0.400%
encoding/hex.s 6241 6154 -87 -1.394%
compress/lzw.s 7133 7069 -64 -0.897%
database/sql/driver.s 18888 18877 -11 -0.058%
net/url.s 29838 29739 -99 -0.332%
debug/plan9obj.s 8329 8279 -50 -0.600%
encoding/csv.s 12986 12902 -84 -0.647%
debug/gosym.s 25403 25330 -73 -0.287%
compress/flate.s 51192 50970 -222 -0.434%
vendor/golang.org/x/net/dns/dnsmessage.s 86769 86208 -561 -0.647%
compress/gzip.s 9791 9758 -33 -0.337%
compress/zlib.s 7310 7277 -33 -0.451%
archive/zip.s 42356 42166 -190 -0.449%
debug/dwarf.s 108259 107730 -529 -0.489%
encoding/json.s 106378 105910 -468 -0.440%
os/user.s 14751 14724 -27 -0.183%
database/sql.s 99011 98404 -607 -0.613%
log.s 9466 9423 -43 -0.454%
debug/pe.s 31272 31182 -90 -0.288%
debug/macho.s 32764 32608 -156 -0.476%
encoding/gob.s 136976 136517 -459 -0.335%
vendor/golang.org/x/text/unicode/bidi.s 27318 27276 -42 -0.154%
archive/tar.s 71416 70975 -441 -0.618%
vendor/golang.org/x/net/http2/hpack.s 23892 23848 -44 -0.184%
vendor/golang.org/x/text/secure/bidirule.s 3354 3351 -3 -0.089%
mime/quotedprintable.s 5960 5925 -35 -0.587%
net/http/internal.s 5874 5853 -21 -0.358%
math/big.s 184147 183692 -455 -0.247%
debug/elf.s 63775 63567 -208 -0.326%
mime.s 39802 39709 -93 -0.234%
encoding/xml.s 111038 110713 -325 -0.293%
crypto/dsa.s 6044 6029 -15 -0.248%
go/token.s 12139 12077 -62 -0.511%
crypto/rand.s 6889 6866 -23 -0.334%
go/scanner.s 19030 19008 -22 -0.116%
flag.s 22320 22236 -84 -0.376%
vendor/golang.org/x/text/unicode/norm.s 66652 66391 -261 -0.392%
crypto/rsa.s 31671 31650 -21 -0.066%
crypto/elliptic.s 51553 51403 -150 -0.291%
internal/xcoff.s 22950 22822 -128 -0.558%
go/constant.s 43750 43689 -61 -0.139%
encoding/asn1.s 57086 57035 -51 -0.089%
runtime/trace.s 2609 2603 -6 -0.230%
crypto/x509/pkix.s 10458 10471 +13 +0.124%
image/gif.s 27544 27385 -159 -0.577%
vendor/golang.org/x/net/idna.s 24558 24502 -56 -0.228%
image/png.s 42775 42685 -90 -0.210%
vendor/golang.org/x/crypto/cryptobyte.s 33616 33493 -123 -0.366%
go/ast.s 80684 80449 -235 -0.291%
net/internal/socktest.s 16571 16535 -36 -0.217%
crypto/ecdsa.s 11948 11936 -12 -0.100%
text/template/parse.s 95138 94002 -1136 -1.194%
runtime/pprof.s 59702 59639 -63 -0.106%
testing.s 68427 68088 -339 -0.495%
internal/testenv.s 5620 5596 -24 -0.427%
testing/internal/testdeps.s 3312 3294 -18 -0.543%
internal/trace.s 78473 78239 -234 -0.298%
testing/iotest.s 4968 4908 -60 -1.208%
os/signal/internal/pty.s 3011 2990 -21 -0.697%
testing/quick.s 12179 12125 -54 -0.443%
cmd/internal/bio.s 9286 9274 -12 -0.129%
cmd/internal/src.s 17684 17663 -21 -0.119%
cmd/internal/goobj2.s 12588 12558 -30 -0.238%
cmd/internal/objabi.s 16408 16390 -18 -0.110%
go/printer.s 77417 77308 -109 -0.141%
go/parser.s 80045 79113 -932 -1.164%
go/format.s 5434 5419 -15 -0.276%
cmd/internal/goobj.s 26146 25954 -192 -0.734%
runtime/pprof/internal/profile.s 102518 102178 -340 -0.332%
text/template.s 95343 94935 -408 -0.428%
cmd/internal/dwarf.s 31718 31572 -146 -0.460%
cmd/vendor/golang.org/x/arch/arm/armasm.s 45240 45151 -89 -0.197%
internal/lazytemplate.s 1470 1457 -13 -0.884%
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s 37253 37220 -33 -0.089%
cmd/asm/internal/flags.s 2593 2590 -3 -0.116%
cmd/asm/internal/lex.s 25068 24921 -147 -0.586%
cmd/internal/buildid.s 18536 18263 -273 -1.473%
cmd/vendor/golang.org/x/arch/x86/x86asm.s 80209 80105 -104 -0.130%
go/doc.s 75140 74585 -555 -0.739%
cmd/internal/edit.s 3893 3899 +6 +0.154%
html/template.s 89377 88809 -568 -0.636%
cmd/vendor/golang.org/x/arch/arm64/arm64asm.s 117998 117824 -174 -0.147%
cmd/internal/obj.s 115015 114290 -725 -0.630%
go/build.s 69379 68862 -517 -0.745%
cmd/internal/objfile.s 48106 47982 -124 -0.258%
cmd/cover.s 46239 46113 -126 -0.272%
cmd/addr2line.s 2845 2833 -12 -0.422%
cmd/internal/obj/arm.s 68658 66855 -1803 -2.626%
cmd/internal/obj/mips.s 57486 56272 -1214 -2.112%
cmd/internal/obj/riscv.s 63834 63006 -828 -1.297%
cmd/compile/internal/syntax.s 146582 145456 -1126 -0.768%
cmd/internal/obj/wasm.s 44117 44066 -51 -0.116%
cmd/cgo.s 242645 241653 -992 -0.409%
cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250%
net.s 295972 292010 -3962 -1.339%
go/types.s 321371 319432 -1939 -0.603%
vendor/golang.org/x/net/http/httpproxy.s 9450 9423 -27 -0.286%
net/textproto.s 19455 19406 -49 -0.252%
cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053%
go/internal/srcimporter.s 6475 6409 -66 -1.019%
log/syslog.s 8017 7929 -88 -1.098%
cmd/compile/internal/logopt.s 10183 10162 -21 -0.206%
net/mail.s 24085 23948 -137 -0.569%
mime/multipart.s 21527 21420 -107 -0.497%
cmd/internal/obj/s390x.s 127610 127757 +147 +0.115%
go/internal/gcimporter.s 34913 34548 -365 -1.045%
vendor/golang.org/x/net/nettest.s 28103 28016 -87 -0.310%
cmd/go/internal/cfg.s 9967 9916 -51 -0.512%
cmd/api.s 39703 39603 -100 -0.252%
go/internal/gccgoimporter.s 56470 56120 -350 -0.620%
go/importer.s 2077 2056 -21 -1.011%
cmd/compile/internal/types.s 48202 47282 -920 -1.909%
cmd/go/internal/str.s 4341 4320 -21 -0.484%
cmd/internal/obj/x86.s 89440 88625 -815 -0.911%
cmd/go/internal/base.s 12667 12580 -87 -0.687%
cmd/go/internal/cache.s 30754 30571 -183 -0.595%
cmd/doc.s 62976 62755 -221 -0.351%
cmd/go/internal/search.s 20114 19993 -121 -0.602%
cmd/vendor/golang.org/x/xerrors.s 17923 17855 -68 -0.379%
cmd/go/internal/lockedfile.s 16451 16415 -36 -0.219%
cmd/vendor/golang.org/x/mod/sumdb/note.s 18200 18150 -50 -0.275%
cmd/vendor/golang.org/x/mod/module.s 17869 17851 -18 -0.101%
cmd/asm/internal/arch.s 37533 37482 -51 -0.136%
cmd/fix.s 87728 87492 -236 -0.269%
cmd/vendor/golang.org/x/mod/sumdb/tlog.s 36394 36367 -27 -0.074%
cmd/vendor/golang.org/x/mod/sumdb/dirhash.s 4990 4963 -27 -0.541%
cmd/go/internal/imports.s 16499 16469 -30 -0.182%
cmd/vendor/golang.org/x/mod/zip.s 18816 18745 -71 -0.377%
cmd/go/internal/cmdflag.s 5126 5123 -3 -0.059%
cmd/internal/test2json.s 9540 9452 -88 -0.922%
cmd/go/internal/tool.s 3629 3623 -6 -0.165%
cmd/go/internal/version.s 11232 11220 -12 -0.107%
cmd/go/internal/mvs.s 25383 25179 -204 -0.804%
cmd/nm.s 5815 5803 -12 -0.206%
cmd/dist.s 210146 209140 -1006 -0.479%
cmd/asm/internal/asm.s 68655 68549 -106 -0.154%
cmd/vendor/golang.org/x/mod/modfile.s 72974 72510 -464 -0.636%
cmd/go/internal/load.s 107548 106861 -687 -0.639%
cmd/link/internal/sym.s 18708 18581 -127 -0.679%
cmd/asm.s 3367 3343 -24 -0.713%
cmd/gofmt.s 30795 30698 -97 -0.315%
cmd/link/internal/objfile.s 21828 21630 -198 -0.907%
cmd/pack.s 14878 14869 -9 -0.060%
cmd/vendor/github.com/google/pprof/internal/elfexec.s 6788 6782 -6 -0.088%
cmd/test2json.s 1647 1641 -6 -0.364%
cmd/link/internal/loader.s 48677 48483 -194 -0.399%
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s 16783 16773 -10 -0.060%
cmd/link/internal/loadelf.s 35464 35126 -338 -0.953%
cmd/link/internal/loadmacho.s 29438 29180 -258 -0.876%
cmd/link/internal/loadpe.s 16440 16371 -69 -0.420%
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2106 2100 -6 -0.285%
cmd/link/internal/loadxcoff.s 11711 11615 -96 -0.820%
cmd/vendor/golang.org/x/tools/go/analysis/internal/facts.s 14954 14883 -71 -0.475%
cmd/vendor/golang.org/x/tools/go/ast/inspector.s 5394 5374 -20 -0.371%
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s 37029 36822 -207 -0.559%
cmd/vendor/golang.org/x/tools/go/analysis/passes/inspect.s 340 337 -3 -0.882%
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s 9919 9858 -61 -0.615%
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s 6705 6690 -15 -0.224%
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s 9783 9741 -42 -0.429%
cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ifaceassert.s 2768 2762 -6 -0.217%
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s 3031 2998 -33 -1.089%
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s 4382 4376 -6 -0.137%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s 8654 8642 -12 -0.139%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stringintconv.s 3458 3446 -12 -0.347%
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s 8011 7995 -16 -0.200%
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s 6205 6193 -12 -0.193%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s 66183 65861 -322 -0.487%
cmd/vendor/github.com/google/pprof/profile.s 150844 150261 -583 -0.386%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8057 8054 -3 -0.037%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s 3670 3667 -3 -0.082%
cmd/vendor/github.com/google/pprof/internal/measurement.s 10464 10440 -24 -0.229%
cmd/vendor/golang.org/x/tools/go/types/typeutil.s 12319 12274 -45 -0.365%
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s 13503 13342 -161 -1.192%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s 5261 5218 -43 -0.817%
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s 1462 1459 -3 -0.205%
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s 9594 9582 -12 -0.125%
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s 34397 34338 -59 -0.172%
cmd/vendor/github.com/google/pprof/internal/graph.s 53225 52936 -289 -0.543%
cmd/vendor/github.com/ianlancetaylor/demangle.s 177450 175329 -2121 -1.195%
crypto/x509.s 147892 147388 -504 -0.341%
cmd/go/internal/work.s 306465 304950 -1515 -0.494%
cmd/go/internal/run.s 4664 4657 -7 -0.150%
crypto/tls.s 313130 311833 -1297 -0.414%
net/http/httptrace.s 3979 3905 -74 -1.860%
net/smtp.s 14413 14344 -69 -0.479%
cmd/link/internal/ld.s 545343 542279 -3064 -0.562%
cmd/link/internal/mips.s 6218 6215 -3 -0.048%
cmd/link/internal/mips64.s 6108 6103 -5 -0.082%
cmd/link/internal/amd64.s 18154 18112 -42 -0.231%
cmd/link/internal/arm64.s 22527 22494 -33 -0.146%
cmd/link/internal/arm.s 22574 22494 -80 -0.354%
cmd/link/internal/s390x.s 20779 20746 -33 -0.159%
cmd/link/internal/wasm.s 16531 16493 -38 -0.230%
cmd/link/internal/x86.s 18906 18849 -57 -0.301%
cmd/link/internal/ppc64.s 26856 26778 -78 -0.290%
net/http.s 559101 556513 -2588 -0.463%
net/http/cookiejar.s 15912 15885 -27 -0.170%
expvar.s 9531 9525 -6 -0.063%
net/http/httptest.s 16616 16475 -141 -0.849%
net/http/cgi.s 23624 23458 -166 -0.703%
cmd/go/internal/web.s 16546 16489 -57 -0.344%
cmd/vendor/golang.org/x/mod/sumdb.s 33197 33117 -80 -0.241%
net/http/fcgi.s 19266 19169 -97 -0.503%
net/http/httputil.s 39875 39728 -147 -0.369%
cmd/vendor/github.com/google/pprof/internal/symbolz.s 5888 5867 -21 -0.357%
net/rpc.s 34154 34003 -151 -0.442%
cmd/vendor/github.com/google/pprof/internal/transport.s 2746 2716 -30 -1.092%
cmd/vendor/github.com/google/pprof/internal/binutils.s 35999 35875 -124 -0.344%
net/rpc/jsonrpc.s 6637 6598 -39 -0.588%
cmd/vendor/github.com/google/pprof/internal/symbolizer.s 11533 11458 -75 -0.650%
cmd/go/internal/get.s 62921 62803 -118 -0.188%
cmd/vendor/github.com/google/pprof/internal/report.s 80364 80058 -306 -0.381%
cmd/go/internal/modfetch/codehost.s 89680 89066 -614 -0.685%
cmd/trace.s 117171 116701 -470 -0.401%
cmd/vendor/github.com/google/pprof/internal/driver.s 144268 143297 -971 -0.673%
cmd/go/internal/modfetch.s 126299 125860 -439 -0.348%
cmd/vendor/github.com/google/pprof/driver.s 9042 9000 -42 -0.464%
cmd/go/internal/modconv.s 17947 17889 -58 -0.323%
cmd/pprof.s 12399 12326 -73 -0.589%
cmd/go/internal/modload.s 151182 150389 -793 -0.525%
cmd/go/internal/generate.s 11738 11636 -102 -0.869%
cmd/go/internal/help.s 6571 6531 -40 -0.609%
cmd/go/internal/clean.s 11174 11142 -32 -0.286%
cmd/go/internal/vet.s 7897 7867 -30 -0.380%
cmd/go/internal/envcmd.s 22176 22095 -81 -0.365%
cmd/go/internal/list.s 15216 15067 -149 -0.979%
cmd/go/internal/modget.s 38698 38519 -179 -0.463%
cmd/go/internal/modcmd.s 46674 46441 -233 -0.499%
cmd/go/internal/test.s 64664 64456 -208 -0.322%
cmd/go.s 6730 6703 -27 -0.401%
cmd/compile/internal/ssa.s 3592565 3582500 -10065 -0.280%
cmd/compile/internal/gc.s 1549123 1537123 -12000 -0.775%
cmd/compile/internal/riscv64.s 14579 14483 -96 -0.658%
cmd/compile/internal/mips.s 20578 20419 -159 -0.773%
cmd/compile/internal/ppc64.s 25524 25359 -165 -0.646%
cmd/compile/internal/mips64.s 19795 19636 -159 -0.803%
cmd/compile/internal/wasm.s 13329 13290 -39 -0.293%
cmd/compile/internal/s390x.s 28097 27892 -205 -0.730%
cmd/compile/internal/arm.s 31489 31321 -168 -0.534%
cmd/compile/internal/arm64.s 29803 29590 -213 -0.715%
cmd/compile/internal/amd64.s 32961 33221 +260 +0.789%
cmd/compile/internal/x86.s 31029 30878 -151 -0.487%
total 18534966 18440341 -94625 -0.511%
Change-Id: I830d37364f14f0297800adc42c99f60a74c51aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/226367
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The load and test instructions compare the given value
against zero and will produce a condition code indicating
one of the following scenarios:
0: Result is zero
1: Result is less than zero
2: Result is greater than zero
3: Result is not a number (NaN)
The instruction can be used to simplify floating point comparisons
against zero, which can enable further optimizations.
This CL also reduces the size of .text section of math.test binary by around
0.7 KB (in hexadecimal, from 1358f0 to 135620).
Change-Id: I33cb714f0c6feebac7a1c46dfcc735e7daceff9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/209159
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Provide Add32, Add64, Cas32, Cas64, Exchange32 and Exchange64 atomic
intrinsics on riscv64.
Updates #36765
Change-Id: I9a3b7d2ce3d49f699171fd76a0fed891d149a6bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/223559
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Updates #36765
Change-Id: Id5ce5c5f60112e4f4cf9eec1b1ec120994934950
Reviewed-on: https://go-review.googlesource.com/c/go/+/223558
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Because of the index, these ops can't guarantee faulting if arg0 is nil.
Clean up the PPC64 index ops - they can't take a sym or an offset.
Noticed while debugging #37881. I don't think it is the cause, but I guess
there is a chance.
Update #37881
Change-Id: Ic22925250bf7b1ba64e3cea1a65638bc4bab390c
Reviewed-on: https://go-review.googlesource.com/c/go/+/224457
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Updates #36765
Change-Id: Ieeb6bbc54e4841a1348ad50e80342ec4bc675e07
Reviewed-on: https://go-review.googlesource.com/c/go/+/223557
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Also rewrite subtraction of zero to NEG/NEGW.
Change-Id: I216e286d1860055f2a07fe2f772cd50f366ea097
Reviewed-on: https://go-review.googlesource.com/c/go/+/221691
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: I24a72c3fb8d72a47cfded4b523c5d7aa2d40419d
Reviewed-on: https://go-review.googlesource.com/c/go/+/221690
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This commit adds a new cmd/compile flag -spectre,
which accepts a comma-separated list of possible
Spectre mitigations to apply, or the empty string (none),
or "all". The only known mitigation right now is "index",
which uses conditional moves to ensure that x86-64 CPUs
do not speculate past index bounds checks.
Speculating past index bounds checks may be problematic
on systems running privileged servers that accept requests
from untrusted users who can execute their own programs
on the same machine. (And some more constraints that
make it even more unlikely in practice.)
The cases this protects against are analogous to the ones
Microsoft explains in the "Array out of bounds load/store feeding ..."
sections here:
https://docs.microsoft.com/en-us/cpp/security/developer-guidance-speculative-execution?view=vs-2019#array-out-of-bounds-load-feeding-an-indirect-branch
Change-Id: Ib7532d7e12466b17e04c4e2075c2a456dc98f610
Reviewed-on: https://go-review.googlesource.com/c/go/+/222660
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This allows for zero stores to be performed using the zero register, rather
than loading a separate register with zero.
Change-Id: Ic81d8dbcdacbb2ca2c3f77682ff5ad7cdc33d18d
Reviewed-on: https://go-review.googlesource.com/c/go/+/221684
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Const64 gets lowered to MOVDconst.
Change rules using interior Const64 to use MOVDconst instead,
to be less dependent on rule application order.
As a result of doing this, some of the rules end up being
exact duplicates; remove those.
We had those exact duplicates because of the order dependency;
ppc64 had no way to optimize away shifts by a constant
if the initial lowering didn't catch it.
Add those optimizations as well.
The outcome is the same, but this makes the overall rules more robust.
Change-Id: Iadd97a9fe73d52358d571d022ace145e506d160b
Reviewed-on: https://go-review.googlesource.com/c/go/+/220877
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Add rules for lowering float <-> unsigned int on s390x.
During compilation,
Cvt64Uto64F rule triggers around 80 times,
Cvt64Fto64U rule triggers around 20 times,
Cvt64Uto32F rule triggers around 5 times.
Change-Id: If4c9d128b9132fce8c0bea9abc09cb43a5df7989
Reviewed-on: https://go-review.googlesource.com/c/go/+/209177
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
We try to preserve type correctness of generic ops.
phiopt modified a bool to be an int without a conversion.
Add a conversion. There are a few random fluctations in the
generated code as a result, but nothing noteworthy or systematic.
no binary size changes
file before after Δ %
math.s 35966 35961 -5 -0.014%
debug/dwarf.s 108141 108147 +6 +0.006%
crypto/dsa.s 6047 6044 -3 -0.050%
image/png.s 42882 42885 +3 +0.007%
go/parser.s 80281 80278 -3 -0.004%
cmd/internal/obj.s 115116 115113 -3 -0.003%
go/types.s 322130 322118 -12 -0.004%
cmd/internal/obj/arm64.s 151679 151685 +6 +0.004%
go/internal/gccgoimporter.s 56487 56493 +6 +0.011%
cmd/test2json.s 1650 1647 -3 -0.182%
cmd/link/internal/loadelf.s 35442 35443 +1 +0.003%
cmd/go/internal/work.s 305039 305035 -4 -0.001%
cmd/link/internal/ld.s 544835 544834 -1 -0.000%
net/http.s 558777 558774 -3 -0.001%
cmd/compile/internal/ssa.s 3926551 3926994 +443 +0.011%
cmd/compile/internal/gc.s 1552320 1552321 +1 +0.000%
total 18862241 18862670 +429 +0.002%
Change-Id: I4289e773be6be534ea3f907d68f614441b8f9b46
Reviewed-on: https://go-review.googlesource.com/c/go/+/221607
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The goal here is improved AuxInt printing in ssa.html.
Instead of displaying an inscrutable encoded integer,
it displays something like
v25 (28) = UBFX <int> [lsb=4,width=8] v52
which is much nicer for debugging.
Change-Id: I40713ff7f4a857c4557486cdf73c2dff137511ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/221420
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
SignExt32to64 can be implemented with a single ADDIW instruction, rather than
the two shifts that are in use currently.
Change-Id: Ie1bbaef4018f1ba5162773fc64fa5a887457cfc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/220922
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Use SUBW to perform a 32-bit subtraction, rather than zero extending from
32 to 64 bits. This reduces Eq32 and Neq32 to two instructions, rather than
the four instructions required previously.
Change-Id: Ib2798324881e9db842c864e91a0c1b1e48c4b67b
Reviewed-on: https://go-review.googlesource.com/c/go/+/220921
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
The generic Greater and Geq ops can always be replaced with the Less and
Leq ops. This CL therefore removes them. This simplifies the compiler since
it reduces the number of operations that need handling in both code and in
rewrite rules. This will be especially true when adding control flow
optimizations such as the integer-in-range optimizations in CL 165998.
Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040
Reviewed-on: https://go-review.googlesource.com/c/go/+/220417
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
And use this newfound power to more precisely describe some PPC64 ops.
Change-Id: Idb2b669d74fbab5f3508edf19f7e3347306b0daf
Reviewed-on: https://go-review.googlesource.com/c/go/+/217002
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Updates #36223
(Might fix #36223. I'm not sure whether there are more outstanding.)
This helps a bit, but not as much as I'd expected/hoped.
file before after Δ %
runtime.s 477286 477256 -30 -0.006%
bytes.s 31089 31085 -4 -0.013%
time.s 83561 83547 -14 -0.017%
strings.s 43284 43280 -4 -0.009%
compress/flate.s 51374 51295 -79 -0.154%
math/big.s 184283 184256 -27 -0.015%
crypto/elliptic.s 51649 51577 -72 -0.139%
crypto/sha512.s 8661 8644 -17 -0.196%
crypto/sha1.s 6975 6959 -16 -0.229%
crypto/sha256.s 6412 6393 -19 -0.296%
vendor/golang.org/x/text/unicode/bidi.s 27158 27146 -12 -0.044%
vendor/golang.org/x/text/unicode/norm.s 66802 66788 -14 -0.021%
net/http.s 560936 560929 -7 -0.001%
text/template.s 96475 96467 -8 -0.008%
go/parser.s 80284 80280 -4 -0.005%
text/tabwriter.s 9618 9611 -7 -0.073%
go/printer.s 78502 78499 -3 -0.004%
go/types.s 321815 321807 -8 -0.002%
internal/xcoff.s 23175 23171 -4 -0.017%
image/jpeg.s 36609 36587 -22 -0.060%
cmd/vendor/golang.org/x/arch/x86/x86asm.s 81274 81001 -273 -0.336%
cmd/internal/obj.s 115184 115126 -58 -0.050%
cmd/internal/obj/arm64.s 151502 151487 -15 -0.010%
cmd/internal/obj/s390x.s 128054 128046 -8 -0.006%
cmd/internal/obj/wasm.s 44295 44291 -4 -0.009%
cmd/compile/internal/ssa.s 4201992 4209504 +7512 +0.179%
cmd/compile/internal/gc.s 1555029 1555011 -18 -0.001%
total 9792875 9799640 +6765 +0.069%
Change-Id: If4a857c0953a766578e68aa299b112a20d9b2b86
Reviewed-on: https://go-review.googlesource.com/c/go/+/213704
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
HMUL is commutative. However, it has asymmetric register requirements.
There are existing rewrite rules to place arguments in preferable slots.
Due to a bug, the existing rulegen commutativity engine doesn't generate
the commuted form of the HMUL rules.
The commuted form of those rewrite rules cause infinite loops.
In order to fix the rulegen commutativity bug,
we need to choose between eliminating
those rewrite rules and marking HMUL ops as not commutative.
This change chooses the latter, since doing so yields better
optimization results on std+cmd.
Removing the rewrite rules yields only text size regressions:
file before after Δ %
runtime.s 477257 477269 +12 +0.003%
time.s 83552 83612 +60 +0.072%
encoding/asn1.s 57378 57382 +4 +0.007%
cmd/go/internal/modfetch/codehost.s 89822 89829 +7 +0.008%
cmd/internal/test2json.s 9459 9466 +7 +0.074%
cmd/go/internal/test.s 57665 57678 +13 +0.023%
Marking HMUL as not commutative actually yields (mostly) improvements:
file before after Δ %
runtime.s 477257 477247 -10 -0.002%
math.s 35985 35992 +7 +0.019%
strconv.s 53486 53462 -24 -0.045%
syscall.s 82483 82446 -37 -0.045%
time.s 83552 83561 +9 +0.011%
os.s 52691 52684 -7 -0.013%
archive/zip.s 42285 42272 -13 -0.031%
encoding/asn1.s 57378 57329 -49 -0.085%
encoding/base64.s 12156 12094 -62 -0.510%
net.s 296286 296276 -10 -0.003%
encoding/base32.s 9720 9658 -62 -0.638%
net/http.s 560931 560907 -24 -0.004%
net/smtp.s 14421 14411 -10 -0.069%
cmd/vendor/golang.org/x/sys/unix.s 74307 74266 -41 -0.055%
The regressions are minor, and are in functions math.cbrt,
time.Time.String, and time.Date.
Change-Id: I9f6d9ee71654e5b70381cac77b0ac26011f4ea12
Reviewed-on: https://go-review.googlesource.com/c/go/+/213701
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Based on riscv-go port.
Updates #27532
Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf
Reviewed-on: https://go-review.googlesource.com/c/go/+/204631
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Count Values with side effects but no use as live, and don't fuse
branches that contain such Values. (This can happen e.g. when it
is followed by an infinite loop.) Otherwise this may lead to
miscompilation (side effect fired at wrong condition) or ICE (two
stores live simultaneously).
Fixes #36005.
Change-Id: If202eae4b37cb7f0311d6ca120ffa46609925157
Reviewed-on: https://go-review.googlesource.com/c/go/+/210179
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Intrinsify these functions to match other platforms. Update the
sequence of instructions used in the assembly implementations to
match the intrinsics.
Also, add a micro benchmark so we can more easily measure the
performance of these two functions:
name old time/op new time/op delta
And8-8 5.33ns ± 7% 2.55ns ± 8% -52.12% (p=0.000 n=20+20)
And8Parallel-8 7.39ns ± 5% 3.74ns ± 4% -49.34% (p=0.000 n=20+20)
Or8-8 4.84ns ±15% 2.64ns ±11% -45.50% (p=0.000 n=20+20)
Or8Parallel-8 7.27ns ± 3% 3.84ns ± 4% -47.10% (p=0.000 n=19+20)
By using a 'rotate then xor selected bits' instruction combined with
either a 'load and and' or a 'load and or' instruction we can
implement And8 and Or8 with far fewer instructions. Replacing
'compare and swap' with atomic instructions may also improve
performance when there is contention.
Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01
Reviewed-on: https://go-review.googlesource.com/c/go/+/204277
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
We'll use CTR as a scratch register for call injection. Mark code
sequences that use CTR as unsafe for async preemption. Currently
it is only used in LoweredZero and LoweredMove. It is unfortunate
that they are nonpreemptible. But I think it is still better than
using LR for call injection and marking all leaf functions
nonpreemptible.
Also mark the prologue of large frame functions nonpreemptible,
as we write below SP.
Change-Id: I05a75431499f3f4b2f23651a7b17f7fcf2afbe06
Reviewed-on: https://go-review.googlesource.com/c/go/+/203823
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Mark atomic LL/SC loops as unsafe for async preemption, as they
use REGTMP.
Change-Id: I5be7f93ad3ee337049ec7c3efd6fdc30eef87d97
Reviewed-on: https://go-review.googlesource.com/c/go/+/203719
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).
I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.
Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.
The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.
For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.
So math.FMA, not math.Fma.
This API has not yet been released, so this change does not break
the compatibility promise.
This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.
Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Introduce a mechanism for marking architecture-specific Ops
unsafe. And mark ones that use REGTMP on ARM64, as for async
preemption we will be using REGTMP as a temporary register in the
injected call.
Change-Id: I8ff22e87d8f9cb10d02a2f0af7c12ad6d7d58f54
Reviewed-on: https://go-review.googlesource.com/c/go/+/203459
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
|
|
CL 203284 added a compiler intrinsics from atomic Load8 and Store8 on
several architectures, but missed the lowering on MIPS. This CL fixes
that.
Updates #10958, #24543.
Change-Id: I82e88971554fe8c33ad2bf195a633c44b9ac4cf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/203977
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
For #10958, #24543, but makes sense on its own.
Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203284
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|