aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/opGen.go
diff options
context:
space:
mode:
authorJosh Bleecher Snyder <josharian@gmail.com>2019-12-19 10:58:28 -0800
committerJosh Bleecher Snyder <josharian@gmail.com>2020-04-04 01:01:04 +0000
commitfff7509d472778cae5e652dbe2479929c666c24f (patch)
treea8c8bc9e9396f7230310aa3c5fa1f2ee2a75a646 /src/cmd/compile/internal/ssa/opGen.go
parented7a8332c413f41d466db3bfc9606025e0c264d8 (diff)
downloadgo-fff7509d472778cae5e652dbe2479929c666c24f.tar.gz
go-fff7509d472778cae5e652dbe2479929c666c24f.zip
cmd/compile: add intrinsic HasCPUFeature for checking cpu features
Before using some CPU instructions, we must check for their presence. We use global variables in the runtime package to record features. Prior to this CL, we issued a regular memory load for these features. The downside to this is that, because it is a regular memory load, it cannot be hoisted out of loops or otherwise reordered with other loads. This CL introduces a new intrinsic just for checking cpu features. It still ends up resulting in a memory load, but that memory load can now be floated to the entry block and rematerialized as needed. One downside is that the regular load could be combined with the comparison into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE. (It is possible that MOVBQZX+TESTQ+NE would be better.) This CL does only amd64. It is easy to extend to other architectures. For the benchmark in #36196, on my machine, this offers a mild speedup. name old time/op new time/op delta FMA-8 1.39ns ± 6% 1.29ns ± 9% -7.19% (p=0.000 n=97+96) NonFMA-8 2.03ns ±11% 2.04ns ±12% ~ (p=0.618 n=99+98) Updates #15808 Updates #36196 Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb Reviewed-on: https://go-review.googlesource.com/c/go/+/212360 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com>
Diffstat (limited to 'src/cmd/compile/internal/ssa/opGen.go')
-rw-r--r--src/cmd/compile/internal/ssa/opGen.go21
1 files changed, 21 insertions, 0 deletions
diff --git a/src/cmd/compile/internal/ssa/opGen.go b/src/cmd/compile/internal/ssa/opGen.go
index bf48bff8f1..e8d1b841c8 100644
--- a/src/cmd/compile/internal/ssa/opGen.go
+++ b/src/cmd/compile/internal/ssa/opGen.go
@@ -885,6 +885,7 @@ const (
OpAMD64LoweredGetCallerSP
OpAMD64LoweredNilCheck
OpAMD64LoweredWB
+ OpAMD64LoweredHasCPUFeature
OpAMD64LoweredPanicBoundsA
OpAMD64LoweredPanicBoundsB
OpAMD64LoweredPanicBoundsC
@@ -2596,6 +2597,7 @@ const (
OpMoveWB
OpZeroWB
OpWB
+ OpHasCPUFeature
OpPanicBounds
OpPanicExtend
OpClosureCall
@@ -11651,6 +11653,18 @@ var opcodeTable = [...]opInfo{
},
},
{
+ name: "LoweredHasCPUFeature",
+ auxType: auxSym,
+ argLen: 0,
+ rematerializeable: true,
+ symEffect: SymNone,
+ reg: regInfo{
+ outputs: []outputInfo{
+ {0, 65519}, // AX CX DX BX BP SI DI R8 R9 R10 R11 R12 R13 R14 R15
+ },
+ },
+ },
+ {
name: "LoweredPanicBoundsA",
auxType: auxInt64,
argLen: 3,
@@ -32980,6 +32994,13 @@ var opcodeTable = [...]opInfo{
generic: true,
},
{
+ name: "HasCPUFeature",
+ auxType: auxSym,
+ argLen: 0,
+ symEffect: SymNone,
+ generic: true,
+ },
+ {
name: "PanicBounds",
auxType: auxInt64,
argLen: 3,