Age | Commit message (Collapse) | Author |
|
Currently, deferreturn runs deferred functions by backing up its
return PC to the deferreturn call, and then effectively tail-calling
the deferred function (via jmpdefer). The effect of this is that the
deferred function appears to be called directly from the deferee, and
when it returns, the deferee calls deferreturn again so it can run the
next deferred function if necessary.
This unusual flow control leads to a large number of special cases and
complications all over the tool chain.
This used to be necessary because deferreturn copied the deferred
function's argument frame directly into its caller's frame and then
had to invoke that call as if it had been called from its caller's
frame so it could access it arguments. But now that we've simplified
defer processing so the runtime only deals with argument-less
closures, this approach is no longer necessary.
This CL simplifies all of this by making deferreturn simply call
deferred functions in a loop.
This eliminates the need for jmpdefer, so we can delete a bunch of
per-architecture assembly code.
This eliminates several special cases on Wasm, since it couldn't
support these calling shenanigans directly and thus had to simulate
the loop a different way. Now Wasm can largely work the way the other
platforms do.
This eliminates the per-architecture Ginsnopdefer operation. On PPC64,
this was necessary to reload the TOC pointer after the tail call
(since TOC pointers in general make tail calls impossible). The tail
call is gone, and in the case where we do force a jump to the
deferreturn call when recovering from an open-coded defer, we go
through gogo (via runtime.recovery), which handles the TOC. On other
platforms, we needed a NOP so traceback didn't get confused by seeing
the return to the CALL instruction, rather than the usual return to
the instruction following the CALL instruction. Now we don't inject a
return to the CALL instruction at all, so this NOP is also
unnecessary.
The one potential effect of this is that deferreturn could now appear
in stack traces from deferred functions. However, this could already
happen from open-coded defers, so we've long since marked deferreturn
as a "wrapper" so it gets elided not only from printed stack traces,
but from runtime.Callers*.
This is a retry of CL 337652 because we had to back out its parent.
There are no changes in this version.
Change-Id: I3f54b7fec1d7ccac71cc6cf6835c6a46b7e5fb6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/339397
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
replace jmpdefer with a loop"
This reverts CL 227652.
I'm reverting CL 337651 and this builds on top of it.
Change-Id: I03ce363be44c2a3defff2e43e7b1aad83386820d
Reviewed-on: https://go-review.googlesource.com/c/go/+/338709
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Currently, deferreturn runs deferred functions by backing up its
return PC to the deferreturn call, and then effectively tail-calling
the deferred function (via jmpdefer). The effect of this is that the
deferred function appears to be called directly from the deferee, and
when it returns, the deferee calls deferreturn again so it can run the
next deferred function if necessary.
This unusual flow control leads to a large number of special cases and
complications all over the tool chain.
This used to be necessary because deferreturn copied the deferred
function's argument frame directly into its caller's frame and then
had to invoke that call as if it had been called from its caller's
frame so it could access it arguments. But now that we've simplified
defer processing so the runtime only deals with argument-less
closures, this approach is no longer necessary.
This CL simplifies all of this by making deferreturn simply call
deferred functions in a loop.
This eliminates the need for jmpdefer, so we can delete a bunch of
per-architecture assembly code.
This eliminates several special cases on Wasm, since it couldn't
support these calling shenanigans directly and thus had to simulate
the loop a different way. Now Wasm can largely work the way the other
platforms do.
This eliminates the per-architecture Ginsnopdefer operation. On PPC64,
this was necessary to reload the TOC pointer after the tail call
(since TOC pointers in general make tail calls impossible). The tail
call is gone, and in the case where we do force a jump to the
deferreturn call when recovering from an open-coded defer, we go
through gogo (via runtime.recovery), which handles the TOC. On other
platforms, we needed a NOP so traceback didn't get confused by seeing
the return to the CALL instruction, rather than the usual return to
the instruction following the CALL instruction. Now we don't inject a
return to the CALL instruction at all, so this NOP is also
unnecessary.
The one potential effect of this is that deferreturn could now appear
in stack traces from deferred functions. However, this could already
happen from open-coded defers, so we've long since marked deferreturn
as a "wrapper" so it gets elided not only from printed stack traces,
but from runtime.Callers*.
Change-Id: Ie9f700cd3fb774f498c9edce363772a868407bf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/337652
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.
Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
|
|
When -clobberdeadreg flag is set, the compiler inserts code that
clobbers integer registers at call sites. This may be helpful for
debugging register ABI.
Only implemented on AMD64 for now.
Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899
Reviewed-on: https://go-review.googlesource.com/c/go/+/302809
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Add generic rule to rewrite the single-precision square root expression
with one single-precision instruction. The optimization will reduce two
times of precision converting between double-precision and single-precision.
On arm64 flatform.
previous:
FCVTSD F0, F0
FSQRTD F0, F0
FCVTDS F0, F0
optimized:
FSQRTS S0, S0
And this patch adds the test case to check the correctness.
This patch refers to CL 241877, contributed by Alice Xu
(dianhong.xu@arm.com)
Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/276873
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
These are the the most common uses, and they reduce line noise.
I don't love adding new deprecated APIs,
but since they're trivial wrappers,
it'll be very easy to update them along with the rest.
No functional changes; passes toolstash-check.
Change-Id: I691a8175cfef9081180e463c63f326376af3f3a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/296009
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
No functional changes; passes toolstash-check.
No measureable performance changes.
Change-Id: I2629f73d4a3cc56d80f512f33cf57cf41d8f15d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/296010
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# maxOpenDefers is declared in ssa.go but used only by walk.
mv maxOpenDefers walk.go
# gc.Arch -> ssagen.Arch
# It is not as nice but will do for now.
mv Arch ArchInfo
mv thearch Arch
mv Arch ArchInfo arch.go
# Pull dwarf out of pgen.go.
mv debuginfo declPos createDwarfVars preInliningDcls \
createSimpleVars createSimpleVar \
createComplexVars createComplexVar \
dwarf.go
# Pull high-level compilation out of pgen.go,
# leaving only the SSA code.
mv compilequeue funccompile compile compilenow \
compileFunctions isInlinableButNotInlined \
initLSym \
compile.go
mv BoundsCheckFunc GCWriteBarrierReg ssa.go
mv largeStack largeStackFrames CheckLargeStacks pgen.go
# All that is left in dcl.go is the nowritebarrierrecCheck
mv dcl.go nowb.go
# Export API and unexport non-API.
mv initssaconfig InitConfig
mv isIntrinsicCall IsIntrinsicCall
mv ssaDumpInline DumpInline
mv initSSATables InitTables
mv initSSAEnv InitEnv
mv compileSSA Compile
mv stackOffset StackOffset
mv canSSAType TypeOK
mv SSAGenState State
mv FwdRefAux fwdRefAux
mv cgoSymABIs CgoSymABIs
mv readSymABIs ReadSymABIs
mv initLSym InitLSym
mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go
mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen
'
rm go.go gsubr.go
Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/279476
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Object file writing routines are used not just at the end
of the compilation but also during static data layout in walk.
Split them into their own package.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# Move bit vector to new package bitvec
mv bvec.n bvec.N
mv bvec.b bvec.B
mv bvec BitVec
mv bvalloc New
mv bvbulkalloc NewBulk
mv bulkBvec.next bulkBvec.Next
mv bulkBvec Bulk
mv H0 h0
mv Hp hp
# Leave bvecSet and bitmap hashes behind - not needed as broadly.
mv bvecSet.extractUniqe bvecSet.extractUnique
mv h0 bvecSet bvecSet.grow bvecSet.add \
bvecSet.extractUnique hashbitmap bvset.go
mv bv.go cmd/compile/internal/bitvec
ex . ../arm ../arm64 ../mips ../mips64 ../ppc64 ../s390x ../riscv64 {
import "cmd/internal/obj"
var a *obj.Addr
var i int64
Addrconst(a, i) -> a.SetConst(i)
var p, to *obj.Prog
Patch(p, to) -> p.To.SetTarget(to)
}
rm Addrconst Patch
# Move object-writing API to new package objw
mv duint8 Objw_Uint8
mv duint16 Objw_Uint16
mv duint32 Objw_Uint32
mv duintptr Objw_Uintptr
mv duintxx Objw_UintN
mv dsymptr Objw_SymPtr
mv dsymptrOff Objw_SymPtrOff
mv dsymptrWeakOff Objw_SymPtrWeakOff
mv ggloblsym Objw_Global
mv dbvec Objw_BitVec
mv newProgs NewProgs
mv Progs.clearp Progs.Clear
mv Progs.settext Progs.SetText
mv Progs.next Progs.Next
mv Progs.pc Progs.PC
mv Progs.pos Progs.Pos
mv Progs.curfn Progs.CurFunc
mv Progs.progcache Progs.Cache
mv Progs.cacheidx Progs.CacheIndex
mv Progs.nextLive Progs.NextLive
mv Progs.prevLive Progs.PrevLive
mv Progs.Appendpp Progs.Append
mv LivenessIndex.stackMapIndex LivenessIndex.StackMapIndex
mv LivenessIndex.isUnsafePoint LivenessIndex.IsUnsafePoint
mv Objw_Uint8 Objw_Uint16 Objw_Uint32 Objw_Uintptr Objw_UintN \
Objw_SymPtr Objw_SymPtrOff Objw_SymPtrWeakOff Objw_Global \
Objw_BitVec \
objw.go
mv sharedProgArray NewProgs Progs \
LivenessIndex StackMapDontCare \
LivenessDontCare LivenessIndex.StackMapValid \
Progs.NewProg Progs.Flush Progs.Free Progs.Prog Progs.Clear Progs.Append Progs.SetText \
prog.go
mv prog.go objw.go cmd/compile/internal/objw
# Move ggloblnod to obj with the rest of the non-objw higher-level writing.
mv ggloblnod obj.go
'
cd ../objw
rf '
mv Objw_Uint8 Uint8
mv Objw_Uint16 Uint16
mv Objw_Uint32 Uint32
mv Objw_Uintptr Uintptr
mv Objw_UintN UintN
mv Objw_SymPtr SymPtr
mv Objw_SymPtrOff SymPtrOff
mv Objw_SymPtrWeakOff SymPtrWeakOff
mv Objw_Global Global
mv Objw_BitVec BitVec
'
Change-Id: I2b87085aa788564fb322e9c55bddd73347b4d5fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/279310
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
[generated]
To break up package gc, we need to put these calculations somewhere
lower in the import graph, either an existing or new package. Package types
already needs this code and is using hacks to get it without an import cycle.
We can remove the hacks and set up for the new package gc by moving the
code into package types itself.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# Remove old import cycle hacks in gc.
rm TypecheckInit:/types.Widthptr =/-0,/types.Dowidth =/+0 \
../ssa/export_test.go:/types.Dowidth =/-+
ex {
import "cmd/compile/internal/types"
types.Widthptr -> Widthptr
types.Dowidth -> dowidth
}
# Disable CalcSize in tests instead of base.Fatalf
sub dowidth:/base.Fatalf\("dowidth without betypeinit"\)/ \
// Assume this is a test. \
return
# Move size calculation into cmd/compile/internal/types
mv Widthptr PtrSize
mv Widthreg RegSize
mv slicePtrOffset SlicePtrOffset
mv sliceLenOffset SliceLenOffset
mv sliceCapOffset SliceCapOffset
mv sizeofSlice SliceSize
mv sizeofString StringSize
mv skipDowidthForTracing SkipSizeForTracing
mv dowidth CalcSize
mv checkwidth CheckSize
mv widstruct calcStructOffset
mv sizeCalculationDisabled CalcSizeDisabled
mv defercheckwidth DeferCheckSize
mv resumecheckwidth ResumeCheckSize
mv typeptrdata PtrDataSize
mv \
PtrSize RegSize SlicePtrOffset SkipSizeForTracing typePos align.go PtrDataSize \
size.go
mv size.go cmd/compile/internal/types
'
: # Remove old import cycle hacks in types.
cd ../types
rf '
ex {
Widthptr -> PtrSize
Dowidth -> CalcSize
}
rm Widthptr Dowidth
'
Change-Id: Ib96cdc6bda2617235480c29392ea5cfb20f60cd8
Reviewed-on: https://go-review.googlesource.com/c/go/+/279234
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
There are a handful of pre-computed magic symbols known by
package gc, and we need a place to store them.
If we keep them together, the need for type *ir.Name means that
package ir is the lowest package in the import hierarchy that they
can go in. And package ir needs gopkg for methodSymSuffix
(in a later CL), so they can't go any higher either, at least not all together.
So package ir it is.
Rather than dump them all into the top-level package ir
namespace, however, we introduce global structs, Syms, Pkgs, and Names,
and make the known symbols, packages, and names fields of those.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
add go.go:$ \
// Names holds known names. \
var Names struct{} \
\
// Syms holds known symbols. \
var Syms struct {} \
\
// Pkgs holds known packages. \
var Pkgs struct {} \
mv staticuint64s Names.Staticuint64s
mv zerobase Names.Zerobase
mv assertE2I Syms.AssertE2I
mv assertE2I2 Syms.AssertE2I2
mv assertI2I Syms.AssertI2I
mv assertI2I2 Syms.AssertI2I2
mv deferproc Syms.Deferproc
mv deferprocStack Syms.DeferprocStack
mv Deferreturn Syms.Deferreturn
mv Duffcopy Syms.Duffcopy
mv Duffzero Syms.Duffzero
mv gcWriteBarrier Syms.GCWriteBarrier
mv goschedguarded Syms.Goschedguarded
mv growslice Syms.Growslice
mv msanread Syms.Msanread
mv msanwrite Syms.Msanwrite
mv msanmove Syms.Msanmove
mv newobject Syms.Newobject
mv newproc Syms.Newproc
mv panicdivide Syms.Panicdivide
mv panicshift Syms.Panicshift
mv panicdottypeE Syms.PanicdottypeE
mv panicdottypeI Syms.PanicdottypeI
mv panicnildottype Syms.Panicnildottype
mv panicoverflow Syms.Panicoverflow
mv raceread Syms.Raceread
mv racereadrange Syms.Racereadrange
mv racewrite Syms.Racewrite
mv racewriterange Syms.Racewriterange
mv SigPanic Syms.SigPanic
mv typedmemclr Syms.Typedmemclr
mv typedmemmove Syms.Typedmemmove
mv Udiv Syms.Udiv
mv writeBarrier Syms.WriteBarrier
mv zerobaseSym Syms.Zerobase
mv arm64HasATOMICS Syms.ARM64HasATOMICS
mv armHasVFPv4 Syms.ARMHasVFPv4
mv x86HasFMA Syms.X86HasFMA
mv x86HasPOPCNT Syms.X86HasPOPCNT
mv x86HasSSE41 Syms.X86HasSSE41
mv WasmDiv Syms.WasmDiv
mv WasmMove Syms.WasmMove
mv WasmZero Syms.WasmZero
mv WasmTruncS Syms.WasmTruncS
mv WasmTruncU Syms.WasmTruncU
mv gopkg Pkgs.Go
mv itabpkg Pkgs.Itab
mv itablinkpkg Pkgs.Itablink
mv mappkg Pkgs.Map
mv msanpkg Pkgs.Msan
mv racepkg Pkgs.Race
mv Runtimepkg Pkgs.Runtime
mv trackpkg Pkgs.Track
mv unsafepkg Pkgs.Unsafe
mv Names Syms Pkgs symtab.go
mv symtab.go cmd/compile/internal/ir
'
Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/279235
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Now that the only remaining ir.Node implementation that is stored
(directly) into ssa.Aux, we can rewrite all of the conversions between
ir.Node and ssa.Aux to use *ir.Name instead.
rf doesn't have a way to rewrite the type switch case clauses, so we
just use sed instead. There's only a handful, and they're the only
times that "case ir.Node" appears anyway.
The next CL will move the tag method declarations so that ir.Node no
longer implements ssa.Aux.
Passes buildall w/ toolstash -cmp.
Updates #42982.
[git-generate]
cd src/cmd/compile/internal
sed -i -e 's/case ir.Node/case *ir.Name/' gc/plive.go */ssa.go
cd ssa
rf '
ex . ../gc {
import "cmd/compile/internal/ir"
var v *Value
v.Aux.(ir.Node) -> v.Aux.(*ir.Name)
var n ir.Node
var asAux func(Aux)
strict n # only match ir.Node-typed expressions; not *ir.Name
implicit asAux # match implicit assignments to ssa.Aux
asAux(n) -> n.(*ir.Name)
}
'
Change-Id: I3206ef5f12a7cfa37c5fecc67a1ca02ea4d52b32
Reviewed-on: https://go-review.googlesource.com/c/go/+/275789
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct.
The previous CL defined an interface INode modeling a *Node.
This CL:
- Changes all references outside internal/ir to use INode,
along with many references inside internal/ir as well.
- Renames Node to node.
- Renames INode to Node
So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible,
and the code outside package ir is now (clearly) using only the interface.
The usual rule is never to redefine an existing name with a new meaning,
so that old code that hasn't been updated gets a "unknown name" error
instead of more mysterious errors or silent misbehavior. That rule would
caution against replacing Node-the-struct with Node-the-interface,
as in this CL, because code that says *Node would now be using a pointer
to an interface. But this CL is being landed at the same time as another that
moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node,
which does follow the rule: any lingering references to gc.Node will be told
it's gone, not silently start using pointers to interfaces. So the rule is followed
by the CL sequence, just not this specific CL.
Overall, the loss of inlining caused by using interfaces cuts the compiler speed
by about 6%, a not insignificant amount. However, as we convert the representation
to concrete structs that are not the giant Node over the next weeks, that speed
should come back as more of the compiler starts operating directly on concrete types
and the memory taken up by the graph of Nodes drops due to the more precise
structs. Honestly, I was expecting worse.
% benchstat bench.old bench.new
name old time/op new time/op delta
Template 168ms ± 4% 182ms ± 2% +8.34% (p=0.000 n=9+9)
Unicode 72.2ms ±10% 82.5ms ± 6% +14.38% (p=0.000 n=9+9)
GoTypes 563ms ± 8% 598ms ± 2% +6.14% (p=0.006 n=9+9)
Compiler 2.89s ± 4% 3.04s ± 2% +5.37% (p=0.000 n=10+9)
SSA 6.45s ± 4% 7.25s ± 5% +12.41% (p=0.000 n=9+10)
Flate 105ms ± 2% 115ms ± 1% +9.66% (p=0.000 n=10+8)
GoParser 144ms ±10% 152ms ± 2% +5.79% (p=0.011 n=9+8)
Reflect 345ms ± 9% 370ms ± 4% +7.28% (p=0.001 n=10+9)
Tar 149ms ± 9% 161ms ± 5% +8.05% (p=0.001 n=10+9)
XML 190ms ± 3% 209ms ± 2% +9.54% (p=0.000 n=9+8)
LinkCompiler 327ms ± 2% 325ms ± 2% ~ (p=0.382 n=8+8)
ExternalLinkCompiler 1.77s ± 4% 1.73s ± 6% ~ (p=0.113 n=9+10)
LinkWithoutDebugCompiler 214ms ± 4% 211ms ± 2% ~ (p=0.360 n=10+8)
StdCmd 14.8s ± 3% 15.9s ± 1% +6.98% (p=0.000 n=10+9)
[Geo mean] 480ms 510ms +6.31%
name old user-time/op new user-time/op delta
Template 223ms ± 3% 237ms ± 3% +6.16% (p=0.000 n=9+10)
Unicode 103ms ± 6% 113ms ± 3% +9.53% (p=0.000 n=9+9)
GoTypes 758ms ± 8% 800ms ± 2% +5.55% (p=0.003 n=10+9)
Compiler 3.95s ± 2% 4.12s ± 2% +4.34% (p=0.000 n=10+9)
SSA 9.43s ± 1% 9.74s ± 4% +3.25% (p=0.000 n=8+10)
Flate 132ms ± 2% 141ms ± 2% +6.89% (p=0.000 n=9+9)
GoParser 177ms ± 9% 183ms ± 4% ~ (p=0.050 n=9+9)
Reflect 467ms ±10% 495ms ± 7% +6.17% (p=0.029 n=10+10)
Tar 183ms ± 9% 197ms ± 5% +7.92% (p=0.001 n=10+10)
XML 249ms ± 5% 268ms ± 4% +7.82% (p=0.000 n=10+9)
LinkCompiler 544ms ± 5% 544ms ± 6% ~ (p=0.863 n=9+9)
ExternalLinkCompiler 1.79s ± 4% 1.75s ± 6% ~ (p=0.075 n=10+10)
LinkWithoutDebugCompiler 248ms ± 6% 246ms ± 2% ~ (p=0.965 n=10+8)
[Geo mean] 483ms 504ms +4.41%
[git-generate]
cd src/cmd/compile/internal/ir
: # We need to do the conversion in multiple steps, so we introduce
: # a temporary type alias that will start out meaning the pointer-to-struct
: # and then change to mean the interface.
rf '
mv Node OldNode
add node.go \
type Node = *OldNode
'
: # It should work to do this ex in ir, but it misses test files, due to a bug in rf.
: # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests.
cd ../gc
rf '
ex . ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
cd ../ssa
rf '
ex {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
: # Back in ir, finish conversion clumsily with sed,
: # because type checking and circular aliases do not mix.
cd ../ir
sed -i '' '
/type Node = \*OldNode/d
s/\*OldNode/Node/g
s/^func (n Node)/func (n *OldNode)/
s/OldNode/node/g
s/type INode interface/type Node interface/
s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/
' *.go
gofmt -w *.go
sed -i '' '
s/{Func{}, 136, 248}/{Func{}, 152, 280}/
s/{Name{}, 32, 56}/{Name{}, 44, 80}/
s/{Param{}, 24, 48}/{Param{}, 44, 88}/
s/{node{}, 76, 128}/{node{}, 88, 152}/
' sizeof_test.go
cd ../ssa
sed -i '' '
s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/
' sizeof_test.go
cd ../gc
sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go
cd ../../../..
go install std cmd
cd cmd/compile
go test -u || go test -u
Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553
Reviewed-on: https://go-review.googlesource.com/c/go/+/272935
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
If we want to break up package gc at all, we will need to move
the compiler IR it defines into a separate package that can be
imported by packages that gc itself imports. This CL does that.
It also removes the TINT8 etc aliases so that all code is clear
about which package things are coming from.
This CL is automatically generated by the script below.
See the comments in the script for details about the changes.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# These names were never fully qualified
# when the types package was added.
# Do it now, to avoid confusion about where they live.
inline -rm \
Txxx \
TINT8 \
TUINT8 \
TINT16 \
TUINT16 \
TINT32 \
TUINT32 \
TINT64 \
TUINT64 \
TINT \
TUINT \
TUINTPTR \
TCOMPLEX64 \
TCOMPLEX128 \
TFLOAT32 \
TFLOAT64 \
TBOOL \
TPTR \
TFUNC \
TSLICE \
TARRAY \
TSTRUCT \
TCHAN \
TMAP \
TINTER \
TFORW \
TANY \
TSTRING \
TUNSAFEPTR \
TIDEAL \
TNIL \
TBLANK \
TFUNCARGS \
TCHANARGS \
NTYPE \
BADWIDTH
# esc.go and escape.go do not need to be split.
# Append esc.go onto the end of escape.go.
mv esc.go escape.go
# Pull out the type format installation from func Main,
# so it can be carried into package ir.
mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats
# Names that need to be exported for use by code left in gc.
mv Isconst IsConst
mv asNode AsNode
mv asNodes AsNodes
mv asTypesNode AsTypesNode
mv basicnames BasicTypeNames
mv builtinpkg BuiltinPkg
mv consttype ConstType
mv dumplist DumpList
mv fdumplist FDumpList
mv fmtMode FmtMode
mv goopnames OpNames
mv inspect Inspect
mv inspectList InspectList
mv localpkg LocalPkg
mv nblank BlankNode
mv numImport NumImport
mv opprec OpPrec
mv origSym OrigSym
mv stmtwithinit StmtWithInit
mv dump DumpAny
mv fdump FDumpAny
mv nod Nod
mv nodl NodAt
mv newname NewName
mv newnamel NewNameAt
mv assertRepresents AssertValidTypeForConst
mv represents ValidTypeForConst
mv nodlit NewLiteral
# Types and fields that need to be exported for use by gc.
mv nowritebarrierrecCallSym SymAndPos
mv SymAndPos.lineno SymAndPos.Pos
mv SymAndPos.target SymAndPos.Sym
mv Func.lsym Func.LSym
mv Func.setWBPos Func.SetWBPos
mv Func.numReturns Func.NumReturns
mv Func.numDefers Func.NumDefers
mv Func.nwbrCalls Func.NWBRCalls
# initLSym is an algorithm left behind in gc,
# not an operation on Func itself.
mv Func.initLSym initLSym
mv nodeQueue NodeQueue
mv NodeQueue.empty NodeQueue.Empty
mv NodeQueue.popLeft NodeQueue.PopLeft
mv NodeQueue.pushRight NodeQueue.PushRight
# Many methods on Node are actually algorithms that
# would apply to any node implementation.
# Those become plain functions.
mv Node.funcname FuncName
mv Node.isBlank IsBlank
mv Node.isGoConst isGoConst
mv Node.isNil IsNil
mv Node.isParamHeapCopy isParamHeapCopy
mv Node.isParamStackCopy isParamStackCopy
mv Node.isSimpleName isSimpleName
mv Node.mayBeShared MayBeShared
mv Node.pkgFuncName PkgFuncName
mv Node.backingArrayPtrLen backingArrayPtrLen
mv Node.isterminating isTermNode
mv Node.labeledControl labeledControl
mv Nodes.isterminating isTermNodes
mv Nodes.sigerr fmtSignature
mv Node.MethodName methodExprName
mv Node.MethodFunc methodExprFunc
mv Node.IsMethod IsMethod
# Every node will need to implement RawCopy;
# Copy and SepCopy algorithms will use it.
mv Node.rawcopy Node.RawCopy
mv Node.copy Copy
mv Node.sepcopy SepCopy
# Extract Node.Format method body into func FmtNode,
# but leave method wrapper behind.
mv Node.Format:0,$ FmtNode
# Formatting helpers that will apply to all node implementations.
mv Node.Line Line
mv Node.exprfmt exprFmt
mv Node.jconv jconvFmt
mv Node.modeString modeString
mv Node.nconv nconvFmt
mv Node.nodedump nodeDumpFmt
mv Node.nodefmt nodeFmt
mv Node.stmtfmt stmtFmt
# Constant support needed for code moving to ir.
mv okforconst OKForConst
mv vconv FmtConst
mv int64Val Int64Val
mv float64Val Float64Val
mv Node.ValueInterface ConstValue
# Organize code into files.
mv LocalPkg BuiltinPkg ir.go
mv NumImport InstallTypeFormats Line fmt.go
mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \
AsNode AsTypesNode BlankNode OrigSym \
Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \
IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \
Node.RawCopy SepCopy Copy \
IsNil IsBlank IsMethod \
Node.Typ Node.StorageClass node.go
mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go
# Move files to new ir package.
mv bitset.go class_string.go dump.go fmt.go \
ir.go node.go op_string.go val.go \
sizeof_test.go cmd/compile/internal/ir
'
: # fix mkbuiltin.go to generate the changes made to builtin.go during rf
sed -i '' '
s/\[T/[types.T/g
s/\*Node/*ir.Node/g
/internal\/types/c \
fmt.Fprintln(&b, `import (`) \
fmt.Fprintln(&b, ` "cmd/compile/internal/ir"`) \
fmt.Fprintln(&b, ` "cmd/compile/internal/types"`) \
fmt.Fprintln(&b, `)`)
' mkbuiltin.go
gofmt -w mkbuiltin.go
: # update cmd/dist to add internal/ir
cd ../../../dist
sed -i '' '/compile.internal.gc/a\
"cmd/compile/internal/ir",
' buildtool.go
gofmt -w buildtool.go
: # update cmd/compile TestFormats
cd ../..
go install std cmd
cd cmd/compile
go test -u || go test # first one updates but fails; second passes
Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db
Reviewed-on: https://go-review.googlesource.com/c/go/+/273008
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Move Flag, Debug, Ctxt, Exit, and error messages to
new package cmd/compile/internal/base.
These are the core functionality that everything in gc uses
and which otherwise prevent splitting any other code
out of gc into different packages.
A minor milestone: the compiler source code
no longer contains the string "yy".
[git-generate]
cd src/cmd/compile/internal/gc
rf '
mv atExit AtExit
mv Ctxt atExitFuncs AtExit Exit base.go
mv lineno Pos
mv linestr FmtPos
mv flusherrors FlushErrors
mv yyerror Errorf
mv yyerrorl ErrorfAt
mv yyerrorv ErrorfVers
mv noder.yyerrorpos noder.errorAt
mv Warnl WarnfAt
mv errorexit ErrorExit
mv base.go debug.go flag.go print.go cmd/compile/internal/base
'
: # update comments
sed -i '' 's/yyerrorl/ErrorfAt/g; s/yyerror/Errorf/g' *.go
: # bootstrap.go is not built by default so invisible to rf
sed -i '' 's/Fatalf/base.Fatalf/' bootstrap.go
goimports -w bootstrap.go
: # update cmd/dist to add internal/base
cd ../../../dist
sed -i '' '/internal.amd64/a\
"cmd/compile/internal/base",
' buildtool.go
gofmt -w buildtool.go
Change-Id: I59903c7084222d6eaee38823fd222159ba24a31a
Reviewed-on: https://go-review.googlesource.com/c/go/+/272250
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The debug table is not as haphazard as flags, but there are still
a few mismatches between command-line names and variable names.
This CL moves them all into a consistent home (var Debug, like var Flag).
Code updated automatically using the rf command below.
A followup CL will make a few manual cleanups, leaving this CL
completely automated and easier to regenerate during merge
conflicts.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
add main.go var Debug struct{}
mv Debug_append Debug.Append
mv Debug_checkptr Debug.Checkptr
mv Debug_closure Debug.Closure
mv Debug_compilelater Debug.CompileLater
mv disable_checknil Debug.DisableNil
mv debug_dclstack Debug.DclStack
mv Debug_gcprog Debug.GCProg
mv Debug_libfuzzer Debug.Libfuzzer
mv Debug_checknil Debug.Nil
mv Debug_panic Debug.Panic
mv Debug_slice Debug.Slice
mv Debug_typeassert Debug.TypeAssert
mv Debug_wb Debug.WB
mv Debug_export Debug.Export
mv Debug_pctab Debug.PCTab
mv Debug_locationlist Debug.LocationLists
mv Debug_typecheckinl Debug.TypecheckInl
mv Debug_gendwarfinl Debug.DwarfInl
mv Debug_softfloat Debug.SoftFloat
mv Debug_defer Debug.Defer
mv Debug_dumpptrs Debug.DumpPtrs
mv flag.go:/parse.-d/-1,/unknown.debug/+2 parseDebug
mv debugtab Debug parseDebug \
debugHelpHeader debugHelpFooter \
debug.go
# Remove //go:generate line copied from main.go
rm debug.go:/go:generate/-+
'
Change-Id: I625761ca5659be4052f7161a83baa00df75cca91
Reviewed-on: https://go-review.googlesource.com/c/go/+/272246
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Encode the flag results in an auxint field instead of having
one opcode per flag state. This helps us handle the new *noov
branches in a unified manner.
This is only for arm, arm64 is in a subsequent CL.
We could extend to other architectures as well, athough it would
only be cleanup, no behavioral change.
Update #39505
Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/238077
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.
Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.
Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:
Block-Op Meaning ARM condition codes
1. LTnoov less than MI
2. GEnoov greater than or equal PL
3. LEnoov less than or equal MI || EQ
4. GTnoov greater than NEQ & PL
The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.
For more details please refer to the previous fix on 64-bit ARM:
https://go-review.googlesource.com/c/go/+/233097
Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.
name old time/op new time/op delta
BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8)
Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8)
FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8)
FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8)
FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7)
FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8)
FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8)
FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7)
GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8)
GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7)
Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8)
Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8)
HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8)
JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7)
JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7)
Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8)
GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8)
RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7)
RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8)
RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7)
RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8)
RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8)
RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8)
Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8)
Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8)
TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8)
TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8)
name old speed new speed delta
GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8)
GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7)
Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8)
Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8)
JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7)
JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7)
GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8)
RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7)
RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8)
RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7)
RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8)
RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8)
RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8)
Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8)
Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8)
Fixes #39303
Updates #38740
Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change-Id: If82ebd9cd6470863eb5de9e031e7905a66218857
Reviewed-on: https://go-review.googlesource.com/c/go/+/204159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
ZeroAuto was used with the ambiguously live logic. The
ambiguously live logic is removed as we switched to stack
objects. It is now never called. Remove.
Change-Id: If4cdd7fed5297f8ab591cc392a76c80f57820856
Reviewed-on: https://go-review.googlesource.com/c/go/+/203538
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.
Updates #25819.
Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This is part two if the nacl removal. Part 1 was CL 199499.
This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.
Updates #30439
Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.
Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.
This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.
Passes toolstash-check -all.
Results of compilebench:
name old time/op new time/op delta
Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20)
Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18)
GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18)
Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18)
SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18)
Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20)
GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20)
Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20)
Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20)
XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18)
LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19)
ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20)
LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20)
StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20)
name old user-time/op new user-time/op delta
Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20)
Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20)
GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19)
Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18)
SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18)
Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18)
GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20)
Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18)
Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20)
XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20)
LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20)
ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19)
LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20)
name old object-bytes new object-bytes delta
Template 559kB ± 0% 559kB ± 0% ~ (all equal)
Unicode 216kB ± 0% 216kB ± 0% ~ (all equal)
GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal)
Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20)
SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20)
Flate 343kB ± 0% 343kB ± 0% ~ (all equal)
GoParser 441kB ± 0% 441kB ± 0% ~ (all equal)
Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal)
Tar 487kB ± 0% 487kB ± 0% ~ (all equal)
XML 632kB ± 0% 632kB ± 0% ~ (all equal)
name old export-bytes new export-bytes delta
Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal)
Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal)
GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal)
Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20)
SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20)
Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal)
GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal)
Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal)
Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal)
XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal)
CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal)
CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal)
CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal)
Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.
The benchmark results of math/bits show some improvements.
name old time/op new time/op delta
RotateLeft-4 9.42ns ± 0% 6.91ns ± 0% -26.66% (p=0.000 n=40+33)
RotateLeft8-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+31)
RotateLeft16-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+32)
RotateLeft32-4 8.16ns ± 0% 7.54ns ± 0% -7.68% (p=0.000 n=40+40)
RotateLeft64-4 15.7ns ± 0% 15.7ns ± 0% ~ (all equal)
updates #31265
Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This CL optimizes math.Abs to an inline ABSD instruction on ARM.
The benchmark results of src/math/ show big improvements.
name old time/op new time/op delta
Acos-4 181ns ± 0% 182ns ± 0% +0.30% (p=0.000 n=40+40)
Acosh-4 202ns ± 0% 202ns ± 0% ~ (all equal)
Asin-4 163ns ± 0% 163ns ± 0% ~ (all equal)
Asinh-4 242ns ± 0% 242ns ± 0% ~ (all equal)
Atan-4 120ns ± 0% 121ns ± 0% +0.83% (p=0.000 n=40+40)
Atanh-4 202ns ± 0% 202ns ± 0% ~ (all equal)
Atan2-4 173ns ± 0% 173ns ± 0% ~ (all equal)
Cbrt-4 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=39+37)
Ceil-4 72.9ns ± 0% 72.8ns ± 0% ~ (p=0.237 n=40+40)
Copysign-4 13.2ns ± 0% 13.2ns ± 0% ~ (all equal)
Cos-4 193ns ± 0% 183ns ± 0% -5.18% (p=0.000 n=40+40)
Cosh-4 254ns ± 0% 239ns ± 0% -5.91% (p=0.000 n=40+40)
Erf-4 112ns ± 0% 112ns ± 0% ~ (all equal)
Erfc-4 117ns ± 0% 117ns ± 0% ~ (all equal)
Erfinv-4 127ns ± 0% 127ns ± 1% ~ (p=0.492 n=40+40)
Erfcinv-4 128ns ± 0% 128ns ± 0% ~ (all equal)
Exp-4 212ns ± 0% 206ns ± 0% -3.05% (p=0.000 n=40+40)
ExpGo-4 216ns ± 0% 209ns ± 0% -3.24% (p=0.000 n=40+40)
Expm1-4 142ns ± 0% 142ns ± 0% ~ (all equal)
Exp2-4 191ns ± 0% 184ns ± 0% -3.45% (p=0.000 n=40+40)
Exp2Go-4 194ns ± 0% 187ns ± 0% -3.61% (p=0.000 n=40+40)
Abs-4 14.4ns ± 0% 6.3ns ± 0% -56.39% (p=0.000 n=38+39)
Dim-4 12.6ns ± 0% 12.6ns ± 0% ~ (all equal)
Floor-4 49.6ns ± 0% 49.6ns ± 0% ~ (all equal)
Max-4 27.6ns ± 0% 27.6ns ± 0% ~ (all equal)
Min-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal)
Mod-4 349ns ± 0% 305ns ± 1% -12.55% (p=0.000 n=33+40)
Frexp-4 54.0ns ± 0% 47.1ns ± 0% -12.78% (p=0.000 n=38+38)
Gamma-4 242ns ± 0% 234ns ± 0% -3.16% (p=0.000 n=36+40)
Hypot-4 84.8ns ± 0% 67.8ns ± 0% -20.05% (p=0.000 n=31+35)
HypotGo-4 88.5ns ± 0% 71.6ns ± 0% -19.12% (p=0.000 n=40+38)
Ilogb-4 45.8ns ± 0% 38.9ns ± 0% -15.12% (p=0.000 n=40+32)
J0-4 821ns ± 0% 802ns ± 0% -2.33% (p=0.000 n=33+40)
J1-4 816ns ± 0% 807ns ± 0% -1.05% (p=0.000 n=40+29)
Jn-4 1.67µs ± 0% 1.65µs ± 0% -1.45% (p=0.000 n=40+39)
Ldexp-4 61.5ns ± 0% 54.6ns ± 0% -11.27% (p=0.000 n=40+32)
Lgamma-4 188ns ± 0% 188ns ± 0% ~ (all equal)
Log-4 154ns ± 0% 147ns ± 0% -4.78% (p=0.000 n=40+40)
Logb-4 50.9ns ± 0% 42.7ns ± 0% -16.11% (p=0.000 n=34+39)
Log1p-4 160ns ± 0% 159ns ± 0% ~ (p=0.828 n=40+40)
Log10-4 173ns ± 0% 166ns ± 0% -4.05% (p=0.000 n=40+40)
Log2-4 65.3ns ± 0% 58.4ns ± 0% -10.57% (p=0.000 n=37+37)
Modf-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal)
Nextafter32-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal)
Nextafter64-4 32.7ns ± 0% 32.6ns ± 0% ~ (p=0.375 n=40+40)
PowInt-4 300ns ± 0% 277ns ± 0% -7.78% (p=0.000 n=40+40)
PowFrac-4 676ns ± 0% 635ns ± 0% -6.00% (p=0.000 n=40+35)
Pow10Pos-4 17.6ns ± 0% 17.6ns ± 0% ~ (all equal)
Pow10Neg-4 22.0ns ± 0% 22.0ns ± 0% ~ (all equal)
Round-4 30.1ns ± 0% 30.1ns ± 0% ~ (all equal)
RoundToEven-4 38.9ns ± 0% 38.9ns ± 0% ~ (all equal)
Remainder-4 291ns ± 0% 263ns ± 0% -9.62% (p=0.000 n=40+40)
Signbit-4 11.3ns ± 0% 11.3ns ± 0% ~ (all equal)
Sin-4 185ns ± 0% 185ns ± 0% ~ (all equal)
Sincos-4 230ns ± 0% 230ns ± 0% ~ (all equal)
Sinh-4 253ns ± 0% 246ns ± 0% -2.77% (p=0.000 n=39+39)
SqrtIndirect-4 41.4ns ± 0% 41.4ns ± 0% ~ (all equal)
SqrtLatency-4 13.8ns ± 0% 13.8ns ± 0% ~ (all equal)
SqrtIndirectLatency-4 37.0ns ± 0% 37.0ns ± 0% ~ (p=0.632 n=40+40)
SqrtGoLatency-4 911ns ± 0% 911ns ± 0% +0.08% (p=0.000 n=40+40)
SqrtPrime-4 13.2µs ± 0% 13.2µs ± 0% +0.01% (p=0.038 n=38+40)
Tan-4 205ns ± 0% 205ns ± 0% ~ (all equal)
Tanh-4 264ns ± 0% 247ns ± 0% -6.44% (p=0.000 n=39+32)
Trunc-4 45.2ns ± 0% 45.2ns ± 0% ~ (all equal)
Y0-4 796ns ± 0% 792ns ± 0% -0.55% (p=0.000 n=35+40)
Y1-4 804ns ± 0% 797ns ± 0% -0.82% (p=0.000 n=24+40)
Yn-4 1.64µs ± 0% 1.62µs ± 0% -1.27% (p=0.000 n=40+39)
Float64bits-4 8.16ns ± 0% 8.16ns ± 0% +0.04% (p=0.000 n=35+40)
Float64frombits-4 10.7ns ± 0% 10.7ns ± 0% ~ (all equal)
Float32bits-4 7.53ns ± 0% 7.53ns ± 0% ~ (p=0.760 n=40+40)
Float32frombits-4 6.91ns ± 0% 6.91ns ± 0% -0.04% (p=0.002 n=32+38)
[Geo mean] 111ns 106ns -3.98%
Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d
Reviewed-on: https://go-review.googlesource.com/c/go/+/188678
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Merge case statement for OpARMSLL, OpARMSRL and OpARMSRA into an
existing one using the same logic.
Change-Id: Ic4224668228902e5188fb0559b5f1949cfea1381
Reviewed-on: https://go-review.googlesource.com/c/go/+/171724
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This opcode was only used to mark unreachable code for plive to use.
plive now uses the SSA representation, so it knows locations are
unreachable because they are ends of Exit blocks. It doesn't need
these opcodes any more.
These opcodes actually used space in the binary, 2 bytes per undef
on x86 and more for other archs.
Makes the amd64 go binary 0.2% smaller.
Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620
Reviewed-on: https://go-review.googlesource.com/c/go/+/171058
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
A few examples (for accessing a slice of length 3):
s[-1] runtime error: index out of range [-1]
s[3] runtime error: index out of range [3] with length 3
s[-1:0] runtime error: slice bounds out of range [-1:]
s[3:0] runtime error: slice bounds out of range [3:0]
s[3:-1] runtime error: slice bounds out of range [:-1]
s[3:4] runtime error: slice bounds out of range [:4] with capacity 3
s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3
Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.
An exhaustive set of examples is in issue30116[u].out in the CL.
The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.
Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.
Fixes #30116
Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
This CL adds two rules to turn patterns like ((x<<8) | (x>>8)) (the type of
x is uint16, "|" can also be "+" or "^") to a REV16 instruction on arm v6+.
This optimization rule can be used for math/bits.ReverseBytes16.
Benchmarks on arm v6:
name old time/op new time/op delta
ReverseBytes-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal)
ReverseBytes16-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal)
ReverseBytes32-32 1.29ns ± 0% 1.29ns ± 0% ~ (all equal)
ReverseBytes64-32 1.43ns ± 0% 1.43ns ± 0% ~ (all equal)
Change-Id: I819e633c9a9d308f8e476fb0c82d73fb73dd019f
Reviewed-on: https://go-review.googlesource.com/c/go/+/159019
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
tracebacks
A recent change to fix stacktraces for inlined functions
introduced a regression on ppc64le when compiling position
independent code. That happened because ginsnop2 was called for
the purpose of inserting a NOP to identify the location of
the inlined function, when ginsnop should have been used.
ginsnop2 is intended to be used before deferreturn to ensure
r2 is properly restored when compiling position independent code.
In some cases the location where r2 is loaded from might not be
initialized. If that happens and r2 is used to generate an address,
the result is likely a SEGV.
This fixes that problem.
Fixes #30283
Change-Id: If70ef27fc65ef31969712422306ac3a57adbd5b6
Reviewed-on: https://go-review.googlesource.com/c/163337
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Work involved in getting a stack trace is divided between
runtime.Callers and runtime.CallersFrames.
Before this CL, runtime.Callers returns a pc per runtime frame.
runtime.CallersFrames is responsible for expanding a runtime frame
into potentially multiple user frames.
After this CL, runtime.Callers returns a pc per user frame.
runtime.CallersFrames just maps those to user frame info.
Entries in the result of runtime.Callers are now pcs
of the calls (or of the inline marks), not of the instruction
just after the call.
Fixes #29007
Fixes #28640
Update #26320
Change-Id: I1c9567596ff73dc73271311005097a9188c3406f
Reviewed-on: https://go-review.googlesource.com/c/152537
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
BFC (Bit Field Clear) was introduced in ARMv7, which can simplify
ANDconst and BICconst. And this CL implements that optimization.
1. The total size of pkg/android_arm decreases about 3KB, excluding
cmd/compile/.
2. There is no regression in the go1 benchmark result, and some
cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get
slight improvement.
name old time/op new time/op delta
BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29)
Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26)
FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28)
FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30)
FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30)
FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29)
FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21)
FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30)
GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30)
GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30)
Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29)
Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30)
HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29)
JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29)
JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30)
Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29)
GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29)
RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28)
RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25)
RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29)
RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29)
RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30)
RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29)
RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27)
Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30)
Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30)
TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27)
TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29)
[Geo mean] 384µs 382µs -0.37%
name old speed new speed delta
GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30)
GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30)
Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30)
Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30)
JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29)
JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30)
GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27)
RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28)
RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28)
RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27)
RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29)
RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29)
RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27)
RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26)
RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal)
Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30)
Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30)
[Geo mean] 12.9MB/s 12.9MB/s +0.29%
Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e
Reviewed-on: https://go-review.googlesource.com/134232
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Add a compiler intrinsic for getcallerpc on following architectures:
arm
mips mipsle mips64 mips64le
ppc64 ppc64le
s390x
Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef
Reviewed-on: https://go-review.googlesource.com/110835
Run-TryBot: Wei Xiao <Wei.Xiao@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.
This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.
The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.
For #24543.
Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Use the new softfloat support in the compiler, originally added
for softfloat on MIPS. This support is portable, so we can just
use it for softfloat on ARM.
In the old softfloat support on ARM, the compiler generates
floating point instructions, then the assembler inserts calls
to _sfloat before FP instructions. _sfloat decodes the following
FP instructions and simulates them.
In the new scheme, the compiler generates runtime calls to do FP
operations at a higher level. It doesn't generate FP instructions,
and therefore the assembler won't insert _sfloat calls, i.e. the
old mechanism is automatically suppressed.
The old method may be still be triggered with assembly code
using FP instructions. In the standard library, the only
occurance is math/sqrt_arm.s, which is rewritten to call to the
Go implementation instead.
Some significant speedups for code using floating points:
name old time/op new time/op delta
BinaryTree17-4 37.1s ± 2% 37.3s ± 1% ~ (p=0.105 n=10+10)
Fannkuch11-4 13.0s ± 0% 13.1s ± 0% +0.46% (p=0.000 n=10+10)
FmtFprintfEmpty-4 700ns ± 4% 734ns ± 6% +4.84% (p=0.009 n=10+10)
FmtFprintfString-4 1.22µs ± 3% 1.22µs ± 4% ~ (p=0.897 n=10+10)
FmtFprintfInt-4 1.27µs ± 2% 1.30µs ± 1% +1.91% (p=0.001 n=10+9)
FmtFprintfIntInt-4 1.83µs ± 2% 1.81µs ± 3% ~ (p=0.149 n=10+10)
FmtFprintfPrefixedInt-4 1.80µs ± 3% 1.81µs ± 2% ~ (p=0.421 n=10+8)
FmtFprintfFloat-4 6.89µs ± 3% 3.59µs ± 2% -47.93% (p=0.000 n=10+10)
FmtManyArgs-4 6.39µs ± 1% 6.09µs ± 1% -4.61% (p=0.000 n=10+9)
GobDecode-4 109ms ± 2% 81ms ± 2% -25.99% (p=0.000 n=9+10)
GobEncode-4 109ms ± 2% 76ms ± 2% -29.88% (p=0.000 n=10+9)
Gzip-4 3.61s ± 1% 3.59s ± 1% ~ (p=0.247 n=10+10)
Gunzip-4 449ms ± 4% 450ms ± 1% ~ (p=0.230 n=10+7)
HTTPClientServer-4 1.55ms ± 3% 1.53ms ± 2% ~ (p=0.400 n=9+10)
JSONEncode-4 356ms ± 1% 183ms ± 1% -48.73% (p=0.000 n=10+10)
JSONDecode-4 1.12s ± 2% 0.87s ± 1% -21.88% (p=0.000 n=10+10)
Mandelbrot200-4 5.49s ± 1% 2.55s ± 1% -53.45% (p=0.000 n=9+10)
GoParse-4 49.6ms ± 2% 47.5ms ± 1% -4.08% (p=0.000 n=10+9)
RegexpMatchEasy0_32-4 1.13µs ± 4% 1.20µs ± 4% +6.42% (p=0.000 n=10+10)
RegexpMatchEasy0_1K-4 4.41µs ± 2% 4.44µs ± 2% ~ (p=0.128 n=10+10)
RegexpMatchEasy1_32-4 1.15µs ± 5% 1.20µs ± 5% +4.85% (p=0.002 n=10+10)
RegexpMatchEasy1_1K-4 6.21µs ± 2% 6.37µs ± 4% +2.62% (p=0.001 n=9+10)
RegexpMatchMedium_32-4 1.58µs ± 5% 1.65µs ± 3% +4.85% (p=0.000 n=10+10)
RegexpMatchMedium_1K-4 341µs ± 3% 351µs ± 7% ~ (p=0.573 n=8+10)
RegexpMatchHard_32-4 21.4µs ± 3% 21.5µs ± 5% ~ (p=0.931 n=9+9)
RegexpMatchHard_1K-4 626µs ± 2% 626µs ± 1% ~ (p=0.645 n=8+8)
Revcomp-4 46.4ms ± 2% 47.4ms ± 2% +2.07% (p=0.000 n=10+10)
Template-4 1.31s ± 3% 1.23s ± 4% -6.13% (p=0.000 n=10+10)
TimeParse-4 4.49µs ± 1% 4.41µs ± 2% -1.81% (p=0.000 n=10+9)
TimeFormat-4 9.31µs ± 1% 9.32µs ± 2% ~ (p=0.561 n=9+9)
Change-Id: Iaeeff6c9a09c1b2c064d06e09dd88101dc02bfa4
Reviewed-on: https://go-review.googlesource.com/106735
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
When a neither of a conditional block's successors follows,
the block must end with a conditional branch followed by a
an unconditional branch. If the (conditional) branch is
"unlikely", invert it and swap successors to make it
likely instead.
This doesn't matter to most benchmarks on amd64, but in one
instance on amd64 it caused a 30% improvement, and it is
otherwise harmless. The problematic loop is
for i := 0; i < w; i++ {
if pw[i] != 0 {
return true
}
}
compiled under GOEXPERIMENT=preemptibleloops
This the very worst-case benchmark for that experiment.
Also in this CL is a commoning up of heavily-repeated
boilerplate, which made it much easier to see that the
changes were applied correctly. In the future this should
allow un-exporting of SSAGenState.Branches once the
boilerplate-replacement is done everywhere.
Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335
Reviewed-on: https://go-review.googlesource.com/104977
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Updates #22460.
Change-Id: I5581df7ad553237db7df3701b117ad99e0593b78
Reviewed-on: https://go-review.googlesource.com/92698
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
MOVBS/MOVBU/MOVHS/MOVHU can be optimized with a single instruction
on ARMv6 and ARMv7, instead of a pair of left/right shifts.
The benchmark tests show big improvement in special cases and a little
improvement in total.
1. A special case gets about 29% improvement.
name old time/op new time/op delta
TypePro-4 3.81ms ± 1% 2.71ms ± 1% -28.97% (p=0.000 n=26+25)
The source code of this case can be found at
https://github.com/benshi001/ugo1/blob/master/typepromotion_test.go
2. There is a little improvement in the go1 benchmark, excluding the noise.
name old time/op new time/op delta
BinaryTree17-4 42.1s ± 3% 42.1s ± 2% ~ (p=0.883 n=28+30)
Fannkuch11-4 24.3s ± 4% 24.7s ± 7% +1.64% (p=0.026 n=30+30)
FmtFprintfEmpty-4 833ns ± 2% 835ns ± 2% ~ (p=0.371 n=26+28)
FmtFprintfString-4 1.36µs ± 3% 1.35µs ± 1% ~ (p=0.202 n=26+23)
FmtFprintfInt-4 1.42µs ± 3% 1.43µs ± 1% +0.66% (p=0.000 n=26+27)
FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 2% ~ (p=0.104 n=25+26)
FmtFprintfPrefixedInt-4 2.37µs ± 2% 2.33µs ± 1% -1.75% (p=0.000 n=25+28)
FmtFprintfFloat-4 4.50µs ± 0% 4.37µs ± 1% -2.81% (p=0.000 n=23+25)
FmtManyArgs-4 8.08µs ± 0% 8.13µs ± 3% ~ (p=0.160 n=23+26)
GobDecode-4 102ms ± 4% 103ms ± 4% +1.08% (p=0.001 n=28+26)
GobEncode-4 96.0ms ± 2% 95.2ms ± 3% -0.81% (p=0.000 n=24+25)
Gzip-4 4.17s ± 3% 4.11s ± 2% -1.45% (p=0.000 n=25+25)
Gunzip-4 597ms ± 2% 594ms ± 2% -0.57% (p=0.000 n=24+26)
HTTPClientServer-4 708µs ± 4% 708µs ± 4% ~ (p=0.852 n=28+28)
JSONEncode-4 241ms ± 1% 245ms ± 3% +1.62% (p=0.000 n=27+28)
JSONDecode-4 906ms ± 3% 889ms ± 3% -1.85% (p=0.000 n=23+24)
Mandelbrot200-4 41.8ms ± 1% 41.8ms ± 1% ~ (p=0.929 n=25+24)
GoParse-4 47.1ms ± 2% 45.3ms ± 4% -3.80% (p=0.000 n=28+24)
RegexpMatchEasy0_32-4 1.27µs ± 2% 1.28µs ± 1% +0.77% (p=0.000 n=26+28)
RegexpMatchEasy0_1K-4 8.08µs ± 9% 7.83µs ±10% -3.10% (p=0.012 n=26+26)
RegexpMatchEasy1_32-4 1.29µs ± 5% 1.29µs ± 2% ~ (p=0.301 n=26+29)
RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.3µs ± 5% -1.95% (p=0.003 n=26+26)
RegexpMatchMedium_32-4 1.94µs ± 1% 1.95µs ± 1% ~ (p=0.251 n=24+27)
RegexpMatchMedium_1K-4 502µs ± 2% 502µs ± 2% ~ (p=0.336 n=25+28)
RegexpMatchHard_32-4 26.7µs ± 1% 26.6µs ± 3% ~ (p=0.454 n=27+26)
RegexpMatchHard_1K-4 801µs ± 3% 799µs ± 2% ~ (p=0.097 n=24+26)
Revcomp-4 73.5ms ± 5% 73.2ms ± 3% ~ (p=0.240 n=26+26)
Template-4 1.07s ± 2% 1.05s ± 1% -2.39% (p=0.000 n=26+24)
TimeParse-4 6.87µs ± 1% 6.85µs ± 1% ~ (p=0.094 n=28+23)
TimeFormat-4 13.4µs ± 1% 13.4µs ± 1% ~ (p=0.664 n=25+29)
[Geo mean] 717µs 713µs -0.54%
name old speed new speed delta
GobDecode-4 7.52MB/s ± 4% 7.44MB/s ± 4% -1.10% (p=0.001 n=28+26)
GobEncode-4 7.99MB/s ± 2% 8.06MB/s ± 3% +0.81% (p=0.000 n=24+25)
Gzip-4 4.66MB/s ± 3% 4.72MB/s ± 2% +1.43% (p=0.000 n=25+25)
Gunzip-4 32.5MB/s ± 2% 32.7MB/s ± 2% +0.56% (p=0.001 n=24+26)
JSONEncode-4 8.04MB/s ± 1% 7.92MB/s ± 3% -1.59% (p=0.000 n=27+28)
JSONDecode-4 2.14MB/s ± 3% 2.18MB/s ± 3% +1.90% (p=0.000 n=23+24)
GoParse-4 1.23MB/s ± 3% 1.28MB/s ± 4% +4.23% (p=0.000 n=30+24)
RegexpMatchEasy0_32-4 25.2MB/s ± 2% 25.0MB/s ± 1% -0.76% (p=0.000 n=26+28)
RegexpMatchEasy0_1K-4 127MB/s ± 8% 131MB/s ± 9% +3.29% (p=0.012 n=26+26)
RegexpMatchEasy1_32-4 24.8MB/s ± 5% 24.8MB/s ± 2% ~ (p=0.339 n=26+29)
RegexpMatchEasy1_1K-4 97.9MB/s ± 4% 99.8MB/s ± 5% +1.98% (p=0.004 n=26+26)
RegexpMatchMedium_32-4 514kB/s ± 3% 515kB/s ± 3% ~ (p=0.391 n=28+28)
RegexpMatchMedium_1K-4 2.04MB/s ± 2% 2.04MB/s ± 2% ~ (p=0.517 n=25+28)
RegexpMatchHard_32-4 1.20MB/s ± 3% 1.20MB/s ± 3% ~ (p=0.203 n=28+28)
RegexpMatchHard_1K-4 1.28MB/s ± 3% 1.28MB/s ± 2% ~ (p=0.499 n=24+26)
Revcomp-4 34.6MB/s ± 4% 34.7MB/s ± 3% ~ (p=0.245 n=26+26)
Template-4 1.81MB/s ± 2% 1.85MB/s ± 3% +2.30% (p=0.000 n=26+25)
[Geo mean] 6.82MB/s 6.88MB/s +0.84%
fixes #20653
Change-Id: Ief0d6e726e517e51ae511325b21ee72598e759ff
Reviewed-on: https://go-review.googlesource.com/71992
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
CMN/TST/TEQ were supported since ARMv4, which can be used to
simplify comparisons.
This patch implements the optimization and here are the benchmark
results.
1. A special test case got 18.21% improvement.
name old time/op new time/op delta
TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18)
(https://github.com/benshi001/ugo1/blob/master/tstteq_test.go)
2. There is no regression in the compilecmp benchmark.
name old time/op new time/op delta
Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9)
Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10)
GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8)
Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9)
SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10)
Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9)
GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10)
Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10)
Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10)
XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10)
[Geo mean] 4.78s 4.77s -0.17%
name old user-time/op new user-time/op delta
Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10)
Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9)
GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10)
Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10)
SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10)
Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10)
GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10)
Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10)
Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10)
XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10)
[Geo mean] 5.73s 5.72s -0.20%
name old text-bytes new text-bytes delta
HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal)
3. There is little change in the go1 benchmark (excluding the noise).
name old time/op new time/op delta
BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30)
Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30)
FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24)
FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30)
FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29)
FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30)
FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30)
FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30)
FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27)
GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29)
GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29)
Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27)
Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28)
HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30)
JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30)
JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30)
Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30)
GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30)
RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30)
RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27)
RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29)
RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30)
RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24)
RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30)
Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29)
Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22)
TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30)
TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30)
[Geo mean] 709µs 707µs -0.32%
name old speed new speed delta
GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29)
GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29)
Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30)
Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28)
JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30)
JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30)
GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30)
RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30)
RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30)
RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27)
RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24)
RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30)
RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30)
RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30)
Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29)
Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30)
[Geo mean] 6.91MB/s 6.96MB/s +0.70%
fixes #21583
Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214
Reviewed-on: https://go-review.googlesource.com/67490
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).
Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
BFX&BFXU were introduced in ARMv6T2. A single BFX or BFXU is
more efficiently than a pair of left-shift/right-shift in bit
field extraction.
This patch implements this optimization. And the benchmark tests
show big improvement in special cases and little change in total.
1. There is big improvement in a special test case.
name old time/op new time/op delta
BFX-4 665µs ± 1% 595µs ± 0% -10.61% (p=0.000 n=20+20)
(The test case: https://github.com/benshi001/ugo1/blob/master/bfx_test.go)
2. The compilecmp benchmark shows no regression.
name old time/op new time/op delta
Template 2.33s ± 2% 2.34s ± 2% ~ (p=0.356 n=9+10)
Unicode 1.32s ± 2% 1.30s ± 2% ~ (p=0.139 n=9+8)
GoTypes 7.77s ± 1% 7.76s ± 1% ~ (p=0.780 n=10+9)
Compiler 37.3s ± 1% 37.1s ± 1% ~ (p=0.211 n=10+9)
SSA 84.3s ± 2% 84.3s ± 2% ~ (p=0.842 n=10+9)
Flate 1.45s ± 1% 1.45s ± 3% ~ (p=0.853 n=10+10)
GoParser 1.83s ± 2% 1.83s ± 2% ~ (p=0.739 n=10+10)
Reflect 5.08s ± 2% 5.09s ± 2% ~ (p=0.720 n=9+10)
Tar 2.44s ± 1% 2.44s ± 2% ~ (p=0.684 n=10+10)
XML 2.62s ± 2% 2.62s ± 2% ~ (p=0.529 n=10+10)
[Geo mean] 4.80s 4.79s -0.06%
name old user-time/op new user-time/op delta
Template 2.76s ± 2% 2.75s ± 3% ~ (p=0.893 n=10+10)
Unicode 1.63s ± 1% 1.60s ± 1% -2.07% (p=0.000 n=8+9)
GoTypes 9.54s ± 1% 9.52s ± 1% ~ (p=0.215 n=10+10)
Compiler 46.0s ± 1% 46.0s ± 1% ~ (p=0.853 n=10+10)
SSA 110s ± 1% 110s ± 1% ~ (p=0.838 n=10+10)
Flate 1.69s ± 3% 1.69s ± 5% ~ (p=0.957 n=10+10)
GoParser 2.15s ± 2% 2.15s ± 2% ~ (p=0.749 n=10+10)
Reflect 6.03s ± 1% 5.99s ± 2% ~ (p=0.060 n=9+10)
Tar 3.02s ± 2% 2.99s ± 2% ~ (p=0.214 n=10+10)
XML 3.10s ± 2% 3.08s ± 2% ~ (p=0.732 n=9+10)
[Geo mean] 5.82s 5.79s -0.41%
name old text-bytes new text-bytes delta
HelloSize 589kB ± 0% 589kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 76.9kB ± 0% 76.9kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal)
3. The go1 benchmark shows little change in total. (excluding noise)
name old time/op new time/op delta
BinaryTree17-4 41.5s ± 1% 41.6s ± 1% ~ (p=0.373 n=30+26)
Fannkuch11-4 23.6s ± 1% 23.6s ± 1% +0.28% (p=0.003 n=29+30)
FmtFprintfEmpty-4 826ns ± 1% 827ns ± 1% ~ (p=0.155 n=30+30)
FmtFprintfString-4 1.35µs ± 1% 1.35µs ± 1% ~ (p=0.499 n=30+30)
FmtFprintfInt-4 1.43µs ± 1% 1.41µs ± 1% -1.19% (p=0.000 n=30+30)
FmtFprintfIntInt-4 2.15µs ± 1% 2.11µs ± 1% -1.78% (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4 2.21µs ± 1% 2.21µs ± 1% ~ (p=0.881 n=30+30)
FmtFprintfFloat-4 4.41µs ± 1% 4.44µs ± 0% +0.64% (p=0.000 n=30+30)
FmtManyArgs-4 8.06µs ± 1% 8.06µs ± 0% ~ (p=0.871 n=30+30)
GobDecode-4 103ms ± 1% 104ms ± 2% +0.54% (p=0.013 n=28+29)
GobEncode-4 92.4ms ± 1% 92.6ms ± 1% ~ (p=0.447 n=30+29)
Gzip-4 4.17s ± 1% 4.06s ± 1% -2.56% (p=0.000 n=29+30)
Gunzip-4 603ms ± 1% 602ms ± 1% ~ (p=0.423 n=30+30)
HTTPClientServer-4 688µs ± 2% 674µs ± 3% -2.09% (p=0.000 n=29+30)
JSONEncode-4 237ms ± 1% 237ms ± 1% ~ (p=0.061 n=29+30)
JSONDecode-4 907ms ± 1% 910ms ± 1% ~ (p=0.061 n=30+30)
Mandelbrot200-4 41.7ms ± 0% 41.7ms ± 0% +0.19% (p=0.000 n=24+20)
GoParse-4 45.7ms ± 2% 45.5ms ± 2% -0.29% (p=0.005 n=30+30)
RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 0% +0.12% (p=0.031 n=30+30)
RegexpMatchEasy0_1K-4 7.77µs ± 4% 7.73µs ± 3% ~ (p=0.169 n=30+30)
RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 1% ~ (p=0.126 n=30+30)
RegexpMatchEasy1_1K-4 10.4µs ± 3% 10.3µs ± 2% -1.32% (p=0.004 n=30+29)
RegexpMatchMedium_32-4 2.06µs ± 0% 2.06µs ± 0% ~ (p=0.071 n=30+30)
RegexpMatchMedium_1K-4 531µs ± 1% 530µs ± 0% ~ (p=0.121 n=30+23)
RegexpMatchHard_32-4 28.7µs ± 1% 28.6µs ± 1% -0.21% (p=0.001 n=30+27)
RegexpMatchHard_1K-4 860µs ± 1% 857µs ± 1% ~ (p=0.105 n=30+27)
Revcomp-4 67.3ms ± 2% 67.3ms ± 2% ~ (p=0.805 n=29+29)
Template-4 1.08s ± 1% 1.08s ± 1% ~ (p=0.260 n=30+30)
TimeParse-4 7.04µs ± 0% 7.04µs ± 0% ~ (p=0.315 n=30+30)
TimeFormat-4 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.077 n=30+30)
[Geo mean] 715µs 713µs -0.30%
name old speed new speed delta
GobDecode-4 7.42MB/s ± 1% 7.38MB/s ± 2% -0.54% (p=0.011 n=28+29)
GobEncode-4 8.30MB/s ± 1% 8.29MB/s ± 1% ~ (p=0.484 n=30+29)
Gzip-4 4.65MB/s ± 2% 4.78MB/s ± 1% +2.73% (p=0.000 n=30+30)
Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.357 n=30+30)
JSONEncode-4 8.18MB/s ± 1% 8.19MB/s ± 1% ~ (p=0.052 n=29+30)
JSONDecode-4 2.14MB/s ± 1% 2.13MB/s ± 1% ~ (p=0.074 n=30+29)
GoParse-4 1.27MB/s ± 1% 1.27MB/s ± 2% ~ (p=0.618 n=24+30)
RegexpMatchEasy0_32-4 25.2MB/s ± 0% 25.2MB/s ± 0% -0.12% (p=0.031 n=30+30)
RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 2% ~ (p=0.171 n=30+30)
RegexpMatchEasy1_32-4 24.8MB/s ± 1% 24.9MB/s ± 1% ~ (p=0.106 n=30+30)
RegexpMatchEasy1_1K-4 98.4MB/s ± 3% 99.6MB/s ± 4% +1.19% (p=0.011 n=30+30)
RegexpMatchMedium_32-4 483kB/s ± 1% 484kB/s ± 1% ~ (p=0.426 n=30+30)
RegexpMatchMedium_1K-4 1.93MB/s ± 1% 1.93MB/s ± 0% ~ (p=0.157 n=30+17)
RegexpMatchHard_32-4 1.12MB/s ± 1% 1.12MB/s ± 0% +0.33% (p=0.001 n=30+24)
RegexpMatchHard_1K-4 1.19MB/s ± 1% 1.19MB/s ± 1% ~ (p=0.290 n=30+30)
Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.815 n=29+29)
Template-4 1.80MB/s ± 1% 1.80MB/s ± 1% ~ (p=0.586 n=30+30)
[Geo mean] 6.80MB/s 6.81MB/s +0.25%
fixes #20966
Change-Id: Idb5567bbe988c875315b8c98c128957cd474ccc5
Reviewed-on: https://go-review.googlesource.com/64950
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
|
|
We used to have {Arg,Auto,Extern}Symbol structs with which we wrapped
a *gc.Node or *obj.LSym before storing them in the Aux field
of an ssa.Value. This let the SSA part of the compiler distinguish
between autos and args, for example. We no longer need the wrappers
as we can query the underlying objects directly.
There was also some sloppy usage, where VarDef had a *gc.Node
directly in its Aux field, whereas the use of that variable had
that *gc.Node wrapped in an AutoSymbol. Thus the Aux fields didn't
match (using ==) when they probably should.
This sloppy usage cleanup is the only thing in the CL that changes the
generated code - we can get rid of some more unused auto variables if
the matching happens reliably.
Removing this wrapper also lets us get rid of the varsyms cache
(which was used to prevent wrapping the same *gc.Node twice).
Change-Id: I0dedf8f82f84bfee413d310342b777316bd1d478
Reviewed-on: https://go-review.googlesource.com/64452
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The go compiler can generate better ARM code with those more
efficient FP instructions. And there is little improvement
in total but big improvement in special cases.
1. The size of pkg/linux_arm/math.a shrinks by 2.4%.
2. there is neither improvement nor regression in compilecmp benchmark.
name old time/op new time/op delta
Template 2.32s ± 2% 2.32s ± 1% ~ (p=1.000 n=9+10)
Unicode 1.32s ± 4% 1.32s ± 4% ~ (p=0.912 n=10+10)
GoTypes 7.76s ± 1% 7.79s ± 1% ~ (p=0.447 n=9+10)
Compiler 37.4s ± 2% 37.2s ± 2% ~ (p=0.218 n=10+10)
SSA 84.8s ± 2% 85.0s ± 1% ~ (p=0.604 n=10+9)
Flate 1.45s ± 2% 1.44s ± 2% ~ (p=0.075 n=10+10)
GoParser 1.82s ± 1% 1.81s ± 1% ~ (p=0.190 n=10+10)
Reflect 5.06s ± 1% 5.05s ± 1% ~ (p=0.315 n=10+9)
Tar 2.37s ± 1% 2.37s ± 2% ~ (p=0.912 n=10+10)
XML 2.56s ± 1% 2.58s ± 2% ~ (p=0.089 n=10+10)
[Geo mean] 4.77s 4.77s -0.08%
name old user-time/op new user-time/op delta
Template 2.74s ± 2% 2.75s ± 2% ~ (p=0.856 n=9+10)
Unicode 1.61s ± 4% 1.62s ± 3% ~ (p=0.693 n=10+10)
GoTypes 9.55s ± 1% 9.49s ± 2% ~ (p=0.056 n=9+10)
Compiler 45.9s ± 1% 45.8s ± 1% ~ (p=0.345 n=9+10)
SSA 110s ± 1% 110s ± 1% ~ (p=0.763 n=9+10)
Flate 1.68s ± 2% 1.68s ± 3% ~ (p=0.616 n=10+10)
GoParser 2.14s ± 4% 2.14s ± 1% ~ (p=0.825 n=10+9)
Reflect 5.95s ± 1% 5.97s ± 3% ~ (p=0.951 n=9+10)
Tar 2.94s ± 3% 2.93s ± 2% ~ (p=0.359 n=10+10)
XML 3.03s ± 3% 3.07s ± 6% ~ (p=0.166 n=10+10)
[Geo mean] 5.76s 5.77s +0.12%
name old text-bytes new text-bytes delta
HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal)
3. The performance of Mandelbrot200 improves 15%, though little
improvement in total.
name old time/op new time/op delta
BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.264 n=29+23)
Fannkuch11-4 24.2s ± 0% 24.1s ± 1% -0.13% (p=0.050 n=30+30)
FmtFprintfEmpty-4 826ns ± 1% 824ns ± 1% -0.24% (p=0.038 n=25+30)
FmtFprintfString-4 1.38µs ± 1% 1.38µs ± 0% -0.42% (p=0.000 n=27+25)
FmtFprintfInt-4 1.46µs ± 1% 1.46µs ± 0% ~ (p=0.060 n=30+23)
FmtFprintfIntInt-4 2.11µs ± 1% 2.08µs ± 0% -1.04% (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4 2.23µs ± 1% 2.22µs ± 1% -0.51% (p=0.000 n=30+30)
FmtFprintfFloat-4 4.49µs ± 1% 4.48µs ± 1% -0.22% (p=0.004 n=26+30)
FmtManyArgs-4 8.06µs ± 1% 8.12µs ± 1% +0.68% (p=0.000 n=25+30)
GobDecode-4 104ms ± 1% 104ms ± 2% ~ (p=0.362 n=29+29)
GobEncode-4 92.9ms ± 1% 92.8ms ± 2% ~ (p=0.786 n=30+30)
Gzip-4 4.12s ± 1% 4.12s ± 1% ~ (p=0.314 n=30+30)
Gunzip-4 602ms ± 1% 603ms ± 1% ~ (p=0.164 n=30+30)
HTTPClientServer-4 659µs ± 1% 655µs ± 2% -0.64% (p=0.006 n=25+28)
JSONEncode-4 234ms ± 1% 235ms ± 1% +0.29% (p=0.050 n=30+30)
JSONDecode-4 912ms ± 0% 911ms ± 0% ~ (p=0.385 n=18+24)
Mandelbrot200-4 49.2ms ± 0% 41.7ms ± 0% -15.35% (p=0.000 n=25+27)
GoParse-4 46.3ms ± 1% 46.3ms ± 2% ~ (p=0.572 n=30+30)
RegexpMatchEasy0_32-4 1.29µs ± 1% 1.27µs ± 0% -1.59% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 7.62µs ± 4% 7.71µs ± 3% ~ (p=0.074 n=30+30)
RegexpMatchEasy1_32-4 1.31µs ± 0% 1.30µs ± 1% -0.71% (p=0.000 n=23+30)
RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.3µs ± 5% ~ (p=0.105 n=30+30)
RegexpMatchMedium_32-4 2.06µs ± 1% 2.06µs ± 1% ~ (p=0.100 n=30+30)
RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.254 n=29+30)
RegexpMatchHard_32-4 28.9µs ± 0% 28.9µs ± 0% ~ (p=0.154 n=30+30)
RegexpMatchHard_1K-4 868µs ± 1% 867µs ± 0% ~ (p=0.729 n=30+23)
Revcomp-4 66.9ms ± 1% 67.2ms ± 2% ~ (p=0.102 n=28+29)
Template-4 1.07s ± 1% 1.06s ± 1% -0.53% (p=0.000 n=30+30)
TimeParse-4 7.07µs ± 1% 7.01µs ± 0% -0.85% (p=0.000 n=30+25)
TimeFormat-4 13.1µs ± 0% 13.2µs ± 1% +0.77% (p=0.000 n=27+27)
[Geo mean] 721µs 716µs -0.70%
name old speed new speed delta
GobDecode-4 7.38MB/s ± 1% 7.37MB/s ± 2% ~ (p=0.399 n=29+29)
GobEncode-4 8.26MB/s ± 1% 8.27MB/s ± 2% ~ (p=0.790 n=30+30)
Gzip-4 4.71MB/s ± 1% 4.71MB/s ± 1% ~ (p=0.885 n=30+30)
Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.190 n=30+30)
JSONEncode-4 8.28MB/s ± 1% 8.25MB/s ± 1% ~ (p=0.053 n=30+30)
JSONDecode-4 2.13MB/s ± 0% 2.12MB/s ± 1% ~ (p=0.072 n=18+30)
GoParse-4 1.25MB/s ± 1% 1.25MB/s ± 2% ~ (p=0.863 n=30+30)
RegexpMatchEasy0_32-4 24.8MB/s ± 0% 25.2MB/s ± 1% +1.61% (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4 134MB/s ± 4% 133MB/s ± 3% ~ (p=0.074 n=30+30)
RegexpMatchEasy1_32-4 24.5MB/s ± 0% 24.6MB/s ± 1% +0.72% (p=0.000 n=23+30)
RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 99.8MB/s ± 5% ~ (p=0.105 n=30+30)
RegexpMatchMedium_32-4 483kB/s ± 1% 487kB/s ± 1% +0.83% (p=0.002 n=30+30)
RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.058 n=30+30)
RegexpMatchHard_32-4 1.10MB/s ± 0% 1.11MB/s ± 0% ~ (p=0.804 n=30+30)
RegexpMatchHard_1K-4 1.18MB/s ± 0% 1.18MB/s ± 0% ~ (all equal)
Revcomp-4 38.0MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.098 n=28+29)
Template-4 1.82MB/s ± 1% 1.83MB/s ± 1% +0.55% (p=0.000 n=29+29)
[Geo mean] 6.79MB/s 6.79MB/s +0.09%
Change-Id: Ia91991c2c5c59c5df712de85a83b13a21c0a554b
Reviewed-on: https://go-review.googlesource.com/63770
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
NMULF and NMULD are efficient FP instructions, and the go compiler can
use them to generate better code.
The benchmark tests of my patch did not show general change, but big
improvement in special cases.
1.A special test case improved 12.6%.
https://github.com/benshi001/ugo1/blob/master/fpmul_test.go
name old time/op new time/op delta
FPMul-4 398µs ± 1% 348µs ± 1% -12.64% (p=0.000 n=40+40)
2. the compilecmp test showed little change.
name old time/op new time/op delta
Template 2.30s ± 1% 2.31s ± 1% ~ (p=0.754 n=17+19)
Unicode 1.31s ± 3% 1.32s ± 5% ~ (p=0.265 n=20+20)
GoTypes 7.73s ± 2% 7.73s ± 1% ~ (p=0.925 n=20+20)
Compiler 37.0s ± 1% 37.3s ± 2% +0.79% (p=0.002 n=19+20)
SSA 83.8s ± 4% 83.5s ± 2% ~ (p=0.964 n=20+17)
Flate 1.43s ± 2% 1.44s ± 1% ~ (p=0.602 n=20+20)
GoParser 1.82s ± 2% 1.81s ± 2% ~ (p=0.141 n=19+20)
Reflect 5.08s ± 2% 5.08s ± 3% ~ (p=0.835 n=20+19)
Tar 2.36s ± 1% 2.35s ± 1% ~ (p=0.195 n=18+17)
XML 2.57s ± 2% 2.56s ± 1% ~ (p=0.283 n=20+17)
[Geo mean] 4.74s 4.75s +0.05%
name old user-time/op new user-time/op delta
Template 2.75s ± 2% 2.75s ± 0% ~ (p=0.620 n=20+15)
Unicode 1.59s ± 4% 1.60s ± 4% ~ (p=0.479 n=20+19)
GoTypes 9.48s ± 1% 9.47s ± 1% ~ (p=0.743 n=20+20)
Compiler 45.7s ± 1% 45.7s ± 1% ~ (p=0.482 n=19+20)
SSA 109s ± 1% 109s ± 2% ~ (p=0.800 n=18+20)
Flate 1.67s ± 3% 1.67s ± 3% ~ (p=0.598 n=19+18)
GoParser 2.15s ± 4% 2.13s ± 3% ~ (p=0.153 n=20+20)
Reflect 5.95s ± 2% 5.95s ± 2% ~ (p=0.961 n=19+20)
Tar 2.93s ± 2% 2.92s ± 3% ~ (p=0.242 n=20+19)
XML 3.02s ± 3% 3.04s ± 3% ~ (p=0.233 n=19+18)
[Geo mean] 5.74s 5.74s -0.04%
name old text-bytes new text-bytes delta
HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal)
3. The go1 benchmark showed little change in total.
name old time/op new time/op delta
BinaryTree17-4 41.8s ± 1% 41.8s ± 1% ~ (p=0.388 n=40+39)
Fannkuch11-4 24.1s ± 1% 24.1s ± 1% ~ (p=0.077 n=40+40)
FmtFprintfEmpty-4 834ns ± 1% 831ns ± 1% -0.31% (p=0.002 n=40+37)
FmtFprintfString-4 1.34µs ± 1% 1.34µs ± 0% ~ (p=0.387 n=40+40)
FmtFprintfInt-4 1.44µs ± 1% 1.44µs ± 1% ~ (p=0.421 n=40+40)
FmtFprintfIntInt-4 2.09µs ± 0% 2.09µs ± 1% ~ (p=0.589 n=40+39)
FmtFprintfPrefixedInt-4 2.32µs ± 1% 2.33µs ± 1% +0.15% (p=0.001 n=40+40)
FmtFprintfFloat-4 4.51µs ± 0% 4.44µs ± 1% -1.50% (p=0.000 n=40+40)
FmtManyArgs-4 7.94µs ± 0% 7.97µs ± 0% +0.36% (p=0.001 n=32+40)
GobDecode-4 104ms ± 1% 102ms ± 2% -1.27% (p=0.000 n=39+37)
GobEncode-4 90.5ms ± 1% 90.9ms ± 2% +0.40% (p=0.006 n=37+40)
Gzip-4 4.10s ± 2% 4.08s ± 1% -0.30% (p=0.004 n=40+40)
Gunzip-4 603ms ± 0% 602ms ± 1% ~ (p=0.303 n=37+40)
HTTPClientServer-4 672µs ± 3% 658µs ± 2% -2.08% (p=0.000 n=39+37)
JSONEncode-4 238ms ± 1% 239ms ± 0% +0.26% (p=0.001 n=40+25)
JSONDecode-4 884ms ± 1% 885ms ± 1% +0.16% (p=0.012 n=40+40)
Mandelbrot200-4 49.3ms ± 0% 49.3ms ± 0% ~ (p=0.588 n=40+38)
GoParse-4 46.3ms ± 1% 46.4ms ± 2% ~ (p=0.487 n=40+40)
RegexpMatchEasy0_32-4 1.28µs ± 1% 1.28µs ± 0% +0.12% (p=0.003 n=40+40)
RegexpMatchEasy0_1K-4 7.78µs ± 5% 7.78µs ± 4% ~ (p=0.825 n=40+40)
RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 0% ~ (p=0.659 n=40+40)
RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.4µs ± 2% ~ (p=0.266 n=40+40)
RegexpMatchMedium_32-4 2.05µs ± 1% 2.05µs ± 0% -0.18% (p=0.002 n=40+28)
RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.397 n=37+40)
RegexpMatchHard_32-4 28.9µs ± 1% 28.9µs ± 1% -0.22% (p=0.002 n=40+40)
RegexpMatchHard_1K-4 868µs ± 1% 870µs ± 1% +0.21% (p=0.015 n=40+40)
Revcomp-4 67.3ms ± 1% 67.2ms ± 2% ~ (p=0.262 n=38+39)
Template-4 1.07s ± 1% 1.07s ± 1% ~ (p=0.276 n=40+40)
TimeParse-4 7.16µs ± 1% 7.16µs ± 1% ~ (p=0.610 n=39+40)
TimeFormat-4 13.3µs ± 1% 13.3µs ± 1% ~ (p=0.617 n=38+40)
[Geo mean] 720µs 719µs -0.13%
name old speed new speed delta
GobDecode-4 7.39MB/s ± 1% 7.49MB/s ± 2% +1.25% (p=0.000 n=39+38)
GobEncode-4 8.48MB/s ± 1% 8.45MB/s ± 2% -0.40% (p=0.005 n=37+40)
Gzip-4 4.74MB/s ± 2% 4.75MB/s ± 1% +0.30% (p=0.018 n=40+40)
Gunzip-4 32.2MB/s ± 0% 32.2MB/s ± 1% ~ (p=0.272 n=36+40)
JSONEncode-4 8.15MB/s ± 1% 8.13MB/s ± 0% -0.26% (p=0.003 n=40+25)
JSONDecode-4 2.19MB/s ± 1% 2.19MB/s ± 1% ~ (p=0.676 n=40+40)
GoParse-4 1.25MB/s ± 2% 1.25MB/s ± 2% ~ (p=0.823 n=40+40)
RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.1MB/s ± 0% -0.12% (p=0.006 n=40+40)
RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 5% ~ (p=0.821 n=40+40)
RegexpMatchEasy1_32-4 24.7MB/s ± 1% 24.7MB/s ± 0% ~ (p=0.630 n=40+40)
RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 98.8MB/s ± 2% ~ (p=0.268 n=40+40)
RegexpMatchMedium_32-4 487kB/s ± 2% 490kB/s ± 0% +0.51% (p=0.001 n=40+40)
RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.208 n=39+40)
RegexpMatchHard_32-4 1.11MB/s ± 1% 1.11MB/s ± 0% +0.36% (p=0.000 n=40+33)
RegexpMatchHard_1K-4 1.18MB/s ± 1% 1.18MB/s ± 1% ~ (p=0.207 n=40+37)
Revcomp-4 37.8MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.276 n=38+39)
Template-4 1.82MB/s ± 1% 1.81MB/s ± 1% ~ (p=0.122 n=38+40)
[Geo mean] 6.81MB/s 6.81MB/s +0.06%
fixes #19843
Change-Id: Ief3a0c2b15f59d40c7b40f2784eeb71196685b59
Reviewed-on: https://go-review.googlesource.com/61150
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
MULS was introduced in ARMv7 and corresponding to MULA. This patch
duplicated all MULA related SSA rules with MULS.
Here was the contrast test result against the original go compiler.
There was no improvement in total, but big improvement in special cases.
1. A specific test case accelerated 18.62%.
(https://github.com/benshi001/ugo1/blob/master/mulsub_test.go)
name old time/op new time/op delta
MulSub-4 270µs ± 0% 219µs ± 0% -18.62% (p=0.000 n=35+40)
2. Total size of all .a files in pkg/ shrank by 0.002%.
3. The compilecmp benchmark showed no decline.
name old time/op new time/op delta
Template 2.37s ± 3% 2.36s ± 1% ~ (p=0.233 n=19+18)
Unicode 1.32s ± 2% 1.34s ± 5% +1.32% (p=0.011 n=20+18)
GoTypes 7.88s ± 1% 7.87s ± 1% ~ (p=0.758 n=20+20)
Compiler 37.5s ± 1% 37.6s ± 1% ~ (p=0.194 n=20+19)
SSA 83.7s ± 2% 83.5s ± 2% ~ (p=0.569 n=20+19)
Flate 1.46s ± 3% 1.45s ± 1% ~ (p=0.619 n=20+17)
GoParser 1.87s ± 2% 1.85s ± 1% -0.58% (p=0.048 n=20+18)
Reflect 5.10s ± 2% 5.11s ± 2% ~ (p=0.365 n=19+20)
Tar 1.78s ± 2% 1.78s ± 2% ~ (p=0.531 n=19+20)
XML 2.62s ± 1% 2.61s ± 2% ~ (p=0.057 n=17+19)
[Geo mean] 4.68s 4.67s -0.07%
name old user-time/op new user-time/op delta
Template 2.80s ± 1% 2.79s ± 2% ~ (p=0.686 n=17+20)
Unicode 1.61s ± 4% 1.63s ± 6% ~ (p=0.222 n=20+20)
GoTypes 9.59s ± 1% 9.60s ± 1% ~ (p=0.482 n=17+20)
Compiler 46.1s ± 1% 46.2s ± 1% ~ (p=0.373 n=20+18)
SSA 108s ± 1% 108s ± 2% ~ (p=0.784 n=20+20)
Flate 1.68s ± 3% 1.69s ± 3% ~ (p=0.335 n=20+19)
GoParser 2.20s ± 4% 2.19s ± 2% ~ (p=0.844 n=20+18)
Reflect 5.97s ± 3% 6.01s ± 2% ~ (p=0.184 n=20+20)
Tar 2.11s ± 2% 2.11s ± 4% ~ (p=0.961 n=19+20)
XML 3.07s ± 1% 3.07s ± 3% ~ (p=0.786 n=16+19)
[Geo mean] 5.61s 5.62s +0.19%
name old text-bytes new text-bytes delta
HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal)
4. The go1 benchmark showed no decline in total.
name old time/op new time/op delta
BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.966 n=40+40)
Fannkuch11-4 23.6s ± 0% 23.6s ± 1% -0.23% (p=0.000 n=40+40)
FmtFprintfEmpty-4 844ns ± 1% 834ns ± 1% -1.23% (p=0.000 n=40+40)
FmtFprintfString-4 1.39µs ± 1% 1.40µs ± 1% +0.71% (p=0.000 n=40+40)
FmtFprintfInt-4 1.44µs ± 1% 1.45µs ± 1% +0.70% (p=0.000 n=40+40)
FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 1% +0.30% (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4 2.49µs ± 0% 2.50µs ± 1% +0.66% (p=0.000 n=32+40)
FmtFprintfFloat-4 4.42µs ± 1% 4.46µs ± 2% +0.94% (p=0.000 n=40+40)
FmtManyArgs-4 8.31µs ± 1% 8.22µs ± 1% -1.09% (p=0.000 n=40+40)
GobDecode-4 105ms ± 1% 102ms ± 1% -2.30% (p=0.000 n=39+39)
GobEncode-4 90.2ms ± 1% 88.7ms ± 1% -1.66% (p=0.000 n=40+39)
Gzip-4 4.17s ± 1% 4.16s ± 1% ~ (p=0.785 n=40+40)
Gunzip-4 608ms ± 1% 608ms ± 1% ~ (p=0.481 n=40+40)
HTTPClientServer-4 697µs ± 2% 684µs ± 3% -1.89% (p=0.000 n=37+40)
JSONEncode-4 255ms ± 1% 256ms ± 1% +0.35% (p=0.000 n=40+40)
JSONDecode-4 920ms ± 1% 926ms ± 1% +0.64% (p=0.000 n=40+39)
Mandelbrot200-4 49.3ms ± 1% 49.3ms ± 0% +0.07% (p=0.005 n=40+40)
GoParse-4 46.8ms ± 2% 46.7ms ± 1% ~ (p=1.000 n=40+40)
RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 1% ~ (p=0.057 n=40+40)
RegexpMatchEasy0_1K-4 7.97µs ± 7% 7.92µs ± 5% ~ (p=0.094 n=40+40)
RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% ~ (p=0.406 n=40+40)
RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.5µs ± 3% ~ (p=0.855 n=40+40)
RegexpMatchMedium_32-4 2.04µs ± 0% 2.04µs ± 1% -0.22% (p=0.000 n=39+40)
RegexpMatchMedium_1K-4 541µs ± 0% 540µs ± 1% -0.25% (p=0.000 n=40+38)
RegexpMatchHard_32-4 29.3µs ± 1% 29.3µs ± 0% ~ (p=0.149 n=40+40)
RegexpMatchHard_1K-4 878µs ± 1% 880µs ± 0% +0.14% (p=0.005 n=36+35)
Revcomp-4 81.8ms ± 2% 81.4ms ± 2% -0.43% (p=0.015 n=38+39)
Template-4 1.05s ± 1% 1.05s ± 1% ~ (p=0.302 n=40+35)
TimeParse-4 7.18µs ± 1% 7.26µs ± 1% +1.05% (p=0.000 n=40+36)
TimeFormat-4 13.1µs ± 1% 13.1µs ± 1% ~ (p=0.698 n=37+40)
[Geo mean] 733µs 732µs -0.16%
name old speed new speed delta
GobDecode-4 7.34MB/s ± 1% 7.51MB/s ± 1% +2.36% (p=0.000 n=39+39)
GobEncode-4 8.51MB/s ± 1% 8.65MB/s ± 1% +1.69% (p=0.000 n=40+39)
Gzip-4 4.66MB/s ± 1% 4.66MB/s ± 1% ~ (p=0.783 n=40+40)
Gunzip-4 31.9MB/s ± 1% 31.9MB/s ± 1% ~ (p=0.466 n=40+40)
JSONEncode-4 7.61MB/s ± 1% 7.58MB/s ± 1% -0.35% (p=0.001 n=40+40)
JSONDecode-4 2.11MB/s ± 1% 2.10MB/s ± 1% -0.52% (p=0.000 n=38+39)
GoParse-4 1.24MB/s ± 2% 1.24MB/s ± 1% ~ (p=0.556 n=40+39)
RegexpMatchEasy0_32-4 25.1MB/s ± 0% 25.1MB/s ± 1% ~ (p=0.064 n=40+40)
RegexpMatchEasy0_1K-4 129MB/s ± 8% 129MB/s ± 5% ~ (p=0.094 n=40+40)
RegexpMatchEasy1_32-4 25.0MB/s ± 1% 25.1MB/s ± 1% ~ (p=0.331 n=40+40)
RegexpMatchEasy1_1K-4 97.7MB/s ± 4% 97.8MB/s ± 3% ~ (p=0.851 n=40+40)
RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal)
RegexpMatchMedium_1K-4 1.89MB/s ± 0% 1.90MB/s ± 1% +0.12% (p=0.031 n=40+40)
RegexpMatchHard_32-4 1.09MB/s ± 1% 1.09MB/s ± 1% ~ (p=0.597 n=40+40)
RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.16MB/s ± 1% ~ (p=0.565 n=40+35)
Revcomp-4 31.1MB/s ± 2% 31.2MB/s ± 2% +0.44% (p=0.018 n=38+39)
Template-4 1.85MB/s ± 1% 1.85MB/s ± 1% ~ (p=0.873 n=40+40)
[Geo mean] 6.66MB/s 6.67MB/s +0.26%
Change-Id: Icc972d8a78ea06c32c3aa15733ff0537c82c2dc7
Reviewed-on: https://go-review.googlesource.com/58950
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
|
|
Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current
ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to
generate further optimized ARM code.
My patch implements this optimization. Here are some contrast test
results against the original go compiler.
1. The total size of all .a files in pkg/ shrinks by 0.03%.
2. The compilecmp benchmark shows a little decline.
name old time/op new time/op delta
Template 2.35s ± 1% 2.37s ± 3% +0.94% (p=0.006 n=19+19)
Unicode 1.33s ± 3% 1.33s ± 2% ~ (p=0.158 n=20+18)
GoTypes 7.86s ± 2% 7.84s ± 1% ~ (p=0.284 n=19+18)
Compiler 37.5s ± 1% 37.7s ± 2% ~ (p=0.101 n=20+19)
SSA 83.4s ± 2% 83.6s ± 2% ~ (p=0.231 n=20+20)
Flate 1.46s ± 2% 1.45s ± 1% ~ (p=0.097 n=20+17)
GoParser 1.86s ± 2% 1.86s ± 4% ~ (p=0.738 n=20+20)
Reflect 5.10s ± 1% 5.11s ± 1% ~ (p=0.290 n=20+18)
Tar 1.78s ± 2% 1.77s ± 2% ~ (p=0.166 n=19+20)
XML 2.61s ± 2% 2.61s ± 2% ~ (p=0.665 n=19+19)
[Geo mean] 4.67s 4.68s +0.16%
name old user-time/op new user-time/op delta
Template 2.79s ± 3% 2.80s ± 2% ~ (p=0.662 n=20+20)
Unicode 1.62s ± 3% 1.64s ± 4% ~ (p=0.252 n=20+20)
GoTypes 9.58s ± 2% 9.62s ± 2% ~ (p=0.250 n=20+20)
Compiler 46.2s ± 1% 46.2s ± 1% ~ (p=0.602 n=20+19)
SSA 108s ± 1% 108s ± 2% ~ (p=0.242 n=18+20)
Flate 1.69s ± 3% 1.69s ± 4% ~ (p=0.470 n=20+20)
GoParser 2.16s ± 3% 2.20s ± 4% +1.70% (p=0.005 n=19+20)
Reflect 6.02s ± 2% 6.02s ± 2% ~ (p=0.700 n=20+17)
Tar 2.11s ± 2% 2.11s ± 3% ~ (p=0.480 n=18+20)
XML 3.07s ± 2% 3.11s ± 4% +1.50% (p=0.043 n=20+20)
[Geo mean] 5.61s 5.64s +0.55%
name old text-bytes new text-bytes delta
HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal)
3. The go1 benchmark shows improvement totally, and even more than 10%
improvement in the test case Revcomp.
name old time/op new time/op delta
BinaryTree17-4 42.0s ± 1% 41.5s ± 1% -1.32% (p=0.000 n=39+40)
Fannkuch11-4 24.1s ± 1% 23.6s ± 0% -2.38% (p=0.000 n=40+40)
FmtFprintfEmpty-4 843ns ± 0% 839ns ± 1% -0.46% (p=0.000 n=33+40)
FmtFprintfString-4 1.44µs ± 1% 1.37µs ± 1% -5.48% (p=0.000 n=40+35)
FmtFprintfInt-4 1.44µs ± 1% 1.41µs ± 2% -1.50% (p=0.000 n=40+40)
FmtFprintfIntInt-4 2.07µs ± 1% 2.06µs ± 0% -0.78% (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4 2.50µs ± 1% 2.33µs ± 1% -6.85% (p=0.000 n=40+40)
FmtFprintfFloat-4 4.36µs ± 1% 4.34µs ± 0% -0.39% (p=0.017 n=40+40)
FmtManyArgs-4 8.11µs ± 0% 8.00µs ± 0% -1.37% (p=0.000 n=40+40)
GobDecode-4 105ms ± 2% 103ms ± 2% -2.17% (p=0.000 n=39+39)
GobEncode-4 90.1ms ± 2% 88.6ms ± 1% -1.67% (p=0.000 n=40+39)
Gzip-4 4.18s ± 1% 4.09s ± 1% -2.03% (p=0.000 n=40+40)
Gunzip-4 608ms ± 1% 603ms ± 1% -0.86% (p=0.000 n=40+34)
HTTPClientServer-4 674µs ± 3% 661µs ± 2% -1.82% (p=0.000 n=40+39)
JSONEncode-4 256ms ± 1% 243ms ± 0% -5.11% (p=0.000 n=39+31)
JSONDecode-4 915ms ± 1% 904ms ± 1% -1.18% (p=0.000 n=40+36)
Mandelbrot200-4 49.2ms ± 0% 49.3ms ± 0% ~ (p=0.254 n=34+40)
GoParse-4 46.9ms ± 2% 46.9ms ± 1% ~ (p=0.737 n=40+39)
RegexpMatchEasy0_32-4 1.28µs ± 1% 1.27µs ± 1% -0.71% (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4 7.86µs ± 4% 7.67µs ± 4% -2.46% (p=0.000 n=38+40)
RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 10.4µs ± 2% 10.3µs ± 2% -0.88% (p=0.003 n=40+39)
RegexpMatchMedium_32-4 2.05µs ± 0% 2.04µs ± 0% -0.34% (p=0.000 n=40+33)
RegexpMatchMedium_1K-4 541µs ± 1% 535µs ± 1% -1.02% (p=0.000 n=40+38)
RegexpMatchHard_32-4 29.3µs ± 1% 29.1µs ± 1% -0.51% (p=0.000 n=40+40)
RegexpMatchHard_1K-4 881µs ± 1% 871µs ± 1% -1.15% (p=0.000 n=40+40)
Revcomp-4 81.7ms ± 2% 67.5ms ± 2% -17.37% (p=0.000 n=39+39)
Template-4 1.05s ± 1% 1.08s ± 2% +3.67% (p=0.000 n=40+40)
TimeParse-4 7.24µs ± 1% 7.09µs ± 1% -2.13% (p=0.000 n=40+40)
TimeFormat-4 13.2µs ± 1% 13.1µs ± 0% -0.31% (p=0.007 n=40+31)
[Geo mean] 733µs 718µs -2.03%
name old speed new speed delta
GobDecode-4 7.28MB/s ± 2% 7.44MB/s ± 2% +2.23% (p=0.000 n=39+39)
GobEncode-4 8.52MB/s ± 2% 8.67MB/s ± 1% +1.70% (p=0.000 n=40+39)
Gzip-4 4.65MB/s ± 1% 4.74MB/s ± 1% +1.94% (p=0.000 n=37+40)
Gunzip-4 31.9MB/s ± 1% 32.2MB/s ± 1% +0.90% (p=0.000 n=40+36)
JSONEncode-4 7.57MB/s ± 1% 7.98MB/s ± 0% +5.41% (p=0.000 n=40+31)
JSONDecode-4 2.12MB/s ± 1% 2.15MB/s ± 1% +1.23% (p=0.000 n=40+40)
GoParse-4 1.23MB/s ± 1% 1.23MB/s ± 1% ~ (p=0.769 n=39+40)
RegexpMatchEasy0_32-4 25.0MB/s ± 1% 25.2MB/s ± 1% +0.71% (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4 130MB/s ± 5% 134MB/s ± 4% +2.53% (p=0.000 n=38+40)
RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.1MB/s ± 1% +0.55% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 98.5MB/s ± 2% 99.4MB/s ± 2% +0.88% (p=0.003 n=40+39)
RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal)
RegexpMatchMedium_1K-4 1.89MB/s ± 1% 1.91MB/s ± 1% +1.02% (p=0.000 n=40+38)
RegexpMatchHard_32-4 1.10MB/s ± 1% 1.10MB/s ± 0% +0.41% (p=0.000 n=40+33)
RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.17MB/s ± 1% +1.21% (p=0.000 n=40+40)
Revcomp-4 31.1MB/s ± 2% 37.6MB/s ± 2% +21.03% (p=0.000 n=39+39)
Template-4 1.86MB/s ± 1% 1.79MB/s ± 1% -3.51% (p=0.000 n=40+38)
[Geo mean] 6.66MB/s 6.80MB/s +2.13%
fixes #21492
Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d
Reviewed-on: https://go-review.googlesource.com/58450
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
For address of an auto or arg, on all non-x86 architectures
the assembler backend encodes the actual SP offset in the
instruction but leaves the offset in Prog unchanged. When the
assembly is printed in compile -S, it shows an offset
relative to pseudo FP/SP with an actual hardware SP base
register (e.g. R13 on ARM). This is confusing. Unset the
base register if it is indeed SP, so the assembly output is
consistent. If the base register isn't SP, it should be an
error and the error output contains the actual base register.
For address loading instructions, the base register isn't set
in the compiler on non-x86 architectures. Set it. Normally it
is SP and will be unset in the change mentioned above for
printing. If it is not, it will be an error and the error
output contains the actual base register.
No change in generated binary, only printed assembly. Passes
"go build -a -toolexec 'toolstash -cmp' std cmd" on all
architectures.
Fixes #21064.
Change-Id: Ifafe8d5f9b437efbe824b63b3cbc2f5f6cdc1fd5
Reviewed-on: https://go-review.googlesource.com/49432
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This reverts commit 94d540a4b6bf68ec472bf4469037955e3133fcf7.
Reason for revert: prefer something along the lines of CL 42018.
Change-Id: I876fe32e98f37d8d725fe55e0fd0ea429c0198e0
Reviewed-on: https://go-review.googlesource.com/42022
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|