Age | Commit message (Collapse) | Author |
|
Currently, deferreturn runs deferred functions by backing up its
return PC to the deferreturn call, and then effectively tail-calling
the deferred function (via jmpdefer). The effect of this is that the
deferred function appears to be called directly from the deferee, and
when it returns, the deferee calls deferreturn again so it can run the
next deferred function if necessary.
This unusual flow control leads to a large number of special cases and
complications all over the tool chain.
This used to be necessary because deferreturn copied the deferred
function's argument frame directly into its caller's frame and then
had to invoke that call as if it had been called from its caller's
frame so it could access it arguments. But now that we've simplified
defer processing so the runtime only deals with argument-less
closures, this approach is no longer necessary.
This CL simplifies all of this by making deferreturn simply call
deferred functions in a loop.
This eliminates the need for jmpdefer, so we can delete a bunch of
per-architecture assembly code.
This eliminates several special cases on Wasm, since it couldn't
support these calling shenanigans directly and thus had to simulate
the loop a different way. Now Wasm can largely work the way the other
platforms do.
This eliminates the per-architecture Ginsnopdefer operation. On PPC64,
this was necessary to reload the TOC pointer after the tail call
(since TOC pointers in general make tail calls impossible). The tail
call is gone, and in the case where we do force a jump to the
deferreturn call when recovering from an open-coded defer, we go
through gogo (via runtime.recovery), which handles the TOC. On other
platforms, we needed a NOP so traceback didn't get confused by seeing
the return to the CALL instruction, rather than the usual return to
the instruction following the CALL instruction. Now we don't inject a
return to the CALL instruction at all, so this NOP is also
unnecessary.
The one potential effect of this is that deferreturn could now appear
in stack traces from deferred functions. However, this could already
happen from open-coded defers, so we've long since marked deferreturn
as a "wrapper" so it gets elided not only from printed stack traces,
but from runtime.Callers*.
This is a retry of CL 337652 because we had to back out its parent.
There are no changes in this version.
Change-Id: I3f54b7fec1d7ccac71cc6cf6835c6a46b7e5fb6c
Reviewed-on: https://go-review.googlesource.com/c/go/+/339397
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
replace jmpdefer with a loop"
This reverts CL 227652.
I'm reverting CL 337651 and this builds on top of it.
Change-Id: I03ce363be44c2a3defff2e43e7b1aad83386820d
Reviewed-on: https://go-review.googlesource.com/c/go/+/338709
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Currently, deferreturn runs deferred functions by backing up its
return PC to the deferreturn call, and then effectively tail-calling
the deferred function (via jmpdefer). The effect of this is that the
deferred function appears to be called directly from the deferee, and
when it returns, the deferee calls deferreturn again so it can run the
next deferred function if necessary.
This unusual flow control leads to a large number of special cases and
complications all over the tool chain.
This used to be necessary because deferreturn copied the deferred
function's argument frame directly into its caller's frame and then
had to invoke that call as if it had been called from its caller's
frame so it could access it arguments. But now that we've simplified
defer processing so the runtime only deals with argument-less
closures, this approach is no longer necessary.
This CL simplifies all of this by making deferreturn simply call
deferred functions in a loop.
This eliminates the need for jmpdefer, so we can delete a bunch of
per-architecture assembly code.
This eliminates several special cases on Wasm, since it couldn't
support these calling shenanigans directly and thus had to simulate
the loop a different way. Now Wasm can largely work the way the other
platforms do.
This eliminates the per-architecture Ginsnopdefer operation. On PPC64,
this was necessary to reload the TOC pointer after the tail call
(since TOC pointers in general make tail calls impossible). The tail
call is gone, and in the case where we do force a jump to the
deferreturn call when recovering from an open-coded defer, we go
through gogo (via runtime.recovery), which handles the TOC. On other
platforms, we needed a NOP so traceback didn't get confused by seeing
the return to the CALL instruction, rather than the usual return to
the instruction following the CALL instruction. Now we don't inject a
return to the CALL instruction at all, so this NOP is also
unnecessary.
The one potential effect of this is that deferreturn could now appear
in stack traces from deferred functions. However, this could already
happen from open-coded defers, so we've long since marked deferreturn
as a "wrapper" so it gets elided not only from printed stack traces,
but from runtime.Callers*.
Change-Id: Ie9f700cd3fb774f498c9edce363772a868407bf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/337652
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.
Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
|
|
When -clobberdeadreg flag is set, the compiler inserts code that
clobbers integer registers at call sites. This may be helpful for
debugging register ABI.
Only implemented on AMD64 for now.
Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899
Reviewed-on: https://go-review.googlesource.com/c/go/+/302809
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
In the ppc64 ISA DS form loads and stores are restricted to offset
fields that are a multiple of 4. This is currently handled with checks
in the rules that generate MOVDload, MOVWload, MOVDstore and
MOVDstorezero to prevent invalid instructions from getting to the
assembler.
An unhandled case was discovered which led to the search for a better
solution to this problem. Now, instead of checking the offset in the
rules, this will be detected when processing these Ops in
ssaGenValues in ppc64/ssa.go. If the offset is not valid, the address
of the symbol to be loaded or stored will be computed using the base
register + offset, and that value used in the new base register.
With the full address in the base register, the offset field can be
zero in the instruction.
Updates #44739
Change-Id: I4f3c0c469ae70a63e3add295c9b55ea0e30ef9b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/299789
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
These instructions are actually 5 argument opcodes as specified
by the ISA. Prior to this patch, the MB and ME arguments were
merged into a single bitmask operand to workaround the limitations
of the ppc64 assembler backend.
This limitation no longer exists. Thus, we can pass operands for
these opcodes without having to merge the MB and ME arguments in
the assembler frontend or compiler backend.
Likewise, support for 4 operand variants is unchanged.
Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/298750
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
|
|
These are the the most common uses, and they reduce line noise.
I don't love adding new deprecated APIs,
but since they're trivial wrappers,
it'll be very easy to update them along with the rest.
No functional changes; passes toolstash-check.
Change-Id: I691a8175cfef9081180e463c63f326376af3f3a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/296009
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# maxOpenDefers is declared in ssa.go but used only by walk.
mv maxOpenDefers walk.go
# gc.Arch -> ssagen.Arch
# It is not as nice but will do for now.
mv Arch ArchInfo
mv thearch Arch
mv Arch ArchInfo arch.go
# Pull dwarf out of pgen.go.
mv debuginfo declPos createDwarfVars preInliningDcls \
createSimpleVars createSimpleVar \
createComplexVars createComplexVar \
dwarf.go
# Pull high-level compilation out of pgen.go,
# leaving only the SSA code.
mv compilequeue funccompile compile compilenow \
compileFunctions isInlinableButNotInlined \
initLSym \
compile.go
mv BoundsCheckFunc GCWriteBarrierReg ssa.go
mv largeStack largeStackFrames CheckLargeStacks pgen.go
# All that is left in dcl.go is the nowritebarrierrecCheck
mv dcl.go nowb.go
# Export API and unexport non-API.
mv initssaconfig InitConfig
mv isIntrinsicCall IsIntrinsicCall
mv ssaDumpInline DumpInline
mv initSSATables InitTables
mv initSSAEnv InitEnv
mv compileSSA Compile
mv stackOffset StackOffset
mv canSSAType TypeOK
mv SSAGenState State
mv FwdRefAux fwdRefAux
mv cgoSymABIs CgoSymABIs
mv readSymABIs ReadSymABIs
mv initLSym InitLSym
mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go
mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen
'
rm go.go gsubr.go
Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/279476
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Object file writing routines are used not just at the end
of the compilation but also during static data layout in walk.
Split them into their own package.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# Move bit vector to new package bitvec
mv bvec.n bvec.N
mv bvec.b bvec.B
mv bvec BitVec
mv bvalloc New
mv bvbulkalloc NewBulk
mv bulkBvec.next bulkBvec.Next
mv bulkBvec Bulk
mv H0 h0
mv Hp hp
# Leave bvecSet and bitmap hashes behind - not needed as broadly.
mv bvecSet.extractUniqe bvecSet.extractUnique
mv h0 bvecSet bvecSet.grow bvecSet.add \
bvecSet.extractUnique hashbitmap bvset.go
mv bv.go cmd/compile/internal/bitvec
ex . ../arm ../arm64 ../mips ../mips64 ../ppc64 ../s390x ../riscv64 {
import "cmd/internal/obj"
var a *obj.Addr
var i int64
Addrconst(a, i) -> a.SetConst(i)
var p, to *obj.Prog
Patch(p, to) -> p.To.SetTarget(to)
}
rm Addrconst Patch
# Move object-writing API to new package objw
mv duint8 Objw_Uint8
mv duint16 Objw_Uint16
mv duint32 Objw_Uint32
mv duintptr Objw_Uintptr
mv duintxx Objw_UintN
mv dsymptr Objw_SymPtr
mv dsymptrOff Objw_SymPtrOff
mv dsymptrWeakOff Objw_SymPtrWeakOff
mv ggloblsym Objw_Global
mv dbvec Objw_BitVec
mv newProgs NewProgs
mv Progs.clearp Progs.Clear
mv Progs.settext Progs.SetText
mv Progs.next Progs.Next
mv Progs.pc Progs.PC
mv Progs.pos Progs.Pos
mv Progs.curfn Progs.CurFunc
mv Progs.progcache Progs.Cache
mv Progs.cacheidx Progs.CacheIndex
mv Progs.nextLive Progs.NextLive
mv Progs.prevLive Progs.PrevLive
mv Progs.Appendpp Progs.Append
mv LivenessIndex.stackMapIndex LivenessIndex.StackMapIndex
mv LivenessIndex.isUnsafePoint LivenessIndex.IsUnsafePoint
mv Objw_Uint8 Objw_Uint16 Objw_Uint32 Objw_Uintptr Objw_UintN \
Objw_SymPtr Objw_SymPtrOff Objw_SymPtrWeakOff Objw_Global \
Objw_BitVec \
objw.go
mv sharedProgArray NewProgs Progs \
LivenessIndex StackMapDontCare \
LivenessDontCare LivenessIndex.StackMapValid \
Progs.NewProg Progs.Flush Progs.Free Progs.Prog Progs.Clear Progs.Append Progs.SetText \
prog.go
mv prog.go objw.go cmd/compile/internal/objw
# Move ggloblnod to obj with the rest of the non-objw higher-level writing.
mv ggloblnod obj.go
'
cd ../objw
rf '
mv Objw_Uint8 Uint8
mv Objw_Uint16 Uint16
mv Objw_Uint32 Uint32
mv Objw_Uintptr Uintptr
mv Objw_UintN UintN
mv Objw_SymPtr SymPtr
mv Objw_SymPtrOff SymPtrOff
mv Objw_SymPtrWeakOff SymPtrWeakOff
mv Objw_Global Global
mv Objw_BitVec BitVec
'
Change-Id: I2b87085aa788564fb322e9c55bddd73347b4d5fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/279310
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
[generated]
To break up package gc, we need to put these calculations somewhere
lower in the import graph, either an existing or new package. Package types
already needs this code and is using hacks to get it without an import cycle.
We can remove the hacks and set up for the new package gc by moving the
code into package types itself.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# Remove old import cycle hacks in gc.
rm TypecheckInit:/types.Widthptr =/-0,/types.Dowidth =/+0 \
../ssa/export_test.go:/types.Dowidth =/-+
ex {
import "cmd/compile/internal/types"
types.Widthptr -> Widthptr
types.Dowidth -> dowidth
}
# Disable CalcSize in tests instead of base.Fatalf
sub dowidth:/base.Fatalf\("dowidth without betypeinit"\)/ \
// Assume this is a test. \
return
# Move size calculation into cmd/compile/internal/types
mv Widthptr PtrSize
mv Widthreg RegSize
mv slicePtrOffset SlicePtrOffset
mv sliceLenOffset SliceLenOffset
mv sliceCapOffset SliceCapOffset
mv sizeofSlice SliceSize
mv sizeofString StringSize
mv skipDowidthForTracing SkipSizeForTracing
mv dowidth CalcSize
mv checkwidth CheckSize
mv widstruct calcStructOffset
mv sizeCalculationDisabled CalcSizeDisabled
mv defercheckwidth DeferCheckSize
mv resumecheckwidth ResumeCheckSize
mv typeptrdata PtrDataSize
mv \
PtrSize RegSize SlicePtrOffset SkipSizeForTracing typePos align.go PtrDataSize \
size.go
mv size.go cmd/compile/internal/types
'
: # Remove old import cycle hacks in types.
cd ../types
rf '
ex {
Widthptr -> PtrSize
Dowidth -> CalcSize
}
rm Widthptr Dowidth
'
Change-Id: Ib96cdc6bda2617235480c29392ea5cfb20f60cd8
Reviewed-on: https://go-review.googlesource.com/c/go/+/279234
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
There are a handful of pre-computed magic symbols known by
package gc, and we need a place to store them.
If we keep them together, the need for type *ir.Name means that
package ir is the lowest package in the import hierarchy that they
can go in. And package ir needs gopkg for methodSymSuffix
(in a later CL), so they can't go any higher either, at least not all together.
So package ir it is.
Rather than dump them all into the top-level package ir
namespace, however, we introduce global structs, Syms, Pkgs, and Names,
and make the known symbols, packages, and names fields of those.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
add go.go:$ \
// Names holds known names. \
var Names struct{} \
\
// Syms holds known symbols. \
var Syms struct {} \
\
// Pkgs holds known packages. \
var Pkgs struct {} \
mv staticuint64s Names.Staticuint64s
mv zerobase Names.Zerobase
mv assertE2I Syms.AssertE2I
mv assertE2I2 Syms.AssertE2I2
mv assertI2I Syms.AssertI2I
mv assertI2I2 Syms.AssertI2I2
mv deferproc Syms.Deferproc
mv deferprocStack Syms.DeferprocStack
mv Deferreturn Syms.Deferreturn
mv Duffcopy Syms.Duffcopy
mv Duffzero Syms.Duffzero
mv gcWriteBarrier Syms.GCWriteBarrier
mv goschedguarded Syms.Goschedguarded
mv growslice Syms.Growslice
mv msanread Syms.Msanread
mv msanwrite Syms.Msanwrite
mv msanmove Syms.Msanmove
mv newobject Syms.Newobject
mv newproc Syms.Newproc
mv panicdivide Syms.Panicdivide
mv panicshift Syms.Panicshift
mv panicdottypeE Syms.PanicdottypeE
mv panicdottypeI Syms.PanicdottypeI
mv panicnildottype Syms.Panicnildottype
mv panicoverflow Syms.Panicoverflow
mv raceread Syms.Raceread
mv racereadrange Syms.Racereadrange
mv racewrite Syms.Racewrite
mv racewriterange Syms.Racewriterange
mv SigPanic Syms.SigPanic
mv typedmemclr Syms.Typedmemclr
mv typedmemmove Syms.Typedmemmove
mv Udiv Syms.Udiv
mv writeBarrier Syms.WriteBarrier
mv zerobaseSym Syms.Zerobase
mv arm64HasATOMICS Syms.ARM64HasATOMICS
mv armHasVFPv4 Syms.ARMHasVFPv4
mv x86HasFMA Syms.X86HasFMA
mv x86HasPOPCNT Syms.X86HasPOPCNT
mv x86HasSSE41 Syms.X86HasSSE41
mv WasmDiv Syms.WasmDiv
mv WasmMove Syms.WasmMove
mv WasmZero Syms.WasmZero
mv WasmTruncS Syms.WasmTruncS
mv WasmTruncU Syms.WasmTruncU
mv gopkg Pkgs.Go
mv itabpkg Pkgs.Itab
mv itablinkpkg Pkgs.Itablink
mv mappkg Pkgs.Map
mv msanpkg Pkgs.Msan
mv racepkg Pkgs.Race
mv Runtimepkg Pkgs.Runtime
mv trackpkg Pkgs.Track
mv unsafepkg Pkgs.Unsafe
mv Names Syms Pkgs symtab.go
mv symtab.go cmd/compile/internal/ir
'
Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/279235
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct.
The previous CL defined an interface INode modeling a *Node.
This CL:
- Changes all references outside internal/ir to use INode,
along with many references inside internal/ir as well.
- Renames Node to node.
- Renames INode to Node
So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible,
and the code outside package ir is now (clearly) using only the interface.
The usual rule is never to redefine an existing name with a new meaning,
so that old code that hasn't been updated gets a "unknown name" error
instead of more mysterious errors or silent misbehavior. That rule would
caution against replacing Node-the-struct with Node-the-interface,
as in this CL, because code that says *Node would now be using a pointer
to an interface. But this CL is being landed at the same time as another that
moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node,
which does follow the rule: any lingering references to gc.Node will be told
it's gone, not silently start using pointers to interfaces. So the rule is followed
by the CL sequence, just not this specific CL.
Overall, the loss of inlining caused by using interfaces cuts the compiler speed
by about 6%, a not insignificant amount. However, as we convert the representation
to concrete structs that are not the giant Node over the next weeks, that speed
should come back as more of the compiler starts operating directly on concrete types
and the memory taken up by the graph of Nodes drops due to the more precise
structs. Honestly, I was expecting worse.
% benchstat bench.old bench.new
name old time/op new time/op delta
Template 168ms ± 4% 182ms ± 2% +8.34% (p=0.000 n=9+9)
Unicode 72.2ms ±10% 82.5ms ± 6% +14.38% (p=0.000 n=9+9)
GoTypes 563ms ± 8% 598ms ± 2% +6.14% (p=0.006 n=9+9)
Compiler 2.89s ± 4% 3.04s ± 2% +5.37% (p=0.000 n=10+9)
SSA 6.45s ± 4% 7.25s ± 5% +12.41% (p=0.000 n=9+10)
Flate 105ms ± 2% 115ms ± 1% +9.66% (p=0.000 n=10+8)
GoParser 144ms ±10% 152ms ± 2% +5.79% (p=0.011 n=9+8)
Reflect 345ms ± 9% 370ms ± 4% +7.28% (p=0.001 n=10+9)
Tar 149ms ± 9% 161ms ± 5% +8.05% (p=0.001 n=10+9)
XML 190ms ± 3% 209ms ± 2% +9.54% (p=0.000 n=9+8)
LinkCompiler 327ms ± 2% 325ms ± 2% ~ (p=0.382 n=8+8)
ExternalLinkCompiler 1.77s ± 4% 1.73s ± 6% ~ (p=0.113 n=9+10)
LinkWithoutDebugCompiler 214ms ± 4% 211ms ± 2% ~ (p=0.360 n=10+8)
StdCmd 14.8s ± 3% 15.9s ± 1% +6.98% (p=0.000 n=10+9)
[Geo mean] 480ms 510ms +6.31%
name old user-time/op new user-time/op delta
Template 223ms ± 3% 237ms ± 3% +6.16% (p=0.000 n=9+10)
Unicode 103ms ± 6% 113ms ± 3% +9.53% (p=0.000 n=9+9)
GoTypes 758ms ± 8% 800ms ± 2% +5.55% (p=0.003 n=10+9)
Compiler 3.95s ± 2% 4.12s ± 2% +4.34% (p=0.000 n=10+9)
SSA 9.43s ± 1% 9.74s ± 4% +3.25% (p=0.000 n=8+10)
Flate 132ms ± 2% 141ms ± 2% +6.89% (p=0.000 n=9+9)
GoParser 177ms ± 9% 183ms ± 4% ~ (p=0.050 n=9+9)
Reflect 467ms ±10% 495ms ± 7% +6.17% (p=0.029 n=10+10)
Tar 183ms ± 9% 197ms ± 5% +7.92% (p=0.001 n=10+10)
XML 249ms ± 5% 268ms ± 4% +7.82% (p=0.000 n=10+9)
LinkCompiler 544ms ± 5% 544ms ± 6% ~ (p=0.863 n=9+9)
ExternalLinkCompiler 1.79s ± 4% 1.75s ± 6% ~ (p=0.075 n=10+10)
LinkWithoutDebugCompiler 248ms ± 6% 246ms ± 2% ~ (p=0.965 n=10+8)
[Geo mean] 483ms 504ms +4.41%
[git-generate]
cd src/cmd/compile/internal/ir
: # We need to do the conversion in multiple steps, so we introduce
: # a temporary type alias that will start out meaning the pointer-to-struct
: # and then change to mean the interface.
rf '
mv Node OldNode
add node.go \
type Node = *OldNode
'
: # It should work to do this ex in ir, but it misses test files, due to a bug in rf.
: # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests.
cd ../gc
rf '
ex . ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
cd ../ssa
rf '
ex {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
: # Back in ir, finish conversion clumsily with sed,
: # because type checking and circular aliases do not mix.
cd ../ir
sed -i '' '
/type Node = \*OldNode/d
s/\*OldNode/Node/g
s/^func (n Node)/func (n *OldNode)/
s/OldNode/node/g
s/type INode interface/type Node interface/
s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/
' *.go
gofmt -w *.go
sed -i '' '
s/{Func{}, 136, 248}/{Func{}, 152, 280}/
s/{Name{}, 32, 56}/{Name{}, 44, 80}/
s/{Param{}, 24, 48}/{Param{}, 44, 88}/
s/{node{}, 76, 128}/{node{}, 88, 152}/
' sizeof_test.go
cd ../ssa
sed -i '' '
s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/
' sizeof_test.go
cd ../gc
sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go
cd ../../../..
go install std cmd
cd cmd/compile
go test -u || go test -u
Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553
Reviewed-on: https://go-review.googlesource.com/c/go/+/272935
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
If we want to break up package gc at all, we will need to move
the compiler IR it defines into a separate package that can be
imported by packages that gc itself imports. This CL does that.
It also removes the TINT8 etc aliases so that all code is clear
about which package things are coming from.
This CL is automatically generated by the script below.
See the comments in the script for details about the changes.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
# These names were never fully qualified
# when the types package was added.
# Do it now, to avoid confusion about where they live.
inline -rm \
Txxx \
TINT8 \
TUINT8 \
TINT16 \
TUINT16 \
TINT32 \
TUINT32 \
TINT64 \
TUINT64 \
TINT \
TUINT \
TUINTPTR \
TCOMPLEX64 \
TCOMPLEX128 \
TFLOAT32 \
TFLOAT64 \
TBOOL \
TPTR \
TFUNC \
TSLICE \
TARRAY \
TSTRUCT \
TCHAN \
TMAP \
TINTER \
TFORW \
TANY \
TSTRING \
TUNSAFEPTR \
TIDEAL \
TNIL \
TBLANK \
TFUNCARGS \
TCHANARGS \
NTYPE \
BADWIDTH
# esc.go and escape.go do not need to be split.
# Append esc.go onto the end of escape.go.
mv esc.go escape.go
# Pull out the type format installation from func Main,
# so it can be carried into package ir.
mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats
# Names that need to be exported for use by code left in gc.
mv Isconst IsConst
mv asNode AsNode
mv asNodes AsNodes
mv asTypesNode AsTypesNode
mv basicnames BasicTypeNames
mv builtinpkg BuiltinPkg
mv consttype ConstType
mv dumplist DumpList
mv fdumplist FDumpList
mv fmtMode FmtMode
mv goopnames OpNames
mv inspect Inspect
mv inspectList InspectList
mv localpkg LocalPkg
mv nblank BlankNode
mv numImport NumImport
mv opprec OpPrec
mv origSym OrigSym
mv stmtwithinit StmtWithInit
mv dump DumpAny
mv fdump FDumpAny
mv nod Nod
mv nodl NodAt
mv newname NewName
mv newnamel NewNameAt
mv assertRepresents AssertValidTypeForConst
mv represents ValidTypeForConst
mv nodlit NewLiteral
# Types and fields that need to be exported for use by gc.
mv nowritebarrierrecCallSym SymAndPos
mv SymAndPos.lineno SymAndPos.Pos
mv SymAndPos.target SymAndPos.Sym
mv Func.lsym Func.LSym
mv Func.setWBPos Func.SetWBPos
mv Func.numReturns Func.NumReturns
mv Func.numDefers Func.NumDefers
mv Func.nwbrCalls Func.NWBRCalls
# initLSym is an algorithm left behind in gc,
# not an operation on Func itself.
mv Func.initLSym initLSym
mv nodeQueue NodeQueue
mv NodeQueue.empty NodeQueue.Empty
mv NodeQueue.popLeft NodeQueue.PopLeft
mv NodeQueue.pushRight NodeQueue.PushRight
# Many methods on Node are actually algorithms that
# would apply to any node implementation.
# Those become plain functions.
mv Node.funcname FuncName
mv Node.isBlank IsBlank
mv Node.isGoConst isGoConst
mv Node.isNil IsNil
mv Node.isParamHeapCopy isParamHeapCopy
mv Node.isParamStackCopy isParamStackCopy
mv Node.isSimpleName isSimpleName
mv Node.mayBeShared MayBeShared
mv Node.pkgFuncName PkgFuncName
mv Node.backingArrayPtrLen backingArrayPtrLen
mv Node.isterminating isTermNode
mv Node.labeledControl labeledControl
mv Nodes.isterminating isTermNodes
mv Nodes.sigerr fmtSignature
mv Node.MethodName methodExprName
mv Node.MethodFunc methodExprFunc
mv Node.IsMethod IsMethod
# Every node will need to implement RawCopy;
# Copy and SepCopy algorithms will use it.
mv Node.rawcopy Node.RawCopy
mv Node.copy Copy
mv Node.sepcopy SepCopy
# Extract Node.Format method body into func FmtNode,
# but leave method wrapper behind.
mv Node.Format:0,$ FmtNode
# Formatting helpers that will apply to all node implementations.
mv Node.Line Line
mv Node.exprfmt exprFmt
mv Node.jconv jconvFmt
mv Node.modeString modeString
mv Node.nconv nconvFmt
mv Node.nodedump nodeDumpFmt
mv Node.nodefmt nodeFmt
mv Node.stmtfmt stmtFmt
# Constant support needed for code moving to ir.
mv okforconst OKForConst
mv vconv FmtConst
mv int64Val Int64Val
mv float64Val Float64Val
mv Node.ValueInterface ConstValue
# Organize code into files.
mv LocalPkg BuiltinPkg ir.go
mv NumImport InstallTypeFormats Line fmt.go
mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \
AsNode AsTypesNode BlankNode OrigSym \
Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \
IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \
Node.RawCopy SepCopy Copy \
IsNil IsBlank IsMethod \
Node.Typ Node.StorageClass node.go
mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go
# Move files to new ir package.
mv bitset.go class_string.go dump.go fmt.go \
ir.go node.go op_string.go val.go \
sizeof_test.go cmd/compile/internal/ir
'
: # fix mkbuiltin.go to generate the changes made to builtin.go during rf
sed -i '' '
s/\[T/[types.T/g
s/\*Node/*ir.Node/g
/internal\/types/c \
fmt.Fprintln(&b, `import (`) \
fmt.Fprintln(&b, ` "cmd/compile/internal/ir"`) \
fmt.Fprintln(&b, ` "cmd/compile/internal/types"`) \
fmt.Fprintln(&b, `)`)
' mkbuiltin.go
gofmt -w mkbuiltin.go
: # update cmd/dist to add internal/ir
cd ../../../dist
sed -i '' '/compile.internal.gc/a\
"cmd/compile/internal/ir",
' buildtool.go
gofmt -w buildtool.go
: # update cmd/compile TestFormats
cd ../..
go install std cmd
cd cmd/compile
go test -u || go test # first one updates but fails; second passes
Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db
Reviewed-on: https://go-review.googlesource.com/c/go/+/273008
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Move Flag, Debug, Ctxt, Exit, and error messages to
new package cmd/compile/internal/base.
These are the core functionality that everything in gc uses
and which otherwise prevent splitting any other code
out of gc into different packages.
A minor milestone: the compiler source code
no longer contains the string "yy".
[git-generate]
cd src/cmd/compile/internal/gc
rf '
mv atExit AtExit
mv Ctxt atExitFuncs AtExit Exit base.go
mv lineno Pos
mv linestr FmtPos
mv flusherrors FlushErrors
mv yyerror Errorf
mv yyerrorl ErrorfAt
mv yyerrorv ErrorfVers
mv noder.yyerrorpos noder.errorAt
mv Warnl WarnfAt
mv errorexit ErrorExit
mv base.go debug.go flag.go print.go cmd/compile/internal/base
'
: # update comments
sed -i '' 's/yyerrorl/ErrorfAt/g; s/yyerror/Errorf/g' *.go
: # bootstrap.go is not built by default so invisible to rf
sed -i '' 's/Fatalf/base.Fatalf/' bootstrap.go
goimports -w bootstrap.go
: # update cmd/dist to add internal/base
cd ../../../dist
sed -i '' '/internal.amd64/a\
"cmd/compile/internal/base",
' buildtool.go
gofmt -w buildtool.go
Change-Id: I59903c7084222d6eaee38823fd222159ba24a31a
Reviewed-on: https://go-review.googlesource.com/c/go/+/272250
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The debug table is not as haphazard as flags, but there are still
a few mismatches between command-line names and variable names.
This CL moves them all into a consistent home (var Debug, like var Flag).
Code updated automatically using the rf command below.
A followup CL will make a few manual cleanups, leaving this CL
completely automated and easier to regenerate during merge
conflicts.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
add main.go var Debug struct{}
mv Debug_append Debug.Append
mv Debug_checkptr Debug.Checkptr
mv Debug_closure Debug.Closure
mv Debug_compilelater Debug.CompileLater
mv disable_checknil Debug.DisableNil
mv debug_dclstack Debug.DclStack
mv Debug_gcprog Debug.GCProg
mv Debug_libfuzzer Debug.Libfuzzer
mv Debug_checknil Debug.Nil
mv Debug_panic Debug.Panic
mv Debug_slice Debug.Slice
mv Debug_typeassert Debug.TypeAssert
mv Debug_wb Debug.WB
mv Debug_export Debug.Export
mv Debug_pctab Debug.PCTab
mv Debug_locationlist Debug.LocationLists
mv Debug_typecheckinl Debug.TypecheckInl
mv Debug_gendwarfinl Debug.DwarfInl
mv Debug_softfloat Debug.SoftFloat
mv Debug_defer Debug.Defer
mv Debug_dumpptrs Debug.DumpPtrs
mv flag.go:/parse.-d/-1,/unknown.debug/+2 parseDebug
mv debugtab Debug parseDebug \
debugHelpHeader debugHelpFooter \
debug.go
# Remove //go:generate line copied from main.go
rm debug.go:/go:generate/-+
'
Change-Id: I625761ca5659be4052f7161a83baa00df75cca91
Reviewed-on: https://go-review.googlesource.com/c/go/+/272246
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
by inserting hint when using bclrl.
Using this instruction as subroutine call is not the expected
default behavior, and as a result confuses the branch predictor.
The default expected behavior is a conditional return from a
subroutine.
We can change this assumption by encoding a hint this is not a
subroutine return.
The regex benchmarks are a pretty good example of how much this
hint can help generic ppc64le code on a power9 machine:
name old time/op new time/op delta
Find 606ns ± 0% 447ns ± 0% -26.27%
FindAllNoMatches 309ns ± 0% 205ns ± 0% -33.72%
FindString 609ns ± 0% 451ns ± 0% -26.04%
FindSubmatch 734ns ± 0% 594ns ± 0% -19.07%
FindStringSubmatch 706ns ± 0% 574ns ± 0% -18.83%
Literal 177ns ± 0% 136ns ± 0% -22.89%
NotLiteral 4.69µs ± 0% 2.34µs ± 0% -50.14%
MatchClass 6.05µs ± 0% 3.26µs ± 0% -46.08%
MatchClass_InRange 5.93µs ± 0% 3.15µs ± 0% -46.86%
ReplaceAll 3.15µs ± 0% 2.18µs ± 0% -30.77%
AnchoredLiteralShortNonMatch 156ns ± 0% 109ns ± 0% -30.61%
AnchoredLiteralLongNonMatch 192ns ± 0% 136ns ± 0% -29.34%
AnchoredShortMatch 268ns ± 0% 209ns ± 0% -22.00%
AnchoredLongMatch 472ns ± 0% 357ns ± 0% -24.30%
OnePassShortA 1.16µs ± 0% 0.87µs ± 0% -25.03%
NotOnePassShortA 1.34µs ± 0% 1.20µs ± 0% -10.63%
OnePassShortB 940ns ± 0% 655ns ± 0% -30.29%
NotOnePassShortB 873ns ± 0% 703ns ± 0% -19.52%
OnePassLongPrefix 258ns ± 0% 155ns ± 0% -40.13%
OnePassLongNotPrefix 943ns ± 0% 529ns ± 0% -43.89%
MatchParallelShared 591ns ± 0% 436ns ± 0% -26.31%
MatchParallelCopied 596ns ± 0% 435ns ± 0% -27.10%
QuoteMetaAll 186ns ± 0% 186ns ± 0% -0.16%
QuoteMetaNone 55.9ns ± 0% 55.9ns ± 0% +0.02%
Compile/Onepass 9.64µs ± 0% 9.26µs ± 0% -3.97%
Compile/Medium 21.7µs ± 0% 20.6µs ± 0% -4.90%
Compile/Hard 174µs ± 0% 174µs ± 0% +0.07%
Match/Easy0/16 7.35ns ± 0% 7.34ns ± 0% -0.11%
Match/Easy0/32 116ns ± 0% 97ns ± 0% -16.27%
Match/Easy0/1K 592ns ± 0% 562ns ± 0% -5.04%
Match/Easy0/32K 12.6µs ± 0% 12.5µs ± 0% -0.64%
Match/Easy0/1M 556µs ± 0% 556µs ± 0% -0.00%
Match/Easy0/32M 17.7ms ± 0% 17.7ms ± 0% +0.05%
Match/Easy0i/16 7.34ns ± 0% 7.35ns ± 0% +0.10%
Match/Easy0i/32 2.82µs ± 0% 1.64µs ± 0% -41.71%
Match/Easy0i/1K 83.2µs ± 0% 48.2µs ± 0% -42.06%
Match/Easy0i/32K 2.13ms ± 0% 1.80ms ± 0% -15.34%
Match/Easy0i/1M 68.1ms ± 0% 57.6ms ± 0% -15.31%
Match/Easy0i/32M 2.18s ± 0% 1.80s ± 0% -17.52%
Match/Easy1/16 7.36ns ± 0% 7.34ns ± 0% -0.24%
Match/Easy1/32 118ns ± 0% 96ns ± 0% -18.72%
Match/Easy1/1K 2.46µs ± 0% 1.58µs ± 0% -35.65%
Match/Easy1/32K 80.2µs ± 0% 54.6µs ± 0% -31.92%
Match/Easy1/1M 2.75ms ± 0% 1.88ms ± 0% -31.66%
Match/Easy1/32M 87.5ms ± 0% 59.8ms ± 0% -31.62%
Match/Medium/16 7.34ns ± 0% 7.34ns ± 0% +0.01%
Match/Medium/32 2.60µs ± 0% 1.50µs ± 0% -42.61%
Match/Medium/1K 78.1µs ± 0% 43.7µs ± 0% -44.06%
Match/Medium/32K 2.08ms ± 0% 1.52ms ± 0% -27.11%
Match/Medium/1M 66.5ms ± 0% 48.6ms ± 0% -26.96%
Match/Medium/32M 2.14s ± 0% 1.60s ± 0% -25.18%
Match/Hard/16 7.35ns ± 0% 7.35ns ± 0% +0.03%
Match/Hard/32 3.58µs ± 0% 2.44µs ± 0% -31.82%
Match/Hard/1K 108µs ± 0% 75µs ± 0% -31.04%
Match/Hard/32K 2.79ms ± 0% 2.25ms ± 0% -19.30%
Match/Hard/1M 89.4ms ± 0% 72.2ms ± 0% -19.26%
Match/Hard/32M 2.91s ± 0% 2.37s ± 0% -18.60%
Match/Hard1/16 11.1µs ± 0% 8.3µs ± 0% -25.07%
Match/Hard1/32 21.4µs ± 0% 16.1µs ± 0% -24.85%
Match/Hard1/1K 658µs ± 0% 498µs ± 0% -24.27%
Match/Hard1/32K 12.2ms ± 0% 11.7ms ± 0% -4.60%
Match/Hard1/1M 391ms ± 0% 374ms ± 0% -4.40%
Match/Hard1/32M 12.6s ± 0% 12.0s ± 0% -4.68%
Match_onepass_regex/16 870ns ± 0% 611ns ± 0% -29.79%
Match_onepass_regex/32 1.58µs ± 0% 1.08µs ± 0% -31.48%
Match_onepass_regex/1K 45.7µs ± 0% 30.3µs ± 0% -33.58%
Match_onepass_regex/32K 1.45ms ± 0% 0.97ms ± 0% -33.20%
Match_onepass_regex/1M 46.2ms ± 0% 30.9ms ± 0% -33.01%
Match_onepass_regex/32M 1.46s ± 0% 0.99s ± 0% -32.02%
name old alloc/op new alloc/op delta
Find 0.00B 0.00B 0.00%
FindAllNoMatches 0.00B 0.00B 0.00%
FindString 0.00B 0.00B 0.00%
FindSubmatch 48.0B ± 0% 48.0B ± 0% 0.00%
FindStringSubmatch 32.0B ± 0% 32.0B ± 0% 0.00%
Compile/Onepass 4.02kB ± 0% 4.02kB ± 0% 0.00%
Compile/Medium 9.39kB ± 0% 9.39kB ± 0% 0.00%
Compile/Hard 84.7kB ± 0% 84.7kB ± 0% 0.00%
Match_onepass_regex/16 0.00B 0.00B 0.00%
Match_onepass_regex/32 0.00B 0.00B 0.00%
Match_onepass_regex/1K 0.00B 0.00B 0.00%
Match_onepass_regex/32K 0.00B 0.00B 0.00%
Match_onepass_regex/1M 5.00B ± 0% 3.00B ± 0% -40.00%
Match_onepass_regex/32M 136B ± 0% 68B ± 0% -50.00%
name old allocs/op new allocs/op delta
Find 0.00 0.00 0.00%
FindAllNoMatches 0.00 0.00 0.00%
FindString 0.00 0.00 0.00%
FindSubmatch 1.00 ± 0% 1.00 ± 0% 0.00%
FindStringSubmatch 1.00 ± 0% 1.00 ± 0% 0.00%
Compile/Onepass 52.0 ± 0% 52.0 ± 0% 0.00%
Compile/Medium 112 ± 0% 112 ± 0% 0.00%
Compile/Hard 424 ± 0% 424 ± 0% 0.00%
Match_onepass_regex/16 0.00 0.00 0.00%
Match_onepass_regex/32 0.00 0.00 0.00%
Match_onepass_regex/1K 0.00 0.00 0.00%
Match_onepass_regex/32K 0.00 0.00 0.00%
Match_onepass_regex/1M 0.00 0.00 0.00%
Match_onepass_regex/32M 2.00 ± 0% 1.00 ± 0% -50.00%
name old speed new speed delta
QuoteMetaAll 75.2MB/s ± 0% 75.3MB/s ± 0% +0.15%
QuoteMetaNone 465MB/s ± 0% 465MB/s ± 0% -0.02%
Match/Easy0/16 2.18GB/s ± 0% 2.18GB/s ± 0% +0.10%
Match/Easy0/32 276MB/s ± 0% 330MB/s ± 0% +19.46%
Match/Easy0/1K 1.73GB/s ± 0% 1.82GB/s ± 0% +5.29%
Match/Easy0/32K 2.60GB/s ± 0% 2.62GB/s ± 0% +0.64%
Match/Easy0/1M 1.89GB/s ± 0% 1.89GB/s ± 0% +0.00%
Match/Easy0/32M 1.89GB/s ± 0% 1.89GB/s ± 0% -0.05%
Match/Easy0i/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.10%
Match/Easy0i/32 11.4MB/s ± 0% 19.5MB/s ± 0% +71.48%
Match/Easy0i/1K 12.3MB/s ± 0% 21.2MB/s ± 0% +72.62%
Match/Easy0i/32K 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12%
Match/Easy0i/1M 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12%
Match/Easy0i/32M 15.4MB/s ± 0% 18.6MB/s ± 0% +21.21%
Match/Easy1/16 2.17GB/s ± 0% 2.18GB/s ± 0% +0.24%
Match/Easy1/32 271MB/s ± 0% 333MB/s ± 0% +23.07%
Match/Easy1/1K 417MB/s ± 0% 648MB/s ± 0% +55.38%
Match/Easy1/32K 409MB/s ± 0% 600MB/s ± 0% +46.88%
Match/Easy1/1M 381MB/s ± 0% 558MB/s ± 0% +46.33%
Match/Easy1/32M 383MB/s ± 0% 561MB/s ± 0% +46.25%
Match/Medium/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.01%
Match/Medium/32 12.3MB/s ± 0% 21.4MB/s ± 0% +74.13%
Match/Medium/1K 13.1MB/s ± 0% 23.4MB/s ± 0% +78.73%
Match/Medium/32K 15.7MB/s ± 0% 21.6MB/s ± 0% +37.23%
Match/Medium/1M 15.8MB/s ± 0% 21.6MB/s ± 0% +36.93%
Match/Medium/32M 15.7MB/s ± 0% 21.0MB/s ± 0% +33.67%
Match/Hard/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.03%
Match/Hard/32 8.93MB/s ± 0% 13.10MB/s ± 0% +46.70%
Match/Hard/1K 9.48MB/s ± 0% 13.74MB/s ± 0% +44.94%
Match/Hard/32K 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87%
Match/Hard/1M 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87%
Match/Hard/32M 11.6MB/s ± 0% 14.2MB/s ± 0% +22.86%
Match/Hard1/16 1.44MB/s ± 0% 1.93MB/s ± 0% +34.03%
Match/Hard1/32 1.49MB/s ± 0% 1.99MB/s ± 0% +33.56%
Match/Hard1/1K 1.56MB/s ± 0% 2.05MB/s ± 0% +31.41%
Match/Hard1/32K 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48%
Match/Hard1/1M 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48%
Match/Hard1/32M 2.66MB/s ± 0% 2.79MB/s ± 0% +4.89%
Match_onepass_regex/16 18.4MB/s ± 0% 26.2MB/s ± 0% +42.41%
Match_onepass_regex/32 20.2MB/s ± 0% 29.5MB/s ± 0% +45.92%
Match_onepass_regex/1K 22.4MB/s ± 0% 33.8MB/s ± 0% +50.54%
Match_onepass_regex/32K 22.6MB/s ± 0% 33.9MB/s ± 0% +49.67%
Match_onepass_regex/1M 22.7MB/s ± 0% 33.9MB/s ± 0% +49.27%
Match_onepass_regex/32M 23.0MB/s ± 0% 33.9MB/s ± 0% +47.14%
Fixes #42709
Change-Id: Ice07fec2de4c5b1302febf8c2978ae8c1e4fd3e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/271337
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
|
|
Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is
and the shift value produce constant which can be encoded into an
RLWINM instruction.
Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate
masks produces a constant which can be encoded into RLWINM.
Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)).
Combine rotate word + and operations which can be encoded as a single
RLWINM/RLWNM instruction.
The most notable performance improvements arise from the crypto
benchmarks below (GOARCH=power8 on a ppc64le/linux):
pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le
ExpandKeyWithSalt 52.2µs ± 0% 47.5µs ± 0% -8.88%
ExpandKey 44.4µs ± 0% 40.3µs ± 0% -9.15%
pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le
Key 57.6ms ± 0% 52.3ms ± 0% -9.13%
pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le
Equal 90.9ms ± 0% 82.6ms ± 0% -9.13%
DefaultCost 91.0ms ± 0% 82.7ms ± 0% -9.12%
Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/260798
Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This is a simple case of changing the operand size of the existing 8-bit
And/Or.
I've also updated a few operand descriptions that were out-of-sync with
the implementation.
Change-Id: I95ac4445d08f7958768aec9a233698a2d652a39a
Reviewed-on: https://go-review.googlesource.com/c/go/+/263150
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This adds support to allow the use of mulli when one of the multiply
operands is a constant that fits in 16 bits.
This especially helps in the case where this instruction appears in
a loop since the load of the constant is not being moved out of the loop.
Some improvements seen in compress/flate on power9:
Decode/Digits/Huffman/1e4 259µs ± 0% 261µs ± 0% +0.57% (p=1.000 n=1+1)
Decode/Digits/Huffman/1e5 2.43ms ± 0% 2.45ms ± 0% +0.79% (p=1.000 n=1+1)
Decode/Digits/Huffman/1e6 23.9ms ± 0% 24.2ms ± 0% +0.86% (p=1.000 n=1+1)
Decode/Digits/Speed/1e4 278µs ± 0% 279µs ± 0% +0.34% (p=1.000 n=1+1)
Decode/Digits/Speed/1e5 2.80ms ± 0% 2.81ms ± 0% +0.29% (p=1.000 n=1+1)
Decode/Digits/Speed/1e6 28.0ms ± 0% 28.1ms ± 0% +0.28% (p=1.000 n=1+1)
Decode/Digits/Default/1e4 278µs ± 0% 278µs ± 0% +0.28% (p=1.000 n=1+1)
Decode/Digits/Default/1e5 2.68ms ± 0% 2.69ms ± 0% +0.19% (p=1.000 n=1+1)
Decode/Digits/Default/1e6 26.6ms ± 0% 26.6ms ± 0% +0.21% (p=1.000 n=1+1)
Decode/Digits/Compression/1e4 278µs ± 0% 278µs ± 0% +0.00% (p=1.000 n=1+1)
Decode/Digits/Compression/1e5 2.68ms ± 0% 2.69ms ± 0% +0.21% (p=1.000 n=1+1)
Decode/Digits/Compression/1e6 26.6ms ± 0% 26.6ms ± 0% +0.07% (p=1.000 n=1+1)
Decode/Newton/Huffman/1e4 322µs ± 0% 312µs ± 0% -2.84% (p=1.000 n=1+1)
Decode/Newton/Huffman/1e5 3.11ms ± 0% 2.91ms ± 0% -6.41% (p=1.000 n=1+1)
Decode/Newton/Huffman/1e6 31.4ms ± 0% 29.3ms ± 0% -6.85% (p=1.000 n=1+1)
Decode/Newton/Speed/1e4 282µs ± 0% 269µs ± 0% -4.69% (p=1.000 n=1+1)
Decode/Newton/Speed/1e5 2.29ms ± 0% 2.20ms ± 0% -4.13% (p=1.000 n=1+1)
Decode/Newton/Speed/1e6 22.7ms ± 0% 21.3ms ± 0% -6.06% (p=1.000 n=1+1)
Decode/Newton/Default/1e4 254µs ± 0% 237µs ± 0% -6.60% (p=1.000 n=1+1)
Decode/Newton/Default/1e5 1.86ms ± 0% 1.75ms ± 0% -5.99% (p=1.000 n=1+1)
Decode/Newton/Default/1e6 18.1ms ± 0% 17.4ms ± 0% -4.10% (p=1.000 n=1+1)
Decode/Newton/Compression/1e4 254µs ± 0% 244µs ± 0% -3.91% (p=1.000 n=1+1)
Decode/Newton/Compression/1e5 1.85ms ± 0% 1.79ms ± 0% -3.10% (p=1.000 n=1+1)
Decode/Newton/Compression/1e6 18.0ms ± 0% 17.3ms ± 0% -3.88% (p=1.000 n=1+1)
Change-Id: I840320fab1c4bf64c76b001c2651ab79f23df4eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/259444
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
A recent change to improve shifts was generating some
invalid cases when the rule was based on an AND. The
extended mnemonics CLRLSLDI and CLRLSLWI only allow
certain values for the operands and in the mask case
those values were not being checked properly. This
adds a check to those rules to verify that the
'b' and 'n' values used when an AND was part of the rule
have correct values.
There was a bug in some diag messages in asm9. The
message expected 3 values but only provided 2. Those are
corrected here also.
The test/codegen/shift.go was updated to add a few more
cases to check for the case mentioned here.
Some of the comments that mention the order of operands
in these extended mnemonics were wrong and those have been
corrected.
Fixes #41683.
Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31
Reviewed-on: https://go-review.googlesource.com/c/go/+/258138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This adds support for the extswsli instruction which combines
extsw followed by a shift.
New benchmark demonstrates the improvement:
name old time/op new time/op delta
ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3)
Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.
These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.
Some rules in PPC64.rules were moved because the ordering prevented
some matching.
The following results were generated on power9.
hash/crc32:
CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41%
CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50%
CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46%
CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80%
CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27%
CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15%
CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22%
CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79%
CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56%
CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53%
crypto/rc4:
RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1)
RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1)
RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1)
crypto/sha1:
Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1)
Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1)
Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1)
Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1)
There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.
Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
This merges an lis + subf into subfic, and for 32b constants
lwa + subf into oris + ori + subf.
The carry bit is no longer used in code generation, therefore
I think we can clobber it as needed. Note, lowered borrow/carry
arithmetic is self-contained and thus is not affected.
A few extra rules are added to ensure early transformations to
SUBFCconst don't trip up earlier rules, fold constant operations,
or otherwise simplify lowering. Likewise, tests are added to
ensure all rules are hit. Generic constant folding catches
trivial cases, however some lowering rules insert arithmetic
which can introduce new opportunities (e.g BitLen or Slicemask).
I couldn't find a specific benchmark to demonstrate noteworthy
improvements, but this is generating subfic in many of the default
bent test binaries, so we are at least saving a little code space.
Change-Id: Iad7c6e5767eaa9dc24dc1c989bd1c8cfe1982012
Reviewed-on: https://go-review.googlesource.com/c/go/+/249461
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
The intermediate SSA opcodes* are no longer generated during the
lowering pass. The shifting rules have been improved using ISEL.
Therefore, we can remove them and the rules which expand them.
* The removed opcodes are:
LoweredAdd64Carry
ADDconstForCarry
MaskIfNotCarry
FlagCarryClear
FlagCarrySet
Change-Id: I1ebe2726ed988f29ed4800c8f57b428f7a214cd0
Reviewed-on: https://go-review.googlesource.com/c/go/+/249462
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Add a new lowering rule to match and replace such instances
with the MADDLD instruction available on power9 where
possible.
Likewise, this plumbs in a new ppc64 ssa opcode to house
the newly generated MADDLD instructions.
When testing ed25519, this reduced binary size by 936B.
Similarly, MADDLD combination occcurs in a few other less
obvious cases such as division by constant.
Testing of golang.org/x/crypto/ed25519 shows non-trivial
speedup during keygeneration:
name old time/op new time/op delta
KeyGeneration 65.2µs ± 0% 63.1µs ± 0% -3.19%
Signing 64.3µs ± 0% 64.4µs ± 0% +0.16%
Verification 147µs ± 0% 147µs ± 0% +0.11%
Similarly, this test binary has shrunk by 66488B.
Change-Id: I077aeda7943119b41f07e4e62e44a648f16e4ad0
Reviewed-on: https://go-review.googlesource.com/c/go/+/248723
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
This updates the PPC64.rules file to use the MOD instructions
that are available in power9. Prior to power9 this is done
using a longer sequence with multiply and divide.
Included in this change is removal of the REM* opcode variations
that set the CC or OV bits since their settings are based
on the DIV and are not appropriate for the REM.
Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/229761
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Will help with strongly typed rewrite rules.
Change-Id: Ifbf316a49f4081322b3b8f13bc962713437d9aba
Reviewed-on: https://go-review.googlesource.com/c/go/+/227785
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
|
|
This change includes the following:
- Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9.
These instructions do not require an index register, which
allows more loads and stores within a loop without initializing
multiple index registers. The LoweredQuadXXX generate LXV/STXV.
- Create LoweredMoveXXXShort and LoweredZeroXXXShort for short
moves that don't generate loops, and therefore don't clobber the
address registers or flags.
- Use registers other than R3 and R4 to avoid conflicting with
registers that have already been allocated to avoid unnecessary
register moves.
- Eliminate the use of R14 as scratch register and use R31
instead.
- Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a
loop with more than 3 iterations.
This performance opportunity was noticed in github.com/golang/snappy
benchmarks. Results on power9:
WordsDecode1e1 54.1ns ± 0% 53.8ns ± 0% -0.51% (p=0.029 n=4+4)
WordsDecode1e2 287ns ± 0% 282ns ± 1% -1.83% (p=0.029 n=4+4)
WordsDecode1e3 3.98µs ± 0% 3.64µs ± 0% -8.52% (p=0.029 n=4+4)
WordsDecode1e4 66.9µs ± 0% 67.0µs ± 0% +0.20% (p=0.029 n=4+4)
WordsDecode1e5 723µs ± 0% 723µs ± 0% -0.01% (p=0.200 n=4+4)
WordsDecode1e6 7.21ms ± 0% 7.21ms ± 0% -0.02% (p=1.000 n=4+4)
WordsEncode1e1 29.9ns ± 0% 29.4ns ± 0% -1.51% (p=0.029 n=4+4)
WordsEncode1e2 2.12µs ± 0% 1.75µs ± 0% -17.70% (p=0.029 n=4+4)
WordsEncode1e3 11.7µs ± 0% 11.2µs ± 0% -4.61% (p=0.029 n=4+4)
WordsEncode1e4 119µs ± 0% 120µs ± 0% +0.36% (p=0.029 n=4+4)
WordsEncode1e5 1.21ms ± 0% 1.22ms ± 0% +0.41% (p=0.029 n=4+4)
WordsEncode1e6 12.0ms ± 0% 12.0ms ± 0% +0.57% (p=0.029 n=4+4)
RandomEncode 286µs ± 0% 203µs ± 0% -28.82% (p=0.029 n=4+4)
ExtendMatch 47.4µs ± 0% 47.0µs ± 0% -0.85% (p=0.029 n=4+4)
Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858
Reviewed-on: https://go-review.googlesource.com/c/go/+/226539
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Because of the index, these ops can't guarantee faulting if arg0 is nil.
Clean up the PPC64 index ops - they can't take a sym or an offset.
Noticed while debugging #37881. I don't think it is the cause, but I guess
there is a chance.
Update #37881
Change-Id: Ic22925250bf7b1ba64e3cea1a65638bc4bab390c
Reviewed-on: https://go-review.googlesource.com/c/go/+/224457
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Const64 gets lowered to MOVDconst.
Change rules using interior Const64 to use MOVDconst instead,
to be less dependent on rule application order.
As a result of doing this, some of the rules end up being
exact duplicates; remove those.
We had those exact duplicates because of the order dependency;
ppc64 had no way to optimize away shifts by a constant
if the initial lowering didn't catch it.
Add those optimizations as well.
The outcome is the same, but this makes the overall rules more robust.
Change-Id: Iadd97a9fe73d52358d571d022ace145e506d160b
Reviewed-on: https://go-review.googlesource.com/c/go/+/220877
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
Change-Id: If82ebd9cd6470863eb5de9e031e7905a66218857
Reviewed-on: https://go-review.googlesource.com/c/go/+/204159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
On PPC64, indirect calls can be made through LR or CTR. Currently
both are used. This CL changes it to always use LR.
For async preemption, to return from the injected call, we need
an indirect jump back to the PC we preeempted. This jump can be
made through LR or CTR. So we'll have to clobber either LR or CTR.
Currently, LR is used more frequently. In particular, for a leaf
function, LR is live throughout the function. We don't want to
make leaf functions nonpreemptible. So we choose CTR for the call
injection. For code sequences that use CTR, if it is ok to use
another register, change it to.
Plus, it is a call so it will clobber LR anyway. It doesn't need
to also clobber CTR (even without preemption).
Change-Id: I07bd0e93b94a1a3aa2be2cd465801136165d8ab8
Reviewed-on: https://go-review.googlesource.com/c/go/+/203822
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
For #10958, #24543, but makes sense on its own.
Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203284
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
ZeroAuto was used with the ambiguously live logic. The
ambiguously live logic is removed as we switched to stack
objects. It is now never called. Remove.
Change-Id: If4cdd7fed5297f8ab591cc392a76c80f57820856
Reviewed-on: https://go-review.googlesource.com/c/go/+/203538
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This improves the code generated for LoweredMove and LoweredZero by
using LXVD2X and STXVD2X to move 16 bytes at a time. These instructions
are now used if the size to be moved or zeroed is >= 64. These same
instructions have already been used in the asm implementations for
memmove and memclr.
Some examples where this shows an improvement on power8:
MakeSlice/Byte 27.3ns ± 1% 25.2ns ± 0% -7.69%
MakeSlice/Int16 40.2ns ± 0% 35.2ns ± 0% -12.39%
MakeSlice/Int 94.9ns ± 1% 77.9ns ± 0% -17.92%
MakeSlice/Ptr 129ns ± 1% 103ns ± 0% -20.16%
MakeSlice/Struct/24 176ns ± 1% 131ns ± 0% -25.67%
MakeSlice/Struct/32 200ns ± 1% 142ns ± 0% -29.09%
MakeSlice/Struct/40 220ns ± 2% 156ns ± 0% -28.82%
GrowSlice/Byte 81.4ns ± 0% 73.4ns ± 0% -9.88%
GrowSlice/Int16 118ns ± 1% 98ns ± 0% -17.03%
GrowSlice/Int 178ns ± 1% 134ns ± 1% -24.65%
GrowSlice/Ptr 249ns ± 4% 212ns ± 0% -14.94%
GrowSlice/Struct/24 294ns ± 5% 215ns ± 0% -27.08%
GrowSlice/Struct/32 315ns ± 1% 248ns ± 0% -21.49%
GrowSlice/Struct/40 382ns ± 4% 289ns ± 1% -24.38%
ExtendSlice/IntSlice 109ns ± 1% 90ns ± 1% -17.51%
ExtendSlice/PointerSlice 142ns ± 2% 118ns ± 0% -16.75%
ExtendSlice/NoGrow 6.02ns ± 0% 5.88ns ± 0% -2.33%
Append 27.2ns ± 0% 27.6ns ± 0% +1.38%
AppendGrowByte 4.20ms ± 3% 2.60ms ± 0% -38.18%
AppendGrowString 134ms ± 3% 102ms ± 2% -23.62%
AppendSlice/1Bytes 5.65ns ± 0% 5.67ns ± 0% +0.35%
AppendSlice/4Bytes 6.40ns ± 0% 6.55ns ± 0% +2.34%
AppendSlice/7Bytes 8.74ns ± 0% 8.84ns ± 0% +1.14%
AppendSlice/8Bytes 5.68ns ± 0% 5.70ns ± 0% +0.40%
AppendSlice/15Bytes 9.31ns ± 0% 9.39ns ± 0% +0.86%
AppendSlice/16Bytes 14.0ns ± 0% 5.8ns ± 0% -58.32%
AppendSlice/32Bytes 5.72ns ± 0% 5.68ns ± 0% -0.66%
AppendSliceLarge/1024Bytes 918ns ± 8% 615ns ± 1% -33.00%
AppendSliceLarge/4096Bytes 3.25µs ± 1% 1.92µs ± 1% -40.84%
AppendSliceLarge/16384Bytes 8.70µs ± 2% 4.69µs ± 0% -46.08%
AppendSliceLarge/65536Bytes 18.1µs ± 3% 7.9µs ± 0% -56.30%
AppendSliceLarge/262144Bytes 69.8µs ± 2% 25.9µs ± 0% -62.91%
AppendSliceLarge/1048576Bytes 258µs ± 1% 93µs ± 0% -63.96%
Change-Id: I21625dbe231a2029ddb9f7d73f5a6417b35c1e49
Reviewed-on: https://go-review.googlesource.com/c/go/+/199639
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.
Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.
This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.
Passes toolstash-check -all.
Results of compilebench:
name old time/op new time/op delta
Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20)
Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18)
GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18)
Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18)
SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18)
Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20)
GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20)
Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20)
Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20)
XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18)
LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19)
ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20)
LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20)
StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20)
name old user-time/op new user-time/op delta
Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20)
Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20)
GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19)
Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18)
SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18)
Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18)
GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20)
Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18)
Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20)
XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20)
LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20)
ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19)
LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20)
name old object-bytes new object-bytes delta
Template 559kB ± 0% 559kB ± 0% ~ (all equal)
Unicode 216kB ± 0% 216kB ± 0% ~ (all equal)
GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal)
Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20)
SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20)
Flate 343kB ± 0% 343kB ± 0% ~ (all equal)
GoParser 441kB ± 0% 441kB ± 0% ~ (all equal)
Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal)
Tar 487kB ± 0% 487kB ± 0% ~ (all equal)
XML 632kB ± 0% 632kB ± 0% ~ (all equal)
name old export-bytes new export-bytes delta
Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal)
Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal)
GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal)
Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20)
SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20)
Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal)
GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal)
Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal)
Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal)
XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal)
CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal)
CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal)
CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal)
Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
PPC64's ANDCC, ORCC, XORCC SSA ops produce a flags value, which
should not have register mask of an integer register.
Fixes #34468.
Change-Id: Ic762e423b20275fd9f8118dae7951c258d59738c
Reviewed-on: https://go-review.googlesource.com/c/go/+/196960
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This cleans up the isel code generation in ssa for ppc64x.
Current there is no isel op and the isel code is only
generated from pseudo ops in ppc64/ssa.go, and only using
operands with values 0 or 1. When the isel is generated,
there is always a load of 1 into the temp register before it.
This change implements the isel op so it can be used in PPC64.rules,
and can recognize operand values other than 0 or 1. This also
eliminates the forced load of 1, so it will be loaded only if
needed.
This will make the isel code generation consistent with other ops,
and allow future rule changes that can take advantage of having
a more general purpose isel rule.
Change-Id: I363e1dbd3f7f5dfecb53187ad51cce409a8d1f8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/195057
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
Most changes are removing redundant declaration of type when direct
instantiating value of map or slice, e.g. []T{T{}} become []T{{}}.
Small changes are removing the high order of subslice if its value
is the length of slice itself, e.g. T[:len(T)] become T[:].
The following file is excluded due to incompatibility with go1.4,
- src/cmd/compile/internal/gc/ssa.go
Change-Id: Id3abb09401795ce1e6da591a89749cba8502fb26
Reviewed-on: https://go-review.googlesource.com/c/go/+/166437
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Change-Id: Id52a5730cf9207ee7ccebac4ef12791dc5720e7c
Reviewed-on: https://go-review.googlesource.com/c/go/+/172283
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
This is a follow up from a review comment at the end of the last
Go release, to provide a more meaningful name for ginsnop2.
Updates #30475
Change-Id: Ice9efd763bf2204a9e8c55ae230d3e8a80210108
Reviewed-on: https://go-review.googlesource.com/c/go/+/174757
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This change creates an intrinsic for Add64 for ppc64x and adds a
testcase for it.
name old time/op new time/op delta
Add64-160 1.90ns ±40% 2.29ns ± 0% ~ (p=0.119 n=5+5)
Add64multiple-160 6.69ns ± 2% 2.45ns ± 4% -63.47% (p=0.016 n=4+5)
Change-Id: I9abe6fb023fdf62eea3c9b46a1820f60bb0a7f97
Reviewed-on: https://go-review.googlesource.com/c/go/+/173758
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
This opcode was only used to mark unreachable code for plive to use.
plive now uses the SSA representation, so it knows locations are
unreachable because they are ends of Exit blocks. It doesn't need
these opcodes any more.
These opcodes actually used space in the binary, 2 bytes per undef
on x86 and more for other archs.
Makes the amd64 go binary 0.2% smaller.
Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620
Reviewed-on: https://go-review.googlesource.com/c/go/+/171058
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
zeros instructions on POWER9
This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD)
to the assembler and generates them in SSA when GOPPC64=power9.
name old time/op new time/op delta
TrailingZeros-160 1.59ns ±20% 1.45ns ±10% -8.81% (p=0.000 n=14+13)
TrailingZeros8-160 1.55ns ±23% 1.62ns ±44% ~ (p=0.593 n=13+15)
TrailingZeros16-160 1.78ns ±23% 1.62ns ±38% -9.31% (p=0.003 n=14+14)
TrailingZeros32-160 1.64ns ±10% 1.49ns ± 9% -9.15% (p=0.000 n=13+14)
TrailingZeros64-160 1.53ns ± 6% 1.45ns ± 5% -5.38% (p=0.000 n=15+13)
Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596
Reviewed-on: https://go-review.googlesource.com/c/go/+/167517
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
A few examples (for accessing a slice of length 3):
s[-1] runtime error: index out of range [-1]
s[3] runtime error: index out of range [3] with length 3
s[-1:0] runtime error: slice bounds out of range [-1:]
s[3:0] runtime error: slice bounds out of range [3:0]
s[3:-1] runtime error: slice bounds out of range [:-1]
s[3:4] runtime error: slice bounds out of range [:4] with capacity 3
s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3
Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.
An exhaustive set of examples is in issue30116[u].out in the CL.
The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.
Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.
Fixes #30116
Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
tracebacks
A recent change to fix stacktraces for inlined functions
introduced a regression on ppc64le when compiling position
independent code. That happened because ginsnop2 was called for
the purpose of inserting a NOP to identify the location of
the inlined function, when ginsnop should have been used.
ginsnop2 is intended to be used before deferreturn to ensure
r2 is properly restored when compiling position independent code.
In some cases the location where r2 is loaded from might not be
initialized. If that happens and r2 is used to generate an address,
the result is likely a SEGV.
This fixes that problem.
Fixes #30283
Change-Id: If70ef27fc65ef31969712422306ac3a57adbd5b6
Reviewed-on: https://go-review.googlesource.com/c/163337
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Work involved in getting a stack trace is divided between
runtime.Callers and runtime.CallersFrames.
Before this CL, runtime.Callers returns a pc per runtime frame.
runtime.CallersFrames is responsible for expanding a runtime frame
into potentially multiple user frames.
After this CL, runtime.Callers returns a pc per user frame.
runtime.CallersFrames just maps those to user frame info.
Entries in the result of runtime.Callers are now pcs
of the calls (or of the inline marks), not of the instruction
just after the call.
Fixes #29007
Fixes #28640
Update #26320
Change-Id: I1c9567596ff73dc73271311005097a9188c3406f
Reviewed-on: https://go-review.googlesource.com/c/152537
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
This commit adapts compile tool to create correct nilchecks for AIX.
AIX allows to load a nil pointer. Therefore, the default nilcheck
which issues a load must be replaced by a CMP instruction followed by a
store at 0x0 if the value is nil. The store will trigger a SIGSEGV as on
others OS.
The nilcheck algorithm must be adapted to do not remove nilcheck if it's
only a read. Stores are detected with v.Type.IsMemory().
Tests related to nilptr must be adapted to the previous changements.
nilptr.go cannot be used as it's because the AIX address space starts at
1<<32.
Change-Id: I9f5aaf0b7e185d736a9b119c0ed2fe4e5bd1e7af
Reviewed-on: https://go-review.googlesource.com/c/144538
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This commit allows the runtime to handle 64bits addresses returned by
mmap syscall on AIX.
Mmap syscall returns addresses on 59bits on AIX. But the Arena
implementation only allows addresses with less than 48 bits.
This commit increases the arena size up to 1<<60 for aix/ppc64.
Update: #25893
Change-Id: Iea72e8a944d10d4f00be915785e33ae82dd6329e
Reviewed-on: https://go-review.googlesource.com/c/138736
Reviewed-by: Austin Clements <austin@google.com>
|