aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ppc64
AgeCommit message (Collapse)Author
2021-08-03[dev.typeparams] runtime,cmd/compile,cmd/link: replace jmpdefer with a loopAustin Clements
Currently, deferreturn runs deferred functions by backing up its return PC to the deferreturn call, and then effectively tail-calling the deferred function (via jmpdefer). The effect of this is that the deferred function appears to be called directly from the deferee, and when it returns, the deferee calls deferreturn again so it can run the next deferred function if necessary. This unusual flow control leads to a large number of special cases and complications all over the tool chain. This used to be necessary because deferreturn copied the deferred function's argument frame directly into its caller's frame and then had to invoke that call as if it had been called from its caller's frame so it could access it arguments. But now that we've simplified defer processing so the runtime only deals with argument-less closures, this approach is no longer necessary. This CL simplifies all of this by making deferreturn simply call deferred functions in a loop. This eliminates the need for jmpdefer, so we can delete a bunch of per-architecture assembly code. This eliminates several special cases on Wasm, since it couldn't support these calling shenanigans directly and thus had to simulate the loop a different way. Now Wasm can largely work the way the other platforms do. This eliminates the per-architecture Ginsnopdefer operation. On PPC64, this was necessary to reload the TOC pointer after the tail call (since TOC pointers in general make tail calls impossible). The tail call is gone, and in the case where we do force a jump to the deferreturn call when recovering from an open-coded defer, we go through gogo (via runtime.recovery), which handles the TOC. On other platforms, we needed a NOP so traceback didn't get confused by seeing the return to the CALL instruction, rather than the usual return to the instruction following the CALL instruction. Now we don't inject a return to the CALL instruction at all, so this NOP is also unnecessary. The one potential effect of this is that deferreturn could now appear in stack traces from deferred functions. However, this could already happen from open-coded defers, so we've long since marked deferreturn as a "wrapper" so it gets elided not only from printed stack traces, but from runtime.Callers*. This is a retry of CL 337652 because we had to back out its parent. There are no changes in this version. Change-Id: I3f54b7fec1d7ccac71cc6cf6835c6a46b7e5fb6c Reviewed-on: https://go-review.googlesource.com/c/go/+/339397 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-07-30[dev.typeparams] Revert "[dev.typeparams] runtime,cmd/compile,cmd/link: ↵Austin Clements
replace jmpdefer with a loop" This reverts CL 227652. I'm reverting CL 337651 and this builds on top of it. Change-Id: I03ce363be44c2a3defff2e43e7b1aad83386820d Reviewed-on: https://go-review.googlesource.com/c/go/+/338709 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-07-30[dev.typeparams] runtime,cmd/compile,cmd/link: replace jmpdefer with a loopAustin Clements
Currently, deferreturn runs deferred functions by backing up its return PC to the deferreturn call, and then effectively tail-calling the deferred function (via jmpdefer). The effect of this is that the deferred function appears to be called directly from the deferee, and when it returns, the deferee calls deferreturn again so it can run the next deferred function if necessary. This unusual flow control leads to a large number of special cases and complications all over the tool chain. This used to be necessary because deferreturn copied the deferred function's argument frame directly into its caller's frame and then had to invoke that call as if it had been called from its caller's frame so it could access it arguments. But now that we've simplified defer processing so the runtime only deals with argument-less closures, this approach is no longer necessary. This CL simplifies all of this by making deferreturn simply call deferred functions in a loop. This eliminates the need for jmpdefer, so we can delete a bunch of per-architecture assembly code. This eliminates several special cases on Wasm, since it couldn't support these calling shenanigans directly and thus had to simulate the loop a different way. Now Wasm can largely work the way the other platforms do. This eliminates the per-architecture Ginsnopdefer operation. On PPC64, this was necessary to reload the TOC pointer after the tail call (since TOC pointers in general make tail calls impossible). The tail call is gone, and in the case where we do force a jump to the deferreturn call when recovering from an open-coded defer, we go through gogo (via runtime.recovery), which handles the TOC. On other platforms, we needed a NOP so traceback didn't get confused by seeing the return to the CALL instruction, rather than the usual return to the instruction following the CALL instruction. Now we don't inject a return to the CALL instruction at all, so this NOP is also unnecessary. The one potential effect of this is that deferreturn could now appear in stack traces from deferred functions. However, this could already happen from open-coded defers, so we've long since marked deferreturn as a "wrapper" so it gets elided not only from printed stack traces, but from runtime.Callers*. Change-Id: Ie9f700cd3fb774f498c9edce363772a868407bf7 Reviewed-on: https://go-review.googlesource.com/c/go/+/337652 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-04-16internal/buildcfg: move build configuration out of cmd/internal/objabiRuss Cox
The go/build package needs access to this configuration, so move it into a new package available to the standard library. Change-Id: I868a94148b52350c76116451f4ad9191246adcff Reviewed-on: https://go-review.googlesource.com/c/go/+/310731 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-03-19cmd/compile: add clobberdeadreg modeCherry Zhang
When -clobberdeadreg flag is set, the compiler inserts code that clobbers integer registers at call sites. This may be helpful for debugging register ABI. Only implemented on AMD64 for now. Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899 Reviewed-on: https://go-review.googlesource.com/c/go/+/302809 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2021-03-10cmd/compile/internal: improve handling of DS form offsets on ppc64xLynn Boger
In the ppc64 ISA DS form loads and stores are restricted to offset fields that are a multiple of 4. This is currently handled with checks in the rules that generate MOVDload, MOVWload, MOVDstore and MOVDstorezero to prevent invalid instructions from getting to the assembler. An unhandled case was discovered which led to the search for a better solution to this problem. Now, instead of checking the offset in the rules, this will be detected when processing these Ops in ssaGenValues in ppc64/ssa.go. If the offset is not valid, the address of the symbol to be loaded or stored will be computed using the base register + offset, and that value used in the new base register. With the full address in the base register, the offset field can be zero in the instruction. Updates #44739 Change-Id: I4f3c0c469ae70a63e3add295c9b55ea0e30ef9b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/299789 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-09cmd/asm,cmd/compile: support 5 operand RLWNM/RLWMI on ppc64Paul E. Murphy
These instructions are actually 5 argument opcodes as specified by the ISA. Prior to this patch, the MB and ME arguments were merged into a single bitmask operand to workaround the limitations of the ppc64 assembler backend. This limitation no longer exists. Thus, we can pass operands for these opcodes without having to merge the MB and ME arguments in the assembler frontend or compiler backend. Likewise, support for 4 operand variants is unchanged. Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb Reviewed-on: https://go-review.googlesource.com/c/go/+/298750 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2021-02-25cmd/internal/obj: add Prog.SetFrom3{Reg,Const}Josh Bleecher Snyder
These are the the most common uses, and they reduce line noise. I don't love adding new deprecated APIs, but since they're trivial wrappers, it'll be very easy to update them along with the rest. No functional changes; passes toolstash-check. Change-Id: I691a8175cfef9081180e463c63f326376af3f3a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/296009 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-12-23[dev.regabi] cmd/compile: split out package ssagen [generated]Russ Cox
[git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23[dev.regabi] cmd/compile: split out package objw [generated]Russ Cox
Object file writing routines are used not just at the end of the compilation but also during static data layout in walk. Split them into their own package. [git-generate] cd src/cmd/compile/internal/gc rf ' # Move bit vector to new package bitvec mv bvec.n bvec.N mv bvec.b bvec.B mv bvec BitVec mv bvalloc New mv bvbulkalloc NewBulk mv bulkBvec.next bulkBvec.Next mv bulkBvec Bulk mv H0 h0 mv Hp hp # Leave bvecSet and bitmap hashes behind - not needed as broadly. mv bvecSet.extractUniqe bvecSet.extractUnique mv h0 bvecSet bvecSet.grow bvecSet.add \ bvecSet.extractUnique hashbitmap bvset.go mv bv.go cmd/compile/internal/bitvec ex . ../arm ../arm64 ../mips ../mips64 ../ppc64 ../s390x ../riscv64 { import "cmd/internal/obj" var a *obj.Addr var i int64 Addrconst(a, i) -> a.SetConst(i) var p, to *obj.Prog Patch(p, to) -> p.To.SetTarget(to) } rm Addrconst Patch # Move object-writing API to new package objw mv duint8 Objw_Uint8 mv duint16 Objw_Uint16 mv duint32 Objw_Uint32 mv duintptr Objw_Uintptr mv duintxx Objw_UintN mv dsymptr Objw_SymPtr mv dsymptrOff Objw_SymPtrOff mv dsymptrWeakOff Objw_SymPtrWeakOff mv ggloblsym Objw_Global mv dbvec Objw_BitVec mv newProgs NewProgs mv Progs.clearp Progs.Clear mv Progs.settext Progs.SetText mv Progs.next Progs.Next mv Progs.pc Progs.PC mv Progs.pos Progs.Pos mv Progs.curfn Progs.CurFunc mv Progs.progcache Progs.Cache mv Progs.cacheidx Progs.CacheIndex mv Progs.nextLive Progs.NextLive mv Progs.prevLive Progs.PrevLive mv Progs.Appendpp Progs.Append mv LivenessIndex.stackMapIndex LivenessIndex.StackMapIndex mv LivenessIndex.isUnsafePoint LivenessIndex.IsUnsafePoint mv Objw_Uint8 Objw_Uint16 Objw_Uint32 Objw_Uintptr Objw_UintN \ Objw_SymPtr Objw_SymPtrOff Objw_SymPtrWeakOff Objw_Global \ Objw_BitVec \ objw.go mv sharedProgArray NewProgs Progs \ LivenessIndex StackMapDontCare \ LivenessDontCare LivenessIndex.StackMapValid \ Progs.NewProg Progs.Flush Progs.Free Progs.Prog Progs.Clear Progs.Append Progs.SetText \ prog.go mv prog.go objw.go cmd/compile/internal/objw # Move ggloblnod to obj with the rest of the non-objw higher-level writing. mv ggloblnod obj.go ' cd ../objw rf ' mv Objw_Uint8 Uint8 mv Objw_Uint16 Uint16 mv Objw_Uint32 Uint32 mv Objw_Uintptr Uintptr mv Objw_UintN UintN mv Objw_SymPtr SymPtr mv Objw_SymPtrOff SymPtrOff mv Objw_SymPtrWeakOff SymPtrWeakOff mv Objw_Global Global mv Objw_BitVec BitVec ' Change-Id: I2b87085aa788564fb322e9c55bddd73347b4d5fd Reviewed-on: https://go-review.googlesource.com/c/go/+/279310 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23[dev.regabi] cmd/compile: move type size calculations into package types ↵Russ Cox
[generated] To break up package gc, we need to put these calculations somewhere lower in the import graph, either an existing or new package. Package types already needs this code and is using hacks to get it without an import cycle. We can remove the hacks and set up for the new package gc by moving the code into package types itself. [git-generate] cd src/cmd/compile/internal/gc rf ' # Remove old import cycle hacks in gc. rm TypecheckInit:/types.Widthptr =/-0,/types.Dowidth =/+0 \ ../ssa/export_test.go:/types.Dowidth =/-+ ex { import "cmd/compile/internal/types" types.Widthptr -> Widthptr types.Dowidth -> dowidth } # Disable CalcSize in tests instead of base.Fatalf sub dowidth:/base.Fatalf\("dowidth without betypeinit"\)/ \ // Assume this is a test. \ return # Move size calculation into cmd/compile/internal/types mv Widthptr PtrSize mv Widthreg RegSize mv slicePtrOffset SlicePtrOffset mv sliceLenOffset SliceLenOffset mv sliceCapOffset SliceCapOffset mv sizeofSlice SliceSize mv sizeofString StringSize mv skipDowidthForTracing SkipSizeForTracing mv dowidth CalcSize mv checkwidth CheckSize mv widstruct calcStructOffset mv sizeCalculationDisabled CalcSizeDisabled mv defercheckwidth DeferCheckSize mv resumecheckwidth ResumeCheckSize mv typeptrdata PtrDataSize mv \ PtrSize RegSize SlicePtrOffset SkipSizeForTracing typePos align.go PtrDataSize \ size.go mv size.go cmd/compile/internal/types ' : # Remove old import cycle hacks in types. cd ../types rf ' ex { Widthptr -> PtrSize Dowidth -> CalcSize } rm Widthptr Dowidth ' Change-Id: Ib96cdc6bda2617235480c29392ea5cfb20f60cd8 Reviewed-on: https://go-review.googlesource.com/c/go/+/279234 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23[dev.regabi] cmd/compile: group known symbols, packages, names [generated]Russ Cox
There are a handful of pre-computed magic symbols known by package gc, and we need a place to store them. If we keep them together, the need for type *ir.Name means that package ir is the lowest package in the import hierarchy that they can go in. And package ir needs gopkg for methodSymSuffix (in a later CL), so they can't go any higher either, at least not all together. So package ir it is. Rather than dump them all into the top-level package ir namespace, however, we introduce global structs, Syms, Pkgs, and Names, and make the known symbols, packages, and names fields of those. [git-generate] cd src/cmd/compile/internal/gc rf ' add go.go:$ \ // Names holds known names. \ var Names struct{} \ \ // Syms holds known symbols. \ var Syms struct {} \ \ // Pkgs holds known packages. \ var Pkgs struct {} \ mv staticuint64s Names.Staticuint64s mv zerobase Names.Zerobase mv assertE2I Syms.AssertE2I mv assertE2I2 Syms.AssertE2I2 mv assertI2I Syms.AssertI2I mv assertI2I2 Syms.AssertI2I2 mv deferproc Syms.Deferproc mv deferprocStack Syms.DeferprocStack mv Deferreturn Syms.Deferreturn mv Duffcopy Syms.Duffcopy mv Duffzero Syms.Duffzero mv gcWriteBarrier Syms.GCWriteBarrier mv goschedguarded Syms.Goschedguarded mv growslice Syms.Growslice mv msanread Syms.Msanread mv msanwrite Syms.Msanwrite mv msanmove Syms.Msanmove mv newobject Syms.Newobject mv newproc Syms.Newproc mv panicdivide Syms.Panicdivide mv panicshift Syms.Panicshift mv panicdottypeE Syms.PanicdottypeE mv panicdottypeI Syms.PanicdottypeI mv panicnildottype Syms.Panicnildottype mv panicoverflow Syms.Panicoverflow mv raceread Syms.Raceread mv racereadrange Syms.Racereadrange mv racewrite Syms.Racewrite mv racewriterange Syms.Racewriterange mv SigPanic Syms.SigPanic mv typedmemclr Syms.Typedmemclr mv typedmemmove Syms.Typedmemmove mv Udiv Syms.Udiv mv writeBarrier Syms.WriteBarrier mv zerobaseSym Syms.Zerobase mv arm64HasATOMICS Syms.ARM64HasATOMICS mv armHasVFPv4 Syms.ARMHasVFPv4 mv x86HasFMA Syms.X86HasFMA mv x86HasPOPCNT Syms.X86HasPOPCNT mv x86HasSSE41 Syms.X86HasSSE41 mv WasmDiv Syms.WasmDiv mv WasmMove Syms.WasmMove mv WasmZero Syms.WasmZero mv WasmTruncS Syms.WasmTruncS mv WasmTruncU Syms.WasmTruncU mv gopkg Pkgs.Go mv itabpkg Pkgs.Itab mv itablinkpkg Pkgs.Itablink mv mappkg Pkgs.Map mv msanpkg Pkgs.Msan mv racepkg Pkgs.Race mv Runtimepkg Pkgs.Runtime mv trackpkg Pkgs.Track mv unsafepkg Pkgs.Unsafe mv Names Syms Pkgs symtab.go mv symtab.go cmd/compile/internal/ir ' Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/279235 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25[dev.regabi] cmd/compile: replace *Node type with an interface Node [generated]Russ Cox
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct. The previous CL defined an interface INode modeling a *Node. This CL: - Changes all references outside internal/ir to use INode, along with many references inside internal/ir as well. - Renames Node to node. - Renames INode to Node So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible, and the code outside package ir is now (clearly) using only the interface. The usual rule is never to redefine an existing name with a new meaning, so that old code that hasn't been updated gets a "unknown name" error instead of more mysterious errors or silent misbehavior. That rule would caution against replacing Node-the-struct with Node-the-interface, as in this CL, because code that says *Node would now be using a pointer to an interface. But this CL is being landed at the same time as another that moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node, which does follow the rule: any lingering references to gc.Node will be told it's gone, not silently start using pointers to interfaces. So the rule is followed by the CL sequence, just not this specific CL. Overall, the loss of inlining caused by using interfaces cuts the compiler speed by about 6%, a not insignificant amount. However, as we convert the representation to concrete structs that are not the giant Node over the next weeks, that speed should come back as more of the compiler starts operating directly on concrete types and the memory taken up by the graph of Nodes drops due to the more precise structs. Honestly, I was expecting worse. % benchstat bench.old bench.new name old time/op new time/op delta Template 168ms ± 4% 182ms ± 2% +8.34% (p=0.000 n=9+9) Unicode 72.2ms ±10% 82.5ms ± 6% +14.38% (p=0.000 n=9+9) GoTypes 563ms ± 8% 598ms ± 2% +6.14% (p=0.006 n=9+9) Compiler 2.89s ± 4% 3.04s ± 2% +5.37% (p=0.000 n=10+9) SSA 6.45s ± 4% 7.25s ± 5% +12.41% (p=0.000 n=9+10) Flate 105ms ± 2% 115ms ± 1% +9.66% (p=0.000 n=10+8) GoParser 144ms ±10% 152ms ± 2% +5.79% (p=0.011 n=9+8) Reflect 345ms ± 9% 370ms ± 4% +7.28% (p=0.001 n=10+9) Tar 149ms ± 9% 161ms ± 5% +8.05% (p=0.001 n=10+9) XML 190ms ± 3% 209ms ± 2% +9.54% (p=0.000 n=9+8) LinkCompiler 327ms ± 2% 325ms ± 2% ~ (p=0.382 n=8+8) ExternalLinkCompiler 1.77s ± 4% 1.73s ± 6% ~ (p=0.113 n=9+10) LinkWithoutDebugCompiler 214ms ± 4% 211ms ± 2% ~ (p=0.360 n=10+8) StdCmd 14.8s ± 3% 15.9s ± 1% +6.98% (p=0.000 n=10+9) [Geo mean] 480ms 510ms +6.31% name old user-time/op new user-time/op delta Template 223ms ± 3% 237ms ± 3% +6.16% (p=0.000 n=9+10) Unicode 103ms ± 6% 113ms ± 3% +9.53% (p=0.000 n=9+9) GoTypes 758ms ± 8% 800ms ± 2% +5.55% (p=0.003 n=10+9) Compiler 3.95s ± 2% 4.12s ± 2% +4.34% (p=0.000 n=10+9) SSA 9.43s ± 1% 9.74s ± 4% +3.25% (p=0.000 n=8+10) Flate 132ms ± 2% 141ms ± 2% +6.89% (p=0.000 n=9+9) GoParser 177ms ± 9% 183ms ± 4% ~ (p=0.050 n=9+9) Reflect 467ms ±10% 495ms ± 7% +6.17% (p=0.029 n=10+10) Tar 183ms ± 9% 197ms ± 5% +7.92% (p=0.001 n=10+10) XML 249ms ± 5% 268ms ± 4% +7.82% (p=0.000 n=10+9) LinkCompiler 544ms ± 5% 544ms ± 6% ~ (p=0.863 n=9+9) ExternalLinkCompiler 1.79s ± 4% 1.75s ± 6% ~ (p=0.075 n=10+10) LinkWithoutDebugCompiler 248ms ± 6% 246ms ± 2% ~ (p=0.965 n=10+8) [Geo mean] 483ms 504ms +4.41% [git-generate] cd src/cmd/compile/internal/ir : # We need to do the conversion in multiple steps, so we introduce : # a temporary type alias that will start out meaning the pointer-to-struct : # and then change to mean the interface. rf ' mv Node OldNode add node.go \ type Node = *OldNode ' : # It should work to do this ex in ir, but it misses test files, due to a bug in rf. : # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests. cd ../gc rf ' ex . ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm { import "cmd/compile/internal/ir" *ir.OldNode -> ir.Node } ' cd ../ssa rf ' ex { import "cmd/compile/internal/ir" *ir.OldNode -> ir.Node } ' : # Back in ir, finish conversion clumsily with sed, : # because type checking and circular aliases do not mix. cd ../ir sed -i '' ' /type Node = \*OldNode/d s/\*OldNode/Node/g s/^func (n Node)/func (n *OldNode)/ s/OldNode/node/g s/type INode interface/type Node interface/ s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/ ' *.go gofmt -w *.go sed -i '' ' s/{Func{}, 136, 248}/{Func{}, 152, 280}/ s/{Name{}, 32, 56}/{Name{}, 44, 80}/ s/{Param{}, 24, 48}/{Param{}, 44, 88}/ s/{node{}, 76, 128}/{node{}, 88, 152}/ ' sizeof_test.go cd ../ssa sed -i '' ' s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/ ' sizeof_test.go cd ../gc sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go cd ../../../.. go install std cmd cd cmd/compile go test -u || go test -u Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553 Reviewed-on: https://go-review.googlesource.com/c/go/+/272935 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25[dev.regabi] cmd/compile: introduce cmd/compile/internal/ir [generated]Russ Cox
If we want to break up package gc at all, we will need to move the compiler IR it defines into a separate package that can be imported by packages that gc itself imports. This CL does that. It also removes the TINT8 etc aliases so that all code is clear about which package things are coming from. This CL is automatically generated by the script below. See the comments in the script for details about the changes. [git-generate] cd src/cmd/compile/internal/gc rf ' # These names were never fully qualified # when the types package was added. # Do it now, to avoid confusion about where they live. inline -rm \ Txxx \ TINT8 \ TUINT8 \ TINT16 \ TUINT16 \ TINT32 \ TUINT32 \ TINT64 \ TUINT64 \ TINT \ TUINT \ TUINTPTR \ TCOMPLEX64 \ TCOMPLEX128 \ TFLOAT32 \ TFLOAT64 \ TBOOL \ TPTR \ TFUNC \ TSLICE \ TARRAY \ TSTRUCT \ TCHAN \ TMAP \ TINTER \ TFORW \ TANY \ TSTRING \ TUNSAFEPTR \ TIDEAL \ TNIL \ TBLANK \ TFUNCARGS \ TCHANARGS \ NTYPE \ BADWIDTH # esc.go and escape.go do not need to be split. # Append esc.go onto the end of escape.go. mv esc.go escape.go # Pull out the type format installation from func Main, # so it can be carried into package ir. mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats # Names that need to be exported for use by code left in gc. mv Isconst IsConst mv asNode AsNode mv asNodes AsNodes mv asTypesNode AsTypesNode mv basicnames BasicTypeNames mv builtinpkg BuiltinPkg mv consttype ConstType mv dumplist DumpList mv fdumplist FDumpList mv fmtMode FmtMode mv goopnames OpNames mv inspect Inspect mv inspectList InspectList mv localpkg LocalPkg mv nblank BlankNode mv numImport NumImport mv opprec OpPrec mv origSym OrigSym mv stmtwithinit StmtWithInit mv dump DumpAny mv fdump FDumpAny mv nod Nod mv nodl NodAt mv newname NewName mv newnamel NewNameAt mv assertRepresents AssertValidTypeForConst mv represents ValidTypeForConst mv nodlit NewLiteral # Types and fields that need to be exported for use by gc. mv nowritebarrierrecCallSym SymAndPos mv SymAndPos.lineno SymAndPos.Pos mv SymAndPos.target SymAndPos.Sym mv Func.lsym Func.LSym mv Func.setWBPos Func.SetWBPos mv Func.numReturns Func.NumReturns mv Func.numDefers Func.NumDefers mv Func.nwbrCalls Func.NWBRCalls # initLSym is an algorithm left behind in gc, # not an operation on Func itself. mv Func.initLSym initLSym mv nodeQueue NodeQueue mv NodeQueue.empty NodeQueue.Empty mv NodeQueue.popLeft NodeQueue.PopLeft mv NodeQueue.pushRight NodeQueue.PushRight # Many methods on Node are actually algorithms that # would apply to any node implementation. # Those become plain functions. mv Node.funcname FuncName mv Node.isBlank IsBlank mv Node.isGoConst isGoConst mv Node.isNil IsNil mv Node.isParamHeapCopy isParamHeapCopy mv Node.isParamStackCopy isParamStackCopy mv Node.isSimpleName isSimpleName mv Node.mayBeShared MayBeShared mv Node.pkgFuncName PkgFuncName mv Node.backingArrayPtrLen backingArrayPtrLen mv Node.isterminating isTermNode mv Node.labeledControl labeledControl mv Nodes.isterminating isTermNodes mv Nodes.sigerr fmtSignature mv Node.MethodName methodExprName mv Node.MethodFunc methodExprFunc mv Node.IsMethod IsMethod # Every node will need to implement RawCopy; # Copy and SepCopy algorithms will use it. mv Node.rawcopy Node.RawCopy mv Node.copy Copy mv Node.sepcopy SepCopy # Extract Node.Format method body into func FmtNode, # but leave method wrapper behind. mv Node.Format:0,$ FmtNode # Formatting helpers that will apply to all node implementations. mv Node.Line Line mv Node.exprfmt exprFmt mv Node.jconv jconvFmt mv Node.modeString modeString mv Node.nconv nconvFmt mv Node.nodedump nodeDumpFmt mv Node.nodefmt nodeFmt mv Node.stmtfmt stmtFmt # Constant support needed for code moving to ir. mv okforconst OKForConst mv vconv FmtConst mv int64Val Int64Val mv float64Val Float64Val mv Node.ValueInterface ConstValue # Organize code into files. mv LocalPkg BuiltinPkg ir.go mv NumImport InstallTypeFormats Line fmt.go mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \ AsNode AsTypesNode BlankNode OrigSym \ Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \ IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \ Node.RawCopy SepCopy Copy \ IsNil IsBlank IsMethod \ Node.Typ Node.StorageClass node.go mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go # Move files to new ir package. mv bitset.go class_string.go dump.go fmt.go \ ir.go node.go op_string.go val.go \ sizeof_test.go cmd/compile/internal/ir ' : # fix mkbuiltin.go to generate the changes made to builtin.go during rf sed -i '' ' s/\[T/[types.T/g s/\*Node/*ir.Node/g /internal\/types/c \ fmt.Fprintln(&b, `import (`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/ir"`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/types"`) \ fmt.Fprintln(&b, `)`) ' mkbuiltin.go gofmt -w mkbuiltin.go : # update cmd/dist to add internal/ir cd ../../../dist sed -i '' '/compile.internal.gc/a\ "cmd/compile/internal/ir", ' buildtool.go gofmt -w buildtool.go : # update cmd/compile TestFormats cd ../.. go install std cmd cd cmd/compile go test -u || go test # first one updates but fails; second passes Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db Reviewed-on: https://go-review.googlesource.com/c/go/+/273008 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25[dev.regabi] cmd/compile: introduce cmd/compile/internal/base [generated]Russ Cox
Move Flag, Debug, Ctxt, Exit, and error messages to new package cmd/compile/internal/base. These are the core functionality that everything in gc uses and which otherwise prevent splitting any other code out of gc into different packages. A minor milestone: the compiler source code no longer contains the string "yy". [git-generate] cd src/cmd/compile/internal/gc rf ' mv atExit AtExit mv Ctxt atExitFuncs AtExit Exit base.go mv lineno Pos mv linestr FmtPos mv flusherrors FlushErrors mv yyerror Errorf mv yyerrorl ErrorfAt mv yyerrorv ErrorfVers mv noder.yyerrorpos noder.errorAt mv Warnl WarnfAt mv errorexit ErrorExit mv base.go debug.go flag.go print.go cmd/compile/internal/base ' : # update comments sed -i '' 's/yyerrorl/ErrorfAt/g; s/yyerror/Errorf/g' *.go : # bootstrap.go is not built by default so invisible to rf sed -i '' 's/Fatalf/base.Fatalf/' bootstrap.go goimports -w bootstrap.go : # update cmd/dist to add internal/base cd ../../../dist sed -i '' '/internal.amd64/a\ "cmd/compile/internal/base", ' buildtool.go gofmt -w buildtool.go Change-Id: I59903c7084222d6eaee38823fd222159ba24a31a Reviewed-on: https://go-review.googlesource.com/c/go/+/272250 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25[dev.regabi] cmd/compile: clean up debug flag (-d) handling [generated]Russ Cox
The debug table is not as haphazard as flags, but there are still a few mismatches between command-line names and variable names. This CL moves them all into a consistent home (var Debug, like var Flag). Code updated automatically using the rf command below. A followup CL will make a few manual cleanups, leaving this CL completely automated and easier to regenerate during merge conflicts. [git-generate] cd src/cmd/compile/internal/gc rf ' add main.go var Debug struct{} mv Debug_append Debug.Append mv Debug_checkptr Debug.Checkptr mv Debug_closure Debug.Closure mv Debug_compilelater Debug.CompileLater mv disable_checknil Debug.DisableNil mv debug_dclstack Debug.DclStack mv Debug_gcprog Debug.GCProg mv Debug_libfuzzer Debug.Libfuzzer mv Debug_checknil Debug.Nil mv Debug_panic Debug.Panic mv Debug_slice Debug.Slice mv Debug_typeassert Debug.TypeAssert mv Debug_wb Debug.WB mv Debug_export Debug.Export mv Debug_pctab Debug.PCTab mv Debug_locationlist Debug.LocationLists mv Debug_typecheckinl Debug.TypecheckInl mv Debug_gendwarfinl Debug.DwarfInl mv Debug_softfloat Debug.SoftFloat mv Debug_defer Debug.Defer mv Debug_dumpptrs Debug.DumpPtrs mv flag.go:/parse.-d/-1,/unknown.debug/+2 parseDebug mv debugtab Debug parseDebug \ debugHelpHeader debugHelpFooter \ debug.go # Remove //go:generate line copied from main.go rm debug.go:/go:generate/-+ ' Change-Id: I625761ca5659be4052f7161a83baa00df75cca91 Reviewed-on: https://go-review.googlesource.com/c/go/+/272246 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-19cmd/compile,cmd/asm: fix function pointer call perf regression on ppc64Paul E. Murphy
by inserting hint when using bclrl. Using this instruction as subroutine call is not the expected default behavior, and as a result confuses the branch predictor. The default expected behavior is a conditional return from a subroutine. We can change this assumption by encoding a hint this is not a subroutine return. The regex benchmarks are a pretty good example of how much this hint can help generic ppc64le code on a power9 machine: name old time/op new time/op delta Find 606ns ± 0% 447ns ± 0% -26.27% FindAllNoMatches 309ns ± 0% 205ns ± 0% -33.72% FindString 609ns ± 0% 451ns ± 0% -26.04% FindSubmatch 734ns ± 0% 594ns ± 0% -19.07% FindStringSubmatch 706ns ± 0% 574ns ± 0% -18.83% Literal 177ns ± 0% 136ns ± 0% -22.89% NotLiteral 4.69µs ± 0% 2.34µs ± 0% -50.14% MatchClass 6.05µs ± 0% 3.26µs ± 0% -46.08% MatchClass_InRange 5.93µs ± 0% 3.15µs ± 0% -46.86% ReplaceAll 3.15µs ± 0% 2.18µs ± 0% -30.77% AnchoredLiteralShortNonMatch 156ns ± 0% 109ns ± 0% -30.61% AnchoredLiteralLongNonMatch 192ns ± 0% 136ns ± 0% -29.34% AnchoredShortMatch 268ns ± 0% 209ns ± 0% -22.00% AnchoredLongMatch 472ns ± 0% 357ns ± 0% -24.30% OnePassShortA 1.16µs ± 0% 0.87µs ± 0% -25.03% NotOnePassShortA 1.34µs ± 0% 1.20µs ± 0% -10.63% OnePassShortB 940ns ± 0% 655ns ± 0% -30.29% NotOnePassShortB 873ns ± 0% 703ns ± 0% -19.52% OnePassLongPrefix 258ns ± 0% 155ns ± 0% -40.13% OnePassLongNotPrefix 943ns ± 0% 529ns ± 0% -43.89% MatchParallelShared 591ns ± 0% 436ns ± 0% -26.31% MatchParallelCopied 596ns ± 0% 435ns ± 0% -27.10% QuoteMetaAll 186ns ± 0% 186ns ± 0% -0.16% QuoteMetaNone 55.9ns ± 0% 55.9ns ± 0% +0.02% Compile/Onepass 9.64µs ± 0% 9.26µs ± 0% -3.97% Compile/Medium 21.7µs ± 0% 20.6µs ± 0% -4.90% Compile/Hard 174µs ± 0% 174µs ± 0% +0.07% Match/Easy0/16 7.35ns ± 0% 7.34ns ± 0% -0.11% Match/Easy0/32 116ns ± 0% 97ns ± 0% -16.27% Match/Easy0/1K 592ns ± 0% 562ns ± 0% -5.04% Match/Easy0/32K 12.6µs ± 0% 12.5µs ± 0% -0.64% Match/Easy0/1M 556µs ± 0% 556µs ± 0% -0.00% Match/Easy0/32M 17.7ms ± 0% 17.7ms ± 0% +0.05% Match/Easy0i/16 7.34ns ± 0% 7.35ns ± 0% +0.10% Match/Easy0i/32 2.82µs ± 0% 1.64µs ± 0% -41.71% Match/Easy0i/1K 83.2µs ± 0% 48.2µs ± 0% -42.06% Match/Easy0i/32K 2.13ms ± 0% 1.80ms ± 0% -15.34% Match/Easy0i/1M 68.1ms ± 0% 57.6ms ± 0% -15.31% Match/Easy0i/32M 2.18s ± 0% 1.80s ± 0% -17.52% Match/Easy1/16 7.36ns ± 0% 7.34ns ± 0% -0.24% Match/Easy1/32 118ns ± 0% 96ns ± 0% -18.72% Match/Easy1/1K 2.46µs ± 0% 1.58µs ± 0% -35.65% Match/Easy1/32K 80.2µs ± 0% 54.6µs ± 0% -31.92% Match/Easy1/1M 2.75ms ± 0% 1.88ms ± 0% -31.66% Match/Easy1/32M 87.5ms ± 0% 59.8ms ± 0% -31.62% Match/Medium/16 7.34ns ± 0% 7.34ns ± 0% +0.01% Match/Medium/32 2.60µs ± 0% 1.50µs ± 0% -42.61% Match/Medium/1K 78.1µs ± 0% 43.7µs ± 0% -44.06% Match/Medium/32K 2.08ms ± 0% 1.52ms ± 0% -27.11% Match/Medium/1M 66.5ms ± 0% 48.6ms ± 0% -26.96% Match/Medium/32M 2.14s ± 0% 1.60s ± 0% -25.18% Match/Hard/16 7.35ns ± 0% 7.35ns ± 0% +0.03% Match/Hard/32 3.58µs ± 0% 2.44µs ± 0% -31.82% Match/Hard/1K 108µs ± 0% 75µs ± 0% -31.04% Match/Hard/32K 2.79ms ± 0% 2.25ms ± 0% -19.30% Match/Hard/1M 89.4ms ± 0% 72.2ms ± 0% -19.26% Match/Hard/32M 2.91s ± 0% 2.37s ± 0% -18.60% Match/Hard1/16 11.1µs ± 0% 8.3µs ± 0% -25.07% Match/Hard1/32 21.4µs ± 0% 16.1µs ± 0% -24.85% Match/Hard1/1K 658µs ± 0% 498µs ± 0% -24.27% Match/Hard1/32K 12.2ms ± 0% 11.7ms ± 0% -4.60% Match/Hard1/1M 391ms ± 0% 374ms ± 0% -4.40% Match/Hard1/32M 12.6s ± 0% 12.0s ± 0% -4.68% Match_onepass_regex/16 870ns ± 0% 611ns ± 0% -29.79% Match_onepass_regex/32 1.58µs ± 0% 1.08µs ± 0% -31.48% Match_onepass_regex/1K 45.7µs ± 0% 30.3µs ± 0% -33.58% Match_onepass_regex/32K 1.45ms ± 0% 0.97ms ± 0% -33.20% Match_onepass_regex/1M 46.2ms ± 0% 30.9ms ± 0% -33.01% Match_onepass_regex/32M 1.46s ± 0% 0.99s ± 0% -32.02% name old alloc/op new alloc/op delta Find 0.00B 0.00B 0.00% FindAllNoMatches 0.00B 0.00B 0.00% FindString 0.00B 0.00B 0.00% FindSubmatch 48.0B ± 0% 48.0B ± 0% 0.00% FindStringSubmatch 32.0B ± 0% 32.0B ± 0% 0.00% Compile/Onepass 4.02kB ± 0% 4.02kB ± 0% 0.00% Compile/Medium 9.39kB ± 0% 9.39kB ± 0% 0.00% Compile/Hard 84.7kB ± 0% 84.7kB ± 0% 0.00% Match_onepass_regex/16 0.00B 0.00B 0.00% Match_onepass_regex/32 0.00B 0.00B 0.00% Match_onepass_regex/1K 0.00B 0.00B 0.00% Match_onepass_regex/32K 0.00B 0.00B 0.00% Match_onepass_regex/1M 5.00B ± 0% 3.00B ± 0% -40.00% Match_onepass_regex/32M 136B ± 0% 68B ± 0% -50.00% name old allocs/op new allocs/op delta Find 0.00 0.00 0.00% FindAllNoMatches 0.00 0.00 0.00% FindString 0.00 0.00 0.00% FindSubmatch 1.00 ± 0% 1.00 ± 0% 0.00% FindStringSubmatch 1.00 ± 0% 1.00 ± 0% 0.00% Compile/Onepass 52.0 ± 0% 52.0 ± 0% 0.00% Compile/Medium 112 ± 0% 112 ± 0% 0.00% Compile/Hard 424 ± 0% 424 ± 0% 0.00% Match_onepass_regex/16 0.00 0.00 0.00% Match_onepass_regex/32 0.00 0.00 0.00% Match_onepass_regex/1K 0.00 0.00 0.00% Match_onepass_regex/32K 0.00 0.00 0.00% Match_onepass_regex/1M 0.00 0.00 0.00% Match_onepass_regex/32M 2.00 ± 0% 1.00 ± 0% -50.00% name old speed new speed delta QuoteMetaAll 75.2MB/s ± 0% 75.3MB/s ± 0% +0.15% QuoteMetaNone 465MB/s ± 0% 465MB/s ± 0% -0.02% Match/Easy0/16 2.18GB/s ± 0% 2.18GB/s ± 0% +0.10% Match/Easy0/32 276MB/s ± 0% 330MB/s ± 0% +19.46% Match/Easy0/1K 1.73GB/s ± 0% 1.82GB/s ± 0% +5.29% Match/Easy0/32K 2.60GB/s ± 0% 2.62GB/s ± 0% +0.64% Match/Easy0/1M 1.89GB/s ± 0% 1.89GB/s ± 0% +0.00% Match/Easy0/32M 1.89GB/s ± 0% 1.89GB/s ± 0% -0.05% Match/Easy0i/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.10% Match/Easy0i/32 11.4MB/s ± 0% 19.5MB/s ± 0% +71.48% Match/Easy0i/1K 12.3MB/s ± 0% 21.2MB/s ± 0% +72.62% Match/Easy0i/32K 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12% Match/Easy0i/1M 15.4MB/s ± 0% 18.2MB/s ± 0% +18.12% Match/Easy0i/32M 15.4MB/s ± 0% 18.6MB/s ± 0% +21.21% Match/Easy1/16 2.17GB/s ± 0% 2.18GB/s ± 0% +0.24% Match/Easy1/32 271MB/s ± 0% 333MB/s ± 0% +23.07% Match/Easy1/1K 417MB/s ± 0% 648MB/s ± 0% +55.38% Match/Easy1/32K 409MB/s ± 0% 600MB/s ± 0% +46.88% Match/Easy1/1M 381MB/s ± 0% 558MB/s ± 0% +46.33% Match/Easy1/32M 383MB/s ± 0% 561MB/s ± 0% +46.25% Match/Medium/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.01% Match/Medium/32 12.3MB/s ± 0% 21.4MB/s ± 0% +74.13% Match/Medium/1K 13.1MB/s ± 0% 23.4MB/s ± 0% +78.73% Match/Medium/32K 15.7MB/s ± 0% 21.6MB/s ± 0% +37.23% Match/Medium/1M 15.8MB/s ± 0% 21.6MB/s ± 0% +36.93% Match/Medium/32M 15.7MB/s ± 0% 21.0MB/s ± 0% +33.67% Match/Hard/16 2.18GB/s ± 0% 2.18GB/s ± 0% -0.03% Match/Hard/32 8.93MB/s ± 0% 13.10MB/s ± 0% +46.70% Match/Hard/1K 9.48MB/s ± 0% 13.74MB/s ± 0% +44.94% Match/Hard/32K 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87% Match/Hard/1M 11.7MB/s ± 0% 14.5MB/s ± 0% +23.87% Match/Hard/32M 11.6MB/s ± 0% 14.2MB/s ± 0% +22.86% Match/Hard1/16 1.44MB/s ± 0% 1.93MB/s ± 0% +34.03% Match/Hard1/32 1.49MB/s ± 0% 1.99MB/s ± 0% +33.56% Match/Hard1/1K 1.56MB/s ± 0% 2.05MB/s ± 0% +31.41% Match/Hard1/32K 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48% Match/Hard1/1M 2.68MB/s ± 0% 2.80MB/s ± 0% +4.48% Match/Hard1/32M 2.66MB/s ± 0% 2.79MB/s ± 0% +4.89% Match_onepass_regex/16 18.4MB/s ± 0% 26.2MB/s ± 0% +42.41% Match_onepass_regex/32 20.2MB/s ± 0% 29.5MB/s ± 0% +45.92% Match_onepass_regex/1K 22.4MB/s ± 0% 33.8MB/s ± 0% +50.54% Match_onepass_regex/32K 22.6MB/s ± 0% 33.9MB/s ± 0% +49.67% Match_onepass_regex/1M 22.7MB/s ± 0% 33.9MB/s ± 0% +49.27% Match_onepass_regex/32M 23.0MB/s ± 0% 33.9MB/s ± 0% +47.14% Fixes #42709 Change-Id: Ice07fec2de4c5b1302febf8c2978ae8c1e4fd3e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/271337 Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>
2020-10-27cmd/compile: combine more 32 bit shift and mask operations on ppc64Paul E. Murphy
Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is and the shift value produce constant which can be encoded into an RLWINM instruction. Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate masks produces a constant which can be encoded into RLWINM. Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)). Combine rotate word + and operations which can be encoded as a single RLWINM/RLWNM instruction. The most notable performance improvements arise from the crypto benchmarks below (GOARCH=power8 on a ppc64le/linux): pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le ExpandKeyWithSalt 52.2µs ± 0% 47.5µs ± 0% -8.88% ExpandKey 44.4µs ± 0% 40.3µs ± 0% -9.15% pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le Key 57.6ms ± 0% 52.3ms ± 0% -9.13% pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le Equal 90.9ms ± 0% 82.6ms ± 0% -9.13% DefaultCost 91.0ms ± 0% 82.7ms ± 0% -9.12% Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce Reviewed-on: https://go-review.googlesource.com/c/go/+/260798 Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-23cmd/compile: intrinsify runtime/internal/atomic.{And,Or} on PPC64Michael Pratt
This is a simple case of changing the operand size of the existing 8-bit And/Or. I've also updated a few operand descriptions that were out-of-sync with the implementation. Change-Id: I95ac4445d08f7958768aec9a233698a2d652a39a Reviewed-on: https://go-review.googlesource.com/c/go/+/263150 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-06cmd/compile,cmd/internal/obj/ppc64: use mulli where possibleLynn Boger
This adds support to allow the use of mulli when one of the multiply operands is a constant that fits in 16 bits. This especially helps in the case where this instruction appears in a loop since the load of the constant is not being moved out of the loop. Some improvements seen in compress/flate on power9: Decode/Digits/Huffman/1e4 259µs ± 0% 261µs ± 0% +0.57% (p=1.000 n=1+1) Decode/Digits/Huffman/1e5 2.43ms ± 0% 2.45ms ± 0% +0.79% (p=1.000 n=1+1) Decode/Digits/Huffman/1e6 23.9ms ± 0% 24.2ms ± 0% +0.86% (p=1.000 n=1+1) Decode/Digits/Speed/1e4 278µs ± 0% 279µs ± 0% +0.34% (p=1.000 n=1+1) Decode/Digits/Speed/1e5 2.80ms ± 0% 2.81ms ± 0% +0.29% (p=1.000 n=1+1) Decode/Digits/Speed/1e6 28.0ms ± 0% 28.1ms ± 0% +0.28% (p=1.000 n=1+1) Decode/Digits/Default/1e4 278µs ± 0% 278µs ± 0% +0.28% (p=1.000 n=1+1) Decode/Digits/Default/1e5 2.68ms ± 0% 2.69ms ± 0% +0.19% (p=1.000 n=1+1) Decode/Digits/Default/1e6 26.6ms ± 0% 26.6ms ± 0% +0.21% (p=1.000 n=1+1) Decode/Digits/Compression/1e4 278µs ± 0% 278µs ± 0% +0.00% (p=1.000 n=1+1) Decode/Digits/Compression/1e5 2.68ms ± 0% 2.69ms ± 0% +0.21% (p=1.000 n=1+1) Decode/Digits/Compression/1e6 26.6ms ± 0% 26.6ms ± 0% +0.07% (p=1.000 n=1+1) Decode/Newton/Huffman/1e4 322µs ± 0% 312µs ± 0% -2.84% (p=1.000 n=1+1) Decode/Newton/Huffman/1e5 3.11ms ± 0% 2.91ms ± 0% -6.41% (p=1.000 n=1+1) Decode/Newton/Huffman/1e6 31.4ms ± 0% 29.3ms ± 0% -6.85% (p=1.000 n=1+1) Decode/Newton/Speed/1e4 282µs ± 0% 269µs ± 0% -4.69% (p=1.000 n=1+1) Decode/Newton/Speed/1e5 2.29ms ± 0% 2.20ms ± 0% -4.13% (p=1.000 n=1+1) Decode/Newton/Speed/1e6 22.7ms ± 0% 21.3ms ± 0% -6.06% (p=1.000 n=1+1) Decode/Newton/Default/1e4 254µs ± 0% 237µs ± 0% -6.60% (p=1.000 n=1+1) Decode/Newton/Default/1e5 1.86ms ± 0% 1.75ms ± 0% -5.99% (p=1.000 n=1+1) Decode/Newton/Default/1e6 18.1ms ± 0% 17.4ms ± 0% -4.10% (p=1.000 n=1+1) Decode/Newton/Compression/1e4 254µs ± 0% 244µs ± 0% -3.91% (p=1.000 n=1+1) Decode/Newton/Compression/1e5 1.85ms ± 0% 1.79ms ± 0% -3.10% (p=1.000 n=1+1) Decode/Newton/Compression/1e6 18.0ms ± 0% 17.3ms ± 0% -3.88% (p=1.000 n=1+1) Change-Id: I840320fab1c4bf64c76b001c2651ab79f23df4eb Reviewed-on: https://go-review.googlesource.com/c/go/+/259444 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-10-01cmd/compile,cmd/internal/obj/ppc64: fix some shift rules due to a regressionLynn Boger
A recent change to improve shifts was generating some invalid cases when the rule was based on an AND. The extended mnemonics CLRLSLDI and CLRLSLWI only allow certain values for the operands and in the mask case those values were not being checked properly. This adds a check to those rules to verify that the 'b' and 'n' values used when an AND was part of the rule have correct values. There was a bug in some diag messages in asm9. The message expected 3 values but only provided 2. Those are corrected here also. The test/codegen/shift.go was updated to add a few more cases to check for the case mentioned here. Some of the comments that mention the order of operands in these extended mnemonics were wrong and those have been corrected. Fixes #41683. Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31 Reviewed-on: https://go-review.googlesource.com/c/go/+/258138 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-28cmd/asm,cmd/compile,cmd/internal/obj/ppc64: add extswsli support on power9Lynn Boger
This adds support for the extswsli instruction which combines extsw followed by a shift. New benchmark demonstrates the improvement: name old time/op new time/op delta ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3) Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/257017 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-09-17cmd/compile: use combined shifts to improve array addressing on ppc64xLynn Boger
This change adds rules to find pairs of instructions that can be combined into a single shifts. These instruction sequences are common in array addressing within loops. Improvements can be seen in many crypto packages and the hash packages. These are based on the extended mnemonics found in the ISA sections C.8.1 and C.8.2. Some rules in PPC64.rules were moved because the ordering prevented some matching. The following results were generated on power9. hash/crc32: CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41% CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50% CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46% CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80% CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27% CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15% CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22% CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79% CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56% CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53% crypto/rc4: RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1) RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1) RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1) crypto/sha1: Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1) Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1) Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1) Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1) There are other improvements found in golang.org/x/crypto which are all in the range of 5-15%. Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c Reviewed-on: https://go-review.googlesource.com/c/go/+/252097 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-08-27cmd/compile: generate subfic on ppc64Paul E. Murphy
This merges an lis + subf into subfic, and for 32b constants lwa + subf into oris + ori + subf. The carry bit is no longer used in code generation, therefore I think we can clobber it as needed. Note, lowered borrow/carry arithmetic is self-contained and thus is not affected. A few extra rules are added to ensure early transformations to SUBFCconst don't trip up earlier rules, fold constant operations, or otherwise simplify lowering. Likewise, tests are added to ensure all rules are hit. Generic constant folding catches trivial cases, however some lowering rules insert arithmetic which can introduce new opportunities (e.g BitLen or Slicemask). I couldn't find a specific benchmark to demonstrate noteworthy improvements, but this is generating subfic in many of the default bent test binaries, so we are at least saving a little code space. Change-Id: Iad7c6e5767eaa9dc24dc1c989bd1c8cfe1982012 Reviewed-on: https://go-review.googlesource.com/c/go/+/249461 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2020-08-27cmd/compile: remove unused carry related ssa ops in ppc64Paul E. Murphy
The intermediate SSA opcodes* are no longer generated during the lowering pass. The shifting rules have been improved using ISEL. Therefore, we can remove them and the rules which expand them. * The removed opcodes are: LoweredAdd64Carry ADDconstForCarry MaskIfNotCarry FlagCarryClear FlagCarrySet Change-Id: I1ebe2726ed988f29ed4800c8f57b428f7a214cd0 Reviewed-on: https://go-review.googlesource.com/c/go/+/249462 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-08-18cmd/compile: combine multiply/add into maddld on ppc64le/power9Paul E. Murphy
Add a new lowering rule to match and replace such instances with the MADDLD instruction available on power9 where possible. Likewise, this plumbs in a new ppc64 ssa opcode to house the newly generated MADDLD instructions. When testing ed25519, this reduced binary size by 936B. Similarly, MADDLD combination occcurs in a few other less obvious cases such as division by constant. Testing of golang.org/x/crypto/ed25519 shows non-trivial speedup during keygeneration: name old time/op new time/op delta KeyGeneration 65.2µs ± 0% 63.1µs ± 0% -3.19% Signing 64.3µs ± 0% 64.4µs ± 0% +0.16% Verification 147µs ± 0% 147µs ± 0% +0.11% Similarly, this test binary has shrunk by 66488B. Change-Id: I077aeda7943119b41f07e4e62e44a648f16e4ad0 Reviewed-on: https://go-review.googlesource.com/c/go/+/248723 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2020-04-29cmd/compile,cmd/internal/obj/ppc64: use mod instructions on power9Lynn Boger
This updates the PPC64.rules file to use the MOD instructions that are available in power9. Prior to power9 this is done using a longer sequence with multiply and divide. Included in this change is removal of the REM* opcode variations that set the CC or OV bits since their settings are based on the DIV and are not appropriate for the REM. Change-Id: Iceed9ce33e128e1911c15592ee674276ce8ba3fa Reviewed-on: https://go-review.googlesource.com/c/go/+/229761 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-10cmd/compile: use a Sym type instead of interface{} for symbolic offsetsKeith Randall
Will help with strongly typed rewrite rules. Change-Id: Ifbf316a49f4081322b3b8f13bc962713437d9aba Reviewed-on: https://go-review.googlesource.com/c/go/+/227785 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2020-04-06cmd/compile: improve lowered moves and zeros for ppc64leLynn Boger
This change includes the following: - Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9. These instructions do not require an index register, which allows more loads and stores within a loop without initializing multiple index registers. The LoweredQuadXXX generate LXV/STXV. - Create LoweredMoveXXXShort and LoweredZeroXXXShort for short moves that don't generate loops, and therefore don't clobber the address registers or flags. - Use registers other than R3 and R4 to avoid conflicting with registers that have already been allocated to avoid unnecessary register moves. - Eliminate the use of R14 as scratch register and use R31 instead. - Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a loop with more than 3 iterations. This performance opportunity was noticed in github.com/golang/snappy benchmarks. Results on power9: WordsDecode1e1 54.1ns ± 0% 53.8ns ± 0% -0.51% (p=0.029 n=4+4) WordsDecode1e2 287ns ± 0% 282ns ± 1% -1.83% (p=0.029 n=4+4) WordsDecode1e3 3.98µs ± 0% 3.64µs ± 0% -8.52% (p=0.029 n=4+4) WordsDecode1e4 66.9µs ± 0% 67.0µs ± 0% +0.20% (p=0.029 n=4+4) WordsDecode1e5 723µs ± 0% 723µs ± 0% -0.01% (p=0.200 n=4+4) WordsDecode1e6 7.21ms ± 0% 7.21ms ± 0% -0.02% (p=1.000 n=4+4) WordsEncode1e1 29.9ns ± 0% 29.4ns ± 0% -1.51% (p=0.029 n=4+4) WordsEncode1e2 2.12µs ± 0% 1.75µs ± 0% -17.70% (p=0.029 n=4+4) WordsEncode1e3 11.7µs ± 0% 11.2µs ± 0% -4.61% (p=0.029 n=4+4) WordsEncode1e4 119µs ± 0% 120µs ± 0% +0.36% (p=0.029 n=4+4) WordsEncode1e5 1.21ms ± 0% 1.22ms ± 0% +0.41% (p=0.029 n=4+4) WordsEncode1e6 12.0ms ± 0% 12.0ms ± 0% +0.57% (p=0.029 n=4+4) RandomEncode 286µs ± 0% 203µs ± 0% -28.82% (p=0.029 n=4+4) ExtendMatch 47.4µs ± 0% 47.0µs ± 0% -0.85% (p=0.029 n=4+4) Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858 Reviewed-on: https://go-review.googlesource.com/c/go/+/226539 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-21cmd/compile: indexed loads/stores can't be faultOnNilArg0Keith Randall
Because of the index, these ops can't guarantee faulting if arg0 is nil. Clean up the PPC64 index ops - they can't take a sym or an offset. Noticed while debugging #37881. I don't think it is the cause, but I guess there is a chance. Update #37881 Change-Id: Ic22925250bf7b1ba64e3cea1a65638bc4bab390c Reviewed-on: https://go-review.googlesource.com/c/go/+/224457 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-02cmd/compile: remove duplicate ppc64 rulesJosh Bleecher Snyder
Const64 gets lowered to MOVDconst. Change rules using interior Const64 to use MOVDconst instead, to be less dependent on rule application order. As a result of doing this, some of the rules end up being exact duplicates; remove those. We had those exact duplicates because of the order dependency; ppc64 had no way to optimize away shifts by a constant if the initial lowering didn't catch it. Add those optimizations as well. The outcome is the same, but this makes the overall rules more robust. Change-Id: Iadd97a9fe73d52358d571d022ace145e506d160b Reviewed-on: https://go-review.googlesource.com/c/go/+/220877 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2019-11-10cmd/compile: enable nil check logging for other architectures.David Chase
Change-Id: If82ebd9cd6470863eb5de9e031e7905a66218857 Reviewed-on: https://go-review.googlesource.com/c/go/+/204159 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-07cmd/compile, cmd/internal/obj/ppc64: use LR for indirect callsCherry Zhang
On PPC64, indirect calls can be made through LR or CTR. Currently both are used. This CL changes it to always use LR. For async preemption, to return from the injected call, we need an indirect jump back to the PC we preeempted. This jump can be made through LR or CTR. So we'll have to clobber either LR or CTR. Currently, LR is used more frequently. In particular, for a leaf function, LR is live throughout the function. We don't want to make leaf functions nonpreemptible. So we choose CTR for the call injection. For code sequences that use CTR, if it is ok to use another register, change it to. Plus, it is a call so it will clobber LR anyway. It doesn't need to also clobber CTR (even without preemption). Change-Id: I07bd0e93b94a1a3aa2be2cd465801136165d8ab8 Reviewed-on: https://go-review.googlesource.com/c/go/+/203822 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2019-10-29cmd/compile: intrinsics for runtime/internal/atomic.Store8Austin Clements
For #10958, #24543, but makes sense on its own. Change-Id: I2a87dab66b82a1863e4b6512b1f8def51463ce2a Reviewed-on: https://go-review.googlesource.com/c/go/+/203284 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-28cmd/compile: delete ZeroAutoCherry Zhang
ZeroAuto was used with the ambiguously live logic. The ambiguously live logic is removed as we switched to stack objects. It is now never called. Remove. Change-Id: If4cdd7fed5297f8ab591cc392a76c80f57820856 Reviewed-on: https://go-review.googlesource.com/c/go/+/203538 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-08cmd/compile: use vsx loads and stores for LoweredMove, LoweredZero on ppc64xLynn Boger
This improves the code generated for LoweredMove and LoweredZero by using LXVD2X and STXVD2X to move 16 bytes at a time. These instructions are now used if the size to be moved or zeroed is >= 64. These same instructions have already been used in the asm implementations for memmove and memclr. Some examples where this shows an improvement on power8: MakeSlice/Byte 27.3ns ± 1% 25.2ns ± 0% -7.69% MakeSlice/Int16 40.2ns ± 0% 35.2ns ± 0% -12.39% MakeSlice/Int 94.9ns ± 1% 77.9ns ± 0% -17.92% MakeSlice/Ptr 129ns ± 1% 103ns ± 0% -20.16% MakeSlice/Struct/24 176ns ± 1% 131ns ± 0% -25.67% MakeSlice/Struct/32 200ns ± 1% 142ns ± 0% -29.09% MakeSlice/Struct/40 220ns ± 2% 156ns ± 0% -28.82% GrowSlice/Byte 81.4ns ± 0% 73.4ns ± 0% -9.88% GrowSlice/Int16 118ns ± 1% 98ns ± 0% -17.03% GrowSlice/Int 178ns ± 1% 134ns ± 1% -24.65% GrowSlice/Ptr 249ns ± 4% 212ns ± 0% -14.94% GrowSlice/Struct/24 294ns ± 5% 215ns ± 0% -27.08% GrowSlice/Struct/32 315ns ± 1% 248ns ± 0% -21.49% GrowSlice/Struct/40 382ns ± 4% 289ns ± 1% -24.38% ExtendSlice/IntSlice 109ns ± 1% 90ns ± 1% -17.51% ExtendSlice/PointerSlice 142ns ± 2% 118ns ± 0% -16.75% ExtendSlice/NoGrow 6.02ns ± 0% 5.88ns ± 0% -2.33% Append 27.2ns ± 0% 27.6ns ± 0% +1.38% AppendGrowByte 4.20ms ± 3% 2.60ms ± 0% -38.18% AppendGrowString 134ms ± 3% 102ms ± 2% -23.62% AppendSlice/1Bytes 5.65ns ± 0% 5.67ns ± 0% +0.35% AppendSlice/4Bytes 6.40ns ± 0% 6.55ns ± 0% +2.34% AppendSlice/7Bytes 8.74ns ± 0% 8.84ns ± 0% +1.14% AppendSlice/8Bytes 5.68ns ± 0% 5.70ns ± 0% +0.40% AppendSlice/15Bytes 9.31ns ± 0% 9.39ns ± 0% +0.86% AppendSlice/16Bytes 14.0ns ± 0% 5.8ns ± 0% -58.32% AppendSlice/32Bytes 5.72ns ± 0% 5.68ns ± 0% -0.66% AppendSliceLarge/1024Bytes 918ns ± 8% 615ns ± 1% -33.00% AppendSliceLarge/4096Bytes 3.25µs ± 1% 1.92µs ± 1% -40.84% AppendSliceLarge/16384Bytes 8.70µs ± 2% 4.69µs ± 0% -46.08% AppendSliceLarge/65536Bytes 18.1µs ± 3% 7.9µs ± 0% -56.30% AppendSliceLarge/262144Bytes 69.8µs ± 2% 25.9µs ± 0% -62.91% AppendSliceLarge/1048576Bytes 258µs ± 1% 93µs ± 0% -63.96% Change-Id: I21625dbe231a2029ddb9f7d73f5a6417b35c1e49 Reviewed-on: https://go-review.googlesource.com/c/go/+/199639 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-02cmd/compile: allow multiple SSA block control valuesMichael Munday
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-09-23cmd/compile: fix register masks of ANDCC et al. on PPC64Cherry Zhang
PPC64's ANDCC, ORCC, XORCC SSA ops produce a flags value, which should not have register mask of an integer register. Fixes #34468. Change-Id: Ic762e423b20275fd9f8118dae7951c258d59738c Reviewed-on: https://go-review.googlesource.com/c/go/+/196960 Reviewed-by: Keith Randall <khr@golang.org>
2019-09-18cmd/asm,cmd/compile: clean up isel codegen on ppc64xLynn Boger
This cleans up the isel code generation in ssa for ppc64x. Current there is no isel op and the isel code is only generated from pseudo ops in ppc64/ssa.go, and only using operands with values 0 or 1. When the isel is generated, there is always a load of 1 into the temp register before it. This change implements the isel op so it can be used in PPC64.rules, and can recognize operand values other than 0 or 1. This also eliminates the forced load of 1, so it will be loaded only if needed. This will make the isel code generation consistent with other ops, and allow future rule changes that can take advantage of having a more general purpose isel rule. Change-Id: I363e1dbd3f7f5dfecb53187ad51cce409a8d1f8d Reviewed-on: https://go-review.googlesource.com/c/go/+/195057 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-05-06all: simplify code using "gofmt -s -w"Shulhan
Most changes are removing redundant declaration of type when direct instantiating value of map or slice, e.g. []T{T{}} become []T{{}}. Small changes are removing the high order of subslice if its value is the length of slice itself, e.g. T[:len(T)] become T[:]. The following file is excluded due to incompatibility with go1.4, - src/cmd/compile/internal/gc/ssa.go Change-Id: Id3abb09401795ce1e6da591a89749cba8502fb26 Reviewed-on: https://go-review.googlesource.com/c/go/+/166437 Run-TryBot: Dave Cheney <dave@cheney.net> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-05-03cmd/compile,runtime/internal/atomic: add Load8Austin Clements
Change-Id: Id52a5730cf9207ee7ccebac4ef12791dc5720e7c Reviewed-on: https://go-review.googlesource.com/c/go/+/172283 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2019-05-01cmd/compile/internal/ppc64: improve naming for ginsnop2Lynn Boger
This is a follow up from a review comment at the end of the last Go release, to provide a more meaningful name for ginsnop2. Updates #30475 Change-Id: Ice9efd763bf2204a9e8c55ae230d3e8a80210108 Reviewed-on: https://go-review.googlesource.com/c/go/+/174757 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-04-28cmd/compile: intrinsify math/bits.Add64 for ppc64xCarlos Eduardo Seo
This change creates an intrinsic for Add64 for ppc64x and adds a testcase for it. name old time/op new time/op delta Add64-160 1.90ns ±40% 2.29ns ± 0% ~ (p=0.119 n=5+5) Add64multiple-160 6.69ns ± 2% 2.45ns ± 4% -63.47% (p=0.016 n=4+5) Change-Id: I9abe6fb023fdf62eea3c9b46a1820f60bb0a7f97 Reviewed-on: https://go-review.googlesource.com/c/go/+/173758 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-04-07cmd/compile: remove AUNDEF opcodeKeith Randall
This opcode was only used to mark unreachable code for plive to use. plive now uses the SSA representation, so it knows locations are unreachable because they are ends of Exit blocks. It doesn't need these opcodes any more. These opcodes actually used space in the binary, 2 bytes per undef on x86 and more for other archs. Makes the amd64 go binary 0.2% smaller. Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620 Reviewed-on: https://go-review.googlesource.com/c/go/+/171058 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-20cmd/compile/internal, cmd/internal/obj/ppc64: generate new count trailing ↵Carlos Eduardo Seo
zeros instructions on POWER9 This change adds new POWER9 instructions for counting trailing zeros (CNTTZW/CNTTZD) to the assembler and generates them in SSA when GOPPC64=power9. name old time/op new time/op delta TrailingZeros-160 1.59ns ±20% 1.45ns ±10% -8.81% (p=0.000 n=14+13) TrailingZeros8-160 1.55ns ±23% 1.62ns ±44% ~ (p=0.593 n=13+15) TrailingZeros16-160 1.78ns ±23% 1.62ns ±38% -9.31% (p=0.003 n=14+14) TrailingZeros32-160 1.64ns ±10% 1.49ns ± 9% -9.15% (p=0.000 n=13+14) TrailingZeros64-160 1.53ns ± 6% 1.45ns ± 5% -5.38% (p=0.000 n=15+13) Change-Id: I365e6ff79f3ce4d8ebe089a6a86b1771853eb596 Reviewed-on: https://go-review.googlesource.com/c/go/+/167517 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2019-03-18cmd/compile,runtime: provide index information on bounds check failureKeith Randall
A few examples (for accessing a slice of length 3): s[-1] runtime error: index out of range [-1] s[3] runtime error: index out of range [3] with length 3 s[-1:0] runtime error: slice bounds out of range [-1:] s[3:0] runtime error: slice bounds out of range [3:0] s[3:-1] runtime error: slice bounds out of range [:-1] s[3:4] runtime error: slice bounds out of range [:4] with capacity 3 s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3 Note that in cases where there are multiple things wrong with the indexes (e.g. s[3:-1]), we report one of those errors kind of arbitrarily, currently the rightmost one. An exhaustive set of examples is in issue30116[u].out in the CL. The message text has the same prefix as the old message text. That leads to slightly awkward phrasing but hopefully minimizes the chance that code depending on the error text will break. Increases the size of the go binary by 0.5% (amd64). The panic functions take arguments in registers in order to keep the size of the compiled code as small as possible. Fixes #30116 Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd Reviewed-on: https://go-review.googlesource.com/c/go/+/161477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2019-02-25cmd/compile: call ginsnop, not ginsnop2 on ppc64le for mid-stack inlining ↵Lynn Boger
tracebacks A recent change to fix stacktraces for inlined functions introduced a regression on ppc64le when compiling position independent code. That happened because ginsnop2 was called for the purpose of inserting a NOP to identify the location of the inlined function, when ginsnop should have been used. ginsnop2 is intended to be used before deferreturn to ensure r2 is properly restored when compiling position independent code. In some cases the location where r2 is loaded from might not be initialized. If that happens and r2 is used to generate an address, the result is likely a SEGV. This fixes that problem. Fixes #30283 Change-Id: If70ef27fc65ef31969712422306ac3a57adbd5b6 Reviewed-on: https://go-review.googlesource.com/c/163337 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Andrew Bonventre <andybons@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-28cmd/compile,runtime: redo mid-stack inlining tracebacksKeith Randall
Work involved in getting a stack trace is divided between runtime.Callers and runtime.CallersFrames. Before this CL, runtime.Callers returns a pc per runtime frame. runtime.CallersFrames is responsible for expanding a runtime frame into potentially multiple user frames. After this CL, runtime.Callers returns a pc per user frame. runtime.CallersFrames just maps those to user frame info. Entries in the result of runtime.Callers are now pcs of the calls (or of the inline marks), not of the instruction just after the call. Fixes #29007 Fixes #28640 Update #26320 Change-Id: I1c9567596ff73dc73271311005097a9188c3406f Reviewed-on: https://go-review.googlesource.com/c/152537 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-11-26cmd/compile: fix nilcheck for AIXClément Chigot
This commit adapts compile tool to create correct nilchecks for AIX. AIX allows to load a nil pointer. Therefore, the default nilcheck which issues a load must be replaced by a CMP instruction followed by a store at 0x0 if the value is nil. The store will trigger a SIGSEGV as on others OS. The nilcheck algorithm must be adapted to do not remove nilcheck if it's only a read. Stores are detected with v.Type.IsMemory(). Tests related to nilptr must be adapted to the previous changements. nilptr.go cannot be used as it's because the AIX address space starts at 1<<32. Change-Id: I9f5aaf0b7e185d736a9b119c0ed2fe4e5bd1e7af Reviewed-on: https://go-review.googlesource.com/c/144538 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-11-26runtime: handle 64bits addresses for AIXClément Chigot
This commit allows the runtime to handle 64bits addresses returned by mmap syscall on AIX. Mmap syscall returns addresses on 59bits on AIX. But the Arena implementation only allows addresses with less than 48 bits. This commit increases the arena size up to 1<<60 for aix/ppc64. Update: #25893 Change-Id: Iea72e8a944d10d4f00be915785e33ae82dd6329e Reviewed-on: https://go-review.googlesource.com/c/138736 Reviewed-by: Austin Clements <austin@google.com>