aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/arm
AgeCommit message (Collapse)Author
2021-08-03[dev.typeparams] runtime,cmd/compile,cmd/link: replace jmpdefer with a loopAustin Clements
Currently, deferreturn runs deferred functions by backing up its return PC to the deferreturn call, and then effectively tail-calling the deferred function (via jmpdefer). The effect of this is that the deferred function appears to be called directly from the deferee, and when it returns, the deferee calls deferreturn again so it can run the next deferred function if necessary. This unusual flow control leads to a large number of special cases and complications all over the tool chain. This used to be necessary because deferreturn copied the deferred function's argument frame directly into its caller's frame and then had to invoke that call as if it had been called from its caller's frame so it could access it arguments. But now that we've simplified defer processing so the runtime only deals with argument-less closures, this approach is no longer necessary. This CL simplifies all of this by making deferreturn simply call deferred functions in a loop. This eliminates the need for jmpdefer, so we can delete a bunch of per-architecture assembly code. This eliminates several special cases on Wasm, since it couldn't support these calling shenanigans directly and thus had to simulate the loop a different way. Now Wasm can largely work the way the other platforms do. This eliminates the per-architecture Ginsnopdefer operation. On PPC64, this was necessary to reload the TOC pointer after the tail call (since TOC pointers in general make tail calls impossible). The tail call is gone, and in the case where we do force a jump to the deferreturn call when recovering from an open-coded defer, we go through gogo (via runtime.recovery), which handles the TOC. On other platforms, we needed a NOP so traceback didn't get confused by seeing the return to the CALL instruction, rather than the usual return to the instruction following the CALL instruction. Now we don't inject a return to the CALL instruction at all, so this NOP is also unnecessary. The one potential effect of this is that deferreturn could now appear in stack traces from deferred functions. However, this could already happen from open-coded defers, so we've long since marked deferreturn as a "wrapper" so it gets elided not only from printed stack traces, but from runtime.Callers*. This is a retry of CL 337652 because we had to back out its parent. There are no changes in this version. Change-Id: I3f54b7fec1d7ccac71cc6cf6835c6a46b7e5fb6c Reviewed-on: https://go-review.googlesource.com/c/go/+/339397 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-07-30[dev.typeparams] Revert "[dev.typeparams] runtime,cmd/compile,cmd/link: ↵Austin Clements
replace jmpdefer with a loop" This reverts CL 227652. I'm reverting CL 337651 and this builds on top of it. Change-Id: I03ce363be44c2a3defff2e43e7b1aad83386820d Reviewed-on: https://go-review.googlesource.com/c/go/+/338709 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-07-30[dev.typeparams] runtime,cmd/compile,cmd/link: replace jmpdefer with a loopAustin Clements
Currently, deferreturn runs deferred functions by backing up its return PC to the deferreturn call, and then effectively tail-calling the deferred function (via jmpdefer). The effect of this is that the deferred function appears to be called directly from the deferee, and when it returns, the deferee calls deferreturn again so it can run the next deferred function if necessary. This unusual flow control leads to a large number of special cases and complications all over the tool chain. This used to be necessary because deferreturn copied the deferred function's argument frame directly into its caller's frame and then had to invoke that call as if it had been called from its caller's frame so it could access it arguments. But now that we've simplified defer processing so the runtime only deals with argument-less closures, this approach is no longer necessary. This CL simplifies all of this by making deferreturn simply call deferred functions in a loop. This eliminates the need for jmpdefer, so we can delete a bunch of per-architecture assembly code. This eliminates several special cases on Wasm, since it couldn't support these calling shenanigans directly and thus had to simulate the loop a different way. Now Wasm can largely work the way the other platforms do. This eliminates the per-architecture Ginsnopdefer operation. On PPC64, this was necessary to reload the TOC pointer after the tail call (since TOC pointers in general make tail calls impossible). The tail call is gone, and in the case where we do force a jump to the deferreturn call when recovering from an open-coded defer, we go through gogo (via runtime.recovery), which handles the TOC. On other platforms, we needed a NOP so traceback didn't get confused by seeing the return to the CALL instruction, rather than the usual return to the instruction following the CALL instruction. Now we don't inject a return to the CALL instruction at all, so this NOP is also unnecessary. The one potential effect of this is that deferreturn could now appear in stack traces from deferred functions. However, this could already happen from open-coded defers, so we've long since marked deferreturn as a "wrapper" so it gets elided not only from printed stack traces, but from runtime.Callers*. Change-Id: Ie9f700cd3fb774f498c9edce363772a868407bf7 Reviewed-on: https://go-review.googlesource.com/c/go/+/337652 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-04-16internal/buildcfg: move build configuration out of cmd/internal/objabiRuss Cox
The go/build package needs access to this configuration, so move it into a new package available to the standard library. Change-Id: I868a94148b52350c76116451f4ad9191246adcff Reviewed-on: https://go-review.googlesource.com/c/go/+/310731 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-03-19cmd/compile: add clobberdeadreg modeCherry Zhang
When -clobberdeadreg flag is set, the compiler inserts code that clobbers integer registers at call sites. This may be helpful for debugging register ABI. Only implemented on AMD64 for now. Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899 Reviewed-on: https://go-review.googlesource.com/c/go/+/302809 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2021-03-02cmd/compile: optimize single-precision floating point square rootfanzha02
Add generic rule to rewrite the single-precision square root expression with one single-precision instruction. The optimization will reduce two times of precision converting between double-precision and single-precision. On arm64 flatform. previous: FCVTSD F0, F0 FSQRTD F0, F0 FCVTDS F0, F0 optimized: FSQRTS S0, S0 And this patch adds the test case to check the correctness. This patch refers to CL 241877, contributed by Alice Xu (dianhong.xu@arm.com) Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e Reviewed-on: https://go-review.googlesource.com/c/go/+/276873 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-02-25cmd/internal/obj: add Prog.SetFrom3{Reg,Const}Josh Bleecher Snyder
These are the the most common uses, and they reduce line noise. I don't love adding new deprecated APIs, but since they're trivial wrappers, it'll be very easy to update them along with the rest. No functional changes; passes toolstash-check. Change-Id: I691a8175cfef9081180e463c63f326376af3f3a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/296009 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2021-02-25cmd/compile: automate resultInArg0 register checksJosh Bleecher Snyder
No functional changes; passes toolstash-check. No measureable performance changes. Change-Id: I2629f73d4a3cc56d80f512f33cf57cf41d8f15d3 Reviewed-on: https://go-review.googlesource.com/c/go/+/296010 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-12-23[dev.regabi] cmd/compile: split out package ssagen [generated]Russ Cox
[git-generate] cd src/cmd/compile/internal/gc rf ' # maxOpenDefers is declared in ssa.go but used only by walk. mv maxOpenDefers walk.go # gc.Arch -> ssagen.Arch # It is not as nice but will do for now. mv Arch ArchInfo mv thearch Arch mv Arch ArchInfo arch.go # Pull dwarf out of pgen.go. mv debuginfo declPos createDwarfVars preInliningDcls \ createSimpleVars createSimpleVar \ createComplexVars createComplexVar \ dwarf.go # Pull high-level compilation out of pgen.go, # leaving only the SSA code. mv compilequeue funccompile compile compilenow \ compileFunctions isInlinableButNotInlined \ initLSym \ compile.go mv BoundsCheckFunc GCWriteBarrierReg ssa.go mv largeStack largeStackFrames CheckLargeStacks pgen.go # All that is left in dcl.go is the nowritebarrierrecCheck mv dcl.go nowb.go # Export API and unexport non-API. mv initssaconfig InitConfig mv isIntrinsicCall IsIntrinsicCall mv ssaDumpInline DumpInline mv initSSATables InitTables mv initSSAEnv InitEnv mv compileSSA Compile mv stackOffset StackOffset mv canSSAType TypeOK mv SSAGenState State mv FwdRefAux fwdRefAux mv cgoSymABIs CgoSymABIs mv readSymABIs ReadSymABIs mv initLSym InitLSym mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen ' rm go.go gsubr.go Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/279476 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23[dev.regabi] cmd/compile: split out package objw [generated]Russ Cox
Object file writing routines are used not just at the end of the compilation but also during static data layout in walk. Split them into their own package. [git-generate] cd src/cmd/compile/internal/gc rf ' # Move bit vector to new package bitvec mv bvec.n bvec.N mv bvec.b bvec.B mv bvec BitVec mv bvalloc New mv bvbulkalloc NewBulk mv bulkBvec.next bulkBvec.Next mv bulkBvec Bulk mv H0 h0 mv Hp hp # Leave bvecSet and bitmap hashes behind - not needed as broadly. mv bvecSet.extractUniqe bvecSet.extractUnique mv h0 bvecSet bvecSet.grow bvecSet.add \ bvecSet.extractUnique hashbitmap bvset.go mv bv.go cmd/compile/internal/bitvec ex . ../arm ../arm64 ../mips ../mips64 ../ppc64 ../s390x ../riscv64 { import "cmd/internal/obj" var a *obj.Addr var i int64 Addrconst(a, i) -> a.SetConst(i) var p, to *obj.Prog Patch(p, to) -> p.To.SetTarget(to) } rm Addrconst Patch # Move object-writing API to new package objw mv duint8 Objw_Uint8 mv duint16 Objw_Uint16 mv duint32 Objw_Uint32 mv duintptr Objw_Uintptr mv duintxx Objw_UintN mv dsymptr Objw_SymPtr mv dsymptrOff Objw_SymPtrOff mv dsymptrWeakOff Objw_SymPtrWeakOff mv ggloblsym Objw_Global mv dbvec Objw_BitVec mv newProgs NewProgs mv Progs.clearp Progs.Clear mv Progs.settext Progs.SetText mv Progs.next Progs.Next mv Progs.pc Progs.PC mv Progs.pos Progs.Pos mv Progs.curfn Progs.CurFunc mv Progs.progcache Progs.Cache mv Progs.cacheidx Progs.CacheIndex mv Progs.nextLive Progs.NextLive mv Progs.prevLive Progs.PrevLive mv Progs.Appendpp Progs.Append mv LivenessIndex.stackMapIndex LivenessIndex.StackMapIndex mv LivenessIndex.isUnsafePoint LivenessIndex.IsUnsafePoint mv Objw_Uint8 Objw_Uint16 Objw_Uint32 Objw_Uintptr Objw_UintN \ Objw_SymPtr Objw_SymPtrOff Objw_SymPtrWeakOff Objw_Global \ Objw_BitVec \ objw.go mv sharedProgArray NewProgs Progs \ LivenessIndex StackMapDontCare \ LivenessDontCare LivenessIndex.StackMapValid \ Progs.NewProg Progs.Flush Progs.Free Progs.Prog Progs.Clear Progs.Append Progs.SetText \ prog.go mv prog.go objw.go cmd/compile/internal/objw # Move ggloblnod to obj with the rest of the non-objw higher-level writing. mv ggloblnod obj.go ' cd ../objw rf ' mv Objw_Uint8 Uint8 mv Objw_Uint16 Uint16 mv Objw_Uint32 Uint32 mv Objw_Uintptr Uintptr mv Objw_UintN UintN mv Objw_SymPtr SymPtr mv Objw_SymPtrOff SymPtrOff mv Objw_SymPtrWeakOff SymPtrWeakOff mv Objw_Global Global mv Objw_BitVec BitVec ' Change-Id: I2b87085aa788564fb322e9c55bddd73347b4d5fd Reviewed-on: https://go-review.googlesource.com/c/go/+/279310 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23[dev.regabi] cmd/compile: move type size calculations into package types ↵Russ Cox
[generated] To break up package gc, we need to put these calculations somewhere lower in the import graph, either an existing or new package. Package types already needs this code and is using hacks to get it without an import cycle. We can remove the hacks and set up for the new package gc by moving the code into package types itself. [git-generate] cd src/cmd/compile/internal/gc rf ' # Remove old import cycle hacks in gc. rm TypecheckInit:/types.Widthptr =/-0,/types.Dowidth =/+0 \ ../ssa/export_test.go:/types.Dowidth =/-+ ex { import "cmd/compile/internal/types" types.Widthptr -> Widthptr types.Dowidth -> dowidth } # Disable CalcSize in tests instead of base.Fatalf sub dowidth:/base.Fatalf\("dowidth without betypeinit"\)/ \ // Assume this is a test. \ return # Move size calculation into cmd/compile/internal/types mv Widthptr PtrSize mv Widthreg RegSize mv slicePtrOffset SlicePtrOffset mv sliceLenOffset SliceLenOffset mv sliceCapOffset SliceCapOffset mv sizeofSlice SliceSize mv sizeofString StringSize mv skipDowidthForTracing SkipSizeForTracing mv dowidth CalcSize mv checkwidth CheckSize mv widstruct calcStructOffset mv sizeCalculationDisabled CalcSizeDisabled mv defercheckwidth DeferCheckSize mv resumecheckwidth ResumeCheckSize mv typeptrdata PtrDataSize mv \ PtrSize RegSize SlicePtrOffset SkipSizeForTracing typePos align.go PtrDataSize \ size.go mv size.go cmd/compile/internal/types ' : # Remove old import cycle hacks in types. cd ../types rf ' ex { Widthptr -> PtrSize Dowidth -> CalcSize } rm Widthptr Dowidth ' Change-Id: Ib96cdc6bda2617235480c29392ea5cfb20f60cd8 Reviewed-on: https://go-review.googlesource.com/c/go/+/279234 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23[dev.regabi] cmd/compile: group known symbols, packages, names [generated]Russ Cox
There are a handful of pre-computed magic symbols known by package gc, and we need a place to store them. If we keep them together, the need for type *ir.Name means that package ir is the lowest package in the import hierarchy that they can go in. And package ir needs gopkg for methodSymSuffix (in a later CL), so they can't go any higher either, at least not all together. So package ir it is. Rather than dump them all into the top-level package ir namespace, however, we introduce global structs, Syms, Pkgs, and Names, and make the known symbols, packages, and names fields of those. [git-generate] cd src/cmd/compile/internal/gc rf ' add go.go:$ \ // Names holds known names. \ var Names struct{} \ \ // Syms holds known symbols. \ var Syms struct {} \ \ // Pkgs holds known packages. \ var Pkgs struct {} \ mv staticuint64s Names.Staticuint64s mv zerobase Names.Zerobase mv assertE2I Syms.AssertE2I mv assertE2I2 Syms.AssertE2I2 mv assertI2I Syms.AssertI2I mv assertI2I2 Syms.AssertI2I2 mv deferproc Syms.Deferproc mv deferprocStack Syms.DeferprocStack mv Deferreturn Syms.Deferreturn mv Duffcopy Syms.Duffcopy mv Duffzero Syms.Duffzero mv gcWriteBarrier Syms.GCWriteBarrier mv goschedguarded Syms.Goschedguarded mv growslice Syms.Growslice mv msanread Syms.Msanread mv msanwrite Syms.Msanwrite mv msanmove Syms.Msanmove mv newobject Syms.Newobject mv newproc Syms.Newproc mv panicdivide Syms.Panicdivide mv panicshift Syms.Panicshift mv panicdottypeE Syms.PanicdottypeE mv panicdottypeI Syms.PanicdottypeI mv panicnildottype Syms.Panicnildottype mv panicoverflow Syms.Panicoverflow mv raceread Syms.Raceread mv racereadrange Syms.Racereadrange mv racewrite Syms.Racewrite mv racewriterange Syms.Racewriterange mv SigPanic Syms.SigPanic mv typedmemclr Syms.Typedmemclr mv typedmemmove Syms.Typedmemmove mv Udiv Syms.Udiv mv writeBarrier Syms.WriteBarrier mv zerobaseSym Syms.Zerobase mv arm64HasATOMICS Syms.ARM64HasATOMICS mv armHasVFPv4 Syms.ARMHasVFPv4 mv x86HasFMA Syms.X86HasFMA mv x86HasPOPCNT Syms.X86HasPOPCNT mv x86HasSSE41 Syms.X86HasSSE41 mv WasmDiv Syms.WasmDiv mv WasmMove Syms.WasmMove mv WasmZero Syms.WasmZero mv WasmTruncS Syms.WasmTruncS mv WasmTruncU Syms.WasmTruncU mv gopkg Pkgs.Go mv itabpkg Pkgs.Itab mv itablinkpkg Pkgs.Itablink mv mappkg Pkgs.Map mv msanpkg Pkgs.Msan mv racepkg Pkgs.Race mv Runtimepkg Pkgs.Runtime mv trackpkg Pkgs.Track mv unsafepkg Pkgs.Unsafe mv Names Syms Pkgs symtab.go mv symtab.go cmd/compile/internal/ir ' Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/279235 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-08[dev.regabi] cmd/compile: rewrite Aux uses of ir.Node to *ir.Name [generated]Matthew Dempsky
Now that the only remaining ir.Node implementation that is stored (directly) into ssa.Aux, we can rewrite all of the conversions between ir.Node and ssa.Aux to use *ir.Name instead. rf doesn't have a way to rewrite the type switch case clauses, so we just use sed instead. There's only a handful, and they're the only times that "case ir.Node" appears anyway. The next CL will move the tag method declarations so that ir.Node no longer implements ssa.Aux. Passes buildall w/ toolstash -cmp. Updates #42982. [git-generate] cd src/cmd/compile/internal sed -i -e 's/case ir.Node/case *ir.Name/' gc/plive.go */ssa.go cd ssa rf ' ex . ../gc { import "cmd/compile/internal/ir" var v *Value v.Aux.(ir.Node) -> v.Aux.(*ir.Name) var n ir.Node var asAux func(Aux) strict n # only match ir.Node-typed expressions; not *ir.Name implicit asAux # match implicit assignments to ssa.Aux asAux(n) -> n.(*ir.Name) } ' Change-Id: I3206ef5f12a7cfa37c5fecc67a1ca02ea4d52b32 Reviewed-on: https://go-review.googlesource.com/c/go/+/275789 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2020-11-25[dev.regabi] cmd/compile: replace *Node type with an interface Node [generated]Russ Cox
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct. The previous CL defined an interface INode modeling a *Node. This CL: - Changes all references outside internal/ir to use INode, along with many references inside internal/ir as well. - Renames Node to node. - Renames INode to Node So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible, and the code outside package ir is now (clearly) using only the interface. The usual rule is never to redefine an existing name with a new meaning, so that old code that hasn't been updated gets a "unknown name" error instead of more mysterious errors or silent misbehavior. That rule would caution against replacing Node-the-struct with Node-the-interface, as in this CL, because code that says *Node would now be using a pointer to an interface. But this CL is being landed at the same time as another that moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node, which does follow the rule: any lingering references to gc.Node will be told it's gone, not silently start using pointers to interfaces. So the rule is followed by the CL sequence, just not this specific CL. Overall, the loss of inlining caused by using interfaces cuts the compiler speed by about 6%, a not insignificant amount. However, as we convert the representation to concrete structs that are not the giant Node over the next weeks, that speed should come back as more of the compiler starts operating directly on concrete types and the memory taken up by the graph of Nodes drops due to the more precise structs. Honestly, I was expecting worse. % benchstat bench.old bench.new name old time/op new time/op delta Template 168ms ± 4% 182ms ± 2% +8.34% (p=0.000 n=9+9) Unicode 72.2ms ±10% 82.5ms ± 6% +14.38% (p=0.000 n=9+9) GoTypes 563ms ± 8% 598ms ± 2% +6.14% (p=0.006 n=9+9) Compiler 2.89s ± 4% 3.04s ± 2% +5.37% (p=0.000 n=10+9) SSA 6.45s ± 4% 7.25s ± 5% +12.41% (p=0.000 n=9+10) Flate 105ms ± 2% 115ms ± 1% +9.66% (p=0.000 n=10+8) GoParser 144ms ±10% 152ms ± 2% +5.79% (p=0.011 n=9+8) Reflect 345ms ± 9% 370ms ± 4% +7.28% (p=0.001 n=10+9) Tar 149ms ± 9% 161ms ± 5% +8.05% (p=0.001 n=10+9) XML 190ms ± 3% 209ms ± 2% +9.54% (p=0.000 n=9+8) LinkCompiler 327ms ± 2% 325ms ± 2% ~ (p=0.382 n=8+8) ExternalLinkCompiler 1.77s ± 4% 1.73s ± 6% ~ (p=0.113 n=9+10) LinkWithoutDebugCompiler 214ms ± 4% 211ms ± 2% ~ (p=0.360 n=10+8) StdCmd 14.8s ± 3% 15.9s ± 1% +6.98% (p=0.000 n=10+9) [Geo mean] 480ms 510ms +6.31% name old user-time/op new user-time/op delta Template 223ms ± 3% 237ms ± 3% +6.16% (p=0.000 n=9+10) Unicode 103ms ± 6% 113ms ± 3% +9.53% (p=0.000 n=9+9) GoTypes 758ms ± 8% 800ms ± 2% +5.55% (p=0.003 n=10+9) Compiler 3.95s ± 2% 4.12s ± 2% +4.34% (p=0.000 n=10+9) SSA 9.43s ± 1% 9.74s ± 4% +3.25% (p=0.000 n=8+10) Flate 132ms ± 2% 141ms ± 2% +6.89% (p=0.000 n=9+9) GoParser 177ms ± 9% 183ms ± 4% ~ (p=0.050 n=9+9) Reflect 467ms ±10% 495ms ± 7% +6.17% (p=0.029 n=10+10) Tar 183ms ± 9% 197ms ± 5% +7.92% (p=0.001 n=10+10) XML 249ms ± 5% 268ms ± 4% +7.82% (p=0.000 n=10+9) LinkCompiler 544ms ± 5% 544ms ± 6% ~ (p=0.863 n=9+9) ExternalLinkCompiler 1.79s ± 4% 1.75s ± 6% ~ (p=0.075 n=10+10) LinkWithoutDebugCompiler 248ms ± 6% 246ms ± 2% ~ (p=0.965 n=10+8) [Geo mean] 483ms 504ms +4.41% [git-generate] cd src/cmd/compile/internal/ir : # We need to do the conversion in multiple steps, so we introduce : # a temporary type alias that will start out meaning the pointer-to-struct : # and then change to mean the interface. rf ' mv Node OldNode add node.go \ type Node = *OldNode ' : # It should work to do this ex in ir, but it misses test files, due to a bug in rf. : # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests. cd ../gc rf ' ex . ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm { import "cmd/compile/internal/ir" *ir.OldNode -> ir.Node } ' cd ../ssa rf ' ex { import "cmd/compile/internal/ir" *ir.OldNode -> ir.Node } ' : # Back in ir, finish conversion clumsily with sed, : # because type checking and circular aliases do not mix. cd ../ir sed -i '' ' /type Node = \*OldNode/d s/\*OldNode/Node/g s/^func (n Node)/func (n *OldNode)/ s/OldNode/node/g s/type INode interface/type Node interface/ s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/ ' *.go gofmt -w *.go sed -i '' ' s/{Func{}, 136, 248}/{Func{}, 152, 280}/ s/{Name{}, 32, 56}/{Name{}, 44, 80}/ s/{Param{}, 24, 48}/{Param{}, 44, 88}/ s/{node{}, 76, 128}/{node{}, 88, 152}/ ' sizeof_test.go cd ../ssa sed -i '' ' s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/ ' sizeof_test.go cd ../gc sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go cd ../../../.. go install std cmd cd cmd/compile go test -u || go test -u Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553 Reviewed-on: https://go-review.googlesource.com/c/go/+/272935 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25[dev.regabi] cmd/compile: introduce cmd/compile/internal/ir [generated]Russ Cox
If we want to break up package gc at all, we will need to move the compiler IR it defines into a separate package that can be imported by packages that gc itself imports. This CL does that. It also removes the TINT8 etc aliases so that all code is clear about which package things are coming from. This CL is automatically generated by the script below. See the comments in the script for details about the changes. [git-generate] cd src/cmd/compile/internal/gc rf ' # These names were never fully qualified # when the types package was added. # Do it now, to avoid confusion about where they live. inline -rm \ Txxx \ TINT8 \ TUINT8 \ TINT16 \ TUINT16 \ TINT32 \ TUINT32 \ TINT64 \ TUINT64 \ TINT \ TUINT \ TUINTPTR \ TCOMPLEX64 \ TCOMPLEX128 \ TFLOAT32 \ TFLOAT64 \ TBOOL \ TPTR \ TFUNC \ TSLICE \ TARRAY \ TSTRUCT \ TCHAN \ TMAP \ TINTER \ TFORW \ TANY \ TSTRING \ TUNSAFEPTR \ TIDEAL \ TNIL \ TBLANK \ TFUNCARGS \ TCHANARGS \ NTYPE \ BADWIDTH # esc.go and escape.go do not need to be split. # Append esc.go onto the end of escape.go. mv esc.go escape.go # Pull out the type format installation from func Main, # so it can be carried into package ir. mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats # Names that need to be exported for use by code left in gc. mv Isconst IsConst mv asNode AsNode mv asNodes AsNodes mv asTypesNode AsTypesNode mv basicnames BasicTypeNames mv builtinpkg BuiltinPkg mv consttype ConstType mv dumplist DumpList mv fdumplist FDumpList mv fmtMode FmtMode mv goopnames OpNames mv inspect Inspect mv inspectList InspectList mv localpkg LocalPkg mv nblank BlankNode mv numImport NumImport mv opprec OpPrec mv origSym OrigSym mv stmtwithinit StmtWithInit mv dump DumpAny mv fdump FDumpAny mv nod Nod mv nodl NodAt mv newname NewName mv newnamel NewNameAt mv assertRepresents AssertValidTypeForConst mv represents ValidTypeForConst mv nodlit NewLiteral # Types and fields that need to be exported for use by gc. mv nowritebarrierrecCallSym SymAndPos mv SymAndPos.lineno SymAndPos.Pos mv SymAndPos.target SymAndPos.Sym mv Func.lsym Func.LSym mv Func.setWBPos Func.SetWBPos mv Func.numReturns Func.NumReturns mv Func.numDefers Func.NumDefers mv Func.nwbrCalls Func.NWBRCalls # initLSym is an algorithm left behind in gc, # not an operation on Func itself. mv Func.initLSym initLSym mv nodeQueue NodeQueue mv NodeQueue.empty NodeQueue.Empty mv NodeQueue.popLeft NodeQueue.PopLeft mv NodeQueue.pushRight NodeQueue.PushRight # Many methods on Node are actually algorithms that # would apply to any node implementation. # Those become plain functions. mv Node.funcname FuncName mv Node.isBlank IsBlank mv Node.isGoConst isGoConst mv Node.isNil IsNil mv Node.isParamHeapCopy isParamHeapCopy mv Node.isParamStackCopy isParamStackCopy mv Node.isSimpleName isSimpleName mv Node.mayBeShared MayBeShared mv Node.pkgFuncName PkgFuncName mv Node.backingArrayPtrLen backingArrayPtrLen mv Node.isterminating isTermNode mv Node.labeledControl labeledControl mv Nodes.isterminating isTermNodes mv Nodes.sigerr fmtSignature mv Node.MethodName methodExprName mv Node.MethodFunc methodExprFunc mv Node.IsMethod IsMethod # Every node will need to implement RawCopy; # Copy and SepCopy algorithms will use it. mv Node.rawcopy Node.RawCopy mv Node.copy Copy mv Node.sepcopy SepCopy # Extract Node.Format method body into func FmtNode, # but leave method wrapper behind. mv Node.Format:0,$ FmtNode # Formatting helpers that will apply to all node implementations. mv Node.Line Line mv Node.exprfmt exprFmt mv Node.jconv jconvFmt mv Node.modeString modeString mv Node.nconv nconvFmt mv Node.nodedump nodeDumpFmt mv Node.nodefmt nodeFmt mv Node.stmtfmt stmtFmt # Constant support needed for code moving to ir. mv okforconst OKForConst mv vconv FmtConst mv int64Val Int64Val mv float64Val Float64Val mv Node.ValueInterface ConstValue # Organize code into files. mv LocalPkg BuiltinPkg ir.go mv NumImport InstallTypeFormats Line fmt.go mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \ AsNode AsTypesNode BlankNode OrigSym \ Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \ IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \ Node.RawCopy SepCopy Copy \ IsNil IsBlank IsMethod \ Node.Typ Node.StorageClass node.go mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go # Move files to new ir package. mv bitset.go class_string.go dump.go fmt.go \ ir.go node.go op_string.go val.go \ sizeof_test.go cmd/compile/internal/ir ' : # fix mkbuiltin.go to generate the changes made to builtin.go during rf sed -i '' ' s/\[T/[types.T/g s/\*Node/*ir.Node/g /internal\/types/c \ fmt.Fprintln(&b, `import (`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/ir"`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/types"`) \ fmt.Fprintln(&b, `)`) ' mkbuiltin.go gofmt -w mkbuiltin.go : # update cmd/dist to add internal/ir cd ../../../dist sed -i '' '/compile.internal.gc/a\ "cmd/compile/internal/ir", ' buildtool.go gofmt -w buildtool.go : # update cmd/compile TestFormats cd ../.. go install std cmd cd cmd/compile go test -u || go test # first one updates but fails; second passes Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db Reviewed-on: https://go-review.googlesource.com/c/go/+/273008 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25[dev.regabi] cmd/compile: introduce cmd/compile/internal/base [generated]Russ Cox
Move Flag, Debug, Ctxt, Exit, and error messages to new package cmd/compile/internal/base. These are the core functionality that everything in gc uses and which otherwise prevent splitting any other code out of gc into different packages. A minor milestone: the compiler source code no longer contains the string "yy". [git-generate] cd src/cmd/compile/internal/gc rf ' mv atExit AtExit mv Ctxt atExitFuncs AtExit Exit base.go mv lineno Pos mv linestr FmtPos mv flusherrors FlushErrors mv yyerror Errorf mv yyerrorl ErrorfAt mv yyerrorv ErrorfVers mv noder.yyerrorpos noder.errorAt mv Warnl WarnfAt mv errorexit ErrorExit mv base.go debug.go flag.go print.go cmd/compile/internal/base ' : # update comments sed -i '' 's/yyerrorl/ErrorfAt/g; s/yyerror/Errorf/g' *.go : # bootstrap.go is not built by default so invisible to rf sed -i '' 's/Fatalf/base.Fatalf/' bootstrap.go goimports -w bootstrap.go : # update cmd/dist to add internal/base cd ../../../dist sed -i '' '/internal.amd64/a\ "cmd/compile/internal/base", ' buildtool.go gofmt -w buildtool.go Change-Id: I59903c7084222d6eaee38823fd222159ba24a31a Reviewed-on: https://go-review.googlesource.com/c/go/+/272250 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25[dev.regabi] cmd/compile: clean up debug flag (-d) handling [generated]Russ Cox
The debug table is not as haphazard as flags, but there are still a few mismatches between command-line names and variable names. This CL moves them all into a consistent home (var Debug, like var Flag). Code updated automatically using the rf command below. A followup CL will make a few manual cleanups, leaving this CL completely automated and easier to regenerate during merge conflicts. [git-generate] cd src/cmd/compile/internal/gc rf ' add main.go var Debug struct{} mv Debug_append Debug.Append mv Debug_checkptr Debug.Checkptr mv Debug_closure Debug.Closure mv Debug_compilelater Debug.CompileLater mv disable_checknil Debug.DisableNil mv debug_dclstack Debug.DclStack mv Debug_gcprog Debug.GCProg mv Debug_libfuzzer Debug.Libfuzzer mv Debug_checknil Debug.Nil mv Debug_panic Debug.Panic mv Debug_slice Debug.Slice mv Debug_typeassert Debug.TypeAssert mv Debug_wb Debug.WB mv Debug_export Debug.Export mv Debug_pctab Debug.PCTab mv Debug_locationlist Debug.LocationLists mv Debug_typecheckinl Debug.TypecheckInl mv Debug_gendwarfinl Debug.DwarfInl mv Debug_softfloat Debug.SoftFloat mv Debug_defer Debug.Defer mv Debug_dumpptrs Debug.DumpPtrs mv flag.go:/parse.-d/-1,/unknown.debug/+2 parseDebug mv debugtab Debug parseDebug \ debugHelpHeader debugHelpFooter \ debug.go # Remove //go:generate line copied from main.go rm debug.go:/go:generate/-+ ' Change-Id: I625761ca5659be4052f7161a83baa00df75cca91 Reviewed-on: https://go-review.googlesource.com/c/go/+/272246 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-06-18cmd/compile: redo flag constant ops for armKeith Randall
Encode the flag results in an auxint field instead of having one opcode per flag state. This helps us handle the new *noov branches in a unified manner. This is only for arm, arm64 is in a subsequent CL. We could extend to other architectures as well, athough it would only be cleanup, no behavioral change. Update #39505 Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/238077 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-06-09cmd/compile: ARM comparisons with 0 incorrect on overflowXiangdong Ji
Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-11-10cmd/compile: enable nil check logging for other architectures.David Chase
Change-Id: If82ebd9cd6470863eb5de9e031e7905a66218857 Reviewed-on: https://go-review.googlesource.com/c/go/+/204159 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-28cmd/compile: delete ZeroAutoCherry Zhang
ZeroAuto was used with the ambiguously live logic. The ambiguously live logic is removed as we switched to stack objects. It is now never called. Remove. Change-Id: If4cdd7fed5297f8ab591cc392a76c80f57820856 Reviewed-on: https://go-review.googlesource.com/c/go/+/203538 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-21cmd/compile: add fma intrinsic for armsmasher164
This change introduces an arm intrinsic that generates the FMULAD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite rule translates the generic intrinsic to FMULAD. Updates #25819. Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa Reviewed-on: https://go-review.googlesource.com/c/go/+/142117 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-10-09all: remove the nacl port (part 2, amd64p32 + toolchain)Brad Fitzpatrick
This is part two if the nacl removal. Part 1 was CL 199499. This CL removes amd64p32 support, which might be useful in the future if we implement the x32 ABI. It also removes the nacl bits in the toolchain, and some remaining nacl bits. Updates #30439 Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/200077 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-02cmd/compile: allow multiple SSA block control valuesMichael Munday
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-28cmd/compile: optimize ARM's math.bits.RotateLeft32Ben Shi
This CL optimizes math.bits.RotateLeft32 to inline "MOVW Rx@>Ry, Rd" on ARM. The benchmark results of math/bits show some improvements. name old time/op new time/op delta RotateLeft-4 9.42ns ± 0% 6.91ns ± 0% -26.66% (p=0.000 n=40+33) RotateLeft8-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+31) RotateLeft16-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+32) RotateLeft32-4 8.16ns ± 0% 7.54ns ± 0% -7.68% (p=0.000 n=40+40) RotateLeft64-4 15.7ns ± 0% 15.7ns ± 0% ~ (all equal) updates #31265 Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712 Reviewed-on: https://go-review.googlesource.com/c/go/+/188697 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28cmd/compile: optimize ARM's math.AbsBen Shi
This CL optimizes math.Abs to an inline ABSD instruction on ARM. The benchmark results of src/math/ show big improvements. name old time/op new time/op delta Acos-4 181ns ± 0% 182ns ± 0% +0.30% (p=0.000 n=40+40) Acosh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Asin-4 163ns ± 0% 163ns ± 0% ~ (all equal) Asinh-4 242ns ± 0% 242ns ± 0% ~ (all equal) Atan-4 120ns ± 0% 121ns ± 0% +0.83% (p=0.000 n=40+40) Atanh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Atan2-4 173ns ± 0% 173ns ± 0% ~ (all equal) Cbrt-4 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=39+37) Ceil-4 72.9ns ± 0% 72.8ns ± 0% ~ (p=0.237 n=40+40) Copysign-4 13.2ns ± 0% 13.2ns ± 0% ~ (all equal) Cos-4 193ns ± 0% 183ns ± 0% -5.18% (p=0.000 n=40+40) Cosh-4 254ns ± 0% 239ns ± 0% -5.91% (p=0.000 n=40+40) Erf-4 112ns ± 0% 112ns ± 0% ~ (all equal) Erfc-4 117ns ± 0% 117ns ± 0% ~ (all equal) Erfinv-4 127ns ± 0% 127ns ± 1% ~ (p=0.492 n=40+40) Erfcinv-4 128ns ± 0% 128ns ± 0% ~ (all equal) Exp-4 212ns ± 0% 206ns ± 0% -3.05% (p=0.000 n=40+40) ExpGo-4 216ns ± 0% 209ns ± 0% -3.24% (p=0.000 n=40+40) Expm1-4 142ns ± 0% 142ns ± 0% ~ (all equal) Exp2-4 191ns ± 0% 184ns ± 0% -3.45% (p=0.000 n=40+40) Exp2Go-4 194ns ± 0% 187ns ± 0% -3.61% (p=0.000 n=40+40) Abs-4 14.4ns ± 0% 6.3ns ± 0% -56.39% (p=0.000 n=38+39) Dim-4 12.6ns ± 0% 12.6ns ± 0% ~ (all equal) Floor-4 49.6ns ± 0% 49.6ns ± 0% ~ (all equal) Max-4 27.6ns ± 0% 27.6ns ± 0% ~ (all equal) Min-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal) Mod-4 349ns ± 0% 305ns ± 1% -12.55% (p=0.000 n=33+40) Frexp-4 54.0ns ± 0% 47.1ns ± 0% -12.78% (p=0.000 n=38+38) Gamma-4 242ns ± 0% 234ns ± 0% -3.16% (p=0.000 n=36+40) Hypot-4 84.8ns ± 0% 67.8ns ± 0% -20.05% (p=0.000 n=31+35) HypotGo-4 88.5ns ± 0% 71.6ns ± 0% -19.12% (p=0.000 n=40+38) Ilogb-4 45.8ns ± 0% 38.9ns ± 0% -15.12% (p=0.000 n=40+32) J0-4 821ns ± 0% 802ns ± 0% -2.33% (p=0.000 n=33+40) J1-4 816ns ± 0% 807ns ± 0% -1.05% (p=0.000 n=40+29) Jn-4 1.67µs ± 0% 1.65µs ± 0% -1.45% (p=0.000 n=40+39) Ldexp-4 61.5ns ± 0% 54.6ns ± 0% -11.27% (p=0.000 n=40+32) Lgamma-4 188ns ± 0% 188ns ± 0% ~ (all equal) Log-4 154ns ± 0% 147ns ± 0% -4.78% (p=0.000 n=40+40) Logb-4 50.9ns ± 0% 42.7ns ± 0% -16.11% (p=0.000 n=34+39) Log1p-4 160ns ± 0% 159ns ± 0% ~ (p=0.828 n=40+40) Log10-4 173ns ± 0% 166ns ± 0% -4.05% (p=0.000 n=40+40) Log2-4 65.3ns ± 0% 58.4ns ± 0% -10.57% (p=0.000 n=37+37) Modf-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter32-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter64-4 32.7ns ± 0% 32.6ns ± 0% ~ (p=0.375 n=40+40) PowInt-4 300ns ± 0% 277ns ± 0% -7.78% (p=0.000 n=40+40) PowFrac-4 676ns ± 0% 635ns ± 0% -6.00% (p=0.000 n=40+35) Pow10Pos-4 17.6ns ± 0% 17.6ns ± 0% ~ (all equal) Pow10Neg-4 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) Round-4 30.1ns ± 0% 30.1ns ± 0% ~ (all equal) RoundToEven-4 38.9ns ± 0% 38.9ns ± 0% ~ (all equal) Remainder-4 291ns ± 0% 263ns ± 0% -9.62% (p=0.000 n=40+40) Signbit-4 11.3ns ± 0% 11.3ns ± 0% ~ (all equal) Sin-4 185ns ± 0% 185ns ± 0% ~ (all equal) Sincos-4 230ns ± 0% 230ns ± 0% ~ (all equal) Sinh-4 253ns ± 0% 246ns ± 0% -2.77% (p=0.000 n=39+39) SqrtIndirect-4 41.4ns ± 0% 41.4ns ± 0% ~ (all equal) SqrtLatency-4 13.8ns ± 0% 13.8ns ± 0% ~ (all equal) SqrtIndirectLatency-4 37.0ns ± 0% 37.0ns ± 0% ~ (p=0.632 n=40+40) SqrtGoLatency-4 911ns ± 0% 911ns ± 0% +0.08% (p=0.000 n=40+40) SqrtPrime-4 13.2µs ± 0% 13.2µs ± 0% +0.01% (p=0.038 n=38+40) Tan-4 205ns ± 0% 205ns ± 0% ~ (all equal) Tanh-4 264ns ± 0% 247ns ± 0% -6.44% (p=0.000 n=39+32) Trunc-4 45.2ns ± 0% 45.2ns ± 0% ~ (all equal) Y0-4 796ns ± 0% 792ns ± 0% -0.55% (p=0.000 n=35+40) Y1-4 804ns ± 0% 797ns ± 0% -0.82% (p=0.000 n=24+40) Yn-4 1.64µs ± 0% 1.62µs ± 0% -1.27% (p=0.000 n=40+39) Float64bits-4 8.16ns ± 0% 8.16ns ± 0% +0.04% (p=0.000 n=35+40) Float64frombits-4 10.7ns ± 0% 10.7ns ± 0% ~ (all equal) Float32bits-4 7.53ns ± 0% 7.53ns ± 0% ~ (p=0.760 n=40+40) Float32frombits-4 6.91ns ± 0% 6.91ns ± 0% -0.04% (p=0.002 n=32+38) [Geo mean] 111ns 106ns -3.98% Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d Reviewed-on: https://go-review.googlesource.com/c/go/+/188678 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-04-13cmd/compile/internal/arm: merge cases in ssaGenValueTobias Klauser
Merge case statement for OpARMSLL, OpARMSRL and OpARMSRA into an existing one using the same logic. Change-Id: Ic4224668228902e5188fb0559b5f1949cfea1381 Reviewed-on: https://go-review.googlesource.com/c/go/+/171724 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-04-07cmd/compile: remove AUNDEF opcodeKeith Randall
This opcode was only used to mark unreachable code for plive to use. plive now uses the SSA representation, so it knows locations are unreachable because they are ends of Exit blocks. It doesn't need these opcodes any more. These opcodes actually used space in the binary, 2 bytes per undef on x86 and more for other archs. Makes the amd64 go binary 0.2% smaller. Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620 Reviewed-on: https://go-review.googlesource.com/c/go/+/171058 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-18cmd/compile,runtime: provide index information on bounds check failureKeith Randall
A few examples (for accessing a slice of length 3): s[-1] runtime error: index out of range [-1] s[3] runtime error: index out of range [3] with length 3 s[-1:0] runtime error: slice bounds out of range [-1:] s[3:0] runtime error: slice bounds out of range [3:0] s[3:-1] runtime error: slice bounds out of range [:-1] s[3:4] runtime error: slice bounds out of range [:4] with capacity 3 s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3 Note that in cases where there are multiple things wrong with the indexes (e.g. s[3:-1]), we report one of those errors kind of arbitrarily, currently the rightmost one. An exhaustive set of examples is in issue30116[u].out in the CL. The message text has the same prefix as the old message text. That leads to slightly awkward phrasing but hopefully minimizes the chance that code depending on the error text will break. Increases the size of the go binary by 0.5% (amd64). The panic functions take arguments in registers in order to keep the size of the compiled code as small as possible. Fixes #30116 Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd Reviewed-on: https://go-review.googlesource.com/c/go/+/161477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2019-03-07cmd/compile: add an optimization rule for math/bits.ReverseBytes16 on armerifan01
This CL adds two rules to turn patterns like ((x<<8) | (x>>8)) (the type of x is uint16, "|" can also be "+" or "^") to a REV16 instruction on arm v6+. This optimization rule can be used for math/bits.ReverseBytes16. Benchmarks on arm v6: name old time/op new time/op delta ReverseBytes-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal) ReverseBytes16-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal) ReverseBytes32-32 1.29ns ± 0% 1.29ns ± 0% ~ (all equal) ReverseBytes64-32 1.43ns ± 0% 1.43ns ± 0% ~ (all equal) Change-Id: I819e633c9a9d308f8e476fb0c82d73fb73dd019f Reviewed-on: https://go-review.googlesource.com/c/go/+/159019 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-02-25cmd/compile: call ginsnop, not ginsnop2 on ppc64le for mid-stack inlining ↵Lynn Boger
tracebacks A recent change to fix stacktraces for inlined functions introduced a regression on ppc64le when compiling position independent code. That happened because ginsnop2 was called for the purpose of inserting a NOP to identify the location of the inlined function, when ginsnop should have been used. ginsnop2 is intended to be used before deferreturn to ensure r2 is properly restored when compiling position independent code. In some cases the location where r2 is loaded from might not be initialized. If that happens and r2 is used to generate an address, the result is likely a SEGV. This fixes that problem. Fixes #30283 Change-Id: If70ef27fc65ef31969712422306ac3a57adbd5b6 Reviewed-on: https://go-review.googlesource.com/c/163337 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Andrew Bonventre <andybons@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-28cmd/compile,runtime: redo mid-stack inlining tracebacksKeith Randall
Work involved in getting a stack trace is divided between runtime.Callers and runtime.CallersFrames. Before this CL, runtime.Callers returns a pc per runtime frame. runtime.CallersFrames is responsible for expanding a runtime frame into potentially multiple user frames. After this CL, runtime.Callers returns a pc per user frame. runtime.CallersFrames just maps those to user frame info. Entries in the result of runtime.Callers are now pcs of the calls (or of the inline marks), not of the instruction just after the call. Fixes #29007 Fixes #28640 Update #26320 Change-Id: I1c9567596ff73dc73271311005097a9188c3406f Reviewed-on: https://go-review.googlesource.com/c/152537 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-09-11cmd/compile: optimize arm's bit operationBen Shi
BFC (Bit Field Clear) was introduced in ARMv7, which can simplify ANDconst and BICconst. And this CL implements that optimization. 1. The total size of pkg/android_arm decreases about 3KB, excluding cmd/compile/. 2. There is no regression in the go1 benchmark result, and some cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get slight improvement. name old time/op new time/op delta BinaryTree17-4 25.3s ± 1% 25.2s ± 1% ~ (p=0.072 n=30+29) Fannkuch11-4 13.3s ± 0% 13.3s ± 0% +0.13% (p=0.000 n=30+26) FmtFprintfEmpty-4 407ns ± 0% 394ns ± 0% -3.19% (p=0.000 n=26+28) FmtFprintfString-4 664ns ± 0% 662ns ± 0% -0.22% (p=0.000 n=30+30) FmtFprintfInt-4 712ns ± 0% 706ns ± 0% -0.79% (p=0.000 n=30+30) FmtFprintfIntInt-4 1.06µs ± 0% 1.05µs ± 0% -0.38% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 1.16µs ± 0% 1.16µs ± 0% -0.13% (p=0.000 n=30+29) FmtFprintfFloat-4 2.24µs ± 0% 2.23µs ± 0% -0.51% (p=0.000 n=29+21) FmtManyArgs-4 4.09µs ± 0% 4.06µs ± 0% -0.83% (p=0.000 n=28+30) GobDecode-4 55.0ms ± 5% 55.4ms ± 5% ~ (p=0.307 n=30+30) GobEncode-4 51.2ms ± 1% 51.9ms ± 1% +1.23% (p=0.000 n=29+30) Gzip-4 2.64s ± 0% 2.60s ± 0% -1.35% (p=0.000 n=30+29) Gunzip-4 309ms ± 0% 308ms ± 0% -0.27% (p=0.000 n=30+30) HTTPClientServer-4 1.03ms ± 5% 1.02ms ± 4% ~ (p=0.117 n=30+29) JSONEncode-4 101ms ± 2% 101ms ± 2% ~ (p=0.338 n=29+29) JSONDecode-4 383ms ± 2% 382ms ± 2% ~ (p=0.751 n=26+30) Mandelbrot200-4 18.4ms ± 0% 18.4ms ± 0% -0.10% (p=0.000 n=29+29) GoParse-4 22.6ms ± 0% 22.5ms ± 0% -0.39% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 761ns ± 0% 750ns ± 0% -1.47% (p=0.000 n=26+29) RegexpMatchEasy0_1K-4 4.33µs ± 0% 4.34µs ± 0% +0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 809ns ± 0% 795ns ± 0% -1.74% (p=0.000 n=27+25) RegexpMatchEasy1_1K-4 5.54µs ± 0% 5.53µs ± 0% -0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 1.11µs ± 0% 1.08µs ± 0% -2.78% (p=0.000 n=27+29) RegexpMatchMedium_1K-4 255µs ± 0% 255µs ± 0% -0.02% (p=0.029 n=30+30) RegexpMatchHard_32-4 14.7µs ± 0% 14.7µs ± 0% -0.28% (p=0.000 n=30+29) RegexpMatchHard_1K-4 439µs ± 0% 439µs ± 0% ~ (p=0.907 n=23+27) Revcomp-4 41.9ms ± 1% 41.9ms ± 1% ~ (p=0.230 n=28+30) Template-4 522ms ± 1% 528ms ± 1% +1.25% (p=0.000 n=30+30) TimeParse-4 3.34µs ± 0% 3.35µs ± 0% +0.23% (p=0.000 n=30+27) TimeFormat-4 6.06µs ± 0% 6.13µs ± 0% +1.08% (p=0.000 n=29+29) [Geo mean] 384µs 382µs -0.37% name old speed new speed delta GobDecode-4 14.0MB/s ± 5% 13.9MB/s ± 5% ~ (p=0.308 n=30+30) GobEncode-4 15.0MB/s ± 1% 14.8MB/s ± 1% -1.22% (p=0.000 n=29+30) Gzip-4 7.36MB/s ± 0% 7.46MB/s ± 0% +1.35% (p=0.000 n=30+30) Gunzip-4 62.8MB/s ± 0% 63.0MB/s ± 0% +0.27% (p=0.000 n=30+30) JSONEncode-4 19.2MB/s ± 2% 19.2MB/s ± 2% ~ (p=0.312 n=29+29) JSONDecode-4 5.05MB/s ± 3% 5.08MB/s ± 2% ~ (p=0.356 n=29+30) GoParse-4 2.56MB/s ± 0% 2.57MB/s ± 0% +0.39% (p=0.000 n=23+27) RegexpMatchEasy0_32-4 42.0MB/s ± 0% 42.6MB/s ± 0% +1.50% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 236MB/s ± 0% 236MB/s ± 0% -0.27% (p=0.000 n=25+28) RegexpMatchEasy1_32-4 39.6MB/s ± 0% 40.2MB/s ± 0% +1.73% (p=0.000 n=27+27) RegexpMatchEasy1_1K-4 185MB/s ± 0% 185MB/s ± 0% +0.18% (p=0.000 n=29+29) RegexpMatchMedium_32-4 900kB/s ± 0% 920kB/s ± 0% +2.22% (p=0.000 n=29+29) RegexpMatchMedium_1K-4 4.02MB/s ± 0% 4.02MB/s ± 0% +0.07% (p=0.004 n=30+27) RegexpMatchHard_32-4 2.17MB/s ± 0% 2.18MB/s ± 0% +0.46% (p=0.000 n=30+26) RegexpMatchHard_1K-4 2.33MB/s ± 0% 2.33MB/s ± 0% ~ (all equal) Revcomp-4 60.6MB/s ± 1% 60.7MB/s ± 1% ~ (p=0.207 n=28+30) Template-4 3.72MB/s ± 1% 3.67MB/s ± 1% -1.23% (p=0.000 n=30+30) [Geo mean] 12.9MB/s 12.9MB/s +0.29% Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e Reviewed-on: https://go-review.googlesource.com/134232 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-02cmd/compile: intrinsify runtime.getcallerpc on all link register architecturesWei Xiao
Add a compiler intrinsic for getcallerpc on following architectures: arm mips mipsle mips64 mips64le ppc64 ppc64le s390x Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef Reviewed-on: https://go-review.googlesource.com/110835 Run-TryBot: Wei Xiao <Wei.Xiao@arm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-20cmd/compile: don't lower OpConvertAustin Clements
Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-04-13all: use new softfloat on GOARM=5Cherry Zhang
Use the new softfloat support in the compiler, originally added for softfloat on MIPS. This support is portable, so we can just use it for softfloat on ARM. In the old softfloat support on ARM, the compiler generates floating point instructions, then the assembler inserts calls to _sfloat before FP instructions. _sfloat decodes the following FP instructions and simulates them. In the new scheme, the compiler generates runtime calls to do FP operations at a higher level. It doesn't generate FP instructions, and therefore the assembler won't insert _sfloat calls, i.e. the old mechanism is automatically suppressed. The old method may be still be triggered with assembly code using FP instructions. In the standard library, the only occurance is math/sqrt_arm.s, which is rewritten to call to the Go implementation instead. Some significant speedups for code using floating points: name old time/op new time/op delta BinaryTree17-4 37.1s ± 2% 37.3s ± 1% ~ (p=0.105 n=10+10) Fannkuch11-4 13.0s ± 0% 13.1s ± 0% +0.46% (p=0.000 n=10+10) FmtFprintfEmpty-4 700ns ± 4% 734ns ± 6% +4.84% (p=0.009 n=10+10) FmtFprintfString-4 1.22µs ± 3% 1.22µs ± 4% ~ (p=0.897 n=10+10) FmtFprintfInt-4 1.27µs ± 2% 1.30µs ± 1% +1.91% (p=0.001 n=10+9) FmtFprintfIntInt-4 1.83µs ± 2% 1.81µs ± 3% ~ (p=0.149 n=10+10) FmtFprintfPrefixedInt-4 1.80µs ± 3% 1.81µs ± 2% ~ (p=0.421 n=10+8) FmtFprintfFloat-4 6.89µs ± 3% 3.59µs ± 2% -47.93% (p=0.000 n=10+10) FmtManyArgs-4 6.39µs ± 1% 6.09µs ± 1% -4.61% (p=0.000 n=10+9) GobDecode-4 109ms ± 2% 81ms ± 2% -25.99% (p=0.000 n=9+10) GobEncode-4 109ms ± 2% 76ms ± 2% -29.88% (p=0.000 n=10+9) Gzip-4 3.61s ± 1% 3.59s ± 1% ~ (p=0.247 n=10+10) Gunzip-4 449ms ± 4% 450ms ± 1% ~ (p=0.230 n=10+7) HTTPClientServer-4 1.55ms ± 3% 1.53ms ± 2% ~ (p=0.400 n=9+10) JSONEncode-4 356ms ± 1% 183ms ± 1% -48.73% (p=0.000 n=10+10) JSONDecode-4 1.12s ± 2% 0.87s ± 1% -21.88% (p=0.000 n=10+10) Mandelbrot200-4 5.49s ± 1% 2.55s ± 1% -53.45% (p=0.000 n=9+10) GoParse-4 49.6ms ± 2% 47.5ms ± 1% -4.08% (p=0.000 n=10+9) RegexpMatchEasy0_32-4 1.13µs ± 4% 1.20µs ± 4% +6.42% (p=0.000 n=10+10) RegexpMatchEasy0_1K-4 4.41µs ± 2% 4.44µs ± 2% ~ (p=0.128 n=10+10) RegexpMatchEasy1_32-4 1.15µs ± 5% 1.20µs ± 5% +4.85% (p=0.002 n=10+10) RegexpMatchEasy1_1K-4 6.21µs ± 2% 6.37µs ± 4% +2.62% (p=0.001 n=9+10) RegexpMatchMedium_32-4 1.58µs ± 5% 1.65µs ± 3% +4.85% (p=0.000 n=10+10) RegexpMatchMedium_1K-4 341µs ± 3% 351µs ± 7% ~ (p=0.573 n=8+10) RegexpMatchHard_32-4 21.4µs ± 3% 21.5µs ± 5% ~ (p=0.931 n=9+9) RegexpMatchHard_1K-4 626µs ± 2% 626µs ± 1% ~ (p=0.645 n=8+8) Revcomp-4 46.4ms ± 2% 47.4ms ± 2% +2.07% (p=0.000 n=10+10) Template-4 1.31s ± 3% 1.23s ± 4% -6.13% (p=0.000 n=10+10) TimeParse-4 4.49µs ± 1% 4.41µs ± 2% -1.81% (p=0.000 n=10+9) TimeFormat-4 9.31µs ± 1% 9.32µs ± 2% ~ (p=0.561 n=9+9) Change-Id: Iaeeff6c9a09c1b2c064d06e09dd88101dc02bfa4 Reviewed-on: https://go-review.googlesource.com/106735 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-11cmd/compile: use Block.Likely to put likely branch firstDavid Chase
When a neither of a conditional block's successors follows, the block must end with a conditional branch followed by a an unconditional branch. If the (conditional) branch is "unlikely", invert it and swap successors to make it likely instead. This doesn't matter to most benchmarks on amd64, but in one instance on amd64 it caused a 30% improvement, and it is otherwise harmless. The problematic loop is for i := 0; i < w; i++ { if pw[i] != 0 { return true } } compiled under GOEXPERIMENT=preemptibleloops This the very worst-case benchmark for that experiment. Also in this CL is a commoning up of heavily-repeated boilerplate, which made it much easier to see that the changes were applied correctly. In the future this should allow un-exporting of SSAGenState.Branches once the boilerplate-replacement is done everywhere. Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335 Reviewed-on: https://go-review.googlesource.com/104977 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-13runtime: buffered write barrier for armAustin Clements
Updates #22460. Change-Id: I5581df7ad553237db7df3701b117ad99e0593b78 Reviewed-on: https://go-review.googlesource.com/92698 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-26cmd/compile: optimize MOVBS/MOVBU/MOVHS/MOVHU on ARMv6 and ARMv7Ben Shi
MOVBS/MOVBU/MOVHS/MOVHU can be optimized with a single instruction on ARMv6 and ARMv7, instead of a pair of left/right shifts. The benchmark tests show big improvement in special cases and a little improvement in total. 1. A special case gets about 29% improvement. name old time/op new time/op delta TypePro-4 3.81ms ± 1% 2.71ms ± 1% -28.97% (p=0.000 n=26+25) The source code of this case can be found at https://github.com/benshi001/ugo1/blob/master/typepromotion_test.go 2. There is a little improvement in the go1 benchmark, excluding the noise. name old time/op new time/op delta BinaryTree17-4 42.1s ± 3% 42.1s ± 2% ~ (p=0.883 n=28+30) Fannkuch11-4 24.3s ± 4% 24.7s ± 7% +1.64% (p=0.026 n=30+30) FmtFprintfEmpty-4 833ns ± 2% 835ns ± 2% ~ (p=0.371 n=26+28) FmtFprintfString-4 1.36µs ± 3% 1.35µs ± 1% ~ (p=0.202 n=26+23) FmtFprintfInt-4 1.42µs ± 3% 1.43µs ± 1% +0.66% (p=0.000 n=26+27) FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 2% ~ (p=0.104 n=25+26) FmtFprintfPrefixedInt-4 2.37µs ± 2% 2.33µs ± 1% -1.75% (p=0.000 n=25+28) FmtFprintfFloat-4 4.50µs ± 0% 4.37µs ± 1% -2.81% (p=0.000 n=23+25) FmtManyArgs-4 8.08µs ± 0% 8.13µs ± 3% ~ (p=0.160 n=23+26) GobDecode-4 102ms ± 4% 103ms ± 4% +1.08% (p=0.001 n=28+26) GobEncode-4 96.0ms ± 2% 95.2ms ± 3% -0.81% (p=0.000 n=24+25) Gzip-4 4.17s ± 3% 4.11s ± 2% -1.45% (p=0.000 n=25+25) Gunzip-4 597ms ± 2% 594ms ± 2% -0.57% (p=0.000 n=24+26) HTTPClientServer-4 708µs ± 4% 708µs ± 4% ~ (p=0.852 n=28+28) JSONEncode-4 241ms ± 1% 245ms ± 3% +1.62% (p=0.000 n=27+28) JSONDecode-4 906ms ± 3% 889ms ± 3% -1.85% (p=0.000 n=23+24) Mandelbrot200-4 41.8ms ± 1% 41.8ms ± 1% ~ (p=0.929 n=25+24) GoParse-4 47.1ms ± 2% 45.3ms ± 4% -3.80% (p=0.000 n=28+24) RegexpMatchEasy0_32-4 1.27µs ± 2% 1.28µs ± 1% +0.77% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 8.08µs ± 9% 7.83µs ±10% -3.10% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 1.29µs ± 5% 1.29µs ± 2% ~ (p=0.301 n=26+29) RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.3µs ± 5% -1.95% (p=0.003 n=26+26) RegexpMatchMedium_32-4 1.94µs ± 1% 1.95µs ± 1% ~ (p=0.251 n=24+27) RegexpMatchMedium_1K-4 502µs ± 2% 502µs ± 2% ~ (p=0.336 n=25+28) RegexpMatchHard_32-4 26.7µs ± 1% 26.6µs ± 3% ~ (p=0.454 n=27+26) RegexpMatchHard_1K-4 801µs ± 3% 799µs ± 2% ~ (p=0.097 n=24+26) Revcomp-4 73.5ms ± 5% 73.2ms ± 3% ~ (p=0.240 n=26+26) Template-4 1.07s ± 2% 1.05s ± 1% -2.39% (p=0.000 n=26+24) TimeParse-4 6.87µs ± 1% 6.85µs ± 1% ~ (p=0.094 n=28+23) TimeFormat-4 13.4µs ± 1% 13.4µs ± 1% ~ (p=0.664 n=25+29) [Geo mean] 717µs 713µs -0.54% name old speed new speed delta GobDecode-4 7.52MB/s ± 4% 7.44MB/s ± 4% -1.10% (p=0.001 n=28+26) GobEncode-4 7.99MB/s ± 2% 8.06MB/s ± 3% +0.81% (p=0.000 n=24+25) Gzip-4 4.66MB/s ± 3% 4.72MB/s ± 2% +1.43% (p=0.000 n=25+25) Gunzip-4 32.5MB/s ± 2% 32.7MB/s ± 2% +0.56% (p=0.001 n=24+26) JSONEncode-4 8.04MB/s ± 1% 7.92MB/s ± 3% -1.59% (p=0.000 n=27+28) JSONDecode-4 2.14MB/s ± 3% 2.18MB/s ± 3% +1.90% (p=0.000 n=23+24) GoParse-4 1.23MB/s ± 3% 1.28MB/s ± 4% +4.23% (p=0.000 n=30+24) RegexpMatchEasy0_32-4 25.2MB/s ± 2% 25.0MB/s ± 1% -0.76% (p=0.000 n=26+28) RegexpMatchEasy0_1K-4 127MB/s ± 8% 131MB/s ± 9% +3.29% (p=0.012 n=26+26) RegexpMatchEasy1_32-4 24.8MB/s ± 5% 24.8MB/s ± 2% ~ (p=0.339 n=26+29) RegexpMatchEasy1_1K-4 97.9MB/s ± 4% 99.8MB/s ± 5% +1.98% (p=0.004 n=26+26) RegexpMatchMedium_32-4 514kB/s ± 3% 515kB/s ± 3% ~ (p=0.391 n=28+28) RegexpMatchMedium_1K-4 2.04MB/s ± 2% 2.04MB/s ± 2% ~ (p=0.517 n=25+28) RegexpMatchHard_32-4 1.20MB/s ± 3% 1.20MB/s ± 3% ~ (p=0.203 n=28+28) RegexpMatchHard_1K-4 1.28MB/s ± 3% 1.28MB/s ± 2% ~ (p=0.499 n=24+26) Revcomp-4 34.6MB/s ± 4% 34.7MB/s ± 3% ~ (p=0.245 n=26+26) Template-4 1.81MB/s ± 2% 1.85MB/s ± 3% +2.30% (p=0.000 n=26+25) [Geo mean] 6.82MB/s 6.88MB/s +0.84% fixes #20653 Change-Id: Ief0d6e726e517e51ae511325b21ee72598e759ff Reviewed-on: https://go-review.googlesource.com/71992 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-11cmd/compile: optimize ARM code with CMN/TST/TEQBen Shi
CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-10cmd/compile: intrinsify runtime.getcallerspCherry Zhang
Add a compiler intrinsic for getcallersp. So we are able to get rid of the argument (not done in this CL). Change-Id: Ic38fda1c694f918328659ab44654198fb116668d Reviewed-on: https://go-review.googlesource.com/69350 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com>
2017-09-21cmd/compile: optimized ARM code with BFX/BFXUBen Shi
BFX&BFXU were introduced in ARMv6T2. A single BFX or BFXU is more efficiently than a pair of left-shift/right-shift in bit field extraction. This patch implements this optimization. And the benchmark tests show big improvement in special cases and little change in total. 1. There is big improvement in a special test case. name old time/op new time/op delta BFX-4 665µs ± 1% 595µs ± 0% -10.61% (p=0.000 n=20+20) (The test case: https://github.com/benshi001/ugo1/blob/master/bfx_test.go) 2. The compilecmp benchmark shows no regression. name old time/op new time/op delta Template 2.33s ± 2% 2.34s ± 2% ~ (p=0.356 n=9+10) Unicode 1.32s ± 2% 1.30s ± 2% ~ (p=0.139 n=9+8) GoTypes 7.77s ± 1% 7.76s ± 1% ~ (p=0.780 n=10+9) Compiler 37.3s ± 1% 37.1s ± 1% ~ (p=0.211 n=10+9) SSA 84.3s ± 2% 84.3s ± 2% ~ (p=0.842 n=10+9) Flate 1.45s ± 1% 1.45s ± 3% ~ (p=0.853 n=10+10) GoParser 1.83s ± 2% 1.83s ± 2% ~ (p=0.739 n=10+10) Reflect 5.08s ± 2% 5.09s ± 2% ~ (p=0.720 n=9+10) Tar 2.44s ± 1% 2.44s ± 2% ~ (p=0.684 n=10+10) XML 2.62s ± 2% 2.62s ± 2% ~ (p=0.529 n=10+10) [Geo mean] 4.80s 4.79s -0.06% name old user-time/op new user-time/op delta Template 2.76s ± 2% 2.75s ± 3% ~ (p=0.893 n=10+10) Unicode 1.63s ± 1% 1.60s ± 1% -2.07% (p=0.000 n=8+9) GoTypes 9.54s ± 1% 9.52s ± 1% ~ (p=0.215 n=10+10) Compiler 46.0s ± 1% 46.0s ± 1% ~ (p=0.853 n=10+10) SSA 110s ± 1% 110s ± 1% ~ (p=0.838 n=10+10) Flate 1.69s ± 3% 1.69s ± 5% ~ (p=0.957 n=10+10) GoParser 2.15s ± 2% 2.15s ± 2% ~ (p=0.749 n=10+10) Reflect 6.03s ± 1% 5.99s ± 2% ~ (p=0.060 n=9+10) Tar 3.02s ± 2% 2.99s ± 2% ~ (p=0.214 n=10+10) XML 3.10s ± 2% 3.08s ± 2% ~ (p=0.732 n=9+10) [Geo mean] 5.82s 5.79s -0.41% name old text-bytes new text-bytes delta HelloSize 589kB ± 0% 589kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 76.9kB ± 0% 76.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows little change in total. (excluding noise) name old time/op new time/op delta BinaryTree17-4 41.5s ± 1% 41.6s ± 1% ~ (p=0.373 n=30+26) Fannkuch11-4 23.6s ± 1% 23.6s ± 1% +0.28% (p=0.003 n=29+30) FmtFprintfEmpty-4 826ns ± 1% 827ns ± 1% ~ (p=0.155 n=30+30) FmtFprintfString-4 1.35µs ± 1% 1.35µs ± 1% ~ (p=0.499 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.41µs ± 1% -1.19% (p=0.000 n=30+30) FmtFprintfIntInt-4 2.15µs ± 1% 2.11µs ± 1% -1.78% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 2.21µs ± 1% 2.21µs ± 1% ~ (p=0.881 n=30+30) FmtFprintfFloat-4 4.41µs ± 1% 4.44µs ± 0% +0.64% (p=0.000 n=30+30) FmtManyArgs-4 8.06µs ± 1% 8.06µs ± 0% ~ (p=0.871 n=30+30) GobDecode-4 103ms ± 1% 104ms ± 2% +0.54% (p=0.013 n=28+29) GobEncode-4 92.4ms ± 1% 92.6ms ± 1% ~ (p=0.447 n=30+29) Gzip-4 4.17s ± 1% 4.06s ± 1% -2.56% (p=0.000 n=29+30) Gunzip-4 603ms ± 1% 602ms ± 1% ~ (p=0.423 n=30+30) HTTPClientServer-4 688µs ± 2% 674µs ± 3% -2.09% (p=0.000 n=29+30) JSONEncode-4 237ms ± 1% 237ms ± 1% ~ (p=0.061 n=29+30) JSONDecode-4 907ms ± 1% 910ms ± 1% ~ (p=0.061 n=30+30) Mandelbrot200-4 41.7ms ± 0% 41.7ms ± 0% +0.19% (p=0.000 n=24+20) GoParse-4 45.7ms ± 2% 45.5ms ± 2% -0.29% (p=0.005 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 0% +0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 4% 7.73µs ± 3% ~ (p=0.169 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 1% ~ (p=0.126 n=30+30) RegexpMatchEasy1_1K-4 10.4µs ± 3% 10.3µs ± 2% -1.32% (p=0.004 n=30+29) RegexpMatchMedium_32-4 2.06µs ± 0% 2.06µs ± 0% ~ (p=0.071 n=30+30) RegexpMatchMedium_1K-4 531µs ± 1% 530µs ± 0% ~ (p=0.121 n=30+23) RegexpMatchHard_32-4 28.7µs ± 1% 28.6µs ± 1% -0.21% (p=0.001 n=30+27) RegexpMatchHard_1K-4 860µs ± 1% 857µs ± 1% ~ (p=0.105 n=30+27) Revcomp-4 67.3ms ± 2% 67.3ms ± 2% ~ (p=0.805 n=29+29) Template-4 1.08s ± 1% 1.08s ± 1% ~ (p=0.260 n=30+30) TimeParse-4 7.04µs ± 0% 7.04µs ± 0% ~ (p=0.315 n=30+30) TimeFormat-4 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.077 n=30+30) [Geo mean] 715µs 713µs -0.30% name old speed new speed delta GobDecode-4 7.42MB/s ± 1% 7.38MB/s ± 2% -0.54% (p=0.011 n=28+29) GobEncode-4 8.30MB/s ± 1% 8.29MB/s ± 1% ~ (p=0.484 n=30+29) Gzip-4 4.65MB/s ± 2% 4.78MB/s ± 1% +2.73% (p=0.000 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.357 n=30+30) JSONEncode-4 8.18MB/s ± 1% 8.19MB/s ± 1% ~ (p=0.052 n=29+30) JSONDecode-4 2.14MB/s ± 1% 2.13MB/s ± 1% ~ (p=0.074 n=30+29) GoParse-4 1.27MB/s ± 1% 1.27MB/s ± 2% ~ (p=0.618 n=24+30) RegexpMatchEasy0_32-4 25.2MB/s ± 0% 25.2MB/s ± 0% -0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 2% ~ (p=0.171 n=30+30) RegexpMatchEasy1_32-4 24.8MB/s ± 1% 24.9MB/s ± 1% ~ (p=0.106 n=30+30) RegexpMatchEasy1_1K-4 98.4MB/s ± 3% 99.6MB/s ± 4% +1.19% (p=0.011 n=30+30) RegexpMatchMedium_32-4 483kB/s ± 1% 484kB/s ± 1% ~ (p=0.426 n=30+30) RegexpMatchMedium_1K-4 1.93MB/s ± 1% 1.93MB/s ± 0% ~ (p=0.157 n=30+17) RegexpMatchHard_32-4 1.12MB/s ± 1% 1.12MB/s ± 0% +0.33% (p=0.001 n=30+24) RegexpMatchHard_1K-4 1.19MB/s ± 1% 1.19MB/s ± 1% ~ (p=0.290 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.815 n=29+29) Template-4 1.80MB/s ± 1% 1.80MB/s ± 1% ~ (p=0.586 n=30+30) [Geo mean] 6.80MB/s 6.81MB/s +0.25% fixes #20966 Change-Id: Idb5567bbe988c875315b8c98c128957cd474ccc5 Reviewed-on: https://go-review.googlesource.com/64950 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-09-19cmd/compile: remove Symbol wrappers from Aux fieldsKeith Randall
We used to have {Arg,Auto,Extern}Symbol structs with which we wrapped a *gc.Node or *obj.LSym before storing them in the Aux field of an ssa.Value. This let the SSA part of the compiler distinguish between autos and args, for example. We no longer need the wrappers as we can query the underlying objects directly. There was also some sloppy usage, where VarDef had a *gc.Node directly in its Aux field, whereas the use of that variable had that *gc.Node wrapped in an AutoSymbol. Thus the Aux fields didn't match (using ==) when they probably should. This sloppy usage cleanup is the only thing in the CL that changes the generated code - we can get rid of some more unused auto variables if the matching happens reliably. Removing this wrapper also lets us get rid of the varsyms cache (which was used to prevent wrapping the same *gc.Node twice). Change-Id: I0dedf8f82f84bfee413d310342b777316bd1d478 Reviewed-on: https://go-review.googlesource.com/64452 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-15cmd/compile: optimize ARM code with MULAF/MULSF/MULAD/MULSDBen Shi
The go compiler can generate better ARM code with those more efficient FP instructions. And there is little improvement in total but big improvement in special cases. 1. The size of pkg/linux_arm/math.a shrinks by 2.4%. 2. there is neither improvement nor regression in compilecmp benchmark. name old time/op new time/op delta Template 2.32s ± 2% 2.32s ± 1% ~ (p=1.000 n=9+10) Unicode 1.32s ± 4% 1.32s ± 4% ~ (p=0.912 n=10+10) GoTypes 7.76s ± 1% 7.79s ± 1% ~ (p=0.447 n=9+10) Compiler 37.4s ± 2% 37.2s ± 2% ~ (p=0.218 n=10+10) SSA 84.8s ± 2% 85.0s ± 1% ~ (p=0.604 n=10+9) Flate 1.45s ± 2% 1.44s ± 2% ~ (p=0.075 n=10+10) GoParser 1.82s ± 1% 1.81s ± 1% ~ (p=0.190 n=10+10) Reflect 5.06s ± 1% 5.05s ± 1% ~ (p=0.315 n=10+9) Tar 2.37s ± 1% 2.37s ± 2% ~ (p=0.912 n=10+10) XML 2.56s ± 1% 2.58s ± 2% ~ (p=0.089 n=10+10) [Geo mean] 4.77s 4.77s -0.08% name old user-time/op new user-time/op delta Template 2.74s ± 2% 2.75s ± 2% ~ (p=0.856 n=9+10) Unicode 1.61s ± 4% 1.62s ± 3% ~ (p=0.693 n=10+10) GoTypes 9.55s ± 1% 9.49s ± 2% ~ (p=0.056 n=9+10) Compiler 45.9s ± 1% 45.8s ± 1% ~ (p=0.345 n=9+10) SSA 110s ± 1% 110s ± 1% ~ (p=0.763 n=9+10) Flate 1.68s ± 2% 1.68s ± 3% ~ (p=0.616 n=10+10) GoParser 2.14s ± 4% 2.14s ± 1% ~ (p=0.825 n=10+9) Reflect 5.95s ± 1% 5.97s ± 3% ~ (p=0.951 n=9+10) Tar 2.94s ± 3% 2.93s ± 2% ~ (p=0.359 n=10+10) XML 3.03s ± 3% 3.07s ± 6% ~ (p=0.166 n=10+10) [Geo mean] 5.76s 5.77s +0.12% name old text-bytes new text-bytes delta HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The performance of Mandelbrot200 improves 15%, though little improvement in total. name old time/op new time/op delta BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.264 n=29+23) Fannkuch11-4 24.2s ± 0% 24.1s ± 1% -0.13% (p=0.050 n=30+30) FmtFprintfEmpty-4 826ns ± 1% 824ns ± 1% -0.24% (p=0.038 n=25+30) FmtFprintfString-4 1.38µs ± 1% 1.38µs ± 0% -0.42% (p=0.000 n=27+25) FmtFprintfInt-4 1.46µs ± 1% 1.46µs ± 0% ~ (p=0.060 n=30+23) FmtFprintfIntInt-4 2.11µs ± 1% 2.08µs ± 0% -1.04% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 2.23µs ± 1% 2.22µs ± 1% -0.51% (p=0.000 n=30+30) FmtFprintfFloat-4 4.49µs ± 1% 4.48µs ± 1% -0.22% (p=0.004 n=26+30) FmtManyArgs-4 8.06µs ± 1% 8.12µs ± 1% +0.68% (p=0.000 n=25+30) GobDecode-4 104ms ± 1% 104ms ± 2% ~ (p=0.362 n=29+29) GobEncode-4 92.9ms ± 1% 92.8ms ± 2% ~ (p=0.786 n=30+30) Gzip-4 4.12s ± 1% 4.12s ± 1% ~ (p=0.314 n=30+30) Gunzip-4 602ms ± 1% 603ms ± 1% ~ (p=0.164 n=30+30) HTTPClientServer-4 659µs ± 1% 655µs ± 2% -0.64% (p=0.006 n=25+28) JSONEncode-4 234ms ± 1% 235ms ± 1% +0.29% (p=0.050 n=30+30) JSONDecode-4 912ms ± 0% 911ms ± 0% ~ (p=0.385 n=18+24) Mandelbrot200-4 49.2ms ± 0% 41.7ms ± 0% -15.35% (p=0.000 n=25+27) GoParse-4 46.3ms ± 1% 46.3ms ± 2% ~ (p=0.572 n=30+30) RegexpMatchEasy0_32-4 1.29µs ± 1% 1.27µs ± 0% -1.59% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.62µs ± 4% 7.71µs ± 3% ~ (p=0.074 n=30+30) RegexpMatchEasy1_32-4 1.31µs ± 0% 1.30µs ± 1% -0.71% (p=0.000 n=23+30) RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.3µs ± 5% ~ (p=0.105 n=30+30) RegexpMatchMedium_32-4 2.06µs ± 1% 2.06µs ± 1% ~ (p=0.100 n=30+30) RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.254 n=29+30) RegexpMatchHard_32-4 28.9µs ± 0% 28.9µs ± 0% ~ (p=0.154 n=30+30) RegexpMatchHard_1K-4 868µs ± 1% 867µs ± 0% ~ (p=0.729 n=30+23) Revcomp-4 66.9ms ± 1% 67.2ms ± 2% ~ (p=0.102 n=28+29) Template-4 1.07s ± 1% 1.06s ± 1% -0.53% (p=0.000 n=30+30) TimeParse-4 7.07µs ± 1% 7.01µs ± 0% -0.85% (p=0.000 n=30+25) TimeFormat-4 13.1µs ± 0% 13.2µs ± 1% +0.77% (p=0.000 n=27+27) [Geo mean] 721µs 716µs -0.70% name old speed new speed delta GobDecode-4 7.38MB/s ± 1% 7.37MB/s ± 2% ~ (p=0.399 n=29+29) GobEncode-4 8.26MB/s ± 1% 8.27MB/s ± 2% ~ (p=0.790 n=30+30) Gzip-4 4.71MB/s ± 1% 4.71MB/s ± 1% ~ (p=0.885 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.190 n=30+30) JSONEncode-4 8.28MB/s ± 1% 8.25MB/s ± 1% ~ (p=0.053 n=30+30) JSONDecode-4 2.13MB/s ± 0% 2.12MB/s ± 1% ~ (p=0.072 n=18+30) GoParse-4 1.25MB/s ± 1% 1.25MB/s ± 2% ~ (p=0.863 n=30+30) RegexpMatchEasy0_32-4 24.8MB/s ± 0% 25.2MB/s ± 1% +1.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 134MB/s ± 4% 133MB/s ± 3% ~ (p=0.074 n=30+30) RegexpMatchEasy1_32-4 24.5MB/s ± 0% 24.6MB/s ± 1% +0.72% (p=0.000 n=23+30) RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 99.8MB/s ± 5% ~ (p=0.105 n=30+30) RegexpMatchMedium_32-4 483kB/s ± 1% 487kB/s ± 1% +0.83% (p=0.002 n=30+30) RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.058 n=30+30) RegexpMatchHard_32-4 1.10MB/s ± 0% 1.11MB/s ± 0% ~ (p=0.804 n=30+30) RegexpMatchHard_1K-4 1.18MB/s ± 0% 1.18MB/s ± 0% ~ (all equal) Revcomp-4 38.0MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.098 n=28+29) Template-4 1.82MB/s ± 1% 1.83MB/s ± 1% +0.55% (p=0.000 n=29+29) [Geo mean] 6.79MB/s 6.79MB/s +0.09% Change-Id: Ia91991c2c5c59c5df712de85a83b13a21c0a554b Reviewed-on: https://go-review.googlesource.com/63770 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-11cmd/compile: optimize ARM code with NMULF/NMULDBen Shi
NMULF and NMULD are efficient FP instructions, and the go compiler can use them to generate better code. The benchmark tests of my patch did not show general change, but big improvement in special cases. 1.A special test case improved 12.6%. https://github.com/benshi001/ugo1/blob/master/fpmul_test.go name old time/op new time/op delta FPMul-4 398µs ± 1% 348µs ± 1% -12.64% (p=0.000 n=40+40) 2. the compilecmp test showed little change. name old time/op new time/op delta Template 2.30s ± 1% 2.31s ± 1% ~ (p=0.754 n=17+19) Unicode 1.31s ± 3% 1.32s ± 5% ~ (p=0.265 n=20+20) GoTypes 7.73s ± 2% 7.73s ± 1% ~ (p=0.925 n=20+20) Compiler 37.0s ± 1% 37.3s ± 2% +0.79% (p=0.002 n=19+20) SSA 83.8s ± 4% 83.5s ± 2% ~ (p=0.964 n=20+17) Flate 1.43s ± 2% 1.44s ± 1% ~ (p=0.602 n=20+20) GoParser 1.82s ± 2% 1.81s ± 2% ~ (p=0.141 n=19+20) Reflect 5.08s ± 2% 5.08s ± 3% ~ (p=0.835 n=20+19) Tar 2.36s ± 1% 2.35s ± 1% ~ (p=0.195 n=18+17) XML 2.57s ± 2% 2.56s ± 1% ~ (p=0.283 n=20+17) [Geo mean] 4.74s 4.75s +0.05% name old user-time/op new user-time/op delta Template 2.75s ± 2% 2.75s ± 0% ~ (p=0.620 n=20+15) Unicode 1.59s ± 4% 1.60s ± 4% ~ (p=0.479 n=20+19) GoTypes 9.48s ± 1% 9.47s ± 1% ~ (p=0.743 n=20+20) Compiler 45.7s ± 1% 45.7s ± 1% ~ (p=0.482 n=19+20) SSA 109s ± 1% 109s ± 2% ~ (p=0.800 n=18+20) Flate 1.67s ± 3% 1.67s ± 3% ~ (p=0.598 n=19+18) GoParser 2.15s ± 4% 2.13s ± 3% ~ (p=0.153 n=20+20) Reflect 5.95s ± 2% 5.95s ± 2% ~ (p=0.961 n=19+20) Tar 2.93s ± 2% 2.92s ± 3% ~ (p=0.242 n=20+19) XML 3.02s ± 3% 3.04s ± 3% ~ (p=0.233 n=19+18) [Geo mean] 5.74s 5.74s -0.04% name old text-bytes new text-bytes delta HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark showed little change in total. name old time/op new time/op delta BinaryTree17-4 41.8s ± 1% 41.8s ± 1% ~ (p=0.388 n=40+39) Fannkuch11-4 24.1s ± 1% 24.1s ± 1% ~ (p=0.077 n=40+40) FmtFprintfEmpty-4 834ns ± 1% 831ns ± 1% -0.31% (p=0.002 n=40+37) FmtFprintfString-4 1.34µs ± 1% 1.34µs ± 0% ~ (p=0.387 n=40+40) FmtFprintfInt-4 1.44µs ± 1% 1.44µs ± 1% ~ (p=0.421 n=40+40) FmtFprintfIntInt-4 2.09µs ± 0% 2.09µs ± 1% ~ (p=0.589 n=40+39) FmtFprintfPrefixedInt-4 2.32µs ± 1% 2.33µs ± 1% +0.15% (p=0.001 n=40+40) FmtFprintfFloat-4 4.51µs ± 0% 4.44µs ± 1% -1.50% (p=0.000 n=40+40) FmtManyArgs-4 7.94µs ± 0% 7.97µs ± 0% +0.36% (p=0.001 n=32+40) GobDecode-4 104ms ± 1% 102ms ± 2% -1.27% (p=0.000 n=39+37) GobEncode-4 90.5ms ± 1% 90.9ms ± 2% +0.40% (p=0.006 n=37+40) Gzip-4 4.10s ± 2% 4.08s ± 1% -0.30% (p=0.004 n=40+40) Gunzip-4 603ms ± 0% 602ms ± 1% ~ (p=0.303 n=37+40) HTTPClientServer-4 672µs ± 3% 658µs ± 2% -2.08% (p=0.000 n=39+37) JSONEncode-4 238ms ± 1% 239ms ± 0% +0.26% (p=0.001 n=40+25) JSONDecode-4 884ms ± 1% 885ms ± 1% +0.16% (p=0.012 n=40+40) Mandelbrot200-4 49.3ms ± 0% 49.3ms ± 0% ~ (p=0.588 n=40+38) GoParse-4 46.3ms ± 1% 46.4ms ± 2% ~ (p=0.487 n=40+40) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.28µs ± 0% +0.12% (p=0.003 n=40+40) RegexpMatchEasy0_1K-4 7.78µs ± 5% 7.78µs ± 4% ~ (p=0.825 n=40+40) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 0% ~ (p=0.659 n=40+40) RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.4µs ± 2% ~ (p=0.266 n=40+40) RegexpMatchMedium_32-4 2.05µs ± 1% 2.05µs ± 0% -0.18% (p=0.002 n=40+28) RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.397 n=37+40) RegexpMatchHard_32-4 28.9µs ± 1% 28.9µs ± 1% -0.22% (p=0.002 n=40+40) RegexpMatchHard_1K-4 868µs ± 1% 870µs ± 1% +0.21% (p=0.015 n=40+40) Revcomp-4 67.3ms ± 1% 67.2ms ± 2% ~ (p=0.262 n=38+39) Template-4 1.07s ± 1% 1.07s ± 1% ~ (p=0.276 n=40+40) TimeParse-4 7.16µs ± 1% 7.16µs ± 1% ~ (p=0.610 n=39+40) TimeFormat-4 13.3µs ± 1% 13.3µs ± 1% ~ (p=0.617 n=38+40) [Geo mean] 720µs 719µs -0.13% name old speed new speed delta GobDecode-4 7.39MB/s ± 1% 7.49MB/s ± 2% +1.25% (p=0.000 n=39+38) GobEncode-4 8.48MB/s ± 1% 8.45MB/s ± 2% -0.40% (p=0.005 n=37+40) Gzip-4 4.74MB/s ± 2% 4.75MB/s ± 1% +0.30% (p=0.018 n=40+40) Gunzip-4 32.2MB/s ± 0% 32.2MB/s ± 1% ~ (p=0.272 n=36+40) JSONEncode-4 8.15MB/s ± 1% 8.13MB/s ± 0% -0.26% (p=0.003 n=40+25) JSONDecode-4 2.19MB/s ± 1% 2.19MB/s ± 1% ~ (p=0.676 n=40+40) GoParse-4 1.25MB/s ± 2% 1.25MB/s ± 2% ~ (p=0.823 n=40+40) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.1MB/s ± 0% -0.12% (p=0.006 n=40+40) RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 5% ~ (p=0.821 n=40+40) RegexpMatchEasy1_32-4 24.7MB/s ± 1% 24.7MB/s ± 0% ~ (p=0.630 n=40+40) RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 98.8MB/s ± 2% ~ (p=0.268 n=40+40) RegexpMatchMedium_32-4 487kB/s ± 2% 490kB/s ± 0% +0.51% (p=0.001 n=40+40) RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.208 n=39+40) RegexpMatchHard_32-4 1.11MB/s ± 1% 1.11MB/s ± 0% +0.36% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.18MB/s ± 1% 1.18MB/s ± 1% ~ (p=0.207 n=40+37) Revcomp-4 37.8MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.276 n=38+39) Template-4 1.82MB/s ± 1% 1.81MB/s ± 1% ~ (p=0.122 n=38+40) [Geo mean] 6.81MB/s 6.81MB/s +0.06% fixes #19843 Change-Id: Ief3a0c2b15f59d40c7b40f2784eeb71196685b59 Reviewed-on: https://go-review.googlesource.com/61150 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-30cmd/compile: optimize ARM with MULSBen Shi
MULS was introduced in ARMv7 and corresponding to MULA. This patch duplicated all MULA related SSA rules with MULS. Here was the contrast test result against the original go compiler. There was no improvement in total, but big improvement in special cases. 1. A specific test case accelerated 18.62%. (https://github.com/benshi001/ugo1/blob/master/mulsub_test.go) name old time/op new time/op delta MulSub-4 270µs ± 0% 219µs ± 0% -18.62% (p=0.000 n=35+40) 2. Total size of all .a files in pkg/ shrank by 0.002%. 3. The compilecmp benchmark showed no decline. name old time/op new time/op delta Template 2.37s ± 3% 2.36s ± 1% ~ (p=0.233 n=19+18) Unicode 1.32s ± 2% 1.34s ± 5% +1.32% (p=0.011 n=20+18) GoTypes 7.88s ± 1% 7.87s ± 1% ~ (p=0.758 n=20+20) Compiler 37.5s ± 1% 37.6s ± 1% ~ (p=0.194 n=20+19) SSA 83.7s ± 2% 83.5s ± 2% ~ (p=0.569 n=20+19) Flate 1.46s ± 3% 1.45s ± 1% ~ (p=0.619 n=20+17) GoParser 1.87s ± 2% 1.85s ± 1% -0.58% (p=0.048 n=20+18) Reflect 5.10s ± 2% 5.11s ± 2% ~ (p=0.365 n=19+20) Tar 1.78s ± 2% 1.78s ± 2% ~ (p=0.531 n=19+20) XML 2.62s ± 1% 2.61s ± 2% ~ (p=0.057 n=17+19) [Geo mean] 4.68s 4.67s -0.07% name old user-time/op new user-time/op delta Template 2.80s ± 1% 2.79s ± 2% ~ (p=0.686 n=17+20) Unicode 1.61s ± 4% 1.63s ± 6% ~ (p=0.222 n=20+20) GoTypes 9.59s ± 1% 9.60s ± 1% ~ (p=0.482 n=17+20) Compiler 46.1s ± 1% 46.2s ± 1% ~ (p=0.373 n=20+18) SSA 108s ± 1% 108s ± 2% ~ (p=0.784 n=20+20) Flate 1.68s ± 3% 1.69s ± 3% ~ (p=0.335 n=20+19) GoParser 2.20s ± 4% 2.19s ± 2% ~ (p=0.844 n=20+18) Reflect 5.97s ± 3% 6.01s ± 2% ~ (p=0.184 n=20+20) Tar 2.11s ± 2% 2.11s ± 4% ~ (p=0.961 n=19+20) XML 3.07s ± 1% 3.07s ± 3% ~ (p=0.786 n=16+19) [Geo mean] 5.61s 5.62s +0.19% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 4. The go1 benchmark showed no decline in total. name old time/op new time/op delta BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.966 n=40+40) Fannkuch11-4 23.6s ± 0% 23.6s ± 1% -0.23% (p=0.000 n=40+40) FmtFprintfEmpty-4 844ns ± 1% 834ns ± 1% -1.23% (p=0.000 n=40+40) FmtFprintfString-4 1.39µs ± 1% 1.40µs ± 1% +0.71% (p=0.000 n=40+40) FmtFprintfInt-4 1.44µs ± 1% 1.45µs ± 1% +0.70% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 1% +0.30% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.49µs ± 0% 2.50µs ± 1% +0.66% (p=0.000 n=32+40) FmtFprintfFloat-4 4.42µs ± 1% 4.46µs ± 2% +0.94% (p=0.000 n=40+40) FmtManyArgs-4 8.31µs ± 1% 8.22µs ± 1% -1.09% (p=0.000 n=40+40) GobDecode-4 105ms ± 1% 102ms ± 1% -2.30% (p=0.000 n=39+39) GobEncode-4 90.2ms ± 1% 88.7ms ± 1% -1.66% (p=0.000 n=40+39) Gzip-4 4.17s ± 1% 4.16s ± 1% ~ (p=0.785 n=40+40) Gunzip-4 608ms ± 1% 608ms ± 1% ~ (p=0.481 n=40+40) HTTPClientServer-4 697µs ± 2% 684µs ± 3% -1.89% (p=0.000 n=37+40) JSONEncode-4 255ms ± 1% 256ms ± 1% +0.35% (p=0.000 n=40+40) JSONDecode-4 920ms ± 1% 926ms ± 1% +0.64% (p=0.000 n=40+39) Mandelbrot200-4 49.3ms ± 1% 49.3ms ± 0% +0.07% (p=0.005 n=40+40) GoParse-4 46.8ms ± 2% 46.7ms ± 1% ~ (p=1.000 n=40+40) RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 1% ~ (p=0.057 n=40+40) RegexpMatchEasy0_1K-4 7.97µs ± 7% 7.92µs ± 5% ~ (p=0.094 n=40+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% ~ (p=0.406 n=40+40) RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.5µs ± 3% ~ (p=0.855 n=40+40) RegexpMatchMedium_32-4 2.04µs ± 0% 2.04µs ± 1% -0.22% (p=0.000 n=39+40) RegexpMatchMedium_1K-4 541µs ± 0% 540µs ± 1% -0.25% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.3µs ± 0% ~ (p=0.149 n=40+40) RegexpMatchHard_1K-4 878µs ± 1% 880µs ± 0% +0.14% (p=0.005 n=36+35) Revcomp-4 81.8ms ± 2% 81.4ms ± 2% -0.43% (p=0.015 n=38+39) Template-4 1.05s ± 1% 1.05s ± 1% ~ (p=0.302 n=40+35) TimeParse-4 7.18µs ± 1% 7.26µs ± 1% +1.05% (p=0.000 n=40+36) TimeFormat-4 13.1µs ± 1% 13.1µs ± 1% ~ (p=0.698 n=37+40) [Geo mean] 733µs 732µs -0.16% name old speed new speed delta GobDecode-4 7.34MB/s ± 1% 7.51MB/s ± 1% +2.36% (p=0.000 n=39+39) GobEncode-4 8.51MB/s ± 1% 8.65MB/s ± 1% +1.69% (p=0.000 n=40+39) Gzip-4 4.66MB/s ± 1% 4.66MB/s ± 1% ~ (p=0.783 n=40+40) Gunzip-4 31.9MB/s ± 1% 31.9MB/s ± 1% ~ (p=0.466 n=40+40) JSONEncode-4 7.61MB/s ± 1% 7.58MB/s ± 1% -0.35% (p=0.001 n=40+40) JSONDecode-4 2.11MB/s ± 1% 2.10MB/s ± 1% -0.52% (p=0.000 n=38+39) GoParse-4 1.24MB/s ± 2% 1.24MB/s ± 1% ~ (p=0.556 n=40+39) RegexpMatchEasy0_32-4 25.1MB/s ± 0% 25.1MB/s ± 1% ~ (p=0.064 n=40+40) RegexpMatchEasy0_1K-4 129MB/s ± 8% 129MB/s ± 5% ~ (p=0.094 n=40+40) RegexpMatchEasy1_32-4 25.0MB/s ± 1% 25.1MB/s ± 1% ~ (p=0.331 n=40+40) RegexpMatchEasy1_1K-4 97.7MB/s ± 4% 97.8MB/s ± 3% ~ (p=0.851 n=40+40) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 0% 1.90MB/s ± 1% +0.12% (p=0.031 n=40+40) RegexpMatchHard_32-4 1.09MB/s ± 1% 1.09MB/s ± 1% ~ (p=0.597 n=40+40) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.16MB/s ± 1% ~ (p=0.565 n=40+35) Revcomp-4 31.1MB/s ± 2% 31.2MB/s ± 2% +0.44% (p=0.018 n=38+39) Template-4 1.85MB/s ± 1% 1.85MB/s ± 1% ~ (p=0.873 n=40+40) [Geo mean] 6.66MB/s 6.67MB/s +0.26% Change-Id: Icc972d8a78ea06c32c3aa15733ff0537c82c2dc7 Reviewed-on: https://go-review.googlesource.com/58950 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-08-28cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHUBen Shi
Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to generate further optimized ARM code. My patch implements this optimization. Here are some contrast test results against the original go compiler. 1. The total size of all .a files in pkg/ shrinks by 0.03%. 2. The compilecmp benchmark shows a little decline. name old time/op new time/op delta Template 2.35s ± 1% 2.37s ± 3% +0.94% (p=0.006 n=19+19) Unicode 1.33s ± 3% 1.33s ± 2% ~ (p=0.158 n=20+18) GoTypes 7.86s ± 2% 7.84s ± 1% ~ (p=0.284 n=19+18) Compiler 37.5s ± 1% 37.7s ± 2% ~ (p=0.101 n=20+19) SSA 83.4s ± 2% 83.6s ± 2% ~ (p=0.231 n=20+20) Flate 1.46s ± 2% 1.45s ± 1% ~ (p=0.097 n=20+17) GoParser 1.86s ± 2% 1.86s ± 4% ~ (p=0.738 n=20+20) Reflect 5.10s ± 1% 5.11s ± 1% ~ (p=0.290 n=20+18) Tar 1.78s ± 2% 1.77s ± 2% ~ (p=0.166 n=19+20) XML 2.61s ± 2% 2.61s ± 2% ~ (p=0.665 n=19+19) [Geo mean] 4.67s 4.68s +0.16% name old user-time/op new user-time/op delta Template 2.79s ± 3% 2.80s ± 2% ~ (p=0.662 n=20+20) Unicode 1.62s ± 3% 1.64s ± 4% ~ (p=0.252 n=20+20) GoTypes 9.58s ± 2% 9.62s ± 2% ~ (p=0.250 n=20+20) Compiler 46.2s ± 1% 46.2s ± 1% ~ (p=0.602 n=20+19) SSA 108s ± 1% 108s ± 2% ~ (p=0.242 n=18+20) Flate 1.69s ± 3% 1.69s ± 4% ~ (p=0.470 n=20+20) GoParser 2.16s ± 3% 2.20s ± 4% +1.70% (p=0.005 n=19+20) Reflect 6.02s ± 2% 6.02s ± 2% ~ (p=0.700 n=20+17) Tar 2.11s ± 2% 2.11s ± 3% ~ (p=0.480 n=18+20) XML 3.07s ± 2% 3.11s ± 4% +1.50% (p=0.043 n=20+20) [Geo mean] 5.61s 5.64s +0.55% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows improvement totally, and even more than 10% improvement in the test case Revcomp. name old time/op new time/op delta BinaryTree17-4 42.0s ± 1% 41.5s ± 1% -1.32% (p=0.000 n=39+40) Fannkuch11-4 24.1s ± 1% 23.6s ± 0% -2.38% (p=0.000 n=40+40) FmtFprintfEmpty-4 843ns ± 0% 839ns ± 1% -0.46% (p=0.000 n=33+40) FmtFprintfString-4 1.44µs ± 1% 1.37µs ± 1% -5.48% (p=0.000 n=40+35) FmtFprintfInt-4 1.44µs ± 1% 1.41µs ± 2% -1.50% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.07µs ± 1% 2.06µs ± 0% -0.78% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.50µs ± 1% 2.33µs ± 1% -6.85% (p=0.000 n=40+40) FmtFprintfFloat-4 4.36µs ± 1% 4.34µs ± 0% -0.39% (p=0.017 n=40+40) FmtManyArgs-4 8.11µs ± 0% 8.00µs ± 0% -1.37% (p=0.000 n=40+40) GobDecode-4 105ms ± 2% 103ms ± 2% -2.17% (p=0.000 n=39+39) GobEncode-4 90.1ms ± 2% 88.6ms ± 1% -1.67% (p=0.000 n=40+39) Gzip-4 4.18s ± 1% 4.09s ± 1% -2.03% (p=0.000 n=40+40) Gunzip-4 608ms ± 1% 603ms ± 1% -0.86% (p=0.000 n=40+34) HTTPClientServer-4 674µs ± 3% 661µs ± 2% -1.82% (p=0.000 n=40+39) JSONEncode-4 256ms ± 1% 243ms ± 0% -5.11% (p=0.000 n=39+31) JSONDecode-4 915ms ± 1% 904ms ± 1% -1.18% (p=0.000 n=40+36) Mandelbrot200-4 49.2ms ± 0% 49.3ms ± 0% ~ (p=0.254 n=34+40) GoParse-4 46.9ms ± 2% 46.9ms ± 1% ~ (p=0.737 n=40+39) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.27µs ± 1% -0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 7.86µs ± 4% 7.67µs ± 4% -2.46% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 10.4µs ± 2% 10.3µs ± 2% -0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 2.05µs ± 0% 2.04µs ± 0% -0.34% (p=0.000 n=40+33) RegexpMatchMedium_1K-4 541µs ± 1% 535µs ± 1% -1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.1µs ± 1% -0.51% (p=0.000 n=40+40) RegexpMatchHard_1K-4 881µs ± 1% 871µs ± 1% -1.15% (p=0.000 n=40+40) Revcomp-4 81.7ms ± 2% 67.5ms ± 2% -17.37% (p=0.000 n=39+39) Template-4 1.05s ± 1% 1.08s ± 2% +3.67% (p=0.000 n=40+40) TimeParse-4 7.24µs ± 1% 7.09µs ± 1% -2.13% (p=0.000 n=40+40) TimeFormat-4 13.2µs ± 1% 13.1µs ± 0% -0.31% (p=0.007 n=40+31) [Geo mean] 733µs 718µs -2.03% name old speed new speed delta GobDecode-4 7.28MB/s ± 2% 7.44MB/s ± 2% +2.23% (p=0.000 n=39+39) GobEncode-4 8.52MB/s ± 2% 8.67MB/s ± 1% +1.70% (p=0.000 n=40+39) Gzip-4 4.65MB/s ± 1% 4.74MB/s ± 1% +1.94% (p=0.000 n=37+40) Gunzip-4 31.9MB/s ± 1% 32.2MB/s ± 1% +0.90% (p=0.000 n=40+36) JSONEncode-4 7.57MB/s ± 1% 7.98MB/s ± 0% +5.41% (p=0.000 n=40+31) JSONDecode-4 2.12MB/s ± 1% 2.15MB/s ± 1% +1.23% (p=0.000 n=40+40) GoParse-4 1.23MB/s ± 1% 1.23MB/s ± 1% ~ (p=0.769 n=39+40) RegexpMatchEasy0_32-4 25.0MB/s ± 1% 25.2MB/s ± 1% +0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 130MB/s ± 5% 134MB/s ± 4% +2.53% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.1MB/s ± 1% +0.55% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 98.5MB/s ± 2% 99.4MB/s ± 2% +0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 1% 1.91MB/s ± 1% +1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 1.10MB/s ± 1% 1.10MB/s ± 0% +0.41% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.17MB/s ± 1% +1.21% (p=0.000 n=40+40) Revcomp-4 31.1MB/s ± 2% 37.6MB/s ± 2% +21.03% (p=0.000 n=39+39) Template-4 1.86MB/s ± 1% 1.79MB/s ± 1% -3.51% (p=0.000 n=40+38) [Geo mean] 6.66MB/s 6.80MB/s +2.13% fixes #21492 Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d Reviewed-on: https://go-review.googlesource.com/58450 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-02cmd/compile: set/unset base register for better assembly printCherry Zhang
For address of an auto or arg, on all non-x86 architectures the assembler backend encodes the actual SP offset in the instruction but leaves the offset in Prog unchanged. When the assembly is printed in compile -S, it shows an offset relative to pseudo FP/SP with an actual hardware SP base register (e.g. R13 on ARM). This is confusing. Unset the base register if it is indeed SP, so the assembly output is consistent. If the base register isn't SP, it should be an error and the error output contains the actual base register. For address loading instructions, the base register isn't set in the compiler on non-x86 architectures. Set it. Normally it is SP and will be unset in the change mentioned above for printing. If it is not, it will be an error and the error output contains the actual base register. No change in generated binary, only printed assembly. Passes "go build -a -toolexec 'toolstash -cmp' std cmd" on all architectures. Fixes #21064. Change-Id: Ifafe8d5f9b437efbe824b63b3cbc2f5f6cdc1fd5 Reviewed-on: https://go-review.googlesource.com/49432 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-05-09cmd/compile: change ssa.Type into *types.TypeJosh Bleecher Snyder
When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28Revert "cmd/compile: add Type.MustSize and Type.MustAlignment"Josh Bleecher Snyder
This reverts commit 94d540a4b6bf68ec472bf4469037955e3133fcf7. Reason for revert: prefer something along the lines of CL 42018. Change-Id: I876fe32e98f37d8d725fe55e0fd0ea429c0198e0 Reviewed-on: https://go-review.googlesource.com/42022 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>