Age | Commit message (Collapse) | Author |
|
Also tweak comment for the arm64 case.
Change-Id: I073405bd2acf901dcaaf33a034a84b6a09dd4a83
Reviewed-on: https://go-review.googlesource.com/c/go/+/334869
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
Fixes #46186
Change-Id: Idb0674079f9484593e07cca172dfbb19be0e594d
GitHub-Last-Rev: 615fc5365510ff7a39af7569f05a0013b724d0c9
GitHub-Pull-Request: golang/go#46185
Reviewed-on: https://go-review.googlesource.com/c/go/+/320111
Reviewed-by: Ben Shi <powerman1st@163.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: David Chase <drchase@google.com>
|
|
Tweak the register allocator to maintain the invariant that
OpArg{Int,Float}Reg values are placed together at the start of the
entry block, before any other non-pseudo-op values. Without this
change, when the register allocator adds spills we can wind up with an
interleaving of OpArg*Reg and stores, which complicates debug location
analysis.
Updates #40724.
Change-Id: Icf30dd814a9e25263ecbea2e48feb840a6e7f2bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/322630
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
|
In the register allocator, if possible, we allocate a value to its
desired register (the ideal register for its next use). In some
cases the desired register does not satisfies the value's output
register mask. We should not use the register in this case.
In the following example, v33 is going to be returned as a
function result, so it is allocated to its desired register AX.
However, its Op cannot use AX as output, causing miscompilation.
v33 = CMOVQEQF <int> v24 v28 v29 : AX (~R0[int])
v35 = MakeResult <int,int,mem> v33 v26 v18
Ret v35
Change-Id: Id0f4f27c4b233ee297e83077e3c8494fe193e664
Reviewed-on: https://go-review.googlesource.com/c/go/+/314630
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
Currently, if we have AX=a and BX=b, and we want to make a call
F(1, a, b), to move arguments into the desired registers it emits
MOVQ AX, CX
MOVL $1, AX // AX=1
MOVQ BX, DX
MOVQ CX, BX // BX=a
MOVQ DX, CX // CX=b
This has a few redundant moves.
This is because we process inputs in order. First, allocate 1 to
AX, which kicks out a (in AX) to CX (a free register at the
moment). Then, allocate a to BX, which kicks out b (in BX) to DX.
Finally, put b to CX.
Notice that if we start with allocating CX=b, then BX=a, AX=1,
we will not have redundant moves. This CL reduces redundant moves
by allocating them in different order: First, for inpouts that are
already in place, keep them there. Then allocate free registers.
Then everything else.
before after
cmd/compile binary size 23703888 23609680
text size 8565899 8533291
(with regabiargs enabled.)
Change-Id: I69e1bdf745f2c90bb791f6d7c45b37384af1e874
Reviewed-on: https://go-review.googlesource.com/c/go/+/311371
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
|
|
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.
Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
|
|
When -clobberdeadreg flag is set, the compiler inserts code that
clobbers integer registers at call sites. This may be helpful for
debugging register ABI.
Only implemented on AMD64 for now.
Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899
Reviewed-on: https://go-review.googlesource.com/c/go/+/302809
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
This moves all remaining GOEXPERIMENT flags into the objabi.Experiment
struct, drops the "_enabled" from their name, and makes them all bool
typed.
We also drop DebugFlags.Fieldtrack because the previous CL shifted the
one test that used it to use GOEXPERIMENT instead.
Change-Id: I3406fe62b1c300bb4caeaffa6ca5ce56a70497fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/302389
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The current layout algorithm tries to put consecutive blocks together,
so the priority of the successor block is higher than the priority of
the zero indegree block. This algorithm is beneficial for subsequent
register allocation, but will result in more branch instructions.
The depth-first topological sorting algorithm is a well-known layout
algorithm, which has applications in many languages, and it helps to
reduce branch instructions. This CL applies it to the layout pass.
The test results show that it helps to reduce the code size.
This CL also includes the following changes:
1, Removed the primary predecessor mechanism. The new layout algorithm is
not very friendly to register allocator in some cases, in order to adapt
to the new layout algorithm, a new primary predecessor selection strategy
is introduced.
2, Since the new layout implementation may place non-loop blocks between
loop blocks, some adaptive modifications have also been made to looprotate
pass.
3, The layout also affects the results of codegen, so this CL also adjusted
several codegen tests accordingly.
It is inevitable that this CL will cause the code size or performance of a
few functions to decrease, but the number of cases it improves is much larger
than the number of cases it drops.
Statistical data from compilecmp on linux/amd64 is as follow:
name old time/op new time/op delta
Template 382ms ± 4% 382ms ± 4% ~ (p=0.497 n=49+50)
Unicode 170ms ± 9% 169ms ± 8% ~ (p=0.344 n=48+50)
GoTypes 2.01s ± 4% 2.01s ± 4% ~ (p=0.628 n=50+48)
Compiler 190ms ±10% 189ms ± 9% ~ (p=0.734 n=50+50)
SSA 11.8s ± 2% 11.8s ± 3% ~ (p=0.877 n=50+50)
Flate 241ms ± 9% 241ms ± 8% ~ (p=0.897 n=50+49)
GoParser 366ms ± 3% 361ms ± 4% -1.21% (p=0.004 n=47+50)
Reflect 835ms ± 3% 838ms ± 3% ~ (p=0.275 n=50+49)
Tar 336ms ± 4% 335ms ± 3% ~ (p=0.454 n=48+48)
XML 433ms ± 4% 431ms ± 3% ~ (p=0.071 n=49+48)
LinkCompiler 706ms ± 4% 705ms ± 4% ~ (p=0.608 n=50+49)
ExternalLinkCompiler 1.85s ± 3% 1.83s ± 2% -1.47% (p=0.000 n=49+48)
LinkWithoutDebugCompiler 437ms ± 5% 437ms ± 6% ~ (p=0.953 n=49+50)
[Geo mean] 615ms 613ms -0.37%
name old alloc/op new alloc/op delta
Template 38.7MB ± 1% 38.7MB ± 1% ~ (p=0.834 n=50+50)
Unicode 28.1MB ± 0% 28.1MB ± 0% -0.22% (p=0.000 n=49+50)
GoTypes 168MB ± 1% 168MB ± 1% ~ (p=0.054 n=47+47)
Compiler 23.0MB ± 1% 23.0MB ± 1% ~ (p=0.432 n=50+50)
SSA 1.54GB ± 0% 1.54GB ± 0% +0.21% (p=0.000 n=50+50)
Flate 23.6MB ± 1% 23.6MB ± 1% ~ (p=0.153 n=43+46)
GoParser 35.1MB ± 1% 35.1MB ± 2% ~ (p=0.202 n=50+50)
Reflect 84.7MB ± 1% 84.7MB ± 1% ~ (p=0.333 n=48+49)
Tar 34.5MB ± 1% 34.5MB ± 1% ~ (p=0.406 n=46+49)
XML 44.3MB ± 2% 44.2MB ± 3% ~ (p=0.981 n=50+50)
LinkCompiler 131MB ± 0% 128MB ± 0% -2.74% (p=0.000 n=50+50)
ExternalLinkCompiler 120MB ± 0% 120MB ± 0% +0.01% (p=0.007 n=50+50)
LinkWithoutDebugCompiler 77.3MB ± 0% 77.3MB ± 0% -0.02% (p=0.000 n=50+50)
[Geo mean] 69.3MB 69.1MB -0.22%
file before after Δ %
addr2line 4104220 4043684 -60536 -1.475%
api 5342502 5249678 -92824 -1.737%
asm 4973785 4858257 -115528 -2.323%
buildid 2667844 2625660 -42184 -1.581%
cgo 4686849 4616313 -70536 -1.505%
compile 23667431 23268406 -399025 -1.686%
cover 4959676 4874108 -85568 -1.725%
dist 3515934 3450422 -65512 -1.863%
doc 3995581 3925469 -70112 -1.755%
fix 3379202 3318522 -60680 -1.796%
link 6743249 6629913 -113336 -1.681%
nm 4047529 3991777 -55752 -1.377%
objdump 4456151 4388151 -68000 -1.526%
pack 2435040 2398072 -36968 -1.518%
pprof 13804080 13565808 -238272 -1.726%
test2json 2690043 2645987 -44056 -1.638%
trace 10418492 10232716 -185776 -1.783%
vet 7258259 7121259 -137000 -1.888%
total 113145867 111204202 -1941665 -1.716%
The situation on linux/arm64 is as follow:
name old time/op new time/op delta
Template 280ms ± 1% 282ms ± 1% +0.75% (p=0.000 n=46+48)
Unicode 124ms ± 2% 124ms ± 2% +0.37% (p=0.045 n=50+50)
GoTypes 1.69s ± 1% 1.70s ± 1% +0.56% (p=0.000 n=49+50)
Compiler 122ms ± 1% 123ms ± 1% +0.93% (p=0.000 n=50+50)
SSA 12.6s ± 1% 12.7s ± 0% +0.72% (p=0.000 n=50+50)
Flate 170ms ± 1% 172ms ± 1% +0.97% (p=0.000 n=49+49)
GoParser 262ms ± 1% 263ms ± 1% +0.39% (p=0.000 n=49+48)
Reflect 639ms ± 1% 650ms ± 1% +1.63% (p=0.000 n=49+49)
Tar 243ms ± 1% 245ms ± 1% +0.82% (p=0.000 n=50+50)
XML 324ms ± 1% 327ms ± 1% +0.72% (p=0.000 n=50+49)
LinkCompiler 597ms ± 1% 596ms ± 1% -0.27% (p=0.001 n=48+47)
ExternalLinkCompiler 1.90s ± 1% 1.88s ± 1% -1.00% (p=0.000 n=50+50)
LinkWithoutDebugCompiler 364ms ± 1% 363ms ± 1% ~ (p=0.220 n=49+50)
[Geo mean] 485ms 488ms +0.49%
name old alloc/op new alloc/op delta
Template 38.7MB ± 0% 38.8MB ± 1% ~ (p=0.093 n=43+49)
Unicode 28.4MB ± 0% 28.4MB ± 0% +0.03% (p=0.000 n=49+45)
GoTypes 169MB ± 1% 169MB ± 1% +0.23% (p=0.010 n=50+50)
Compiler 23.2MB ± 1% 23.2MB ± 1% +0.11% (p=0.000 n=40+44)
SSA 1.54GB ± 0% 1.55GB ± 0% +0.45% (p=0.000 n=47+49)
Flate 23.8MB ± 2% 23.8MB ± 1% ~ (p=0.543 n=50+50)
GoParser 35.3MB ± 1% 35.4MB ± 1% ~ (p=0.792 n=50+50)
Reflect 85.2MB ± 1% 85.2MB ± 0% ~ (p=0.055 n=50+47)
Tar 34.5MB ± 1% 34.5MB ± 1% +0.06% (p=0.015 n=50+50)
XML 43.8MB ± 2% 43.9MB ± 2% +0.19% (p=0.000 n=48+48)
LinkCompiler 137MB ± 0% 136MB ± 0% -0.92% (p=0.000 n=50+50)
ExternalLinkCompiler 127MB ± 0% 127MB ± 0% ~ (p=0.516 n=50+50)
LinkWithoutDebugCompiler 84.0MB ± 0% 84.0MB ± 0% ~ (p=0.057 n=50+50)
[Geo mean] 70.4MB 70.4MB +0.01%
file before after Δ %
addr2line 4021557 4002933 -18624 -0.463%
api 5127847 5028503 -99344 -1.937%
asm 5034716 4936836 -97880 -1.944%
buildid 2608118 2594094 -14024 -0.538%
cgo 4488592 4398320 -90272 -2.011%
compile 22501129 22213592 -287537 -1.278%
cover 4742301 4713573 -28728 -0.606%
dist 3388071 3365311 -22760 -0.672%
doc 3802250 3776082 -26168 -0.688%
fix 3306147 3216939 -89208 -2.698%
link 6404483 6363699 -40784 -0.637%
nm 3941026 3921930 -19096 -0.485%
objdump 4383330 4295122 -88208 -2.012%
pack 2404547 2389515 -15032 -0.625%
pprof 12996234 12856818 -139416 -1.073%
test2json 2668500 2586788 -81712 -3.062%
trace 9816276 9609580 -206696 -2.106%
vet 6900682 6787338 -113344 -1.643%
total 108535806 107056973 -1478833 -1.363%
Change-Id: Iaec1cdcaacca8025e9babb0fb8a532fddb70c87d
Reviewed-on: https://go-review.googlesource.com/c/go/+/255239
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: eric fang <eric.fang@arm.com>
|
|
The register allocator has a special case that doesn't allocate
LR on ARMv5. This was necessary when softfloat expansion was done
by the assembler. Now softfloat calls are inserted by SSA, so it
works as normal. Remove this special case.
Change-Id: I5502f07597f4d4b675dc16b6b0d7cb47e1e8974b
Reviewed-on: https://go-review.googlesource.com/c/go/+/301792
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
at least for ints and strings
includes simple test
For #40724.
Change-Id: Ib8484e5b957b08f961574a67cfd93d3d26551558
Reviewed-on: https://go-review.googlesource.com/c/go/+/295309
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
still needs morestack
still needs results
lots of corner cases also not dealt with.
For #40724.
Change-Id: I03abdf1e8363d75c52969560b427e488a48cd37a
Reviewed-on: https://go-review.googlesource.com/c/go/+/293889
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
|
|
Also handles case where OpArg does not escape but has its address
taken.
May have exposed a lurking bug in 1.16 expandCalls,
if e.g., loading len(someArrayOfstructThing[0].secondStringField)
from a local. Maybe.
For #40724.
Change-Id: I0298c4ad5d652b5e3d7ed6a62095d59e2d8819c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/293396
Trust: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
in progress; doesn't fully work until they are also passed on
register on the caller side.
For #40724.
Change-Id: I29a6680e60bdbe9d132782530214f2a2b51fb8f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/293394
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
StaticLECall (multiple value in +mem, multiple value result +mem) ->
StaticCall (multiple ergister value in +mem,
multiple register-sized-value result +mem) ->
ARCH CallStatic (multiple ergister value in +mem,
multiple register-sized-value result +mem)
But the architecture-dependent stuff is indifferent to whether
it is mem->mem or (mem)->(mem) until Prog generation.
Deal with OpSelectN -> Prog in ssagen/ssa.go, others, as they
appear.
For #40724.
Change-Id: I1d0436f6371054f1881862641d8e7e418e4a6a16
Reviewed-on: https://go-review.googlesource.com/c/go/+/293391
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Now that the only remaining ir.Node implementation that is stored
(directly) into ssa.Aux, we can rewrite all of the conversions between
ir.Node and ssa.Aux to use *ir.Name instead.
rf doesn't have a way to rewrite the type switch case clauses, so we
just use sed instead. There's only a handful, and they're the only
times that "case ir.Node" appears anyway.
The next CL will move the tag method declarations so that ir.Node no
longer implements ssa.Aux.
Passes buildall w/ toolstash -cmp.
Updates #42982.
[git-generate]
cd src/cmd/compile/internal
sed -i -e 's/case ir.Node/case *ir.Name/' gc/plive.go */ssa.go
cd ssa
rf '
ex . ../gc {
import "cmd/compile/internal/ir"
var v *Value
v.Aux.(ir.Node) -> v.Aux.(*ir.Name)
var n ir.Node
var asAux func(Aux)
strict n # only match ir.Node-typed expressions; not *ir.Name
implicit asAux # match implicit assignments to ssa.Aux
asAux(n) -> n.(*ir.Name)
}
'
Change-Id: I3206ef5f12a7cfa37c5fecc67a1ca02ea4d52b32
Reviewed-on: https://go-review.googlesource.com/c/go/+/275789
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
|
|
Passes toolstash/buildall.
[git-generate]
cd src/cmd/compile/internal/ssa
rf '
ex . ../ir ../gc {
import "cmd/compile/internal/types"
var t *types.Type
t.Etype -> t.Kind()
t.Sym -> t.GetSym()
t.Orig -> t.Underlying()
}
'
cd ../types
rf '
mv EType Kind
mv IRNode Object
mv Type.Etype Type.kind
mv Type.Sym Type.sym
mv Type.Orig Type.underlying
mv Type.Cache Type.cache
mv Type.GetSym Type.Sym
mv Bytetype ByteType
mv Runetype RuneType
mv Errortype ErrorType
'
cd ../gc
sed -i 's/Bytetype/ByteType/; s/Runetype/RuneType/' mkbuiltin.go
git codereview gofmt
go install cmd/compile/internal/...
go test cmd/compile -u || go test cmd/compile
Change-Id: Ibecb2d7100d3318a49238eb4a78d70acb49eedca
Reviewed-on: https://go-review.googlesource.com/c/go/+/274437
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
|
|
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct.
The previous CL defined an interface INode modeling a *Node.
This CL:
- Changes all references outside internal/ir to use INode,
along with many references inside internal/ir as well.
- Renames Node to node.
- Renames INode to Node
So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible,
and the code outside package ir is now (clearly) using only the interface.
The usual rule is never to redefine an existing name with a new meaning,
so that old code that hasn't been updated gets a "unknown name" error
instead of more mysterious errors or silent misbehavior. That rule would
caution against replacing Node-the-struct with Node-the-interface,
as in this CL, because code that says *Node would now be using a pointer
to an interface. But this CL is being landed at the same time as another that
moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node,
which does follow the rule: any lingering references to gc.Node will be told
it's gone, not silently start using pointers to interfaces. So the rule is followed
by the CL sequence, just not this specific CL.
Overall, the loss of inlining caused by using interfaces cuts the compiler speed
by about 6%, a not insignificant amount. However, as we convert the representation
to concrete structs that are not the giant Node over the next weeks, that speed
should come back as more of the compiler starts operating directly on concrete types
and the memory taken up by the graph of Nodes drops due to the more precise
structs. Honestly, I was expecting worse.
% benchstat bench.old bench.new
name old time/op new time/op delta
Template 168ms ± 4% 182ms ± 2% +8.34% (p=0.000 n=9+9)
Unicode 72.2ms ±10% 82.5ms ± 6% +14.38% (p=0.000 n=9+9)
GoTypes 563ms ± 8% 598ms ± 2% +6.14% (p=0.006 n=9+9)
Compiler 2.89s ± 4% 3.04s ± 2% +5.37% (p=0.000 n=10+9)
SSA 6.45s ± 4% 7.25s ± 5% +12.41% (p=0.000 n=9+10)
Flate 105ms ± 2% 115ms ± 1% +9.66% (p=0.000 n=10+8)
GoParser 144ms ±10% 152ms ± 2% +5.79% (p=0.011 n=9+8)
Reflect 345ms ± 9% 370ms ± 4% +7.28% (p=0.001 n=10+9)
Tar 149ms ± 9% 161ms ± 5% +8.05% (p=0.001 n=10+9)
XML 190ms ± 3% 209ms ± 2% +9.54% (p=0.000 n=9+8)
LinkCompiler 327ms ± 2% 325ms ± 2% ~ (p=0.382 n=8+8)
ExternalLinkCompiler 1.77s ± 4% 1.73s ± 6% ~ (p=0.113 n=9+10)
LinkWithoutDebugCompiler 214ms ± 4% 211ms ± 2% ~ (p=0.360 n=10+8)
StdCmd 14.8s ± 3% 15.9s ± 1% +6.98% (p=0.000 n=10+9)
[Geo mean] 480ms 510ms +6.31%
name old user-time/op new user-time/op delta
Template 223ms ± 3% 237ms ± 3% +6.16% (p=0.000 n=9+10)
Unicode 103ms ± 6% 113ms ± 3% +9.53% (p=0.000 n=9+9)
GoTypes 758ms ± 8% 800ms ± 2% +5.55% (p=0.003 n=10+9)
Compiler 3.95s ± 2% 4.12s ± 2% +4.34% (p=0.000 n=10+9)
SSA 9.43s ± 1% 9.74s ± 4% +3.25% (p=0.000 n=8+10)
Flate 132ms ± 2% 141ms ± 2% +6.89% (p=0.000 n=9+9)
GoParser 177ms ± 9% 183ms ± 4% ~ (p=0.050 n=9+9)
Reflect 467ms ±10% 495ms ± 7% +6.17% (p=0.029 n=10+10)
Tar 183ms ± 9% 197ms ± 5% +7.92% (p=0.001 n=10+10)
XML 249ms ± 5% 268ms ± 4% +7.82% (p=0.000 n=10+9)
LinkCompiler 544ms ± 5% 544ms ± 6% ~ (p=0.863 n=9+9)
ExternalLinkCompiler 1.79s ± 4% 1.75s ± 6% ~ (p=0.075 n=10+10)
LinkWithoutDebugCompiler 248ms ± 6% 246ms ± 2% ~ (p=0.965 n=10+8)
[Geo mean] 483ms 504ms +4.41%
[git-generate]
cd src/cmd/compile/internal/ir
: # We need to do the conversion in multiple steps, so we introduce
: # a temporary type alias that will start out meaning the pointer-to-struct
: # and then change to mean the interface.
rf '
mv Node OldNode
add node.go \
type Node = *OldNode
'
: # It should work to do this ex in ir, but it misses test files, due to a bug in rf.
: # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests.
cd ../gc
rf '
ex . ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
cd ../ssa
rf '
ex {
import "cmd/compile/internal/ir"
*ir.OldNode -> ir.Node
}
'
: # Back in ir, finish conversion clumsily with sed,
: # because type checking and circular aliases do not mix.
cd ../ir
sed -i '' '
/type Node = \*OldNode/d
s/\*OldNode/Node/g
s/^func (n Node)/func (n *OldNode)/
s/OldNode/node/g
s/type INode interface/type Node interface/
s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/
' *.go
gofmt -w *.go
sed -i '' '
s/{Func{}, 136, 248}/{Func{}, 152, 280}/
s/{Name{}, 32, 56}/{Name{}, 44, 80}/
s/{Param{}, 24, 48}/{Param{}, 44, 88}/
s/{node{}, 76, 128}/{node{}, 88, 152}/
' sizeof_test.go
cd ../ssa
sed -i '' '
s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/
' sizeof_test.go
cd ../gc
sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go
cd ../../../..
go install std cmd
cd cmd/compile
go test -u || go test -u
Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553
Reviewed-on: https://go-review.googlesource.com/c/go/+/272935
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
The cycle hacks existed because gc needed to import ssa
which need to know about gc.Node. But now that's ir.Node,
and there's no cycle anymore.
Don't know how much it matters but LocalSlot is now
one word shorter than before, because it holds a pointer
instead of an interface for the *Node. That won't last long.
Now that they're not necessary for interface satisfaction,
IsSynthetic and IsAutoTmp can move to top-level ir functions.
Change-Id: Ie511e93466cfa2b17d9a91afc4bd8d53fdb80453
Reviewed-on: https://go-review.googlesource.com/c/go/+/272931
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Per https://developers.google.com/style/inclusive-documentation,
since we are editing some of this code anyway and it is easier
to put the cleanup in a separate CL.
Change-Id: Ib6b851f43f9cc0a57676564477d4ff22abb1cee5
Reviewed-on: https://go-review.googlesource.com/c/go/+/273106
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
When allocting registers for live values, use desired register if available,
this is helpful for some cases, such as (*entry).delete, which can save a
few of copies.
Besides, this patch allows more debugging information to be printed out.
Test results of compilecmp on Linux/amd64:
name old time/op new time/op delta
Template 326729362.060000ns +- 3% 329227238.775510ns +- 4% +0.76% (p=0.038 n=50+49)
Unicode 157671860.391304ns +- 6% 156917927.320000ns +- 6% ~ (p=0.291 n=46+50)
GoTypes 1065591138.304348ns +- 2% 1063695977.434783ns +- 1% ~ (p=0.208 n=46+46)
Compiler 5053424790.760001ns +- 2% 5052729636.551020ns +- 3% ~ (p=0.908 n=50+49)
SSA 12392067635.866669ns +- 2% 12319786960.460005ns +- 2% -0.58% (p=0.008 n=45+50)
Flate 212609767.340000ns +- 5% 213011228.085106ns +- 5% ~ (p=0.685 n=50+47)
GoParser 266870495.100000ns +- 4% 266962314.280000ns +- 3% ~ (p=0.975 n=50+50)
Reflect 660164306.551021ns +- 2% 658284470.729167ns +- 2% ~ (p=0.069 n=49+48)
Tar 292805895.720000ns +- 4% 292103626.954545ns +- 2% ~ (p=0.321 n=50+44)
XML 386294811.700000ns +- 4% 386665088.820000ns +- 4% ~ (p=0.786 n=50+50)
LinkCompiler 548495788.659575ns +- 5% 549359489.102041ns +- 4% ~ (p=0.855 n=47+49)
ExternalLinkCompiler 1810414270.280000ns +- 2% 1806872224.673470ns +- 2% ~ (p=0.313 n=50+49)
LinkWithoutDebugCompiler 340888843.795918ns +- 5% 340341541.100000ns +- 6% ~ (p=0.735 n=49+50)
[Geo mean] 664550174.613777ns 664090221.153575ns -0.07%
name old user-time/op new user-time/op delta
Template 565202800.000000ns +-16% 595351040.000000ns +-16% +5.33% (p=0.001 n=50+50)
Unicode 378444740.000000ns +-14% 373825183.673469ns +-17% ~ (p=0.458 n=50+49)
GoTypes 2052073341.463415ns +-12% 2059679864.864865ns +- 7% ~ (p=0.381 n=41+37)
Compiler 9913371980.000000ns +-20% 9848836720.000002ns +-19% ~ (p=0.781 n=50+50)
SSA 25013846224.489799ns +-17% 24571896183.673466ns +-17% ~ (p=0.132 n=49+49)
Flate 314422702.127660ns +-17% 314831666.666667ns +-11% ~ (p=0.427 n=47+45)
GoParser 419496060.000000ns +- 9% 417403460.000000ns +-11% ~ (p=0.512 n=50+50)
Reflect 1233632469.387755ns +-17% 1193061073.170732ns +-13% -3.29% (p=0.030 n=49+41)
Tar 509855937.500000ns +-10% 508700740.000000ns +-14% ~ (p=0.890 n=48+50)
XML 703511425.531915ns +-12% 694007591.836735ns +-11% ~ (p=0.164 n=47+49)
LinkCompiler 993137687.500000ns +- 6% 991914714.285714ns +- 8% ~ (p=0.860 n=48+49)
ExternalLinkCompiler 2193851840.000001ns +- 3% 2186672183.673470ns +- 5% ~ (p=0.320 n=50+49)
LinkWithoutDebugCompiler 420800875.000000ns +-10% 422062640.000000ns +- 9% ~ (p=0.840 n=48+50)
[Geo mean] 1145156131.480097ns 1142033233.550961ns -0.27%
name old alloc/op new alloc/op delta
Template 36.3MB +- 0% 36.3MB +- 0% ~ (p=0.886 n=50+49)
Unicode 30.1MB +- 0% 30.1MB +- 0% ~ (p=0.792 n=50+50)
GoTypes 118MB +- 0% 118MB +- 0% ~ (p=1.000 n=47+48)
Compiler 562MB +- 0% 562MB +- 0% ~ (p=0.205 n=50+49)
SSA 1.42GB +- 0% 1.42GB +- 0% -0.12% (p=0.000 n=50+50)
Flate 22.8MB +- 0% 22.8MB +- 0% ~ (p=0.384 n=50+47)
GoParser 28.0MB +- 0% 28.0MB +- 0% -0.02% (p=0.013 n=50+50)
Reflect 78.0MB +- 0% 78.0MB +- 0% ~ (p=0.384 n=46+48)
Tar 34.1MB +- 0% 34.1MB +- 0% ~ (p=0.072 n=50+50)
XML 43.1MB +- 0% 43.1MB +- 0% -0.04% (p=0.000 n=49+50)
LinkCompiler 98.5MB +- 0% 98.5MB +- 0% +0.01% (p=0.012 n=50+43)
ExternalLinkCompiler 89.6MB +- 0% 89.6MB +- 0% ~ (p=0.762 n=50+50)
LinkWithoutDebugCompiler 56.9MB +- 0% 56.9MB +- 0% ~ (p=0.268 n=49+48)
[Geo mean] 77.7MB 77.7MB -0.01%
name old allocs/op new allocs/op delta
Template 367k +- 0% 367k +- 0% -0.01% (p=0.002 n=50+49)
Unicode 345k +- 0% 345k +- 0% ~ (p=0.981 n=50+50)
GoTypes 1.28M +- 0% 1.28M +- 0% -0.00% (p=0.002 n=49+50)
Compiler 5.39M +- 0% 5.39M +- 0% -0.00% (p=0.000 n=50+50)
SSA 13.9M +- 0% 13.9M +- 0% +0.01% (p=0.000 n=50+50)
Flate 230k +- 0% 230k +- 0% ~ (p=0.815 n=50+50)
GoParser 292k +- 0% 292k +- 0% -0.01% (p=0.000 n=50+50)
Reflect 977k +- 0% 977k +- 0% -0.00% (p=0.035 n=50+50)
Tar 343k +- 0% 343k +- 0% -0.01% (p=0.008 n=48+50)
XML 418k +- 0% 418k +- 0% -0.01% (p=0.000 n=50+50)
LinkCompiler 516k +- 0% 516k +- 0% +0.01% (p=0.002 n=50+48)
ExternalLinkCompiler 570k +- 0% 570k +- 0% ~ (p=0.430 n=46+50)
LinkWithoutDebugCompiler 169k +- 0% 169k +- 0% ~ (p=0.706 n=49+49)
[Geo mean] 672k 672k -0.00%
name old maxRSS/op new maxRSS/op delta
Template 34.3M +- 5% 34.7M +- 4% +1.24% (p=0.004 n=50+50)
Unicode 36.2M +- 5% 36.1M +- 8% ~ (p=0.785 n=50+50)
GoTypes 75.7M +- 7% 76.1M +- 6% ~ (p=0.544 n=50+50)
Compiler 304M +- 7% 304M +- 7% ~ (p=0.744 n=50+50)
SSA 721M +- 6% 723M +- 7% ~ (p=0.724 n=49+50)
Flate 26.1M +- 3% 26.1M +- 5% ~ (p=0.649 n=48+49)
GoParser 29.3M +- 5% 29.3M +- 4% ~ (p=0.809 n=50+50)
Reflect 56.0M +- 6% 56.3M +- 5% ~ (p=0.350 n=50+50)
Tar 34.1M +- 3% 33.9M +- 5% ~ (p=0.121 n=49+50)
XML 39.6M +- 5% 39.9M +- 4% ~ (p=0.109 n=50+50)
LinkCompiler 168M +- 1% 168M +- 1% ~ (p=0.578 n=49+48)
ExternalLinkCompiler 179M +- 1% 179M +- 2% ~ (p=0.522 n=46+46)
LinkWithoutDebugCompiler 137M +- 3% 137M +- 3% ~ (p=0.463 n=41+50)
[Geo mean] 79.3M 79.5M +0.20%
name old text-bytes new text-bytes delta
HelloSize 812kB +- 0% 811kB +- 0% -0.05% (p=0.000 n=50+50)
name old data-bytes new data-bytes delta
HelloSize 13.3kB +- 0% 13.3kB +- 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 206kB +- 0% 206kB +- 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.21MB +- 0% 1.21MB +- 0% +0.02% (p=0.000 n=50+50)
file before after Δ %
addr2line 4052949 4052453 -496 -0.012%
api 4948171 4947163 -1008 -0.020%
asm 4888889 4888049 -840 -0.017%
buildid 2617545 2617673 +128 +0.005%
cgo 4521681 4516801 -4880 -0.108%
compile 19139091 19137683 -1408 -0.007%
cover 4843191 4840359 -2832 -0.058%
dist 3473677 3474717 +1040 +0.030%
doc 3821592 3821552 -40 -0.001%
fix 3220587 3220059 -528 -0.016%
link 6587368 6582696 -4672 -0.071%
nm 3999858 3999186 -672 -0.017%
objdump 4409161 4408217 -944 -0.021%
pack 2394038 2393846 -192 -0.008%
pprof 13601271 13602487 +1216 +0.009%
test2json 2645148 2644604 -544 -0.021%
trace 10357878 10356862 -1016 -0.010%
vet 6779482 6778706 -776 -0.011%
total 106301577 106283113 -18464 -0.017%
Change-Id: I63ac6e224e1a4756ddc1bfc4aabbaeb92d7d4273
Reviewed-on: https://go-review.googlesource.com/c/go/+/263599
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
When allocating registers for phi value, only the primary predecessor is considered.
Taking into account the allocation status of other predecessors can help reduce
unnecessary copy or spill operations. Many such cases can be found in the standard
library, such as runtime.wirep, moveByType, etc. The test results from benchstat
also show that this change helps reduce the file size.
name old time/op new time/op delta
Template 328ms ± 5% 326ms ± 4% ~ (p=0.254 n=50+47)
Unicode 156ms ± 7% 158ms ±10% ~ (p=0.412 n=49+49)
GoTypes 1.07s ± 3% 1.07s ± 2% ~ (p=0.664 n=48+49)
Compiler 4.43s ± 3% 4.44s ± 3% ~ (p=0.758 n=48+50)
SSA 10.3s ± 2% 10.4s ± 2% +0.43% (p=0.017 n=50+46)
Flate 208ms ± 9% 209ms ± 7% ~ (p=0.920 n=49+46)
GoParser 260ms ± 5% 262ms ± 4% ~ (p=0.063 n=50+48)
Reflect 687ms ± 3% 685ms ± 2% ~ (p=0.459 n=50+48)
Tar 293ms ± 4% 293ms ± 5% ~ (p=0.695 n=49+48)
XML 391ms ± 4% 389ms ± 3% ~ (p=0.109 n=49+46)
LinkCompiler 570ms ± 5% 563ms ± 5% -1.10% (p=0.006 n=46+47)
ExternalLinkCompiler 1.57s ± 3% 1.56s ± 3% ~ (p=0.118 n=47+46)
LinkWithoutDebugCompiler 349ms ± 6% 349ms ± 5% ~ (p=0.726 n=49+47)
[Geo mean] 645ms 645ms -0.05%
name old user-time/op new user-time/op delta
Template 507ms ±14% 513ms ±14% ~ (p=0.398 n=48+49)
Unicode 345ms ±29% 345ms ±38% ~ (p=0.521 n=47+49)
GoTypes 1.95s ±16% 1.94s ±19% ~ (p=0.324 n=50+50)
Compiler 8.26s ±16% 8.22s ±14% ~ (p=0.834 n=50+50)
SSA 19.6s ± 8% 19.2s ±15% ~ (p=0.056 n=50+50)
Flate 293ms ± 9% 299ms ±12% ~ (p=0.057 n=47+50)
GoParser 388ms ± 9% 387ms ±14% ~ (p=0.660 n=46+50)
Reflect 1.15s ±28% 1.12s ±18% ~ (p=0.648 n=49+48)
Tar 456ms ±10% 476ms ±15% +4.48% (p=0.001 n=46+48)
XML 648ms ±27% 634ms ±16% ~ (p=0.685 n=50+46)
LinkCompiler 1.00s ± 8% 1.00s ± 8% ~ (p=0.638 n=50+50)
ExternalLinkCompiler 1.96s ± 5% 1.96s ± 5% ~ (p=0.792 n=50+50)
LinkWithoutDebugCompiler 443ms ±10% 442ms ±11% ~ (p=0.813 n=50+50)
[Geo mean] 1.05s 1.05s -0.09%
name old alloc/op new alloc/op delta
Template 36.0MB ± 0% 36.0MB ± 0% ~ (p=0.599 n=49+50)
Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=0.739 n=50+50)
GoTypes 118MB ± 0% 118MB ± 0% ~ (p=0.436 n=50+50)
Compiler 562MB ± 0% 562MB ± 0% ~ (p=0.693 n=50+50)
SSA 1.42GB ± 0% 1.42GB ± 0% -0.10% (p=0.000 n=50+49)
Flate 22.5MB ± 0% 22.5MB ± 0% ~ (p=0.429 n=48+49)
GoParser 27.7MB ± 0% 27.7MB ± 0% ~ (p=0.705 n=49+48)
Reflect 77.7MB ± 0% 77.7MB ± 0% -0.01% (p=0.043 n=50+50)
Tar 33.8MB ± 0% 33.8MB ± 0% ~ (p=0.241 n=49+50)
XML 42.8MB ± 0% 42.8MB ± 0% ~ (p=0.677 n=47+49)
LinkCompiler 98.3MB ± 0% 98.3MB ± 0% ~ (p=0.157 n=50+50)
ExternalLinkCompiler 89.4MB ± 0% 89.4MB ± 0% ~ (p=0.683 n=50+50)
LinkWithoutDebugCompiler 56.7MB ± 0% 56.7MB ± 0% ~ (p=0.155 n=49+49)
[Geo mean] 77.3MB 77.3MB -0.01%
name old allocs/op new allocs/op delta
Template 367k ± 0% 367k ± 0% ~ (p=0.863 n=50+50)
Unicode 345k ± 0% 345k ± 0% ~ (p=0.744 n=49+49)
GoTypes 1.28M ± 0% 1.28M ± 0% ~ (p=0.957 n=48+50)
Compiler 5.39M ± 0% 5.39M ± 0% +0.00% (p=0.012 n=50+49)
SSA 13.9M ± 0% 13.9M ± 0% +0.02% (p=0.000 n=47+49)
Flate 230k ± 0% 230k ± 0% -0.01% (p=0.007 n=47+49)
GoParser 292k ± 0% 292k ± 0% ~ (p=0.891 n=50+49)
Reflect 977k ± 0% 977k ± 0% ~ (p=0.274 n=50+50)
Tar 343k ± 0% 343k ± 0% ~ (p=0.942 n=50+50)
XML 418k ± 0% 418k ± 0% ~ (p=0.374 n=50+49)
LinkCompiler 516k ± 0% 516k ± 0% ~ (p=0.205 n=49+47)
ExternalLinkCompiler 570k ± 0% 570k ± 0% ~ (p=0.783 n=49+47)
LinkWithoutDebugCompiler 169k ± 0% 169k ± 0% ~ (p=0.233 n=50+46)
[Geo mean] 672k 672k +0.00%
name old maxRSS/op new maxRSS/op delta
Template 34.5M ± 3% 34.4M ± 3% ~ (p=0.566 n=49+48)
Unicode 36.0M ± 6% 35.9M ± 6% ~ (p=0.736 n=50+50)
GoTypes 75.7M ± 7% 75.4M ± 5% ~ (p=0.412 n=50+50)
Compiler 314M ±10% 313M ± 8% ~ (p=0.708 n=50+50)
SSA 730M ± 6% 735M ± 6% ~ (p=0.324 n=50+50)
Flate 25.8M ± 5% 25.6M ± 6% ~ (p=0.415 n=49+50)
GoParser 28.5M ± 3% 28.5M ± 4% ~ (p=0.977 n=46+50)
Reflect 57.4M ± 4% 57.2M ± 3% ~ (p=0.173 n=50+50)
Tar 33.3M ± 3% 33.2M ± 4% ~ (p=0.621 n=48+50)
XML 39.6M ± 5% 39.6M ± 4% ~ (p=0.997 n=50+50)
LinkCompiler 168M ± 2% 167M ± 1% ~ (p=0.072 n=49+45)
ExternalLinkCompiler 179M ± 1% 179M ± 1% ~ (p=0.147 n=48+50)
LinkWithoutDebugCompiler 136M ± 1% 136M ± 1% ~ (p=0.789 n=47+49)
[Geo mean] 79.2M 79.1M -0.12%
name old text-bytes new text-bytes delta
HelloSize 812kB ± 0% 811kB ± 0% -0.06% (p=0.000 n=50+50)
name old data-bytes new data-bytes delta
HelloSize 13.3kB ± 0% 13.3kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 206kB ± 0% 206kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.21MB ± 0% 1.21MB ± 0% -0.03% (p=0.000 n=50+50)
file before after Δ %
addr2line 4057421 4056237 -1184 -0.029%
api 4952451 4946715 -5736 -0.116%
asm 4888993 4888185 -808 -0.017%
buildid 2617705 2616441 -1264 -0.048%
cgo 4521849 4520681 -1168 -0.026%
compile 19143451 19141243 -2208 -0.012%
cover 4847391 4837151 -10240 -0.211%
dist 3473877 3472565 -1312 -0.038%
doc 3821496 3820432 -1064 -0.028%
fix 3220587 3220659 +72 +0.002%
link 6587504 6582576 -4928 -0.075%
nm 4000154 3998690 -1464 -0.037%
objdump 4409449 4407625 -1824 -0.041%
pack 2398086 2393110 -4976 -0.207%
pprof 13599060 13606111 +7051 +0.052%
test2json 2645148 2645692 +544 +0.021%
trace 10355281 10355862 +581 +0.006%
vet 6780026 6779666 -360 -0.005%
total 106319929 106289641 -30288 -0.028%
Change-Id: Ia5399286958c187c8664c769bbddf7bc4c1cae99
Reviewed-on: https://go-review.googlesource.com/c/go/+/263600
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
My last 387 CL. So sad ... ... ... ... not!
Fixes #40255
Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/258957
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
I think they are no longer experimental status. Might as well promote
them to permanent.
Change-Id: Id1259601b3dd2061dd60df86ee48080bfb575d2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/249857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
When setting the edge state in register allocation we should only
be setting each register once. It is not possible for a register
to hold multiple values at once.
This CL converts the runtime error seen in #38195 into an internal
compiler error (ICE). It is better for the compiler to fail than
generate an incorrect program.
The bug reported in #38195 is now exposed as:
./parserc.go:459:11: internal compiler error: 'yaml_parser_parse_node': R5 is already set (v1074/v1241)
[stack trace]
Updates #38195.
Change-Id: Id95842fd850b95494cbd472b6fd5a55513ecacec
Reviewed-on: https://go-review.googlesource.com/c/go/+/228060
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
When deallocating the input register to a phi so that the phi
itself could be allocated to that register the code was also
deallocating all copies of that phi input value. Those copies
of the value could still be live and if they were the register
allocator could reuse them incorrectly to hold speculative
copies of other phi inputs. This causes strange bugs.
No test because this is a very obscure scenario that is hard
to replicate but CL 228060 adds an assertion to the compiler
that does trigger when running the std tests on linux/s390x
without this CL applied. Hopefully that assertion will prevent
future regressions.
Fixes #38195.
Change-Id: Id975dadedd731c7bb21933b9ea6b17daaa5c9e1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/228061
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Once defined, a stack slot holding an open-coded defer arg should always be marked
live, since it may be used at any time if there is a panic. These stack slots are
typically kept live naturally by the open-defer code inlined at each return/exit point.
However, we need to do extra work to make sure that they are kept live if a
function has an infinite loop or a panic exit.
For this fix, only in the case of a function that is using open-coded defers, we
compute the set of blocks (most often empty) that cannot reach a return or a
BlockExit (panic) because of an infinite loop. Then, for each block b which
cannot reach a return or BlockExit or is a BlockExit block, we mark each defer arg
slot as live, as long as the definition of the defer arg slot dominates block b.
For this change, had to export (*Func).sdom (-> Sdom) and SparseTree.isAncestorEq
(-> IsAncestorEq)
Updates #35277
Change-Id: I7b53c9bd38ba384a3794386dd0eb94e4cbde4eb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204802
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This is part two if the nacl removal. Part 1 was CL 199499.
This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.
Updates #30439
Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
For commuting ops, check whether the second argument is dead before
checking if the first argument is rematerializeable. Reusing the register
holding a dead value is always best.
Fixes #33580
Change-Id: I7372cfc03d514e6774d2d9cc727a3e6bf6ce2657
Reviewed-on: https://go-review.googlesource.com/c/go/+/199559
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.
Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.
This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.
Passes toolstash-check -all.
Results of compilebench:
name old time/op new time/op delta
Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20)
Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18)
GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18)
Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18)
SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18)
Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20)
GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20)
Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20)
Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20)
XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18)
LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19)
ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20)
LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20)
StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20)
name old user-time/op new user-time/op delta
Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20)
Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20)
GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19)
Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18)
SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18)
Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18)
GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20)
Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18)
Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20)
XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20)
LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20)
ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19)
LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20)
name old object-bytes new object-bytes delta
Template 559kB ± 0% 559kB ± 0% ~ (all equal)
Unicode 216kB ± 0% 216kB ± 0% ~ (all equal)
GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal)
Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20)
SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20)
Flate 343kB ± 0% 343kB ± 0% ~ (all equal)
GoParser 441kB ± 0% 441kB ± 0% ~ (all equal)
Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal)
Tar 487kB ± 0% 487kB ± 0% ~ (all equal)
XML 632kB ± 0% 632kB ± 0% ~ (all equal)
name old export-bytes new export-bytes delta
Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal)
Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal)
GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal)
Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20)
SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20)
Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal)
GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal)
Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal)
Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal)
XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal)
name old text-bytes new text-bytes delta
HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal)
CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal)
CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal)
CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal)
CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal)
Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Before this change, wasm only used float variables with a size of 64 bit
and applied rounding to 32 bit precision where necessary. This change
adds proper 32 bit float variables.
Reduces the size of pkg/js_wasm by 254 bytes.
Change-Id: Ieabe846a8cb283d66def3cdf11e2523b3b31f345
Reviewed-on: https://go-review.googlesource.com/c/go/+/195117
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Use the following (suboptimal) script to obtain a list of possible
typos:
#!/usr/bin/env sh
set -x
git ls-files |\
grep -e '\.\(c\|cc\|go\)$' |\
xargs -n 1\
awk\
'/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
hunspell -d en_US -l |\
grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
sort -f |\
uniq -c |\
awk '$1 == 1 { print $2; }'
Then, go through the results manually and fix the most obvious typos in
the non-vendored code.
Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
We shouldn't mask to desired registers if we haven't masked out all the
forbidden registers yet. In this path we haven't masked out the nospill
registers yet. If the resulting mask contains only nospill registers, then
allocReg fails.
This can only happen on resultNotInArgs-marked instructions, which exist
only on the ARM64, MIPS, MIPS64, and PPC64 ports.
Maybe there's a better way to handle resultNotInArgs instructions.
But for 1.13, this is a low-risk fix.
Fixes #33355
Change-Id: I1082f78f798d1371bde65c58cc265540480e4fa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/188178
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Updates #27739: reduces package ssa's allocated space by 3.77%.
maxrss is harder to measure, but using best-of-three-runs
as reported by /usr/bin/time -l, I see ~2% reduction in maxrss.
We still have a long way to go, though; the new maxrss is still 1.1gb.
name old alloc/op new alloc/op delta
Template 38.8MB ± 0% 37.7MB ± 0% -2.77% (p=0.008 n=5+5)
Unicode 28.2MB ± 0% 28.1MB ± 0% -0.20% (p=0.008 n=5+5)
GoTypes 131MB ± 0% 127MB ± 0% -2.94% (p=0.008 n=5+5)
Compiler 606MB ± 0% 587MB ± 0% -3.21% (p=0.008 n=5+5)
SSA 2.14GB ± 0% 2.06GB ± 0% -3.77% (p=0.008 n=5+5)
Flate 24.0MB ± 0% 23.3MB ± 0% -3.00% (p=0.008 n=5+5)
GoParser 28.8MB ± 0% 28.1MB ± 0% -2.61% (p=0.008 n=5+5)
Reflect 83.8MB ± 0% 81.5MB ± 0% -2.71% (p=0.008 n=5+5)
Tar 36.4MB ± 0% 35.4MB ± 0% -2.73% (p=0.008 n=5+5)
XML 47.9MB ± 0% 46.7MB ± 0% -2.49% (p=0.008 n=5+5)
[Geo mean] 84.6MB 82.4MB -2.65%
name old allocs/op new allocs/op delta
Template 379k ± 0% 379k ± 0% -0.05% (p=0.008 n=5+5)
Unicode 340k ± 0% 340k ± 0% ~ (p=0.151 n=5+5)
GoTypes 1.36M ± 0% 1.36M ± 0% -0.06% (p=0.008 n=5+5)
Compiler 5.49M ± 0% 5.48M ± 0% -0.03% (p=0.008 n=5+5)
SSA 17.5M ± 0% 17.5M ± 0% -0.03% (p=0.008 n=5+5)
Flate 235k ± 0% 235k ± 0% -0.04% (p=0.008 n=5+5)
GoParser 302k ± 0% 302k ± 0% -0.04% (p=0.008 n=5+5)
Reflect 976k ± 0% 975k ± 0% -0.10% (p=0.008 n=5+5)
Tar 352k ± 0% 352k ± 0% -0.06% (p=0.008 n=5+5)
XML 436k ± 0% 436k ± 0% -0.03% (p=0.008 n=5+5)
[Geo mean] 842k 841k -0.04%
Change-Id: I0ab6631b5a0bb6303c291dcb0367b586a4e584fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/176221
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Currently, runtime.KeepAlive applied on a stack object doesn't
actually keeps the stack object alive, and the heap object
referenced from it could be collected. This is because the
address of the stack object is rematerializeable, and we just
ignored KeepAlive on rematerializeable values. This CL fixes it.
Fixes #30476.
Change-Id: Ic1f75ee54ed94ea79bd46a8ddcd9e81d01556d1d
Reviewed-on: https://go-review.googlesource.com/c/164537
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Document what the fields of regalloc mean.
Hopefully will help people understand how the register allocator works.
Change-Id: Ic322ed2019cc839b812740afe8cd2cf0b61da046
Reviewed-on: https://go-review.googlesource.com/137016
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Makes code simpler and faster (at least on x86).
name old time/op new time/op delta
CountRegs-8 7.40ns ± 1% 0.59ns ± 0% -92.02% (p=0.000 n=9+9)
PickReg/(1<<0)-8 2.07ns ± 0% 0.37ns ± 0% -82.13% (p=0.000 n=9+10)
PickReg/(1<<16)-8 11.8ns ± 0% 0.4ns ± 0% -96.86% (p=0.002 n=8+10)
Change-Id: Ic780b615b75c25b6e7632a0de93b16a8e9ed0f8f
Reviewed-on: https://go-review.googlesource.com/120318
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
On architectures where G is stored in a register, it is
possible for a variable to allocated to it, and subsequently
that variable may be spilled and reloaded, for example
because of an intervening call. If such an allocation
reaches a join point and it is the primary predecessor,
it becomes the target of a reload, which is only usually
right.
Fix: guard all the LoadReg ops, and spill value in the G
register (if any) before merges (in the same way that 387
FP registers are freed between blocks).
Includes test.
Fixes #25504.
Change-Id: I0482a53e20970c7315bf09c0e407ae5bba2fe05d
Reviewed-on: https://go-review.googlesource.com/114695
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Go's SSA instructions only operate on registers. For example, an add
instruction would read two registers, do the addition and then write
to a register. WebAssembly's instructions, on the other hand, operate
on the stack. The add instruction first pops two values from the stack,
does the addition, then pushes the result to the stack. To fulfill
Go's semantics, one needs to map Go's single add instruction to
4 WebAssembly instructions:
- Push the value of local variable A to the stack
- Push the value of local variable B to the stack
- Do addition
- Write value from stack to local variable C
Now consider that B was set to the constant 42 before the addition:
- Push constant 42 to the stack
- Write value from stack to local variable B
This works, but is inefficient. Instead, the stack is used directly
by inlining instructions if possible. With inlining it becomes:
- Push the value of local variable A to the stack (add)
- Push constant 42 to the stack (constant)
- Do addition (add)
- Write value from stack to local variable C (add)
Note that the two SSA instructions can not be generated sequentially
anymore, because their WebAssembly instructions are interleaved.
Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4
Updates #18892
Change-Id: Ie35e1c0bebf4985fddda0d6330eb2066f9ad6dec
Reviewed-on: https://go-review.googlesource.com/103535
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
CL 74410 added rules to combine consecutive byte loads and
stores when the byte order was little endian for ppc64le. This
is the corresponding change for bytes that are in big endian order.
These rules are all intended for a little endian target arch.
This adds new testcases in test/codegen/memcombine.go
Fixes #22496
Updates #24242
Benchmark improvement for encoding/binary:
name old time/op new time/op delta
ReadSlice1000Int32s-16 11.0µs ± 0% 9.0µs ± 0% -17.47% (p=0.029 n=4+4)
ReadStruct-16 2.47µs ± 1% 2.48µs ± 0% +0.67% (p=0.114 n=4+4)
ReadInts-16 642ns ± 1% 630ns ± 1% -2.02% (p=0.029 n=4+4)
WriteInts-16 654ns ± 0% 653ns ± 1% -0.08% (p=0.629 n=4+4)
WriteSlice1000Int32s-16 8.75µs ± 0% 8.20µs ± 0% -6.19% (p=0.029 n=4+4)
PutUint16-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4)
PutUint32-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4)
PutUint64-16 1.85ns ± 0% 0.93ns ± 0% -49.73% (p=0.029 n=4+4)
LittleEndianPutUint16-16 1.03ns ± 0% 0.93ns ± 0% -9.71% (p=0.029 n=4+4)
LittleEndianPutUint32-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal)
LittleEndianPutUint64-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal)
PutUvarint32-16 43.0ns ± 0% 43.1ns ± 0% +0.12% (p=0.429 n=4+4)
PutUvarint64-16 174ns ± 0% 175ns ± 0% +0.29% (p=0.429 n=4+4)
Updates made to functions in gcm.go to enable their matching. An existing
testcase prevents these functions from being replaced by those in encoding/binary
due to import dependencies.
Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a
Reviewed-on: https://go-review.googlesource.com/98136
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
R11 is only used as a temporary by a very small set of instructions
(DIV, MOD, MULH and extended MVC/XC instructions). By marking these
instructions as clobbering R11 we can allocate R11 in the general
case.
Change-Id: I0d4ffe80e57c164d42a5ea5ef6308756a5b0f742
Reviewed-on: https://go-review.googlesource.com/110255
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Before:
live values at end of each block
b1: v3 v2 v7 avoid=0
b2: v3 v13 avoid=81
b3: v19[AX] v3 avoid=81
b6: avoid=0
b7: avoid=0
b5: avoid=0
b4: v3 v18 avoid=81
After:
live values at end of each block
b1: v3 v2 v7
b2: v3 v13 avoid=AX DI
b3: v19[AX] v3 avoid=AX DI
b6:
b7:
b5:
b4: v3 v18 avoid=AX DI
Change-Id: Ibec5c76a16151832b8d49a21c640699fdc9a9d28
Reviewed-on: https://go-review.googlesource.com/109000
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.
This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.
The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.
For #24543.
Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
Stores to auto tmp variables can be hoisted to places
where the line numbers make debugging look "jumpy".
Turning those instructions into ones with is_stmt = 0 in
the DWARF (accomplished by marking ssa nodes with NotStmt)
makes debugging look better while still attributing the
instructions with the correct line number.
The same is true for certain register allocator spills and
reloads.
Change-Id: I97a394eb522d4911cc40b4bf5bf76d3d7221f6c0
Reviewed-on: https://go-review.googlesource.com/98415
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This resolves a long-standing regalloc TODO:
If you must evict a register, choose to evict a register
containing a rematerializable value, since that value
won't need to be spilled.
Provides very minor performance and size improvements.
name old time/op new time/op delta
BinaryTree17-8 2.20s ± 3% 2.18s ± 2% -0.77% (p=0.000 n=45+49)
Fannkuch11-8 2.14s ± 2% 2.15s ± 2% +0.73% (p=0.000 n=43+44)
FmtFprintfEmpty-8 30.6ns ± 4% 30.2ns ± 3% -1.14% (p=0.000 n=50+48)
FmtFprintfString-8 54.5ns ± 6% 53.6ns ± 5% -1.64% (p=0.001 n=50+48)
FmtFprintfInt-8 58.0ns ± 7% 57.6ns ± 4% ~ (p=0.220 n=50+50)
FmtFprintfIntInt-8 85.3ns ± 2% 84.8ns ± 3% -0.62% (p=0.001 n=44+47)
FmtFprintfPrefixedInt-8 93.9ns ± 6% 93.6ns ± 5% ~ (p=0.706 n=50+48)
FmtFprintfFloat-8 178ns ± 4% 177ns ± 4% ~ (p=0.107 n=49+50)
FmtManyArgs-8 376ns ± 4% 374ns ± 3% -0.58% (p=0.013 n=45+50)
GobDecode-8 4.77ms ± 2% 4.76ms ± 3% ~ (p=0.059 n=47+46)
GobEncode-8 4.04ms ± 2% 3.99ms ± 3% -1.13% (p=0.000 n=49+49)
Gzip-8 177ms ± 2% 180ms ± 3% +1.43% (p=0.000 n=48+48)
Gunzip-8 28.5ms ± 6% 28.3ms ± 5% ~ (p=0.104 n=50+49)
HTTPClientServer-8 72.1µs ± 1% 72.0µs ± 1% -0.15% (p=0.042 n=48+42)
JSONEncode-8 9.81ms ± 5% 10.03ms ± 6% +2.29% (p=0.000 n=50+49)
JSONDecode-8 39.2ms ± 3% 39.3ms ± 2% ~ (p=0.095 n=49+49)
Mandelbrot200-8 3.48ms ± 2% 3.46ms ± 2% -0.80% (p=0.000 n=47+48)
GoParse-8 2.54ms ± 3% 2.51ms ± 3% -1.35% (p=0.000 n=49+49)
RegexpMatchEasy0_32-8 66.0ns ± 7% 65.7ns ± 8% ~ (p=0.331 n=50+50)
RegexpMatchEasy0_1K-8 155ns ± 4% 154ns ± 4% ~ (p=0.986 n=49+50)
RegexpMatchEasy1_32-8 62.6ns ± 8% 62.2ns ± 5% ~ (p=0.395 n=50+49)
RegexpMatchEasy1_1K-8 260ns ± 5% 255ns ± 3% -1.92% (p=0.000 n=49+49)
RegexpMatchMedium_32-8 92.9ns ± 2% 91.8ns ± 2% -1.25% (p=0.000 n=46+48)
RegexpMatchMedium_1K-8 27.7µs ± 3% 27.0µs ± 2% -2.59% (p=0.000 n=49+49)
RegexpMatchHard_32-8 1.23µs ± 4% 1.21µs ± 2% -2.16% (p=0.000 n=49+44)
RegexpMatchHard_1K-8 36.4µs ± 2% 35.7µs ± 2% -1.87% (p=0.000 n=48+49)
Revcomp-8 274ms ± 2% 276ms ± 3% +0.70% (p=0.034 n=45+48)
Template-8 45.1ms ± 8% 45.1ms ± 8% ~ (p=0.643 n=50+50)
TimeParse-8 223ns ± 2% 223ns ± 2% ~ (p=0.401 n=47+47)
TimeFormat-8 245ns ± 2% 246ns ± 3% ~ (p=0.758 n=49+50)
[Geo mean] 36.5µs 36.3µs -0.54%
name old object-bytes new object-bytes delta
Template 480kB ± 0% 480kB ± 0% ~ (all equal)
Unicode 214kB ± 0% 214kB ± 0% ~ (all equal)
GoTypes 1.54MB ± 0% 1.54MB ± 0% -0.03% (p=0.008 n=5+5)
Compiler 5.75MB ± 0% 5.75MB ± 0% ~ (all equal)
SSA 14.6MB ± 0% 14.6MB ± 0% -0.01% (p=0.008 n=5+5)
Flate 300kB ± 0% 300kB ± 0% -0.01% (p=0.008 n=5+5)
GoParser 366kB ± 0% 366kB ± 0% ~ (all equal)
Reflect 1.20MB ± 0% 1.20MB ± 0% ~ (all equal)
Tar 413kB ± 0% 413kB ± 0% ~ (all equal)
XML 529kB ± 0% 528kB ± 0% -0.13% (p=0.008 n=5+5)
[Geo mean] 909kB 909kB -0.02%
Change-Id: I46d37a55197683a98913f35801dc2b0d609653c8
Reviewed-on: https://go-review.googlesource.com/103240
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Fixes #24132
name old time/op new time/op delta
BinaryTree17-8 2.18s ± 2% 2.15s ± 2% -1.28% (p=0.000 n=25+26)
Fannkuch11-8 2.16s ± 3% 2.13s ± 3% -1.54% (p=0.000 n=27+30)
FmtFprintfEmpty-8 29.9ns ± 3% 29.6ns ± 3% -1.08% (p=0.001 n=29+26)
FmtFprintfString-8 53.6ns ± 2% 54.0ns ± 4% ~ (p=0.193 n=28+29)
FmtFprintfInt-8 56.8ns ± 3% 57.0ns ± 3% ~ (p=0.330 n=29+29)
FmtFprintfIntInt-8 85.3ns ± 2% 85.8ns ± 3% +0.56% (p=0.042 n=30+29)
FmtFprintfPrefixedInt-8 94.1ns ± 5% 99.0ns ± 8% +5.20% (p=0.000 n=27+30)
FmtFprintfFloat-8 183ns ± 4% 182ns ± 3% ~ (p=0.619 n=30+26)
FmtManyArgs-8 369ns ± 2% 369ns ± 2% ~ (p=0.748 n=27+29)
GobDecode-8 4.78ms ± 2% 4.75ms ± 1% ~ (p=0.051 n=28+27)
GobEncode-8 4.06ms ± 3% 4.07ms ± 3% ~ (p=0.781 n=29+30)
Gzip-8 178ms ± 2% 177ms ± 2% ~ (p=0.171 n=29+30)
Gunzip-8 28.2ms ± 7% 28.0ms ± 4% ~ (p=0.155 n=30+30)
HTTPClientServer-8 71.5µs ± 3% 71.3µs ± 1% ~ (p=0.913 n=25+27)
JSONEncode-8 9.71ms ± 5% 9.86ms ± 4% +1.55% (p=0.015 n=28+30)
JSONDecode-8 38.8ms ± 2% 39.3ms ± 2% +1.41% (p=0.000 n=28+29)
Mandelbrot200-8 3.47ms ± 6% 3.44ms ± 3% ~ (p=0.183 n=28+28)
GoParse-8 2.55ms ± 2% 2.54ms ± 3% -0.58% (p=0.003 n=27+29)
RegexpMatchEasy0_32-8 66.0ns ± 5% 65.3ns ± 4% ~ (p=0.124 n=30+30)
RegexpMatchEasy0_1K-8 152ns ± 2% 152ns ± 3% ~ (p=0.881 n=30+30)
RegexpMatchEasy1_32-8 62.9ns ± 9% 62.7ns ± 7% ~ (p=0.717 n=30+30)
RegexpMatchEasy1_1K-8 263ns ± 3% 263ns ± 4% ~ (p=0.909 n=30+29)
RegexpMatchMedium_32-8 93.4ns ± 3% 89.3ns ± 2% -4.32% (p=0.000 n=29+29)
RegexpMatchMedium_1K-8 27.5µs ± 3% 27.1µs ± 2% -1.46% (p=0.000 n=30+27)
RegexpMatchHard_32-8 1.33µs ± 3% 1.31µs ± 3% -1.50% (p=0.000 n=27+28)
RegexpMatchHard_1K-8 39.4µs ± 2% 39.1µs ± 2% -0.54% (p=0.027 n=28+28)
Revcomp-8 274ms ± 4% 276ms ± 2% +0.67% (p=0.048 n=29+28)
Template-8 45.1ms ± 5% 44.6ms ± 7% -1.22% (p=0.029 n=30+29)
TimeParse-8 227ns ± 3% 224ns ± 3% -1.25% (p=0.000 n=28+27)
TimeFormat-8 248ns ± 3% 245ns ± 3% -1.33% (p=0.002 n=30+29)
[Geo mean] 36.6µs 36.5µs -0.32%
Change-Id: I24083f0013506b77e2d9da99c40ae2f67803285e
Reviewed-on: https://go-review.googlesource.com/101076
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
name old time/op new time/op delta
Template 281ms ± 2% 282ms ± 3% ~ (p=0.428 n=19+20)
Unicode 138ms ± 6% 138ms ± 7% ~ (p=0.813 n=19+20)
GoTypes 901ms ± 2% 895ms ± 2% ~ (p=0.050 n=19+20)
Compiler 4.25s ± 1% 4.23s ± 1% -0.31% (p=0.031 n=19+18)
SSA 9.77s ± 1% 9.78s ± 1% ~ (p=0.512 n=20+20)
Flate 187ms ± 3% 187ms ± 4% ~ (p=0.687 n=20+19)
GoParser 224ms ± 4% 222ms ± 3% ~ (p=0.301 n=20+20)
Reflect 576ms ± 2% 576ms ± 2% ~ (p=0.620 n=20+20)
Tar 262ms ± 3% 263ms ± 3% ~ (p=0.599 n=19+18)
XML 322ms ± 4% 322ms ± 2% ~ (p=0.512 n=20+20)
name old user-time/op new user-time/op delta
Template 403ms ± 3% 399ms ± 5% ~ (p=0.149 n=17+20)
Unicode 217ms ±12% 217ms ± 9% ~ (p=0.883 n=20+20)
GoTypes 1.24s ± 3% 1.24s ± 3% ~ (p=0.718 n=20+20)
Compiler 5.90s ± 3% 5.84s ± 5% ~ (p=0.217 n=18+20)
SSA 14.0s ± 6% 14.1s ± 5% ~ (p=0.235 n=19+20)
Flate 253ms ± 6% 254ms ± 5% ~ (p=0.749 n=20+19)
GoParser 309ms ± 7% 307ms ± 5% ~ (p=0.398 n=20+20)
Reflect 772ms ± 3% 771ms ± 3% ~ (p=0.901 n=20+19)
Tar 368ms ± 5% 369ms ± 8% ~ (p=0.429 n=20+20)
XML 435ms ± 5% 434ms ± 5% ~ (p=0.841 n=20+20)
name old alloc/op new alloc/op delta
Template 39.0MB ± 0% 38.9MB ± 0% -0.21% (p=0.000 n=20+19)
Unicode 29.0MB ± 0% 29.0MB ± 0% -0.03% (p=0.000 n=20+20)
GoTypes 116MB ± 0% 115MB ± 0% -0.33% (p=0.000 n=20+20)
Compiler 498MB ± 0% 496MB ± 0% -0.37% (p=0.000 n=19+20)
SSA 1.41GB ± 0% 1.40GB ± 0% -0.24% (p=0.000 n=20+20)
Flate 25.0MB ± 0% 25.0MB ± 0% -0.22% (p=0.000 n=20+19)
GoParser 31.0MB ± 0% 30.9MB ± 0% -0.23% (p=0.000 n=20+17)
Reflect 77.1MB ± 0% 77.0MB ± 0% -0.12% (p=0.000 n=20+20)
Tar 39.7MB ± 0% 39.6MB ± 0% -0.17% (p=0.000 n=20+20)
XML 44.9MB ± 0% 44.8MB ± 0% -0.29% (p=0.000 n=20+20)
name old allocs/op new allocs/op delta
Template 386k ± 0% 385k ± 0% -0.28% (p=0.000 n=20+20)
Unicode 337k ± 0% 336k ± 0% -0.07% (p=0.000 n=20+20)
GoTypes 1.20M ± 0% 1.20M ± 0% -0.41% (p=0.000 n=20+20)
Compiler 4.71M ± 0% 4.68M ± 0% -0.52% (p=0.000 n=20+20)
SSA 11.7M ± 0% 11.6M ± 0% -0.31% (p=0.000 n=20+19)
Flate 238k ± 0% 237k ± 0% -0.28% (p=0.000 n=18+20)
GoParser 320k ± 0% 319k ± 0% -0.34% (p=0.000 n=20+19)
Reflect 961k ± 0% 959k ± 0% -0.12% (p=0.000 n=20+20)
Tar 397k ± 0% 396k ± 0% -0.23% (p=0.000 n=20+20)
XML 419k ± 0% 417k ± 0% -0.39% (p=0.000 n=20+19)
Change-Id: Ic7ec3614808d9892c1cab3991b996b7a3b8eff21
Reviewed-on: https://go-review.googlesource.com/102676
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Compilebench:
name old time/op new time/op delta
Template 283ms ± 3% 281ms ± 4% ~ (p=0.242 n=20+20)
Unicode 137ms ± 6% 135ms ± 6% ~ (p=0.194 n=20+19)
GoTypes 890ms ± 2% 883ms ± 1% -0.74% (p=0.001 n=19+19)
Compiler 4.21s ± 2% 4.20s ± 2% -0.40% (p=0.033 n=20+19)
SSA 9.86s ± 2% 9.68s ± 1% -1.80% (p=0.000 n=20+19)
Flate 185ms ± 5% 185ms ± 7% ~ (p=0.429 n=20+20)
GoParser 222ms ± 3% 222ms ± 4% ~ (p=0.588 n=19+20)
Reflect 572ms ± 2% 570ms ± 3% ~ (p=0.113 n=19+20)
Tar 263ms ± 4% 259ms ± 2% -1.41% (p=0.013 n=20+20)
XML 321ms ± 2% 321ms ± 4% ~ (p=0.835 n=20+19)
name old user-time/op new user-time/op delta
Template 400ms ± 5% 405ms ± 5% ~ (p=0.096 n=20+20)
Unicode 217ms ± 8% 213ms ± 8% ~ (p=0.242 n=20+20)
GoTypes 1.23s ± 3% 1.22s ± 3% ~ (p=0.923 n=19+20)
Compiler 5.76s ± 6% 5.81s ± 2% ~ (p=0.687 n=20+19)
SSA 14.2s ± 4% 14.0s ± 4% ~ (p=0.121 n=20+20)
Flate 248ms ± 7% 251ms ±10% ~ (p=0.369 n=20+20)
GoParser 308ms ± 5% 305ms ± 6% ~ (p=0.336 n=19+20)
Reflect 771ms ± 2% 766ms ± 2% ~ (p=0.113 n=20+19)
Tar 370ms ± 5% 362ms ± 7% -2.06% (p=0.036 n=19+20)
XML 435ms ± 4% 432ms ± 5% ~ (p=0.369 n=20+20)
name old alloc/op new alloc/op delta
Template 39.5MB ± 0% 39.4MB ± 0% -0.20% (p=0.000 n=20+20)
Unicode 29.1MB ± 0% 29.1MB ± 0% ~ (p=0.064 n=20+20)
GoTypes 117MB ± 0% 117MB ± 0% -0.17% (p=0.000 n=20+20)
Compiler 503MB ± 0% 502MB ± 0% -0.15% (p=0.000 n=19+19)
SSA 1.42GB ± 0% 1.42GB ± 0% -0.16% (p=0.000 n=20+20)
Flate 25.3MB ± 0% 25.3MB ± 0% -0.19% (p=0.000 n=20+20)
GoParser 31.4MB ± 0% 31.3MB ± 0% -0.14% (p=0.000 n=20+18)
Reflect 78.1MB ± 0% 77.9MB ± 0% -0.34% (p=0.000 n=20+19)
Tar 40.1MB ± 0% 40.0MB ± 0% -0.17% (p=0.000 n=20+20)
XML 45.3MB ± 0% 45.2MB ± 0% -0.13% (p=0.000 n=20+20)
name old allocs/op new allocs/op delta
Template 393k ± 0% 392k ± 0% -0.21% (p=0.000 n=20+19)
Unicode 337k ± 0% 337k ± 0% -0.02% (p=0.000 n=20+20)
GoTypes 1.22M ± 0% 1.22M ± 0% -0.21% (p=0.000 n=20+20)
Compiler 4.77M ± 0% 4.76M ± 0% -0.16% (p=0.000 n=20+20)
SSA 11.8M ± 0% 11.8M ± 0% -0.12% (p=0.000 n=20+20)
Flate 242k ± 0% 241k ± 0% -0.20% (p=0.000 n=20+20)
GoParser 324k ± 0% 324k ± 0% -0.14% (p=0.000 n=20+20)
Reflect 985k ± 0% 981k ± 0% -0.38% (p=0.000 n=20+20)
Tar 403k ± 0% 402k ± 0% -0.19% (p=0.000 n=20+20)
XML 424k ± 0% 424k ± 0% -0.16% (p=0.000 n=19+20)
Change-Id: I131e382b64cd6db11a9263a477d45d80c180c499
Reviewed-on: https://go-review.googlesource.com/102421
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Currently we don't lift spill out of loop if loop contains call.
However often we have code like this:
for .. {
if hard_case {
call()
}
// simple case, without call
}
So instead of checking for any call, check for unavoidable call.
For #22698 cases I see:
mime/quotedprintable/Writer-6 10.9µs ± 4% 9.2µs ± 3% -15.02% (p=0.000 n=8+8)
And:
compress/flate/Encode/Twain/Huffman/1e4-6 99.4µs ± 6% 90.9µs ± 0% -8.57% (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e5-6 760µs ± 1% 725µs ± 1% -4.56% (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e6-6 7.55ms ± 0% 7.24ms ± 0% -4.07% (p=0.000 n=8+7)
There are no significant changes on go1 benchmarks.
But for cases with runtime arch checks, where we call generic version on old hardware,
there are respectable performance gains:
math/RoundToEven-6 1.43ns ± 0% 1.25ns ± 0% -12.59% (p=0.001 n=7+7)
math/bits/OnesCount64-6 1.60ns ± 1% 1.42ns ± 1% -11.32% (p=0.000 n=8+8)
Also on some runtime benchmarks loops have less loads and higher performance:
runtime/RuneIterate/range1/ASCII-6 15.6ns ± 1% 13.9ns ± 1% -10.74% (p=0.000 n=7+8)
runtime/ArrayEqual-6 3.22ns ± 0% 2.86ns ± 2% -11.06% (p=0.000 n=7+8)
Fixes #22698
Updates #22234
Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a
Reviewed-on: https://go-review.googlesource.com/84055
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|