Age | Commit message (Collapse) | Author |
|
There are still two places in src/runtime/string.go that use
staticbytes, so we cannot delete it just yet.
There is a new codegen test to verify that the index calculation
is constant-folded, at least on amd64. ppc64, mips[64] and s390x
cannot currently do that.
There is also a new runtime benchmark to ensure that this does not
slow down performance (tested against parent commit):
name old time/op new time/op delta
ConvT2EByteSized/bool-4 1.07ns ± 1% 1.07ns ± 1% ~ (p=0.060 n=14+15)
ConvT2EByteSized/uint8-4 1.06ns ± 1% 1.07ns ± 1% ~ (p=0.095 n=14+15)
Updates #37612
Change-Id: I5ec30738edaa48cda78dfab4a78e24a32fa7fd6a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221957
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Prior to this change, we avoid allocation when
converting 0 to an interface.
This change extends that optimization to larger value types
whose values happens to be in the range 0 to 255.
This is marginally more expensive in the case of a 0 value,
in that the address is computed rather than fixed.
name old time/op new time/op delta
ConvT2ESmall-8 2.36ns ± 4% 2.65ns ± 4% +12.23% (p=0.000 n=87+91)
ConvT2EUintptr-8 2.36ns ± 4% 2.84ns ± 6% +20.05% (p=0.000 n=96+99)
ConvT2ELarge-8 23.8ns ± 2% 23.1ns ± 3% -2.94% (p=0.000 n=93+95)
ConvT2ISmall-8 2.67ns ± 5% 2.74ns ±27% ~ (p=0.214 n=99+100)
ConvT2IUintptr-8 2.65ns ± 5% 2.46ns ± 5% -7.19% (p=0.000 n=98+98)
ConvT2ILarge-8 24.2ns ± 2% 23.5ns ± 4% -3.16% (p=0.000 n=91+97)
ConvT2Ezero/zero/16-8 2.79ns ± 6% 2.99ns ± 4% +7.52% (p=0.000 n=94+88)
ConvT2Ezero/zero/32-8 2.34ns ± 3% 2.65ns ± 3% +13.06% (p=0.000 n=92+98)
ConvT2Ezero/zero/64-8 2.35ns ± 4% 2.65ns ± 6% +12.86% (p=0.000 n=99+94)
ConvT2Ezero/zero/str-8 2.55ns ± 4% 2.54ns ± 4% ~ (p=0.063 n=97+99)
ConvT2Ezero/zero/slice-8 2.82ns ± 4% 2.85ns ± 5% +1.00% (p=0.000 n=99+95)
ConvT2Ezero/zero/big-8 94.3ns ± 5% 93.4ns ± 4% -0.94% (p=0.000 n=88+90)
ConvT2Ezero/nonzero/str-8 29.6ns ± 3% 27.7ns ± 3% -6.69% (p=0.000 n=98+97)
ConvT2Ezero/nonzero/slice-8 36.6ns ± 2% 37.1ns ± 2% +1.31% (p=0.000 n=94+90)
ConvT2Ezero/nonzero/big-8 93.4ns ± 3% 92.7ns ± 3% -0.74% (p=0.000 n=88+84)
ConvT2Ezero/smallint/16-8 13.3ns ± 4% 2.7ns ± 6% -79.82% (p=0.000 n=100+97)
ConvT2Ezero/smallint/32-8 12.5ns ± 1% 2.9ns ± 5% -77.17% (p=0.000 n=85+96)
ConvT2Ezero/smallint/64-8 14.7ns ± 3% 2.6ns ± 3% -82.05% (p=0.000 n=94+94)
ConvT2Ezero/largeint/16-8 14.0ns ± 4% 13.2ns ± 7% -5.44% (p=0.000 n=95+99)
ConvT2Ezero/largeint/32-8 12.8ns ± 4% 12.9ns ± 3% ~ (p=0.096 n=99+87)
ConvT2Ezero/largeint/64-8 15.5ns ± 2% 15.0ns ± 2% -3.46% (p=0.000 n=95+96)
An example of a program for which this makes a perceptible difference
is running the compiler with the -S flag:
name old time/op new time/op delta
Template 349ms ± 2% 344ms ± 2% -1.48% (p=0.000 n=23+25)
Unicode 138ms ± 4% 136ms ± 3% -1.67% (p=0.003 n=25+25)
GoTypes 1.25s ± 2% 1.24s ± 2% -1.11% (p=0.001 n=24+25)
Compiler 5.73s ± 2% 5.67s ± 2% -1.09% (p=0.002 n=25+24)
SSA 20.2s ± 2% 19.9s ± 2% -1.45% (p=0.000 n=25+23)
Flate 216ms ± 4% 210ms ± 2% -2.77% (p=0.000 n=25+24)
GoParser 283ms ± 2% 278ms ± 3% -1.58% (p=0.000 n=23+23)
Reflect 757ms ± 2% 745ms ± 2% -1.58% (p=0.000 n=25+25)
Tar 303ms ± 4% 296ms ± 2% -2.20% (p=0.000 n=22+23)
XML 415ms ± 2% 411ms ± 3% -0.94% (p=0.002 n=25+22)
[Geo mean] 726ms 715ms -1.59%
name old user-time/op new user-time/op delta
Template 434ms ± 3% 427ms ± 2% -1.66% (p=0.000 n=23+24)
Unicode 204ms ±12% 198ms ±12% -2.83% (p=0.032 n=25+25)
GoTypes 1.59s ± 2% 1.56s ± 2% -1.64% (p=0.000 n=22+25)
Compiler 7.50s ± 1% 7.40s ± 2% -1.32% (p=0.000 n=25+25)
SSA 27.2s ± 2% 26.8s ± 2% -1.50% (p=0.000 n=24+23)
Flate 266ms ± 6% 254ms ± 3% -4.38% (p=0.000 n=25+25)
GoParser 357ms ± 2% 351ms ± 2% -1.90% (p=0.000 n=24+23)
Reflect 966ms ± 2% 947ms ± 2% -1.94% (p=0.000 n=24+25)
Tar 387ms ± 2% 380ms ± 3% -1.83% (p=0.000 n=22+24)
XML 538ms ± 1% 532ms ± 1% -1.15% (p=0.000 n=24+20)
[Geo mean] 942ms 923ms -2.02%
name old alloc/op new alloc/op delta
Template 54.1MB ± 0% 52.9MB ± 0% -2.26% (p=0.000 n=25+25)
Unicode 33.5MB ± 0% 33.1MB ± 0% -1.03% (p=0.000 n=25+24)
GoTypes 189MB ± 0% 185MB ± 0% -2.27% (p=0.000 n=25+25)
Compiler 875MB ± 0% 858MB ± 0% -1.99% (p=0.000 n=23+25)
SSA 3.19GB ± 0% 3.13GB ± 0% -1.95% (p=0.000 n=25+25)
Flate 32.9MB ± 0% 32.2MB ± 0% -2.26% (p=0.000 n=25+25)
GoParser 44.0MB ± 0% 42.9MB ± 0% -2.33% (p=0.000 n=25+25)
Reflect 117MB ± 0% 114MB ± 0% -2.60% (p=0.000 n=25+25)
Tar 48.6MB ± 0% 47.5MB ± 0% -2.18% (p=0.000 n=25+24)
XML 65.7MB ± 0% 64.4MB ± 0% -1.96% (p=0.000 n=23+25)
[Geo mean] 118MB 115MB -2.08%
name old allocs/op new allocs/op delta
Template 1.07M ± 0% 0.92M ± 0% -14.29% (p=0.000 n=25+25)
Unicode 539k ± 0% 494k ± 0% -8.27% (p=0.000 n=25+25)
GoTypes 3.97M ± 0% 3.43M ± 0% -13.71% (p=0.000 n=24+25)
Compiler 17.6M ± 0% 15.4M ± 0% -12.69% (p=0.000 n=25+24)
SSA 66.1M ± 0% 58.1M ± 0% -12.17% (p=0.000 n=25+25)
Flate 629k ± 0% 536k ± 0% -14.73% (p=0.000 n=24+24)
GoParser 929k ± 0% 799k ± 0% -13.96% (p=0.000 n=25+25)
Reflect 2.49M ± 0% 2.11M ± 0% -15.28% (p=0.000 n=25+25)
Tar 919k ± 0% 788k ± 0% -14.30% (p=0.000 n=25+25)
XML 1.28M ± 0% 1.11M ± 0% -12.85% (p=0.000 n=24+25)
[Geo mean] 2.32M 2.01M -13.24%
There is a slight increase in binary size from this change:
file before after Δ %
addr2line 4307728 4307760 +32 +0.001%
api 5972680 5972728 +48 +0.001%
asm 5114200 5114232 +32 +0.001%
buildid 2843720 2847848 +4128 +0.145%
cgo 4823736 4827864 +4128 +0.086%
compile 24912056 24912104 +48 +0.000%
cover 5259800 5259832 +32 +0.001%
dist 3665080 3665128 +48 +0.001%
doc 4672712 4672744 +32 +0.001%
fix 3376952 3376984 +32 +0.001%
link 6618008 6622152 +4144 +0.063%
nm 4253280 4257424 +4144 +0.097%
objdump 4655376 4659504 +4128 +0.089%
pack 2294280 2294328 +48 +0.002%
pprof 14747476 14751620 +4144 +0.028%
test2json 2819320 2823448 +4128 +0.146%
trace 11665068 11669212 +4144 +0.036%
vet 8342360 8342408 +48 +0.001%
Change-Id: I38ef70244e23069bfd14334061d43ae22a294519
Reviewed-on: https://go-review.googlesource.com/c/go/+/216401
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Prior to this CL, all runtime conversions
from a concrete value to an interface went
through one of two runtime calls: convT2E or convT2I.
However, in practice, basic types are very common.
Specializing convT2x for those basic types allows
for a more efficient implementation for those types.
For basic scalars and strings, allocation and copying
can use the same methods as normal code.
For pointer-free types, allocation can occur without
zeroing, and copying can take place without GC calls.
For slices, copying is cheaper and simpler.
This CL adds twelve runtime routines:
convT2E16, convT2I16
convT2E32, convT2I32
convT2E64, convT2I64
convT2Estring, convT2Istring
convT2Eslice, convT2Islice
convT2Enoptr, convT2Inoptr
While compiling make.bash, 93% of all convT2x calls
are now to one of these specialized convT2x call.
Within specialized convT2x routines, it is cheap to check
for a zero value, in a way that it is not in general.
When we detect a zero value there, we return a pointer
to zeroVal, rather than allocating.
name old time/op new time/op delta
ConvT2Ezero/zero/16-8 17.9ns ± 2% 3.0ns ± 3% -83.20% (p=0.000 n=56+56)
ConvT2Ezero/zero/32-8 17.8ns ± 2% 3.0ns ± 3% -83.15% (p=0.000 n=59+60)
ConvT2Ezero/zero/64-8 20.1ns ± 1% 3.0ns ± 2% -84.98% (p=0.000 n=57+57)
ConvT2Ezero/zero/str-8 32.6ns ± 1% 3.0ns ± 4% -90.70% (p=0.000 n=59+60)
ConvT2Ezero/zero/slice-8 36.7ns ± 2% 3.0ns ± 2% -91.78% (p=0.000 n=59+59)
ConvT2Ezero/zero/big-8 91.9ns ± 2% 85.9ns ± 2% -6.52% (p=0.000 n=57+57)
ConvT2Ezero/nonzero/16-8 17.7ns ± 2% 12.7ns ± 3% -28.38% (p=0.000 n=55+60)
ConvT2Ezero/nonzero/32-8 17.8ns ± 1% 12.7ns ± 1% -28.44% (p=0.000 n=54+57)
ConvT2Ezero/nonzero/64-8 20.0ns ± 1% 15.0ns ± 1% -24.90% (p=0.000 n=56+58)
ConvT2Ezero/nonzero/str-8 32.6ns ± 1% 25.7ns ± 1% -21.17% (p=0.000 n=58+55)
ConvT2Ezero/nonzero/slice-8 36.8ns ± 2% 30.4ns ± 1% -17.32% (p=0.000 n=60+52)
ConvT2Ezero/nonzero/big-8 92.1ns ± 2% 85.9ns ± 2% -6.70% (p=0.000 n=57+59)
Benchmarks on a real program (the compiler):
name old time/op new time/op delta
Template 227ms ± 5% 221ms ± 2% -2.48% (p=0.000 n=30+26)
Unicode 102ms ± 5% 100ms ± 3% -1.30% (p=0.009 n=30+26)
GoTypes 656ms ± 5% 659ms ± 4% ~ (p=0.208 n=30+30)
Compiler 2.82s ± 2% 2.82s ± 1% ~ (p=0.614 n=29+27)
Flate 128ms ± 2% 128ms ± 5% ~ (p=0.783 n=27+28)
GoParser 158ms ± 3% 158ms ± 3% ~ (p=0.261 n=28+30)
Reflect 408ms ± 7% 401ms ± 3% ~ (p=0.075 n=30+30)
Tar 123ms ± 6% 121ms ± 8% ~ (p=0.287 n=29+30)
XML 220ms ± 2% 220ms ± 4% ~ (p=0.805 n=29+29)
name old user-ns/op new user-ns/op delta
Template 281user-ms ± 4% 279user-ms ± 3% -0.87% (p=0.044 n=28+28)
Unicode 142user-ms ± 4% 141user-ms ± 3% -1.04% (p=0.015 n=30+27)
GoTypes 884user-ms ± 3% 886user-ms ± 2% ~ (p=0.532 n=30+30)
Compiler 3.94user-s ± 3% 3.92user-s ± 1% ~ (p=0.185 n=30+28)
Flate 165user-ms ± 2% 165user-ms ± 4% ~ (p=0.780 n=27+29)
GoParser 209user-ms ± 2% 208user-ms ± 3% ~ (p=0.453 n=28+30)
Reflect 533user-ms ± 6% 526user-ms ± 3% ~ (p=0.057 n=30+30)
Tar 156user-ms ± 6% 154user-ms ± 6% ~ (p=0.133 n=29+30)
XML 288user-ms ± 4% 288user-ms ± 4% ~ (p=0.633 n=30+30)
name old alloc/op new alloc/op delta
Template 41.0MB ± 0% 40.9MB ± 0% -0.11% (p=0.000 n=29+29)
Unicode 32.6MB ± 0% 32.6MB ± 0% ~ (p=0.572 n=29+30)
GoTypes 122MB ± 0% 122MB ± 0% -0.10% (p=0.000 n=30+30)
Compiler 482MB ± 0% 481MB ± 0% -0.07% (p=0.000 n=30+29)
Flate 26.6MB ± 0% 26.6MB ± 0% ~ (p=0.096 n=30+30)
GoParser 32.7MB ± 0% 32.6MB ± 0% -0.06% (p=0.011 n=28+28)
Reflect 84.2MB ± 0% 84.1MB ± 0% -0.17% (p=0.000 n=29+30)
Tar 27.7MB ± 0% 27.7MB ± 0% -0.05% (p=0.032 n=27+28)
XML 44.7MB ± 0% 44.7MB ± 0% ~ (p=0.131 n=28+30)
name old allocs/op new allocs/op delta
Template 373k ± 1% 370k ± 1% -0.76% (p=0.000 n=30+30)
Unicode 325k ± 1% 325k ± 1% ~ (p=0.383 n=29+30)
GoTypes 1.16M ± 0% 1.15M ± 0% -0.75% (p=0.000 n=29+30)
Compiler 4.15M ± 0% 4.13M ± 0% -0.59% (p=0.000 n=30+29)
Flate 238k ± 1% 237k ± 1% -0.62% (p=0.000 n=30+30)
GoParser 304k ± 1% 302k ± 1% -0.64% (p=0.000 n=30+28)
Reflect 1.00M ± 0% 0.99M ± 0% -1.10% (p=0.000 n=29+30)
Tar 245k ± 1% 244k ± 1% -0.59% (p=0.000 n=27+29)
XML 391k ± 1% 389k ± 1% -0.59% (p=0.000 n=29+30)
Change-Id: Id7f456d690567c2b0a96b0d6d64de8784b6e305f
Reviewed-on: https://go-review.googlesource.com/36476
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Extend escape analysis to convT2E and conT2I. If the interface value
does not escape supply runtime with a stack buffer for the object copy.
This is a straight port from .c to .go of Dmitry's patch
Change-Id: Ic315dd50d144d94dd3324227099c116be5ca70b6
Reviewed-on: https://go-review.googlesource.com/8201
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
|
|
Some type assertions of the form _, ok := i.(T) allow efficient inlining.
Such type assertions commonly show up in type switches.
For example, with this optimization, using 6g, the length of
encoding/binary's intDataSize function shrinks from 2224 to 1728 bytes (-22%).
benchmark old ns/op new ns/op delta
BenchmarkAssertI2E2Blank 4.67 0.82 -82.44%
BenchmarkAssertE2T2Blank 4.38 0.83 -81.05%
BenchmarkAssertE2E2Blank 3.88 0.83 -78.61%
BenchmarkAssertE2E2 14.2 14.4 +1.41%
BenchmarkAssertE2T2 10.3 10.4 +0.97%
BenchmarkAssertI2E2 13.4 13.3 -0.75%
Change-Id: Ie9798c3e85432bb8e0f2c723afc376e233639df7
Reviewed-on: https://go-review.googlesource.com/7697
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Consider an interface value i of type I and concrete value c of type C.
Prior to this CL, i==c was evaluated as
I(c) == i
Evaluating I(c) can allocate.
This CL changes the evaluation of i==c to
x, ok := i.(C); ok && x == c
The new generated code is shorter and does not allocate directly.
If C is small, as it is in every instance in the stdlib,
the new code also uses less stack space
and makes one runtime call instead of two.
If C is very large, the original implementation is used.
The cutoff for "very large" is 1<<16,
following the stack vs heap cutoff used elsewhere.
This kind of comparison occurs in 38 places in the stdlib,
mostly in the net and os packages.
benchmark old ns/op new ns/op delta
BenchmarkEqEfaceConcrete 29.5 7.92 -73.15%
BenchmarkEqIfaceConcrete 32.1 7.90 -75.39%
BenchmarkNeEfaceConcrete 29.9 7.90 -73.58%
BenchmarkNeIfaceConcrete 35.9 7.90 -77.99%
Fixes #9370.
Change-Id: I7c4555950bcd6406ee5c613be1f2128da2c9a2b7
Reviewed-on: https://go-review.googlesource.com/2096
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
|
|
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
|