diff options
author | Ben Shi <powerman1st@163.com> | 2017-08-24 10:51:34 +0000 |
---|---|---|
committer | Cherry Zhang <cherryyz@google.com> | 2017-08-28 16:10:27 +0000 |
commit | a2f22a680317aa00cd82ac5c946a18db268a1025 (patch) | |
tree | f0dc03712a6ab04e3d65b87df4e23056a10f1b0b /src/cmd/compile/internal/arm | |
parent | 1fe1512f50585d461dd9f41d8b373da5ed66c99b (diff) | |
download | go-a2f22a680317aa00cd82ac5c946a18db268a1025.tar.gz go-a2f22a680317aa00cd82ac5c946a18db268a1025.zip |
cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU
Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current
ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to
generate further optimized ARM code.
My patch implements this optimization. Here are some contrast test
results against the original go compiler.
1. The total size of all .a files in pkg/ shrinks by 0.03%.
2. The compilecmp benchmark shows a little decline.
name old time/op new time/op delta
Template 2.35s ± 1% 2.37s ± 3% +0.94% (p=0.006 n=19+19)
Unicode 1.33s ± 3% 1.33s ± 2% ~ (p=0.158 n=20+18)
GoTypes 7.86s ± 2% 7.84s ± 1% ~ (p=0.284 n=19+18)
Compiler 37.5s ± 1% 37.7s ± 2% ~ (p=0.101 n=20+19)
SSA 83.4s ± 2% 83.6s ± 2% ~ (p=0.231 n=20+20)
Flate 1.46s ± 2% 1.45s ± 1% ~ (p=0.097 n=20+17)
GoParser 1.86s ± 2% 1.86s ± 4% ~ (p=0.738 n=20+20)
Reflect 5.10s ± 1% 5.11s ± 1% ~ (p=0.290 n=20+18)
Tar 1.78s ± 2% 1.77s ± 2% ~ (p=0.166 n=19+20)
XML 2.61s ± 2% 2.61s ± 2% ~ (p=0.665 n=19+19)
[Geo mean] 4.67s 4.68s +0.16%
name old user-time/op new user-time/op delta
Template 2.79s ± 3% 2.80s ± 2% ~ (p=0.662 n=20+20)
Unicode 1.62s ± 3% 1.64s ± 4% ~ (p=0.252 n=20+20)
GoTypes 9.58s ± 2% 9.62s ± 2% ~ (p=0.250 n=20+20)
Compiler 46.2s ± 1% 46.2s ± 1% ~ (p=0.602 n=20+19)
SSA 108s ± 1% 108s ± 2% ~ (p=0.242 n=18+20)
Flate 1.69s ± 3% 1.69s ± 4% ~ (p=0.470 n=20+20)
GoParser 2.16s ± 3% 2.20s ± 4% +1.70% (p=0.005 n=19+20)
Reflect 6.02s ± 2% 6.02s ± 2% ~ (p=0.700 n=20+17)
Tar 2.11s ± 2% 2.11s ± 3% ~ (p=0.480 n=18+20)
XML 3.07s ± 2% 3.11s ± 4% +1.50% (p=0.043 n=20+20)
[Geo mean] 5.61s 5.64s +0.55%
name old text-bytes new text-bytes delta
HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal)
name old data-bytes new data-bytes delta
HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal)
3. The go1 benchmark shows improvement totally, and even more than 10%
improvement in the test case Revcomp.
name old time/op new time/op delta
BinaryTree17-4 42.0s ± 1% 41.5s ± 1% -1.32% (p=0.000 n=39+40)
Fannkuch11-4 24.1s ± 1% 23.6s ± 0% -2.38% (p=0.000 n=40+40)
FmtFprintfEmpty-4 843ns ± 0% 839ns ± 1% -0.46% (p=0.000 n=33+40)
FmtFprintfString-4 1.44µs ± 1% 1.37µs ± 1% -5.48% (p=0.000 n=40+35)
FmtFprintfInt-4 1.44µs ± 1% 1.41µs ± 2% -1.50% (p=0.000 n=40+40)
FmtFprintfIntInt-4 2.07µs ± 1% 2.06µs ± 0% -0.78% (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4 2.50µs ± 1% 2.33µs ± 1% -6.85% (p=0.000 n=40+40)
FmtFprintfFloat-4 4.36µs ± 1% 4.34µs ± 0% -0.39% (p=0.017 n=40+40)
FmtManyArgs-4 8.11µs ± 0% 8.00µs ± 0% -1.37% (p=0.000 n=40+40)
GobDecode-4 105ms ± 2% 103ms ± 2% -2.17% (p=0.000 n=39+39)
GobEncode-4 90.1ms ± 2% 88.6ms ± 1% -1.67% (p=0.000 n=40+39)
Gzip-4 4.18s ± 1% 4.09s ± 1% -2.03% (p=0.000 n=40+40)
Gunzip-4 608ms ± 1% 603ms ± 1% -0.86% (p=0.000 n=40+34)
HTTPClientServer-4 674µs ± 3% 661µs ± 2% -1.82% (p=0.000 n=40+39)
JSONEncode-4 256ms ± 1% 243ms ± 0% -5.11% (p=0.000 n=39+31)
JSONDecode-4 915ms ± 1% 904ms ± 1% -1.18% (p=0.000 n=40+36)
Mandelbrot200-4 49.2ms ± 0% 49.3ms ± 0% ~ (p=0.254 n=34+40)
GoParse-4 46.9ms ± 2% 46.9ms ± 1% ~ (p=0.737 n=40+39)
RegexpMatchEasy0_32-4 1.28µs ± 1% 1.27µs ± 1% -0.71% (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4 7.86µs ± 4% 7.67µs ± 4% -2.46% (p=0.000 n=38+40)
RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 10.4µs ± 2% 10.3µs ± 2% -0.88% (p=0.003 n=40+39)
RegexpMatchMedium_32-4 2.05µs ± 0% 2.04µs ± 0% -0.34% (p=0.000 n=40+33)
RegexpMatchMedium_1K-4 541µs ± 1% 535µs ± 1% -1.02% (p=0.000 n=40+38)
RegexpMatchHard_32-4 29.3µs ± 1% 29.1µs ± 1% -0.51% (p=0.000 n=40+40)
RegexpMatchHard_1K-4 881µs ± 1% 871µs ± 1% -1.15% (p=0.000 n=40+40)
Revcomp-4 81.7ms ± 2% 67.5ms ± 2% -17.37% (p=0.000 n=39+39)
Template-4 1.05s ± 1% 1.08s ± 2% +3.67% (p=0.000 n=40+40)
TimeParse-4 7.24µs ± 1% 7.09µs ± 1% -2.13% (p=0.000 n=40+40)
TimeFormat-4 13.2µs ± 1% 13.1µs ± 0% -0.31% (p=0.007 n=40+31)
[Geo mean] 733µs 718µs -2.03%
name old speed new speed delta
GobDecode-4 7.28MB/s ± 2% 7.44MB/s ± 2% +2.23% (p=0.000 n=39+39)
GobEncode-4 8.52MB/s ± 2% 8.67MB/s ± 1% +1.70% (p=0.000 n=40+39)
Gzip-4 4.65MB/s ± 1% 4.74MB/s ± 1% +1.94% (p=0.000 n=37+40)
Gunzip-4 31.9MB/s ± 1% 32.2MB/s ± 1% +0.90% (p=0.000 n=40+36)
JSONEncode-4 7.57MB/s ± 1% 7.98MB/s ± 0% +5.41% (p=0.000 n=40+31)
JSONDecode-4 2.12MB/s ± 1% 2.15MB/s ± 1% +1.23% (p=0.000 n=40+40)
GoParse-4 1.23MB/s ± 1% 1.23MB/s ± 1% ~ (p=0.769 n=39+40)
RegexpMatchEasy0_32-4 25.0MB/s ± 1% 25.2MB/s ± 1% +0.71% (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4 130MB/s ± 5% 134MB/s ± 4% +2.53% (p=0.000 n=38+40)
RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.1MB/s ± 1% +0.55% (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4 98.5MB/s ± 2% 99.4MB/s ± 2% +0.88% (p=0.003 n=40+39)
RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal)
RegexpMatchMedium_1K-4 1.89MB/s ± 1% 1.91MB/s ± 1% +1.02% (p=0.000 n=40+38)
RegexpMatchHard_32-4 1.10MB/s ± 1% 1.10MB/s ± 0% +0.41% (p=0.000 n=40+33)
RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.17MB/s ± 1% +1.21% (p=0.000 n=40+40)
Revcomp-4 31.1MB/s ± 2% 37.6MB/s ± 2% +21.03% (p=0.000 n=39+39)
Template-4 1.86MB/s ± 1% 1.79MB/s ± 1% -3.51% (p=0.000 n=40+38)
[Geo mean] 6.66MB/s 6.80MB/s +2.13%
fixes #21492
Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d
Reviewed-on: https://go-review.googlesource.com/58450
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Diffstat (limited to 'src/cmd/compile/internal/arm')
-rw-r--r-- | src/cmd/compile/internal/arm/ssa.go | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/src/cmd/compile/internal/arm/ssa.go b/src/cmd/compile/internal/arm/ssa.go index 93abee3da0..d0d864d25d 100644 --- a/src/cmd/compile/internal/arm/ssa.go +++ b/src/cmd/compile/internal/arm/ssa.go @@ -516,7 +516,7 @@ func ssaGenValue(s *gc.SSAGenState, v *ssa.Value) { p.To.Type = obj.TYPE_MEM p.To.Reg = v.Args[0].Reg() gc.AddAux(&p.To, v) - case ssa.OpARMMOVWloadidx: + case ssa.OpARMMOVWloadidx, ssa.OpARMMOVBUloadidx, ssa.OpARMMOVBloadidx, ssa.OpARMMOVHUloadidx, ssa.OpARMMOVHloadidx: // this is just shift 0 bits fallthrough case ssa.OpARMMOVWloadshiftLL: @@ -528,7 +528,7 @@ func ssaGenValue(s *gc.SSAGenState, v *ssa.Value) { case ssa.OpARMMOVWloadshiftRA: p := genshift(s, v.Op.Asm(), 0, v.Args[1].Reg(), v.Reg(), arm.SHIFT_AR, v.AuxInt) p.From.Reg = v.Args[0].Reg() - case ssa.OpARMMOVWstoreidx: + case ssa.OpARMMOVWstoreidx, ssa.OpARMMOVBstoreidx, ssa.OpARMMOVHstoreidx: // this is just shift 0 bits fallthrough case ssa.OpARMMOVWstoreshiftLL: |