diff options
author | Austin Clements <austin@google.com> | 2016-06-29 17:41:50 -0400 |
---|---|---|
committer | Austin Clements <austin@google.com> | 2016-06-30 19:35:44 +0000 |
commit | 9c8809f82aa59e0725e93cffb03de863e61cbbae (patch) | |
tree | d69c0238accae4a22948c608d936038bdfb7e3a9 | |
parent | 95483f262b619a53793baf86512aeabf44fc9d3a (diff) | |
download | go-9c8809f82aa59e0725e93cffb03de863e61cbbae.tar.gz go-9c8809f82aa59e0725e93cffb03de863e61cbbae.zip |
runtime/internal/sys: implement Ctz and Bswap in assembly for 386
Ctz is a hot-spot in the Go 1.7 memory manager. In SSA it's
implemented as an intrinsic that compiles to a few instructions, but
on the old backend (all architectures other than amd64), it's
implemented as a fairly complex Go function. As a result, switching to
bitmap-based allocation was a significant hit to allocation-heavy
workloads like BinaryTree17 on non-SSA platforms.
For unknown reasons, this hit 386 particularly hard. We can regain a
lot of the lost performance by implementing Ctz in assembly on the
386. This isn't as good as an intrinsic, since it still generates a
function call and prevents useful inlining, but it's much better than
the pure Go implementation:
name old time/op new time/op delta
BinaryTree17-12 3.59s ± 1% 3.06s ± 1% -14.74% (p=0.000 n=19+20)
Fannkuch11-12 3.72s ± 1% 3.64s ± 1% -2.09% (p=0.000 n=17+19)
FmtFprintfEmpty-12 52.3ns ± 3% 52.3ns ± 3% ~ (p=0.829 n=20+19)
FmtFprintfString-12 156ns ± 1% 148ns ± 3% -5.20% (p=0.000 n=18+19)
FmtFprintfInt-12 137ns ± 1% 136ns ± 1% -0.56% (p=0.000 n=19+13)
FmtFprintfIntInt-12 227ns ± 2% 225ns ± 2% -0.93% (p=0.000 n=19+17)
FmtFprintfPrefixedInt-12 210ns ± 1% 208ns ± 1% -0.91% (p=0.000 n=19+17)
FmtFprintfFloat-12 375ns ± 1% 371ns ± 1% -1.06% (p=0.000 n=19+18)
FmtManyArgs-12 995ns ± 2% 978ns ± 1% -1.63% (p=0.000 n=17+17)
GobDecode-12 9.33ms ± 1% 9.19ms ± 0% -1.59% (p=0.000 n=20+17)
GobEncode-12 7.73ms ± 1% 7.73ms ± 1% ~ (p=0.771 n=19+20)
Gzip-12 375ms ± 1% 374ms ± 1% ~ (p=0.141 n=20+18)
Gunzip-12 61.8ms ± 1% 61.8ms ± 1% ~ (p=0.602 n=20+20)
HTTPClientServer-12 87.7µs ± 2% 86.9µs ± 3% -0.87% (p=0.024 n=19+20)
JSONEncode-12 20.2ms ± 1% 20.4ms ± 0% +0.53% (p=0.000 n=18+19)
JSONDecode-12 65.3ms ± 0% 65.4ms ± 1% ~ (p=0.385 n=16+19)
Mandelbrot200-12 4.11ms ± 1% 4.12ms ± 0% +0.29% (p=0.020 n=19+19)
GoParse-12 3.75ms ± 1% 3.61ms ± 2% -3.90% (p=0.000 n=20+20)
RegexpMatchEasy0_32-12 104ns ± 0% 103ns ± 0% -0.96% (p=0.000 n=13+16)
RegexpMatchEasy0_1K-12 805ns ± 1% 803ns ± 1% ~ (p=0.189 n=18+18)
RegexpMatchEasy1_32-12 111ns ± 0% 111ns ± 3% ~ (p=1.000 n=14+19)
RegexpMatchEasy1_1K-12 1.00µs ± 1% 1.00µs ± 1% +0.50% (p=0.003 n=19+19)
RegexpMatchMedium_32-12 133ns ± 2% 133ns ± 2% ~ (p=0.218 n=20+20)
RegexpMatchMedium_1K-12 41.2µs ± 1% 42.2µs ± 1% +2.52% (p=0.000 n=18+16)
RegexpMatchHard_32-12 2.35µs ± 1% 2.38µs ± 1% +1.53% (p=0.000 n=18+18)
RegexpMatchHard_1K-12 70.9µs ± 2% 72.0µs ± 1% +1.42% (p=0.000 n=19+17)
Revcomp-12 1.06s ± 0% 1.05s ± 0% -1.36% (p=0.000 n=20+18)
Template-12 86.2ms ± 1% 84.6ms ± 0% -1.89% (p=0.000 n=20+18)
TimeParse-12 425ns ± 2% 428ns ± 1% +0.77% (p=0.000 n=18+19)
TimeFormat-12 517ns ± 1% 519ns ± 1% +0.43% (p=0.001 n=20+19)
[Geo mean] 74.3µs 73.5µs -1.05%
Prior to this commit, BinaryTree17-12 on 386 was 33% slower than at
the go1.6 tag. With this commit, it's 13% slower.
On arm and arm64, BinaryTree17-12 is only ~5% slower than it was at
go1.6. It may be worth implementing Ctz for them as well.
I consider this change low risk, since the functions it replaces are
simple, very well specified, and well tested.
For #16117.
Change-Id: Ic39d851d5aca91330134596effd2dab9689ba066
Reviewed-on: https://go-review.googlesource.com/24640
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
-rw-r--r-- | src/runtime/internal/sys/intrinsics.go | 2 | ||||
-rw-r--r-- | src/runtime/internal/sys/intrinsics_386.s | 68 | ||||
-rw-r--r-- | src/runtime/internal/sys/intrinsics_stubs.go | 14 |
3 files changed, 84 insertions, 0 deletions
diff --git a/src/runtime/internal/sys/intrinsics.go b/src/runtime/internal/sys/intrinsics.go index 1054c6948f..08a062f85a 100644 --- a/src/runtime/internal/sys/intrinsics.go +++ b/src/runtime/internal/sys/intrinsics.go @@ -2,6 +2,8 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. +// +build !386 + package sys // Using techniques from http://supertech.csail.mit.edu/papers/debruijn.pdf diff --git a/src/runtime/internal/sys/intrinsics_386.s b/src/runtime/internal/sys/intrinsics_386.s new file mode 100644 index 0000000000..1f48e26492 --- /dev/null +++ b/src/runtime/internal/sys/intrinsics_386.s @@ -0,0 +1,68 @@ +// Copyright 2016 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +#include "textflag.h" + +TEXT runtime∕internal∕sys·Ctz64(SB), NOSPLIT, $0-16 + MOVL $0, ret_hi+12(FP) + + // Try low 32 bits. + MOVL x_lo+0(FP), AX + BSFL AX, AX + JZ tryhigh + MOVL AX, ret_lo+8(FP) + RET + +tryhigh: + // Try high 32 bits. + MOVL x_hi+4(FP), AX + BSFL AX, AX + JZ none + ADDL $32, AX + MOVL AX, ret_lo+8(FP) + RET + +none: + // No bits are set. + MOVL $64, ret_lo+8(FP) + RET + +TEXT runtime∕internal∕sys·Ctz32(SB), NOSPLIT, $0-8 + MOVL x+0(FP), AX + BSFL AX, AX + JNZ 2(PC) + MOVL $32, AX + MOVL AX, ret+4(FP) + RET + +TEXT runtime∕internal∕sys·Ctz16(SB), NOSPLIT, $0-6 + MOVW x+0(FP), AX + BSFW AX, AX + JNZ 2(PC) + MOVW $16, AX + MOVW AX, ret+4(FP) + RET + +TEXT runtime∕internal∕sys·Ctz8(SB), NOSPLIT, $0-5 + MOVBLZX x+0(FP), AX + BSFL AX, AX + JNZ 2(PC) + MOVB $8, AX + MOVB AX, ret+4(FP) + RET + +TEXT runtime∕internal∕sys·Bswap64(SB), NOSPLIT, $0-16 + MOVL x_lo+0(FP), AX + MOVL x_hi+4(FP), BX + BSWAPL AX + BSWAPL BX + MOVL BX, ret_lo+8(FP) + MOVL AX, ret_hi+12(FP) + RET + +TEXT runtime∕internal∕sys·Bswap32(SB), NOSPLIT, $0-8 + MOVL x+0(FP), AX + BSWAPL AX + MOVL AX, ret+4(FP) + RET diff --git a/src/runtime/internal/sys/intrinsics_stubs.go b/src/runtime/internal/sys/intrinsics_stubs.go new file mode 100644 index 0000000000..079844fda4 --- /dev/null +++ b/src/runtime/internal/sys/intrinsics_stubs.go @@ -0,0 +1,14 @@ +// Copyright 2016 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +// +build 386 + +package sys + +func Ctz64(x uint64) uint64 +func Ctz32(x uint32) uint32 +func Ctz16(x uint16) uint16 +func Ctz8(x uint8) uint8 +func Bswap64(x uint64) uint64 +func Bswap32(x uint32) uint32 |