aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/asm_arm64.s
AgeCommit message (Collapse)Author
2021-08-02[release-branch.go1.15] cmd/compile: mark R16, R17 clobbered for ↵Cherry Zhang
non-standard calls on ARM64 On ARM64, (external) linker generated trampoline may clobber R16 and R17. In CL 183842 we change Duff's devices not to use those registers. However, this is not enough. The register allocator also needs to know that these registers may be clobbered in any calls that don't follow the standard Go calling convention. This include Duff's devices and the write barrier. Fixes #46927. Updates #32773. Change-Id: Ia52a891d9bbb8515c927617dd53aee5af5bd9aa4 Reviewed-on: https://go-review.googlesource.com/c/go/+/184437 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Meng Zhuo <mzh@golangcn.org> (cherry picked from commit 11b4aee05bfe83513cf08f83091e5aef8b33e766) Reviewed-on: https://go-review.googlesource.com/c/go/+/331030 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>
2020-03-03runtime: use CBZ/CBNZ in linux/arm64 assembly codeXiangdong Ji
Replace compare and branch on zero/non-zero instructions in linux/arm64 assembly files with CBZ/CBNZ. Change-Id: I4dbf56678f85827e83b5863804368bc28a4603b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/209617 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
2019-08-30runtime: use all 64 bits of hash seed on arm64Keith Randall
Fixes #33960 Change-Id: I4f8cf65dcf4140a97e7b368572b31c171c453316 Reviewed-on: https://go-review.googlesource.com/c/go/+/192498 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-29runtime: switch default order of hashing algorithmsKeith Randall
Currently the standard hasher is memhash, which checks whether aes instructions are available, and if so redirects to aeshash. With this CL, we call aeshash directly, which then redirects to the fallback hash if aes instructions are not available. This reduces the overhead for the hash function in the common case, as it requires just one call instead of two. On architectures which have no assembly hasher, it's a single jump slower. Thanks to Martin for this idea. name old time/op new time/op delta BigKeyMap-4 22.6ns ± 1% 21.1ns ± 2% -6.55% (p=0.000 n=9+10) Change-Id: Ib7ca77b63d28222eb0189bc3d7130531949d853c Reviewed-on: https://go-review.googlesource.com/c/go/+/190998 Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2019-04-20runtime: move linux specific code into linux specific filesMaya Rashish
Allows us to stop whitelisting this error on many OS/arch combinations XXX I'm not sure I am running vet correctly, and testing all platforms right. Change-Id: I29f548bd5f4a63bd13c4d0667d4209c75c886fd9 GitHub-Last-Rev: 52f6ff4a6b986e86f8b26c3d19da7707d39f1664 GitHub-Pull-Request: golang/go#31583 Reviewed-on: https://go-review.googlesource.com/c/go/+/173157 Run-TryBot: Benny Siegert <bsiegert@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Benny Siegert <bsiegert@gmail.com>
2019-04-15cmd/link, runtime: mark goexit as the top of the call stackMichael Munday
This CL adds a new attribute, TOPFRAME, which can be used to mark functions that should be treated as being at the top of the call stack. The function `runtime.goexit` has been marked this way on architectures that use a link register. This will stop programs that use DWARF to unwind the call stack from unwinding past `runtime.goexit` on architectures that use a link register. For example, it eliminates "corrupt stack?" warnings when generating a backtrace that hits `runtime.goexit` in GDB on s390x. Similar code should be added for non-link-register architectures (i.e. amd64, 386). They mark the top of the call stack slightly differently to link register architectures so I haven't added that code (they need to mark "rip" as undefined). Fixes #24385. Change-Id: I15b4c69ac75b491daa0acf0d981cb80eb06488de Reviewed-on: https://go-review.googlesource.com/c/go/+/169726 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-18cmd/compile,runtime: provide index information on bounds check failureKeith Randall
A few examples (for accessing a slice of length 3): s[-1] runtime error: index out of range [-1] s[3] runtime error: index out of range [3] with length 3 s[-1:0] runtime error: slice bounds out of range [-1:] s[3:0] runtime error: slice bounds out of range [3:0] s[3:-1] runtime error: slice bounds out of range [:-1] s[3:4] runtime error: slice bounds out of range [:4] with capacity 3 s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3 Note that in cases where there are multiple things wrong with the indexes (e.g. s[3:-1]), we report one of those errors kind of arbitrarily, currently the rightmost one. An exhaustive set of examples is in issue30116[u].out in the CL. The message text has the same prefix as the old message text. That leads to slightly awkward phrasing but hopefully minimizes the chance that code depending on the error text will break. Increases the size of the go binary by 0.5% (amd64). The panic functions take arguments in registers in order to keep the size of the compiled code as small as possible. Fixes #30116 Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd Reviewed-on: https://go-review.googlesource.com/c/go/+/161477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-11-17runtime: don't use thread local storage before it is set up on iOSElias Naur
CL 138675 added a call to runtime.save_g which uses thread local storage to store g. On iOS however, that storage was not initialized yet. Move the call to below _cgo_init where it is set up. Change-Id: I14538d3e7d56ff35a6fa02c47bca306d24c38010 Reviewed-on: https://go-review.googlesource.com/c/150157 Run-TryBot: Elias Naur <elias.naur@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-13cmd,runtime: enable race detector on arm64Fangming.Fang
Changes include: 1. enable compiler option -race for arm64 2. add runtime/race_arm64.s to manage the calls from Go to the compiler-rt runtime 3. change racewalk.go to call racefuncenterfp instead of racefuncenter on arm64 to allow the caller pc to be obtained in the asm code before calling the tsan version 4. race_linux_arm64.syso comes from compiler-rt which just supports 48bit VA, compiler-rt is fetched from master branch which latest commit is 3aa2b775d08f903f804246af10b Fixes #25682 Change-Id: I04364c580b8157fd117deecae74a4656ba16e005 Reviewed-on: https://go-review.googlesource.com/c/138675 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-12runtime, reflect: access runtime.reflectcall directlyAustin Clements
Currently, package runtime contains the definition of reflect.call, even though it's just a jump to runtime.reflectcall. This "push" symbol is confusing, since it's not clear where the definition of reflect.call comes from when you're in the reflect package. Replace this with a "pull" symbol: the runtime now defines only runtime.reflectcall and package reflect uses a go:linkname to access this symbol directly. This makes it clear where reflect.call is coming from without any spooky action at a distance and eliminates all of the definitions of reflect.call in the runtime. Change-Id: I3ec73cd394efe9df8d3061a57c73aece2e7048dd Reviewed-on: https://go-review.googlesource.com/c/148657 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-08-29build: support frame-pointer for arm64Zheng Xu
Supporting frame-pointer makes Linux's perf and other profilers much more useful because it lets them gather a stack trace efficiently on profiling events. Major changes include: 1. save FP on the word below where RSP is pointing to (proposed by Cherry and Austin) 2. adjust some specific offsets in runtime assembly and wrapper code 3. add support to FP in goroutine scheduler 4. adjust link stack overflow check to take the extra word into account 5. adjust nosplit test cases to enable frame sizes which are 16 bytes aligned Performance impacts on go1 benchmarks: Enable frame-pointer (by default) name old time/op new time/op delta BinaryTree17-46 5.94s ± 0% 6.00s ± 0% +1.03% (p=0.029 n=4+4) Fannkuch11-46 2.84s ± 1% 2.77s ± 0% -2.58% (p=0.008 n=5+5) FmtFprintfEmpty-46 55.0ns ± 1% 58.9ns ± 1% +7.06% (p=0.008 n=5+5) FmtFprintfString-46 102ns ± 0% 105ns ± 0% +2.94% (p=0.008 n=5+5) FmtFprintfInt-46 118ns ± 0% 117ns ± 1% -1.19% (p=0.000 n=4+5) FmtFprintfIntInt-46 181ns ± 0% 182ns ± 1% ~ (p=0.444 n=5+5) FmtFprintfPrefixedInt-46 215ns ± 1% 214ns ± 0% ~ (p=0.254 n=5+4) FmtFprintfFloat-46 292ns ± 0% 296ns ± 0% +1.46% (p=0.029 n=4+4) FmtManyArgs-46 720ns ± 0% 732ns ± 0% +1.72% (p=0.008 n=5+5) GobDecode-46 9.82ms ± 1% 10.03ms ± 2% +2.10% (p=0.008 n=5+5) GobEncode-46 8.14ms ± 0% 8.72ms ± 1% +7.14% (p=0.008 n=5+5) Gzip-46 420ms ± 0% 424ms ± 0% +0.92% (p=0.008 n=5+5) Gunzip-46 48.2ms ± 0% 48.4ms ± 0% +0.41% (p=0.008 n=5+5) HTTPClientServer-46 201µs ± 4% 201µs ± 0% ~ (p=0.730 n=5+4) JSONEncode-46 17.1ms ± 0% 17.7ms ± 1% +3.80% (p=0.008 n=5+5) JSONDecode-46 88.0ms ± 0% 90.1ms ± 0% +2.42% (p=0.008 n=5+5) Mandelbrot200-46 5.06ms ± 0% 5.07ms ± 0% ~ (p=0.310 n=5+5) GoParse-46 5.04ms ± 0% 5.12ms ± 0% +1.53% (p=0.008 n=5+5) RegexpMatchEasy0_32-46 117ns ± 0% 117ns ± 0% ~ (all equal) RegexpMatchEasy0_1K-46 332ns ± 0% 329ns ± 0% -0.78% (p=0.008 n=5+5) RegexpMatchEasy1_32-46 104ns ± 0% 113ns ± 0% +8.65% (p=0.029 n=4+4) RegexpMatchEasy1_1K-46 563ns ± 0% 569ns ± 0% +1.10% (p=0.008 n=5+5) RegexpMatchMedium_32-46 167ns ± 2% 177ns ± 1% +5.74% (p=0.008 n=5+5) RegexpMatchMedium_1K-46 49.5µs ± 0% 53.4µs ± 0% +7.81% (p=0.008 n=5+5) RegexpMatchHard_32-46 2.56µs ± 1% 2.72µs ± 0% +6.01% (p=0.008 n=5+5) RegexpMatchHard_1K-46 77.0µs ± 0% 81.8µs ± 0% +6.24% (p=0.016 n=5+4) Revcomp-46 631ms ± 1% 627ms ± 1% ~ (p=0.095 n=5+5) Template-46 81.8ms ± 0% 86.3ms ± 0% +5.55% (p=0.008 n=5+5) TimeParse-46 423ns ± 0% 432ns ± 0% +2.32% (p=0.008 n=5+5) TimeFormat-46 478ns ± 2% 497ns ± 1% +3.89% (p=0.008 n=5+5) [Geo mean] 71.6µs 73.3µs +2.45% name old speed new speed delta GobDecode-46 78.1MB/s ± 1% 76.6MB/s ± 2% -2.04% (p=0.008 n=5+5) GobEncode-46 94.3MB/s ± 0% 88.0MB/s ± 1% -6.67% (p=0.008 n=5+5) Gzip-46 46.2MB/s ± 0% 45.8MB/s ± 0% -0.91% (p=0.008 n=5+5) Gunzip-46 403MB/s ± 0% 401MB/s ± 0% -0.41% (p=0.008 n=5+5) JSONEncode-46 114MB/s ± 0% 109MB/s ± 1% -3.66% (p=0.008 n=5+5) JSONDecode-46 22.0MB/s ± 0% 21.5MB/s ± 0% -2.35% (p=0.008 n=5+5) GoParse-46 11.5MB/s ± 0% 11.3MB/s ± 0% -1.51% (p=0.008 n=5+5) RegexpMatchEasy0_32-46 272MB/s ± 0% 272MB/s ± 1% ~ (p=0.190 n=4+5) RegexpMatchEasy0_1K-46 3.08GB/s ± 0% 3.11GB/s ± 0% +0.77% (p=0.008 n=5+5) RegexpMatchEasy1_32-46 306MB/s ± 0% 283MB/s ± 0% -7.63% (p=0.029 n=4+4) RegexpMatchEasy1_1K-46 1.82GB/s ± 0% 1.80GB/s ± 0% -1.07% (p=0.008 n=5+5) RegexpMatchMedium_32-46 5.99MB/s ± 0% 5.64MB/s ± 1% -5.77% (p=0.016 n=4+5) RegexpMatchMedium_1K-46 20.7MB/s ± 0% 19.2MB/s ± 0% -7.25% (p=0.008 n=5+5) RegexpMatchHard_32-46 12.5MB/s ± 1% 11.8MB/s ± 0% -5.66% (p=0.008 n=5+5) RegexpMatchHard_1K-46 13.3MB/s ± 0% 12.5MB/s ± 1% -6.01% (p=0.008 n=5+5) Revcomp-46 402MB/s ± 1% 405MB/s ± 1% ~ (p=0.095 n=5+5) Template-46 23.7MB/s ± 0% 22.5MB/s ± 0% -5.25% (p=0.008 n=5+5) [Geo mean] 82.2MB/s 79.6MB/s -3.26% Disable frame-pointer (GOEXPERIMENT=noframepointer) name old time/op new time/op delta BinaryTree17-46 5.94s ± 0% 5.96s ± 0% +0.39% (p=0.029 n=4+4) Fannkuch11-46 2.84s ± 1% 2.79s ± 1% -1.68% (p=0.008 n=5+5) FmtFprintfEmpty-46 55.0ns ± 1% 55.2ns ± 3% ~ (p=0.794 n=5+5) FmtFprintfString-46 102ns ± 0% 103ns ± 0% +0.98% (p=0.016 n=5+4) FmtFprintfInt-46 118ns ± 0% 115ns ± 0% -2.54% (p=0.029 n=4+4) FmtFprintfIntInt-46 181ns ± 0% 179ns ± 0% -1.10% (p=0.000 n=5+4) FmtFprintfPrefixedInt-46 215ns ± 1% 213ns ± 0% ~ (p=0.143 n=5+4) FmtFprintfFloat-46 292ns ± 0% 300ns ± 0% +2.83% (p=0.029 n=4+4) FmtManyArgs-46 720ns ± 0% 739ns ± 0% +2.64% (p=0.008 n=5+5) GobDecode-46 9.82ms ± 1% 9.78ms ± 1% ~ (p=0.151 n=5+5) GobEncode-46 8.14ms ± 0% 8.12ms ± 1% ~ (p=0.690 n=5+5) Gzip-46 420ms ± 0% 420ms ± 0% ~ (p=0.548 n=5+5) Gunzip-46 48.2ms ± 0% 48.0ms ± 0% -0.33% (p=0.032 n=5+5) HTTPClientServer-46 201µs ± 4% 199µs ± 3% ~ (p=0.548 n=5+5) JSONEncode-46 17.1ms ± 0% 17.2ms ± 0% ~ (p=0.056 n=5+5) JSONDecode-46 88.0ms ± 0% 88.6ms ± 0% +0.64% (p=0.008 n=5+5) Mandelbrot200-46 5.06ms ± 0% 5.07ms ± 0% ~ (p=0.548 n=5+5) GoParse-46 5.04ms ± 0% 5.07ms ± 0% +0.65% (p=0.008 n=5+5) RegexpMatchEasy0_32-46 117ns ± 0% 112ns ± 4% -4.27% (p=0.016 n=4+5) RegexpMatchEasy0_1K-46 332ns ± 0% 330ns ± 1% ~ (p=0.095 n=5+5) RegexpMatchEasy1_32-46 104ns ± 0% 110ns ± 1% +5.29% (p=0.029 n=4+4) RegexpMatchEasy1_1K-46 563ns ± 0% 567ns ± 2% ~ (p=0.151 n=5+5) RegexpMatchMedium_32-46 167ns ± 2% 166ns ± 0% ~ (p=0.333 n=5+4) RegexpMatchMedium_1K-46 49.5µs ± 0% 49.6µs ± 0% ~ (p=0.841 n=5+5) RegexpMatchHard_32-46 2.56µs ± 1% 2.49µs ± 0% -2.81% (p=0.008 n=5+5) RegexpMatchHard_1K-46 77.0µs ± 0% 75.8µs ± 0% -1.55% (p=0.008 n=5+5) Revcomp-46 631ms ± 1% 628ms ± 0% ~ (p=0.095 n=5+5) Template-46 81.8ms ± 0% 84.3ms ± 1% +3.05% (p=0.008 n=5+5) TimeParse-46 423ns ± 0% 425ns ± 0% +0.52% (p=0.008 n=5+5) TimeFormat-46 478ns ± 2% 478ns ± 1% ~ (p=1.000 n=5+5) [Geo mean] 71.6µs 71.6µs -0.01% name old speed new speed delta GobDecode-46 78.1MB/s ± 1% 78.5MB/s ± 1% ~ (p=0.151 n=5+5) GobEncode-46 94.3MB/s ± 0% 94.5MB/s ± 1% ~ (p=0.690 n=5+5) Gzip-46 46.2MB/s ± 0% 46.2MB/s ± 0% ~ (p=0.571 n=5+5) Gunzip-46 403MB/s ± 0% 404MB/s ± 0% +0.33% (p=0.032 n=5+5) JSONEncode-46 114MB/s ± 0% 113MB/s ± 0% ~ (p=0.056 n=5+5) JSONDecode-46 22.0MB/s ± 0% 21.9MB/s ± 0% -0.64% (p=0.008 n=5+5) GoParse-46 11.5MB/s ± 0% 11.4MB/s ± 0% -0.64% (p=0.008 n=5+5) RegexpMatchEasy0_32-46 272MB/s ± 0% 285MB/s ± 4% +4.74% (p=0.016 n=4+5) RegexpMatchEasy0_1K-46 3.08GB/s ± 0% 3.10GB/s ± 1% ~ (p=0.151 n=5+5) RegexpMatchEasy1_32-46 306MB/s ± 0% 290MB/s ± 1% -5.21% (p=0.029 n=4+4) RegexpMatchEasy1_1K-46 1.82GB/s ± 0% 1.81GB/s ± 2% ~ (p=0.151 n=5+5) RegexpMatchMedium_32-46 5.99MB/s ± 0% 6.02MB/s ± 1% ~ (p=0.063 n=4+5) RegexpMatchMedium_1K-46 20.7MB/s ± 0% 20.7MB/s ± 0% ~ (p=0.659 n=5+5) RegexpMatchHard_32-46 12.5MB/s ± 1% 12.8MB/s ± 0% +2.88% (p=0.008 n=5+5) RegexpMatchHard_1K-46 13.3MB/s ± 0% 13.5MB/s ± 0% +1.58% (p=0.008 n=5+5) Revcomp-46 402MB/s ± 1% 405MB/s ± 0% ~ (p=0.095 n=5+5) Template-46 23.7MB/s ± 0% 23.0MB/s ± 1% -2.95% (p=0.008 n=5+5) [Geo mean] 82.2MB/s 82.3MB/s +0.04% Frame-pointer is enabled on Linux by default but can be disabled by setting: GOEXPERIMENT=noframepointer. Fixes #10110 Change-Id: I1bfaca6dba29a63009d7c6ab04ed7a1413d9479e Reviewed-on: https://go-review.googlesource.com/61511 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-06-12runtime: use libc for signal functions on iOSElias Naur
Also: - Add extra SystemStack space for darwin/arm64 just like for darwin/arm. - Removed redundant stack alignment; the arm64 hardware enforces the 16 byte alignment. - Save and restore the g registers at library initialization. - Zero g registers since libpreinit can call libc functions that in turn use asmcgocall. asmcgocall requires an initialized g. - Change asmcgocall to work even if no g is set. The change mimics amd64. Change-Id: I1b8c63b07cfec23b909c0d215b50dc229f8adbc8 Reviewed-on: https://go-review.googlesource.com/117176 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-21cmd/link,runtime: move syscalls to libc on iOSElias Naur
This CL is the darwin/arm and darwin/arm64 equivalent to CL 108679, 110215, 110437, 110438, 111258, 110655. Updates #17490 Change-Id: Ia95b27b38f9c3535012c566f17a44b4ed26b9db6 Reviewed-on: https://go-review.googlesource.com/111015 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-04-30cmd/compile: intrinsify runtime.getcallerpc on arm64Wei Xiao
Add a compiler intrinsic for getcallerpc on arm64 for better code generation. Change-Id: I897e670a2b8ffa1a8c2fdc638f5b2c44bda26318 Reviewed-on: https://go-review.googlesource.com/109276 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-04runtime: implement aeshash for arm64 platformMeng Zhuo
Fix #10109 name old time/op new time/op delta Hash5 72.3ns ± 0% 51.5ns ± 0% -28.71% (p=0.000 n=4+5) Hash16 78.8ns ± 0% 48.7ns ± 0% ~ (p=0.079 n=4+5) Hash64 196ns ±25% 73ns ±16% -62.68% (p=0.008 n=5+5) Hash1024 1.57µs ± 0% 0.27µs ± 1% -82.90% (p=0.000 n=5+4) Hash65536 96.5µs ± 0% 14.3µs ± 0% -85.14% (p=0.016 n=5+4) HashStringSpeed 156ns ± 6% 129ns ± 3% -17.56% (p=0.008 n=5+5) HashBytesSpeed 227ns ± 1% 200ns ± 1% -11.98% (p=0.008 n=5+5) HashInt32Speed 116ns ± 2% 102ns ± 0% -11.92% (p=0.016 n=5+4) HashInt64Speed 120ns ± 3% 101ns ± 2% -15.55% (p=0.008 n=5+5) HashStringArraySpeed 342ns ± 0% 306ns ± 2% -10.58% (p=0.008 n=5+5) FastrandHashiter 217ns ± 1% 217ns ± 1% ~ (p=1.000 n=5+5) name old speed new speed delta Hash5 69.1MB/s ± 0% 97.0MB/s ± 0% +40.32% (p=0.008 n=5+5) Hash16 203MB/s ± 0% 329MB/s ± 0% +61.76% (p=0.016 n=4+5) Hash64 332MB/s ±21% 881MB/s ±14% +165.66% (p=0.008 n=5+5) Hash1024 651MB/s ± 0% 3652MB/s ±17% +460.73% (p=0.008 n=5+5) Hash65536 679MB/s ± 0% 4570MB/s ± 0% +572.85% (p=0.016 n=5+4) Change-Id: I573028979f84cf2e0e087951271d5de8865dbf04 Reviewed-on: https://go-review.googlesource.com/89755 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-03-09runtime: fix abort handling on arm64Austin Clements
The implementation of runtime.abort on arm64 currently branches to address 0, which results in a signal from PC 0, rather than from runtime.abort, so the runtime fails to recognize it as an abort. Fix runtime.abort on arm64 to read from address 0 like what other architectures do and recognize this in the signal handler. Should fix the linux/arm64 build. Change-Id: I960ab630daaeadc9190287604d4d8337b1ea3853 Reviewed-on: https://go-review.googlesource.com/99895 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-08runtime: make throw safer to callAustin Clements
Currently, throw may grow the stack, which means whenever we call it from a context where it's not safe to grow the stack, we first have to switch to the system stack. This is pretty easy to get wrong. Fix this by making throw switch to the system stack so it doesn't grow the stack and is hence safe to call without a system stack switch at the call site. The only thing this complicates is badsystemstack itself, which would now go into an infinite loop before printing anything (previously it would also go into an infinite loop, but would at least print the error first). Fix this by making badsystemstack do a direct write and then crash hard. Change-Id: Ic5b4a610df265e47962dcfa341cabac03c31c049 Reviewed-on: https://go-review.googlesource.com/93659 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04internal/bytealg: move compare functions to bytealgKeith Randall
Move bytes.Compare and runtime·cmpstring to bytealg. Update #19792 Change-Id: I139e6d7c59686bef7a3017e3dec99eba5fd10447 Reviewed-on: https://go-review.googlesource.com/98515 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-03internal/bytealg: move equal functions to bytealgKeith Randall
Move bytes.Equal, runtime.memequal, and runtime.memequal_varlen to the bytealg package. Update #19792 Change-Id: Ic4175e952936016ea0bda6c7c3dbb33afdc8e4ac Reviewed-on: https://go-review.googlesource.com/98355 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-02internal/bytealg: move IndexByte asssembly to the new bytealg packageKeith Randall
Move the IndexByte function from the runtime to a new bytealg package. The new package will eventually hold all the optimized assembly for groveling through byte slices and strings. It seems a better home for this code than randomly keeping it in runtime. Once this is in, the next step is to move the other functions (Compare, Equal, ...). Update #19792 This change seems complicated enough that we might just declare "not worth it" and abandon. Opinions welcome. The core assembly is all unchanged, except minor modifications where the code reads cpu feature bits. The wrapper functions have been cleaned up as they are now actually checked by vet. Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796 Reviewed-on: https://go-review.googlesource.com/98016 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-13runtime: buffered write barrier for arm64Austin Clements
Updates #22460. Change-Id: I5f8fbece9545840f5fc4c9834e2050b0920776f0 Reviewed-on: https://go-review.googlesource.com/92699 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-12runtime: use NOFRAME on arm64Austin Clements
This replaces frame size -8 with the NOFRAME flag in arm64 assembly. This was automated with: sed -i -e 's/\(^TEXT.*[A-Z]\),\( *\)\$-8/\1|NOFRAME,\2$0/' $(find -name '*_arm64.s') Plus a manual fix to mkduff.go. The go binary is identical before and after this change. Change-Id: I0310384d1a584118c41d1cd3a042bb8ea7227efa Reviewed-on: https://go-review.googlesource.com/92043 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-12runtime: fix silly frame sizes on arm and arm64Austin Clements
"-8" is not a sensible frame size on arm and we're about to start rejecting it. Replace it with -4. Likewise, "-4" is not a sensible frame size on arm64 and we're about to start rejecting it. Replace it with -8. Finally, clean up some places we're weirdly inconsistent about using 0 versus -8. Change-Id: If85e229993d5f7f1f0cfa9852b4e294d053bd784 Reviewed-on: https://go-review.googlesource.com/92038 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-15runtime: IndexByte and memclr perf improvements on arm64wei xiao
Update runtime asm_arm64.s and memclr_arm64.s to improve performance by using SIMD instructions to do more in parallel. It shows improvement on bytes, html and go1 benchmarks (particualrly regexp, which uses IndexByte frequently). Benchmark results of bytes: name old time/op new time/op delta IndexByte/10-8 28.5ns ± 0% 19.5ns ± 0% -31.58% (p=0.000 n=10+10) IndexByte/32-8 52.6ns ± 0% 19.0ns ± 0% -63.88% (p=0.000 n=10+10) IndexByte/4K-8 4.12µs ± 0% 0.49µs ± 0% -88.16% (p=0.000 n=10+10) IndexByte/4M-8 4.29ms ± 1% 0.70ms ±26% -83.65% (p=0.000 n=10+10) IndexByte/64M-8 69.7ms ± 0% 16.0ms ± 0% -76.97% (p=0.000 n=9+10) IndexBytePortable/10-8 34.0ns ± 0% 34.0ns ± 0% ~ (all equal) IndexBytePortable/32-8 66.1ns ± 0% 66.1ns ± 0% ~ (p=0.471 n=9+9) IndexBytePortable/4K-8 6.17µs ± 0% 6.17µs ± 0% ~ (all equal) IndexBytePortable/4M-8 6.33ms ± 0% 6.35ms ± 0% +0.21% (p=0.002 n=10+9) IndexBytePortable/64M-8 103ms ± 0% 103ms ± 0% +0.01% (p=0.017 n=9+10) name old speed new speed delta IndexByte/10-8 351MB/s ± 0% 512MB/s ± 0% +46.14% (p=0.000 n=9+10) IndexByte/32-8 609MB/s ± 0% 1683MB/s ± 0% +176.40% (p=0.000 n=10+10) IndexByte/4K-8 994MB/s ± 0% 8378MB/s ± 0% +742.75% (p=0.000 n=10+10) IndexByte/4M-8 977MB/s ± 1% 6149MB/s ±32% +529.29% (p=0.000 n=10+10) IndexByte/64M-8 963MB/s ± 0% 4182MB/s ± 0% +334.29% (p=0.000 n=9+10) IndexBytePortable/10-8 294MB/s ± 0% 294MB/s ± 0% +0.17% (p=0.000 n=8+8) IndexBytePortable/32-8 484MB/s ± 0% 484MB/s ± 0% ~ (p=0.877 n=9+9) IndexBytePortable/4K-8 664MB/s ± 0% 664MB/s ± 0% ~ (p=0.242 n=8+9) IndexBytePortable/4M-8 662MB/s ± 0% 661MB/s ± 0% -0.21% (p=0.002 n=10+9) IndexBytePortable/64M-8 652MB/s ± 0% 652MB/s ± 0% ~ (p=0.065 n=10+10) Benchmark results of html: name old time/op new time/op delta Escape-8 62.0µs ± 1% 61.0µs ± 1% -1.69% (p=0.000 n=9+10) EscapeNone-8 10.2µs ± 0% 10.2µs ± 0% -0.09% (p=0.022 n=9+10) Unescape-8 71.9µs ± 0% 68.7µs ± 0% -4.35% (p=0.000 n=10+10) UnescapeNone-8 4.03µs ± 0% 0.48µs ± 0% -88.08% (p=0.000 n=10+10) UnescapeSparse-8 10.7µs ± 2% 7.1µs ± 3% -33.91% (p=0.000 n=10+10) UnescapeDense-8 53.2µs ± 1% 53.5µs ± 1% ~ (p=0.143 n=10+10) Benchmark results of go1: name old time/op new time/op delta BinaryTree17-8 6.53s ± 0% 6.48s ± 2% ~ (p=0.190 n=4+5) Fannkuch11-8 6.35s ± 1% 6.35s ± 0% ~ (p=1.000 n=5+5) FmtFprintfEmpty-8 108ns ± 1% 101ns ± 2% -6.32% (p=0.008 n=5+5) FmtFprintfString-8 172ns ± 1% 182ns ± 2% +5.70% (p=0.008 n=5+5) FmtFprintfInt-8 207ns ± 0% 207ns ± 0% ~ (p=0.444 n=5+5) FmtFprintfIntInt-8 277ns ± 1% 276ns ± 1% ~ (p=0.873 n=5+5) FmtFprintfPrefixedInt-8 386ns ± 0% 382ns ± 1% -1.04% (p=0.024 n=5+5) FmtFprintfFloat-8 492ns ± 0% 492ns ± 1% ~ (p=0.571 n=4+5) FmtManyArgs-8 1.32µs ± 1% 1.33µs ± 0% ~ (p=0.087 n=5+5) GobDecode-8 16.8ms ± 2% 16.7ms ± 1% ~ (p=1.000 n=5+5) GobEncode-8 14.1ms ± 1% 14.0ms ± 1% ~ (p=0.056 n=5+5) Gzip-8 788ms ± 0% 802ms ± 0% +1.71% (p=0.008 n=5+5) Gunzip-8 83.6ms ± 0% 83.9ms ± 0% +0.40% (p=0.008 n=5+5) HTTPClientServer-8 120µs ± 0% 120µs ± 1% ~ (p=0.548 n=5+5) JSONEncode-8 33.2ms ± 0% 33.0ms ± 1% -0.71% (p=0.008 n=5+5) JSONDecode-8 152ms ± 1% 152ms ± 1% ~ (p=1.000 n=5+5) Mandelbrot200-8 10.0ms ± 0% 10.0ms ± 0% -0.05% (p=0.008 n=5+5) GoParse-8 7.97ms ± 0% 7.98ms ± 0% ~ (p=0.690 n=5+5) RegexpMatchEasy0_32-8 233ns ± 1% 206ns ± 0% -11.44% (p=0.016 n=5+4) RegexpMatchEasy0_1K-8 1.86µs ± 0% 0.77µs ± 1% -58.54% (p=0.008 n=5+5) RegexpMatchEasy1_32-8 250ns ± 0% 205ns ± 0% -18.07% (p=0.008 n=5+5) RegexpMatchEasy1_1K-8 2.28µs ± 0% 1.11µs ± 0% -51.09% (p=0.029 n=4+4) RegexpMatchMedium_32-8 332ns ± 1% 301ns ± 2% -9.45% (p=0.008 n=5+5) RegexpMatchMedium_1K-8 85.5µs ± 2% 78.8µs ± 0% -7.83% (p=0.008 n=5+5) RegexpMatchHard_32-8 4.34µs ± 1% 4.27µs ± 0% -1.49% (p=0.008 n=5+5) RegexpMatchHard_1K-8 130µs ± 1% 127µs ± 0% -2.53% (p=0.008 n=5+5) Revcomp-8 1.35s ± 1% 1.13s ± 1% -16.17% (p=0.008 n=5+5) Template-8 160ms ± 2% 162ms ± 2% ~ (p=0.222 n=5+5) TimeParse-8 795ns ± 2% 778ns ± 1% ~ (p=0.095 n=5+5) TimeFormat-8 782ns ± 0% 786ns ± 1% +0.59% (p=0.040 n=5+5) name old speed new speed delta GobDecode-8 45.8MB/s ± 2% 45.9MB/s ± 1% ~ (p=1.000 n=5+5) GobEncode-8 54.3MB/s ± 1% 55.0MB/s ± 1% ~ (p=0.056 n=5+5) Gzip-8 24.6MB/s ± 0% 24.2MB/s ± 0% -1.69% (p=0.008 n=5+5) Gunzip-8 232MB/s ± 0% 231MB/s ± 0% -0.40% (p=0.008 n=5+5) JSONEncode-8 58.4MB/s ± 0% 58.8MB/s ± 1% +0.71% (p=0.008 n=5+5) JSONDecode-8 12.8MB/s ± 1% 12.8MB/s ± 1% ~ (p=1.000 n=5+5) GoParse-8 7.27MB/s ± 0% 7.26MB/s ± 0% ~ (p=0.762 n=5+5) RegexpMatchEasy0_32-8 137MB/s ± 1% 155MB/s ± 0% +12.93% (p=0.008 n=5+5) RegexpMatchEasy0_1K-8 551MB/s ± 0% 1329MB/s ± 1% +141.11% (p=0.008 n=5+5) RegexpMatchEasy1_32-8 128MB/s ± 0% 156MB/s ± 0% +22.00% (p=0.008 n=5+5) RegexpMatchEasy1_1K-8 449MB/s ± 0% 920MB/s ± 0% +104.68% (p=0.016 n=4+5) RegexpMatchMedium_32-8 3.00MB/s ± 0% 3.32MB/s ± 2% +10.60% (p=0.016 n=4+5) RegexpMatchMedium_1K-8 12.0MB/s ± 2% 13.0MB/s ± 0% +8.48% (p=0.008 n=5+5) RegexpMatchHard_32-8 7.38MB/s ± 1% 7.49MB/s ± 0% +1.49% (p=0.008 n=5+5) RegexpMatchHard_1K-8 7.88MB/s ± 1% 8.08MB/s ± 0% +2.59% (p=0.008 n=5+5) Revcomp-8 188MB/s ± 1% 224MB/s ± 1% +19.29% (p=0.008 n=5+5) Template-8 12.2MB/s ± 2% 12.0MB/s ± 2% ~ (p=0.206 n=5+5) Change-Id: I94116620a287d173a6f60510684362e500f54887 Reviewed-on: https://go-review.googlesource.com/33597 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-30runtime: make systemstack tail call if already switchedAustin Clements
Currently systemstack always calls its argument, even if we're already on the system stack. Unfortunately, traceback with _TraceJump stops at the first systemstack it sees, which often cuts off runtime stacks early in profiles. Fix this by performing a tail call if we're already on the system stack. This eliminates it from the traceback entirely, so it won't stop prematurely (or all get mushed into a single node in the profile graph). Change-Id: Ibc69e8765e899f8d3806078517b8c7314da196f4 Reviewed-on: https://go-review.googlesource.com/74050 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-10-29runtime: remove write barriers from newstack, gogoAustin Clements
Currently, newstack and gogo have write barriers for maintaining the context register saved in g.sched.ctxt. This is troublesome, because newstack can be called from go:nowritebarrierrec places that can't allow write barriers. It happens to be benign because g.sched.ctxt will always be nil on entry to newstack *and* it so happens the incoming ctxt will also always be nil in these contexts (I think/hope), but this is playing with fire. It's also desirable to mark newstack go:nowritebarrierrec to prevent any other, non-benign write barriers from creeping in, but we can't do that right now because of this one write barrier. Fix all of this by observing that g.sched.ctxt is really just a saved live pointer register. Hence, we can shade it when we scan g's stack and otherwise move it back and forth between the actual context register and g.sched.ctxt without write barriers. This means we can save it in morestack along with all of the other g.sched, eliminate the save from newstack along with its troublesome write barrier, and eliminate the shenanigans in gogo to invoke the write barrier when restoring it. Once we've done all of this, we can mark newstack go:nowritebarrierrec. Fixes #22385. For #22460. Change-Id: I43c24958e3f6785b53c1350e1e83c2844e0d1522 Reviewed-on: https://go-review.googlesource.com/72553 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-25bytes: add optimized Equal for arm64Wei Xiao
Use SIMD instructions when comparing chunks bigger than 16 bytes. Benchmark results of bytes: name old time/op new time/op delta Equal/0-8 6.52ns ± 1% 5.51ns ± 0% -15.43% (p=0.000 n=8+9) Equal/1-8 11.5ns ± 0% 10.5ns ± 0% -8.70% (p=0.000 n=10+10) Equal/6-8 19.0ns ± 0% 13.5ns ± 0% -28.95% (p=0.000 n=10+10) Equal/9-8 31.0ns ± 0% 13.5ns ± 0% -56.45% (p=0.000 n=10+10) Equal/15-8 40.0ns ± 0% 15.5ns ± 0% -61.25% (p=0.000 n=10+10) Equal/16-8 41.5ns ± 0% 14.5ns ± 0% -65.06% (p=0.000 n=10+10) Equal/20-8 47.5ns ± 0% 17.0ns ± 0% -64.21% (p=0.000 n=10+10) Equal/32-8 65.6ns ± 0% 17.0ns ± 0% -74.09% (p=0.000 n=10+10) Equal/4K-8 6.17µs ± 0% 0.57µs ± 1% -90.76% (p=0.000 n=10+10) Equal/4M-8 6.41ms ± 0% 1.11ms ±14% -82.71% (p=0.000 n=8+10) Equal/64M-8 104ms ± 0% 33ms ± 0% -68.64% (p=0.000 n=10+10) EqualPort/1-8 13.0ns ± 0% 13.0ns ± 0% ~ (all equal) EqualPort/6-8 22.0ns ± 0% 22.7ns ± 0% +3.06% (p=0.000 n=8+9) EqualPort/32-8 78.1ns ± 0% 78.1ns ± 0% ~ (all equal) EqualPort/4K-8 7.54µs ± 0% 7.61µs ± 0% +0.92% (p=0.000 n=10+8) EqualPort/4M-8 8.16ms ± 2% 8.05ms ± 1% -1.31% (p=0.023 n=10+10) EqualPort/64M-8 142ms ± 0% 142ms ± 0% +0.37% (p=0.000 n=10+10) CompareBytesEqual-8 39.0ns ± 0% 41.6ns ± 2% +6.67% (p=0.000 n=9+10) name old speed new speed delta Equal/1-8 86.9MB/s ± 0% 95.2MB/s ± 0% +9.53% (p=0.000 n=8+8) Equal/6-8 315MB/s ± 0% 444MB/s ± 0% +40.74% (p=0.000 n=9+10) Equal/9-8 290MB/s ± 0% 666MB/s ± 0% +129.63% (p=0.000 n=8+10) Equal/15-8 375MB/s ± 0% 967MB/s ± 0% +158.09% (p=0.000 n=10+10) Equal/16-8 385MB/s ± 0% 1103MB/s ± 0% +186.24% (p=0.000 n=10+9) Equal/20-8 421MB/s ± 0% 1175MB/s ± 0% +179.44% (p=0.000 n=9+10) Equal/32-8 488MB/s ± 0% 1881MB/s ± 0% +285.34% (p=0.000 n=10+8) Equal/4K-8 664MB/s ± 0% 7181MB/s ± 1% +981.32% (p=0.000 n=10+10) Equal/4M-8 654MB/s ± 0% 3822MB/s ±16% +484.15% (p=0.000 n=8+10) Equal/64M-8 645MB/s ± 0% 2056MB/s ± 0% +218.90% (p=0.000 n=10+10) EqualPort/1-8 76.8MB/s ± 0% 76.7MB/s ± 0% -0.09% (p=0.023 n=10+10) EqualPort/6-8 272MB/s ± 0% 264MB/s ± 0% -2.94% (p=0.000 n=8+10) EqualPort/32-8 410MB/s ± 0% 410MB/s ± 0% +0.01% (p=0.004 n=9+10) EqualPort/4K-8 543MB/s ± 0% 538MB/s ± 0% -0.91% (p=0.000 n=9+9) EqualPort/4M-8 514MB/s ± 2% 521MB/s ± 1% +1.31% (p=0.023 n=10+10) EqualPort/64M-8 473MB/s ± 0% 472MB/s ± 0% -0.37% (p=0.000 n=10+10) Benchmark results of go1: name old time/op new time/op delta BinaryTree17-8 6.53s ± 0% 6.52s ± 2% ~ (p=0.286 n=4+5) Fannkuch11-8 6.35s ± 1% 6.33s ± 0% ~ (p=0.690 n=5+5) FmtFprintfEmpty-8 108ns ± 1% 99ns ± 1% -8.31% (p=0.008 n=5+5) FmtFprintfString-8 172ns ± 1% 188ns ± 0% +9.43% (p=0.016 n=5+4) FmtFprintfInt-8 207ns ± 0% 202ns ± 0% -2.42% (p=0.008 n=5+5) FmtFprintfIntInt-8 277ns ± 1% 271ns ± 1% -2.02% (p=0.008 n=5+5) FmtFprintfPrefixedInt-8 386ns ± 0% 380ns ± 0% -1.55% (p=0.008 n=5+5) FmtFprintfFloat-8 492ns ± 0% 494ns ± 1% ~ (p=0.175 n=4+5) FmtManyArgs-8 1.32µs ± 1% 1.31µs ± 2% ~ (p=0.651 n=5+5) GobDecode-8 16.8ms ± 2% 16.9ms ± 1% ~ (p=0.310 n=5+5) GobEncode-8 14.1ms ± 1% 14.1ms ± 1% ~ (p=1.000 n=5+5) Gzip-8 788ms ± 0% 789ms ± 0% ~ (p=0.548 n=5+5) Gunzip-8 83.6ms ± 0% 83.6ms ± 0% ~ (p=0.548 n=5+5) HTTPClientServer-8 120µs ± 0% 120µs ± 1% ~ (p=0.690 n=5+5) JSONEncode-8 33.2ms ± 0% 33.6ms ± 0% +1.20% (p=0.008 n=5+5) JSONDecode-8 152ms ± 1% 146ms ± 1% -3.70% (p=0.008 n=5+5) Mandelbrot200-8 10.0ms ± 0% 10.0ms ± 0% ~ (p=0.151 n=5+5) GoParse-8 7.97ms ± 0% 8.06ms ± 0% +1.15% (p=0.008 n=5+5) RegexpMatchEasy0_32-8 233ns ± 1% 239ns ± 4% ~ (p=0.135 n=5+5) RegexpMatchEasy0_1K-8 1.86µs ± 0% 1.86µs ± 0% ~ (p=0.167 n=5+5) RegexpMatchEasy1_32-8 250ns ± 0% 263ns ± 1% +5.28% (p=0.008 n=5+5) RegexpMatchEasy1_1K-8 2.28µs ± 0% 2.13µs ± 0% -6.64% (p=0.000 n=4+5) RegexpMatchMedium_32-8 332ns ± 1% 319ns ± 0% -3.97% (p=0.008 n=5+5) RegexpMatchMedium_1K-8 85.5µs ± 2% 79.1µs ± 1% -7.42% (p=0.008 n=5+5) RegexpMatchHard_32-8 4.34µs ± 1% 4.42µs ± 7% ~ (p=0.881 n=5+5) RegexpMatchHard_1K-8 130µs ± 1% 127µs ± 0% -2.18% (p=0.008 n=5+5) Revcomp-8 1.35s ± 1% 1.34s ± 0% -0.58% (p=0.016 n=5+4) Template-8 160ms ± 2% 158ms ± 1% ~ (p=0.222 n=5+5) TimeParse-8 795ns ± 2% 772ns ± 2% -2.87% (p=0.024 n=5+5) TimeFormat-8 782ns ± 0% 784ns ± 0% ~ (p=0.198 n=5+5) name old speed new speed delta GobDecode-8 45.8MB/s ± 2% 45.5MB/s ± 1% ~ (p=0.310 n=5+5) GobEncode-8 54.3MB/s ± 1% 54.4MB/s ± 1% ~ (p=0.984 n=5+5) Gzip-8 24.6MB/s ± 0% 24.6MB/s ± 0% ~ (p=0.540 n=5+5) Gunzip-8 232MB/s ± 0% 232MB/s ± 0% ~ (p=0.548 n=5+5) JSONEncode-8 58.4MB/s ± 0% 57.7MB/s ± 0% -1.19% (p=0.008 n=5+5) JSONDecode-8 12.8MB/s ± 1% 13.3MB/s ± 1% +3.85% (p=0.008 n=5+5) GoParse-8 7.27MB/s ± 0% 7.18MB/s ± 0% -1.13% (p=0.008 n=5+5) RegexpMatchEasy0_32-8 137MB/s ± 1% 134MB/s ± 4% ~ (p=0.151 n=5+5) RegexpMatchEasy0_1K-8 551MB/s ± 0% 550MB/s ± 0% ~ (p=0.222 n=5+5) RegexpMatchEasy1_32-8 128MB/s ± 0% 121MB/s ± 1% -5.09% (p=0.008 n=5+5) RegexpMatchEasy1_1K-8 449MB/s ± 0% 481MB/s ± 0% +7.12% (p=0.016 n=4+5) RegexpMatchMedium_32-8 3.00MB/s ± 0% 3.13MB/s ± 0% +4.33% (p=0.016 n=4+5) RegexpMatchMedium_1K-8 12.0MB/s ± 2% 12.9MB/s ± 1% +7.98% (p=0.008 n=5+5) RegexpMatchHard_32-8 7.38MB/s ± 1% 7.25MB/s ± 7% ~ (p=0.952 n=5+5) RegexpMatchHard_1K-8 7.88MB/s ± 1% 8.05MB/s ± 0% +2.21% (p=0.008 n=5+5) Revcomp-8 188MB/s ± 1% 189MB/s ± 0% +0.58% (p=0.016 n=5+4) Template-8 12.2MB/s ± 2% 12.3MB/s ± 1% ~ (p=0.183 n=5+5) Change-Id: I65e79f3f8f8b2914678311c4f1b0a2d98459e220 Reviewed-on: https://go-review.googlesource.com/71110 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-10-17reflect: optimize CALLFN wrapper for arm64Wei Xiao
Optimize arm64 CALLFN wrapper with LDP/STP instructions. This provides a significant speedup for big argument copy. Benchmark results for reflect: name old time/op new time/op delta Call-8 79.0ns ± 4% 73.6ns ± 4% -6.78% (p=0.000 n=10+10) CallArgCopy/size=128-8 80.5ns ± 0% 60.3ns ± 0% -25.06% (p=0.000 n=10+9) CallArgCopy/size=256-8 119ns ± 2% 67ns ± 1% -43.59% (p=0.000 n=8+10) CallArgCopy/size=1024-8 524ns ± 1% 99ns ± 1% -81.03% (p=0.000 n=10+10) CallArgCopy/size=4096-8 837ns ± 0% 231ns ± 1% -72.42% (p=0.000 n=9+9) CallArgCopy/size=65536-8 13.6µs ± 6% 3.1µs ± 1% -77.38% (p=0.000 n=10+10) PtrTo-8 12.9ns ± 0% 13.1ns ± 3% +1.86% (p=0.000 n=10+10) FieldByName1-8 28.7ns ± 2% 28.6ns ± 2% ~ (p=0.408 n=9+10) FieldByName2-8 928ns ± 4% 946ns ± 8% ~ (p=0.326 n=9+10) FieldByName3-8 5.35µs ± 5% 5.32µs ± 5% ~ (p=0.755 n=10+10) InterfaceBig-8 2.57ns ± 0% 2.57ns ± 0% ~ (all equal) InterfaceSmall-8 2.57ns ± 0% 2.57ns ± 0% ~ (all equal) New-8 9.09ns ± 1% 8.83ns ± 1% -2.81% (p=0.000 n=10+9) name old alloc/op new alloc/op delta Call-8 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call-8 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128-8 1.59GB/s ± 0% 2.12GB/s ± 1% +33.46% (p=0.000 n=10+9) CallArgCopy/size=256-8 2.14GB/s ± 2% 3.81GB/s ± 1% +78.02% (p=0.000 n=8+10) CallArgCopy/size=1024-8 1.95GB/s ± 1% 10.30GB/s ± 0% +427.99% (p=0.000 n=10+9) CallArgCopy/size=4096-8 4.89GB/s ± 0% 17.69GB/s ± 1% +261.87% (p=0.000 n=9+9) CallArgCopy/size=65536-8 4.84GB/s ± 6% 21.36GB/s ± 1% +341.67% (p=0.000 n=10+10) Change-Id: I775d88b30c43cb2eda1d0612ac15e6d283e70beb Reviewed-on: https://go-review.googlesource.com/70570 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-22runtime: remove getcallerpc argumentAustin Clements
Now that getcallerpc is a compiler intrinsic on x86 and non-x86 platforms don't need the argument, we can drop it. Sadly, this doesn't let us remove any dummy arguments since all of those cases also use getcallersp, which still takes the argument pointer, but this is at least an improvement. Change-Id: I9c34a41cf2c18cba57f59938390bf9491efb22d2 Reviewed-on: https://go-review.googlesource.com/65474 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-08-22cmd/compile: replace eqstring with memequalMartin Möhrmann
eqstring is only called for strings with equal lengths. Instead of pushing a pointer and length for each argument string on the stack we can omit pushing one of the lengths on the stack. Changing eqstrings signature to eqstring(*uint8, *uint8, int) bool to implement the above optimization would make it very similar to the existing memequal(*any, *any, uintptr) bool function. Since string lengths are positive we can avoid code redundancy and use memequal instead of using eqstring with an optimized signature. go command binary size reduced by 4128 bytes on amd64. name old time/op new time/op delta CompareStringEqual 6.03ns ± 1% 5.71ns ± 1% -5.23% (p=0.000 n=19+18) CompareStringIdentical 2.88ns ± 1% 3.22ns ± 7% +11.86% (p=0.000 n=20+20) CompareStringSameLength 4.31ns ± 1% 4.01ns ± 1% -7.17% (p=0.000 n=19+19) CompareStringDifferentLength 0.29ns ± 2% 0.29ns ± 2% ~ (p=1.000 n=20+20) CompareStringBigUnaligned 64.3µs ± 2% 64.1µs ± 3% ~ (p=0.164 n=20+19) CompareStringBig 61.9µs ± 1% 61.6µs ± 2% -0.46% (p=0.033 n=20+19) Change-Id: Ice15f3b937c981f0d3bc8479a9ea0d10658ac8df Reviewed-on: https://go-review.googlesource.com/53650 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-08-11runtime, cmd/compile: add intrinsic getclosureptrCholerae Hu
Intrinsic enabled on all architectures, runtime asm implementation removed on all architectures. Fixes #21258 Change-Id: I2cb86d460b497c2f287a5b3df5c37fdb231c23a7 Reviewed-on: https://go-review.googlesource.com/53411 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: David Chase <drchase@google.com>
2017-08-08runtime: remove unused prefetch functionsMartin Möhrmann
The only non test user of the assembler prefetch functions is the heapBits.prefetch function which is itself unused. The runtime prefetch functions have no functionality on most platforms and are not inlineable since they are written in assembler. The function call overhead eliminates the performance gains that could be achieved with prefetching and would degrade performance for platforms where the functions are no-ops. If prefetch functions are needed back again later they can be improved by avoiding the function call overhead and implementing them as intrinsics. Change-Id: I52c553cf3607ffe09f0441c6e7a0a818cb21117d Reviewed-on: https://go-review.googlesource.com/44370 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-07cmd/compile, runtime: simplify multiway select implementationMatthew Dempsky
This commit reworks multiway select statements to use normal control flow primitives instead of the previous setjmp/longjmp-like behavior. This simplifies liveness analysis and should prevent issues around "returns twice" function calls within SSA passes. test/live.go is updated because liveness analysis's CFG is more representative of actual control flow. The case bodies are the only real successors of the selectgo call, but previously the selectsend, selectrecv, etc. calls were included in the successors list too. Updates #19331. Change-Id: I7f879b103a4b85e62fc36a270d812f54c0aa3e83 Reviewed-on: https://go-review.googlesource.com/37661 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-02-14runtime: remove stack barriersAustin Clements
Now that we don't rescan stacks, stack barriers are unnecessary. This removes all of the code and structures supporting them as well as tests that were specifically for stack barriers. Updates #17503. Change-Id: Ia29221730e0f2bbe7beab4fa757f31a032d9690c Reviewed-on: https://go-review.googlesource.com/36620 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-10runtime: implement fastrand in goSokolov Yura
So it could be inlined. Using bit-tricks it could be implemented without condition (improved trick version by Minux Ma). Simple benchmark shows it is faster on i386 and x86_64, though I don't know will it be faster on other architectures? benchmark old ns/op new ns/op delta BenchmarkFastrand-3 2.79 1.48 -46.95% BenchmarkFastrandHashiter-3 25.9 24.9 -3.86% Change-Id: Ie2eb6d0f598c0bb5fac7f6ad0f8b5e3eddaa361b Reviewed-on: https://go-review.googlesource.com/34782 Reviewed-by: Minux Ma <minux@golang.org> Run-TryBot: Minux Ma <minux@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-28runtime: add deletion barriers on gobuf.ctxtAustin Clements
gobuf.ctxt is set to nil from many places in assembly code and these assignments require write barriers with the hybrid barrier. Conveniently, in most of these places ctxt should already be nil, in which case we don't need the barrier. This commit changes these places to assert that ctxt is already nil. gogo is more complicated, since ctxt may not already be nil. For gogo, we manually perform the write barrier if ctxt is not nil. Updates #17503. Change-Id: I9d75e27c75a1b7f8b715ad112fc5d45ffa856d30 Reviewed-on: https://go-review.googlesource.com/31764 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-26runtime: simplify reflectcall write barriersAustin Clements
Currently reflectcall has a subtle dance with write barriers where the assembly code copies the result values from the stack to the in-heap argument frame without write barriers and then calls into the runtime after the fact to invoke the necessary write barriers. For the hybrid barrier (and for ROC), we need to switch to a *pre*-write write barrier, which is very difficult to do with the current setup. We could tie ourselves in knots of subtle reasoning about why it's okay in this particular case to have a post-write write barrier, but this commit instead takes a different approach. Rather than making things more complex, this simplifies reflection calls so that the argument copy is done in Go using normal bulk write barriers. The one difficulty with this approach is that calling into Go requires putting arguments on the stack, but the call* functions "donate" their entire stack frame to the called function. We can get away with this now because the copy avoids using the stack and has copied the results out before we clobber the stack frame to call into the write barrier. The solution in this CL is to call another function, passing arguments in registers instead of on the stack, and let that other function reserve more stack space and setup the arguments for the runtime. This approach seemed to work out the best. I also tried making the call* functions reserve 32 extra bytes of frame for the write barrier arguments and adjust SP up by 32 bytes around the call. However, even with the necessary changes to the assembler to correct the spdelta table, the runtime was still having trouble with the frame layout (and the changes to the assembler caused many other things that do strange things with the SP to fail to assemble). The approach I took doesn't require any funny business with the SP. Updates #17503. Change-Id: Ie2bb0084b24d6cff38b5afb218b9e0534ad2119e Reviewed-on: https://go-review.googlesource.com/31655 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-24runtime: make morestack less subtleAustin Clements
morestack writes the context pointer to gobuf.ctxt, but since morestack is written in assembly (and has to be very careful with state), it does *not* invoke the requisite write barrier for this write. Instead, we patch this up later, in newstack, where we invoke an explicit write barrier for ctxt. This already requires some subtle reasoning, and it's going to get a lot hairier with the hybrid barrier. Fix this by simplifying the whole mechanism. Instead of writing gobuf.ctxt in morestack, just pass the value of the context register to newstack and let it write it to gobuf.ctxt. This is a normal Go pointer write, so it gets the normal Go write barrier. No subtle reasoning required. Updates #17503. Change-Id: Ia6bf8459bfefc6828f53682ade32c02412e4db63 Reviewed-on: https://go-review.googlesource.com/31550 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-17runtime: print a message on bad morestackAustin Clements
If morestack runs on the g0 or gsignal stack, it currently performs some abort operation that typically produces a signal (e.g., it does an INT $3 on x86). This is useful if you're running in a debugger, but if you're not, the runtime tries to trap this signal, which is likely to send the program into a deeper spiral of collapse and lead to very confusing diagnostic output. Help out people trying to debug without a debugger by making morestack print an informative message before blowing up. Change-Id: I2814c64509b137bfe20a00091d8551d18c2c4749 Reviewed-on: https://go-review.googlesource.com/31133 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-09-26runtime: implement getcallersp in GoAustin Clements
This makes it possible to inline getcallersp. getcallersp is on the hot path of defers, so this slightly speeds up defer: name old time/op new time/op delta Defer-4 78.3ns ± 2% 75.1ns ± 1% -4.00% (p=0.000 n=9+8) Updates #14939. Change-Id: Icc1cc4cd2f0a81fc4c8344432d0b2e783accacdd Reviewed-on: https://go-review.googlesource.com/29655 TryBot-Result: Gobot Gobot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-08-30runtime: rename fastrand1 to fastrandJosh Bleecher Snyder
Change-Id: I37706ff0a3486827c5b072c95ad890ea87ede847 Reviewed-on: https://go-review.googlesource.com/28210 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-30cmd/compile, runtime, etc: get rid of constant FP registersCherry Zhang
On ARM64, MIPS64, and PPC64, some floating point registers were reserved for constants 0, 1, 2, 0.5, etc. This CL removes them. On ARM64, they are never used. On MIPS64 and PPC64, the only use case is a multiplication-by-2 in the old backend of the compiler, which is replaced with an addition. Change-Id: I737cbf43283756e3408964fc88c567a938c57036 Reviewed-on: https://go-review.googlesource.com/28095 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-25all: fix assembly vet issuesJosh Bleecher Snyder
Add missing function prototypes. Fix function prototypes. Use FP references instead of SP references. Fix variable names. Update comments. Clean up whitespace. (Not for vet.) All fairly minor fixes to make vet happy. Updates #11041 Change-Id: Ifab2cdf235ff61cdc226ab1d84b8467b5ac9446c Reviewed-on: https://go-review.googlesource.com/27713 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-29runtime: fix cgocallback_gofunc argument passing on arm64Ian Lance Taylor
Change-Id: I4b34bcd5cde71ecfbb352b39c4231de6168cc7f3 Reviewed-on: https://go-review.googlesource.com/22651 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <munday@ca.ibm.com>
2016-04-29cmd/cgo, runtime, runtime/cgo: use cgo context functionIan Lance Taylor
Add support for the context function set by runtime.SetCgoTraceback. The context function was added in CL 17761, without support. This CL is the support. This CL has not been tested for real C code, as a working context function for C code requires unwind support that does not seem to exist. I wanted to get the CL out before the freeze. I apologize for the length of this CL. It's mostly plumbing, but unfortunately the plumbing is processor-specific. Change-Id: I8ce11a0de9b3dafcc29efd2649d776e93bff0e90 Reviewed-on: https://go-review.googlesource.com/22508 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02all: single space after period.Brad Fitzpatrick
The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-23runtime: unify memeq and memequalKeith Randall
They do the same thing, except memequal also has the short-circuit check if the two pointers are equal. A) We might as well always do the short-circuit check, it is only 2 instructions. B) The extra function call (memequal->memeq) is expensive. benchmark old ns/op new ns/op delta BenchmarkArrayEqual-8 8.56 5.31 -37.97% No noticeable affect on the former memeq user (maps). Fixes #14302 Change-Id: I85d1ada59ed11e64dd6c54667f79d32cc5f81948 Reviewed-on: https://go-review.googlesource.com/19843 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-01-11runtime: fix arm/arm64/ppc64/mips64 to dropm when necessaryIan Lance Taylor
Fixes #13881. Change-Id: Idff77db381640184ddd2b65022133bb226168800 Reviewed-on: https://go-review.googlesource.com/18449 Reviewed-by: David Crawshaw <crawshaw@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org>
2015-11-25runtime: fix conflict resolution in golang.org/cl/14207Michael Hudson-Doyle
Fixes testshared on arm64 and ppc64le. Change-Id: Ie94bc0c85c7666fbb5ab6fc6d3dbb180407a9955 Reviewed-on: https://go-review.googlesource.com/17212 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-11-25runtime: check that masks and shifts are correct alignedShenghou Ma
We need a runtime check because the original issue is encountered when running cross compiled windows program from linux. It's better to give a meaningful crash message earlier than to segfault later. The added test should not impose any measurable overhead to Go programs. For #12415. Change-Id: Ib4a24ef560c09c0585b351d62eefd157b6b7f04c Reviewed-on: https://go-review.googlesource.com/14207 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Minux Ma <minux@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>