runtime: zero upper bit of Y registers in asyncPreempt on darwin/amd64

Apparently, the signal handling code path in darwin kernel leaves the upper bits of Y registers in a dirty state, which causes many SSE operations (128-bit and narrower) become much slower. Clear the upper bits to get to a clean state. We do it at the entry of asyncPreempt, which is immediately following exiting from the kernel's signal handling code, if we actually injected a call. It does not cover other exits where we don't inject a call, e.g. failed preemption, profiling signal, or other async signals. But it does cover an important use case of async signals, preempting a tight numerical loop, which we introduced in this cycle. Running the benchmark in issue #37174: name old time/op new time/op delta Fast-8 90.0ns ± 1% 46.8ns ± 3% -47.97% (p=0.000 n=10+10) Slow-8 188ns ± 5% 49ns ± 1% -73.82% (p=0.000 n=10+9) There is no more slowdown due to preemption signals. For #37174. Change-Id: I8b83d083fade1cabbda09b4bc25ccbadafaf7605 Reviewed-on: https://go-review.googlesource.com/c/go/+/219131 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
author: Cherry Zhang <cherryyz@google.com> 2020-02-11 18:54:30 -0500
committer: Cherry Zhang <cherryyz@google.com> 2020-02-13 19:41:53 +0000
commit: 123f7dd3e1eb90825ece57b8dde39438ca34f150 (patch)
tree: 87c73b63b74568419b164d64b4f5509ffaad9aae /src/runtime/mkpreempt.go
parent: a0c9fb6bd359f111f19176ff176244b91a3e7eaa (diff)
download: go-123f7dd3e1eb90825ece57b8dde39438ca34f150.tar.gz
go-123f7dd3e1eb90825ece57b8dde39438ca34f150.zip
1 files changed, 9 insertions, 0 deletions
diff --git a/src/runtime/mkpreempt.go b/src/runtime/mkpreempt.go
index 64e220772e..31b6f5cbac 100644
--- a/src/runtime/mkpreempt.go
+++ b/src/runtime/mkpreempt.go
@@ -244,6 +244,15 @@ func genAMD64() {
 
 	// TODO: MXCSR register?
 
+	// Apparently, the signal handling code path in darwin kernel leaves
+	// the upper bits of Y registers in a dirty state, which causes
+	// many SSE operations (128-bit and narrower) become much slower.
+	// Clear the upper bits to get to a clean state. See issue #37174.
+	// It is safe here as Go code don't use the upper bits of Y registers.
+	p("#ifdef GOOS_darwin")
+	p("VZEROUPPER")
+	p("#endif")
+
 	p("PUSHQ BP")
 	p("MOVQ SP, BP")
 	p("// Save flags before clobbering them")
author	Cherry Zhang <cherryyz@google.com>	2020-02-11 18:54:30 -0500
committer	Cherry Zhang <cherryyz@google.com>	2020-02-13 19:41:53 +0000
commit	123f7dd3e1eb90825ece57b8dde39438ca34f150 (patch)
tree	87c73b63b74568419b164d64b4f5509ffaad9aae /src/runtime/mkpreempt.go
parent	a0c9fb6bd359f111f19176ff176244b91a3e7eaa (diff)
download	go-123f7dd3e1eb90825ece57b8dde39438ca34f150.tar.gz go-123f7dd3e1eb90825ece57b8dde39438ca34f150.zip