go - The Go programming language

Age	Commit message (Collapse)	Author
2021-04-22	cmd/compile, runtime: add metadata for argument printing in traceback	Cherry Zhang
	Currently, when the runtime printing a stack track (at panic, or when runtime.Stack is called), it prints the function arguments as words in memory. With a register-based calling convention, the layout of argument area of the memory changes, so the printing also needs to change. In particular, the memory order and the syntax order of the arguments may differ. To address that, this CL lets the compiler to emit some metadata about the memory layout of the arguments, and the runtime will use this information to print arguments in syntax order. Previously we print the memory contents of the results along with the arguments. The results are likely uninitialized when the traceback is taken, so that information is rarely useful. Also, with a register-based calling convention the results may not have corresponding locations in memory. This CL changes it to not print results. Previously the runtime simply prints the memory contents as pointer-sized words. With a register-based calling convention, as the layout changes, arguments that were packed in one word may no longer be in one word. Also, as the spill slots are not always initialized, it is possible that some part of a word contains useful informationwhile the rest contains garbage. Instead of letting the runtime recreating the ABI0 layout and print them as words, we now print each component separately. Aggregate-typed argument/component is surrounded by "{}". For example, for a function F(int, [3]byte, byte) int when called as F(1, [3]byte{2, 3, 4}, 5), it used to print F(0x1, 0x5040302, 0xXXXXXXXX) // assuming little endian, 0xXXXXXXXX is uninitilized result Now prints F(0x1, {0x2, 0x3, 0x4}, 0x5). Note: the liveness tracking of the spill splots has not been implemented in this CL. Currently the runtime just assumes all the slots are live and print them all. Increase binary sizes by ~1.5%. old new hello (println) 1171328 1187712 (+1.4%) hello (fmt) 1877024 1901600 (+1.3%) cmd/compile 22326928 22662800 (+1.5%) cmd/go 13505024 13726208 (+1.6%) Updates #40724. Change-Id: I351e0bf497f99bdbb3f91df2fb17e3c2c5c316dc Reviewed-on: https://go-review.googlesource.com/c/go/+/304470 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-04-14	runtime: update debug call protocol for register ABI	Michael Anthony Knyszek
	The debug call tests currently assume that the target Go function is ABI0; this is clearly no longer true when we switch to the new ABI, so make the tests set up argument register state in the debug call handler and copy back results returned in registers. A small snag in calling a Go function that follows the new ABI is that the debug call protocol depends on the AX register being set to a specific value as it bounces in and out of the handler, but this register is part of the new register ABI, so results end up being clobbered. Use R12 instead. Next, the new desugaring behavior for "go" statements means that newosproc1 must always call a function with no frame; if it takes any arguments, it closes over them and they're passed in the context register. Currently when debugCallWrap creates a new goroutine, it uses newosproc1 directly and passes a non-zero-sized frame, so that needs to be updated. To fix this, briefly use the g's param field which is otherwise only used for channels to pass an explicitly allocated object containing the "closed over" variables. While we could manually do the desugaring ourselves (we cannot do so automatically because the Go compiler prevents heap-allocated closures in the runtime), that bakes in more ABI details in a place that really doesn't need to care about them. Finally, there's an old bug here where the context register was set up in CX, so technically closure calls never worked. Oops. It was otherwise harmless for other types of calls before, but now CX is an argument register, so now that interferes with regular calls, too. For #40724. Change-Id: I652c25ed56a25741bb04c24cfb603063c099edde Reviewed-on: https://go-review.googlesource.com/c/go/+/309169 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-04-13	runtime, cgo/test: improve debugging output	David Chase
	tests that run commands should log their actions in a shell-pasteable way. Change-Id: Ifeee88397047ef5a76925c5f30c213e83e535038 Reviewed-on: https://go-review.googlesource.com/c/go/+/309770 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-04-13	runtime: eliminate externalthreadhandler	Austin Clements
	This function is no longer used. Eliminating this actually fixes several problems: - It made assumptions about what registers memclrNoHeapPointers would preserve. Besides being an abstraction violation and lurking maintenance issue, this actively became a problem for regabi because the call to memclrNoHeapPointers now happens through an ABI wrapper, which is generated by the compiler and hence we can't easily control what registers it clobbers. - The amd64 implementation (at least), does not interact with the host ABI correctly. Notably, it doesn't save many of the registers that are callee-save in the host ABI but caller-save in the Go ABI. - It interacts strangely with the NOSPLIT checker because it allocates an entire M and G on its stack. It worked around this on arm64, and happened to do things the NOSPLIT checker couldn't track on 386 and amd64, and happened to be 4 bytes below the limit on arm (so any addition to the m or g structs would cause a NOSPLIT failure). See CL 309031 for a more complete explanation. Fixes #45530. Updates #40724. Change-Id: Ic70d4d7e1c17f1d796575b3377b8529449e93576 Reviewed-on: https://go-review.googlesource.com/c/go/+/309634 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-04-12	runtime: non-strict InlTreeIndex lookup in expandFinalInlineFrame	Michael Pratt
	This is a follow-up to golang.org/cl/301369, which made the same change in Frames.Next. The same logic applies here: a profile stack may have been truncated at an invalid PC provided by cgoTraceback. expandFinalInlineFrame will then try to lookup the inline tree and crash. The same fix applies as well: upon encountering a bad PC, simply leave it as-is and move on. Fixes #44971 Fixes #45480 Change-Id: I2823c67a1f3425466b05384cc6d30f5fc8ee6ddc Reviewed-on: https://go-review.googlesource.com/c/go/+/309109 Reviewed-by: Michael Knyszek <mknyszek@google.com> Trust: Michael Pratt <mpratt@google.com>
2021-04-06	runtime: use funcID to identify abort in isAbortPC	Michael Anthony Knyszek
	This change eliminates the use of funcPC to determine if an PC is in abort. Using funcPC for this purpose is problematic when using plugins because symbols no longer have unique PCs. funcPC also grabs the wrapper for runtime.abort which isn't what we want for the new register ABI, so rather than mark runtime.abort as ABIInternal, use funcID. For #40724. Change-Id: I2730e99fe6f326d22d64a10384828b94f04d101a Reviewed-on: https://go-review.googlesource.com/c/go/+/307391 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>
2021-03-30	runtime: non-strict InlTreeIndex lookup in Frames.Next	Michael Pratt
	When using cgo, some of the frames can be provided by cgoTraceback, a cgo-provided function to generate C tracebacks. Unlike Go tracebacks, cgoTraceback has no particular guarantees that it produces valid tracebacks. If one of the (invalid) frames happens to put the PC in the alignment region at the end of a function (filled with int 3's on amd64), then Frames.Next will find a valid funcInfo for the PC, but pcdatavalue will panic because PCDATA doesn't cover this PC. Tolerate this case by doing a non-strict PCDATA lookup. We'll still show a bogus frame, but at least avoid throwing. Fixes #44971 Change-Id: I9eed728470d6f264179a7615bd19845c941db78c Reviewed-on: https://go-review.googlesource.com/c/go/+/301369 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-02-19	runtime: do not treat mcall as a topofstack	Russ Cox
	I added mcall to this list in 2013 without explaining why. (https://codereview.appspot.com/11085043/diff/61001/src/pkg/runtime/traceback_x86.c) I suspect I was stopping crashes during profiling where the unwind tried to walk up past mcall and got confused. mcall is not something you can unwind past, because it switches stacks, but it's also not something you should expect as a standard top-of-stack frame. So if you do see it during say a garbage collection stack walk, it would be important to crash instead of silently stopping the walk prematurely. This CL removes it from the topofstack list to avoid the silent stop. Now that mcall is detected as SPWRITE, that will stop the unwind (with a crash if encountered during GC, which we want). This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: I666487ce24efd72292f2bc3eac7fe0477e16bddd Reviewed-on: https://go-review.googlesource.com/c/go/+/288803 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-02-19	cmd/asm, cmd/link, runtime: introduce FuncInfo flag bits	Russ Cox
	The runtime traceback code has its own definition of which functions mark the top frame of a stack, separate from the TOPFRAME bits that exist in the assembly and are passed along in DWARF information. It's error-prone and redundant to have two different sources of truth. This CL provides the actual TOPFRAME bits to the runtime, so that the runtime can use those bits instead of reinventing its own category. This CL also adds a new bit, SPWRITE, which marks functions that write directly to SP (anything but adding and subtracting constants). Such functions must stop a traceback, because the traceback has no way to rederive the SP on entry. Again, the runtime has its own definition which is mostly correct, but also missing some functions. During ordinary goroutine context switches, such functions do not appear on the stack, so the incompleteness in the runtime usually doesn't matter. But profiling signals can arrive at any moment, and the runtime may crash during traceback if it attempts to unwind an SP-writing frame and gets out-of-sync with the actual stack. The runtime contains code to try to detect likely candidates but again it is incomplete. Deriving the SPWRITE bit automatically from the actual assembly code provides the complete truth, and passing it to the runtime lets the runtime use it. This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58 Reviewed-on: https://go-review.googlesource.com/c/go/+/288800 Trust: Russ Cox <rsc@golang.org> Trust: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-19	runtime: clean up funcID assignment	Russ Cox
	Large enum sets should be sorted by name when the values don't matter, as they don't here. Do that. Also replace the large switch with a map lookup. This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: Ibe727b5d8866bf4c40c96020e1f4632bde7efd59 Reviewed-on: https://go-review.googlesource.com/c/go/+/288798 Trust: Russ Cox <rsc@golang.org> Trust: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-10-30	cmd/internal/objabi, runtime: compact FUNCDATA indices	Cherry Zhang
	As we deleted register maps, move FUNCDATA indices of stack objects, inline trees, and open-coded defers earlier. Change-Id: If73797b8c11fd207655c9498802fca9f6f9ac338 Reviewed-on: https://go-review.googlesource.com/c/go/+/265761 Trust: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>
2020-10-30	runtime: remove go115ReduceLiveness and go115RestartSeq	Cherry Zhang
	Make them always true. Delete code that are only executed when they are false. Change-Id: I6194fa00de23486c2b0a0c9075fe3a09d9c52762 Reviewed-on: https://go-review.googlesource.com/c/go/+/264339 Trust: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>
2020-10-26	runtime,cmd/cgo: simplify C -> Go call path	Austin Clements
	This redesigns the way calls work from C to exported Go functions. It removes several steps from the call path, makes cmd/cgo no longer sensitive to the Go calling convention, and eliminates the use of reflectcall from cgo. In order to avoid generating a large amount of FFI glue between the C and Go ABIs, the cgo tool has long depended on generating a C function that marshals the arguments into a struct, and then the actual ABI switch happens in functions with fixed signatures that simply take a pointer to this struct. In a way, this CL simply pushes this idea further. Currently, the cgo tool generates this argument struct in the exact layout of the Go stack frame and depends on reflectcall to unpack it into the appropriate Go call (even though it's actually reflectcall'ing a function generated by cgo). In this CL, we decouple this struct from the Go stack layout. Instead, cgo generates a Go function that takes the struct, unpacks it, and calls the exported function. Since this generated function has a generic signature (like the rest of the call path), we don't need reflectcall and can instead depend on the Go compiler itself to implement the call to the exported Go function. One complication is that syscall.NewCallback on Windows, which converts a Go function into a C function pointer, depends on cgocallback's current dynamic calling approach since the signatures of the callbacks aren't known statically. For this specific case, we continue to depend on reflectcall. Really, the current approach makes some overly simplistic assumptions about translating the C ABI to the Go ABI. Now we're at least in a much better position to do a proper ABI translation. For comparison, the current cgo call path looks like: GoF (generated C function) -> crosscall2 (in cgo/asm_.s) -> _cgoexp_GoF (generated Go function) -> cgocallback (in asm_.s) -> cgocallback_gofunc (in asm_.s) -> cgocallbackg (in cgocall.go) -> cgocallbackg1 (in cgocall.go) -> reflectcall (in asm_.s) -> _cgoexpwrap_GoF (generated Go function) -> p.GoF Now the call path looks like: GoF (generated C function) -> crosscall2 (in cgo/asm_.s) -> cgocallback (in asm_.s) -> cgocallbackg (in cgocall.go) -> cgocallbackg1 (in cgocall.go) -> _cgoexp_GoF (generated Go function) -> p.GoF Notably: 1. We combine _cgoexp_GoF and _cgoexpwrap_GoF and move the combined operation to the end of the sequence. This combined function also handles reflectcall's previous role. 2. We combined cgocallback and cgocallback_gofunc since the only purpose of having both was to convert a raw PC into a Go function value. We instead construct the Go function value in cgocallbackg1. 3. cgocallbackg1 no longer reaches backwards through the stack to get the arguments to cgocallback_gofunc. Instead, we just pass the arguments down. 4. Currently, we need an explicit msanwrite to mark the results struct as written because reflectcall doesn't do this. Now, the results are written by regular Go assignments, so the Go compiler generates the necessary MSAN annotations. This also means we no longer need to track the size of the arguments frame. Updates #40724, since now we don't need to teach cgo about the register ABI or change how it uses reflectcall. Change-Id: I7840489a2597962aeb670e0c1798a16a7359c94f Reviewed-on: https://go-review.googlesource.com/c/go/+/258938 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-14	runtime: implement GODEBUG=inittrace=1 support	Martin Möhrmann
	Setting inittrace=1 causes the runtime to emit a single line to standard error for each package with init work, summarizing the execution time and memory allocation. The emitted debug information for init functions can be used to find bottlenecks or regressions in Go startup performance. Packages with no init function work (user defined or compiler generated) are omitted. Tracing plugin inits is not supported as they can execute concurrently. This would make the implementation of tracing more complex while adding support for a very rare use case. Plugin inits can be traced separately by testing a main package importing the plugins package imports explicitly. $ GODEBUG=inittrace=1 go test init internal/bytealg @0.008 ms, 0 ms clock, 0 bytes, 0 allocs init runtime @0.059 ms, 0.026 ms clock, 0 bytes, 0 allocs init math @0.19 ms, 0.001 ms clock, 0 bytes, 0 allocs init errors @0.22 ms, 0.004 ms clock, 0 bytes, 0 allocs init strconv @0.24 ms, 0.002 ms clock, 32 bytes, 2 allocs init sync @0.28 ms, 0.003 ms clock, 16 bytes, 1 allocs init unicode @0.44 ms, 0.11 ms clock, 23328 bytes, 24 allocs ... Inspired by stapelberg@google.com who instrumented doInit in a prototype to measure init times with GDB. Fixes #41378 Change-Id: Ic37c6a0cfc95488de9e737f5e346b8dbb39174e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/254659 Trust: Martin Möhrmann <moehrmann@google.com> Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-09-11	[dev.link] all: merge branch 'master' into dev.link	Cherry Zhang
	Clean merge. Change-Id: Ib773b0bc00fd99d494f9331c3613bcc8285e48e3
2020-09-08	runtime: make PCDATA_RegMapUnsafe more clear and remove magic number	chainhelen
	Change-Id: Ibf3ee755c3fbec03a9396840dc92ce148c49d9f7 GitHub-Last-Rev: 945d8aaa136003dc381c6aa48bff9ea7ca2c6991 GitHub-Pull-Request: golang/go#41262 Reviewed-on: https://go-review.googlesource.com/c/go/+/253377 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-08-18	[dev.link] cmd/{compile,link}: remove pcdata tables from pclntab_old	Jeremy Faller
	Move the pctables out of pclntab_old. Creates a new generator symbol, runtime.pctab, which holds all the deduplicated pctables. Also, tightens up some of the types in runtime. Darwin, cmd/compile statistics: alloc/op Pclntab_GC 26.4MB ± 0% 13.8MB ± 0% allocs/op Pclntab_GC 89.9k ± 0% 86.4k ± 0% liveB Pclntab_GC 25.5M ± 0% 24.2M ± 0% No significant change in binary size. Change-Id: I1560fd4421f8a210f8d4b508fbc54e1780e338f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/248332 Run-TryBot: Jeremy Faller <jeremy@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-12	[dev.link] cmd/link: stop renumbering files for pclntab generation	Jeremy Faller
	Creates two new symbols: runtime.cutab, and runtime.filetab, and strips the filenames out of runtime.pclntab_old. All stats are for cmd/compile. Time: Pclntab_GC 48.2ms ± 3% 45.5ms ± 9% -5.47% (p=0.004 n=9+9) Alloc/op: Pclntab_GC 30.0MB ± 0% 29.5MB ± 0% -1.88% (p=0.000 n=10+10) Allocs/op: Pclntab_GC 90.4k ± 0% 73.1k ± 0% -19.11% (p=0.000 n=10+10) live-B: Pclntab_GC 29.1M ± 0% 29.2M ± 0% +0.10% (p=0.000 n=10+10) binary sizes: NEW: 18565600 OLD: 18532768 The size differences in the binary are caused by the increased size of the Func objects, and (less likely) some extra alignment padding needed as a result. This is probably the maximum increase in size we'll size from the pclntab reworking. Change-Id: Idd95a9b159fea46f7701cfe6506813b88257fbea Reviewed-on: https://go-review.googlesource.com/c/go/+/246497 Run-TryBot: Jeremy Faller <jeremy@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Austin Clements <austin@google.com>
2020-07-31	[dev.link] create runtime.funcnametab	Jeremy Faller
	Move the function names out of runtime.pclntab_old, creating runtime.funcnametab. There is an unfortunate artifact in this change in that calculating the funcID still requires loading the name. Future work will likely pull this out and put it into the object file Funcs. ls -l cmd/compile (darwin): before: 18524016 after: 18519952 The difference in size can be attributed to alignment in pclntab_old. Change-Id: Ibcbb230d4632178f8fcd0667165f5335786381f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/243223 Reviewed-by: Austin Clements <austin@google.com>
2020-07-30	[dev.link] cmd/link: add runtime.pcheader	Jeremy Faller
	As of July 2020, a fair amount of the new linker's live memory, and runtime is spent generating pclntab. In an effort to streamline that code, this change starts breaking up the generation of runtime.pclntab into smaller chunks that can run later in a link. These changes are described in an (as yet not widely distributed) document that lays out an improved format. Largely the work consists of breaking up runtime.pclntab into smaller pieces, stopping much of the data rewriting, and getting runtime.pclntab into a form where we can reason about its size and look to shrink it. This change is the first part of that work -- just pulling out the header, and demonstrating where a majority of that work will be. Change-Id: I65618d0d0c780f7e5977c9df4abdbd1696fedfcb Reviewed-on: https://go-review.googlesource.com/c/go/+/241598 Run-TryBot: Jeremy Faller <jeremy@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>
2020-06-10	runtime: fix typo in FuncForPC docgo1.15beta1	Rodolfo Carvalho
	Change-Id: I04037e13b131e79ebc5af84896bfeda49ddc0eaa GitHub-Last-Rev: b0d0de930862e4f163e158876cba70d81ed2d52e GitHub-Pull-Request: golang/go#39500 Reviewed-on: https://go-review.googlesource.com/c/go/+/237220 Reviewed-by: Keith Randall <khr@golang.org>
2020-05-06	cmd/internal/obj, runtime: preempt & restart some instruction sequences	Cherry Zhang
	On some architectures, for async preemption the injected call needs to clobber a register (usually REGTMP) in order to return to the preempted function. As a consequence, the PC ranges where REGTMP is live are not preemptible. The uses of REGTMP are usually generated by the assembler, where it needs to load or materialize a large constant or offset that doesn't fit into the instruction. In those cases, REGTMP is not live at the start of the instruction sequence. Instead of giving up preemption in those cases, we could preempt it and restart the sequence when resuming the execution. Basically, this is like reissuing an interrupted instruction, except that here the "instruction" is a Prog that consists of multiple machine instructions. For this to work, we need to generate PC data to mark the start of the Prog. Currently this is only done for ARM64. TODO: the split-stack function prologue is currently not async preemptible. We could use this mechanism, preempt it and restart at the function entry. Change-Id: I37cb282f8e606e7ab6f67b3edfdc6063097b4bd1 Reviewed-on: https://go-review.googlesource.com/c/go/+/208126 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2020-04-29	cmd/compile,runtime: stack maps only at calls, remove register maps	Austin Clements
	Currently, we emit stack maps and register maps at almost every instruction. This was originally intended to support non-cooperative preemption, but was only ever used for debug call injection. Now debug call injection also uses conservative frame scanning. As a result, stack maps are only needed at call sites and register maps aren't needed at all except that we happen to also encode unsafe-point information in the register map PCDATA stream. This CL reduces stack maps to only appear at calls, and replace full register maps with just safe/unsafe-point information. This is all protected by the go115ReduceLiveness feature flag, which is defined in both runtime and cmd/compile. This CL significantly reduces binary sizes and also speeds up compiles and links: name old exe-bytes new exe-bytes delta BinGoSize 15.0MB ± 0% 14.1MB ± 0% -5.72% name old pcln-bytes new pcln-bytes delta BinGoSize 3.14MB ± 0% 2.48MB ± 0% -21.08% name old time/op new time/op delta Template 178ms ± 7% 172ms ±14% -3.59% (p=0.005 n=19+19) Unicode 71.0ms ±12% 69.8ms ±10% ~ (p=0.126 n=18+18) GoTypes 655ms ± 8% 615ms ± 8% -6.11% (p=0.000 n=19+19) Compiler 3.27s ± 6% 3.15s ± 7% -3.69% (p=0.001 n=20+20) SSA 7.10s ± 5% 6.85s ± 8% -3.53% (p=0.001 n=19+20) Flate 124ms ±15% 116ms ±22% -6.57% (p=0.024 n=18+19) GoParser 156ms ±26% 147ms ±34% ~ (p=0.070 n=19+19) Reflect 406ms ± 9% 387ms ±21% -4.69% (p=0.028 n=19+20) Tar 163ms ±15% 162ms ±27% ~ (p=0.370 n=19+19) XML 223ms ±13% 218ms ±14% ~ (p=0.157 n=20+20) LinkCompiler 503ms ±21% 484ms ±23% ~ (p=0.072 n=20+20) ExternalLinkCompiler 1.27s ± 7% 1.22s ± 8% -3.85% (p=0.005 n=20+19) LinkWithoutDebugCompiler 294ms ±17% 273ms ±11% -7.16% (p=0.001 n=19+18) (https://perf.golang.org/search?q=upload:20200428.8) The binary size improvement is even slightly better when you include the CLs leading up to this. Relative to the parent of "cmd/compile: mark PanicBounds/Extend as calls": name old exe-bytes new exe-bytes delta BinGoSize 15.0MB ± 0% 14.1MB ± 0% -6.18% name old pcln-bytes new pcln-bytes delta BinGoSize 3.22MB ± 0% 2.48MB ± 0% -22.92% (https://perf.golang.org/search?q=upload:20200428.9) For #36365. Change-Id: I69448e714f2a44430067ca97f6b78e08c0abed27 Reviewed-on: https://go-review.googlesource.com/c/go/+/230544 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-20	runtime: handle empty stack in expandFinalInlineFrame	Keith Randall
	Fixes #37967 Change-Id: I6fc22bdd65f0263d5672731b73d09249201ab0aa Reviewed-on: https://go-review.googlesource.com/c/go/+/224458 Reviewed-by: Michael Pratt <mpratt@google.com>
2020-03-05	runtime/pprof: expand final stack frame to avoid truncation	Michael Pratt
	When generating stacks, the runtime automatically expands inline functions to inline all inline frames in the stack. However, due to the stack size limit, the final frame may be truncated in the middle of several inline frames at the same location. As-is, we assume that the final frame is a normal function, and emit and cache a Location for it. If we later receive a complete stack frame, we will first use the cached Location for the inlined function and then generate a new Location for the "caller" frame, in violation of the pprof requirement to merge inlined functions into the same Location. As a result, we: 1. Nondeterministically may generate a profile with the different stacks combined or split, depending on which is encountered first. This is particularly problematic when performing a diff of profiles. 2. When split stacks are generated, we lose the inlining information. We avoid both of these problems by performing a second expansion of the last stack frame to recover additional inline frames that may have been lost. This expansion is a bit simpler than the one done by the runtime because we don't have to handle skipping, and we know that the last emitted frame is not an elided wrapper, since it by definition is already included in the stack. Fixes #37446 Change-Id: If3ca2af25b21d252cf457cc867dd932f107d4c61 Reviewed-on: https://go-review.googlesource.com/c/go/+/221577 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2020-02-18	cmd/link, runtime: skip holes in func table	Cherry Zhang
	On PPC64 when external linking, for large binaries we split the text section to multiple sections, so the external linking may insert trampolines between sections. These trampolines are within the address range covered by the func table, but not known by Go. This causes runtime.findfunc to return a wrong function if the given PC is from such trampolines. In this CL, we generate a marker between text sections where there could potentially be a hole in the func table. At run time, we skip the hole if we see such a marker. Fixes #37216. Change-Id: I95ab3875a84b357dbaa65a4ed339a19282257ce0 Reviewed-on: https://go-review.googlesource.com/c/go/+/219717 Reviewed-by: David Chase <drchase@google.com>
2019-11-02	runtime: use signals to preempt Gs for suspendG	Austin Clements
	This adds support for pausing a running G by sending a signal to its M. The main complication is that we want to target a G, but can only send a signal to an M. Hence, the protocol we use is to simply mark the G for preemption (which we already do) and send the M a "wake up and look around" signal. The signal checks if it's running a G with a preemption request and stops it if so in the same way that stack check preemptions stop Gs. Since the preemption may fail (the G could be moved or the signal could arrive at an unsafe point), we keep a count of the number of received preemption signals. This lets stopG detect if its request failed and should be retried without an explicit channel back to suspendG. For #10958, #24543. Change-Id: I3e1538d5ea5200aeb434374abb5d5fdc56107e53 Reviewed-on: https://go-review.googlesource.com/c/go/+/201760 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-02	runtime: scan stacks conservatively at async safe points	Austin Clements
	This adds support for scanning the stack when a goroutine is stopped at an async safe point. This is not yet lit up because asyncPreempt is not yet injected, but prepares us for that. This works by conservatively scanning the registers dumped in the frame of asyncPreempt and its parent frame, which was stopped at an asynchronous safe point. Conservative scanning works by only marking words that are pointers to valid, allocated heap objects. One complication is pointers to stack objects. In this case, we can't determine if the stack object is still "allocated" or if it was freed by an earlier GC. Hence, we need to propagate the conservative-ness of scanning stack objects: if all pointers found to a stack object were found via conservative scanning, then the stack object itself needs to be scanned conservatively, since its pointers may point to dead objects. For #10958, #24543. Change-Id: I7ff84b058c37cde3de8a982da07002eaba126fd6 Reviewed-on: https://go-review.googlesource.com/c/go/+/201761 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24	cmd/compile, cmd/link, runtime: make defers low-cost through inline code and ↵	Dan Scales
	extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/202340 Reviewed-by: Austin Clements <austin@google.com>
2019-10-16	Revert "cmd/compile, cmd/link, runtime: make defers low-cost through inline ↵	Bryan C. Mills
	code and extra funcdata" This reverts CL 190098. Reason for revert: broke several builders. Change-Id: I69161352f9ded02537d8815f259c4d391edd9220 Reviewed-on: https://go-review.googlesource.com/c/go/+/201519 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Dan Scales <danscales@google.com>
2019-10-16	cmd/compile, cmd/link, runtime: make defers low-cost through inline code and ↵	Dan Scales
	extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28 Reviewed-on: https://go-review.googlesource.com/c/go/+/190098 Reviewed-by: Austin Clements <austin@google.com>
2019-10-11	runtime: make goroutine for wasm async events short-lived	Richard Musiol
	An extra goroutine is necessary to handle asynchronous events on wasm. However, we do not want this goroutine to exist all the time. This change makes it short-lived, so it ends after the asynchronous event was handled. Fixes #34768 Change-Id: I24626ff0af9d803a01ebe33fbb584d04d2059a44 Reviewed-on: https://go-review.googlesource.com/c/go/+/200497 Run-TryBot: Richard Musiol <neelance@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-07	runtime: do not omit stack trace of goroutine that handles async events	Richard Musiol
	On wasm there is a special goroutine that handles asynchronous events. Blocking this goroutine often causes a deadlock. However, the stack trace of this goroutine was omitted when printing the deadlock error. This change adds an exception so the goroutine is not considered as an internal system goroutine and the stack trace gets printed, which helps with debugging the deadlock. Updates #32764 Change-Id: Icc8f5ba3ca5a485d557b7bdd76bf2f1ffb92eb3e Reviewed-on: https://go-review.googlesource.com/c/go/+/199537 Run-TryBot: Richard Musiol <neelance@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-23	runtime: allow the Go runtime to return multiple stack frames for a single PC	Keith Randall
	Upgrade the thread sanitizer to handle mid-stack inlining correctly. We can now return multiple stack frames for each pc that the thread sanitizer gives us to symbolize. To fix #33309, we still need to modify the tsan library with its portion of this fix, rebuild the .syso files on all supported archs, and check them into runtime/race. Update #33309 Change-Id: I340013631ffc8428043ab7efe3a41b6bf5638eaf Reviewed-on: https://go-review.googlesource.com/c/go/+/195781 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2019-05-20	runtime: In Frames.Next, delay file/line lookup until just before return	Keith Randall
	That way we will never have to look up the file/line for the frame that's next to be returned when the user stops calling Next. For the benchmark from #32093: name old time/op new time/op delta Helper-4 948ns ± 1% 836ns ± 3% -11.89% (p=0.000 n=9+9) (#32093 was fixed with a more specific, and better, fix, but this fix is much more general.) Change-Id: I89e796f80c9706706d8d8b30eb14be3a8a442846 Reviewed-on: https://go-review.googlesource.com/c/go/+/178077 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-04-19	runtime, cmd/compile: re-order PCDATA and FUNCDATA indices	Josh Bleecher Snyder
	The pclntab encoding supports writing only some PCDATA and FUNCDATA values. However, the encoding is dense: The max index in use determines the space used. We should thus choose a numbering in which frequently used indices are smaller. This change re-orders the PCDATA and FUNCDATA indices using that principle, using a quick and dirty instrumentation to measure index frequency. It shrinks binaries by about 0.5%. Updates #6853 file before after Δ % go 14745044 14671316 -73728 -0.500% addr2line 4305128 4280552 -24576 -0.571% api 6095800 6058936 -36864 -0.605% asm 4930928 4906352 -24576 -0.498% buildid 2881520 2861040 -20480 -0.711% cgo 4896584 4867912 -28672 -0.586% compile 25868408 25770104 -98304 -0.380% cover 5319656 5286888 -32768 -0.616% dist 3654528 3634048 -20480 -0.560% doc 4719672 4691000 -28672 -0.607% fix 3418312 3393736 -24576 -0.719% link 6137952 6109280 -28672 -0.467% nm 4250536 4225960 -24576 -0.578% objdump 4665192 4636520 -28672 -0.615% pack 2297488 2285200 -12288 -0.535% pprof 14735332 14657508 -77824 -0.528% test2json 2834952 2818568 -16384 -0.578% trace 11679964 11618524 -61440 -0.526% vet 8452696 8403544 -49152 -0.581% Change-Id: I30665dce57ec7a52e7d3c6718560b3aa5b83dd0b Reviewed-on: https://go-review.googlesource.com/c/go/+/171760 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2019-03-27	runtime: create library startup for aix/ppc64	Clément Chigot
	As .init_array section aren't available on AIX, the Go runtime initialization is made with gcc constructor attribute. However, as cgo tool is building a binary in order to get imported C symbols, Go symbols imported for this initilization must be ignored. -Wl,-berok is mandatory otherwize ld will fail to create this binary, _rt0_aix_ppc64_lib and runtime_rt0_go aren't defined in runtime/cgo. These two symbols must also be ignored when creating _cgo_import.go. Change-Id: Icf2e0282f5b50de5fa82007439a428e6147efef1 Reviewed-on: https://go-review.googlesource.com/c/go/+/169118 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-02-26	all: fix typos as reported by 'misspell'	Leon Klingele
	Change-Id: I904b8655f21743189814bccf24073b6fbb9fc56d GitHub-Last-Rev: b032c14394c949f9ad7b18d019a3979d38d4e1fb GitHub-Pull-Request: golang/go#29997 Reviewed-on: https://go-review.googlesource.com/c/160421 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-01-14	runtime: keep FuncForPC from crashing for PCs between functions	Keith Randall
	Reuse the strict mechanism from FileLine for FuncForPC, so we don't crash when asking the pcln table about bad pcs. Fixes #29735 Change-Id: Iaffb32498b8586ecf4eae03823e8aecef841aa68 Reviewed-on: https://go-review.googlesource.com/c/157799 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-01-08	runtime: make FuncForPC return the innermost inlined frame	Keith Randall
	Returning the innermost frame instead of the outermost makes code that walks the results of runtime.Caller{,s} still work correctly in the presence of mid-stack inlining. Fixes #29582 Change-Id: I2392e3dd5636eb8c6f58620a61cef2194fe660a7 Reviewed-on: https://go-review.googlesource.com/c/156364 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-01-08	runtime: store incremented PC in result of runtime.Callers	Keith Randall
	In 1.11 we stored "return addresses" in the result of runtime.Callers. I changed that behavior in CL 152537 to store an address in the call instruction itself. This CL reverts that part of 152537. The change in 152537 was made because we now store pcs of inline marks in the result of runtime.Callers as well. This CL will now store the address of the inline mark + 1 in the results of runtime.Callers, so that the subsequent -1 done in CallersFrames will pick out the correct inline mark instruction. This CL means that the results of runtime.Callers can be passed to runtime.FuncForPC as they were before. There are a bunch of packages in the wild that take the results of runtime.Callers, subtract 1, and then call FuncForPC. This CL keeps that pattern working as it did in 1.11. The changes to runtime/pprof in this CL are exactly a revert of the changes to that package in 152537 (except the locForPC comment). Update #29582 Change-Id: I04d232000fb482f0f0ff6277f8d7b9c72e97eb48 Reviewed-on: https://go-review.googlesource.com/c/156657 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-28	cmd/compile,runtime: redo mid-stack inlining tracebacks	Keith Randall
	Work involved in getting a stack trace is divided between runtime.Callers and runtime.CallersFrames. Before this CL, runtime.Callers returns a pc per runtime frame. runtime.CallersFrames is responsible for expanding a runtime frame into potentially multiple user frames. After this CL, runtime.Callers returns a pc per user frame. runtime.CallersFrames just maps those to user frame info. Entries in the result of runtime.Callers are now pcs of the calls (or of the inline marks), not of the instruction just after the call. Fixes #29007 Fixes #28640 Update #26320 Change-Id: I1c9567596ff73dc73271311005097a9188c3406f Reviewed-on: https://go-review.googlesource.com/c/152537 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-11-09	runtime: reduce linear search through pcvalue cache	Josh Bleecher Snyder
	This change introduces two optimizations together, one for recursive and one for non-recursive stacks. For recursive stacks, we introduce the new entry at the beginning of the cache, so it can be found first. This adds an extra read and write. While we're here, switch from fastrandn, which does a multiply, to fastrand % n, which does a shift. For non-recursive stacks, split the cache from [16]pcvalueCacheEnt into [2][8]pcvalueCacheEnt, and add a very cheap associative lookup. name old time/op new time/op delta StackCopyPtr-8 118ms ± 1% 106ms ± 2% -9.56% (p=0.000 n=17+18) StackCopy-8 95.8ms ± 1% 87.0ms ± 3% -9.11% (p=0.000 n=19+20) StackCopyNoCache-8 135ms ± 2% 139ms ± 1% +3.06% (p=0.000 n=19+18) During make.bash, the association function used has this return distribution: percent count return value 53.23% 678797 1 46.74% 596094 0 It is definitely not perfect, but it is pretty good, and that's all we need. Change-Id: I2cabb1d26b99c5111bc28f427016a2a5e6c620fd Reviewed-on: https://go-review.googlesource.com/c/110564 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-10-03	runtime: on a signal, set traceback address to a deferreturn call	Keith Randall
	When a function triggers a signal (like a segfault which translates to a nil pointer exception) during execution, a sigpanic handler is just below it on the stack. The function itself did not stop at a safepoint, so we have to figure out what safepoint we should use to scan its stack frame. Previously we used the site of the most recent defer to get the live variables at the signal site. That answer is not quite correct, as explained in #27518. Instead, use the site of a deferreturn call. It has all the right variables marked as live (no args, all the return values, except those that escape to the heap, in which case the corresponding PAUTOHEAP variables will be live instead). This CL requires stack objects, so that all the local variables and args referenced by the deferred closures keep the right variables alive. Fixes #27518 Change-Id: Id45d8a8666759986c203181090b962e2981e48ca Reviewed-on: https://go-review.googlesource.com/c/134637 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-03	cmd/compile,runtime: implement stack objects	Keith Randall
	Rework how the compiler+runtime handles stack-allocated variables whose address is taken. Direct references to such variables work as before. References through pointers, however, use a new mechanism. The new mechanism is more precise than the old "ambiguously live" mechanism. It computes liveness at runtime based on the actual references among objects on the stack. Each function records all of its address-taken objects in a FUNCDATA. These are called "stack objects". The runtime then uses that information while scanning a stack to find all of the stack objects on a stack. It then does a mark phase on the stack objects, using all the pointers found on the stack (and ancillary structures, like defer records) as the root set. Only stack objects which are found to be live during this mark phase will be scanned and thus retain any heap objects they point to. A subsequent CL will remove all the "ambiguously live" logic from the compiler, so that the stack object tracing will be required. For this CL, the stack tracing is all redundant with the current ambiguously live logic. Update #22350 Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f Reviewed-on: https://go-review.googlesource.com/c/134155 Reviewed-by: Austin Clements <austin@google.com>
2018-05-22	runtime: support for debugger function calls	Austin Clements
	This adds a mechanism for debuggers to safely inject calls to Go functions on amd64. Debuggers must participate in a protocol with the runtime, and need to know how to lay out a call frame, but the runtime support takes care of the details of handling live pointers in registers, stack growth, and detecting the trickier conditions when it is unsafe to inject a user function call. Fixes #21678. Updates derekparker/delve#119. Change-Id: I56d8ca67700f1f77e19d89e7fc92ab337b228834 Reviewed-on: https://go-review.googlesource.com/109699 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-22	cmd/compile, cmd/internal/obj: record register maps in binary	Austin Clements
	This adds FUNCDATA and PCDATA that records the register maps much like the existing live arguments maps and live locals maps. The register map is indexed independently from the argument and locals maps since changes in register liveness tend not to correlate with changes to argument and local liveness. This is the final CL toward adding safe-points everywhere. The following CLs will optimize liveness analysis to bring down the cost. The effect of this CL is: name old time/op new time/op delta Template 195ms ± 2% 197ms ± 1% ~ (p=0.136 n=9+9) Unicode 98.4ms ± 2% 99.7ms ± 1% +1.39% (p=0.004 n=10+10) GoTypes 685ms ± 1% 700ms ± 1% +2.06% (p=0.000 n=9+9) Compiler 3.28s ± 1% 3.34s ± 0% +1.71% (p=0.000 n=9+8) SSA 7.79s ± 1% 7.91s ± 1% +1.55% (p=0.000 n=10+9) Flate 133ms ± 2% 133ms ± 2% ~ (p=0.190 n=10+10) GoParser 161ms ± 2% 164ms ± 3% +1.83% (p=0.015 n=10+10) Reflect 450ms ± 1% 457ms ± 1% +1.62% (p=0.000 n=10+10) Tar 183ms ± 2% 185ms ± 1% +0.91% (p=0.008 n=9+10) XML 234ms ± 1% 238ms ± 1% +1.60% (p=0.000 n=9+9) [Geo mean] 411ms 417ms +1.40% name old exe-bytes new exe-bytes delta HelloSize 1.47M ± 0% 1.51M ± 0% +2.79% (p=0.000 n=10+10) Compared to just before "cmd/internal/obj: consolidate emitting entry stack map", the cumulative effect of adding stack maps everywhere and register maps is: name old time/op new time/op delta Template 185ms ± 2% 197ms ± 1% +6.42% (p=0.000 n=10+9) Unicode 96.3ms ± 3% 99.7ms ± 1% +3.60% (p=0.000 n=10+10) GoTypes 658ms ± 0% 700ms ± 1% +6.37% (p=0.000 n=10+9) Compiler 3.14s ± 1% 3.34s ± 0% +6.53% (p=0.000 n=9+8) SSA 7.41s ± 2% 7.91s ± 1% +6.71% (p=0.000 n=9+9) Flate 126ms ± 1% 133ms ± 2% +6.15% (p=0.000 n=10+10) GoParser 153ms ± 1% 164ms ± 3% +6.89% (p=0.000 n=10+10) Reflect 437ms ± 1% 457ms ± 1% +4.59% (p=0.000 n=10+10) Tar 178ms ± 1% 185ms ± 1% +4.18% (p=0.000 n=10+10) XML 223ms ± 1% 238ms ± 1% +6.39% (p=0.000 n=10+9) [Geo mean] 394ms 417ms +5.78% name old alloc/op new alloc/op delta Template 34.5MB ± 0% 38.0MB ± 0% +10.19% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 30.3MB ± 0% +3.56% (p=0.000 n=8+9) GoTypes 113MB ± 0% 125MB ± 0% +10.89% (p=0.000 n=10+10) Compiler 510MB ± 0% 575MB ± 0% +12.79% (p=0.000 n=10+10) SSA 1.46GB ± 0% 1.64GB ± 0% +12.40% (p=0.000 n=10+10) Flate 23.9MB ± 0% 25.9MB ± 0% +8.56% (p=0.000 n=10+10) GoParser 28.0MB ± 0% 30.8MB ± 0% +10.08% (p=0.000 n=10+10) Reflect 77.6MB ± 0% 84.3MB ± 0% +8.63% (p=0.000 n=10+10) Tar 34.1MB ± 0% 37.0MB ± 0% +8.44% (p=0.000 n=10+10) XML 42.7MB ± 0% 47.2MB ± 0% +10.75% (p=0.000 n=10+10) [Geo mean] 76.0MB 83.3MB +9.60% name old allocs/op new allocs/op delta Template 321k ± 0% 337k ± 0% +4.98% (p=0.000 n=10+10) Unicode 337k ± 0% 340k ± 0% +1.04% (p=0.000 n=10+9) GoTypes 1.13M ± 0% 1.18M ± 0% +4.85% (p=0.000 n=10+10) Compiler 4.67M ± 0% 4.96M ± 0% +6.25% (p=0.000 n=10+10) SSA 11.7M ± 0% 12.3M ± 0% +5.69% (p=0.000 n=10+10) Flate 216k ± 0% 226k ± 0% +4.52% (p=0.000 n=10+9) GoParser 271k ± 0% 283k ± 0% +4.52% (p=0.000 n=10+10) Reflect 927k ± 0% 972k ± 0% +4.78% (p=0.000 n=10+10) Tar 318k ± 0% 333k ± 0% +4.56% (p=0.000 n=10+10) XML 376k ± 0% 395k ± 0% +5.04% (p=0.000 n=10+10) [Geo mean] 730k 764k +4.61% name old exe-bytes new exe-bytes delta HelloSize 1.46M ± 0% 1.51M ± 0% +3.66% (p=0.000 n=10+10) For #24543. Change-Id: I91e003dc64151916b384274884bf02a2d6862547 Reviewed-on: https://go-review.googlesource.com/109353 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-07	runtime: replace system goroutine whitelist with symbol test	Austin Clements
	Currently isSystemGoroutine has a hard-coded list of known entry points into system goroutines. This list is annoying to maintain. For example, it's missing the ensureSigM goroutine. Replace it with a check that simply looks for any goroutine with runtime function as its entry point, with a few exceptions. This also matches the definition recently added to the trace viewer (CL 81315). Change-Id: Iaed723d4a6e8c2ffb7c0c48fbac1688b00b30f01 Reviewed-on: https://go-review.googlesource.com/81655 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-01	runtime: allow inlining of stackmapdata	Josh Bleecher Snyder
	Also do very minor code cleanup. name old time/op new time/op delta StackCopyPtr-8 84.8ms ± 6% 82.9ms ± 5% -2.19% (p=0.000 n=95+94) StackCopy-8 68.4ms ± 5% 65.3ms ± 4% -4.54% (p=0.000 n=99+99) StackCopyNoCache-8 107ms ± 2% 105ms ± 2% -2.13% (p=0.000 n=91+95) Change-Id: I2d85ede48bffada9584d437a08a82212c0da6d00 Reviewed-on: https://go-review.googlesource.com/109001 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-15	runtime: identify special functions by flag instead of address	Keith Randall
	When there are plugins, there may not be a unique copy of runtime functions like goexit, mcall, etc. So identifying them by entry address is problematic. Instead, keep track of each special function using a field in the symbol table. That way, multiple copies of the same runtime function will be treated identically. Fixes #24351 Fixes #23133 Change-Id: Iea3232df8a6af68509769d9ca618f530cc0f84fd Reviewed-on: https://go-review.googlesource.com/100739 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>