aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/cache.go
AgeCommit message (Collapse)Author
2019-05-10cmd/compile: re-use regalloc's []valStateJosh Bleecher Snyder
Updates #27739: reduces package ssa's allocated space by 3.77%. maxrss is harder to measure, but using best-of-three-runs as reported by /usr/bin/time -l, I see ~2% reduction in maxrss. We still have a long way to go, though; the new maxrss is still 1.1gb. name old alloc/op new alloc/op delta Template 38.8MB ± 0% 37.7MB ± 0% -2.77% (p=0.008 n=5+5) Unicode 28.2MB ± 0% 28.1MB ± 0% -0.20% (p=0.008 n=5+5) GoTypes 131MB ± 0% 127MB ± 0% -2.94% (p=0.008 n=5+5) Compiler 606MB ± 0% 587MB ± 0% -3.21% (p=0.008 n=5+5) SSA 2.14GB ± 0% 2.06GB ± 0% -3.77% (p=0.008 n=5+5) Flate 24.0MB ± 0% 23.3MB ± 0% -3.00% (p=0.008 n=5+5) GoParser 28.8MB ± 0% 28.1MB ± 0% -2.61% (p=0.008 n=5+5) Reflect 83.8MB ± 0% 81.5MB ± 0% -2.71% (p=0.008 n=5+5) Tar 36.4MB ± 0% 35.4MB ± 0% -2.73% (p=0.008 n=5+5) XML 47.9MB ± 0% 46.7MB ± 0% -2.49% (p=0.008 n=5+5) [Geo mean] 84.6MB 82.4MB -2.65% name old allocs/op new allocs/op delta Template 379k ± 0% 379k ± 0% -0.05% (p=0.008 n=5+5) Unicode 340k ± 0% 340k ± 0% ~ (p=0.151 n=5+5) GoTypes 1.36M ± 0% 1.36M ± 0% -0.06% (p=0.008 n=5+5) Compiler 5.49M ± 0% 5.48M ± 0% -0.03% (p=0.008 n=5+5) SSA 17.5M ± 0% 17.5M ± 0% -0.03% (p=0.008 n=5+5) Flate 235k ± 0% 235k ± 0% -0.04% (p=0.008 n=5+5) GoParser 302k ± 0% 302k ± 0% -0.04% (p=0.008 n=5+5) Reflect 976k ± 0% 975k ± 0% -0.10% (p=0.008 n=5+5) Tar 352k ± 0% 352k ± 0% -0.06% (p=0.008 n=5+5) XML 436k ± 0% 436k ± 0% -0.03% (p=0.008 n=5+5) [Geo mean] 842k 841k -0.04% Change-Id: I0ab6631b5a0bb6303c291dcb0367b586a4e584fb Reviewed-on: https://go-review.googlesource.com/c/go/+/176221 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-11cmd/compile: make deadcode pass cheaperJosh Bleecher Snyder
The deadcode pass runs a lot. I'd like it to run even more. This change adds dedicated storage for deadcode to ssa.Cache. In addition to being a nice win now, it makes deadcode easier to add other places in the future. name old time/op new time/op delta Template 210ms ± 3% 209ms ± 2% ~ (p=0.951 n=93+95) Unicode 92.2ms ± 3% 93.0ms ± 3% +0.87% (p=0.000 n=94+94) GoTypes 739ms ± 2% 733ms ± 2% -0.84% (p=0.000 n=92+94) Compiler 3.51s ± 2% 3.49s ± 2% -0.57% (p=0.000 n=94+91) SSA 9.80s ± 2% 9.75s ± 2% -0.57% (p=0.000 n=95+92) Flate 132ms ± 2% 132ms ± 3% ~ (p=0.165 n=94+98) GoParser 160ms ± 3% 159ms ± 3% -0.42% (p=0.005 n=96+94) Reflect 446ms ± 4% 442ms ± 4% -0.91% (p=0.000 n=95+98) Tar 186ms ± 3% 186ms ± 2% ~ (p=0.221 n=94+97) XML 252ms ± 2% 250ms ± 2% -0.55% (p=0.000 n=95+94) [Geo mean] 430ms 429ms -0.34% name old user-time/op new user-time/op delta Template 256ms ± 3% 257ms ± 3% ~ (p=0.521 n=94+98) Unicode 120ms ± 9% 121ms ± 9% ~ (p=0.074 n=99+100) GoTypes 935ms ± 3% 935ms ± 2% ~ (p=0.574 n=82+96) Compiler 4.56s ± 1% 4.55s ± 2% ~ (p=0.247 n=88+90) SSA 13.6s ± 2% 13.6s ± 1% ~ (p=0.277 n=94+95) Flate 155ms ± 3% 156ms ± 3% ~ (p=0.181 n=95+100) GoParser 193ms ± 8% 184ms ± 6% -4.39% (p=0.000 n=100+89) Reflect 549ms ± 3% 552ms ± 3% +0.45% (p=0.036 n=94+96) Tar 230ms ± 4% 230ms ± 4% ~ (p=0.670 n=97+99) XML 315ms ± 5% 309ms ±12% -2.05% (p=0.000 n=99+99) [Geo mean] 540ms 538ms -0.47% name old alloc/op new alloc/op delta Template 40.3MB ± 0% 38.9MB ± 0% -3.36% (p=0.008 n=5+5) Unicode 28.6MB ± 0% 28.4MB ± 0% -0.90% (p=0.008 n=5+5) GoTypes 137MB ± 0% 132MB ± 0% -3.65% (p=0.008 n=5+5) Compiler 637MB ± 0% 609MB ± 0% -4.40% (p=0.008 n=5+5) SSA 2.19GB ± 0% 2.07GB ± 0% -5.63% (p=0.008 n=5+5) Flate 25.0MB ± 0% 24.1MB ± 0% -3.80% (p=0.008 n=5+5) GoParser 30.0MB ± 0% 29.1MB ± 0% -3.17% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 84.4MB ± 0% -3.05% (p=0.008 n=5+5) Tar 37.3MB ± 0% 36.0MB ± 0% -3.31% (p=0.008 n=5+5) XML 49.8MB ± 0% 48.0MB ± 0% -3.69% (p=0.008 n=5+5) [Geo mean] 87.6MB 84.6MB -3.50% name old allocs/op new allocs/op delta Template 387k ± 0% 380k ± 0% -1.76% (p=0.008 n=5+5) Unicode 342k ± 0% 341k ± 0% -0.31% (p=0.008 n=5+5) GoTypes 1.39M ± 0% 1.37M ± 0% -1.64% (p=0.008 n=5+5) Compiler 5.68M ± 0% 5.60M ± 0% -1.41% (p=0.008 n=5+5) SSA 17.1M ± 0% 16.8M ± 0% -1.49% (p=0.008 n=5+5) Flate 240k ± 0% 236k ± 0% -1.99% (p=0.008 n=5+5) GoParser 309k ± 0% 304k ± 0% -1.57% (p=0.008 n=5+5) Reflect 1.01M ± 0% 0.99M ± 0% -2.69% (p=0.008 n=5+5) Tar 360k ± 0% 353k ± 0% -1.91% (p=0.008 n=5+5) XML 447k ± 0% 441k ± 0% -1.26% (p=0.008 n=5+5) [Geo mean] 858k 844k -1.60% Fixes #15306 Change-Id: I9f558adb911efddead3865542fe2ca71f66fe1da Reviewed-on: https://go-review.googlesource.com/c/go/+/166718 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-22cmd/compile: reuse liveness structuresAustin Clements
Currently liveness analysis is a significant source of allocations in the compiler. This CL mitigates this by moving the main sources of allocation to the ssa.Cache, allowing them to be reused between different liveness runs. Passes toolstash -cmp. name old time/op new time/op delta Template 194ms ± 1% 193ms ± 1% ~ (p=0.156 n=10+9) Unicode 99.1ms ± 1% 99.3ms ± 2% ~ (p=0.853 n=10+10) GoTypes 689ms ± 0% 687ms ± 0% -0.27% (p=0.022 n=10+9) Compiler 3.29s ± 1% 3.30s ± 1% ~ (p=0.489 n=9+9) SSA 8.02s ± 2% 7.97s ± 1% -0.71% (p=0.011 n=10+10) Flate 131ms ± 1% 130ms ± 1% -0.59% (p=0.043 n=9+10) GoParser 162ms ± 1% 160ms ± 1% -1.53% (p=0.000 n=10+10) Reflect 454ms ± 0% 454ms ± 0% ~ (p=0.959 n=8+8) Tar 185ms ± 1% 185ms ± 2% ~ (p=0.905 n=9+10) XML 235ms ± 1% 232ms ± 1% -1.15% (p=0.001 n=9+10) [Geo mean] 414ms 412ms -0.39% name old alloc/op new alloc/op delta Template 35.6MB ± 0% 34.2MB ± 0% -3.75% (p=0.000 n=10+10) Unicode 29.5MB ± 0% 29.4MB ± 0% -0.26% (p=0.000 n=10+9) GoTypes 117MB ± 0% 112MB ± 0% -3.78% (p=0.000 n=9+10) Compiler 532MB ± 0% 512MB ± 0% -3.80% (p=0.000 n=10+10) SSA 1.55GB ± 0% 1.48GB ± 0% -4.82% (p=0.000 n=10+10) Flate 24.5MB ± 0% 23.6MB ± 0% -3.61% (p=0.000 n=10+9) GoParser 28.7MB ± 0% 27.7MB ± 0% -3.43% (p=0.000 n=10+10) Reflect 80.5MB ± 0% 78.1MB ± 0% -2.96% (p=0.000 n=10+10) Tar 35.1MB ± 0% 33.9MB ± 0% -3.49% (p=0.000 n=10+10) XML 43.7MB ± 0% 42.4MB ± 0% -3.05% (p=0.000 n=10+10) [Geo mean] 78.4MB 75.8MB -3.30% name old allocs/op new allocs/op delta Template 335k ± 0% 335k ± 0% -0.12% (p=0.000 n=10+10) Unicode 339k ± 0% 339k ± 0% -0.01% (p=0.001 n=10+10) GoTypes 1.18M ± 0% 1.17M ± 0% -0.12% (p=0.000 n=10+10) Compiler 4.94M ± 0% 4.94M ± 0% -0.06% (p=0.000 n=10+10) SSA 12.5M ± 0% 12.5M ± 0% -0.07% (p=0.000 n=10+10) Flate 223k ± 0% 223k ± 0% -0.11% (p=0.000 n=10+10) GoParser 281k ± 0% 281k ± 0% -0.08% (p=0.000 n=10+10) Reflect 963k ± 0% 960k ± 0% -0.23% (p=0.000 n=10+9) Tar 330k ± 0% 330k ± 0% -0.12% (p=0.000 n=10+10) XML 392k ± 0% 392k ± 0% -0.08% (p=0.000 n=10+10) [Geo mean] 761k 760k -0.10% Compared to just before "cmd/internal/obj: consolidate emitting entry stack map", the cumulative effect of adding stack maps everywhere and register maps, plus these optimizations, is: name old time/op new time/op delta Template 186ms ± 1% 194ms ± 1% +4.41% (p=0.000 n=9+10) Unicode 96.5ms ± 1% 99.1ms ± 1% +2.76% (p=0.000 n=9+10) GoTypes 659ms ± 1% 689ms ± 0% +4.56% (p=0.000 n=9+10) Compiler 3.14s ± 2% 3.29s ± 1% +4.95% (p=0.000 n=9+9) SSA 7.68s ± 3% 8.02s ± 2% +4.41% (p=0.000 n=10+10) Flate 126ms ± 0% 131ms ± 1% +4.14% (p=0.000 n=10+9) GoParser 153ms ± 1% 162ms ± 1% +5.90% (p=0.000 n=10+10) Reflect 436ms ± 1% 454ms ± 0% +4.14% (p=0.000 n=10+8) Tar 177ms ± 1% 185ms ± 1% +4.28% (p=0.000 n=8+9) XML 224ms ± 1% 235ms ± 1% +5.23% (p=0.000 n=10+9) [Geo mean] 396ms 414ms +4.47% name old alloc/op new alloc/op delta Template 34.5MB ± 0% 35.6MB ± 0% +3.24% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 29.5MB ± 0% +0.51% (p=0.000 n=9+10) GoTypes 113MB ± 0% 117MB ± 0% +3.31% (p=0.000 n=8+9) Compiler 509MB ± 0% 532MB ± 0% +4.46% (p=0.000 n=10+10) SSA 1.49GB ± 0% 1.55GB ± 0% +4.10% (p=0.000 n=10+10) Flate 23.8MB ± 0% 24.5MB ± 0% +2.92% (p=0.000 n=10+10) GoParser 27.9MB ± 0% 28.7MB ± 0% +2.88% (p=0.000 n=10+10) Reflect 77.4MB ± 0% 80.5MB ± 0% +4.01% (p=0.000 n=10+10) Tar 34.1MB ± 0% 35.1MB ± 0% +3.12% (p=0.000 n=10+10) XML 42.6MB ± 0% 43.7MB ± 0% +2.65% (p=0.000 n=10+10) [Geo mean] 76.1MB 78.4MB +3.11% name old allocs/op new allocs/op delta Template 320k ± 0% 335k ± 0% +4.60% (p=0.000 n=10+10) Unicode 336k ± 0% 339k ± 0% +0.96% (p=0.000 n=9+10) GoTypes 1.12M ± 0% 1.18M ± 0% +4.55% (p=0.000 n=10+10) Compiler 4.66M ± 0% 4.94M ± 0% +6.18% (p=0.000 n=10+10) SSA 11.9M ± 0% 12.5M ± 0% +5.37% (p=0.000 n=10+10) Flate 214k ± 0% 223k ± 0% +4.15% (p=0.000 n=9+10) GoParser 270k ± 0% 281k ± 0% +4.15% (p=0.000 n=10+10) Reflect 921k ± 0% 963k ± 0% +4.49% (p=0.000 n=10+10) Tar 317k ± 0% 330k ± 0% +4.25% (p=0.000 n=10+10) XML 375k ± 0% 392k ± 0% +4.75% (p=0.000 n=10+10) [Geo mean] 729k 761k +4.34% Updates #24543. Change-Id: Ia951fdb3c17ae1c156e1d05fc42e69caba33c91a Reviewed-on: https://go-review.googlesource.com/110179 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-05-14cmd/compile: reduce allocations in prove by reusing posetsGiovanni Bajo
In prove, reuse posets between different functions by storing them in the per-worker cache. Allocation count regression caused by prove improvements is down from 5% to 3% after this CL. Updates #25179 Change-Id: I6d14003109833d9b3ef5165fdea00aa9c9e952e8 Reviewed-on: https://go-review.googlesource.com/110455 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-03-15cmd/compile: cache sparse maps across ssa passesDaniel Martí
This is done for sparse sets already, but it was missing for sparse maps. Only affects deadstore and regalloc, as they're the only ones that use sparse maps. name old time/op new time/op delta DSEPass-4 247µs ± 0% 216µs ± 0% -12.75% (p=0.008 n=5+5) DSEPassBlock-4 3.05ms ± 1% 2.87ms ± 1% -6.02% (p=0.002 n=6+6) CSEPass-4 2.30ms ± 0% 2.32ms ± 0% +0.53% (p=0.026 n=6+6) CSEPassBlock-4 23.8ms ± 0% 23.8ms ± 0% ~ (p=0.931 n=6+5) DeadcodePass-4 51.7µs ± 1% 51.5µs ± 2% ~ (p=0.429 n=5+6) DeadcodePassBlock-4 734µs ± 1% 742µs ± 3% ~ (p=0.394 n=6+6) MultiPass-4 152µs ± 0% 149µs ± 2% ~ (p=0.082 n=5+6) MultiPassBlock-4 2.67ms ± 1% 2.41ms ± 2% -9.77% (p=0.008 n=5+5) name old alloc/op new alloc/op delta DSEPass-4 41.2kB ± 0% 0.1kB ± 0% -99.68% (p=0.002 n=6+6) DSEPassBlock-4 560kB ± 0% 4kB ± 0% -99.34% (p=0.026 n=5+6) CSEPass-4 189kB ± 0% 189kB ± 0% ~ (all equal) CSEPassBlock-4 3.10MB ± 0% 3.10MB ± 0% ~ (p=0.444 n=5+5) DeadcodePass-4 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) DeadcodePassBlock-4 164kB ± 0% 164kB ± 0% ~ (all equal) MultiPass-4 240kB ± 0% 199kB ± 0% -17.06% (p=0.002 n=6+6) MultiPassBlock-4 3.60MB ± 0% 2.99MB ± 0% -17.06% (p=0.002 n=6+6) name old allocs/op new allocs/op delta DSEPass-4 8.00 ± 0% 4.00 ± 0% -50.00% (p=0.002 n=6+6) DSEPassBlock-4 240 ± 0% 120 ± 0% -50.00% (p=0.002 n=6+6) CSEPass-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) CSEPassBlock-4 1.35k ± 0% 1.35k ± 0% ~ (all equal) DeadcodePass-4 3.00 ± 0% 3.00 ± 0% ~ (all equal) DeadcodePassBlock-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) MultiPass-4 11.0 ± 0% 10.0 ± 0% -9.09% (p=0.002 n=6+6) MultiPassBlock-4 165 ± 0% 150 ± 0% -9.09% (p=0.002 n=6+6) Change-Id: I43860687c88f33605eb1415f36473c5cfe8fde4a Reviewed-on: https://go-review.googlesource.com/98449 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-03-07cmd/compile: go fmtKunpei Sakai
Change-Id: I2eae33928641c6ed74badfe44d079ae90e5cc8c8 Reviewed-on: https://go-review.googlesource.com/99195 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-21cmd/compile/internal: reuse more memoryHeschi Kreinick
Reuse even more memory, and keep track of it in a long-lived debugState object rather than piecemeal in the Cache. Change-Id: Ib6936b4e8594dc6dda1f59ece753c00fd1c136ba Reviewed-on: https://go-review.googlesource.com/92404 Reviewed-by: David Chase <drchase@google.com>
2018-02-14cmd/compile/internal: reuse memory for valueToProgAfterHeschi Kreinick
Not a big improvement, but does help edge cases like the SSA package. Change-Id: I40e531110b97efd5f45955be477fd0f4faa8d545 Reviewed-on: https://go-review.googlesource.com/92396 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-02-14cmd/compile/internal/ssa: reduce location list memory useHeschi Kreinick
Put everything that showed up in the allocation profile into the cache, and reuse it across functions. After this CL, the overhead of enabling location lists is getting pretty close to the desired 5%: compilecmp -all -beforeflags -dwarflocationlists=0 -afterflags -dwarflocationlists=1 -n 30 4ebad42292b6a4090faf37753dd768d2965e38c4 4ebad42292b6a4090faf37753dd768d2965e38c4 compilecmp -dwarflocationlists=0 4ebad42292b6a4090faf37753dd768d2965e38c4 -dwarflocationlists=1 4ebad42292b6a4090faf37753dd768d2965e38c4 benchstat -geomean /tmp/869550129 /tmp/143495132 completed 30 of 30, estimated time remaining 0s (eta 3:24PM) name old time/op new time/op delta Template 199ms ± 4% 209ms ± 6% +5.17% (p=0.000 n=29+30) Unicode 99.2ms ± 8% 100.5ms ± 6% ~ (p=0.112 n=30+30) GoTypes 642ms ± 3% 684ms ± 3% +6.54% (p=0.000 n=29+30) SSA 8.00s ± 1% 8.71s ± 1% +8.78% (p=0.000 n=29+29) Flate 129ms ± 7% 134ms ± 5% +3.77% (p=0.000 n=30+30) GoParser 157ms ± 4% 164ms ± 5% +4.35% (p=0.000 n=29+30) Reflect 428ms ± 3% 450ms ± 4% +5.09% (p=0.000 n=30+30) Tar 195ms ± 5% 204ms ± 8% +4.78% (p=0.000 n=30+30) XML 228ms ± 4% 241ms ± 4% +5.62% (p=0.000 n=30+29) StdCmd 15.4s ± 1% 16.7s ± 1% +8.29% (p=0.000 n=29+29) [Geo mean] 476ms 502ms +5.35% name old user-time/op new user-time/op delta Template 294ms ±18% 304ms ±15% ~ (p=0.242 n=29+29) Unicode 182ms ±27% 172ms ±28% ~ (p=0.104 n=30+30) GoTypes 957ms ±15% 1016ms ±12% +6.16% (p=0.000 n=30+30) SSA 13.3s ± 5% 14.3s ± 3% +7.32% (p=0.000 n=30+28) Flate 188ms ±17% 193ms ±17% ~ (p=0.288 n=28+29) GoParser 232ms ±16% 238ms ±13% ~ (p=0.065 n=30+29) Reflect 585ms ±13% 620ms ±10% +5.88% (p=0.000 n=30+30) Tar 298ms ±21% 332ms ±23% +11.32% (p=0.000 n=30+30) XML 329ms ±17% 343ms ±12% +4.18% (p=0.032 n=30+30) [Geo mean] 492ms 513ms +4.13% name old alloc/op new alloc/op delta Template 38.3MB ± 0% 40.3MB ± 0% +5.29% (p=0.000 n=30+30) Unicode 29.3MB ± 0% 29.6MB ± 0% +1.28% (p=0.000 n=30+29) GoTypes 110MB ± 0% 118MB ± 0% +6.97% (p=0.000 n=29+30) SSA 1.48GB ± 0% 1.61GB ± 0% +9.06% (p=0.000 n=30+30) Flate 24.8MB ± 0% 26.0MB ± 0% +4.99% (p=0.000 n=29+30) GoParser 30.9MB ± 0% 32.2MB ± 0% +4.20% (p=0.000 n=30+30) Reflect 76.8MB ± 0% 80.6MB ± 0% +4.97% (p=0.000 n=30+30) Tar 39.6MB ± 0% 41.7MB ± 0% +5.22% (p=0.000 n=29+30) XML 42.0MB ± 0% 45.4MB ± 0% +8.22% (p=0.000 n=29+30) [Geo mean] 63.9MB 67.5MB +5.56% name old allocs/op new allocs/op delta Template 383k ± 0% 405k ± 0% +5.69% (p=0.000 n=30+30) Unicode 343k ± 0% 346k ± 0% +0.98% (p=0.000 n=30+27) GoTypes 1.15M ± 0% 1.22M ± 0% +6.17% (p=0.000 n=29+29) SSA 12.2M ± 0% 13.2M ± 0% +8.15% (p=0.000 n=30+30) Flate 234k ± 0% 249k ± 0% +6.44% (p=0.000 n=30+30) GoParser 315k ± 0% 332k ± 0% +5.31% (p=0.000 n=30+28) Reflect 972k ± 0% 1010k ± 0% +3.89% (p=0.000 n=30+30) Tar 394k ± 0% 415k ± 0% +5.35% (p=0.000 n=28+30) XML 404k ± 0% 429k ± 0% +6.31% (p=0.000 n=29+29) [Geo mean] 651k 686k +5.35% Change-Id: Ia005a8d6b33ce9f8091322f004376a3d6e5c1a94 Reviewed-on: https://go-review.googlesource.com/89357 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-02-14cmd/compile: reimplement location list generationHeschi Kreinick
Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-07-27[dev.debug] cmd/compile: better DWARF with optimizations ondev.debugHeschi Kreinick
Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-17cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.ConfigJosh Bleecher Snyder
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill the roles laid out for them in CL 38160. The only non-trivial change in this CL is how cached values and blocks get IDs. Prior to this CL, their IDs were assigned as part of resetting the cache, and only modified IDs were reset. This required knowing how many values and blocks were modified, which required a tight coupling between ssa.Func and ssa.Config. To eliminate that coupling, we now zero values and blocks during reset, and assign their IDs when they are used. Since unused values and blocks have ID == 0, we can efficiently find the last used value/block, to avoid zeroing everything. Bulk zeroing is efficient, but not efficient enough to obviate the need to avoid zeroing everything every time. As a happy side-effect, ssa.Func.Free is no longer necessary. DebugHashMatch and friends now belong in func.go. They have been left in place for clarity and review. I will move them in a subsequent CL. Passes toolstash -cmp. No compiler performance impact. No change in 'go test cmd/compile/internal/ssa' execution time. Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098 Reviewed-on: https://go-review.googlesource.com/38167 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2017-03-15cmd/compile: define roles for ssa.Func, ssa.Config, and ssa.CacheJosh Bleecher Snyder
The line between ssa.Func and ssa.Config has blurred. Concurrent compilation in the backend will require more precision. This CL lays out an (aspirational) organization. The implementation will come in follow-up CLs, once the organization is settled. ssa.Config holds basic compiler configuration, mostly arch-specific information. It is configured once, early on, and is readonly, so it is safe for concurrent use. ssa.Func is a single-shot object used for compiling a single Func. It is not concurrency-safe and not re-usable. ssa.Cache is a multi-use object used to avoid expensive allocations during compilation. Each ssa.Func is given an ssa.Cache to use. ssa.Cache is not concurrency-safe. Change-Id: Id02809b6f3541541cac6c27bbb598834888ce1cc Reviewed-on: https://go-review.googlesource.com/38160 Reviewed-by: Keith Randall <khr@golang.org>