Age | Commit message (Collapse) | Author |
|
Replace the memmove implementation for moves of 17 bytes or larger
with an implementation from ARM optimized software. The moves of 16
bytes or fewer are unchanged, but the registers used are updated to
match the rest of the implementation.
This implementation makes use of new optimizations:
- software pipelined loop for large (>128 byte) moves
- medium size moves (17..128 bytes) have a new implementation
- address realignment when src or dst is unaligned
- preference for aligned src (loads) or dst (stores) depending on CPU
To support preference for aligned loads or aligned stores, a new CPU
flag is added. This flag indicates that the detected micro
architecture performs better with aligned loads. Some tested CPUs did
not exhibit a significant difference and are left with the default
behavior of realigning based on the destination address (stores).
Neoverse N1 (Tested on Graviton 2)
name old time/op new time/op delta
Memmove/0-4 1.88ns ± 1% 1.87ns ± 1% -0.58% (p=0.020 n=10+10)
Memmove/1-4 4.40ns ± 0% 4.40ns ± 0% ~ (all equal)
Memmove/8-4 3.88ns ± 3% 3.80ns ± 0% -1.97% (p=0.001 n=10+9)
Memmove/16-4 3.90ns ± 3% 3.80ns ± 0% -2.49% (p=0.000 n=10+9)
Memmove/32-4 4.80ns ± 0% 4.40ns ± 0% -8.33% (p=0.000 n=9+8)
Memmove/64-4 5.86ns ± 0% 5.00ns ± 0% -14.76% (p=0.000 n=8+8)
Memmove/128-4 8.46ns ± 0% 8.06ns ± 0% -4.62% (p=0.000 n=10+10)
Memmove/256-4 12.4ns ± 0% 12.2ns ± 0% -1.61% (p=0.000 n=10+10)
Memmove/512-4 19.5ns ± 0% 19.1ns ± 0% -2.05% (p=0.000 n=10+10)
Memmove/1024-4 33.7ns ± 0% 33.5ns ± 0% -0.59% (p=0.000 n=10+10)
Memmove/2048-4 62.1ns ± 0% 59.0ns ± 0% -4.99% (p=0.000 n=10+10)
Memmove/4096-4 117ns ± 1% 110ns ± 0% -5.66% (p=0.000 n=10+10)
MemmoveUnalignedDst/64-4 6.41ns ± 0% 5.62ns ± 0% -12.32% (p=0.000 n=10+7)
MemmoveUnalignedDst/128-4 9.40ns ± 0% 8.34ns ± 0% -11.24% (p=0.000 n=10+10)
MemmoveUnalignedDst/256-4 12.8ns ± 0% 12.8ns ± 0% ~ (all equal)
MemmoveUnalignedDst/512-4 20.4ns ± 0% 19.7ns ± 0% -3.43% (p=0.000 n=9+10)
MemmoveUnalignedDst/1024-4 34.1ns ± 0% 35.1ns ± 0% +2.93% (p=0.000 n=9+9)
MemmoveUnalignedDst/2048-4 61.5ns ± 0% 60.4ns ± 0% -1.77% (p=0.000 n=10+10)
MemmoveUnalignedDst/4096-4 122ns ± 0% 113ns ± 0% -7.38% (p=0.002 n=8+10)
MemmoveUnalignedSrc/64-4 7.25ns ± 1% 6.26ns ± 0% -13.64% (p=0.000 n=9+9)
MemmoveUnalignedSrc/128-4 10.5ns ± 0% 9.7ns ± 0% -7.52% (p=0.000 n=10+10)
MemmoveUnalignedSrc/256-4 17.1ns ± 0% 17.3ns ± 0% +1.17% (p=0.000 n=10+10)
MemmoveUnalignedSrc/512-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal)
MemmoveUnalignedSrc/1024-4 46.7ns ± 0% 35.7ns ± 0% -23.55% (p=0.000 n=10+9)
MemmoveUnalignedSrc/2048-4 85.2ns ± 0% 61.2ns ± 0% -28.17% (p=0.000 n=10+8)
MemmoveUnalignedSrc/4096-4 162ns ± 0% 113ns ± 0% -30.25% (p=0.000 n=10+10)
name old speed new speed delta
Memmove/4096-4 35.2GB/s ± 0% 37.1GB/s ± 0% +5.56% (p=0.000 n=10+9)
MemmoveUnalignedSrc/1024-4 21.9GB/s ± 0% 28.7GB/s ± 0% +30.90% (p=0.000 n=10+10)
MemmoveUnalignedSrc/2048-4 24.0GB/s ± 0% 33.5GB/s ± 0% +39.18% (p=0.000 n=10+9)
MemmoveUnalignedSrc/4096-4 25.3GB/s ± 0% 36.2GB/s ± 0% +43.50% (p=0.000 n=10+7)
Cortex-A72 (Graviton 1)
name old time/op new time/op delta
Memmove/0-4 3.06ns ± 3% 3.08ns ± 1% ~ (p=0.958 n=10+9)
Memmove/1-4 8.72ns ± 0% 7.85ns ± 0% -9.98% (p=0.002 n=8+10)
Memmove/8-4 8.29ns ± 0% 8.29ns ± 0% ~ (all equal)
Memmove/16-4 8.29ns ± 0% 8.29ns ± 0% ~ (all equal)
Memmove/32-4 8.19ns ± 2% 8.29ns ± 0% ~ (p=0.114 n=10+10)
Memmove/64-4 18.3ns ± 4% 10.0ns ± 0% -45.36% (p=0.000 n=10+10)
Memmove/128-4 14.8ns ± 0% 17.4ns ± 0% +17.77% (p=0.000 n=10+10)
Memmove/256-4 21.8ns ± 0% 23.1ns ± 0% +5.96% (p=0.000 n=10+10)
Memmove/512-4 35.8ns ± 0% 37.2ns ± 0% +3.91% (p=0.000 n=10+10)
Memmove/1024-4 63.7ns ± 0% 67.2ns ± 0% +5.49% (p=0.000 n=10+10)
Memmove/2048-4 126ns ± 0% 123ns ± 0% -2.38% (p=0.000 n=10+10)
Memmove/4096-4 238ns ± 1% 243ns ± 1% +1.93% (p=0.000 n=10+10)
MemmoveUnalignedDst/64-4 19.3ns ± 1% 12.0ns ± 1% -37.49% (p=0.000 n=10+10)
MemmoveUnalignedDst/128-4 17.2ns ± 0% 17.4ns ± 0% +1.16% (p=0.000 n=10+10)
MemmoveUnalignedDst/256-4 28.2ns ± 8% 29.2ns ± 0% ~ (p=0.352 n=10+10)
MemmoveUnalignedDst/512-4 49.8ns ± 3% 48.9ns ± 0% ~ (p=1.000 n=10+10)
MemmoveUnalignedDst/1024-4 89.5ns ± 0% 80.5ns ± 1% -10.02% (p=0.000 n=10+10)
MemmoveUnalignedDst/2048-4 180ns ± 0% 127ns ± 0% -29.44% (p=0.000 n=9+10)
MemmoveUnalignedDst/4096-4 347ns ± 0% 244ns ± 0% -29.59% (p=0.000 n=10+9)
MemmoveUnalignedSrc/128-4 16.1ns ± 0% 21.8ns ± 0% +35.40% (p=0.000 n=10+10)
MemmoveUnalignedSrc/256-4 24.9ns ± 8% 26.6ns ± 0% +6.70% (p=0.015 n=10+10)
MemmoveUnalignedSrc/512-4 39.4ns ± 6% 40.6ns ± 0% ~ (p=0.352 n=10+10)
MemmoveUnalignedSrc/1024-4 72.5ns ± 0% 83.0ns ± 1% +14.44% (p=0.000 n=9+10)
MemmoveUnalignedSrc/2048-4 129ns ± 1% 128ns ± 1% ~ (p=0.179 n=10+10)
MemmoveUnalignedSrc/4096-4 241ns ± 0% 253ns ± 1% +4.99% (p=0.000 n=9+9)
Cortex-A53 (Raspberry Pi 3)
name old time/op new time/op delta
Memmove/0-4 11.0ns ± 0% 11.0ns ± 1% ~ (p=0.294 n=8+10)
Memmove/1-4 29.6ns ± 0% 28.0ns ± 1% -5.41% (p=0.000 n=9+10)
Memmove/8-4 23.5ns ± 0% 22.1ns ± 0% -6.11% (p=0.000 n=8+8)
Memmove/16-4 23.7ns ± 1% 22.1ns ± 0% -6.59% (p=0.000 n=10+8)
Memmove/32-4 27.9ns ± 0% 27.1ns ± 0% -3.13% (p=0.000 n=8+8)
Memmove/64-4 33.8ns ± 0% 31.5ns ± 1% -6.99% (p=0.000 n=8+10)
Memmove/128-4 45.6ns ± 0% 44.2ns ± 1% -3.23% (p=0.000 n=9+10)
Memmove/256-4 69.3ns ± 0% 69.3ns ± 0% ~ (p=0.072 n=8+8)
Memmove/512-4 127ns ± 0% 110ns ± 0% -13.39% (p=0.000 n=8+8)
Memmove/1024-4 222ns ± 0% 205ns ± 1% -7.66% (p=0.000 n=7+10)
Memmove/2048-4 411ns ± 0% 366ns ± 0% -10.98% (p=0.000 n=8+9)
Memmove/4096-4 795ns ± 1% 695ns ± 1% -12.63% (p=0.000 n=10+10)
MemmoveUnalignedDst/64-4 44.0ns ± 0% 40.5ns ± 0% -7.93% (p=0.000 n=8+8)
MemmoveUnalignedDst/128-4 59.6ns ± 0% 54.9ns ± 0% -7.85% (p=0.000 n=9+9)
MemmoveUnalignedDst/256-4 98.2ns ±11% 90.0ns ± 1% ~ (p=0.130 n=10+10)
MemmoveUnalignedDst/512-4 161ns ± 2% 145ns ± 1% -9.96% (p=0.000 n=10+10)
MemmoveUnalignedDst/1024-4 281ns ± 0% 265ns ± 0% -5.65% (p=0.000 n=9+8)
MemmoveUnalignedDst/2048-4 528ns ± 0% 482ns ± 0% -8.73% (p=0.000 n=8+9)
MemmoveUnalignedDst/4096-4 1.02µs ± 1% 0.92µs ± 0% -10.00% (p=0.000 n=10+8)
MemmoveUnalignedSrc/64-4 42.4ns ± 1% 40.5ns ± 0% -4.39% (p=0.000 n=10+8)
MemmoveUnalignedSrc/128-4 57.4ns ± 0% 57.0ns ± 1% -0.75% (p=0.048 n=9+10)
MemmoveUnalignedSrc/256-4 88.1ns ± 1% 89.6ns ± 0% +1.70% (p=0.000 n=9+8)
MemmoveUnalignedSrc/512-4 160ns ± 2% 144ns ± 0% -9.89% (p=0.000 n=10+8)
MemmoveUnalignedSrc/1024-4 286ns ± 0% 266ns ± 1% -6.69% (p=0.000 n=8+10)
MemmoveUnalignedSrc/2048-4 525ns ± 0% 483ns ± 1% -7.96% (p=0.000 n=9+10)
MemmoveUnalignedSrc/4096-4 1.01µs ± 0% 0.92µs ± 1% -9.40% (p=0.000 n=8+10)
Change-Id: Ia1144e9d4dfafdece6e167c5e576bf80f254c8ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/243357
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
Change-Id: I6389d7efe90836b6ece44d2e75053d1ad9f35d08
Reviewed-on: https://go-review.googlesource.com/c/go/+/253417
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
In the previous CL we ensures that memmove writes pointers
atomically, so the concurrent GC won't observe a partially
updated pointer. This CL adds a test.
Change-Id: Icd1124bf3a15ef25bac20c7fb8933f1a642d897c
Reviewed-on: https://go-review.googlesource.com/c/go/+/212627
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Speeds up the "go test runtime -cpu=1,2,4 -short -quick" phase of all.bash.
For #26473.
Change-Id: I090f5a5aa754462b3253a2156dc31fa67ce7af2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/177399
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
|
|
name old time/op new time/op delta
CopyFat8 2.15ns ± 1% 2.19ns ± 6% ~ (p=0.171 n=8+9)
CopyFat12 2.15ns ± 0% 2.17ns ± 2% ~ (p=0.137 n=8+10)
CopyFat16 2.17ns ± 3% 2.15ns ± 0% ~ (p=0.211 n=10+10)
CopyFat24 2.16ns ± 1% 2.15ns ± 0% ~ (p=0.087 n=10+10)
CopyFat32 11.5ns ± 0% 12.8ns ± 2% +10.87% (p=0.000 n=8+10)
CopyFat64 20.2ns ± 2% 12.9ns ± 0% -36.11% (p=0.000 n=10+10)
CopyFat128 37.2ns ± 0% 21.5ns ± 0% -42.20% (p=0.000 n=10+10)
CopyFat256 71.6ns ± 0% 38.7ns ± 0% -45.95% (p=0.000 n=10+10)
CopyFat512 140ns ± 0% 73ns ± 0% -47.86% (p=0.000 n=10+9)
CopyFat520 142ns ± 0% 74ns ± 0% -47.54% (p=0.000 n=10+10)
CopyFat1024 277ns ± 0% 141ns ± 0% -49.10% (p=0.000 n=10+10)
Change-Id: If54bc571add5db674d5e081579c87e80153d0a5a
Reviewed-on: https://go-review.googlesource.com/97395
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
This cuts 23 seconds from all.bash on my MacBook Pro.
Change-Id: Ibc4d7c01660b9e9ebd088dd55ba993f0d7ec6aa3
Reviewed-on: https://go-review.googlesource.com/73991
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
memmove used to use 2 2-byte load/store pairs to move 4 bytes.
When the result is loaded with a single 4-byte load, it caused
a store to load fowarding stall. To avoid the stall,
special case memmove to use 4 byte ops for the 4 byte copy case.
We already have a special case for 8-byte copies.
386 already specializes 4-byte copies.
I'll do 2-byte copies also, but not for 1.8.
benchmark old ns/op new ns/op delta
BenchmarkIssue18740-8 7567 4799 -36.58%
3-byte copies get a bit slower. Other copies are unchanged.
name old time/op new time/op delta
Memmove/3-8 4.76ns ± 5% 5.26ns ± 3% +10.50% (p=0.000 n=10+10)
Fixes #18740
Change-Id: Iec82cbac0ecfee80fa3c8fc83828f9a1819c3c74
Reviewed-on: https://go-review.googlesource.com/35567
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
I used the slowtests.go tool as described in
https://golang.org/cl/32684 on packages that stood out.
go test -short std drops from ~56 to ~52 seconds.
This isn't a huge win, but it was mostly an exercise.
Updates #17751
Change-Id: I9f3402e36a038d71e662d06ce2c1d52f6c4b674d
Reviewed-on: https://go-review.googlesource.com/32751
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors.
(collected on E5 2609v3 @1.9GHz)
name old speed new speed delta
Memmove/1-6 158MB/s ± 0% 172MB/s ± 0% +9.09% (p=0.000 n=16+16)
Memmove/2-6 316MB/s ± 0% 345MB/s ± 0% +9.09% (p=0.000 n=18+16)
Memmove/3-6 517MB/s ± 0% 517MB/s ± 0% ~ (p=0.445 n=16+16)
Memmove/4-6 687MB/s ± 1% 690MB/s ± 0% +0.35% (p=0.000 n=20+17)
Memmove/5-6 729MB/s ± 0% 729MB/s ± 0% +0.01% (p=0.000 n=16+18)
Memmove/6-6 875MB/s ± 0% 875MB/s ± 0% +0.01% (p=0.000 n=18+18)
Memmove/7-6 1.02GB/s ± 0% 1.02GB/s ± 1% ~ (p=0.139 n=19+20)
Memmove/8-6 1.26GB/s ± 0% 1.26GB/s ± 0% +0.00% (p=0.000 n=18+18)
Memmove/9-6 1.42GB/s ± 0% 1.42GB/s ± 0% +0.00% (p=0.000 n=17+18)
Memmove/10-6 1.58GB/s ± 0% 1.58GB/s ± 0% +0.00% (p=0.000 n=19+19)
Memmove/11-6 1.74GB/s ± 0% 1.74GB/s ± 0% +0.00% (p=0.001 n=18+17)
Memmove/12-6 1.90GB/s ± 0% 1.90GB/s ± 0% +0.00% (p=0.000 n=19+19)
Memmove/13-6 2.05GB/s ± 0% 2.05GB/s ± 0% +0.00% (p=0.000 n=18+19)
Memmove/14-6 2.21GB/s ± 0% 2.21GB/s ± 0% +0.00% (p=0.000 n=16+20)
Memmove/15-6 2.37GB/s ± 0% 2.37GB/s ± 0% +0.00% (p=0.004 n=19+20)
Memmove/16-6 2.53GB/s ± 0% 2.53GB/s ± 0% +0.00% (p=0.000 n=16+16)
Memmove/32-6 4.67GB/s ± 0% 4.67GB/s ± 0% +0.00% (p=0.000 n=17+17)
Memmove/64-6 8.67GB/s ± 0% 8.64GB/s ± 0% -0.33% (p=0.000 n=18+17)
Memmove/128-6 12.6GB/s ± 0% 11.6GB/s ± 0% -8.05% (p=0.000 n=16+19)
Memmove/256-6 16.3GB/s ± 0% 16.6GB/s ± 0% +1.66% (p=0.000 n=20+18)
Memmove/512-6 21.5GB/s ± 0% 24.4GB/s ± 0% +13.35% (p=0.000 n=18+17)
Memmove/1024-6 24.7GB/s ± 0% 33.7GB/s ± 0% +36.12% (p=0.000 n=18+18)
Memmove/2048-6 27.3GB/s ± 0% 43.3GB/s ± 0% +58.77% (p=0.000 n=19+17)
Memmove/4096-6 37.5GB/s ± 0% 50.5GB/s ± 0% +34.56% (p=0.000 n=19+19)
MemmoveUnalignedDst/1-6 135MB/s ± 0% 146MB/s ± 0% +7.69% (p=0.000 n=16+14)
MemmoveUnalignedDst/2-6 271MB/s ± 0% 292MB/s ± 0% +7.69% (p=0.000 n=18+18)
MemmoveUnalignedDst/3-6 438MB/s ± 0% 438MB/s ± 0% ~ (p=0.352 n=16+19)
MemmoveUnalignedDst/4-6 584MB/s ± 0% 584MB/s ± 0% ~ (p=0.876 n=17+17)
MemmoveUnalignedDst/5-6 631MB/s ± 1% 632MB/s ± 0% +0.25% (p=0.000 n=20+17)
MemmoveUnalignedDst/6-6 759MB/s ± 0% 759MB/s ± 0% +0.00% (p=0.000 n=19+16)
MemmoveUnalignedDst/7-6 885MB/s ± 0% 883MB/s ± 1% ~ (p=0.647 n=18+20)
MemmoveUnalignedDst/8-6 1.08GB/s ± 0% 1.08GB/s ± 0% +0.00% (p=0.035 n=19+18)
MemmoveUnalignedDst/9-6 1.22GB/s ± 0% 1.22GB/s ± 0% ~ (p=0.251 n=18+17)
MemmoveUnalignedDst/10-6 1.35GB/s ± 0% 1.35GB/s ± 0% ~ (p=0.327 n=17+18)
MemmoveUnalignedDst/11-6 1.49GB/s ± 0% 1.49GB/s ± 0% ~ (p=0.531 n=18+19)
MemmoveUnalignedDst/12-6 1.63GB/s ± 0% 1.63GB/s ± 0% ~ (p=0.886 n=19+18)
MemmoveUnalignedDst/13-6 1.76GB/s ± 0% 1.76GB/s ± 1% -0.24% (p=0.006 n=18+20)
MemmoveUnalignedDst/14-6 1.90GB/s ± 0% 1.90GB/s ± 0% ~ (p=0.818 n=20+19)
MemmoveUnalignedDst/15-6 2.03GB/s ± 0% 2.03GB/s ± 0% ~ (p=0.294 n=17+16)
MemmoveUnalignedDst/16-6 2.17GB/s ± 0% 2.17GB/s ± 0% ~ (p=0.602 n=16+18)
MemmoveUnalignedDst/32-6 4.05GB/s ± 0% 4.05GB/s ± 0% +0.00% (p=0.010 n=18+17)
MemmoveUnalignedDst/64-6 7.59GB/s ± 0% 7.59GB/s ± 0% +0.00% (p=0.022 n=18+16)
MemmoveUnalignedDst/128-6 11.1GB/s ± 0% 11.4GB/s ± 0% +2.79% (p=0.000 n=18+17)
MemmoveUnalignedDst/256-6 16.4GB/s ± 0% 16.7GB/s ± 0% +1.59% (p=0.000 n=20+17)
MemmoveUnalignedDst/512-6 15.7GB/s ± 0% 21.3GB/s ± 0% +35.87% (p=0.000 n=18+20)
MemmoveUnalignedDst/1024-6 16.0GB/s ±20% 31.5GB/s ± 0% +96.93% (p=0.000 n=20+14)
MemmoveUnalignedDst/2048-6 19.6GB/s ± 0% 42.1GB/s ± 0% +115.16% (p=0.000 n=17+18)
MemmoveUnalignedDst/4096-6 6.41GB/s ± 0% 33.18GB/s ± 0% +417.56% (p=0.000 n=17+18)
MemmoveUnalignedSrc/1-6 171MB/s ± 0% 166MB/s ± 0% -3.33% (p=0.000 n=19+16)
MemmoveUnalignedSrc/2-6 343MB/s ± 0% 342MB/s ± 1% -0.41% (p=0.000 n=17+20)
MemmoveUnalignedSrc/3-6 508MB/s ± 0% 493MB/s ± 1% -2.90% (p=0.000 n=17+17)
MemmoveUnalignedSrc/4-6 677MB/s ± 0% 660MB/s ± 2% -2.55% (p=0.000 n=17+20)
MemmoveUnalignedSrc/5-6 790MB/s ± 0% 790MB/s ± 0% ~ (p=0.139 n=17+17)
MemmoveUnalignedSrc/6-6 948MB/s ± 0% 946MB/s ± 1% ~ (p=0.330 n=17+19)
MemmoveUnalignedSrc/7-6 1.11GB/s ± 0% 1.11GB/s ± 0% -0.05% (p=0.026 n=17+17)
MemmoveUnalignedSrc/8-6 1.38GB/s ± 0% 1.38GB/s ± 0% ~ (p=0.091 n=18+16)
MemmoveUnalignedSrc/9-6 1.42GB/s ± 0% 1.40GB/s ± 1% -1.04% (p=0.000 n=19+20)
MemmoveUnalignedSrc/10-6 1.58GB/s ± 0% 1.56GB/s ± 1% -1.15% (p=0.000 n=18+19)
MemmoveUnalignedSrc/11-6 1.73GB/s ± 0% 1.71GB/s ± 1% -1.30% (p=0.000 n=20+20)
MemmoveUnalignedSrc/12-6 1.89GB/s ± 0% 1.87GB/s ± 1% -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/13-6 2.05GB/s ± 0% 2.02GB/s ± 1% -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/14-6 2.21GB/s ± 0% 2.18GB/s ± 1% -1.14% (p=0.000 n=17+20)
MemmoveUnalignedSrc/15-6 2.36GB/s ± 0% 2.34GB/s ± 1% -1.04% (p=0.000 n=17+20)
MemmoveUnalignedSrc/16-6 2.52GB/s ± 0% 2.49GB/s ± 1% -1.26% (p=0.000 n=19+20)
MemmoveUnalignedSrc/32-6 4.82GB/s ± 0% 4.61GB/s ± 0% -4.40% (p=0.000 n=19+20)
MemmoveUnalignedSrc/64-6 5.03GB/s ± 4% 7.97GB/s ± 0% +58.55% (p=0.000 n=20+16)
MemmoveUnalignedSrc/128-6 11.1GB/s ± 0% 11.2GB/s ± 0% +0.52% (p=0.000 n=17+18)
MemmoveUnalignedSrc/256-6 16.5GB/s ± 0% 16.4GB/s ± 0% -0.10% (p=0.000 n=20+18)
MemmoveUnalignedSrc/512-6 21.0GB/s ± 0% 22.1GB/s ± 0% +5.48% (p=0.000 n=14+17)
MemmoveUnalignedSrc/1024-6 24.9GB/s ± 0% 31.9GB/s ± 0% +28.20% (p=0.000 n=19+20)
MemmoveUnalignedSrc/2048-6 23.3GB/s ± 0% 33.8GB/s ± 0% +45.22% (p=0.000 n=17+19)
MemmoveUnalignedSrc/4096-6 37.3GB/s ± 0% 42.7GB/s ± 0% +14.30% (p=0.000 n=17+17)
Change-Id: Id66aa3e499ccfb117cb99d623ef326b50d057b64
Reviewed-on: https://go-review.googlesource.com/29590
Run-TryBot: Denis Nagorny <denis.nagorny@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
This reverts commit 3607c5f4f18ad4d423e40996ebf7f46b2f79ce02.
This was causing failures on amd64 machines without AVX.
Fixes #16939
Change-Id: I70080fbb4e7ae791857334f2bffd847d08cb25fa
Reviewed-on: https://go-review.googlesource.com/28274
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors.
(collected on E5 2609v3 @1.9GHz)
name old speed new speed delta
Memmove/1-6 158MB/s ± 0% 172MB/s ± 0% +9.09% (p=0.000 n=16+16)
Memmove/2-6 316MB/s ± 0% 345MB/s ± 0% +9.09% (p=0.000 n=18+16)
Memmove/3-6 517MB/s ± 0% 517MB/s ± 0% ~ (p=0.445 n=16+16)
Memmove/4-6 687MB/s ± 1% 690MB/s ± 0% +0.35% (p=0.000 n=20+17)
Memmove/5-6 729MB/s ± 0% 729MB/s ± 0% +0.01% (p=0.000 n=16+18)
Memmove/6-6 875MB/s ± 0% 875MB/s ± 0% +0.01% (p=0.000 n=18+18)
Memmove/7-6 1.02GB/s ± 0% 1.02GB/s ± 1% ~ (p=0.139 n=19+20)
Memmove/8-6 1.26GB/s ± 0% 1.26GB/s ± 0% +0.00% (p=0.000 n=18+18)
Memmove/9-6 1.42GB/s ± 0% 1.42GB/s ± 0% +0.00% (p=0.000 n=17+18)
Memmove/10-6 1.58GB/s ± 0% 1.58GB/s ± 0% +0.00% (p=0.000 n=19+19)
Memmove/11-6 1.74GB/s ± 0% 1.74GB/s ± 0% +0.00% (p=0.001 n=18+17)
Memmove/12-6 1.90GB/s ± 0% 1.90GB/s ± 0% +0.00% (p=0.000 n=19+19)
Memmove/13-6 2.05GB/s ± 0% 2.05GB/s ± 0% +0.00% (p=0.000 n=18+19)
Memmove/14-6 2.21GB/s ± 0% 2.21GB/s ± 0% +0.00% (p=0.000 n=16+20)
Memmove/15-6 2.37GB/s ± 0% 2.37GB/s ± 0% +0.00% (p=0.004 n=19+20)
Memmove/16-6 2.53GB/s ± 0% 2.53GB/s ± 0% +0.00% (p=0.000 n=16+16)
Memmove/32-6 4.67GB/s ± 0% 4.67GB/s ± 0% +0.00% (p=0.000 n=17+17)
Memmove/64-6 8.67GB/s ± 0% 8.64GB/s ± 0% -0.33% (p=0.000 n=18+17)
Memmove/128-6 12.6GB/s ± 0% 11.6GB/s ± 0% -8.05% (p=0.000 n=16+19)
Memmove/256-6 16.3GB/s ± 0% 16.6GB/s ± 0% +1.66% (p=0.000 n=20+18)
Memmove/512-6 21.5GB/s ± 0% 24.4GB/s ± 0% +13.35% (p=0.000 n=18+17)
Memmove/1024-6 24.7GB/s ± 0% 33.7GB/s ± 0% +36.12% (p=0.000 n=18+18)
Memmove/2048-6 27.3GB/s ± 0% 43.3GB/s ± 0% +58.77% (p=0.000 n=19+17)
Memmove/4096-6 37.5GB/s ± 0% 50.5GB/s ± 0% +34.56% (p=0.000 n=19+19)
MemmoveUnalignedDst/1-6 135MB/s ± 0% 146MB/s ± 0% +7.69% (p=0.000 n=16+14)
MemmoveUnalignedDst/2-6 271MB/s ± 0% 292MB/s ± 0% +7.69% (p=0.000 n=18+18)
MemmoveUnalignedDst/3-6 438MB/s ± 0% 438MB/s ± 0% ~ (p=0.352 n=16+19)
MemmoveUnalignedDst/4-6 584MB/s ± 0% 584MB/s ± 0% ~ (p=0.876 n=17+17)
MemmoveUnalignedDst/5-6 631MB/s ± 1% 632MB/s ± 0% +0.25% (p=0.000 n=20+17)
MemmoveUnalignedDst/6-6 759MB/s ± 0% 759MB/s ± 0% +0.00% (p=0.000 n=19+16)
MemmoveUnalignedDst/7-6 885MB/s ± 0% 883MB/s ± 1% ~ (p=0.647 n=18+20)
MemmoveUnalignedDst/8-6 1.08GB/s ± 0% 1.08GB/s ± 0% +0.00% (p=0.035 n=19+18)
MemmoveUnalignedDst/9-6 1.22GB/s ± 0% 1.22GB/s ± 0% ~ (p=0.251 n=18+17)
MemmoveUnalignedDst/10-6 1.35GB/s ± 0% 1.35GB/s ± 0% ~ (p=0.327 n=17+18)
MemmoveUnalignedDst/11-6 1.49GB/s ± 0% 1.49GB/s ± 0% ~ (p=0.531 n=18+19)
MemmoveUnalignedDst/12-6 1.63GB/s ± 0% 1.63GB/s ± 0% ~ (p=0.886 n=19+18)
MemmoveUnalignedDst/13-6 1.76GB/s ± 0% 1.76GB/s ± 1% -0.24% (p=0.006 n=18+20)
MemmoveUnalignedDst/14-6 1.90GB/s ± 0% 1.90GB/s ± 0% ~ (p=0.818 n=20+19)
MemmoveUnalignedDst/15-6 2.03GB/s ± 0% 2.03GB/s ± 0% ~ (p=0.294 n=17+16)
MemmoveUnalignedDst/16-6 2.17GB/s ± 0% 2.17GB/s ± 0% ~ (p=0.602 n=16+18)
MemmoveUnalignedDst/32-6 4.05GB/s ± 0% 4.05GB/s ± 0% +0.00% (p=0.010 n=18+17)
MemmoveUnalignedDst/64-6 7.59GB/s ± 0% 7.59GB/s ± 0% +0.00% (p=0.022 n=18+16)
MemmoveUnalignedDst/128-6 11.1GB/s ± 0% 11.4GB/s ± 0% +2.79% (p=0.000 n=18+17)
MemmoveUnalignedDst/256-6 16.4GB/s ± 0% 16.7GB/s ± 0% +1.59% (p=0.000 n=20+17)
MemmoveUnalignedDst/512-6 15.7GB/s ± 0% 21.3GB/s ± 0% +35.87% (p=0.000 n=18+20)
MemmoveUnalignedDst/1024-6 16.0GB/s ±20% 31.5GB/s ± 0% +96.93% (p=0.000 n=20+14)
MemmoveUnalignedDst/2048-6 19.6GB/s ± 0% 42.1GB/s ± 0% +115.16% (p=0.000 n=17+18)
MemmoveUnalignedDst/4096-6 6.41GB/s ± 0% 33.18GB/s ± 0% +417.56% (p=0.000 n=17+18)
MemmoveUnalignedSrc/1-6 171MB/s ± 0% 166MB/s ± 0% -3.33% (p=0.000 n=19+16)
MemmoveUnalignedSrc/2-6 343MB/s ± 0% 342MB/s ± 1% -0.41% (p=0.000 n=17+20)
MemmoveUnalignedSrc/3-6 508MB/s ± 0% 493MB/s ± 1% -2.90% (p=0.000 n=17+17)
MemmoveUnalignedSrc/4-6 677MB/s ± 0% 660MB/s ± 2% -2.55% (p=0.000 n=17+20)
MemmoveUnalignedSrc/5-6 790MB/s ± 0% 790MB/s ± 0% ~ (p=0.139 n=17+17)
MemmoveUnalignedSrc/6-6 948MB/s ± 0% 946MB/s ± 1% ~ (p=0.330 n=17+19)
MemmoveUnalignedSrc/7-6 1.11GB/s ± 0% 1.11GB/s ± 0% -0.05% (p=0.026 n=17+17)
MemmoveUnalignedSrc/8-6 1.38GB/s ± 0% 1.38GB/s ± 0% ~ (p=0.091 n=18+16)
MemmoveUnalignedSrc/9-6 1.42GB/s ± 0% 1.40GB/s ± 1% -1.04% (p=0.000 n=19+20)
MemmoveUnalignedSrc/10-6 1.58GB/s ± 0% 1.56GB/s ± 1% -1.15% (p=0.000 n=18+19)
MemmoveUnalignedSrc/11-6 1.73GB/s ± 0% 1.71GB/s ± 1% -1.30% (p=0.000 n=20+20)
MemmoveUnalignedSrc/12-6 1.89GB/s ± 0% 1.87GB/s ± 1% -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/13-6 2.05GB/s ± 0% 2.02GB/s ± 1% -1.18% (p=0.000 n=17+20)
MemmoveUnalignedSrc/14-6 2.21GB/s ± 0% 2.18GB/s ± 1% -1.14% (p=0.000 n=17+20)
MemmoveUnalignedSrc/15-6 2.36GB/s ± 0% 2.34GB/s ± 1% -1.04% (p=0.000 n=17+20)
MemmoveUnalignedSrc/16-6 2.52GB/s ± 0% 2.49GB/s ± 1% -1.26% (p=0.000 n=19+20)
MemmoveUnalignedSrc/32-6 4.82GB/s ± 0% 4.61GB/s ± 0% -4.40% (p=0.000 n=19+20)
MemmoveUnalignedSrc/64-6 5.03GB/s ± 4% 7.97GB/s ± 0% +58.55% (p=0.000 n=20+16)
MemmoveUnalignedSrc/128-6 11.1GB/s ± 0% 11.2GB/s ± 0% +0.52% (p=0.000 n=17+18)
MemmoveUnalignedSrc/256-6 16.5GB/s ± 0% 16.4GB/s ± 0% -0.10% (p=0.000 n=20+18)
MemmoveUnalignedSrc/512-6 21.0GB/s ± 0% 22.1GB/s ± 0% +5.48% (p=0.000 n=14+17)
MemmoveUnalignedSrc/1024-6 24.9GB/s ± 0% 31.9GB/s ± 0% +28.20% (p=0.000 n=19+20)
MemmoveUnalignedSrc/2048-6 23.3GB/s ± 0% 33.8GB/s ± 0% +45.22% (p=0.000 n=17+19)
MemmoveUnalignedSrc/4096-6 37.3GB/s ± 0% 42.7GB/s ± 0% +14.30% (p=0.000 n=17+17)
Change-Id: Iab488d93a293cdf573ab5cd89b95a818bbb5d531
Reviewed-on: https://go-review.googlesource.com/22515
Run-TryBot: Denis Nagorny <denis.nagorny@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Names of sub-benchmarks are preserved, short of the additional slash.
Change-Id: I9b3f82964f9a44b0d28724413320afd091ed3106
Reviewed-on: https://go-review.googlesource.com/23425
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
MOVSB is quite a bit faster for unaligned moves.
Possibly we should use MOVSB all of the time, but Intel folks
say it might be a bit faster to use MOVSQ on some processors
(but not any I have access to at the moment).
benchmark old ns/op new ns/op delta
BenchmarkMemmove4096-8 93.9 93.2 -0.75%
BenchmarkMemmoveUnalignedDst4096-8 256 151 -41.02%
BenchmarkMemmoveUnalignedSrc4096-8 175 90.5 -48.29%
Fixes #14630
Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff
Reviewed-on: https://go-review.googlesource.com/20293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This is a subset of https://golang.org/cl/20022 with only the copyright
header lines, so the next CL will be smaller and more reviewable.
Go policy has been single space after periods in comments for some time.
The copyright header template at:
https://golang.org/doc/contribute.html#copyright
also uses a single space.
Make them all consistent.
Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0
Reviewed-on: https://go-review.googlesource.com/20111
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Results are a bit noisy, but show good improvement (haswell)
name old time/op new time/op delta
Memclr5-48 6.06ns ± 8% 5.65ns ± 8% -6.81% (p=0.000 n=20+20)
Memclr16-48 5.75ns ± 6% 5.71ns ± 6% ~ (p=0.545 n=20+19)
Memclr64-48 6.54ns ± 5% 6.14ns ± 9% -6.12% (p=0.000 n=18+20)
Memclr256-48 10.1ns ±12% 9.9ns ±14% ~ (p=0.285 n=20+19)
Memclr4096-48 104ns ± 8% 57ns ±15% -44.98% (p=0.000 n=20+20)
Memclr65536-48 2.45µs ± 5% 2.43µs ± 8% ~ (p=0.665 n=16+20)
Memclr1M-48 58.7µs ±13% 56.4µs ±11% -3.92% (p=0.033 n=20+19)
Memclr4M-48 233µs ± 9% 234µs ± 9% ~ (p=0.728 n=20+19)
Memclr8M-48 469µs ±11% 472µs ±16% ~ (p=0.947 n=20+20)
Memclr16M-48 947µs ±10% 916µs ±10% ~ (p=0.050 n=20+19)
Memclr64M-48 10.9ms ±10% 4.5ms ± 9% -58.43% (p=0.000 n=20+20)
GoMemclr5-48 3.80ns ±13% 3.38ns ± 6% -11.02% (p=0.000 n=20+20)
GoMemclr16-48 3.34ns ±15% 3.40ns ± 9% ~ (p=0.351 n=20+20)
GoMemclr64-48 4.10ns ±15% 4.04ns ±10% ~ (p=1.000 n=20+19)
GoMemclr256-48 7.75ns ±20% 7.88ns ± 9% ~ (p=0.227 n=20+19)
name old speed new speed delta
Memclr5-48 826MB/s ± 7% 886MB/s ± 8% +7.32% (p=0.000 n=20+20)
Memclr16-48 2.78GB/s ± 5% 2.81GB/s ± 6% ~ (p=0.550 n=20+19)
Memclr64-48 9.79GB/s ± 5% 10.44GB/s ±10% +6.64% (p=0.000 n=18+20)
Memclr256-48 25.4GB/s ±14% 25.6GB/s ±12% ~ (p=0.647 n=20+19)
Memclr4096-48 39.4GB/s ± 8% 72.0GB/s ±13% +82.81% (p=0.000 n=20+20)
Memclr65536-48 26.6GB/s ± 6% 27.0GB/s ± 9% ~ (p=0.517 n=17+20)
Memclr1M-48 17.9GB/s ±12% 18.5GB/s ±11% ~ (p=0.068 n=20+20)
Memclr4M-48 18.0GB/s ± 9% 17.8GB/s ±14% ~ (p=0.547 n=20+20)
Memclr8M-48 17.9GB/s ±10% 17.8GB/s ±14% ~ (p=0.947 n=20+20)
Memclr16M-48 17.8GB/s ± 9% 18.4GB/s ± 9% ~ (p=0.050 n=20+19)
Memclr64M-48 6.19GB/s ±10% 14.87GB/s ± 9% +140.11% (p=0.000 n=20+20)
GoMemclr5-48 1.31GB/s ±10% 1.48GB/s ± 6% +13.06% (p=0.000 n=19+20)
GoMemclr16-48 4.81GB/s ±14% 4.71GB/s ± 8% ~ (p=0.341 n=20+20)
GoMemclr64-48 15.7GB/s ±13% 15.8GB/s ±11% ~ (p=0.967 n=20+19)
Change-Id: I393f3f20e2f31538d1b1dd70d6e5c201c106a095
Reviewed-on: https://go-review.googlesource.com/16773
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Klaus Post <klauspost@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
they can
Not only is this an obvious optimization:
benchmark old MB/s new MB/s speedup
BenchmarkMemmove1-4 35.35 29.65 0.84x
BenchmarkMemmove2-4 63.78 52.53 0.82x
BenchmarkMemmove3-4 89.72 73.96 0.82x
BenchmarkMemmove4-4 109.94 95.73 0.87x
BenchmarkMemmove5-4 127.60 112.80 0.88x
BenchmarkMemmove6-4 143.59 126.67 0.88x
BenchmarkMemmove7-4 157.90 138.92 0.88x
BenchmarkMemmove8-4 167.18 231.81 1.39x
BenchmarkMemmove9-4 175.23 252.07 1.44x
BenchmarkMemmove10-4 165.68 261.10 1.58x
BenchmarkMemmove11-4 174.43 263.31 1.51x
BenchmarkMemmove12-4 180.76 267.56 1.48x
BenchmarkMemmove13-4 189.06 284.93 1.51x
BenchmarkMemmove14-4 186.31 284.72 1.53x
BenchmarkMemmove15-4 195.75 281.62 1.44x
BenchmarkMemmove16-4 202.96 439.23 2.16x
BenchmarkMemmove32-4 264.77 775.77 2.93x
BenchmarkMemmove64-4 306.81 1209.64 3.94x
BenchmarkMemmove128-4 357.03 1515.41 4.24x
BenchmarkMemmove256-4 380.77 2066.01 5.43x
BenchmarkMemmove512-4 385.05 2556.45 6.64x
BenchmarkMemmove1024-4 381.23 2804.10 7.36x
BenchmarkMemmove2048-4 379.06 2814.83 7.43x
BenchmarkMemmove4096-4 387.43 3064.96 7.91x
BenchmarkMemmoveUnaligned1-4 28.91 25.40 0.88x
BenchmarkMemmoveUnaligned2-4 56.13 47.56 0.85x
BenchmarkMemmoveUnaligned3-4 74.32 69.31 0.93x
BenchmarkMemmoveUnaligned4-4 97.02 83.58 0.86x
BenchmarkMemmoveUnaligned5-4 110.17 103.62 0.94x
BenchmarkMemmoveUnaligned6-4 124.95 113.26 0.91x
BenchmarkMemmoveUnaligned7-4 142.37 130.82 0.92x
BenchmarkMemmoveUnaligned8-4 151.20 205.64 1.36x
BenchmarkMemmoveUnaligned9-4 166.97 215.42 1.29x
BenchmarkMemmoveUnaligned10-4 148.49 221.22 1.49x
BenchmarkMemmoveUnaligned11-4 159.47 239.57 1.50x
BenchmarkMemmoveUnaligned12-4 163.52 247.32 1.51x
BenchmarkMemmoveUnaligned13-4 167.55 256.54 1.53x
BenchmarkMemmoveUnaligned14-4 175.12 251.03 1.43x
BenchmarkMemmoveUnaligned15-4 192.10 267.13 1.39x
BenchmarkMemmoveUnaligned16-4 190.76 378.87 1.99x
BenchmarkMemmoveUnaligned32-4 259.02 562.98 2.17x
BenchmarkMemmoveUnaligned64-4 317.72 842.44 2.65x
BenchmarkMemmoveUnaligned128-4 355.43 1274.49 3.59x
BenchmarkMemmoveUnaligned256-4 378.17 1815.74 4.80x
BenchmarkMemmoveUnaligned512-4 362.15 2180.81 6.02x
BenchmarkMemmoveUnaligned1024-4 376.07 2453.58 6.52x
BenchmarkMemmoveUnaligned2048-4 381.66 2568.32 6.73x
BenchmarkMemmoveUnaligned4096-4 398.51 2669.36 6.70x
BenchmarkMemclr5-4 113.83 107.93 0.95x
BenchmarkMemclr16-4 223.84 389.63 1.74x
BenchmarkMemclr64-4 421.99 1209.58 2.87x
BenchmarkMemclr256-4 525.94 2411.58 4.59x
BenchmarkMemclr4096-4 581.66 4372.20 7.52x
BenchmarkMemclr65536-4 565.84 4747.48 8.39x
BenchmarkGoMemclr5-4 194.63 160.31 0.82x
BenchmarkGoMemclr16-4 295.30 630.07 2.13x
BenchmarkGoMemclr64-4 480.24 1884.03 3.92x
BenchmarkGoMemclr256-4 540.23 2926.49 5.42x
but it turns out that it's necessary to avoid the GC seeing partially written
pointers.
It's of course possible to be more sophisticated (using ldp/stp to move 16
bytes at a time in the core loop and unrolling the tail copying loops being
the obvious ideas) but I wanted something simple and (reasonably) obviously
correct.
Fixes #12552
Change-Id: Iaeaf8a812cd06f4747ba2f792de1ded738890735
Reviewed-on: https://go-review.googlesource.com/14813
Reviewed-by: Austin Clements <austin@google.com>
|
|
It is faster to execute
MOVQ AX,(DI)
MOVQ AX,8(DI)
MOVQ AX,16(DI)
MOVQ AX,24(DI)
ADDQ $32,DI
than
STOSQ
STOSQ
STOSQ
STOSQ
However, in order to be able to jump into
the middle of a block of MOVQs, the call
site needs to pre-adjust DI.
If we're clearing a small area, the cost
of that DI pre-adjustment isn't repaid.
This CL switches the DUFFZERO implementation
to use a hybrid strategy, in which small
clears use STOSQ as before, but large clears
use mostly MOVQ/ADDQ blocks.
benchmark old ns/op new ns/op delta
BenchmarkClearFat8 0.55 0.55 +0.00%
BenchmarkClearFat12 0.82 0.83 +1.22%
BenchmarkClearFat16 0.55 0.55 +0.00%
BenchmarkClearFat24 0.82 0.82 +0.00%
BenchmarkClearFat32 2.20 1.94 -11.82%
BenchmarkClearFat40 1.92 1.66 -13.54%
BenchmarkClearFat48 2.21 1.93 -12.67%
BenchmarkClearFat56 3.03 2.20 -27.39%
BenchmarkClearFat64 3.26 2.48 -23.93%
BenchmarkClearFat72 3.57 2.76 -22.69%
BenchmarkClearFat80 3.83 3.05 -20.37%
BenchmarkClearFat88 4.14 3.30 -20.29%
BenchmarkClearFat128 5.54 4.69 -15.34%
BenchmarkClearFat256 9.95 9.09 -8.64%
BenchmarkClearFat512 18.7 17.9 -4.28%
BenchmarkClearFat1024 36.2 35.4 -2.21%
Change-Id: Ic786406d9b3cab68d5a231688f9e66fcd1bd7103
Reviewed-on: https://go-review.googlesource.com/2585
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Recognize loops of the form
for i := range a {
a[i] = zero
}
in which the evaluation of a is free from side effects.
Replace these loops with calls to memclr.
This occurs in the stdlib in 18 places.
The motivating example is clearing a byte slice:
benchmark old ns/op new ns/op delta
BenchmarkGoMemclr5 3.31 3.26 -1.51%
BenchmarkGoMemclr16 13.7 3.28 -76.06%
BenchmarkGoMemclr64 50.8 4.14 -91.85%
BenchmarkGoMemclr256 157 6.02 -96.17%
Update #5373.
Change-Id: I99d3e6f5f268e8c6499b7e661df46403e5eb83e4
Reviewed-on: https://go-review.googlesource.com/2520
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
|