aboutsummaryrefslogtreecommitdiff
path: root/src/runtime/memmove_amd64.s
AgeCommit message (Collapse)Author
2021-05-13all: add //go:build lines to assembly filesTobias Klauser
Don't add them to files in vendor and cmd/vendor though. These will be pulled in by updating the respective dependencies. For #41184 Change-Id: Icc57458c9b3033c347124323f33084c85b224c70 Reviewed-on: https://go-review.googlesource.com/c/go/+/319389 Trust: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
2021-04-12runtime: port performance-critical functions to regabiAustin Clements
This CL ports a few performance-critical runtime assembly functions to use register arguments directly. While using the faster ABI is nice, the real win here is that we avoid ABI wrappers: since these are "builtin" functions in the compiler, it can generate calls to them without knowing that their native implementation is ABI0. Hence, it generates ABIInternal calls that go through ABI wrappers. By porting them to use ABIInternal natively, we avoid the overhead of the ABI wrapper. This significantly improves performance on several benchmarks, comparing regabiwrappers before and after this change: name old time/op new time/op delta BiogoIgor 15.7s ± 2% 15.7s ± 2% ~ (p=0.617 n=25+25) BiogoKrishna 18.5s ± 5% 17.7s ± 2% -4.61% (p=0.000 n=25+25) BleveIndexBatch100 5.91s ± 3% 5.82s ± 3% -1.60% (p=0.000 n=25+25) BleveQuery 6.76s ± 0% 6.60s ± 1% -2.31% (p=0.000 n=22+25) CompileTemplate 248ms ± 5% 245ms ± 1% ~ (p=0.643 n=25+20) CompileUnicode 94.4ms ± 3% 93.9ms ± 2% ~ (p=0.152 n=24+23) CompileGoTypes 1.60s ± 2% 1.59s ± 2% ~ (p=0.059 n=24+24) CompileCompiler 104ms ± 3% 103ms ± 1% ~ (p=0.056 n=25+22) CompileSSA 10.9s ± 1% 10.9s ± 1% ~ (p=0.052 n=25+25) CompileFlate 156ms ± 8% 152ms ± 1% -2.49% (p=0.008 n=25+21) CompileGoParser 248ms ± 1% 249ms ± 2% ~ (p=0.058 n=21+20) CompileReflect 595ms ± 3% 601ms ± 4% ~ (p=0.182 n=25+25) CompileTar 211ms ± 2% 211ms ± 1% ~ (p=0.663 n=23+23) CompileXML 282ms ± 2% 284ms ± 5% ~ (p=0.456 n=21+23) CompileStdCmd 13.6s ± 2% 13.5s ± 2% ~ (p=0.112 n=25+24) FoglemanFauxGLRenderRotateBoat 8.69s ± 2% 8.67s ± 0% ~ (p=0.094 n=22+25) FoglemanPathTraceRenderGopherIter1 20.2s ± 2% 20.7s ± 3% +2.53% (p=0.000 n=24+24) GopherLuaKNucleotide 31.4s ± 1% 31.0s ± 1% -1.28% (p=0.000 n=25+24) MarkdownRenderXHTML 246ms ± 1% 244ms ± 1% -0.79% (p=0.000 n=20+21) Tile38WithinCircle100kmRequest 843µs ± 4% 818µs ± 4% -2.93% (p=0.000 n=25+25) Tile38IntersectsCircle100kmRequest 1.06ms ± 5% 1.05ms ± 3% -1.19% (p=0.021 n=24+25) Tile38KNearestLimit100Request 1.01ms ± 1% 1.01ms ± 2% ~ (p=0.335 n=22+25) [Geo mean] 596ms 592ms -0.71% (https://perf.golang.org/search?q=upload:20210411.5) It also significantly reduces the performance penalty of enabling regabiwrappers, though it doesn't yet fully close the gap on all benchmarks: name old time/op new time/op delta BiogoIgor 15.7s ± 1% 15.7s ± 2% ~ (p=0.366 n=24+25) BiogoKrishna 17.7s ± 2% 17.7s ± 2% ~ (p=0.315 n=23+25) BleveIndexBatch100 5.86s ± 4% 5.82s ± 3% ~ (p=0.137 n=24+25) BleveQuery 6.55s ± 0% 6.60s ± 1% +0.83% (p=0.000 n=24+25) CompileTemplate 244ms ± 1% 245ms ± 1% ~ (p=0.208 n=21+20) CompileUnicode 94.0ms ± 4% 93.9ms ± 2% ~ (p=0.666 n=24+23) CompileGoTypes 1.60s ± 2% 1.59s ± 2% ~ (p=0.154 n=25+24) CompileCompiler 103ms ± 1% 103ms ± 1% ~ (p=0.905 n=24+22) CompileSSA 10.9s ± 2% 10.9s ± 1% ~ (p=0.803 n=25+25) CompileFlate 153ms ± 1% 152ms ± 1% ~ (p=0.182 n=23+21) CompileGoParser 250ms ± 2% 249ms ± 2% ~ (p=0.843 n=24+20) CompileReflect 595ms ± 4% 601ms ± 4% ~ (p=0.141 n=25+25) CompileTar 212ms ± 3% 211ms ± 1% ~ (p=0.499 n=23+23) CompileXML 282ms ± 1% 284ms ± 5% ~ (p=0.129 n=20+23) CompileStdCmd 13.5s ± 2% 13.5s ± 2% ~ (p=0.480 n=24+24) FoglemanFauxGLRenderRotateBoat 8.66s ± 1% 8.67s ± 0% ~ (p=0.325 n=25+25) FoglemanPathTraceRenderGopherIter1 20.6s ± 3% 20.7s ± 3% ~ (p=0.137 n=25+24) GopherLuaKNucleotide 30.5s ± 2% 31.0s ± 1% +1.68% (p=0.000 n=23+24) MarkdownRenderXHTML 243ms ± 1% 244ms ± 1% +0.51% (p=0.000 n=23+21) Tile38WithinCircle100kmRequest 801µs ± 2% 818µs ± 4% +2.11% (p=0.000 n=25+25) Tile38IntersectsCircle100kmRequest 1.01ms ± 2% 1.05ms ± 3% +4.34% (p=0.000 n=24+25) Tile38KNearestLimit100Request 1.00ms ± 1% 1.01ms ± 2% +0.81% (p=0.008 n=21+25) [Geo mean] 589ms 592ms +0.50% (https://perf.golang.org/search?q=upload:20210411.6) Change-Id: I8f77f010b0abc658064df569a27a9c7a7b1c7bf9 Reviewed-on: https://go-review.googlesource.com/c/go/+/308931 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2020-06-04all: fix dead links to inferno-os bitbucket repositoryTobias Klauser
Generated using: perl -i -npe 's#inferno-os/src/default#inferno-os/src/master#' $(git grep -l "inferno-os/src/default" | grep -v vendor) Change-Id: I4b6443bd09a8ea4c8aaeb40a1c73520d1f7ca648 Reviewed-on: https://go-review.googlesource.com/c/go/+/235821 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com>
2020-01-22runtime: document special memmove requirementsAustin Clements
Unlike C's memmove, Go's memmove must be careful to do indivisible writes of pointer values because it may be racing with the garbage collector reading the heap. We've had various bugs related to this over the years (#36101, #13160, #12552). Indeed, memmove is a great target for optimization and it's easy to forget the special requirements of Go's memmove. The CL documents these (currently unwritten!) requirements. We're also adding a test that should hopefully keep everyone honest going forward, though it's hard to be sure we're hitting all cases of memmove. Change-Id: I2f59f8d8d6fb42d2f10006b55d605b5efd8ddc24 Reviewed-on: https://go-review.googlesource.com/c/go/+/213418 Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-24all: align cpu feature variable offset namingMartin Möhrmann
Add an "offset_" prefix to all cpu feature variable offset constants to signify that they are not boolean cpu feature variables. Remove _ from offset constant names. Change-Id: I6e22a79ebcbe6e2ae54c4ac8764f9260bb3223ff Reviewed-on: https://go-review.googlesource.com/131215 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-24runtime: use internal/cpu variables in assembler codeMartin Möhrmann
Using internal/cpu variables has the benefit of avoiding false sharing (as those are padded) and allows memory and cache usage for these variables to be shared by multiple packages. Change-Id: I2bf68d03091bf52b466cf689230d5d25d5950037 Reviewed-on: https://go-review.googlesource.com/126599 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-06-11runtime: remove TODO notes suggesting jump tablesIskander Sharipov
For memmove/memclr using jump tables only reduces overall function performance for both amd64 and 386. Benchmarks for 32-bit memclr: name old time/op new time/op delta Memclr/5-8 8.01ns ± 0% 8.94ns ± 2% +11.59% (p=0.000 n=9+9) Memclr/16-8 9.05ns ± 0% 9.49ns ± 0% +4.81% (p=0.000 n=8+8) Memclr/64-8 9.15ns ± 0% 9.49ns ± 0% +3.76% (p=0.000 n=9+10) Memclr/256-8 16.6ns ± 0% 16.6ns ± 0% ~ (p=1.140 n=10+9) Memclr/4096-8 179ns ± 0% 166ns ± 0% -7.26% (p=0.000 n=9+8) Memclr/65536-8 3.36µs ± 1% 3.31µs ± 1% -1.48% (p=0.000 n=10+9) Memclr/1M-8 59.5µs ± 3% 60.5µs ± 2% +1.67% (p=0.009 n=10+10) Memclr/4M-8 239µs ± 3% 245µs ± 0% +2.49% (p=0.004 n=10+8) Memclr/8M-8 618µs ± 2% 614µs ± 1% ~ (p=0.315 n=10+8) Memclr/16M-8 1.49ms ± 2% 1.47ms ± 1% -1.11% (p=0.029 n=10+10) Memclr/64M-8 7.06ms ± 1% 7.05ms ± 0% ~ (p=0.573 n=10+8) [Geo mean] 3.36µs 3.39µs +1.14% For less predictable data, like loop iteration dependant sizes, branch table still shows 2-5% worse results. It also makes code slightly more complicated. This CL removes TODO note that directly suggest trying this optimization out. That encourages people to spend their time in a quite hopeless endeavour. The code used to implement branch table used a 32/64-entry table with pointers to TEXT blocks that implemented every associated label work. Most last entries point to "loop" code that is a fallthrough for all other sizes that do not map into specialized routines. The only inefficiency is extra MOVL/MOVQ required to fetch table pointer itself as MOVL $sym<>(SB)(AX*4) is not valid in Go asm (it works in other assemblers): TEXT ·memclrNew(SB), NOSPLIT, $0-8 MOVL ptr+0(FP), DI MOVL n+4(FP), BX // Handle 0 separately. TESTL BX, BX JEQ _0 LEAL -1(BX), CX // n-1 BSRL CX, CX // AX or X0 zeroed inside every text block. MOVL $memclrTable<>(SB), AX JMP (AX)(CX*4) _0: RET Change-Id: I4f706931b8127f85a8439b95834d5c2485a5d1bf Reviewed-on: https://go-review.googlesource.com/115678 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-06-01all: update comment URLs from HTTP to HTTPS, where possibleTim Cooper
Each URL was manually verified to ensure it did not serve up incorrect content. Change-Id: I4dc846227af95a73ee9a3074d0c379ff0fa955df Reviewed-on: https://go-review.googlesource.com/115798 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org>
2018-05-21runtime: use Go function signatures for memclr and memmove commentsMichael Munday
The function signatures in the comments used a C-like style. Using Go function signatures is cleaner. Change-Id: I1a093ed8fe5df59f3697c613cf3fce58bba4f5c1 Reviewed-on: https://go-review.googlesource.com/113876 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-15runtime: simplify amd64 memmove of 3/4 bytesJosh Bleecher Snyder
Change-Id: I132d3627ae301b68bf87eacb5bf41fd1ba2dcd91 Reviewed-on: https://go-review.googlesource.com/94025 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-15runtime: fix minor doc typos in amd64 memmoveJosh Bleecher Snyder
Change-Id: Ic1ce2f93d6a225699e9ce5307d62cdda8f97630d Reviewed-on: https://go-review.googlesource.com/94024 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-05-01runtime: refactor cpu feature detection for 386 & amd64Martin Möhrmann
Changes all cpu features to be detected and stored in bools in rt0_go. Updates: #15403 Change-Id: I5a9961cdec789b331d09c44d86beb53833d5dc3e Reviewed-on: https://go-review.googlesource.com/41950 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-04-25runtime: simplify detection of preference to use AVX memmoveMartin Möhrmann
Reduces cmd/go by 4464 bytes on amd64. Removes the duplicate detection of AVX support and presence of Intel processors. Change-Id: I4670189951a63760fae217708f68d65e94a30dc5 Reviewed-on: https://go-review.googlesource.com/41570 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-03all: fix minor misspellingsEric Lagergren
Change-Id: I1f1cfb161640eb8756fb1a283892d06b30b7a8fa Reviewed-on: https://go-review.googlesource.com/39356 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-23runtime: amd64, use 4-byte ops for memmove of 4 bytesKeith Randall
memmove used to use 2 2-byte load/store pairs to move 4 bytes. When the result is loaded with a single 4-byte load, it caused a store to load fowarding stall. To avoid the stall, special case memmove to use 4 byte ops for the 4 byte copy case. We already have a special case for 8-byte copies. 386 already specializes 4-byte copies. I'll do 2-byte copies also, but not for 1.8. benchmark old ns/op new ns/op delta BenchmarkIssue18740-8 7567 4799 -36.58% 3-byte copies get a bit slower. Other copies are unchanged. name old time/op new time/op delta Memmove/3-8 4.76ns ± 5% 5.26ns ± 3% +10.50% (p=0.000 n=10+10) Fixes #18740 Change-Id: Iec82cbac0ecfee80fa3c8fc83828f9a1819c3c74 Reviewed-on: https://go-review.googlesource.com/35567 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-10-06 runtime: improve memmove for amd64Denis Nagorny
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors. (collected on E5 2609v3 @1.9GHz) name old speed new speed delta Memmove/1-6 158MB/s ± 0% 172MB/s ± 0% +9.09% (p=0.000 n=16+16) Memmove/2-6 316MB/s ± 0% 345MB/s ± 0% +9.09% (p=0.000 n=18+16) Memmove/3-6 517MB/s ± 0% 517MB/s ± 0% ~ (p=0.445 n=16+16) Memmove/4-6 687MB/s ± 1% 690MB/s ± 0% +0.35% (p=0.000 n=20+17) Memmove/5-6 729MB/s ± 0% 729MB/s ± 0% +0.01% (p=0.000 n=16+18) Memmove/6-6 875MB/s ± 0% 875MB/s ± 0% +0.01% (p=0.000 n=18+18) Memmove/7-6 1.02GB/s ± 0% 1.02GB/s ± 1% ~ (p=0.139 n=19+20) Memmove/8-6 1.26GB/s ± 0% 1.26GB/s ± 0% +0.00% (p=0.000 n=18+18) Memmove/9-6 1.42GB/s ± 0% 1.42GB/s ± 0% +0.00% (p=0.000 n=17+18) Memmove/10-6 1.58GB/s ± 0% 1.58GB/s ± 0% +0.00% (p=0.000 n=19+19) Memmove/11-6 1.74GB/s ± 0% 1.74GB/s ± 0% +0.00% (p=0.001 n=18+17) Memmove/12-6 1.90GB/s ± 0% 1.90GB/s ± 0% +0.00% (p=0.000 n=19+19) Memmove/13-6 2.05GB/s ± 0% 2.05GB/s ± 0% +0.00% (p=0.000 n=18+19) Memmove/14-6 2.21GB/s ± 0% 2.21GB/s ± 0% +0.00% (p=0.000 n=16+20) Memmove/15-6 2.37GB/s ± 0% 2.37GB/s ± 0% +0.00% (p=0.004 n=19+20) Memmove/16-6 2.53GB/s ± 0% 2.53GB/s ± 0% +0.00% (p=0.000 n=16+16) Memmove/32-6 4.67GB/s ± 0% 4.67GB/s ± 0% +0.00% (p=0.000 n=17+17) Memmove/64-6 8.67GB/s ± 0% 8.64GB/s ± 0% -0.33% (p=0.000 n=18+17) Memmove/128-6 12.6GB/s ± 0% 11.6GB/s ± 0% -8.05% (p=0.000 n=16+19) Memmove/256-6 16.3GB/s ± 0% 16.6GB/s ± 0% +1.66% (p=0.000 n=20+18) Memmove/512-6 21.5GB/s ± 0% 24.4GB/s ± 0% +13.35% (p=0.000 n=18+17) Memmove/1024-6 24.7GB/s ± 0% 33.7GB/s ± 0% +36.12% (p=0.000 n=18+18) Memmove/2048-6 27.3GB/s ± 0% 43.3GB/s ± 0% +58.77% (p=0.000 n=19+17) Memmove/4096-6 37.5GB/s ± 0% 50.5GB/s ± 0% +34.56% (p=0.000 n=19+19) MemmoveUnalignedDst/1-6 135MB/s ± 0% 146MB/s ± 0% +7.69% (p=0.000 n=16+14) MemmoveUnalignedDst/2-6 271MB/s ± 0% 292MB/s ± 0% +7.69% (p=0.000 n=18+18) MemmoveUnalignedDst/3-6 438MB/s ± 0% 438MB/s ± 0% ~ (p=0.352 n=16+19) MemmoveUnalignedDst/4-6 584MB/s ± 0% 584MB/s ± 0% ~ (p=0.876 n=17+17) MemmoveUnalignedDst/5-6 631MB/s ± 1% 632MB/s ± 0% +0.25% (p=0.000 n=20+17) MemmoveUnalignedDst/6-6 759MB/s ± 0% 759MB/s ± 0% +0.00% (p=0.000 n=19+16) MemmoveUnalignedDst/7-6 885MB/s ± 0% 883MB/s ± 1% ~ (p=0.647 n=18+20) MemmoveUnalignedDst/8-6 1.08GB/s ± 0% 1.08GB/s ± 0% +0.00% (p=0.035 n=19+18) MemmoveUnalignedDst/9-6 1.22GB/s ± 0% 1.22GB/s ± 0% ~ (p=0.251 n=18+17) MemmoveUnalignedDst/10-6 1.35GB/s ± 0% 1.35GB/s ± 0% ~ (p=0.327 n=17+18) MemmoveUnalignedDst/11-6 1.49GB/s ± 0% 1.49GB/s ± 0% ~ (p=0.531 n=18+19) MemmoveUnalignedDst/12-6 1.63GB/s ± 0% 1.63GB/s ± 0% ~ (p=0.886 n=19+18) MemmoveUnalignedDst/13-6 1.76GB/s ± 0% 1.76GB/s ± 1% -0.24% (p=0.006 n=18+20) MemmoveUnalignedDst/14-6 1.90GB/s ± 0% 1.90GB/s ± 0% ~ (p=0.818 n=20+19) MemmoveUnalignedDst/15-6 2.03GB/s ± 0% 2.03GB/s ± 0% ~ (p=0.294 n=17+16) MemmoveUnalignedDst/16-6 2.17GB/s ± 0% 2.17GB/s ± 0% ~ (p=0.602 n=16+18) MemmoveUnalignedDst/32-6 4.05GB/s ± 0% 4.05GB/s ± 0% +0.00% (p=0.010 n=18+17) MemmoveUnalignedDst/64-6 7.59GB/s ± 0% 7.59GB/s ± 0% +0.00% (p=0.022 n=18+16) MemmoveUnalignedDst/128-6 11.1GB/s ± 0% 11.4GB/s ± 0% +2.79% (p=0.000 n=18+17) MemmoveUnalignedDst/256-6 16.4GB/s ± 0% 16.7GB/s ± 0% +1.59% (p=0.000 n=20+17) MemmoveUnalignedDst/512-6 15.7GB/s ± 0% 21.3GB/s ± 0% +35.87% (p=0.000 n=18+20) MemmoveUnalignedDst/1024-6 16.0GB/s ±20% 31.5GB/s ± 0% +96.93% (p=0.000 n=20+14) MemmoveUnalignedDst/2048-6 19.6GB/s ± 0% 42.1GB/s ± 0% +115.16% (p=0.000 n=17+18) MemmoveUnalignedDst/4096-6 6.41GB/s ± 0% 33.18GB/s ± 0% +417.56% (p=0.000 n=17+18) MemmoveUnalignedSrc/1-6 171MB/s ± 0% 166MB/s ± 0% -3.33% (p=0.000 n=19+16) MemmoveUnalignedSrc/2-6 343MB/s ± 0% 342MB/s ± 1% -0.41% (p=0.000 n=17+20) MemmoveUnalignedSrc/3-6 508MB/s ± 0% 493MB/s ± 1% -2.90% (p=0.000 n=17+17) MemmoveUnalignedSrc/4-6 677MB/s ± 0% 660MB/s ± 2% -2.55% (p=0.000 n=17+20) MemmoveUnalignedSrc/5-6 790MB/s ± 0% 790MB/s ± 0% ~ (p=0.139 n=17+17) MemmoveUnalignedSrc/6-6 948MB/s ± 0% 946MB/s ± 1% ~ (p=0.330 n=17+19) MemmoveUnalignedSrc/7-6 1.11GB/s ± 0% 1.11GB/s ± 0% -0.05% (p=0.026 n=17+17) MemmoveUnalignedSrc/8-6 1.38GB/s ± 0% 1.38GB/s ± 0% ~ (p=0.091 n=18+16) MemmoveUnalignedSrc/9-6 1.42GB/s ± 0% 1.40GB/s ± 1% -1.04% (p=0.000 n=19+20) MemmoveUnalignedSrc/10-6 1.58GB/s ± 0% 1.56GB/s ± 1% -1.15% (p=0.000 n=18+19) MemmoveUnalignedSrc/11-6 1.73GB/s ± 0% 1.71GB/s ± 1% -1.30% (p=0.000 n=20+20) MemmoveUnalignedSrc/12-6 1.89GB/s ± 0% 1.87GB/s ± 1% -1.18% (p=0.000 n=17+20) MemmoveUnalignedSrc/13-6 2.05GB/s ± 0% 2.02GB/s ± 1% -1.18% (p=0.000 n=17+20) MemmoveUnalignedSrc/14-6 2.21GB/s ± 0% 2.18GB/s ± 1% -1.14% (p=0.000 n=17+20) MemmoveUnalignedSrc/15-6 2.36GB/s ± 0% 2.34GB/s ± 1% -1.04% (p=0.000 n=17+20) MemmoveUnalignedSrc/16-6 2.52GB/s ± 0% 2.49GB/s ± 1% -1.26% (p=0.000 n=19+20) MemmoveUnalignedSrc/32-6 4.82GB/s ± 0% 4.61GB/s ± 0% -4.40% (p=0.000 n=19+20) MemmoveUnalignedSrc/64-6 5.03GB/s ± 4% 7.97GB/s ± 0% +58.55% (p=0.000 n=20+16) MemmoveUnalignedSrc/128-6 11.1GB/s ± 0% 11.2GB/s ± 0% +0.52% (p=0.000 n=17+18) MemmoveUnalignedSrc/256-6 16.5GB/s ± 0% 16.4GB/s ± 0% -0.10% (p=0.000 n=20+18) MemmoveUnalignedSrc/512-6 21.0GB/s ± 0% 22.1GB/s ± 0% +5.48% (p=0.000 n=14+17) MemmoveUnalignedSrc/1024-6 24.9GB/s ± 0% 31.9GB/s ± 0% +28.20% (p=0.000 n=19+20) MemmoveUnalignedSrc/2048-6 23.3GB/s ± 0% 33.8GB/s ± 0% +45.22% (p=0.000 n=17+19) MemmoveUnalignedSrc/4096-6 37.3GB/s ± 0% 42.7GB/s ± 0% +14.30% (p=0.000 n=17+17) Change-Id: Id66aa3e499ccfb117cb99d623ef326b50d057b64 Reviewed-on: https://go-review.googlesource.com/29590 Run-TryBot: Denis Nagorny <denis.nagorny@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-08-31Revert "runtime: improve memmove for amd64"Joe Tsai
This reverts commit 3607c5f4f18ad4d423e40996ebf7f46b2f79ce02. This was causing failures on amd64 machines without AVX. Fixes #16939 Change-Id: I70080fbb4e7ae791857334f2bffd847d08cb25fa Reviewed-on: https://go-review.googlesource.com/28274 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-08-31runtime: fix typoKevin Burke
Change-Id: I47e3cfa8b49e3d0b55c91387df31488b37038a8f Reviewed-on: https://go-review.googlesource.com/28225 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-08-31runtime: improve memmove for amd64Denis Nagorny
Use AVX if available on 4th generation of Intel(TM) Core(TM) processors. (collected on E5 2609v3 @1.9GHz) name old speed new speed delta Memmove/1-6 158MB/s ± 0% 172MB/s ± 0% +9.09% (p=0.000 n=16+16) Memmove/2-6 316MB/s ± 0% 345MB/s ± 0% +9.09% (p=0.000 n=18+16) Memmove/3-6 517MB/s ± 0% 517MB/s ± 0% ~ (p=0.445 n=16+16) Memmove/4-6 687MB/s ± 1% 690MB/s ± 0% +0.35% (p=0.000 n=20+17) Memmove/5-6 729MB/s ± 0% 729MB/s ± 0% +0.01% (p=0.000 n=16+18) Memmove/6-6 875MB/s ± 0% 875MB/s ± 0% +0.01% (p=0.000 n=18+18) Memmove/7-6 1.02GB/s ± 0% 1.02GB/s ± 1% ~ (p=0.139 n=19+20) Memmove/8-6 1.26GB/s ± 0% 1.26GB/s ± 0% +0.00% (p=0.000 n=18+18) Memmove/9-6 1.42GB/s ± 0% 1.42GB/s ± 0% +0.00% (p=0.000 n=17+18) Memmove/10-6 1.58GB/s ± 0% 1.58GB/s ± 0% +0.00% (p=0.000 n=19+19) Memmove/11-6 1.74GB/s ± 0% 1.74GB/s ± 0% +0.00% (p=0.001 n=18+17) Memmove/12-6 1.90GB/s ± 0% 1.90GB/s ± 0% +0.00% (p=0.000 n=19+19) Memmove/13-6 2.05GB/s ± 0% 2.05GB/s ± 0% +0.00% (p=0.000 n=18+19) Memmove/14-6 2.21GB/s ± 0% 2.21GB/s ± 0% +0.00% (p=0.000 n=16+20) Memmove/15-6 2.37GB/s ± 0% 2.37GB/s ± 0% +0.00% (p=0.004 n=19+20) Memmove/16-6 2.53GB/s ± 0% 2.53GB/s ± 0% +0.00% (p=0.000 n=16+16) Memmove/32-6 4.67GB/s ± 0% 4.67GB/s ± 0% +0.00% (p=0.000 n=17+17) Memmove/64-6 8.67GB/s ± 0% 8.64GB/s ± 0% -0.33% (p=0.000 n=18+17) Memmove/128-6 12.6GB/s ± 0% 11.6GB/s ± 0% -8.05% (p=0.000 n=16+19) Memmove/256-6 16.3GB/s ± 0% 16.6GB/s ± 0% +1.66% (p=0.000 n=20+18) Memmove/512-6 21.5GB/s ± 0% 24.4GB/s ± 0% +13.35% (p=0.000 n=18+17) Memmove/1024-6 24.7GB/s ± 0% 33.7GB/s ± 0% +36.12% (p=0.000 n=18+18) Memmove/2048-6 27.3GB/s ± 0% 43.3GB/s ± 0% +58.77% (p=0.000 n=19+17) Memmove/4096-6 37.5GB/s ± 0% 50.5GB/s ± 0% +34.56% (p=0.000 n=19+19) MemmoveUnalignedDst/1-6 135MB/s ± 0% 146MB/s ± 0% +7.69% (p=0.000 n=16+14) MemmoveUnalignedDst/2-6 271MB/s ± 0% 292MB/s ± 0% +7.69% (p=0.000 n=18+18) MemmoveUnalignedDst/3-6 438MB/s ± 0% 438MB/s ± 0% ~ (p=0.352 n=16+19) MemmoveUnalignedDst/4-6 584MB/s ± 0% 584MB/s ± 0% ~ (p=0.876 n=17+17) MemmoveUnalignedDst/5-6 631MB/s ± 1% 632MB/s ± 0% +0.25% (p=0.000 n=20+17) MemmoveUnalignedDst/6-6 759MB/s ± 0% 759MB/s ± 0% +0.00% (p=0.000 n=19+16) MemmoveUnalignedDst/7-6 885MB/s ± 0% 883MB/s ± 1% ~ (p=0.647 n=18+20) MemmoveUnalignedDst/8-6 1.08GB/s ± 0% 1.08GB/s ± 0% +0.00% (p=0.035 n=19+18) MemmoveUnalignedDst/9-6 1.22GB/s ± 0% 1.22GB/s ± 0% ~ (p=0.251 n=18+17) MemmoveUnalignedDst/10-6 1.35GB/s ± 0% 1.35GB/s ± 0% ~ (p=0.327 n=17+18) MemmoveUnalignedDst/11-6 1.49GB/s ± 0% 1.49GB/s ± 0% ~ (p=0.531 n=18+19) MemmoveUnalignedDst/12-6 1.63GB/s ± 0% 1.63GB/s ± 0% ~ (p=0.886 n=19+18) MemmoveUnalignedDst/13-6 1.76GB/s ± 0% 1.76GB/s ± 1% -0.24% (p=0.006 n=18+20) MemmoveUnalignedDst/14-6 1.90GB/s ± 0% 1.90GB/s ± 0% ~ (p=0.818 n=20+19) MemmoveUnalignedDst/15-6 2.03GB/s ± 0% 2.03GB/s ± 0% ~ (p=0.294 n=17+16) MemmoveUnalignedDst/16-6 2.17GB/s ± 0% 2.17GB/s ± 0% ~ (p=0.602 n=16+18) MemmoveUnalignedDst/32-6 4.05GB/s ± 0% 4.05GB/s ± 0% +0.00% (p=0.010 n=18+17) MemmoveUnalignedDst/64-6 7.59GB/s ± 0% 7.59GB/s ± 0% +0.00% (p=0.022 n=18+16) MemmoveUnalignedDst/128-6 11.1GB/s ± 0% 11.4GB/s ± 0% +2.79% (p=0.000 n=18+17) MemmoveUnalignedDst/256-6 16.4GB/s ± 0% 16.7GB/s ± 0% +1.59% (p=0.000 n=20+17) MemmoveUnalignedDst/512-6 15.7GB/s ± 0% 21.3GB/s ± 0% +35.87% (p=0.000 n=18+20) MemmoveUnalignedDst/1024-6 16.0GB/s ±20% 31.5GB/s ± 0% +96.93% (p=0.000 n=20+14) MemmoveUnalignedDst/2048-6 19.6GB/s ± 0% 42.1GB/s ± 0% +115.16% (p=0.000 n=17+18) MemmoveUnalignedDst/4096-6 6.41GB/s ± 0% 33.18GB/s ± 0% +417.56% (p=0.000 n=17+18) MemmoveUnalignedSrc/1-6 171MB/s ± 0% 166MB/s ± 0% -3.33% (p=0.000 n=19+16) MemmoveUnalignedSrc/2-6 343MB/s ± 0% 342MB/s ± 1% -0.41% (p=0.000 n=17+20) MemmoveUnalignedSrc/3-6 508MB/s ± 0% 493MB/s ± 1% -2.90% (p=0.000 n=17+17) MemmoveUnalignedSrc/4-6 677MB/s ± 0% 660MB/s ± 2% -2.55% (p=0.000 n=17+20) MemmoveUnalignedSrc/5-6 790MB/s ± 0% 790MB/s ± 0% ~ (p=0.139 n=17+17) MemmoveUnalignedSrc/6-6 948MB/s ± 0% 946MB/s ± 1% ~ (p=0.330 n=17+19) MemmoveUnalignedSrc/7-6 1.11GB/s ± 0% 1.11GB/s ± 0% -0.05% (p=0.026 n=17+17) MemmoveUnalignedSrc/8-6 1.38GB/s ± 0% 1.38GB/s ± 0% ~ (p=0.091 n=18+16) MemmoveUnalignedSrc/9-6 1.42GB/s ± 0% 1.40GB/s ± 1% -1.04% (p=0.000 n=19+20) MemmoveUnalignedSrc/10-6 1.58GB/s ± 0% 1.56GB/s ± 1% -1.15% (p=0.000 n=18+19) MemmoveUnalignedSrc/11-6 1.73GB/s ± 0% 1.71GB/s ± 1% -1.30% (p=0.000 n=20+20) MemmoveUnalignedSrc/12-6 1.89GB/s ± 0% 1.87GB/s ± 1% -1.18% (p=0.000 n=17+20) MemmoveUnalignedSrc/13-6 2.05GB/s ± 0% 2.02GB/s ± 1% -1.18% (p=0.000 n=17+20) MemmoveUnalignedSrc/14-6 2.21GB/s ± 0% 2.18GB/s ± 1% -1.14% (p=0.000 n=17+20) MemmoveUnalignedSrc/15-6 2.36GB/s ± 0% 2.34GB/s ± 1% -1.04% (p=0.000 n=17+20) MemmoveUnalignedSrc/16-6 2.52GB/s ± 0% 2.49GB/s ± 1% -1.26% (p=0.000 n=19+20) MemmoveUnalignedSrc/32-6 4.82GB/s ± 0% 4.61GB/s ± 0% -4.40% (p=0.000 n=19+20) MemmoveUnalignedSrc/64-6 5.03GB/s ± 4% 7.97GB/s ± 0% +58.55% (p=0.000 n=20+16) MemmoveUnalignedSrc/128-6 11.1GB/s ± 0% 11.2GB/s ± 0% +0.52% (p=0.000 n=17+18) MemmoveUnalignedSrc/256-6 16.5GB/s ± 0% 16.4GB/s ± 0% -0.10% (p=0.000 n=20+18) MemmoveUnalignedSrc/512-6 21.0GB/s ± 0% 22.1GB/s ± 0% +5.48% (p=0.000 n=14+17) MemmoveUnalignedSrc/1024-6 24.9GB/s ± 0% 31.9GB/s ± 0% +28.20% (p=0.000 n=19+20) MemmoveUnalignedSrc/2048-6 23.3GB/s ± 0% 33.8GB/s ± 0% +45.22% (p=0.000 n=17+19) MemmoveUnalignedSrc/4096-6 37.3GB/s ± 0% 42.7GB/s ± 0% +14.30% (p=0.000 n=17+17) Change-Id: Iab488d93a293cdf573ab5cd89b95a818bbb5d531 Reviewed-on: https://go-review.googlesource.com/22515 Run-TryBot: Denis Nagorny <denis.nagorny@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2016-08-29all: fix obsolete inferno-os linksEmmanuel Odeke
Fixes #16911. Fix obsolete inferno-os links, since code.google.com shutdown. This CL points to the right files by replacing http://code.google.com/p/inferno-os/source/browse with https://bitbucket.org/inferno-os/inferno-os/src/default To implement the change I wrote and ran this script in the root: $ grep -Rn 'http://code.google.com/p/inferno-os/source/browse' * \ | cut -d":" -f1 | while read F;do perl -pi -e \ 's/http:\/\/code.google.com\/p\/inferno-os\/source\/browse/https:\/\/bitbucket.org\/inferno-os\/inferno-os\/src\/default/g' $F;done I excluded any cmd/vendor changes from the commit. Change-Id: Iaaf828ac8f6fc949019fd01832989d00b29b6749 Reviewed-on: https://go-review.googlesource.com/27994 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-31runtime: don't use REP;MOVSB if CPUID doesn't say it is fastKeith Randall
Only use REP;MOVSB if: 1) The CPUID flag says it is fast, and 2) The pointers are unaligned Otherwise, use REP;MOVSQ. Update #14630 Change-Id: I946b28b87880c08e5eed1ce2945016466c89db66 Reviewed-on: https://go-review.googlesource.com/21300 Reviewed-by: Nigel Tao <nigeltao@golang.org>
2016-03-21runtime: use MOVSB instead of MOVSQ for unaligned movesKeith Randall
MOVSB is quite a bit faster for unaligned moves. Possibly we should use MOVSB all of the time, but Intel folks say it might be a bit faster to use MOVSQ on some processors (but not any I have access to at the moment). benchmark old ns/op new ns/op delta BenchmarkMemmove4096-8 93.9 93.2 -0.75% BenchmarkMemmoveUnalignedDst4096-8 256 151 -41.02% BenchmarkMemmoveUnalignedSrc4096-8 175 90.5 -48.29% Fixes #14630 Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff Reviewed-on: https://go-review.googlesource.com/20293 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-02all: single space after period.Brad Fitzpatrick
The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-07runtime: memmove/memclr pointers atomicallyKeith Randall
Make sure that we're moving or zeroing pointers atomically. Anything that is a multiple of pointer size and at least pointer aligned might have pointers in it. All the code looks ok except for the 1-pointer-sized moves. Fixes #13160 Update #12552 Change-Id: Ib97d9b918fa9f4cc5c56c67ed90255b7fdfb7b45 Reviewed-on: https://go-review.googlesource.com/16668 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2014-09-08build: move package sources from src/pkg to srcRuss Cox
Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.