Age | Commit message (Collapse) | Author |
|
As part of #42026, these helpers from io/ioutil were moved to os.
(ioutil.TempFile and TempDir became os.CreateTemp and MkdirTemp.)
Update the Go tree to use the preferred names.
As usual, code compiled with the Go 1.4 bootstrap toolchain
and code vendored from other sources is excluded.
ReadDir changes are in a separate CL, because they are not a
simple search and replace.
For #42026.
Change-Id: If318df0216d57e95ea0c4093b89f65e5b0ababb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/266365
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
The switch on tab is checking tab == castagnoliTable,
but castagnoliTable can change value during a concurrent
call to MakeTable.
Fixes #41911.
Change-Id: I6124dcdbf33e17fe302baa3e1aa03202dec61b4c
Reviewed-on: https://go-review.googlesource.com/c/go/+/261639
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Add note about using per-use seeds.
Delete "collision-resistant but" in:
> The hash functions are collision-resistant but not cryptographically secure.
"Collision-resistant" has a precise cryptographic meaning that is
incompatible with "not cryptographically secure".
All that is really meant by it here here is "it's a good hash function",
which should be established already.
Also delete:
> The hash value of a given byte sequence is consistent within a
> single process, but will be different in different processes.
This was added for its final clause in response to #37040,
but "The hash value of a given byte sequence" is by design not a
concept in this package. Only "... of a given seed and byte sequence".
And seeds cannot be shared between processes, so again by design
you can't even set up the appropriate first half of the sentence
to say the second half.
Change-Id: I2c02bee0e804ef3b120cb4752bf89e60f3f5ff5d
Reviewed-on: https://go-review.googlesource.com/c/go/+/255968
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
goos: linux
goarch: arm64
pkg: hash/maphash
BenchmarkHash8Bytes
BenchmarkHash8Bytes 22568919 46.0 ns/op 173.80 MB/s
BenchmarkHash320Bytes
BenchmarkHash320Bytes 5243858 230 ns/op 1393.30 MB/s
BenchmarkHash1K
BenchmarkHash1K 1755870 660 ns/op 1550.60 MB/s
BenchmarkHash8K
BenchmarkHash8K 225688 5313 ns/op 1541.90 MB/s
PASS
ok hash/maphash 6.465s
Change-Id: I5a909042a542135ebc47d639fea02dc46c900c1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/249079
Run-TryBot: Meng Zhuo <mzh@golangcn.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Originally, we use an assembly function that returns a boolean result to
tell whether the machine has vector facility or not. It is now no longer
needed when we can directly use cpu.S390X.HasVX variable. This CL
also removes the last occurence of hasVectorFacility function on s390x.
Change-Id: Id20cb746c21eacac5e13344b362e2d87adfe4317
Reviewed-on: https://go-review.googlesource.com/c/go/+/230337
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Test all the paths by which a Hash picks its seed.
Make sure they all behave identically to a preset seed.
Change-Id: I2f7950857697f2f07226b96655574c36931b2aae
Reviewed-on: https://go-review.googlesource.com/c/go/+/220686
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Vladimir Evgrafov <evgrafov.vladimir@gmail.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
Change-Id: I05c7ca644410822a527e94a7a8b883a0f8b0a4ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/220420
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
|
|
Hash initializes seed on the first usage of seed or state with initSeed.
initSeed uses SetSeed which discards accumulated data.
This causes hash to return different sums for the same data in the first use
and after reset.
This CL fixes this issue by separating the seed set from data discard.
Fixes #37315
Change-Id: Ic7020702c2ce822eb700af462e37efab12f72054
GitHub-Last-Rev: 48b2f963e86c1b37d49b838a050cc4128bb01266
GitHub-Pull-Request: golang/go#37328
Reviewed-on: https://go-review.googlesource.com/c/go/+/220259
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Change-Id: I0d2ba52d79c34d77d475ec8d673286d0e56b826b
Reviewed-on: https://go-review.googlesource.com/c/go/+/219340
Reviewed-by: Alan Donovan <adonovan@google.com>
|
|
Updates #36878
Fixes #37040
Change-Id: Ib0bd21481e5d9c3b3966c116966ecfe071243a24
Reviewed-on: https://go-review.googlesource.com/c/go/+/218297
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
|
|
This allows maphash.Hash to be allocated on the stack for typical uses.
Fixes #35636
Change-Id: I8366507d26ea717f47a9fb46d3bd69ba799845ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/207444
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This CL makes these changes to the hash/maphash API to make it fit a bit
more into the standard library:
- Move some of the package doc onto type Hash, so that `go doc maphash.Hash` shows it.
- Instead of having identical AddBytes and Write methods,
standardize on Write, the usual name for this function.
Similarly, AddString -> WriteString, AddByte -> WriteByte.
- Instead of having identical Hash and Sum64 methods,
standardize on Sum64 (for hash.Hash64). Dropping the "Hash" method
also helps because Hash is usually reserved to mean the state of a
hash function (hash.Hash etc), not the hash value itself.
- Make an uninitialized hash.Hash auto-seed with a random seed.
It is critical that users not use the same seed for all hash functions
in their program, at least not accidentally. So the Hash implementation
must either panic if uninitialized or initialize itself.
Initializing itself is less work for users and can be done lazily.
- Now that the zero hash.Hash is useful, drop maphash.New in favor of
new(maphash.Hash) or simply declaring a maphash.Hash.
- Add a [0]func()-typed field to the Hash so that Hashes cannot be compared.
(I considered doing the same for Seed but comparing seeds seems OK.)
- Drop the integer argument from MakeSeed, to match the original design
in golang.org/issue/28322. There is no point to giving users control
over the specific seed bits, since we want the interpretation of those
bits to be different in every different process. The only thing users
need is to be able to create a new random seed at each call.
(Fixes a TODO in MakeSeed's public doc comment.)
This API is new in Go 1.14, so these changes do not violate the compatibility promise.
Fixes #35060.
Fixes #35348.
Change-Id: Ie6fecc441f3f5ef66388c6ead92e875c0871f805
Reviewed-on: https://go-review.googlesource.com/c/go/+/205069
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Fixes #34778
Change-Id: If8225a7c41cb2af3f67157fb9670eef86272e85e
Reviewed-on: https://go-review.googlesource.com/c/go/+/204997
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
one of my grep patterns when splitting the original CL up into
two parts.
This one might also have interesting stuff to resurrect for any future
x32 ABI support.
Updates #30439
Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
This is part two if the nacl removal. Part 1 was CL 199499.
This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.
Updates #30439
Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
Replace the 128-bit multiplication in 4 parts with bits.Mul64
and two single-width multiplications. This simplifies the code
and increases throughput by ~50% on amd64.
name old time/op new time/op delta
Fnv128KB-4 9.64µs ± 0% 6.09µs ± 0% -36.89% (p=0.016 n=4+5)
Fnv128aKB-4 9.11µs ± 0% 6.17µs ± 5% -32.32% (p=0.008 n=5+5)
name old speed new speed delta
Fnv128KB-4 106MB/s ± 0% 168MB/s ± 0% +58.44% (p=0.016 n=4+5)
Fnv128aKB-4 112MB/s ± 0% 166MB/s ± 5% +47.85% (p=0.008 n=5+5)
Change-Id: Id752f2a20ea3de23a41e08db89eecf2bb60b7e6d
Reviewed-on: https://go-review.googlesource.com/c/133936
Run-TryBot: Matt Layher <mdlayher@gmail.com>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
Use t.Fatalf instead of t.Errorf followed by t.FailNow.
Change-Id: Ie31f8006e7d9daca7f59bf6f0d5ae688222be486
Reviewed-on: https://go-review.googlesource.com/c/144111
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Saves 36KB of memory in stdlib packages.
Updates #26775
Change-Id: I0f9d7b17d9768f6fb980d5fbba7c45920215a5fc
Reviewed-on: https://go-review.googlesource.com/127735
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Avoid using package specific variables when there is a one to one
correspondance to cpu feature support exported by internal/cpu.
This makes it clearer which cpu feature is referenced.
Another advantage is that internal/cpu variables are padded to avoid
false sharing and memory and cache usage is shared by multiple packages.
Change-Id: If18fb448a95207cfa6a3376f3b2ddc4b230dd138
Reviewed-on: https://go-review.googlesource.com/126596
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Each URL was manually verified to ensure it did not serve up incorrect
content.
Change-Id: I4dc846227af95a73ee9a3074d0c379ff0fa955df
Reviewed-on: https://go-review.googlesource.com/115798
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
|
|
Replace BYTE.. encodings with asm. This is possible due to asm
implementing more instructions and removal of
MOV $0, reg -> XOR reg, reg transformation from asm.
Change-Id: I011749ab6b3f64403ab6e746f3760c5841548b57
Reviewed-on: https://go-review.googlesource.com/97936
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
implementations
There are some basic tests in the packages implementing the hashes,
but this one is meant to be comprehensive for the standard library
as a whole.
Most importantly, it locks in the current representations and makes
sure that they do not change from release to release (and also, as a
result, that future releases can parse the representations generated
by older releases).
The crypto/* MarshalBinary implementations are being changed
in this CL to write only d.x[:d.nx] to the encoding, with zeros for
the remainder of the slice d.x[d.nx:]. The old encoding wrote the
whole d.x, but that exposed an internal detail: whether d.x is
cleared after a full buffer is accumulated, and also whether d.x was
used at all for previous blocks (consider 1-byte writes vs 1024-byte writes).
The new encoding writes only what the decoder needs to know,
nothing more.
In fact the old encodings were arguably also a security hole,
because they exposed data written even before the most recent
call to the Reset method, data that clearly has no impact on the
current hash and clearly should not be exposed. The leakage
is clearly visible in the old crypto/sha1 golden test tables also
being modified in this CL.
Change-Id: I4e9193a3ec5f91d27ce7d0aa24c19b3923741416
Reviewed-on: https://go-review.googlesource.com/82136
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
|
|
This reverts commit 08f19bbde1b01227fdc2fa2d326e4029bb74dd96.
Reason for revert:
The changed transformation takes effect on a larger set
of code snippets than expected.
For example, this:
func foo() {
// Comment
bar()
}
becomes:
func foo() {
// Comment
bar()
}
This is an unintended consequence.
Change-Id: Ifca88d6267dab8a8170791f7205124712bf8ace8
Reviewed-on: https://go-review.googlesource.com/81335
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Joe Tsai <joetsai@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Unless you go back and read the hash package documentation, it's
not clear that all the hash packages implement marshaling and
unmarshaling. Document the behaviour specifically in each package
that implements it as it this is hidden behaviour and easy to miss.
Change-Id: Id9d3508909362f1a3e53872d0319298359e50a94
Reviewed-on: https://go-review.googlesource.com/77251
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
|
|
change hash/crc32 package to use cpu package instead of using
runtime internal variables to check crc32 instruction
Change-Id: I8f88d2351bde8ed4e256f9adf822a08b9a00f532
Reviewed-on: https://go-review.googlesource.com/76490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
|
|
The cryptographic checksums operate in blocks of 64 or 128 bytes,
which means that the last 128 bytes or so of the input may be encoded
in its original (plaintext) form as part of the state.
Document this so users do not falsely assume that the encoded state
carries no reversible information about the input.
Change-Id: I823dbb87867bf0a77aa20f6ed7a615dbedab3715
Reviewed-on: https://go-review.googlesource.com/77372
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Example usage of functionality implemented in CL 66710.
Change-Id: I87d6e4d2fb7a60e4ba1e6ef02715480eb7e8f8bd
Reviewed-on: https://go-review.googlesource.com/76011
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
To improve readability when exported fields are removed,
forbid the printer from emitting an empty line before the first comment
in a const, var, or type block.
Also, when printing the "Has filtered or unexported fields." message,
add an empty line before it to separate the message from the struct
or interfact contents.
Before the change:
<<<
type NamedArg struct {
// Name is the name of the parameter placeholder.
//
// If empty, the ordinal position in the argument list will be
// used.
//
// Name must omit any symbol prefix.
Name string
// Value is the value of the parameter.
// It may be assigned the same value types as the query
// arguments.
Value interface{}
// contains filtered or unexported fields
}
>>>
After the change:
<<<
type NamedArg struct {
// Name is the name of the parameter placeholder.
//
// If empty, the ordinal position in the argument list will be
// used.
//
// Name must omit any symbol prefix.
Name string
// Value is the value of the parameter.
// It may be assigned the same value types as the query
// arguments.
Value interface{}
// contains filtered or unexported fields
}
>>>
Fixes #18264
Change-Id: I9fe17ca39cf92fcdfea55064bd2eaa784ce48c88
Reviewed-on: https://go-review.googlesource.com/71990
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
implementations
The marshal method allows the hash's internal state to be serialized and
unmarshaled at a later time, without having the re-write the entire stream
of data that was already written to the hash.
Fixes #20573
Change-Id: I40bbb84702ac4b7c5662f99bf943cdf4081203e5
Reviewed-on: https://go-review.googlesource.com/66710
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Change-Id: I2d0439a9f068e726173afafe2ef1f5d62b7feb4d
Reviewed-on: https://go-review.googlesource.com/46190
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Implements detection of x86 cpu features that
are used in the go standard library.
Changes all standard library packages to use the new cpu package
instead of using runtime internal variables to check x86 cpu features.
Updates: #15403
Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9
Reviewed-on: https://go-review.googlesource.com/41476
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
ARMv8 defines crc32 instruction.
Comparing to the original crc32 calculation, this patch makes use of
crc32 instructions to do crc32 calculation instead of the multiple
lookup table algorithms.
ARMv8 provides IEEE and Castagnoli polynomials for crc32 calculation
so that the perfomance of these two types of crc32 get significant
improved.
name old time/op new time/op delta
CRC32/poly=IEEE/size=15/align=0-32 117ns ± 0% 38ns ± 0% -67.44%
CRC32/poly=IEEE/size=15/align=1-32 117ns ± 0% 38ns ± 0% -67.52%
CRC32/poly=IEEE/size=40/align=0-32 129ns ± 0% 41ns ± 0% -68.37%
CRC32/poly=IEEE/size=40/align=1-32 129ns ± 0% 41ns ± 0% -68.29%
CRC32/poly=IEEE/size=512/align=0-32 828ns ± 0% 246ns ± 0% -70.29%
CRC32/poly=IEEE/size=512/align=1-32 828ns ± 0% 132ns ± 0% -84.06%
CRC32/poly=IEEE/size=1kB/align=0-32 1.58µs ± 0% 0.46µs ± 0% -70.98%
CRC32/poly=IEEE/size=1kB/align=1-32 1.58µs ± 0% 0.46µs ± 0% -70.92%
CRC32/poly=IEEE/size=4kB/align=0-32 6.06µs ± 0% 1.74µs ± 0% -71.27%
CRC32/poly=IEEE/size=4kB/align=1-32 6.10µs ± 0% 1.74µs ± 0% -71.44%
CRC32/poly=IEEE/size=32kB/align=0-32 48.3µs ± 0% 13.7µs ± 0% -71.61%
CRC32/poly=IEEE/size=32kB/align=1-32 48.3µs ± 0% 13.7µs ± 0% -71.60%
CRC32/poly=Castagnoli/size=15/align=0-32 116ns ± 0% 38ns ± 0% -67.07%
CRC32/poly=Castagnoli/size=15/align=1-32 116ns ± 0% 38ns ± 0% -66.90%
CRC32/poly=Castagnoli/size=40/align=0-32 127ns ± 0% 40ns ± 0% -68.11%
CRC32/poly=Castagnoli/size=40/align=1-32 127ns ± 0% 40ns ± 0% -68.11%
CRC32/poly=Castagnoli/size=512/align=0-32 828ns ± 0% 132ns ± 0% -84.06%
CRC32/poly=Castagnoli/size=512/align=1-32 827ns ± 0% 132ns ± 0% -84.04%
CRC32/poly=Castagnoli/size=1kB/align=0-32 1.59µs ± 0% 0.22µs ± 0% -85.89%
CRC32/poly=Castagnoli/size=1kB/align=1-32 1.58µs ± 0% 0.22µs ± 0% -85.79%
CRC32/poly=Castagnoli/size=4kB/align=0-32 6.14µs ± 0% 0.77µs ± 0% -87.40%
CRC32/poly=Castagnoli/size=4kB/align=1-32 6.06µs ± 0% 0.77µs ± 0% -87.25%
CRC32/poly=Castagnoli/size=32kB/align=0-32 48.3µs ± 0% 5.9µs ± 0% -87.71%
CRC32/poly=Castagnoli/size=32kB/align=1-32 48.4µs ± 0% 6.0µs ± 0% -87.69%
CRC32/poly=Koopman/size=15/align=0-32 104ns ± 0% 104ns ± 0% +0.00%
CRC32/poly=Koopman/size=15/align=1-32 104ns ± 0% 104ns ± 0% +0.00%
CRC32/poly=Koopman/size=40/align=0-32 235ns ± 0% 235ns ± 0% +0.00%
CRC32/poly=Koopman/size=40/align=1-32 235ns ± 0% 235ns ± 0% +0.00%
CRC32/poly=Koopman/size=512/align=0-32 2.71µs ± 0% 2.71µs ± 0% -0.07%
CRC32/poly=Koopman/size=512/align=1-32 2.71µs ± 0% 2.71µs ± 0% -0.04%
CRC32/poly=Koopman/size=1kB/align=0-32 5.40µs ± 0% 5.39µs ± 0% -0.06%
CRC32/poly=Koopman/size=1kB/align=1-32 5.40µs ± 0% 5.40µs ± 0% +0.02%
CRC32/poly=Koopman/size=4kB/align=0-32 21.5µs ± 0% 21.5µs ± 0% -0.16%
CRC32/poly=Koopman/size=4kB/align=1-32 21.5µs ± 0% 21.5µs ± 0% -0.05%
CRC32/poly=Koopman/size=32kB/align=0-32 172µs ± 0% 172µs ± 0% -0.07%
CRC32/poly=Koopman/size=32kB/align=1-32 172µs ± 0% 172µs ± 0% -0.01%
name old speed new speed delta
CRC32/poly=IEEE/size=15/align=0-32 128MB/s ± 0% 394MB/s ± 0% +207.95%
CRC32/poly=IEEE/size=15/align=1-32 128MB/s ± 0% 394MB/s ± 0% +208.09%
CRC32/poly=IEEE/size=40/align=0-32 310MB/s ± 0% 979MB/s ± 0% +216.07%
CRC32/poly=IEEE/size=40/align=1-32 310MB/s ± 0% 979MB/s ± 0% +216.16%
CRC32/poly=IEEE/size=512/align=0-32 618MB/s ± 0% 2074MB/s ± 0% +235.72%
CRC32/poly=IEEE/size=512/align=1-32 618MB/s ± 0% 3852MB/s ± 0% +523.55%
CRC32/poly=IEEE/size=1kB/align=0-32 646MB/s ± 0% 2225MB/s ± 0% +244.57%
CRC32/poly=IEEE/size=1kB/align=1-32 647MB/s ± 0% 2225MB/s ± 0% +243.87%
CRC32/poly=IEEE/size=4kB/align=0-32 676MB/s ± 0% 2352MB/s ± 0% +248.02%
CRC32/poly=IEEE/size=4kB/align=1-32 672MB/s ± 0% 2352MB/s ± 0% +250.15%
CRC32/poly=IEEE/size=32kB/align=0-32 678MB/s ± 0% 2387MB/s ± 0% +252.17%
CRC32/poly=IEEE/size=32kB/align=1-32 678MB/s ± 0% 2388MB/s ± 0% +252.11%
CRC32/poly=Castagnoli/size=15/align=0-32 129MB/s ± 0% 393MB/s ± 0% +205.51%
CRC32/poly=Castagnoli/size=15/align=1-32 129MB/s ± 0% 390MB/s ± 0% +203.41%
CRC32/poly=Castagnoli/size=40/align=0-32 314MB/s ± 0% 988MB/s ± 0% +215.04%
CRC32/poly=Castagnoli/size=40/align=1-32 314MB/s ± 0% 987MB/s ± 0% +214.68%
CRC32/poly=Castagnoli/size=512/align=0-32 618MB/s ± 0% 3860MB/s ± 0% +524.32%
CRC32/poly=Castagnoli/size=512/align=1-32 619MB/s ± 0% 3859MB/s ± 0% +523.66%
CRC32/poly=Castagnoli/size=1kB/align=0-32 645MB/s ± 0% 4568MB/s ± 0% +608.56%
CRC32/poly=Castagnoli/size=1kB/align=1-32 650MB/s ± 0% 4567MB/s ± 0% +602.94%
CRC32/poly=Castagnoli/size=4kB/align=0-32 667MB/s ± 0% 5297MB/s ± 0% +693.81%
CRC32/poly=Castagnoli/size=4kB/align=1-32 676MB/s ± 0% 5297MB/s ± 0% +684.00%
CRC32/poly=Castagnoli/size=32kB/align=0-32 678MB/s ± 0% 5519MB/s ± 0% +713.83%
CRC32/poly=Castagnoli/size=32kB/align=1-32 677MB/s ± 0% 5497MB/s ± 0% +712.04%
CRC32/poly=Koopman/size=15/align=0-32 143MB/s ± 0% 144MB/s ± 0% +0.27%
CRC32/poly=Koopman/size=15/align=1-32 143MB/s ± 0% 144MB/s ± 0% +0.33%
CRC32/poly=Koopman/size=40/align=0-32 169MB/s ± 0% 170MB/s ± 0% +0.12%
CRC32/poly=Koopman/size=40/align=1-32 170MB/s ± 0% 170MB/s ± 0% +0.08%
CRC32/poly=Koopman/size=512/align=0-32 189MB/s ± 0% 189MB/s ± 0% +0.07%
CRC32/poly=Koopman/size=512/align=1-32 189MB/s ± 0% 189MB/s ± 0% +0.04%
CRC32/poly=Koopman/size=1kB/align=0-32 190MB/s ± 0% 190MB/s ± 0% +0.05%
CRC32/poly=Koopman/size=1kB/align=1-32 190MB/s ± 0% 190MB/s ± 0% -0.01%
CRC32/poly=Koopman/size=4kB/align=0-32 190MB/s ± 0% 190MB/s ± 0% +0.15%
CRC32/poly=Koopman/size=4kB/align=1-32 190MB/s ± 0% 191MB/s ± 0% +0.05%
CRC32/poly=Koopman/size=32kB/align=0-32 191MB/s ± 0% 191MB/s ± 0% +0.06%
CRC32/poly=Koopman/size=32kB/align=1-32 191MB/s ± 0% 191MB/s ± 0% +0.02%
Also fix a bug of arm64 assembler
The optimization is mainly contributed by Fangming.Fang <fangming.fang@arm.com>
Change-Id: I900678c2e445d7e8ad9e2a9ab3305d649230905f
Reviewed-on: https://go-review.googlesource.com/40074
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The 128bit FNV hash will be used e.g. in QUIC.
The algorithm is described at
https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
Change-Id: I13f3ec39b0e12b7a5008824a6619dff2e708ee81
Reviewed-on: https://go-review.googlesource.com/38356
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Change-Id: I1f1cfb161640eb8756fb1a283892d06b30b7a8fa
Reviewed-on: https://go-review.googlesource.com/39356
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
This change improves the performance of crc32 for ppc64le by using
vpmsum and other vector instructions in the algorithm.
The testcase was updated to test more sizes.
Fixes #19570
BenchmarkCRC32/poly=IEEE/size=15/align=0-8 90.5 81.8 -9.61%
BenchmarkCRC32/poly=IEEE/size=15/align=1-8 89.7 81.7 -8.92%
BenchmarkCRC32/poly=IEEE/size=40/align=0-8 93.2 61.1 -34.44%
BenchmarkCRC32/poly=IEEE/size=40/align=1-8 92.8 60.9 -34.38%
BenchmarkCRC32/poly=IEEE/size=512/align=0-8 501 55.8 -88.86%
BenchmarkCRC32/poly=IEEE/size=512/align=1-8 502 132 -73.71%
BenchmarkCRC32/poly=IEEE/size=1kB/align=0-8 947 69.9 -92.62%
BenchmarkCRC32/poly=IEEE/size=1kB/align=1-8 946 144 -84.78%
BenchmarkCRC32/poly=IEEE/size=4kB/align=0-8 3602 186 -94.84%
BenchmarkCRC32/poly=IEEE/size=4kB/align=1-8 3603 263 -92.70%
BenchmarkCRC32/poly=IEEE/size=32kB/align=0-8 28404 1338 -95.29%
BenchmarkCRC32/poly=IEEE/size=32kB/align=1-8 28856 1405 -95.13%
BenchmarkCRC32/poly=Castagnoli/size=15/align=0-8 89.7 81.8 -8.81%
BenchmarkCRC32/poly=Castagnoli/size=15/align=1-8 89.8 81.9 -8.80%
BenchmarkCRC32/poly=Castagnoli/size=40/align=0-8 93.8 61.4 -34.54%
BenchmarkCRC32/poly=Castagnoli/size=40/align=1-8 94.3 61.3 -34.99%
BenchmarkCRC32/poly=Castagnoli/size=512/align=0-8 503 56.4 -88.79%
BenchmarkCRC32/poly=Castagnoli/size=512/align=1-8 502 132 -73.71%
BenchmarkCRC32/poly=Castagnoli/size=1kB/align=0-8 941 70.2 -92.54%
BenchmarkCRC32/poly=Castagnoli/size=1kB/align=1-8 943 145 -84.62%
BenchmarkCRC32/poly=Castagnoli/size=4kB/align=0-8 3588 186 -94.82%
BenchmarkCRC32/poly=Castagnoli/size=4kB/align=1-8 3595 264 -92.66%
BenchmarkCRC32/poly=Castagnoli/size=32kB/align=0-8 28266 1323 -95.32%
BenchmarkCRC32/poly=Castagnoli/size=32kB/align=1-8 28344 1404 -95.05%
Change-Id: Ic4d8274c66e0e87bfba5f609f508a3877aee6bb5
Reviewed-on: https://go-review.googlesource.com/38184
Reviewed-by: David Chase <drchase@google.com>
|
|
Change-Id: Iae68a097a6897f1616f94fdc3548837ef200e66f
Reviewed-on: https://go-review.googlesource.com/36541
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
|
|
Major reorganization of the crc32 code:
- The arch-specific files now implement a well-defined interface
(documented in crc32.go). They no longer have the responsibility of
initializing and falling back to a non-accelerated implementation;
instead, that happens in the higher level code.
- The non-accelerated algorithms are moved to a separate file with no
dependencies on other code.
- The "cutoff" optimization for slicing-by-8 is moved inside the
algorithm itself (as opposed to every callsite).
Tests are significantly improved:
- direct tests for the non-accelerated algorithms.
- "cross-check" tests for arch-specific implementations (all archs).
- tests for misaligned buffers for both IEEE and Castagnoli.
Fixes #16909.
Change-Id: I9b6dd83b7a57cd615eae901c0a6d61c6b8091c74
Reviewed-on: https://go-review.googlesource.com/27935
Reviewed-by: Keith Randall <khr@golang.org>
|
|
When SSE is available, we don't need the Table. However, it is
returned as a handle by MakeTable. Fix this to always generate
the table.
Further cleanup is discussed in #16909.
Change-Id: Ic05400d68c6b5d25073ebd962000451746137afc
Reviewed-on: https://go-review.googlesource.com/27934
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
The algorithm is explained in the comments. The improvement in
throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
for larger buffers.
Additionally, we no longer initialize the software tables if SSE4.2 is
available.
Adding a test for the SSE implementation (restricted to amd64 and
amd64p32).
Benchmarks on a Haswell i5-4670 @ 3.4 GHz:
name old time/op new time/op delta
CastagnoliCrc15B-4 21.9ns ± 1% 22.9ns ± 0% +4.45%
CastagnoliCrc15BMisaligned-4 22.6ns ± 0% 23.4ns ± 0% +3.43%
CastagnoliCrc40B-4 23.3ns ± 0% 23.9ns ± 0% +2.58%
CastagnoliCrc40BMisaligned-4 25.4ns ± 0% 26.1ns ± 0% +2.86%
CastagnoliCrc512-4 72.6ns ± 0% 52.8ns ± 0% -27.33%
CastagnoliCrc512Misaligned-4 76.3ns ± 1% 56.3ns ± 0% -26.18%
CastagnoliCrc1KB-4 128ns ± 1% 89ns ± 0% -30.04%
CastagnoliCrc1KBMisaligned-4 130ns ± 0% 88ns ± 0% -32.65%
CastagnoliCrc4KB-4 461ns ± 0% 187ns ± 0% -59.40%
CastagnoliCrc4KBMisaligned-4 463ns ± 0% 191ns ± 0% -58.77%
CastagnoliCrc32KB-4 3.58µs ± 0% 1.35µs ± 0% -62.22%
CastagnoliCrc32KBMisaligned-4 3.58µs ± 0% 1.36µs ± 0% -61.84%
name old speed new speed delta
CastagnoliCrc15B-4 684MB/s ± 1% 655MB/s ± 0% -4.32%
CastagnoliCrc15BMisaligned-4 663MB/s ± 0% 641MB/s ± 0% -3.32%
CastagnoliCrc40B-4 1.72GB/s ± 0% 1.67GB/s ± 0% -2.69%
CastagnoliCrc40BMisaligned-4 1.58GB/s ± 0% 1.53GB/s ± 0% -2.82%
CastagnoliCrc512-4 7.05GB/s ± 0% 9.70GB/s ± 0% +37.59%
CastagnoliCrc512Misaligned-4 6.71GB/s ± 1% 9.09GB/s ± 0% +35.43%
CastagnoliCrc1KB-4 7.98GB/s ± 1% 11.46GB/s ± 0% +43.55%
CastagnoliCrc1KBMisaligned-4 7.86GB/s ± 0% 11.70GB/s ± 0% +48.75%
CastagnoliCrc4KB-4 8.87GB/s ± 0% 21.80GB/s ± 0% +145.69%
CastagnoliCrc4KBMisaligned-4 8.83GB/s ± 0% 21.39GB/s ± 0% +142.25%
CastagnoliCrc32KB-4 9.15GB/s ± 0% 24.22GB/s ± 0% +164.62%
CastagnoliCrc32KBMisaligned-4 9.16GB/s ± 0% 24.00GB/s ± 0% +161.94%
Fixes #16107.
Change-Id: Ibe50ea76574674ce0571ef31c31015e0ed66b907
Reviewed-on: https://go-review.googlesource.com/27931
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This reverts commit 54d7de7dd62bab764125c48fd159bb938da078e1.
It was breaking non-amd64 builds.
Change-Id: I22650e922498eeeba3d4fa08bb4ea40a210c8f97
Reviewed-on: https://go-review.googlesource.com/27925
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The algorithm is explained in the comments. The improvement in
throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
for larger buffers.
Additionally, we no longer initialize the software tables if SSE4.2 is
available.
Benchmarks on a Haswell i5-4670 @ 3.4 GHz:
name old time/op new time/op delta
CastagnoliCrc15B-4 21.9ns ± 1% 22.9ns ± 0% +4.45%
CastagnoliCrc15BMisaligned-4 22.6ns ± 0% 23.4ns ± 0% +3.43%
CastagnoliCrc40B-4 23.3ns ± 0% 23.9ns ± 0% +2.58%
CastagnoliCrc40BMisaligned-4 25.4ns ± 0% 26.1ns ± 0% +2.86%
CastagnoliCrc512-4 72.6ns ± 0% 52.8ns ± 0% -27.33%
CastagnoliCrc512Misaligned-4 76.3ns ± 1% 56.3ns ± 0% -26.18%
CastagnoliCrc1KB-4 128ns ± 1% 89ns ± 0% -30.04%
CastagnoliCrc1KBMisaligned-4 130ns ± 0% 88ns ± 0% -32.65%
CastagnoliCrc4KB-4 461ns ± 0% 187ns ± 0% -59.40%
CastagnoliCrc4KBMisaligned-4 463ns ± 0% 191ns ± 0% -58.77%
CastagnoliCrc32KB-4 3.58µs ± 0% 1.35µs ± 0% -62.22%
CastagnoliCrc32KBMisaligned-4 3.58µs ± 0% 1.36µs ± 0% -61.84%
name old speed new speed delta
CastagnoliCrc15B-4 684MB/s ± 1% 655MB/s ± 0% -4.32%
CastagnoliCrc15BMisaligned-4 663MB/s ± 0% 641MB/s ± 0% -3.32%
CastagnoliCrc40B-4 1.72GB/s ± 0% 1.67GB/s ± 0% -2.69%
CastagnoliCrc40BMisaligned-4 1.58GB/s ± 0% 1.53GB/s ± 0% -2.82%
CastagnoliCrc512-4 7.05GB/s ± 0% 9.70GB/s ± 0% +37.59%
CastagnoliCrc512Misaligned-4 6.71GB/s ± 1% 9.09GB/s ± 0% +35.43%
CastagnoliCrc1KB-4 7.98GB/s ± 1% 11.46GB/s ± 0% +43.55%
CastagnoliCrc1KBMisaligned-4 7.86GB/s ± 0% 11.70GB/s ± 0% +48.75%
CastagnoliCrc4KB-4 8.87GB/s ± 0% 21.80GB/s ± 0% +145.69%
CastagnoliCrc4KBMisaligned-4 8.83GB/s ± 0% 21.39GB/s ± 0% +142.25%
CastagnoliCrc32KB-4 9.15GB/s ± 0% 24.22GB/s ± 0% +164.62%
CastagnoliCrc32KBMisaligned-4 9.16GB/s ± 0% 24.00GB/s ± 0% +161.94%
Fixes #16107.
Change-Id: I8fa827ec03f708ba27ee71c833f7544ad9dc5bc3
Reviewed-on: https://go-review.googlesource.com/24471
Reviewed-by: Keith Randall <khr@golang.org>
|
|
The code wasn't checking to see if the data was still >= 64 bytes
long after aligning it.
Aligning the data is an optimization and we don't actually need
to do it. In fact for smaller sizes it slows things down due to
the overhead of calling the generic function. Therefore for now
I have simply removed the alignment stage. I have also added a
check into the assembly to deliberately trigger a segmentation
fault if the data is too short.
Fixes #16779.
Change-Id: Ic01636d775efc5ec97689f050991cee04ce8fe73
Reviewed-on: https://go-review.googlesource.com/27409
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
AMD64
This commit improves the processing of the final few bytes in
castagnoliSSE42: instead of processing one byte at a time, we use all
versions of the CRC32 instruction to process 4 bytes, then 2, then 1.
The difference is only noticeable for small "odd" sized buffers.
We do the similar improvement for processing the first few bytes in
the case of unaligned buffer.
Fixing the test which was not actually verifying the results for
misaligned buffers (WriteString was creating an internal copy which
was aligned).
Adding benchmarks for length 15 (aligned and misaligned), results
below.
name old time/op new time/op delta
CastagnoliCrc15B-4 25.1ns ± 0% 22.1ns ± 1% -12.14%
CastagnoliCrc15BMisaligned-4 25.2ns ± 0% 22.9ns ± 1% -9.03%
CastagnoliCrc40B-4 23.1ns ± 0% 23.4ns ± 0% +1.08%
CastagnoliCrc1KB-4 127ns ± 0% 128ns ± 0% +1.18%
CastagnoliCrc4KB-4 462ns ± 0% 464ns ± 0% ~
CastagnoliCrc32KB-4 3.58µs ± 0% 3.60µs ± 0% +0.58%
name old speed new speed delta
CastagnoliCrc15B-4 597MB/s ± 0% 679MB/s ± 1% +13.77%
CastagnoliCrc15BMisaligned-4 596MB/s ± 0% 655MB/s ± 1% +9.94%
CastagnoliCrc40B-4 1.73GB/s ± 0% 1.71GB/s ± 0% -1.14%
CastagnoliCrc1KB-4 8.01GB/s ± 0% 7.93GB/s ± 1% -1.06%
CastagnoliCrc4KB-4 8.86GB/s ± 0% 8.83GB/s ± 0% ~
CastagnoliCrc32KB-4 9.14GB/s ± 0% 9.09GB/s ± 0% -0.58%
Change-Id: I499e37af2241d28e3e5d522bbab836c1a718430a
Reviewed-on: https://go-review.googlesource.com/24470
Reviewed-by: Keith Randall <khr@golang.org>
|
|
Similar to crc32 slicing by 8.
This also fixes a Crc64KB benchmark actually using 1024 bytes.
Crc64/ISO64KB-4 147µs ± 0% 37µs ± 0% -75.05% (p=0.000 n=18+18)
Crc64/ISO4KB-4 9.19µs ± 0% 2.33µs ± 0% -74.70% (p=0.000 n=19+20)
Crc64/ISO1KB-4 2.31µs ± 0% 0.60µs ± 0% -73.81% (p=0.000 n=19+15)
Crc64/ECMA64KB-4 147µs ± 0% 37µs ± 0% -75.05% (p=0.000 n=20+20)
Crc64/Random64KB-4 147µs ± 0% 41µs ± 0% -72.17% (p=0.000 n=20+18)
Crc64/Random16KB-4 36.7µs ± 0% 36.5µs ± 0% -0.54% (p=0.000 n=18+19)
name old speed new speed delta
Crc64/ISO64KB-4 446MB/s ± 0% 1788MB/s ± 0% +300.72% (p=0.000 n=18+18)
Crc64/ISO4KB-4 446MB/s ± 0% 1761MB/s ± 0% +295.20% (p=0.000 n=18+20)
Crc64/ISO1KB-4 444MB/s ± 0% 1694MB/s ± 0% +281.46% (p=0.000 n=19+20)
Crc64/ECMA64KB-4 446MB/s ± 0% 1788MB/s ± 0% +300.77% (p=0.000 n=20+20)
Crc64/Random64KB-4 446MB/s ± 0% 1603MB/s ± 0% +259.32% (p=0.000 n=20+18)
Crc64/Random16KB-4 446MB/s ± 0% 448MB/s ± 0% +0.54% (p=0.000 n=18+20)
Change-Id: I1c7621d836c486d6bfc41dbe1ec2ff9ab11aedfc
Reviewed-on: https://go-review.googlesource.com/22222
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
The input buffer is aligned to a doubleword boundary to
improve performance of the vector instructions. The pure
Go implementation is used to align the input data, and is
also used when the vector instructions are not available
or the data length is less than 64 bytes.
Change-Id: Ie259a5f2f1562bcc17961c99e5776c99091d6bed
Reviewed-on: https://go-review.googlesource.com/22201
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
name old time/op new time/op delta
Adler32KB-4 592ns ± 0% 447ns ± 0% -24.49% (p=0.000 n=19+20)
name old speed new speed delta
Adler32KB-4 1.73GB/s ± 0% 2.29GB/s ± 0% +32.41% (p=0.000 n=20+20)
Change-Id: I38990aa66ca4452a886200018a57c0bc3af30717
Reviewed-on: https://go-review.googlesource.com/21880
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
It seems cleaner and more consistent with other files to list the
architectures that have assembly implementations rather than to
list those that do not.
This means we don't have to add s390x and future platforms to this
list.
Change-Id: I2ad3f66b76eb1711333c910236ca7f5151b698e5
Reviewed-on: https://go-review.googlesource.com/21770
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
Currently we test crc64 only with ISO polynomial.
Change-Id: Ibc5e202db3b960369cbbb18e31eb0fea07b54dba
Reviewed-on: https://go-review.googlesource.com/21309
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
This adds "slicing by 8" optimization to Castagnoli tables which will
speed up CRC32 calculation on systems without asssembler,
which are all but AMD64.
In my tests, it is faster to use "slicing by 8" for sizes all down to
16 bytes, so the switchover point has been adjusted.
There are no benchmarks for small sizes, so I have added one for 40 bytes,
as well as one for bigger sizes (32KB).
Castagnoli, No assembler, 40 Byte payload: (before, after)
BenchmarkCastagnoli40B-4 10000000 161 ns/op 246.94 MB/s
BenchmarkCastagnoli40B-4 20000000 100 ns/op 398.01 MB/s
Castagnoli, No assembler, 32KB payload: (before, after)
BenchmarkCastagnoli32KB-4 10000 115426 ns/op 283.89 MB/s
BenchmarkCastagnoli32KB-4 30000 45171 ns/op 725.41 MB/s
IEEE, No assembler, 1KB payload: (before, after)
BenchmarkCrc1KB-4 500000 3604 ns/op 284.10 MB/s
BenchmarkCrc1KB-4 1000000 1463 ns/op 699.79 MB/s
Compared:
benchmark old ns/op new ns/op delta
BenchmarkCastagnoli40B-4 161 100 -37.89%
BenchmarkCastagnoli32KB-4 115426 45171 -60.87%
BenchmarkCrc1KB-4 3604 1463 -59.41%
benchmark old MB/s new MB/s speedup
BenchmarkCastagnoli40B-4 246.94 398.01 1.61x
BenchmarkCastagnoli32KB-4 283.89 725.41 2.56x
BenchmarkCrc1KB-4 284.10 699.79 2.46x
Change-Id: I303e4ec84e8d4dafd057d64c0e43deb2b498e968
Reviewed-on: https://go-review.googlesource.com/19335
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
Add several instructions that were used via BYTE and use them.
Instructions added: PEXTRB, PEXTRD, PEXTRQ, PINSRB, XGETBV, POPCNT.
Change-Id: I5a80cd390dc01f3555dbbe856a475f74b5e6df65
Reviewed-on: https://go-review.googlesource.com/18593
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|