math/big: remove Direct Sqrt computation

The Float.Sqrt method switches (for performance reasons) between direct (uses Quo) and inverse (doesn't) computation, depending on the precision, with threshold 128. Unfortunately the implementation of recursive division in CL 172018 made Quo slightly slower exactly in the range around and below the threshold Sqrt is using, so this strategy is no longer profitable. The new division algorithm allocates more, and this has increased the amount of allocations performed by Sqrt when using the direct method; on low precisions the computation is fast, so additional allocations have an negative impact on performance. Interestingly, only using the inverse method doesn't just reverse the effects of the Quo algorithm change, but it seems to make performances better overall for small precisions: name old time/op new time/op delta FloatSqrt/64-4 643ns ± 1% 635ns ± 1% -1.24% (p=0.000 n=10+10) FloatSqrt/128-4 1.44µs ± 1% 1.02µs ± 1% -29.25% (p=0.000 n=10+10) FloatSqrt/256-4 1.49µs ± 1% 1.49µs ± 1% ~ (p=0.752 n=10+10) FloatSqrt/1000-4 3.71µs ± 1% 3.74µs ± 1% +0.87% (p=0.001 n=10+10) FloatSqrt/10000-4 35.3µs ± 1% 35.6µs ± 1% +0.82% (p=0.002 n=10+9) FloatSqrt/100000-4 844µs ± 1% 844µs ± 0% ~ (p=0.549 n=10+9) FloatSqrt/1000000-4 69.5ms ± 0% 69.6ms ± 0% ~ (p=0.222 n=9+9) name old alloc/op new alloc/op delta FloatSqrt/64-4 280B ± 0% 200B ± 0% -28.57% (p=0.000 n=10+10) FloatSqrt/128-4 504B ± 0% 248B ± 0% -50.79% (p=0.000 n=10+10) FloatSqrt/256-4 344B ± 0% 344B ± 0% ~ (all equal) FloatSqrt/1000-4 1.30kB ± 0% 1.30kB ± 0% ~ (all equal) FloatSqrt/10000-4 13.5kB ± 0% 13.5kB ± 0% ~ (p=0.237 n=10+10) FloatSqrt/100000-4 123kB ± 0% 123kB ± 0% ~ (p=0.247 n=10+10) FloatSqrt/1000000-4 1.83MB ± 1% 1.83MB ± 3% ~ (p=0.779 n=8+10) name old allocs/op new allocs/op delta FloatSqrt/64-4 8.00 ± 0% 5.00 ± 0% -37.50% (p=0.000 n=10+10) FloatSqrt/128-4 11.0 ± 0% 5.0 ± 0% -54.55% (p=0.000 n=10+10) FloatSqrt/256-4 5.00 ± 0% 5.00 ± 0% ~ (all equal) FloatSqrt/1000-4 6.00 ± 0% 6.00 ± 0% ~ (all equal) FloatSqrt/10000-4 6.00 ± 0% 6.00 ± 0% ~ (all equal) FloatSqrt/100000-4 6.00 ± 0% 6.00 ± 0% ~ (all equal) FloatSqrt/1000000-4 10.3 ±13% 10.3 ±13% ~ (p=1.000 n=10+10) For example, 1.02µs for FloatSqrt/128 is actually better than what I was getting on the same machine before the Quo changes. The .8% slowdown on /1000 and /10000 appears to be real and it is quite baffling (that codepath was not touched at all); it may be caused by code alignment changes. Change-Id: Ib03761cdc1055674bc7526d4f3a23d7a25094029 Reviewed-on: https://go-review.googlesource.com/c/go/+/228062 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
author: Alberto Donizetti <alb.donizetti@gmail.com> 2020-04-15 10:50:30 +0200
committer: Robert Griesemer <gri@golang.org> 2020-04-15 16:37:53 +0000
commit: 813f8eae2738c75151d036906a9008525c1ba0fe (patch)
tree: 3846225c465f826584e49eadfa4a901f32a93537 /src/math
parent: 435b9dd1a1bae81a32eafb59a9de7fb2873cd51e (diff)
download: go-813f8eae2738c75151d036906a9008525c1ba0fe.tar.gz
go-813f8eae2738c75151d036906a9008525c1ba0fe.zip
1 files changed, 3 insertions, 51 deletions
diff --git a/src/math/big/sqrt.go b/src/math/big/sqrt.go
index e11504ad07..0d50164557 100644
--- a/src/math/big/sqrt.go
+++ b/src/math/big/sqrt.go
@@ -73,61 +73,14 @@ func (z *Float) Sqrt(x *Float) *Float {
 	}
 	// 0.25 <= z < 2.0
 
-	// Solving x² - z = 0 directly requires a Quo call, but it's
-	// faster for small precisions.
-	//
-	// Solving 1/x² - z = 0 avoids the Quo call and is much faster for
-	// high precisions.
-	//
-	// 128bit precision is an empirically chosen threshold.
-	if z.prec <= 128 {
-		z.sqrtDirect(z)
-	} else {
-		z.sqrtInverse(z)
-	}
+	// Solving 1/x² - z = 0 avoids Quo calls and is faster, especially
+	// for high precisions.
+	z.sqrtInverse(z)
 
 	// re-attach halved exponent
 	return z.SetMantExp(z, b/2)
 }
 
-// Compute √x (up to prec 128) by solving
-//   t² - x = 0
-// for t, starting with a 53 bits precision guess from math.Sqrt and
-// then using at most two iterations of Newton's method.
-func (z *Float) sqrtDirect(x *Float) {
-	// let
-	//   f(t) = t² - x
-	// then
-	//   g(t) = f(t)/f'(t) = ½(t² - x)/t
-	// and the next guess is given by
-	//   t2 = t - g(t) = ½(t² + x)/t
-	u := new(Float)
-	ng := func(t *Float) *Float {
-		u.prec = t.prec
-		u.Mul(t, t)        // u = t²
-		u.Add(u, x)        //   = t² + x
-		u.exp--            //   = ½(t² + x)
-		return t.Quo(u, t) //   = ½(t² + x)/t
-	}
-
-	xf, _ := x.Float64()
-	sq := NewFloat(math.Sqrt(xf))
-
-	switch {
-	case z.prec > 128:
-		panic("sqrtDirect: only for z.prec <= 128")
-	case z.prec > 64:
-		sq.prec *= 2
-		sq = ng(sq)
-		fallthrough
-	default:
-		sq.prec *= 2
-		sq = ng(sq)
-	}
-
-	z.Set(sq)
-}
-
 // Compute √x (to z.prec precision) by solving
 //   1/t² - x = 0
 // for t (using Newton's method), and then inverting.
@@ -150,7 +103,6 @@ func (z *Float) sqrtInverse(x *Float) {
 		u.Mul(t, v)     // u = t(3 - xt²)
 		u.exp--         //   = ½t(3 - xt²)
 		return t.Set(u)
-
 	}
 
 	xf, _ := x.Float64()
author	Alberto Donizetti <alb.donizetti@gmail.com>	2020-04-15 10:50:30 +0200
committer	Robert Griesemer <gri@golang.org>	2020-04-15 16:37:53 +0000
commit	813f8eae2738c75151d036906a9008525c1ba0fe (patch)
tree	3846225c465f826584e49eadfa4a901f32a93537 /src/math
parent	435b9dd1a1bae81a32eafb59a9de7fb2873cd51e (diff)
download	go-813f8eae2738c75151d036906a9008525c1ba0fe.tar.gz go-813f8eae2738c75151d036906a9008525c1ba0fe.zip