This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()
- From: Denis Vlasenko <vda dot linux at googlemail dot com>
- To: Jim Wilson <wilson at specifix dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Wed, 2 Aug 2006 22:12:38 +0200
- Subject: Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()
- References: <firstname.lastname@example.org> <44CE856E.email@example.com>
On Tuesday 01 August 2006 00:34, Jim Wilson wrote:
> Denis Vlasenko wrote:
> > I still cannot figure out what precision is, so I restricted new code to
> > (n == HOST_BITS_PER_WIDE_INT && precision == HOST_BITS_PER_WIDE_INT) case.
> > Need help here.
> At the moment, there is probably no one who understands this code as
> well as you do, so you may not get much help from others.
> I looked at this a little, and I think precision is the number of
> significant bits in the divisor. Note that unsigned divides use N for
> precision, but signed divides use N-1. Also, in the pre_shift case,
> where we shift the divisors right if they have zeros in the low bits, we
> subtract the shift count from the precision.
> This probably has some effect on how many bits we can safely use for the
> resulting inverse multiplier.
> This code is based on a paper writte by Torbjorn Granlund and Peter
> Montgomery. This was published in the 1994 SIGPLAN PLDI conference
> proceedings. You should read this paper if you haven't already done so.
> There may be clues in there about how the gcc algorithm works. The
> gcc code was written by one of the co-authors, Torbjorn Granlund.
I read it.
I uploaded newer patch to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417.
My code basically differs in
1) I never do 33-bit math. I do 32-bit and if the best 32-bit value of K
for multiplier still doesn't work (there exist such 32-bit A
that A*K/L != A/B) then I return K*2-1 (which is a 33-bit value)
_without_ ever checking that it works (because it works always
because of math stuff (read the paper)). Old code calculates
33-bit K (it calls it "m", not K) and needs to jumt thru hoops
due to 33-bitness.
2) I do not do mhigh/mlow stuff. This is the core of my improvement.
I directly calculate the smallest bad_A for which bad_A*K/L != A/B.
For some values of B, it turns out that bad_A is > 2^32-1 even though
mhigh/mlow method says that K is maybe bad (i.e. my method
proves that this K is actually okay).
It also allows for finding better K values if we know that A is
bounded (from value range analysis), but current patch doesn't
use it yet.
My algorithm's comment (taken from patch):
+[below: 'div' is unsigned integer division]
+['/' is real division with infinite precision]
+[A,B,C... - integers, a,b,c... - reals]
+Theory: we want to calculate A div B, fast.
+B is const > 2 and is not a power of 2.
+In fp: A/B = A*(L/B)/L. (If L is a large power of 2,
+say 2^32, then it could be done really fast).
+Let k := L/B, K := L div B + 1.
+Then A/B = A*k/L.
+Then this is true:
+A div B <= A * K div L.
+For small enough A: A div B = A * K div L.
+Lets find first A for which it is not true.
+Lets compare k/L and K/L. K/L is larger by a small value d:
+d := K/L - k/L = (L div B + 1) / L - L/B/L =
+= (L div B * B + B) / (L*B) - L/(L*B) =
+= (L div B * B + B - L) / (L*B)
+A*K/L is larger than A*k/L by A*d.
+A*k/L is closest to 'overflowing into next integer'
+when A = N*B-1. The 'deficit' with such A is exactly 1/B.
+If A*d >= 1/B, then A*K/L will 'overflow'.
+Thus bad_A >= 1/B / d = (1/B) * (L*B)/(L div B * B + B - L) =
+= L/(L div B * B + B - L). Then you need to 'walk up' to next
+A representable as N*B-1: bad_A2 = (bad_A div B) * B + B-1
+This is the first A which will have wrong result
+(i.e. for which (A*K div L) = (A div B)+1, not (A div B).