This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()

From: Denis Vlasenko <vda dot linux at googlemail dot com>
To: Jim Wilson <wilson at specifix dot com>
Cc: gcc at gcc dot gnu dot org
Date: Wed, 2 Aug 2006 22:12:38 +0200
Subject: Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()
References: <200607301514.12612.vda.linux@googlemail.com> <44CE856E.3060305@specifix.com>

On Tuesday 01 August 2006 00:34, Jim Wilson wrote:
> Denis Vlasenko wrote:
> > I still cannot figure out what precision is, so I restricted new code to
> > (n == HOST_BITS_PER_WIDE_INT && precision == HOST_BITS_PER_WIDE_INT) case.
> > Need help here.
> 
> At the moment, there is probably no one who understands this code as 
> well as you do, so you may not get much help from others.
> 
> I looked at this a little, and I think precision is the number of 
> significant bits in the divisor.  Note that unsigned divides use N for 
> precision, but signed divides use N-1.  Also, in the pre_shift case, 
> where we shift the divisors right if they have zeros in the low bits, we 
> subtract the shift count from the precision.

Yes.

> This probably has some effect on how many bits we can safely use for the 
> resulting inverse multiplier.
> 
> This code is based on a paper writte by Torbjorn Granlund and Peter 
> Montgomery.  This was published in the 1994 SIGPLAN PLDI conference 
> proceedings.  You should read this paper if you haven't already done so. 
>   There may be clues in there about how the gcc algorithm works.  The 
> gcc code was written by one of the co-authors, Torbjorn Granlund.

I read it.

I uploaded newer patch to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417.
My code basically differs in

1) I never do 33-bit math. I do 32-bit and if the best 32-bit value of K
   for multiplier still doesn't work (there exist such 32-bit A
   that A*K/L != A/B) then I return K*2-1 (which is a 33-bit value)
   _without_ ever checking that it works (because it works always
   because of math stuff (read the paper)). Old code calculates
   33-bit K (it calls it "m", not K) and needs to jumt thru hoops
   due to 33-bitness.

2) I do not do mhigh/mlow stuff. This is the core of my improvement.
   I directly calculate the smallest bad_A for which bad_A*K/L != A/B.
   For some values of B, it turns out that bad_A is > 2^32-1 even though
   mhigh/mlow method says that K is maybe bad (i.e. my method
   proves that this K is actually okay).
   It also allows for finding better K values if we know that A is
   bounded (from value range analysis), but current patch doesn't
   use it yet.

My algorithm's comment (taken from patch):

+/*
+[below: 'div' is unsigned integer division]
+['/' is real division with infinite precision]
+[A,B,C... - integers, a,b,c... - reals]
+
+Theory: we want to calculate A div B, fast.
+B is const > 2 and is not a power of 2.
+
+In fp: A/B = A*(L/B)/L. (If L is a large power of 2,
+say 2^32, then it could be done really fast).
+Let k := L/B, K := L div B + 1.
+Then A/B = A*k/L.
+
+Then this is true:
+
+A div B <= A * K div L.
+
+For small enough A: A div B = A * K div L.
+Lets find first A for which it is not true.
+
+Lets compare k/L and K/L. K/L is larger by a small value d:
+
+d := K/L - k/L = (L div B + 1) / L - L/B/L =
+= (L div B * B + B) / (L*B) - L/(L*B) =
+= (L div B * B + B - L) / (L*B)
+
+A*K/L is larger than A*k/L by A*d.
+
+A*k/L is closest to 'overflowing into next integer'
+when A = N*B-1. The 'deficit' with such A is exactly 1/B.
+If A*d >= 1/B, then A*K/L will 'overflow'.
+
+Thus bad_A >= 1/B / d = (1/B) * (L*B)/(L div B * B + B - L) =
+= L/(L div B * B + B - L). Then you need to 'walk up' to next
+A representable as N*B-1: bad_A2 = (bad_A div B) * B + B-1
+This is the first A which will have wrong result
+(i.e. for which (A*K div L) = (A div B)+1, not (A div B).

--
vda

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]