This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()
- From: Denis Vlasenko <vda dot linux at googlemail dot com>
- To: Roger Sayle <roger at eyesopen dot com>, gcc at gcc dot gnu dot org, tege at swox dot com
- Date: Thu, 3 Aug 2006 19:11:37 +0200
- Subject: Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()
- References: <Pine.LNX.4.44.0608022214001.21492-100000@www.eyesopen.com>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417
On Thursday 03 August 2006 06:55, Roger Sayle wrote:
> As mentioned by Jim, the current GCC algorithm is based upon the
> paper by Torbjorn Granlund and Peter Montgomery, "Division by
> Invariant Integers using Multiplication", PLDI-94.
> http://swox.se/~tg/divcnst-pldi94.pdf
>
> Another excellent reference describing division by integer constant
> algorithms is Henry S. Warren Jr.'s "Hacker's Delight", Addision-Wesley
> publishers, 2003.
>
> The strange thing is that all three of GCC, the paper and the book
> agree on the a preferred multiplier, which differs from the one
> selected by your code. Following through both sets of correctness
> proofs is tedious, but it looks like Granlund's work attempts to
> find multipliers whose residual error is less than 1/n, which is
> sufficient by not necessary. Your code relies on round to zero
> semantics of truncating division, so that provided the residual is
> less than 1, we'll always truncate to the correct result.
Yes, it works exactly that way.
> One issue is that Grunlund's derivation starts with signed division
> and is then extended/tweaked to handle unsigned division. Your
> work on the other hand, starts its derivation with unsigned division.
In my pdf downloaded from http://swox.se/~tg/divcnst-pldi94.pdf
Grunlund starts with unsigned division also.
Actually, my code is not very different. Method of reducing K
is different, and also I borrowed from that paper the knowledge
that 33bit good K always exists, so I don't canculate it
at all. I calculate best possible 32-bit K, if even that is
bad, then known-good 33bit one is K*2-1.
I can explain why it = K*2-1 if needed.
Oh, and notation is somewhat different, I used A/B before
I saw the paper, and paper uses d instead of B, etc...
> So currently I'm trying to find the catch. For which divisions
> are 33-bit operations required. Your proof looks reasonable and
> the code does the correct thing for the divisions in your examples,
> though the coding style issues prevent it being used in the compiler
> without being cleaned up.
Style is fixed now, new attachment is added to the bug page.
> There's also a return clause where it gives
> up after failing to find a suitable multiplier, where falling back
> to the exisiting code might be a better compromise. I've not played
> with your algorithm enough to determine how often it triggers.
I don't see where does it do this. The code is below my sig,
please reply and quote relevant part.
> To be honest all this discrete math makes my head hurt. Indeed,
> when its appreciated that its not the smallest multiplier that we'd
> prefer, but the cheapest (using shifts/adds etc...) and that
> choose_multiplier is potentially intertwined with synth_mult, the
> world starts spinning and I need to reach for the headache tablets.
If "goodness" of given multiplier K can be measured somehow,
(say, we have mul_cost()), you need to add it here in my code:
K++;
+ min_cost = mul_cost(K); best_K = K; best_lgdn = lgdn;
...
...
/* K is good, but maybe we can reduce it? */
while(1)
{
last_K = K;
if (--lgdn < 0)
break;
if (!high_A && bad_A <= max_A)
break;
+ cost = mul_cost(K);
+ if(min_cost >= cost) min_cost = cost, best_K = K, best_lgdn = lgdn;
}
/* Report previous parameters because we current ones are bad */
- *multiplier_ptr = last_K;
- *shift_ptr = lgdn+1;
+ *multiplier_ptr = best_K;
+ *shift_ptr = best_lgdn;
return 0; /* 32-bit or less */
Something like it.
BTW, the most improvement will be caused by propagating data
about possible range of dividend into choose_multiplier().
Code will be able to pick smaller Ks.
Right now, it assumes max_A = 0xffffffff or 0x80000000.
> Roger
> --
--
vda
int gcc_fastdiv_params(unsigned HOST_WIDE_INT B, unsigned HOST_WIDE_INT max_A,
unsigned HOST_WIDE_INT *multiplier_ptr, int *shift_ptr)
{
unsigned HOST_WIDE_INT K, last_K, d_LB, bad_A, udummy;
HOST_WIDE_INT high_A, dummy;
int lgdn;
/* L/B should fit into uint, so max L is floor_log2(B) << HOST_BITS_PER_WIDE_INT
** L is impicitly = 1 << (lgdn+HOST_BITS_PER_WIDE_INT) */
lgdn = floor_log2(B);
/* K = L/B + 1 */
div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, B,(HOST_WIDE_INT)0,
&K,&dummy, &udummy,&dummy);
K++;
/* d_LB = ((L/B) * B + B - L) */
d_LB = K * B; /* since (HOST_WIDE_INT)L == 0, subtracting it can be optimized out */
/* bad_A = L / d_LB */
div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, d_LB,(HOST_WIDE_INT)0,
&bad_A,&high_A, &udummy,&dummy);
bad_A = (bad_A / B) * B; /* ... + B-1 later */
if(high_A || bad_A > max_HOST_WIDE_INT - (B-1) )
high_A = 1; /* we aren't interested much in true value if high_A!=0 */
else
bad_A += B-1;
if (!high_A && bad_A <= max_A)
{
/* K doesn't work for all required A's.
** However, there always exist
** (potentially HOST_BITS_PER_WIDE_INT+1 bit) K which works
** for all A's and it equals K*2-1.
** We return lower HOST_BITS_PER_WIDE_INT bits in *multiplier_ptr. */
*multiplier_ptr = K*2-1;
*shift_ptr = lgdn+1;
/* if K > 0x800..000 then K*2-1 overflows into 32th bit,
** and we need to return 33-bit value */
return K > (1<<(HOST_BITS_PER_WIDE_INT-1));
}
/* There is not much point in multiplying by even number
** and then shifting right. Reduce K & L to avoid it: */
while (!(K & 1) && lgdn)
lgdn--, K >>= 1;
/* K is good, but maybe we can reduce it? */
while(1)
{
last_K = K;
if (--lgdn < 0)
break;
/* K = L/B + 1 */
div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, B,(HOST_WIDE_INT)0,
&K,&dummy, &udummy,&dummy);
K++;
/* d_LB = ((L/B) * B + B - L) */
d_LB = K * B; /* since (HOST_WIDE_INT)L == 0, subtracting it can be optimized out */
/* bad_A = L / d_LB */
div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, d_LB,(HOST_WIDE_INT)0,
&bad_A,&high_A, &udummy,&dummy);
bad_A = (bad_A / B) * B;
if(high_A || bad_A > max_HOST_WIDE_INT - (B-1) )
high_A = 1; /* we aren't interested much in true value if high_A!=0 */
else
bad_A += B-1;
if (!high_A && bad_A <= max_A)
break;
}
/* Report previous parameters because we current ones are bad */
*multiplier_ptr = last_K;
*shift_ptr = lgdn+1;
return 0; /* 32-bit or less */
}