This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] improved algorithm for gcc/expmed.c::choose_multiplier()


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417

On Thursday 03 August 2006 06:55, Roger Sayle wrote:
> As mentioned by Jim, the current GCC algorithm is based upon the
> paper by Torbjorn Granlund and Peter Montgomery, "Division by
> Invariant Integers using Multiplication", PLDI-94.
> http://swox.se/~tg/divcnst-pldi94.pdf
> 
> Another excellent reference describing division by integer constant
> algorithms is Henry S. Warren Jr.'s "Hacker's Delight", Addision-Wesley
> publishers, 2003.
> 
> The strange thing is that all three of GCC, the paper and the book
> agree on the a preferred multiplier, which differs from the one
> selected by your code.  Following through both sets of correctness
> proofs is tedious, but it looks like Granlund's work attempts to
> find multipliers whose residual error is less than 1/n, which is
> sufficient by not necessary.  Your code relies on round to zero
> semantics of truncating division, so that provided the residual is
> less than 1, we'll always truncate to the correct result.

Yes, it works exactly that way.

> One issue is that Grunlund's derivation starts with signed division
> and is then extended/tweaked to handle unsigned division.  Your
> work on the other hand, starts its derivation with unsigned division.

In my pdf downloaded from http://swox.se/~tg/divcnst-pldi94.pdf
Grunlund starts with unsigned division also.

Actually, my code is not very different. Method of reducing K
is different, and also I borrowed from that paper the knowledge
that 33bit good K always exists, so I don't canculate it
at all. I calculate best possible 32-bit K, if even that is
bad, then known-good 33bit one is K*2-1.

I can explain why it = K*2-1 if needed.

Oh, and notation is somewhat different, I used A/B before
I saw the paper, and paper uses d instead of B, etc...

> So currently I'm trying to find the catch.  For which divisions
> are 33-bit operations required.  Your proof looks reasonable and
> the code does the correct thing for the divisions in your examples,
> though the coding style issues prevent it being used in the compiler
> without being cleaned up.

Style is fixed now, new attachment is added to the bug page.

> There's also a return clause where it gives 
> up after failing to find a suitable multiplier, where falling back
> to the exisiting code might be a better compromise.  I've not played
> with your algorithm enough to determine how often it triggers.

I don't see where does it do this. The code is below my sig,
please reply and quote relevant part.

> To be honest all this discrete math makes my head hurt.  Indeed,
> when its appreciated that its not the smallest multiplier that we'd
> prefer, but the cheapest (using shifts/adds etc...) and that
> choose_multiplier is potentially intertwined with synth_mult, the
> world starts spinning and I need to reach for the headache tablets.

If "goodness" of given multiplier K can be measured somehow,
(say, we have mul_cost()), you need to add it here in my code:

  K++;
+ min_cost = mul_cost(K); best_K = K; best_lgdn = lgdn;
  ...
  ...
  /* K is good, but maybe we can reduce it? */
  while(1)
    {
      last_K = K;
      if (--lgdn < 0)
        break;

      if (!high_A && bad_A <= max_A)
        break;
+     cost = mul_cost(K);
+     if(min_cost >= cost) min_cost = cost, best_K = K, best_lgdn = lgdn;
    }
  /* Report previous parameters because we current ones are bad */
- *multiplier_ptr = last_K;
- *shift_ptr = lgdn+1;
+ *multiplier_ptr = best_K;
+ *shift_ptr = best_lgdn;
  return 0; /* 32-bit or less */

Something like it.

BTW, the most improvement will be caused by propagating data
about possible range of dividend into choose_multiplier().
Code will be able to pick smaller Ks.
Right now, it assumes max_A = 0xffffffff or 0x80000000.

> Roger
> --
--
vda

int gcc_fastdiv_params(unsigned HOST_WIDE_INT B, unsigned HOST_WIDE_INT max_A,
		unsigned HOST_WIDE_INT *multiplier_ptr, int *shift_ptr)
{
  unsigned HOST_WIDE_INT K, last_K, d_LB, bad_A, udummy;
  HOST_WIDE_INT high_A, dummy;
  int lgdn;

  /* L/B should fit into uint, so max L is floor_log2(B) << HOST_BITS_PER_WIDE_INT
  ** L is impicitly = 1 << (lgdn+HOST_BITS_PER_WIDE_INT) */
  lgdn = floor_log2(B);

  /* K = L/B + 1 */
  div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, B,(HOST_WIDE_INT)0,
			&K,&dummy, &udummy,&dummy);
  K++;

  /* d_LB = ((L/B) * B + B - L) */
  d_LB = K * B; /* since (HOST_WIDE_INT)L == 0, subtracting it can be optimized out */
  /* bad_A = L / d_LB */
  div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, d_LB,(HOST_WIDE_INT)0,
			&bad_A,&high_A, &udummy,&dummy);

  bad_A = (bad_A / B) * B; /* ... + B-1 later */
  if(high_A || bad_A > max_HOST_WIDE_INT - (B-1) )
    high_A = 1; /* we aren't interested much in true value if high_A!=0 */
  else
    bad_A += B-1;

  if (!high_A && bad_A <= max_A)
    {
      /* K doesn't work for all required A's.
      ** However, there always exist
      ** (potentially HOST_BITS_PER_WIDE_INT+1 bit) K which works
      ** for all A's and it equals K*2-1.
      ** We return lower HOST_BITS_PER_WIDE_INT bits in *multiplier_ptr. */
      *multiplier_ptr = K*2-1;
      *shift_ptr = lgdn+1;
      /* if K > 0x800..000 then K*2-1 overflows into 32th bit,
      ** and we need to return 33-bit value */
      return K > (1<<(HOST_BITS_PER_WIDE_INT-1));
    }

  /* There is not much point in multiplying by even number
  ** and then shifting right. Reduce K & L to avoid it: */
  while (!(K & 1) && lgdn)
    lgdn--, K >>= 1;

  /* K is good, but maybe we can reduce it? */
  while(1)
    {
      last_K = K;
      if (--lgdn < 0)
        break;
      /* K = L/B + 1 */
      div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, B,(HOST_WIDE_INT)0,
			&K,&dummy, &udummy,&dummy);
      K++;
      /* d_LB = ((L/B) * B + B - L) */
      d_LB = K * B; /* since (HOST_WIDE_INT)L == 0, subtracting it can be optimized out */
      /* bad_A = L / d_LB */
      div_and_round_double(TRUNC_DIV_EXPR, 1, 0,1<<lgdn, d_LB,(HOST_WIDE_INT)0,
			&bad_A,&high_A, &udummy,&dummy);
      bad_A = (bad_A / B) * B;
      if(high_A || bad_A > max_HOST_WIDE_INT - (B-1) )
        high_A = 1; /* we aren't interested much in true value if high_A!=0 */
      else
        bad_A += B-1;
      if (!high_A && bad_A <= max_A)
        break;
    }
  /* Report previous parameters because we current ones are bad */
  *multiplier_ptr = last_K;
  *shift_ptr = lgdn+1;
  return 0; /* 32-bit or less */
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]