[patch] expmed.c: Improve multiplication by more constants.

Mon Apr 13 17:20:00 GMT 2009

Hi,

Attached is a patch to improve synth_mult.

Specifically, this patch improves multiplication by constants of the
form

  -m * (2^^n - 1) for some positive integers m and n.

Without this patch, arm-none-eabi-gcc -mcpu=cortex-a8 produces a
multiplication by -28 like so:

<multminus28>:
	mvn	r3, #27    ; r3 = -28
	mul	r0, r3, r0 ; r0 = r3 * r0
	bx	lr

I've inserted comments after ';' for readability.  With this patch,
you get:

<multminus28>:
	sub	r0, r0, r0, lsl #3 ; r0 = r0 - (r0 << 3)
	lsl	r0, r0, #2         ; r0 = r0 << 2
	bx	lr

When you call synth_mult with T == -28 and mode == SImode, synth_mult
tries a shift by 2 bits because T is divisible by 4.  synth_mult
without this patch always performs an unsigned shift on T.  So,
synth_mult is recursively called with T being:

  0x3ffffff9

rather than

  0xfffffff9

A multiplication by 0x3ffffff9 is hard to synthesize, but 0xfffffff9,
which is -7, is doable.  Specifically, we can do:

	sub	r0, r0, r0, lsl #3

on ARM.

The patch fixes the problem by trying out the result of a signed shift
on T.  For example, when T = -28 and mode == SImode are given, we
pass:

  0xfffffff9 = 0xffffffe4 >> 2

to the recursively called synth_mult, which gives you:

  A * -7 = A - A * 8

if costs permit.

Tested on x86_64-pc-linux-gnu and arm-none-eabi on top of the previous
synth_mult patch.  OK to apply?

Kazu Hirata

2009-04-13  Kazu Hirata  <kazu@codesourcery.com>

	* expmed.c (synth_mult): When trying out a shift, pass the result
	of a signed shift.

Index: gcc/expmed.c
===================================================================

--- gcc/expmed.c	(revision 145996)
+++ gcc/expmed.c	(working copy)
@@ -2542,6 +2542,38 @@ synth_mult (struct algorithm *alg_out, u
 	      best_alg->log[best_alg->ops] = m;
 	      best_alg->op[best_alg->ops] = alg_shift;
 	    }
+
+	  /* See if treating ORIG_T as a signed number yields a better
+	     sequence.  Try this sequence only for a negative ORIG_T
+	     as it would be useless for a non-negative ORIG_T.  */
+	  if ((HOST_WIDE_INT) orig_t < 0)
+	    {
+	      /* Shift ORIG_T as follows because a right shift of a
+		 negative-valued signed type is implementation
+		 defined.  */
+	      q = ~(~orig_t >> m);
+	      /* The function expand_shift will choose between a shift
+		 and a sequence of additions, so the observed cost is
+		 given as MIN (m * add_cost[speed][mode],
+		 shift_cost[speed][mode][m]).  */
+	      op_cost = m * add_cost[speed][mode];
+	      if (shift_cost[speed][mode][m] < op_cost)
+		op_cost = shift_cost[speed][mode][m];
+	      new_limit.cost = best_cost.cost - op_cost;
+	      new_limit.latency = best_cost.latency - op_cost;
+	      synth_mult (alg_in, q, &new_limit, mode);
+
+	      alg_in->cost.cost += op_cost;
+	      alg_in->cost.latency += op_cost;
+	      if (CHEAPER_MULT_COST (&alg_in->cost, &best_cost))
+		{
+		  struct algorithm *x;
+		  best_cost = alg_in->cost;
+		  x = alg_in, alg_in = best_alg, best_alg = x;
+		  best_alg->log[best_alg->ops] = m;
+		  best_alg->op[best_alg->ops] = alg_shift;
+		}
+	    }
 	}
       if (cache_hit)
 	goto done;