[PATCH] PR middle-end/15239: Artificial limit in expand_mult (take 2)

Thu Jun 24 06:53:00 GMT 2004

The following patch is a resubmission of my patch to remove the limit
on maximum synthetic multiply cost from expand_mult, resolving PR 15239.

When I last tried proposing this patch, Jim Wilson insightfully stated
the "patch is clearly useful" on condition that (i) the TARGET_RTX_COSTS
documentaion was updated to mention its dependence upon optimize_size
and (ii) that I demonstrate that it actually does something useful.

Part (i) was relatively easy, but I'll admit that I was surprised myself
that at the time my testing of (ii) showed no difference at all on the
P4, where the cost of multiplication is 15 times that of an addition, so
synth_mult should theoretically be able to generate beneficial code
sequences that cost more than expand_mult's artificial 12 addition limit.

Further investigation revealed the problem was elsewhere in the i386
backend, where shift-add instructions were incorrectly parameterized
with costs over five times larger than they really had.  Now that this
has been fixed, I'm resubmitting this original patch, now that it can
be demonstrated to be of benefit.

The first co-efficient that demonstrates a difference in middle-end's
synth_mult is for multiplications by 173.  At present, mainline CVS
currently implements this using a single "imul" instruction, having an
rtx_cost of 15:

mul173:                               ;                      rtx_cost
        imull   $173, 4(%esp), %eax   ;                      15 cycles
        ret

With the patch below, we now generate the following sequence

mul173:                               ;                      rtx_cost
        movl    4(%esp), %edx         ; dx = x
        leal    (%edx,%edx,2), %ecx   ; cx = 2*dx+dx = 3x    3 cycles
        leal    0(,%ecx,8), %eax      ; ax = 8*cx    = 24x   3 cycles
        subl    %ecx, %eax            ; ax = ax-cx   = 21x   1 cycle
        leal    (%edx,%eax,2), %eax   ; ax = 2*ax+dx = 43x   3 cycles
        leal    (%edx,%eax,4), %eax   ; ax = 4*ax+dx = 173x  3 cycles
        ret

Total cycle time for this multiplication is 13 cycles which is above
the current 12 cycle limit, yet still faster than the 15 cycle "imul".

Interestingly, the Intel compiler when generating code for the Pentium4
also avoids using the architecture's "imul" instruction and generates:

mul173:                               ;                        rtx_cost
        mov       edx, DWORD PTR [esp+4]    ; dx = x
        lea       ecx, DWORD PTR [edx+edx]  ; cx = dx+dx = 2x  1 cycle
        add       ecx, ecx            ; cx = cx+cx   = 4x      1 cycle
        sub       ecx, edx            ; cx = cx-dx   = 3x      1 cycle
        add       ecx, ecx            ; cx = cx+cx   = 6x      1 cycle
        add       ecx, ecx            ; cx = cx+cx   = 12x     1 cycle
        sub       ecx, edx            ; cx = cx-dx   = 11x     1 cycle
        add       ecx, ecx            ; cx = cx+cx   = 22x     1 cycle
        add       ecx, ecx            ; cx = cx+cx   = 44x     1 cycle
        sub       ecx, edx            ; cx = cx-dx   = 43x     1 cycle
        lea       eax, DWORD PTR [edx+ecx*4] ;       = 173x    3 cycles
        ret

For a total of COSTS_N_INSNS(12), which although slightly better than
GCC's code with this patch, would still be rejected by the current 12
addition limit.

Hopefully, this is sufficient to demonstrate that on atleast one platform
the middle-end shouldn't artificially limit the cost of a synthetic
multiplication sequence, which is a task best left to the backend.

The following patch has been retested on i686-pc-linux-gnu with a full
"make bootstrap", all default languages, and regression tested with a
top-level "make -k check" with no new failures.

Ok for mainline?

2004-06-23  Roger Sayle  <roger@eyesopen.com>

	PR middle-end/15239
	* expmed.c (expand_mult): Remove artificial restriction on the
	maximum cost of a synthetic multiplication sequence.

Index: expmed.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expmed.c,v
retrieving revision 1.168
diff -c -3 -p -r1.168 expmed.c
*** expmed.c	17 Jun 2004 17:03:11 -0000	1.168
--- expmed.c	20 Jun 2004 23:28:46 -0000
*************** expand_mult (enum machine_mode mode, rtx
*** 2664,2670 ****
        && (unsignedp || !flag_trapv))
      {
        int mult_cost = rtx_cost (gen_rtx_MULT (mode, op0, op1), SET);
-       mult_cost = MIN (12 * add_cost[mode], mult_cost);

        if (choose_mult_variant (mode, INTVAL (const_op1), &algorithm, &variant,
  			       mult_cost))
--- 2664,2669 ----

Roger
--