This is the mail archive of the
mailing list for the GCC project.
[PATCH] PR middle-end/15239: Artificial limit in expand_mult (take2)
- From: Roger Sayle <roger at eyesopen dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: Jim Wilson <wilson at specifixinc dot com>
- Date: Wed, 23 Jun 2004 20:51:26 -0600 (MDT)
- Subject: [PATCH] PR middle-end/15239: Artificial limit in expand_mult (take2)
The following patch is a resubmission of my patch to remove the limit
on maximum synthetic multiply cost from expand_mult, resolving PR 15239.
When I last tried proposing this patch, Jim Wilson insightfully stated
the "patch is clearly useful" on condition that (i) the TARGET_RTX_COSTS
documentaion was updated to mention its dependence upon optimize_size
and (ii) that I demonstrate that it actually does something useful.
Part (i) was relatively easy, but I'll admit that I was surprised myself
that at the time my testing of (ii) showed no difference at all on the
P4, where the cost of multiplication is 15 times that of an addition, so
synth_mult should theoretically be able to generate beneficial code
sequences that cost more than expand_mult's artificial 12 addition limit.
Further investigation revealed the problem was elsewhere in the i386
backend, where shift-add instructions were incorrectly parameterized
with costs over five times larger than they really had. Now that this
has been fixed, I'm resubmitting this original patch, now that it can
be demonstrated to be of benefit.
The first co-efficient that demonstrates a difference in middle-end's
synth_mult is for multiplications by 173. At present, mainline CVS
currently implements this using a single "imul" instruction, having an
rtx_cost of 15:
mul173: ; rtx_cost
imull $173, 4(%esp), %eax ; 15 cycles
With the patch below, we now generate the following sequence
mul173: ; rtx_cost
movl 4(%esp), %edx ; dx = x
leal (%edx,%edx,2), %ecx ; cx = 2*dx+dx = 3x 3 cycles
leal 0(,%ecx,8), %eax ; ax = 8*cx = 24x 3 cycles
subl %ecx, %eax ; ax = ax-cx = 21x 1 cycle
leal (%edx,%eax,2), %eax ; ax = 2*ax+dx = 43x 3 cycles
leal (%edx,%eax,4), %eax ; ax = 4*ax+dx = 173x 3 cycles
Total cycle time for this multiplication is 13 cycles which is above
the current 12 cycle limit, yet still faster than the 15 cycle "imul".
Interestingly, the Intel compiler when generating code for the Pentium4
also avoids using the architecture's "imul" instruction and generates:
mul173: ; rtx_cost
mov edx, DWORD PTR [esp+4] ; dx = x
lea ecx, DWORD PTR [edx+edx] ; cx = dx+dx = 2x 1 cycle
add ecx, ecx ; cx = cx+cx = 4x 1 cycle
sub ecx, edx ; cx = cx-dx = 3x 1 cycle
add ecx, ecx ; cx = cx+cx = 6x 1 cycle
add ecx, ecx ; cx = cx+cx = 12x 1 cycle
sub ecx, edx ; cx = cx-dx = 11x 1 cycle
add ecx, ecx ; cx = cx+cx = 22x 1 cycle
add ecx, ecx ; cx = cx+cx = 44x 1 cycle
sub ecx, edx ; cx = cx-dx = 43x 1 cycle
lea eax, DWORD PTR [edx+ecx*4] ; = 173x 3 cycles
For a total of COSTS_N_INSNS(12), which although slightly better than
GCC's code with this patch, would still be rejected by the current 12
Hopefully, this is sufficient to demonstrate that on atleast one platform
the middle-end shouldn't artificially limit the cost of a synthetic
multiplication sequence, which is a task best left to the backend.
The following patch has been retested on i686-pc-linux-gnu with a full
"make bootstrap", all default languages, and regression tested with a
top-level "make -k check" with no new failures.
Ok for mainline?
2004-06-23 Roger Sayle <firstname.lastname@example.org>
* expmed.c (expand_mult): Remove artificial restriction on the
maximum cost of a synthetic multiplication sequence.
RCS file: /cvs/gcc/gcc/gcc/expmed.c,v
retrieving revision 1.168
diff -c -3 -p -r1.168 expmed.c
*** expmed.c 17 Jun 2004 17:03:11 -0000 1.168
--- expmed.c 20 Jun 2004 23:28:46 -0000
*************** expand_mult (enum machine_mode mode, rtx
*** 2664,2670 ****
&& (unsignedp || !flag_trapv))
int mult_cost = rtx_cost (gen_rtx_MULT (mode, op0, op1), SET);
- mult_cost = MIN (12 * add_cost[mode], mult_cost);
if (choose_mult_variant (mode, INTVAL (const_op1), &algorithm, &variant,
--- 2664,2669 ----