This is the mail archive of the
`gcc-patches@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: Roger Sayle <roger at eyesopen dot com>*To*: gcc-patches at gcc dot gnu dot org*Cc*: Jim Wilson <wilson at specifixinc dot com>*Date*: Wed, 23 Jun 2004 20:51:26 -0600 (MDT)*Subject*: [PATCH] PR middle-end/15239: Artificial limit in expand_mult (take2)

The following patch is a resubmission of my patch to remove the limit on maximum synthetic multiply cost from expand_mult, resolving PR 15239. When I last tried proposing this patch, Jim Wilson insightfully stated the "patch is clearly useful" on condition that (i) the TARGET_RTX_COSTS documentaion was updated to mention its dependence upon optimize_size and (ii) that I demonstrate that it actually does something useful. Part (i) was relatively easy, but I'll admit that I was surprised myself that at the time my testing of (ii) showed no difference at all on the P4, where the cost of multiplication is 15 times that of an addition, so synth_mult should theoretically be able to generate beneficial code sequences that cost more than expand_mult's artificial 12 addition limit. Further investigation revealed the problem was elsewhere in the i386 backend, where shift-add instructions were incorrectly parameterized with costs over five times larger than they really had. Now that this has been fixed, I'm resubmitting this original patch, now that it can be demonstrated to be of benefit. The first co-efficient that demonstrates a difference in middle-end's synth_mult is for multiplications by 173. At present, mainline CVS currently implements this using a single "imul" instruction, having an rtx_cost of 15: mul173: ; rtx_cost imull $173, 4(%esp), %eax ; 15 cycles ret With the patch below, we now generate the following sequence mul173: ; rtx_cost movl 4(%esp), %edx ; dx = x leal (%edx,%edx,2), %ecx ; cx = 2*dx+dx = 3x 3 cycles leal 0(,%ecx,8), %eax ; ax = 8*cx = 24x 3 cycles subl %ecx, %eax ; ax = ax-cx = 21x 1 cycle leal (%edx,%eax,2), %eax ; ax = 2*ax+dx = 43x 3 cycles leal (%edx,%eax,4), %eax ; ax = 4*ax+dx = 173x 3 cycles ret Total cycle time for this multiplication is 13 cycles which is above the current 12 cycle limit, yet still faster than the 15 cycle "imul". Interestingly, the Intel compiler when generating code for the Pentium4 also avoids using the architecture's "imul" instruction and generates: mul173: ; rtx_cost mov edx, DWORD PTR [esp+4] ; dx = x lea ecx, DWORD PTR [edx+edx] ; cx = dx+dx = 2x 1 cycle add ecx, ecx ; cx = cx+cx = 4x 1 cycle sub ecx, edx ; cx = cx-dx = 3x 1 cycle add ecx, ecx ; cx = cx+cx = 6x 1 cycle add ecx, ecx ; cx = cx+cx = 12x 1 cycle sub ecx, edx ; cx = cx-dx = 11x 1 cycle add ecx, ecx ; cx = cx+cx = 22x 1 cycle add ecx, ecx ; cx = cx+cx = 44x 1 cycle sub ecx, edx ; cx = cx-dx = 43x 1 cycle lea eax, DWORD PTR [edx+ecx*4] ; = 173x 3 cycles ret For a total of COSTS_N_INSNS(12), which although slightly better than GCC's code with this patch, would still be rejected by the current 12 addition limit. Hopefully, this is sufficient to demonstrate that on atleast one platform the middle-end shouldn't artificially limit the cost of a synthetic multiplication sequence, which is a task best left to the backend. The following patch has been retested on i686-pc-linux-gnu with a full "make bootstrap", all default languages, and regression tested with a top-level "make -k check" with no new failures. Ok for mainline? 2004-06-23 Roger Sayle <roger@eyesopen.com> PR middle-end/15239 * expmed.c (expand_mult): Remove artificial restriction on the maximum cost of a synthetic multiplication sequence. Index: expmed.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/expmed.c,v retrieving revision 1.168 diff -c -3 -p -r1.168 expmed.c *** expmed.c 17 Jun 2004 17:03:11 -0000 1.168 --- expmed.c 20 Jun 2004 23:28:46 -0000 *************** expand_mult (enum machine_mode mode, rtx *** 2664,2670 **** && (unsignedp || !flag_trapv)) { int mult_cost = rtx_cost (gen_rtx_MULT (mode, op0, op1), SET); - mult_cost = MIN (12 * add_cost[mode], mult_cost); if (choose_mult_variant (mode, INTVAL (const_op1), &algorithm, &variant, mult_cost)) --- 2664,2669 ---- Roger --

**Follow-Ups**:**Re: [PATCH] PR middle-end/15239: Artificial limit in expand_mult (take 2)***From:*Richard Henderson

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |