User account creation filtered due to spam.

Bug 34522

Summary: inefficient code for long long multiply when only low bits are needed
Product: gcc Reporter: Paolo Bonzini <bonzini>
Component: rtl-optimizationAssignee: Paolo Bonzini <bonzini>
Severity: normal CC: fang, gcc-bugs, jakub, ubizjak
Priority: P3 Keywords: missed-optimization
Version: 4.3.0   
Target Milestone: ---   
Host: Target:
Build: Known to work: 4.4.0
Known to fail: 4.0.0 4.3.0 Last reconfirmed: 2008-03-12 09:00:32

Description Paolo Bonzini 2007-12-18 16:49:44 UTC

int test(long long a, long long b)
        return a * b;

GCC generates a widening multiply, and cannot remove the DImode operations until after register allocation.  This causes unnecessary splits.

This could be fixed on the tree level by folding to (int)a * (int)b, or alternatively in expand.

expand_expr is called with

 <mult_expr 0x2aaaae9032c0
    type <integer_type 0x2aaaae937840 long long int DI>
    arg 0 <parm_decl 0x2aaaae92d2d0 b type <integer_type 0x2aaaae937840 long
long int>>
    arg 1 <parm_decl 0x2aaaae92d240 a type <integer_type 0x2aaaae937840 long
long int>>>

and tmode SImode, still enough info to choose a better multiply.  However, tmode is not passed on to expand_mult.
Comment 1 Paolo Bonzini 2007-12-18 16:58:58 UTC
Prototype untested patch.  Produces

        movl    12(%esp), %eax
        imull   4(%esp), %eax

on the testcase.

Index: expr.c
--- expr.c      (revision 130928)
+++ expr.c      (working copy)
@@ -8642,7 +8642,8 @@ expand_expr_real_1 (tree exp, rtx target
       expand_operands (TREE_OPERAND (exp, 0), TREE_OPERAND (exp, 1),
                       subtarget, &op0, &op1, 0);
-      return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp));
+      return REDUCE_BIT_FIELD (expand_mult (tmode != VOIDmode ? tmode : mode,
+                                            op0, op1, target, unsignedp));
     case TRUNC_DIV_EXPR:
     case FLOOR_DIV_EXPR:
Comment 2 Jakub Jelinek 2007-12-19 09:08:34 UTC
Shouldn't tmode be only used if GET_MODE_CLASS (tmode) == MODE_INT
&& GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_BITSIZE (tmode) < GET_MODE_BITSIZE (mode), to make sure we optimize only narrow, never widen, and
that float etc. multiplication is not affected?
Comment 3 Paolo Bonzini 2007-12-19 09:37:17 UTC
Makes a lot of sense.  I made the patch only to test that it would not crash, or something like that.
Comment 4 UroŇ° Bizjak 2008-02-21 15:41:38 UTC
(In reply to comment #1)
> Prototype untested patch.  Produces

Paolo, do you plan to test your patch and submit it to gcc-patches@ ?
Comment 5 Paolo Bonzini 2008-02-21 15:52:32 UTC
I want to clear 17236 first (and Jakub's tweaks are needed anyway).

Feel free to beat me to it.
Comment 6 Paolo Bonzini 2008-03-12 09:00:32 UTC
Comment 7 Paolo Bonzini 2008-03-12 12:50:58 UTC
The expand patch does not bootstrap, even with the tweaks Jakub suggested.

I'm also hesitant to fold it on the tree level because it's actually undefined code unless -fwrapv.  For example if a = b = 65536LL,

 (int) a * (int) b = undefined
 (int) (a * b) = (int) (1LL << 32) = 0
Comment 8 Paolo Bonzini 2008-03-12 12:56:55 UTC
Hmm maybe I can go through an unsigned type.
Comment 9 Paolo Bonzini 2008-03-12 15:34:15 UTC
fixed in 4.4