Bug 34522 - inefficient code for long long multiply when only low bits are needed
Summary: inefficient code for long long multiply when only low bits are needed
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Paolo Bonzini
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2007-12-18 16:49 UTC by Paolo Bonzini
Modified: 2019-12-19 15:51 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.4.0
Known to fail: 4.0.0, 4.3.0
Last reconfirmed: 2008-03-12 09:00:32


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paolo Bonzini 2007-12-18 16:49:44 UTC
For

int test(long long a, long long b)
{
        return a * b;
}

GCC generates a widening multiply, and cannot remove the DImode operations until after register allocation.  This causes unnecessary splits.

This could be fixed on the tree level by folding to (int)a * (int)b, or alternatively in expand.

expand_expr is called with

 <mult_expr 0x2aaaae9032c0
    type <integer_type 0x2aaaae937840 long long int DI>
    arg 0 <parm_decl 0x2aaaae92d2d0 b type <integer_type 0x2aaaae937840 long
long int>>
    arg 1 <parm_decl 0x2aaaae92d240 a type <integer_type 0x2aaaae937840 long
long int>>>

and tmode SImode, still enough info to choose a better multiply.  However, tmode is not passed on to expand_mult.
Comment 1 Paolo Bonzini 2007-12-18 16:58:58 UTC
Prototype untested patch.  Produces

        movl    12(%esp), %eax
        imull   4(%esp), %eax
        ret

on the testcase.

Index: expr.c
===================================================================
--- expr.c      (revision 130928)
+++ expr.c      (working copy)
@@ -8642,7 +8642,8 @@ expand_expr_real_1 (tree exp, rtx target
        }
       expand_operands (TREE_OPERAND (exp, 0), TREE_OPERAND (exp, 1),
                       subtarget, &op0, &op1, 0);
-      return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp));
+      return REDUCE_BIT_FIELD (expand_mult (tmode != VOIDmode ? tmode : mode,
+                                            op0, op1, target, unsignedp));
 
     case TRUNC_DIV_EXPR:
     case FLOOR_DIV_EXPR:
Comment 2 Jakub Jelinek 2007-12-19 09:08:34 UTC
Shouldn't tmode be only used if GET_MODE_CLASS (tmode) == MODE_INT
&& GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_BITSIZE (tmode) < GET_MODE_BITSIZE (mode), to make sure we optimize only narrow, never widen, and
that float etc. multiplication is not affected?
Comment 3 Paolo Bonzini 2007-12-19 09:37:17 UTC
Makes a lot of sense.  I made the patch only to test that it would not crash, or something like that.
Comment 4 Uroš Bizjak 2008-02-21 15:41:38 UTC
(In reply to comment #1)
> Prototype untested patch.  Produces

Paolo, do you plan to test your patch and submit it to gcc-patches@ ?
Comment 5 Paolo Bonzini 2008-02-21 15:52:32 UTC
I want to clear 17236 first (and Jakub's tweaks are needed anyway).

Feel free to beat me to it.
Comment 6 Paolo Bonzini 2008-03-12 09:00:32 UTC
testing
Comment 7 Paolo Bonzini 2008-03-12 12:50:58 UTC
The expand patch does not bootstrap, even with the tweaks Jakub suggested.

I'm also hesitant to fold it on the tree level because it's actually undefined code unless -fwrapv.  For example if a = b = 65536LL,

 (int) a * (int) b = undefined
 (int) (a * b) = (int) (1LL << 32) = 0
Comment 8 Paolo Bonzini 2008-03-12 12:56:55 UTC
Hmm maybe I can go through an unsigned type.
Comment 9 Paolo Bonzini 2008-03-12 15:34:15 UTC
fixed in 4.4
Comment 10 Jonathan Wakely 2019-12-19 15:51:25 UTC
Fixed by r133144 and r133163 (with wrong PR number in the changelogs)