[PATCH (3/7)] Widening multiply-and-accumulate pattern matching

Thu Jul 7 12:34:00 GMT 2011

On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote:
> On 07/07/11 11:26, Andrew Stubbs wrote:
>>
>> On 07/07/11 10:58, Richard Guenther wrote:
>>>
>>> I think you should assume that series of widenings,
>>> (int)(short)char_variable
>>> are already combined.  Thus I believe you only need to consider a single
>>> conversion in valid_types_for_madd_p.
>>
>> Hmm, I'm not so sure. I'll look into it a bit further.
>
> OK, here's a test case that gives multiple conversions:
>
>  long long
>  foo (long long a, signed char b, signed char c)
>  {
>    int bc = b * c;
>    return a + (short)bc;
>  }
>
> The dump right before the widen_mult pass gives:
>
>  foo (long long int a, signed char b, signed char c)
>  {
>    int bc;
>    long long int D.2018;
>    short int D.2017;
>    long long int D.2016;
>    int D.2015;
>    int D.2014;
>
>  <bb 2>:
>    D.2014_2 = (int) b_1(D);
>    D.2015_4 = (int) c_3(D);
>    bc_5 = D.2014_2 * D.2015_4;
>    D.2017_6 = (short int) bc_5;

Ok, so you have a truncation that is a no-op value-wise.  I would
argue that this truncation should be removed independent on
whether we have a widening multiply instruction or not.

The technically most capable place to remove non-value-changing
truncations (and combine them with a successive conversion)
would be value-range propagation.  Which already knows:

Value ranges after VRP:

b_1(D): VARYING
D.2698_2: [-128, 127]
c_3(D): VARYING
D.2699_4: [-128, 127]
bc_5: [-16256, 16384]
D.2701_6: [-16256, 16384]
D.2702_7: [-16256, 16384]
a_8(D): VARYING
D.2700_9: VARYING

thus truncating bc_5 to short does not change the value.

The simplification could be made when looking at the
statement

>    D.2018_7 = (long long int) D.2017_6;

in vrp_fold_stmt, based on the fact that this conversion
converts from a value-preserving intermediate conversion.
Thus the transform would replace the D.2017_6 operand
with bc_5.

So yes, the case appears - but it shouldn't ;)

I'll cook up a quick patch for VRP.

Thanks,
Richard.

>    D.2016_9 = D.2018_7 + a_8(D);
>    return D.2016_9;
>
>  }
>
> Here we have a multiply and accumulate done the long way. The 8-bit inputs
> are widened to 32-bit, multiplied to give a 32-bit result (of which only the
> lower 16-bits contain meaningful data), then truncated to 16-bits, and
> sign-extended up to 64-bits ready for the 64-bit addition.
>
> This is slight contrived, perhaps, but not unlike the sort of thing that
> might occur when you have inline functions and macros, and most importantly
> - it is mathematically valid!
>
>
> So, here's the output from my patched widen_mult pass:
>
>  foo (long long int a, signed char b, signed char c)
>  {
>    int bc;
>    long long int D.2018;
>    short int D.2017;
>    long long int D.2016;
>    int D.2015;
>    int D.2014;
>
>  <bb 2>:
>    D.2014_2 = (int) b_1(D);
>    D.2015_4 = (int) c_3(D);
>    bc_5 = b_1(D) w* c_3(D);
>    D.2017_6 = (short int) bc_5;
>    D.2018_7 = (long long int) D.2017_6;
>    D.2016_9 = WIDEN_MULT_PLUS_EXPR <b_1(D), c_3(D), a_8(D)>;
>    return D.2016_9;
>
>  }
>
> As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is now
> redundant. (Ideally, this would be removed now, but in fact it doesn't get
> eliminated until the RTL into_cfglayout pass. This is not new behaviour.)
>
>
> My point is that it's possible to have at least two conversions to examine.
> Is it possible to have more? I don't know, but once I'm dealing with two I
> might as well deal with an arbitrary number.
>
> Andrew
>