Do we need to do a loop invariant motion after loop interchange ?

Fri Nov 22 05:51:00 GMT 2019

On 2019/11/21 8:10 PM, Richard Biener wrote:
> On Thu, Nov 21, 2019 at 10:22 AM Li Jia He <helijia@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> I found for the follow code:
>>
>> #define N 256
>> int a[N][N][N], b[N][N][N];
>> int d[N][N], c[N][N];
>> void __attribute__((noinline))
>> double_reduc (int n)
>> {
>>     for (int k = 0; k < n; k++)
>>     {
>>       for (int l = 0; l < n; l++)
>>        {
>>          c[k][l] = 0;
>>           for (int m = 0; m < n; m++)
>>             c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m];
>>        }
>>     }
>> }
>>
>> I dumped the file after loop interchange and got the following information:
>>
>> <bb 3> [local count: 118111600]:
>>     # m_46 = PHI <0(7), m_45(11)>
>>     # ivtmp_44 = PHI <_42(7), ivtmp_43(11)>
>>     _39 = _49 + 1;
>>
>>     <bb 4> [local count: 955630224]:
>>     # l_48 = PHI <0(3), l_47(12)>
>>     # ivtmp_41 = PHI <_39(3), ivtmp_40(12)>
>>     c_I_I_lsm.5_18 = c[k_28][l_48];
>>     c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0;
>>     _2 = a[k_28][m_46][l_48];
>>     _3 = d[k_28][m_46];
>>     _4 = _2 * _3;
>>     _5 = b[k_28][m_46][l_48];
>>     _6 = _3 * _5;
>>     _7 = _4 + _6;
>>     _8 = _7 + c_I_I_lsm.5_53;
>>     c[k_28][l_48] = _8;
>>     l_47 = l_48 + 1;
>>     ivtmp_40 = ivtmp_41 - 1;
>>     if (ivtmp_40 != 0)
>>       goto <bb 12>; [89.00%]
>>     else
>>       goto <bb 5>; [11.00%]
>>
>> we can see '_3 = d[k_28][m_46];'  is a loop invariant.
>> Do we need to add a loop invariant motion pass after the loop interchange?
> 
> There is one at the end of the loop pipeline.

Hi,

The one at the end of the loop pipeline may miss some optimization
opportunities.  If we vectorize the above code (a.c.158t.vect), we
can get information similar to the following:

bb 3:
  # m_46 = PHI <0(7), m_45(11)>  // loop m, outer loop
   if (_59 <= 2)
     goto bb 20;
   else
     goto bb 15;

bb 15:
   _89 = d[k_28][m_46];
   vect_cst__90 = {_89, _89, _89, _89};

bb 4:
    # l_48 = PHI <l_47(12), 0(15)> // loop l, inner loop
   vect__6.23_100 = vect_cst__99 * vect__5.22_98;
    if (ivtmp_110 < bnd.8_1)
     goto bb 12;
   else
     goto bb 17;

bb 20:
bb 18:
    _27 = d[k_28][m_46];
if (ivtmp_12 != 0)
     goto bb 19;
   else
     goto bb 21;

Vectorization will do some conversions in this case.  We can see
â€˜ _89 = d[k_28][m_46];â€™ and â€˜_27 = d[k_28][m_46];â€™ are loop invariant
relative to loop l.  We can move â€˜d[k_28][m_46]â€™ to the front of
â€˜if (_59 <= 2)â€™ to get rid of loading data from memory in both branches.

The one at at the end of the loop pipeline can't handle this situation.
If we move d[k_28][m_46] from loop l to loop m before doing
vectorization, we can get rid of this situation.

-- 
BR,
Lijia He

> 
>> BR,
>> Lijia He
>>