This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH GCC][v2]Simplify alias check code generation in vectorizer


On Wed, Sep 21, 2016 at 3:01 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Sep 20, 2016 at 5:25 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> I originally posted a patch improving code generation for alias check in vectorizer at https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00929.html.  Here it's the 2nd version (better) patch.  It detects data reference pair in which the two references are only different to each other wrto index.  In this case the patch generates intersect range checks based on index of address of reference, rather than the address of reference.  Take example from patch's comment, given below two data references:
>>
>>                        DR_A                           DR_B
>>       data-ref         arr[i]                         arr[j]
>>       base_object      arr                            arr
>>       index            {i_0, +, 1}_loop               {j_0, +, 1}_loop
>>
>> The addresses and their index can be depicted as:
>>
>>         |<- ADDR_A    ->|          |<- ADDR_B    ->|
>>      ------------------------------------------------------->
>>         |   |   |   |   |          |   |   |   |   |
>>      ------------------------------------------------------->
>>         i_0 ...         i_0+4      j_0 ...         j_0+4
>>
>> We can create expression based on index rather than address:  (i_0 + 4 < j_0 || j_0 + 4 < i_0).
>>
>> Also take example from spec2k/200.sixtrack/DACOP, gcc needs to generate alias check for three pairs:
>> //pair 1
>> dr_a:
>> (Data Ref:
>>   bb: 8
>>   stmt: _92 = da.cc[_27];
>>   ref: da.cc[_27];
>> )
>> dr_b:
>> (Data Ref:
>>   bb: 8
>>   stmt: da.cc[_93] = _92;
>>   ref: da.cc[_93];
>> )
>> //pair 2
>> dr_a:
>> (Data Ref:
>>   bb: 8
>>   stmt: pretmp_29 = da.i2[_27];
>>   ref: da.i2[_27];
>> )
>> dr_b:
>> (Data Ref:
>>   bb: 8
>>   stmt: da.i2[_93] = pretmp_29;
>>   ref: da.i2[_93];
>> )
>> //pair 3
>> dr_a:
>> (Data Ref:
>>   bb: 8
>>   stmt: pretmp_28 = da.i1[_27];
>>   ref: da.i1[_27];
>> )
>> dr_b:
>> (Data Ref:
>>   bb: 8
>>   stmt: da.i1[_93] = pretmp_28;
>>   ref: da.i1[_93];
>> )
>>
>> With this patch, code can be improved to:
>>
>>   <bb 7>:
>>   _37 = (unsigned int) _6;
>>   _106 = (unsigned int) _52;
>>   _107 = _37 - _106;
>>   _108 = _107 > 7;
>>   _109 = (integer(kind=8)) _2;
>>   _110 = _109 + 3;
>>   _111 = (integer(kind=8)) _52;
>>   _112 = _111 + -1;
>>   _113 = _110 < _112;
>>   _114 = (integer(kind=8)) _52;
>>   _115 = _114 + 2;
>>   _116 = (integer(kind=8)) _2;
>>   _117 = _115 < _116;
>>   _118 = _113 | _117;
>>   _119 = _108 & _118;
>>   _120 = (integer(kind=8)) _2;
>>   _121 = _120 + 3;
>>   _122 = (integer(kind=8)) _52;
>>   _123 = _122 + -1;
>>   _124 = _121 < _123;
>>   _125 = (integer(kind=8)) _52;
>>   _126 = _125 + 2;
>>   _127 = (integer(kind=8)) _2;
>>   _128 = _126 < _127;
>>   _129 = _124 | _128;
>>   _130 = _119 & _129;
>>   _131 = (integer(kind=8)) _2;
>>   _132 = _131 + 3;
>>   _133 = (integer(kind=8)) _52;
>>   _134 = _133 + -1;
>>   _135 = _132 < _134;
>>   _136 = (integer(kind=8)) _52;
>>   _137 = _136 + 2;
>>   _138 = (integer(kind=8)) _2;
>>   _139 = _137 < _138;
>>   _140 = _135 | _139;
>>   _141 = _130 & _140;
>>   if (_141 != 0)
>>     goto <bb 8>;
>>   else
>>     goto <bb 20>;
>>
>> Note this patch doesn't do local CSE, and common sub-expressions will be optimized by later passes.  I think this is OK because the redundancy is far away from loops thus won't have bad effect (there is an opposite example/PR).
>> Bootstrap and test on x86_64 and AArch64, is it OK?
>
> Seeing
>
> +  /* Infer index length from segment length.  */
> +  unsigned HOST_WIDE_INT idx_len1 = (seg_len1 + abs_step - 1) / abs_step;
> +  unsigned HOST_WIDE_INT idx_len2 = (seg_len2 + abs_step - 1) / abs_step;
>
> Does this really work for DR_STEP that is bigger than 1 in index space?  The
I think it works.  Since both segment length of memory accessed and
index length evaluated are induction variable wrto loop iterations,
they are linear functions of the niters, with DR_STEP and IDX_STEP as
their coefficient, like
  segment_length = niters * DR_STEP
  index_length = niters * IDX_STEP
The affine relation between segment_length and index_length is
guaranteed by SCEV, otherwise, the loop won't be eligible for
vectorization.  So I think it's equal between check segment and index
here.
Follow your reviews, I did find a problem in the original patch
because I forgot to multiply niters with IDX_STEP in computing
index_length.  This is fixed now.  I also added explaining comment.

> segment length is generally vectorization-factor * DR_STEP but index space
> is -- if you are talking about array-ref indexes -- the segment length divided
> by the array element size.  Note that what the index space refers to for
> a given DR_ACCESS_FN depends on whether this is from an ARRAY_REF
> (element size) or the base pointer evolution (bytes).  I suppose looking at
> the access functions type might distinguish the pointer vs. array index case.
? Looking at access functions is the reason we can't distinguish
between pointer and array reference because their access function are
totally different.  On the contrary, we need to look into (innermost)
evolution behavior in this case I think?

>
> What I'm looking for is a big comment on why the above "translation" is
> correct.
Comments added.  Bootstrap and test, is it reasonable?

Thanks,
bin

Attachment: check-alias-on-index-20160921.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]