This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Induction variable candidates not sufficiently general


On Tue, Jul 17, 2018 at 2:08 AM, Kelvin Nilsen <kdnilsen@linux.ibm.com> wrote:
> Thanks for looking at this for me.  In simplifying the test case for a bug report, I've narrowed the "problem" to integer overflow considerations.  My len variable is declared int, and the target has 64-bit pointers.  I'm gathering that the "manual transformation" I quoted below is not considered "equivalent" to the original source code due to different integer overflow behaviors.  If I redeclare len to be unsigned long long, then I automatically get the optimizations that I was originally expecting.
>
> I suppose this is really NOT a bug?
As your test case demonstrates, it is caused by wrapping unsigned int32.
>
> Is there a compiler optimization flag that allows the optimizer to ignore array index integer overflow in considering legal optimizations?
I am not aware of one for unsigned integer, and I guess it won't be
introduced in the future either?

Thanks,
bin
>
>
>
> On 7/13/18 9:14 PM, Bin.Cheng wrote:
>> On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen <kdnilsen@linux.ibm.com> wrote:
>>> A somewhat old "issue report" pointed me to the code generated for a 4-fold manually unrolled version of the following loop:
>>>
>>>>                       while (++len != len_limit) /* this is loop */
>>>>                               if (pb[len] != cur[len])
>>>>                                       break;
>>>
>>> As unrolled, the loop appears as:
>>>
>>>>                 while (++len != len_limit) /* this is loop */ {
>>>>                   if (pb[len] != cur[len])
>>>>                     break;
>>>>                   if (++len == len_limit)  /* unrolled 2nd iteration */
>>>>                     break;
>>>>                   if (pb[len] != cur[len])
>>>>                     break;
>>>>                   if (++len == len_limit)  /* unrolled 3rd iteration */
>>>>                     break;
>>>>                   if (pb[len] != cur[len])
>>>>                     break;
>>>>                   if (++len == len_limit)  /* unrolled 4th iteration */
>>>>                     break;
>>>>                   if (pb[len] != cur[len])
>>>>                     break;
>>>>                 }
>>>
>>> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only induction variable candidates that are being considered are all forms of the len variable.  We are not considering any induction variables to represent the address expressions &pb[len] and &cur[len].
>>>
>>> I rewrote the source code for this loop to make the addressing expressions more explicit, as in the following:
>>>
>>>>       cur++;
>>>>       while (++pb != last_pb) /* this is loop */ {
>>>>       if (*pb != *cur)
>>>>         break;
>>>>       ++cur;
>>>>       if (++pb == last_pb)  /* unrolled 2nd iteration */
>>>>         break;
>>>>       if (*pb != *cur)
>>>>         break;
>>>>       ++cur;
>>>>       if (++pb == last_pb)  /* unrolled 3rd iteration */
>>>>         break;
>>>>       if (*pb != *cur)
>>>>         break;
>>>>       ++cur;
>>>>       if (++pb == last_pb)  /* unrolled 4th iteration */
>>>>         break;
>>>>       if (*pb != *cur)
>>>>         break;
>>>>       ++cur;
>>>>       }
>>>
>>> Now, gcc does a better job of identifying the "address expression induction variables".  This version of the loop runs about 10% faster than the original on my target architecture.
>>>
>>> This would seem to be a textbook pattern for the induction variable analysis.  Does anyone have any thoughts on the best way to add these candidates to the set of induction variables that are considered by tree-ssa-loop-ivopts.c?
>>>
>>> Thanks in advance for any suggestions.
>>>
>> Hi,
>> Could you please file a bug with your original slow test code
>> attached?  I tried to construct meaningful test case from your code
>> snippet but not successful.  There is difference in generated
>> assembly, but it's not that fundamental.  So a bug with preprocessed
>> test would be high appreciated.
>> I think there are two potential issues in cost computation for such
>> case: invariant expression and iv uses outside of loop handled as
>> inside uses.
>>
>> Thanks,
>> bin
>>
>>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]