This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: x86 gcc lacks simple optimization

From: Richard Biener <richard dot guenther at gmail dot com>
To: Konstantin Vladimirov <konstantin dot vladimirov at gmail dot com>
Cc: GCC Development <gcc at gcc dot gnu dot org>, GCC-help <gcc-help at gcc dot gnu dot org>
Date: Fri, 6 Dec 2013 15:17:14 +0100
Subject: Re: x86 gcc lacks simple optimization
Authentication-results: sourceware.org; auth=none
References: <CADn89gRZPDo1Z4gvime-PTC9aaeO6G5jgbN+0hOSZrnD8M1vtw at mail dot gmail dot com> <CAFiYyc0kZGi5XgXikvhqH5kHCXgBc=HtrB5OXHrqv2Z+HdNmhg at mail dot gmail dot com> <CADn89gSK0P4qXfGTNddW43XnP4KzThwNTQOBBNFBP9fO=raV3g at mail dot gmail dot com> <CAFiYyc30jVTphXoOeSnf9DbqpMg=C3069ZGWaBFVj1yAzbLMqg at mail dot gmail dot com> <CADn89gRQmp_yTmW+8aWpWMxZckV0GAVUFgZZ8W1-RSki4QqiBA at mail dot gmail dot com>

On Fri, Dec 6, 2013 at 2:52 PM, Konstantin Vladimirov
<konstantin.vladimirov@gmail.com> wrote:
> Hi,
>
> Richard, I tried to add LSHIFT_EXPR case to tree-scalar-evolution.c
> and now it yields code like (x86 again):
>
> .L5:
> movzbl 4(%esi,%eax,4), %edx
> movb %dl, 4(%ebx,%eax,4)
> addl $1, %eax
> cmpl %ecx, %eax
> jne .L5
>
> So, excessive lea is gone. It is great, thank you so much. But I
> wonder what else can I do to move add upper to simplify memory
> accesses (I am guessing, this is some arithmetical re-associations,
> still not sure where to look). For architecture, I am working on, it
> is important. What would you advise?

You need to look at IVOPTs and how it arrives at the choice of
induction variables.

Richard.

> ---
> With best regards, Konstantin
>
> On Fri, Dec 6, 2013 at 2:25 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Dec 6, 2013 at 11:19 AM, Konstantin Vladimirov
>> <konstantin.vladimirov@gmail.com> wrote:
>>> Hi,
>>>
>>> nothing changes if everything is unsigned and we are guaranteed to not
>>> raise UB on overflow:
>>>
>>> unsigned foo(unsigned char *t, unsigned char *v, unsigned w)
>>> {
>>> unsigned i;
>>>
>>> for (i = 1; i != w; ++i)
>>> {
>>> unsigned x = i << 2;
>>> v[x + 4] = t[x + 4];
>>> }
>>>
>>> return 0;
>>> }
>>>
>>> yields:
>>>
>>> .L5:
>>> leal 0(,%eax,4), %edx
>>> addl $1, %eax
>>> movzbl 4(%edi,%edx), %ecx
>>> cmpl %ebx, %eax
>>> movb %cl, 4(%esi,%edx)
>>> jne .L5
>>>
>>> What is SCEV infrastructure (guessing scalar evolutions?) and what
>>> files/passes to look in?
>>
>> tree-scalar-evolution.c, look at where it handles MULT_EXPR but
>> lacks LSHIFT_EXPR support.
>>
>> Richard.
>>
>>> ---
>>> With best regards, Konstantin
>>>
>>> On Fri, Dec 6, 2013 at 2:10 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Dec 6, 2013 at 9:30 AM, Konstantin Vladimirov
>>>> <konstantin.vladimirov@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Consider code:
>>>>>
>>>>> int foo(char *t, char *v, int w)
>>>>> {
>>>>> int i;
>>>>>
>>>>> for (i = 1; i != w; ++i)
>>>>> {
>>>>> int x = i << 2;
>>>>> v[x + 4] = t[x + 4];
>>>>> }
>>>>>
>>>>> return 0;
>>>>> }
>>>>>
>>>>> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options:
>>>>>
>>>>> gcc -O2 -m32 -S test.c
>>>>>
>>>>> You will see loop, formed like:
>>>>>
>>>>> .L5:
>>>>> leal 0(,%eax,4), %edx
>>>>> addl $1, %eax
>>>>> movzbl 4(%edi,%edx), %ecx
>>>>> cmpl %ebx, %eax
>>>>> movb %cl, 4(%esi,%edx)
>>>>> jne .L5
>>>>>
>>>>> But it can be easily simplified to something like this:
>>>>>
>>>>> .L5:
>>>>> addl $1, %eax
>>>>> movzbl (%esi,%eax,4), %edx
>>>>> cmpl %ecx, %eax
>>>>> movb %dl, (%ebx,%eax,4)
>>>>> jne .L5
>>>>>
>>>>> (i.e. left shift may be moved to address).
>>>>>
>>>>> First question to gcc-help maillist. May be there are some options,
>>>>> that I've missed, and there IS a way to explain gcc my intention to do
>>>>> this?
>>>>>
>>>>> And second question to gcc developers mail list. I am working on
>>>>> private backend and want to add this optimization to my backend. What
>>>>> do you advise me to do -- custom gimple pass, or rtl pass, or modify
>>>>> some existent pass, etc?
>>>>
>>>> This looks like a deficiency in induction variable optimization.  Note
>>>> that i << 2 may overflow and this overflow does not invoke undefined
>>>> behavior but is in the implementation defined behavior category.
>>>>
>>>> The issue in this case is likely that the SCEV infrastructure does not handle
>>>> left-shifts.
>>>>
>>>> Richard.
>>>>
>>>>> ---
>>>>> With best regards, Konstantin

Follow-Ups:
- Re: x86 gcc lacks simple optimization
  - From: Jeff Law

References:
- x86 gcc lacks simple optimization
  - From: Konstantin Vladimirov
- Re: x86 gcc lacks simple optimization
  - From: Richard Biener
- Re: x86 gcc lacks simple optimization
  - From: Konstantin Vladimirov
- Re: x86 gcc lacks simple optimization
  - From: Richard Biener
- Re: x86 gcc lacks simple optimization
  - From: Konstantin Vladimirov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]