x86 gcc lacks simple optimization

David Brown david@westcontrol.com
Fri Dec 6 09:28:00 GMT 2013


On 06/12/13 09:30, Konstantin Vladimirov wrote:
> Hi,
> 
> Consider code:
> 
> int foo(char *t, char *v, int w)
> {
> int i;
> 
> for (i = 1; i != w; ++i)
> {
> int x = i << 2;
> v[x + 4] = t[x + 4];
> }
> 
> return 0;
> }
> 
> Compile it to x86 (I used both gcc 4.7.2 and gcc 4.8.1) with options:
> 
> gcc -O2 -m32 -S test.c
> 
> You will see loop, formed like:
> 
> .L5:
> leal 0(,%eax,4), %edx
> addl $1, %eax
> movzbl 4(%edi,%edx), %ecx
> cmpl %ebx, %eax
> movb %cl, 4(%esi,%edx)
> jne .L5
> 
> But it can be easily simplified to something like this:
> 
> .L5:
> addl $1, %eax
> movzbl (%esi,%eax,4), %edx
> cmpl %ecx, %eax
> movb %dl, (%ebx,%eax,4)
> jne .L5
> 
> (i.e. left shift may be moved to address).
> 
> First question to gcc-help maillist. May be there are some options,
> that I've missed, and there IS a way to explain gcc my intention to do
> this?
> 
> And second question to gcc developers mail list. I am working on
> private backend and want to add this optimization to my backend. What
> do you advise me to do -- custom gimple pass, or rtl pass, or modify
> some existent pass, etc?
> 

Hi,

Usually the gcc developers are not keen on emails going to both the help
and development list - they prefer to keep them separate.

My first thought when someone finds a "missed optimisation" issue,
especially with the x86 target, is are you /sure/ this code is slower?
x86 chips are immensely complex, and the interplay between different
instructions, pipelines, superscaling, etc., means that code that might
appear faster, can actually be slower.  So please check your
architecture flags (i.e., are you optimising for the "native" cpu, or
any other specific cpu - optimised code can be different for different
x86 cpus).  Then /measure/ the speed of the code to see if there is a
real difference.


Regarding your "private backend" - is this a modification of the x86
backend, or a completely different target?  If it is x86, then I think
the answer is "don't do it - work with the mainline code".  If it is
something else, then an x86-specific optimisation is of little use anyway.

mvh.,

David





More information about the Gcc-help mailing list