[RFC] load/store widening question

Marat Zakirov m.zakirov@samsung.com
Thu Feb 19 11:47:00 GMT 2015


On 02/19/2015 12:25 PM, Ramana Radhakrishnan wrote:
> On Thu, Feb 19, 2015 at 9:17 AM, Marat Zakirov <m.zakirov@samsung.com> wrote:
>> Hi all!
>>
>> During my investigation I found that GCC does not performs load/store
>> widening (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65088). Could you
>> please answer is it so? And is there any plans to make it? I also would like
>> to know is there any need to make load/store widening exclusively in ASan
>> phase just for reducing number of ASAN_CHECKS?
>>
>> Example from the bug:
>>
>> $ cat t2.c
>>
>> int a[2];
>> int b[2];
>>
>> int main ()
>> {
>>    b[0] = a[0];
>>    b[1] = a[1];
>>    return 0;
>> }
>>
> The answer is it depends. GCC can have SLP spot this in a generic form
> across ports as in the example below.
>
>
> AArch64 :
>
> main:
>      adrp    x0, a    // 5    *movdi_aarch64/11    [length = 4]
>      add    x0, x0, :lo12:a    // 6    add_losym_di    [length = 4]
>      adrp    x1, b    // 8    *movdi_aarch64/11    [length = 4]
>      add    x1, x1, :lo12:b    // 9    add_losym_di    [length = 4]
>      ldr    d0, [x0]    // 7    *aarch64_simd_movv2si/1    [length = 4]
>      mov    w0, 0    // 15    *movsi_aarch64/4    [length = 4]
>      str    d0, [x1]    // 10    *aarch64_simd_movv2si/2    [length = 4]
>      ret    // 40    simple_return    [length = 4]
>
>
> Or AArch32 without neon, the standard ldm peepholes / ldrd peepholes spot this.
>
> main:
>      @ args = 0, pretend = 0, frame = 0
>      @ frame_needed = 0, uses_anonymous_args = 0
>      @ link register save eliminated.
>      movw    r2, #:lower16:a
>      movw    r3, #:lower16:b
>      movt    r2, #:upper16:a
>      movt    r3, #:upper16:b
>      ldmia    r2, {r1, r2}
>      mov    r0, #0
>      stmia    r3, {r1, r2}
>      bx    lr
>
>
> It will be interesting to see if the number of checks can be reduced
> but I suspect you'll hit quite a few phase ordering issues and you'll
> have quite a few variances between ports to make this work sensibly.
>
>
>
> regards
> Ramana
>
>
>> $ gcc t2.c -O3 -S
>>
>> $ cat t2.s
>>
>> ...
>>
>> main:
>> .LFB0:
>>          .cfi_startproc
>>          movl    a(%rip), %eax
>>          movl    %eax, b(%rip)
>>          movl    a+4(%rip), %eax
>>          movl    %eax, b+4(%rip)
>>          xorl    %eax, %eax
>>          ret
>>          .cfi_endproc
>>
>>
>>
>> I will be very appreciate for your answers and thoughts.
>>
>> --Marat
>>
Thank you very much Ramana.
I also would like x86 maintainers to explain why x86 GCC didn't handle 
given example?

--Marat



More information about the Gcc mailing list