[RFC] load/store widening question

Thu Feb 19 09:25:00 GMT 2015

On Thu, Feb 19, 2015 at 9:17 AM, Marat Zakirov <m.zakirov@samsung.com> wrote:
> Hi all!
>
> During my investigation I found that GCC does not performs load/store
> widening (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65088). Could you
> please answer is it so? And is there any plans to make it? I also would like
> to know is there any need to make load/store widening exclusively in ASan
> phase just for reducing number of ASAN_CHECKS?
>
> Example from the bug:
>
> $ cat t2.c
>
> int a[2];
> int b[2];
>
> int main ()
> {
>   b[0] = a[0];
>   b[1] = a[1];
>   return 0;
> }
>

The answer is it depends. GCC can have SLP spot this in a generic form
across ports as in the example below.

AArch64 :

main:
    adrp    x0, a    // 5    *movdi_aarch64/11    [length = 4]
    add    x0, x0, :lo12:a    // 6    add_losym_di    [length = 4]
    adrp    x1, b    // 8    *movdi_aarch64/11    [length = 4]
    add    x1, x1, :lo12:b    // 9    add_losym_di    [length = 4]
    ldr    d0, [x0]    // 7    *aarch64_simd_movv2si/1    [length = 4]
    mov    w0, 0    // 15    *movsi_aarch64/4    [length = 4]
    str    d0, [x1]    // 10    *aarch64_simd_movv2si/2    [length = 4]
    ret    // 40    simple_return    [length = 4]

Or AArch32 without neon, the standard ldm peepholes / ldrd peepholes spot this.

main:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    movw    r2, #:lower16:a
    movw    r3, #:lower16:b
    movt    r2, #:upper16:a
    movt    r3, #:upper16:b
    ldmia    r2, {r1, r2}
    mov    r0, #0
    stmia    r3, {r1, r2}
    bx    lr

It will be interesting to see if the number of checks can be reduced
but I suspect you'll hit quite a few phase ordering issues and you'll
have quite a few variances between ports to make this work sensibly.

regards
Ramana

> $ gcc t2.c -O3 -S
>
> $ cat t2.s
>
> ...
>
> main:
> .LFB0:
>         .cfi_startproc
>         movl    a(%rip), %eax
>         movl    %eax, b(%rip)
>         movl    a+4(%rip), %eax
>         movl    %eax, b+4(%rip)
>         xorl    %eax, %eax
>         ret
>         .cfi_endproc
>
>
>
> I will be very appreciate for your answers and thoughts.
>
> --Marat
>