Bug 54398 - Incorrect ARM assembly when building with -fno-omit-frame-pointer -O2
Summary: Incorrect ARM assembly when building with -fno-omit-frame-pointer -O2
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: other (show other bugs)
Version: 4.6.4
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-29 00:21 UTC by asharif
Modified: 2018-12-20 14:12 UTC (History)
3 users (show)

See Also:
Host:
Target: arm
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-09-04 00:00:00


Attachments
Reduced test case (677 bytes, application/x-bzip-compressed-tar)
2012-08-29 00:21 UTC, asharif
Details
Repro case (69.19 KB, application/octet-stream)
2012-09-04 22:47 UTC, asharif
Details

Note You need to log in before you can comment on or make changes to this bug.
Description asharif 2012-08-29 00:21:21 UTC
Created attachment 28096 [details]
Reduced test case

Please see the attached cpp and h files.

I used gcc-4.6.4 at 19788:190734 (as reported by svnversion -c).

Configured it as follows:

configure --disable-multilib --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/armv7a-cros-linux-gnueabi/gcc-bin/4.6.4 --datadir=/usr/share/gcc-data/armv7a-cros-linux-gnueabi/4.6.4 --mandir=/usr/share/gcc-data/armv7a-cros-linux-gnueabi/4.6.4/man --infodir=/usr/share/gcc-data/armv7a-cros-linux-gnueabi/4.6.4/info --includedir=/usr/lib/gcc/armv7a-cros-linux-gnueabi/4.6.4/include --with-gxx-include-dir=/usr/lib/gcc/armv7a-cros-linux-gnueabi/4.6.4/include/g++-v4 --host=x86_64-pc-linux-gnu --target=armv7a-cros-linux-gnueabi --build=x86_64-pc-linux-gnu --enable-languages=c,c++ --with-float=hard --with-mode=thumb --with-sysroot=/usr/armv7a-cros-linux-gnueabi --disable-libmudflap --disable-libssp --enable-libgomp --enable-__cxa_atexit --enable-checking=release --disable-libquadmath --with-arch=armv7-a --disable-esp --enable-linker-build-id

Built my file as follows:

g++ -fno-exceptions -Wno-unused-parameter -Wno-missing-field-initializers -D_FILE_OFFSET_BITS=64 -fvisibility=hidden -pipe -fPIC -fno-strict-aliasing -g -mthumb -Wa,-mimplicit-it=thumb -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -O2 -fno-ident -fdata-sections -ffunction-sections -fno-rtti -fno-threadsafe-statics -fvisibility-inlines-hidden -Wsign-compare -Wno-invalid-offsetof -Wno-multichar -Wno-sign-compare -Wno-abi -O2 -pipe -march=armv7-a -mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -g -fno-omit-frame-pointer -o reduced.out reduced.cpp

The final binary is called reduced.out, and produces different output at -O0 and -O2 with -fno-omit-frame-pointer.

At -O0 (as well as for x86 backend):

p1.x 0
p1.y 0
p2.x 5
p2.y 5
p3.x 10
p3.y 10
p1.x 4
p1.y 4
p2.x 7
p2.y 7
p3.x 10
p3.y 10
p1.x 0
p1.y 0
p2.x 2
p2.y 2
p3.x 4
p3.y 4

At -fno-omit-frame-pointer -O2:

p1.x 0
p1.y 0
p2.x 5
p2.y 5
p3.x 10
p3.y 10
p1.x 4
p1.y 4
p2.x 777508
p2.y 7
p3.x 10
p3.y 10
p1.x 0
p1.y 0
p2.x 777492
p2.y 79193
p3.x 4
p3.y 4


In the assembly, I see:

 7a6:   6139            str     r1, [r7, #16] # Note non-0-offset from r7
 7a8:   f8c7 9014       str.w   r9, [r7, #20] # Note non-0-offset from r7
 7ac:   e893 0003       ldmia.w r3, {r0, r1}
 7b0:   e884 0003       stmia.w r4, {r0, r1}
 7b4:   e897 0003       ldmia.w r7, {r0, r1} # Note 0-offset from r7
 7b8:   e88c 0003       stmia.w ip, {r0, r1}

The two offsets should match otherwise we are using garbage values.
Comment 1 Ramana Radhakrishnan 2012-08-30 23:32:42 UTC
On 29 Aug 2012, at 01:21, asharif at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org> wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54398
> 
>             Bug #: 54398
>           Summary: Incorrect ARM assembly when building with
>                    -fno-omit-frame-pointer -O2
>    Classification: Unclassified
>           Product: gcc
>           Version: 4.6.4
>            Status: UNCONFIRMED
>          Severity: normal
>          Priority: P3
>         Component: other
>        AssignedTo: unassigned@gcc.gnu.org
>        ReportedBy: asharif@gcc.gnu.org
> 
> 
> Created attachment 28096 [details]
>  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28096
> Reduced test case
> 
> Please see the attached cpp and h files.
> 


Please report bugs as per 

http://gcc.gnu.org/bugs/

paying special attention to 

http://gcc.gnu.org/bugs/#dontwant


Thanks, 
Ramana
Comment 2 asharif 2012-09-04 22:47:15 UTC
Created attachment 28131 [details]
Repro case
Comment 3 asharif 2012-09-04 22:48:51 UTC
Sorry about that.

Attached is a preprocessed file that reproduces the bug.

gcc version: gcc-4.6.4 at 19788:190734 (as reported by svnversion -c)
system type: arm linux
configured as: configure --disable-multilib --prefix=/usr
--bindir=/usr/x86_64-pc-linux-gnu/armv7a-cros-linux-gnueabi/gcc-bin/4.6.4
--datadir=/usr/share/gcc-data/armv7a-cros-linux-gnueabi/4.6.4
--mandir=/usr/share/gcc-data/armv7a-cros-linux-gnueabi/4.6.4/man
--infodir=/usr/share/gcc-data/armv7a-cros-linux-gnueabi/4.6.4/info
--includedir=/usr/lib/gcc/armv7a-cros-linux-gnueabi/4.6.4/include
--with-gxx-include-dir=/usr/lib/gcc/armv7a-cros-linux-gnueabi/4.6.4/include/g++-v4
--host=x86_64-pc-linux-gnu --target=armv7a-cros-linux-gnueabi
--build=x86_64-pc-linux-gnu --enable-languages=c,c++ --with-float=hard
--with-mode=thumb --with-sysroot=/usr/armv7a-cros-linux-gnueabi
--disable-libmudflap --disable-libssp --enable-libgomp --enable-__cxa_atexit
--enable-checking=release --disable-libquadmath --with-arch=armv7-a
--disable-esp --enable-linker-build-id
command line: armv7a-cros-linux-gnueabi-g++ -fno-exceptions -Wno-unused-parameter -Wno-missing-field-initializers -D_FILE_OFFSET_BITS=64 -fvisibility=hidden -pipe -fPIC -fno-strict-aliasing -g -mthumb -Wa,-mimplicit-it=thumb -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -O2 -fno-ident -fdata-sections -ffunction-sections -fno-rtti -fno-threadsafe-statics -fvisibility-inlines-hidden -Wsign-compare -Wno-invalid-offsetof -Wno-multichar -Wno-sign-compare -Wno-abi -O2 -pipe -march=armv7-a -mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -g -fno-omit-frame-pointer -o reduced.out reduced.ii
# There is no compiler output
# When I run this on an ARM box, I get this output from the binary:

p1.x 0
p1.y 0
p2.x 5
p2.y 5
p3.x 10
p3.y 10
p1.x 4
p1.y 4
p2.x 762508 # This is 7 with -fomit-frame-pointer
p2.y 7
p3.x 10
p3.y 10
p1.x 0
p1.y 0
p2.x 569140 # this is 2 with -fomit-frame-pointer
p2.y 0
p3.x 4
p3.y 4
Comment 4 Carrot 2012-09-07 01:19:31 UTC
The code before the position Ahmad pointed out is already wrong.

The fault instruction sequence is:

       asrs    r5, r5, #1
       asr     ip, ip, #1      ; A, tmp1.x
       asrs    r0, r0, #1     ; B, tmp1.y
       asrs    r6, r6, #1
       mov     r4, r1
       add     r8, ip, r6      ; C, tmp3.x
       add     r9, r0, r5      ; D, tmp3.y
       add     r7, sp, #0
       asr     r1, r8, #1
       add     ip, r4, #8      ; E,
       asr     r9, r9, #1
       str     r1, [r7, #16]
       str     r9, [r7, #20]
       ldmia   r3, {r0, r1}    ; F,
       stmia   r4, {r0, r1}

Instruction A computes the result of tmp1.x, instruction C use it to compute tmp3.x, instruction E overwrite the value of tmp1.x. But in the source code, tmp1.x is still needed to execute "dst1->p2 = tmp1;", so at last dest1->p2.x gets garbage.

Similarly instruction B computes tmp1.y, instruction D uses it to compute tmp3.y, instruction F overwrites it. After executing "dst1->p2 = tmp1;", dst1->p2.y gets another garbage value.


For comparison, following is the correct version

       asrs    r7, r7, #1    ; A, tmp1.x
       asrs    r0, r0, #1    ; B, tmp1.y
       asrs    r6, r6, #1
       asrs    r5, r5, #1
       sub     sp, sp, #28
       mov     r4, r1
       add     r8, r7, r6    ; C, tmp3.x
       add     ip, r0, r5    ; D, tmp3.y
       str     r7, [sp, #0]  ; X, save tmp1.x
       str     r0, [sp, #4]  ; Y, save tmp1.y
       asr     r1, ip, #1
       add     r7, r4, #8   ; E
       asr     r8, r8, #1
       str     r1, [sp, #20]
       str     r8, [sp, #16]
       ldmia   r3, {r0, r1}  ; F
       stmia   r4, {r0, r1}

The obvious difference is the extra instructions X and Y, they save the value of tmp1 to stack before reusing the register.

The simplified preprocessed source code is


struct A
{
              int x;
              int y;

              void f(const A &a, const A &b)
              {
                      x = (a.x + b.x)>>1;
                      y = (a.y + b.y)>>1;
              }
};

class C {
public:
                A p1;
                A p2;
                A p3;

                bool b;
                void g(C *, C *) const;
};

void C::g(C *dst1, C *dst2) const
{
             A tmp1, tmp2, tmp3;

             tmp1.f(p2,p1);
             tmp2.f(p2,p3);
             tmp3.f(tmp1, tmp2);

             dst1->p1 = p1;
             dst1->p2 = tmp1;
             dst1->p3 =
             dst2->p1 = tmp3;
             dst2->p2 = tmp2;
             dst2->p3 = p3;
}

The simplified command line is:

./cc1plus -fpreprocessed t.ii -quiet -dumpbase t.cpp -mthumb "-march=armv7-a" "-mtune=cortex-a15" -auxbase t -O2 -fno-omit-frame-pointer -o t.s

It looks like the dse2 pass did wrong transformation.

The gcc4.7 and trunk generate correct code.
Comment 5 Carrot 2012-09-11 00:10:45 UTC
It's the bug in local dse sub step in dse.c.

 66 (insn/f 70 69 71 2 (set (reg/f:SI 7 r7)
 67         (plus:SI (reg/f:SI 13 sp)
 68             (const_int 0 [0]))) t.ii:24 -1
 69      (nil))
This insn setup the hfp, r7

197 (insn 12 30 17 2 (set (mem/s/c:SI (reg/f:SI 7 r7) [4 tmp1.x+0 S4 A64])
198         (reg:SI 12 ip [orig:137 D.1799 ] [137])) t.ii:8 694 {*thumb2_movsi_insn}
199      (nil))
This is the store instruction, the memory base address is r7, the hfp register, dse think hfp is constant inside the function, so give it a store group

221 (insn 37 36 34 2 (set (reg/f:SI 8 r8 [170])
222         (reg/f:SI 7 r7)) t.ii:32 694 {*thumb2_movsi_insn}
223      (expr_list:REG_EQUIV (plus:SI (reg/f:SI 7 r7)
224             (const_int 0 [0]))
225         (nil)))
This insn move r7 to r8, it also equals to the value of sp

245 (insn 38 35 39 2 (parallel [
246             (set (reg:SI 0 r0)
247                 (mem/s/c:SI (reg/f:SI 8 r8 [170]) [3 tmp1+0 S4 A64]))
248             (set (reg:SI 1 r1)
249                 (mem/s/c:SI (plus:SI (reg/f:SI 8 r8 [170])
250                         (const_int 4 [0x4])) [3 tmp1+4 S4 A32]))
251         ]) t.ii:32 369 {*ldm2_ia}
252      (nil))
This is the load instruction, the memory base address is r8, const_or_frame_p returns false for r8, after using cselib_expand_value_rtx to r8, we get a base address sp, const_or_frame_p still return false for it. So the corresponding group id is -1 (no corresponding store group), then it can't match the store insn 12. So dse consider the memory stored in insn 12 is never used, thus a dead store, and can be eliminated.

The problem is the hfp based address is considered constant base address, sp and derived addresses are considered varied base address, they will not be matched when detecting interfering memory access. But in many cases sp and hfp can be same. Even worse, addresses copied from or derived from hfp could be recognized as derived from sp, like in this case, and causes memory access mismatch.
Comment 6 Jakub Jelinek 2012-09-11 06:09:13 UTC
cselib.c has for this the various spots that special case HARD_FRAME_POINTER_REGNUM (or STACK_POINTER_REGNUM or FRAME_POINTER_REGNUM).
Please see why that doesn't work in this case.
Comment 7 ramrad01 2012-09-11 10:44:30 UTC
On 09/11/12 07:09, jakub at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54398
>
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                   CC|                            |jakub at gcc dot gnu.org
>
> --- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-09-11 06:09:13 UTC ---
> cselib.c has for this the various spots that special case
> HARD_FRAME_POINTER_REGNUM (or STACK_POINTER_REGNUM or FRAME_POINTER_REGNUM).
> Please see why that doesn't work in this case.
>

This rings a bell.

Maybe the patch mentioned below needs backporting given Carrot is 
reporting this against the 4.6 branch. What's not clear if this is 
reproducible on anything later though.

http://old.nabble.com/-PATCH--Prevent-cselib-substitution-of-FP,-SP,-SFP-td33080657.html

Full disclaimer that I've not investigated whether the same problem 
occurs on trunk.

HTH,
Ramana
Comment 8 Carrot 2012-09-12 20:57:33 UTC
(In reply to comment #7)
> 
> This rings a bell.
> 
> Maybe the patch mentioned below needs backporting given Carrot is 
> reporting this against the 4.6 branch. What's not clear if this is 
> reproducible on anything later though.
> 
> http://old.nabble.com/-PATCH--Prevent-cselib-substitution-of-FP,-SP,-SFP-td33080657.html
> 
The patch can fix this bug.
Comment 9 Richard Earnshaw 2018-12-20 14:02:15 UTC
I don't know precisely when this was fixed, but the testcase produces correct output on (at least) gcc-7 and later.  So presume fixed.
Comment 10 Jakub Jelinek 2018-12-20 14:12:13 UTC
Guess it would be nice to add the testcase into the testsuite in that case.