Bug 78529 - gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2
Summary: gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: testsuite (show other bugs)
Version: 7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: lto
Depends on:
Blocks:
 
Reported: 2016-11-25 16:10 UTC by amker
Modified: 2017-03-31 07:49 UTC (History)
8 users (show)

See Also:
Host:
Target: aarch64-elf
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-11-30 00:00:00


Attachments
revert part of r242745 for strcat, strcat_chk, strncat, strncat_chk (641 bytes, patch)
2016-11-28 20:33 UTC, prathamesh3492
Details | Diff
test case (234 bytes, text/x-csrc)
2017-01-06 16:38 UTC, Renlin Li
Details
memset.c (168 bytes, text/x-csrc)
2017-01-06 16:39 UTC, Renlin Li
Details
reduced objdump assembler file (1.58 KB, text/plain)
2017-01-06 16:42 UTC, Renlin Li
Details

Note You need to log in before you can comment on or make changes to this bug.
Description amker 2016-11-25 16:10:51 UTC
Hi,
With below commit:
commit c618308c1b2a474bd56ede831f681a49f4327d4c
Author: prathamesh3492 <prathamesh3492@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Nov 23 10:52:25 2016 +0000

    2016-11-23  Richard Biener  <rguenther@suse.de>
            Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.rog>
    
        PR tree-optimization/78154
        * tree-vrp.c (gimple_stmt_nonzero_warnv_p): Return true if function
        returns it's argument and the argument is nonnull.
        * builtin-attrs.def: Define ATTR_RETURNS_NONNULL,
        ATT_RETNONNULL_NOTHROW_LEAF.
        * builtins.def (BUILT_IN_MEMPCPY): Change attribute to
        ATTR_RETNONNULL_NOTHROW_LEAF.
        (BUILT_IN_STPCPY): Likewise.
        (BUILT_IN_STPNCPY): Likewise.
        (BUILT_IN_MEMPCPY_CHK): Likewise.
        (BUILT_IN_STPCPY_CHK): Likewise.
        (BUILT_IN_STPNCPY_CHK): Likewise.
        (BUILT_IN_STRCAT): Change attribute to ATTR_RET1_NOTHROW_NONNULL_LEAF.
        (BUILT_IN_STRNCAT): Likewise.
        (BUILT_IN_STRNCPY): Likewise.
        (BUILT_IN_MEMSET_CHK): Likewise.
        (BUILT_IN_STRCAT_CHK): Likewise.
        (BUILT_IN_STRCPY_CHK): Likewise.
        (BUILT_IN_STRNCAT_CHK): Likewise.
        (BUILT_IN_STRNCPY_CHK): Likewise.
    
    testsuite/
        * gcc.dg/tree-ssa/pr78154.c: New test.
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242745 138bc75d-0d04-0410-961f-82ee72b054a4

We have new failure:
FAIL: gcc.c-torture/execute/builtins/strcat-chk.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 

With/without the revision, there is difference in assembly like:
*** 1368,1387 ****
    4023e4:	b9401261 	ldr	w1, [x19, #16]
    4023e8:	6b00003f 	cmp	w1, w0
    4023ec:	54ffff01 	b.ne	4023cc <test1+0x14c>  // b.any
!   4023f0:	91001664 	add	x4, x19, #0x5
!   4023f4:	d2800802 	mov	x2, #0x40                  	// #64
!   4023f8:	52800b01 	mov	w1, #0x58                  	// #88
    4023fc:	aa1303e0 	mov	x0, x19
    402400:	94000950 	bl	404940 <memset>
    402404:	b9400a96 	ldr	w22, [x20, #8]
    402408:	f9400297 	ldr	x23, [x20]
    40240c:	d2800762 	mov	x2, #0x3b                  	// #59
    402410:	10071c81 	adr	x1, 4107a0 <s1+0x90>
!   402414:	aa0403e0 	mov	x0, x4
    402418:	f90023b7 	str	x23, [x29, #64]
    40241c:	b9004bb6 	str	w22, [x29, #72]
    402420:	97fffc0c 	bl	401450 <__strcat_chk>
!   402424:	eb00001f 	cmp	x0, x0
    402428:	54fffd21 	b.ne	4023cc <test1+0x14c>  // b.any
    40242c:	10071820 	adr	x0, 410730 <s1+0x20>
    402430:	f94023a2 	ldr	x2, [x29, #64]
--- 1368,1387 ----
    4023e4:	b9401261 	ldr	w1, [x19, #16]
    4023e8:	6b00003f 	cmp	w1, w0
    4023ec:	54ffff01 	b.ne	4023cc <test1+0x14c>  // b.any
!   4023f0:	d2800802 	mov	x2, #0x40                  	// #64
!   4023f4:	52800b01 	mov	w1, #0x58                  	// #88
!   4023f8:	91001678 	add	x24, x19, #0x5
    4023fc:	aa1303e0 	mov	x0, x19
    402400:	94000950 	bl	404940 <memset>
    402404:	b9400a96 	ldr	w22, [x20, #8]
    402408:	f9400297 	ldr	x23, [x20]
    40240c:	d2800762 	mov	x2, #0x3b                  	// #59
    402410:	10071c81 	adr	x1, 4107a0 <s1+0x90>
!   402414:	aa1803e0 	mov	x0, x24
    402418:	f90023b7 	str	x23, [x29, #64]
    40241c:	b9004bb6 	str	w22, [x29, #72]
    402420:	97fffc0c 	bl	401450 <__strcat_chk>
!   402424:	eb00031f 	cmp	x24, x0
    402428:	54fffd21 	b.ne	4023cc <test1+0x14c>  // b.any
    40242c:	10071820 	adr	x0, 410730 <s1+0x20>
    402430:	f94023a2 	ldr	x2, [x29, #64]


Looks like the "cmp x0, x0" instruction is wrongly optimized?

GCC is configured as:

configure --target=aarch64-none-elf --prefix=... --with-gmp=... --with-mpfr=... --with-mpc=... --with-isl=... --with-pkgversion=unknown --disable-shared --disable-nls --disable-threads --disable-tls --enable-checking=yes --enable-languages=c,c++,fortran --with-newlib
Comment 1 Richard Biener 2016-11-28 08:57:27 UTC
Note this might be an artifact of how these tests are set up (implemeting their own builtins).
Comment 2 prathamesh3492 2016-11-28 11:04:03 UTC
Sorry for the breakage, I am looking into the issue.

Regards,
Prathamesh
Comment 3 prathamesh3492 2016-11-28 20:33:24 UTC
Created attachment 40183 [details]
revert part of r242745 for strcat, strcat_chk, strncat, strncat_chk

Hi,
Unfortunately I haven't been able to reproduce the issue on aarch64-none-elf with (almost) identical configure opts with make-check,
and neither did it get caught by our validation matrix.
I would be grateful for suggestions on how to build strcat-chk.c standalone.
It seems it depends on lib/chk.c and possibly other files ?
I tried a few approaches but none of them worked :/

Regarding the code-gen difference "cmp x0, x0", I have verified that is not caused by my patch.
The same can be reproduced with the following test-case before r242745.

char *f(char *dest, char *src)
{
  if (__builtin_strcpy (dest + 5, src) != (dest + 5))
    __builtin_abort ();
}

-O2 shows following assembly for aarch64-none-elf:
f:
        add     x2, x0, 5
        stp     x29, x30, [sp, -16]!
        mov     x0, x2
        add     x29, sp, 0
        bl      strcpy
        cmp     x0, x0
        bne     .L5
        ldp     x29, x30, [sp], 16
        ret

This seems to start after "pro_and_epilogue" pass, which probably realizes strcpy returns 1st arg.
The dump of the pass contains the following insn:
(insn 13 36 14 2 (set (reg:CC 66 cc)
        (compare:CC (reg/f:DI 0 x0 [orig:73 _1 ] [73])
            (reg:DI 0 x0))) "foo2.c":3 393 {cmpdi}
     (nil))

Full dump: http://pastebin.com/sUmg09SK

Apparently, this is also observed on x86_64-unknown-linux-gnu:
f:
.LFB0:
        .cfi_startproc
        leaq    5(%rdi), %rdx
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movq    %rdx, %rdi
        call    strcpy
        cmpq    %rax, %rax
        jne     .L5
        addq    $8, %rsp
        .cfi_remember_state
        .cfi_def_cfa_offset 8
        ret

The reason it started showing up this for strcat/strcat_chk is because r242745 changed attribute of BUILT_IN_STRCAT_CHK
and BUILT_IN_STRCAT from ATTR_NOTHROW_NONNULL_LEAF to ATTR_RET1_NOTHROW_NONNULL_LEAF, which is same as for strcpy.
Could you check if reverting that change (attached patch), doesn't cause the issue ?

Thanks,
Prathamesh
Comment 4 Jakub Jelinek 2016-11-29 09:16:55 UTC
The cmp %rax, %rax is just a missed optimization, because we manage to optimize it only so late that nothing cleans it up afterwards.  We could optimize this away already in GIMPLE, e.g. for SCCVN use the same VN for return value of pass-through builtin as the corresponding argument (if we don't do that already).
That doesn't explain the failure.
Comment 5 prathamesh3492 2016-11-29 10:00:18 UTC
(In reply to Jakub Jelinek from comment #4)
> The cmp %rax, %rax is just a missed optimization, because we manage to
> optimize it only so late that nothing cleans it up afterwards.  We could
> optimize this away already in GIMPLE, e.g. for SCCVN use the same VN for
> return value of pass-through builtin as the corresponding argument (if we
> don't do that already).
> That doesn't explain the failure.
Hmm yes, but I don't understand why should this patch cause the test to fail.
The patch makes following two separate changes:

a) to make gimple_stmt_nonzero_warnv_p (stmt) return true, if stmt is function-call, and the function returns one of it's argument and that argument is non-null. However this change doesn't make any difference to the .evrp
(and .optimized) dumps for strcat-chk.c    
because the patch expects SSA_VAR_P (arg) to be true, and the argument is MEM_REF.

b) The other change was to modify attribute of string builtins like STRCAT, STRNCAT, etc. from ATTR_NOTHROW_NONNULL_LEAF
to ATTR_RET1_NOTHROW_NONNULL_LEAF, which resulted in above code-gen difference, but as you mention it's a missed optimization
and not incorrect.

I would be grateful for suggestions on how to proceed.

Thanks,
Prathamesh
Comment 6 Richard Biener 2016-11-29 11:26:57 UTC
(In reply to prathamesh3492 from comment #5)
> (In reply to Jakub Jelinek from comment #4)
> > The cmp %rax, %rax is just a missed optimization, because we manage to
> > optimize it only so late that nothing cleans it up afterwards.  We could
> > optimize this away already in GIMPLE, e.g. for SCCVN use the same VN for
> > return value of pass-through builtin as the corresponding argument (if we
> > don't do that already).
> > That doesn't explain the failure.
> Hmm yes, but I don't understand why should this patch cause the test to fail.
> The patch makes following two separate changes:
> 
> a) to make gimple_stmt_nonzero_warnv_p (stmt) return true, if stmt is
> function-call, and the function returns one of it's argument and that
> argument is non-null. However this change doesn't make any difference to the
> .evrp
> (and .optimized) dumps for strcat-chk.c    
> because the patch expects SSA_VAR_P (arg) to be true, and the argument is
> MEM_REF.
> 
> b) The other change was to modify attribute of string builtins like STRCAT,
> STRNCAT, etc. from ATTR_NOTHROW_NONNULL_LEAF
> to ATTR_RET1_NOTHROW_NONNULL_LEAF, which resulted in above code-gen
> difference, but as you mention it's a missed optimization
> and not incorrect.
> 
> I would be grateful for suggestions on how to proceed.
>

As I said, look at the additional sources involved, like execute/builtins/lib/chk.c, and the effect on those.

> Thanks,
> Prathamesh
Comment 7 Richard Biener 2016-11-29 11:29:42 UTC
(In reply to Richard Biener from comment #6)
> (In reply to prathamesh3492 from comment #5)
> > (In reply to Jakub Jelinek from comment #4)
> > > The cmp %rax, %rax is just a missed optimization, because we manage to
> > > optimize it only so late that nothing cleans it up afterwards.  We could
> > > optimize this away already in GIMPLE, e.g. for SCCVN use the same VN for
> > > return value of pass-through builtin as the corresponding argument (if we
> > > don't do that already).
> > > That doesn't explain the failure.
> > Hmm yes, but I don't understand why should this patch cause the test to fail.
> > The patch makes following two separate changes:
> > 
> > a) to make gimple_stmt_nonzero_warnv_p (stmt) return true, if stmt is
> > function-call, and the function returns one of it's argument and that
> > argument is non-null. However this change doesn't make any difference to the
> > .evrp
> > (and .optimized) dumps for strcat-chk.c    
> > because the patch expects SSA_VAR_P (arg) to be true, and the argument is
> > MEM_REF.
> > 
> > b) The other change was to modify attribute of string builtins like STRCAT,
> > STRNCAT, etc. from ATTR_NOTHROW_NONNULL_LEAF
> > to ATTR_RET1_NOTHROW_NONNULL_LEAF, which resulted in above code-gen
> > difference, but as you mention it's a missed optimization
> > and not incorrect.
> > 
> > I would be grateful for suggestions on how to proceed.
> >
> 
> As I said, look at the additional sources involved, like
> execute/builtins/lib/chk.c, and the effect on those.

Btw, I am not seeing the failure on x86_64 either, so if you can't reproduce on aarch64 then ask Bin how to reproduce.

> > Thanks,
> > Prathamesh
Comment 8 amker 2016-11-29 13:58:50 UTC
Hi Prathamesh, I will check on it and get back to you.  Thanks for looking at this.
Comment 9 prathamesh3492 2016-11-29 17:49:31 UTC
(In reply to Richard Biener from comment #6)
> (In reply to prathamesh3492 from comment #5)
> > (In reply to Jakub Jelinek from comment #4)
> > > The cmp %rax, %rax is just a missed optimization, because we manage to
> > > optimize it only so late that nothing cleans it up afterwards.  We could
> > > optimize this away already in GIMPLE, e.g. for SCCVN use the same VN for
> > > return value of pass-through builtin as the corresponding argument (if we
> > > don't do that already).
> > > That doesn't explain the failure.
> > Hmm yes, but I don't understand why should this patch cause the test to fail.
> > The patch makes following two separate changes:
> > 
> > a) to make gimple_stmt_nonzero_warnv_p (stmt) return true, if stmt is
> > function-call, and the function returns one of it's argument and that
> > argument is non-null. However this change doesn't make any difference to the
> > .evrp
> > (and .optimized) dumps for strcat-chk.c    
> > because the patch expects SSA_VAR_P (arg) to be true, and the argument is
> > MEM_REF.
> > 
> > b) The other change was to modify attribute of string builtins like STRCAT,
> > STRNCAT, etc. from ATTR_NOTHROW_NONNULL_LEAF
> > to ATTR_RET1_NOTHROW_NONNULL_LEAF, which resulted in above code-gen
> > difference, but as you mention it's a missed optimization
> > and not incorrect.
> > 
> > I would be grateful for suggestions on how to proceed.
> >
> 
> As I said, look at the additional sources involved, like
> execute/builtins/lib/chk.c, and the effect on those.
Hi Richard,
It seems strcat-chk.c depends on lib/chk.c and lib/main.c.
For lib/chk.c, I verified that there is no difference in .optimized dump
and slim .dfinish dump with and without patch on aarch64-none-elf.
lib/main.c just contains definitions of main() which calls main_test() and link_error(), so didn't check that.

Thanks,
Prathamesh
> 
> > Thanks,
> > Prathamesh
Comment 10 prathamesh3492 2016-11-29 19:05:27 UTC
On a related note, Jim told me he is seeing following failures
on aarch64-none-elf before and after updating the tree.

FAIL: gcc.c-torture/execute/builtins/memset-chk.c execution,  -O2
-flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2
-flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2
-flto -fuse-linker-plugin -fno-fat-lto-objects

However he isn't seeing the failure for strcat-chk.c before or after.
Before he updated the tree, it was older than 23rd Nov, which didn't
contain r242745. Perhaps the issue is sth else ?
Bin, are you able to see the above test-cases FAIL too ?

Thanks,
Prathamesh
Comment 11 Jim Wilson 2016-11-30 05:57:51 UTC
FYI I'm using the gdb simulator to run the testcases.
Comment 12 Andrew Pinski 2016-11-30 06:13:35 UTC
Fails only on -elf targets for me. That is it passes for aarch64-linux-gnu but not for aarch64-elf. My testsuite go to the mailing list if anyone wants to look into them.
Comment 14 Jim Wilson 2016-12-01 07:45:21 UTC
I debugged one of my gcc.c-torture/execute/builtins failures, and found a bug in the gdb aarch64 simulator with the FP stur instruction support.  I submitted a patch to gdb-patches.  With that fix, all of the tests in this dir are passing for me now.  Hence, I am still unable to reproduce this failure with my build tree.

https://sourceware.org/ml/gdb-patches/2016-12/msg00018.html
Comment 15 Andrew Pinski 2016-12-01 07:55:08 UTC
I should note I run these tests using the installed compiler rather than directly from the build directory
Comment 16 prathamesh3492 2016-12-01 10:09:55 UTC
(In reply to Andrew Pinski from comment #15)
> I should note I run these tests using the installed compiler rather than
> directly from the build directory
Hi Andrew,
Could you please check if you can reproduce strcat-chk.c failure
before r242745 ? Unfortunately I am not able to reproduce it locally.
Thanks!
Comment 17 Jim Wilson 2016-12-05 05:18:52 UTC
I still haven't been able to reproduce this, but I do see a problem.

In the original bug report, the only difference is that the code uses x4 in the first part of the diff, and x24 in the second part of the diff, which seems unimportant.  However, this value lives across a call to memcpy.  x24 is a safe register here because it is callee saved.  x4 is not safe though, as it is an argument passing/return value register, which may be clobbered by a call.  Whether it gets clobbered depends on the memcpy implementation that is linked with.  If people are linking with different memcpy implementations, that might affect whether the bug is reproducible.

Disassembling my testcase, I don't see the same code sequence though.  I see
  401530:       d2800802        mov     x2, #0x40                       // #64
  401534:       52800b01        mov     w1, #0x58                       // #88
  401538:       aa1303e0        mov     x0, x19
  40153c:       940000d1        bl      401880 <memset>
  401540:       9121c324        add     x4, x25, #0x870
  401544:       91001663        add     x3, x19, #0x5                           
which is OK, because the "add x3, x19, #0x5" instruction comes after the memset call.

Maybe there is something subtly different about how I'm configuring or building the toolchain that results in the different LTO optimized code.
Comment 18 amker 2016-12-12 17:51:28 UTC
(In reply to Jim Wilson from comment #17)
> I still haven't been able to reproduce this, but I do see a problem.
> 
> In the original bug report, the only difference is that the code uses x4 in
> the first part of the diff, and x24 in the second part of the diff, which
> seems unimportant.  However, this value lives across a call to memcpy.  x24
> is a safe register here because it is callee saved.  x4 is not safe though,
> as it is an argument passing/return value register, which may be clobbered
> by a call.  Whether it gets clobbered depends on the memcpy implementation
> that is linked with.  If people are linking with different memcpy
> implementations, that might affect whether the bug is reproducible.
> 
> Disassembling my testcase, I don't see the same code sequence though.  I see
>   401530:       d2800802        mov     x2, #0x40                       //
> #64
>   401534:       52800b01        mov     w1, #0x58                       //
> #88
>   401538:       aa1303e0        mov     x0, x19
>   40153c:       940000d1        bl      401880 <memset>
>   401540:       9121c324        add     x4, x25, #0x870
>   401544:       91001663        add     x3, x19, #0x5                       
> 
> which is OK, because the "add x3, x19, #0x5" instruction comes after the
> memset call.
> 
> Maybe there is something subtly different about how I'm configuring or
> building the toolchain that results in the different LTO optimized code.

Hi Jim,
I think that's the problem, Wilco also noticed that use of x4 is bogus.  It could be a RA bug triggered by this change though.  I will double check that later.  Thanks.
Comment 19 James Greenhalgh 2016-12-13 12:17:39 UTC
That this only happens for LTO builds is the key here.

gcc/testsuite/gcc.c-torture/execute/builtins/strcat-chk-lib.c includes gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c which defines a memset.

In an LTO build, that definition of memset is visible, here is what it looks like after code generation:

memset:
	sub	x3, x2, #1
	cbz	x2, .L158
	and	w1, w1, 255
	.p2align 2
.L159:
	strb	w1, [x0, x3]
	sub	x3, x3, #1
	cmn	x3, #1
	bne	.L159
.L158:
	adr	x1, .LANCHOR0
	ldr	w1, [x1, 2096]
	ret

Clearly x4 is not going to be clobbered there, so ipa-ra can assume it isn't used and is safe to clobber.

At final link time, on aarch64-none-elf builds with specs files that are pulling in start-up and tear-down code which uses a library memset, the linker may pull in the library memset before it sees the memset from lib/chk.c .

That would be an error:

/tmp/ccpefK3l.ltrans0.ltrans.o: In function `memset':
<artificial>:(.text+0x4a0): multiple definition of `memset'
.../aarch64-none-elf/lib/libc.a(lib_a-memset.o): .../newlib/libc/machine/aarch64/memset.S:90: first defined here

Were it not for the flag added to resolve PR55994 -Wl,--allow-multiple-definition .

So, in my opinion, the testcase is broken and could always have failed in this way. The combination of register allocation, LTO and order the linker sees symbols explains why this is hard to reproduce.
Comment 20 Jakub Jelinek 2016-12-13 12:33:23 UTC
Unless you do something very nasty in the spec files (in which case you should just avoid those tests), the user specified objects should always appear before stuff coming from -lc unless -lc is specified first on the command line.
Comment 21 amker 2016-12-13 18:04:29 UTC
(In reply to Jakub Jelinek from comment #20)
> Unless you do something very nasty in the spec files (in which case you
> should just avoid those tests), the user specified objects should always
> appear before stuff coming from -lc unless -lc is specified first on the
> command line.

The link command is like:
/.../collect2 -plugin /.../liblto_plugin.so "-plugin-opt=/.../lto-wrapper" "-plugin-opt=-fresolution=strcat-chk.res" "-plugin-opt=-pass-through=-lgcc" "-plugin-opt=-pass-through=-lc" "-plugin-opt=-pass-through=-lrdimon" "-plugin-opt=-pass-through=-lgcc" -flto -Ttext-segment 0x80000000 -EL -X -maarch64elf -o a.exe /.../crti.o /.../crtbegin.o /.../../../../../aarch64-none-elf/lib/crt0.o -L/... -L/.../../lib/gcc -L/.../../../../../aarch64-none-elf/lib strcat-chk.o strcat-chk-lib.o main.o --allow-multiple-definition -lm -Map a.map -lgcc /.../../../../../aarch64-none-elf/lib/cpu-init/rdimon-aem-el3.o --start-group -lc -lrdimon --end-group -lgcc /.../crtend.o /.../crtn.o

libraries goes after object files in this case.  With link map file as:
Archive member included to satisfy reference by file (symbol)

/.../../../../../aarch64-none-elf/lib/libc.a(lib_a-atexit.o)
                              /.../../../../../aarch64-none-elf/lib/crt0.o (atexit)
/.../../../../../aarch64-none-elf/lib/libc.a(lib_a-exit.o)
                              /.../../../../../aarch64-none-elf/lib/crt0.o (exit)
/.../../../../../aarch64-none-elf/lib/libc.a(lib_a-fini.o)
                              /.../../../../../aarch64-none-elf/lib/crt0.o (__libc_fini_array)
/.../../../../../aarch64-none-elf/lib/libc.a(lib_a-impure.o)
                              /.../../../../../aarch64-none-elf/lib/libc.a(lib_a-exit.o) (_global_impure_ptr)
/.../../../../../aarch64-none-elf/lib/libc.a(lib_a-init.o)
                              /.../../../../../aarch64-none-elf/lib/crt0.o (__libc_init_array)
/.../../../../../aarch64-none-elf/lib/libc.a(lib_a-memset.o)
                              /.../../../../../aarch64-none-elf/lib/crt0.o (memset)

It looks like library version memset is pulled in because of crt0.o and zeroing out BSS section.

Shouldn't linker first parse all imports/exports of under provided compilation units, then search for undefined symbols in libraries?
Comment 22 Renlin Li 2017-01-06 16:37:35 UTC
(In reply to James Greenhalgh from comment #19)

> That would be an error:
> 
> /tmp/ccpefK3l.ltrans0.ltrans.o: In function `memset':
> <artificial>:(.text+0x4a0): multiple definition of `memset'
> .../aarch64-none-elf/lib/libc.a(lib_a-memset.o):
> .../newlib/libc/machine/aarch64/memset.S:90: first defined here
> 
> Were it not for the flag added to resolve PR55994
> -Wl,--allow-multiple-definition .
> 
> So, in my opinion, the testcase is broken and could always have failed in
> this way. The combination of register allocation, LTO and order the linker
> sees symbols explains why this is hard to reproduce.

I had exactly the same errors and issues today.
I reduced it to a minimum test case. Please check the new attachment

The build command line is:
aarch64-none-elf-gcc -O2 -specs=aem-ve.specs -Wl,--allow-multiple-definition -lm -flto main.c memset.c  -o new.exe

The expected output should be "A A A 2"

0000000080001038 <main>:
    80001038:	a9bf7bfd 	stp	x29, x30, [sp,#-16]!
    8000103c:	90000123 	adrp	x3, 80025000 <__global_locale+0x68>
    80001040:	52800044 	mov	w4, #0x2                   	// #2
    80001044:	91060060 	add	x0, x3, #0x180
    80001048:	910003fd 	mov	x29, sp
    8000104c:	b9018064 	str	w4, [x3,#384]
    80001050:	d2800402 	mov	x2, #0x20                  	// #32
    80001054:	52800821 	mov	w1, #0x41                  	// #65
    80001058:	91002000 	add	x0, x0, #0x8

# At this function entry, x4 is not saved. Because LTO thinks the local memset
# implementation will not clobber it. However, the libc version of memeset is
# linked in the final binary. The implementation there will clobber x4. This
# will cause run-time data corruption, which is shown here.

    8000105c:	94000a39 	bl	80003940 <memset>
    80001060:	a8c17bfd 	ldp	x29, x30, [sp],#16
    80001064:	52800823 	mov	w3, #0x41                  	// #65
    80001068:	90000080 	adrp	x0, 80011000 <__swbuf_r+0x70>
    8000106c:	2a0303e2 	mov	w2, w3
    80001070:	2a0303e1 	mov	w1, w3
    80001074:	91152000 	add	x0, x0, #0x548
    80001078:	140015c0 	b	80006778 <printf>
    8000107c:	00000000 	.inst	0x00000000 ; undefined



This is mentioned above. But allow me to ask again:
"aarch64-none-elf-gcc -O2  main.c memset.c  -o new.o -specs=aem-ve.specs -lm -flto"
will give the "multiple definition of `memset'" error
while
"aarch64-none-elf-gcc -O2  main.c memset.c  -o new.o -specs=aem-ve.specs -lm" won't.

Should them behavior the same? By adding "-Wl,--allow-multiple-definition" do fix this erro. But why it's the test case that is broken instead of the lto pass?
Comment 23 Renlin Li 2017-01-06 16:38:42 UTC
Created attachment 40472 [details]
test case
Comment 24 Renlin Li 2017-01-06 16:39:02 UTC
Created attachment 40473 [details]
memset.c
Comment 25 Renlin Li 2017-01-06 16:42:07 UTC
Created attachment 40474 [details]
reduced objdump assembler file
Comment 26 Jim Wilson 2017-01-07 03:23:41 UTC
I can reproduce the problem with this new reduced testcase.  I don't have knowledge of all of the details how the gcc implementation of LTO works, but my understanding goes something like this.

The testcase is defining memset.  Memset is a reserved identifier (ISO C section 7.1.3).  So the testcase is violating the ISO C standard.

The memset definition in the input source is LTO optimized, which means it gets turned into an LTO symbol that is not immediately visible to the linker.  The linker sees the startfiles first, and then the LTO optimized files.  The crt0.o file has a reference to memset.  The linker can't use the LTO optimized memset to satisfy this reference, because it is still a special LTO symbol not a normal symbol.  So it pulls in memset from the C library to satisfy the reference.  The linker then looks at the LTO optimized files, calls the compiler to convert them from LTO symbols into normal symbols, and then discovers that it has two memset functions, and because --allow-multiple-definitions is given, it arbitrarily discards one instead of giving an error.  Since the memset from libc.a was already linked in, I don't think it has much choice, and has to drop the new one from LTO.

The main function was LTO optimized expecting that it would call the LTO optimized memset function, which uses a restricted register set, thus allowing main to use normally call clobbered registers across the memset call.  However, if the linker drops the LTO optimized memset, then we get a call to the library memset, which clobbers all call clobbered regs, and the main function fails.

The problem is primarily with the testcase, and secondarily with how LTO works.

If the LTO link converted the LTO symbols into normal symbols before trying to resolve unsatisfied symbols from crt0.o, that would appear to solve the problem.  This would require changes to the linker, changes that may or may not be reasonable.

If crt0.o didn't call memset, that would solve the problem, but that isn't a very reasonable solution.  There may also be issues with other standard library functions and/or other start files.

If the newlib C library was built as a shared library that would appear to solve the problem, as the LTO symbol would override the shared library symbol, but that isn't a reasonable solution for an embedded target.

The easy solution is to stop running this test on embedded targets that don't have a shared C library.
Comment 27 James Greenhalgh 2017-01-09 11:06:17 UTC
(In reply to Jim Wilson from comment #26)
> I can reproduce the problem with this new reduced testcase.  I don't have
> knowledge of all of the details how the gcc implementation of LTO works, but
> my understanding goes something like this.

That fits with my understanding of this test failure when I looked at it in December.
Comment 28 Richard Biener 2017-01-13 11:43:26 UTC
From the description it might be worth to check the behavior when using gold vs. BFD.  But yes, symbols from the standard runtime have long been a problem with LTO and the way we do linking "twice".  OTOH the linker _does_ see the memset
symbol but it seems to ignore it?  And AFAIK the resolution we get from the linker isn't able to tell us that there's another provider of a symbol.
Comment 29 Jan Hubicka 2017-01-25 10:21:55 UTC
I think the official answer for LTOing symbols implementing runtime (i.e. those that can be introduced by middle-end) is "don't do that". We may support LTOing them with an explicit attribute "used" on them or, if we get really into LTOing runtime libraries, somehow auto generating used attributes by having some central way of saying if symbol is implementing runtime or not. We don't have this and we have many different runtime calls (libgcc, decnumber, standard library symbols like printf->putc translation and many more).
Comment 30 Richard Biener 2017-03-14 11:36:19 UTC
Kind of SUSPENDED I guess.  Or we need to fix this testsuite part (in the past we just added more and more flag workarounds).
Comment 31 Jakub Jelinek 2017-03-22 17:15:47 UTC
So, is there a way to request -fno-lto compilation of the gcc.c-torture/execute/builtins/lib/*.c TUs?  The rest probably can stay -flto when testing lto.
Comment 32 Jakub Jelinek 2017-03-22 17:22:47 UTC
It is actually the *-lib.c files that should be -fno-lto.
But it seems all the source files are compiled/linked using one driver invocation, so I have no idea how to do -fno-lto just for one file.
Comment 33 rguenther@suse.de 2017-03-23 08:27:12 UTC
On Wed, 22 Mar 2017, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529
> 
> --- Comment #32 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> It is actually the *-lib.c files that should be -fno-lto.
> But it seems all the source files are compiled/linked using one driver
> invocation, so I have no idea how to do -fno-lto just for one file.

Somehow change builtins.exp from

foreach src [lsort [find $srcdir/$subdir *.c]] {
    if {![string match *-lib.c $src] && [runtest_file_p $runtests $src]} {
        c-torture-execute [list $src \
                                [file root $src]-lib.c \
                                $srcdir/$subdir/lib/main.c] \
                                $additional_flags

to first compile [file root $src]-lib.c to [file root $src]-lib.o
and then do

        c-torture-execute [list $src \
                                [file root $src]-lib.o \
                                $srcdir/$subdir/lib/main.c] \
                                $additional_flags

?  Using gcc_target_compile in some way, like

	gcc_target_compile [file root $src]-lib.c [file root $src]-lib.o
Comment 34 Ramana Radhakrishnan 2017-03-28 09:28:25 UTC
This is really a testsuite issue.
Comment 35 Richard Biener 2017-03-31 07:49:51 UTC
And not really a regression thus.  Either INVALID or leave it open as testsuite enhancement.