Bug 48308 - [4.6 Regression] crosscompiling to arm fails with assembler: can't resolve '.LC4' {.rodata.str1.1 section} - '.LPIC4' {*UND* section}
Summary: [4.6 Regression] crosscompiling to arm fails with assembler: can't resolve '....
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.6.0
: P2 normal
Target Milestone: 4.6.4
Assignee: Ramana Radhakrishnan
URL:
Keywords: assemble-failure, patch
: 56561 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-03-28 02:30 UTC by Gerhard Heift
Modified: 2013-04-05 08:32 UTC (History)
14 users (show)

See Also:
Host:
Target: arm-unknown-linux-uclibcgnueabi
Build:
Known to work: 4.4.5, 4.5.2, 4.7.2, 4.8.0
Known to fail: 4.6.0
Last reconfirmed: 2013-03-07 12:46:35


Attachments
reduced testcase (1.46 KB, application/octet-stream)
2011-03-28 02:30 UTC, Gerhard Heift
Details
generated assembler (1001 bytes, application/octet-stream)
2011-03-28 12:34 UTC, Gerhard Heift
Details
reduced^2 test case (261 bytes, text/plain)
2011-03-28 22:57 UTC, Mikael Pettersson
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gerhard Heift 2011-03-28 02:30:01 UTC
Created attachment 23786 [details]
reduced testcase

Hello,

I tried to crosscompile openssl to arm and it stopped with the following error message:

{standard input}: Assembler messages:
{standard input}:113: Error: can't resolve `.LC4' {.rodata.str1.1 section} - `.LPIC4' {*UND* section}

This happens only if I try to compile the file with "-Os -fPIC". 

/data/br-build/host/usr/bin/arm-unknown-linux-uclibcgnueabi-gcc -fPIC -Os -c pkeyparam.out.i

Output of arm-unknown-linux-uclibcgnueabi-gcc -v:

COLLECT_GCC=/data/br-build/host/usr/bin/arm-unknown-linux-uclibcgnueabi-gcc
COLLECT_LTO_WRAPPER=/data/br-build/host/usr/libexec/gcc/arm-unknown-linux-uclibcgnueabi/4.6.0/lto-wrapper
Target: arm-unknown-linux-uclibcgnueabi
Configured with: /data/br-build/toolchain/gcc-4.6.0/configure --prefix=/home/data/br-build/host/usr --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=arm-unknown-linux-uclibcgnueabi --enable-languages=c --with-sysroot=/data/br-build/host/usr/arm-unknown-linux-uclibcgnueabi/sysroot --with-build-time-tools=/data/br-build/host/usr/arm-unknown-linux-uclibcgnueabi/bin --disable-__cxa_atexit --enable-target-optspace --with-gnu-ld --disable-libssp --disable-multilib --disable-tls --enable-shared --with-gmp=/data/br-build/host/usr --with-mpfr=/data/br-build/host/usr --with-mpc=/data/br-build/host/usr --enable-threads --disable-decimal-float --with-float=soft --with-abi=aapcs-linux --with-pkgversion='Buildroot 2011.05-git-dirty' --with-bugurl=http://bugs.buildroot.net/
Thread model: posix
gcc version 4.6.0 (Buildroot 2011.05-git-dirty)

Output of arm-unknown-linux-uclibcgnueabi-as -version:

GNU assembler (GNU Binutils) 2.21
Copyright 2010 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `arm-unknown-linux-uclibcgnueabi'

Regards,
  Gerhard
Comment 1 Mikael Pettersson 2011-03-28 07:47:41 UTC
Works for me on armv5tel-linux-gnueabi with gcc-4.6-20110325 (r171529) and binutils-2.20.1 (heavily updated).  Please attach the intermediate assembly file.
Comment 2 Gerhard Heift 2011-03-28 12:34:31 UTC
Created attachment 23791 [details]
generated assembler

On line 114, the generates assembler code refers to .LPIC4, which does not exists.
Comment 3 Mikael Pettersson 2011-03-28 13:09:15 UTC
(In reply to comment #2)
> Created attachment 23791 [details]
> generated assembler
> 
> On line 114, the generates assembler code refers to .LPIC4, which does not
> exists.

Indeed.  If I compile with -mcpu=arm9tdmi as your assembly file indicates then I lose several lines of code, including the .LPIC4 label and a strcmp() call, but the reference to .LPIC4 remained.  Normally I have -march=armv5te -mtune=xscale, and in that case the .LPIC4 label and surrounding code is not lost.

Works(*) with gcc-4.4.5 and 4.5.2, so it's a regression.

(*) Had to eliminate some apparent C1X-isms from the test case though.
Comment 4 Mikael Pettersson 2011-03-28 22:24:05 UTC
It's triggered by r163998:

Author: matz
Date: Wed Sep  8 12:34:52 2010
New Revision: 163998

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163998
Log:
	PR tree-optimization/33244
	* tree-ssa-sink.c (statement_sink_location): Don't sink into
	empty loop latches.
Comment 5 Mikael Pettersson 2011-03-28 22:57:21 UTC
Created attachment 23793 [details]
reduced^2 test case
Comment 6 Ian Bolton 2011-03-29 15:23:12 UTC
Using trunk, r171422, I have compiled the reduced test case as follows:

arm-none-linux-gnueabi-gcc -fPIC -Os -c pr48308.c -mcpu=arm9tdmi -marm
pr48308.s: Assembler messages:
pr48308.s:97: Error: can't resolve `.LC4' {.rodata.str1.1 section} - `.LPIC4' {*UND* section}

With --save-temps, you can see the missing .LPIC4 in the .s.

This is therefore confirmed in trunk as well.

FYI: the problem doesn't happen with -mthumb.
Comment 7 Michael K. Edwards 2011-06-22 15:14:06 UTC
I hit this with Linaro GCC 4.6 (4.6.1-based) and the same pkeyparam.c from OpenSSL.  I am also compiling with -Os and -fPIC, and implicitly with -mthumb (the default in my toolchain); so it's not specific to ARM mode.

The situation appears to be that two pc-relative fetches (artifacts of -fPIC and string literals) get folded together, losing one of the labels (needed for calculation of the offset in the table of PIC indirections).

Reverting http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163998 makes the problem go away, at least at compile time; I should be able to run a test suite soon.
Comment 8 Ramana Radhakrishnan 2011-06-24 09:15:08 UTC
Not sure if this is a target or tree-optimization issue yet.

Ramana
Comment 9 Andrew Pinski 2011-09-07 07:53:23 UTC
(In reply to comment #8)
> Not sure if this is a target or tree-optimization issue yet.

The tree level change just exposed the issue really.  I think you need to look at the RTL level to see where the label disappears.
Comment 10 Ramana Radhakrishnan 2011-09-13 06:31:39 UTC
*** Bug 50313 has been marked as a duplicate of this bug. ***
Comment 11 Jakub Jelinek 2011-10-26 17:14:01 UTC
GCC 4.6.2 is being released.
Comment 12 Madhu 2011-11-02 11:37:07 UTC
Please tell me this bug is fixed in gcc 4.6.2 release? If this is fixed in gcc4.6.2 release can I use the fix patch into gcc 4.6-linaro?

Thanks & Regards,
Madhu
Comment 13 Ramana Radhakrishnan 2011-11-02 11:40:44 UTC
(In reply to comment #12)
> Please tell me this bug is fixed in gcc 4.6.2 release? If this is fixed in
> gcc4.6.2 release can I use the fix patch into gcc 4.6-linaro?
> 
> Thanks & Regards,
> Madhu

This has not been fixed yet. That's evident from the status of the bug. 


Ramana
Comment 14 Jakub Jelinek 2011-12-22 16:05:52 UTC
Note, can't be reproduced on the trunk, the strcmp isn't DCEd there, but guess the problem is just latent there.

It looks like a target bug to me.  Before RTL loop opts we have:
(insn 91 90 92 13 (set (reg:SI 167)
        (unspec:SI [
                (const:SI (unspec:SI [
                            (symbol_ref/v/f:SI ("*.LC4") [flags 0x82]  <var_decl 0x7f5ebb0a5500 *.LC4>)
                            (const:SI (plus:SI (unspec:SI [
                                            (const_int 4 [0x4])
                                        ] 21)
                                    (const_int 8 [0x8])))
                        ] 27))
            ] 3)) pr48308.i:228 170 {pic_load_addr_32bit}
     (nil))

(insn 92 91 94 13 (set (reg:SI 167)
        (unspec:SI [
                (reg:SI 167)
                (const_int 8 [0x8])
                (const_int 4 [0x4])
            ] 4)) pr48308.i:228 173 {pic_add_dot_plus_eight}
     (expr_list:REG_EQUAL (symbol_ref/v/f:SI ("*.LC4") [flags 0x82]  <var_decl 0x7f5ebb0a5500 *.LC4>)
        (nil)))

and the pseudo 167 is then used to load one of the strcmp parameters.
Then (probably loop invariant motion) moves insn 91 before the loop, as it looks to be loop invariant, but insn 92 is kept in the loop.
Next during RA, the register pressure is high and thus pseudo 167 is spilled, so
before the loop there is a store.  Then during the *.csa pass DCE is performed and the strcmp is removed, which means insn 92 is removed as well, but the store before the loop of course is kept.  And there is no further DSE pass that would optimize that (now dead) store away.
So, IMHO arm_reorg needs to handle this case, find out what minipool entries don't have the corresponding UNSPEC_PIC_BASE insn and handle them somehow (either by emitting there a dummy 0 or similar, or trying to replace the insn with UNSPEC_PIC_SYM with something else, ...).
That said, perhaps it would be nice to help the loop optimizers somehow figure out that even the UNSPEC_PIC_BASE is loop invariant (wrap it into CONST?).
Comment 15 Michael Hope 2011-12-22 16:07:46 UTC
I'm away of holiday until the 4th of January.  For anything urgent,
please contact kiko@linaro.org.

-- Michael
Comment 16 Jakub Jelinek 2012-01-05 17:48:11 UTC
Ramana (or some other ARM maintainer), could you please try to implement this?  Thanks.
Comment 17 Richard Earnshaw 2012-01-06 16:02:49 UTC
I think it would be better to just generate a single insn early on and then to split it once the loop optimizers have had a chance to run.  Then the optimizers should be able to throw the whole expression away if it's not needed and hoist it as a whole if it's inside a loop.

Splitting it before sched1 would be ideal, but the split could be delayed until after reload if necessary.
Comment 18 Ramana Radhakrishnan 2012-01-11 18:15:05 UTC
(In reply to comment #14)
> Note, can't be reproduced on the trunk, the strcmp isn't DCEd there, but guess
> the problem is just latent there.
> 
> It looks like a target bug to me.  Before RTL loop opts we have:
> (insn 91 90 92 13 (set (reg:SI 167)
>         (unspec:SI [
>                 (const:SI (unspec:SI [
>                             (symbol_ref/v/f:SI ("*.LC4") [flags 0x82] 
> <var_decl 0x7f5ebb0a5500 *.LC4>)
>                             (const:SI (plus:SI (unspec:SI [
>                                             (const_int 4 [0x4])
>                                         ] 21)
>                                     (const_int 8 [0x8])))
>                         ] 27))
>             ] 3)) pr48308.i:228 170 {pic_load_addr_32bit}
>      (nil))
> 
> (insn 92 91 94 13 (set (reg:SI 167)
>         (unspec:SI [
>                 (reg:SI 167)
>                 (const_int 8 [0x8])
>                 (const_int 4 [0x4])
>             ] 4)) pr48308.i:228 173 {pic_add_dot_plus_eight}
>      (expr_list:REG_EQUAL (symbol_ref/v/f:SI ("*.LC4") [flags 0x82]  <var_decl
> 0x7f5ebb0a5500 *.LC4>)
>         (nil)))
> 
> and the pseudo 167 is then used to load one of the strcmp parameters.
> Then (probably loop invariant motion) moves insn 91 before the loop, as it
> looks to be loop invariant, but insn 92 is kept in the loop.
> Next during RA, the register pressure is high and thus pseudo 167 is spilled,
> so
> before the loop there is a store.  Then during the *.csa pass DCE is performed
> and the strcmp is removed, which means insn 92 is removed as well, but the
> store before the loop of course is kept.  And there is no further DSE pass that
> would optimize that (now dead) store away.
> So, IMHO arm_reorg needs to handle this case, find out what minipool entries
> don't have the corresponding UNSPEC_PIC_BASE insn and handle them somehow
> (either by emitting there a dummy 0 or similar, or trying to replace the insn
> with UNSPEC_PIC_SYM with something else, ...).
> That said, perhaps it would be nice to help the loop optimizers somehow figure
> out that even the UNSPEC_PIC_BASE is loop invariant (wrap it into CONST?).


Found it I think- the problem is way before any of this . A transformation 
in combine looks suspicious. 

The first clue to all this was that the compiler was removing a call of strcmp with .LC4 which is the constant string "-noout" . If you look at the testcase, there is a use of noout afterwards which is to call :

   if (!noout)   
     PEM_write_bio_Parameters(out,pkey);

However the condition 

 else if (strcmp(*args,"-text") == 0)    
       text=1;

has a value of text set to 1. 

However given text is not used later that is the only call 
to strcmp that can be removed safely. However the strcmp of "-text" is retained in the final code generated but the call to strcmp of "-noout" is the one that's getting eliminated. 

Here's the suspicious transformation after combine: 


(insn 97 96 188 13 (set (reg:SI 168)
        (reg:SI 0 r0)) ./t.i:41 624 {*arm_movsi_vfp}
     (expr_list:REG_DEAD (reg:SI 0 r0)
        (nil)))

(insn 188 97 189 13 (set (reg:CC 24 cc)
        (compare:CC (reg:SI 168)
            (const_int 0 [0]))) ./t.i:41 198 {*arm_cmpsi_insn}
     (nil))

(insn 189 188 191 13 (set (reg/v:SI 136 [ badarg ])
        (if_then_else:SI (eq (reg:CC 24 cc)
                (const_int 0 [0]))
            (reg:SI 168)
            (const_int 0 [0]))) ./t.i:41 220 {*movsicc_insn}
     (expr_list:REG_DEAD (reg:SI 168)
        (nil)))

(insn 191 189 153 13 (set (reg/v:SI 135 [ noout ])
        (if_then_else:SI (ne (reg:CC 24 cc)
                (const_int 0 [0]))
            (reg/v:SI 135 [ noout ])
            (const_int 1 [0x1]))) ./t.i:41 220 {*movsicc_insn}
     (expr_list:REG_DEAD (reg:CC 24 cc)
        (nil)))
 

is transformed to :

(note 97 96 188 13 NOTE_INSN_DELETED)

(note 188 97 189 13 NOTE_INSN_DELETED)

(insn 189 188 191 13 (set (reg/v:SI 136 [ badarg ])
        (const_int 0 [0])) ./t.i:41 624 {*arm_movsi_vfp}
     (nil))

(insn 191 189 153 13 (parallel [
            (set (reg/v:SI 135 [ noout ])
                (if_then_else:SI (ne (reg:SI 168)
                        (const_int 0 [0]))
                    (reg/v:SI 135 [ noout ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 24 cc))
        ]) ./t.i:41 273 {movcond}
     (expr_list:REG_UNUSED (reg:CC 24 cc)
        (expr_list:REG_DEAD (reg:SI 168)
            (nil))))


Look at the use of reg:SI 168  in insn 191 rather than reg:SI r0 . That's where it all goes pear shaped as we've now used reg:SI 168 which we've not defined before. The other bits of info from .combine is as below.  

Trying 97 -> 188:
Successfully matched this instruction:
(parallel [
        (set (reg:CC 24 cc)
            (compare:CC (reg:SI 0 r0)
                (const_int 0 [0])))
        (set (reg:SI 168)
            (reg:SI 0 r0))
    ])

and then further trying to combine insn 188 into insn 189 we get the following 


Trying 188 -> 189:
Failed to match this instruction:
(parallel [
        (set (reg/v:SI 136 [ badarg ])
            (const_int 0 [0]))
        (set (reg:CC 24 cc)
            (compare:CC (reg:SI 168)
                (const_int 0 [0])))
    ])
Failed to match this instruction:
(parallel [
        (set (reg/v:SI 136 [ badarg ])
            (const_int 0 [0]))
        (set (reg:CC 24 cc)
            (compare:CC (reg:SI 168)
                (const_int 0 [0])))
    ])
Successfully matched this instruction:
(set (reg:CC 24 cc)
    (compare:CC (reg:SI 168)
        (const_int 0 [0])))
Successfully matched this instruction:
(set (reg/v:SI 136 [ badarg ])
    (const_int 0 [0]))
deferring deletion of insn with uid = 188.
modifying insn i2   188 cc:CC=cmp(r168:SI,0)
      REG_DEAD: r168:SI
deferring rescan insn with uid = 188.
modifying insn i3   189 r136:SI=0
deferring rescan insn with uid = 189.
 
 I still need to debug through try_combine to figure out where everything goes pear shaped but that's my understanding on this bug as of today. 

Ofcourse, the simple workaround is to simply disable the transformation in this form in the ARM backend (*movsi_compare0) but I don't want to do that before I've finished stepping through combine. 

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 794b865..01c2a87 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5380,7 +5380,7 @@
                    (const_int 0)))
    (set (match_operand:SI 0 "s_register_operand" "=r,r")
        (match_dup 1))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && 0"
   "@
    cmp%?\\t%0, #0
    sub%.\\t%0, %1, #0"
Comment 19 Ramana Radhakrishnan 2012-01-13 09:07:48 UTC
(In reply to comment #14)
> Note, can't be reproduced on the trunk, the strcmp isn't DCEd there, but guess
> the problem is just latent there.

Latent still in trunk with the testcase - you can try it with -fdbg-cnt=cprop:0 with the first reduced testcase and you should be able to see the same behaviour with trunk. The other testcase which is in PR50313 also fails with the same error on trunk (with -Os -fPIC -mcpu=arm9tdmi -marm )

The problem with reduced^2 testcase or reduced testcase appears to get fixed if we add LOG_LINKS to keep track of the dummy i1 in the form of a patch of the following nature.  

combine.c: try_combine attempts to create a dummy i1 instruction in case i1 is NULL and i2 looks like 

	(parallel [(set (reg:CC X) (compare:CC OP (const_int 0)))
		   (set Y OP)])

which is exactly how the intermediate instruction looks 

(parallel [
        (set (reg:CC 24 cc)
            (compare:CC (reg:SI 0 r0)
                (const_int 0 [0])))
        (set (reg:SI 168)
            (reg:SI 0 r0))
    ])

i1 now becomes 

  (set (reg:SI 168)
            (reg:SI 0 r0))

and i2 becomes 

    (set (reg:CC 24 cc)
            (compare:CC (reg:SI 0 r168)
                (const_int 0 [0])))

but there are no LOG_LINKS to indicate that even when we've created the dummy i1 - i2 actually feeds into i1. After changing try_combine[1] to create a LOG_LINKS to indicate this relationship between i1 and i2 we no longer eliminate the wrong strcmp call and therefore this should IMHO be the handled in combine rather than faffing around in the backend.

However this doesn't fix the testcase from PR50313 (gmime.i) which prima-facie appeared to exhibit the same behaviour as this particular testcase and that needs reopening and further investigation to be sure that the IR is valid when cfgcleanup decides to eliminate the PIC insn in.


cheers
Ramana



diff --git a/gcc/combine.c b/gcc/combine.c
index 4178870..f6b8769 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2865,6 +2865,8 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p,
          SUBST (PATTERN (i2), XVECEXP (PATTERN (i2), 0, 0));
          SUBST (XEXP (SET_SRC (PATTERN (i2)), 0),
                 SET_DEST (PATTERN (i1)));
+         LOG_LINKS (i2) = alloc_insn_link (i1, LOG_LINKS (i2));
+
        }
     }
 #endif
Comment 20 Ramana Radhakrishnan 2012-01-25 08:52:43 UTC
Author: ramana
Date: Wed Jan 25 08:52:39 2012
New Revision: 183512

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183512
Log:

2012-01-25  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>

	PR rtl-optimization/48308
	* combine.c (enum undo_kind): Add UNDO_LINKS.
	(struct undo): Add member l to other_contents and where.
	(do_SUBST_LINK): New.
	(SUBST_LINK): New.
	(try_combine): Handle LOG_LINKS for the dummy i1 case.
	(undo_all): Handle UNDO_LINKS.



Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
Comment 21 Jakub Jelinek 2012-03-01 14:39:21 UTC
GCC 4.6.3 is being released.
Comment 22 Matt Heck 2012-05-30 23:59:46 UTC
A quick comment for folks trying to work around Buildroot problems related to this:

As of this date, this problem still occurs using current Buildroot materials and attempt to build for, for instance, the AT91SAM9M10-G45-EK.  The failure occurs in the SSL library (presumably a stress point due to how heavily optimized it is).

For the moment, the workaround is to set the GCC compiler version to 4.5.x, instead of 4.6.x.  This seems to work.

FWIW, I would recommend testing a fairly thorough (lots of packages, including SSL and Xorg) Buildroot test for an ARM9 target as a good way to check whether or not this bug has been resolved.  I'd be happy to try that myself when there's a fix.
Comment 23 Andrew Pinski 2012-12-31 09:55:54 UTC
(In reply to comment #20)
Did the patch which was applied at revision 183512 fix this bug?  If so please mark this bug as being fixed for 4.7.0 and 4.8.0 and not a regression.
Comment 24 Mike Hommey 2013-03-10 09:20:05 UTC
*** Bug 56561 has been marked as a duplicate of this bug. ***
Comment 25 Steven Bosscher 2013-03-13 20:55:12 UTC
http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00666.html and follow-ups
claim to have fixed this for GCC 4.7 and GCC 4.8.

Ramana, you should take care of the bug reports assigned to you.  Also,
you should have added a test case.

http://gcc.gnu.org/PR56561#c4 has a patch for GCC 4.6.
Comment 26 Sebastian Huber 2013-03-22 08:28:44 UTC
In case this won't be fixed in 4.6, then it is probably good to set the status to WONTFIX so people know that they have to deal with this on their own.
Comment 27 Ramana Radhakrishnan 2013-03-22 09:59:47 UTC
(In reply to comment #26)
> In case this won't be fixed in 4.6, then it is probably good to set the status
> to WONTFIX so people know that they have to deal with this on their own.

It looks like I missed this backport when I did it earlier - I'll look into a backport over the weekend.


Thanks,
Ramana
Comment 28 Ramana Radhakrishnan 2013-04-05 08:32:19 UTC
Fixed for 4.6 by http://gcc.gnu.org/ml/gcc-cvs/2013-04/msg00218.html