Bug 34628 - [4.2/4.3 Regression] problems with inlining on ARM
Summary: [4.2/4.3 Regression] problems with inlining on ARM
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.3.0
: P1 normal
Target Milestone: 4.2.3
Assignee: Eric Botcazou
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2007-12-31 10:29 UTC by Martin Michlmayr
Modified: 2008-01-22 22:32 UTC (History)
11 users (show)

See Also:
Host:
Target: arm*-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-01-09 16:34:46


Attachments
optimized tree dump (4.2) (336 bytes, text/plain)
2008-01-03 18:07 UTC, Martin Michlmayr
Details
Lightly tested fix. (572 bytes, patch)
2008-01-10 14:03 UTC, Eric Botcazou
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Michlmayr 2007-12-31 10:29:07 UTC
[ Forwarded from http://bugs.debian.org/445566 ]

We found in Debian that samba doesn't work properly on ARM (old ABI and
EABI) when compiled with -O or higher.

Herbert Valerio Riedel has investigated the problem and came up with
the following observations:

I did some investigation, and the compiler bug seems to be related to
code inlining; I pinned it down to the following code:

/***********************************************************************/
typedef unsigned short u16;
typedef unsigned char u8;

static void
do_segfault(u8 in_buf[], const u8 out_buf[], const int len)
{
  int i;

  for (i = 0; i < len; i++) {
    //SSVAL(in_buf, 2*i, SVAL(out_buf,2*i)); // more or less just a
                                             // byte-wise memcpy

    asm("nop\n");

    in_buf[2*i] = (   out_buf[2*i] | out_buf[(2*i)+1]<<8  ) & 0xFF; //
in_buf[2*i] = out_buf[2*i];

    asm("nop\n");

    in_buf[(2*i)+1] =  ( out_buf[2*i] | out_buf[(2*i)+1]<<8 ) >> 8; //
in_buf[(2*i)+1] = out_buf[(2*i)+1];

    asm("nop\n");
  }
}

int main(int argc, char *argv[])
{
  u8 outbuf[32] = "buffer     ";
  u8 inbuf[32] = "\f";

  asm("nop\n");
  do_segfault(inbuf, outbuf, 12);
  asm("nop\n");

  return 0;
}
/***********************************************************************/

which when compiled with  4.2.3 20071123 (prerelease) (Debian 4.2.2-4),
causes a segfault when using -O2, but works when either removing the
'static' modifier and thus avoiding inlining of do_segfault, or by using
an optimization level which does avoid that...

the generated assembler code is quite broken for the optimized case:

(I've only pasted and commented the relevant section containing the 5
nops)

// r4 points to outbuf (= source buffer)
// sp points to inbuf  (= target buffer)
#APP
        nop
        mov     r2, sp
        add     r1, sp, #56 // upper loop-bound; should have been #12
.L2:
#APP
        nop
        ldrb    r3, [r4, #0]
        strb    r3, [r2, #0]
#APP
        nop
        ldrb    r3, [r4, #1]
        strb    r3, [r2, #1]
#APP
        nop
        cmp     r4, r1 // upper loop-bound check
        add     r2, r2, #2 // increment target buffer (NB: source buffer
                           // is not incremented!)
        bne     .L2 // repeat loop if upper loop-bound not reached yet
#APP
        nop
Comment 1 Martin Michlmayr 2007-12-31 10:30:14 UTC
Herbert Valerio Riedel just had some more comments:

>         mov     r2, sp
>         add     r1, sp, #56 // upper loop-bound; should have been #12

I actually wanted to say 'should have been #24' :-)

this overrun is what causes the segfault... and which goes away when not
inlining, but the source-ptr not updating...

>       cmp     r4, r1 // upper loop-bound check
>         add     r2, r2, #2 // increment target buffer (NB: source buffer is
not incremented!)
>         bne     .L2 // repeat loop if upper loop-bound not reached yet

...stays even when avoiding inlining; it's somehow the optimizer that
gets confused by
in_buf[2*i] = (   out_buf[2*i] | out_buf[(2*i)+1]<<8  ) & 0xFF;
in_buf[(2*i)+1] =  ( out_buf[2*i] | out_buf[(2*i)+1]<<8 ) >> 8;

which gcc correctly optimizes to

in_buf[2*i] = out_buf[2*i];
in_buf[(2*i)+1] = out_buf[(2*i)+1];

but then gets confused; if one takes away that confusion from gcc by
replacing the former two lines by their latter equivalent optimized
variants, code generation is fine for all optimization levels...
Comment 2 Richard Biener 2008-01-01 17:42:47 UTC
I see different IL for 4.2 compared to 4.3, is the bug present in 4.3?  Can you
attach the optimized tree dump?
Comment 3 Mark Mitchell 2008-01-02 23:14:55 UTC
Put into WAITING state until the information requested in Comment #2 is available.
Comment 4 Martin Michlmayr 2008-01-03 18:03:21 UTC
Bugzilla wraps the testcase in a way that some commented out is no longer
commented out and so you don't see the segfault.  Here's the testcase again
with proper wrapping:

typedef unsigned short u16;
typedef unsigned char u8;

static void
do_segfault(u8 in_buf[], const u8 out_buf[], const int len)
{
  int i;

  for (i = 0; i < len; i++) {
    //SSVAL(in_buf, 2*i, SVAL(out_buf,2*i)); // more or less just a
                                             // byte-wise memcpy

    asm("nop\n");

    in_buf[2*i] = (   out_buf[2*i] | out_buf[(2*i)+1]<<8  ) & 0xFF;
 // in_buf[2*i] = out_buf[2*i];

    asm("nop\n");

    in_buf[(2*i)+1] =  ( out_buf[2*i] | out_buf[(2*i)+1]<<8 ) >> 8;
 // in_buf[(2*i)+1] = out_buf[(2*i)+1];

    asm("nop\n");
  }
}

int main(int argc, char *argv[])
{
  u8 outbuf[32] = "buffer     ";
  u8 inbuf[32] = "\f";

  asm("nop\n");
  do_segfault(inbuf, outbuf, 12);
  asm("nop\n");

  return 0;
}
Comment 5 Martin Michlmayr 2008-01-03 18:07:05 UTC
Created attachment 14870 [details]
optimized tree dump (4.2)
Comment 6 Martin Michlmayr 2008-01-03 18:22:46 UTC
(In reply to comment #2)
> I see different IL for 4.2 compared to 4.3, is the bug present in 4.3?  Can you
> attach the optimized tree dump?

I also get a segfault with the testcase and 4.3.0 20070916.

The original program (samba) compiled with 4.2 shows bad code (smbclient returning 'string length'+1 of the first character of the share name rather
than the share name) whereas it segfaults when compiled with 4.3.
Comment 7 Richard Biener 2008-01-03 18:58:29 UTC
The final tree IL looks good, so I suspect the RTL loop optimizer gets this wrong.
Comment 8 Zdenek Dvorak 2008-01-03 21:23:35 UTC
(In reply to comment #7)
> The final tree IL looks good, so I suspect the RTL loop optimizer gets this
> wrong.
> 

>  add     r1, sp, #56 // upper loop-bound; should have been #12
>  I actually wanted to say 'should have been #24' :-)

This insn is actually correct, r1 = &outbuf + 24.  The missing increment of r4 seems to be the problem.  Post-increment is created for r4, but it disappears in the combine pass.  Does not seem to be loop optimizer related.
Comment 9 Richard Biener 2008-01-09 15:56:37 UTC
Trying someone else - Eric, any clue about post-increment?
Comment 10 Eric Botcazou 2008-01-09 16:34:46 UTC
Will take a look.
Comment 11 Eric Botcazou 2008-01-10 14:03:38 UTC
Created attachment 14907 [details]
Lightly tested fix.

I'll give it a whirl on IA-64 but it would be nice to test it on ARM too.
Comment 12 Martin Michlmayr 2008-01-11 22:01:46 UTC
(In reply to comment #11)
> Created an attachment (id=14907) [edit]
> Lightly tested fix.
> 
> I'll give it a whirl on IA-64 but it would be nice to test it on ARM too.

I'm away from my machine for another week.  Paul, do you think you could
bootstrap and regtest the patch on ARM?
Comment 13 Martin Michlmayr 2008-01-21 12:31:17 UTC
(In reply to comment #11)
> Created an attachment (id=14907) [edit]
> Lightly tested fix.
> 
> I'll give it a whirl on IA-64 but it would be nice to test it on ARM too.

Eric, was your testing successful?  I'll give it a go on ARM now.
Comment 14 Eric Botcazou 2008-01-21 14:25:01 UTC
> Eric, was your testing successful?

Yes, everything is clean on the IA-64.

> I'll give it a go on ARM now.

Thanks in advance.
Comment 15 Martin Michlmayr 2008-01-22 10:49:14 UTC
I've now successfully bootstrapped and regtested this change with trunk on
ARM.  I enabled c,c++,fortran,objc,obj-c++,treelang.

There are no regressions and the segfault is gone.

Eric, can you please submit the patch?

Thanks.
Comment 16 Eric Botcazou 2008-01-22 11:39:28 UTC
> I've now successfully bootstrapped and regtested this change with trunk on
> ARM.  I enabled c,c++,fortran,objc,obj-c++,treelang.
> 
> There are no regressions and the segfault is gone.

Great!

> Eric, can you please submit the patch?

I can even directly install it. :-)
Comment 17 Eric Botcazou 2008-01-22 22:28:31 UTC
Subject: Bug 34628

Author: ebotcazou
Date: Tue Jan 22 22:27:47 2008
New Revision: 131744

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131744
Log:
	PR rtl-optimization/34628
	* combine.c (try_combine): Stop and undo after the first combination
	if an autoincrement side-effect on the first insn has effectively
	been lost.


Added:
    trunk/gcc/testsuite/gcc.c-torture/execute/20080122-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
    trunk/gcc/testsuite/ChangeLog

Comment 18 Eric Botcazou 2008-01-22 22:29:48 UTC
Subject: Bug 34628

Author: ebotcazou
Date: Tue Jan 22 22:29:04 2008
New Revision: 131745

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131745
Log:
	PR rtl-optimization/34628
	* combine.c (try_combine): Stop and undo after the first combination
	if an autoincrement side-effect on the first insn has effectively
	been lost.


Added:
    branches/gcc-4_2-branch/gcc/testsuite/gcc.c-torture/execute/20080122-1.c
Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/combine.c
    branches/gcc-4_2-branch/gcc/testsuite/ChangeLog

Comment 19 Eric Botcazou 2008-01-22 22:32:13 UTC
.