[ Forwarded from http://bugs.debian.org/445566 ] We found in Debian that samba doesn't work properly on ARM (old ABI and EABI) when compiled with -O or higher. Herbert Valerio Riedel has investigated the problem and came up with the following observations: I did some investigation, and the compiler bug seems to be related to code inlining; I pinned it down to the following code: /***********************************************************************/ typedef unsigned short u16; typedef unsigned char u8; static void do_segfault(u8 in_buf[], const u8 out_buf[], const int len) { int i; for (i = 0; i < len; i++) { //SSVAL(in_buf, 2*i, SVAL(out_buf,2*i)); // more or less just a // byte-wise memcpy asm("nop\n"); in_buf[2*i] = ( out_buf[2*i] | out_buf[(2*i)+1]<<8 ) & 0xFF; // in_buf[2*i] = out_buf[2*i]; asm("nop\n"); in_buf[(2*i)+1] = ( out_buf[2*i] | out_buf[(2*i)+1]<<8 ) >> 8; // in_buf[(2*i)+1] = out_buf[(2*i)+1]; asm("nop\n"); } } int main(int argc, char *argv[]) { u8 outbuf[32] = "buffer "; u8 inbuf[32] = "\f"; asm("nop\n"); do_segfault(inbuf, outbuf, 12); asm("nop\n"); return 0; } /***********************************************************************/ which when compiled with 4.2.3 20071123 (prerelease) (Debian 4.2.2-4), causes a segfault when using -O2, but works when either removing the 'static' modifier and thus avoiding inlining of do_segfault, or by using an optimization level which does avoid that... the generated assembler code is quite broken for the optimized case: (I've only pasted and commented the relevant section containing the 5 nops) // r4 points to outbuf (= source buffer) // sp points to inbuf (= target buffer) #APP nop mov r2, sp add r1, sp, #56 // upper loop-bound; should have been #12 .L2: #APP nop ldrb r3, [r4, #0] strb r3, [r2, #0] #APP nop ldrb r3, [r4, #1] strb r3, [r2, #1] #APP nop cmp r4, r1 // upper loop-bound check add r2, r2, #2 // increment target buffer (NB: source buffer // is not incremented!) bne .L2 // repeat loop if upper loop-bound not reached yet #APP nop
Herbert Valerio Riedel just had some more comments: > mov r2, sp > add r1, sp, #56 // upper loop-bound; should have been #12 I actually wanted to say 'should have been #24' :-) this overrun is what causes the segfault... and which goes away when not inlining, but the source-ptr not updating... > cmp r4, r1 // upper loop-bound check > add r2, r2, #2 // increment target buffer (NB: source buffer is not incremented!) > bne .L2 // repeat loop if upper loop-bound not reached yet ...stays even when avoiding inlining; it's somehow the optimizer that gets confused by in_buf[2*i] = ( out_buf[2*i] | out_buf[(2*i)+1]<<8 ) & 0xFF; in_buf[(2*i)+1] = ( out_buf[2*i] | out_buf[(2*i)+1]<<8 ) >> 8; which gcc correctly optimizes to in_buf[2*i] = out_buf[2*i]; in_buf[(2*i)+1] = out_buf[(2*i)+1]; but then gets confused; if one takes away that confusion from gcc by replacing the former two lines by their latter equivalent optimized variants, code generation is fine for all optimization levels...
I see different IL for 4.2 compared to 4.3, is the bug present in 4.3? Can you attach the optimized tree dump?
Put into WAITING state until the information requested in Comment #2 is available.
Bugzilla wraps the testcase in a way that some commented out is no longer commented out and so you don't see the segfault. Here's the testcase again with proper wrapping: typedef unsigned short u16; typedef unsigned char u8; static void do_segfault(u8 in_buf[], const u8 out_buf[], const int len) { int i; for (i = 0; i < len; i++) { //SSVAL(in_buf, 2*i, SVAL(out_buf,2*i)); // more or less just a // byte-wise memcpy asm("nop\n"); in_buf[2*i] = ( out_buf[2*i] | out_buf[(2*i)+1]<<8 ) & 0xFF; // in_buf[2*i] = out_buf[2*i]; asm("nop\n"); in_buf[(2*i)+1] = ( out_buf[2*i] | out_buf[(2*i)+1]<<8 ) >> 8; // in_buf[(2*i)+1] = out_buf[(2*i)+1]; asm("nop\n"); } } int main(int argc, char *argv[]) { u8 outbuf[32] = "buffer "; u8 inbuf[32] = "\f"; asm("nop\n"); do_segfault(inbuf, outbuf, 12); asm("nop\n"); return 0; }
Created attachment 14870 [details] optimized tree dump (4.2)
(In reply to comment #2) > I see different IL for 4.2 compared to 4.3, is the bug present in 4.3? Can you > attach the optimized tree dump? I also get a segfault with the testcase and 4.3.0 20070916. The original program (samba) compiled with 4.2 shows bad code (smbclient returning 'string length'+1 of the first character of the share name rather than the share name) whereas it segfaults when compiled with 4.3.
The final tree IL looks good, so I suspect the RTL loop optimizer gets this wrong.
(In reply to comment #7) > The final tree IL looks good, so I suspect the RTL loop optimizer gets this > wrong. > > add r1, sp, #56 // upper loop-bound; should have been #12 > I actually wanted to say 'should have been #24' :-) This insn is actually correct, r1 = &outbuf + 24. The missing increment of r4 seems to be the problem. Post-increment is created for r4, but it disappears in the combine pass. Does not seem to be loop optimizer related.
Trying someone else - Eric, any clue about post-increment?
Will take a look.
Created attachment 14907 [details] Lightly tested fix. I'll give it a whirl on IA-64 but it would be nice to test it on ARM too.
(In reply to comment #11) > Created an attachment (id=14907) [edit] > Lightly tested fix. > > I'll give it a whirl on IA-64 but it would be nice to test it on ARM too. I'm away from my machine for another week. Paul, do you think you could bootstrap and regtest the patch on ARM?
(In reply to comment #11) > Created an attachment (id=14907) [edit] > Lightly tested fix. > > I'll give it a whirl on IA-64 but it would be nice to test it on ARM too. Eric, was your testing successful? I'll give it a go on ARM now.
> Eric, was your testing successful? Yes, everything is clean on the IA-64. > I'll give it a go on ARM now. Thanks in advance.
I've now successfully bootstrapped and regtested this change with trunk on ARM. I enabled c,c++,fortran,objc,obj-c++,treelang. There are no regressions and the segfault is gone. Eric, can you please submit the patch? Thanks.
> I've now successfully bootstrapped and regtested this change with trunk on > ARM. I enabled c,c++,fortran,objc,obj-c++,treelang. > > There are no regressions and the segfault is gone. Great! > Eric, can you please submit the patch? I can even directly install it. :-)
Subject: Bug 34628 Author: ebotcazou Date: Tue Jan 22 22:27:47 2008 New Revision: 131744 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131744 Log: PR rtl-optimization/34628 * combine.c (try_combine): Stop and undo after the first combination if an autoincrement side-effect on the first insn has effectively been lost. Added: trunk/gcc/testsuite/gcc.c-torture/execute/20080122-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/combine.c trunk/gcc/testsuite/ChangeLog
Subject: Bug 34628 Author: ebotcazou Date: Tue Jan 22 22:29:04 2008 New Revision: 131745 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131745 Log: PR rtl-optimization/34628 * combine.c (try_combine): Stop and undo after the first combination if an autoincrement side-effect on the first insn has effectively been lost. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.c-torture/execute/20080122-1.c Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/combine.c branches/gcc-4_2-branch/gcc/testsuite/ChangeLog
.