Wrong code is generated for some loops when -O3 -funroll-all-loops option is used. Attached (unroll-test_e.i) is the intermediate file generated by gcc -v -save-temps -O3 -funroll-all-loops -o unroll_test unroll-test_e.c The code performs a simple scaling of a floating-point vector. When the loop iteration is 5, 6, or 7, the only the first four iteration is done, leaving the last 1, 2, or 3 vector elements unmodified (see scale() in the source code). The output of above command is Reading specs from /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/specs Configured with: ../gcc-3.1/configure --prefix=/usr --enable-shared --enable-threads=posix --with-slibdir=/lib Thread model: posix gcc version 3.1 /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/cpp0 -lang-c -v -D__GNUC__=3 -D__GNUC_MINOR__=1 -D__GNUC_PATCHLEVEL__=0 -DPPC -D__ELF__ -Dpowerpc -D__PPC__ -D__ELF__ -D__powerpc__ -D__PPC -D__powerpc -Acpu=powerpc -Amachine=powerpc -D__OPTIMIZE__ -D__STDC_HOSTED__=1 -D_CALL_SYSV -D_BIG_ENDIAN -D__BIG_ENDIAN__ -Amachine=bigendian -D_ARCH_PPC -D__unix__ -D__gnu_linux__ -D__linux__ -Dunix -D__unix -Dlinux -D__linux -Asystem=unix -Asystem=posix unroll-test_e.c unroll-test_e.i GNU CPP version 3.1 (cpplib) (PowerPC GNU/Linux) ignoring nonexistent directory "/usr/powerpc-unknown-linux-gnu/include" #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/include /usr/include End of search list. /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/cc1 -fpreprocessed unroll-test_e.i -quiet -dumpbase unroll-test_e.c -O3 -version -funroll-all-loops -o unroll-test_e.s GNU CPP version 3.1 (cpplib) (PowerPC GNU/Linux) GNU C version 3.1 (powerpc-unknown-linux-gnu) compiled by GNU C version 3.1. as -mppc -V -Qy -o unroll-test_e.o unroll-test_e.s GNU assembler version 2.12.1 (powerpc-unknown-linux-gnu) using BFD version 2.12.1 /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/collect2 --eh-frame-hdr -V -Qy -m elf32ppclinux -dynamic-linker /lib/ld.so.1 -o unroll_test /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/../../../crt1.o /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/../../../crti.o /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/crtbegin.o -L/usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1 -L/usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/../../.. unroll-test_e.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/crtsavres.o /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/crtend.o /usr/lib/gcc-lib/powerpc-unknown-linux-gnu/3.1/../../../crtn.o GNU ld version 2.12.1 Supported emulations: elf32ppclinux elf32ppc elf32ppcsim The problem was discovered while compiling atlas-3.4.1. -Yozo Release: gcc-3.1 Environment: powerpc-unknown-linux-gnu PowerPC 750 (G3), PowerBook G3 Series (Wallstreet), Linux-2.14.19-pre10, Glibc 2.2.5 Configured with: ../gcc-3.1/configure --prefix=/usr --enable-shared --enable-threads=posix --with-slibdir=/lib How-To-Repeat: Compile the source file with gcc -O3 -funroll-all-loops -o unroll_test unroll_test_e.c Run the program with ./unroll_test Notice the element x[9] is not changed even though it is supposed to be multiplied by two.
Fix: Don't use "-funroll-all-loops".
From: Andreas Jaeger <aj@suse.de> To: yozo@cs.berkeley.edu Cc: gcc-gnats@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: Re: optimization/7130: miscompiled code for gcc-3.1 in powerpc linux with -funroll-all-loops Date: Thu, 11 Jul 2002 14:06:55 +0200 Alan Modra <amodra@bigpond.net.au> writes: > (And this time, I waited for bootstrap to get through building the stage2 > compiler before firing off emails.) What about the testsuite? Any new regressions or passes? I guess this is on Powerpc-linux? Andreas -- Andreas Jaeger SuSE Labs aj@suse.de private aj@arthur.inka.de http://www.suse.de/~aj
From: Alan Modra <amodra@bigpond.net.au> To: yozo@cs.berkeley.edu Cc: gcc-gnats@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: Re: optimization/7130: miscompiled code for gcc-3.1 in powerpc linux with -funroll-all-loops Date: Thu, 11 Jul 2002 17:21:57 +0930 doloop.c:doloop_modify_runtime says: If the loop has been unrolled, then the loop body has been preconditioned to iterate a multiple of unroll_number times. If abs_inc is != 1, the full calculation is t1 = abs_inc * unroll_number; n = abs (final - initial) / t1; n += (abs (final - initial) % t1) > t1 - abs_inc; This is wrong. Taking the example in the PR, we have abs_inc = 1 unroll_number = 4 abs (final - initial) = 10 => t1 == 4 abs (final - initial) % t1 == 2 => n == 2 We want n == 3, to go around the loop fully twice, and once partially. A little thought shows the correct calculation is The amount we increment per (partially) unrolled loop t1 = abs_inc * unroll_number; The number of time we'll go fully round the loop. n = abs (final - initial) / t1; Plus any partial loops. n += (abs (final - initial) % t1) >= abs_inc; PR optimization/7130 * doloop.c (doloop_modify_runtime): Correct count for unrolled loops. This needs to go on the 3.1 branch too. OK, assuming my powerpc-linux bootstrap and regression test passes? -- Alan Modra IBM OzLabs - Linux Technology Centre Index: gcc/doloop.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/doloop.c,v retrieving revision 1.20 diff -u -p -r1.20 doloop.c --- gcc/doloop.c 24 Jun 2002 02:16:42 -0000 1.20 +++ gcc/doloop.c 11 Jul 2002 07:49:43 -0000 @@ -552,6 +552,7 @@ doloop_modify_runtime (loop, iterations_ { const struct loop_info *loop_info = LOOP_INFO (loop); HOST_WIDE_INT abs_inc; + HOST_WIDE_INT abs_loop_inc; int neg_inc; rtx diff; rtx sequence; @@ -601,7 +602,7 @@ doloop_modify_runtime (loop, iterations_ t1 = abs_inc * unroll_number; n = abs (final - initial) / t1; - n += (abs (final - initial) % t1) > t1 - abs_inc; + n += (abs (final - initial) % t1) >= abs_inc; The division and modulo operations can be avoided by requiring that the increment is a power of 2 (precondition_loop_p enforces @@ -667,20 +668,21 @@ doloop_modify_runtime (loop, iterations_ } } - if (abs_inc * loop_info->unroll_number != 1) + abs_loop_inc = abs_inc * loop_info->unroll_number; + if (abs_loop_inc != 1) { int shift_count; - shift_count = exact_log2 (abs_inc * loop_info->unroll_number); + shift_count = exact_log2 (abs_loop_inc); if (shift_count < 0) abort (); - if (abs_inc != 1) - diff = expand_simple_binop (GET_MODE (diff), PLUS, - diff, GEN_INT (abs_inc - 1), - diff, 1, OPTAB_LIB_WIDEN); + diff = expand_simple_binop (GET_MODE (diff), PLUS, + diff, GEN_INT (abs_loop_inc - abs_inc), + diff, 1, OPTAB_LIB_WIDEN); - /* (abs (final - initial) + abs_inc - 1) / (abs_inc * unroll_number) */ + /* (abs (final - initial) + abs_inc * unroll_number - abs_inc) + / (abs_inc * unroll_number) */ diff = expand_simple_binop (GET_MODE (diff), LSHIFTRT, diff, GEN_INT (shift_count), diff, 1, OPTAB_LIB_WIDEN);
From: Alan Modra <amodra@bigpond.net.au> To: yozo@cs.berkeley.edu, gcc-gnats@gcc.gnu.org, gcc-patches@gcc.gnu.org Cc: Subject: Re: optimization/7130: miscompiled code for gcc-3.1 in powerpc linux with -funroll-all-loops Date: Thu, 11 Jul 2002 19:38:40 +0930 On Thu, Jul 11, 2002 at 05:21:57PM +0930, Alan Modra wrote: > doloop.c:doloop_modify_runtime says: > > If the loop has been unrolled, then the loop body has been > preconditioned to iterate a multiple of unroll_number times. If > abs_inc is != 1, the full calculation is > > t1 = abs_inc * unroll_number; > n = abs (final - initial) / t1; > n += (abs (final - initial) % t1) > t1 - abs_inc; > > This is wrong. Taking the example in the PR, we have > > abs_inc = 1 > unroll_number = 4 > abs (final - initial) = 10 > > => t1 == 4 > abs (final - initial) % t1 == 2 > => n == 2 > > We want n == 3, to go around the loop fully twice, and once partially. > > A little thought shows the correct calculation is > > The amount we increment per (partially) unrolled loop > t1 = abs_inc * unroll_number; > > The number of time we'll go fully round the loop. > n = abs (final - initial) / t1; > > Plus any partial loops. > n += (abs (final - initial) % t1) >= abs_inc; And a little more thought, prodded by bootstrap failing :), says Plus any partial loops. n += (abs (final - initial) % t1) != 0; Why? Well, GE loops don't care about final - initial being a multiple of the loop increment. Besides, for unroll_number == 1, the above calculations ought to agree with this extract from unroll.c The number of iterations (prior to any loop unrolling) is given by: n = (abs (final - initial) + abs_inc - 1) / abs_inc. However, it is possible for the summation to overflow, and a safer method is: n = abs (final - initial) / abs_inc; n += (abs (final - initial) % abs_inc) != 0; gcc/ChangeLog PR optimization/7130 * doloop.c (doloop_modify_runtime): Correct count for unrolled loops. gcc/testsuite/ChangeLog * gcc.dg/unroll.c: New. (And this time, I waited for bootstrap to get through building the stage2 compiler before firing off emails.) -- Alan Modra IBM OzLabs - Linux Technology Centre Index: gcc/doloop.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/doloop.c,v retrieving revision 1.20 diff -u -p -r1.20 doloop.c --- gcc/doloop.c 24 Jun 2002 02:16:42 -0000 1.20 +++ gcc/doloop.c 11 Jul 2002 09:07:06 -0000 @@ -552,6 +552,7 @@ doloop_modify_runtime (loop, iterations_ { const struct loop_info *loop_info = LOOP_INFO (loop); HOST_WIDE_INT abs_inc; + HOST_WIDE_INT abs_loop_inc; int neg_inc; rtx diff; rtx sequence; @@ -601,7 +602,7 @@ doloop_modify_runtime (loop, iterations_ t1 = abs_inc * unroll_number; n = abs (final - initial) / t1; - n += (abs (final - initial) % t1) > t1 - abs_inc; + n += (abs (final - initial) % t1) != 0; The division and modulo operations can be avoided by requiring that the increment is a power of 2 (precondition_loop_p enforces @@ -667,20 +668,21 @@ doloop_modify_runtime (loop, iterations_ } } - if (abs_inc * loop_info->unroll_number != 1) + abs_loop_inc = abs_inc * loop_info->unroll_number; + if (abs_loop_inc != 1) { int shift_count; - shift_count = exact_log2 (abs_inc * loop_info->unroll_number); + shift_count = exact_log2 (abs_loop_inc); if (shift_count < 0) abort (); - if (abs_inc != 1) - diff = expand_simple_binop (GET_MODE (diff), PLUS, - diff, GEN_INT (abs_inc - 1), - diff, 1, OPTAB_LIB_WIDEN); + diff = expand_simple_binop (GET_MODE (diff), PLUS, + diff, GEN_INT (abs_loop_inc - 1), + diff, 1, OPTAB_LIB_WIDEN); - /* (abs (final - initial) + abs_inc - 1) / (abs_inc * unroll_number) */ + /* (abs (final - initial) + abs_inc * unroll_number - 1) + / (abs_inc * unroll_number) */ diff = expand_simple_binop (GET_MODE (diff), LSHIFTRT, diff, GEN_INT (shift_count), diff, 1, OPTAB_LIB_WIDEN); Index: gcc/testsuite/gcc.dg/unroll.c =================================================================== RCS file: gcc/testsuite/gcc.dg/unroll.c diff -N gcc/testsuite/gcc.dg/unroll.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ gcc/testsuite/gcc.dg/unroll.c 11 Jul 2002 09:31:12 -0000 @@ -0,0 +1,39 @@ +/* PR opt/7130 */ +/* { dg-do run } */ +/* { dg-options "-O2 -funroll-all-loops" } */ + +#define TYPE long + +void +scale (TYPE *alpha, TYPE *x, int n) +{ + int i, ix; + + if (*alpha != 1) + for (i = 0, ix = 0; i < n; i++, ix += 2) + { + TYPE tmpr, tmpi; + tmpr = *alpha * x[ix]; + tmpi = *alpha * x[ix + 1]; + x[ix] = tmpr; + x[ix + 1] = tmpi; + } +} + +int +main (void) +{ + int i; + TYPE x[10]; + TYPE alpha = 2; + + for (i = 0; i < 10; i++) + x[i] = i; + + scale (&alpha, x, 5); + + if (x[9] != 18) + abort (); + + return 0; +}
From: Alan Modra <amodra@bigpond.net.au> To: Andreas Jaeger <aj@suse.de> Cc: yozo@cs.berkeley.edu, gcc-gnats@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: Re: optimization/7130: miscompiled code for gcc-3.1 in powerpc linux with -funroll-all-loops Date: Fri, 12 Jul 2002 09:12:20 +0930 On Thu, Jul 11, 2002 at 02:06:55PM +0200, Andreas Jaeger wrote: > Alan Modra <amodra@bigpond.net.au> writes: > > > (And this time, I waited for bootstrap to get through building the stage2 > > compiler before firing off emails.) > > > What about the testsuite? Any new regressions or passes? New regressions unfortunately. Patch withdrawn until I can figure out what is going wrong. > I guess this is on Powerpc-linux? Yes. -- Alan Modra IBM OzLabs - Linux Technology Centre
From: Alan Modra <amodra@bigpond.net.au> To: yozo@cs.berkeley.edu, gcc-gnats@gcc.gnu.org, gcc-patches@gcc.gnu.org Cc: Subject: Re: optimization/7130: miscompiled code for gcc-3.1 in powerpc linux with -funroll-all-loops Date: Fri, 12 Jul 2002 12:06:06 +0930 The testsuite failures with the previous patch were due to doloop_modify_runtime not taking into account loop preconditioning. No new regressions with this patch. Yay! gcc/ChangeLog PR optimization/7130 * loop.h (struct loop_info): Add "preconditioned". * unroll.c (unroll_loop): Set it. * doloop.c (doloop_modify_runtime): Correct count for unrolled loops. Hmm, I suppose I should be adding the testcase in gcc.c-torture/execute/ since these get compiled with -funroll-all-loops. Index: gcc/loop.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/loop.h,v retrieving revision 1.61 diff -u -p -r1.61 loop.h --- gcc/loop.h 30 May 2002 20:55:11 -0000 1.61 +++ gcc/loop.h 12 Jul 2002 01:50:43 -0000 @@ -316,6 +316,9 @@ struct loop_info int has_multiple_exit_targets; /* Nonzero if there is an indirect jump in the current function. */ int has_indirect_jump; + /* Whether loop unrolling has emitted copies of the loop body so + that the main loop needs no exit tests. */ + int preconditioned; /* Register or constant initial loop value. */ rtx initial_value; /* Register or constant value used for comparison test. */ Index: gcc/unroll.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/unroll.c,v retrieving revision 1.169 diff -u -p -r1.169 unroll.c --- gcc/unroll.c 30 Jun 2002 05:06:01 -0000 1.169 +++ gcc/unroll.c 12 Jul 2002 01:50:46 -0000 @@ -1135,6 +1135,9 @@ unroll_loop (loop, insn_count, strength_ /* Keep track of the unroll factor for the loop. */ loop_info->unroll_number = unroll_number; + /* And whether the loop has been preconditioned. */ + loop_info->preconditioned = loop_preconditioned; + /* For each biv and giv, determine whether it can be safely split into a different variable for each unrolled copy of the loop body. We precalculate and save this info here, since computing it is Index: gcc/doloop.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/doloop.c,v retrieving revision 1.20 diff -u -p -r1.20 doloop.c --- gcc/doloop.c 24 Jun 2002 02:16:42 -0000 1.20 +++ gcc/doloop.c 12 Jul 2002 01:50:42 -0000 @@ -552,6 +552,7 @@ doloop_modify_runtime (loop, iterations_ { const struct loop_info *loop_info = LOOP_INFO (loop); HOST_WIDE_INT abs_inc; + HOST_WIDE_INT abs_loop_inc; int neg_inc; rtx diff; rtx sequence; @@ -595,13 +596,18 @@ doloop_modify_runtime (loop, iterations_ except in cases where the loop never terminates. So we don't need to use this more costly calculation. - If the loop has been unrolled, then the loop body has been - preconditioned to iterate a multiple of unroll_number times. If - abs_inc is != 1, the full calculation is - - t1 = abs_inc * unroll_number; - n = abs (final - initial) / t1; - n += (abs (final - initial) % t1) > t1 - abs_inc; + If the loop has been unrolled, the full calculation is + + t1 = abs_inc * unroll_number; increment per loop + n = abs (final - initial) / t1; full loops + n += (abs (final - initial) % t1) != 0; partial loop + + However, in certain cases the unrolled loop will be preconditioned + by emitting copies of the loop body with conditional branches, + so that the unrolled loop is always a full loop and thus needs + no exit tests. In this case we don't want to add the partial + loop count. As above, when t1 is a power of two we don't need to + worry about overflow. The division and modulo operations can be avoided by requiring that the increment is a power of 2 (precondition_loop_p enforces @@ -667,20 +673,22 @@ doloop_modify_runtime (loop, iterations_ } } - if (abs_inc * loop_info->unroll_number != 1) + abs_loop_inc = abs_inc * loop_info->unroll_number; + if (abs_loop_inc != 1) { int shift_count; - shift_count = exact_log2 (abs_inc * loop_info->unroll_number); + shift_count = exact_log2 (abs_loop_inc); if (shift_count < 0) abort (); - if (abs_inc != 1) + if (!loop_info->preconditioned) diff = expand_simple_binop (GET_MODE (diff), PLUS, - diff, GEN_INT (abs_inc - 1), + diff, GEN_INT (abs_loop_inc - 1), diff, 1, OPTAB_LIB_WIDEN); - /* (abs (final - initial) + abs_inc - 1) / (abs_inc * unroll_number) */ + /* (abs (final - initial) + abs_inc * unroll_number - 1) + / (abs_inc * unroll_number) */ diff = expand_simple_binop (GET_MODE (diff), LSHIFTRT, diff, GEN_INT (shift_count), diff, 1, OPTAB_LIB_WIDEN); -- Alan Modra IBM OzLabs - Linux Technology Centre
Responsible-Changed-From-To: unassigned->amodra Responsible-Changed-Why: Patch pending review.
From: amodra@gcc.gnu.org To: gcc-gnats@gcc.gnu.org Cc: Subject: optimization/7130 Date: 20 Jul 2002 00:31:15 -0000 CVSROOT: /cvs/gcc Module name: gcc Changes by: amodra@gcc.gnu.org 2002-07-19 17:31:15 Modified files: gcc : ChangeLog loop.h unroll.c doloop.c Log message: PR optimization/7130 * loop.h (struct loop_info): Add "preconditioned". * unroll.c (unroll_loop): Set it. * doloop.c (doloop_modify_runtime): Correct count for unrolled loops. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=1.14891&r2=1.14892 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/loop.h.diff?cvsroot=gcc&r1=1.61&r2=1.62 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/unroll.c.diff?cvsroot=gcc&r1=1.169&r2=1.170 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/doloop.c.diff?cvsroot=gcc&r1=1.20&r2=1.21
From: amodra@gcc.gnu.org To: gcc-gnats@gcc.gnu.org Cc: Subject: optimization/7130 Date: 17 Sep 2002 03:25:07 -0000 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-3_2-branch Changes by: amodra@gcc.gnu.org 2002-09-16 20:25:07 Modified files: gcc : ChangeLog doloop.c loop.h unroll.c Log message: Merge from mainline. 2002-07-20 Alan Modra <amodra@bigpond.net.au> PR optimization/7130 * loop.h (struct loop_info): Add "preconditioned". * unroll.c (unroll_loop): Set it. * doloop.c (doloop_modify_runtime): Correct count for unrolled loops. 2002-06-24 Alan Modra <amodra@bigpond.net.au> PR optimization/6984 * doloop.c (doloop_valid_p): Correct comment. (doloop_modify_runtime <abs_inc != 1>): Simplify. (doloop_modify_runtime <do-while>): Don't emit code when NE. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_2-branch&r1=1.13152.2.657.2.48&r2=1.13152.2.657.2.49 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/doloop.c.diff?cvsroot=gcc&only_with_tag=gcc-3_2-branch&r1=1.16&r2=1.16.8.1 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/loop.h.diff?cvsroot=gcc&only_with_tag=gcc-3_2-branch&r1=1.58.6.1&r2=1.58.6.1.4.1 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/unroll.c.diff?cvsroot=gcc&only_with_tag=gcc-3_2-branch&r1=1.160.2.4.2.1&r2=1.160.2.4.2.2
State-Changed-From-To: analyzed->closed State-Changed-Why: Fixed mainline and 3.2 branch