Bug 15927 - THUMB -O2: strength-reduced iteration variable ends up off by 1
THUMB -O2: strength-reduced iteration variable ends up off by 1
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: target
3.4.0
: P2 normal
: 3.4.2
Assigned To: Richard Earnshaw
: wrong-code
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-06-11 00:34 UTC by Dan Bornstein
Modified: 2004-10-30 21:09 UTC (History)
2 users (show)

See Also:
Host:
Target: arm-unknown-elf
Build:
Known to work: 4.0.0
Known to fail: 3.4.0
Last reconfirmed: 2004-06-22 14:41:39


Attachments
C source that demonstrates the problem (4.19 KB, text/plain)
2004-06-11 00:41 UTC, Dan Bornstein
Details
Annotated assembly (3.49 KB, text/plain)
2004-06-11 00:41 UTC, Dan Bornstein
Details
Better C Source (2.72 KB, text/plain)
2004-06-11 05:32 UTC, Dan Bornstein
Details
Another C File (m2.c) (888 bytes, text/plain)
2004-06-19 00:36 UTC, Dan Bornstein
Details
Annotated source for m2.c (2.68 KB, text/plain)
2004-06-19 00:37 UTC, Dan Bornstein
Details
annotated RTL dumps for m2.c (3.85 KB, text/plain)
2004-06-21 18:07 UTC, Dan Bornstein
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Bornstein 2004-06-11 00:34:44 UTC
When compiling a particular function with -O2 and targetting THUMB, I found that the compiler 
generated incorrect code. In particular it looks like an iteration variable that was strength-reduced ends 
up off by one: In each iteration, the value used for the variable in question appears to be the value that 
ought to be used in the *subsequent* iteration.

The line in question that gets erroneously compiled is:

    ic = scan->ics[n];

I'll attach the raw source (pared down but still verified to compile) and the annotated assembly file as 
soon as I figure out how to do that with bugzilla.
Comment 1 Dan Bornstein 2004-06-11 00:38:11 UTC
Here's how the compiler was configured and compiled:

$ tar -xjvf gcc-core-3.4.0.tar.bz2
$ cd gcc-3.4.0
$ ./configure --prefix=/usr/local/armdev --target=arm-elf --with-newlib --enable-languages=c

The following lines were uncommented in gcc/config/arm/t-arm-elf:

MULTILIB_OPTIONS    += mno-thumb-interwork/mthumb-interwork
MULTILIB_DIRNAMES   += normal interwork
MULTILIB_EXCEPTIONS += *mapcs-26/*mthumb-interwork*

$ make
$ sudo make install

In case it matters, I'm currently using binutils-2.15.
Comment 2 Dan Bornstein 2004-06-11 00:41:03 UTC
Created attachment 6510 [details]
C source that demonstrates the problem

Sorry about how long the function is. If I do almost anything else to it the
bug stops manifesting itself.
Comment 3 Dan Bornstein 2004-06-11 00:41:47 UTC
Created attachment 6511 [details]
Annotated assembly
Comment 4 Dan Bornstein 2004-06-11 00:42:38 UTC
How the source code was compiled:

arm-elf-gcc -save-temps -W -Wall -Wno-unused-parameter -c jpeg_decode_4.c -mthumb -fno-
builtin -mthumb-interwork -fsigned-char -funsigned-bitfields -O2 -MD -MT jpeg_decode_4.o -MF 
jpeg_decode_4.d -o jpeg_decode_4.o
Comment 5 Dan Bornstein 2004-06-11 05:32:18 UTC
Created attachment 6514 [details]
Better C Source

I managed to pare the source down a bit more while still getting the bogus
output, though the register assignments are a little different. Here's the bad
assembly in this version:

	ldr	r3, [sp, #28]
	ldmia	r3!, {r2}
	ldr	r2, [sp, #40]
	ldr	r5, [r4, #20]
	str	r3, [sp, #28]
	str	r3, [sp, #44] // off by one!
Comment 6 Dan Bornstein 2004-06-15 23:18:51 UTC
Better compile line:

arm-elf-gcc -save-temps -W -Wall -c jpeg_decode_4.c -mthumb -O2

BTW, out of curiosity, what was wrong with my "Host" and "Build" triples? (Apologies: I was unable to 
find good documentation about what to fill in for them.)
Comment 7 Dan Bornstein 2004-06-15 23:42:05 UTC
I've now determined that changing the optimization options to "-O2 -fno-strength-reduce" makes the 
problem go away and that changing the optimization options to "-O1 -fstrength-reduce" makes the 
problem manifest itself.
Comment 8 Richard Earnshaw 2004-06-16 09:32:11 UTC
Subject: Re:  THUMB -O2: strength-reduced
	iteration variable ends up off by 1

On Wed, 2004-06-16 at 00:18, danfuzz at milk dot com wrote:
> BTW, out of curiosity, what was wrong with my "Host" and "Build" triples? (Apologies: I was unable to 
> find good documentation about what to fill in for them.)

Nothing in particular.  But this problem is independent of the host and
build so it makes life easier in this case to leave them unspecified.  
This way maintainers looking for host-specific problems won't match on
bugs that aren't interesting to them.
Comment 9 Dan Bornstein 2004-06-19 00:36:33 UTC
Created attachment 6573 [details]
Another C File (m2.c)

Here's another file (m2.c) that exhibits the same, or at least a similar,
problem even with -fno-strength-reduce. The code was compiled with the
following commandline:

    $ arm-elf-gcc -W -Wall -save-temps -c m2.c -mthumb -fno-builtin
-fsigned-char
    -funsigned-bitfields -Os -fno-strength-reduce -fno-strict-aliasing

Annotated asm code coming shortly.
Comment 10 Dan Bornstein 2004-06-19 00:37:08 UTC
Created attachment 6574 [details]
Annotated source for m2.c
Comment 11 Dan Bornstein 2004-06-19 00:49:45 UTC
With the new file (m2.c) I tried turning off every single fine-grained "-f" option documented as being 
enabled by "-Os" and still the problem persists. That is, the following compile line still causes the 
compiler to spit out bad code:

    $ arm-elf-gcc -W -Wall -save-temps -c m2.c -mthumb -fno-builtin -Os -fno-strength-reduce 
    -fno-strict-aliasing -fno-force-mem -fno-optimize-sibling-calls -fno-cse-follow-jumps
    -fno-cse-skip-blocks -fno-rerun-cse-after-loop -fno-rerun-loop-opt -fno-gcse  -fno-gcse-lm  
    -fno-gcse-sm  -fno-gcse-las -fno-delete-null-pointer-checks -fno-expensive-optimizations
    -fno-regmove -fno-schedule-insns  -fno-schedule-insns2 -fno-sched-interblock
    -fno-sched-spec -fno-caller-saves -fno-peephole2 -fno-reorder-blocks  -fno-reorder-functions 
    -fno-strict-aliasing -fno-unit-at-a-time -fno-crossjumping 

This leads me to believe the documentation (at <http://gcc.gnu.org/onlinedocs/gcc-3.4.0/gcc/
Optimize-Options.html#Optimize%20Options>) is out-of-date with respect to what options -O2 and 
-Os enable.
Comment 12 Dan Bornstein 2004-06-21 16:15:44 UTC
Ok, scratch that thing about options being out-of-date. I didn't realize (until rereading the docs a little 
more closely and then verifying in the sources) that "-Os" turns on optimizations that aren't available 
via any fine-grained "-f" option. (Maybe that should be changed? Why not have "-foptimize-size"?)
Comment 13 Dan Bornstein 2004-06-21 18:07:53 UTC
Created attachment 6595 [details]
annotated RTL dumps for m2.c

I believe I've managed to narrow down when the error gets introduced to after
local register allocation and perhaps during global register allocation.
Excerpts of -dl and -dg output with commentary are attached.
Comment 14 Dan Bornstein 2004-06-21 18:55:58 UTC
I've now determined that adding "-fnew-ra" to the compile options for m2.c makes the problem go 
away. However, I'm reticent to switch to using that option across-the-board since the documentation 
implies that it's not yet ready for prime time.
Comment 15 Dan Bornstein 2004-06-21 23:43:01 UTC
I just looked through bugzilla, and it looks like this bug may be the same as the ones reported as bugs 
#15289 and #16028.
Comment 16 Dan Bornstein 2004-06-22 00:09:29 UTC
Unfortunately, it looks like the patch in the comments of bug #16028 does not in fact fix this bug.
Comment 17 Dan Bornstein 2004-06-22 02:07:27 UTC
...though it *does* look like the problem in this case is introduced in the function reload(), based on 
adding an rtl dump immediately before and immediately after the call to reload() inside global_alloc() 
(line 606 of global.c in the 3.4.0 sources).
Comment 18 Richard Earnshaw 2004-06-22 14:41:38 UTC
Confrimed.  Also occurs on the trunk with similar compilation options.

I dont' think this is related to the two reports you cite.  In fact, I suspect
it's a problem with validating that a register hasn't already been allocated for
use in reloads in the current insn.  Looking earlier in the greg dump (m2.c) we see:

Reloads for insn # 44
Reload 0: LO_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), can't combine,
secondary_reload_p
	reload_reg_rtx: (reg:SI 2 r2)
Reload 1: reload_in (SI) = (reg/v/f:SI 102 [ xb ])
	reload_out (SI) = (reg/v/f:SI 102 [ xb ])
	BASE_REGS, RELOAD_OTHER (opnum = 1)
	reload_in_reg: (reg/v/f:SI 102 [ xb ])
	reload_out_reg: (reg/v/f:SI 102 [ xb ])
	reload_reg_rtx: (reg:SI 1 r1)
	secondary_out_reload = 0

Reload 2: reload_out (SI) = (reg/v:SI 106 [ y ])
	LO_REGS, RELOAD_FOR_OUTPUT (opnum = 0)
	reload_out_reg: (reg/v:SI 106 [ y ])
	reload_reg_rtx: (reg:SI 2 r2)

and from this we can see that r2 is allocated (reload_reg_rtx) for use by both
reload 0 and reload 2.  

As an aside, it's not entirely obvious to me why reload thinks that the first
reload is needed at all in this case: it could be in some circumstances, but in
this instance we can directly load/store the base address from the stack, and
directly store the result to the stack.  A third reload seems to be redundant.
Comment 19 Dan Bornstein 2004-06-23 00:45:07 UTC
I have no idea if this is useful information or not, but it looks like the problem is introduced during the 
call to reload_as_needed() on line 1061 of reload1.c (of the 3.4.0 sources).

Sorry for being thick, but it is unclear to me whether gcc's state is already messed up by the time the 
call gets made and the call does the "right" thing with bad data, or whether it is this call itself which 
actually causes the problem in the first place.
Comment 20 Richard Earnshaw 2004-06-23 10:28:43 UTC
It would appear that the back-end incorrectly requesting a secondary reload is
the cause of the later confusion.  So this is probably a target-only bug.
Comment 21 CVS Commits 2004-06-23 10:40:00 UTC
Subject: Bug 15927

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rearnsha@gcc.gnu.org	2004-06-23 10:39:52

Modified files:
	gcc            : ChangeLog 
	gcc/config/arm : arm.h 

Log message:
	PR target/15927
	* arm.h (THUMB_SECONDARY_OUTPUT_RELOAD_CLASS): Don't need a secondary
	reload if CLASS is BASE_REGS.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.4093&r2=2.4094
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/arm/arm.h.diff?cvsroot=gcc&r1=1.238&r2=1.239

Comment 22 Richard Earnshaw 2004-06-23 11:26:10 UTC
Fixed on the trunk, but will have to wait until 3.4.1 is shipped before it can
be put on the branch.  (There's a small risk on this one that it will have
unintended consequences)
Comment 23 Dan Bornstein 2004-06-23 17:31:51 UTC
Confirmed that my original code runs correctly with the patch. Thanks!
Comment 24 Mark Mitchell 2004-08-23 19:24:23 UTC
Richard, is there any reason not to apply this patch before 3.4.2?
Comment 25 Richard Earnshaw 2004-08-24 09:23:03 UTC
Subject: Re:  THUMB -O2: strength-reduced iteration
	variable ends up off by 1

On Mon, 2004-08-23 at 20:24, mmitchel at gcc dot gnu dot org wrote:
> ------- Additional Comments From mmitchel at gcc dot gnu dot org  2004-08-23 19:24 -------
> Richard, is there any reason not to apply this patch before 3.4.2?

No.  I wanted to give it time to gestate on the mainline for a while,
but it's done that now and now thrown up any problems, so I'll check it
in.

R.
Comment 26 CVS Commits 2004-08-25 15:56:00 UTC
Subject: Bug 15927

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-branch
Changes by:	rearnsha@gcc.gnu.org	2004-08-25 15:55:55

Modified files:
	gcc            : ChangeLog 
	gcc/config/arm : arm.h 

Log message:
	PR target/15927
	* arm.h (THUMB_SECONDARY_OUTPUT_RELOAD_CLASS): Don't need a secondary
	reload if CLASS is BASE_REGS.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.595&r2=2.2326.2.596
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/arm/arm.h.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.214.2.4&r2=1.214.2.5

Comment 27 CVS Commits 2004-08-25 16:08:55 UTC
Subject: Bug 15927

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	csl-arm-branch
Changes by:	rearnsha@gcc.gnu.org	2004-08-25 16:08:49

Modified files:
	gcc            : ChangeLog.csl-arm 
	gcc/config/arm : arm.h 

Log message:
	PR target/15927
	* arm.h (THUMB_SECONDARY_OUTPUT_RELOAD_CLASS): Don't need a secondary
	reload if CLASS is BASE_REGS.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.csl-arm.diff?cvsroot=gcc&only_with_tag=csl-arm-branch&r1=1.1.2.16&r2=1.1.2.17
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config/arm/arm.h.diff?cvsroot=gcc&only_with_tag=csl-arm-branch&r1=1.210.2.39&r2=1.210.2.40

Comment 28 Richard Earnshaw 2004-08-25 16:11:08 UTC
Patch applied to 3.4 and csl-arm branches