Bug 42722 - move_by_pieces() incorrectly pushes structures to stack
Summary: move_by_pieces() incorrectly pushes structures to stack
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.4.2
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: argument, return
  Show dependency treegraph
 
Reported: 2010-01-13 09:43 UTC by Ivan Shcherbakov
Modified: 2021-08-16 22:29 UTC (History)
1 user (show)

See Also:
Host: all
Target: msp430-gcc
Build: all
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Test case (requires msp430 back end) (287 bytes, text/plain)
2011-02-21 09:45 UTC, Peter A. Bigot
Details
Generated code before patch applied (386 bytes, application/octet-stream)
2011-02-21 09:45 UTC, Peter A. Bigot
Details
Generated code after patch applied (386 bytes, application/octet-stream)
2011-02-21 09:46 UTC, Peter A. Bigot
Details
Fixes problem (480 bytes, patch)
2011-02-21 09:48 UTC, Peter A. Bigot
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan Shcherbakov 2010-01-13 09:43:47 UTC
The move_by_pieces() function incorrectly works on targets with enabled PUSH_ARGS that support post-increment loads. To reproduce the bug, the following code should be compiled:

struct test { int a, b, c, d, e, f; } //Enough fields to pass structure in stack
void func2(test copy);
void func1(test copy) {func2(copy);}

Every time a copy of our big structure is pushed to stack, its fields will go in reverse order. This happens due to a bug in move_by_pieces() function from expr.c that generates code for pushing a memory block in stack. 
This happens because if the target supports post-increment addressing, the USE_LOAD_POST_INCREMENT() condition is met and the memory block to be pushed is parsed from start to end using autoincrement addressing. The "reverse" flag specifying that the push statements should traverse the block from end to start is then ignored.
The proposed fix explicitly disables using post-increment mode if the "reverse" flag is set.

The following patch fixes the problem:
---------------------------------------------
--- expr.old	Thu Aug 20 00:52:11 2009
+++ expr.new	Tue Jan 12 23:32:05 2010
@@ -952,13 +952,13 @@
       if (USE_LOAD_PRE_DECREMENT (mode) && data.reverse && ! data.autinc_from)
 	{
 	  data.from_addr = copy_addr_to_reg (plus_constant (from_addr, len));
 	  data.autinc_from = 1;
 	  data.explicit_inc_from = -1;
 	}
-      if (USE_LOAD_POST_INCREMENT (mode) && ! data.autinc_from)
+      if (USE_LOAD_POST_INCREMENT (mode) && !data.reverse && ! data.autinc_from)
 	{
 	  data.from_addr = copy_addr_to_reg (from_addr);
 	  data.autinc_from = 1;
 	  data.explicit_inc_from = 1;
 	}
       if (!data.autinc_from && CONSTANT_P (from_addr))
Comment 1 Steven Bosscher 2010-01-30 15:33:53 UTC
What is msp430-gcc?
Comment 2 Ivan Shcherbakov 2010-01-30 15:37:30 UTC
(In reply to comment #1)
> What is msp430-gcc?
> 

msp430-gcc is a port of GCC on TI MSP430 platform, that revealed the bug. However, as shown in the patch, the discovered problem is general to GCC4 (will cause problems on all targets having USE_LOAD_PRE_DECREMENT) and is not specific to msp430-gcc.
Comment 3 Richard Biener 2010-01-30 16:01:17 UTC
Patches should be sent to gcc-patches@gcc.gu.org with a ChangeLog entry and
a note how you tested the patch.
Comment 4 Mikael Pettersson 2010-01-30 19:54:34 UTC
(In reply to comment #2)
> However, as shown in the patch, the discovered problem is general to GCC4 (will
> cause problems on all targets having USE_LOAD_PRE_DECREMENT)

The "test case" is syntactically invalid and doesn't do anything to inform the user if it's been miscompiled. Fixing those issues I still don't see anything wrong on ARM (which defines USE_LOAD_PRE_DECREMENT via HAVE_PRE_DECREMENT).
Comment 5 Ivan Shcherbakov 2010-01-30 20:08:12 UTC
Oops, missed a semicolon in the first line of "test case" (compile it with g++).
ARM uses ACCUMULATE_OUTGOING_ARGS instead of PUSH_ARGS. To reproduce the problem on ARM, you need to comment out "#define ACCUMULATE_OUTGOING_ARGS 1" and add "#define PUSH_ARGS 1" in "gcc/config/arm/arm.h", so that func1() will contain several "push" instructions rather than "mov xxx, [sp + yyy]".
To see that the code has been compiled incorrectly, simply examine the assembly output of "func1()", that will push the "copy" to stack starting with first word and ending with the last one (causing the layout in stack to be inverted).
Comment 6 Mikael Pettersson 2010-01-30 21:24:22 UTC
(In reply to comment #5)
> ARM uses ACCUMULATE_OUTGOING_ARGS instead of PUSH_ARGS. To reproduce the
> problem on ARM, you need to comment out "#define ACCUMULATE_OUTGOING_ARGS 1"
> and add "#define PUSH_ARGS 1" in "gcc/config/arm/arm.h"

Doesn't work, an ARM gcc built with that modification ICEs while trying to compile __muldi3() in libgcc2.c.
Comment 7 Ivan Shcherbakov 2010-01-30 21:30:09 UTC
(In reply to comment #6)
> Doesn't work, an ARM gcc built with that modification ICEs while trying to
> compile __muldi3() in libgcc2.c.
You don't need libgcc to reproduce the problem. If gcc build has gone to this step, the xgcc/cc1/cc1plus were successfully built and can be called as "<your-gcc-build-directory>/gcc/xgcc". Just invoke cc1plus from the build/gcc directory directly:
./cc1plus -O2 test.cpp -o test.S
The resulting .S file should be enough to see whether the problem is reproduced on ARM.
Comment 8 Mikael Pettersson 2010-01-30 21:57:49 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Doesn't work, an ARM gcc built with that modification ICEs while trying to
> > compile __muldi3() in libgcc2.c.
> You don't need libgcc to reproduce the problem. If gcc build has gone to this
> step, the xgcc/cc1/cc1plus were successfully built and can be called as
> "<your-gcc-build-directory>/gcc/xgcc". Just invoke cc1plus from the build/gcc
> directory directly:
> ./cc1plus -O2 test.cpp -o test.S
> The resulting .S file should be enough to see whether the problem is reproduced
> on ARM.

A known broken gcc cannot be used to prove or disprove a bug in gcc.

Comment 9 Ivan Shcherbakov 2010-01-30 22:48:32 UTC
In general 'black box' case - yes. However, here there are 2 factors:
1) It is clear from the sources that 2 similar situations (pre-decrement and post-increment) are handled in a different way, an additional check performed in pre-decrement is forgotten in post-increment mode.
2) The bug is clearly reproduced on the non-standard msp430-gcc target. However, the analysis of the expr.c code (see factor 1) indicates that this is a general GCC problem caused by rarely occurring factor combination, rather than bug in msp430 target implementation. To reproduce it outside this target, you need to combine 2 factors: supported USE_LOAD_POST_INCREMENT and supported PUSH_ARGS.
You can alternatively build the msp430-gcc compiler (see http://mspgcc4.sourceforge.net/, the site provides an automatic downloading & building script) to see a non-broken gcc compiler having the problem (you will need to undo the fix applied by MSPGCC4 build script).

The problem is that "a case not used by mainstream ports is handled incorrectly by platform-independent part of GCC" and the problem now is to prove that the bug is common to gcc and not caused by msp430 port. Running a modified ARM port in debugger (setting breakpoint on mve_by_pieces()) can clearly illustrate, how GCC makes wrong decision to push the structure in a reversed way. The code making this decision (and generating move insns) does not depend on any other global definitions and is not related to the problem in __muldi3().

Finally, the m32c port defines both PUSH_ARGS and HAVE_POST_INCREMENT, so it can be possible to reproduce the bug on this port.
Comment 10 Mikael Pettersson 2010-02-09 08:58:13 UTC
(In reply to comment #9)
> Finally, the m32c port defines both PUSH_ARGS and HAVE_POST_INCREMENT, so it
> can be possible to reproduce the bug on this port.

> cat pr42722.c
struct test {
    int a, b, c, d;
};
void func2(struct test copy);
void func1(struct test copy) {func2(copy);}
> m32c-unknown-elf-gcc -Os -S pr42722.c
> cat pr42722.s 
        .file   "pr42722.c"
.text
        .global _func1
        .type   _func1, @function
_func1:
        enter   #0
        push.w  11[fb]
        push.w  9[fb]
        push.w  7[fb]
        push.w  5[fb]
        jsr.a   _func2
        add.w   #8,sp
        exitd
        .size   _func1, .-_func1
        .ident  "GCC: (GNU) 4.4.3"

which looks correct to me (not knowing m32c asm at all). With a larger struct gcc generates a memcpy() call instead.
Comment 11 Ivan Shcherbakov 2010-02-10 12:44:42 UTC
Ok, m32c uses "memcpy" for all block moves (4 ints is a DImode, not BLKmode). Additionally, it supports pre-decrement addressing mode, that enables a mutually exclusive case to the one showing the bug.
To reproduce the bug on m32c, change the following:
1. in m32c.h replace "#define HAVE_PRE_DECREMENT 1" with 0
2. Comment out "(define_expand "movmemhi"" in blkmov.md
Then, compiling the example with structure containing 5 integers produces this:
	.file	"0.c"
.text
	.global	_func1
	.type	_func1, @function
_func1:
	enter	#0
	push.w	7[fb]
	push.w	5[fb]
	mova	9[fb],a0
	push.w	2[a0]
	push.w	[a0]
	push.w	4[a0]
	jsr.a	_func2
	add.w	#10,sp
	exitd
	.size	_func1, .-_func1
	.ident	"GCC: (GNU) 4.4.3"

Obviously, the push order (fp+7,fp+5,fp+11,fp+9,...) is incorrect.

Note that removing hardcoded "memcpy" call and commenting out pre-increment support was needed ONLY to reproduce the bug normally happening on MSP430 using a supported M32C platform.
Comment 12 Peter A. Bigot 2011-02-21 09:45:15 UTC
Created attachment 23418 [details]
Test case (requires msp430 back end)

Sorry, I don't speak dejagnu well enough yet to put target test conditionals into the code; this test lives in the gcc.target/msp430 directory in my environment.
Comment 13 Peter A. Bigot 2011-02-21 09:45:52 UTC
Created attachment 23419 [details]
Generated code before patch applied
Comment 14 Peter A. Bigot 2011-02-21 09:46:30 UTC
Created attachment 23420 [details]
Generated code after patch applied
Comment 15 Peter A. Bigot 2011-02-21 09:48:23 UTC
Created attachment 23421 [details]
Fixes problem
Comment 16 Peter A. Bigot 2011-02-21 09:50:00 UTC
I've now taken over the msp430 back end and created the attached patch and test case.  This is relative to the gcc trunk as of a couple weeks ago; still present in all releases up to that point.

Sorry it doesn't show up with any other back end, but perhaps an appeal to inspection could work while we get the msp430 target ready for integration:  This is the only conditional of four in the same block that does not check for data.reverse before doing its thing.

It's definitely wrong.