Bug 33699 - [12/13/14/15 regression] missing optimization on const addr area store
Summary: [12/13/14/15 regression] missing optimization on const addr area store
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.3.0
: P2 normal
Target Milestone: 12.5
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2007-10-08 16:54 UTC by fshvaige
Modified: 2024-07-19 12:54 UTC (History)
8 users (show)

See Also:
Host:
Target: mips*-* powerpc*-*-* x86_64-*-*
Build:
Known to work: 3.4.0
Known to fail: 4.0.0, 4.1.3, 4.2.2, 4.3.0, 4.6.0, 6.3.0, 7.0
Last reconfirmed: 2018-02-01 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description fshvaige 2007-10-08 16:54:26 UTC
Same problem for
-0s/-02
version 4.1.0
etc...


[Code]

typedef unsigned * ptr_t;
void f (void) {
    ptr_t p = (ptr_t)0xFED0;
    p[0] = 0xDEAD;
    p[2] = 0xDEAD;
    p[4] = 0xDEAD;
    p[6] = 0xDEAD;
}


[Assembly generated by version gcc-4.3-20071005]

00000000 <f>:
   0:	3404dead 	li	a0,0xdead
   4:	3402fee8 	li	v0,0xfee8
   8:	3403fed0 	li	v1,0xfed0
   c:	ac440000 	sw	a0,0(v0)
  10:	ac640000 	sw	a0,0(v1)
  14:	3402fed8 	li	v0,0xfed8
  18:	3403fee0 	li	v1,0xfee0
  1c:	ac440000 	sw	a0,0(v0)
  20:	03e00008 	jr	ra
  24:	ac640000 	sw	a0,0(v1)


[Assembly generated by version 3.4.5 (seems better)]

00000000 <f>:
   0:	3403fed0 	li	v1,0xfed0
   4:	3402dead 	li	v0,0xdead
   8:	ac620018 	sw	v0,24(v1)
   c:	ac620000 	sw	v0,0(v1)
  10:	ac620008 	sw	v0,8(v1)
  14:	03e00008 	jr	ra
  18:	ac620010 	sw	v0,16(v1)
  1c:	00000000 	nop


[Version]

Using built-in specs.
Target: mips-elf
Configured with: ../gcc-4.3-20071005/configure --enable-languages=c,c++ --prefix=/auto/mipaproj/fshvaige/apps/Linux/gcc-4.3-20071005 --target=mips-elf --program-suffix=.mips --without-headers --with-newlib
Thread model: single
gcc version 4.3.0 20071005 (experimental) (GCC) 


[Command line options]

gcc.mips -c -o main.o -v -save-temps -O3 -march=mips64 -mabi=eabi -mexplicit-relocs main.c
Comment 1 Andrew Pinski 2007-12-26 01:33:50 UTC
The issue here is that we are using constants as being free in the first place and not being able to decompose them later on, in the RTL level.
Comment 2 Steven Bosscher 2008-01-07 18:24:00 UTC
This is related to some work done in the past for auto-increment addressing modes (even though there are no auto-inc/dec modes in the reporter's assembly).  See one of Joern's old patches: http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01612.html

Look at the comment before optimize_related_value() to understand what this patch is supposed to achieve.  Let's not talk about how it achieved this -- it suffices to say that the patch is not in the trunk -- but we really do need a pass over RTL to optimize this kind of thing.
Comment 3 Joseph S. Myers 2008-07-04 22:18:43 UTC
Closing 4.1 branch.
Comment 4 Jorn Wolfgang Rennecke 2009-02-08 12:49:33 UTC
(In reply to comment #2)
> This is related to some work done in the past for auto-increment addressing
> modes

Actually, the problem with constants that are loaded into registers -
and in the same basic block, at that - is much simpler.
If the targets rtx_cost works properly, then reload_cse_move2add should
fix up this code.

We need, however, some way to deal with the case where constants are expensive
addresses; this is completely broken at the moment.  Complete unrolling of
loops accessing static arrays can create oodles of constant addresses; I've
managed to split these up with LEGITIMIZE_ADDRESS, the movsi expander, and
a patch to momory_address, however, gcse just recombines the costly constants,
irrespective of what rtx_cost and address_cost says.
And the havoc that gcse can wreak transcends basic blocks, so any attempt to
clean up after if with lesser scope is bound to be inferior.
Comment 5 Joseph S. Myers 2009-03-31 20:12:29 UTC
Closing 4.2 branch.
Comment 6 Adam Nemet 2009-05-28 07:43:13 UTC
Subject: Bug 33699

Author: nemet
Date: Thu May 28 07:42:52 2009
New Revision: 147944

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=147944
Log:
	PR middle-end/33699
	* target.h (struct gcc_target): Fix indentation.  Add
	const_anchor.
	* target-def.h (TARGET_CONST_ANCHOR): New macro.
	(TARGET_INITIALIZER): Use it.
	* cse.c (CHEAPER): Move it up to the other macros.
	(insert): Rename this ...
	(insert_with_costs): ... to this.  Add cost parameters.  Update
	function comment.
	(insert): New function.  Call insert_with_costs.
	(compute_const_anchors, insert_const_anchor, insert_const_anchors,
	find_reg_offset_for_const, try_const_anchors): New functions.
	(cse_insn): Call try_const_anchors.  Adjust cost of src_related
	when using a const-anchor.  Call insert_const_anchors.
	* config/mips/mips.c (mips_set_mips16_mode): Set
	targetm.const_anchor.
	* doc/tm.texi (Misc): Document TARGET_CONST_ANCHOR.

testsuite/
	* gcc.target/mips/const-anchor-1.c: New test.
	* gcc.target/mips/const-anchor-2.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/mips/const-anchor-1.c
    trunk/gcc/testsuite/gcc.target/mips/const-anchor-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/mips/mips.c
    trunk/gcc/cse.c
    trunk/gcc/doc/tm.texi
    trunk/gcc/target-def.h
    trunk/gcc/target.h
    trunk/gcc/testsuite/ChangeLog

Comment 7 Adam Nemet 2009-05-28 07:49:14 UTC
Note that the above patch does not yet fix the testcase.  Besides this patch we need some more cost adjustments and also some changes in fwprop to propagate into the address expression.
Comment 8 Richard Biener 2009-08-04 12:28:24 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 9 Andrew Pinski 2010-03-12 23:56:39 UTC
PowerPC has the same issue.  X86 does not because it's move instruction can take a constant address.
Comment 10 Richard Biener 2010-05-22 18:11:42 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 11 Richard Biener 2011-03-04 12:12:27 UTC
Even on x86 it's smaller to not replicate 0xDEAD and to use
offset addressing, like

0000000000000000 <f>:
   0: b8 d0 fe 00 00            mov    $0xfed0,%eax
   5: bb ad de 00 00            mov    $0xdead,%ebx
   a: 89 18                     mov    %ebx,(%rax)
   c: 89 58 08                  mov    %ebx,0x8(%rax)
   f: 89 58 10                  mov    %ebx,0x10(%rax)
  12: 89 58 18                  mov    %ebx,0x18(%rax)
  15: c3                        retq   

instead of the generated

   0: c7 04 25 d0 fe 00 00      movl   $0xdead,0xfed0
   7: ad de 00 00 
   b: c7 04 25 d8 fe 00 00      movl   $0xdead,0xfed8
  12: ad de 00 00 
  16: c7 04 25 e0 fe 00 00      movl   $0xdead,0xfee0
  1d: ad de 00 00 
  21: c7 04 25 e8 fe 00 00      movl   $0xdead,0xfee8
  28: ad de 00 00 
  2c: c3                        retq
Comment 12 Michael Matz 2011-03-04 15:33:04 UTC
Smaller perhaps, but it uses two registers, where it originally used none.
For x86 that's the better tradeoff.
Comment 13 Richard Biener 2011-06-27 12:14:26 UTC
4.3 branch is being closed, moving to 4.4.7 target.
Comment 14 Jakub Jelinek 2012-03-13 12:47:52 UTC
4.4 branch is being closed, moving to 4.5.4 target.
Comment 15 Richard Biener 2012-07-02 11:48:47 UTC
The 4.5 branch is being closed, adjusting target milestone.
Comment 16 Andrew Pinski 2012-12-31 10:09:07 UTC
(In reply to comment #12)
> Smaller perhaps, but it uses two registers, where it originally used none.
> For x86 that's the better tradeoff.

Except for the obvious -Os.
Comment 17 Jakub Jelinek 2013-04-12 15:17:00 UTC
GCC 4.6.4 has been released and the branch has been closed.
Comment 18 Richard Biener 2014-06-12 13:46:53 UTC
The 4.7 branch is being closed, moving target milestone to 4.8.4.
Comment 19 Jakub Jelinek 2014-12-19 13:36:14 UTC
GCC 4.8.4 has been released.
Comment 20 Richard Biener 2015-06-23 08:19:02 UTC
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
Comment 21 Jakub Jelinek 2015-06-26 19:59:26 UTC
GCC 4.9.3 has been released.
Comment 22 Richard Biener 2016-08-03 10:43:14 UTC
GCC 4.9 branch is being closed
Comment 23 Martin Sebor 2017-01-13 17:40:19 UTC
No progress in GCC 7.0 which emits the following code for powerpc64le at -O2 (-Os is slightly different but the same size):

0000000000000000 <f>:
   0:	00 00 20 39 	li      r9,0
   4:	00 00 c0 38 	li      r6,0
   8:	00 00 e0 38 	li      r7,0
   c:	00 00 00 39 	li      r8,0
  10:	00 00 40 39 	li      r10,0
  14:	ad de 29 61 	ori     r9,r9,57005
  18:	d0 fe c6 60 	ori     r6,r6,65232
  1c:	d8 fe e7 60 	ori     r7,r7,65240
  20:	e0 fe 08 61 	ori     r8,r8,65248
  24:	e8 fe 4a 61 	ori     r10,r10,65256
  28:	00 00 26 91 	stw     r9,0(r6)
  2c:	00 00 27 91 	stw     r9,0(r7)
  30:	00 00 28 91 	stw     r9,0(r8)
  34:	00 00 2a 91 	stw     r9,0(r10)
  38:	20 00 80 4e 	blr


Clang in contrast emits the following more compact code:

0000000000000000 <f>:
   0:	00 00 60 3c 	lis     r3,0
   4:	00 00 80 38 	li      r4,0
   8:	01 00 a0 3c 	lis     r5,1
   c:	ad de 63 60 	ori     r3,r3,57005
  10:	d0 fe 84 60 	ori     r4,r4,65232
  14:	d0 fe 65 90 	stw     r3,-304(r5)
  18:	08 00 64 90 	stw     r3,8(r4)
  1c:	10 00 64 90 	stw     r3,16(r4)
  20:	18 00 64 90 	stw     r3,24(r4)
  24:	20 00 80 4e 	blr
Comment 24 Jakub Jelinek 2017-10-10 13:28:04 UTC
GCC 5 branch is being closed
Comment 25 Aldy Hernandez 2018-02-01 20:27:08 UTC
(In reply to Martin Sebor from comment #23)
> No progress in GCC 7.0 which emits the following code for powerpc64le at -O2
> (-Os is slightly different but the same size):

Same thing on mainline still.
Comment 26 Jakub Jelinek 2018-04-06 07:04:01 UTC
(In reply to Michael Matz from comment #12)
> Smaller perhaps, but it uses two registers, where it originally used none.
> For x86 that's the better tradeoff.

That can be handled by doing it in some very late post-RA pass, and only do it if we can find a usable register for that.
Comment 27 Jakub Jelinek 2018-10-26 10:13:21 UTC
GCC 6 branch is being closed
Comment 28 Richard Biener 2019-11-14 07:58:22 UTC
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
Comment 29 Jakub Jelinek 2020-03-04 09:39:36 UTC
GCC 8.4.0 has been released, adjusting target milestone.
Comment 30 Jakub Jelinek 2021-05-14 09:45:52 UTC
GCC 8 branch is being closed.
Comment 31 Richard Biener 2021-06-01 08:04:34 UTC
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
Comment 32 Michael Meissner 2021-07-08 17:16:16 UTC
I looked at adding the following powerpc patch that was proposed in March, 2021:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html

There are two parts to the patch, that are sort of unrelated.

The first part is to add minimum and maximum section anchor offset values and use -fsection anchors.  I ran a spec 2017 benchmark on a pre-production power10 system, comparing my normal run times to run times with -fsection-anchors and setting the minimum/maximum section anchor offsets.

Two benchmarks improved and two benchmarks regressed:

    xalancbmk_r: 1.75% regression
    cactuBSSN_r: 4.24% improvement
    blender_r: 1.92% regression
    roms_r: 1.05% improvement

I then built spec 2017 with just the part of setting const_anchor, but not the section anchor minimum/maximum offsets.  Eight benchmarks did not build due to assertion failures in cse.c:

    gcc_r
    exchange2_r
    cactuBSSN_r
    wrf_r
    blender_r
    cam4_r
    fotonik3d_r
    roms_r

If I specify the section anchor minimum/maximum offsets, add -fsection-anchors, and set the const_anchor, all 23 INT+FP benchmarks build, but WRF_R does not run correctly.  So without more debugging, I don't recommend setting const_anchor.  It is probably useful to set the minimum/maximum section anchor offsets in case people use -fsection-anchors.

As an aside, if we wanted to accept using constant addresses in the PowerPC, we would need to recognize a constant address as being legitimate.  This may be useful in some embedded environments where you have devices at certain memory locations.  But somebody would need to add the support.
Comment 33 Richard Biener 2022-05-27 09:33:49 UTC
GCC 9 branch is being closed
Comment 34 Jakub Jelinek 2022-06-28 10:29:29 UTC
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
Comment 35 Richard Biener 2023-07-07 10:28:55 UTC
GCC 10 branch is being closed.
Comment 36 Richard Biener 2024-07-19 12:54:21 UTC
GCC 11 branch is being closed.