Bug 37514 - [4.4 Regression] Wrong code generated for 20021120-1.c with -O3 -fomit-frame-pointer on sh4
Summary: [4.4 Regression] Wrong code generated for 20021120-1.c with -O3 -fomit-frame-...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.4.0
: P4 normal
Target Milestone: 4.4.0
Assignee: Not yet assigned to anyone
URL:
Keywords: ra, wrong-code
Depends on:
Blocks:
 
Reported: 2008-09-14 00:36 UTC by Kazumoto Kojima
Modified: 2008-12-02 08:36 UTC (History)
4 users (show)

See Also:
Host:
Target: sh-elf
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
a test case (784 bytes, text/plain)
2008-09-14 00:38 UTC, Kazumoto Kojima
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kazumoto Kojima 2008-09-14 00:36:10 UTC
gcc.c-torture/execute/20021120-1.c execution fails with -m4 -ml -O3
-fomit-frame-pointer on sh-elf.  gf[0] is wrongly set to 1.0
by foo.  It started to fail after the IRA merge.
Comment 1 Kazumoto Kojima 2008-09-14 00:38:15 UTC
Created attachment 16315 [details]
a test case

I've attached a slightly reduced testcase.
Comment 2 Kazumoto Kojima 2008-09-14 00:38:56 UTC
Here is the problematic part of the generated code:

	mov.w	.L240,r7
	add	#4,r1
	mov.w	.L241,r6
	add	#4,r3
	add	r15,r7
	fmov.s	@r7,fr14
	add	r15,r6
	add	#4,r14
	mov.l	.L242,r7
	fmov.s	fr14,@r2
	add	#4,r13
	fmov.s	@r6,fr14
	add	#4,r12
	add	#4,r11
	mov.l	.L243,r6
	fmov.s	fr14,@r2
	...
.L240:
	.short	368
.L241:
	.short	364

where r2 is set to &gf[0].   Thus the float value in [r15+368]
is written to gf[0] and then the float value in [r15+364] is
stored to gf[0] again.  The corresponding rtl dump in .sched1
is:

(insn:HI 828 826 832 6 xxx.c:42 (parallel [
            (set (mem/s/v:SF (reg/f:SI 434) [3 gf+0 S4 A32])
                (reg/v:SF 289 [ f00 ]))
            (use (reg/v:PSI 151 ))
            (clobber (scratch:SI))
        ]) 205 {movsf_ie} (expr_list:REG_DEAD (reg/f:SI 434)
        (expr_list:REG_DEAD (reg/v:SF 289 [ f00 ])
            (nil))))

(insn:HI 832 828 836 6 xxx.c:42 (parallel [
            (set (mem/s/v:SF (reg/f:SI 551) [3 gf+4 S4 A32])
                (reg/v:SF 288 [ f10 ]))
            (use (reg/v:PSI 151 ))
            (clobber (scratch:SI))
        ]) 205 {movsf_ie} (expr_list:REG_DEAD (reg/f:SI 551)
        (expr_list:REG_DEAD (reg/v:SF 288 [ f10 ])
            (nil))))

which looks to be right.  OTOH that in .ira is:

(insn 2024 826 2025 5 xxx.c:42 (set (reg:SI 7 r7)
        (const_int 368 [0x170])) 175 {movsi_ie} (nil))

(insn 2025 2024 2026 5 xxx.c:42 (set (reg:SI 7 r7)
        (plus:SI (reg:SI 7 r7)
            (reg/f:SI 15 r15))) 35 {*addsi3_compact} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 r15)
            (const_int 368 [0x170]))
        (nil)))

(insn 2026 2025 828 5 xxx.c:42 (parallel [
            (set (reg:SF 78 fr14)
                (mem/c:SF (reg:SI 7 r7) [7 f00+0 S4 A32]))
            (use (reg/v:PSI 151 ))
            (clobber (scratch:SI))
        ]) 205 {movsf_ie} (nil))

(insn:HI 828 2026 2028 5 xxx.c:42 (parallel [
            (set (mem/s/v:SF (reg/f:SI 2 r2 [434]) [3 gf+0 S4 A32])
                (reg:SF 78 fr14))
            (use (reg/v:PSI 151 ))
            (clobber (scratch:SI))
        ]) 205 {movsf_ie} (nil))

(insn 2028 828 2029 5 xxx.c:42 (set (reg:SI 6 r6)
        (const_int 364 [0x16c])) 175 {movsi_ie} (nil))

(insn 2029 2028 2030 5 xxx.c:42 (set (reg:SI 6 r6)
        (plus:SI (reg:SI 6 r6)
            (reg/f:SI 15 r15))) 35 {*addsi3_compact} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 15 r15)
            (const_int 364 [0x16c]))
        (nil)))

(insn 2030 2029 832 5 xxx.c:42 (parallel [
            (set (reg:SF 78 fr14)
                (mem/c:SF (reg:SI 6 r6) [8 f10+0 S4 A32]))
            (use (reg/v:PSI 151 ))
            (clobber (scratch:SI))
        ]) 205 {movsf_ie} (nil))

(insn:HI 832 2030 2033 5 xxx.c:42 (parallel [
            (set (mem/s/v:SF (reg/f:SI 2 r2 [434]) [3 gf+4 S4 A32])
                (reg:SF 78 fr14))
            (use (reg/v:PSI 151 ))
            (clobber (scratch:SI))
        ]) 205 {movsf_ie} (nil))

which shows that IRA uses r434 both for gf+0 and gf+4.
I've confirmed that the error goes away with -fno-ira-share-spill-slots.
Comment 3 Kazumoto Kojima 2008-09-14 00:51:20 UTC
I'd like to add Vlad to the CC list.
Comment 4 Kazumoto Kojima 2008-11-10 23:11:34 UTC
Subject: Bug 37514

Author: kkojima
Date: Mon Nov 10 23:10:10 2008
New Revision: 141752

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141752
Log:
	PR rtl-optimization/37514
	* config/sh/sh.h (OPTIMIZATION_OPTIONS): Set
	flag_ira_share_spill_slots to 2 if it's already non-zero.
	(OVERRIDE_OPTIONS): Clear flag_ira_share_spill_slots if
        flag_ira_share_spill_slots is 2.
	* gcc.target/sh/pr37514.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/sh/pr37514.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh.h
    trunk/gcc/testsuite/ChangeLog

Comment 5 Kazumoto Kojima 2008-11-10 23:25:14 UTC
Currently worked around with the patch in #4.
Comment 6 Vladimir Makarov 2008-11-11 15:22:47 UTC
  Sorry, Kaz.  I missed this PR.  I've just found it after Bernd's email.

  I don't think it is a right solution or stable workaround.  In fact all pseudos (551, 289, 288) involved in 2 wrong insns got different stack slots.  It might be IRA triggered some latent bug in reload inheritence. Reload decided that r2 contains value of 551 (gf+4) but it contains gf+0 (pseudo 434 which died in prev insn 828).

  I'll look at this problem but taking reload complexity into account it will take a few days.
Comment 7 Vladimir Makarov 2008-11-24 22:26:12 UTC
This is a latent bug in reload inheritance which IRA triggered.  Here
is the important info.  r434 was assigned to hard register 2 and r551
was assigned to memory.  After insn #1164, r434 and r551 are
synchronized and as a consequence the value of r551 is in hard register 2
(see reload1.c::reg_last_reload_reg).  After insn 179, the
pseudo-registers should be not synchronized (because of increment of r551) but
reload does not invalidate reg_last_reload_reg[551].  For this case it is done in inc_for_reload by the following code

  /* No hard register is equivalent to this register after
     inc/dec operation.  If REG_LAST_RELOAD_REG were nonzero,
     we could inc/dec that register as well (maybe even using it for
     the source), but I'm not sure it's worth worrying about.  */
  if (REG_P (incloc))
    reg_last_reload_reg[REGNO (incloc)] = 0;

It is not done because incloc is memory assigned to pseudo-register
r551.  So after insn #179 reg_last_reload_reg[551] is still hard
register 2.  This results in that hard register #2 is used for reload #1
in in insn 831 which is wrong

I'll sent a patch solving the problem soon.

--------------------------------------------------------------

(insn:HI 1161 25 27 5 a.i:30 (set (reg/f:SI 434)
        (symbol_ref:SI ("gf") <var_decl 0x7f9a1cc6f500 gf>)) 175 {movsi_ie} (nil))

...


(insn:HI 1164 177 179 6 a.i:15 (set (reg/f:SI 551)
        (reg/f:SI 434)) 175 {movsi_ie} (nil))


(insn:HI 179 1164 183 6 a.i:15 (parallel [
            (set (reg/v:SF 289 [ f00 ])
                (mem/s/v:SF (post_inc:SI (reg/f:SI 551)) [2 gf+0 S4 A32]))
            (use (reg/v:PSI 151 ))
            (clobber (scratch:SI))
        ]) 205 {movsf_ie} (expr_list:REG_INC (reg/f:SI 551)
        (expr_list:REG_EQUAL (mem/s/v:SF (symbol_ref:SI ("gf") <var_decl 0x7f9a1cc6f500 gf>) [2 gf+0 S4 A32])
            (nil))))

...

(insn:HI 827 825 831 6 a.i:19 (parallel [
            (set (mem/s/v:SF (reg/f:SI 434) [2 gf+0 S4 A32])
                (reg/v:SF 289 [ f00 ]))
            (use (reg/v:PSI 151 ))
            (clobber (scratch:SI))
        ]) 205 {movsf_ie} (expr_list:REG_DEAD (reg/f:SI 434)
        (expr_list:REG_DEAD (reg/v:SF 289 [ f00 ])
            (nil))))

(insn:HI 831 827 835 6 a.i:19 (parallel [
            (set (mem/s/v:SF (reg/f:SI 551) [2 gf+4 S4 A32])
                (reg/v:SF 288 [ f10 ]))
            (use (reg/v:PSI 151 ))
            (clobber (scratch:SI))
        ]) 205 {movsf_ie} (expr_list:REG_DEAD (reg/f:SI 551)
        (expr_list:REG_DEAD (reg/v:SF 288 [ f10 ])
            (nil))))


Spilling for insn 827.
Using reg 7 for reload 0
Using reg 78 for reload 2
Spilling for insn 831.
Using reg 2 for reload 0
Using reg 2 for reload 1
Using reg 2 for reload 2
Using reg 78 for reload 4

Reloads for insn # 827
...
Reload 1: reload_out (SF) = (mem/s/v:SF (reg/f:SI 2 r2 [434]) [2 gf+0 S4 A32])
	NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
	reload_out_reg: (mem/s/v:SF (reg/f:SI 2 r2 [434]) [2 gf+0 S4 A32])
...

Reloads for insn # 831
...
Reload 1: reload_in (SI) = (mem/c:SI (plus:SI (plus:SI (reg/f:SI 15 r15)
                                                            (const_int 764 [0x2fc]))
                                                        (const_int 52 [0x34])) [5 %sfp+-196 S4 A32])
	GENERAL_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 0), can't combine
	reload_in_reg: (reg/f:SI 551)
	reload_reg_rtx: (reg/f:SI 2 r2 [434])
...
Comment 8 Kazumoto Kojima 2008-11-26 00:03:04 UTC
Vlad, thanks for taking the time to look into this!  Your comments
in #7 and http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01323.html
give a very clear picture of the problem.
Comment 9 hjl@gcc.gnu.org 2008-12-01 17:04:42 UTC
Subject: Bug 37514

Author: hjl
Date: Mon Dec  1 17:03:13 2008
New Revision: 142324

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=142324
Log:
2008-12-01  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/37514
	* reload1.c (reload_as_needed): Invalidate reg_last_reload
	from previous insns.

Modified:
    branches/ira-merge/gcc/ChangeLog.ira
    branches/ira-merge/gcc/reload1.c

Comment 10 Vladimir Makarov 2008-12-01 19:33:03 UTC
Subject: Bug 37514

Author: vmakarov
Date: Mon Dec  1 19:31:41 2008
New Revision: 142328

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=142328
Log:
2008-12-01  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/37514
	* reload1.c (reload_as_needed): Invalidate reg_last_reload
	from previous insns.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/reload1.c

Comment 11 Kazumoto Kojima 2008-12-02 08:36:48 UTC
Fixed.