This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.
Toon Moene wrote:
> Paolo Bonzini wrote:
>
>>> Attached you'll find the (preprocessed) source of the routine that
>>> printed the Infinity's (of course, I cannot be completely certain that
>>> it actually resulted in the wrong code, but at least it might be studied
>>> to see if it helps to find the culprit).
>>
>> No, this function is sane (the peephole *is* called a lot by this
>> function, but all is in due order). I looked at the dumps and assembly
>> for -O2, -O3 and -O3 -fno-schedule-insns (*), and all is as expected.
>
> Yeah, it was probably too much to hope for.
No, you were right, and that's great. -ffast-math makes a difference,
because it enables more vectorization.
It goes as this:
(insn 494 493 495 44 statin.f:703 (set (reg:SF 371)
(vec_select:SF (reg:V4SF 367)
(parallel [
(const_int 0 [0x0])
]))) 1408 {*vec_extractv4sf_0} (expr_list:REG_DEAD
(reg:V4SF 367)
(nil)))
registers 371 and 367 are coalesced into xmm0. Then the vec_select is
split to just
(set (reg:SF 21 [orig: 371]) (reg:SF 21 [orig: 367]))
and these are indeed !=, but they have the same hard register number so
the peephole should not apply in this case. Here is a minimized testcase:
subroutine statin(x,y,pstratr,pconvecr,zhxy,zhxhy,ztmp)
integer :: x,y
real pstratr(x,y),pconvecr(x,y),zhxy(x,y)
real ztmp(4)
do j = 1,y
do i = 1,x-2
zttotrainr = zttotrainr + (pstratr(i,j) + pconvecr(i,j))*zhxy(i,j)
ztstratr = ztstratr + pstratr(i,j)
ztconvecr = ztconvecr + pconvecr(i,j)
ztsenf = ztsenf + zhxy(i,j)
ztlatf = ztlatf + zhxy(i,j)
ztcldtop = ztcldtop + zhxy(i,j)
enddo
enddo
ztmp(1)=zttotrainr
ztmp(2)=ztstratr
ztmp(3)=ztconvecr
ztmp(4)=ztsenf*ztlatf*ztcldtop
end
The following patch should fix it, you're welcome to run it through
HIRLAM. I'm bootstrapping it in the meanwhile.
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md (revision 144464)
+++ gcc/config/i386/i386.md (working copy)
@@ -20795,7 +20795,7 @@
[(match_dup 0)
(match_operand:SI 2 "memory_operand" "")]))
(clobber (reg:CC FLAGS_REG))])]
- "operands[0] != operands[1]
+ "!rtx_equal_p (operands[0], operands[1])
&& GENERAL_REGNO_P (REGNO (operands[0]))
&& GENERAL_REGNO_P (REGNO (operands[1]))"
[(set (match_dup 0) (match_dup 4))
@@ -20811,7 +20811,7 @@
(match_operator 3 "commutative_operator"
[(match_dup 0)
(match_operand 2 "memory_operand" "")]))]
- "operands[0] != operands[1]
+ "!rtx_equal_p (operands[0], operands[1])
&& ((MMX_REG_P (operands[0]) && MMX_REG_P (operands[1]))
|| (SSE_REG_P (operands[0]) && SSE_REG_P (operands[1])))"
[(set (match_dup 0) (match_dup 2))
Paolo