Bug 60604 - [4.9 Regression] GCC incorrectly compiles s_csinh function on MIPS32 (32bit fp)
Summary: [4.9 Regression] GCC incorrectly compiles s_csinh function on MIPS32 (32bit fp)
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.9.0
: P3 normal
Target Milestone: 4.9.0
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2014-03-20 17:25 UTC by Steve Ellcey
Modified: 2014-04-01 10:45 UTC (History)
3 users (show)

See Also:
Host:
Target: mips*-*-* (o32/eabi32)
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-03-30 00:00:00


Attachments
Test case (610 bytes, text/x-csrc)
2014-03-20 17:25 UTC, Steve Ellcey
Details
New reduced test case (522 bytes, text/x-csrc)
2014-03-21 23:45 UTC, Steve Ellcey
Details
Tentative patch (1.25 KB, patch)
2014-03-30 21:40 UTC, Richard Sandiford
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Ellcey 2014-03-20 17:25:39 UTC
Created attachment 32410 [details]
Test case

While looking at a GCC fortran test failure (gfortran.dg/complex_intrinsic_3.f90) I tracked the problem down to GCC miscompiling the routine.  I have attached a cutdown test case to this report.

When compiled for mips32r2, big-endian, hard-float, and using -O0 or -O1
the attached program prints:

x = 0.634964 1.298458
rcls = 4, icls = 4, t = 709
In __csinh(1), 0.634964 1.29846
In __csinh(2), 0.634964 1.29846
x = 0.000000 0.000000

When compiled with -O2 or -O3 it prints:

x = 0.634964 1.298458
rcls = 4, icls = 4, t = 709
In __csinh(1), 0.634964 1.29846
In __csinh(2), 1.00959e+116 1.29846
In if statement
x = 0.000000 0.000000

Somehow the optimized code is corrupting the input value of csinh.
Comment 1 Steve Ellcey 2014-03-21 23:45:54 UTC
Created attachment 32428 [details]
New reduced test case

Here is a new reduced test case that calls no libm functions.  I am pretty sure that the bug is in the handling of __builtin_fabs and specifically in expand_absneg_bit where we use the MIPS exp instruction to genreate a floating point absolute value.  I am not sure exactly what the problem is, but if I have this routine always return NULL_RTX and use a differentr method of computing a floating point absolute value then my program works.

I thought it might be a simulator issue since I do most of my testing with qemu but I ran the executable on hardware and got failures there.

The new reduced test case prints with -O0 or -O1 prints:

x = 0.634964 1.298458
In __csinh(1), 0.634964 1.29846
In __csinh(2), 0.634964 1.29846
x = 0.000000 0.000000

With -O2 or -O3 it prints:

x = 0.634964 1.298458
In __csinh(1), 0.634964 1.29846
In __csinh(2), 1.00959e+116 1.29846
In if statement
x = 0.000000 0.000000
Comment 2 Steve Ellcey 2014-03-21 23:54:28 UTC
Richard and Andrew, I hope you don't mind me adding you to the CC list for this bug but I am curious if you can reproduce this bug and if you have any ideas on what could be going wrong.
Comment 3 Andrew Pinski 2014-03-22 00:48:40 UTC
The code generation looks incorrect to me:

	mfc1	$3,$f12
	mfc1	$2,$f12
	move	$17,$3
	jal	myclassify
	ext	$16,$2,0,31


Notice how $2/$3 are moving from the same register, one of them should have been $f13.


The problem is due to paired floating point registers are swapped for big-endian.
/* Paired FPRs are always ordered little-endian.  */

So when the register allocator is figuring out which register to copy from for the high subreg of the DF mode:
(insn 23 22 24 (set (subreg:SI (reg:DF 200 [ D.2940 ]) 0)
        (and:SI (subreg:SI (reg:DF 194 [ D.2940 ]) 0)
            (const_int 2147483647 [0x7fffffff]))) t77.c:25 -1
     (nil))

It decides that is the same as the register which is incorrect for pair float.

This is what the register allocator produces:

(insn 110 8 23 2 (set (reg:SI 2 $2)
        (reg:SI 44 $f12)) t77.c:25 302 {*movsi_internal}
     (nil))
(insn 23 110 111 2 (set (reg:SI 16 $16 [ D.2940 ])
        (and:SI (reg:SI 2 $2)
            (const_int 2147483647 [0x7fffffff]))) t77.c:25 157 {*andsi3}
     (nil))

This is incorrect as we should be using $f13 or the full DI mode instead.
Comment 4 Steve Ellcey 2014-03-24 23:10:29 UTC
I see what you mean about the bad code and if I change it by hand (copying $2 to $f13 instead of $f12 then the code does work.  I am not sure how to fix the register allocator though.  I thought maybe REG_WORDS_BIG_ENDIAN would do what I wanted but if I set it to '0', then my build aborts.
Comment 5 Steve Ellcey 2014-03-25 22:12:18 UTC
I have tracked this problem down to this checkin:

commit 284f069678f0b28c57e62b5da9b6dfed77d4d700
Author: vmakarov <vmakarov@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Oct 30 14:27:25 2013 +0000

    2013-10-30  Vladimir Makarov  <vmakarov@redhat.com>
   
        * regmove.c: Remove.
        * tree-pass.h (make_pass_regmove): Remove.
        * timevar.def (TV_REGMOVE): Remove.
        (etc.......)


Note that after this checkin GCC for mips does not build, it gets a 
seg fault while compiling the run-time libraries.  I did verify that
the cc1 that we have with this checkin generates the bad code though (moving $f12 into $2 and $3, instead of putting $f12 into $3 and $f13 into $2).

The complete build starts working again with 682e13bb33f851d8d89507918abf1da1a554c5fc and that version also gives us bad code.

Before this patch GCC did not use $f12 and $f13 at all so the issue of moving them incorrectly does not come up.

It looks like this is new for 4.9 so I am going to mark this as a 4.9 regression.
Comment 6 Steve Ellcey 2014-03-27 16:43:46 UTC
I think the underlying bug here is the code we generate for builtin_fabs.
In emit-rtl.c (validate_subreg) I see this comment:

  /* Subregs involving floating point modes are not allowed to
     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
     (subreg:SI (reg:DF) 0) isn't.  */

But on mips, builtin_fabs calls expand_absneg_bit and that generates
this RTL:

;; _11 = ABS_EXPR <_4>;

(insn 26 25 27 (set (subreg:SI (reg:DF 199 [ D.3267 ]) 0)
        (and:SI (subreg:SI (reg:DF 194 [ D.3267 ]) 0)
            (const_int 2147483647 [0x7fffffff]))) x.c:25 -1
     (nil))

(insn 27 26 0 (set (subreg:SI (reg:DF 199 [ D.3267 ]) 4)
        (subreg:SI (reg:DF 194 [ D.3267 ]) 4)) x.c:25 -1
     (nil))

Which would seem to violate the restrictions in the comment.
Comment 7 Steve Ellcey 2014-03-27 18:42:19 UTC
I didn't notice that just before the emit-rtl.c (validate_subregs) comment
that says:

  /* Subregs involving floating point modes are not allowed to
     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
     (subreg:SI (reg:DF) 0) isn't.  */


Is another comment (and code) that has:

  /* ??? This should not be here.  Temporarily continue to allow word_mode
     subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
     Generally, backends are doing something sketchy but it'll take time to
     fix them all.  */
  if (omode == word_mode)
    ;

This is why we are creating the 'bad' subregs.
Comment 8 Richard Sandiford 2014-03-30 21:38:56 UTC
The subregs on pseudos seem fine.  I think the problem comes when
IRA substitutes $f12 into them.  I.e.:

(insn 26 8 27 2 (set (subreg:SI (reg:DF 199 [ D.3267 ]) 0)
        (and:SI (subreg:SI (reg/v:DF 205 [ x ]) 0)
            (const_int 2147483647 [0x7fffffff]))) /home/richards/Downloads/reduced.c:25 157 {*andsi3}
     (nil))
(insn 27 26 9 2 (set (subreg:SI (reg:DF 199 [ D.3267 ]) 4)
        (subreg:SI (reg/v:DF 205 [ x ]) 4)) /home/richards/Downloads/reduced.c:25 286 {*movsi_internal}
     (nil))

becomes:

(insn 26 8 27 2 (set (subreg:SI (reg:DF 199 [ D.3267 ]) 0)
        (and:SI (subreg:SI (reg:DF 44 $f12) 0)
            (const_int 2147483647 [0x7fffffff]))) /home/richards/Downloads/reduced.c:25 157 {*andsi3}
     (nil))
(insn 27 26 9 2 (set (subreg:SI (reg:DF 199 [ D.3267 ]) 4)
        (subreg:SI (reg:DF 44 $f12) 4)) /home/richards/Downloads/reduced.c:25 286 {*movsi_internal}
     (nil))

These operands would normally be invalid register_operands thanks to:

#ifdef CANNOT_CHANGE_MODE_CLASS
      if (REG_P (sub)
	  && REGNO (sub) < FIRST_PSEUDO_REGISTER
	  && REG_CANNOT_CHANGE_MODE_P (REGNO (sub), GET_MODE (sub), mode)
	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_INT
	  && GET_MODE_CLASS (GET_MODE (sub)) != MODE_COMPLEX_FLOAT
	  /* LRA can generate some invalid SUBREGS just for matched
	     operand reload presentation.  LRA needs to treat them as
	     valid.  */
	  && ! LRA_SUBREG_P (op))
	return 0;
#endif

The problem is that general_operand and nonmemory_operand don't check
the same thing.

You could argue that reload should be able to cope correctly with the
"invalid" subregs, and maybe LRA would, but really we shouldn't create
insns that we know are going to need a reload.

The problem is, general_operand, nonmemory_operand and register_operand
all have slightly different ideas what a register operand is.  So although
the patch I'm about to attach seems to fix it, it might be too drastic
for this stage.
Comment 9 Richard Sandiford 2014-03-30 21:40:01 UTC
Created attachment 32491 [details]
Tentative patch
Comment 10 Richard Biener 2014-03-31 09:48:27 UTC
no 32bit mips target in the primary/secondary target list -> P4.
Comment 11 Andrew Pinski 2014-03-31 10:54:32 UTC
(In reply to Richard Biener from comment #10)
> no 32bit mips target in the primary/secondary target list -> P4.

But it fails with -mabi=o32 on mips64 so asking to reconsider the priority.
Comment 12 Richard Biener 2014-03-31 11:15:47 UTC
ABI changing option makes it a differen triplet IMHO (but it's always a hard
guess as to exactly what variants/multilibs are supposed to be in this list).
Comment 13 Richard Sandiford 2014-04-01 10:38:51 UTC
Author: rsandifo
Date: Tue Apr  1 10:38:19 2014
New Revision: 208984

URL: http://gcc.gnu.org/viewcvs?rev=208984&root=gcc&view=rev
Log:
gcc/
	PR rtl-optimization/60604
	* recog.c (general_operand): Incorporate REG_CANNOT_CHANGE_MODE_P
	check from register_operand.
	(register_operand): Redefine in terms of general_operand.
	(nonmemory_operand): Use register_operand for the non-constant cases.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/recog.c
Comment 14 Richard Sandiford 2014-04-01 10:45:14 UTC
Patch applied.