Bug 92905 - [10 Regression] Spills float-int union to memory
Summary: [10 Regression] Spills float-int union to memory
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: 10.0
Assignee: Vladimir Makarov
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks:
 
Reported: 2019-12-11 08:57 UTC by Alexander Monakov
Modified: 2019-12-21 15:06 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2019-12-11 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Monakov 2019-12-11 08:57:25 UTC
gcc-10 branch regressed for code that needs bitwise operations on floats:

float f(float x)
{
    union {float f; unsigned i;} u = {x};
    u.i |= 0x80000000;
    return u.f;
}

float my_copysign(float x, float y)
{
    union {float f; unsigned i;} ux = {x}, uy ={y};
    ux.i &= 0x7fffffff;
    ux.i |= 0x80000000 & uy.i;
    return ux.f;
}


For function 'f' gcc-10 -O2 -mtune=intel generates
f:
        movd    %xmm0, -4(%rsp)
        movl    $-2147483648, %eax
        orl     -4(%rsp), %eax
        movd    %eax, %xmm0
        ret

while gcc-9 and earlier generate code without stack use, even without -mtune=intel:
f:
        movd    %xmm0, %eax
        orl     $-2147483648, %eax
        movd    %eax, %xmm0
        ret

Likewise for the more realistic my_copysign, where ux is spilled, but uy is not.

Eventually it would be nicer to use SSE bitwise operations for this, for example LLVM already generates
f:
        orps    .LCPI0_0(%rip), %xmm0
Comment 1 Jakub Jelinek 2019-12-11 09:41:36 UTC
Started with r273554.
Comment 2 Jakub Jelinek 2019-12-11 09:58:25 UTC
In particular, reverting the:
--- gcc/config/i386/i386.md	2019-12-11 10:48:51.787820841 +0100
+++ gcc/config/i386/i386.md	2019-12-09 19:50:24.604955118 +0100
@@ -9223,10 +9223,10 @@
 })
 
 (define_insn "*<code><mode>_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm")
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r")
 	(any_or:SWI248
 	 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
-	 (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>")))
+	 (match_operand:SWI248 2 "<general_operand>" "r<i>,m")))
    (clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
   "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"

change fixes it.  I admit I have no idea why.  It is true that the operands are commutative, so RA can choose whether to put the constant or what used to be in xmm0 into the destination register, previously it chose what results in better code by putting the constant into the instruction and xmm0 moved into a GPR, now it spills xmm0 instead, puts the constant into the destination register and uses a memory operand.  The above mentioned commit changed this for commutative operands in several spots.
Comment 3 Jakub Jelinek 2019-12-11 11:14:38 UTC
Note, it isn't about <g>, using r<i>m in the first alternative of the reverted define_insn works well too, as well as swapping the alternatives (that is in that case basically what the trunk has, except in the second alternative the second input is r<i>m instead of m.  If the second alternative of second input is <i>m,
it works the same as current trunk (unnecessary spill), if it is rm, it works well.
Now, no idea if this isn't a bug in LRA or if there are some rules that the seemingly redundant constraints actually aren't redundant.
Comment 4 Alexander Monakov 2019-12-11 13:09:35 UTC
Perhaps only xmm0 is problematic, as making xmm0 unused by adding a dummy argument brings back the old spill-free result:

float my_copysign(float dummy, float x, float y)
{
    union {float f; unsigned i;} ux = {x}, uy ={y};
    ux.i &= 0x7fffffff;
    ux.i |= 0x80000000 & uy.i;
    return ux.f;
}

float f(float dummy, float x)
{
    union {float f; unsigned i;} u = {x};
    u.i |= 0x80000000;
    return u.f;
}
Comment 5 Vladimir Makarov 2019-12-18 19:13:24 UTC
(In reply to Jakub Jelinek from comment #3)
> Note, it isn't about <g>, using r<i>m in the first alternative of the
> reverted define_insn works well too, as well as swapping the alternatives
> (that is in that case basically what the trunk has, except in the second
> alternative the second input is r<i>m instead of m.  If the second
> alternative of second input is <i>m,
> it works the same as current trunk (unnecessary spill), if it is rm, it
> works well.
> Now, no idea if this isn't a bug in LRA or if there are some rules that the
> seemingly redundant constraints actually aren't redundant.

The culprit is 'g' vs 'm'.  When we have 'g' matching pseudo assigned to hard register, LRA ignores memory in 'g' when considering preferred_reload_class.

Reload also treats such situation in analogous way.  And actually LRA adapted the reload code.

So I can try to solve this PR.  But it will take some time.  Mostly any patch in this sensitive area should be tested and benchmarked well.
Comment 6 Vladimir Makarov 2019-12-19 22:00:24 UTC
Author: vmakarov
Date: Thu Dec 19 21:59:47 2019
New Revision: 279596

URL: https://gcc.gnu.org/viewcvs?rev=279596&root=gcc&view=rev
Log:
2019-12-19  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/92905
	* lra-constraints.c (process_alt_operands): Check offmemok when
	processing preferred_reload_class.

2019-12-19  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/92905
	* gcc.target/i386/pr92905.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr92905.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-constraints.c
    trunk/gcc/testsuite/ChangeLog
Comment 7 Jakub Jelinek 2019-12-19 23:05:53 UTC
Fixed, thanks.
Comment 8 Alexander Monakov 2019-12-21 15:06:36 UTC
(In reply to Alexander Monakov from comment #0)
> Eventually it would be nicer to use SSE bitwise operations for this, for
> example LLVM already generates
> f:
>         orps    .LCPI0_0(%rip), %xmm0

This is now reported separately as PR 93039.