gcc-10 branch regressed for code that needs bitwise operations on floats: float f(float x) { union {float f; unsigned i;} u = {x}; u.i |= 0x80000000; return u.f; } float my_copysign(float x, float y) { union {float f; unsigned i;} ux = {x}, uy ={y}; ux.i &= 0x7fffffff; ux.i |= 0x80000000 & uy.i; return ux.f; } For function 'f' gcc-10 -O2 -mtune=intel generates f: movd %xmm0, -4(%rsp) movl $-2147483648, %eax orl -4(%rsp), %eax movd %eax, %xmm0 ret while gcc-9 and earlier generate code without stack use, even without -mtune=intel: f: movd %xmm0, %eax orl $-2147483648, %eax movd %eax, %xmm0 ret Likewise for the more realistic my_copysign, where ux is spilled, but uy is not. Eventually it would be nicer to use SSE bitwise operations for this, for example LLVM already generates f: orps .LCPI0_0(%rip), %xmm0
Started with r273554.
In particular, reverting the: --- gcc/config/i386/i386.md 2019-12-11 10:48:51.787820841 +0100 +++ gcc/config/i386/i386.md 2019-12-09 19:50:24.604955118 +0100 @@ -9223,10 +9223,10 @@ }) (define_insn "*<code><mode>_1" - [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm") + [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r") (any_or:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0") - (match_operand:SWI248 2 "<general_operand>" "<g>,r<i>"))) + (match_operand:SWI248 2 "<general_operand>" "r<i>,m"))) (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)" "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}" change fixes it. I admit I have no idea why. It is true that the operands are commutative, so RA can choose whether to put the constant or what used to be in xmm0 into the destination register, previously it chose what results in better code by putting the constant into the instruction and xmm0 moved into a GPR, now it spills xmm0 instead, puts the constant into the destination register and uses a memory operand. The above mentioned commit changed this for commutative operands in several spots.
Note, it isn't about <g>, using r<i>m in the first alternative of the reverted define_insn works well too, as well as swapping the alternatives (that is in that case basically what the trunk has, except in the second alternative the second input is r<i>m instead of m. If the second alternative of second input is <i>m, it works the same as current trunk (unnecessary spill), if it is rm, it works well. Now, no idea if this isn't a bug in LRA or if there are some rules that the seemingly redundant constraints actually aren't redundant.
Perhaps only xmm0 is problematic, as making xmm0 unused by adding a dummy argument brings back the old spill-free result: float my_copysign(float dummy, float x, float y) { union {float f; unsigned i;} ux = {x}, uy ={y}; ux.i &= 0x7fffffff; ux.i |= 0x80000000 & uy.i; return ux.f; } float f(float dummy, float x) { union {float f; unsigned i;} u = {x}; u.i |= 0x80000000; return u.f; }
(In reply to Jakub Jelinek from comment #3) > Note, it isn't about <g>, using r<i>m in the first alternative of the > reverted define_insn works well too, as well as swapping the alternatives > (that is in that case basically what the trunk has, except in the second > alternative the second input is r<i>m instead of m. If the second > alternative of second input is <i>m, > it works the same as current trunk (unnecessary spill), if it is rm, it > works well. > Now, no idea if this isn't a bug in LRA or if there are some rules that the > seemingly redundant constraints actually aren't redundant. The culprit is 'g' vs 'm'. When we have 'g' matching pseudo assigned to hard register, LRA ignores memory in 'g' when considering preferred_reload_class. Reload also treats such situation in analogous way. And actually LRA adapted the reload code. So I can try to solve this PR. But it will take some time. Mostly any patch in this sensitive area should be tested and benchmarked well.
Author: vmakarov Date: Thu Dec 19 21:59:47 2019 New Revision: 279596 URL: https://gcc.gnu.org/viewcvs?rev=279596&root=gcc&view=rev Log: 2019-12-19 Vladimir Makarov <vmakarov@redhat.com> PR target/92905 * lra-constraints.c (process_alt_operands): Check offmemok when processing preferred_reload_class. 2019-12-19 Vladimir Makarov <vmakarov@redhat.com> PR target/92905 * gcc.target/i386/pr92905.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr92905.c Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c trunk/gcc/testsuite/ChangeLog
Fixed, thanks.
(In reply to Alexander Monakov from comment #0) > Eventually it would be nicer to use SSE bitwise operations for this, for > example LLVM already generates > f: > orps .LCPI0_0(%rip), %xmm0 This is now reported separately as PR 93039.