Bug 19250 - minss/maxss SSE insn not generated for -mfpmath=sse
Summary: minss/maxss SSE insn not generated for -mfpmath=sse
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.0.0
: P2 normal
Target Milestone: 4.0.0
Assignee: Richard Henderson
Keywords: missed-optimization, ssemmx
Depends on:
Reported: 2005-01-04 08:09 UTC by Uroš Bizjak
Modified: 2005-01-18 12:38 UTC (History)
2 users (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2005-01-12 02:48:34


Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2005-01-04 08:09:46 UTC
This testcase should generate min{s,d}s, max{s,d}s SSE insn. It looks that
min?f, max?f pattern is converted to equivalent i387 insn sequence, because the
output is expected in FP reg. However, the result of min/max should be _moved_
from SSE to FP reg.


float minf(float a, float b) {
  return a <= b ? a : b;

when compiled with 'gcc -O2 -ffast-math -march=pentium4 -mfpmath=sse

        subl    $4, %esp
        flds    8(%esp)
        movss   12(%esp), %xmm0
        movss   %xmm0, (%esp)
        flds    (%esp)
        fcomi   %st(1), %st
        fcmovnb %st(1), %st
        fstp    %st(1)
        addl    $4, %esp

Equivalent code could be something like:
	subl	$4, %esp
	movss	8(%esp), %xmm0
	minss	%xmm0, 12(%esp)
	movss	%xmm0, (%esp)
	flds	(%esp)
	addl	$4, %esp

Comment 1 Andrew Pinski 2005-01-04 08:27:30 UTC
Confirmed but note this is a register allocator problem really.
in lreg:
(insn:HI 12 8 16 0 (parallel [
            (set (reg/v:SF 59 [ a ])
                (if_then_else:SF (lt (reg/v:SF 59 [ a ])
                        (reg/v:SF 60 [ b ]))
                    (reg/v:SF 59 [ a ])
                    (reg/v:SF 60 [ b ])))
            (clobber (reg:CC 17 flags))
        ]) 654 {*minsf_nonieee} (insn_list:REG_DEP_TRUE 6 (insn_list:REG_DEP_TRUE 7 (nil)))
    (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_DEAD (reg/v:SF 60 [ b ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)

After global/postreload:
(insn:HI 12 32 16 0 (parallel [
            (set (reg/v:SF 8 st [orig:59 a ] [59])
                (if_then_else:SF (lt (reg/v:SF 8 st [orig:59 a ] [59])
                        (reg:SF 9 st(1)))
                    (reg/v:SF 8 st [orig:59 a ] [59])
                    (reg:SF 9 st(1))))
            (clobber (reg:CC 17 flags))
        ]) 654 {*minsf_nonieee} (insn_list:REG_DEP_TRUE 6 (insn_list:REG_DEP_TRUE 7 (nil)))

Note we chose the fp "registers".
Comment 2 Andreas Jaeger 2005-01-04 09:05:13 UTC
Btw. we do the right thing for 64-bit x86-64: 
$ gcc -O2 -ffast-math  -c t.c 
$ objdump -d t.o 
t.o:     file format elf64-x86-64 
Disassembly of section .text: 
0000000000000000 <minf>: 
   0:   f3 0f 5d c1             minss  %xmm1,%xmm0 
   4:   c3                      retq 
With -m32 I get the same ugly code. 
Comment 3 Uroš Bizjak 2005-01-07 11:15:28 UTC
The same trick as in http://gcc.gnu.org/ml/gcc-patches/2005-01/msg00394.html
with sqrtf applied to return value gets the code as expected.
        subl    $4, %esp
        movss   8(%esp), %xmm0
        movss   12(%esp), %xmm1
        minss   %xmm1, %xmm0
        sqrtss  %xmm0, %xmm0
        movss   %xmm0, (%esp)
        flds    (%esp)
        addl    $4, %esp
Comment 4 CVS Commits 2005-01-14 00:34:02 UTC
Subject: Bug 19250

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	rth@gcc.gnu.org	2005-01-14 00:33:51

Modified files:
	gcc            : ChangeLog 
	gcc/config/i386: i386-protos.h i386.c i386.md 

Log message:
	PR target/19099
	PR target/19250
	PR target/19252
	* config/i386/i386.md (cmpdf, cmpsf, bunordered, bordered, buneq,
	bunge, bungt, bunle, bunlt, bltgt): Enable for TARGET_SSE_MATH,
	not just TARGET_SSE.
	(cmpfp_i_387): Rename from cmpfp_i.  Move after sse patterns.
	(cmpfp_i_mixed): Rename from cmpfp_i_sse; use for TARGET_MIX_SSE_I387.
	(cmpfp_i_sse): Rename from cmpfp_i_sse_only; use for TARGET_SSE_MATH.
	(cmpfp_iu_mixed, cmpfp_iu_sse, cmpfp_iu_387): Similarly.
	(fp_jcc_1_mixed, fp_jcc_1_sse, fp_jcc_1_387): Similarly.
	(fp_jcc_2_mixed, fp_jcc_2_sse, fp_jcc_2_387): Similarly.
	(fp_jcc_3_387, fp_jcc_4_387, fp_jcc_5_387, fp_jcc_6_387,
	fp_jcc_7_387, fp_jcc_8_387): Rename from fp_jcc_N.
	(movdicc_c_rex64): Rename with '*'.
	(movsfcc, movdfcc): Add checks for 387 and sse math to condition.
	(movsfcc_1_sse_min, movsfcc_1_sse_max, movsfcc_1_sse): New.
	(movsfcc_1_387): Rename from movsfcc_1.
	(movdfcc_1_sse_min, movdfcc_1_sse_max, movdfcc_1_sse): New.
	(movdfcc_1, movdfcc_1_rex64): Add check for 387.
	(sminsf3, smaxsf3, smindf3, smaxdf3): New.
	(minsf3, minsf, minsf_nonieee, minsf_sse, mindf3, mindf,
	mindf_nonieee, mindf_sse, maxsf3, maxsf, maxsf_nonieee, maxsf_sse,
	maxdf3, maxdf, maxdf_nonieee, maxdf_sse, sse_movsfcc, sse_movsfcc_eq,
	sse_movdfcc, sse_movdfcc_eq, sse_movsfcc_const0_1,
	sse_movsfcc_const0_2, sse_movsfcc_const0_3, sse_movsfcc_const0_4,
	sse_movdfcc_const0_1, sse_movdfcc_const0_2, sse_movdfcc_const0_3,
	sse_movdfcc_const0_4): Remove.
	* config/i386/i386.c (ix86_expand_fp_movcc): For TARGET_SSE_MATH,
	recognize min/max early.  Update for changed sse cmove patterns.
	(ix86_split_sse_movcc): New.
	* config/i386/i386-protos.h: Update.


Comment 5 Richard Henderson 2005-01-14 01:03:26 UTC
I believe the problem you ascribe to this bug is fixed.  Note that we do not
generate minss for the given example because "<=" is not the operation of the
minss instruction; it performs "<".  Which is relevant for "+0.0 < -0.0".

We *ought* to emit minss with -ffast-math, but that should happen via
noce_try_minmax invoking the sminsf3 pattern, rather than special-casing
this in the back-end.  Open a separate PR for that if you like.

But on the bright side, we at least generate a conditional move sequence:

        cmpless %xmm1, %xmm0
        andps   %xmm0, %xmm2
        andnps  %xmm1, %xmm0
        orps    %xmm2, %xmm0
Comment 6 Uroš Bizjak 2005-01-14 06:37:05 UTC
This comment was found in fold-const.c:

  /* Try some transformations of A op B ? A : B.

     A == B? A : B    same as B
     A != B? A : B    same as A
     A >= B? A : B    same as max (A, B)
     A > B?  A : B    same as max (B, A)
     A <= B? A : B    same as min (A, B)
     A < B?  A : B    same as min (B, A)

     As above, these transformations don't work in the presence
     of signed zeros.  For example, if A and B are zeros of
     opposite sign, the first two transformations will change
     the sign of the result.  In the last four, the original
     expressions give different results for (A=+0, B=-0) and
     (A=-0, B=+0), but the transformed expressions do not.

     The first two transformations are correct if either A or B
     is a NaN.  In the first transformation, the condition will
     be false, and B will indeed be chosen.  In the case of the
     second transformation, the condition A != B will be true,
     and A will be chosen.

     The conversions to max() and min() are not correct if B is
     a number and A is not.  The conditions in the original
     expressions will be false, so all four give B.  The min()
     and max() versions would give a NaN instead.  */

The testcase is ineed a bit unfortunate. I'll open an enhancement request
regarding -ffast-math, as you suggested.
Comment 7 Uroš Bizjak 2005-01-18 12:38:14 UTC
With current mainline, testcase from description generates minss instruction as
expected, when -ffast-math is used.

gcc -O2 -msse2 -mfpmath=sse -ffast-math:

        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movss   12(%ebp), %xmm0
        minss   8(%ebp), %xmm0
        movss   %xmm0, -4(%ebp)
        flds    -4(%ebp)

It also works for max and all other cases from comment #6, so there is in fact
no need for a new PR.