[PATCH, i386]: Optimize SSE float->double conversions

Uros Bizjak uros.bizjak@kss-loka.si
Fri Nov 4 08:03:00 GMT 2005


Hello!

In current mainline gcc, following code produces unnecessary move to xmm register:

void test_fp (float *a, double *b)
{
  int i;

  for (i = 0; i < 4; i++)
    b[i] = (double) a[i];
}

gcc -O2 -msse2 -mfpmath=sse produces:

.L2:
        movss   -4(%ecx,%eax,4), %xmm0  <<< generated in expander
        cvtss2sd        %xmm0, %xmm0
        movsd   %xmm0, -8(%edx,%eax,8)  <<< generated in reload
        addl    $1, %eax
        cmpl    $5, %eax
        jne     .L2

The problem here is in extend?f?f expanders, that force operand[1] into register
in case both operands are memory operands. This is OK for 387, as its relevant
insn patterns can handle if either op0 or op1 is memory operand. However, SSE
extend* patterns can only handle operand[1] as memory operand. It is obvious
that it is not good to force operand[1] into register for SSE insn patterns.

To proper shadow i387 extendsfdf pattern, SSE extendsfdf pattern requires
nonimmediate operand 0, but it is constrainted to XMM register only. Reload is
perfectly capable to fix this even for memory operand 0 by generating necessary
output reload to memory.

Attached patch fixes this problem by removing artifical "both operands can't be
in memory" limitation. In this case, combine is free to produce (valid) pattern
with both memory operands:

(insn 27 25 28 1 (set (mem:DF (plus:SI (plus:SI (mult:SI (reg:SI 69 [ ivtmp.37 ])
                        (const_int 8 [0x8]))
                    (reg/v/f:SI 71 [ b ]))
                (const_int -8 [0xfffffff8])) [2 S8 A64])
        (float_extend:DF (mem:SF (plus:SI (plus:SI (mult:SI (reg:SI 69 [ ivtmp.37 ])
                            (const_int 4 [0x4]))
                        (reg/v/f:SI 70 [ a ]))
                    (const_int -4 [0xfffffffc])) [3 S4 A32]))) 85
{*extendsfdf2_sse} (nil)
    (nil))

The register/memory constrains are further resolved in reload, according to
different constraint of *extendsfdf2_sse or *extendsfdf2_i387. Following
sequences are produced:

-mfpmath=387 (input reload):

(insn 58 25 27 1 (set (reg:SF 8 st)
        (mem:SF (plus:SI (plus:SI (mult:SI (reg:SI 0 ax [orig:69 ivtmp.37 ] [69])
                        (const_int 4 [0x4]))
                    (reg/v/f:SI 2 cx [orig:70 a ] [70]))
                (const_int -4 [0xfffffffc])) [3 S4 A32])) 59 {*movsf_1} (nil)
    (nil))

(insn:HI 27 58 28 1 (set (mem:DF (plus:SI (plus:SI (mult:SI (reg:SI 0 ax
[orig:69 ivtmp.37 ] [69])
                        (const_int 8 [0x8]))
                    (reg/v/f:SI 1 dx [orig:71 b ] [71]))
                (const_int -8 [0xfffffff8])) [2 S8 A64])
        (float_extend:DF (reg:SF 8 st))) 86 {*extendsfdf2_i387} (nil)
    (nil))

-mfpmath=sse (output reload):

(insn:HI 27 25 58 1 (set (reg:DF 21 xmm0)
        (float_extend:DF (mem:SF (plus:SI (plus:SI (mult:SI (reg:SI 0 ax
[orig:69 ivtmp.37 ] [69])
                            (const_int 4 [0x4]))
                        (reg/v/f:SI 2 cx [orig:70 a ] [70]))
                    (const_int -4 [0xfffffffc])) [3 S4 A32]))) 85
{*extendsfdf2_sse} (nil)
    (nil))

(insn 58 27 28 1 (set (mem:DF (plus:SI (plus:SI (mult:SI (reg:SI 0 ax [orig:69
ivtmp.37 ] [69])
                        (const_int 8 [0x8]))
                    (reg/v/f:SI 1 dx [orig:71 b ] [71]))
                (const_int -8 [0xfffffff8])) [2 S8 A64])
        (reg:DF 21 xmm0)) 63 {*movdf_nointeger} (nil)
    (nil))

these patterns results in optimal asm:

.L2:
        flds    -4(%ecx,%eax,4)
        fstpl   -8(%edx,%eax,8)
        addl    $1, %eax
        cmpl    $5, %eax
        jne     .L2

and

.L2:
        cvtss2sd        -4(%ecx,%eax,4), %xmm0
        movsd   %xmm0, -8(%edx,%eax,8)
        addl    $1, %eax
        cmpl    $5, %eax
        jne     .L2

Similar problem was fixed for truncdfsf case.

Patch was bootstrapped and regtested on i686-pc-linux-gnu for c and c++.
Additionally, povray-3.6.1 was built for "-march=pentium -mfmpath=387",
"-march=pentium4 -mfpmath=sse" and "-march=pentium4 -mfpmath=387".

This patch fixes a problem, where unnecesary move is produced, spotted in
PR24659. PR19398 is different problem, caused by generic reload code.

OK for 4.2?

2005-11-05  Uros Bizjak  <uros@kss-loka.si>

	* config/i386/i386.md (extendsfdf, extendsfxf2, extenddfxf2): Do not
	force operand1 to register if both operands are memory operands.
	(*extendsfdf2_mixed, *extendsfdf2_sse, *extendsfdf2_i387)
	(*extendsfxf2_i387, *extenddfxf2_i387): Do not disable pattern
	if both operands are memory operands.
	(truncdfsf2): Do not force operand1 to register if both operands
	are memory operands.

Uros.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cvt.diff
Type: application/octet-stream
Size: 3221 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20051104/446e1786/attachment.obj>


More information about the Gcc-patches mailing list