This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns
- From: Ilya Enkovich <enkovich dot gnu at gmail dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "H.J. Lu" <hjl dot tools at gmail dot com>, Jakub Jelinek <jakub at redhat dot com>
- Date: Thu, 28 Apr 2016 13:36:30 +0300
- Subject: Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns
- Authentication-results: sourceware.org; auth=none
- References: <CAFULd4afCkFamkaWA3F_30AaW9ZerBWKj90F2AR_z4wTjNGcNA at mail dot gmail dot com>
2016-04-27 22:58 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
> Hello!
>
> This RFC patch illustrates the idea of using STV pass to load/store
> any TImode constant using SSE insns. The testcase:
>
> --cut here--
> __int128 x;
>
> __int128 test_1 (void)
> {
> x = (__int128) 0x00112233;
> }
>
> __int128 test_2 (void)
> {
> x = ((__int128) 0x0011223344556677 << 64);
> }
>
> __int128 test_3 (void)
> {
> x = ((__int128) 0x0011223344556677 << 64) + (__int128) 0x0011223344556677;
> }
> --cut here--
>
> currently compiles (-O2) on x86_64 to:
>
> test_1:
> movq $1122867, x(%rip)
> movq $0, x+8(%rip)
> ret
>
> test_2:
> xorl %eax, %eax
> movabsq $4822678189205111, %rdx
> movq %rax, x(%rip)
> movq %rdx, x+8(%rip)
> ret
>
> test_3:
> movabsq $4822678189205111, %rax
> movabsq $4822678189205111, %rdx
> movq %rax, x(%rip)
> movq %rdx, x+8(%rip)
> ret
>
> However, using the attached patch, we compile all tests to:
>
> test:
> movdqa .LC0(%rip), %xmm0
> movaps %xmm0, x(%rip)
> ret
>
> Ilya, HJ - do you think new sequences are better, or - as suggested by
> Jakub - they are beneficial with STV pass, as we are now able to load
> any immediate value? A variant of this patch can also be used to load
> DImode values to 32bit STV pass.
>
> Uros.
Hi,
Why don't we have two movq instructions in all three cases now? Is it
because of late split?
I wouldn't say SSE load+store is always better than two movq instructions.
But it obviously can enable bigger chains for STV which is good. I think
you should adjust a cost model to handle immediates for proper decision.
That's what I have in my draft for DImode immediates:
@@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates,
unsigned insn_uid)
BITMAP_FREE (queue);
}
+/* Return a cost of building a vector costant
+ instead of using a scalar one. */
+
+int
+scalar_chain::vector_const_cost (rtx exp)
+{
+ gcc_assert (CONST_INT_P (exp));
+
+ if (const0_operand (exp, GET_MODE (exp))
+ || constm1_operand (exp, GET_MODE (exp)))
+ return COSTS_N_INSNS (1);
+ return ix86_cost->sse_load[1];
+}
+
/* Compute a gain for chain conversion. */
int
@@ -3145,11 +3168,25 @@ scalar_chain::compute_convert_gain ()
|| GET_CODE (src) == IOR
|| GET_CODE (src) == XOR
|| GET_CODE (src) == AND)
- gain += ix86_cost->add;
+ {
+ gain += ix86_cost->add;
+ if (CONST_INT_P (XEXP (src, 0)))
+ gain -= scalar_chain::vector_const_cost (XEXP (src, 0));
+ if (CONST_INT_P (XEXP (src, 1)))
+ gain -= scalar_chain::vector_const_cost (XEXP (src, 1));
+ }
else if (GET_CODE (src) == COMPARE)
{
/* Assume comparison cost is the same. */
}
+ else if (GET_CODE (src) == CONST_INT)
+ {
+ if (REG_P (dst))
+ gain += COSTS_N_INSNS (2);
+ else if (MEM_P (dst))
+ gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1];
+ gain -= scalar_chain::vector_const_cost (src);
+ }
else
gcc_unreachable ();