[RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

Uros Bizjak ubizjak@gmail.com
Thu Apr 28 10:43:00 GMT 2016


On Thu, Apr 28, 2016 at 12:36 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2016-04-27 22:58 GMT+03:00 Uros Bizjak <ubizjak@gmail.com>:
>> Hello!
>>
>> This RFC patch illustrates the idea of using STV pass to load/store
>> any TImode constant using SSE insns. The testcase:
>>
>> --cut here--
>> __int128 x;
>>
>> __int128 test_1 (void)
>> {
>>   x = (__int128) 0x00112233;
>> }
>>
>> __int128 test_2 (void)
>> {
>>   x = ((__int128) 0x0011223344556677 << 64);
>> }
>>
>> __int128 test_3 (void)
>> {
>>   x = ((__int128) 0x0011223344556677 << 64) + (__int128) 0x0011223344556677;
>> }
>> --cut here--
>>
>> currently compiles (-O2) on x86_64 to:
>>
>> test_1:
>>         movq    $1122867, x(%rip)
>>         movq    $0, x+8(%rip)
>>         ret
>>
>> test_2:
>>         xorl    %eax, %eax
>>         movabsq $4822678189205111, %rdx
>>         movq    %rax, x(%rip)
>>         movq    %rdx, x+8(%rip)
>>         ret
>>
>> test_3:
>>         movabsq $4822678189205111, %rax
>>         movabsq $4822678189205111, %rdx
>>         movq    %rax, x(%rip)
>>         movq    %rdx, x+8(%rip)
>>         ret
>>
>> However, using the attached patch, we compile all tests to:
>>
>> test:
>>         movdqa  .LC0(%rip), %xmm0
>>         movaps  %xmm0, x(%rip)
>>         ret
>>
>> Ilya, HJ - do you think new sequences are better, or - as suggested by
>> Jakub - they are beneficial with STV pass, as we are now able to load
>> any immediate value? A variant of this patch can also be used to load
>> DImode values to 32bit STV pass.
>>
>> Uros.
>
> Hi,
>
> Why don't we have two movq instructions in all three cases now?  Is it
> because of late split?

movq can handle only 32bit sign-extended immediates. There is actually
room for improvement in test_2, where we could directly store 0 to
x(%rip).

Uros.

> I wouldn't say SSE load+store is always better than two movq instructions.
> But it obviously can enable bigger chains for STV which is good.  I think
> you should adjust a cost model to handle immediates for proper decision.
>
> That's what I have in my draft for DImode immediates:
>
> @@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates,
> unsigned insn_uid)
>    BITMAP_FREE (queue);
>  }
>
> +/* Return a cost of building a vector costant
> +   instead of using a scalar one.  */
> +
> +int
> +scalar_chain::vector_const_cost (rtx exp)
> +{
> +  gcc_assert (CONST_INT_P (exp));
> +
> +  if (const0_operand (exp, GET_MODE (exp))
> +      || constm1_operand (exp, GET_MODE (exp)))
> +    return COSTS_N_INSNS (1);
> +  return ix86_cost->sse_load[1];
> +}
> +
>  /* Compute a gain for chain conversion.  */
>
>  int
> @@ -3145,11 +3168,25 @@ scalar_chain::compute_convert_gain ()
>                || GET_CODE (src) == IOR
>                || GET_CODE (src) == XOR
>                || GET_CODE (src) == AND)
> -       gain += ix86_cost->add;
> +       {
> +         gain += ix86_cost->add;
> +         if (CONST_INT_P (XEXP (src, 0)))
> +           gain -= scalar_chain::vector_const_cost (XEXP (src, 0));
> +         if (CONST_INT_P (XEXP (src, 1)))
> +           gain -= scalar_chain::vector_const_cost (XEXP (src, 1));
> +       }
>        else if (GET_CODE (src) == COMPARE)
>         {
>           /* Assume comparison cost is the same.  */
>         }
> +      else if (GET_CODE (src) == CONST_INT)
> +       {
> +         if (REG_P (dst))
> +           gain += COSTS_N_INSNS (2);
> +         else if (MEM_P (dst))
> +           gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1];
> +         gain -= scalar_chain::vector_const_cost (src);
> +       }
>        else
>         gcc_unreachable ();



More information about the Gcc-patches mailing list