[RTL, i386] Use subreg instead of UNSPEC_CAST

Thu Mar 21 09:24:00 GMT 2013

On Wed, Mar 20, 2013 at 4:54 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Wed, 20 Mar 2013, Richard Biener wrote:
>
>> On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> On Wed, 20 Mar 2013, Richard Henderson wrote:
>>>
>>>> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>>>>>
>>>>>
>>>>> Do you at least agree that vector-vector subregs make sense, or is that
>>>>> part
>>>>> wrong as well?
>>>>
>>>>
>>>>
>>>> You mean a V4SImode subreg of a V8SImode register, not just same-size
>>>> casting?
>>>
>>>
>>>
>>> I am mostly interested in the reverse, a paradoxical subreg, since
>>> vec_select can only model one direction (and only rvalues, but that's a
>>> different question).
>>
>>
>> vec_duplicate?
>
>
> There is already some of that in various places, and there may be even more
> vec_merge+vec_duplicate patterns soon, but you want to make sure you don't
> actually do the duplication.
>
>
>> Honestly, what semantics should  _mm256_castpd128_pd256 have if
>> it is supposed to cast a v2df to a v4df?
>
>
> NOP. We don't care what is in the high part of the vector.
>
>> Or what use?
>
>
> Many vector operations are defined as taking 2 vectors and merging them
> somehow. I didn't check if this case works, but for instance if you want to
> copy a V2DF to the bottom part of a V4DF using Intel's intrinsics, you will
> probably have to cast the V2DF to a V4DF and then use an intrinsic that
> takes 2 V4DF. (there are many issues with those intrinsics, but we don't
> control them)

Hmm, I see.  I still think that we should expose most of the intrinsics
and builtins implementation details earlier, at the GIMPLE level.  This one
would be an awkward one there, too.  You'd need sth like

  v4df_3 = CONSTRUCTOR { v2df_2, v2df_1(D) };

thus, make that "uninitialized" explicit by using a default def.  I
think we don't
support generating the above from C/C++ source with GNU extensions as
vector type casts are quite restricted at the moment, so there you'd have
to write sth like

  double uninit;
  v4df res = { v2dfv[0], v2dfv[1], uninit, uninit };

which would get you

  D.1723 = BIT_FIELD_REF <x, 64, 0>;
  D.1724 = BIT_FIELD_REF <x, 64, 64>;
  D.1725 = {D.1723, D.1724, uninit, uninit};

at the moment.  And of course awkward code in the end ;)

Which leaves the other option of folding the __builtin_ia32_ps256_ps
in the target (and most other builtins).

Just side-tracking from the RTL issue of course ...

Richard.

> --
> Marc Glisse