PATCH: Update classification of aggregates with __m256 ([AVX]: Update x86-64 psABI for aggregates with __m256)

H.J. Lu hjl.tools@gmail.com
Wed Feb 11 19:23:00 GMT 2009


On Wed, Feb 11, 2009 at 10:16 AM, Harsha Jagasia <harsha.jagasia@amd.com> wrote:
> Hi HJ,
>
> -\item If the size of an object is larger than two \eightbytes, or
> +\item If the size of an object is larger than four \eightbytes, or
> + it contains unaligned fields, it has class MEMORY.
>
> I think this may be confusing to the reader. It maybe better to state:
>
> "If the size of an object is larger than four eightbytes, or it contains unaligned fields, it has class MEMORY.

I can't tell the difference between my wording and yours.

> The post merger clean up ensures that, for the processors that do not support the __m256 type, if the size of an object is larger than two eightbytes and the first eightbyte is not SSE or any other eightbyte is not SSEUP, it still has class MEMORY.
>
> This in turn ensures that for processors that do support the __m256 type, if the size of an object is four eightbytes and the first eightbyte is SSE and all other eightbytes are SSEUP, it can be passed in a register."

That is our goal.

> -  \item If SSEUP is not preceeded by SSE, it is converted to SSE.
> +  \item If the size of the aggregate exceeds two \eightbytes and the first
> +    \eightbyte isn't SSE or any other \eightbyte isn't SSEUP, the whole
> +    argument is passed in memory.
> +  \item If SSEUP is not preceded by SSE or SSEUP, it is converted to SSE.
>
> Again it may be better to say:
> "Otherwise, if SSEUP is not preceded by SSE or SSEUP and the size of the aggregate does not exceed four eightbytes, it is converted to SSE."

The post merger clean up rules are in strict order. You added "and the
size of the aggregate does not exceed four eightbytes". I don't think it is
necessary since it is covered by "If the size of the aggregate
exceeds two \eightbytes and the first \eightbyte isn't SSE or any other
\eightbyte isn't SSEUP, the whole argument is passed in memory."
You don't need to check the size of the aggregate for the last
rule.

>> -\item If the class is SSE, the next available SSE register is used, the
>> +\item If the class is SSE, the next available vector register is used,
>> the
>>     registers are taken in the order from \reg{xmm0} to \reg{xmm7}.
>>
>> -\item If the class is SSE, the next available SSE register of the
>> +\item If the class is SSE, the next available vector register of the
>>     sequence \reg{xmm0}, \reg{xmm1} is used.
>>
>
> Maybe better to say \reg{xmm0} to \reg{xmm7} or \reg{ymm0} to \reg{ymm7}.

You can have mixed xmmN/ymmN. xmmN refers to vector register N,
which can be either 128bit or 256bit. Adding "\reg{ymm0} to \reg{ymm7}"
doesn't make it easier to understand.

>
>> Here are a few comments on the patch?
>>
>> -------------
>> @@ -5331,14 +5352,22 @@ construct_container (enum machine_mode m
>>           break;
>>         case X86_64_SSE_CLASS:
>>           if (i < n - 1 && regclass[i + 1] == X86_64_SSEUP_CLASS)
>> -           tmpmode = TImode;
>> +           {
>> +             if (regclass[i + 2] == X86_64_SSEUP_CLASS
>> +                 || regclass[i + 3] == X86_64_SSEUP_CLASS)
>> +               tmpmode = OImode;
>> +             else
>> +               tmpmode = TImode;
>> +           }
>>
>
>
> I would think a check for n is needed here. If n is 2, regclass[i + 2] and regclass[i + 3] would not be valid, right?
>

You are right. Let me think about it.

Thanks.


-- 
H.J.



More information about the Gcc-patches mailing list