This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: Extend x86-64 psABI for 256bit AVX register


On Fri, Jun 6, 2008 at 4:40 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jun 6, 2008 at 7:31 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Fri, Jun 6, 2008 at 4:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Jun 06, 2008 at 06:50:26AM -0700, H.J. Lu wrote:
>>>> On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote:
>>>> > >
>>>> > > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
>>>> > > of xmm0. I am not sure if we need separate XMM registers from
>>>> > > YMM registers.
>>>> >
>>>> >
>>>> > Yes, I know that xmm0 is lower part of ymm0.  I still think we ought to
>>>> > be able to support varargs that do save ymm0 registers only when ymm
>>>> > values are passed same way as we touch SSE only when SSE values are
>>>> > passed via EAX hint.
>>>>
>>>> Which register do you propose for hint? The current psABI uses RAX
>>>> for XMM registers. We can't change it to AL and AH for YMM without
>>>> breaking backward compatibility.
>>>>
>>>> > This way we will be able to support e.g. printf that has YMM printing %
>>>> > construct but don't need YMM enabled hardware when those are not used.
>>>> >
>>>> > This is why I think extending EAX to contain information about amount of
>>>> > XMM values to save and in addition YMM values to save is sane.  Then old
>>>> > non-YMM aware varargs prologues will crash when YMM values are passed,
>>>> > but all other combinations will work.
>>>>
>>>> I don't think it is necessary since -mavx will enable AVX code
>>>> generation for all SSE codes. Unless the function only uses integer,
>>>> it will crash on non-YMM aware hardware.  That is if there is one
>>>> SSE register is used, which is hinted in RAX, varargs prologue will
>>>> use AVX instructions to save it. We don't need another hint for AVX
>>>> instructions.
>>>>
>>>> > >
>>>> > > >
>>>> > > > I personally don't have much preferences over 1. or 2.. 1. seems
>>>> > > > relatively easy to implement too, or is packaging two 128bit values to
>>>> > > > single 256bit difficult in va_arg expansion?
>>>> > > >
>>>> > >
>>>> > > Access to 256bit register as lower and upper 128bits needs 2
>>>> > > instructions. For store
>>>> > >
>>>> > > vmovaps   %xmm7, -143(%rax)
>>>> > > vextractf128 $1, %ymm7, -15(%rax)
>>>> > >
>>>> > > For load
>>>> > >
>>>> > > vmovaps  -143(%rax),%xmm7
>>>> > > vinsert128 $1, -15(%rax),%ymm7,%ymm7
>>>> > >
>>>> > > If we go beyond 256bit, we need more instructions to access
>>>> > > the full register. For 512bit, it will be split into lower 128bit,
>>>> > > middle 128bit and upper 256bit. 1024bit will have 4 parts.
>>>> > >
>>>> > > For #2, only one instruction will be needed for 256bit and
>>>> > > beyond.
>>>> >
>>>> > Yes, but we will still save half of stack space.  Well, I don't have
>>>> > much preferences here.  If it seems saner to simply save whole thing
>>>> > saving lower part twice, I am fine with that.
>>>>
>>>> I was told that it wasn't very easy to get decent performance with
>>>> split access. I extended my proposal to include a 16bit bitmask to
>>>> indicate which YMM regisetrs should be saved. If the bit is 0,
>>>> we should only save the the lower 128bit in the original register
>>>> save area. Otherwise, we should only save the same whole YMM register.
>>>>
>>>
>>> My second thought. How useful is such a bitmask? Do we really
>>> need it? Is that accepetable to save the lower 128bit twice?
>>
>> Why do we need to save the lower 128bit at all if a ymm reg is passed?
>> Can't we assume "type-correctness"?
>
> Say a double is passed in YMM0/XMM0, we should save it in XMM0 area.
> Do we also need to save the whole 256bit YMM0? If we save both XMM0 and
> YMM0, we are free to use any location to load the saved register content.
> Either one will be correct.

What is the benefit here?  (What would the contents of the upper 128bit
be - apart from "undefined")

I suppose you can load into xmm0 and then "extend" to ymm0?

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]