This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: RFC: Extend x86-64 psABI for 256bit AVX register
- From: "Richard Guenther" <richard dot guenther at gmail dot com>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: "Jan Hubicka" <jh at suse dot cz>, "Jan Hubicka" <hubicka at ucw dot cz>, discuss at x86-64 dot org, GCC <gcc at gcc dot gnu dot org>, "Girkar, Milind" <milind dot girkar at intel dot com>, "Dmitriev, Serguei N" <serguei dot n dot dmitriev at intel dot com>, "Kreitzer, David L" <david dot l dot kreitzer at intel dot com>
- Date: Fri, 6 Jun 2008 16:31:19 +0200
- Subject: Re: RFC: Extend x86-64 psABI for 256bit AVX register
- References: <6dc9ffc80806050731s77b49d63id048d142d76560c9@mail.gmail.com> <20080605151511.GB24241@atrey.karlin.mff.cuni.cz> <6dc9ffc80806050914t76383385o380c0bb8ebc4e972@mail.gmail.com> <20080606082834.GC31743@kam.mff.cuni.cz> <20080606135026.GA14877@lucon.org> <20080606142813.GA18621@lucon.org>
On Fri, Jun 6, 2008 at 4:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jun 06, 2008 at 06:50:26AM -0700, H.J. Lu wrote:
>> On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote:
>> > >
>> > > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
>> > > of xmm0. I am not sure if we need separate XMM registers from
>> > > YMM registers.
>> >
>> >
>> > Yes, I know that xmm0 is lower part of ymm0. I still think we ought to
>> > be able to support varargs that do save ymm0 registers only when ymm
>> > values are passed same way as we touch SSE only when SSE values are
>> > passed via EAX hint.
>>
>> Which register do you propose for hint? The current psABI uses RAX
>> for XMM registers. We can't change it to AL and AH for YMM without
>> breaking backward compatibility.
>>
>> > This way we will be able to support e.g. printf that has YMM printing %
>> > construct but don't need YMM enabled hardware when those are not used.
>> >
>> > This is why I think extending EAX to contain information about amount of
>> > XMM values to save and in addition YMM values to save is sane. Then old
>> > non-YMM aware varargs prologues will crash when YMM values are passed,
>> > but all other combinations will work.
>>
>> I don't think it is necessary since -mavx will enable AVX code
>> generation for all SSE codes. Unless the function only uses integer,
>> it will crash on non-YMM aware hardware. That is if there is one
>> SSE register is used, which is hinted in RAX, varargs prologue will
>> use AVX instructions to save it. We don't need another hint for AVX
>> instructions.
>>
>> > >
>> > > >
>> > > > I personally don't have much preferences over 1. or 2.. 1. seems
>> > > > relatively easy to implement too, or is packaging two 128bit values to
>> > > > single 256bit difficult in va_arg expansion?
>> > > >
>> > >
>> > > Access to 256bit register as lower and upper 128bits needs 2
>> > > instructions. For store
>> > >
>> > > vmovaps %xmm7, -143(%rax)
>> > > vextractf128 $1, %ymm7, -15(%rax)
>> > >
>> > > For load
>> > >
>> > > vmovaps -143(%rax),%xmm7
>> > > vinsert128 $1, -15(%rax),%ymm7,%ymm7
>> > >
>> > > If we go beyond 256bit, we need more instructions to access
>> > > the full register. For 512bit, it will be split into lower 128bit,
>> > > middle 128bit and upper 256bit. 1024bit will have 4 parts.
>> > >
>> > > For #2, only one instruction will be needed for 256bit and
>> > > beyond.
>> >
>> > Yes, but we will still save half of stack space. Well, I don't have
>> > much preferences here. If it seems saner to simply save whole thing
>> > saving lower part twice, I am fine with that.
>>
>> I was told that it wasn't very easy to get decent performance with
>> split access. I extended my proposal to include a 16bit bitmask to
>> indicate which YMM regisetrs should be saved. If the bit is 0,
>> we should only save the the lower 128bit in the original register
>> save area. Otherwise, we should only save the same whole YMM register.
>>
>
> My second thought. How useful is such a bitmask? Do we really
> need it? Is that accepetable to save the lower 128bit twice?
Why do we need to save the lower 128bit at all if a ymm reg is passed?
Can't we assume "type-correctness"?
Richard.