This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: Representing vector lane load/store operations

From: Richard Guenther <richard dot guenther at gmail dot com>
To: Richard Guenther <richard dot guenther at gmail dot com>, gcc at gcc dot gnu dot org, rdsandiford at googlemail dot com
Date: Wed, 23 Mar 2011 10:23:11 +0100
Subject: Re: RFC: Representing vector lane load/store operations
References: <g4lj07w10t.fsf@linaro.org> <AANLkTi=mKUpvPTvcz83QoyufYDodcc_DeLut-mrVHqs0@mail.gmail.com> <87k4frlz5c.fsf@firetop.home>

On Tue, Mar 22, 2011 at 8:43 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>> Simple. ?Just make them registers anyway (I did that in the past
>> when working on middle-end arrays). ?You'd set DECL_GIMPLE_REG_P
>> on the decl.
>
> OK, thanks, I'll give that a go. ?TBH, I'm still hopeful we can
> do without it, because we do seem to cope quite well as things stand.
> But I suppose that might not hold true as the examples get more complicated.
>
>> ? 4. a vector-of-vectors type
>>
>> ? ? ?Cons
>> ? ? ? ? * I don't think we want that ;)
>
> Yeah :-)
>
>>> ? ?__builtin_load_lanes (REF : array N*M of X)
>>> ? ? ?returns array N of vector M of X
>>> ? ? ?maps to vldN on ARM
>>> ? ? ?in practice, the result would be used in assignments of the form:
>>> ? ? ? ?vectorY = ARRAY_REF <result, Y>
>>>
>>> ? ?__builtin_store_lanes (VECTORS : array N of vector M of X)
>>> ? ? ?returns array N*M of X
>>> ? ? ?maps to vstN on ARM
>>> ? ? ?in practice, the argument would be populated by assignments of the form:
>>> ? ? ? ?ARRAY_REF <VECTORS, Y> = vectorY
>>>
>>> ? ?__builtin_load_lane (REF : array N of X,
>>> ? ? ? ? ? ? ? ? ? ? ? ? VECTORS : array N of vector M of X,
>>> ? ? ? ? ? ? ? ? ? ? ? ? LANE : integer)
>>> ? ? ?returns array N of vector M of X
>>> ? ? ?maps to vldN_lane on ARM
>>>
>>> ? ?__builtin_store_lane (VECTORS : array N of vector M of X,
>>> ? ? ? ? ? ? ? ? ? ? ? ? ?LANE : integer)
>>> ? ? ?returns array N of X
>>> ? ? ?maps to vstN_lane on ARM
>>>
>>> ? ?__builtin_load_dup (REF : array N of X)
>>> ? ? ?returns array N of vector M of X
>>> ? ? ?maps to vldN_dup on ARM
>>>
>>> I've hacked up a prototype of this and it seems to produce good code.
>>> What do you think?
>>
>> How do you expect these to be used? ?That is, would you ever expect
>> components of those large vectors/arrays be used in operations
>> like add, or does the HW provide vector-lane variants for those as well?
>
> The individual vectors would be used for add, etc. ?That's what the
> ARRAY_REF stuff above is supposed to be getting at. ?So...
>
>> Thus, will
>>
>> ? for (i=0; i<N; ++i)
>> ? ? X[i] = Y[i] + Z[i];
>>
>> result in a single add per vector lane load or a single vector lane load
>> for M "unrolled" instances of (small) vector adds? ?If the latter then
>> we have to think about indexing the vector lanes as well as allowing
>> partial stores (or have a vector-lane construct operation). ?Representing
>> vector lanes as automatic memory (with array of vector type) makes
>> things easy, but eventually not very efficient.
>
> ...Ira would know best, but I don't think it would be used for this
> kind of loop. ?It would be more something like:
>
> ? for (i=0; i<N; ++i)
> ? ? X[i] = Y[i].red + Y[i].blue + Y[i].green;
>
> (not a realistic example). ?You'd then have:
>
> ? ?compoundY = __builtin_load_lanes (Y);
> ? ?red = ARRAY_REF <compoundY, 0>
> ? ?green = ARRAY_REF <compoundY, 1>
> ? ?blue = ARRAY_REF <compoundY, 2>
> ? ?D1 = red + green
> ? ?D2 = D1 + blue
> ? ?MEM_REF <X> = D2;
>
> My understanding is that'd we never do any operations besides ARRAY_REFs
> on the compound value, and that the individual vectors would be treated
> pretty much like any other.

Ok, I thought it might be used to have a larger vectorization factor for
loads and stores, basically make further unrolling cheaper because you
don't have to duplicate the loads and stores.

>> I had new tree/stmt codes for array loads/stores for middle-end arrays.
>> Eventually the vector lane support can at least walk in the same direction
>> that middle-end arrays would ;)
>
> What's the status of the middle-end array stuff? ?A quick search
> showed up your paper, but is it still WIP, or has it already gone in?
> (Showing my ignorance of tree-level stuff here. :-)) ?It does sound
> like it'd be a good fit for these ops.

Well, the work is basically suspended (though a lot of middle-end
surgery that was required went in) - I was stuck on the necessity
to have the Fortran frontend generate these expressions to have
testing on real code (rather than constructing examples from my
lame C frontend + builtins hack).  ISTR porting the patch to tuples,
the current patch seems to have two or three places that adjust
the middle-end in order to allow aggregate typed SSA names.

But as you have partial defs of the vector lane array the simplest
approach is probably to not make them a register.  Be prepared
for some surprises during RTL expansion though ;)

Richard.

> Richard
>

Follow-Ups:
- Re: RFC: Representing vector lane load/store operations
  - From: Richard Sandiford

References:
- RFC: Representing vector lane load/store operations
  - From: Richard Sandiford
- Re: RFC: Representing vector lane load/store operations
  - From: Richard Guenther
- Re: RFC: Representing vector lane load/store operations
  - From: Richard Sandiford

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]