This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: new rtl vec_set_unit/vec_get_unit


> 
> Example?

Imagine expanding a*b into sequence of

 vec_extract a[1]
 vec_extract b[1]
 mult
 vec_set b[1]
 vec_extract a[2]
 vec_extract b[2]
 mult
 vec_set b[2]
 .....

To extract field 6 of V16QImode vector you need to first rotate around
the V4SImode, then move lowpart of V4SImode into SImode register and
rotate the SImode register to get the 6th field first.
To save the value you need about the same.

However the fastest way to do this in this particular case on Athlon to
move everything into memory and do it in memory (this is not the case
for Pentium4, where the fastest way is to use the rotations but one has
to optimize since as you need only one rotation per field and one
SSE->integer reg move per 4 fields.

This is also not the case of V4SFmode.  To extrace 3rd field of V4SF you
still need two rotations (I believe) and two temporaries, while the
whole operation can be done without them like:

a[0]=a[0]*b[0]  (there is such instruction)
rotate a by 4
rotate b by 4
a[0]=a[0]*b[0]
rotate a by 4
rotate b by 4
a[0]=a[0]*b[0]
rotate a by 4
rotate b by 4
a[0]=a[0]*b[0]

It would be nice to have expanders smart enought to be able to do such
tricks (offloading the memory, doing the computations with rotated
sources and destinations or extracting fields smartly reusing rotated
temporaries)
I am not sure about sane API to get this, but it is definitly important
for the perofmrance.

I was even thinking about something like we do for call expansion - an
target hooks that will have three functions
 - initialize (receiving the vector and flag whether we are going to
   read, write or read/write the operand, it would return target
   specific structure)
 - advance - this one will switch to next field in target specific order
 - extract (this one will get the current field)
 - set (this one will set the field

This way we can write everything we need in the i386, however I am not
sure whether is not too overenginered and it does not fit the GCC design
very well.  Since it is allowed to modify the input operands, it must be
initialized exactly once per each register making it unconfortable for
the midleend.

Perhaps plain flag to let middleend to choose one of the three methods -
extracting field by field,  rotating and modifying the first field,
offloading everything to memory and provide expanders for that.
> 
> >Also vec_set_unit/vec_get_unit can be expanded into
> >vec_select/vec_duplicate operations so there is probably no need to
> >invent the RTL construct for that, we only need the named patterns.
> 
> Ok I see you're using vec_select in the x86 backend to get to a 
> particular element.  This is definitely better than my approach, but I 
> suggest we document it.
> 
> How does this look for extraction?:
> 
> (set (match_operand:SI 99)
>      (vec_select:SI (match_operand:V4SI 3))
> 
> However... how do you suggest we do the set operation, as in setting an 
> element of vector to a particular value. ??  You can't use vec_select 

You need something like

(define_insn "sse_loadss_1"
  [(set (match_operand:V4SF 0 "register_operand" "=x")
	(vec_merge:V4SF
	 (vec_duplicate:V4SF (match_operand:SF 1 "memory_operand" "m"))
	 (match_operand:V4SF 2 "const0_operand" "X")
	 (const_int 1)))]
  "TARGET_SSE"
  "movss\t{%1, %0|%0, %1}"
  [(set_attr "type" "ssemov")
   (set_attr "mode" "SF")])
> as a left hand value without major surgery, pretty much every place we 
> handle zero_extract, sign_extract, subreg, and strict_low_part.

I don't think it is sane to have these and I would not like adding
similar construct.  It makes number of problems everywhere (SSA form,
dependency analysis and so on) - having plain expression to merge both
values into result is much more convenient to operate on.

Honza
> 
> For the set, I was suggesting completely different rtl, ala:
> 
> >	(vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5))
> >
> >Then, the expanders:
> >
> >(define_expand "vec_set_unitv2si"
> >	(set (match_operand:V2SI 0)
> >	     (vec_set_unit:V2SI (match_operand:V2SI 1)
> >				(match_operand 2 immediate)
> >				(match_operand:SI 3)))
> 
> Aldy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]