[MIPS][LS2][2/5] Vector intrinsics

Fri Jun 6 12:31:00 GMT 2008

Richard Sandiford wrote:

...

> Maxim Kuvyrkov <maxim@codesourcery.com> writes:

...

>> Loongson behaves as generic MIPS III in the respect to moves to and from 
>> FP registers.  So to handle new modes I added them to MOVE64 and SPLITF 
>> mode_iterators and adjusted mips_split_doubleword_move() accordingly.
> 
> This part looks good as far as it goes, thanks, but your vector move
> pattern still has an "f<-f" (aka "fmove") alternative.  Like I say,
> that causes GCC to use a MOV.D instruction for something that is not a
> double-precision floating-point value.  My main concern was that using
> MOV.D in this way is usually incorrect; see:
> 
>   - section 5.7 in volume 1 of the MIPS32/64 ISA spec
>   - the documentation of MOV.FMT in volume 2 of the MIPS32/64 ISA spec
> 
> for more details.
> 
> Thus the standard ISA spec has two separate 64-bit move instructions:
> MOV.D and MOV.PS.  Both instructions move one 64-bit FPR to another,
> but they are used for two different kinds of 64-bit data.
> 
> There is no MOV.L instruction, so 64-bit integer moves must be done
> through a GPR.  (This is true for both for 32-bit and 64-bit targets.)
> Using MOV.D or MOV.PS for 64-bit integers is incorrect.
> 
> So when it comes to the new modes, I think there are two cases:
> 
>   (1) Loongson specifically exempts itself from this restriction.
>       You can use MOV.D for any kind of data, regardless of how
>       the source FPR has been used, or how the destination FPR
>       will be used.
> 
>   (2) We need to move through GPRs for the new modes too.
> 
> The current implementation falls between two stools.  It provides
> an "fmove" alternative in the move pattern (suggesting (1)),
> but mips_mode_ok_for_mov_fmt_p still returns false (suggesting (2)).

It is the (2).  I didn't spot "f" -> "f" alternative in
mov<mode>_internal when fixed the patch; this alternative should be
removed.  I checked with Loongson designers and they confirmed that
MOV.D should not be used in this case.

...

> Couldn't you replace the unsigned_p argument with the test:
> 
>   if (TREE_CODE (type) == INTEGER_TYPE && TYPE_UNSIGNED (type))
> 
> ?  That'd avoid accidentally mixing the unsigned_p and type arguments.

Fixed.

> 
>> +Also provided are helper functions for loading and storing values of the
>> +above 64-bit vector types to and from memory:
>> +
>> +@smallexample
>> +uint32x2_t vec_load_uw (uint32x2_t *src);
>> +uint16x4_t vec_load_uh (uint16x4_t *src);
>> +uint8x8_t vec_load_ub (uint8x8_t *src);
>> +int32x2_t vec_load_sw (int32x2_t *src);
>> +int16x4_t vec_load_sh (int16x4_t *src);
>> +int8x8_t vec_load_sb (int8x8_t *src);
>> +void vec_store_uw (uint32x2_t v, uint32x2_t *dest);
>> +void vec_store_uh (uint16x4_t v, uint16x4_t *dest);
>> +void vec_store_ub (uint8x8_t v, uint8x8_t *dest);
>> +void vec_store_sw (int32x2_t v, int32x2_t *dest);
>> +void vec_store_sh (int16x4_t v, int16x4_t *dest);
>> +void vec_store_sb (int8x8_t v, int8x8_t *dest);
>> +@end smallexample
> 
> I assume this is an existing cross-compiler API you're implementing.

This API was developed at CodeSourcery and was signed off by ST 
Microelectronics, the producers of Loongson CPUs.  I speculate that the 
design of the API is similar to arm_neon.h.

> Is it worth saying that plain C pointer dereferences would also work?
> Or do we explicitly want to steer the user away from that, because
> doing so doesn't conform to the API?  I think we should say something
> either way.
> 
> (For the record, I'm OK with keeping these functions, if you're
> implementing an existing API.)

How about "While it is possible to use plain C pointer dereferences, the 
following helper functions provide stable interface for loading and 
storing values of the above 64-bit vector types to and from memory" ?

> 
> I believe:
> 
>    foo_t vec_load_bar (const volatile foo_t *src);
> 
> would be more general; as it stands, I think using the functions
> on constant or volatile data would result in a warning.  Likewise:
> 
>    void vec_store_bar (foo_t v, volatile foo_t *dest);
> 
> But again, if you need to be precisely compatible with an API,
> I'm OK keeping it as-is.

"volatile" will probably kill CSE optimizations.  "const" seems like a 
good addition.

> 
>> +;; Handle legitimized moves between values of vector modes.
>> +(define_insn "mov<mode>_internal"
>> +  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,d,f,d,m, f")
>> +	(match_operand:VWHB 1 "move_operand"          "f,m,f,f,d,d,YG,YG"))]
>> +  "HAVE_LOONGSON_VECTOR_MODES"
>> +  { return mips_output_move (operands[0], operands[1]); }
>> +  [(set_attr "type" "fpstore,fpload,fmove,mfc,mtc,move,fpstore,mtc")
>                                                           ^^^^^^^
> Just "store"; this is a GPR store.  The pattern is missing "m<-d",
> "d<-m" and "d<-YG", which you also need to handle.  I think:
> 
> (define_insn "mov<mode>_internal"
>   [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,d,f,d,m,d")
> 	(match_operand:VWHB 1 "move_operand"          "f,m,f,dYG,dYG,dYG,m"))]
>   "HAVE_LOONGSON_VECTOR_MODES"
>   { return mips_output_move (operands[0], operands[1]); }
>   [(set_attr "type" "fpstore,fpload,mfc,mtc,move,store,load")
> 
> would be correct for (2) above.  Add the fmove alternative back for (1).

Fixed, no fmove alternative now.  Thanks for pointing at this.

> 
>> --- config/mips/mips.h	(/local/gcc-trunk/gcc)	(revision 373)
>> +++ config/mips/mips.h	(/local/gcc-2/gcc)	(revision 373)
>> @@ -266,6 +266,11 @@ enum mips_code_readable_setting {
>>  				     || mips_tune == PROCESSOR_74KF3_2)
>>  #define TUNE_20KC		    (mips_tune == PROCESSOR_20KC)
>>  
>> +/* Whether vector modes and intrinsics for ST Microelectronics
>> +   Loongson-2E/2F processors should be enabled.  In o32 pairs of
>> +   floating-point registers provide 64-bit values.  */
>> +#define HAVE_LOONGSON_VECTOR_MODES TARGET_LOONGSON_2EF
>> +
>>  /* True if the pre-reload scheduler should try to create chains of
>>     multiply-add or multiply-subtract instructions.  For example,
>>     suppose we have:
> 
> I think this needs to depend on TARGET_HARD_FLOAT as well.

Fixed.

--
Maxim