This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [MIPS][LS2][2/5] Vector intrinsics


Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> Richard Sandiford wrote:
>>> Loongson behaves as generic MIPS III in the respect to moves to and from 
>>> FP registers.  So to handle new modes I added them to MOVE64 and SPLITF 
>>> mode_iterators and adjusted mips_split_doubleword_move() accordingly.
>> 
>> This part looks good as far as it goes, thanks, but your vector move
>> pattern still has an "f<-f" (aka "fmove") alternative.  Like I say,
>> that causes GCC to use a MOV.D instruction for something that is not a
>> double-precision floating-point value.  My main concern was that using
>> MOV.D in this way is usually incorrect; see:
>> 
>>   - section 5.7 in volume 1 of the MIPS32/64 ISA spec
>>   - the documentation of MOV.FMT in volume 2 of the MIPS32/64 ISA spec
>> 
>> for more details.
>> 
>> Thus the standard ISA spec has two separate 64-bit move instructions:
>> MOV.D and MOV.PS.  Both instructions move one 64-bit FPR to another,
>> but they are used for two different kinds of 64-bit data.
>> 
>> There is no MOV.L instruction, so 64-bit integer moves must be done
>> through a GPR.  (This is true for both for 32-bit and 64-bit targets.)
>> Using MOV.D or MOV.PS for 64-bit integers is incorrect.
>> 
>> So when it comes to the new modes, I think there are two cases:
>> 
>>   (1) Loongson specifically exempts itself from this restriction.
>>       You can use MOV.D for any kind of data, regardless of how
>>       the source FPR has been used, or how the destination FPR
>>       will be used.
>> 
>>   (2) We need to move through GPRs for the new modes too.
>> 
>> The current implementation falls between two stools.  It provides
>> an "fmove" alternative in the move pattern (suggesting (1)),
>> but mips_mode_ok_for_mov_fmt_p still returns false (suggesting (2)).
>
> It is the (2).  I didn't spot "f" -> "f" alternative in
> mov<mode>_internal when fixed the patch; this alternative should be
> removed.  I checked with Loongson designers and they confirmed that
> MOV.D should not be used in this case.
>
> ...

OK, thanks.

>>> +Also provided are helper functions for loading and storing values of the
>>> +above 64-bit vector types to and from memory:
>>> +
>>> +@smallexample
>>> +uint32x2_t vec_load_uw (uint32x2_t *src);
>>> +uint16x4_t vec_load_uh (uint16x4_t *src);
>>> +uint8x8_t vec_load_ub (uint8x8_t *src);
>>> +int32x2_t vec_load_sw (int32x2_t *src);
>>> +int16x4_t vec_load_sh (int16x4_t *src);
>>> +int8x8_t vec_load_sb (int8x8_t *src);
>>> +void vec_store_uw (uint32x2_t v, uint32x2_t *dest);
>>> +void vec_store_uh (uint16x4_t v, uint16x4_t *dest);
>>> +void vec_store_ub (uint8x8_t v, uint8x8_t *dest);
>>> +void vec_store_sw (int32x2_t v, int32x2_t *dest);
>>> +void vec_store_sh (int16x4_t v, int16x4_t *dest);
>>> +void vec_store_sb (int8x8_t v, int8x8_t *dest);
>>> +@end smallexample
>> 
>> I assume this is an existing cross-compiler API you're implementing.
>
> This API was developed at CodeSourcery and was signed off by ST 
> Microelectronics, the producers of Loongson CPUs.  I speculate that the 
> design of the API is similar to arm_neon.h.
>
>> Is it worth saying that plain C pointer dereferences would also work?
>> Or do we explicitly want to steer the user away from that, because
>> doing so doesn't conform to the API?  I think we should say something
>> either way.
>> 
>> (For the record, I'm OK with keeping these functions, if you're
>> implementing an existing API.)
>
> How about "While it is possible to use plain C pointer dereferences, the 
> following helper functions provide stable interface for loading and 
> storing values of the above 64-bit vector types to and from memory" ?

Hmm.  If this is a newly-defined interface, I really have to question
the wisdom of these functions.  The wording above suggests that there's
something "unstable" about normal C pointer and array accesses.
There shouldn't be ;)  They ought to work as expected.

The patch rightly uses well-known insn names for well-known operations
like vector addition, vector maximum, and so on.  As well as allowing
autovectorisation, I believe this means you could write:

    uint8x8_t *a;

    a[0] = a[1] + a[2];

(It might be nice to have tests to make sure that this does indeed
work when using the new header file.  It could just be cut-&-paste
from the version that uses intrinsic functions.)

I just think that, given GCC's vector extensions, having these
functions as well is confusing.  I take what you say about it
being consistent with arm_neon.h, but AltiVec doesn't have these
sorts of function, and GCC's generic vector support was heavily
influenced by AltiVec.

Sorry for shooting the messenger here.  I realise this wasn't
your decision.

>> I believe:
>> 
>>    foo_t vec_load_bar (const volatile foo_t *src);
>> 
>> would be more general; as it stands, I think using the functions
>> on constant or volatile data would result in a warning.  Likewise:
>> 
>>    void vec_store_bar (foo_t v, volatile foo_t *dest);
>> 
>> But again, if you need to be precisely compatible with an API,
>> I'm OK keeping it as-is.
>
> "volatile" will probably kill CSE optimizations.  "const" seems like a 
> good addition.

Hmm, good point.  That's another mark against these functions IMO ;)

>>> +;; Handle legitimized moves between values of vector modes.
>>> +(define_insn "mov<mode>_internal"
>>> +  [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,f,d,f,d,m, f")
>>> +	(match_operand:VWHB 1 "move_operand"          "f,m,f,f,d,d,YG,YG"))]
>>> +  "HAVE_LOONGSON_VECTOR_MODES"
>>> +  { return mips_output_move (operands[0], operands[1]); }
>>> +  [(set_attr "type" "fpstore,fpload,fmove,mfc,mtc,move,fpstore,mtc")
>>                                                           ^^^^^^^
>> Just "store"; this is a GPR store.  The pattern is missing "m<-d",
>> "d<-m" and "d<-YG", which you also need to handle.  I think:
>> 
>> (define_insn "mov<mode>_internal"
>>   [(set (match_operand:VWHB 0 "nonimmediate_operand" "=m,f,d,f,d,m,d")
>> 	(match_operand:VWHB 1 "move_operand"          "f,m,f,dYG,dYG,dYG,m"))]
>>   "HAVE_LOONGSON_VECTOR_MODES"
>>   { return mips_output_move (operands[0], operands[1]); }
>>   [(set_attr "type" "fpstore,fpload,mfc,mtc,move,store,load")
>> 
>> would be correct for (2) above.  Add the fmove alternative back for (1).
>
> Fixed, no fmove alternative now.

Thanks.  It sounds from James Ruan's message that 2F could use an FPU
OR instruction here, but it's fine to handle that separately.

Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]