[MIPS][LS2][2/5] Vector intrinsics

Thu Jun 12 08:45:00 GMT 2008

Richard Sandiford wrote:
> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>> Richard Sandiford wrote:
>>> Maxim Kuvyrkov <maxim@codesourcery.com> writes:
>>>> +;; Addition of doubleword integers stored in FP registers.
>>>> +;; Overflow is treated by wraparound.
>>>> +(define_insn "paddd"
>>>> +  [(set (match_operand:DI 0 "register_operand" "=f")
>>>> +        (plus:DI (match_operand:DI 1 "register_operand" "f")
>>>> +		 (match_operand:DI 2 "register_operand" "f")))]
>>>> +  "HAVE_LOONGSON_VECTOR_MODES"
>>>> +  "paddd\t%0,%1,%2")
>>> I don't think this pattern or psubd will ever be used for 64-bit ABIs;
>>> they'll be trumped by the normal addition and subtraction patterns.
>>> Thus paddd (...) and psub (...) will actually expand to "daddu" and
>>> "dsubu", moving to and from FPRs if necessary.
>> I think this is separate problem from what this patch tries to solve. 
> 
> Sorry, but I disagree.  As you say...
> 
>> The main objective of this patch is to add intrinsics that can be used 
>> to write hand-optimized code.
> 
> ...so users of paddd() would reasonably expect it to generate paddd
> instead of daddu, because presumably the code has been hand-optimised
> that way.  So they'd expect to see:
> 
>         paddd   $f0,$f1,$f2
> 
> rather than (on 64-bit targets)
> 
>         dmfc1   $5,$f1
>         dmfc1   $6,$f2
>         daddu   $4,$5,$6
>         dmtc1   $4,$f0

What I meant was the patch should provide proper support for all 
intrinsics it defines, so that, when paddd intrinsic is used, the 
compiler outputs paddd instruction.  This patch should not, however, go 
far beyond its primary objective and try to add optimizations that are 
not trivial to accomplish.  Sorry if I wasn't clear.

...

>>> Also, you _might_ end up
>>> using these patterns for 64-bit addition on 32-bit ABIs, even though the
>>> cost of moving to and from FPRs is higher than the usual add/shift
>>> sequence.
>> Right.  Although, my recollection from last time I looked at splitter is 
>> that if splitter sees something it can split, it splits it.
> 
> There is no splitter for 64-bit addition though.  We decompose it
> during expand.  The problem is that we then attach a REG_EQUAL
> note that contains the original 64-bit addition.  Passes which
> use that note could in principle recognise this instruction.
> 
>> For the time being, I propose to replace 'plus' in paddd and 'minus' in 
>> psubd with unspecs.  This way the intrinsics will still work, and 
>> there'll be no clashes with add<mode>3 and sub<mode>3.
> 
> Yeah, FWIW, I suspected this is the way we'd go.  I certainly can't
> think of anything better.
> 
> We lose the ability to constant-fold, but:
> 
>   (a) I imagine that's less likely to happen in hand-optimised code.
>   (b) We don't constant-fold most built-in functions.  There's no
>       intrinsic reason why this one should be any different.
>   (c) It would be simple to add a "simplify_unspec" target hook at some
>       point, if anyone was so inclined.

Unspecs it is.

> 
>> +;; We use 'unspec' instead of 'plus' here to avoid clash with
>> +;; mips.md::add<mode>3.  If 'plus' was used, then such instruction
>> +;; would be recognized as adddi3 and reload would make it use
>> +;; GPRs instead of FPRs.  Same is valid for loongson_paddd.
>                                                ^^^^^^^^^^^^^^
>>  (define_insn "loongson_paddd"
> 
> Not sure I understand the last sentence.  Did you mean loongson_psubd?
> Either way, I think it would be clearer with this sentence removed.
> Here...

OK.

> 
>>  ;; Subtraction of doubleword integers stored in FP registers.
>>  ;; Overflow is treated by wraparound.
>> +;; We use 'unspec' instead of 'minus' here to avoid clash with
>> +;; mips.md::sub<mode>3.  If 'minus' was used, then such instruction
>> +;; would be recognized as subdi3 and reload would make it use
>> +;; GPRs instead of FPRs.  Same is valid for loongson_paddd.
> 
> ...you can just say something like:
> 
> ;; See loongson_paddd for the reason we use 'unspec' rather than 'minus' here.

OK.

> 
>>> Finally, I adjusted the patch so that it applies on top of the
>>> built-in-table patch I sent yesterday in the [3/5] thread.
>>>
>>> I've not done anything about the paddd/psubd thing; I'll leave
>>> that to you ;)  Otherwise, does this look OK to you?
>> Yes, the patch looks great, thanks.  I'm testing it right now together 
>> with the simple fix for paddd/psubd issue.
> 
> Thanks.  I'll test the builtin patch separately on mipsisa64-elfoabi,
> just to be sure.  Once that's in, the combination of:
> 
>     http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00554.html
> 
> and the paddd patch (adjusted as above) is OK to install,
> if testing succeeds.

I'll post the final patch once the tests are finished.  I'm also testing 
your built-in-table patch and will get back to you with the results.

...

> PS. Just as a reminder, I think it would be a good idea to have
>     scan-assembler tests too, as a separate submission.

This is on my list.

Thanks,

Maxim