This is the mail archive of the
mailing list for the GCC project.
Re: (R5900) Implementing Vector Support
- From: Richard Henderson <rth at redhat dot com>
- To: Woon yung Liu <ysai187 at yahoo dot com>, Gcc Mailing List <gcc at gcc dot gnu dot org>
- Date: Mon, 9 May 2016 07:53:52 -1000
- Subject: Re: (R5900) Implementing Vector Support
- Authentication-results: sourceware.org; auth=none
- References: <e7a9b3fd-4008-c321-3c6b-8532a4a5a8ec at redhat dot com> <1100982660 dot 199674 dot 1462606080844 dot JavaMail dot yahoo at mail dot yahoo dot com>
On 05/06/2016 09:28 PM, Woon yung Liu wrote:
Regarding multiplication of vectors, is there a way to work with a multiplication operation that results in something like this (the result is spread across these 3 registers), without re-ordering any elements:
RD: A6xB6, A4xB4, A2xB2, A0xA0
LO: A7xB7, A6xB6, A3xB3, A2xA2
HI: A5xB5, A4xB4, A1xB1, A0xA0
A0-A7 and B0-B7 are the 8 elements of two V8HI vectors, which are multiplied together to produce a widened multiplication result.
It looks like the vector hi/lo multiplication pattern would work with the values in HI and LO, but the order of the elements don't seem to be in a way that GCC expects.
Assuming that it is possible to put this pattern to use, does GCC allow the vec_widen_smult_hi and
vec_widen_smult_lo patterns to be combined together? Like for the divmod (division + modulus) patterns.
The instruction described above (PMULTH) will result in calculation of both the hi and lo parts of the result, in one instruction. Hence combining the two patterns would be more efficient.
You can use this if you reshuffle the results.
Since it appears that PMULTH naturally produces even results in RD, it would
seem to make the most sense to attempt to construct the odd results from LO+HI.
However, I don't see anything in the TX79 isa that's particularly helpful there.
pmulth r0, x, y
pcpyld r1, t1, t2
pcpyud r2, t2, t1
would appear to produce the results gcc expects for the hi/lo multiples.
Don't worry overmuch about initially generating two copies of the pmulth
instruction. We have a similar problem with the ia64 patterns. Rely on the
rtl CSE pass to remove the duplicate instructions.