[PATCH 2/2] aarch64: Provide FPR alternatives for some bit insertions [PR109632]

Richard Sandiford richard.sandiford@arm.com
Tue May 23 11:02:54 GMT 2023


Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, May 23, 2023 at 12:38 PM Richard Sandiford via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> At -O2, and so with SLP vectorisation enabled:
>>
>>     struct complx_t { float re, im; };
>>     complx_t add(complx_t a, complx_t b) {
>>       return {a.re + b.re, a.im + b.im};
>>     }
>>
>> generates:
>>
>>         fmov    w3, s1
>>         fmov    x0, d0
>>         fmov    x1, d2
>>         fmov    w2, s3
>>         bfi     x0, x3, 32, 32
>>         fmov    d31, x0
>>         bfi     x1, x2, 32, 32
>>         fmov    d30, x1
>>         fadd    v31.2s, v31.2s, v30.2s
>>         fmov    x1, d31
>>         lsr     x0, x1, 32
>>         fmov    s1, w0
>>         lsr     w0, w1, 0
>>         fmov    s0, w0
>>         ret
>>
>> This is because complx_t is passed and returned in FPRs, but GCC gives
>> it DImode.
>
> Isn't that the choice of the target?  Of course "FPRs" might mean a
> single FPR here and arguably DFmode would be similarly bad?

Yeah, the problem is really the "single register" aspect, rather than
the exact choice of mode.  We're happy to store DImode values in FPRs
if it makes sense (and we will do for this example, after the patch).

V2SFmode or DFmode would be just as bad, like you say.

> That said, to the ppc folks who also recently tried to change how
> argument passing materializes I suggested to piggy-back on a
> SRA style analysis (you could probably simply build the access
> tree for all function parameters using its infrastructure) to drive
> RTL expansion heuristics (it's heuristics after all...) what exact
> (set of) pseudo / stack slot we want to form from the actual
> argument hard registers.

My long-term plan is to allow a DECL_RTL to be a PARALLEL of pseudos,
just like the DECL_INCOMING_RTL of a PARM_DECL can be a PARALLEL of
hard registers.  This also makes it possible to store things that are
currently BLKmode (but still passed and returned in registers).
E.g. it means that a 12-byte structure can be stored in registers
rather than being forced to the stack frame.

The idea (at least at first) is to handle only those cases that
make sense from an ABI point of view.  We'd still be relying on
SRA to split up operations on individual fields.

I have a WIP patch that gives some promising improvements,
but it needs more time.

Thanks,
Richard


More information about the Gcc-patches mailing list