This is the mail archive of the
mailing list for the GCC project.
Re: how to keep a hard register across multiple instrutions?
- From: Jeff Law <law at redhat dot com>
- To: David Kang <dkang at isi dot edu>, gcc at gcc dot gnu dot org
- Date: Mon, 03 Nov 2014 09:21:58 -0700
- Subject: Re: how to keep a hard register across multiple instrutions?
- Authentication-results: sourceware.org; auth=none
- References: <1727039389 dot 969391 dot 1414792865599 dot JavaMail dot root at zm dot isi dot edu>
On 10/31/14 16:01, David Kang wrote:
Use a define_insn_and_split, but only split it after register allocation
I'm newbie in gcc porting.
The architecture that I'm porting gcc has hardware FPU.
But the compiler has to generate code which builds a FPU instruction in a integer register
at run-time and writes the value to the FPU command register.
To make a single FPU instruction, three instructions are needed.
Two instructions make the FPU instruction in 32 bit (cmd, operands, operands, operands) format.
Here operands are the FPU register numbers, which can be 0 ~ 32.
As an example, f3 = f1 + 2 can be encoded as (code of 'add', 2, 1, 3).
And the third instruction write it to a FPU command register.
The architecture can issue up to 3 instructions at a time.
The difficulty lies in that we need to know the FPU register number
for those operands to generate the FPU instruction.
The easiest but lowest performance implementation is to generate those three instruction
from a single "define_insn" as three consecutive instructions.
However, we lose all possible bundling of those 3 instructions with other instructions for optimization.
So, I'm trying to find a better way.
I used "define_insn_and_split" and split a single FPU instruction into 3 instructions like this:
(Here I assume to use register r10, but it can be any integer register.)
operands = plus (operands, operands)
(1) r10 <- lower half of FPU instruction using
(code of 'add', operands, operands, operands)
(2) r10 <- r10 | upper half of FPU instruction using (code of 'add', operands, operands, operands)
(3) (FPU cmd register) <- r10
The problem is that gcc catches that operands is used before the 3rd instruction,
and allocates two different hard registers for (1,2) instructions and (3) instruction.
So, when the code is generated, the first two instructions are assuming wrong register
This happens especially frequently when '-unroll' option is used.
So, I think if there is a way to inform gcc to use the same hard registers for
operands across those three instructions.
Is it possible?
Or would there be any better way to generate efficient FPU code?
I will appreciate any advice or pointer to further information.