This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
How to implement efficiently builtins for dual-result instructions ?
- From: "Dmitry Cheresiz" <dmitry dot cheresiz at gmail dot com>
- To: gcc at gcc dot gnu dot org
- Date: Mon, 4 Feb 2008 11:14:29 +0100
- Subject: How to implement efficiently builtins for dual-result instructions ?
Hi,
I am implementing a gcc backend for a target architecture which
contains assembly isntructions writing two result registers.
I have a difficulty implementing builtins for such instructions efficiently.
For example, the "super-load" instruction has a form super_ld32 rA
-> rX, rY.
This operation retrieves two consecutive 32-bit values from the
address given by
register rA and writes them two the two 32-bit registers rX and rY.
The registers rX and rY might be non-consecutive.
To invoke this instruction from the source level, a compiler builtin
is provided.
Since C syntax doesn't provide functions with two results, this builtin refers
to them via pointers:__super_ld32( int* x, int *y, int *a)
For example, let sampleC1, sampleC2, and currFrame be local variables. Then
__super_ld32(&sampleC1, &sampleC2, &currFrame[index_xy]);
means that the result of a super load from address &currFrame[index_xy] should
be assigned to the variables sampleC1 and sampleC2.
I expand this builtin as follows. First, I generate two new pseudo
regs and an RTL insn which assignes them to the results of the
superload. This instruction is
matched by the following definition
(define_insn "customop_super_ld32"
[ (set:SI (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:SI 2 "register_operand" "r"
)]UNSPEC_customop_super_ld32))
(set:SI (match_operand:SI 1 "register_operand" "=r")
(unspec:SI [(match_dup:SI 2 )]UNSPEC_customop_super_ld32_2))
]
""
"super_ld32 %2 -> %0 %1"
)
To generate correct code, builtin has to be expanded to a semantically
equivalent sequence of RTL insns. Therefore, I also generate two
store instructions, which write the generated pseudos to the addresses
given by the x and y parameters of the builtin.
Compiling the code
__super_ld32(&sampleC1, &sampleC2, &currFrame[index_xy]);
I have observed that when a variable sampleC1 is used relatively far
away from its definition (e.g. in a different basic block), GCC was
not able to determine that its value is still contained in the
destination register of the super_ld32, although that
was the case. Instead, GCC loaded the variable from the stack. On the
other hand,
when the use was close to the definition, GCC was avoiding the load.
Consequently, the store generated during the builtin expansion was also
often eliminated by the dse pass, resulting in efficient code.
I would like to achieve such efficient code generation also in more
complex cases.
I will appreciate if somebody can suggest a mechanism in GCC which can
be useful
for this or comment on the following approaches I am currently thinking of.
Approach A.
Substitute the builtin with two results referred by pointer
by two builtins having
single results:
sampleC1 = __super_ld32_part1(&currFrame[index_xy]) ;
sampleC2 = __super_ld32_part2(&currFrame[index_xy], sampleC1)
These two builtins can be each expanded to a single unspec RTL insn.
I enforce __super_ld32_part2 to use sampleC1 in order to create a
dependency and to be able to identify that they form a pair.
At a later stage, I would like to identify such insn pairs and
substitute them with a single RTL insn which should eventually
produce desired
super_ld32 rA -> rX rY insn.
I wonder if combine stage would be able to do so or it is
better to implement it
manually for example in pass_final ?
Approach B.
Append REG_EQUIV notes to the destination registers of the RTL for
customop_super_ld32, hoping that this will help optimization
stages to realize
that these regs contain the values of the variables which
addresses are
given to the builtin, and let these stages to optimize
unnecessary ld/st insns.
I see, however, that such notes are supposed to be applied
only to insns
which have a single destination register.