This is the mail archive of the
mailing list for the GCC project.
Re: [RL78] Questions about code-generation
- From: Richard Hulme <peper03 at yahoo dot com>
- To: gcc at gcc dot gnu dot org
- Date: Mon, 24 Mar 2014 22:11:26 +0100
- Subject: Re: [RL78] Questions about code-generation
- Authentication-results: sourceware.org; auth=none
- References: <1394465260 dot 82407 dot YahooMailNeo at web125603 dot mail dot ne1 dot yahoo dot com> <201403102137 dot s2ALbDMw016198 at greed dot delorie dot com> <531E49CD dot 1040000 at yahoo dot com> <201403110017 dot s2B0HUUW020617 at greed dot delorie dot com> <1394497818 dot 19458 dot 84 dot camel at yam-132-YW-E178-FTW> <201403110040 dot s2B0ebdR021145 at greed dot delorie dot com> <532CBED2 dot 7010800 at yahoo dot com> <201403220035 dot s2M0Zs18032433 at greed dot delorie dot com> <532CDDBE dot 6070901 at redhat dot com> <532D7430 dot 4040208 at yahoo dot com> <532FAA23 dot 2020301 at redhat dot com>
On 24/03/14 04:44, Jeff Law wrote:
On 03/22/14 05:29, Richard Hulme wrote:
On 22/03/14 01:47, Jeff Law wrote:
On 03/21/14 18:35, DJ Delorie wrote:
I've found that "removing uneeded moves through registers" is
something gcc does poorly in the post-reload optimizers. I've written
my own on some occasions (for rl78 too). Perhaps this is a good
starting point to look at?
much needless copying, which strengthens my suspicion that it's
something in the RL78 backend that needs 'tweaking'.
Of course it is, I've said that before I think. The RL78 uses a
virtual model until reload, then converts each virtual instructions
into multiple real instructions, then optimizes the result. This is
going to be worse than if the real model had been used throughout
(like arm or x86), but in this case, the real model *can't* be used
throughout, because gcc can't understand it well enough to get through
regalloc and reload. The RL78 is just to "weird" to be modelled
I keep hoping that gcc's own post-reload optimizers would do a better
job, though. Combine should be able to combine, for example, the "mov
r8,ax; cmp r8,#4" types of insns together.
The virtual register file was the only way I could see to make RL78
work. I can't recall the details, but when you described the situation
to me the virtual register file was the only way I could see to make the
RL78 work in the IRA+reload world.
What would be quite interesting to try would be to continue to use the
virtualized register set, but instead use the IRA+LRA path. Presumably
that wouldn't be terribly hard to try and there's a reasonable chance
that'll improve the code in a noticeable way.
Looking at how that's done by other backends, as far as I can tell, I
just need to add something like:
#define TARGET_LRA_P rl78_enable_lra
to rl78.c? At least in theory, even if other work is needed elsewhere
to make things run smoothly.
Unfortunately, that function never seems to be called.
How does TARGET_LRA_P get used, anyway? I can't find anything that
tries to use it, only places where it gets set. Is there some funky
preprocessor stuff going on that's stopping me grepping for it?
That should be enough to switch to the LRA path. It's a target hook.
Grep for "targetm.lra_p"
Ok, I figured out what was wrong eventually. I'd added the lines above
*after* the declaration of the targetm variable.
Activating LRA alone is certainly not the answer. Whilst I can see that
*some* of the "to me, to you" register passing has been eliminated, LRA
seems to have an intense dislike to indirect memory addressing with an
offset. So instead of something like:
mov a, [sp+4]
it's now producing:
movw ax, sp
addw ax, #4
movw hl, ax
mov a, [hl]
which takes 7 bytes (compared to 4). Overall I've got an code increase
of about 31%.
I don't know why it's avoiding the indirect with offset addressing mode.
It *does* generate code using it but seemingly as a last resort.
Something else to track down, I guess.