This is the mail archive of the
mailing list for the GCC project.
Re: [RL78] Questions about code-generation
- From: Richard Hulme <peper03 at yahoo dot com>
- To: gcc at gcc dot gnu dot org
- Date: Tue, 11 Mar 2014 00:25:01 +0100
- Subject: Re: [RL78] Questions about code-generation
- Authentication-results: sourceware.org; auth=none
- References: <1394465260 dot 82407 dot YahooMailNeo at web125603 dot mail dot ne1 dot yahoo dot com> <201403102137 dot s2ALbDMw016198 at greed dot delorie dot com>
On 10/03/14 22:37, DJ Delorie wrote:
I've managed to build GCC myself so that I could experiment a bit
but as this is my first foray into compiler internals, I'm
struggling to work out how things fit together and what affects
The key thing to know about the RL78 backend, is that it has two
"targets" it uses. For the first part of the compilation, up until
after reload, the model uses 16 virtual registers (R8 through R15) and
a virtual machine to give gcc an orthogonal model that it can generate
code for. After reload, there's a "devirtualization" pass in the RL78
backend that maps the virtual model to the real model (R0 through R7),
which means copying values in and out of the real registers according
to which addressing modes are needed. Then GCC continues optimizing,
which gets rid of most of the unneeded instructions.
The problem you're probably running into is that deciding which real
registers to use for each virtual one is a very tricky task, and the
post-reload optimizers aren't expecing the code to look like what it
What causes that code to be generated when using a variable instead
of a fixed memory address?
The use of "volatile" disables many of GCC's optimizations. I
consider this a bug in GCC, but at the moment it needs to be "fixed"
in the backends on a case-by-case basis.
Ah, that certainly explains a lot. How exactly would the fixing be
done? Is there an example I could look at for one of the other processors?
It's certainly unfortunate, since an awful lot of bit-twiddling goes on
with the memory-mapped hardware registers (which obviously generally
need to be declared volatile).
Just to get a feel for the potential gains, I've removed the volatile
keyword from all the declarations and rebuilt the project. That change
alone reduces the code size by 3.7%. I wouldn't want to risk running
that code but the gain is certainly significant.
I calculated a week or two ago that we could make a code-saving of
around 8% by using near or relative branches and near calls instead of
always generating far calls. I changed rl78-real.md to use near
addressing and got about 5%. That's probably about right. I tried to
generate relative branches too but I'm guessing that the 'length'
attribute needs to be set for all instructions to get that working properly.
Obviously near/far addressing would need to be controlled by an external
switch to allow for processors with more than 64KB code-flash.
A few small gains can be had elsewhere (using 'clrb a' in
zero_extendqihi2_real, possibly optimizing addsi3_internal_real to avoid
addw ax,#0 etc.). These don't save much space in our project (about
30-40 bytes perhaps) but it'll obviously vary from project to project.