[PATCH, ARM] stop changing signedness in PROMOTE_MODE

Jim Wilson jim.wilson@linaro.org
Tue Jul 7 18:25:00 GMT 2015


On Thu, Jul 2, 2015 at 2:07 AM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> Not quite, ARM state still has more flexible addressing modes for
> unsigned byte loads than for signed byte loads.  It's even worse with
> thumb1 where some signed loads have no single-register addressing mode
> (ie you have to copy zero into another register to use as an index
> before doing the load).

I wasn't aware of the load address problem.  That was something I
hadn't considered, and will have to look at that.  Load is just one
instruction though.  For most other instructions, a zero-extend
results in less efficient code, because it then forces a sign-extend
before a signed operation.  The fact that parameters and locals are
handled differently which requires conversions when copying between
them results in more inefficient code.  And changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, and hence would be
unwise, so changing PROMOTE_MODE is the safer option.

Consider this testcase
extern signed short gs;

short
sub (void)
{
  signed short s = gs;
  int i;

  for (i = 0; i < 10; i++)
    {
      s += 1;
      if (s > 10) break;
    }

  return s;
}

The inner loop ends up as
.L3:
adds r3, r3, #1
mov r0, r1
uxth r3, r3
sxth r2, r3
cmp r2, #10
bgt .L8
cmp r2, r1
bne .L3
bx lr

We need the sign-extension for the compare.  We need the
zero-extension for the loop carried dependency.  We have two
extensions in every loop iteration, plus some extra register usage and
register movement.  We get better code for this example if we aren't
forcing signed shorts to be zero-extended via PROMOTE_MODE.

The lack of a reg+immediate address mode for ldrs[bh] in thumb1 does
look like a problem though.  But this means the difference between
generating
    movs r2, #0
    ldrsh r3, [r3, r2]
with my patch, or
    ldrh r3, [r3]
    lsls r2, r3, #16
    asrs r2, r2, #16
without my patch.  It isn't clear which sequence is better.  The
sign-extends in the second sequence can sometimes be optimized away,
and sometimes they can't be optimized away.  Similarly, in the first
sequence, loading zero into a reg can sometimes be optimized, and
sometimes it can't.  There is also no guarantee that you get the first
sequence with the patch or the second sequence without the patch.
There is a splitter for ldrsh, so you can get the second pattern
sometimes with the patch.  Similarly, it might be possible to get the
first pattern without the patch in some cases, though I don't have one
at the moment.

Jim



More information about the Gcc-patches mailing list