This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended


Hi all,

On 13/01/16 01:40, Jim Wilson wrote:
On Tue, Jan 12, 2016 at 5:10 PM, Kugan
<kugan.vivekanandarajah@linaro.org> wrote:
Yes, making PROMOTE_MODE to work the same way as in
promote_function_mode in arm will fix this. Can you please point me to
the test cases that are regressing so that I can also start looking at them.
The info is in here
     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932
See the comments on gcc.target/arm/wmul-[123].c which no longer
generate smulbb etc instructions, which are 16x16=32 expanding
multiplies which are faster on some older parts that have them.  They
are present in armv5e and higher architecture versions.

Kyrylo looked at this in November, but the situation looks even worse
now, as some of the redundant sign extends are gone even before the
first rtl pass.  That may make it harder to get the smulbb
instructions back.

I've done some more investigation on this yesterday.
The situation is actually not so bad.
The sign-extends are being removed from the arms of the multiply
(turning it into an mla rather than smlabb) in the RTL cse pass.
Something's going off in the costing logic in CSE because it generates
a simpler RTX (a multiply-add versus a multiply-add-extend) but one which
is more expensive. I think I've found the bug in there and I hope to post
a separate thread on that soon.

With the fix to CSE and a couple of arm backend rtx costing issues I can get
wmul-1.c and wmul-2.c to pass even with the change to promote_mode.
For wmul-3.c I get the sequence:
    ldrsh    r1, [ip, #2]!
    ldrsh    r4, [r0, #2]!
    cmp    r5, ip
    mls    r2, r1, r1, r2
    mls    lr, r1, r4, lr

instead of the previous:
    ldrh    ip, [lr, #2]!
    ldrh    r1, [r0, #2]!
    cmp    r6, lr
    smulbb    r5, ip, ip
    smulbb    r1, ip, r1
    sub    r2, r2, r5
    sub    r4, r4, r1

That is, two instructions shorter (no subs) but using the more expensive mls instruction
rather than smulbb. Is the new sequence preferable?

I hope to post the backend cost bug fixes soon and an RFC for the cse issue.

Thanks,
Kyrill

Jim



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]