This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][expr.c] PR middle-end/71700: zero-extend sub-word value when widening constructor element



On 05/07/16 13:57, Kyrill Tkachov wrote:
Hi Bernd,

On 04/07/16 19:02, Bernd Schmidt wrote:
On 07/01/2016 11:18 AM, Kyrill Tkachov wrote:
In this arm wrong-code PR the struct assignment goes wrong when
expanding constructor elements to a register destination
when the constructor elements are signed bitfields less than a word wide.
In this testcase we're intialising a struct with a 16-bit signed
bitfield to -1 followed by a 1-bit bitfield to 0.
Before it starts storing the elements it zeroes out the register.
 The code in store_constructor extends the first field to word size
because it appears at the beginning of a word.
It sign-extends the -1 to word size. However, when it later tries to
store the 0 to bitposition 16 it has some logic
to avoid redundant zeroing since the destination was originally cleared,
so it doesn't emit the zero store.
But the previous sign-extended -1 took up the whole word, so the
position of the second bitfield contains a set bit.

This patch fixes the problem by zeroing out the bits of the widened
field that did not appear in the original value,
so that we can safely avoid storing the second zero in the constructor.
[...]

Bootstrapped and tested on arm, aarch64, x86_64 though the codepath is
gated on WORD_REGISTER_OPERATIONS I didn't
expect any effect on aarch64 and x86_64 anyway.

So - that code path starts with this comment:

            /* If this initializes a field that is smaller than a
               word, at the start of a word, try to widen it to a full
               word.  This special case allows us to output C++ member
               function initializations in a form that the optimizers
               can understand.  */

Doesn't your patch completely defeat the purpose of this? Would you get better/identical code by just deleting this block? It seems unfortunate to have two different code generation approaches like this.

It would be interesting to know the effects of your patch, and the effects of removing this code entirely, on generated code. Try to find the motivating C++ member function example perhaps? Maybe another possibility is to ensure this doesn't happen if the value would be interpreted as signed.


Doing some archaeology shows this code was added back in 1998 with no testcase (r22567) so I'd have to do more digging.

My interpretation of that comment was that for WORD_REGISTER_OPERATIONS targets it's more beneficial to have word-size
operations, so the expansion code tries to emit as many of the operations in word_mode as it safely can.
With my patch it still emits a word_mode operation, it's just that the immediate that is moved in word_mode has it's
top bits zeroed out instead of sign-extended.

I'll build SPEC2006 on arm (a WORD_REGISTER_OPERATIONS target) and inspect the assembly to see if I see any interesting
effects, but I suspect there won't be much change.


Ok, I found a bit of time to investigate.
On SPEC2006 I didn't see a difference in codegen with or without that whole block
(or with and without this patch).
The differences in codegen can be observed on the testcase in the patch.
Currently (without this patch) we generate the wrong-code:
main:
        strd    r3, lr, [sp, #-8]!
        movw    r3, #:lower16:d
        mov     r2, #-1
        movt    r3, #:upper16:d
        str     r2, [r3]
        bl      abort


With this patch that performs the zero extension on the loaded immediate but still
widens the operation to word_mode we generate (the correct):
main:
    strd    r3, lr, [sp, #-8]!
    movw    r3, #:lower16:d
    movw    r2, #65535
    movt    r3, #:upper16:d
    mov    r0, #0
    str    r2, [r3]
    cbnz    r0, .L5
    pop    {r3, pc}
.L5:
    bl    abort

If I remove that whole block of code we end up emitting a short operation with a bitfield
move operation, so we generate:
main:
    strd    r3, lr, [sp, #-8]!
    mov    r2, #0
    mov    r1, #-1
    movw    r3, #:lower16:d
    bfi    r2, r1, #0, #16
    movt    r3, #:upper16:d
    mov    r0, #0
    str    r2, [r3]
    cbnz    r0, .L5
    pop    {r3, pc}
.L5:
    bl    abort

which is correct, but suboptimal compared to the codegen with this patch.
Based on that, I think that code block is a useful optimisation, we just need
to take care with immediates.

What do you think?

Thanks,
Kyrill


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]