59461 – missed zero-extension elimination in the combiner

Bug 59461 - missed zero-extension elimination in the combiner

Summary: missed zero-extension elimination in the combiner

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.9.0

Importance:	P3 enhancement
Target Milestone:	7.0
Assignee:	Eric Botcazou

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2013-12-10 23:12 UTC by Eric Botcazou
Modified:	2019-09-10 14:05 UTC (History)
CC List:	0 users

See Also:	91720
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2013-12-10 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eric Botcazou 2013-12-10 23:12:57 UTC

This is a spin-off of PR rtl-optimization/58295.  The zero-extension is not (and has never been) eliminated on the SPARC at -O2:

ee_isdigit2:
        sethi   %hi(zeb_test_array), %g1
        or      %g1, %lo(zeb_test_array), %g1
        ldub    [%g1+%o0], %g1
        mov     0, %o0
        add     %g1, -48, %g1
        and     %g1, 0xff, %g1
        cmp     %g1, 9
        jmp     %o7+8
         movleu %icc, 1, %o0
        .size   ee_isdigit2, .-ee_isdigit2

The instruction "and %g1, 0xff, %g1" is redundant like on the ARM and the combiner should eliminate it.  The difference between the ARM and the SPARC is that the former explicitly zero-extends the load from memory while the latter does it only implicitly via LOAD_EXTEND_OP.  This shouldn't matter in the end, but does here because of some weakness of the nonzero_bits machinery.

Comment 1 Eric Botcazou 2013-12-10 23:14:14 UTC

I'll look into this at some point.

Comment 2 Richard Biener 2013-12-11 09:23:53 UTC

I wonder if x86_64 is also affected as it has implicitely zero/sign-extending
loads as well.

Comment 3 Eric Botcazou 2013-12-11 09:51:16 UTC

> I wonder if x86_64 is also affected as it has implicitely zero/sign-extending
> loads as well.

Not for this testcase at least, where the code is (and has always been) optimal:

ee_isdigit2:
.LFB0:
        .cfi_startproc
        movl    %edi, %edi
        movzbl  zeb_test_array(%rdi), %eax
        subl    $48, %eax
        cmpb    $9, %al
        setbe   %al
        ret
        .cfi_endproc

because the x86-64 can perform the addition in QImode directly.

Comment 4 Eric Botcazou 2016-11-11 22:39:05 UTC

Author: ebotcazou
Date: Fri Nov 11 22:38:33 2016
New Revision: 242326

URL: https://gcc.gnu.org/viewcvs?rev=242326&root=gcc&view=rev
Log:
	PR rtl-optimization/59461
	* doc/rtl.texi (paradoxical subregs): Add missing word.
	* combine.c (reg_nonzero_bits_for_combine): Do not discard results
	in modes with precision larger than that of last_set_mode.
	* rtlanal.c (nonzero_bits1) <SUBREG>: If WORD_REGISTER_OPERATIONS is
	set and LOAD_EXTEND_OP is appropriate, propagate results from inner
	REGs to paradoxical SUBREGs.
	(num_sign_bit_copies1) <SUBREG>: Likewise.  Check that the mode is not
	larger than a word before invoking LOAD_EXTEND_OP on it.

Added:
    trunk/gcc/testsuite/gcc.target/sparc/20161111-1.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/combine.c
    trunk/gcc/doc/rtl.texi
    trunk/gcc/rtlanal.c
    trunk/gcc/testsuite/ChangeLog

Comment 5 Eric Botcazou 2016-11-11 22:41:08 UTC

Fixed in GCC 7.

Comment 6 Matthew Fortune 2017-01-12 12:49:38 UTC

(In reply to Eric Botcazou from comment #5)
> Fixed in GCC 7.

There is a reasonable chance that this patch broke mips64 n64 but I do not have confirmation yet. See PR target/78660.

I can vaguely see how this patch may affect MIPS64 in terms of how 32-bit values are handled on a MIPS64 architecture with the sign bit replicated to the upper-32 bits. This presumably is somehow not accounted for in the nonzero_bits logic.

I'm yet to get my head around what the issue is but if anyone has a pointer based on the potential impact on MIPS64 as described above then I'd be grateful.

Comment 7 Eric Botcazou 2017-01-12 16:11:49 UTC

> There is a reasonable chance that this patch broke mips64 n64 but I do not
> have confirmation yet. See PR target/78660.

The quoted hunk only reverted a recent pessimization (r205550), the current code is the correct approach as explained in the comment.

> I can vaguely see how this patch may affect MIPS64 in terms of how 32-bit
> values are handled on a MIPS64 architecture with the sign bit replicated to
> the upper-32 bits. This presumably is somehow not accounted for in the
> nonzero_bits logic.

Probably yes, maybe in conjunction with WORD_REGISTER_OPERATIONS.

> I'm yet to get my head around what the issue is but if anyone has a pointer
> based on the potential impact on MIPS64 as described above then I'd be
> grateful.

Is WORD_REGISTER_OPERATIONS correct for MIPS64, i.e. do all instructions operate on the full 64-bit itneger registers?

Comment 8 Matthew Fortune 2017-01-12 16:49:37 UTC

(In reply to Eric Botcazou from comment #7)
> > I'm yet to get my head around what the issue is but if anyone has a pointer
> > based on the potential impact on MIPS64 as described above then I'd be
> > grateful.
> 
> Is WORD_REGISTER_OPERATIONS correct for MIPS64, i.e. do all instructions
> operate on the full 64-bit integer registers?

This is a notoriously hard topic to address. All instructions affect the full 64-bit register including those that do 32-bit arithmetic i.e. they will set/clear the upper bits to replicate bit-31. However, according to the architecture they logically operate on 32-bits and require that all inputs are canonical (sign bit replicated) otherwise the operation is invalid. So it would not matter whether the register was 33 bits or 1000 bits wide as long as all bits from 32 upwards replicate bit-31. The upper bits only become relevant once a 32-bit value is cast to a 64-bit value where sign extension is free and zero extension is an operation. truncation from 64-bit to 32-bit is a sign extension from bit-31 regardless of whether it is truncating to signed or unsigned.

In terms of instruction definition we therefore have instructions that operate on DImode and instructions that operate on SImode. The SImode instructions just don't need to worry about what is happening with the upper bits.

I don't know if any of that subtlety affects this yet.

Comment 9 Eric Botcazou 2017-01-13 08:14:10 UTC

> This is a notoriously hard topic to address. All instructions affect the
> full 64-bit register including those that do 32-bit arithmetic i.e. they
> will set/clear the upper bits to replicate bit-31.

So there are different 32-bit and 64-bit 'add' instructions for example?  That might nevertheless be OK for WORD_REGISTER_OPERATIONS.

> In terms of instruction definition we therefore have instructions that
> operate on DImode and instructions that operate on SImode. The SImode
> instructions just don't need to worry about what is happening with the upper
> bits.

Likewise for SPARC, which is WORD_REGISTER_OPERATIONS too, but doesn't care about the upper bits in 32-bit mode and has a single 'add' instruction (but maintains 32-bit _and_ 64-bit condition codes for every instruction).

> I don't know if any of that subtlety affects this yet.

Bugs in the nonzero_bits machinery are not to be ruled out either.  I'm willing to help of course but I don't have access to MIPS64 hardware.

Comment 10 Matthew Fortune 2017-01-13 09:40:05 UTC

(In reply to Eric Botcazou from comment #9)
> > This is a notoriously hard topic to address. All instructions affect the
> > full 64-bit register including those that do 32-bit arithmetic i.e. they
> > will set/clear the upper bits to replicate bit-31.
> 
> So there are different 32-bit and 64-bit 'add' instructions for example? 
> That might nevertheless be OK for WORD_REGISTER_OPERATIONS.

Yes. But logical operations are the same machine instructions regardless of whether it is being used for a 64-bit logical op or 32-bit logical op.

> > In terms of instruction definition we therefore have instructions that
> > operate on DImode and instructions that operate on SImode. The SImode
> > instructions just don't need to worry about what is happening with the upper
> > bits.
> 
> Likewise for SPARC, which is WORD_REGISTER_OPERATIONS too, but doesn't care
> about the upper bits in 32-bit mode and has a single 'add' instruction (but
> maintains 32-bit _and_ 64-bit condition codes for every instruction).

So are the upper bits for SPARC completely undefined? That would then be the major distinction between MIPS and SPARC. The upper bits are defined for MIPS

> > I don't know if any of that subtlety affects this yet.
> 
> Bugs in the nonzero_bits machinery are not to be ruled out either.  I'm
> willing to help of course but I don't have access to MIPS64 hardware.

I'm yet to even narrow down to an object file affected owing to the bug only showing in stage2 build so far.

Thanks for the offer of help, I'll add more when I find it.

Comment 11 Eric Botcazou 2017-01-13 10:21:34 UTC

> So are the upper bits for SPARC completely undefined? That would then be the
> major distinction between MIPS and SPARC. The upper bits are defined for MIPS

On the contrary, they are always defined since instructions always operate on the full 64-bit registers, but they don't matter for 32-bit operations, i.e. 32-bit operations would work the same if they were garbage.