Bug 37262 - Two branches of the same condition being emitted
Summary: Two branches of the same condition being emitted
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 4.4.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2008-08-28 02:02 UTC by Andrew Pinski
Modified: 2017-11-19 20:01 UTC (History)
4 users (show)

See Also:
Host:
Target: powerpc64-linux-gnu
Build:
Known to work: 6.0
Known to fail: 4.9.3
Last reconfirmed: 2015-11-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Pinski 2008-08-28 02:02:00 UTC
Take:
unsigned ReverseBits (unsigned index, unsigned NumBits)
{
  unsigned i, rev;

  for (i = rev = 0; i < NumBits; i++)
  {
    rev = (rev << 1) | (index & 1);
    index >>= 1;
  }
  return rev;
}
---- CUT ---
Currently we get:
        mtctr 9
        beq- 7,.L8
        beq- 7,.L8
        .p2align 3,,7

Which is obviously broken as we should have only one beq as they use the same CR and go to the same block and there is no way to get to the second one without going through the first.

4.1.1 -fno-ivopts produces even worse code:
        cmplwi 7,4,1
        blt- 7,.L8
        cmpwi 7,4,0
        beq- 7,.L8

But we know that this a logicial compare so r4 < 1 is the same as r4 ==0 so 4.3 produces better code but still needs slight improvement with respect of getting rid of the extra branch (though we have regression between 4.1 and 4.3 which I will file seperately as it is unrelated to this bug).
Comment 1 Andrew Pinski 2008-08-30 01:41:45 UTC
I have seen this in other cases even for the non doloop case, though I don't know if it is because of -O1 or because it is not removing it.
Testcase:
int _bfd_xcoff_canonicalize_dynamic_reloc (unsigned long long l_symndx)
{
  if (l_symndx < 3)
    {
      switch (l_symndx)
      {
        case 0:
        case 1:
         break;
        case 2:
         return _bfd_abort ();
    }
  }
}
--- CUT ---
Compile at -O1 on powerpc-linux and you will see the double branches:
        bne 0,.L7
        bne 0,.L8
Comment 2 Andrew Pinski 2010-03-02 18:29:46 UTC
Still happens as of today on the trunk.
Comment 3 Segher Boessenkool 2015-11-22 18:41:12 UTC
Your first testcase works on both 4.9 and 6.0 now.  The second
still fails on 4.9, but works on 6.  I don't know about GCC 5.
Comment 4 Martin Sebor 2016-01-30 04:11:37 UTC
I confirm that there are no duplicated branch instructions in object code emitted by GCC 6.0, although the branches that are there don't look quite optimal.

I built the gcc-5-branch on powerpc64-linux and the duplicate instructions are still there with the second test case compiled at -O2 (see below).

$ /build/gcc-5.x/gcc/xgcc -B /build/gcc-5.x/gcc -O1 -S -Wall -Wextra -Wpedantic -o/dev/stdout -m32 t.c
	.file	"t.c"
	.machine power4
t.c: In function ‘_bfd_xcoff_canonicalize_dynamic_reloc’:
t.c:15:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
	.section	".text"
	.align 2
	.globl _bfd_xcoff_canonicalize_dynamic_reloc
	.type	_bfd_xcoff_canonicalize_dynamic_reloc, @function
_bfd_xcoff_canonicalize_dynamic_reloc:
	cmpwi 0,3,0
	bne 0,.L2          <<< branch
	bne 0,.L6          <<< duplicate branch
	cmplwi 7,4,2
	bgt 7,.L2
.L6:
	cmpwi 7,3,0
	bne 7,.L2
	cmplwi 7,4,2
	bne 7,.L2
	stwu 1,-16(1)
	mflr 0
	stw 0,20(1)
	bl _bfd_abort
	b .L1
.L2:
	blr
.L1:
	lwz 0,20(1)
	mtlr 0
	addi 1,1,16
	blr
	.size	_bfd_xcoff_canonicalize_dynamic_reloc,.-_bfd_xcoff_canonicalize_dynamic_reloc
	.ident	"GCC: (GNU) 5.3.1 20160130"
	.section	.note.GNU-stack,"",@progbits
Comment 5 Martin Sebor 2016-01-30 04:19:56 UTC
To make clear what I meant by "not optimal" in comment #4, and focusing on powepc64le output with -O2 for the test case in comment #1, trunk (6.0) emits the code below.  The first branch (to .L2) is superfluous given the second (to .L9).  

_bfd_xcoff_canonicalize_dynamic_reloc:
.LCF0:
0:	addis 2,12,.TOC.-.LCF0@ha
	addi 2,2,.TOC.-.LCF0@l
	cmpldi 7,3,2
	bgt 7,.L2   <<< superfluous branch
	beq 7,.L9
.L2:
	blr
	.p2align 4,,15
.L9:
	mflr 0
	std 0,16(1)
	stdu 1,-96(1)
	bl _bfd_abort
	nop
	addi 1,1,96
	ld 0,16(1)
	mtlr 0
	blr
Comment 6 Segher Boessenkool 2016-01-30 15:28:45 UTC
(In reply to Martin Sebor from comment #4)
> I built the gcc-5-branch on powerpc64-linux and the duplicate instructions
> are still there with the second test case compiled at -O2 (see below).
> 
> $ /build/gcc-5.x/gcc/xgcc -B /build/gcc-5.x/gcc -O1 -S -Wall -Wextra

That's -O1.  Is it actually still there at -O2?

For the other case (both bgt and beq), at expand time there is a switch
which could just be a single conditional (only case 2 is not the same
as default, not empty in this case).  This doesn't seem to be powerpc-
specific; could you open a new PR for this?  Thanks.
Comment 7 Martin Sebor 2016-01-30 15:45:52 UTC
Yes, sorry, I meant to say "still there at -O1" because that's the optimization level mentioned in comment #1.  (I had to guess at the level based on that.  I also accidentally used -m32 when the target is powerpc64, probably because of all the other old 32-bit powerpc bugs I've been looking at.)

The -O2 code for -m32 doesn't have the duplicate branch, though it does nicely show the other unnecessary branches:

$ /build/gcc-5.x/gcc/xgcc -B /build/gcc-5.x/gcc -O2 -S -Wall -Wextra -Wpedantic -m32 -o/dev/stdout t.c
...
_bfd_xcoff_canonicalize_dynamic_reloc:
	cmpwi 7,3,0
	bnelr 7
	cmplwi 7,4,2
	bgtlr 7
	bnelr 7
	b _bfd_abort

...and for -m64:

.L._bfd_xcoff_canonicalize_dynamic_reloc:
	cmpldi 7,3,2
	bgt 7,.L2
	beq 7,.L7
.L2:
	blr
	.p2align 4,,15
.L7:
	mflr 0
	std 0,16(1)
	stdu 1,-112(1)
	bl _bfd_abort
	nop
	addi 1,1,112
	ld 0,16(1)
	mtlr 0
	blr
Comment 8 Segher Boessenkool 2017-11-19 15:37:20 UTC
This is fixed on trunk.  I'll bisect it to see if it is worth backporting.
Comment 9 Segher Boessenkool 2017-11-19 20:01:46 UTC
So not exactly surprising the second testcase was fixed by r251690.
This should probably not be backported.  The first testcase is fixed
everywhere.  Closing the bug now.