This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug rtl-optimization/56175] New: Issue with combine phase on x86.
- From: "ysrumyan at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Fri, 01 Feb 2013 15:51:38 +0000
- Subject: [Bug rtl-optimization/56175] New: Issue with combine phase on x86.
- Auto-submitted: auto-generated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
Bug #: 56175
Summary: Issue with combine phase on x86.
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: ysrumyan@gmail.com
Analyzing performance of important benchmark on x86 Atom in 32bit mode we found
out that the code produced for attached testcase is not optimal - the inner
loop contains 18 instructions instead of 12.
The problem is that 'combine' does not perform desired substitution for the
following stmt:
t = (u8)((x & 1) ^ ((u8)y & 1));
It is not able to convert it to more optimal form like:
t = (u8)((x ^ (u8)y ) & 1);
This issue can be explained using the following testcase:
int foo( unsigned char x, unsigned short y)
{
unsigned char z;
if (x ==0 || y == 0)
return 0;
x>>=1;
y>>=1;
z = (unsigned char)((x & 1) ^ ((unsigned char)y & 1));
if (z == 1)
return 1;
return 0;
}
For this case combine performs needed transformation and we get optimal
assembly:
...
xorl %edx, %eax
andl $1, %eax
ret
For this case combine tries to perform the following substitution:
Trying 22, 20 -> 23:
Failed to match this instruction:
(parallel [
(set (reg:QI 83 [ D.1758 ])
(and:QI (xor:QI (reg:QI 79 [ x ])
(subreg:QI (reg:HI 81 [ y ]) 0))
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
])
Failed to match this instruction:
(set (reg:QI 83 [ D.1758 ])
(and:QI (xor:QI (reg:QI 79 [ x ])
(subreg:QI (reg:HI 81 [ y ]) 0))
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (reg:QI 82 [ D.1760 ])
(xor:QI (reg:QI 79 [ x ])
(subreg:QI (reg:HI 81 [ y ]) 0)))
Successfully matched this instruction:
(set (reg:QI 83 [ D.1758 ])
(and:QI (reg:QI 82 [ D.1760 ])
(const_int 1 [0x1])))
where
(insn 20 19 21 4 (parallel [
(set (reg:QI 80 [ D.1759 ])
(and:QI (reg:QI 79 [ x ])
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) t.c:8 405 {*andqi_1}
(expr_list:REG_DEAD (reg:QI 79 [ x ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))))
(insn 22 21 23 4 (parallel [
(set (reg:HI 82 [ D.1760 ])
(and:HI (reg:HI 81 [ y ])
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) t.c:8 404 {*andhi_1}
(expr_list:REG_DEAD (reg:HI 81 [ y ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))))
(insn 23 22 24 4 (parallel [
(set (reg:QI 83 [ D.1758 ])
(xor:QI (reg:QI 80 [ D.1759 ])
(subreg:QI (reg:HI 82 [ D.1760 ]) 0)))
(clobber (reg:CC 17 flags))
]) t.c:8 426 {*xorqi_1}
(expr_list:REG_DEAD (reg:HI 82 [ D.1760 ])
(expr_list:REG_DEAD (reg:QI 80 [ D.1759 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))))
but for more compicated test that is attached combine tries to do the same
substitution in reverse order of operands and it is failed:
Trying 14, 13 -> 15:
Failed to match this instruction:
(parallel [
(set (reg:QI 63 [ D.1770 ])
(xor:QI (and:QI (reg/v:QI 72 [ x ])
(const_int 1 [0x1]))
(and:QI (subreg:QI (reg/v:HI 74 [ y ]) 0)
(const_int 1 [0x1]))))
(clobber (reg:CC 17 flags))
])
Failed to match this instruction:
(set (reg:QI 63 [ D.1770 ])
(xor:QI (and:QI (reg/v:QI 72 [ x ])
(const_int 1 [0x1]))
(and:QI (subreg:QI (reg/v:HI 74 [ y ]) 0)
(const_int 1 [0x1]))))
Successfully matched this instruction:
(set (reg:QI 77 [ D.1771 ])
(and:QI (subreg:QI (reg/v:HI 74 [ y ]) 0)
(const_int 1 [0x1])))
Failed to match this instruction:
(set (reg:QI 63 [ D.1770 ])
(xor:QI (and:QI (reg/v:QI 72 [ x ])
(const_int 1 [0x1]))
(reg:QI 77 [ D.1771 ])))
where
(insn 13 12 14 3 (parallel [
(set (reg:HI 76 [ D.1772 ])
(and:HI (reg/v:HI 74 [ y ])
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) t1.c:9 404 {*andhi_1}
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 14 13 15 3 (parallel [
(set (reg:QI 77 [ D.1771 ])
(and:QI (reg/v:QI 72 [ x ])
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) t1.c:9 405 {*andqi_1}
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 15 14 16 3 (parallel [
(set (reg:QI 63 [ D.1770 ])
(xor:QI (reg:QI 77 [ D.1771 ])
(subreg:QI (reg:HI 76 [ D.1772 ]) 0)))
(clobber (reg:CC 17 flags))
]) t1.c:9 426 {*xorqi_1}
(expr_list:REG_DEAD (reg:QI 77 [ D.1771 ])
(expr_list:REG_DEAD (reg:HI 76 [ D.1772 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))))
It seems that if we tried to combine 13, 14 -> 15 we will be successful.
Note also that an order of instructions is different after expand.