Compile attached bad.c with "-O2" option only we can see that redundant cmp with zero instruction is generated: subl %r9d, %eax cmpl $0, %eax je .L10 but for slightly changed good.c there is no such redundancy: subl %r9d, %eax je .L10 The problem phase is combine. For good case it does combining: Trying 37 -> 38: Successfully matched this instruction: (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (minus:SI (reg:SI 121 [ D.2002 ]) (reg/v:SI 115 [ med ])) (const_int 0 [0]))) (set (reg/v:SI 101 [ n ]) (minus:SI (reg:SI 121 [ D.2002 ]) (reg/v:SI 115 [ med ]))) ]) allowing combination of insns 37 and 38 original costs 4 + 4 = 8 replacement cost 0 but for bad case it is not performed: Trying 37 -> 38: Failed to match this instruction: (set (reg:CC 17 flags) (compare:CC (minus:SI (reg:SI 120 [ D.2006 ]) (reg/v:SI 114 [ med ])) (const_int 0 [0]))) Note that this test-case extracted from one of hot loop in bzip2 (mainQSort3).
Created attachment 35202 [details] test-case to reproduce Need to compile with -O2 flag only.
Created attachment 35203 [details] test-case to reproduce
Well, there is a significant difference between the two testcases, one uses the result of the comparison just in == 0 test, thus CCZmode is appropriate, the other uses it in two comparisons, one == 0 test and one < 0 test. For combine to match *sub<mode>_2 insn, it has to match ix86_match_ccmode (insn, CCGOCmode) where CCGOCmode stands for: Add CCGOC to indicate comparisons against zero that allows unspecified garbage in the Carry and Overflow flag. This mode is used to simulate comparisons of (a-b) and (a+b) against zero using sub/cmp/add operations. But the jle instruction tests ZF || SF <> OF and thus it isn't appropriate. So the question is if the CCGOC test isn't too restrictive, say if CCGCmode would be sufficient (but then we'd still need to arrange for the CCGCmode to be used, rather than CCmode), or if the optimization you are looking for is simply not possible.
Jakub, Thanks for your comments. We will try to fix this issue ourselves. Best regards. Yuri. P.S. Note that icc does not produce such redundant cmp with zero. 2015-04-01 16:10 GMT+03:00 jakub at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org>: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651 > > Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jakub at gcc dot gnu.org, > | |uros at gcc dot gnu.org > > --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Well, there is a significant difference between the two testcases, one uses the > result of the comparison just in == 0 test, thus CCZmode is appropriate, the > other uses it in two comparisons, one == 0 test and one < 0 test. > For combine to match *sub<mode>_2 insn, it has to match > ix86_match_ccmode (insn, CCGOCmode) > where CCGOCmode stands for: > Add CCGOC to indicate comparisons against zero that allows > unspecified garbage in the Carry and Overflow flag. This > mode is used to simulate comparisons of (a-b) and (a+b) > against zero using sub/cmp/add operations. > But the jle instruction tests ZF || SF <> OF and thus it isn't appropriate. > So the question is if the CCGOC test isn't too restrictive, say if CCGCmode > would be sufficient (but then we'd still need to arrange for the CCGCmode to be > used, rather than CCmode), or if the optimization you are looking for is simply > not possible. > > -- > You are receiving this mail because: > You reported the bug.
Confirmed on the trunk still: subl %r9d, %eax testl %eax, %eax jne .L4