Bug 65651 - Redundant cmp with zero instruction in loop for x86 target.
Summary: Redundant cmp with zero instruction in loop for x86 target.
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 5.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2015-04-01 12:33 UTC by Yuri Rumyantsev
Modified: 2022-01-10 00:08 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-01-10 00:00:00


Attachments
test-case to reproduce (293 bytes, text/plain)
2015-04-01 12:35 UTC, Yuri Rumyantsev
Details
test-case to reproduce (291 bytes, text/plain)
2015-04-01 12:36 UTC, Yuri Rumyantsev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Rumyantsev 2015-04-01 12:33:54 UTC
Compile attached bad.c with "-O2" option only we can see that redundant cmp with zero instruction is generated:
	subl	%r9d, %eax
	cmpl	$0, %eax
	je	.L10
 but for slightly changed good.c there is no such redundancy:
	subl	%r9d, %eax
	je	.L10

The problem phase is combine.
For good case it does combining:
Trying 37 -> 38:
Successfully matched this instruction:
(parallel [
        (set (reg:CCZ 17 flags)
            (compare:CCZ (minus:SI (reg:SI 121 [ D.2002 ])
                    (reg/v:SI 115 [ med ]))
                (const_int 0 [0])))
        (set (reg/v:SI 101 [ n ])
            (minus:SI (reg:SI 121 [ D.2002 ])
                (reg/v:SI 115 [ med ])))
    ])
allowing combination of insns 37 and 38
original costs 4 + 4 = 8
replacement cost 0
but for bad case it is not performed:
Trying 37 -> 38:
Failed to match this instruction:
(set (reg:CC 17 flags)
    (compare:CC (minus:SI (reg:SI 120 [ D.2006 ])
            (reg/v:SI 114 [ med ]))
        (const_int 0 [0])))

Note that this test-case extracted from one of hot loop in bzip2 (mainQSort3).
Comment 1 Yuri Rumyantsev 2015-04-01 12:35:26 UTC
Created attachment 35202 [details]
test-case to reproduce

Need to compile with -O2 flag only.
Comment 2 Yuri Rumyantsev 2015-04-01 12:36:04 UTC
Created attachment 35203 [details]
test-case to reproduce
Comment 3 Jakub Jelinek 2015-04-01 13:10:05 UTC
Well, there is a significant difference between the two testcases, one uses the result of the comparison just in == 0 test, thus CCZmode is appropriate, the other uses it in two comparisons, one == 0 test and one < 0 test.
For combine to match *sub<mode>_2 insn, it has to match
ix86_match_ccmode (insn, CCGOCmode)
where CCGOCmode stands for:
   Add CCGOC to indicate comparisons against zero that allows
   unspecified garbage in the Carry and Overflow flag. This
   mode is used to simulate comparisons of (a-b) and (a+b)
   against zero using sub/cmp/add operations.
But the jle instruction tests ZF || SF <> OF and thus it isn't appropriate.
So the question is if the CCGOC test isn't too restrictive, say if CCGCmode would be sufficient (but then we'd still need to arrange for the CCGCmode to be used, rather than CCmode), or if the optimization you are looking for is simply not possible.
Comment 4 Yuri Rumyantsev 2015-04-01 13:45:43 UTC
Jakub,

Thanks for your comments.

We will try to fix this issue ourselves.

Best regards.
Yuri.

P.S. Note that icc does not produce such redundant cmp with zero.

2015-04-01 16:10 GMT+03:00 jakub at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org>:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65651
>
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |jakub at gcc dot gnu.org,
>                    |                            |uros at gcc dot gnu.org
>
> --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Well, there is a significant difference between the two testcases, one uses the
> result of the comparison just in == 0 test, thus CCZmode is appropriate, the
> other uses it in two comparisons, one == 0 test and one < 0 test.
> For combine to match *sub<mode>_2 insn, it has to match
> ix86_match_ccmode (insn, CCGOCmode)
> where CCGOCmode stands for:
>    Add CCGOC to indicate comparisons against zero that allows
>    unspecified garbage in the Carry and Overflow flag. This
>    mode is used to simulate comparisons of (a-b) and (a+b)
>    against zero using sub/cmp/add operations.
> But the jle instruction tests ZF || SF <> OF and thus it isn't appropriate.
> So the question is if the CCGOC test isn't too restrictive, say if CCGCmode
> would be sufficient (but then we'd still need to arrange for the CCGCmode to be
> used, rather than CCmode), or if the optimization you are looking for is simply
> not possible.
>
> --
> You are receiving this mail because:
> You reported the bug.
Comment 5 Andrew Pinski 2022-01-10 00:08:13 UTC
Confirmed on the trunk still:
        subl    %r9d, %eax
        testl   %eax, %eax
        jne     .L4