[Bug tree-optimization/99142] New: [11 Regression] __builtin_clz match.pd transformation too greedy

hp at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Wed Feb 17 23:11:42 GMT 2021


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99142

            Bug ID: 99142
           Summary: [11 Regression] __builtin_clz match.pd transformation
                    too greedy
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hp at gcc dot gnu.org
  Target Milestone: ---

Created attachment 50215
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50215&action=edit
test-case  gcc.dg/tree-ssa/prXXXXX.c

See the attachment test-case, which is de-macroized from
gcc.target/cris/pr93372-31.c, which started regressing with d2eb616a0f7b
"match.pd: Add clz(X) == 0 -> (int)X < 0 etc. simpifications [PR94802]"

In the test-case, the result *is* used more than once (twice more besides the
transformed compare) and the match.pd matching expression *does* have the s
modifier: (op (clz:s @0) INTEGER_CST@1), but since the transformation doesn't
result in "an expression with more than one operator" (cf.
doc/match-and-simplify.texi), it's still performed.

The result is that the *input* is kept alive *after* the clz instruction.  This
generally causes additional register pressure and throws away any re-use of
incidentally computed condition codes.  Though the original observation was for
cris-elf, where the effect is more dramatic, the effect is visible even for
x86_64 and of the same kind: losing the re-use of non-zero condition codes from
the bsrl instruction, i.e. the transformation causes an additional instruction:

--- prXXXXX.s.64good    2021-02-17 02:26:57.646183108 +0100
+++ prXXXXX.s.64bad     2021-02-17 02:27:33.124979464 +0100
@@ -9,7 +9,8 @@ f:
        bsrl    %edi, %eax
        xorl    $31, %eax
        movl    %eax, (%rsi)
-       je      .L1
+       testl   %edi, %edi
+       js      .L1
        movl    %eax, (%rdx)
 .L1:
        ret

To wit, my conclusion is that the matching condition should better be gated by
single_use(clz result) *everywhere*.

Alternatively, the "s" modifier adjusted somehow, but I'm not sure besides
obviously just making it *exactly* single_use, and that suggestion has been
shot down before.

Maybe there should be an additional *reverse* version of the "simplification",
replacing "y = clz(x); if (x < 0) ...stuff using y but not x" -> "y = clz(x);
if (y != 0) ...stuff using y but not x"!


More information about the Gcc-bugs mailing list