[Bug middle-end/67438] [6 Regression] ~X op ~Y pattern relocation causes loop performance degradation on 32bit x86

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Wed Nov 25 08:37:00 GMT 2015


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67438

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-11-25
     Ever confirmed|0                           |1

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Yuri Rumyantsev from comment #11)
> In fact, the problem is quite different although it is caused by
> non-profitable pattern matching ~X CMP ~Y -> Y CMP X. In general this
> pattern may be helpful if we can delete not operation, e.g.
>   x1 = ~x;
>   y1 = ~y;
>   if (x1 <cmp> y1) ... and there no any other uses of x1 and y1, i.e. x1 and
> y1 have single use. But if this is not truth we will increase register
> pressure since we can not use the same register for x,x1 and y,y1.
> 
> Richard proposed to use the same simplification for min/max operations but
> in original test-case nested min/max operation (min(x,min(y,z)) or multi
> operand min/max (min(x,y,z)) are not recognized by gcc (Note that icc does
> such transformation)

Can you file an enhancement bug for this?  Best with a testcase.  AFAICS
a full solution will have pieces in phi-opt and reassoc at least.

> and so this won't help since we have the same register
> pressure issue:
>     c = ~r; 
>     m = ~g;
>     y = ~b;
>     k = min(c, m, y);
>     *out++ = c - k;
>     *out++ = m - k;
>     *out++ = y - k;
>     *out++ = k;
> and we can see that value of 'c' is used in min computation and resulting
> store, so if we will use r <cmp> g comparison we will increase live range
> for r, g, b variables and additional registers will require for them (till
> comparison).
> Note also that there exists another issue with path-splitting (aka tail
> duplication) which duplicate loop back edge and in fact move tail block to
> hammock. This transformation does not loop useful (at least at given stage
> of design) but this is another topic for discussion.
> 
> I'd like to propose to introduce new predicate for pattern matching which
> tells us how much uses have left-hand side of ~x.

There are examples in match.pd that use single_use () in conditions, doing
that would fix this issue.

Note that generally constraining patterns to "single-use" operands misses
the case where applying (the same) pattern(s) at multiple locations may
effectively make operands "single-use".  Currently match.pd patterns
are applied one-by-one (without fully cleaning up dead stmts) in
tree-ssa-forwprop.c which will miss opportunities because of this.  Thus
there is no "global" analysis done to determine whether an operand becomes
dead after applying (multiple) pattern(s) (which is the point of single-use
checks).


More information about the Gcc-bugs mailing list