This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug middle-end/67438] [6 Regression] ~X op ~Y pattern relocation causes loop performance degradation on 32bit x86


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67438

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 3 Sep 2015, miyuki at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67438
> 
> Mikhail Maltsev <miyuki at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |miyuki at gcc dot gnu.org
> 
> --- Comment #4 from Mikhail Maltsev <miyuki at gcc dot gnu.org> ---
> I looked at gimple dumps. The only difference looks like this. In the "good"
> revision after forwprop1:
> 
>   <bb 3>:
>   _13 = *in_2;
>   a_14 = ~_13;
>   _17 = MEM[(char *)in_2 + 1B];
>   b_18 = ~_17;
>   in_20 = &MEM[(void *)in_2 + 3B];
>   _21 = MEM[(char *)in_2 + 2B];
>   c_22 = ~_21;
>   if (a_14 < b_18)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> 
> In the "bad" revision this basic block is simplified:
> 
>   <bb 3>:
>   _13 = *in_2;
>   a_14 = ~_13;
>   _17 = MEM[(char *)in_2 + 1B];
>   b_18 = ~_17;
>   in_20 = &MEM[(void *)in_2 + 3B];
>   _21 = MEM[(char *)in_2 + 2B];
>   c_22 = ~_21;
>   if (_13 > _17)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> 
> Next BB's are:
> 
>   <bb 4>: d_23 = MIN_EXPR <a_14, c_22>;
>   <bb 5>: d_24 = MIN_EXPR <b_18, c_22>;
>   <bb 6>: # d_4 = PHI <d_23(4), d_24(5)>
> 
> The condition of "if" is not altered throughout all other passes (it gets
> if-converted and vectorized).
> 
> Another small difference: VRP adds assertions in bb 4 (a_12 lt_expr b_14, b_14
> gt_expr a_12) and bb5 (a_12 ge_expr b_14, b_14 le_expr a_12). For some reason
> this does not happen in the "bad" revision.
> 
> As I understand, the problem is that if we do not fold the condition, values
> _13 and _17 are killed after we calculate a_14 = ~_13 and b_18 = ~_17. But if
> we do fold, they are still live (because they are used in the condition), thus,
> register pressure increases.

Yes.  Note that because of :s implementation details "fixing"

/* Fold ~X op ~Y as Y op X.  */
(for cmp (simple_comparison)
 (simplify
  (cmp (bit_not @0) (bit_not @1))
  (cmp @1 @0)))

with :s on the bit_not's is not going to help (because we still allow
a single-stmt result as we are just replacing one with another).  So
:s cannot be used to guard against register pressure increase but
only to guard against undoing CSE.

For the case in this bug the user might have written the testcase
in the way we transform it now and thus what is desirable is a pass
that can reduce register pressure by expressing values in a different
way.

For the case above, why is a_14 = ~_13 not sunk to the edge
3->4 and b_18 = ~_17 to the edge 3->5?  (yes, this creates
additional BBs)  This would reduce register pressure.  Maybe
this kind of scheduling can be considered when register pressure
is high (does -fsched-pressure -fschedule-insns help?)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]