[Bug tree-optimization/62173] [5.0 regression] 64bit Arch can't ivopt while 32bit Arch can

Tue Jan 27 09:11:00 GMT 2015

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173

--- Comment #28 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 27 Jan 2015, amker at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173
> 
> --- Comment #26 from amker at gcc dot gnu.org ---
> (In reply to Richard Biener from comment #17)
> > I really wonder why IVOPTs calls convert_affine_scev with
> > !use_overflow_semantics.
> I don't understand below code in convert_affine_scev:
> 
>   enforce_overflow_semantics = (use_overflow_semantics
>                 && nowrap_type_p (type));
> According to comments, 
> 
>    "USE_OVERFLOW_SEMANTICS is true if this function should assume that
>    the rules for overflow of the given language apply (e.g., that signed
>    arithmetics in C does not overflow) -- i.e., to use them to avoid
> unnecessary
>    tests, but also to enforce that the result follows them."
> 
> Seems to me we need to enforce overflow check for result if we take 
> advantage of USE_OVERFLOW_SEMANTICS to prove there is no overflow for 
> src.  So shouldn't we set enforce_overflow_semantics according to 
> "nowrap_type_p (TREE_TYPE (*base))", rather than the result type.

Yes, I also wondered about this...

> Also it is noted at the end of function, that we can't use the fact 
> "signed variables do not overflow" when we are checking for result.
>
> But the function is used widespread in scev, there shouldn't be anything so
> wrong.

Heh - I wouldn't count on that.

> > Note that for the original testcase 'i' may be negative or zero and thus 'd'
> > may be zero.  We do a bad analysis here because IVOPTs follows complete
> > peeling immediately...  but at least we have range information that looks
> > useful:
> 
> The case also holds for O2, at this level gcc won't completely unroll 
> the first loop.
> 
> An irrelevant question.  Isn't cunroll too aggressive in GCC?  For cases 
> like this one, the code size is bloated and may hurt Icache performance, 
> while only saving several increment instruction.

Yeah - it was Honza enabling this aggressive peeling.  It makes sense
for a limited amount of code growth (like peeling two iterations) but
indeed using the same limit as for unrolling (where we know intermediate
exits are not taken) doesn't make too much sense...  I wonder if
the size estimates are correctly handling that fact...