This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches


On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng <bin.cheng@arm.com> wrote:
>>>> Hi,
>>>> This patch is to fix regression reported in PR60280 by removing forward loop
>>>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>>>> this change since cfg cleanup is shared by all optimizers.  Some tests has
>>>> already been fixed by recent patches, I went through and fixed the others.
>>>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  When
>>>> GCC removing a basic block, it checks profile information by calling
>>>> check_bb_profile after redirecting incoming edges of the bb.  This certainly
>>>> results in warnings about invalid profile information and causes the case to
>>>> fail.  I will send a patch to skip checking profile information for a
>>>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>>>> twisted the case itself.
>>>>
>>>> Bootstrap and tested on x86_64 and arm_a15.
>>>>
>>>> Is it OK?
>>>>
>>>>
>>>> 2014-02-25  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>         PR target/60280
>>>>         * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>>>>         preheaders and latches only if requested.  Fix latch if it
>>>>         is removed.
>>>>         * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>>>>         LOOPS_HAVE_PREHEADERS.
>>>>
>>>
>>> This change:
>>>
>>>         if (dest->loop_father->header == dest)
>>> -  return false;
>>> +  {
>>> +    if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>> +        && bb->loop_father->header != dest)
>>> +      return false;
>>> +
>>> +    if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>> +        && bb->loop_father->header == dest)
>>> +      return false;
>>> +  }
>>>      }
>>>
>>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>>>
>>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
>>> -fuse-linker-plugin
>>>
>>> This patch changes loops without LOOPS_HAVE_PREHEADERS
>>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
>>> true.  I don't have a small testcase.  But this patch:
>>>
>>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
>>> index b5c384b..2ba673c 100644
>>> --- a/gcc/tree-cfgcleanup.c
>>> +++ b/gcc/tree-cfgcleanup.c
>>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
>>>      if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>>          && bb->loop_father->header == dest)
>>>        return false;
>>> +
>>> +    if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>> +        && !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
>>> +      return false;
>>>    }
>>>      }
>>>
>>> fixes the regression.  Does it make any senses?
>>
>> I think the preheader test isn't fully correct (bb may be in an inner loop
>> for example).  So a more conservative variant would be
>>
>> Index: gcc/tree-cfgcleanup.c
>> ===================================================================
>> --- gcc/tree-cfgcleanup.c       (revision 208169)
>> +++ gcc/tree-cfgcleanup.c       (working copy)
>> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
>>        /* Protect loop preheaders and latches if requested.  */
>>        if (dest->loop_father->header == dest)
>>         {
>> -         if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>> -             && bb->loop_father->header != dest)
>> -           return false;
>> -
>> -         if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>> -             && bb->loop_father->header == dest)
>> -           return false;
>> +         if (bb->loop_father == dest->loop_father)
>> +           return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
>> +         else if (bb->loop_father == loop_outer (dest->loop_father))
>> +           return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
>> +         /* Always preserve other edges into loop headers that are
>> +            not simple latches or preheaders.  */
>> +         return false;
>>         }
>>      }
>>
>> that makes sure we can properly update loop information.  It's also
>> a more conservative change at this point which should still successfully
>> remove simple latches and preheaders created by loop discovery.
>
> I think the patch makes sense anyway and thus I'll install it once it
> passed bootstrap / regtesting.
>
> Another fix that may make sense is to restrict it to
> !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
> itself can end up setting that ... which we eventually should fix if it
> still happens.  That is, check if
>
> Index: gcc/tree-cfgcleanup.c
> ===================================================================
> --- gcc/tree-cfgcleanup.c       (revision 208169)
> +++ gcc/tree-cfgcleanup.c       (working copy)
>
> @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)
>
>    timevar_pop (TV_TREE_CLEANUP_CFG);
>
> -  if (changed && current_loops)
> -    loops_state_set (LOOPS_NEED_FIXUP);
> +  if (changed && current_loops
> +      && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
> +    verify_loop_structure ();
>
>    return changed;
>  }
>
> trips anywhere (and apply fixes).  That's of course not appropriate at
> this stage.
>
>> Does it fix 435.gromacs?

I tried revision 208222 and it doesn't fix 435.gromacs.

> I can't see the failure on our testers (x86_64, i?86, with/without LTO).  How
> can I reproduce it?
>

It only happens with

-mx32 -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
-fuse-linker-plugin

The failure is

  Running 435.gromacs ref peak lto default

*** Miscompare of gromacs.out; for details see
    /export/project/git/gcc-regression/spec/2006/spec/benchspec/CPU2006/435.grom
acs/run/run_peak_ref_lto.0000/gromacs.out.mis

cat /export/project/git/gcc-regression/spec/2006/spec/benchspec/CPU2006/435.gromacs/run/run_peak_ref_lto.0000/gromacs.out.mis
0002:  3.07684e+02
       3.03594e+02

It is very sensitive to loop optimization.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]