This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Keep static VTA locs in cselib tables only


Alexandre Oliva <aoliva@redhat.com> writes:
> On Nov 25, 2011, Jakub Jelinek <jakub@redhat.com> wrote:
>> The numbers I got with your patch (RTL checking) are below, seems
>> the cumulative numbers other than 100% are all bigger with patched stage2,
>> which means unfortunately debug info quality degradation.
>
> Not really.  I found some actual degradation after finally getting back
> to it.  In some cases, I failed to reset NO_LOC_P, and this caused
> expressions that depended on it to resort to alternate values or end up
> unset.  In other cases, we created different cselib values for debug
> temps and implicit ptrs, and merging them at dataflow confluences no
> longer found a common value because the common value was in cselib's
> static equivalence table.  I've fixed (and added an assertion to catch)
> left-over NO_LOC_Ps, and arranged for values created for debug exprs,
> implicit ptr, entry values and parameter refs to be preserved across
> basic blocks as constants within cselib.
>
> With that, the debug info we get is a strict improvement in terms of
> coverage, even though a bunch of .o files still display a decrease in
> 100% coverage.  In the handful files I examined, the patched compiler
> was emitting a loc list without full coverage, while the original
> compiler was emitting a single loc expr, that implicitly got full
> coverage even though AFAICT it should really cover a narrower range.
> Full coverage was a false positive, and less-than-100% coverage in these
> cases is not a degradation, but rather an improvement.
>
> Now, the reason why we emit additional expressions now is that the new
> algorithm is more prone to emitting different (and better) expressions
> when entering basic block, because we don't try as hard as before to
> keep on with the same location expression.  Instead we recompute all the
> potentially-changed expressions, which will tend to select better
> expressions if available.
>
>> Otherwise the patch looks good to me.
>
> Thanks.  After the updated comparison data below, you can find the patch
> I'm checking in, followed by the small interdiff from the previous
> patch.
>
>
> Happy GNU Year! :-)
>
>
> The results below can be reproduced with r182723.
>
> stage1 sources are patched, stage2 and stage3 aren't, so
> stage2 is built with a patched compiler, stage3 isn't.
>
> $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.ev
>   100784 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.ev
>   102406 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.ev
>    33275 obj-i686-linux-gnu/stage2-gcc/cc1plus.ev
>    33944 obj-i686-linux-gnu/stage3-gcc/cc1plus.ev
>
> $ wc -l obj-{x86_64,i686}-linux-gnu/stage[23]-gcc/cc1plus.csv
>    523647 obj-x86_64-linux-gnu/stage2-gcc/cc1plus.csv
>    523536 obj-x86_64-linux-gnu/stage3-gcc/cc1plus.csv
>    521276 obj-i686-linux-gnu/stage2-gcc/cc1plus.csv
>    521907 obj-i686-linux-gnu/stage3-gcc/cc1plus.csv
>
> $ diff -yW80 obj-x86_64-linux-gnu/stage[23]-gcc/cc1plus.ls
> cov%    samples cumul                   cov%    samples cumul
> 0.0     150949/30%      150949/30%    | 0.0     150980/30%      150980/30%
> 0..5    6234/1% 157183/31%            | 0..5    6254/1% 157234/31%
> 6..10   5630/1% 162813/32%            | 6..10   5641/1% 162875/32%
> 11..15  4675/0% 167488/33%            | 11..15  4703/0% 167578/33%
> 16..20  5041/1% 172529/34%            | 16..20  5044/1% 172622/34%
> 21..25  5435/1% 177964/35%            | 21..25  5466/1% 178088/35%
> 26..30  4249/0% 182213/36%            | 26..30  4269/0% 182357/36%
> 31..35  4666/0% 186879/37%            | 31..35  4674/0% 187031/37%
> 36..40  6939/1% 193818/38%            | 36..40  6982/1% 194013/38%
> 41..45  7824/1% 201642/40%            | 41..45  7859/1% 201872/40%
> 46..50  8538/1% 210180/42%            | 46..50  8536/1% 210408/42%
> 51..55  7585/1% 217765/43%            | 51..55  7611/1% 218019/43%
> 56..60  6088/1% 223853/44%            | 56..60  6108/1% 224127/44%
> 61..65  5545/1% 229398/45%            | 61..65  5574/1% 229701/46%
> 66..70  7151/1% 236549/47%            | 66..70  7195/1% 236896/47%
> 71..75  8068/1% 244617/49%            | 71..75  8104/1% 245000/49%
> 76..80  18852/3%        263469/52%    | 76..80  18879/3%        263879/52%
> 81..85  11958/2%        275427/55%    | 81..85  11954/2%        275833/55%
> 86..90  15201/3%        290628/58%    | 86..90  15145/3%        290978/58%
> 91..95  16814/3%        307442/61%    | 91..95  16727/3%        307705/61%
> 96..99  17121/3%        324563/65%    | 96..99  16991/3%        324696/65%
> 100     174515/34%      499078/100%   | 100     173994/34%      498690/100%
>
> $ diff -yW80 obj-i686-linux-gnu/stage[23]-gcc/cc1plus.ls
> cov%    samples cumul                   cov%    samples cumul
> 0.0     145453/27%      145453/27%    | 0.0     145480/27%      145480/27%
> 0..5    6594/1% 152047/29%            | 0..5    6603/1% 152083/29%
> 6..10   5664/1% 157711/30%            | 6..10   5671/1% 157754/30%
> 11..15  4982/0% 162693/31%            | 11..15  4997/0% 162751/31%
> 16..20  6155/1% 168848/32%            | 16..20  6169/1% 168920/32%
> 21..25  5038/0% 173886/33%            | 21..25  5057/0% 173977/33%
> 26..30  4925/0% 178811/34%            | 26..30  4918/0% 178895/34%
> 31..35  4359/0% 183170/35%            | 31..35  4372/0% 183267/35%
> 36..40  6977/1% 190147/36%            | 36..40  6972/1% 190239/36%
> 41..45  8138/1% 198285/38%            | 41..45  8148/1% 198387/38%
> 46..50  8538/1% 206823/39%            | 46..50  8538/1% 206925/39%
> 51..55  5607/1% 212430/40%            | 51..55  5610/1% 212535/40%
> 56..60  6629/1% 219059/41%            | 56..60  6629/1% 219164/42%
> 61..65  5232/1% 224291/42%            | 61..65  5242/1% 224406/43%
> 66..70  8827/1% 233118/44%            | 66..70  8824/1% 233230/44%
> 71..75  6240/1% 239358/45%            | 71..75  6245/1% 239475/45%
> 76..80  8573/1% 247931/47%            | 76..80  8577/1% 248052/47%
> 81..85  8235/1% 256166/49%            | 81..85  8236/1% 256288/49%
> 86..90  13385/2%        269551/51%    | 86..90  13365/2%        269653/51%
> 91..95  21427/4%        290978/55%    | 91..95  21397/4%        291050/55%
> 96..99  20791/3%        311769/59%    | 96..99  20739/3%        311789/59%
> 100     209906/40%      521675/100%   | 100     209781/40%      521570/100%
>
>
> for  gcc/ChangeLog
> from  Alexandre Oliva  <aoliva@redhat.com>
>
> 	* cselib.h (cselib_add_permanent_equiv): Declare.
> 	(canonical_cselib_val): New.
> 	* cselib.c (new_elt_loc_list): Rework to support value
> 	equivalences.  Adjust all callers.
> 	(preserve_only_constants): Retain value equivalences.
> 	(references_value_p): Retain preserved values.
> 	(rtx_equal_for_cselib_1): Handle value equivalences.
> 	(cselib_invalidate_regno): Use canonical value.
> 	(cselib_add_permanent_equiv): New.
> 	* alias.c (find_base_term): Reset locs lists while recursing.
> 	* var-tracking.c (val_bind): New.  Don't add equivalences
> 	present in cselib table, compared with code moved from...
> 	(val_store): ... here.
> 	(val_resolve): Use val_bind.
> 	(VAL_EXPR_HAS_REVERSE): Drop.
> 	(add_uses): Do not create MOps for addresses.  Do not mark
> 	non-REG non-MEM expressions as requiring resolution.
> 	(reverse_op): Record reverse as a cselib equivalence.
> 	(add_stores): Use it.  Do not create MOps for addresses.
> 	Do not require resolution for non-REG non-MEM expressions.
> 	Simplify support for reverse operations.
> 	(compute_bb_dataflow): Drop reverse support.
> 	(emit_notes_in_bb): Likewise.
> 	(create_entry_value): Rename to...
> 	(record_entry_value): ... this.  Use cselib equivalences.
> 	(vt_add_function_parameter): Adjust.

This has significantly increased the compile time of 20001226-1.c -O3 -g
on mips64-linux-gnu:

time for -O3:
    real    0m25.656s
    user    0m25.106s
    sys     0m0.544s

time for -O3 -g with this patch and the follow-ons reverted:
    real    0m28.212s
    user    0m27.586s
    sys     0m0.624s

time for -O3 -g with today's trunk:
    CTRL-Ced after (sorry, I got bored):
    real    15m11.725s
    user    15m11.177s
    sys     0m0.496s
    
I realise that 20001226-1.c isn't exactly the most representative
testcase in the world, but it does look like a real problem.

The testcase has a lot of identical "load high, add low" sequences.
Even though there are only two values here (the high part and
the result), we end up with thousands of value rtxs and thousands
of distinct "reverse" locations attached to the "load high" value.

One source of inefficiency seems to be that if an as-yet-unseen value
is passed to cselib_add_permanent_equiv, we create a new value for it,
then make the previous value (ELT) equivalent to it.  AIUI, ELT is still
supposed to be the "canonical" representation of the value in this
situation, so most (but presumably not all, given the bug) instances
of the new value will be replaced by the old.

Which led to the patch below.  However, it triggers an infinite loop
in alias.c:memrefs_conflict_p when compiling libgcov.c, because we end
up recording that

  value V1 = value V2 + OFFSET
  value V2 = value V1 - OFFSET

and the function cycles endlessly between V1 and V2.

And this is what makes me a bit nervous about the original patch.
AIUI, cselib values were always supposed to form a dag, whereas
the patch above is deliberately recording cycles.  It seems a bit
dangerous to change something like that at the end of stage 3.
I couldn't convince myself that, without this patch, all these
cycles were harmless, especially given the increased use of
canonical_cselib_val in the follow-on patches.

The follow-on patches also made me realise that I don't understand
when cselib is supposed to use "canonical" values and when it should
use (or can make do with) the non-canonical ones.

Thanks,
Richard


Index: gcc/cselib.c
===================================================================
--- gcc/cselib.c	2012-01-21 12:42:16.000000000 +0000
+++ gcc/cselib.c	2012-01-21 13:09:38.000000000 +0000
@@ -1895,16 +1895,26 @@ cselib_subst_to_values (rtx x, enum mach
   return copy;
 }
 
-/* Look up the rtl expression X in our tables and return the value it
-   has.  If CREATE is zero, we return NULL if we don't know the value.
-   Otherwise, we create a new one if possible, using mode MODE if X
-   doesn't have a mode (i.e. because it's a constant).  When X is part
-   of an address, MEMMODE should be the mode of the enclosing MEM if
-   we're tracking autoinc expressions.  */
+/* Look up the rtl expression X in our tables.  Use mode MODE if X
+   doesn't have a mode (i.e. because it's a constant).  If PRESERVED_EQUIV
+   is nonnull, it is a preserved value that is now known to be equal to X,
+   although that equivalence may not yet have been recorded.
+
+   Specifically:
+
+   - If there is already an entry, return the value it has.
+   - Otherwise, if CREATE is zero, return NULL.
+   - Otherwise, if PRESERVED_EQUIV is nonnull, add X to PRESERVED_EQUIV
+     and return that.
+   - Otherwise, create a new value if possible, add X to it, and return it.
+
+   When X is part of an address, MEMMODE should be the mode of the
+   enclosing MEM if we're tracking autoinc expressions.  */
 
 static cselib_val *
 cselib_lookup_1 (rtx x, enum machine_mode mode,
-		 int create, enum machine_mode memmode)
+		 int create, enum machine_mode memmode,
+		 cselib_val *preserved_equiv)
 {
   void **slot;
   cselib_val *e;
@@ -2012,7 +2022,13 @@ cselib_lookup_1 (rtx x, enum machine_mod
   if (e)
     return e;
 
-  e = new_cselib_val (hashval, mode, x);
+  if (preserved_equiv)
+    {
+      e = preserved_equiv;
+      gcc_assert (PRESERVED_VALUE_P (e->val_rtx));
+    }
+  else
+    e = new_cselib_val (hashval, mode, x);
 
   /* We have to fill the slot before calling cselib_subst_to_values:
      the hash table is inconsistent until we do so, and
@@ -2040,14 +2056,14 @@ cselib_lookup_from_insn (rtx x, enum mac
   return ret;
 }
 
-/* Wrapper for cselib_lookup_1, that logs the lookup result and
-   maintains invariants related with debug insns.  */
+/* Wrapper for cselib_lookup_1 that logs the lookup result.  */
 
-cselib_val *
-cselib_lookup (rtx x, enum machine_mode mode,
-	       int create, enum machine_mode memmode)
+static cselib_val *
+cselib_lookup_2 (rtx x, enum machine_mode mode,
+		 int create, enum machine_mode memmode,
+		 cselib_val *preserved_equiv)
 {
-  cselib_val *ret = cselib_lookup_1 (x, mode, create, memmode);
+  cselib_val *ret = cselib_lookup_1 (x, mode, create, memmode, preserved_equiv);
 
   /* ??? Should we return NULL if we're not to create an entry, the
      found loc is a debug loc and cselib_current_insn is not DEBUG?
@@ -2067,6 +2083,16 @@ cselib_lookup (rtx x, enum machine_mode
   return ret;
 }
 
+/* Wrapper around cselib_lookup_2 for the usual case in which there is
+   no known preserved value that is equal to X.  */
+
+cselib_val *
+cselib_lookup (rtx x, enum machine_mode mode,
+	       int create, enum machine_mode memmode)
+{
+  return cselib_lookup_2 (x, mode, create, memmode, 0);
+}
+
 /* Invalidate any entries in reg_values that overlap REGNO.  This is called
    if REGNO is changing.  MODE is the mode of the assignment to REGNO, which
    is used to determine how many hard registers are being changed.  If MODE
@@ -2363,7 +2389,7 @@ cselib_add_permanent_equiv (cselib_val *
 
   cselib_current_insn = insn;
 
-  nelt = cselib_lookup (x, GET_MODE (elt->val_rtx), 1, VOIDmode);
+  nelt = cselib_lookup_2 (x, GET_MODE (elt->val_rtx), 1, VOIDmode, elt);
 
   if (nelt != elt)
     {


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]