This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] ipa bitwise constant propagation


On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote:

> On 4 August 2016 at 13:31, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 4 Aug 2016, Prathamesh Kulkarni wrote:
> >
> >> Hi,
> >> This is a prototype patch for propagating known/unknown bits inter-procedurally.
> >> for integral types which propagates info obtained from get_nonzero_bits ().
> >>
> >> Patch required making following changes:
> >> a) To make info from get_nonzero_bits() available to ipa, I had to remove
> >> guard !nonzero_p in ccp_finalize. However that triggered the following ICE
> >> in get_ptr_info() for default_none.f95 (and several other fortran tests)
> >> with options: -fopenacc -O2
> >> ICE: http://pastebin.com/KjD7HMQi
> >> I confirmed with Richard that this was a latent issue.
> >
> > Can you plase bootstrap/test the fix for this separately?  (doesn't
> > seem to be included in this patch btw)
> Well I don't have the fix available -;)

Oh, I thought it was obvious:

Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c   (revision 239117)
+++ gcc/tree-inline.c   (working copy)
@@ -242,7 +242,8 @@ remap_ssa_name (tree name, copy_body_dat
       SSA_NAME_OCCURS_IN_ABNORMAL_PHI (new_tree)
        = SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name);
       /* At least IPA points-to info can be directly transferred.  */
-      if (id->src_cfun->gimple_df
+      if (POINTER_TYPE_P (TREE_TYPE (name))
+         && id->src_cfun->gimple_df
          && id->src_cfun->gimple_df->ipa_pta
          && (pi = SSA_NAME_PTR_INFO (name))
          && !pi->pt.anything)
@@ -274,7 +275,8 @@ remap_ssa_name (tree name, copy_body_dat
       SSA_NAME_OCCURS_IN_ABNORMAL_PHI (new_tree)
        = SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name);
       /* At least IPA points-to info can be directly transferred.  */
-      if (id->src_cfun->gimple_df
+      if (POINTER_TYPE_P (TREE_TYPE (name))
+         && id->src_cfun->gimple_df
          && id->src_cfun->gimple_df->ipa_pta
          && (pi = SSA_NAME_PTR_INFO (name))
          && !pi->pt.anything)

similarly range info could be transfered of course.

> >
> >> b) I chose widest_int for representing value, mask in ipcp_bits_lattice
> >> and correspondingly changed declarations for
> >> bit_value_unop_1/bit_value_binop_1 to take
> >> precision and sign instead of type (those are the only two fields that
> >> were used). Both these functions are exported by tree-ssa-ccp.h
> >> I hope that's ok ?
> >
> > That's ok, but please change the functions to overloads of
> > bit_value_binop / bit_value_unop to not export ugly _1 names.
> >
> > -  signop sgn = TYPE_SIGN (type);
> > -  int width = TYPE_PRECISION (type);
> > +  signop sgn = type_sgn;
> > +  int width = (int) type_precision;
> >
> > please adjust parameter names to get rid of those now unnecessary
> > locals (and make the precision parameter an 'int').
> >
> >> c) Changed streamer_read_wi/streamer_write_wi to non-static.
> >> Ah I see Kugan has submitted a patch for this, so I will drop this hunk.
> >
> > But he streams wide_int, not widest_int.  I followed up on his
> > patch.
> Oops, I got confused, sorry about that.
> >
> >> d) We have following in tree-ssa-ccp.c:get_default_value ():
> >>           if (flag_tree_bit_ccp)
> >>             {
> >>               wide_int nonzero_bits = get_nonzero_bits (var);
> >>               if (nonzero_bits != -1)
> >>                 {
> >>                   val.lattice_val = CONSTANT;
> >>                   val.value = build_zero_cst (TREE_TYPE (var));
> >>                   val.mask = extend_mask (nonzero_bits);
> >>                 }
> >>
> >> extend_mask() sets all upper bits to 1 in nonzero_bits, ie, varying
> >> in terms of bit-ccp.
> >> I suppose in tree-ccp we need to extend mask if var is parameter since we don't
> >> know in advance what values it will receive from different callers and mark all
> >> upper bits as 1 to be safe.
> >
> > Not sure, it seems to me that we can zero-extend for unsigned types
> > and sign-extend for signed types (if the "sign"-bit of nonzero_bits
> > is one it properly makes higher bits undefined).  Can you change
> > the code accordingly?  (simply give extend_mask a sign-op and use
> > that appropriately?)  Please split out this change so it can be
> > tested separately.
> >
> >> However I suppose with ipa, we can determine exactly which bits of
> >> parameter are constant and
> >> setting all upper bits to 1 will become unnecessary ?
> >>
> >> For example, consider following artificial test-case:
> >> int f(int x)
> >> {
> >>   if (x > 300)
> >>     return 1;
> >>   else
> >>     return 2;
> >> }
> >>
> >> int main(int argc, char **argv)
> >> {
> >>   return f(argc & 0xc) + f (argc & 0x3);
> >> }
> >>
> >> For x, the mask would be meet of:
> >> <0, 0xc> meet <0, 0x3> == (0x3 | 0xc) | (0 ^ 0) == 0xf
> >> and ipcp_update_bits() sets nonzero_bits for x to 0xf.
> >> However get_default_value then calls extend_mask (0xf), resulting in
> >> all upper bits
> >> being set to 1 and consequently the condition if (x > 300) doesn't get folded.
> >
> > But then why would the code trying to optimize the comparison look at
> > bits that are outside of the precision?  (where do we try to use this
> > info?  I see that VRP misses to use nonzero bits if no range info
> > is present - I suppose set_nonzero_bits misses to eventually adjust
> > the range.
> >
> > That said, where is the folding code and why does it care for those
> > "uninteresting" bits at all?
> Well there is following in bit_value_binop_1 for case LT_EXPR / LE_EXPR:
>         /* If the most significant bits are not known we know nothing.  */
>         if (wi::neg_p (o1mask) || wi::neg_p (o2mask))
>           break;
> 
> IIUC extend_mask extends all upper bits to 1, and we hit break and
> thus not perform folding.

Yeah, this should simply test _the_ most significant bit
(that is, bit number TYPE_PRECISION (r1type)).

But it should be fixed by properly extending the mask.

> ccp2 dump shows:
> Folding statement: if (x_2(D) > 300)
> which is likely CONSTANT
> Not folded
> 
> Instead if we extend based on signop, then the condition gets folded correctly:
> Folding statement: if (x_2(D) > 300)
> which is likely CONSTANT
> Folding predicate x_2(D) > 300 to 0
> gimple_simplified to if (0 != 0)
> Folded into: if (0 != 0)
> 
> I thought it was unsafe for ccp to extend based on sign-op,
> so I guarded that on DECL_SET_BY_IPA.
> I will try to change extend_mask to extend the mask based on signop
> and get rid of the flag.
> 
> I will address your other comments in follow-up patch.

Thanks,
Richard.

> Thanks,
> Prathamesh
> >
> >> To resolve this, I added a new flag "set_by_ipa" to decl_common,
> >> which is set to true if the mask of parameter is determined by ipa-cp,
> >> and the condition changes to:
> >>
> >> if (SSA_NAME_VAR (var)
> >>     && TREE_CODE (SSA_NAME_VAR (var)) == PARM_DECL
> >>     && DECL_SET_BY_IPA (SSA_NAME_VAR (var))
> >>   val.mask = widest_int::from (nonzero_bits,
> >>                           TYPE_SIGN (TREE_TYPE (SSA_NAME_VAR (var)));
> >> else
> >>   val.mask = extend_mask (nonzero_bits);
> >>
> >> I am not sure if adding a new flag to decl_common is a good idea. How
> >> do other ipa passes deal with this/similar issue ?
> >>
> >> I suppose we would want to gate this on some flag, say -fipa-bit-cp ?
> >> I haven't yet gated it on the flag, will do in next version of patch.
> >> I have added some very simple test-cases, I will try to add more
> >> meaningful ones.
> >
> > See above - we should avoid needing this.
> >
> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu
> >> and cross-tested on arm*-*-* and aarch64*-*-* with the exception
> >> of some fortran tests failing due to above ICE.
> >>
> >> As next steps, I am planning to extend it to handle alignment propagation,
> >> and do further testing (lto-bootstrap, chromium).
> >> I would be grateful for feedback on the current patch.
> >
> > I see you do
> >
> > @@ -895,7 +899,7 @@ do_dbg_cnt (void)
> >     Return TRUE when something was optimized.  */
> >
> >  static bool
> > -ccp_finalize (bool nonzero_p)
> > +ccp_finalize (bool nonzero_p ATTRIBUTE_UNUSED)
> >  {
> >    bool something_changed;
> >    unsigned i;
> > @@ -913,10 +917,7 @@ ccp_finalize (bool nonzero_p)
> >
> >        if (!name
> >           || (!POINTER_TYPE_P (TREE_TYPE (name))
> > -             && (!INTEGRAL_TYPE_P (TREE_TYPE (name))
> > -                 /* Don't record nonzero bits before IPA to avoid
> > -                    using too much memory.  */
> > -                 || !nonzero_p)))
> > +             && (!INTEGRAL_TYPE_P (TREE_TYPE (name)))))
> >         continue;
> >
> > can you instead adjust the caller to do sth like
> >
> >   if (ccp_finalize (nonzero_p || flag_ipa_cp))
> >     {
> >
> > ?  What we miss to optimize memory usage in the early CCP case
> > (it's run very early, before dead code elimination) is to
> > avoid setting alignment / nonzero bits for the case of
> > fully propagatable (and thus dead after substitute_and_fold)
> > SSA names.
> >
> > So in ccp_finalize do sth like
> >
> >       val = get_value (name);
> >       if (val->lattice_val != CONSTANT
> >           || TREE_CODE (val->value) != INTEGER_CST
> >           || val->mask == 0)
> >         continue;
> >
> > That should cut down early CCP memory use in case of nonzero
> > setting significantly.
> >
> > I didn't look at the propagation part but eventually the IPA-CP
> > lattice gets quite big.  Also the alignment lattice is very
> > similar to the bits lattice so why not merge those two?  But
> > in the end it's Martins/Honzas call here.  Note there is
> > trailing_wide_ints <> which could be used to improve memory usage
> > based on the underlying type.
> >
> > Thanks,
> > Richard.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]