68930 – Aggregate replacements not applied to inline function bodies.

Bug 68930 - Aggregate replacements not applied to inline function bodies.

Summary: Aggregate replacements not applied to inline function bodies.

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	ipa (show other bugs)
Version:	unknown

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2015-12-16 05:48 UTC by Jan Hubicka
Modified:	2023-08-16 07:40 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2015-12-16 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jan Hubicka 2015-12-16 05:48:04 UTC

The following testcase:
struct a
{
  int a, b;
};

static inline int
t (struct a a)
{
  if (!__builtin_constant_p (a.a))
    __builtin_abort ();
}

static int __attribute__ ((noinline)) q (struct a a)
{
  t (a);
}

int
main (int argc)
{
  struct a a;
  int i;
  for (i = 0; i < 10000; i++)
    {
      a.a = 1;
      a.b = 2;
      q (a);
    }
  return 0;
}

Fails when compiled with 
./xgcc -B ./ -O3 t.c -S -fdump-ipa-cp -fno-ipa-sra -fno-early-inlining
ipa-cp correctly identifies the inlining oppurtunity:
Estimating effects for main/2, base_time: 1588.                                 
   Estimating body: main/2                                                      
   Known to be false:                                                           
   size:10 time:1588                                                            
 - context independent values, size: 10, time_benefit: 1                        
     good_cloning_opportunity_p (time: 1, size: 10, freq_sum: 0, single_call) -> evaluation: 0, threshold: 500
   Not cloning for all contexts because !good_cloning_opportunity_p.            
                                                                                
Estimating effects for q/1, base_time: 13.                                      
   Estimating body: q/1                                                         
   Known to be false:                                                           
   size:5 time:13                                                               
 - context independent values, size: 5, time_benefit: 0                         
     Decided to specialize for all known contexts, code not going to grow.      
                                                                                
Estimating effects for t/0, base_time: 5.                                       
   Estimating body: t/0                                                         
   Known to be false: op0[offset: 0] changed                                    
   size:4 time:2                                                                
 - context independent values, size: 4, time_benefit: 3                         
     Decided to specialize for all known contexts, code not going to grow.      

however because the ipa-cp adjustments are applied to the body of q before inlining we end up with:
__attribute__((noinline))
q.constprop (struct a a)
{
  struct a a;
  int _2;
  int _3;

  <bb 2>:
  a = a;
  _2 = a.a;
  _3 = __builtin_constant_p (_2);
  if (_3 == 0)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  __builtin_abort ();

  <bb 4>:
  return;

}
i.e. we lose the track that a.a==1

Modification phase of node q.constprop/5
     Aggregate replacements: 0[0]=1, 0[32]=2
__attribute__((noinline))
q.constprop (struct a a)
{
  <bb 3>:

  <bb 2>:
  t (a);
  return;

}

here t is not inlined yet and the clone of t is not modified because it is inline clone but it should be because it is inline clone of ipa-cp clone.
I wonder if we should not apply the replacements only after inlining. Even if ipa-cp decided to not clone the callee, inliner might have inlined it and we ought to use the known constants when optimizing the function body.

Comment 1 Jan Hubicka 2015-12-16 06:00:11 UTC

Actually applying it just after inlining is probably going to miss the transform, too, because there will be extra copy of the whole aggregate.  What about making the info about known constant values of aggregates available to GVN? It should be able to do the desired cleanup.

Comment 2 Richard Biener 2015-12-16 08:50:21 UTC

I suppose the dump is misleading in that the inline clone has

Modification phase of node q.constprop/5
     Aggregate replacements: 0[0]=1, 0[32]=2

applied to?

__attribute__((noinline))
q.constprop (struct a a)
{
  <bb 3>:

  <bb 2>:
  t (a);
  return;

}

instead would look like wrong-code.

Ah, IPA-CP doesn't actually remove "const" parameters.

So this report is reporting that IPA-CP thinks it does sth but in the
end it's a no-op clone.

I wonder why we don't simply add

  a.a = 1;
  a.b = 2;

at the start of q.constprop.  Or if we only know a partially then initialize
those parts from the know constants?

Comment 3 rguenther@suse.de 2015-12-16 12:17:27 UTC

On Wed, 16 Dec 2015, hubicka at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68930
> 
> Jan Hubicka <hubicka at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |rguenther at suse dot de
> 
> --- Comment #1 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
> Actually applying it just after inlining is probably going to miss the
> transform, too, because there will be extra copy of the whole aggregate.  What
> about making the info about known constant values of aggregates available to
> GVN? It should be able to do the desired cleanup.

But how would you represent this info?  I don't see how it's going to
do better than the inliner inserting an extra initialization (of course
that might stay around if there are no cleanup opportunities and actually
make the code bigger & slower)

Comment 4 Martin Jambor 2015-12-16 16:32:50 UTC

I agree with the analysis in the summary, the problem is that the
IPA-CP transformation phase should be run on clone of t _before_ we
inline it to q.  From the ipa-cp dump it is apparent that does not
happen.

As far as the GVN idea is concerned, can GVN even work with
aggregates?  And even if yes, I assume this would mean adding some
kind of asserts into the gimple stream, which would also require that
we process the body of the clone before we inline it.  Trying to track
IPA-CP aggregate transformation info across inlining seems difficult
and unnecessary to me.

The modification of q is a no op because we only replace loads of
known values with the known values in the transformation phase.

And IPA-CP indeed does not remove _aggregate_ parameters, even when it
knows the value of it all.  I suppose it should.

Comment 5 Jan Hubicka 2015-12-16 18:24:54 UTC

> But how would you represent this info?  I don't see how it's going to
> do better than the inliner inserting an extra initialization (of course
> that might stay around if there are no cleanup opportunities and actually
> make the code bigger & slower)

Just keep the existing transformation data used by ipcp_modif_dom_walker
and feed it into GVN at entry block.

Consider code:

foo (struct a)
{
  struct b=a;
  return b.a;
}

this is a common case of post inlining code.  Now the way ipa-prop works if we
figure out that b.a=1 it won't update the copy b=a because it is not access to
the field it knows value of so no transformation happens.
GVN will figure out that the return is equivalent to a.a and if it knew that a.a==1
it would optimize the code.

It is quite important bug that we work hard to figure out all those constants and
then we drop them to the floor :)

Honza

Comment 6 Jan Hubicka 2015-12-16 18:28:35 UTC

For the Martin's answer.  I poked about this a bit.
One can add call to node->get_body to the end of cgraph_materialize_clone which
after removing the associated assert will call to the ipa-prop transform
method.  It will also call inliner's transform which will in turn remove the
cgraph and break.

I suppose I can fix this by making IPA transforms function speicfic (they should
be) and making inliner to only drop its transform method on functions non-inline
functions and make ipa-cp add transforms only to the clones.  It should improve
compile time, too.

Still we will get missed optimizations, so keeping this info for GVN seems like
resonable thing to do.

Comment 7 rguenther@suse.de 2015-12-16 19:31:49 UTC

On December 16, 2015 7:28:35 PM GMT+01:00, hubicka at ucw dot cz <gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68930
>
>--- Comment #6 from Jan Hubicka <hubicka at ucw dot cz> ---
>For the Martin's answer.  I poked about this a bit.
>One can add call to node->get_body to the end of
>cgraph_materialize_clone which
>after removing the associated assert will call to the ipa-prop
>transform
>method.  It will also call inliner's transform which will in turn
>remove the
>cgraph and break.
>
>I suppose I can fix this by making IPA transforms function speicfic
>(they
>should
>be) and making inliner to only drop its transform method on functions
>non-inline
>functions and make ipa-cp add transforms only to the clones.  It should
>improve
>compile time, too.
>
>Still we will get missed optimizations, so keeping this info for GVN
>seems like
>resonable thing to do.

If the information is there in some way then I can try using it from GVN. just tell me where it is.

Comment 8 Jan Hubicka 2016-01-11 17:49:17 UTC

The information is consumed by ipcp_transform_function, so you can take a look.
It is stored in ipcp_get_transformation_summary
I think it is just matter of not freeing it (assuming it is freed after cloning is done) and using it from GVN

Comment 9 Martin Jambor 2023-05-29 16:25:04 UTC

I have proposed a fix on the mailing list:

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619969.html

Comment 10 GCC Commits 2023-08-15 15:41:42 UTC

The master branch has been updated by Martin Jambor <jamborm@gcc.gnu.org>:

https://gcc.gnu.org/g:d073e2d75d9ed492de9a8dc6970e5b69fae20e5a

commit r14-3226-gd073e2d75d9ed492de9a8dc6970e5b69fae20e5a
Author: Martin Jambor <mjambor@suse.cz>
Date:   Tue Aug 15 17:26:13 2023 +0200

    Feed results of IPA-CP into tree value numbering
    
    PRs 68930 and 92497 show that when IPA-CP figures out constants in
    aggregate parameters or when passed by reference but the loads happen
    in an inlined function the information is lost.  This happens even
    when the inlined function itself was known to have - or even cloned to
    have - such constants in incoming parameters because the transform
    phase of IPA passes is not run on them.  See discussion in the bugs
    for reasons why.
    
    Honza suggested that we can plug the results of IPA-CP analysis into
    value numbering, so that FRE can figure out that some loads fetch
    known constants.  This is what this patch attempts to do.  The patch
    does not attempt to populate partial_defs with information from
    IPA-CP, this can be hopefully added as a follow-up.
    
    gcc/ChangeLog:
    
    2023-08-11  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/68930
            PR ipa/92497
            * ipa-prop.h (ipcp_get_aggregate_const): Declare.
            * ipa-prop.cc (ipcp_get_aggregate_const): New function.
            (ipcp_transform_function): Do not deallocate transformation info.
            * tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and
            ipa-prop.h.
            (vn_reference_lookup_2): When hitting default-def vuse, query
            IPA-CP transformation info for any known constants.
    
    gcc/testsuite/ChangeLog:
    
    2023-06-07  Martin Jambor  <mjambor@suse.cz>
    
            PR ipa/68930
            PR ipa/92497
            * gcc.dg/ipa/pr92497-1.c: New test.
            * gcc.dg/ipa/pr92497-2.c: Likewise.

Comment 11 Martin Jambor 2023-08-15 15:44:43 UTC

Finally fixed.