This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix PR47059


Hi,

On Tue, Jan 18, 2011 at 05:07:59PM -0000, Rahul Kharche wrote:
> Hi Martin,
> 
> Thanks for the review.
> 
> > I am a bit afraid that if this gets applied, we'll get complaints
> > about unnecessary stack frames being set up and a lot of unnecessary
> > copying of stuff because of aggregates that were not completely
> > scalarized away just like in PR 42585.  I admit that I do not know how
> > to simply reconcile the techniques to avoid the two issues or which
> > one is worse.  For 4.7 I'd be content to just wait and see what will
> > come up, if anything.  For 4.6 or even 4.5, I am not so sure.
> 
> At the moment structure copying happens even on trivial cases when a
> structure like the following is defined in context of the attached test
> case. This may be related padding and alignment?
> 
> struct struct1
> {
>   void *data;
>   unsigned short f1;
>   unsigned char f2;
>   int f3;
> }
> 
> This is true of GCC4.5 and 4.6. Optimizing this out if at all, is at
> the mercy of DSE anyway. I believe this fix does not make the condition
> any worse. We will still need both a fix to DSE that will remove all
> redundant stack stores and have a pass (in 4.7?) to merge all adjacent
> loads and stores.

Well... consider the following example (on i686):

struct S
{
  int i;
  unsigned short f1, f2, f3, f4, f5, f6, f7, f8;
};

int foo (struct S *p)
{
  struct S l;

  l = *p;
  l.i++;
  *p = l;
}

The current trunk compiles this to:

foo:
.LFB0:
	movl	4(%esp), %eax
	addl	$1, (%eax)
	ret


With (my version of) your patch, I get:

foo:
.LFB0:
	subl	$40, %esp
.LCFI0:
	movl	44(%esp), %eax
	movl	%ebx, 32(%esp)
.LCFI1:
	movl	32(%esp), %ebx
	movl	%esi, 36(%esp)
.LCFI2:
	movl	36(%esp), %esi
	movl	(%eax), %edx
	addl	$1, %edx
	movl	%edx, (%eax)
	addl	$40, %esp
.LCFI3:
	ret

That is what I am worried about.  On the other hand, yes, I have just
found out that padding can easily confuse SRA to not remove the
aggregate as it should, so I am not really sure to what extent this is
a big problem in practice - in the presence of small fields, that is.
But currently it is really a bug, with your patch it would be a design
decision.

> 
> 
> >> +  unsigned can_merge : 1;
> >
> > I don't think we need this flag (see below).  Nevertheless, if we
> > eventually decide to keep it, please add a grp_ prefix to its name
> > because it describes the whole group and not an individual access.
> 
> Done. Removed completely, used grp_unscalarizable_region as suggested.
> 
> >> +      spill = 0;
> >> +      offset = child->offset;
> >> +      merge_size = 0;
> >
> > I cannot see any difference in between spill and merge_size so I guess
> > the former can be replaced by the latter?
> 
> Done.
> 
> >> +      while (ac1 && ac1->size < max_bits
> >> +	     && (ac1->size + spill) <= max_bits
> >> +	     && offset == ac1->offset
> >> +	     && !(direct_access = ac1->grp_assignment_read
> >> +		   		  || ac1->grp_assignment_write))
> >> +	{
> >> +	  spill += ac1->size;
> >> +	  offset += child->size;
> >> +	  merge_size += ac1->size;
> >> +	  num_merged++;
> >> +	  ac1 = ac1->next_sibling;
> >> +	}
> >> +
> >> +      /* Merge at least two adjacent accesses which have no direct
> >> +         read/writes or that will result in less reads. */
> >> +      if (num_merged > 1
> >> +	  && (merge_size > (max_bits - merge_size)
> >> +	      || !direct_access))
> >
> > if direct access is true, the while loop above is stopped immediately
> > and the directly accessed access will not be part of the merge group
> > at all.  So I don't really understand this part of the condition, can
> > you please explain what it is supposed to do?
> 
> The while loop will terminate on multiple conditions. The if condition
> after the while loop tests if it could be profitable to merge adjacent
> regions. In trivial case where say only two subaccesses can be merged
> and the size criterion, merge_size > (max_bits - merge_size) fails,
> we would still like to merge the two accesses. This is also true for
> subaccesses at the end of an aggregate.
> 
> In this case the direct_access flag will be zero. In other cases where
> adjacent subaccesses have direct reads/writes the direct access flag
> will be non-zero and we will use the size criterion to decide on
> the merge. This reduces the total number of accesses, compared to
> scalarizing the whole structure.
> 
> E.g. The struct definition in the accompanying test case were defined
> to be:
> 
> struct struct1
> {
>   char f1;
>   char f2;
> }

I see.

> 
> > In addition to the above, I think we don't really need an extra
> > can_merge flag just to say don't scalarize, we already have
> > grp_unscalarizable_region for it (though we might want to rename it to
> > grp_noscalarize_region or something like that if we also use it in this
> > way).
> 
> Done.
> 
> > Can you please have a look if that works for you, amend it at you find
> > necessary and
> > explain the intent of direct_access?
> >
> The call to disallow_mergable_subaccesses in analyze_access_subtree needs
> To be made before the test for grp_unscalarizable_region. I have changed
> this.

No, it does not, we set flags of children of the root, not of the root
itself.  (I have actually tested my version yesterday ;-)

I'm happy with what the code looks like now, let's see what maintainers
think about the issue I outlined at the top and associated tradeoffs.

Thanks,

Martin

> 
> See revised patch below:
> 
> 2011-01-17  Rahul Kharche  <rahul@icerasemi.com>
> 
> 	PR tree-optimization/47059
> 	* tree-sra.c (disallow_mergable_subaccesses): New.
> 	(analyze_access_subtree): Call above.
> 
> 
> 	* gcc.dg/tree-ssa/pr47059.c: New.
> 
> 
> Index: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr47059.c
> ===================================================================
> --- trunk/gcc/testsuite/gcc.dg/tree-ssa/pr47059.c	(revision 0)
> +++ trunk/gcc/testsuite/gcc.dg/tree-ssa/pr47059.c	(revision 0)
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fdump-tree-esra" } */
> +
> +/* Test for SRA. */
> +
> +struct struct1
> +{
> +  void *data;
> +  unsigned short f1;
> +  unsigned short f2;
> +};
> +typedef struct struct1 S1;
> +
> +struct struct2
> +{
> +  int f3;
> +  S1 f4;
> +};
> +typedef struct struct2 S2;
> +
> +extern void foo (S1 *ptr);
> +extern S2 gstruct2_var;
> +extern S1 gstruct1_var;
> +
> +int
> +main ()
> +{
> +  S2 *ps_var;
> +  S1 *pls_var = &gstruct1_var;
> +  S1 ls_var = *pls_var;
> +
> +  ps_var = &gstruct2_var;
> +  ps_var->f4 = ls_var;
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "ls_var\$f1" 0 "esra" } } */
> +/* { dg-final { scan-tree-dump-times "ls_var\$f2" 0 "esra" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> +
> Index: trunk/gcc/tree-sra.c
> ===================================================================
> --- trunk/gcc/tree-sra.c	(revision 168655)
> +++ trunk/gcc/tree-sra.c	(working copy)
> @@ -1838,6 +1838,52 @@ expr_with_var_bounded_array_refs_p (tree
>    return false;
>  }
>  
> +/* Analyze sccess subtree and detect neighbouring sub-accesses that could be
> +   merged and set grp_unscalarizable_region for them.  */
> +
> +static void
> +disallow_mergable_subaccesses (struct access *access)
> +{
> +  struct access *child, *ac1;
> +  HOST_WIDE_INT offset, merge_size, num_merged;
> +  bool direct_access;
> +  const HOST_WIDE_INT max_bits = MOVE_MAX_PIECES * BITS_PER_UNIT;
> +
> +  child = access->first_child;
> +  while (child)
> +    {
> +      offset = child->offset;
> +      merge_size = 0;
> +      num_merged = 0;
> +
> +      /* Find adjacent accesses that are less than MOVE_MAX_PIECES
> +         bytes in size. Ensure these accesses are not read or written
> +	 to directly. */
> +      ac1 = child;
> +      while (ac1 && ac1->size < max_bits
> +	     && (ac1->size + merge_size) <= max_bits
> +	     && offset == ac1->offset
> +	     && !(direct_access = ac1->grp_assignment_read
> +		   		  || ac1->grp_assignment_write))
> +	{
> +	  offset += child->size;
> +	  merge_size += ac1->size;
> +	  num_merged++;
> +	  ac1 = ac1->next_sibling;
> +	}
> +
> +      /* Merge at least two adjacent accesses which have no direct
> +         read/writes or that will result in less reads. */
> +      if (num_merged > 1
> +	  && (merge_size > (max_bits - merge_size)
> +	      || !direct_access))
> +	  for (; child != ac1; child = child->next_sibling)
> +	    child->grp_unscalarizable_region = 1;
> +      else
> +	child = child->next_sibling;
> +    }
> +}
> +
>  enum mark_rw_status { SRA_MRRW_NOTHING, SRA_MRRW_DIRECT, SRA_MRRW_ASSIGN};
>  
>  /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when
> @@ -1915,6 +1961,9 @@ analyze_access_subtree (struct access *r
>    else if (root->grp_write)
>      mark_write = SRA_MRRW_DIRECT;
>  
> +  if (allow_replacements)
> +    disallow_mergable_subaccesses (root);
> +
>    if (root->grp_unscalarizable_region)
>      allow_replacements = false;
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]