This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Fix PR47059
Hi,
On Tue, Jan 18, 2011 at 05:07:59PM -0000, Rahul Kharche wrote:
> Hi Martin,
>
> Thanks for the review.
>
> > I am a bit afraid that if this gets applied, we'll get complaints
> > about unnecessary stack frames being set up and a lot of unnecessary
> > copying of stuff because of aggregates that were not completely
> > scalarized away just like in PR 42585. I admit that I do not know how
> > to simply reconcile the techniques to avoid the two issues or which
> > one is worse. For 4.7 I'd be content to just wait and see what will
> > come up, if anything. For 4.6 or even 4.5, I am not so sure.
>
> At the moment structure copying happens even on trivial cases when a
> structure like the following is defined in context of the attached test
> case. This may be related padding and alignment?
>
> struct struct1
> {
> void *data;
> unsigned short f1;
> unsigned char f2;
> int f3;
> }
>
> This is true of GCC4.5 and 4.6. Optimizing this out if at all, is at
> the mercy of DSE anyway. I believe this fix does not make the condition
> any worse. We will still need both a fix to DSE that will remove all
> redundant stack stores and have a pass (in 4.7?) to merge all adjacent
> loads and stores.
Well... consider the following example (on i686):
struct S
{
int i;
unsigned short f1, f2, f3, f4, f5, f6, f7, f8;
};
int foo (struct S *p)
{
struct S l;
l = *p;
l.i++;
*p = l;
}
The current trunk compiles this to:
foo:
.LFB0:
movl 4(%esp), %eax
addl $1, (%eax)
ret
With (my version of) your patch, I get:
foo:
.LFB0:
subl $40, %esp
.LCFI0:
movl 44(%esp), %eax
movl %ebx, 32(%esp)
.LCFI1:
movl 32(%esp), %ebx
movl %esi, 36(%esp)
.LCFI2:
movl 36(%esp), %esi
movl (%eax), %edx
addl $1, %edx
movl %edx, (%eax)
addl $40, %esp
.LCFI3:
ret
That is what I am worried about. On the other hand, yes, I have just
found out that padding can easily confuse SRA to not remove the
aggregate as it should, so I am not really sure to what extent this is
a big problem in practice - in the presence of small fields, that is.
But currently it is really a bug, with your patch it would be a design
decision.
>
>
> >> + unsigned can_merge : 1;
> >
> > I don't think we need this flag (see below). Nevertheless, if we
> > eventually decide to keep it, please add a grp_ prefix to its name
> > because it describes the whole group and not an individual access.
>
> Done. Removed completely, used grp_unscalarizable_region as suggested.
>
> >> + spill = 0;
> >> + offset = child->offset;
> >> + merge_size = 0;
> >
> > I cannot see any difference in between spill and merge_size so I guess
> > the former can be replaced by the latter?
>
> Done.
>
> >> + while (ac1 && ac1->size < max_bits
> >> + && (ac1->size + spill) <= max_bits
> >> + && offset == ac1->offset
> >> + && !(direct_access = ac1->grp_assignment_read
> >> + || ac1->grp_assignment_write))
> >> + {
> >> + spill += ac1->size;
> >> + offset += child->size;
> >> + merge_size += ac1->size;
> >> + num_merged++;
> >> + ac1 = ac1->next_sibling;
> >> + }
> >> +
> >> + /* Merge at least two adjacent accesses which have no direct
> >> + read/writes or that will result in less reads. */
> >> + if (num_merged > 1
> >> + && (merge_size > (max_bits - merge_size)
> >> + || !direct_access))
> >
> > if direct access is true, the while loop above is stopped immediately
> > and the directly accessed access will not be part of the merge group
> > at all. So I don't really understand this part of the condition, can
> > you please explain what it is supposed to do?
>
> The while loop will terminate on multiple conditions. The if condition
> after the while loop tests if it could be profitable to merge adjacent
> regions. In trivial case where say only two subaccesses can be merged
> and the size criterion, merge_size > (max_bits - merge_size) fails,
> we would still like to merge the two accesses. This is also true for
> subaccesses at the end of an aggregate.
>
> In this case the direct_access flag will be zero. In other cases where
> adjacent subaccesses have direct reads/writes the direct access flag
> will be non-zero and we will use the size criterion to decide on
> the merge. This reduces the total number of accesses, compared to
> scalarizing the whole structure.
>
> E.g. The struct definition in the accompanying test case were defined
> to be:
>
> struct struct1
> {
> char f1;
> char f2;
> }
I see.
>
> > In addition to the above, I think we don't really need an extra
> > can_merge flag just to say don't scalarize, we already have
> > grp_unscalarizable_region for it (though we might want to rename it to
> > grp_noscalarize_region or something like that if we also use it in this
> > way).
>
> Done.
>
> > Can you please have a look if that works for you, amend it at you find
> > necessary and
> > explain the intent of direct_access?
> >
> The call to disallow_mergable_subaccesses in analyze_access_subtree needs
> To be made before the test for grp_unscalarizable_region. I have changed
> this.
No, it does not, we set flags of children of the root, not of the root
itself. (I have actually tested my version yesterday ;-)
I'm happy with what the code looks like now, let's see what maintainers
think about the issue I outlined at the top and associated tradeoffs.
Thanks,
Martin
>
> See revised patch below:
>
> 2011-01-17 Rahul Kharche <rahul@icerasemi.com>
>
> PR tree-optimization/47059
> * tree-sra.c (disallow_mergable_subaccesses): New.
> (analyze_access_subtree): Call above.
>
>
> * gcc.dg/tree-ssa/pr47059.c: New.
>
>
> Index: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr47059.c
> ===================================================================
> --- trunk/gcc/testsuite/gcc.dg/tree-ssa/pr47059.c (revision 0)
> +++ trunk/gcc/testsuite/gcc.dg/tree-ssa/pr47059.c (revision 0)
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fdump-tree-esra" } */
> +
> +/* Test for SRA. */
> +
> +struct struct1
> +{
> + void *data;
> + unsigned short f1;
> + unsigned short f2;
> +};
> +typedef struct struct1 S1;
> +
> +struct struct2
> +{
> + int f3;
> + S1 f4;
> +};
> +typedef struct struct2 S2;
> +
> +extern void foo (S1 *ptr);
> +extern S2 gstruct2_var;
> +extern S1 gstruct1_var;
> +
> +int
> +main ()
> +{
> + S2 *ps_var;
> + S1 *pls_var = &gstruct1_var;
> + S1 ls_var = *pls_var;
> +
> + ps_var = &gstruct2_var;
> + ps_var->f4 = ls_var;
> +
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "ls_var\$f1" 0 "esra" } } */
> +/* { dg-final { scan-tree-dump-times "ls_var\$f2" 0 "esra" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> +
> Index: trunk/gcc/tree-sra.c
> ===================================================================
> --- trunk/gcc/tree-sra.c (revision 168655)
> +++ trunk/gcc/tree-sra.c (working copy)
> @@ -1838,6 +1838,52 @@ expr_with_var_bounded_array_refs_p (tree
> return false;
> }
>
> +/* Analyze sccess subtree and detect neighbouring sub-accesses that could be
> + merged and set grp_unscalarizable_region for them. */
> +
> +static void
> +disallow_mergable_subaccesses (struct access *access)
> +{
> + struct access *child, *ac1;
> + HOST_WIDE_INT offset, merge_size, num_merged;
> + bool direct_access;
> + const HOST_WIDE_INT max_bits = MOVE_MAX_PIECES * BITS_PER_UNIT;
> +
> + child = access->first_child;
> + while (child)
> + {
> + offset = child->offset;
> + merge_size = 0;
> + num_merged = 0;
> +
> + /* Find adjacent accesses that are less than MOVE_MAX_PIECES
> + bytes in size. Ensure these accesses are not read or written
> + to directly. */
> + ac1 = child;
> + while (ac1 && ac1->size < max_bits
> + && (ac1->size + merge_size) <= max_bits
> + && offset == ac1->offset
> + && !(direct_access = ac1->grp_assignment_read
> + || ac1->grp_assignment_write))
> + {
> + offset += child->size;
> + merge_size += ac1->size;
> + num_merged++;
> + ac1 = ac1->next_sibling;
> + }
> +
> + /* Merge at least two adjacent accesses which have no direct
> + read/writes or that will result in less reads. */
> + if (num_merged > 1
> + && (merge_size > (max_bits - merge_size)
> + || !direct_access))
> + for (; child != ac1; child = child->next_sibling)
> + child->grp_unscalarizable_region = 1;
> + else
> + child = child->next_sibling;
> + }
> +}
> +
> enum mark_rw_status { SRA_MRRW_NOTHING, SRA_MRRW_DIRECT, SRA_MRRW_ASSIGN};
>
> /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when
> @@ -1915,6 +1961,9 @@ analyze_access_subtree (struct access *r
> else if (root->grp_write)
> mark_write = SRA_MRRW_DIRECT;
>
> + if (allow_replacements)
> + disallow_mergable_subaccesses (root);
> +
> if (root->grp_unscalarizable_region)
> allow_replacements = false;
>