This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: increase alignment of global structs in increase_alignment pass


On Wed, 4 May 2016, Prathamesh Kulkarni wrote:

> On 23 February 2016 at 21:49, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> > On 23 February 2016 at 17:31, Richard Biener <rguenther@suse.de> wrote:
> >> On Tue, 23 Feb 2016, Prathamesh Kulkarni wrote:
> >>
> >>> On 22 February 2016 at 17:36, Richard Biener <rguenther@suse.de> wrote:
> >>> > On Mon, 22 Feb 2016, Prathamesh Kulkarni wrote:
> >>> >
> >>> >> Hi Richard,
> >>> >> As discussed in private mail, this version of patch attempts to
> >>> >> increase alignment
> >>> >> of global struct decl if it contains an an array field(s) and array's
> >>> >> offset is a multiple of the alignment of vector type corresponding to
> >>> >> it's scalar type and recursively checks for nested structs.
> >>> >> eg:
> >>> >> static struct
> >>> >> {
> >>> >>   int a, b, c, d;
> >>> >>   int k[4];
> >>> >>   float f[10];
> >>> >> };
> >>> >> k is a candidate array since it's offset is 16 and alignment of
> >>> >> "vector (4) int" is 8.
> >>> >> Similarly for f.
> >>> >>
> >>> >> I haven't been able to create a test-case where there are
> >>> >> multiple candidate arrays and vector alignment of arrays are different.
> >>> >> I suppose in this case we will have to increase alignment
> >>> >> of the struct by the max alignment ?
> >>> >> eg:
> >>> >> static struct
> >>> >> {
> >>> >>   <fields>
> >>> >>   T1 k[S1]
> >>> >>   <fields>
> >>> >>   T2 f[S2]
> >>> >>   <fields>
> >>> >> };
> >>> >>
> >>> >> if V1 is vector type corresponding to T1, and V2 corresponding vector
> >>> >> type to T2,
> >>> >> offset (k) % align(V1) == 0 and offset (f) % align(V2) == 0
> >>> >> and align (V1) > align(V2) then we will increase alignment of struct
> >>> >> by align(V1).
> >>> >>
> >>> >> Testing showed FAIL for g++.dg/torture/pr31863.C due to program timeout.
> >>> >> Initially it appeared to me, it went in infinite loop. However
> >>> >> on second thoughts I think it's probably not an infinite loop, rather
> >>> >> taking (extraordinarily) large amount of time
> >>> >> to compile the test-case with the patch.
> >>> >> The test-case  builds quickly for only 2 instantiations of ClassSpec
> >>> >> (ClassSpec<Key, A001, 1>,
> >>> >>  ClassSpec<Key, A002, 2>)
> >>> >> Building with 22 instantiations (upto ClassSpec<Key, A0023, 22>) takes up
> >>> >> to ~1m to compile.
> >>> >> with:
> >>> >> 23  instantiations: ~2m
> >>> >> 24 instantiations: ~5m
> >>> >> For 30 instantiations I terminated cc1plus after 13m (by SIGKILL).
> >>> >>
> >>> >> I guess it shouldn't go in an infinite loop because:
> >>> >> a) structs cannot have circular references.
> >>> >> b) works for lower number of instantiations
> >>> >> However I have no sound evidence that it cannot be in infinite loop.
> >>> >> I don't understand why a decl node is getting visited more than once
> >>> >> for that test-case.
> >>> >>
> >>> >> Using a hash_map to store alignments of decl's so that decl node gets visited
> >>> >> only once prevents the issue.
> >>> >
> >>> > Maybe aliases.  Try not walking vnode->alias == true vars.
> >>> Hi,
> >>> I have a hypothesis why decl node gets visited multiple times.
> >>>
> >>> Consider the test-case:
> >>> template <typename T, unsigned N>
> >>> struct X
> >>> {
> >>>   T a;
> >>>   virtual int foo() { return N; }
> >>> };
> >>>
> >>> typedef struct X<int, 1> x_1;
> >>> typedef struct X<int ,2> x_2;
> >>>
> >>> static x_1 obj1 __attribute__((used));
> >>> static x_2 obj2 __attribute__((used));
> >>>
> >>> Two additional structs are created by C++FE, c++filt shows:
> >>> _ZTI1XIiLj1EE -> typeinfo for X<int, 1u>
> >>> _ZTI1XIiLj2EE -> typeinfo for X<int, 2u>
> >>>
> >>> Both of these structs have only one field D.2991 and it appears it's
> >>> *shared* between them:
> >>>  struct  D.2991;
> >>>     const void * D.2980;
> >>>     const char * D.2981;
> >>>
> >>> Hence the decl node D.2991 and it's fields (D.2890, D.2981) get visited twice:
> >>> once when walking _ZTI1XIiLj1EE and 2nd time when walking _ZTI1XIiLj2EE
> >>>
> >>> Dump of walking over the global structs for above test-case:
> >>> http://pastebin.com/R5SABW0c
> >>>
> >>> So it appears to me to me a DAG (interior node == struct decl, leaf ==
> >>> non-struct field,
> >>> edge from node1 -> node2 if node2 is field of node1) is getting
> >>> created when struct decl
> >>> is a type-info object.
> >>>
> >>> I am not really clear on how we should proceed:
> >>> a) Keep using hash_map to store alignments so that every decl gets
> >>> visited only once.
> >>> b) Skip walking artificial record decls.
> >>> I am not sure if skipping all artificial struct decls would be a good
> >>> idea, but I don't
> >>> think it's possible to identify if a struct-decl is typeinfo struct at
> >>> middle-end ?
> >>
> >> You shouldn't end up walking those when walking the type of
> >> global decls.  That is, don't walk typeinfo decls - yes, practically
> >> that means just not walking DECL_ARTIFICIAL things.
> > Hi,
> > I have done the changes in the patch (attached) and cross-tested
> > on arm*-*-* and aarch64*-*-* without failures.
> > Is it OK for stage-1 ?
> Hi,
> Is the attached patch OK for trunk ?
> Bootstrapped and tested on aarch64-linux-gnu, ppc64le-linux-gnu.
> Cross-tested on arm*-*-* and aarch64*-*-*.

You can't simply use

+      offset = int_byte_position (field);

as it can be placed at variable offset which will make int_byte_position
ICE.  Note it also returns a truncated byte position (bit position
stripped) which may be undesirable here.  I think you want to use
bit_position and if and only if DECL_FIELD_OFFSET and
DECL_FIELD_BIT_OFFSET are INTEGER_CST.

Your observation about the expensiveness of the walk still stands I guess
and eventually you should at least cache the 
get_vec_alignment_for_record_decl cases.  Please make those workers
_type rather than _decl helpers.

You seem to simply get at the maximum vectorized field/array element 
alignment possible for all arrays - you could restrict that to
arrays with at least vector size (as followup).

+  /* Skip artificial decls like typeinfo decls or if
+     record is packed.  */
+  if (DECL_ARTIFICIAL (record_decl) || TYPE_PACKED (type))
+    return 0;

I think we should honor DECL_USER_ALIGN as well and not mess with those 
decls.

Given the patch now does quite some extra work it might make sense
to split the symtab part out of the vect_can_force_dr_alignment_p
predicate and call that early.

Richard.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]