This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.


On Tue, May 20, 2014 at 5:00 AM, Kirill Yukhin <kirill.yukhin@gmail.com> wrote:
> Hello,
> On 19 May 09:58, H.J. Lu wrote:
>> On Mon, May 19, 2014 at 9:45 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> > On Mon, May 19, 2014 at 6:42 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >
>> >>>> Uros,
>> >>>> I am looking into libreoffice size and the data alignment seems to make huge
>> >>>> difference. Data section has grown from 5.8MB to 6.3MB in between GCC 4.8 and 4.9,
>> >>>> while clang produces 5.2MB.
>> >>>>
>> >>>> The two patches I posted to not align vtables and RTTI reduces it to 5.7MB, but
>> >>>> But perhaps we want to revisit the alignment rules.  The optimization manuals
>> >>>> usually care only about performance critical loops.  Perhaps we can make the
>> >>>> rules to align only bigger datastructures, or so at least for -O2.
>> >>>
>> >>> Based on the above quote, "Misaligned data access can incur
>> >>> significant performance penalties." and the fact that this particular
>> >>> alignment rule has some compatibility issues with previous versions of
>> >>> gcc (these were later fixed by Jakub), I'd rather leave this rule as
>> >>> is. However, if the access is from the cold section, we can perhaps
>> >>> avoid extra alignment, while avoiding those compatibility issues.
>> >>>
>> >>
>> >> It is excessive to align
>> >>
>> >> struct foo
>> >> {
>> >>   int x1;
>> >>   int x2;
>> >>   char x3;
>> >>   int x4;
>> >>   int x5;
>> >>   char x6;
>> >>   int x7;
>> >>   int x8;
>> >> };
>> >>
>> >> to 32 bytes and align
>> >>
>> >> struct foo
>> >> {
>> >>   int x1;
>> >>   int x2;
>> >>   char x3;
>> >>   int x4;
>> >>   int x5;
>> >>   char x6;
>> >>   int x7[9];
>> >>   int x8;
>> >> };
>> >>
>> >> to 64 bytes.  What performance gain does it provide?
>> >
>> > Avoids "significant performance penalties," perhaps?
>> >
>>
>> Kirill, do we have performance data for excessive alignment
>> vs ABI alignment?
> Nope, we have no actual data showing performance impact on such changes,
> sorry.
>
> We may try such a change on HSW machine (on Spec 2006), will it be useful?
>
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -26576,7 +26576,7 @@ ix86_data_alignment (tree type, int align, bool opt)
>       used to assume.  */
>
>    int max_align_compat
> -    = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT);
> +    = optimize_size ? BITS_PER_WORD : MIN (128, MAX_OFILE_ALIGNMENT);
>
>    /* A data structure, equal or greater than the size of a cache line
>       (64 bytes in the Pentium 4 and other recent Intel processors, including
>
>

ABI alignment should be sufficient for correctness. Bigger alignments
are supposed to give better performance.  Can you try this patch on
HSW and SLM to see if it has any impact on performance?

-- 
H.J.
----
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c0a46ed..4879110 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -26568,39 +26568,6 @@ ix86_constant_alignment (tree exp, int align)
 int
 ix86_data_alignment (tree type, int align, bool opt)
 {
-  /* GCC 4.8 and earlier used to incorrectly assume this alignment even
-     for symbols from other compilation units or symbols that don't need
-     to bind locally.  In order to preserve some ABI compatibility with
-     those compilers, ensure we don't decrease alignment from what we
-     used to assume.  */
-
-  int max_align_compat
-    = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT);
-
-  /* A data structure, equal or greater than the size of a cache line
-     (64 bytes in the Pentium 4 and other recent Intel processors, including
-     processors based on Intel Core microarchitecture) should be aligned
-     so that its base address is a multiple of a cache line size.  */
-
-  int max_align
-    = MIN ((unsigned) ix86_tune_cost->prefetch_block * 8, MAX_OFILE_ALIGNMENT);
-
-  if (max_align < BITS_PER_WORD)
-    max_align = BITS_PER_WORD;
-
-  if (opt
-      && AGGREGATE_TYPE_P (type)
-      && TYPE_SIZE (type)
-      && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
-    {
-      if (wi::geu_p (TYPE_SIZE (type), max_align_compat)
-  && align < max_align_compat)
- align = max_align_compat;
-       if (wi::geu_p (TYPE_SIZE (type), max_align)
-   && align < max_align)
- align = max_align;
-    }
-
   /* x86-64 ABI requires arrays greater than 16 bytes to be aligned
      to 16byte boundary.  */
   if (TARGET_64BIT)
@@ -26616,6 +26583,9 @@ ix86_data_alignment (tree type, int align, bool opt)
   if (!opt)
     return align;

+  if (align < BITS_PER_WORD)
+    align = BITS_PER_WORD;
+
   if (TREE_CODE (type) == ARRAY_TYPE)
     {
       if (TYPE_MODE (TREE_TYPE (type)) == DFmode && align < 64)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]