[PATCH] Extend -falign-FOO=N to N[,M]: the second number is max padding

Denys Vlasenko dvlasenk@redhat.com
Fri Aug 12 00:20:00 GMT 2016



On 08/11/2016 10:59 PM, Andrew Pinski wrote:
> On Thu, Aug 11, 2016 at 1:49 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> falign-functions=N is too simplistic.
>>
>> Ingo Molnar ran some tests and it looks on latest x86 CPUs, 64-byte alignment
>> runs fastest (he tried many other possibilites).
>>
>> However, developers are less than thrilled by the idea of a slam-dunk 64-byte
>> aligning everything. Too much waste:
>>         On 05/20/2015 02:47 AM, Linus Torvalds wrote:
>>         > At the same time, I have to admit that I abhor a 64-byte function
>>         > alignment, when we have a fair number of functions that are (much)
>>         > smaller than that.
>>         >
>>         > Is there some way to get gcc to take the size of the function into
>>         > account? Because aligning a 16-byte or 32-byte function on a 64-byte
>>         > alignment is just criminally nasty and wasteful.
>>
>> This change makes it possible to align function to 64-byte boundaries *IF*
>> this does not introduce huge amount of padding.
>>
>> Patch drops forced alignment to 8 if requested alignment is higher than 8:
>> before the patch, -falign-functions=9 was generating
>>
>>         .p2align 4,,8
>>         .p2align 3
>>
>> which means: "align to 16 if the skip is 8 bytes or less; else align to 8".
>> After this change, ".p2align 3" is not emitted.
>>
>> It is dropped because I ultimately want to do something
>> like -falign-functions=64,8 - IOW, I want to align functions to 64 bytes,
>> but only if that generates padding of less than 8 bytes - otherwise I want
>> *no alignment at all*. The forced ".p2align 3" interferes with that intention.
>>
>> Testing:
>> tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
>> directives are the same before and after the patch.
>> Tested that -falign-functions=N,N (two equal paramenters) works exactly
>> like -falign-functions=N.
>
> Two things I noticed you missed:
> 1) ChangeLog entries (this is required as per the GNU coding style)

Is this form okay? -

...
Testing:
tested that with -falign-functions=N (tried 8, 15, 16, 17...) the alignment
directives are the same before and after the patch.
Tested that -falign-functions=N,N (two equal paramenters) works exactly
like -falign-functions=N.

2016-08-11  Denys Vlasenko  <dvlasenk@redhat.com>

     * common.opt: Change definitions so that -falign_FOO= accepts a string.
     * config/i386/freebsd.h: Remove "If N is large, do at least 8 byte
     alignment" code.
     * config/i386/gnu-user.h: Likewise.
     * config/i386/iamcu.h: Likewise.
     * config/i386/openbsdelf.h: Likewise.
     * config/i386/x86-64.h: Likewise.
     * flags.h (struct target_flag_state): Add x_align_functions_max_skip
     member.
     * toplev.c (parse_N_M): New function.
     (init_alignments): Set align_FOO_log, align_FOO, align_FOO_max_skip
     from specified --align_FOO=N{,M} option
     * varasm.c (assemble_start_function): Use align_functions_max_skip
     instead of align_functions - 1.

Index: gcc/common.opt
===================================================================
....


> 2) Documentation changes.
>
>>
>> Index: gcc/common.opt
>> ===================================================================
...

>>  #define ASM_OUTPUT_MAX_SKIP_ALIGN(FILE,LOG,MAX_SKIP)                   \
>>    do {                                                                 \
>>      if ((LOG) != 0) {                                                  \
>> -      if ((MAX_SKIP) == 0) fprintf ((FILE), "\t.p2align %d\n", (LOG)); \
>> -      else {                                                           \
>> +      if ((MAX_SKIP) == 0 || (MAX_SKIP) >= (1<<(LOG)))                 \
>> +        fprintf ((FILE), "\t.p2align %d\n", (LOG));                    \
>> +      else                                                             \
>>         fprintf ((FILE), "\t.p2align %d,,%d\n", (LOG), (MAX_SKIP));     \
>> -       /* Make sure that we have at least 8 byte alignment if > 8 byte \
>> -          alignment is preferred.  */                                  \
>> -       if ((LOG) > 3                                                   \
>> -           && (1 << (LOG)) > ((MAX_SKIP) + 1)                          \
>> -           && (MAX_SKIP) >= 7)                                         \
>> -         fputs ("\t.p2align 3\n", (FILE));                             \
>> -      }                                                                        \
>>      }                                                                  \
>>    } while (0)
>>  #undef  ASM_OUTPUT_MAX_SKIP_PAD
>
> These looks like a bug fix, it most likely should be sent separately;
> maybe even first.

ok

> Also can you make sure the generic ASM_OUTPUT_MAX_SKIP_ALIGN is done correctly?

If by generic you mean "non-arch-specific", the situation is as follows:
if arch does not define ASM_OUTPUT_MAX_SKIP_ALIGN(file, align, max_skip), then
ASM_OUTPUT_ALIGN_WITH_NOP(file, align) or
ASM_OUTPUT_ALIGN (file, align) macros are used.
As you se from their arguments, those macros don't take "max_skip" argument.
IOW: even in existing code, they ignore max_skip, and always pad to the
specified alignment.
Whether this is considered "correct" or not, is up to you. The patch
changes nothing for those arches.

I checked all arches which do have ASM_OUTPUT_MAX_SKIP_ALIGN.

visium has the same "align at least to 8" logic like many x86 flavors,
do you think it should be removed?

rs6000, aarch64, arm use ".p2align %d,,%d" logic with no special cases,
look good to me.

rx uses ".balign %d,3,%d" and special-cases TARGET_AS100_SYNTAX,
this arch is probably fine too.



More information about the Gcc-patches mailing list