With v7.2.1, compiling for recent x86_64, the command line option "-falign-loops=32" is accepted and has the desired effect. However I find that both: _Pragma("GCC optimize(\"align-loops=32\")") and: __attribute__((optimize("align-loops=32"))) are rejected with a "warning: bad option '-falign-loops=32'" -- reported for each and every affected function. I have tried optimize("align-loops=x") for x=0, 1, 8 and 16 -- all are rejected the same way. FWIW, brief checks with Compiler Explorer suggest: 1) this appears to be a new bug in v7, or at least since v6.3. 2) v6.3 accepts the _Pragma(), but does *not* do the requested alignment. 3) v5.4 accepts the _Pragma(), and does the requested alignment. Chris
We have falign-loops Common Report Var(align_loops,0) Optimization UInteger Align the start of loops. falign-loops= Common RejectNegative Joined UInteger Var(align_loops) so maybe behavior WRT non-Optimization marked "aliases" changed? Confirmed it works with GCC 6.
Started with r237174. Guess falign-loops= needs to be marked Optimization too. Will check for other similar cases.
Created attachment 43292 [details] gcc8-pr84100.patch Untested fix.
Author: jakub Date: Wed Jan 31 08:26:52 2018 New Revision: 257219 URL: https://gcc.gnu.org/viewcvs?rev=257219&root=gcc&view=rev Log: PR c/84100 * common.opt (falign-functions=, falign-jumps=, falign-labels=, falign-loops=): Add Optimization flag. * gcc.dg/pr84100.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr84100.c Modified: trunk/gcc/ChangeLog trunk/gcc/common.opt trunk/gcc/testsuite/ChangeLog
Fixed on the trunk so far.
Created attachment 43325 [details] Summary of results using Compiler Explorer: for v5.4.0, 6.3.0 and 7.2.0. Setting -O2 or -O3 on the command line appears to imply -falign-functions=16 and -falign-loops=n, where n is at least 10 but less than 14. Setting -falign-functions=32 on the command line does the requested alignment, plus the implied -falign-loops=n for -O2 and -O3 -- except for -Os, which does no alignment Setting both -falign-functions=32 and -falign-loops=32 on the command line does both requested alignments -- except for -Os which does neither and -O0 which does only the function alignment. So far, so good. One might hope that using _Pragma() would have the same effect as the command line -falign-functions=32 and -falign-loops=32. Sadly, this is not the case. As reported, v7 rejects the pragmas. But I find that v5.4.0 and v6.3.0, (a) don't always do the same thing and (b) don't always align loops as requested. In particular: for -O2 and -O3, setting both align-functions=32 and align-loops=32 by _Pragma() will: align functions as requested, BUT loops are aligned to 8 (which is less than useful) !! The last set of results looks at compiling with command line -O0 through to -O3 (including -Og and -Os), with _Pragma() for -O3, align-functions=32 and align-loops=32. The _Pragma() for -O3 seems to override the command line as far as this small test is concerned, except for -fomit-frame-pointer. Sadly, align-loops=32 is still broken.
And here's a funny thing... ... if I compile "-O3 -falign-functions -falign-loops=32" I get the alignment I ask for. ... if I compile "-O3 -falign-functions -falign-loops=32 -fno-tree-vectorize" I get the alignment I ask for (and I avoid Bug 84101). ... if I compile "-O3 -falign-functions -falign-loops=32" but with: _Pragma("GCC optimize(\"no-tree-vectorize\")") extern int func_x(int num) ; extern int func_y(int num) ; extern uint64_pair_t pair(int num) ; extern int func_z(int num) ; I no longer get the alignment I asked for, I get the default alignment for -O3 ! And, for further amusement: extern int func_x(int num) ; extern int func_y(int num) ; extern uint64_pair_t pair(int num) __attribute__((optimize("no-tree-vectorize"))) ; extern int func_z(int num) ; has exactly the same effect ! FWIW, I have tried _Pragma("GCC push/pop/reset_options") -- the effect of which is hard to determine; but I can say that pop/reset do not restore all optimization option settings.
I see one another problem with usage of the pragma: $ cat lbm.i void a (void) {} #pragma GCC push_options #pragma GCC optimize "align-functions=128" void b (void) {} #pragma GCC pop_options void c (void) {} $ ./xgcc -B. -O2 -S lbm.i -c -o/dev/stdout .file "lbm.i" .text .p2align 4,,15 .globl a .type a, @function a: .LFB0: .cfi_startproc ret .cfi_endproc .LFE0: .size a, .-a .p2align 4,,127 <-------- THIS is wrong as it should be '6,,127' .globl b .type b, @function b: .LFB1: .cfi_startproc ret .cfi_endproc .LFE1: .size b, .-b ... I've got patch for that, let me Jakub take this PR.
And I see very similar problem for -falign-loops, ...
Ok, so I've spent quite some time on it and it's not easy. Problem is that we have default values for max skips that live here: flags.h: struct target_flag_state { /* Values of the -falign-* flags: how much to align labels in code. 0 means `use default', 1 means `don't align'. For each variable, there is an _log variant which is the power of two not less than the variable, for .align output. */ int x_align_loops_log; int x_align_loops_max_skip; int x_align_jumps_log; int x_align_jumps_max_skip; int x_align_labels_log; int x_align_labels_max_skip; int x_align_functions_log; /* The excess precision currently in effect. */ enum excess_precision x_flag_excess_precision; }; which is a target structure. Currently ix86_default_align is called as ix86_override_options_after_change hook. But problem with the hook is that we are unable to distinguish between an option set by used (-falign-function=xyz), or whether the default value was set in ix86_default_align after options were processed. That said, the proper solution should be have both values as field of 'struct gcc_options'. That's probably done in https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00720.html which was not merged yet. I'll return to it in next stage1.
FWIW: __attribute__((aligned(32))) works nicely for functions. Generally there is little to be gained from aligning all loops/jumps/labels in a given function or group of functions. Further, when code alignment issues bite, there is no guarantee that the solution is to align all loops/jumps/labels in the same way. So, it would be nice to be able to apply __attribute__((aligned(n))) to loops and/or labels... and perhaps to blocks. That way, precise alignment could be applied precisely where it matters. I have been doing some detailed timing of relatively small operations, which of course involves a loop to do the operation being timed many times. If the timing loop is not 32-byte aligned, the timings I get are not stable.
(In reply to Chris Hall from comment #11) > FWIW: __attribute__((aligned(32))) works nicely for functions. You are right, that works fine! It should be equal to use #pragma GCC optimize "align-functions=128" or the corresponding attribute. > > Generally there is little to be gained from aligning all loops/jumps/labels > in a given function or group of functions. > > Further, when code alignment issues bite, there is no guarantee that the > solution is to align all loops/jumps/labels in the same way. > > So, it would be nice to be able to apply __attribute__((aligned(n))) to > loops and/or labels... and perhaps to blocks. That way, precise alignment > could be applied precisely where it matters. > > I have been doing some detailed timing of relatively small operations, which > of course involves a loop to do the operation being timed many times. If > the timing loop is not 32-byte aligned, the timings I get are not stable. Such fine grained control would be nice, but it's very problematic to track such information from a front-end to back-end where the real alignment happens.
Author: marxin Date: Wed Jul 4 07:51:08 2018 New Revision: 262375 URL: https://gcc.gnu.org/viewcvs?rev=262375&root=gcc&view=rev Log: 2018-07-04 Denys Vlasenko <dvlasenk@redhat.com> Martin Liska <mliska@suse.cz> PR middle-end/66240 PR target/45996 PR c/84100 * common.opt: Rename align options with 'str_' prefix. * common/config/i386/i386-common.c (set_malign_value): New function. (ix86_handle_option): Use it to set -falign-* options/ * config/aarch64/aarch64-protos.h (struct tune_params): Change type from int to string. * config/aarch64/aarch64.c: Update default values from int to string. * config/alpha/alpha.c (alpha_override_options_after_change): Likewise. * config/arm/arm.c (arm_override_options_after_change_1): Likewise. * config/i386/dragonfly.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print max skip conditionally. * config/i386/freebsd.h (SUBALIGN_LOG): New. (ASM_OUTPUT_MAX_SKIP_ALIGN): Print max skip conditionally. * config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print max skip conditionally. * config/i386/gnu-user.h (SUBALIGN_LOG): New. (ASM_OUTPUT_MAX_SKIP_ALIGN): Print max skip conditionally. * config/i386/i386.c (struct ptt): Change type from int to string. (ix86_default_align): Set default values. * config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Print max skip conditionally. * config/i386/iamcu.h (SUBALIGN_LOG): New. (ASM_OUTPUT_MAX_SKIP_ALIGN): * config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN): * config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print max skip conditionally. * config/i386/openbsdelf.h (SUBALIGN_LOG): New. (ASM_OUTPUT_MAX_SKIP_ALIGN) Print max skip conditionally.: * config/i386/x86-64.h (SUBALIGN_LOG): New. (ASM_OUTPUT_MAX_SKIP_ALIGN): Print max skip conditionally. (ASM_OUTPUT_MAX_SKIP_PAD): Likewise. * config/ia64/ia64.c (ia64_option_override): Set default values for alignment options. * config/m68k/m68k.c: Handle new str_align_* options. * config/mips/mips.c (mips_set_compression_mode): Change type of constants. (mips_option_override): Set default values for options. * config/powerpcspe/powerpcspe.c (rs6000_option_override_internal): Likewise. * config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise. * config/rx/rx.c (rx_option_override): Likewise. * config/rx/rx.h (JUMP_ALIGN): Use align_jumps_log. (LABEL_ALIGN): Use align_labels_log. (LOOP_ALIGN): Use align_loops_align. * config/s390/s390.c (s390_asm_output_function_label): Use new macros. * config/sh/sh.c (sh_override_options_after_change): Change type of constants. * config/spu/spu.c (spu_sched_init): Likewise. * config/sparc/sparc.c (sparc_option_override): Set default values for options. * config/visium/visium.c (visium_option_override): Likewise. * config/visium/visium.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Do not emit p2align format with last argument if it's not needed. * doc/invoke.texi: Document extended format of -falign-*. * final.c: Use align_labels alignment. * flags.h (struct target_flag_state): Change type to use align_flags. (struct align_flags_tuple): New. (struct align_flags): Likewise. (align_loops_log): Redefine macro to use new types. (align_loops_max_skip): Redefine macro to use new types. (align_jumps_log): Redefine macro to use new types. (align_jumps_max_skip): Redefine macro to use new types. (align_labels_log): Redefine macro to use new types. (align_labels_max_skip): Redefine macro to use new types. (align_functions_log): Redefine macro to use new types. (align_loops): Redefine macro to use new types. (align_jumps): Redefine macro to use new types. (align_labels): Redefine macro to use new types. (align_functions): Redefine macro to use new types. (align_functions_max_skip): Redefine macro to use new types. (align_loops_value): New macro. (align_jumps_value): New macro. (align_labels_value): New macro. (align_functions_value): New macro. * function.c (invoke_set_current_function_hook): Propagate alignment values from flags to global variables default in topleev.h. * ipa-icf.c (sem_function::equals_wpa): Use cl_optimization_option_eq instead of memcmp. * lto-streamer.h (cl_optimization_stream_out): Support streaming of string types. (cl_optimization_stream_in): Likewise. * optc-save-gen.awk: Support strings in cl_optimization. * opth-gen.awk: Likewise. * opts.c (finish_options): Remove error checking of invalid value ranges. (MAX_CODE_ALIGN): Remove. (MAX_CODE_ALIGN_VALUE): Likewise. (parse_and_check_align_values): New function. (check_alignment_argument): Likewise. (common_handle_option): Use check_alignment_argument. * opts.h (parse_and_check_align_values): Declare. * toplev.c (init_alignments): Remove. (read_log_maxskip): New. (parse_N_M): Likewise. (parse_alignment_opts): Likewise. (backend_init_target): Remove usage of init_alignments. * toplev.h (parse_alignment_opts): Declare. * tree-streamer-in.c (streamer_read_tree_bitfields): Add new argument. * tree-streamer-out.c (streamer_write_tree_bitfields): Likewise. * tree.c (cl_option_hasher::equal): New. * varasm.c: Use new global macros. 2018-07-04 Martin Liska <mliska@suse.cz> PR middle-end/66240 PR target/45996 PR c/84100 * lto.c (compare_tree_sccs_1): Use cl_optimization_option_eq instead of memcmp. 2018-07-04 Martin Liska <mliska@suse.cz> PR middle-end/66240 PR target/45996 PR c/84100 * gcc.dg/pr84100.c (foo): * gcc.target/i386/falign-functions-2.c: New test. * gcc.target/i386/falign-functions.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/falign-functions-2.c trunk/gcc/testsuite/gcc.target/i386/falign-functions.c Modified: trunk/gcc/ChangeLog trunk/gcc/common.opt trunk/gcc/common/config/i386/i386-common.c trunk/gcc/config/aarch64/aarch64-protos.h trunk/gcc/config/aarch64/aarch64.c trunk/gcc/config/alpha/alpha.c trunk/gcc/config/arm/arm.c trunk/gcc/config/i386/i386.c trunk/gcc/config/ia64/ia64.c trunk/gcc/config/m68k/m68k.c trunk/gcc/config/mips/mips.c trunk/gcc/config/powerpcspe/powerpcspe.c trunk/gcc/config/rs6000/rs6000.c trunk/gcc/config/rx/rx.c trunk/gcc/config/rx/rx.h trunk/gcc/config/s390/s390.c trunk/gcc/config/sh/sh.c trunk/gcc/config/sparc/sparc.c trunk/gcc/config/spu/spu.c trunk/gcc/config/spu/spu.h trunk/gcc/config/visium/visium.c trunk/gcc/config/visium/visium.h trunk/gcc/doc/invoke.texi trunk/gcc/final.c trunk/gcc/flags.h trunk/gcc/function.c trunk/gcc/ipa-icf.c trunk/gcc/lto-streamer.h trunk/gcc/lto/ChangeLog trunk/gcc/lto/lto.c trunk/gcc/optc-save-gen.awk trunk/gcc/opth-gen.awk trunk/gcc/opts.c trunk/gcc/opts.h trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/pr84100.c trunk/gcc/toplev.c trunk/gcc/toplev.h trunk/gcc/tree-streamer-in.c trunk/gcc/tree-streamer-out.c trunk/gcc/tree.c trunk/gcc/varasm.c
Fixed on trunk, not planning to backport that.