Bug 84100 - [7 Regression] Function __attribute__((optimize(align-loops=32))) gives spurious warning and is ignored
Summary: [7 Regression] Function __attribute__((optimize(align-loops=32))) gives spuri...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: c (show other bugs)
Version: 7.1.0
: P2 normal
Target Milestone: 9.0
Assignee: Martin Liška
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-29 12:47 UTC by Chris Hall
Modified: 2018-07-18 12:07 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 6.4.0, 7.1.0, 8.0
Last reconfirmed: 2018-01-30 00:00:00


Attachments
gcc8-pr84100.patch (696 bytes, patch)
2018-01-30 14:06 UTC, Jakub Jelinek
Details | Diff
Summary of results using Compiler Explorer: for v5.4.0, 6.3.0 and 7.2.0. (110.49 KB, application/pdf)
2018-02-02 14:50 UTC, Chris Hall
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Hall 2018-01-29 12:47:01 UTC
With v7.2.1, compiling for recent x86_64, the command line option "-falign-loops=32" is accepted and has the desired effect.

However I find that both:

    _Pragma("GCC optimize(\"align-loops=32\")")

and:

    __attribute__((optimize("align-loops=32")))

are rejected with a "warning: bad option '-falign-loops=32'" -- reported for each and every affected function.

I have tried optimize("align-loops=x") for x=0, 1, 8 and 16 -- all are rejected the same way.

FWIW, brief checks with Compiler Explorer suggest:

  1) this appears to be a new bug in v7, or at least
     since v6.3.

  2) v6.3 accepts the _Pragma(), but does *not* do the requested
     alignment.

  3) v5.4 accepts the _Pragma(), and does the requested alignment.

Chris
Comment 1 Richard Biener 2018-01-30 09:07:05 UTC
We have

falign-loops
Common Report Var(align_loops,0) Optimization UInteger
Align the start of loops.

falign-loops=
Common RejectNegative Joined UInteger Var(align_loops)

so maybe behavior WRT non-Optimization marked "aliases" changed?

Confirmed it works with GCC 6.
Comment 2 Jakub Jelinek 2018-01-30 13:45:30 UTC
Started with r237174.
Guess falign-loops= needs to be marked Optimization too.
Will check for other similar cases.
Comment 3 Jakub Jelinek 2018-01-30 14:06:04 UTC
Created attachment 43292 [details]
gcc8-pr84100.patch

Untested fix.
Comment 4 Jakub Jelinek 2018-01-31 08:27:23 UTC
Author: jakub
Date: Wed Jan 31 08:26:52 2018
New Revision: 257219

URL: https://gcc.gnu.org/viewcvs?rev=257219&root=gcc&view=rev
Log:
	PR c/84100
	* common.opt (falign-functions=, falign-jumps=, falign-labels=,
	falign-loops=): Add Optimization flag.

	* gcc.dg/pr84100.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr84100.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/common.opt
    trunk/gcc/testsuite/ChangeLog
Comment 5 Jakub Jelinek 2018-01-31 08:28:29 UTC
Fixed on the trunk so far.
Comment 6 Chris Hall 2018-02-02 14:50:44 UTC
Created attachment 43325 [details]
Summary of results using Compiler Explorer: for v5.4.0, 6.3.0 and 7.2.0.

Setting -O2 or -O3 on the command line appears to imply -falign-functions=16 and -falign-loops=n, where n is at least 10 but less than 14.

Setting -falign-functions=32 on the command line does the requested alignment, plus the implied -falign-loops=n for -O2 and -O3 -- except for -Os, which does no alignment

Setting both -falign-functions=32 and -falign-loops=32 on the command line does both requested alignments -- except for -Os which does neither and -O0 which does only the function alignment.

So far, so good.

One might hope that using _Pragma() would have the same effect as the command line -falign-functions=32 and -falign-loops=32.  Sadly, this is not the case.  As reported, v7 rejects the pragmas.  But I find that v5.4.0 and v6.3.0, (a) don't always do the same thing and (b) don't always align loops as requested.

In particular: for -O2 and -O3, setting both align-functions=32 and align-loops=32 by _Pragma() will: align functions as requested, BUT loops are aligned to 8 (which is less than useful) !!  

The last set of results looks at compiling with command line -O0 through to -O3 (including -Og and -Os), with _Pragma() for -O3, align-functions=32 and align-loops=32.  The _Pragma() for -O3 seems to override the command line as far as this small test is concerned, except for -fomit-frame-pointer.  Sadly, align-loops=32 is still broken.
Comment 7 Chris Hall 2018-02-04 15:22:46 UTC
And here's a funny thing...

... if I compile "-O3 -falign-functions -falign-loops=32" I get the alignment I ask for.

... if I compile  "-O3 -falign-functions -falign-loops=32 -fno-tree-vectorize" I get the alignment I ask for (and I avoid Bug 84101).

... if I compile "-O3 -falign-functions -falign-loops=32" but with:

  _Pragma("GCC optimize(\"no-tree-vectorize\")")
  extern int func_x(int num) ;
  extern int func_y(int num) ;
  extern uint64_pair_t pair(int num) ;
  extern int func_z(int num) ;

I no longer get the alignment I asked for, I get the default alignment for -O3 !

And, for further amusement:

  extern int func_x(int num) ;
  extern int func_y(int num) ;
  extern uint64_pair_t pair(int num)  
                          __attribute__((optimize("no-tree-vectorize"))) ;
  extern int func_z(int num) ;

has exactly the same effect !

FWIW, I have tried _Pragma("GCC push/pop/reset_options") -- the effect of which is hard to determine; but I can say that pop/reset do not restore all optimization option settings.
Comment 8 Martin Liška 2018-02-06 13:51:21 UTC
I see one another problem with usage of the pragma:

$ cat lbm.i
void a (void) {}
#pragma GCC push_options
#pragma GCC optimize "align-functions=128"
void b (void) {}
#pragma GCC pop_options
void c (void) {}

$ ./xgcc -B. -O2 -S lbm.i -c -o/dev/stdout
	.file	"lbm.i"
	.text
	.p2align 4,,15
	.globl	a
	.type	a, @function
a:
.LFB0:
	.cfi_startproc
	ret
	.cfi_endproc
.LFE0:
	.size	a, .-a
	.p2align 4,,127 <-------- THIS is wrong as it should be '6,,127'
	.globl	b
	.type	b, @function
b:
.LFB1:
	.cfi_startproc
	ret
	.cfi_endproc
.LFE1:
	.size	b, .-b
...

I've got patch for that, let me Jakub take this PR.
Comment 9 Martin Liška 2018-02-06 15:43:44 UTC
And I see very similar problem for -falign-loops, ...
Comment 10 Martin Liška 2018-02-12 10:22:27 UTC
Ok, so I've spent quite some time on it and it's not easy. Problem is that we have default values for max skips that live here:

flags.h:
struct target_flag_state {
  /* Values of the -falign-* flags: how much to align labels in code.
     0 means `use default', 1 means `don't align'.
     For each variable, there is an _log variant which is the power
     of two not less than the variable, for .align output.  */
  int x_align_loops_log;
  int x_align_loops_max_skip;
  int x_align_jumps_log;
  int x_align_jumps_max_skip;
  int x_align_labels_log;
  int x_align_labels_max_skip;
  int x_align_functions_log;

  /* The excess precision currently in effect.  */
  enum excess_precision x_flag_excess_precision;
};

which is a target structure. Currently ix86_default_align is called as ix86_override_options_after_change hook. But problem with the hook is that we are unable to distinguish between an option set by used (-falign-function=xyz), or whether the default value was set in ix86_default_align after options were processed.

That said, the proper solution should be have both values as field of 'struct gcc_options'. That's probably done in https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00720.html which was not merged yet.

I'll return to it in next stage1.
Comment 11 Chris Hall 2018-02-14 14:39:45 UTC
FWIW:  __attribute__((aligned(32))) works nicely for functions.

Generally there is little to be gained from aligning all loops/jumps/labels in a given function or group of functions.

Further, when code alignment issues bite, there is no guarantee that the solution is to align all loops/jumps/labels in the same way.

So, it would be nice to be able to apply __attribute__((aligned(n))) to loops and/or labels... and perhaps to blocks.  That way, precise alignment could be applied precisely where it matters.

I have been doing some detailed timing of relatively small operations, which of course involves a loop to do the operation being timed many times.  If the timing loop is not 32-byte aligned, the timings I get are not stable.
Comment 12 Martin Liška 2018-02-19 13:08:51 UTC
(In reply to Chris Hall from comment #11)
> FWIW:  __attribute__((aligned(32))) works nicely for functions.

You are right, that works fine! It should be equal to use

#pragma GCC optimize "align-functions=128"

or the corresponding attribute.


> 
> Generally there is little to be gained from aligning all loops/jumps/labels
> in a given function or group of functions.
> 
> Further, when code alignment issues bite, there is no guarantee that the
> solution is to align all loops/jumps/labels in the same way.
> 
> So, it would be nice to be able to apply __attribute__((aligned(n))) to
> loops and/or labels... and perhaps to blocks.  That way, precise alignment
> could be applied precisely where it matters.
> 
> I have been doing some detailed timing of relatively small operations, which
> of course involves a loop to do the operation being timed many times.  If
> the timing loop is not 32-byte aligned, the timings I get are not stable.

Such fine grained control would be nice, but it's very problematic to track such information from a front-end to back-end where the real alignment happens.
Comment 13 Martin Liška 2018-07-04 07:51:40 UTC
Author: marxin
Date: Wed Jul  4 07:51:08 2018
New Revision: 262375

URL: https://gcc.gnu.org/viewcvs?rev=262375&root=gcc&view=rev
Log:


2018-07-04  Denys Vlasenko  <dvlasenk@redhat.com>
	    Martin Liska  <mliska@suse.cz>

	PR middle-end/66240
	PR target/45996
	PR c/84100
	* common.opt: Rename align options with 'str_' prefix.
	* common/config/i386/i386-common.c (set_malign_value): New
	function.
	(ix86_handle_option): Use it to set -falign-* options/
	* config/aarch64/aarch64-protos.h (struct tune_params): Change
	type from int to string.
	* config/aarch64/aarch64.c: Update default values from int
	to string.
	* config/alpha/alpha.c (alpha_override_options_after_change):
	Likewise.
	* config/arm/arm.c (arm_override_options_after_change_1): Likewise.
	* config/i386/dragonfly.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/freebsd.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/gnu-user.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/i386.c (struct ptt): Change type from int to
	string.
	(ix86_default_align): Set default values.
	* config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Print
	max skip conditionally.
	* config/i386/iamcu.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN):
	* config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN):
	* config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	* config/i386/openbsdelf.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN) Print max skip conditionally.:
	* config/i386/x86-64.h (SUBALIGN_LOG): New.
	(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
	max skip conditionally.
	(ASM_OUTPUT_MAX_SKIP_PAD): Likewise.
	* config/ia64/ia64.c (ia64_option_override): Set default values
        for alignment options.
	* config/m68k/m68k.c: Handle new str_align_* options.
	* config/mips/mips.c (mips_set_compression_mode): Change
	type of constants.
	(mips_option_override): Set default values for options.
	* config/powerpcspe/powerpcspe.c (rs6000_option_override_internal):
        Likewise.
	* config/rs6000/rs6000.c (rs6000_option_override_internal):
	Likewise.
	* config/rx/rx.c (rx_option_override): Likewise.
	* config/rx/rx.h (JUMP_ALIGN): Use align_jumps_log.
	(LABEL_ALIGN): Use align_labels_log.
	(LOOP_ALIGN): Use align_loops_align.
	* config/s390/s390.c (s390_asm_output_function_label): Use new
        macros.
	* config/sh/sh.c (sh_override_options_after_change):
	Change type of constants.
	* config/spu/spu.c (spu_sched_init): Likewise.
	* config/sparc/sparc.c (sparc_option_override): Set default
        values for options.
	* config/visium/visium.c (visium_option_override): Likewise.
	* config/visium/visium.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Do not
        emit p2align format with last argument if it's not needed.
	* doc/invoke.texi: Document extended format of -falign-*.
	* final.c: Use align_labels alignment.
	* flags.h (struct target_flag_state): Change type to use
	align_flags.
	(struct align_flags_tuple): New.
	(struct align_flags): Likewise.
	(align_loops_log): Redefine macro to use new types.
	(align_loops_max_skip): Redefine macro to use new types.
	(align_jumps_log): Redefine macro to use new types.
	(align_jumps_max_skip): Redefine macro to use new types.
	(align_labels_log): Redefine macro to use new types.
	(align_labels_max_skip): Redefine macro to use new types.
	(align_functions_log): Redefine macro to use new types.
	(align_loops): Redefine macro to use new types.
	(align_jumps): Redefine macro to use new types.
	(align_labels): Redefine macro to use new types.
	(align_functions): Redefine macro to use new types.
	(align_functions_max_skip): Redefine macro to use new types.
	(align_loops_value): New macro.
	(align_jumps_value): New macro.
	(align_labels_value): New macro.
	(align_functions_value): New macro.
	* function.c (invoke_set_current_function_hook): Propagate
	alignment values from flags to global variables default in
	topleev.h.
	* ipa-icf.c (sem_function::equals_wpa): Use
	cl_optimization_option_eq instead of memcmp.
	* lto-streamer.h (cl_optimization_stream_out): Support streaming
	of string types.
	(cl_optimization_stream_in): Likewise.
	* optc-save-gen.awk: Support strings in cl_optimization.
	* opth-gen.awk: Likewise.
	* opts.c (finish_options): Remove error checking of invalid
	value ranges.
	(MAX_CODE_ALIGN): Remove.
	(MAX_CODE_ALIGN_VALUE): Likewise.
	(parse_and_check_align_values): New function.
	(check_alignment_argument): Likewise.
	(common_handle_option): Use check_alignment_argument.
	* opts.h (parse_and_check_align_values): Declare.
	* toplev.c (init_alignments): Remove.
	(read_log_maxskip): New.
	(parse_N_M): Likewise.
	(parse_alignment_opts): Likewise.
	(backend_init_target): Remove usage of init_alignments.
	* toplev.h (parse_alignment_opts): Declare.
	* tree-streamer-in.c (streamer_read_tree_bitfields): Add new
	argument.
	* tree-streamer-out.c (streamer_write_tree_bitfields): Likewise.
	* tree.c (cl_option_hasher::equal): New.
	* varasm.c: Use new global macros.
2018-07-04  Martin Liska  <mliska@suse.cz>

	PR middle-end/66240
	PR target/45996
	PR c/84100
	* lto.c (compare_tree_sccs_1): Use cl_optimization_option_eq
	instead of memcmp.
2018-07-04  Martin Liska  <mliska@suse.cz>

	PR middle-end/66240
	PR target/45996
	PR c/84100
	* gcc.dg/pr84100.c (foo):
	* gcc.target/i386/falign-functions-2.c: New test.
	* gcc.target/i386/falign-functions.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/falign-functions-2.c
    trunk/gcc/testsuite/gcc.target/i386/falign-functions.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/common.opt
    trunk/gcc/common/config/i386/i386-common.c
    trunk/gcc/config/aarch64/aarch64-protos.h
    trunk/gcc/config/aarch64/aarch64.c
    trunk/gcc/config/alpha/alpha.c
    trunk/gcc/config/arm/arm.c
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/ia64/ia64.c
    trunk/gcc/config/m68k/m68k.c
    trunk/gcc/config/mips/mips.c
    trunk/gcc/config/powerpcspe/powerpcspe.c
    trunk/gcc/config/rs6000/rs6000.c
    trunk/gcc/config/rx/rx.c
    trunk/gcc/config/rx/rx.h
    trunk/gcc/config/s390/s390.c
    trunk/gcc/config/sh/sh.c
    trunk/gcc/config/sparc/sparc.c
    trunk/gcc/config/spu/spu.c
    trunk/gcc/config/spu/spu.h
    trunk/gcc/config/visium/visium.c
    trunk/gcc/config/visium/visium.h
    trunk/gcc/doc/invoke.texi
    trunk/gcc/final.c
    trunk/gcc/flags.h
    trunk/gcc/function.c
    trunk/gcc/ipa-icf.c
    trunk/gcc/lto-streamer.h
    trunk/gcc/lto/ChangeLog
    trunk/gcc/lto/lto.c
    trunk/gcc/optc-save-gen.awk
    trunk/gcc/opth-gen.awk
    trunk/gcc/opts.c
    trunk/gcc/opts.h
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/pr84100.c
    trunk/gcc/toplev.c
    trunk/gcc/toplev.h
    trunk/gcc/tree-streamer-in.c
    trunk/gcc/tree-streamer-out.c
    trunk/gcc/tree.c
    trunk/gcc/varasm.c
Comment 14 Martin Liška 2018-07-18 12:07:07 UTC
Fixed on trunk, not planning to backport that.