Bug 61925 - [4.9 Regression] internal error when using vectorization on CPU without SSE
Summary: [4.9 Regression] internal error when using vectorization on CPU without SSE
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.9.1
: P2 normal
Target Milestone: 5.0
Assignee: Jakub Jelinek
Depends on:
Reported: 2014-07-26 22:18 UTC by mikulas
Modified: 2016-08-03 11:10 UTC (History)
2 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: x86_64-unknown-linux-gnu
Build: x86_64-unknown-linux-gnu
Known to work: 5.0
Known to fail: 4.9.4
Last reconfirmed: 2014-07-27 00:00:00

a test case (11.55 KB, text/plain; charset=ISO-8859-2)
2014-07-26 22:18 UTC, mikulas
gcc5-pr61925.patch (1.76 KB, patch)
2015-01-28 14:47 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mikulas 2014-07-26 22:18:36 UTC
Created attachment 33192 [details]
a test case

Compile the attached file with "-O3 -m32 -march=i386". You get an internal error.

vector.c: In function 'f':
vector.c:4:1: warning: SSE vector return without SSE enabled changes the ABI [-Wpsabi]
vector.c:3:38: note: The ABI for passing parameters with 16-byte alignment has changed in GCC 4.6
 __attribute__((vector_size(16))) int f(__attribute__((vector_size(16))) int a, __attribute__((vector_size(16))) int b)
vector.c:3:38: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi]
vector.c:5:2: internal compiler error: in convert_move, at expr.c:333
  return a + b;
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
Comment 1 Marek Polacek 2014-07-27 09:49:32 UTC
Confirmed.  Happens since 4.6.
Comment 2 Marek Polacek 2014-07-27 13:58:52 UTC
Started with r162918.
Comment 3 Richard Biener 2014-11-24 13:01:56 UTC
I will have a look.
Comment 4 Richard Biener 2014-11-25 10:14:13 UTC
On trunk I get

vector.c:6:1: error: unrecognizable insn:
(insn 3 2 4 2 (set (reg/v:TI 101 [ a ])
        (mem/c:TI (plus:SI (reg/f:SI 81 virtual-incoming-args)
                (const_int 16 [0x10])) [1 a+0 S16 A128])) vector.c:4 -1
     (expr_list:REG_EQUIV (mem/c:TI (plus:SI (reg/f:SI 81 virtual-incoming-args)
                (const_int 16 [0x10])) [1 a+0 S16 A128])


Reduced testcase for the convert_move ICE on the branches, ICEs at -m32 -march=i386:

#pragma GCC push_options                                                        
#pragma GCC target("sse")                                                       
typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));     
typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
__m128i _mm_castps_si128(__m128 __A) { }                                        
#pragma GCC pop_options                                                         
__attribute__((vector_size(16))) int                                            f(__attribute__((vector_size(16))) int a,                                  __attribute__((vector_size(16))) int b)                                       
  return a + b;                                                                 

this seems to be fixed on trunk.

Reduced testcase for the ICE on trunk, ICEs at -m32 -march=i386:

#pragma GCC push_options                                                        
#pragma GCC target("sse")                                                       
typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));     
extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_undefined_ps (void) {                                             
#pragma GCC target("sse2")
#pragma GCC pop_options                                                         
__attribute__((vector_size(16))) int f(__attribute__((vector_size(16))) int a, __attribute__((vector_size(16))) int b) {                                        
   return a + b;                                                                

thus it requires a sse2 target attribute.

Would be interesting to know what fixed the convert_move ICE on trunk.
Comment 5 Jakub Jelinek 2014-12-19 13:28:57 UTC
GCC 4.8.4 has been released.
Comment 6 Jakub Jelinek 2015-01-28 08:28:01 UTC
The first testcase in #c4 got fixed with r217633.
Comment 7 Jakub Jelinek 2015-01-28 12:24:14 UTC
So, it seems this is a complete mess.
The reason why we ICE is that the target pragma support is broken.

The main issue I see is that ix86_reset_to_default_globals doesn't actually reset to defaults (== target_option_default_node), but to the current target pragma (== target_option_current_node), and that even only if
ix86_previous_fndecl was previously non-NULL and had non-NULL target specific option.  And then ix86_set_current_function for some strange reason special cases the defaults (i.e. NULL or == target_option_default_node), both for the old and new, rather than the current target pragma (== target_option_current_node).  So, the important question is, is there any reason
why in between functions the target options (both in global_options and target globals) should be set to something other than the defaults (== target_option_default_node)?  I mean, it is hard to guess in what state it is anyway, as ix86_set_current_function when going to NULL will keep it at the latest state, so say __attribute__((target (avx2))) function definition in
a #pragma GCC target ("avx") region will keep it in avx2 state afterwards anyway.
Also, I wonder about the start of
ix86_pragma_target_parse, shouldn't prev_tree be set to
target_option_current_node rather than whatever happens to be in global_options at that point?
Or is keeping global_options to match the current active target pragma needed for say vector type modes?  If yes, then we should arrange for
target_option_current_node != target_option_default_node case that whenever we
ix86_set_current_function to NULL, we also restore global_options to that.

And there is another thing - I've noticed nested ix86_valid_target_attribute_tree calls, the thing is when we are in pragma target, decl_attributes call this function again, and that happens even when
calling ix86_add_new_builtins from this function.  I wonder if we shouldn't temporarily clear current_target_pragma, I think the target builtin decls don't really need that and if pragma GCC target isn't used (but target attribute instead), it isn't done anyway.
Comment 8 Jakub Jelinek 2015-01-28 14:47:52 UTC
Created attachment 34608 [details]

Untested fix that keeps the current #pragma GCC target options in global_options if outside of functions.  Passed make check-gcc \ RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp'
so far.
Comment 9 Jakub Jelinek 2015-02-11 10:04:45 UTC
Author: jakub
Date: Wed Feb 11 10:04:14 2015
New Revision: 220609

URL: https://gcc.gnu.org/viewcvs?rev=220609&root=gcc&view=rev
	PR target/61925
	* config/i386/i386.c (ix86_reset_to_default_globals): Removed.
	(ix86_reset_previous_fndecl): Restore it here, unconditionally.
	(ix86_set_current_function): Rewritten.
	(ix86_add_new_builtins): Temporarily clear current_target_pragma
	when creating builtin fndecls.

	* gcc.target/i386/pr61925-1.c: New test.
	* gcc.target/i386/pr61925-2.c: New test.
	* gcc.target/i386/pr61925-3.c: New test.

Comment 10 Jakub Jelinek 2015-02-11 10:15:29 UTC
Fixed on the trunk so far.
Comment 11 Richard Biener 2015-06-23 08:18:14 UTC
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
Comment 12 Jakub Jelinek 2015-06-26 19:54:23 UTC
GCC 4.9.3 has been released.
Comment 13 Richard Biener 2016-08-03 11:10:55 UTC