For this C++ code: long _mm_set_epi64x___q0; __attribute__((__vector_size__(2 * sizeof(long)))) long long _mm_set_epi64x() { return (__attribute__((__vector_size__(2 * sizeof(long)))) long long){ _mm_set_epi64x___q0}; } compiled by recent gcc and compiler flag -ftrivial-auto-var-init=zero, does this: $ /home/dcb/gcc/results/bin/gcc -ftrivial-auto-var-init=zero bug756.cc bug756.cc: In function ‘__vector(2) long long int _mm_set_epi64x()’: bug756.cc:2:62: error: invalid operand in return statement 2 | _attribute__((__vector_size__(2 * sizeof(long)))) long long _mm_set_epi64x() { | ^~~~~~~~~~~~~~ D.2371 return D.2371; bug756.cc:2:62: internal compiler error: ‘verify_gimple’ failed 0x105dc3a verify_gimple_in_seq(gimple*) ../../trunk.git/gcc/tree-cfg.c:5228 Taking the new flag off causes the code to compile ok: $ /home/dcb/gcc/results/bin/gcc -c bug756.cc $
Confirmed, works with the C frontend. The issue is that we do _mm_set_epi64x___q0.0_1 = _mm_set_epi64x___q0; D.2371 = {_mm_set_epi64x___q0.0_1}; __builtin_clear_padding (&D.2371, 0B, 1); return D.2371; so __builtin_clear_padding marks the object addressable which in turn makes it no longer a is_gimple_val and thus it cannot be returned directly. I wonder if we can have a more friendly __builtin_clear_padding as internal-function doing sth like val = .IFN_CLEAR_PADDING (val, 0B, 1); thus embedding a RMW cycle on an actual value?
__builtin_clear_padding when folded emits a series of memory stores rather than BIT_INSERT_EXPR etc., so that wouldn't work. But, IMNSHO, -ftrivial-auto-var-init* shouldn't be adding __builtin_clear_padding calls at all for objects of types that can't have any padding. Currently one can do e.g. what my r12-3455-g8122fbff770bcff183a9c3c72e8092c0ca32150b does for OpenMP atomics, + bool clear_padding = false; + if (BITS_PER_UNIT == 8 && CHAR_BIT == 8) + { + HOST_WIDE_INT sz = int_size_in_bytes (cmptype), i; + gcc_assert (sz > 0); + unsigned char *buf = XALLOCAVEC (unsigned char, sz); + memset (buf, ~0, sz); + clear_type_padding_in_mask (cmptype, buf); + for (i = 0; i < sz; i++) + if (buf[i] != (unsigned char) ~0) + { + clear_padding = true; + break; + } + } so that when nothing needs to be padded (the usual case for non-struct/union types unless they have extended long double), the builtin isn't added at all. I doubt we support vectors of long double, so it is mainly whether returning of long double or _Complex long double works fine.
On Mon, 13 Sep 2021, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281 > > --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > __builtin_clear_padding when folded emits a series of memory stores rather than > BIT_INSERT_EXPR etc., so that wouldn't work. > But, IMNSHO, -ftrivial-auto-var-init* shouldn't be adding > __builtin_clear_padding calls at all for objects of types that can't have any > padding. > Currently one can do e.g. what my > r12-3455-g8122fbff770bcff183a9c3c72e8092c0ca32150b does for OpenMP atomics, > + bool clear_padding = false; > + if (BITS_PER_UNIT == 8 && CHAR_BIT == 8) > + { > + HOST_WIDE_INT sz = int_size_in_bytes (cmptype), i; > + gcc_assert (sz > 0); > + unsigned char *buf = XALLOCAVEC (unsigned char, sz); > + memset (buf, ~0, sz); > + clear_type_padding_in_mask (cmptype, buf); > + for (i = 0; i < sz; i++) > + if (buf[i] != (unsigned char) ~0) > + { > + clear_padding = true; > + break; > + } > + } > so that when nothing needs to be padded (the usual case for non-struct/union > types unless they have extended long double), the builtin isn't added at all. > I doubt we support vectors of long double, so it is mainly whether returning of > long double or _Complex long double works fine. for this specific issue it might be enough to not bother initializing padding of is_gimple_reg_type types, I doubt we have any of those that have padding (and if we'd have then the issue would re-surface)
(In reply to rguenther@suse.de from comment #3) > > I doubt we support vectors of long double, so it is mainly whether returning of > > long double or _Complex long double works fine. > > for this specific issue it might be enough to not bother initializing > padding of is_gimple_reg_type types, I doubt we have any of those > that have padding (and if we'd have then the issue would re-surface) long double and _Complex long double do have padding and clearing it is desirable (for -ftrivial-auto-var-init*) or a requirement (e.g. for atomics, whether OpenMP or std::atomic (unfortunately not implemented yet in libstdc++). The hw instruction really do store only 10 bytes and leave the rest uninitialized...
On Mon, 13 Sep 2021, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281 > > --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > (In reply to rguenther@suse.de from comment #3) > > > I doubt we support vectors of long double, so it is mainly whether returning of > > > long double or _Complex long double works fine. > > > > for this specific issue it might be enough to not bother initializing > > padding of is_gimple_reg_type types, I doubt we have any of those > > that have padding (and if we'd have then the issue would re-surface) > > long double and _Complex long double do have padding and clearing it is > desirable (for -ftrivial-auto-var-init*) or a requirement (e.g. for atomics, > whether OpenMP or std::atomic (unfortunately not implemented yet in libstdc++). > The hw instruction really do store only 10 bytes and leave the rest > uninitialized... But then we still cannot do the GIMPLE this way. It's a case where it is problematic to mark something as address-taken when some gimplification already happened, that's usually a bad idea. So I fear that for those cases we have to use alternate GIMPLE to do the padding clearing - can we force-"fold" the clear-padding immediately somehow?
(In reply to rguenther@suse.de from comment #5) > But then we still cannot do the GIMPLE this way. It's a case where > it is problematic to mark something as address-taken when some > gimplification already happened, that's usually a bad idea. > > So I fear that for those cases we have to use alternate GIMPLE > to do the padding clearing - can we force-"fold" the clear-padding > immediately somehow? Sure, for long double/_Complex long double, the primary question is what the flag wants to achieve. Because for non-addressable vars of such types, the clearing of padding isn't all that useful, when those variables live in the FPU stack they are really 10-byte and so the padding doesn't come up together with that. Any copy of the long double values from one place to another will not copy the padding bits... So, perhaps for a security feature non-TREE_ADDRESSABLE decls of such types shouldn't have padding cleared during gimplification, but something should clear that padding when spilling those types to the stack (perhaps e.g. in the function prologue? And as long as the spill slots aren't shared with values of types that don't have similar padding).
(In reply to Jakub Jelinek from comment #2) > __builtin_clear_padding when folded emits a series of memory stores rather > than BIT_INSERT_EXPR etc., so that wouldn't work. > But, IMNSHO, -ftrivial-auto-var-init* shouldn't be adding > __builtin_clear_padding calls at all for objects of types that can't have > any padding. > Currently one can do e.g. what my > r12-3455-g8122fbff770bcff183a9c3c72e8092c0ca32150b does for OpenMP atomics, > + bool clear_padding = false; > > + if (BITS_PER_UNIT == 8 && CHAR_BIT == 8) > > + { > > + HOST_WIDE_INT sz = int_size_in_bytes (cmptype), i; > > + gcc_assert (sz > 0); > > + unsigned char *buf = XALLOCAVEC (unsigned char, sz); > > + memset (buf, ~0, sz); > > + clear_type_padding_in_mask (cmptype, buf); > > + for (i = 0; i < sz; i++) > > + if (buf[i] != (unsigned char) ~0) > > + { > > + clear_padding = true; > > + break; > > + } > > + } > > so that when nothing needs to be padded (the usual case for non-struct/union > types unless they have extended long double), the builtin isn't added at all. > I doubt we support vectors of long double, so it is mainly whether returning > of long double or _Complex long double works fine. I agree that the above additional check is necessary. one question here is, can I use the routine "bool clear_padding_type_may_have_padding_p (tree type)" in gimple-fold.c instead of the above new function for this purpose?
(In reply to qinzhao from comment #7) > I agree that the above additional check is necessary. > > one question here is, can I use the routine "bool > clear_padding_type_may_have_padding_p (tree type)" in gimple-fold.c instead > of the above new function for this purpose? Sure, you can, but just note that it is conservative and fast, it will return true even for types that don't have any padding. But for double, _Complex double, int etc. it will quickly return false. I guess I should use it in c-omp.c too. But also note that it will return true for x86 long double/_Complex long double, and if you have a variable that isn't addressable, you need to figure out what to do. I think it might be easiest not to clear anything, another option is to create a temporary, store the var value in there, clear padding and copy back, but in the end the padding bits will probably not stay cleared. After all, e.g. on x86-64 -m64 the return value will be either in %st or %st/%st(1) pair and the padding isn't present there (but e.g. for ia32 _Complex long double is returned through invisible reference and padding is there, but you'd need to perform clear padding like operation during expansion).
Created attachment 51587 [details] C++ source code Another test case derived from compiling fedora source code.
> On Oct 11, 2021, at 3:29 PM, jakub at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org> wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281 > > --- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > (In reply to qinzhao from comment #7) >> I agree that the above additional check is necessary. >> >> one question here is, can I use the routine "bool >> clear_padding_type_may_have_padding_p (tree type)" in gimple-fold.c instead >> of the above new function for this purpose? > > Sure, you can, but just note that it is conservative and fast, it will return > true even for types that don't have any padding. > But for double, _Complex double, int etc. it will quickly return false. > I guess I should use it in c-omp.c too. Yes, the routine “clear_padding_type_may_have_padding_p” is quicker when returning FALSE. When it return TRUE, might not be very accurate, at this time, we can further call the new routine to make it accurate. Another question here, what’s the purpose for the routine “clear_type_padding_in_mask”? > But also note that it will return true for x86 long double/_Complex long > double, > and if you have a variable that isn't addressable, you need to figure out what > to do. Yes, we should resolve this issue too. > I think it might be easiest not to clear anything, Do you mean not to call clear padding for any variable here? Or only for variables with type long double/_Complex long double? > another option is to > create a temporary, store the var value in there, clear padding and copy back, > but in the end the padding bits will probably not stay cleared. Why not? > After all, e.g. on x86-64 -m64 the return value will be either in %st or > %st/%st(1) pair and the padding isn't present there You mean for variables with type long double/_Complex long doubles, if they are in register (Not-addressable), they don’t have Padding, so, don’t need to clear padding at all? (But this is only true for x86-64) > (but e.g. for ia32 _Complex > long double is returned through invisible reference and padding is there, If they are returned through invisible reference, does this mean they are addressable? > but > you'd need to perform clear padding like operation during expansion). > > -- > You are receiving this mail because: > You are on the CC list for the bug.
(In reply to Jakub Jelinek from comment #8) > Sure, you can, but just note that it is conservative and fast, it will > return true even for types that don't have any padding. > But for double, _Complex double, int etc. it will quickly return false. > I guess I should use it in c-omp.c too. I saw you changed c-omp.c today, I am wondering whether it's better to also add the following as a common utility routine to dimple-fold.[ch], then I can use it when adding padding initializations for auto var: /* Return true if TYPE contains any padding bits. */ bool clear_padding_type_has_padding_p (tree type) { bool clear_padding = false; if (BITS_PER_UNIT == 8 && CHAR_BIT == 8 && clear_padding_type_may_have_padding_p (type)) { HOST_WIDE_INT sz = int_size_in_bytes (cmptype), i; gcc_assert (sz > 0); unsigned char *buf = XALLOCAVEC (unsigned char, sz); memset (buf, ~0, sz); clear_type_padding_in_mask (cmptype, buf); for (i = 0; i < sz; i++) if (buf[i] != (unsigned char) ~0) { clear_padding = true; break; } } return clear_padding; }
(In reply to Jakub Jelinek from comment #8) > and if you have a variable that isn't addressable, you need to figure out > what to do. I think it might be easiest not to clear anything, another > option is to create a temporary, store the var value in there, clear padding > and copy back, but in the end the padding bits will probably not stay > cleared. currently, we add __builtin_clear_padding call _AFTER_ every explicit initialization of an auto variable: var_decl = {init_constructor}; __builtin_clear_padding (&var_decl, 0B, 1); the reason I added the call to EVERY auto variable that has explicit initialization is, the folding of __builtin_clear_padding will automatically turn this call to a NOP when there is no padding in the variable. So, I don't need to check whether the variable has padding explicitly. However, looks like that it's still better to check the type has padding or not before adding the call to save some compilation time for unnecessary folding of the call. In addition to this, if the auto variable is not addressable and need padding (for example, long double or complex long double), then must it be in a register? if so, we can do the following for them: var_decl = ZERO; var_decl = {init_constructor}; i.e, use zero to initialize the whole variable BEFORE explicit fields initialization. for such solution, I need to check whether an auto variable is not addressable in the beginning of the routine "gimplify_init_constructor", how to check for this? let me know your opinion on this solution.
Created attachment 51593 [details] C++ source code Third test case.
All the 3 testing cases can be resolved by adding the following patch (check whether the type has padding before adding call to __builtin_clear_padding): I believe that this is a safe partial fix for this bug. We might need to put this partial fix in first. diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 7fcfef41f72..e100f5bd1f5 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -4651,6 +4651,31 @@ clear_padding_type_may_have_padding_p (tree type) } } +/* Return true if TYPE contains any padding bits. */ + +bool +clear_padding_type_has_padding_p (tree type) +{ + bool has_padding = false; + if (BITS_PER_UNIT == 8 + && CHAR_BIT == 8 + && clear_padding_type_may_have_padding_p (type)) + { + HOST_WIDE_INT sz = int_size_in_bytes (type), i; + gcc_assert (sz > 0); + unsigned char *buf = XALLOCAVEC (unsigned char, sz); + memset (buf, ~0, sz); + clear_type_padding_in_mask (type, buf); + for (i = 0; i < sz; i++) + if (buf[i] != (unsigned char) ~0) + { + has_padding = true; + break; + } + } + return has_padding; +} + /* Emit a runtime loop: for (; buf.base != end; buf.base += sz) __builtin_clear_padding (buf.base); */ diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h index 397f4aeb7cf..eb750a68eca 100644 --- a/gcc/gimple-fold.h +++ b/gcc/gimple-fold.h @@ -37,6 +37,7 @@ extern tree maybe_fold_and_comparisons (tree, enum tree_code, tree, tree, extern tree maybe_fold_or_comparisons (tree, enum tree_code, tree, tree, enum tree_code, tree, tree); extern bool clear_padding_type_may_have_padding_p (tree); +extern bool clear_padding_type_has_padding_p (tree); extern void clear_type_padding_in_mask (tree, unsigned char *); extern bool optimize_atomic_compare_exchange_p (gimple *); extern void fold_builtin_atomic_compare_exchange (gimple_stmt_iterator *); diff --git a/gcc/gimplify.c b/gcc/gimplify.c index d8e4b139349..62684dd6338 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1955,7 +1955,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) In order to make the paddings as zeroes for pattern init, We should add a call to __builtin_clear_padding to clear the paddings to zero in compatiple with CLANG. */ - if (flag_auto_var_init == AUTO_INIT_PATTERN) + if (flag_auto_var_init == AUTO_INIT_PATTERN + && clear_padding_type_has_padding_p (TREE_TYPE (decl))) gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p); } } @@ -5390,6 +5391,7 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, variable has be completely cleared already or it's initialized with an empty constructor. */ if (is_init_expr + && clear_padding_type_has_padding_p (type) && ((AGGREGATE_TYPE_P (type) && !cleared && !is_empty_ctor) || !AGGREGATE_TYPE_P (type)) && is_var_need_auto_init (object))
A summary of this bug based on discussion and more study: multiple issues in the current implemenation: A. should check is_gimple_reg before adding the call to __builtin_clear_padding; (correctness) B. should check whether a type has padding before adding this call; (more efficient) C. For long double/Complex long double variables, if they are explicitly initialzied, should clear their padding during RTL phase when the variable is spilled into stack. in the fix to this bug, A is a must, B is better to add in. C is not needed, can be fixed in another bug, I have created a new PR 102781 to record this issue and will fix it later is needed.
Created attachment 51662 [details] proposed patch to gcc12
The master branch has been updated by Qing Zhao <qinzhao@gcc.gnu.org>: https://gcc.gnu.org/g:429e3b7d8bf6609ddf7c7b1e49244997e9ac76b8 commit r12-4829-g429e3b7d8bf6609ddf7c7b1e49244997e9ac76b8 Author: Oracle Public Cloud User <opc@qinzhao-aarch64-ol8.allregionaliads.osdevelopmeniad.oraclevcn.com> Date: Mon Nov 1 15:14:26 2021 +0000 PR 102281 (-ftrivial-auto-var-init=zero causes ice) Do not add call to __builtin_clear_padding when a variable is a gimple register or it might not have padding. gcc/ChangeLog: 2021-11-01 qing zhao <qing.zhao@oracle.com> * gimplify.c (gimplify_decl_expr): Do not add call to __builtin_clear_padding when a variable is a gimple register or it might not have padding. (gimplify_init_constructor): Likewise. gcc/testsuite/ChangeLog: 2021-11-01 qing zhao <qing.zhao@oracle.com> * c-c++-common/pr102281.c: New test. * gcc.target/i386/auto-init-2.c: Adjust testing case. * gcc.target/i386/auto-init-4.c: Likewise. * gcc.target/i386/auto-init-6.c: Likewise. * gcc.target/aarch64/auto-init-6.c: Likewise.
fixed
fixed.