For the nonsense code (reduced from real code): -------------------------------------------------- struct s { unsigned int foo : 11; unsigned int flag1 : 1; unsigned int bar : 11; unsigned int flag2 : 1; }; void f (int n, int *x, struct s *ptr, struct s flags) { for (int i = 0; i < n; ++i) if (x[i] == 1) flags.flag1 = 1; else if (x[i] == 2) flags.flag2 = 1; else if (x[i] == 3) { if (flags.flag1) *ptr++ = flags; } else if (x[i] == 4) { if (flags.flag2) *ptr++ = flags; } else *ptr++ = flags; } -------------------------------------------------- SRA significantly pessimises the output. At the machine level, each update to flags is usually a simple register OR, bit-test, or move, but SRA instead decides to split flags up into 4 pieces and reassemble it for "*ptr++ = flags" (which in the original code is the hot statement).
Total scalarization fun I guess.
SRA only decides to create replacements for the single-bit flags, we don't do total scalarization when there are bit-fields. Part of the problem, is that SRA sees some accesses to the flags directly: flags.flag1 = 1; but others it does not really see because they are quite indirect: _13 = BIT_FIELD_REF <flags, 8, 8>; _14 = _13 & 8; if (_14 != 0) ...and it still decides to scalarize (more on that below), which means that it has to store the flag back to flags before the BIT_FIELD_REF, which leads to impression of the access being split into four parts. Having said that, scalarizing bit-fields, and even more so 1-bit flags, when there are not multiple reads from the bit-field itself, is probably not useful (in the case of non-bit fields it can lead to disappearance of the aggregate). So, would the following help the real code that you derived the testcase from? diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index d177f1ba11c..d3c3a4584a0 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -2519,7 +2519,8 @@ analyze_access_subtree (struct access *root, struct access *parent, && (totally || !root->grp_total_scalarization) && (totally || root->grp_hint - || ((root->grp_scalar_read || root->grp_assignment_read) + || (root->size > 1 + && (root->grp_scalar_read || root->grp_assignment_read) && (root->grp_scalar_write || root->grp_assignment_write)))) { /* Always create access replacements that cover the whole access. @@ -2562,11 +2563,10 @@ analyze_access_subtree (struct access *root, struct access *parent, && scalar && !root->first_child && !root->grp_total_scalarization && (root->grp_scalar_write || root->grp_assignment_write) + && (!root->grp_scalar_read && !root->grp_assignment_read) && !bitmap_bit_p (cannot_scalarize_away_bitmap, DECL_UID (root->base))) { - gcc_checking_assert (!root->grp_scalar_read - && !root->grp_assignment_read); sth_created = true; if (MAY_HAVE_DEBUG_BIND_STMTS) {