Bug 97588 - Overzealous SRA of boolean bitfields
Summary: Overzealous SRA of boolean bitfields
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.2.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: bitfield
  Show dependency treegraph
 
Reported: 2020-10-27 08:43 UTC by Richard Sandiford
Modified: 2023-07-19 04:07 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-01-22 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Sandiford 2020-10-27 08:43:55 UTC
For the nonsense code (reduced from real code):

--------------------------------------------------
struct s
{
  unsigned int foo : 11;
  unsigned int flag1 : 1;
  unsigned int bar : 11;
  unsigned int flag2 : 1;
};

void
f (int n, int *x, struct s *ptr, struct s flags)
{
  for (int i = 0; i < n; ++i)
    if (x[i] == 1)
      flags.flag1 = 1;
    else if (x[i] == 2)
      flags.flag2 = 1;
    else if (x[i] == 3)
      {
	if (flags.flag1)
	  *ptr++ = flags;
      }
    else if (x[i] == 4)
      {
	if (flags.flag2)
	  *ptr++ = flags;
      }
    else
      *ptr++ = flags;
}
--------------------------------------------------

SRA significantly pessimises the output.  At the machine level,
each update to flags is usually a simple register OR, bit-test,
or move, but SRA instead decides to split flags up into 4
pieces and reassemble it for "*ptr++ = flags" (which in the
original code is the hot statement).
Comment 1 Richard Biener 2020-10-27 09:05:55 UTC
Total scalarization fun I guess.
Comment 2 Martin Jambor 2021-01-22 18:28:40 UTC
SRA only decides to create replacements for the single-bit flags, we
don't do total scalarization when there are bit-fields.  Part of the
problem, is that SRA sees some accesses to the flags directly:

  flags.flag1 = 1;

but others it does not really see because they are quite indirect:

  _13 = BIT_FIELD_REF <flags, 8, 8>;
  _14 = _13 & 8;
  if (_14 != 0)

...and it still decides to scalarize (more on that below), which means
that it has to store the flag back to flags before the BIT_FIELD_REF,
which leads to impression of the access being split into four parts.

Having said that, scalarizing bit-fields, and even more so 1-bit
flags, when there are not multiple reads from the bit-field itself, is
probably not useful (in the case of non-bit fields it can lead to
disappearance of the aggregate).

So, would the following help the real code that you derived the
testcase from?

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index d177f1ba11c..d3c3a4584a0 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2519,7 +2519,8 @@ analyze_access_subtree (struct access *root, struct access *parent,
       && (totally || !root->grp_total_scalarization)
       && (totally
          || root->grp_hint
-         || ((root->grp_scalar_read || root->grp_assignment_read)
+         || (root->size > 1
+             && (root->grp_scalar_read || root->grp_assignment_read)
              && (root->grp_scalar_write || root->grp_assignment_write))))
     {
       /* Always create access replacements that cover the whole access.
@@ -2562,11 +2563,10 @@ analyze_access_subtree (struct access *root, struct access *parent,
          && scalar && !root->first_child
          && !root->grp_total_scalarization
          && (root->grp_scalar_write || root->grp_assignment_write)
+         && (!root->grp_scalar_read && !root->grp_assignment_read)
          && !bitmap_bit_p (cannot_scalarize_away_bitmap,
                            DECL_UID (root->base)))
        {
-         gcc_checking_assert (!root->grp_scalar_read
-                              && !root->grp_assignment_read);
          sth_created = true;
          if (MAY_HAVE_DEBUG_BIND_STMTS)
            {