Bug 54013 - Loop with control flow not vectorized
Summary: Loop with control flow not vectorized
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.8.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer early-break
  Show dependency treegraph
 
Reported: 2012-07-18 11:09 UTC by Richard Biener
Modified: 2024-06-05 17:37 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2012-07-18 11:09:04 UTC
ICC manages to vectorize the following loop which happens in Polyhedron mp_prop_design.

int foo (float x, float *tab)
{
  int i;
  for (i = 2; i < 45; ++i)
    if (x < tab[i])
      break;
  return i - 1;
}
Comment 1 Alan Lawrence 2015-06-15 16:42:45 UTC
Indeed it does (confirmed).

So there are a few tricks here, but they are not Intel-specific, and don't even look to require new tree codes. The loop body can be vectorized by computing the (x < tab[i]) predicate across the vector, and then using a reduction opcode (a bitwise-or reduction would be most natural but others work) to convert to a scalar which then jumps out of the loop, i.e. if *any* of the lanes in the vector would exit:

int foo (float x, float *tab)
{
  for (i = 2; i < 45; i+= 4)
    {
      v4sf v_tab = ...load from tab...
      unsigned v4si v_exit_cond = vec_cond_expr({x,x,x,x} < v_tab, -1, 0);
      if (reduc_max_expr (v_exit_cond)) break;
    }
  ...
}

The epilogue must then work out the value of i at exit (possibly a separate epilogue for the "break" vs the other exit). I see two schemes:

(1) use vec_pack_trunc_expr, or similar, to narrow v_exit_cond down to a scalar, where we can find the first set bit, and use this as an index to add to the value still in i.

(2) compute a vector of the value i would have had if each element had been the one that exitted:

v4si v_i_on_exit = vec_cond_expr (v_exit_cond,
    {i, i+1, i+2, i+3}, /* Maybe available as induction variable?  */
    {MAX_INT, MAX_INT, MAX_INT, MAX_INT})

and then take a reduc_min_expr to look for the *first* value of i that exits.

(There is one more issue, i.e. that we need to speculate the read of tab[i+1...i+3], as the vector load will probably read all the lanes before we know whether earlier iterations should have exited. So we'd need to have some kind of check against that, or e.g. if tab[] were a global with known bounds. Similar/complicated conditions apply to any/everything else in the loop, too!)
Comment 2 Andrew Pinski 2016-01-04 23:52:54 UTC
Note the loop needs an alignment potion otherwise there could be undefined behavior if 45 is passed the tab array bounds and the one of the elements are greater than x.
Comment 3 Andrew Pinski 2024-06-05 17:35:33 UTC
I think for SVE(2?) this could be vectorized using the fault first case.
Comment 4 Tamar Christina 2024-06-05 17:37:52 UTC
Since there's only one source here, alignment peeling should be enough to vectorize it.

our pending patches should support it.  Will add it to verify list.