100846 – Different vector handling for strided IVs and modulo conditions

Bug 100846 - Different vector handling for strided IVs and modulo conditions

Summary: Different vector handling for strided IVs and modulo conditions

Status:	UNCONFIRMED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	12.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2021-06-01 07:22 UTC by Richard Sandiford
Modified:	2021-07-03 05:16 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Sandiford 2021-06-01 07:22:02 UTC

The following loops are equivalent, but we only vectorise the second:

void f1 (int *x)
{
  for (int i = 0; i < 100; ++i)
    if ((i & 1) == 0)
      x[i] += 1;
}

void f2 (int *x)
{
  for (int i = 0; i < 100; i += 2)
    x[i] += 1;
}

I guess this isn't vectorisation-specific.  Perhaps this is something that
ivcanon could handle?

Comment 1 Richard Biener 2021-06-01 08:12:55 UTC

I think that this is iteration space splitting, turning the loop into

   for (int i = 0; i < 100; i+=2)
    if (1)
      x[i] += 1;
   for (int i = 1; i < 100; i+=2)
    if (0)
      x[i] += 1;

alternatively it is unrolling driven by jump threading.  That said,
we'd likely want to handle

  for (int i = 0; i < 100; ++i)
    if ((i & 1) == 0)
      x[i] += 1;
    else
      y[i] += 1;

as well.  loop distribution would eventually create two of the original
loops out of that.

I think the most promising way is to do unrolling and have that keyed on
the ability to optimize away the branches.  So yes, it's ivcanon, but not
said pass but the implementation file.