Bug 108863 - Unrolling could use range information
Summary: Unrolling could use range information
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2023-02-20 20:26 UTC by Thomas Koenig
Modified: 2023-03-08 16:36 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-02-20 00:00:00


Attachments
Assembly code generated by test case (756 bytes, text/plain)
2023-02-20 20:26 UTC, Thomas Koenig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Koenig 2023-02-20 20:26:51 UTC
Created attachment 54497 [details]
Assembly code generated by test case

Looking a bit more at the code generated for the test code of PR108839.

For the test
$ cat u2.c
void foo(double *const restrict dx, double *dy, double da, long int n)
{
      long int m = n % 4;
      for (unsigned long i = 0; i < m; i++ )
        dy[i] = dy[i] + da * dx[i];
}

a recently-ish trunk gives, with

$ gcc -S -O3  -funroll-all-loops -fno-tree-vectorize u2.c

far too much unrolling for a loop which can only be executed, at
most, four times (see attachment).

The range information about m does not appear to be propagated to
the unroll passes.
Comment 1 Andrew Pinski 2023-02-20 20:33:10 UTC
(In reply to Thomas Koenig from comment #0)
> The range information about m does not appear to be propagated to
> the unroll passes.

Most likely because range information is not propagated at all to rtl level.
In this case even just non-zero bits might be enough...
Comment 2 Andrew Pinski 2023-02-20 21:09:11 UTC
Confirmed. Though maybe the tree level unroller could improve this situtation such that it just does the unroll here 4 times.