Bug 44612 - -flto -fwhole-program: Never read variable not optimized away
-flto -fwhole-program: Never read variable not optimized away
Status: NEW
Product: gcc
Classification: Unclassified
Component: middle-end
4.6.0
: P3 normal
: ---
Assigned To: Not yet assigned to anyone
: missed-optimization
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-21 15:16 UTC by Tobias Burnus
Modified: 2010-06-22 14:12 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-06-22 12:26:35


Attachments
Test case (1.07 KB, text/plain)
2010-06-21 15:17 UTC, Tobias Burnus
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2010-06-21 15:16:50 UTC
Follow up to PR 41137.

Using the Intel compiler, the following program takes 0s for the loops (real time: 0.005s); however, with
 gfortran -fdump-tree-original -fwhole-program -flto -ffast-math -march=native -O3 cont.f90
GCC needs 1.142s.

Expected:
* GCC also optimizes the loops away if the variable "a" is never read (but only set)


Removing the !! comments prevents ifort from optimizing the loops away; still the performance is with (real time) 0.650s twice as good as the one of GCC.
Comment 1 Tobias Burnus 2010-06-21 15:17:11 UTC
Created attachment 20966 [details]
Test case
Comment 2 Richard Biener 2010-06-22 11:19:00 UTC
Your testcase doesn't build:

contiguous.f90:54.5:

REAL,contiguous :: a(:,:,:,:)
     1
Error: Invalid character in name at (1)
contiguous.f90:55:

a(:,:,:,:)=0.0
1
Error: Unclassifiable statement at (1)
contiguous.f90:15.8:

CALL SC(a)
        1
Error: Rank mismatch in argument 'a' at (1) (0 and 4)
Comment 3 Jakub Jelinek 2010-06-22 11:37:11 UTC
It does, you need latest trunk though (r161079 or later).
Comment 4 Tobias Burnus 2010-06-22 12:26:34 UTC
(In reply to comment #2)
> Your testcase doesn't build:
> REAL,contiguous :: a(:,:,:,:)
>      1
> Error: Invalid character in name at (1)

If you want to test it with other compilers than the latest trunk, you can simply take out the "contiguous" - though, as Jakub mentioned, it should build with the latest GCC trunk builds. 

 * * *


C test case:

$ gcc -flto -fwhole-program -O3 -std=c99 test.c && time ./a.out
Start
Done

real    0m0.699s
user    0m0.676s
sys     0m0.004s


#include <stdio.h>
#define SIZE 40000

void s4 (float *restrict a)
{
  (void) __builtin_memset ((void *) a, 0, sizeof(float)*SIZE);
}


int main ()
{
  static float a[SIZE];
  int i;
  printf("Start\n");
  for (i = 0; i < SIZE; i++)
    s4 (a);
  printf("Done\n");
  return 0;
}
Comment 5 Tobias Burnus 2010-06-22 13:14:43 UTC
Similarly for:
  static void s4 (float *restrict a) {
    for (int j=1; j < SIZE; j++) a[j] = 0;
  }

Except that this uses
   MEM[symbol: a, index: ivtmp.25_27, offset: 16] = { 0.0, 0.0, 0.0, 0.0 };
in the optimized dump rather than "memset". But also here: the loop is not optimized away.
Comment 6 Richard Biener 2010-06-22 13:46:02 UTC
Confirmed.

DSE doesn't remove memset or memcpy calls.

We also do not have a flag to mark functions only clobbering local or
incoming memory (thus effectively not have side-effects if the argument
reachable memory is unused after a call).
Comment 7 Jakub Jelinek 2010-06-22 14:12:30 UTC
Well, RTL DSE to some extent knows about memset.
And, for tree DSE handling it would be much easier and desirable.