User account creation filtered due to spam.

Bug 44612 - -flto -fwhole-program: Never read variable not optimized away
Summary: -flto -fwhole-program: Never read variable not optimized away
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.6.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Keywords: missed-optimization
Depends on:
Reported: 2010-06-21 15:16 UTC by Tobias Burnus
Modified: 2010-06-22 14:12 UTC (History)
1 user (show)

See Also:
Known to work:
Known to fail:
Last reconfirmed: 2010-06-22 12:26:35

Test case (291 bytes, text/plain)
2010-06-21 15:17 UTC, Tobias Burnus

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2010-06-21 15:16:50 UTC
Follow up to PR 41137.

Using the Intel compiler, the following program takes 0s for the loops (real time: 0.005s); however, with
 gfortran -fdump-tree-original -fwhole-program -flto -ffast-math -march=native -O3 cont.f90
GCC needs 1.142s.

* GCC also optimizes the loops away if the variable "a" is never read (but only set)

Removing the !! comments prevents ifort from optimizing the loops away; still the performance is with (real time) 0.650s twice as good as the one of GCC.
Comment 1 Tobias Burnus 2010-06-21 15:17:11 UTC
Created attachment 20966 [details]
Test case
Comment 2 Richard Biener 2010-06-22 11:19:00 UTC
Your testcase doesn't build:


REAL,contiguous :: a(:,:,:,:)
Error: Invalid character in name at (1)

Error: Unclassifiable statement at (1)

Error: Rank mismatch in argument 'a' at (1) (0 and 4)
Comment 3 Jakub Jelinek 2010-06-22 11:37:11 UTC
It does, you need latest trunk though (r161079 or later).
Comment 4 Tobias Burnus 2010-06-22 12:26:34 UTC
(In reply to comment #2)
> Your testcase doesn't build:
> REAL,contiguous :: a(:,:,:,:)
>      1
> Error: Invalid character in name at (1)

If you want to test it with other compilers than the latest trunk, you can simply take out the "contiguous" - though, as Jakub mentioned, it should build with the latest GCC trunk builds. 

 * * *

C test case:

$ gcc -flto -fwhole-program -O3 -std=c99 test.c && time ./a.out

real    0m0.699s
user    0m0.676s
sys     0m0.004s

#include <stdio.h>
#define SIZE 40000

void s4 (float *restrict a)
  (void) __builtin_memset ((void *) a, 0, sizeof(float)*SIZE);

int main ()
  static float a[SIZE];
  int i;
  for (i = 0; i < SIZE; i++)
    s4 (a);
  return 0;
Comment 5 Tobias Burnus 2010-06-22 13:14:43 UTC
Similarly for:
  static void s4 (float *restrict a) {
    for (int j=1; j < SIZE; j++) a[j] = 0;

Except that this uses
   MEM[symbol: a, index: ivtmp.25_27, offset: 16] = { 0.0, 0.0, 0.0, 0.0 };
in the optimized dump rather than "memset". But also here: the loop is not optimized away.
Comment 6 Richard Biener 2010-06-22 13:46:02 UTC

DSE doesn't remove memset or memcpy calls.

We also do not have a flag to mark functions only clobbering local or
incoming memory (thus effectively not have side-effects if the argument
reachable memory is unused after a call).
Comment 7 Jakub Jelinek 2010-06-22 14:12:30 UTC
Well, RTL DSE to some extent knows about memset.
And, for tree DSE handling it would be much easier and desirable.