This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Store motion is not moving loads & stores to the same location out of loops- as it should be.

From: Mostafa Hagog <MUSTAFA at il dot ibm dot com>
To: akdver at atrey dot karlin dot mff dot cuni dot cz
Cc: gcc at gcc dot gnu dot org
Date: Mon, 27 Oct 2003 11:25:47 +0200
Subject: Store motion is not moving loads & stores to the same location out of loops- as it should be.




Hi Zdenek,

I tried to understand why the store motion pass in gcc doesn't move the
load and store out of the loop in the examples below as it should do
according to the documentation in gcse.c.

The first example (I) is a code sample taken from a hot loop in 179.art;,
gcse should have moved the load/store from/to Y[tj].y out of the inner
loop.  But when compiled with a recent version of gcc3.4 both the load
and store from/to Y[tj].y remained in the inner loop.

Even when I simplified the example, as in the second example (II), and the
load from Y.y was moved out of the inner loop, the store to Y.y remained
inside the inner loop.

Do you have an idea what is going wrong there?

Example I
----------
int numf1s, numf2s;
double **bus;

typedef struct {
      double *I;
      double W;
      double X;
      double V;
      double U;
      double P;
      double Q;
      double R;
            } f1_neuron;

f1_neuron *f1_layer;

typedef struct {
      double y;
      int   reset;
      } xyz;

xyz *Y;

void match ()
{
  int ti,tj;

  /* Compute F2 - y values */
  for (tj=0;tj<numf2s;tj++)
  {
    Y[tj].y = 0;
    if ( !Y[tj].reset )
    for (ti=0;ti<numf1s;ti++)
      Y[tj].y += f1_layer[ti].P * bus[ti][tj];
  }
}

The inner loop assembly
(on a powerpc-apple-darwin6.4 machine
 flags: -S -O3 --param max-gcse-passes=3 -mdynamic-no-pic):

L14:
        slwi r0,r8,2
        lfd f2,0(r11)
        lwzx r9,r5,r0
        addi r11,r11,60
        lfdx f0,r10,r7
        addi r8,r8,1
        lfdx f3,r6,r9
        fmadd f1,f2,f3,f0
        stfdx f1,r10,r7
        bdnz L14
-----------------------------------

Example II
-----------

int numf1s, numf2s;
double **bus;

typedef struct {
      double *I;
      double W;
      double X;
      double V;
      double U;
      double P;
      double Q;
      double R;
            } f1_neuron;

f1_neuron *f1_layer;

typedef struct {
      double y;
      int   reset;
      } xyz;

xyz Y;

void match ()
{
  int ti,tj;

    Y.y = 0;
    if ( !Y.reset )
    for (ti=0;ti<numf1s;ti++)
      Y.y += f1_layer[ti].P * bus[ti][tj];
  }
}

The generated assembly of the loop
(on a powerpc-apple-darwin6.4 machine
 flags: -S -O3 --param max-gcse-passes=3 -mdynamic-no-pic):

L8:
        slwi r0,r10,2
        lfd f2,0(r11)
        lwzx r4,r7,r0
        addi r11,r11,60
        addi r10,r10,1
        lfdx f0,r8,r4
        fmadd f1,f2,f0,f12
        fmr f12,f1
        stfd f1,0(r6)
        bdnz L8



Thanks,
Mostafa

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]