This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GCC3 to GCC4 performance regression. Bug?


I have been looking at a significant performance regression in the hmmer
application between GCC 3.4 and GCC 4.0.  I have a small cutdown test
case (attached) that demonstrates the problem and which runs more than
10% slower on IA64 (HP-UX or Linux) when compiled with GCC 4.0 than when
compiled with GCC 3.4.  At first I thought this was just due to 'better'
alias analysis in the P7Viterbi routine and that it was the right thing
to do even if it was slower.  It looked like GCC 3.4 does not believe
that hmm->tsc could alias mmx but GCC 4.0 thinks they could and thus GCC
4.0 does more loads inside the inner loop of P7Viterbi.  But then I
noticed something weird, if I remove the field M (which is unused in my
example) from the plan_s structure.  GCC 4.0 runs as fast as GCC 3.4.  I
don't understand why this would affect things.

Any optimization experts care to take a look at this test case and help
me understand what is going on and if this change from 3.4 to 4.0 is
intentional or not?

Steve Ellcey
sje@cup.hp.com


------------------------ Test Case -----------------------

#define L_CONST 500

void *malloc(long size);

struct plan7_s {
  int M;
  int **tsc;                   /* transition scores     [0.6][1.M-1]        */
};

struct dpmatrix_s {
  int **mmx;
};
struct dpmatrix_s *mx;



void
AllocPlan7Body(struct plan7_s *hmm, int M) 
{
  int i;

  hmm->tsc    = malloc (7 * sizeof(int *));
  hmm->tsc[0] = malloc ((M+16) * sizeof(int));
  mx->mmx = (int **) malloc(sizeof(int *) * (L_CONST+1));
  for (i = 0; i <= L_CONST; i++) {
    mx->mmx[i] = malloc (M+2+16);
  }
  return;
}  

void
P7Viterbi(int L, int M, struct plan7_s *hmm, int **mmx)
{
  int   i,k;
  
  for (i = 1; i <= L; i++) {
    for (k = 1; k <= M; k++) {
      mmx[i][k] = mmx[i-1][k-1] + hmm->tsc[0][k-1];
    }
  }
}

main ()
{
	struct plan7_s *hmm;
	char dsq[L_CONST];
        int i;

	hmm = (struct plan7_s *) malloc (sizeof (struct plan7_s));
	mx = (struct dpmatrix_s *) malloc (sizeof (struct dpmatrix_s));
	AllocPlan7Body(hmm, 10);
        for (i = 0; i < 600000; i++) {
                P7Viterbi(500, 10, hmm, mx->mmx);
        }
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]