This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Predictive commoning leads to register to register moves through memory.


Hello all,

I've been looking at a code generation issue with GCC 5.2 lately dealing with register to register moves through memory with -O3 -funroll-loops. For reference the C code is at the end of this mail. The generated code for mips is (cut down for clarity, ldc1 and sdc1 are double word floating point stores):

        div.d   $f8,$f6,$f4
        mul.d   $f2,$f8,$f8
        sdc1    $f2,8($7)
$L38:
        ldc1    $f0,8($7)		<- load instead of move
        li      $11,1                   # 0x1
	
	<snip>
$L49:
	....
        div.d   $f8,$f6,$f4
        addiu   $11,$10,3
        mul.d   $f2,$f8,$f8
        sdc1    $f2,8($7)
        ldc1    $f0,8($7)		<- load instead of move
$L48:
        mul.d   $f2,$f2,$f0

	<snip>


$L45:
        mul.d   $f2,$f4,$f4
        mov.d   $f8,$f4
        j       $L38
        sdc1    $f2,8($7)

For the basic block L38, all dominating blocks store to 8($7) which is then loaded back into another floating register. 

Disabling predictive commoning generates:

        div.d   $f4,$f18,$f2
        mul.d   $f0,$f4,$f4
$L37:
        mul.d   $f6,$f0,$f0
        li      $10,1                   # 0x1
        mul.d   $f8,$f0,$f6
        mul.d   $f10,$f0,$f8
        mul.d   $f12,$f0,$f10
        mul.d   $f14,$f0,$f12
        mul.d   $f16,$f0,$f14
        beq     $4,$10,$L38
        mul.d   $f20,$f0,$f16

For the same basic block. 

Following Jeff's advice[1] to extract more information from GCC, I've narrowed the cause down to the predictive commoning pass inserting the load in a loop header style basic block. However, the next pass in GCC, tree-cunroll promptly removes the loop and joins the loop header to the body of the (non)loop. More oddly, disabling conditional store elimination pass or the dominator optimizations pass or disabling of jump-threading with --param max-jump-thread-duplication-stmts=0 nets the above assembly code. Any ideas on an approach for this issue?

[1] https://gcc.gnu.org/ml/gcc-help/2015-08/msg00162.html

Thanks,
Simon

double N;
int i1;
double T;
double poly[9];


void 
g (int iterations)
{ 
  int count = 0;
  for (count = 0; count < iterations; count++)
    {
      if (N > 1)
        {
          T = 1 / N;
        }
      else
        {
          T = N;
        }

      poly[1] = T * T;
      for (i1 = 2; i1 <= 8; i1++)
        {
          poly[i1] = poly[i1 - 1] * poly[1];
        }
    }

  return;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]