Scheduling an early complete loop unrolling pass?

Richard Guenther rguenther@suse.de
Mon Feb 5 15:25:00 GMT 2007


Hi,

currently with -ftree-vectorize we generate for

  for (i=0; i<3; ++i)
  # SFT.4346_507 = VDEF <SFT.4346_504(D)>
  # SFT.4347_508 = VDEF <SFT.4347_505(D)>
  # SFT.4348_509 = VDEF <SFT.4348_506(D)>
    d[i] = 0.0;

  for (j=0; j<n; ++j)
    x[j] = d;

(that is, zero a small vector and use that to initialize an array
of vectors)

<L266>:;
  vect_cst_.4501_723 = { 0.0, 0.0 };
  vect_p.4506_724 = (vector double *) &D.76822;
  vect_p.4502_725 = vect_p.4506_724;

  # ivtmp.4508_728 = PHI <0(6), ivtmp.4508_729(11)>
  # ivtmp.4507_726 = PHI <vect_p.4502_725(6), ivtmp.4507_727(11)>
  # ivtmp.4461_601 = PHI <3(6), ivtmp.4461_485(11)>
  # SFT.4348_612 = PHI <SFT.4348_506(D)(6), SFT.4348_509(11)>
  # SFT.4347_611 = PHI <SFT.4347_505(D)(6), SFT.4347_508(11)>
  # SFT.4346_610 = PHI <SFT.4346_504(D)(6), SFT.4346_507(11)>
  # i_582 = PHI <0(6), i_118(11)>
<L131>:;
  # SFT.4346_507 = VDEF <SFT.4346_610>
  # SFT.4347_508 = VDEF <SFT.4347_611>
  # SFT.4348_509 = VDEF <SFT.4348_612>
  *ivtmp.4507_726 = vect_cst_.4501_723;
  i_118 = i_582 + 1;
  ivtmp.4461_485 = ivtmp.4461_601 - 1;
  ivtmp.4507_727 = ivtmp.4507_726 + 16B;
  ivtmp.4508_729 = ivtmp.4508_728 + 1;
  if (ivtmp.4508_729 < 1) goto <L171>; else goto <L263>;

  # i_722 = PHI <i_118(7)>
  # ivtmp.4461_717 = PHI <ivtmp.4461_485(7)>
<L263>:;

  # ivtmp.4461_706 = PHI <ivtmp.4461_715(10), 1(8)>
  # SFT.4348_707 = PHI <SFT.4348_713(10), SFT.4348_509(8)>
  # SFT.4347_708 = PHI <SFT.4347_712(10), SFT.4347_508(8)>
  # SFT.4346_709 = PHI <SFT.4346_711(10), SFT.4346_507(8)>
  # i_710 = PHI <i_714(10), 2(8)>
<L260>:;
  # SFT.4346_711 = VDEF <SFT.4346_709>
  # SFT.4347_712 = VDEF <SFT.4347_708>
  # SFT.4348_713 = VDEF <SFT.4348_707>
  D.76822.D.44378.values[i_710] = 0.0;
  i_714 = i_710 + 1;
  ivtmp.4461_715 = ivtmp.4461_706 - 1;
  if (ivtmp.4461_715 != 0) goto <L259>; else goto <L264>;

...

and we are later not able to do constant propagation to the
second loop which we can do if we first unroll such small loops.

As we also only vectorize innermost loops I believe doing a
complete unrolling pass early will help in general (I pushed
for this some time ago).

Thoughts?

Thanks,
Richard.

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs



More information about the Gcc mailing list