This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Scheduling an early complete loop unrolling pass?
- From: Richard Guenther <rguenther at suse dot de>
- To: gcc at gcc dot gnu dot org
- Cc: dorit at il dot ibm dot com
- Date: Mon, 5 Feb 2007 16:27:03 +0100 (CET)
- Subject: Scheduling an early complete loop unrolling pass?
Hi,
currently with -ftree-vectorize we generate for
for (i=0; i<3; ++i)
# SFT.4346_507 = VDEF <SFT.4346_504(D)>
# SFT.4347_508 = VDEF <SFT.4347_505(D)>
# SFT.4348_509 = VDEF <SFT.4348_506(D)>
d[i] = 0.0;
for (j=0; j<n; ++j)
x[j] = d;
(that is, zero a small vector and use that to initialize an array
of vectors)
<L266>:;
vect_cst_.4501_723 = { 0.0, 0.0 };
vect_p.4506_724 = (vector double *) &D.76822;
vect_p.4502_725 = vect_p.4506_724;
# ivtmp.4508_728 = PHI <0(6), ivtmp.4508_729(11)>
# ivtmp.4507_726 = PHI <vect_p.4502_725(6), ivtmp.4507_727(11)>
# ivtmp.4461_601 = PHI <3(6), ivtmp.4461_485(11)>
# SFT.4348_612 = PHI <SFT.4348_506(D)(6), SFT.4348_509(11)>
# SFT.4347_611 = PHI <SFT.4347_505(D)(6), SFT.4347_508(11)>
# SFT.4346_610 = PHI <SFT.4346_504(D)(6), SFT.4346_507(11)>
# i_582 = PHI <0(6), i_118(11)>
<L131>:;
# SFT.4346_507 = VDEF <SFT.4346_610>
# SFT.4347_508 = VDEF <SFT.4347_611>
# SFT.4348_509 = VDEF <SFT.4348_612>
*ivtmp.4507_726 = vect_cst_.4501_723;
i_118 = i_582 + 1;
ivtmp.4461_485 = ivtmp.4461_601 - 1;
ivtmp.4507_727 = ivtmp.4507_726 + 16B;
ivtmp.4508_729 = ivtmp.4508_728 + 1;
if (ivtmp.4508_729 < 1) goto <L171>; else goto <L263>;
# i_722 = PHI <i_118(7)>
# ivtmp.4461_717 = PHI <ivtmp.4461_485(7)>
<L263>:;
# ivtmp.4461_706 = PHI <ivtmp.4461_715(10), 1(8)>
# SFT.4348_707 = PHI <SFT.4348_713(10), SFT.4348_509(8)>
# SFT.4347_708 = PHI <SFT.4347_712(10), SFT.4347_508(8)>
# SFT.4346_709 = PHI <SFT.4346_711(10), SFT.4346_507(8)>
# i_710 = PHI <i_714(10), 2(8)>
<L260>:;
# SFT.4346_711 = VDEF <SFT.4346_709>
# SFT.4347_712 = VDEF <SFT.4347_708>
# SFT.4348_713 = VDEF <SFT.4348_707>
D.76822.D.44378.values[i_710] = 0.0;
i_714 = i_710 + 1;
ivtmp.4461_715 = ivtmp.4461_706 - 1;
if (ivtmp.4461_715 != 0) goto <L259>; else goto <L264>;
...
and we are later not able to do constant propagation to the
second loop which we can do if we first unroll such small loops.
As we also only vectorize innermost loops I believe doing a
complete unrolling pass early will help in general (I pushed
for this some time ago).
Thoughts?
Thanks,
Richard.
--
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs