This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Vectorization: Loop peeling with misaligned support.


On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote:
> "OndÅej BÃlka" <neleai@seznam.cz> wrote:
> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
> 
> IIRC what can still be seen is store-buffer related slowdowns when you have a big unaligned store load in your loop.  Thus aligning stores still pays back last time I measured this.

Then send you benchmark. What I did is a loop that stores 512 bytes. Unaligned stores there are faster than aligned ones, so tell me when aligning stores pays itself. Note that in filling store buffer you must take into account extra stores to make loop aligned.

Also what do you do with loops that contain no store? If I modify test to

int set(int *p, int *q){
  int i;
  int sum = 0;
  for (i=0; i < 128; i++)
     sum += 42 * p[i];
  return sum;
}

then it still does aligning.

There may be a threshold after which aligning buffer makes sense then you
need to show that loop spend most of time on sizes after that treshold.

Also do you have data how common store-buffer slowdowns are? Without
knowing that you risk that you make few loops faster at expense of
majority which could likely slow whole application down. It would not
supprise me as these loops can be ran mostly on L1 cache data (which is
around same level as assuming that increased code size fits into instruction cache.)


Actually these questions could be answered by a test, first compile
SPEC2006 with vanilla gcc -O3 and then with gcc that contains patch to
use unaligned loads. Then results will tell if peeling is also good in
practice or not.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]