restrict leaving byte copies unoptimized
Dan Dickerman
dan09@trueindiemedia.com
Thu Mar 18 05:11:00 GMT 2010
> Good questions, but I don't know all the answers. The nop
> instructions are there because the gcc instruction scheduler is
> creating groups which are intended to be optimal for the instruction
> dispatcher. You should make sure that you are using a -mtune option
> that corresponds to the processor you are using, to make sure that gcc
> is doing something that is appropriate there.
>
Indeed changing to -mtune=G4 (or G3) gets rid of the nops which are
there in the -mtune=G5 and default versions, though still giving the
stalling load/store pairs, rather than the interleaved
load/load/.../store/store sequence seen with the word-op code:
...
lbz r0,1(r2)
stb r0,1(r9)
lbz r11,2(r2)
stb r11,2(r9)
lbz r0,3(r2)
stb r0,3(r9)
lbz r11,4(r2)
stb r11,4(r9)
lbz r0,5(r2)
stb r0,5(r9)
...
> gcc has gotten steadily better support for the restrict qualifier, but
> it still doesn't work as well as it should. In gcc 4.2 it did very
> little.
>
It does make me wonder if the restrict qualifier optimizations were
simply done for word operations and not for byte versions, if there are
good reasons for this or if something about the G5 architecture prefers
the nops in the pipeline rather than stalled load/store sequences.
Interesting stuff; thanks for the pointers,
--
Dan
More information about the Gcc-help
mailing list