[PATCH 3/3] x86: Update memcpy/memset inline strategies for -mtune=generic
Hongyu Wang
wwwhhhyyy333@gmail.com
Tue Mar 23 02:41:36 GMT 2021
> Hongyue, please collect code size differences on SPEC CPU 2017 and
> eembc.
Here is code size difference for this patch
SPEC CPU 2017
difference w patch w/o patch
500.perlbench_r 0.051% 1622637 1621805
502.gcc_r 0.039% 6930877 6928141
505.mcf_r 0.098% 16413 16397
520.omnetpp_r 0.083% 1327757 1326653
523.xalancbmk_r 0.001% 3575709 3575677
525.x264_r -0.067% 769095 769607
531.deepsjeng_r 0.071% 67629 67581
541.leela_r -3.062% 127629 131661
548.exchange2_r -0.338% 66141 66365
557.xz_r 0.946% 128061 126861
503.bwaves_r 0.534% 33117 32941
507.cactuBSSN_r 0.004% 2993645 2993517
508.namd_r 0.006% 851677 851629
510.parest_r 0.488% 6741277 6708557
511.povray_r -0.021% 849290 849466
521.wrf_r 0.022% 29682154 29675530
526.blender_r 0.054% 7544057 7540009
527.cam4_r 0.043% 6102234 6099594
538.imagick_r -0.015% 1625770 1626010
544.nab_r 0.155% 155453 155213
549.fotonik3d_r 0.000% 351757 351757
554.roms_r 0.041% 735837 735533
eembc
difference w patch w/o patch
aifftr01 0.762% 14813 14701
aiifft01 0.556% 14477 14397
idctrn01 0.101% 15853 15837
cjpeg-rose7-preset 0.114% 56125 56061
nnet_test -0.848% 35549 35853
aes 0.125% 38493 38445
cjpegv2data 0.108% 59213 59149
djpegv2data 0.025% 63821 63805
huffde -0.104% 30621 30653
mp2decoddata -0.047% 68285 68317
mp2enf32data1 0.018% 86925 86909
mp2enf32data2 0.018% 89357 89341
mp2enf32data3 0.018% 88253 88237
mp3playerfixeddata 0.103% 46877 46829
ip_pktcheckb1m 0.191% 25213 25165
nat 0.527% 45757 45517
ospfv2 0.196% 24573 24525
routelookup 0.189% 25389 25341
tcpbulk 0.155% 30925 30877
textv2data 0.055% 29101 29085
H.J. Lu via Gcc-patches <gcc-patches@gcc.gnu.org> 于2021年3月22日周一 下午9:39写道:
>
> On Mon, Mar 22, 2021 at 6:29 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Mar 22, 2021 at 2:19 PM H.J. Lu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Simply memcpy and memset inline strategies to avoid branches for
> > > -mtune=generic:
> > >
> > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
> > > load and store for up to 16 * 16 (256) bytes when the data size is
> > > fixed and known.
> > > 2. Inline only if data size is known to be <= 256.
> > > a. Use "rep movsb/stosb" with simple code sequence if the data size
> > > is a constant.
> > > b. Use loop if data size is not a constant.
> > > 3. Use memcpy/memset libray function if data size is unknown or > 256.
> > >
> > > With -mtune=generic -O2,
> >
> > Is there any visible code-size effect of increasing CLEAR_RATIO on
>
> Hongyue, please collect code size differences on SPEC CPU 2017 and
> eembc.
>
> > SPEC/eembc? Did you play with other values of MOVE/CLEAR_RATIO?
> > 17 memory-to-memory/memory-clear insns looks quite a lot.
> >
>
> Yes, we did. 256 bytes is the threshold above which memcpy/memset in libc
> win. Below 256 bytes, 16 by_pieces move/store is faster.
>
> --
> H.J.
--
Regards,
Hongyu, Wang
More information about the Gcc-patches
mailing list