[PATCH 3/3] x86: Update memcpy/memset inline strategies for -mtune=generic

Tue Mar 23 02:41:36 GMT 2021

> Hongyue, please collect code size differences on SPEC CPU 2017 and
> eembc.

Here is code size difference for this patch

SPEC CPU 2017
                                   difference             w patch      w/o patch
500.perlbench_r              0.051%             1622637          1621805
502.gcc_r                         0.039%             6930877          6928141
505.mcf_r                         0.098%             16413              16397
520.omnetpp_r               0.083%             1327757          1326653
523.xalancbmk_r            0.001%             3575709          3575677
525.x264_r                       -0.067%           769095            769607
531.deepsjeng_r             0.071%             67629              67581
541.leela_r                       -3.062%           127629            131661
548.exchange2_r            -0.338%            66141              66365
557.xz_r                            0.946%            128061            126861

503.bwaves_r                  0.534%             33117              32941
507.cactuBSSN_r            0.004%             2993645          2993517
508.namd_r                     0.006%             851677            851629
510.parest_r                    0.488%             6741277          6708557
511.povray_r                   -0.021%           849290            849466
521.wrf_r                         0.022%             29682154       29675530
526.blender_r                  0.054%             7544057          7540009
527.cam4_r                      0.043%             6102234          6099594
538.imagick_r                  -0.015%           1625770          1626010
544.nab_r                         0.155%             155453            155213
549.fotonik3d_r              0.000%             351757            351757
554.roms_r                      0.041%             735837            735533

eembc
                                    difference        w patch      w/o patch
aifftr01                              0.762%             14813            14701
aiifft01                              0.556%             14477            14397
idctrn01                            0.101%             15853            15837
cjpeg-rose7-preset         0.114%             56125              56061
nnet_test                         -0.848%           35549              35853
aes                                   0.125%             38493            38445
cjpegv2data                     0.108%             59213              59149
djpegv2data                     0.025%             63821              63805
huffde                               -0.104%           30621              30653
mp2decoddata                -0.047%           68285              68317
mp2enf32data1              0.018%             86925              86909
mp2enf32data2              0.018%             89357              89341
mp2enf32data3              0.018%             88253              88237
mp3playerfixeddata       0.103%             46877              46829
ip_pktcheckb1m              0.191%             25213              25165
nat                                   0.527%             45757             45517
ospfv2                               0.196%             24573             24525
routelookup                     0.189%             25389              25341
tcpbulk                            0.155%             30925              30877
textv2data                        0.055%             29101              29085

H.J. Lu via Gcc-patches <gcc-patches@gcc.gnu.org> 于2021年3月22日周一 下午9:39写道：
>
> On Mon, Mar 22, 2021 at 6:29 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Mar 22, 2021 at 2:19 PM H.J. Lu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Simply memcpy and memset inline strategies to avoid branches for
> > > -mtune=generic:
> > >
> > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
> > >    load and store for up to 16 * 16 (256) bytes when the data size is
> > >    fixed and known.
> > > 2. Inline only if data size is known to be <= 256.
> > >    a. Use "rep movsb/stosb" with simple code sequence if the data size
> > >       is a constant.
> > >    b. Use loop if data size is not a constant.
> > > 3. Use memcpy/memset libray function if data size is unknown or > 256.
> > >
> > > With -mtune=generic -O2,
> >
> > Is there any visible code-size effect of increasing CLEAR_RATIO on
>
> Hongyue, please collect code size differences on SPEC CPU 2017 and
> eembc.
>
> > SPEC/eembc?  Did you play with other values of MOVE/CLEAR_RATIO?
> > 17 memory-to-memory/memory-clear insns looks quite a lot.
> >
>
> Yes, we did.  256 bytes is the threshold above which memcpy/memset in libc
> win. Below 256 bytes, 16 by_pieces move/store is faster.
>
> --
> H.J.

-- 
Regards,

Hongyu, Wang