This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: New parameters to control stringop expansion libcall strategy
- From: Michael Zolotukhin <michael dot v dot zolotukhin at gmail dot com>
- To: Xinliang David Li <davidxl at google dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Jan Hubicka <hubicka at ucw dot cz>, Teresa Johnson <tejohnson at google dot com>
- Date: Wed, 7 Aug 2013 11:14:14 +0400
- Subject: Re: New parameters to control stringop expansion libcall strategy
- References: <CAAkRFZ+muGUjANkKqbp8r4HvddywmRgz+xPbVAUbuU9rE7pC7Q at mail dot gmail dot com> <CAAkRFZ+eO3wq9vJhMG5FBc6Awb04+8PQKz44A0PJEpOo7TByfQ at mail dot gmail dot com> <20130805105658 dot GA5144 at msticlxl57 dot ims dot intel dot com> <CAAkRFZJNrW0WRJQwOUUz1NhHLXKiu44pwEkNSyJj70RNgCOY9w at mail dot gmail dot com> <CANtU07_Oi2xTDX_K71MEpgkGBFXAVXtLGmdTqPNX3Yf=C7tHqA at mail dot gmail dot com> <CAAkRFZKUHeSA1LqKmbb-AtkD81Y8XmC4wsL=gpUj6dQKNXAxPg at mail dot gmail dot com>
> the option is designed for purpose like this.
That's great, thanks!
Michael
> David
On 6 August 2013 20:42, Xinliang David Li <davidxl@google.com> wrote:
> Corrected two small problems reported by the style checker (The
> warnings about the EnumValue for options in stringopt.opt are not
> valid).
>
> On Tue, Aug 6, 2013 at 1:46 AM, Michael Zolotukhin
> <michael.v.zolotukhin@gmail.com> wrote:
>> There are still some formatting issues (like 8 spaces instead of a
>> tab, wrong indentation of do-loop and some other places) - to reveal
>> some of them you could use contrib/check_GNU_style.sh script.
>> But that was a nitpicking again:) Actually I wanted to ask whether
>> you're going to use this option for some performance experiments
>> involving memmov/memset - if so, probably you could tune existing
>> cost-models as well? Is it possible?
>
> the option is designed for purpose like this.
>
> thanks,
>
> David
>
>>
>> Michael
>>
>> On 5 August 2013 20:44, Xinliang David Li <davidxl@google.com> wrote:
>>> thanks. Updated patch attached.
>>>
>>> David
>>>
>>> On Mon, Aug 5, 2013 at 3:57 AM, Michael V. Zolotukhin
>>> <michael.v.zolotukhin@gmail.com> wrote:
>>>> Hi,
>>>> This is a really convenient option, thanks for working on it.
>>>> I can't approve it as I'm not a maintainer, but it looks ok to me,
>>>> except fot a small nitpicking: afair, comments should end with
>>>> dot-space-space.
>>>>
>>>> Michael
>>>>
>>>> On 04 Aug 20:01, Xinliang David Li wrote:
>>>>> The attached is a new patch implementing the stringop inline strategy
>>>>> control using two new -m options:
>>>>>
>>>>> -mmemcpy-strategy=
>>>>> -mmemset-strategy=
>>>>>
>>>>> See changes in doc/invoke.texi for description of the new options. Example:
>>>>> -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned
>>>>>
>>>>> tells compiler to inline memcpy using rep_8byte when the size is no
>>>>> larger than 64 byte, using unrolled_loop when size is no larger than
>>>>> 2048, and for size > 2048, using library call. In all cases,
>>>>> destination alignment adjustment is not done.
>>>>>
>>>>> Tested on x86-64/linux. Ok for trunk?
>>>>>
>>>>> thanks,
>>>>>
>>>>> David
>>>>>
>>>>> 2013-08-02 Xinliang David Li <davidxl@google.com>
>>>>>
>>>>> * config/i386/stringop.def: New file.
>>>>> * config/i386/stringop.opt: New file.
>>>>> * config/i386/i386-opts.h: Include stringopt.def.
>>>>> * config/i386/i386.opt: Include stringopt.opt.
>>>>> * config/i386/i386.c (ix86_option_override_internal):
>>>>> Override default size based stringop inline strategies
>>>>> with options.
>>>>> * config/i386/i386.c (ix86_parse_stringop_strategy_string):
>>>>> New function.
>>>>>
>>>>> 2013-08-04 Xinliang David Li <davidxl@google.com>
>>>>>
>>>>> * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
>>>>> * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
>>>>> * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
>>>>> * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with
>>>>> > FDO), libcall strategy is used with the size is > 8192. This value is
>>>>> > hard coded, which makes it hard to do performance tuning. This patch
>>>>> > adds two new parameters to do that. Potential usage includes
>>>>> > per-application libcall strategy min-size tuning based on summary data
>>>>> > with FDO (e.g, instruction workset size).
>>>>> >
>>>>> > Bootstrap and tested on x86_64/linux. Ok for trunk?
>>>>> >
>>>>> > thanks,
>>>>> >
>>>>> > David
>>>>> >
>>>>> >
>>>>> > 2013-08-02 Xinliang David Li <davidxl@google.com>
>>>>> >
>>>>> > * params.def: New parameters.
>>>>> > * config/i386/i386.c (ix86_option_override_internal):
>>>>> > Override default libcall size limit with parameters.
>>>>
>>>>> Index: config/i386/stringop.def
>>>>> ===================================================================
>>>>> --- config/i386/stringop.def (revision 0)
>>>>> +++ config/i386/stringop.def (revision 0)
>>>>> @@ -0,0 +1,42 @@
>>>>> +/* Definitions for option handling for IA-32.
>>>>> + Copyright (C) 2013 Free Software Foundation, Inc.
>>>>> +
>>>>> +This file is part of GCC.
>>>>> +
>>>>> +GCC is free software; you can redistribute it and/or modify
>>>>> +it under the terms of the GNU General Public License as published by
>>>>> +the Free Software Foundation; either version 3, or (at your option)
>>>>> +any later version.
>>>>> +
>>>>> +GCC is distributed in the hope that it will be useful,
>>>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>>>> +GNU General Public License for more details.
>>>>> +
>>>>> +Under Section 7 of GPL version 3, you are granted additional
>>>>> +permissions described in the GCC Runtime Library Exception, version
>>>>> +3.1, as published by the Free Software Foundation.
>>>>> +
>>>>> +You should have received a copy of the GNU General Public License and
>>>>> +a copy of the GCC Runtime Library Exception along with this program;
>>>>> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
>>>>> +<http://www.gnu.org/licenses/>. */
>>>>> +
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (no_stringop, no_stringop)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (libcall, libcall)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (rep_prefix_1_byte, rep_byte)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (rep_prefix_4_byte, rep_4byte)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (rep_prefix_8_byte, rep_8byte)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (loop_1_byte, byte_loop)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (loop, loop)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (unrolled_loop, unrolled_loop)
>>>>> +DEF_ENUM
>>>>> +DEF_ALG (vector_loop, vector_loop)
>>>>> Index: config/i386/i386.opt
>>>>> ===================================================================
>>>>> --- config/i386/i386.opt (revision 201458)
>>>>> +++ config/i386/i386.opt (working copy)
>>>>> @@ -316,6 +316,14 @@ mstack-arg-probe
>>>>> Target Report Mask(STACK_PROBE) Save
>>>>> Enable stack probing
>>>>>
>>>>> +mmemcpy-strategy=
>>>>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
>>>>> +Specify memcpy expansion strategy when expected size is known
>>>>> +
>>>>> +mmemset-strategy=
>>>>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy)
>>>>> +Specify memset expansion strategy when expected size is known
>>>>> +
>>>>> mstringop-strategy=
>>>>> Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) Init(no_stringop)
>>>>> Chose strategy to generate stringop using
>>>>> Index: config/i386/stringop.opt
>>>>> ===================================================================
>>>>> --- config/i386/stringop.opt (revision 0)
>>>>> +++ config/i386/stringop.opt (revision 0)
>>>>> @@ -0,0 +1,36 @@
>>>>> +/* Definitions for option handling for IA-32.
>>>>> + Copyright (C) 2013 Free Software Foundation, Inc.
>>>>> +
>>>>> +This file is part of GCC.
>>>>> +
>>>>> +GCC is free software; you can redistribute it and/or modify
>>>>> +it under the terms of the GNU General Public License as published by
>>>>> +the Free Software Foundation; either version 3, or (at your option)
>>>>> +any later version.
>>>>> +
>>>>> +GCC is distributed in the hope that it will be useful,
>>>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>>>> +GNU General Public License for more details.
>>>>> +
>>>>> +Under Section 7 of GPL version 3, you are granted additional
>>>>> +permissions described in the GCC Runtime Library Exception, version
>>>>> +3.1, as published by the Free Software Foundation.
>>>>> +
>>>>> +You should have received a copy of the GNU General Public License and
>>>>> +a copy of the GCC Runtime Library Exception along with this program;
>>>>> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
>>>>> +<http://www.gnu.org/licenses/>. */
>>>>> +
>>>>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
>>>>> +
>>>>> +#undef DEF_ENUM
>>>>> +#define DEF_ENUM EnumValue
>>>>> +
>>>>> +#undef DEF_ALG
>>>>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
>>>>> +
>>>>> +#include "stringop.def"
>>>>> +
>>>>> +#undef DEF_ENUM
>>>>> +#undef DEF_ALG
>>>>> Index: config/i386/i386.c
>>>>> ===================================================================
>>>>> --- config/i386/i386.c (revision 201458)
>>>>> +++ config/i386/i386.c (working copy)
>>>>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
>>>>> };
>>>>>
>>>>> /* Processor costs (relative to an add) */
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs i386_cost = { /* 386 specific costs */
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (1), /* cost of a lea instruction */
>>>>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = { /*
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs i486_cost = { /* 486 specific costs */
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (1), /* cost of a lea instruction */
>>>>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = { /*
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs pentium_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (1), /* cost of a lea instruction */
>>>>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs pentiumpro_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (1), /* cost of a lea instruction */
>>>>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs geode_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (1), /* cost of a lea instruction */
>>>>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs k6_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (2), /* cost of a lea instruction */
>>>>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs athlon_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (2), /* cost of a lea instruction */
>>>>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs k8_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (2), /* cost of a lea instruction */
>>>>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs pentium4_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (3), /* cost of a lea instruction */
>>>>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs nocona_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (1), /* cost of a lea instruction */
>>>>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
>>>>> 1, /* cond_not_taken_branch_cost. */
>>>>> };
>>>>>
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs atom_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */
>>>>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
>>>>> };
>>>>>
>>>>> /* Generic64 should produce code tuned for Nocona and K8. */
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs generic64_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> /* On all chips taken into consideration lea is 2 cycles and more. With
>>>>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost =
>>>>> };
>>>>>
>>>>> /* core_cost should produce code tuned for Core familly of CPUs. */
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs core_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> /* On all chips taken into consideration lea is 2 cycles and more. With
>>>>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = {
>>>>>
>>>>> /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
>>>>> Athlon and K8. */
>>>>> -static const
>>>>> +static
>>>>> struct processor_costs generic32_cost = {
>>>>> COSTS_N_INSNS (1), /* cost of an add instruction */
>>>>> COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */
>>>>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void)
>>>>>
>>>>> return;
>>>>> }
>>>>> +
>>>>> +static const char *stringop_alg_names[] = {
>>>>> +#define DEF_ENUM
>>>>> +#define DEF_ALG(alg, name) #name,
>>>>> +#include "stringop.def"
>>>>> +#undef DEF_ENUM
>>>>> +#undef DEF_ALG
>>>>> +};
>>>>> +
>>>>> +/* Parse parameter string passed to -mmemcpy-strategy= or -mmemset-strategy=.
>>>>> + The string is of the following form (or comma separated list of it):
>>>>> +
>>>>> + strategy_alg:max_size:[align|noalign]
>>>>> +
>>>>> + where the full size range for the strategy is either [0, max_size] or
>>>>> + [min_size, max_size], in which min_size is the max_size + 1 of the
>>>>> + preceding range. The last size range must have max_size == -1.
>>>>> +
>>>>> + Examples:
>>>>> +
>>>>> + 1.
>>>>> + -mmemcpy-strategy=libcall:-1:noalign
>>>>> +
>>>>> + this is equivalent to (for known size memcpy) -mstringop-strategy=libcall
>>>>> +
>>>>> +
>>>>> + 2.
>>>>> + -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
>>>>> +
>>>>> + This is to tell the compiler to use the following strategy for memset
>>>>> + 1) when the expected size is between [1, 16], use rep_8byte strategy;
>>>>> + 2) when the size is between [17, 2048], use vector_loop;
>>>>> + 3) when the size is > 2048, use libcall.
>>>>> +
>>>>> +*/
>>>>> +
>>>>> +struct stringop_size_range
>>>>> +{
>>>>> + int min;
>>>>> + int max;
>>>>> + stringop_alg alg;
>>>>> + bool noalign;
>>>>> +};
>>>>> +
>>>>> +static void
>>>>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
>>>>> +{
>>>>> + const struct stringop_algs *default_algs;
>>>>> + stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
>>>>> + char *curr_range_str, *next_range_str;
>>>>> + int i = 0, n = 0;
>>>>> +
>>>>> + if (is_memset)
>>>>> + default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
>>>>> + else
>>>>> + default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
>>>>> +
>>>>> + curr_range_str = strategy_str;
>>>>> +
>>>>> + do {
>>>>> +
>>>>> + int mins, maxs;
>>>>> + stringop_alg alg;
>>>>> + char alg_name[128];
>>>>> + char align[16];
>>>>> +
>>>>> + next_range_str = strchr (curr_range_str, ',');
>>>>> + if (next_range_str)
>>>>> + *next_range_str++ = '\0';
>>>>> +
>>>>> + if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, align))
>>>>> + {
>>>>> + warning (0, "Wrong arg %s to option %s", curr_range_str,
>>>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != -1))
>>>>> + {
>>>>> + warning (0, "Size ranges of option %s should be increasing",
>>>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + for (i = 0; i < last_alg; i++)
>>>>> + {
>>>>> + if (!strcmp (alg_name, stringop_alg_names[i]))
>>>>> + {
>>>>> + alg = (stringop_alg) i;
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + if (i == last_alg)
>>>>> + {
>>>>> + warning (0, "Wrong stringop strategy name %s specified for option %s",
>>>>> + alg_name,
>>>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + input_ranges[n].min = mins;
>>>>> + input_ranges[n].max = maxs;
>>>>> + input_ranges[n].alg = alg;
>>>>> + if (!strcmp (align, "align"))
>>>>> + input_ranges[n].noalign = false;
>>>>> + else if (!strcmp (align, "noalign"))
>>>>> + input_ranges[n].noalign = true;
>>>>> + else
>>>>> + {
>>>>> + warning (0, "Unknown alignment %s specified for option %s",
>>>>> + align, is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> + return;
>>>>> + }
>>>>> + n++;
>>>>> + curr_range_str = next_range_str;
>>>>> + } while (curr_range_str);
>>>>> +
>>>>> + if (input_ranges[n - 1].max != -1)
>>>>> + {
>>>>> + warning (0, "The max value for the last size range should be -1"
>>>>> + " for option %s",
>>>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + if (n > MAX_STRINGOP_ALGS)
>>>>> + {
>>>>> + warning (0, "Too many size ranges specified in option %s",
>>>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + /* Now override the default algs array */
>>>>> + for (i = 0; i < n; i++)
>>>>> + {
>>>>> + *const_cast<int *>(&default_algs->size[i].max) = input_ranges[i].max;
>>>>> + *const_cast<stringop_alg *>(&default_algs->size[i].alg)
>>>>> + = input_ranges[i].alg;
>>>>> + *const_cast<int *>(&default_algs->size[i].noalign)
>>>>> + = input_ranges[i].noalign;
>>>>> + }
>>>>> +}
>>>>> +
>>>>>
>>>>> /* Override various settings based on options. If MAIN_ARGS_P, the
>>>>> options are from the command line, otherwise they are from
>>>>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main
>>>>> /* Handle stack protector */
>>>>> if (!global_options_set.x_ix86_stack_protector_guard)
>>>>> ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
>>>>> +
>>>>> + /* Handle -mmemcpy-strategy= and -mmemset-strategy= */
>>>>> + if (ix86_tune_memcpy_strategy)
>>>>> + {
>>>>> + char *str = xstrdup (ix86_tune_memcpy_strategy);
>>>>> + ix86_parse_stringop_strategy_string (str, false);
>>>>> + free (str);
>>>>> + }
>>>>> +
>>>>> + if (ix86_tune_memset_strategy)
>>>>> + {
>>>>> + char *str = xstrdup (ix86_tune_memset_strategy);
>>>>> + ix86_parse_stringop_strategy_string (str, true);
>>>>> + free (str);
>>>>> + }
>>>>> }
>>>>>
>>>>> /* Implement the TARGET_OPTION_OVERRIDE hook. */
>>>>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>>> {
>>>>> case libcall:
>>>>> case no_stringop:
>>>>> + case last_alg:
>>>>> gcc_unreachable ();
>>>>> case loop_1_byte:
>>>>> need_zero_guard = true;
>>>>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>>> {
>>>>> case libcall:
>>>>> case no_stringop:
>>>>> + case last_alg:
>>>>> gcc_unreachable ();
>>>>> case loop_1_byte:
>>>>> case loop:
>>>>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>>> {
>>>>> case libcall:
>>>>> case no_stringop:
>>>>> + case last_alg:
>>>>> gcc_unreachable ();
>>>>> case loop:
>>>>> need_zero_guard = true;
>>>>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>>> {
>>>>> case libcall:
>>>>> case no_stringop:
>>>>> + case last_alg:
>>>>> gcc_unreachable ();
>>>>> case loop_1_byte:
>>>>> case loop:
>>>>> Index: config/i386/i386-opts.h
>>>>> ===================================================================
>>>>> --- config/i386/i386-opts.h (revision 201458)
>>>>> +++ config/i386/i386-opts.h (working copy)
>>>>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI
>>>>> /* Algorithm to expand string function with. */
>>>>> enum stringop_alg
>>>>> {
>>>>> - no_stringop,
>>>>> - libcall,
>>>>> - rep_prefix_1_byte,
>>>>> - rep_prefix_4_byte,
>>>>> - rep_prefix_8_byte,
>>>>> - loop_1_byte,
>>>>> - loop,
>>>>> - unrolled_loop,
>>>>> - vector_loop
>>>>> +#undef DEF_ENUM
>>>>> +#define DEF_ENUM
>>>>> +
>>>>> +#undef DEF_ALG
>>>>> +#define DEF_ALG(alg, name) alg,
>>>>> +
>>>>> +#include "stringop.def"
>>>>> +last_alg
>>>>> +
>>>>> +#undef DEF_ENUM
>>>>> +#undef DEF_ALG
>>>>> };
>>>>>
>>>>> /* Available call abi. */
>>>>> Index: doc/invoke.texi
>>>>> ===================================================================
>>>>> --- doc/invoke.texi (revision 201458)
>>>>> +++ doc/invoke.texi (working copy)
>>>>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}.
>>>>> -mbmi2 -mrtm -mlwp -mthreads @gol
>>>>> -mno-align-stringops -minline-all-stringops @gol
>>>>> -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
>>>>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy}
>>>>> -mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
>>>>> -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
>>>>> -mregparm=@var{num} -msseregparm @gol
>>>>> @@ -14598,6 +14599,24 @@ Expand into an inline loop.
>>>>> Always use a library call.
>>>>> @end table
>>>>>
>>>>> +@item -mmemcpy-strategy=@var{strategy}
>>>>> +@opindex mmemcpy-strategy=@var{strategy}
>>>>> +Override the internal decision heuristic to decide if @code{__builtin_memcpy}
>>>>> +should be inlined and what inline algorithm to use when the expected size
>>>>> +of the copy operation is known. @var{strategy}
>>>>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets.
>>>>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies
>>>>> +the max byte size with which inline algorithm @var{alg} is allowed. For the last
>>>>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets
>>>>> +in the list must be specified in increasing order. The minimal byte size for
>>>>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the
>>>>> +preceding range.
>>>>> +
>>>>> +@item -mmemset-strategy=@var{strategy}
>>>>> +@opindex mmemset-strategy=@var{strategy}
>>>>> +The option is similar to @option{-mmemcpy-strategy=} except that it is to control
>>>>> +@code{__builtin_memset} expansion.
>>>>> +
>>>>> @item -momit-leaf-frame-pointer
>>>>> @opindex momit-leaf-frame-pointer
>>>>> Don't keep the frame pointer in a register for leaf functions. This
>>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0)
>>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0)
>>>>> @@ -0,0 +1,12 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" } */
>>>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>>>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>>>> +
>>>>> +char a[2048];
>>>>> +char b[2048];
>>>>> +void t (void)
>>>>> +{
>>>>> + __builtin_memcpy (a, b, 2048);
>>>>> +}
>>>>> +
>>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0)
>>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0)
>>>>> @@ -0,0 +1,12 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
>>>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } } } */
>>>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>>>> +
>>>>> +char a[2048];
>>>>> +char b[2048];
>>>>> +void t (void)
>>>>> +{
>>>>> + __builtin_memcpy (a, b, 2048);
>>>>> +}
>>>>> +
>>>>> Index: testsuite/gcc.target/i386/memset-strategy-1.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.target/i386/memset-strategy-1.c (revision 0)
>>>>> +++ testsuite/gcc.target/i386/memset-strategy-1.c (revision 0)
>>>>> @@ -0,0 +1,10 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
>>>>> +/* { dg-final { scan-assembler-times "memset" 2 } } */
>>>>> +
>>>>> +char a[2048];
>>>>> +void t (void)
>>>>> +{
>>>>> + __builtin_memset (a, 1, 2048);
>>>>> +}
>>>>> +
>>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0)
>>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0)
>>>>> @@ -0,0 +1,11 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
>>>>> +/* { dg-final { scan-assembler-times "memcpy" 2 } } */
>>>>> +
>>>>> +char a[2048];
>>>>> +char b[2048];
>>>>> +void t (void)
>>>>> +{
>>>>> + __builtin_memcpy (a, b, 2048);
>>>>> +}
>>>>> +
>>>>
>>
>>
>>
>> --
>> ---
>> Best regards,
>> Michael V. Zolotukhin,
>> Software Engineer
>> Intel Corporation.
--
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.