This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)


On Thu, Aug 18, 2011 at 1:03 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Aug 17, 2011 at 7:54 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Wed, Aug 17, 2011 at 12:37 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Aug 16, 2011 at 10:50 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>>> Support for getting CPU type and feature information at run-time.
>>>>
>>>> The following patch provides support for finding the platform type at run-time, like cpu type and features supported. The multi-versioning framework will use the builtins added to dispatch the right function version. Please refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00298.html for details on function multi-versioning usability.
>>>
>>> Please provide an overview why you need the new builtins,
>>
>> For multi-versioning, ?the compiler can call the appropriate builtin
>> to dispatch the right version. The builtin call will later get folded.
>>
>> For example,
>>
>> int ?__attribute__ version ("sse4_1")
>> compute ()
>> {
>> ? // Do sse4_1 specific impkementation.
>> }
>>
>> int
>> compute ()
>> {
>> ?// Generic implementation
>> }
>>
>> The compiler will check if the target supports the attribute and then
>> convert a call to compute () ?into ?this:
>>
>> if (__builtin_target_supports_sse4_1 ())
>> ?compute_sse4_1 (); // Call to the SSE4_1 implementation
>> else
>> ?compute_generic (); // Call to the generic implementation
>>
>> Further, having it as builtin function allows it to be overridden by
>> the programmer. For instance, the programmer can override it to
>> identify newer CPU types not yet supported. Having these builtins
>> makes it convenient to identify platform type and features in general.
>>
>> why you need
>>> a separate pass to fold them (instead of just expanding them) and why
>>
>> I can move it into builtins.c along with where other builtins are
>> folded and remove the separate pass. My intention originally was to
>> fold them as early as possible, in this case after multi-versioning
>> but I guess this is not a requirement.
>
> Yes, they should be folded by targetm.fold_builtin instead. ?The Frontend
> should simply fold the tests at the time it creates them, that's as early
> as possible (gimplification will also re-fold all builtin function calls).
>
>>> you are creating
>>> vars behind the back of GCC:
>>
>> The flow I had in mind was to have functions in libgcc which will use
>> CPUID to get target features and set global vars corresponding to the
>> features. So, the builtin should be folded by into the appropriate
>> variable in libgcc.
>
> Hm, but then the variable should reside in libgcc and you'd only need
> an extern variant in the varpool. ?I'm not sure separate constructors
> (possibly in each module ...) would be better than a single one in
> libgcc that would get run unconditionally.
>
>>>
>>> + ?/* Set finalized to 1, otherwise it asserts in function "write_symbol" in
>>> + ? ? lto-streamer-out.c. */
>>> + ?vnode->finalized = 1;
>>>
>>> where I think you miss a varpool_finalize_node call somewhere. ?Why
>>> isn't this all done at target init time
>>
>> I wanted to do this on demand. If none of the new builtins are called
>> in the program, I do not need to to do this at all. In summary, libgcc
>> has a function called __cpu_indicator_init which does the work of
>> determining target features and setting the appropriate globals. If
>> the new builtins are called, gcc will call __cpu_indicator_init in a
>> constructor so that it is called exactly once. Then, gcc will fold the
>> builtin to the appropriate global variable.
>
> I see, but this sounds like premature optimization to me, no? ?Considering
> you'd do this in each module and our inability to merge those constructors
> at link time. ?If we put __cpu_indicator, the constructor and the assorted
> support into a separate module inside libgcc.a could we arrange it in a way
> that if __cpu_indicator is not referenced from the program that piece isn't
> linked in? ?(not sure if that is possible with constructors)

Ok, so two things. I create the constructor as a comdat. So, it is
created by gcc in every module but at link time only one copy will be
kept. So, it is going to be called only once and that is not a
problem. The other thing is that I can eliminate all of this code gen
in gcc for and mark this as a constructor in libgcc which means it
will always be linked in and always be called once at run-time. There
is no easy way right now to garbage collect unreferenced ctors at
run-time. I do not have a strong opinion on this and I can do the
latter.

>
> Richard.
>
>>
>> ? ?If you don't mark the
>>> variable as to be preserved
>>> like you do cgraph will optimize it all away if it isn't needed.
>>
>>>
>>> Richard.
>>>
>>>> ? ? ? ?* tree-pass.h (pass_tree_fold_builtin_target): New pass.
>>>> ? ? ? ?* builtins.def (BUILT_IN_TARGET_SUPPORTS_CMOV): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_SUPPORTS_MMX): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_SUPPORTS_POPCOUNT): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_SUPPORTS_SSE): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_SUPPORTS_SSE2): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_SUPPORTS_SSE3): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_SUPPORTS_SSSE3): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_SUPPORTS_SSE4_1): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_SUPPORTS_SSE4_2): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_IS_AMD): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_IS_INTEL): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_IS_COREI7_NEHALEM): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_IS_COREI7_WESTMERE): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI): New builtin.
>>>> ? ? ? ?(BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL): New builtin.
>>>> ? ? ? ?* mversn-dispatch.c (do_fold_builtin_target): New function.
>>>> ? ? ? ?(gate_fold_builtin_target): New function.
>>>> ? ? ? ?(pass_tree_fold_builtin_target): New pass.
>>>> ? ? ? ?* timevar.def (TV_FOLD_BUILTIN_TARGET): New var.
>>>> ? ? ? ?* passes.c (init_optimization_passes): Add new pass to pass list.
>>>> ? ? ? ?* config/i386/i386.c (build_struct_with_one_bit_fields): New function.
>>>> ? ? ? ?(make_var_decl): New function.
>>>> ? ? ? ?(get_field_from_struct): New function.
>>>> ? ? ? ?(make_constructor_to_get_target_type): New function.
>>>> ? ? ? ?(fold_builtin_target): New function.
>>>> ? ? ? ?(ix86_fold_builtin): New function.
>>>> ? ? ? ?(TARGET_FOLD_BUILTIN): New macro.
>>>>
>>>> ? ? ? ?* gcc.dg/builtin_target.c: New test.
>>>>
>>>> ? ? ? ?* config/i386/i386-cpuinfo.c: New file.
>>>> ? ? ? ?* config/i386/t-cpuinfo: New file.
>>>> ? ? ? ?* config.host: Add t-cpuinfo to link i386-cpuinfo.o with libgcc
>>>>
>>>> Index: libgcc/config.host
>>>> ===================================================================
>>>> --- libgcc/config.host ?(revision 177767)
>>>> +++ libgcc/config.host ?(working copy)
>>>> @@ -609,7 +609,7 @@ case ${host} in
>>>> ?i[34567]86-*-linux* | x86_64-*-linux* | \
>>>> ? i[34567]86-*-kfreebsd*-gnu | i[34567]86-*-knetbsd*-gnu | \
>>>> ? i[34567]86-*-gnu*)
>>>> - ? ? ? tmake_file="${tmake_file} t-tls"
>>>> + ? ? ? tmake_file="${tmake_file} t-tls i386/t-cpuinfo"
>>>> ? ? ? ?if test "$libgcc_cv_cfi" = "yes"; then
>>>> ? ? ? ? ? ? ? ?tmake_file="${tmake_file} t-stack i386/t-stack-i386"
>>>> ? ? ? ?fi
>>>> Index: libgcc/config/i386/t-cpuinfo
>>>> ===================================================================
>>>> --- libgcc/config/i386/t-cpuinfo ? ? ? ?(revision 0)
>>>> +++ libgcc/config/i386/t-cpuinfo ? ? ? ?(revision 0)
>>>> @@ -0,0 +1,2 @@
>>>> +# This is an endfile
>>>> +LIB2ADD += $(srcdir)/config/i386/i386-cpuinfo.c
>>>> Index: libgcc/config/i386/i386-cpuinfo.c
>>>> ===================================================================
>>>> --- libgcc/config/i386/i386-cpuinfo.c ? (revision 0)
>>>> +++ libgcc/config/i386/i386-cpuinfo.c ? (revision 0)
>>>> @@ -0,0 +1,275 @@
>>>> +/* Copyright (C) 2011 Free Software Foundation, Inc.
>>>> + * Contributed by Sriraman Tallam <tmsriram@google.com>.
>>>> + *
>>>> + * This file is free software; you can redistribute it and/or modify it
>>>> + * under the terms of the GNU General Public License as published by the
>>>> + * Free Software Foundation; either version 3, or (at your option) any
>>>> + * later version.
>>>> + *
>>>> + * This file is distributed in the hope that it will be useful, but
>>>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ?See the GNU
>>>> + * General Public License for more details.
>>>> + *
>>>> + * Under Section 7 of GPL version 3, you are granted additional
>>>> + * permissions described in the GCC Runtime Library Exception, version
>>>> + * 3.1, as published by the Free Software Foundation.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License and
>>>> + * a copy of the GCC Runtime Library Exception along with this program;
>>>> + * see the files COPYING3 and COPYING.RUNTIME respectively. ?If not, see
>>>> + * <http://www.gnu.org/licenses/>.
>>>> + *
>>>> + *
>>>> + * This code is adapted from gcc/config/i386/driver-i386.c. The CPUID
>>>> + * instruction is used to figure out the cpu type and supported features.
>>>> + * GCC runs __cpu_indicator_init from a constructor which sets the members
>>>> + * of __cpu_model and __cpu_features.
>>>> + */
>>>> +
>>>> +#include <string.h>
>>>> +
>>>> +#ifdef __GNUC__
>>>> +#include "cpuid.h"
>>>> +
>>>> +enum processor_type
>>>> +{
>>>> + ?PROCESSOR_PENTIUM = 0,
>>>> + ?PROCESSOR_CORE2,
>>>> + ?PROCESSOR_COREI7_NEHALEM,
>>>> + ?PROCESSOR_COREI7_WESTMERE,
>>>> + ?PROCESSOR_COREI7_SANDYBRIDGE,
>>>> + ?PROCESSOR_INTEL_GENERIC,
>>>> + ?PROCESSOR_AMDFAM10_BARCELONA,
>>>> + ?PROCESSOR_AMDFAM10_SHANGHAI,
>>>> + ?PROCESSOR_AMDFAM10_ISTANBUL,
>>>> + ?PROCESSOR_AMDFAM10_GENERIC,
>>>> + ?PROCESSOR_AMD_GENERIC,
>>>> + ?PROCESSOR_GENERIC,
>>>> + ?PROCESSOR_max
>>>> +};
>>>> +
>>>> +enum vendor_signatures
>>>> +{
>>>> + ?SIG_INTEL = ?0x756e6547 /* Genu */,
>>>> + ?SIG_AMD = ? ?0x68747541 /* Auth */
>>>> +};
>>>> +
>>>> +
>>>> +/* Features supported. */
>>>> +
>>>> +struct __processor_features
>>>> +{
>>>> + ?unsigned int __cpu_cmov : 1;
>>>> + ?unsigned int __cpu_mmx : 1;
>>>> + ?unsigned int __cpu_popcnt : 1;
>>>> + ?unsigned int __cpu_sse : 1;
>>>> + ?unsigned int __cpu_sse2 : 1;
>>>> + ?unsigned int __cpu_sse3 : 1;
>>>> + ?unsigned int __cpu_ssse3 : 1;
>>>> + ?unsigned int __cpu_sse4_1 : 1;
>>>> + ?unsigned int __cpu_sse4_2 : 1;
>>>> +};
>>>> +
>>>> +/* Flags exported. */
>>>> +
>>>> +struct __processor_model
>>>> +{
>>>> + ?unsigned int __cpu_is_amd : 1;
>>>> + ?unsigned int __cpu_is_intel : 1;
>>>> + ?unsigned int __cpu_is_corei7_nehalem : 1;
>>>> + ?unsigned int __cpu_is_corei7_westmere : 1;
>>>> + ?unsigned int __cpu_is_corei7_sandybridge : 1;
>>>> + ?unsigned int __cpu_is_amdfam10_barcelona : 1;
>>>> + ?unsigned int __cpu_is_amdfam10_shanghai : 1;
>>>> + ?unsigned int __cpu_is_amdfam10_istanbul : 1;
>>>> +};
>>>> +
>>>> +enum processor_type __cpu_type = PROCESSOR_GENERIC;
>>>> +struct __processor_features __cpu_features;
>>>> +struct __processor_model __cpu_model;
>>>> +
>>>> +static void
>>>> +get_amd_cpu (unsigned int family, unsigned int model)
>>>> +{
>>>> + ?switch (family)
>>>> + ? ?{
>>>> + ? ?case 0x10:
>>>> + ? ? ?switch (model)
>>>> + ? ? ? {
>>>> + ? ? ? case 0x2:
>>>> + ? ? ? ? __cpu_type = PROCESSOR_AMDFAM10_BARCELONA;
>>>> + ? ? ? ? __cpu_model.__cpu_is_amdfam10_barcelona = 1;
>>>> + ? ? ? ? break;
>>>> + ? ? ? case 0x4:
>>>> + ? ? ? ? __cpu_type = PROCESSOR_AMDFAM10_SHANGHAI;
>>>> + ? ? ? ? __cpu_model.__cpu_is_amdfam10_shanghai = 1;
>>>> + ? ? ? ? break;
>>>> + ? ? ? case 0x8:
>>>> + ? ? ? ? __cpu_type = PROCESSOR_AMDFAM10_ISTANBUL;
>>>> + ? ? ? ? __cpu_model.__cpu_is_amdfam10_istanbul = 1;
>>>> + ? ? ? ? break;
>>>> + ? ? ? default:
>>>> + ? ? ? ? __cpu_type = PROCESSOR_AMDFAM10_GENERIC;
>>>> + ? ? ? ? break;
>>>> + ? ? ? }
>>>> + ? ? ?break;
>>>> + ? ?default:
>>>> + ? ? ?__cpu_type = PROCESSOR_AMD_GENERIC;
>>>> + ? ?}
>>>> +}
>>>> +
>>>> +static void
>>>> +get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id)
>>>> +{
>>>> + ?/* Parse family and model only if brand ID is 0. */
>>>> + ?if (brand_id == 0)
>>>> + ? ?{
>>>> + ? ? ?switch (family)
>>>> + ? ? ? {
>>>> + ? ? ? case 0x5:
>>>> + ? ? ? ? __cpu_type = PROCESSOR_PENTIUM;
>>>> + ? ? ? ? break;
>>>> + ? ? ? case 0x6:
>>>> + ? ? ? ? switch (model)
>>>> + ? ? ? ? ? {
>>>> + ? ? ? ? ? case 0x1a:
>>>> + ? ? ? ? ? case 0x1e:
>>>> + ? ? ? ? ? case 0x1f:
>>>> + ? ? ? ? ? case 0x2e:
>>>> + ? ? ? ? ? ? /* Nehalem. ?*/
>>>> + ? ? ? ? ? ? __cpu_type = PROCESSOR_COREI7_NEHALEM;
>>>> + ? ? ? ? ? ? __cpu_model.__cpu_is_corei7_nehalem = 1;
>>>> + ? ? ? ? ? ? break;
>>>> + ? ? ? ? ? case 0x25:
>>>> + ? ? ? ? ? case 0x2c:
>>>> + ? ? ? ? ? case 0x2f:
>>>> + ? ? ? ? ? ? /* Westmere. ?*/
>>>> + ? ? ? ? ? ? __cpu_type = PROCESSOR_COREI7_WESTMERE;
>>>> + ? ? ? ? ? ? __cpu_model.__cpu_is_corei7_westmere = 1;
>>>> + ? ? ? ? ? ? break;
>>>> + ? ? ? ? ? case 0x2a:
>>>> + ? ? ? ? ? ? /* Sandy Bridge. ?*/
>>>> + ? ? ? ? ? ? __cpu_type = PROCESSOR_COREI7_SANDYBRIDGE;
>>>> + ? ? ? ? ? ? __cpu_model.__cpu_is_corei7_sandybridge = 1;
>>>> + ? ? ? ? ? ? break;
>>>> + ? ? ? ? ? case 0x17:
>>>> + ? ? ? ? ? case 0x1d:
>>>> + ? ? ? ? ? ? /* Penryn. ?*/
>>>> + ? ? ? ? ? case 0x0f:
>>>> + ? ? ? ? ? ? /* Merom. ?*/
>>>> + ? ? ? ? ? ? __cpu_type = PROCESSOR_CORE2;
>>>> + ? ? ? ? ? ? break;
>>>> + ? ? ? ? ? default:
>>>> + ? ? ? ? ? ? __cpu_type = PROCESSOR_INTEL_GENERIC;
>>>> + ? ? ? ? ? ? break;
>>>> + ? ? ? ? ? }
>>>> + ? ? ? ? break;
>>>> + ? ? ? default:
>>>> + ? ? ? ? /* We have no idea. ?*/
>>>> + ? ? ? ? __cpu_type = PROCESSOR_INTEL_GENERIC;
>>>> + ? ? ? ? break;
>>>> + ? ? ? }
>>>> + ? ?}
>>>> +}
>>>> +
>>>> +static void
>>>> +get_available_features (unsigned int ecx, unsigned int edx)
>>>> +{
>>>> + ?__cpu_features.__cpu_cmov = (edx & bit_CMOV) ? 1 : 0;
>>>> + ?__cpu_features.__cpu_mmx = (edx & bit_MMX) ? 1 : 0;
>>>> + ?__cpu_features.__cpu_sse = (edx & bit_SSE) ? 1 : 0;
>>>> + ?__cpu_features.__cpu_sse2 = (edx & bit_SSE2) ? 1 : 0;
>>>> + ?__cpu_features.__cpu_popcnt = (ecx & bit_POPCNT) ? 1 : 0;
>>>> + ?__cpu_features.__cpu_sse3 = (ecx & bit_SSE3) ? 1 : 0;
>>>> + ?__cpu_features.__cpu_ssse3 = (ecx & bit_SSSE3) ? 1 : 0;
>>>> + ?__cpu_features.__cpu_sse4_1 = (ecx & bit_SSE4_1) ? 1 : 0;
>>>> + ?__cpu_features.__cpu_sse4_2 = (ecx & bit_SSE4_2) ? 1 : 0;
>>>> +}
>>>> +
>>>> +/* A noinline function calling __get_cpuid. Having many calls to
>>>> + ? cpuid in one function in 32-bit mode causes GCC to complain:
>>>> + ? "can’t find a register in class ‘CLOBBERED_REGS’". ?This is
>>>> + ? related to PR rtl-optimization 44174. */
>>>> +
>>>> +static int __attribute__ ((noinline))
>>>> +__get_cpuid_output (unsigned int __level,
>>>> + ? ? ? ? ? ? ? ? ? unsigned int *__eax, unsigned int *__ebx,
>>>> + ? ? ? ? ? ? ? ? ? unsigned int *__ecx, unsigned int *__edx)
>>>> +{
>>>> + ?return __get_cpuid (__level, __eax, __ebx, __ecx, __edx);
>>>> +}
>>>> +
>>>> +/* This function will be linked in to binaries that need to look up
>>>> + ? CPU information. ?*/
>>>> +
>>>> +void
>>>> +__cpu_indicator_init(void)
>>>> +{
>>>> + ?unsigned int eax, ebx, ecx, edx;
>>>> +
>>>> + ?int max_level = 5;
>>>> + ?unsigned int vendor;
>>>> + ?unsigned int model, family, brand_id;
>>>> +
>>>> + ?memset (&__cpu_features, 0, sizeof (struct __processor_features));
>>>> + ?memset (&__cpu_model, 0, sizeof (struct __processor_model));
>>>> +
>>>> + ?/* Assume cpuid insn present. Run in level 0 to get vendor id. */
>>>> + ?if (!__get_cpuid_output (0, &eax, &ebx, &ecx, &edx))
>>>> + ? ?return;
>>>> +
>>>> + ?vendor = ebx;
>>>> + ?max_level = eax;
>>>> +
>>>> + ?if (max_level < 1)
>>>> + ? ?return;
>>>> +
>>>> + ?if (!__get_cpuid_output (1, &eax, &ebx, &ecx, &edx))
>>>> + ? ?return;
>>>> +
>>>> + ?model = (eax >> 4) & 0x0f;
>>>> + ?family = (eax >> 8) & 0x0f;
>>>> + ?brand_id = ebx & 0xff;
>>>> +
>>>> + ?/* Adjust model and family for Intel CPUS. */
>>>> + ?if (vendor == SIG_INTEL)
>>>> + ? ?{
>>>> + ? ? ?unsigned int extended_model, extended_family;
>>>> +
>>>> + ? ? ?extended_model = (eax >> 12) & 0xf0;
>>>> + ? ? ?extended_family = (eax >> 20) & 0xff;
>>>> + ? ? ?if (family == 0x0f)
>>>> + ? ? ? {
>>>> + ? ? ? ? family += extended_family;
>>>> + ? ? ? ? model += extended_model;
>>>> + ? ? ? }
>>>> + ? ? ?else if (family == 0x06)
>>>> + ? ? ? model += extended_model;
>>>> + ? ?}
>>>> +
>>>> + ?/* Find CPU model. */
>>>> +
>>>> + ?if (vendor == SIG_AMD)
>>>> + ? ?{
>>>> + ? ? ?__cpu_model.__cpu_is_amd = 1;
>>>> + ? ? ?get_amd_cpu (family, model);
>>>> + ? ?}
>>>> + ?else if (vendor == SIG_INTEL)
>>>> + ? ?{
>>>> + ? ? ?__cpu_model.__cpu_is_intel = 1;
>>>> + ? ? ?get_intel_cpu (family, model, brand_id);
>>>> + ? ?}
>>>> +
>>>> + ?/* Find available features. */
>>>> + ?get_available_features (ecx, edx);
>>>> +}
>>>> +
>>>> +#else
>>>> +
>>>> +void
>>>> +__cpu_indicator_init(void)
>>>> +{
>>>> +}
>>>> +
>>>> +#endif /* __GNUC__ */
>>>> Index: gcc/tree-pass.h
>>>> ===================================================================
>>>> --- gcc/tree-pass.h ? ? (revision 177767)
>>>> +++ gcc/tree-pass.h ? ? (working copy)
>>>> @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_split_functions
>>>> ?extern struct gimple_opt_pass pass_feedback_split_functions;
>>>> ?extern struct gimple_opt_pass pass_threadsafe_analyze;
>>>> ?extern struct gimple_opt_pass pass_tree_convert_builtin_dispatch;
>>>> +extern struct gimple_opt_pass pass_tree_fold_builtin_target;
>>>>
>>>> ?/* IPA Passes */
>>>> ?extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>>>> Index: gcc/testsuite/gcc.dg/builtin_target.c
>>>> ===================================================================
>>>> --- gcc/testsuite/gcc.dg/builtin_target.c ? ? ? (revision 0)
>>>> +++ gcc/testsuite/gcc.dg/builtin_target.c ? ? ? (revision 0)
>>>> @@ -0,0 +1,49 @@
>>>> +/* This test checks if the __builtin_target_* calls are recognized. */
>>>> +
>>>> +/* { dg-do run } */
>>>> +
>>>> +int
>>>> +fn1 ()
>>>> +{
>>>> + ?if (__builtin_target_supports_cmov () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_supports_mmx () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_supports_popcount () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_supports_sse () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_supports_sse2 () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_supports_sse3 () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_supports_ssse3 () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_supports_sse4_1 () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_supports_sse4_2 () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_is_amd () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_is_intel () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_is_corei7_nehalem () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_is_corei7_westmere () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_is_corei7_sandybridge () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_is_amdfam10_barcelona () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_is_amdfam10_shanghai () < 0)
>>>> + ? ?return -1;
>>>> + ?if (__builtin_target_is_amdfam10_istanbul () < 0)
>>>> + ? ?return -1;
>>>> +
>>>> + ?return 0;
>>>> +}
>>>> +
>>>> +int main ()
>>>> +{
>>>> + ?return fn1 ();
>>>> +}
>>>> Index: gcc/builtins.def
>>>> ===================================================================
>>>> --- gcc/builtins.def ? ?(revision 177767)
>>>> +++ gcc/builtins.def ? ?(working copy)
>>>> @@ -763,6 +763,25 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
>>>> ?/* Multiversioning builtin dispatch hook. */
>>>> ?DEF_GCC_BUILTIN (BUILT_IN_DISPATCH, "dispatch", BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR, ATTR_NULL)
>>>>
>>>> +/* Builtins to determine target type and features at run-time. */
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_CMOV, "target_supports_cmov", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_MMX, "target_supports_mmx", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_POPCOUNT, "target_supports_popcount", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE, "target_supports_sse", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE2, "target_supports_sse2", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE3, "target_supports_sse3", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSSE3, "target_supports_ssse3", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_1, "target_supports_sse4_1", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_SUPPORTS_SSE4_2, "target_supports_sse4_2", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMD, "target_is_amd", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_INTEL, "target_is_intel", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_NEHALEM, "target_is_corei7_nehalem", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_WESTMERE, "target_is_corei7_westmere", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE, "target_is_corei7_sandybridge", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA, "target_is_amdfam10_barcelona", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI, "target_is_amdfam10_shanghai", BT_FN_INT, ATTR_NULL)
>>>> +DEF_GCC_BUILTIN (BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL, "target_is_amdfam10_istanbul", BT_FN_INT, ATTR_NULL)
>>>> +
>>>> ?/* Exception support. ?*/
>>>> ?DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
>>>> ?DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
>>>> Index: gcc/mversn-dispatch.c
>>>> ===================================================================
>>>> --- gcc/mversn-dispatch.c ? ? ? (revision 177767)
>>>> +++ gcc/mversn-dispatch.c ? ? ? (working copy)
>>>> @@ -135,6 +135,7 @@ along with GCC; see the file COPYING3. ?If not see
>>>> ?#include "output.h"
>>>> ?#include "vecprim.h"
>>>> ?#include "gimple-pretty-print.h"
>>>> +#include "target.h"
>>>>
>>>> ?typedef struct cgraph_node* NODEPTR;
>>>> ?DEF_VEC_P (NODEPTR);
>>>> @@ -1764,3 +1765,103 @@ struct gimple_opt_pass pass_tree_convert_builtin_d
>>>> ? TODO_update_ssa | TODO_verify_ssa
>>>> ?}
>>>> ?};
>>>> +
>>>> +/* Fold calls to __builtin_target_* */
>>>> +
>>>> +static unsigned int
>>>> +do_fold_builtin_target (void)
>>>> +{
>>>> + ?basic_block bb;
>>>> + ?gimple_stmt_iterator gsi;
>>>> +
>>>> + ?/* Go through each stmt looking for __builtin_target_* calls */
>>>> + ?FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (current_function_decl))
>>>> + ? ?{
>>>> + ? ? ?for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>>> + ? ? ? ?{
>>>> + ? ? ? ? gimple stmt = gsi_stmt (gsi);
>>>> + ? ? ? ? gimple assign_stmt;
>>>> + ? ? ? ? ?tree call_decl;
>>>> + ? ? ? ? tree lhs_retval;
>>>> + ? ? ? ? tree folded_val;
>>>> +
>>>> + ? ? ? ? tree ssa_var, tmp_var;
>>>> + ? ? ? ? gimple init_stmt;
>>>> +
>>>> + ? ? ? ? ?if (!is_gimple_call (stmt))
>>>> + ? ? ? ? ? ?continue;
>>>> +
>>>> + ? ? ? ? ?call_decl = gimple_call_fndecl (stmt);
>>>> +
>>>> + ? ? ? ? /* Check if it is a __builtin_target_* call. */
>>>> +
>>>> + ? ? ? ? if (call_decl == NULL
>>>> + ? ? ? ? ? ? || DECL_NAME (call_decl) == NULL_TREE
>>>> + ? ? ? ? ? ? || DECL_BUILT_IN_CLASS (call_decl) != BUILT_IN_NORMAL
>>>> + ? ? ? ? ? ? || strstr (IDENTIFIER_POINTER (DECL_NAME (call_decl)),
>>>> + ? ? ? ? ? ? ? ? ? ? ? ? "__builtin_target") == NULL)
>>>> + ? ? ? ? ? ?continue;
>>>> +
>>>> + ? ? ? ? /* If the lhs is NULL there is no need to fold the call. */
>>>> + ? ? ? ? lhs_retval = gimple_call_lhs(stmt);
>>>> + ? ? ? ? if (lhs_retval == NULL)
>>>> + ? ? ? ? ? continue;
>>>> +
>>>> + ? ? ? ? /* Call the target hook to fold the builtin */
>>>> + ? ? ? ? ?folded_val = targetm.fold_builtin(call_decl, 0, NULL, false);
>>>> +
>>>> + ? ? ? ? /* If the target does not support the builtin then fold it to zero. */
>>>> + ? ? ? ? if (folded_val == NULL_TREE)
>>>> + ? ? ? ? ? folded_val = build_zero_cst (unsigned_type_node);
>>>> +
>>>> + ? ? ? ? /* Type cast unsigned value to integer */
>>>> + ? ? ? ? tmp_var = create_tmp_var (unsigned_type_node, NULL);
>>>> + ? ? ? ? init_stmt = gimple_build_assign (tmp_var, folded_val);
>>>> + ? ? ? ? ssa_var = make_ssa_name (tmp_var, init_stmt);
>>>> + ? ? ? ? gimple_assign_set_lhs (init_stmt, ssa_var);
>>>> + ? ? ? ? mark_symbols_for_renaming (init_stmt);
>>>> +
>>>> + ? ? ? ? assign_stmt = gimple_build_assign_with_ops (NOP_EXPR, lhs_retval, ssa_var, 0);
>>>> + ? ? ? ? mark_symbols_for_renaming(assign_stmt);
>>>> +
>>>> + ? ? ? ? gsi_insert_after_without_update (&gsi, assign_stmt, GSI_SAME_STMT);
>>>> + ? ? ? ? gsi_insert_after_without_update (&gsi, init_stmt, GSI_SAME_STMT);
>>>> + ? ? ? ? /* Delete the original call. */
>>>> + ? ? ? ? gsi_remove(&gsi, true);
>>>> + ? ? ? }
>>>> + ? ?}
>>>> +
>>>> + ?return 0;
>>>> +}
>>>> +
>>>> +static bool
>>>> +gate_fold_builtin_target (void)
>>>> +{
>>>> + ?return true;
>>>> +}
>>>> +
>>>> +/* Pass to fold __builtin_target_* functions */
>>>> +
>>>> +struct gimple_opt_pass pass_tree_fold_builtin_target =
>>>> +{
>>>> + {
>>>> + ?GIMPLE_PASS,
>>>> + ?"fold_builtin_target", ? ? ? ? ? ? ? /* name */
>>>> + ?gate_fold_builtin_target, ? ? ? ? ? ?/* gate */
>>>> + ?do_fold_builtin_target, ? ? ? ? ? ? ?/* execute */
>>>> + ?NULL, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* sub */
>>>> + ?NULL, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* next */
>>>> + ?0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* static_pass_number */
>>>> + ?TV_FOLD_BUILTIN_TARGET, ? ? ? ? ? ? ?/* tv_id */
>>>> + ?PROP_cfg, ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* properties_required */
>>>> + ?PROP_cfg, ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* properties_provided */
>>>> + ?0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* properties_destroyed */
>>>> + ?0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* todo_flags_start */
>>>> + ?TODO_dump_func | ? ? ? ? ? ? ? ? ? ? /* todo_flags_finish */
>>>> + ?TODO_cleanup_cfg |
>>>> + ?TODO_update_ssa |
>>>> + ?TODO_verify_ssa
>>>> + }
>>>> +};
>>>> +
>>>> +
>>>> Index: gcc/timevar.def
>>>> ===================================================================
>>>> --- gcc/timevar.def ? ? (revision 177767)
>>>> +++ gcc/timevar.def ? ? (working copy)
>>>> @@ -124,6 +124,7 @@ DEFTIMEVAR (TV_PARSE_INMETH ? ? ? ? ?, "parser inl
>>>> ?DEFTIMEVAR (TV_TEMPLATE_INST ? ? ? ? , "template instantiation")
>>>> ?DEFTIMEVAR (TV_INLINE_HEURISTICS ? ? , "inline heuristics")
>>>> ?DEFTIMEVAR (TV_MVERSN_DISPATCH ? ? ? , "multiversion dispatch")
>>>> +DEFTIMEVAR (TV_FOLD_BUILTIN_TARGET ? , "fold __builtin_target calls")
>>>> ?DEFTIMEVAR (TV_INTEGRATION ? ? ? ? ? , "integration")
>>>> ?DEFTIMEVAR (TV_TREE_GIMPLIFY ? ? ? ?, "tree gimplify")
>>>> ?DEFTIMEVAR (TV_TREE_EH ? ? ? ? ? ? ?, "tree eh")
>>>> Index: gcc/passes.c
>>>> ===================================================================
>>>> --- gcc/passes.c ? ? ? ?(revision 177767)
>>>> +++ gcc/passes.c ? ? ? ?(working copy)
>>>> @@ -1249,6 +1249,8 @@ init_optimization_passes (void)
>>>> ? ? {
>>>> ? ? ? struct opt_pass **p = &pass_ipa_multiversion_dispatch.pass.sub;
>>>> ? ? ? NEXT_PASS (pass_tree_convert_builtin_dispatch);
>>>> + ? ? ?/* Fold calls to __builtin_target_*. */
>>>> + ? ? ?NEXT_PASS (pass_tree_fold_builtin_target);
>>>> ? ? ? /* Rebuilding cgraph edges is necessary as the above passes change
>>>> ? ? ? ? ?the call graph. ?Otherwise, future optimizations use the old
>>>> ? ? ? ? call graph and make wrong decisions sometimes.*/
>>>> Index: gcc/config/i386/i386.c
>>>> ===================================================================
>>>> --- gcc/config/i386/i386.c ? ? ?(revision 177767)
>>>> +++ gcc/config/i386/i386.c ? ? ?(working copy)
>>>> @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3. ?If not see
>>>> ?#include "sched-int.h"
>>>> ?#include "sbitmap.h"
>>>> ?#include "fibheap.h"
>>>> +#include "tree-flow.h"
>>>> +#include "tree-pass.h"
>>>>
>>>> ?enum upper_128bits_state
>>>> ?{
>>>> @@ -7867,6 +7869,338 @@ ix86_build_builtin_va_list (void)
>>>> ? return ret;
>>>> ?}
>>>>
>>>> +/* Returns a struct type with name NAME and number of fields equal to
>>>> + ? NUM_FIELDS. ?Each field is a unsigned int bit field of length 1 bit. */
>>>> +
>>>> +static tree
>>>> +build_struct_with_one_bit_fields (int num_fields, const char *name)
>>>> +{
>>>> + ?int i;
>>>> + ?char field_name [10];
>>>> + ?tree field = NULL_TREE, field_chain = NULL_TREE;
>>>> + ?tree type = make_node (RECORD_TYPE);
>>>> +
>>>> + ?strcpy (field_name, "k_field");
>>>> +
>>>> + ?for (i = 0; i < num_fields; i++)
>>>> + ? ?{
>>>> + ? ? ?/* Name the fields, 0_field, 1_field, ... */
>>>> + ? ? ?field_name [0] = '0' + i;
>>>> + ? ? ?field = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
>>>> + ? ? ? ? ? ? ? ? ? ? ? ? get_identifier (field_name), unsigned_type_node);
>>>> + ? ? ?DECL_BIT_FIELD (field) = 1;
>>>> + ? ? ?DECL_SIZE (field) = bitsize_one_node;
>>>> + ? ? ?if (field_chain != NULL_TREE)
>>>> + ? ? ? DECL_CHAIN (field) = field_chain;
>>>> + ? ? ?field_chain = field;
>>>> + ? ?}
>>>> + ?finish_builtin_struct (type, name, field_chain, NULL_TREE);
>>>> + ?return type;
>>>> +}
>>>> +
>>>> +/* Returns a VAR_DECL of type TYPE and name NAME. */
>>>> +
>>>> +static tree
>>>> +make_var_decl (tree type, const char *name)
>>>> +{
>>>> + ?tree new_decl;
>>>> + ?struct varpool_node *vnode;
>>>> +
>>>> + ?new_decl = build_decl (UNKNOWN_LOCATION,
>>>> + ? ? ? ? ? ? ? ? ? ? ? ?VAR_DECL,
>>>> + ? ? ? ? ? ? ? ? ? ? ? ?get_identifier(name),
>>>> + ? ? ? ? ? ? ? ? ? ? ? ?type);
>>>> +
>>>> + ?DECL_EXTERNAL (new_decl) = 1;
>>>> + ?TREE_STATIC (new_decl) = 1;
>>>> + ?TREE_PUBLIC (new_decl) = 1;
>>>> + ?DECL_INITIAL (new_decl) = 0;
>>>> + ?DECL_ARTIFICIAL (new_decl) = 0;
>>>> + ?DECL_PRESERVE_P (new_decl) = 1;
>>>> +
>>>> + ?make_decl_one_only (new_decl, DECL_ASSEMBLER_NAME (new_decl));
>>>> + ?assemble_variable (new_decl, 0, 0, 0);
>>>> +
>>>> + ?vnode = varpool_node (new_decl);
>>>> + ?gcc_assert (vnode != NULL);
>>>> + ?/* Set finalized to 1, otherwise it asserts in function "write_symbol" in
>>>> + ? ? lto-streamer-out.c. */
>>>> + ?vnode->finalized = 1;
>>>> +
>>>> + ?return new_decl;
>>>> +}
>>>> +
>>>> +/* Traverses the chain of fields in STRUCT_TYPE and returns the FIELD_NUM
>>>> + ? numbered field. */
>>>> +
>>>> +static tree
>>>> +get_field_from_struct (tree struct_type, int field_num)
>>>> +{
>>>> + ?int i;
>>>> + ?tree field = TYPE_FIELDS (struct_type);
>>>> +
>>>> + ?for (i = 0; i < field_num; i++, field = DECL_CHAIN(field))
>>>> + ? ?{
>>>> + ? ? ?gcc_assert (field != NULL_TREE);
>>>> + ? ?}
>>>> +
>>>> + ?return field;
>>>> +}
>>>> +
>>>> +/* Create a new static constructor that calls __cpu_indicator_init ()
>>>> + ? function defined in libgcc/config/i386-cpuinfo.c which runs cpuid
>>>> + ? to figure out the type of the target. */
>>>> +
>>>> +static tree
>>>> +make_constructor_to_get_target_type (const char *name)
>>>> +{
>>>> + ?tree decl, type, t;
>>>> + ?gimple_seq seq;
>>>> + ?basic_block new_bb;
>>>> + ?tree old_current_function_decl;
>>>> +
>>>> + ?tree __cpu_indicator_int_decl;
>>>> + ?gimple constructor_body;
>>>> +
>>>> +
>>>> + ?type = build_function_type_list (void_type_node, NULL_TREE);
>>>> +
>>>> + ?/* Make a call stmt to __cpu_indicator_init */
>>>> + ?__cpu_indicator_int_decl = build_fn_decl ("__cpu_indicator_init", type);
>>>> + ?constructor_body = gimple_build_call (__cpu_indicator_int_decl, 0);
>>>> + ?DECL_EXTERNAL (__cpu_indicator_int_decl) = 1;
>>>> +
>>>> + ?decl = build_fn_decl (name, type);
>>>> +
>>>> + ?DECL_NAME (decl) = get_identifier (name);
>>>> + ?SET_DECL_ASSEMBLER_NAME (decl, DECL_NAME (decl));
>>>> + ?gcc_assert (cgraph_node (decl) != NULL);
>>>> +
>>>> + ?TREE_USED (decl) = 1;
>>>> + ?DECL_ARTIFICIAL (decl) = 1;
>>>> + ?DECL_IGNORED_P (decl) = 0;
>>>> + ?TREE_PUBLIC (decl) = 0;
>>>> + ?DECL_UNINLINABLE (decl) = 1;
>>>> + ?DECL_EXTERNAL (decl) = 0;
>>>> + ?DECL_CONTEXT (decl) = NULL_TREE;
>>>> + ?DECL_INITIAL (decl) = make_node (BLOCK);
>>>> + ?DECL_STATIC_CONSTRUCTOR (decl) = 1;
>>>> + ?TREE_READONLY (decl) = 0;
>>>> + ?DECL_PURE_P (decl) = 0;
>>>> +
>>>> + ?/* This is a comdat. */
>>>> + ?make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
>>>> +
>>>> + ?/* Build result decl and add to function_decl. */
>>>> + ?t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, void_type_node);
>>>> + ?DECL_ARTIFICIAL (t) = 1;
>>>> + ?DECL_IGNORED_P (t) = 1;
>>>> + ?DECL_RESULT (decl) = t;
>>>> +
>>>> + ?gimplify_function_tree (decl);
>>>> +
>>>> + ?/* Build CFG for this function. */
>>>> +
>>>> + ?old_current_function_decl = current_function_decl;
>>>> + ?push_cfun (DECL_STRUCT_FUNCTION (decl));
>>>> + ?current_function_decl = decl;
>>>> + ?init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl));
>>>> + ?cfun->curr_properties |=
>>>> + ? ?(PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars |
>>>> + ? ? PROP_ssa);
>>>> + ?new_bb = create_empty_bb (ENTRY_BLOCK_PTR);
>>>> + ?make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU);
>>>> +
>>>> + ?/* XXX: Not sure if the edge commented below is necessary. ?If I add this
>>>> + ? ? edge, it fails in gimple_verify_flow_info in tree-cfg.c in condition :
>>>> + ? ? " if (e->flags & EDGE_FALLTHRU)"
>>>> + ? ? during -fprofile-generate.
>>>> + ? ? Otherwise, it is fine. ?Deleting this edge does not break anything.
>>>> + ? ? Commenting this so that it is clear I am intentionally not doing this.*/
>>>> + ?/* make_edge (new_bb, EXIT_BLOCK_PTR, EDGE_FALLTHRU); */
>>>> +
>>>> + ?seq = gimple_seq_alloc_with_stmt (constructor_body);
>>>> +
>>>> + ?set_bb_seq (new_bb, seq);
>>>> + ?gimple_set_bb (constructor_body, new_bb);
>>>> +
>>>> + ?/* Set the lexical block of the constructor body. Fails the inliner
>>>> + ? ? other wise. */
>>>> + ?gimple_set_block (constructor_body, DECL_INITIAL (decl));
>>>> +
>>>> + ?/* This call is very important if this pass runs when the IR is in
>>>> + ? ? SSA form. ?It breaks things in strange ways otherwise. */
>>>> + ?init_tree_ssa (DECL_STRUCT_FUNCTION (decl));
>>>> + ?/* add_referenced_var (version_selector_var); */
>>>> +
>>>> + ?cgraph_add_new_function (decl, true);
>>>> + ?cgraph_call_function_insertion_hooks (cgraph_node (decl));
>>>> + ?cgraph_mark_needed_node (cgraph_node (decl));
>>>> +
>>>> + ?pop_cfun ();
>>>> + ?current_function_decl = old_current_function_decl;
>>>> + ?return decl;
>>>> +}
>>>> +
>>>> +/* FNDECL is a __builtin_target_* call that is folded into an integer defined
>>>> + ? in libgcc/config/i386/i386-cpuinfo.c */
>>>> +
>>>> +static tree
>>>> +fold_builtin_target (tree fndecl)
>>>> +{
>>>> + ?/* This is the order of bit-fields in __processor_features in
>>>> + ? ? i386-cpuinfo.c */
>>>> + ?enum processor_features
>>>> + ?{
>>>> + ? ?F_CMOV = 0,
>>>> + ? ?F_MMX,
>>>> + ? ?F_POPCNT,
>>>> + ? ?F_SSE,
>>>> + ? ?F_SSE2,
>>>> + ? ?F_SSE3,
>>>> + ? ?F_SSSE3,
>>>> + ? ?F_SSE4_1,
>>>> + ? ?F_SSE4_2,
>>>> + ? ?F_MAX
>>>> + ?};
>>>> +
>>>> + ?/* This is the order of bit-fields in __processor_model in
>>>> + ? ? i386-cpuinfo.c */
>>>> + ?enum processor_model
>>>> + ?{
>>>> + ? ?M_AMD = 0,
>>>> + ? ?M_INTEL,
>>>> + ? ?M_COREI7_NEHALEM,
>>>> + ? ?M_COREI7_WESTMERE,
>>>> + ? ?M_COREI7_SANDYBRIDGE,
>>>> + ? ?M_AMDFAM10_BARCELONA,
>>>> + ? ?M_AMDFAM10_SHANGHAI,
>>>> + ? ?M_AMDFAM10_ISTANBUL,
>>>> + ? ?M_MAX
>>>> + ?};
>>>> +
>>>> + ?static tree __processor_features_type = NULL_TREE;
>>>> + ?static tree __cpu_features_var = NULL_TREE;
>>>> + ?static tree __processor_model_type = NULL_TREE;
>>>> + ?static tree __cpu_model_var = NULL_TREE;
>>>> + ?static tree ctor_decl = NULL_TREE;
>>>> + ?static tree field;
>>>> + ?static tree which_struct;
>>>> +
>>>> + ?/* Make a call to __cpu_indicatior_init in a constructor.
>>>> + ? ? Function __cpu_indicator_init is defined in i386-cpuinfo.c. */
>>>> + ?if (ctor_decl == NULL_TREE)
>>>> + ? ctor_decl = make_constructor_to_get_target_type
>>>> + ? ? ? ? ? ? ? ("__cpu_indicator_init_ctor");
>>>> +
>>>> + ?if (__processor_features_type == NULL_TREE)
>>>> + ? ?__processor_features_type = build_struct_with_one_bit_fields (F_MAX,
>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "__processor_features");
>>>> +
>>>> + ?if (__processor_model_type == NULL_TREE)
>>>> + ? ?__processor_model_type = build_struct_with_one_bit_fields (M_MAX,
>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "__processor_model");
>>>> +
>>>> + ?if (__cpu_features_var == NULL_TREE)
>>>> + ? ?__cpu_features_var = make_var_decl (__processor_features_type,
>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "__cpu_features");
>>>> +
>>>> + ?if (__cpu_model_var == NULL_TREE)
>>>> + ? ?__cpu_model_var = make_var_decl (__processor_model_type,
>>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"__cpu_model");
>>>> +
>>>> + ?/* Look at fndecl code to identify the field requested. */
>>>> + ?switch (DECL_FUNCTION_CODE (fndecl))
>>>> + ? ?{
>>>> + ? ?case BUILT_IN_TARGET_SUPPORTS_CMOV:
>>>> + ? ? ?field = get_field_from_struct (__processor_features_type, F_CMOV);
>>>> + ? ? ?which_struct = __cpu_features_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_SUPPORTS_MMX:
>>>> + ? ? ?field = get_field_from_struct (__processor_features_type, F_MMX);
>>>> + ? ? ?which_struct = __cpu_features_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_SUPPORTS_POPCOUNT:
>>>> + ? ? ?field = get_field_from_struct (__processor_features_type, F_POPCNT);
>>>> + ? ? ?which_struct = __cpu_features_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_SUPPORTS_SSE:
>>>> + ? ? ?field = get_field_from_struct (__processor_features_type, F_SSE);
>>>> + ? ? ?which_struct = __cpu_features_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_SUPPORTS_SSE2:
>>>> + ? ? ?field = get_field_from_struct (__processor_features_type, F_SSE2);
>>>> + ? ? ?which_struct = __cpu_features_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_SUPPORTS_SSE3:
>>>> + ? ? ?field = get_field_from_struct (__processor_features_type, F_SSE3);
>>>> + ? ? ?which_struct = __cpu_features_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_SUPPORTS_SSSE3:
>>>> + ? ? ?field = get_field_from_struct (__processor_features_type, F_SSE3);
>>>> + ? ? ?which_struct = __cpu_features_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_SUPPORTS_SSE4_1:
>>>> + ? ? ?field = get_field_from_struct (__processor_features_type, F_SSE4_1);
>>>> + ? ? ?which_struct = __cpu_features_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_SUPPORTS_SSE4_2:
>>>> + ? ? ?field = get_field_from_struct (__processor_features_type, F_SSE4_2);
>>>> + ? ? ?which_struct = __cpu_features_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_IS_AMD:
>>>> + ? ? ?field = get_field_from_struct (__processor_model_type, M_AMD);;
>>>> + ? ? ?which_struct = __cpu_model_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_IS_INTEL:
>>>> + ? ? ?field = get_field_from_struct (__processor_model_type, M_INTEL);;
>>>> + ? ? ?which_struct = __cpu_model_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_IS_COREI7_NEHALEM:
>>>> + ? ? ?field = get_field_from_struct (__processor_model_type, M_COREI7_NEHALEM);;
>>>> + ? ? ?which_struct = __cpu_model_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_IS_COREI7_WESTMERE:
>>>> + ? ? ?field = get_field_from_struct (__processor_model_type, M_COREI7_WESTMERE);;
>>>> + ? ? ?which_struct = __cpu_model_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_IS_COREI7_SANDYBRIDGE:
>>>> + ? ? ?field = get_field_from_struct (__processor_model_type, M_COREI7_SANDYBRIDGE);;
>>>> + ? ? ?which_struct = __cpu_model_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_IS_AMDFAM10_BARCELONA:
>>>> + ? ? ?field = get_field_from_struct (__processor_model_type, M_AMDFAM10_BARCELONA);;
>>>> + ? ? ?which_struct = __cpu_model_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_IS_AMDFAM10_SHANGHAI:
>>>> + ? ? ?field = get_field_from_struct (__processor_model_type, M_AMDFAM10_SHANGHAI);;
>>>> + ? ? ?which_struct = __cpu_model_var;
>>>> + ? ? ?break;
>>>> + ? ?case BUILT_IN_TARGET_IS_AMDFAM10_ISTANBUL:
>>>> + ? ? ?field = get_field_from_struct (__processor_model_type, M_AMDFAM10_ISTANBUL);;
>>>> + ? ? ?which_struct = __cpu_model_var;
>>>> + ? ? ?break;
>>>> + ? ?default:
>>>> + ? ? ?return NULL_TREE;
>>>> + ? ?}
>>>> +
>>>> + ?return build3 (COMPONENT_REF, TREE_TYPE (field), which_struct, field, NULL_TREE);
>>>> +}
>>>> +
>>>> +/* Folds __builtin_target_* builtins. */
>>>> +
>>>> +static tree
>>>> +ix86_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
>>>> + ? ? ? ? ? ? ? ? ? tree *args ATTRIBUTE_UNUSED, bool ignore ATTRIBUTE_UNUSED)
>>>> +{
>>>> + ?const char *decl_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>>>> + ?if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
>>>> + ? ? ?&& strstr(decl_name, "__builtin_target") != NULL)
>>>> + ? ?return fold_builtin_target (fndecl);
>>>> +
>>>> + ?return NULL_TREE;
>>>> +}
>>>> +
>>>> ?/* Worker function for TARGET_SETUP_INCOMING_VARARGS. ?*/
>>>>
>>>> ?static void
>>>> @@ -35097,6 +35431,9 @@ ix86_autovectorize_vector_sizes (void)
>>>> ?#undef TARGET_BUILD_BUILTIN_VA_LIST
>>>> ?#define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list
>>>>
>>>> +#undef TARGET_FOLD_BUILTIN
>>>> +#define TARGET_FOLD_BUILTIN ix86_fold_builtin
>>>> +
>>>> ?#undef TARGET_ENUM_VA_LIST_P
>>>> ?#define TARGET_ENUM_VA_LIST_P ix86_enum_va_list
>>>>
>>>>
>>>> --
>>>> This patch is available for review at http://codereview.appspot.com/4893046
>>>>
>>>
>>
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]