This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.
- From: "Jones, Joel" <Joel dot Jones at cavium dot com>
- To: James Greenhalgh <james dot greenhalgh at arm dot com>
- Cc: Andrew Pinski <pinskia at gmail dot com>, "Pinski, Andrew" <Andrew dot Pinski at cavium dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, "nd at arm dot com" <nd at arm dot com>
- Date: Wed, 2 Nov 2016 12:11:50 +0000
- Subject: Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Joel dot Jones at cavium dot com;
- References: <CA+=Sn1n52gDzREzkDLDYC-JafXE9jce4Sv95vDMM3Sgc-UY0kA@mail.gmail.com>,<20161102105437.GA18140@arm.com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
What is currently submitted for LLVM review was submitted before we determined this naming scheme. I will mark the current submittal as abandoned, as the scheduling model needs to be split out and revised.
Joel Jones
Sent from my AArch64 powered iPhone
> On Nov 2, 2016, at 3:55 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
>> On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote:
>>> On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski <apinski@cavium.com> wrote:
>>> Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 (variant 1)
>>> is an ARMv8.1 part, I needed to add detecting of the variant also for this
>>> difference. Also I simplify a little bit and combined the single core and
>>> arch detecting cases so it would be easier to add variant.
>>
>> Actually it is a bit more complex than what I said here, see below for
>> the full table of options and what are enabled/disabled now.
>>
>>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>>> Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is
>>> deecting the two seperately.
>>
>>
>> Here is the final patch in this series updated; I changed the cpu name
>> slightly and made sure I updated invoke.texi too.
>>
>> The names are going to match the names in LLVM (worked with our LLVM
>> engineer here at Cavium about the names).
>> Here are the names recorded and
>> -mpcu=thunderx:
>> * Matches part num 0xA0 (reserved for ThunderX 8x series)
>> * T88 Pass 2 scheduling
>> * Hardware prefetching (software prefetching disabled)
>> * LSE enabled
>> * no v8.1
>
> This doesn't match the current LLVM proposal
> ( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support
> for -mcpu=thunderx.
>
>> -mcpu=thunderxt88:
>> * Matches part num 0xA1
>> * T88 Pass 2 scheduling
>> * software prefetching enabled
>> * LSE enabled
>> * no v8.1
>>
>> -mcpu=thunderxt88p1 (only for GCC):
>> * Matches part num 0xA1, variant 0
>> * T88 Pass 1 scheduling
>> * software prefetching enabled
>> * no LSE enabled
>> * no v8.1
>>
>> -mcpu=thunderxt81 and -mcpu=thunderxt83:
>> * Matches part num 0xA2/0xA3
>> * T88 Pass 2 scheduling
>> * Hardware prefetching (software prefetching disabled)
>> * LSE enabled
>> * v8.1
>
> This looks like what has been added to LLVM as -mcpu=thunderx.
>
>> I have not hooked up software vs hardware prefetching and the
>> scheduler parts (the next patch will do part of that); both ARMv8.1-a
>> and LSE parts are hooked up as those parts are only in
>> aarch64-cores.def.
>>
>> OK? Bootstrapped and tested on ThunderX T88 and ThunderX T81
>> (aarch64-linux-gnu).
>>
>> Index: common/config/aarch64/aarch64-common.c
>> ===================================================================
>> --- common/config/aarch64/aarch64-common.c (revision 241727)
>> +++ common/config/aarch64/aarch64-common.c (working copy)
>> @@ -145,7 +145,7 @@ struct arch_to_arch_name
>> the default set of architectural feature flags they support. */
>> static const struct processor_name_to_arch all_cores[] =
>> {
>> -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \
>> +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, VARIANT) \
>> {NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS},
>> #include "config/aarch64/aarch64-cores.def"
>> {"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8},
>> Index: config/aarch64/aarch64-cores.def
>> ===================================================================
>> --- config/aarch64/aarch64-cores.def (revision 241727)
>> +++ config/aarch64/aarch64-cores.def (working copy)
>> @@ -21,7 +21,7 @@
>>
>> Before using #include to read this file, define a macro:
>>
>> - AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART)
>> + AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, VARIANT)
>>
>> The CORE_NAME is the name of the core, represented as a string constant.
>> The CORE_IDENT is the name of the core, represented as an identifier.
>> @@ -39,39 +39,45 @@
>> PART is the part number of the CPU. On a GNU/Linux system it can be
>> found in /proc/cpuinfo. For big.LITTLE systems this should use the
>> macro AARCH64_BIG_LITTLE where the big part number comes as the first
>> - argument to the macro and little is the second. */
>> + argument to the macro and little is the second.
>> + VARIANT is the variant of the CPU. In a GNU/Linux system it can found
>> + in /proc/cpuinfo. If this is -1, this means it can match any variant. */
>>
>> /* V8 Architecture Processors. */
>>
>> /* ARM ('A') cores. */
>> -AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04)
>> -AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03)
>> -AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07)
>> -AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08)
>> -AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09)
>> +AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04, -1)
>> +AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03, -1)
>> +AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07, -1)
>> +AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08, -1)
>> +AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, 0xd09, -1)
>>
>> /* Samsung ('S') cores. */
>> -AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001)
>> +AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001, -1)
>>
>> /* Qualcomm ('Q') cores. */
>> -AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0x800)
>> +AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0x800, -1)
>>
>> /* Cavium ('C') cores. */
>> -AARCH64_CORE("thunderx", thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a1)
>> +AARCH64_CORE("thunderx", thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, thunderx, 0x43, 0x0a0, -1)
>> +AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a1, 0)
>> +AARCH64_CORE("thunderxt88", thunderxt88, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, thunderx, 0x43, 0x0a1, -1)
>
> You probably want a comment somewhere here making it clear that the ordering
> of thunderxt88p1 and thunderxt88 must remain as is, or detection will fail
> (-1 will match before 0). Otherwise someone will come along and helpfully
> put these in alphabetical order and cause you trouble...
>
>> +AARCH64_CORE("thunderxt81", thunderxt81, thunderx, 8_1A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, thunderx, 0x43, 0x0a2, -1)
>> +AARCH64_CORE("thunderxt83", thunderxt83, thunderx, 8_1A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, thunderx, 0x43, 0x0a3, -1)
>>
>> /* APM ('P') cores. */
>> -AARCH64_CORE("xgene1", xgene1, xgene1, 8A, AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000)
>> +AARCH64_CORE("xgene1", xgene1, xgene1, 8A, AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000, -1)
>>
>> /* V8.1 Architecture Processors. */
>>
>> /* Broadcom ('B') cores. */
>> -AARCH64_CORE("vulcan", vulcan, cortexa57, 8_1A, AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, vulcan, 0x42, 0x516)
>> +AARCH64_CORE("vulcan", vulcan, cortexa57, 8_1A, AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, vulcan, 0x42, 0x516, -1)
>>
>> /* V8 big.LITTLE implementations. */
>>
>> -AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03))
>> -AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE (0xd08, 0xd03))
>> -AARCH64_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd09, 0xd04))
>> -AARCH64_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd09, 0xd03))
>> +AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1)
>> +AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE (0xd08, 0xd03), -1)
>> +AARCH64_CORE("cortex-a73.cortex-a35", cortexa73cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd09, 0xd04), -1)
>> +AARCH64_CORE("cortex-a73.cortex-a53", cortexa73cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE (0xd09, 0xd03), -1)
>
> Why do variants for big.LITTLE get a single variant number, but you track
> two variant numbers in the code below?
>
> Thanks,
> James