This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: Patch to implement Aarch64 SIMD ABI


On Thu, Jul 19, 2018 at 8:31 AM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> Hi,
>
> Thanks for doing this.
>
> Steve Ellcey <sellcey@cavium.com> writes:
>> This is a patch to support the Aarch64 SIMD ABI [1] in GCC.  I intend
>> to eventually follow this up with two more patches; one to define the
>> TARGET_SIMD_CLONE* macros and one to improve the GCC register
>> allocation/usage when calling SIMD functions.
>>
>> The significant difference between the standard ARM ABI and the SIMD ABI
>> is that in the normal ABI a callee saves only the lower 64 bits of registers
>> V8-V15, in the SIMD ABI the callee must save all 128 bits of registers
>> V8-V23.
>>
>> This patch checks for SIMD functions and saves the extra registers when
>> needed.  It does not change the caller behavour, so with just this patch
>> there may be values saved by both the caller and callee.  This is not
>> efficient, but it is correct code.
>>
>> This patch bootstraps and passes the GCC testsuite but that only verifies
>> I haven't broken anything, it doesn't validate the handling of SIMD functions.
>> I tried to write some tests, but I could never get GCC to generate code
>> that would save the FP callee-save registers in the prologue.  Complex code
>> might generate spills and fills but it never triggered the prologue/epilogue
>> code to save V8-V23.  If anyone has ideas on how to write a test that would
>> cause GCC to generate this code I would appreciate some ideas.  Just doing
>> lots of calculations with lots of intermediate values doesn't seem to be enough.
>
> Probably easiest to use asm clobbers, e.g.:
>
> void __attribute__ ((aarch64_vector_pcs))
> f (void)
> {
>   asm volatile ("" ::: "s8", "s13");
> }
>
> This also lets you control exactly which registers are saved.

For just checking the save and restore the technique Richard suggests
is probably sufficient.

One of the techniques I've used in the past in general is to force
everything to be tested with a command line option added for testing
-. In this case after all the C library dependence of the testsuite
isn't huge and there wouldn't be any vector PCS interfaces to the
libraries needed to run the testsuite that things would work ?

You could cross-check coverage by using lcov. Instructions thanks to
marxin are :

$> ../configure --disable-bootstrap --enable-coverage=opt
--enable-languages=c,c++,fortran,go,jit,lto --enable-host-shared
$> make
$> make check
$>find gcc/testsuite/ -name '*.gcda' -exec rm -rf {} \;
$> lcov -d . --capture --output-file gcc.info
$> lcov --remove gcc.info "/usr/*" "/opt/*" "*/gcc/gt-*"
"*/gcc/gtype-*" --output-file gcc.info
$> genhtml gcc.info --ignore-errors=source --output-directory html
--html-epilog epilog.txt

You will see a warning

genhtml: WARNING: cannot read $builddir/gcc/cfns.gperf

and you can ignore that.

Then just look at the html output that it produces, pretty neat and I
see about 80% coverage on an aarch64-none-linux-gnu test run with
c,c++,fortran,go,lto IIRC.


regards
Ramana

1.
>
>> @@ -4105,7 +4128,8 @@ aarch64_layout_frame (void)
>>        {
>>       /* If there is an alignment gap between integer and fp callee-saves,
>>          allocate the last fp register to it if possible.  */
>> -     if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
>> +     if (regno == last_fp_reg && has_align_gap
>> +         && !simd_function && (offset & 8) == 0)
>>         {
>>           cfun->machine->frame.reg_offset[regno] = max_int_offset;
>>           break;
>> @@ -4117,7 +4141,7 @@ aarch64_layout_frame (void)
>>       else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM
>>                && cfun->machine->frame.wb_candidate1 >= V0_REGNUM)
>>         cfun->machine->frame.wb_candidate2 = regno;
>> -     offset += UNITS_PER_WORD;
>> +     offset += simd_function ? UNITS_PER_VREG : UNITS_PER_WORD;
>>        }
>>
>>    offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
>> @@ -4706,8 +4730,11 @@ aarch64_process_components (sbitmap components, bool prologue_p)
>>    while (regno != last_regno)
>>      {
>>        /* AAPCS64 section 5.1.2 requires only the bottom 64 bits to be saved
>> -      so DFmode for the vector registers is enough.  */
>> -      machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode;
>> +      so DFmode for the vector registers is enough.  For simd functions
>> +         we want to save the entire register.  */
>> +      machine_mode mode = GP_REGNUM_P (regno) ? E_DImode
>> +     : (aarch64_simd_function_p (cfun->decl) ? E_TFmode : E_DFmode);
>
> This condition also occurs in aarch64_push_regs and aarch64_pop_regs.
> It'd probably be worth splitting it out into a subfunction.
>
> I think you also need to handle the writeback cases, which should work
> for Q registers too.  This will mean extra loadwb_pair and storewb_pair
> patterns.
>
> LGTM otherwise FWIW.
>
>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>> index f284e74..d11474e 100644
>> --- a/gcc/config/aarch64/aarch64.h
>> +++ b/gcc/config/aarch64/aarch64.h
>> @@ -500,6 +500,8 @@ extern unsigned aarch64_architecture_version;
>>  #define PR_LO_REGNUM_P(REGNO)\
>>    (((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM))
>>
>> +#define FP_SIMD_SAVED_REGNUM_P(REGNO)                        \
>> +  (((unsigned) (REGNO - V8_REGNUM)) <= (V23_REGNUM - V8_REGNUM))
>
> (We should probably rewrite these to use IN_RANGE at some point,
> but I agree it's better to be consistent until then.)
>
> Thanks,
> Richard


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]