[patch, arm] align saved FP regs on stack

Ramana Radhakrishnan ramana.gcc@googlemail.com
Wed May 6 09:46:00 GMT 2015


On Sat, Nov 15, 2014 at 12:46 AM, Sandra Loosemore
<sandra@codesourcery.com> wrote:
> On ARM targets, the stack is aligned to an 8-byte boundary, but when
> saving/restoring the VFP coprocessor registers in the function
> prologue/epilogue, it is possible for the 8-byte values to end up at
> locations that are 4-byte aligned but not 8-byte aligned.  This can result
> in a performance penalty on micro-architectures that are optimized for
> well-aligned data, especially when such a misalignment may result in cache
> line splits within a single access.  This patch detects when at least one
> coprocessor register value needs to be saved and adds some additional
> padding to the stack at that point if necessary to align it to an 8-byte
> boundary.  I've re-used the existing logic to try pushing a 4-byte scratch
> register and only fall back to an explicit stack adjustment if that fails.
>
> NVIDIA found that an earlier version of this patch (benchmarked with
> SPECint2k and SPECfp2k on an older version of GCC) gave measurable
> improvements on their Tegra K1 64-bit processor, aka "Denver".  We aren't
> sure what other ARM processors might benefit from the extra alignment, so
> we've given it its own command-line option instead of tying it to -mtune.
>
> I did some hand-testing of this patch on small test cases to verify that the
> expected alignment was happening, but it seemed to me that the expected
> assembly-language patterns were likely too fragile to be hard-wired into a
> test case.  I also ran regression tests both with and without the switch set
> so it doesn't break other things.  OK to commit?
>
> -Sandra


Coming back to an old patch , now that we are in stage1 again. In the
ARM backend  we are moving away from such bespoke command line options
that do not get widely tested but control CPU tuning options.

The way of doing this these days would be to add this to the CPU
tuning tables as a tuning option along with (the) command line option.
Currently this would be set to false, but it then gives folks who want
to try this out on other cores a chance to turn this on by default. I
would prefer that a -mcpu=denver option was added to the compiler that
then allowed for testing with --with-cpu=denver for those folks who'd
like to auto-test the compiler with such a feature and that inherited
the features from a tuning table that was close enough to denver.


regards
Ramana


>



More information about the Gcc-patches mailing list