This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Reduce stack usage in sha512 (PR target/77308)



On 30/09/16 14:34, Bernd Edlinger wrote:
On 09/30/16 12:14, Bernd Edlinger wrote:
Eric Botcazou wrote:
A comment before the SETs and a testcase would be nice.  IIRC
we do have stack size testcases via using -fstack-usage.
Or -Wstack-usage, which might be more appropriate here.
Yes.  good idea.  I was not aware that we already have that kind of tests.

When trying to write this test. I noticed, that I did not try -Os so far.
But for -Os the stack is still the unchanged 3500 bytes.

However for embedded targets I am often inclined to use -Os, and
would certainly not expect the stack to explode...

I see in arm.md there are places like

        /* If we're optimizing for size, we prefer the libgcc calls.  */
        if (optimize_function_for_size_p (cfun))
          FAIL;

Oh, yeah.  The comment is completely misleading.

If this pattern fails, expmed.c simply expands some
less efficient rtl, which also results in two shifts
and one or-op.  No libgcc calls at all.

So in simple cases without spilling the resulting
assembler is the same, regardless if this pattern
fails or not.  But the half-defined out registers
make a big difference when it has to be spilled.

        /* Expand operation using core-registers.
           'FAIL' would achieve the same thing, but this is a bit smarter.  */
        scratch1 = gen_reg_rtx (SImode);
        scratch2 = gen_reg_rtx (SImode);
        arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1],
                                       operands[2], scratch1, scratch2);


.. that explains why this happens.  I think it would be better to
use the emit_coreregs for shift count >= 32, because these are
effectively 32-bit shifts.

Will try if that can be improved, and come back with the
results.

The test case with -Os has 3520 bytes stack usage.
When only shift count >= 32 are handled we
have still 3000 bytes stack usage.
And when arm_emit_coreregs_64bit_shift is always
allowed to run, we have 2360 bytes stack usage.

Also for the code size it is better not to fail this
pattern.  So I propose to remove this exception in all
three expansions.

Here is an improved patch with the test case from the PR.
And a comment on the redundant SET why it is better to clear
the out register first.


Bootstrap and reg-testing on arm-linux-gnueabihf.
Is it OK for trunk?

This looks ok to me.
Thanks,
Kyrill


Thanks
Bernd.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]