This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/77308] surprisingly large stack usage for sha512 on arm


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #37 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
(In reply to Bernd Edlinger from comment #34)
> (In reply to Richard Earnshaw from comment #33)

> > The logic is certainly strange.  Some cores run LDRD less quickly than they
> > can do LDM, or even two independent loads.  I suspect the logic is meant to
> > be: use LDRD if available and not (optimizing for speed on a slow
> > LDRD-device).
> 
> Ok, so instead of removing this completely I should change it to:
>    TARGET_LDRD
>    && (current_tune->prefer_ldrd_strd
>        || optimize_function_for_size_p (cfun))

That sounds about right.  Note that the original patch, back in 2013, said:

"* does not attempt to generate LDRD/STRD when optimizing for size and non of
the LDM/STM patterns match (but it would be easy to add),"
(https://gcc.gnu.org/ml/gcc-patches/2013-02/msg00604.html)

So it appears that this case was not attempted at the time.

I think when LDRD is not preferred we'd want to try to use LDM by preference if
the address offsets support it, even when optimizing for size.  Otherwise, use
LDRD if it supports the operation.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]