[Bug target/77308] surprisingly large stack usage for sha512 on arm

Tue Nov 1 11:43:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #36 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #34)
> (In reply to Richard Earnshaw from comment #33)
> > (In reply to Wilco from comment #32)
> > > (In reply to Bernd Edlinger from comment #31)
> > > > Furthermore, if I want to do -Os the third condition is FALSE too.
> > > > But one ldrd must be shorter than two ldr ?
> > > > 
> > > > That seems wrong...
> > > 
> > > Indeed, on a target that supports LDRD you want to use LDRD if legal. LDM
> > > should only be tried on Thumb-1. Emitting LDRD from a peephole when the
> > > offset is in range will never increase code size so should always be enabled.
> > 
> > The logic is certainly strange.  Some cores run LDRD less quickly than they
> > can do LDM, or even two independent loads.  I suspect the logic is meant to
> > be: use LDRD if available and not (optimizing for speed on a slow
> > LDRD-device).
> 
> Ok, so instead of removing this completely I should change it to:
>    TARGET_LDRD
>    && (current_tune->prefer_ldrd_strd
>        || optimize_function_for_size_p (cfun))

That's better but still won't emit LDRD as it seems most cores have
prefer_ldrd_strd disabled... Given that we currently always emit LDRD/STRD for
DI mode accesses, this should just check TARGET_LDRD.