This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug target/77308] surprisingly large stack usage for sha512 on arm


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #56 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
(In reply to wilco from comment #55)
> (In reply to Bernd Edlinger from comment #39)
> > Created attachment 39940 [details]
> > proposed patch, v2
> > 
> > last upload was accidentally truncated.
> > uploaded the right patch.
> 
> Right so looking at your patch, I think we should make the LDRD peephole
> change in a separate patch. I tried your foo example on all combinations of
> ARM, Thumb-2, VFP, NEON on various CPUs with both settings of
> prefer_ldrd_strd.
> 
> In all cases the current GCC generates LDRD/STRD, even for zero offsets.
> CPUs where prefer_ldrd_strd=false emit LDR/STR for the shifts with
> -msoft-float or -mfpu=vfp (but not -mfpu=neon). This is clearly incorrect
> given that LDRD/STRD is used in all other cases, and prefer_ldrd_strd seems
> to imply whether to prefer using LDRD/STRD in prolog/epilog and inlined
> memcpy.
> 
> So that means we should remove the odd checks for codesize and
> current_tune->prefer_ldrd_strd from all the peepholes.

Agreed, I can split the patch.

From what I understand, we should never emit ldrd/strd out of
the memmovdi2 pattern when optimizing for speed and disable
the peephole in the way I proposed it in the patch.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]