This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/77308] surprisingly large stack usage for sha512 on arm
- From: "bernd.edlinger at hotmail dot de" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Thu, 03 Nov 2016 14:50:43 +0000
- Subject: [Bug target/77308] surprisingly large stack usage for sha512 on arm
- Auto-submitted: auto-generated
- References: <bug-77308-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #56 from Bernd Edlinger <bernd.edlinger at hotmail dot de> ---
(In reply to wilco from comment #55)
> (In reply to Bernd Edlinger from comment #39)
> > Created attachment 39940 [details]
> > proposed patch, v2
> >
> > last upload was accidentally truncated.
> > uploaded the right patch.
>
> Right so looking at your patch, I think we should make the LDRD peephole
> change in a separate patch. I tried your foo example on all combinations of
> ARM, Thumb-2, VFP, NEON on various CPUs with both settings of
> prefer_ldrd_strd.
>
> In all cases the current GCC generates LDRD/STRD, even for zero offsets.
> CPUs where prefer_ldrd_strd=false emit LDR/STR for the shifts with
> -msoft-float or -mfpu=vfp (but not -mfpu=neon). This is clearly incorrect
> given that LDRD/STRD is used in all other cases, and prefer_ldrd_strd seems
> to imply whether to prefer using LDRD/STRD in prolog/epilog and inlined
> memcpy.
>
> So that means we should remove the odd checks for codesize and
> current_tune->prefer_ldrd_strd from all the peepholes.
Agreed, I can split the patch.
From what I understand, we should never emit ldrd/strd out of
the memmovdi2 pattern when optimizing for speed and disable
the peephole in the way I proposed it in the patch.