This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: Honnor ix86_accumulate_outgoing_args again
- From: "Zamyatin, Igor" <igor dot zamyatin at intel dot com>
- To: Vladimir Makarov <vmakarov at redhat dot com>, Jan Hubicka <hubicka at ucw dot cz>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Sun, 3 Nov 2013 09:58:57 +0000
- Subject: RE: Honnor ix86_accumulate_outgoing_args again
- Authentication-results: sourceware.org; auth=none
- References: <20131002173249 dot GB12304 at kam dot mff dot cuni dot cz> <20131002224516 dot GA26046 at kam dot mff dot cuni dot cz> <524CC41F dot 5090801 at redhat dot com> <20131003130524 dot GC16774 at kam dot mff dot cuni dot cz> <20131010184005 dot GA26449 at kam dot mff dot cuni dot cz> <0EFAB2BDD0F67E4FB6CCC8B9F87D7569427DC81A at IRSMSX101 dot ger dot corp dot intel dot com> <20131014164343 dot GA17422 at kam dot mff dot cuni dot cz> <0EFAB2BDD0F67E4FB6CCC8B9F87D7569427DE1D0 at IRSMSX101 dot ger dot corp dot intel dot com> <0EFAB2BDD0F67E4FB6CCC8B9F87D7569427F84D0 at IRSMSX101 dot ger dot corp dot intel dot com> <20131019203016 dot GA27981 at kam dot mff dot cuni dot cz> <526496DC dot 9030907 at redhat dot com>
So, Jan, what do you think will be best solution for stage 1?
Thanks,
Igor
> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Vladimir Makarov
> Sent: Monday, October 21, 2013 6:52 AM
> To: Jan Hubicka; Zamyatin, Igor; gcc-patches@gcc.gnu.org
> Subject: Re: Honnor ix86_accumulate_outgoing_args again
>
> On 13-10-19 4:30 PM, Jan Hubicka wrote:
> >> Jan,
> >>
> >> Does this seem reasonable to you?
> > Oops, sorry, I missed your email. (I was travelling and I am finishing
> > a paper now).
> >> Thanks,
> >> Igor
> >>
> >>> -----Original Message-----
> >>> From: Zamyatin, Igor
> >>> Sent: Tuesday, October 15, 2013 3:48 PM
> >>> To: Jan Hubicka
> >>> Subject: RE: Honnor ix86_accumulate_outgoing_args again
> >>>
> >>> Jan,
> >>>
> >>> Now we have following prologue in, say, phi0 routine in equake
> >>>
> >>> 0x804aa90 1 push %ebp
> >>> 0x804aa91 2 mov %esp,%ebp
> >>> 0x804aa93 3 sub $0x18,%esp
> >>> 0x804aa96 4 vmovsd 0x80ef7a8,%xmm0
> >>> 0x804aa9e 5 vmovsd 0x8(%ebp),%xmm1
> >>> 0x804aaa3 6 vcomisd %xmm1,%xmm0 <-- we see big stall somewhere
> here
> >>> or 1-2 instructions above
> >>>
> >>> While earlier it was
> >>>
> >>> 0x804abd0 1 sub $0x2c,%esp
> >>> 0x804abd3 2 vmovsd 0x30(%esp),%xmm1
> >>> 0x804abd9 3 vmovsd 0x80efcc8,%xmm0
> >>> 0x804abe1 4 vcomisd %xmm1,%xmm0
> > Thanks for analysis! It is a different benchmark than for bulldozer,
> > but apparently same case. Again we used to eliminate frame pointer
> > here but IRS now doesn't Do you see the same regression with
> > -fno-omit-frame-pointer -maccumulate-outgoing-args?
> >
> > I suppose this is a conflict in between the push instruction hanled by
> > stack engine and initialization of EBP that isn't. That would explain
> > why bulldozer don't seem to care about this particular benchmark (its
> > stack engine seems to have quite different design).
> >
> > This is a bit sad situation - accumulate-outgoing-args is expensive
> > code size wise and it seems we don't really need esp with -mno-
> accumulate-outgoing-args.
> > The non-accumulation code path was mistakely disabled for too long ;(
> >
> > Vladimir, how much effort do you think it will be to fix the frame
> > pointer elimination here?
> My guess is a week. The problem I am busy and having some problems with
> two small projects right now which I'd like to include into gcc-4.9.
>
> But I think, this still can be fixed on stage2 as it is a PR.
>
> > I can imagine it is a quite tricky case. If so I would suggest adding
> > m_CORE_ALL to X86_TUNE_ACCUMULATE_OUTGOING_ARGS with a
> comment
> > explaining the problem and mentioning the regression on equake on core
> > and mgrid on Bulldizer and opening an enhancement request for this...
> >
> > I also wonder if direct ESP use and push/pop instructions are causing
> > so noticeable issues, I wonder if we can't "shrink wrap" this into
> > red-zone in the 64bit compilation. It seems that even with
> > -maccumulate-outgoing-args pushing the frame allocation as late as
> > possible in the function would be a good idea so it is not close to the
> push/pop/call/ret.
> >
> >