This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Honnor ix86_accumulate_outgoing_args again


On 13-10-19 4:30 PM, Jan Hubicka wrote:
Jan,

Does this seem reasonable to you?
Oops, sorry, I missed your email. (I was travelling and I am finishing a paper
now).
Thanks,
Igor

-----Original Message-----
From: Zamyatin, Igor
Sent: Tuesday, October 15, 2013 3:48 PM
To: Jan Hubicka
Subject: RE: Honnor ix86_accumulate_outgoing_args again

Jan,

Now we have following prologue in, say, phi0 routine in equake

0x804aa90 1  push   %ebp
0x804aa91 2  mov    %esp,%ebp
0x804aa93 3  sub    $0x18,%esp
0x804aa96 4  vmovsd 0x80ef7a8,%xmm0
0x804aa9e 5  vmovsd 0x8(%ebp),%xmm1
0x804aaa3 6  vcomisd %xmm1,%xmm0   <-- we see big stall somewhere here
or 1-2 instructions above

While earlier it was

0x804abd0 1 sub    $0x2c,%esp
0x804abd3 2 vmovsd 0x30(%esp),%xmm1
0x804abd9 3 vmovsd 0x80efcc8,%xmm0
0x804abe1 4 vcomisd %xmm1,%xmm0
Thanks for analysis! It is a different benchmark than for bulldozer, but
apparently same case.  Again we used to eliminate frame pointer here but IRS
now doesn't Do you see the same regression with -fno-omit-frame-pointer
-maccumulate-outgoing-args?

I suppose this is a conflict in between the push instruction hanled by stack
engine and initialization of EBP that isn't.  That would explain why bulldozer
don't seem to care about this particular benchmark (its stack engine seems to
have quite different design).

This is a bit sad situation - accumulate-outgoing-args is expensive code size
wise and it seems we don't really need esp with -mno-accumulate-outgoing-args.
The non-accumulation code path was mistakely disabled for too long ;(

Vladimir, how much effort do you think it will be to fix the frame pointer
elimination here?
My guess is a week.  The problem I am busy and having some problems with two
small projects right now which I'd like to include into gcc-4.9.

But I think, this still can be fixed on stage2 as it is a PR.

I can imagine it is a quite tricky case. If so I would suggest adding m_CORE_ALL
to X86_TUNE_ACCUMULATE_OUTGOING_ARGS with a comment explaining the problem and
mentioning the regression on equake on core and mgrid on Bulldizer and opening
an enhancement request for this...

I also wonder if direct ESP use and push/pop instructions are causing so
noticeable issues, I wonder if we can't "shrink wrap" this into red-zone in the
64bit compilation.  It seems that even with -maccumulate-outgoing-args pushing
the frame allocation as late as possible in the function would be a good idea
so it is not close to the push/pop/call/ret.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]