This is the mail archive of the
gcc-help@gcc.gnu.org
mailing list for the GCC project.
Re: GCC asm block optimizations on x86_64
On Wed, Aug 29, 2007 at 12:11:22PM +0100, Darryl Miles wrote:
> Rask Ingemann Lambertsen wrote:
> >On Tue, Aug 28, 2007 at 11:02:49PM +0100, Darryl Miles wrote:
> > Peephole definitions check for cases like this and won't do the
> >optimization clobbering the flags register if the flags register is live at
> >that point.
>
> So I take it that peephole works by knowing the instructions emitted
> with annotations about the lifetimes of registers / flags / other useful
> stuff to help it. I was thinking it was a bit more blind to things than
> that.
The peephole2 pass tries to match one or more instructions in GCC's RTL
format against a template, and if successful, replaces those instructions by
one or more new ones. You can include a condition for the replacement as
well as requiring a scratch register to be available. In this particular
case, the condition is that the flags registers isn't live.
> I did not understand the relevance to knowing if it is (*movsi_xxx) or
> (*movdi_xxx). From my point of view knowing that would not alter the 2
> original points I was making [1] and [3]. Maybe there is some
> pipelining (or other complex) issue I don't know about which makes the
> emitted code better than what I'm suggesting.
No, it's just that debugging problems with poor code, it's best to know
exactly what the compiler thinks it's generating.
> Recapping on the original issues:
>
> [1] failure to treat setting a register to the value of zero as a
> special case (since there maybe many ways to achieve this on a given
> CPU, different methods have different trades, insn length, unwanted side
> effects) which may allow this operation a lot of freedom for moving /
> scheduling.
> [3] usage of %ebx when %r8d would have been a better choice, at the time
> %ebx is needed to be allocated the lifetime of the temporary use of %r8d
> was over. i.e. allocating of registers which form outputs but not
> inputs should take place last thing (at the moment of #APP) maybe by
> doing this %r8d would have been a candidate ? which would negate the
> need for the push/pop's.
GCC's register allocator isn't as good as we'd like it to be. I think
it's causing both problems.
> Thanks for your thoughts. Maybe I am just expecting too much.
I don't think so.
--
Rask Ingemann Lambertsen