This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Revisit Core tunning flags


> On Sat, Sep 21, 2013 at 3:51 PM, Xinliang David Li <davidxl@google.com> wrote:
> > On Sat, Sep 21, 2013 at 12:54 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> Hi,
> >> this is upated version of patch discussed at
> >> http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00841.html
> >>
> >> It makes CORE tuning to more follow the optimization guidelines.
> >> In particular it removes some tuning flags for features I implemented years
> >> back specifically for K7/K8 chips that ended up in Core tunning becuase
> >> it was based on generic. Incrementally I plan to drop some of these from
> >> generic, too.
> >>
> >> Compared to previous version of patch I left out INC_DEC change, even
> >> though Core I7+ should resolve dependencies on partial flags correctly.
> >> Optimization manual still seems to suggest to not use this:
> >>
> >> Assembly/Compiler Coding Rule 33. (M impact, H generality)
> >> INC and DEC instructions should be replaced with ADD or SUB instructions,
> >> because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore
> >> creating false dependencies on earlier instructions that set the flags.
> >>
> >> Other change dropped is use_vector_fp_converts that seems to improve
> >> Core perofrmance.
> >
> > I did not see this in your patch, but Wei has this tuning in this patch:
> >
> 
> Sorry, I meant to ask why dropping this part?

Because I wanted to go with obvious changes first.
> 
> David
> 
> > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00884.html

This patch seems resonable. (in fact I have pretty much same in my tree)
use_vector_fp_converts is actually trying to solve the same problem in AMD
hardware - you need to type the whole register when converting.   
So it may work well for AMD chips too or may be the difference is that
Intel chips somehow handle "cvtpd2ps        %xmm0, %xmm0" well even though
the upper half of xmm0 is ill defined, while AMD chips doesn't.

The patch seems OK. I do not see rason for
  && peep2_reg_dead_p (0, operands[0])
test.  Reg has to be dead since it is full destination of the operation.

Lets wait few days before commit so we know effect of
individual changes.  I will test it on AMD hardware and we can decide on
generic tuning then.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]