This is the mail archive of the
mailing list for the GCC project.
Re: Revisit Core tunning flags
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Xinliang David Li <davidxl at google dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, GCC Patches <gcc-patches at gcc dot gnu dot org>, "H.J. Lu" <hjl dot tools at gmail dot com>, Wei Mi <wmi at google dot com>
- Date: Sun, 22 Sep 2013 10:26:24 +0200
- Subject: Re: Revisit Core tunning flags
- Authentication-results: sourceware.org; auth=none
- References: <20130921195426 dot GA20274 at kam dot mff dot cuni dot cz> <CAAkRFZKtN7V2EOUrufLEWpexZc+MZiaz+u7DGAff9NMyc078DA at mail dot gmail dot com> <CAAkRFZKkY+BZjGLsp6EYs+BnShEHUGxDUtSk=TRG1=0YRbOy_g at mail dot gmail dot com>
> On Sat, Sep 21, 2013 at 3:51 PM, Xinliang David Li <firstname.lastname@example.org> wrote:
> > On Sat, Sep 21, 2013 at 12:54 PM, Jan Hubicka <email@example.com> wrote:
> >> Hi,
> >> this is upated version of patch discussed at
> >> http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00841.html
> >> It makes CORE tuning to more follow the optimization guidelines.
> >> In particular it removes some tuning flags for features I implemented years
> >> back specifically for K7/K8 chips that ended up in Core tunning becuase
> >> it was based on generic. Incrementally I plan to drop some of these from
> >> generic, too.
> >> Compared to previous version of patch I left out INC_DEC change, even
> >> though Core I7+ should resolve dependencies on partial flags correctly.
> >> Optimization manual still seems to suggest to not use this:
> >> Assembly/Compiler Coding Rule 33. (M impact, H generality)
> >> INC and DEC instructions should be replaced with ADD or SUB instructions,
> >> because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore
> >> creating false dependencies on earlier instructions that set the flags.
> >> Other change dropped is use_vector_fp_converts that seems to improve
> >> Core perofrmance.
> > I did not see this in your patch, but Wei has this tuning in this patch:
> Sorry, I meant to ask why dropping this part?
Because I wanted to go with obvious changes first.
> > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00884.html
This patch seems resonable. (in fact I have pretty much same in my tree)
use_vector_fp_converts is actually trying to solve the same problem in AMD
hardware - you need to type the whole register when converting.
So it may work well for AMD chips too or may be the difference is that
Intel chips somehow handle "cvtpd2ps %xmm0, %xmm0" well even though
the upper half of xmm0 is ill defined, while AMD chips doesn't.
The patch seems OK. I do not see rason for
&& peep2_reg_dead_p (0, operands)
test. Reg has to be dead since it is full destination of the operation.
Lets wait few days before commit so we know effect of
individual changes. I will test it on AMD hardware and we can decide on
generic tuning then.