This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, i386]: Fix PR62011, False data dependency in popcnt instruction


Hello!

Attached patch fixes the problem with false data dependency on output
register for popcnt, lzcnt and tzcnt insns on sandybridge and haswell
targets.

The new insn pattern shadows existing one, and after reload, the
clearing isns is split out of the insn. This way the clearing insn can
be scheduled by postreload scheduler. The new pattern takes care to
avoid live registers, so the compiler is always able to clear output
reg.

The testcase from the PR, compiled with -O3 -march=corei7 improves on
Ivybridge from:

unsigned        209717360000    3.21002 sec     16.3329 GB/s
uint64_t        209717360000    4.06517 sec     12.8971 GB/s

to (-O3 -march=corei7 -mtune-ctrl=avoid_false_dep_for_bmi):

unsigned        209717360000    3.14541 sec     16.6683 GB/s
uint64_t        209717360000    2.3663 sec      22.1564 GB/s

Due to high impact, the new tune flag is enabled by default for Intel
tunes and generic:

m_SANDYBRIDGE | m_HASWELL | m_INTEL | m_GENERIC

2014-08-16  Uros Bizjak  <ubizjak@gmail.com>

    PR target/62011
    * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI):
    New tune flag.
    * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BMI): New define.
    * config/i386/i386.md (unspec) <UNSPEC_INSN_FALSE_DEP>: New unspec.
    (ffs<mode>2): Do not expand with tzcnt for
    TARGET_AVOID_FALSE_DEP_FOR_BMI.
    (ffssi2_no_cmove): Ditto.
    (*tzcnt<mode>_1): Disable for TARGET_AVOID_FALSE_DEP_FOR_BMI.
    (ctz<mode>2): New expander.
    (*ctz<mode>2_falsedep_1): New insn_and_split pattern.
    (*ctz<mode>2_falsedep): New insn.
    (*ctz<mode>2): Rename from ctz<mode>2.
    (clz<mode>2_lzcnt): New expander.
    (*clz<mode>2_lzcnt_falsedep_1): New insn_and_split pattern.
    (*clz<mode>2_lzcnt_falsedep): New insn.
    (*clz<mode>2): Rename from ctz<mode>2.
    (popcount<mode>2): New expander.
    (*popcount<mode>2_falsedep_1): New insn_and_split pattern.
    (*popcount<mode>2_falsedep): New insn.
    (*popcount<mode>2): Rename from ctz<mode>2.
    (*popcount<mode>2_cmp): Remove.
    (*popcountsi2_cmp_zext): Ditto.

The patch was bootstrapped and regression tested on
x86_64-pc-linux-gnu {,-m32} and will be committed to mainline SVN
after a couple of days. The patch will be also backported to 4.9
branch.

Uros.

Attachment: p.diff.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]