[patch] tuning gcc for AMDFAM10 processor (patch 2)

Jagasia, Harsha harsha.jagasia@amd.com
Tue Jan 30 16:54:00 GMT 2007


Hi Uros,
Thanks for your suggestions, here is the correct ChangeLog. This should
also fix the issues Roger mentioned.

I used m_ATHLON_K8_AMDFAM10 for brevity since those CPUs do have similar
tuning mostly except for little divergences. I think it makes the
surrounding code easier to maintain and easier to identify the tuning
parameters where those CPU's are similar.

-----
2007-01-30      Harsha Jagasia  <harsha.jagasia@amd.com>


        * config/i386/i386.h (TARGET_AMDFAM10): New macro.
        (TARGET_CPU_CPP_BUILTINS): Add code for amdfam10.
        Define TARGET_CPU_DEFAULT_amdfam10.
        (TARGET_CPU_DEFAULT_NAMES): Add amdfam10.
        (processor_type): Add PROCESSOR_AMDFAM10.

        * config/i386/i386.md:  Add amdfam10 as a new cpu
        attribute to match processor_type in config/i386/i386.h.
        Enable imul peepholes for TARGET_AMDFAM10.

        * config.gcc: Add support for --with-cpu option for
        amdfam10.

        * config/i386/i386.c (amdfam10_cost): New variable.
        (m_AMDFAM10): New macro.
        (m_ATHLON_K8_AMDFAM10): New macro.
        (x86_use_leave, x86_push_memory, x86_movx, x86_unroll_strlen,
        x86_cmove, x86_3dnow_a, x86_deep_branch, x86_use_simode_fiop,
        x86_promote_QImode, x86_integer_DFmode_moves,
        x86_partial_reg_dependency, x86_memory_mismatch_stall,
        x86_accumulate_outgoing_args, x86_arch_always_fancy_math_387,
        x86_sse_partial_reg_dependency, x86_sse_typeless_stores,
        x86_use_ffreep, x86_use_incdec, x86_four_jump_limit,
x86_schedule,
        x86_use_bt, x86_cmpxchg16b, x86_pad_returns): Enabled/disabled
        for amdfam10.
        (override_options): Add amdfam10_cost to processor_target_table.
        Set up PROCESSOR_AMDFAM10 for amdfam10 entry in
        processor_alias_table.
        (ix86_issue_rate): Add PROCESSOR_AMDFAM10.
        (ix86_adjust_cost): Add code for amdfam10.

>
>> This is the 2nd of 11 patches to tune gcc for AMD's AMDFAM10
processor
>> (based on mainline rev 121295). This patch defines mtune=amdfam10 and
>> enables/disables some existing tuning choices for amdfam10 such as
>> aligning loop tops to 32 bytes and using push/pops instead of moves
for
>> prologue/epilogue.
>
>>  #define m_GENERIC64 (1<<PROCESSOR_GENERIC64)
>>  #define m_GENERIC (m_GENERIC32 | m_GENERIC64)
>> +#define m_ATHLON_K8_AMDFAM10  (m_K8 | m_ATHLON | m_AMDFAM10)
>
>This part isn't described in ChangeLog.
>
>>         (x86_use_leave, x86_push_memory, int x86_movx,
x86_unroll_strlen,
>>         x86_cmove, x86_fisttp, x86_3dnow_a, x86_deep_branch,
>>         x86_use_simode_fiop, x86_promote_QImode,
>
>x86_fisttp is gone.
>
>BTW: Is there really a reason to use combined defines , such as
>m_ATHLON_K8_AMDFAM10? These are used only in i386.c in a lines below
>and IMO don't bring us anything.
>
>Uros.
>

Thanks,
Harsha




More information about the Gcc-patches mailing list