This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch 1/2][ARM]: New CPU support for Marvell Whitney


Hi Xingxing,
Some comments inline on the tuning struct and pipeline description.

On 25/02/15 13:32, Xingxing Pan wrote:
> Hi,
>
> This patch merges pipeline description for marvell-whitney to latest code base.
> Is it OK for trunk?
>
> -- Regards, Xingxing
> whitney_p1_20150225.patch
A couple of questions:

+const struct tune_params arm_marvell_whitney_tune =
+{
+  arm_9e_rtx_costs,
+  &cortexa9_extra_costs,
+  cortex_a9_sched_adjust_cost,

Are you sure the cortexa9_extra_costs are best for this? Did you try other cost
tables? Do you think it would be good to write your own?
The pipeline description you posted seems to be very intricate.
If such complexity is needed to get the most out of the core, then perhaps
it could benefit from a custom cost table?

Also, are you sure cortex_a9_sched_adjust_cost is appropriate?
It's a very Cortex-A9 specific hook and we don't use it for any other cores.
Did it give meaningful improvements in performance?

I don't know much about this core (any public documentation you can point
us at, perhaps?) so I'll trust you that these are optimal...


+  1,                        /* Constant limit.  */
+  5,                        /* Max cond insns.  */
+  ARM_PREFETCH_BENEFICIAL(4,32,32),
+  false,                    /* Prefer constant pool.  */
+  arm_default_branch_cost,
+  false,                    /* Prefer LDRD/STRD.  */
+  {true, true},                    /* Prefer non short circuit.  */
+  &arm_default_vec_cost,                        /* Vectorizer costs.  */
+ false, /* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */
+  false,                    /* Prefer Neon for stringops.  */
+  8                        /* Maximum insns to inline memset.  */
+};
+


As Maxim said, you need to take into account the two new fields that were
recently added, the fusion and prefetcher modelling ones.

>
> diff --git a/gcc/config/arm/marvell-whitney.md b/gcc/config/arm/marvell-whitney.md
> new file mode 100644
> index 0000000..fffbd29
> --- /dev/null
> +++ b/gcc/config/arm/marvell-whitney.md
> @@ -0,0 +1,678 @@
> +;; Marvell ARM Processor Pipeline Description
> +;; Copyright (C) 2011-2014 Free Software Foundation, Inc.

Update the year to 2015 .

>
> +;;
> +;; Contributed by Marvell.
> +
> +;; This file is part of GCC.
> +;;
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published
> +;; by the Free Software Foundation; either version 3, or (at your
> +;; option) any later version.
> +;;
> +;; GCC is distributed in the hope that it will be useful, but WITHOUT
> +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +;; License for more details.
> +;;
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; <http://www.gnu.org/licenses/>.
> +
> +(define_attr "whitney_type"
> + "wTP1,wTP2,wTP3,wTP4,wTP5,wTP6,wTP7,wTP8,wTP9,wTP10,wTP11,wTP12,wTP13,\
> + wTP14,wTP15,wTP16,wTP17,wTP18,wTP19,wTP20,wTP21,wTP22,wTP23,wTP24,wTP25,\ > + wTP26,wTP27,wTP28,wTP29,wTP30,wTP31,wTP32,wTP33,wTP34,wTP35,wTP36,wTP37,\ > + wTP38,wTP39,wTP40,wTP41,wTP42,wTP43,wTP44,wTP45,wTP46,wTP47,wTP48,wTP49,\
> + wTP50,wTP51,wTP52,wTP53,wTP54,wTP55,wTP56,wTP57,wTP58,wTP59,\

These types are very hard to remember and understand, and hence maintain.
I'm assuming the wTP* notation stands for "Whitney TyPe"?
Look at the other pipeline descriptions in config/arm/ and try to give them more
descriptive names, like: whitney_alu_simple, whitney_alu_shift_reg etc.
It will make it much easier to understand.

I've tried this patch out and added:
(automata_option "v")
(automata_option "time")
(automata_option "stats")
(automata_option "progress")

to the .md file to get some compiler build-time stats on the automata.
I see that the whitney_pipe automaton is the largest automaton in terms
of NDFA states, and second largest in minimal DFA states:

    44170 NDFA states,          188695 NDFA arcs
    44170 DFA states,           188695 DFA arcs
     7070 minimal DFA states,   66279 minimal DFA arcs
      923 all insns         30 insn equivalence classes

Could the complexity of it be reduced without sacrificing code quality?
In some cores we break up the automaton into two automata, roughly separated into an integer/memory unit and another one for FP/NEON units to reduce the state space. Look at cortex-a15.md and cortex-a15-neon.md or some other description of that kind.

>
> +  (cond [
> +          (eq_attr "type" "alu_imm,alu_sreg,alus_imm,alus_sreg,adc_reg,\
> + adc_imm,bfm,branch,call,clz,extend,logic_imm,\
> + logic_reg,logics_imm,logics_reg,logics_shift_reg,\
> + mov_imm,mov_reg,mvn_imm,mvn_reg,shift_imm,\
> +                           shift_reg,rev")
> +                          (const_string "wTP1")
> +          (eq_attr "type" "alu_shift_imm,alu_shift_reg")
> +                          (if_then_else (match_test
> +                          "marvell_whitney_inner_shift (insn)")
> +                                      (const_string "wTP2")
> +                                      (const_string "wTP1"))
> +          (eq_attr "type" "alus_shift_imm,alus_shift_reg,mov_shift_reg,\
> + mov_shift,mvn_shift_reg,mvn_shift,logic_shift_imm,\
> +                           logic_shift_reg,multiple")
> +                          (const_string "wTP2")
> + (eq_attr "type" "mla,mlas,mul,muls,smlal,smlalxy,smlaxy,smull,\
> +                           smulxy,umlal,umull")
> +                          (const_string "wTP3")
> +          (eq_attr "type" "sdiv,udiv")
> +                          (const_string "wTP4")
> +          (eq_attr "type" "load1,load2,f_loads,f_loadd,load_byte")
> +                          (const_string "wTP5")
> +          (eq_attr "type" "neon_load2_one_lane,neon_load2_one_lane_q,\
> + neon_load2_all_lanes,neon_load2_all_lanes_q,\
> +                           neon_load2_2reg_q")
> +                          (const_string "wTP7")
> +          (eq_attr "type" "neon_load3_one_lane,neon_load3_one_lane_q,\
> +                           neon_load3_3reg,neon_load3_3reg_q,\
> + neon_load3_all_lanes,neon_load3_all_lanes_q")
> +                          (const_string "wTP8")
> +          (eq_attr "type" "neon_load4_one_lane,neon_load4_one_lane_q,\
> +                           neon_load4_4reg,neon_load4_4reg_q,\
> + neon_load4_all_lanes,neon_load4_all_lanes_q")
> +                          (const_string "wTP9")
> + (eq_attr "type" "load3,load4,neon_load1_3reg,neon_load1_3reg_q,\
> + neon_load1_4reg,neon_load1_4reg_q,neon_load1_4reg")
> +                          (const_string "wTP10")
> +          (eq_attr "type" "load2,neon_load1_1reg,neon_load1_1reg_q,\
> + neon_load1_2reg,neon_load1_2reg_q,")
> +                          (const_string "wTP11")
> +          (eq_attr "type" "neon_load1_all_lanes,neon_load1_all_lanes_q,\
> + neon_load1_one_lane,neon_load1_one_lane_q,")
> +                           (const_string "wTP6")
> +          (eq_attr "type" "f_stored,f_stores,store1,store2")
> +                           (const_string "wTP12")
> + (eq_attr "type" "f_stored,neon_store1_3reg,neon_store1_3reg_q,\
> + neon_store1_4reg,neon_store1_4reg_q,\
> + neon_store2_4reg,neon_store2_4reg_q,\
> + neon_store3_3reg,neon_store3_3reg_q,\
> + neon_store3_one_lane,neon_store3_one_lane_q,\
> + neon_store4_4reg,neon_store4_4reg_q,\
> + neon_store4_one_lane,neon_store4_one_lane_q,\
> +                           store3,store4")
> +                           (const_string "wTP13")
> +          (eq_attr "type" "neon_store1_1reg,neon_store1_1reg_q,\
> + neon_store1_2reg,neon_store1_2reg_q,\
> + neon_store1_one_lane,neon_store1_one_lane_q,\
> + neon_store2_2reg,neon_store2_2reg_q,\
> + neon_store2_one_lane,neon_store2_one_lane_q")
> +                           (const_string "wTP14")
> + (eq_attr "type" "fmuld,fmuls,neon_fp_mul_s,neon_fp_mul_s_scalar")
> +                           (const_string "wTP15")
> +          (eq_attr "type" "neon_fp_mul_s_q,neon_fp_mul_s_scalar_q")
> +                           (const_string "wTP16")
> +          (eq_attr "type" "ffmad,ffmas,neon_fp_mla_s,neon_fp_round_s,\
> +                           neon_fp_round_d")
> +                           (const_string "wTP17")
> +          (eq_attr "type" "neon_fp_mla_s_q,neon_fp_round_s_q,\
> +                           neon_fp_round_d_q")
> +                           (const_string "wTP18")
> + (eq_attr "type" "fmacd,fmacs,neon_fp_mla_s,neon_fp_mla_s_scalar,\
> +                           neon_fp_recps_s,neon_fp_rsqrts_s")
> +                           (const_string "wTP19")
> +          (eq_attr "type" "neon_fp_mla_s_q,neon_fp_mla_s_scalar_q,\
> + neon_fp_recps_s_q,neon_fp_rsqrts_s_q")
> +                           (const_string "wTP20")
> +          (eq_attr "type" "fconsts,fconstd,ffariths,ffarithd,fmov,\
> +                           neon_fp_abs_s,neon_fp_neg_s")
> +                           (const_string "wTP21")
> +          (eq_attr "type" "neon_fp_abs_s_q,neon_fp_neg_s_q")
> +                           (const_string "wTP22")
> + (eq_attr "type" "fcmpd,fcmps,neon_fp_minmax_s,neon_fp_compare_s,\
> + neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_d")
> +                           (const_string "wTP23")
> +          (eq_attr "type" "neon_fp_minmax_s_q,neon_fp_compare_s_q,\
> + neon_fp_reduc_minmax_s_q,neon_fp_reduc_minmax_d_q")
> +                           (const_string "wTP24")
> +          (eq_attr "type" "f_cvt,f_cvtf2i,f_cvti2f,neon_fp_to_int_s,\
> + neon_fp_to_int_d,neon_int_to_fp_s,neon_int_to_fp_d")
> +                           (const_string "wTP25")
> +          (eq_attr "type" "neon_fp_to_int_s_q,neon_fp_to_int_d_q,\
> + neon_int_to_fp_s_q,neon_int_to_fp_d_q")
> +                           (const_string "wTP26")
> +          (eq_attr "type" "faddd,fadds,neon_fp_abd_s,neon_fp_addsub_s,\
> +                           neon_fp_reduc_add_s")
> +                           (const_string "wTP27")
> +          (eq_attr "type" "neon_fp_abd_s_q,neon_fp_addsub_s_q,\
> +                           neon_fp_reduc_add_s_q")
> +                           (const_string "wTP28")
> +          (eq_attr "type" "fdivs")
> +                           (const_string "wTP29")
> +          (eq_attr "type" "fsqrts")
> +                           (const_string "wTP30")
> +          (eq_attr "type" "fdivd")
> +                           (const_string "wTP31")
> +          (eq_attr "type" "fsqrtd")
> +                           (const_string "wTP32")
> +          (eq_attr "type" "neon_reduc_add_acc")
> +                          (if_then_else (match_test
> +                          "marvell_whitney_vector_mode_qi (insn)")
> +                                        (const_string "wTP33")
> +                                        (const_string "wTP35"))
> +          (eq_attr "type" "neon_reduc_add_acc_q")
> +                          (if_then_else (match_test
> +                          "marvell_whitney_vector_mode_qi (insn)")
> +                                        (const_string "wTP34")
> +                                        (const_string "wTP36"))
> + (eq_attr "type" "neon_add,neon_qadd,neon_qsub,neon_reduc_add_long,\
> + neon_reduc_add,neon_sub,neon_sub_halve")
> +                          (if_then_else (match_test
> +                          "marvell_whitney_vector_mode_qi (insn)")
> +                                        (const_string "wTP37")
> +                                        (const_string "wTP39"))
> +          (eq_attr "type" "neon_add_q,neon_qadd_q,neon_qsub_q,\
> + neon_reduc_add_q,neon_sub_q,neon_sub_halve_q")
> +                          (if_then_else (match_test
> +                          "marvell_whitney_vector_mode_qi (insn)")
> +                                        (const_string "wTP38")
> +                                        (const_string "wTP40"))
> +          (eq_attr "type" "neon_add_halve,neon_add_long,neon_add_widen,\
> + neon_compare,neon_compare_zero,neon_sub_long,\
> +                           neon_sub_widen,neon_tst")
> +                                        (const_string "wTP37")
> +          (eq_attr "type" "neon_add_halve_q,neon_compare_q,\
> +                           neon_compare_zero_q,neon_tst_q")
> +                           (const_string "wTP38")
> + (eq_attr "type" "neon_add_halve_narrow_q,neon_sub_halve_narrow_q")
> +                           (const_string "wTP40")
> +          (eq_attr "type" "neon_permute,neon_tbl1,neon_tbl2,neon_tbl3,\
> +                           neon_tbl4,neon_zip")
> +                           (const_string "wTP41")
> +          (eq_attr "type" "neon_permute_q,neon_zip_q")
> +                           (const_string "wTP42")
> +          (eq_attr "type" "neon_bsl,neon_logic")
> +                           (const_string "wTP43")
> +          (eq_attr "type" "neon_arith_acc,neon_shift_acc")
> +                          (if_then_else (match_test
> +                          "marvell_whitney_vector_mode_qi (insn)")
> +                                        (const_string "wTP43")
> +                                        (const_string "wTP45"))
> +          (eq_attr "type" "neon_arith_acc_q,neon_shift_acc_q")
> +                          (if_then_else (match_test
> +                          "marvell_whitney_vector_mode_qi (insn)")
> +                                        (const_string "wTP44")
> +                                        (const_string "wTP46"))
> +          (eq_attr "type" "neon_logic_q")
> +                          (const_string "wTP44")
> +          (eq_attr "type" "neon_cls,neon_cnt,neon_dup,neon_ext,\
> + neon_fp_recpe_s,neon_fp_rsqrte_s,neon_minmax,\
> + neon_move,neon_reduc_minmax,neon_rev")
> +                          (const_string "wTP47")
> +          (eq_attr "type" "neon_cls_q,neon_cnt_q,neon_dup_q,neon_ext_q,\
> + neon_fp_recpe_s_q,neon_fp_rsqrte_s_q,neon_minmax_q,\
> + neon_move_q,neon_move_narrow_q,neon_reduc_minmax_q,\
> + neon_rev_q,neon_sat_shift_imm_narrow_q,\
> +                           neon_shift_imm_narrow_q")
> +                           (const_string "wTP48")
> +          (eq_attr "type" "neon_abd_long,neon_abs,neon_neg,neon_qabs,\
> + neon_qneg,neon_sat_shift_imm,neon_shift_imm_long,\
> +                           neon_shift_imm,neon_shift_reg")
> +                          (if_then_else (match_test
> +                          "marvell_whitney_vector_mode_qi (insn)")
> +                                        (const_string "wTP47")
> +                                        (const_string "wTP49"))
> + (eq_attr "type" "neon_abs_q,neon_neg_q,neon_qabs_q,neon_qneg_q,\
> + neon_sat_shift_imm_q,neon_shift_imm_q,\
> +                           neon_shift_reg_q")
> +                          (if_then_else (match_test
> +                          "marvell_whitney_vector_mode_qi (insn)")
> +                                        (const_string "wTP48")
> +                                        (const_string "wTP50"))
> + (eq_attr "type" "neon_mul_b_long,neon_mul_h_long,neon_mul_s_long,\
> + neon_sat_mul_b,neon_sat_mul_h,neon_sat_mul_s")
> +                           (const_string "wTP51")
> + (eq_attr "type" "neon_sat_mul_b_q,neon_sat_mul_h_q,neon_sat_mul_s_q,\
> + neon_mul_h_scalar_long,neon_mul_s_scalar_long,\
> + neon_sat_mul_b_long,neon_sat_mul_h_long,\
> + neon_sat_mul_s_long,neon_sat_mul_h_scalar_long,\
> + neon_sat_mul_s_scalar_long,neon_sat_mul_h_scalar_q,\
> +                           neon_sat_mul_s_scalar_q")
> +                           (const_string "wTP52")
> + (eq_attr "type" "neon_mla_b,neon_mla_h,neon_mla_s,neon_mla_h_scalar,\
> +                           neon_mla_s_scalar")
> +                           (const_string "wTP53")
> +          (eq_attr "type" "neon_mla_b_q,neon_mla_h_q,neon_mla_s_q,\
> + neon_mla_h_scalar_q,neon_mla_s_scalar_q,\
> + neon_mla_b_long,neon_mla_h_long,neon_mla_s_long,\
> + neon_mla_h_scalar_long,neon_mla_s_scalar_long,\
> + neon_sat_mla_b_long,neon_sat_mla_h_long,\
> + neon_sat_mla_s_long,neon_sat_mla_h_scalar_long,\
> +                           neon_sat_mla_s_scalar_long")
> +                           (const_string "wTP54")
> +          (eq_attr "type" "f_flag,f_mrc,f_mrrc")
> +                           (const_string "wTP57")
> +          (eq_attr "type" "neon_to_gp,neon_to_gp_q")
> +                           (const_string "wTP55")
> +          (eq_attr "type" "f_mcr,f_mcrr")
> +                           (const_string "wTP58")
> +          (eq_attr "type" "neon_from_gp_q")
> +                           (const_string "wTP59")
> +          (eq_attr "type" "neon_from_gp")
> +                           (const_string "wTP56")]
> +          (const_string "unknown")))
> +
<snip>
>
> +
> +(define_bypass 8 "wIR15" "wIR29, wIR31, wIR30, wIR32")
> +(define_bypass 8 "wIR16" "wIR29, wIR31, wIR30, wIR32")
> +(define_bypass 8 "wIR17" "wIR29, wIR31, wIR30, wIR32")
> +(define_bypass 8 "wIR18" "wIR29, wIR31, wIR30, wIR32")
> +(define_bypass 12 "wIR19" "wIR29, wIR31, wIR30, wIR32")
> +(define_bypass 12 "wIR20" "wIR29, wIR31, wIR30, wIR32")
> +(define_bypass 4 "wIR21" "wIR29, wIR31, wIR30, wIR32")
> +(define_bypass 5 "wIR22" "wIR41")
> +(define_bypass 6 "wIR22" "wIR42")
> +(define_bypass 6 "wIR23" "wIR41, wIR42")
> +(define_bypass 6 "wIR24" "wIR41")
> +(define_bypass 7 "wIR24" "wIR42")
> +(define_bypass 6 "wIR25" "wIR30, wIR32, wIR29, wIR31")
> +(define_bypass 7 "wIR26" "wIR41")
> +(define_bypass 8 "wIR26" "wIR42")
> +(define_bypass 6 "wIR27" "wIR41, wIR42")
> +(define_bypass 6 "wIR28" "wIR41, wIR42")
> +(define_bypass 18 "wIR29" "wIR29, wIR30, wIR31, wIR32")
> +(define_bypass 18 "wIR30" "wIR29, wIR30, wIR31, wIR32")
> +(define_bypass 28 "wIR31" "wIR29, wIR30, wIR31, wIR32")
> +(define_bypass 28 "wIR32" "wIR29, wIR30, wIR31, wIR32")
> +(define_bypass 6 "wIR33" "wIR41, wIR42")
> +(define_bypass 6 "wIR34" "wIR41")
> +(define_bypass 7 "wIR34" "wIR42")
> +(define_bypass 6 "wIR35" "wIR41, wIR42")
> +(define_bypass 6 "wIR36" "wIR41")
> +(define_bypass 7 "wIR36" "wIR42")
> +(define_bypass 6 "wIR37" "wIR41, wIR42")
> +(define_bypass 6 "wIR38" "wIR41")
> +(define_bypass 7 "wIR38" "wIR42")
> +(define_bypass 6 "wIR39" "wIR41, wIR42")
> +(define_bypass 6 "wIR40" "wIR41")
> +(define_bypass 7 "wIR40" "wIR42")
> +(define_bypass 6 "wIR41" "wIR41, wIR42")
> +(define_bypass 6 "wIR42" "wIR41")
> +(define_bypass 7 "wIR42" "wIR42")
> +(define_bypass 6 "wIR43" "wIR41, wIR42")
> +(define_bypass 6 "wIR44" "wIR41")
> +(define_bypass 7 "wIR44" "wIR42")
> +(define_bypass 6 "wIR45" "wIR41, wIR42")
> +(define_bypass 6 "wIR46" "wIR41")
> +(define_bypass 7 "wIR46" "wIR42")
> +(define_bypass 6 "wIR47" "wIR41, wIR42")
> +(define_bypass 6 "wIR48" "wIR41")
> +(define_bypass 7 "wIR48" "wIR42")
> +(define_bypass 6 "wIR49" "wIR41, wIR42")
> +(define_bypass 6 "wIR50" "wIR41")
> +(define_bypass 7 "wIR50" "wIR42")
> +(define_bypass 8 "wIR51" "wIR41, wIR42")
> +(define_bypass 8 "wIR52" "wIR41")
> +(define_bypass 9 "wIR52" "wIR42")
> +(define_bypass 8 "wIR53" "wIR41, wIR42")
> +(define_bypass 8 "wIR54" "wIR41")
> +(define_bypass 9 "wIR54" "wIR42")
> +(define_bypass 2 "wIR56" "wIR41, wIR42")
> +(define_bypass 4 "wIR58" "wIR30, wIR32, wIR29, wIR31")
> +(define_bypass 5 "wIR58" "wIR41, wIR42")
> +(define_bypass 5 "wIR59" "wIR41")
> +(define_bypass 6 "wIR59" "wIR42")
These are very hard to understand, which brings us back to my comment above.
The instruction reservations should have more descriptive names.

Cheers,
Kyrill


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]