This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch 2/2,avr] PR49847: Add hook to place read-only lookup-tables in named address-space


On 27.07.2017 14:29, Georg-Johann Lay wrote:
For some targets, the best place to put read-only lookup tables as
generated by -ftree-switch-conversion is not the generic address space
but some target specific address space.

This is the case for AVR, where for most devices .rodata must be
located in RAM.

Part #1 adds a new, optional target hook that queries the back-end
about the desired address space.  There is currently only one user:
tree-switch-conversion.c::build_one_array() which re-builds value_type
and array_type if the address space returned by the backend is not
the generic one.

Part #2 is the AVR part that implements the new hook and adds some
sugar around it.

This is the AVR part.

It implements the new hook which returns a convenient flash address
space for devices where .rodata is located in RAM:  The 16-bit __flash
for devices with <= 64 KiB flash and 24-bit __memx for > 64 KiB flash.

It adds a new option -madd-space-for-lookup= which allows to pick a
specific address space.

Some new insns and combine-split suport best code generation by the
knowledge that the 24-bit addresses will never point to RAM so that
the expensive decision-at-runtime whether LD or LPM has to be used
can be avoided.

Passed without new regressions on atmega128.

Ok for trunk provided the gcc part 1/2 is approved?

Johann


	Implement TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA.

	PR target/49857
	* config/avr/avr-opts.h: New file.
	* config/avr/avr.opt: Include it.
	(-maddr-space-for-lookup=): New option and...
	(avr_opt_addr_space_for_lookup): ...associated Var.
	(avr_aspace_for_lookup): New option enums used by above.
	* config/avr/avr-protos.h (avr_out_load_flashx): New proto.
	* config/avr/avr.c (avr_out_load_flashx): New function.
	* avr_adjust_insn_length [ADJUST_LEN_LOAD_FLASHX]: Handle it.
	* avr_rtx_costs_1 [ZERO_EXTEND, SIGN_EXTEND]: Handle
	shift-and-extend-by-1 of HI -> PSI.
	[ASHIFT,PSImode]: Describe cost of extend-and-shift-by-1.
	(TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA): Define to...
	(avr_addr_space_for_artificial_rodata): ...this new static function.
	* config/avr/avr.md (unspec): Add UNSPEC_LOAD_FLASHX.
	(adjust_len): Add load_flashx.
	(*ashiftpsi.1_sign_extend.hi, *ashiftpsi.1_zero_extend.hi)
	(*extendpsi.ashift.1.uqi, *load<mode>-flashx): New insns.
	(*split_xload<mode>-cswtch): New insn-and-split.
	* doc/invoke.texi (AVR Options) <-maddr-space-for-lookup=>: Document.

Index: config/avr/avr-opts.h
===================================================================
--- config/avr/avr-opts.h	(nonexistent)
+++ config/avr/avr-opts.h	(working copy)
@@ -0,0 +1,40 @@
+/* Definitions for option handling for AVR.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef AVR_OPTS_H
+#define AVR_OPTS_H
+
+enum avr_opt_addr_space
+  {
+    AVR_OPT_ADDR_SPACE_flash,
+    AVR_OPT_ADDR_SPACE_flash1,
+    AVR_OPT_ADDR_SPACE_flash2,
+    AVR_OPT_ADDR_SPACE_flash3,
+    AVR_OPT_ADDR_SPACE_flash4,
+    AVR_OPT_ADDR_SPACE_flash5,
+    AVR_OPT_ADDR_SPACE_memx,
+    AVR_OPT_ADDR_SPACE_generic
+  };
+
+#endif /* AVR_OPTS_H */
Index: config/avr/avr-protos.h
===================================================================
--- config/avr/avr-protos.h	(revision 250302)
+++ config/avr/avr-protos.h	(working copy)
@@ -94,6 +94,7 @@ extern const char* avr_out_plus (rtx, rt
 extern const char* avr_out_round (rtx_insn *, rtx*, int* =NULL);
 extern const char* avr_out_addto_sp (rtx*, int*);
 extern const char* avr_out_xload (rtx_insn *, rtx*, int*);
+extern const char* avr_out_load_flashx (rtx_insn*, rtx*, int*);
 extern const char* avr_out_movmem (rtx_insn *, rtx*, int*);
 extern const char* avr_out_insert_bits (rtx*, int*);
 extern bool avr_popcount_each_byte (rtx, int, int);
Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(revision 250302)
+++ config/avr/avr.c	(working copy)
@@ -3914,6 +3914,31 @@ avr_out_xload (rtx_insn *insn ATTRIBUTE_
 }
 
 
+/* Load register XOP[0] with flash content from PSI address XOP[1].
+   We have ELPMX, MOVW, load up to 4 bytes and may clobber Z.  */
+
+const char*
+avr_out_load_flashx (rtx_insn*, rtx *xop, int *plen)
+{
+  unsigned n_bytes = GET_MODE_SIZE (GET_MODE (xop[0]));
+  gcc_assert (n_bytes <= 4);
+
+  avr_asm_len ("out __RAMPZ__,%C1", xop, plen, -1);
+  avr_asm_len ("movw ZL,%1", xop, plen, 1);
+
+  avr_asm_len ("elpm %A0,Z+", xop, plen, 1);
+  if (n_bytes >= 2)  avr_asm_len ("elpm %B0,Z+", xop, plen, 1);
+  if (n_bytes >= 3)  avr_asm_len ("elpm %C0,Z+", xop, plen, 1);
+  if (n_bytes >= 4)  avr_asm_len ("elpm %D0,Z+", xop, plen, 1);
+
+  // EBI devices: reset RAMPZ to 0 so that we don't read garbage from RAM.
+  if (AVR_HAVE_RAMPD)
+    avr_asm_len ("out __RAMPZ__,__zero_reg__", xop, plen, 1);
+  
+  return "";
+}
+
+
 const char*
 output_movqi (rtx_insn *insn, rtx operands[], int *plen)
 {
@@ -9438,6 +9463,7 @@ avr_adjust_insn_length (rtx_insn *insn,
     case ADJUST_LEN_MOV32: output_movsisf (insn, op, &len); break;
     case ADJUST_LEN_MOVMEM: avr_out_movmem (insn, op, &len); break;
     case ADJUST_LEN_XLOAD: avr_out_xload (insn, op, &len); break;
+    case ADJUST_LEN_LOAD_FLASHX: avr_out_load_flashx (insn, op, &len); break;
     case ADJUST_LEN_SEXT: avr_out_sign_extend (insn, op, &len); break;
 
     case ADJUST_LEN_SFRACT: avr_out_fract (insn, op, true, &len); break;
@@ -10837,6 +10863,12 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
       return true;
 
     case ZERO_EXTEND:
+      if (PSImode == mode
+          && ASHIFT == GET_CODE (XEXP (x, 0)))
+        {
+          *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
+          return true;
+        }
       *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)
 			      - GET_MODE_SIZE (GET_MODE (XEXP (x, 0))));
       *total += avr_operand_rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
@@ -10844,6 +10876,12 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
       return true;
 
     case SIGN_EXTEND:
+      if (PSImode == mode
+          && ASHIFT == GET_CODE (XEXP (x, 0)))
+        {
+          *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
+          return true;
+        }
       *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) + 2
 			      - GET_MODE_SIZE (GET_MODE (XEXP (x, 0))));
       *total += avr_operand_rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
@@ -11227,6 +11265,18 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
 	  break;
 
         case PSImode:
+          if (SIGN_EXTEND == GET_CODE (XEXP (x, 0))
+              && const1_rtx == XEXP (x, 1))
+            {
+              *total = COSTS_N_INSNS (3);
+              return true;
+            }
+          if (ZERO_EXTEND == GET_CODE (XEXP (x, 0))
+              && const1_rtx == XEXP (x, 1))
+            {
+              *total = COSTS_N_INSNS (4);
+              return true;
+            }
           if (!CONST_INT_P (XEXP (x, 1)))
             {
               *total = COSTS_N_INSNS (!speed ? 6 : 73);
@@ -13341,6 +13391,66 @@ avr_emit3_fix_outputs (rtx (*gen)(rtx,rt
 }
 
 
+/* Implement `TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA'.  */
+
+static addr_space_t
+avr_addr_space_for_artificial_rodata (tree node)
+{
+  if (TREE_CODE (node) == ARRAY_TYPE
+      && INTEGRAL_TYPE_P (TREE_TYPE (node))
+      && int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (node))) > 4)
+    {
+      // Due to reduced addressing mode and DImode being lowered to QI,
+      // avoid a total code bloat by leaving DImode in .rodata.
+      return (addr_space_t) ADDR_SPACE_GENERIC;
+    }
+
+  // Map option to address space proper... due to cumbersome include
+  // order we cannot use address spaces as option enum *sigh*.
+
+  int as = ADDR_SPACE_GENERIC;
+  switch (avr_opt_addr_space_for_lookup)
+    {
+    case AVR_OPT_ADDR_SPACE_memx:   as = ADDR_SPACE_MEMX; break;
+    case AVR_OPT_ADDR_SPACE_flash:  as = ADDR_SPACE_FLASH; break;
+    case AVR_OPT_ADDR_SPACE_flash1: as = ADDR_SPACE_FLASH1; break;
+    case AVR_OPT_ADDR_SPACE_flash2: as = ADDR_SPACE_FLASH2; break;
+    case AVR_OPT_ADDR_SPACE_flash3: as = ADDR_SPACE_FLASH3; break;
+    case AVR_OPT_ADDR_SPACE_flash4: as = ADDR_SPACE_FLASH4; break;
+    case AVR_OPT_ADDR_SPACE_flash5: as = ADDR_SPACE_FLASH5; break;
+    default:
+      as = ADDR_SPACE_GENERIC;
+      break;
+    }
+
+  if (as == ADDR_SPACE_GENERIC
+#if defined HAVE_LD_AVR_AVRXMEGA3_RODATA_IN_FLASH
+      // If the device has a linear address space and configure found out
+      // that .rodata resides in flash, use .rodata without extra penalty.
+      || avr_arch->flash_pm_offset != 0
+#endif
+      // Reduced Tiny has no address spaces, .rodata is just fine.
+      || AVR_TINY)
+    {
+      return (addr_space_t) ADDR_SPACE_GENERIC;
+    }
+
+  if (avr_n_flash > 1
+      && (AVR_HAVE_ELPM || AVR_HAVE_ELPMX))
+    {
+      if (as == ADDR_SPACE_MEMX
+          || (as == ADDR_SPACE_FLASH1 && avr_n_flash > 1)
+          || (as == ADDR_SPACE_FLASH2 && avr_n_flash > 2)
+          || (as == ADDR_SPACE_FLASH3 && avr_n_flash > 3)
+          || (as == ADDR_SPACE_FLASH4 && avr_n_flash > 4)
+          || (as == ADDR_SPACE_FLASH5 && avr_n_flash > 5))
+        return (addr_space_t) as;
+    }
+
+  return (addr_space_t) ADDR_SPACE_FLASH;
+}
+
+
 /* Worker function for movmemhi expander.
    XOP[0]  Destination as MEM:BLK
    XOP[1]  Source      "     "
@@ -14762,6 +14872,10 @@ avr_fold_builtin (tree fndecl, int n_arg
 #undef  TARGET_ADDR_SPACE_DIAGNOSE_USAGE
 #define TARGET_ADDR_SPACE_DIAGNOSE_USAGE avr_addr_space_diagnose_usage
 
+#undef  TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA
+#define TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA  \
+  avr_addr_space_for_artificial_rodata
+
 #undef  TARGET_MODE_DEPENDENT_ADDRESS_P
 #define TARGET_MODE_DEPENDENT_ADDRESS_P avr_mode_dependent_address_p
 
Index: config/avr/avr.md
===================================================================
--- config/avr/avr.md	(revision 250302)
+++ config/avr/avr.md	(working copy)
@@ -78,6 +78,7 @@ (define_c_enum "unspec"
    UNSPEC_COPYSIGN
    UNSPEC_IDENTITY
    UNSPEC_INSERT_BITS
+   UNSPEC_LOAD_FLASHX
    UNSPEC_ROUND
    ])
 
@@ -158,7 +159,7 @@ (define_attr "adjust_len"
    tsthi, tstpsi, tstsi, compare, compare64, call,
    mov8, mov16, mov24, mov32, reload_in16, reload_in24, reload_in32,
    ufract, sfract, round,
-   xload, movmem,
+   xload, load_flashx, movmem,
    ashlqi, ashrqi, lshrqi,
    ashlhi, ashrhi, lshrhi,
    ashlsi, ashrsi, lshrsi,
@@ -1405,6 +1406,99 @@ (define_insn "*addsi3_zero_extend.hi"
   [(set_attr "length" "4")
    (set_attr "cc" "set_n")])
 
+(define_insn "*ashiftpsi.1_sign_extend.hi"
+  [(set (match_operand:PSI 0 "register_operand"                            "=r")
+        (ashift:PSI (sign_extend:PSI (match_operand:HI 1 "register_operand" "0"))
+                    (const_int 1)))]
+  ""
+  "lsl %A0\;rol %B0\;sbc %C0,%C0"
+  [(set_attr "length" "3")
+   (set_attr "cc" "clobber")])
+
+(define_insn "*ashiftpsi.1_zero_extend.hi"
+  [(set (match_operand:PSI 0 "register_operand"                            "=r")
+        (ashift:PSI (zero_extend:PSI (match_operand:HI 1 "register_operand" "0"))
+                    (const_int 1)))]
+  ""
+  "lsl %A0\;rol %B0\;clr %C0\;rol %C0"
+  [(set_attr "length" "4")
+   (set_attr "cc" "clobber")])
+
+(define_split ; non-matching
+  [(set (match_operand:PSI 0 "register_operand")
+        (plus:PSI (sign_extend:PSI (ashift:HI (match_operand:HI 1 "register_operand")
+                                              (const_int 1)))
+                  (match_operand:PSI 2 "symbol_ref_operand")))]
+  "!reload_completed"
+  [;; "*ashiftpsi.1_sign_extend.hi"
+   (set (match_dup 0)
+        (ashift:PSI (sign_extend:PSI (match_dup 1))
+                    (const_int 1)))
+   ;; "addpsi3"
+   (set (match_dup 0)
+        (plus:PSI (match_dup 0)
+                  (match_dup 2)))]
+  {
+     if (strncmp (XSTR (operands[2], 0), "CSWTCH.", strlen ("CSWTCH.")))
+       FAIL;
+     // Ok, this is from tree-switch-conversion.  If, in the original
+     // pattern, bit 15 or 14 was set, then the code cooked up by
+     // tree-swich-conversion won't work anyways and we can just as well
+     // swap the shift with the extension.
+  })
+        
+
+(define_insn "*extendpsi.ashift.1.uqi"
+  [(set (match_operand:PSI 0 "register_operand"                                          "=r")
+        (any_extend:PSI (ashift:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "0"))
+                                   (const_int 1))))]
+  ""
+  "lsl %A0\;clr %B0\;rol %B0\;clr %C0"
+  [(set_attr "length" "4")
+   (set_attr "cc" "clobber")])
+
+
+;; We abuse the addition to get grep for a split that transforms an
+;; expensive memx library call to an inline sequence of ELPM Z+.
+;; This is legitimate for lookups created by tree-switch-conversion.
+(define_insn_and_split "*split_xload<mode>-cswtch"
+  [(set (match_operand:MOVMODE 0 "register_operand"                      "=r")
+        (mem:MOVMODE (plus:PSI (match_operand:PSI 1 "register_operand"    "r")
+                               (match_operand:PSI 2 "symbol_ref_operand"  "i"))))
+   (clobber (reg:MOVMODE 22))
+   (clobber (reg:QI 21))
+   (clobber (reg:HI REG_Z))]
+  "!reload_completed
+   && AVR_HAVE_ELPMX
+   && SYMBOL_REF_P (operands[2])
+   && 0 == strncmp (XSTR (operands[2], 0), \"CSWTCH.\", strlen (\"CSWTCH.\"))"
+  { gcc_unreachable(); }
+  "&& 1"
+  [;; "*load<mode>-flashx"
+   (parallel [(set (match_dup 0)
+                   (unspec:MOVMODE [(match_dup 3)
+                                    ] UNSPEC_LOAD_FLASHX))
+              (clobber (reg:HI REG_Z))])]
+  {
+    rtx set = single_set (curr_insn);  
+    rtx mem = SET_SRC (set);
+    // "addpsi3"
+    operands[3] = force_reg (PSImode, XEXP (mem, 0));
+  })
+
+
+(define_insn "*load<mode>-flashx"
+  [(set (match_operand:MOVMODE 0 "register_operand"             "=r")
+        (unspec:MOVMODE [(match_operand:PSI 1 "register_operand" "r")
+                         ] UNSPEC_LOAD_FLASHX))
+   (clobber (reg:HI REG_Z))]
+  "AVR_HAVE_ELPMX"
+  {
+    return avr_out_load_flashx (insn, operands, NULL);
+  }
+  [(set_attr "adjust_len" "load_flashx")])
+
+
 (define_insn "addpsi3"
   [(set (match_operand:PSI 0 "register_operand"         "=??r,d ,d,r")
         (plus:PSI (match_operand:PSI 1 "register_operand" "%0,0 ,0,0")
Index: config/avr/avr.opt
===================================================================
--- config/avr/avr.opt	(revision 250302)
+++ config/avr/avr.opt	(working copy)
@@ -18,6 +18,9 @@
 ; along with GCC; see the file COPYING3.  If not see
 ; <http://www.gnu.org/licenses/>.
 
+HeaderInclude
+config/avr/avr-opts.h
+
 mcall-prologues
 Target Report Mask(CALL_PROLOGUES)
 Use subroutines for function prologues and epilogues.
@@ -82,6 +85,38 @@ mpmem-wrap-around
 Target Report
 Make the linker relaxation machine assume that a program counter wrap-around occurs.
 
+maddr-space-for-lookup=
+Target RejectNegative Joined Var(avr_opt_addr_space_for_lookup) Enum(avr_aspace_for_lookup) Init(AVR_OPT_ADDR_SPACE_memx)
+Select the address space that might be used for accessing artificial lookup tables.
+
+Enum
+Name(avr_aspace_for_lookup) Type(enum avr_opt_addr_space)
+Known address space choices (for use with the -maddr-space-for-lookup= option):
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash) Value(AVR_OPT_ADDR_SPACE_flash)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash1) Value(AVR_OPT_ADDR_SPACE_flash1)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash2) Value(AVR_OPT_ADDR_SPACE_flash2)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash3) Value(AVR_OPT_ADDR_SPACE_flash3)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash4) Value(AVR_OPT_ADDR_SPACE_flash4)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(flash5) Value(AVR_OPT_ADDR_SPACE_flash5)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(memx) Value(AVR_OPT_ADDR_SPACE_memx)
+
+EnumValue
+Enum(avr_aspace_for_lookup) String(generic) Value(AVR_OPT_ADDR_SPACE_generic)
+
 maccumulate-args
 Target Report Mask(ACCUMULATE_OUTGOING_ARGS)
 Accumulate outgoing function arguments and acquire/release the needed stack space for outgoing function arguments in function prologue/epilogue.  Without this option, outgoing arguments are pushed before calling a function and popped afterwards.  This option can lead to reduced code size for functions that call many functions that get their arguments on the stack like, for example printf.
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 250302)
+++ doc/invoke.texi	(working copy)
@@ -662,7 +662,7 @@ -imacros @var{file}  -imultilib @var{dir
 @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args @gol
 -mbranch-cost=@var{cost} @gol
 -mcall-prologues  -mgas-isr-prologues  -mint8 @gol
--mn_flash=@var{size}  -mno-interrupts @gol
+-maddr-space-for-lookup=@var{as} -mn_flash=@var{size}  -mno-interrupts @gol
 -mrelax  -mrmw  -mstrict-X  -mtiny-stack  -mfract-convert-truncate @gol
 -mshort-calls  -nodevicelib @gol
 -Waddr-space-convert  -Wmisspelled-isr}
@@ -15977,6 +15977,40 @@ This option can lead to reduced code siz
 several calls to functions that get their arguments on the stack like
 calls to printf-like functions.
 
+@item -maddr-space-for-lookup=@var{addr-space}
+@opindex maddr-space-for-lookup
+When a simple initialization in a switch statement is converted to an
+initialization from a scalar array, the compiler puts the lookup array
+into @code{.rodata}, which is the generic address space.
+This address space is always used for devices where flash memory is seen
+in the the RAM address range like for @code{avrtiny} and @code{avrxmega3}.
+
+On devices where @code{.rodata} resides in RAM, this option allows to chose
+a different @ref{AVR Named Address Spaces, address space} @var{addr-space}
+which may be one of:
+@table @code
+@item memx
+The default.
+On devices with up to 64@tie{}KiB of program memory, use the 16-bit address
+space @code{__flash}.   On devices with more than 64@tie{}KiB of program
+memory, use a 24-bit flash address space.
+@item flash
+@itemx flash1
+@itemx flash2
+@itemx flash3
+@itemx flash4
+@itemx flash5
+Use the 16-bit address space @code{__flash} @dots{} @code{__flash5},
+respectively. If the respective address space is not supported for the device
+because it is beyond its flash size, @code{__flash} is used as a fallback.
+@item generic
+Use the generic address space, i.e. @code{.rodata}.
+@end table
+Other noticeable options in this context are:
+@option{-f[no-]tree-switch-conversion}, @option{-f[no-]jump-tables}
+and @option{--param case-values-threshold=@var{value}},
+@pxref{Optimize Options}. 
+
 @item -mbranch-cost=@var{cost}
 @opindex mbranch-cost
 Set the branch costs for conditional branch instructions to

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]