This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH PING 4/5] Named address spaces: SPU back-end support


Hello,

this patch finally implements SPU back-end support for the __ea
named address space.  The back-end defines the various hooks
introduced by the previous patches to implement the following
semantics:

- There is one named address space identified by the __ea keyword;
  it refers to the effective (PPE) address space associated with
  the current SPU task.

- The width of __ea pointers is either 32 or 64 bits and can be
  selected by the (ABI-changing) options -mea32 or -mea64.  SPU
  code compiled with -mea32 only work with 32-bit PPU code, but
  SPU code compiled with -mea64 works with either 32-bit or
  64-bit PPU code.

- The __ea address space is a superset of the generic (local store
  address space).  Generic pointers can thus be implicitly converted
  to __ea pointers; the result is a PPU pointer into the mapping of
  this SPU's local store into PPU address space.  __ea pointers can
  be explicitly casted back to local store pointers; this result in
  undefined behaviour if the __ea pointer does not correspond to any
  local store address (or the null pointer).  To implement pointer
  conversion, generated code refers to the __ea_local_store variable
  which must be initialized by start-up code to hold the PPU start
  address of the SPU local store mapping.

  In environments where SPU local store is *not* mapped into the PPU
  address space, this can be disabled with -mno-address-space-conversion;
  this implies that __ea and generic address space are treated as
  disjoint address spaces according to TR 18037 rules.  When using
  this flag, generated code will never refer to __ea_local_store.

- References to PPU variables are implemented via the @ppu assembler
  syntax, which causes R_SPU_PPU32 or R_SPU_PPU64 relocations to be
  emitted.  Variables defined with the __ea attribute will be placed
  into the special ._ea section, which is marked as non-allocatable
  and will not be loaded into local store.

- Accessed to the __ea address space are implemented via a software-
  managed cache performing DMA operations.  This is always a 4-way
  associative cache with line size of 128 bytes.  The total cache
  size can be specified at link time via the -mcache-size= option.
  Fast-path cache lookup code is inlined into generated code (unless
  compiling with -Os or without optimization).  The cache usually
  performs write-back of a cache line via an atomic read-modify-
  write loop; this can be changed to use unconditional DMA write
  by the -mno-atomic-updates option.

- The cache implements a number of user-interface routines to allow
  code some control over caching operations; these are provided
  in a new target header spu_cache.h.

The whole series tested on spu-elf and s390x-ibm-linux.
OK for mainline?

Bye,
Ulrich


2009-09-14  Ben Elliston  <bje@au.ibm.com>
	    Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Ulrich Weigand  <uweigand@de.ibm.com>

	* config.gcc (spu-*-elf*): Add spu_cache.h to extra_headers.
	* config/spu/spu_cache.h: New file.

	* config/spu/cachemgr.c: New file.
	* config/spu/cache.S: New file.

	* config/spu/spu.h (ASM_OUTPUT_SYMBOL_REF): Define.
	(ADDR_SPACE_GENERIC, ADDR_SPACE_EA): Define.
	(TARGET_ADDR_SPACE_KEYWORDS): Define.
	* config/spu/spu.c (EAmode): New macro.
	(TARGET_ADDR_SPACE_POINTER_MODE): Define.
	(TARGET_ADDR_SPACE_ADDRESS_MODE): Likewise.
	(TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): Likewise.
	(TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS): Likewise.
	(TARGET_ADDR_SPACE_SUBSET_P): Likewise.
	(TARGET_ADDR_SPACE_CONVERT): Likewise.
	(TARGET_ASM_SELECT_SECTION): Likewise.
	(TARGET_ASM_UNIQUE_SECTION): Likewise.
	(TARGET_ASM_UNALIGNED_SI_OP): Likewise.
	(TARGET_ASM_ALIGNED_DI_OP): Likewise.
	(ea_symbol_ref): New function.
	(spu_legitimate_constant_p): Handle __ea qualified addresses.
	(spu_addr_space_legitimate_address_p): New function.
	(spu_addr_space_legitimize_address): Likewise.
	(cache_fetch): New global.
	(cache_fetch_dirty): Likewise.
	(ea_alias_set): Likewise.
	(ea_load_store): New function.
	(ea_load_store_inline): Likewise.
	(expand_ea_mem): Likewise.
	(spu_expand_mov): Handle __ea qualified memory references.
	(spu_addr_space_pointer_mode): New function.
	(spu_addr_space_address_mode): Likewise.
	(spu_addr_space_subset_p): Likewise.
	(spu_addr_space_convert): Likewise.
	(spu_section_type_flags): Handle "._ea" section.
	(spu_select_section): New function.
	(spu_unique_section): Likewise.
	* config/spu/spu-c.c (spu_cpu_cpp_builtins): Support __EA32__
	and __EA64__ predefined macros.
	* config/spu/spu-elf.h (LIB_SPEC): Handle -mcache-size= and
	-matomic-updates switches.

	* config/spu/t-spu-elf (MULTILIB_OPTIONS): Define.
	(EXTRA_MULTILIB_PARTS): Add libgcc_cachemgr.a,
	libgcc_cachemgr_nonatomic.a, libgcc_cache8k.a, libgcc_cache16k.a,
	libgcc_cache32k.a, libgcc_cache64k.a, libgcc_cache128k.a.
	($(T)cachemgr.o, $(T)cachemgr_nonatomic.o): New target.
	($(T)cache8k.o, $(T)cache16k.o, $(T)cache32k.o, $(T)cache64k.o,
	$(T)cache128k.o): Likewise.
	($(T)libgcc_%.a): Likewise.

	* config/spu/spu.opt (-mea32/-mea64): Add switches.
	(-maddress-space-conversion/-mno-address-space-conversion): Likewise.
	(-mcache-size=): Likewise.
	(-matomic-updates/-mno-atomic-updates): Likewise.
	* doc/invoke.texi (-mea32/-mea64): Document.
	(-maddress-space-conversion/-mno-address-space-conversion): Likewise.
	(-mcache-size=): Likewise.
	(-matomic-updates/-mno-atomic-updates): Likewise.


Index: gcc-head/gcc/config.gcc
===================================================================
--- gcc-head.orig/gcc/config.gcc
+++ gcc-head/gcc/config.gcc
@@ -2442,7 +2442,7 @@ sparc64-*-netbsd*)
 spu-*-elf*)
 	tm_file="dbxelf.h elfos.h spu/spu-elf.h spu/spu.h newlib-stdint.h"
 	tmake_file="spu/t-spu-elf"
-	extra_headers="spu_intrinsics.h spu_internals.h vmx2spu.h spu_mfcio.h vec_types.h"
+	extra_headers="spu_intrinsics.h spu_internals.h vmx2spu.h spu_mfcio.h vec_types.h spu_cache.h"
 	extra_modes=spu/spu-modes.def
 	c_target_objs="${c_target_objs} spu-c.o"
 	cxx_target_objs="${cxx_target_objs} spu-c.o"
Index: gcc-head/gcc/config/spu/cache.S
===================================================================
--- /dev/null
+++ gcc-head/gcc/config/spu/cache.S
@@ -0,0 +1,43 @@
+/* Copyright (C) 2008, 2009 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+	.data
+	.p2align 7
+	.global __cache
+__cache:
+	.rept __CACHE_SIZE__ * 8
+	.fill 128
+	.endr
+
+	.p2align 7
+	.global __cache_tag_array
+__cache_tag_array:
+	.rept __CACHE_SIZE__ * 2
+	.long 1, 1, 1, 1
+	.fill 128-16
+	.endr
+__end_cache_tag_array:
+
+	.globl __cache_tag_array_size
+	.set __cache_tag_array_size, __end_cache_tag_array-__cache_tag_array
+
Index: gcc-head/gcc/config/spu/cachemgr.c
===================================================================
--- /dev/null
+++ gcc-head/gcc/config/spu/cachemgr.c
@@ -0,0 +1,438 @@
+/* Copyright (C) 2008, 2009 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#include <spu_mfcio.h>
+#include <spu_internals.h>
+#include <spu_intrinsics.h>
+#include <spu_cache.h>
+
+extern unsigned long long __ea_local_store;
+extern char __cache_tag_array_size;
+
+#define LINE_SIZE 128
+#define TAG_MASK (LINE_SIZE - 1)
+
+#define WAYS 4
+#define SET_MASK ((int) &__cache_tag_array_size - LINE_SIZE)
+
+#define CACHE_LINES ((int) &__cache_tag_array_size /		\
+		     sizeof (struct __cache_tag_array) * WAYS)
+
+struct __cache_tag_array
+{
+  unsigned int tag_lo[WAYS];
+  unsigned int tag_hi[WAYS];
+  void *base[WAYS];
+  int reserved[WAYS];
+  vector unsigned short dirty_bits[WAYS];
+};
+
+extern struct __cache_tag_array __cache_tag_array[];
+extern char __cache[];
+
+/* In order to make the code seem a little cleaner, and to avoid having
+   64/32 bit ifdefs all over the place, we use macros.  */
+
+#ifdef __EA64__
+typedef unsigned long long addr;
+
+#define CHECK_TAG(_entry, _way, _tag)			\
+  ((_entry)->tag_lo[(_way)] == ((_tag) & 0xFFFFFFFF)	\
+   && (_entry)->tag_hi[(_way)] == ((_tag) >> 32))
+
+#define GET_TAG(_entry, _way) \
+  ((unsigned long long)(_entry)->tag_hi[(_way)] << 32	\
+   | (unsigned long long)(_entry)->tag_lo[(_way)])
+
+#define SET_TAG(_entry, _way, _tag)			\
+  (_entry)->tag_lo[(_way)] = (_tag) & 0xFFFFFFFF;	\
+  (_entry)->tag_hi[(_way)] = (_tag) >> 32
+
+#else /*__EA32__*/
+typedef unsigned long addr;
+
+#define CHECK_TAG(_entry, _way, _tag)			\
+  ((_entry)->tag_lo[(_way)] == (_tag))
+
+#define GET_TAG(_entry, _way)				\
+  ((_entry)->tag_lo[(_way)])
+
+#define SET_TAG(_entry, _way, _tag)			\
+  (_entry)->tag_lo[(_way)] = (_tag)
+
+#endif
+
+/* In GET_ENTRY, we cast away the high 32 bits,
+   as the tag is only in the low 32.  */
+
+#define GET_ENTRY(_addr)						   \
+  ((struct __cache_tag_array *)						   \
+   si_to_uint (si_a (si_and (si_from_uint ((unsigned int) (addr) (_addr)), \
+			     si_from_uint (SET_MASK)),			   \
+	       si_from_uint ((unsigned int) __cache_tag_array))))
+
+#define GET_CACHE_LINE(_addr, _way) \
+  ((void *) (__cache + ((_addr) & SET_MASK) * WAYS) + ((_way) * LINE_SIZE));
+
+#define CHECK_DIRTY(_vec) (si_to_uint (si_orx ((qword) (_vec))))
+#define SET_EMPTY(_entry, _way) ((_entry)->tag_lo[(_way)] = 1)
+#define CHECK_EMPTY(_entry, _way) ((_entry)->tag_lo[(_way)] == 1)
+
+#define LS_FLAG 0x80000000
+#define SET_IS_LS(_entry, _way) ((_entry)->reserved[(_way)] |= LS_FLAG)
+#define CHECK_IS_LS(_entry, _way) ((_entry)->reserved[(_way)] & LS_FLAG)
+#define GET_LRU(_entry, _way) ((_entry)->reserved[(_way)] & ~LS_FLAG)
+
+static int dma_tag = 32;
+
+static void
+__cache_evict_entry (struct __cache_tag_array *entry, int way)
+{
+  addr tag = GET_TAG (entry, way);
+
+  if (CHECK_DIRTY (entry->dirty_bits[way]) && !CHECK_IS_LS (entry, way))
+    {
+#ifdef NONATOMIC
+      /* Non-atomic writes.  */
+      unsigned int oldmask, mach_stat;
+      char *line = ((void *) 0);
+
+      /* Enter critical section.  */
+      mach_stat = spu_readch (SPU_RdMachStat);
+      spu_idisable ();
+
+      /* Issue DMA request.  */
+      line = GET_CACHE_LINE (entry->tag_lo[way], way);
+      mfc_put (line, tag, LINE_SIZE, dma_tag, 0, 0);
+
+      /* Wait for DMA completion.  */
+      oldmask = mfc_read_tag_mask ();
+      mfc_write_tag_mask (1 << dma_tag);
+      mfc_read_tag_status_all ();
+      mfc_write_tag_mask (oldmask);
+
+      /* Leave critical section.  */
+      if (__builtin_expect (mach_stat & 1, 0))
+	spu_ienable ();
+#else
+      /* Allocate a buffer large enough that we know it has 128 bytes
+         that are 128 byte aligned (for DMA). */
+
+      char buffer[LINE_SIZE + 127];
+      qword *buf_ptr = (qword *) (((unsigned int) (buffer) + 127) & ~127);
+      qword *line = GET_CACHE_LINE (entry->tag_lo[way], way);
+      qword bits;
+      unsigned int mach_stat;
+
+      /* Enter critical section.  */
+      mach_stat = spu_readch (SPU_RdMachStat);
+      spu_idisable ();
+
+      do
+	{
+	  /* We atomically read the current memory into a buffer
+	     modify the dirty bytes in the buffer, and write it
+	     back. If writeback fails, loop and try again.  */
+
+	  mfc_getllar (buf_ptr, tag, 0, 0);
+	  mfc_read_atomic_status ();
+
+	  /* The method we're using to write 16 dirty bytes into
+	     the buffer at a time uses fsmb which in turn uses
+	     the least significant 16 bits of word 0, so we
+	     load the bits and rotate so that the first bit of
+	     the bitmap is in the first bit that fsmb will use.  */
+
+	  bits = (qword) entry->dirty_bits[way];
+	  bits = si_rotqbyi (bits, -2);
+
+	  /* Si_fsmb creates the mask of dirty bytes.
+	     Use selb to nab the appropriate bits.  */
+	  buf_ptr[0] = si_selb (buf_ptr[0], line[0], si_fsmb (bits));
+
+	  /* Rotate to next 16 byte section of cache.  */
+	  bits = si_rotqbyi (bits, 2);
+
+	  buf_ptr[1] = si_selb (buf_ptr[1], line[1], si_fsmb (bits));
+	  bits = si_rotqbyi (bits, 2);
+	  buf_ptr[2] = si_selb (buf_ptr[2], line[2], si_fsmb (bits));
+	  bits = si_rotqbyi (bits, 2);
+	  buf_ptr[3] = si_selb (buf_ptr[3], line[3], si_fsmb (bits));
+	  bits = si_rotqbyi (bits, 2);
+	  buf_ptr[4] = si_selb (buf_ptr[4], line[4], si_fsmb (bits));
+	  bits = si_rotqbyi (bits, 2);
+	  buf_ptr[5] = si_selb (buf_ptr[5], line[5], si_fsmb (bits));
+	  bits = si_rotqbyi (bits, 2);
+	  buf_ptr[6] = si_selb (buf_ptr[6], line[6], si_fsmb (bits));
+	  bits = si_rotqbyi (bits, 2);
+	  buf_ptr[7] = si_selb (buf_ptr[7], line[7], si_fsmb (bits));
+	  bits = si_rotqbyi (bits, 2);
+
+	  mfc_putllc (buf_ptr, tag, 0, 0);
+	}
+      while (mfc_read_atomic_status ());
+
+      /* Leave critical section.  */
+      if (__builtin_expect (mach_stat & 1, 0))
+	spu_ienable ();
+#endif
+    }
+
+  /* In any case, marking the lo tag with 1 which denotes empty.  */
+  SET_EMPTY (entry, way);
+  entry->dirty_bits[way] = (vector unsigned short) si_from_uint (0);
+}
+
+void
+__cache_evict (__ea void *ea)
+{
+  addr tag = (addr) ea & ~TAG_MASK;
+  struct __cache_tag_array *entry = GET_ENTRY (ea);
+  int i = 0;
+
+  /* Cycles through all the possible ways an address could be at
+     and evicts the way if found.  */
+
+  for (i = 0; i < WAYS; i++)
+    if (CHECK_TAG (entry, i, tag))
+      __cache_evict_entry (entry, i);
+}
+
+static void *
+__cache_fill (int way, addr tag)
+{
+  unsigned int oldmask, mach_stat;
+  char *line = ((void *) 0);
+
+  /* Reserve our DMA tag.  */
+  if (dma_tag == 32)
+    dma_tag = mfc_tag_reserve ();
+
+  /* Enter critical section.  */
+  mach_stat = spu_readch (SPU_RdMachStat);
+  spu_idisable ();
+
+  /* Issue DMA request.  */
+  line = GET_CACHE_LINE (tag, way);
+  mfc_get (line, tag, LINE_SIZE, dma_tag, 0, 0);
+
+  /* Wait for DMA completion.  */
+  oldmask = mfc_read_tag_mask ();
+  mfc_write_tag_mask (1 << dma_tag);
+  mfc_read_tag_status_all ();
+  mfc_write_tag_mask (oldmask);
+
+  /* Leave critical section.  */
+  if (__builtin_expect (mach_stat & 1, 0))
+    spu_ienable ();
+
+  return (void *) line;
+}
+
+static void
+__cache_miss (__ea void *ea, struct __cache_tag_array *entry, int way)
+{
+
+  addr tag = (addr) ea & ~TAG_MASK;
+  unsigned int lru = 0;
+  int i = 0;
+  int idx = 0;
+
+  /* If way > 4, then there are no empty slots, so we must evict
+     the least recently used entry. */
+  if (way >= 4)
+    {
+      for (i = 0; i < WAYS; i++)
+	{
+	  if (GET_LRU (entry, i) > lru)
+	    {
+	      lru = GET_LRU (entry, i);
+	      idx = i;
+	    }
+	}
+      __cache_evict_entry (entry, idx);
+      way = idx;
+    }
+
+  /* Set the empty entry's tag and fill it's cache line. */
+
+  SET_TAG (entry, way, tag);
+  entry->reserved[way] = 0;
+
+  /* Check if the address is just an effective address within the
+     SPU's local store. */
+
+  /* Because the LS is not 256k aligned, we can't do a nice and mask
+     here to compare, so we must check the whole range.  */
+
+  if ((addr) ea >= (addr) __ea_local_store
+      && (addr) ea < (addr) (__ea_local_store + 0x40000))
+    {
+      SET_IS_LS (entry, way);
+      entry->base[way] =
+	(void *) ((unsigned int) ((addr) ea -
+				  (addr) __ea_local_store) & ~0x7f);
+    }
+  else
+    {
+      entry->base[way] = __cache_fill (way, tag);
+    }
+}
+
+void *
+__cache_fetch_dirty (__ea void *ea, int n_bytes_dirty)
+{
+#ifdef __EA64__
+  unsigned int tag_hi;
+  qword etag_hi;
+#endif
+  unsigned int tag_lo;
+  struct __cache_tag_array *entry;
+
+  qword etag_lo;
+  qword equal;
+  qword bit_mask;
+  qword way;
+
+  /* This first chunk, we merely fill the pointer and tag.  */
+
+  entry = GET_ENTRY (ea);
+
+#ifndef __EA64__
+  tag_lo =
+    si_to_uint (si_andc
+		(si_shufb
+		 (si_from_uint ((addr) ea), si_from_uint (0),
+		  si_from_uint (0x00010203)), si_from_uint (TAG_MASK)));
+#else
+  tag_lo =
+    si_to_uint (si_andc
+		(si_shufb
+		 (si_from_ullong ((addr) ea), si_from_uint (0),
+		  si_from_uint (0x04050607)), si_from_uint (TAG_MASK)));
+
+  tag_hi =
+    si_to_uint (si_shufb
+		(si_from_ullong ((addr) ea), si_from_uint (0),
+		 si_from_uint (0x00010203)));
+#endif
+
+  /* Increment LRU in reserved bytes.  */
+  si_stqd (si_ai (si_lqd (si_from_ptr (entry), 48), 1),
+	   si_from_ptr (entry), 48);
+
+missreturn:
+  /* Check if the entry's lo_tag is equal to the address' lo_tag.  */
+  etag_lo = si_lqd (si_from_ptr (entry), 0);
+  equal = si_ceq (etag_lo, si_from_uint (tag_lo));
+#ifdef __EA64__
+  /* And the high tag too.  */
+  etag_hi = si_lqd (si_from_ptr (entry), 16);
+  equal = si_and (equal, (si_ceq (etag_hi, si_from_uint (tag_hi))));
+#endif
+
+  if ((si_to_uint (si_orx (equal)) == 0))
+    goto misshandler;
+
+  if (n_bytes_dirty)
+    {
+      /* way = 0x40,0x50,0x60,0x70 for each way, which is also the
+         offset of the appropriate dirty bits.  */
+      way = si_shli (si_clz (si_gbb (equal)), 2);
+
+      /* To create the bit_mask, we set it to all 1s (uint -1), then we
+         shift it over (128 - n_bytes_dirty) times.  */
+
+      bit_mask = si_from_uint (-1);
+
+      bit_mask =
+	si_shlqby (bit_mask, si_from_uint ((LINE_SIZE - n_bytes_dirty) / 8));
+
+      bit_mask =
+	si_shlqbi (bit_mask, si_from_uint ((LINE_SIZE - n_bytes_dirty) % 8));
+
+      /* Rotate it around to the correct offset.  */
+      bit_mask =
+	si_rotqby (bit_mask,
+		   si_from_uint (-1 * ((addr) ea & TAG_MASK) / 8));
+
+      bit_mask =
+	si_rotqbi (bit_mask,
+		   si_from_uint (-1 * ((addr) ea & TAG_MASK) % 8));
+
+      /* Update the dirty bits.  */
+      si_stqx (si_or (si_lqx (si_from_ptr (entry), way), bit_mask),
+	       si_from_ptr (entry), way);
+    };
+
+  /* We've definitely found the right entry, set LRU (reserved) to 0
+     maintaining the LS flag (MSB).  */
+
+  si_stqd (si_andc
+	   (si_lqd (si_from_ptr (entry), 48),
+	    si_and (equal, si_from_uint (~(LS_FLAG)))),
+	   si_from_ptr (entry), 48);
+
+  return (void *)
+    si_to_uint (si_a
+		(si_orx
+		 (si_and (si_lqd (si_from_ptr (entry), 32), equal)),
+		 si_from_uint (((unsigned int) (addr) ea) & TAG_MASK)));
+
+misshandler:
+  equal = si_ceqi (etag_lo, 1);
+  __cache_miss (ea, entry, (si_to_uint (si_clz (si_gbb (equal))) - 16) >> 2);
+  goto missreturn;
+}
+
+void *
+__cache_fetch (__ea void *ea)
+{
+  return __cache_fetch_dirty (ea, 0);
+}
+
+void
+__cache_touch (__ea void *ea __attribute__ ((unused)))
+{
+  /* NO-OP for now.  */
+}
+
+void __cache_flush (void) __attribute__ ((destructor));
+void
+__cache_flush (void)
+{
+  struct __cache_tag_array *entry = __cache_tag_array;
+  unsigned int i;
+  int j;
+
+  /* Cycle through each cache entry and evict all used ways.  */
+
+  for (i = 0; i < CACHE_LINES / WAYS; i++)
+    {
+      for (j = 0; j < WAYS; j++)
+	if (!CHECK_EMPTY (entry, j))
+	  __cache_evict_entry (entry, j);
+
+      entry++;
+    }
+}
Index: gcc-head/gcc/config/spu/spu-c.c
===================================================================
--- gcc-head.orig/gcc/config/spu/spu-c.c
+++ gcc-head/gcc/config/spu/spu-c.c
@@ -201,6 +201,17 @@ spu_cpu_cpp_builtins (struct cpp_reader 
   if (spu_arch == PROCESSOR_CELLEDP)
     builtin_define_std ("__SPU_EDP__");
   builtin_define_std ("__vector=__attribute__((__spu_vector__))");
+  switch (spu_ea_model)
+    {
+    case 32:
+      builtin_define_std ("__EA32__");
+      break;
+    case 64:
+      builtin_define_std ("__EA64__");
+      break;
+    default:
+       gcc_unreachable ();
+    }
 
   if (!flag_iso)
     {
Index: gcc-head/gcc/config/spu/spu-elf.h
===================================================================
--- gcc-head.orig/gcc/config/spu/spu-elf.h
+++ gcc-head/gcc/config/spu/spu-elf.h
@@ -68,8 +68,14 @@
 
 #define LINK_SPEC "%{mlarge-mem: --defsym __stack=0xfffffff0 }"
 
-#define LIB_SPEC \
-	"-( %{!shared:%{g*:-lg}} -lc -lgloss -)"
+#define LIB_SPEC "-( %{!shared:%{g*:-lg}} -lc -lgloss -) \
+    %{mno-atomic-updates:-lgcc_cachemgr_nonatomic; :-lgcc_cachemgr} \
+    %{mcache-size=128:-lgcc_cache128k; \
+      mcache-size=64 :-lgcc_cache64k; \
+      mcache-size=32 :-lgcc_cache32k; \
+      mcache-size=16 :-lgcc_cache16k; \
+      mcache-size=8  :-lgcc_cache8k; \
+                     :-lgcc_cache64k}"
 
 /* Turn off warnings in the assembler too. */
 #undef ASM_SPEC
Index: gcc-head/gcc/config/spu/spu.c
===================================================================
--- gcc-head.orig/gcc/config/spu/spu.c
+++ gcc-head/gcc/config/spu/spu.c
@@ -153,6 +153,8 @@ static void spu_init_builtins (void);
 static unsigned char spu_scalar_mode_supported_p (enum machine_mode mode);
 static unsigned char spu_vector_mode_supported_p (enum machine_mode mode);
 static bool spu_legitimate_address_p (enum machine_mode, rtx, bool);
+static bool spu_addr_space_legitimate_address_p (enum machine_mode, rtx,
+						 bool, addr_space_t);
 static rtx adjust_operand (rtx op, HOST_WIDE_INT * start);
 static rtx get_pic_reg (void);
 static int need_to_save_reg (int regno, int saving);
@@ -202,15 +204,23 @@ static bool spu_return_in_memory (const_
 static void fix_range (const char *);
 static void spu_encode_section_info (tree, rtx, int);
 static rtx spu_legitimize_address (rtx, rtx, enum machine_mode);
+static rtx spu_addr_space_legitimize_address (rtx, rtx, enum machine_mode,
+					      addr_space_t);
 static tree spu_builtin_mul_widen_even (tree);
 static tree spu_builtin_mul_widen_odd (tree);
 static tree spu_builtin_mask_for_load (void);
 static int spu_builtin_vectorization_cost (bool);
 static bool spu_vector_alignment_reachable (const_tree, bool);
 static tree spu_builtin_vec_perm (tree, tree *);
+static enum machine_mode spu_addr_space_pointer_mode (addr_space_t);
+static enum machine_mode spu_addr_space_address_mode (addr_space_t);
+static bool spu_addr_space_subset_p (addr_space_t, addr_space_t);
+static rtx spu_addr_space_convert (rtx, tree, tree);
 static int spu_sms_res_mii (struct ddg *g);
 static void asm_file_start (void);
 static unsigned int spu_section_type_flags (tree, const char *, int);
+static section *spu_select_section (tree, int, unsigned HOST_WIDE_INT);
+static void spu_unique_section (tree, int);
 static rtx spu_expand_load (rtx, rtx, rtx, int);
 
 extern const char *reg_names[];
@@ -268,6 +278,10 @@ spu_libgcc_cmp_return_mode (void);
 
 static enum machine_mode
 spu_libgcc_shift_count_mode (void);
+
+/* Pointer mode for __ea references.  */
+#define EAmode (spu_ea_model != 32 ? DImode : SImode)
+
 
 /*  Table of machine attributes.  */
 static const struct attribute_spec spu_attribute_table[] =
@@ -280,6 +294,25 @@ static const struct attribute_spec spu_a
 
 /*  TARGET overrides.  */
 
+#undef TARGET_ADDR_SPACE_POINTER_MODE
+#define TARGET_ADDR_SPACE_POINTER_MODE spu_addr_space_pointer_mode
+
+#undef TARGET_ADDR_SPACE_ADDRESS_MODE
+#define TARGET_ADDR_SPACE_ADDRESS_MODE spu_addr_space_address_mode
+
+#undef TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P
+#define TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P \
+  spu_addr_space_legitimate_address_p
+
+#undef TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS
+#define TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS spu_addr_space_legitimize_address
+
+#undef TARGET_ADDR_SPACE_SUBSET_P
+#define TARGET_ADDR_SPACE_SUBSET_P spu_addr_space_subset_p
+
+#undef TARGET_ADDR_SPACE_CONVERT
+#define TARGET_ADDR_SPACE_CONVERT spu_addr_space_convert
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS spu_init_builtins
 
@@ -292,6 +325,15 @@ static const struct attribute_spec spu_a
 #undef TARGET_LEGITIMIZE_ADDRESS
 #define TARGET_LEGITIMIZE_ADDRESS spu_legitimize_address
 
+/* The current assembler doesn't like .4byte foo@ppu, so use the normal .long
+   and .quad for the debugger.  When it is known that the assembler is fixed,
+   these can be removed.  */
+#undef TARGET_ASM_UNALIGNED_SI_OP
+#define TARGET_ASM_UNALIGNED_SI_OP	"\t.long\t"
+
+#undef TARGET_ASM_ALIGNED_DI_OP
+#define TARGET_ASM_ALIGNED_DI_OP	"\t.quad\t"
+
 /* The .8byte directive doesn't seem to work well for a 32 bit
    architecture. */
 #undef TARGET_ASM_UNALIGNED_DI_OP
@@ -408,6 +450,12 @@ static const struct attribute_spec spu_a
 #undef TARGET_SECTION_TYPE_FLAGS
 #define TARGET_SECTION_TYPE_FLAGS spu_section_type_flags
 
+#undef TARGET_ASM_SELECT_SECTION
+#define TARGET_ASM_SELECT_SECTION  spu_select_section
+
+#undef TARGET_ASM_UNIQUE_SECTION
+#define TARGET_ASM_UNIQUE_SECTION  spu_unique_section
+
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P spu_legitimate_address_p
 
@@ -3602,6 +3650,29 @@ exp2_immediate_p (rtx op, enum machine_m
   return FALSE;
 }
 
+/* Return true if X is a SYMBOL_REF to an __ea qualified variable.  */
+
+static int
+ea_symbol_ref (rtx *px, void *data ATTRIBUTE_UNUSED)
+{
+  rtx x = *px;
+  tree decl;
+
+  if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == PLUS)
+    {
+      rtx plus = XEXP (x, 0);
+      rtx op0 = XEXP (plus, 0);
+      rtx op1 = XEXP (plus, 1);
+      if (GET_CODE (op1) == CONST_INT)
+	x = op0;
+    }
+
+  return (GET_CODE (x) == SYMBOL_REF
+ 	  && (decl = SYMBOL_REF_DECL (x)) != 0
+ 	  && TREE_CODE (decl) == VAR_DECL
+ 	  && TYPE_ADDR_SPACE (TREE_TYPE (decl)));
+}
+
 /* We accept:
    - any 32-bit constant (SImode, SFmode)
    - any constant that can be generated with fsmbi (any mode)
@@ -3613,6 +3684,12 @@ spu_legitimate_constant_p (rtx x)
 {
   if (GET_CODE (x) == HIGH)
     x = XEXP (x, 0);
+
+  /* Reject any __ea qualified reference.  These can't appear in
+     instructions but must be forced to the constant pool.  */
+  if (for_each_rtx (&x, ea_symbol_ref, 0))
+    return 0;
+
   /* V4SI with all identical symbols is valid. */
   if (!flag_pic
       && GET_MODE (x) == V4SImode
@@ -3651,8 +3728,14 @@ spu_legitimate_address_p (enum machine_m
   switch (GET_CODE (x))
     {
     case LABEL_REF:
+      return !TARGET_LARGE_MEM;
+
     case SYMBOL_REF:
     case CONST:
+      /* Keep __ea references until reload so that spu_expand_mov can see them
+	 in MEMs.  */
+      if (ea_symbol_ref (&x, 0))
+	return !reload_in_progress && !reload_completed;
       return !TARGET_LARGE_MEM;
 
     case CONST_INT:
@@ -3696,6 +3779,20 @@ spu_legitimate_address_p (enum machine_m
   return FALSE;
 }
 
+/* Like spu_legitimate_address_p, except with named addresses.  */
+static bool
+spu_addr_space_legitimate_address_p (enum machine_mode mode, rtx x,
+				     bool reg_ok_strict, addr_space_t as)
+{
+  if (as == ADDR_SPACE_EA)
+    return (REG_P (x) && (GET_MODE (x) == EAmode));
+
+  else if (as != ADDR_SPACE_GENERIC)
+    gcc_unreachable ();
+
+  return spu_legitimate_address_p (mode, x, reg_ok_strict);
+}
+
 /* When the address is reg + const_int, force the const_int into a
    register.  */
 rtx
@@ -3727,6 +3824,17 @@ spu_legitimize_address (rtx x, rtx oldx 
   return x;
 }
 
+/* Like spu_legitimate_address, except with named address support.  */
+static rtx
+spu_addr_space_legitimize_address (rtx x, rtx oldx, enum machine_mode mode,
+				   addr_space_t as)
+{
+  if (as != ADDR_SPACE_GENERIC)
+    return x;
+
+  return spu_legitimize_address (x, oldx, mode);
+}
+
 /* Handle an attribute requiring a FUNCTION_DECL; arguments as in
    struct attribute_spec.handler.  */
 static tree
@@ -4230,6 +4338,233 @@ address_needs_split (rtx mem)
   return 0;
 }
 
+static GTY(()) rtx cache_fetch;		  /* __cache_fetch function */
+static GTY(()) rtx cache_fetch_dirty;	  /* __cache_fetch_dirty function */
+static alias_set_type ea_alias_set = -1;  /* alias set for __ea memory */
+
+/* MEM is known to be an __ea qualified memory access.  Emit a call to
+   fetch the ppu memory to local store, and return its address in local
+   store.  */
+
+static void
+ea_load_store (rtx mem, bool is_store, rtx ea_addr, rtx data_addr)
+{
+  if (is_store)
+    {
+      rtx ndirty = GEN_INT (GET_MODE_SIZE (GET_MODE (mem)));
+      if (!cache_fetch_dirty)
+	cache_fetch_dirty = init_one_libfunc ("__cache_fetch_dirty");
+      emit_library_call_value (cache_fetch_dirty, data_addr, LCT_NORMAL, Pmode,
+			       2, ea_addr, EAmode, ndirty, SImode);
+    }
+  else
+    {
+      if (!cache_fetch)
+	cache_fetch = init_one_libfunc ("__cache_fetch");
+      emit_library_call_value (cache_fetch, data_addr, LCT_NORMAL, Pmode,
+			       1, ea_addr, EAmode);
+    }
+}
+
+/* Like ea_load_store, but do the cache tag comparison and, for stores,
+   dirty bit marking, inline.
+
+   The cache control data structure is an array of
+
+   struct __cache_tag_array
+     {
+        unsigned int tag_lo[4];
+        unsigned int tag_hi[4];
+        void *data_pointer[4];
+        int reserved[4];
+        vector unsigned short dirty_bits[4];
+     }  */
+
+static void
+ea_load_store_inline (rtx mem, bool is_store, rtx ea_addr, rtx data_addr)
+{
+  rtx ea_addr_si;
+  HOST_WIDE_INT v;
+  rtx tag_size_sym = gen_rtx_SYMBOL_REF (Pmode, "__cache_tag_array_size");
+  rtx tag_arr_sym = gen_rtx_SYMBOL_REF (Pmode, "__cache_tag_array");
+  rtx index_mask = gen_reg_rtx (SImode);
+  rtx tag_arr = gen_reg_rtx (Pmode);
+  rtx splat_mask = gen_reg_rtx (TImode);
+  rtx splat = gen_reg_rtx (V4SImode);
+  rtx splat_hi = NULL_RTX;
+  rtx tag_index = gen_reg_rtx (Pmode);
+  rtx block_off = gen_reg_rtx (SImode);
+  rtx tag_addr = gen_reg_rtx (Pmode);
+  rtx tag = gen_reg_rtx (V4SImode);
+  rtx cache_tag = gen_reg_rtx (V4SImode);
+  rtx cache_tag_hi = NULL_RTX;
+  rtx cache_ptrs = gen_reg_rtx (TImode);
+  rtx cache_ptrs_si = gen_reg_rtx (SImode);
+  rtx tag_equal = gen_reg_rtx (V4SImode);
+  rtx tag_equal_hi = NULL_RTX;
+  rtx tag_eq_pack = gen_reg_rtx (V4SImode);
+  rtx tag_eq_pack_si = gen_reg_rtx (SImode);
+  rtx eq_index = gen_reg_rtx (SImode);
+  rtx bcomp, hit_label, hit_ref, cont_label, insn;
+
+  if (spu_ea_model != 32)
+    {
+      splat_hi = gen_reg_rtx (V4SImode);
+      cache_tag_hi = gen_reg_rtx (V4SImode);
+      tag_equal_hi = gen_reg_rtx (V4SImode);
+    }
+
+  emit_move_insn (index_mask, plus_constant (tag_size_sym, -128));
+  emit_move_insn (tag_arr, tag_arr_sym);
+  v = 0x0001020300010203LL;
+  emit_move_insn (splat_mask, immed_double_const (v, v, TImode));
+  ea_addr_si = ea_addr;
+  if (spu_ea_model != 32)
+    ea_addr_si = convert_to_mode (SImode, ea_addr, 1);
+
+  /* tag_index = ea_addr & (tag_array_size - 128)  */
+  emit_insn (gen_andsi3 (tag_index, ea_addr_si, index_mask));
+
+  /* splat ea_addr to all 4 slots.  */
+  emit_insn (gen_shufb (splat, ea_addr_si, ea_addr_si, splat_mask));
+  /* Similarly for high 32 bits of ea_addr.  */
+  if (spu_ea_model != 32)
+    emit_insn (gen_shufb (splat_hi, ea_addr, ea_addr, splat_mask));
+
+  /* block_off = ea_addr & 127  */
+  emit_insn (gen_andsi3 (block_off, ea_addr_si, spu_const (SImode, 127)));
+
+  /* tag_addr = tag_arr + tag_index  */
+  emit_insn (gen_addsi3 (tag_addr, tag_arr, tag_index));
+
+  /* Read cache tags.  */
+  emit_move_insn (cache_tag, gen_rtx_MEM (V4SImode, tag_addr));
+  if (spu_ea_model != 32)
+    emit_move_insn (cache_tag_hi, gen_rtx_MEM (V4SImode,
+					       plus_constant (tag_addr, 16)));
+
+  /* tag = ea_addr & -128  */
+  emit_insn (gen_andv4si3 (tag, splat, spu_const (V4SImode, -128)));
+
+  /* Read all four cache data pointers.  */
+  emit_move_insn (cache_ptrs, gen_rtx_MEM (TImode,
+					   plus_constant (tag_addr, 32)));
+
+  /* Compare tags.  */
+  emit_insn (gen_ceq_v4si (tag_equal, tag, cache_tag));
+  if (spu_ea_model != 32)
+    {
+      emit_insn (gen_ceq_v4si (tag_equal_hi, splat_hi, cache_tag_hi));
+      emit_insn (gen_andv4si3 (tag_equal, tag_equal, tag_equal_hi));
+    }
+
+  /* At most one of the tags compare equal, so tag_equal has one
+     32-bit slot set to all 1's, with the other slots all zero.
+     gbb picks off low bit from each byte in the 128-bit registers,
+     so tag_eq_pack is one of 0xf000, 0x0f00, 0x00f0, 0x000f, assuming
+     we have a hit.  */
+  emit_insn (gen_spu_gbb (tag_eq_pack, spu_gen_subreg (V16QImode, tag_equal)));
+  emit_insn (gen_spu_convert (tag_eq_pack_si, tag_eq_pack));
+
+  /* So counting leading zeros will set eq_index to 16, 20, 24 or 28.  */
+  emit_insn (gen_clzsi2 (eq_index, tag_eq_pack_si));
+
+  /* Allowing us to rotate the corresponding cache data pointer to slot0.
+     (rotating eq_index mod 16 bytes).  */
+  emit_insn (gen_rotqby_ti (cache_ptrs, cache_ptrs, eq_index));
+  emit_insn (gen_spu_convert (cache_ptrs_si, cache_ptrs));
+
+  /* Add block offset to form final data address.  */
+  emit_insn (gen_addsi3 (data_addr, cache_ptrs_si, block_off));
+
+  /* Check that we did hit.  */
+  hit_label = gen_label_rtx ();
+  hit_ref = gen_rtx_LABEL_REF (VOIDmode, hit_label);
+  bcomp = gen_rtx_NE (SImode, tag_eq_pack_si, const0_rtx);
+  insn = emit_jump_insn (gen_rtx_SET (VOIDmode, pc_rtx,
+				      gen_rtx_IF_THEN_ELSE (VOIDmode, bcomp,
+							    hit_ref, pc_rtx)));
+  /* Say that this branch is very likely to happen.  */
+  v = REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100 - 1;
+  REG_NOTES (insn)
+    = gen_rtx_EXPR_LIST (REG_BR_PROB, GEN_INT (v), REG_NOTES (insn));
+
+  ea_load_store (mem, is_store, ea_addr, data_addr);
+  cont_label = gen_label_rtx ();
+  emit_jump_insn (gen_jump (cont_label));
+  emit_barrier ();
+
+  emit_label (hit_label);
+
+  if (is_store)
+    {
+      HOST_WIDE_INT v_hi;
+      rtx dirty_bits = gen_reg_rtx (TImode);
+      rtx dirty_off = gen_reg_rtx (SImode);
+      rtx dirty_128 = gen_reg_rtx (TImode);
+      rtx neg_block_off = gen_reg_rtx (SImode);
+
+      /* Set up mask with one dirty bit per byte of the mem we are
+	 writing, starting from top bit.  */
+      v_hi = v = -1;
+      v <<= (128 - GET_MODE_SIZE (GET_MODE (mem))) & 63;
+      if ((128 - GET_MODE_SIZE (GET_MODE (mem))) >= 64)
+	{
+	  v_hi = v;
+	  v = 0;
+	}
+      emit_move_insn (dirty_bits, immed_double_const (v, v_hi, TImode));
+
+      /* Form index into cache dirty_bits.  eq_index is one of
+	 0x10, 0x14, 0x18 or 0x1c.  Multiplying by 4 gives us
+	 0x40, 0x50, 0x60 or 0x70 which just happens to be the
+	 offset to each of the four dirty_bits elements.  */
+      emit_insn (gen_ashlsi3 (dirty_off, eq_index, spu_const (SImode, 2)));
+
+      emit_insn (gen_spu_lqx (dirty_128, tag_addr, dirty_off));
+
+      /* Rotate bit mask to proper bit.  */
+      emit_insn (gen_negsi2 (neg_block_off, block_off));
+      emit_insn (gen_rotqbybi_ti (dirty_bits, dirty_bits, neg_block_off));
+      emit_insn (gen_rotqbi_ti (dirty_bits, dirty_bits, neg_block_off));
+
+      /* Or in the new dirty bits.  */
+      emit_insn (gen_iorti3 (dirty_128, dirty_bits, dirty_128));
+
+      /* Store.  */
+      emit_insn (gen_spu_stqx (dirty_128, tag_addr, dirty_off));
+    }
+
+  emit_label (cont_label);
+}
+
+static rtx
+expand_ea_mem (rtx mem, bool is_store)
+{
+  rtx ea_addr;
+  rtx data_addr = gen_reg_rtx (Pmode);
+  rtx new_mem;
+
+  ea_addr = force_reg (EAmode, XEXP (mem, 0));
+  if (optimize_size || optimize == 0)
+    ea_load_store (mem, is_store, ea_addr, data_addr);
+  else
+    ea_load_store_inline (mem, is_store, ea_addr, data_addr);
+
+  if (ea_alias_set == -1)
+    ea_alias_set = new_alias_set ();
+
+  /* We generate a new MEM RTX to refer to the copy of the data
+     in the cache.  We do not copy memory attributes (except the
+     alignment) from the original MEM, as they may no longer apply
+     to the cache copy.  */
+  new_mem = gen_rtx_MEM (GET_MODE (mem), data_addr);
+  set_mem_alias_set (new_mem, ea_alias_set);
+  set_mem_align (new_mem, MIN (MEM_ALIGN (mem), 128 * 8));
+
+  return new_mem;
+}
+
 int
 spu_expand_mov (rtx * ops, enum machine_mode mode)
 {
@@ -4287,9 +4622,17 @@ spu_expand_mov (rtx * ops, enum machine_
 	}
     }
   if (MEM_P (ops[0]))
-    return spu_split_store (ops);
+    {
+      if (MEM_ADDR_SPACE (ops[0]))
+	ops[0] = expand_ea_mem (ops[0], true);
+      return spu_split_store (ops);
+    }
   if (MEM_P (ops[1]))
-    return spu_split_load (ops);
+    {
+      if (MEM_ADDR_SPACE (ops[1]))
+	ops[1] = expand_ea_mem (ops[1], false);
+      return spu_split_load (ops);
+    }
 
   return 0;
 }
@@ -6419,6 +6762,113 @@ spu_builtin_vec_perm (tree type, tree *m
   return d->fndecl;
 }
 
+/* Return the appropriate mode for a named address pointer.  */
+static enum machine_mode
+spu_addr_space_pointer_mode (addr_space_t addrspace)
+{
+  switch (addrspace)
+    {
+    case ADDR_SPACE_GENERIC:
+      return ptr_mode;
+    case ADDR_SPACE_EA:
+      return EAmode;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return the appropriate mode for a named address address.  */
+static enum machine_mode
+spu_addr_space_address_mode (addr_space_t addrspace)
+{
+  switch (addrspace)
+    {
+    case ADDR_SPACE_GENERIC:
+      return Pmode;
+    case ADDR_SPACE_EA:
+      return EAmode;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Determine if one named address space is a subset of another.  */
+
+static bool
+spu_addr_space_subset_p (addr_space_t subset, addr_space_t superset)
+{
+  gcc_assert (subset == ADDR_SPACE_GENERIC || subset == ADDR_SPACE_EA);
+  gcc_assert (superset == ADDR_SPACE_GENERIC || superset == ADDR_SPACE_EA);
+
+  if (subset == superset)
+    return true;
+
+  /* If we have -mno-address-space-conversion, treat __ea and generic as not
+     being subsets but instead as disjoint address spaces.  */
+  else if (TARGET_NO_ADDRESS_SPACE_CONVERSION)
+    return false;
+
+  else
+    return (subset == ADDR_SPACE_GENERIC && superset == ADDR_SPACE_EA);
+}
+
+/* Convert from one address space to another.  */
+static rtx
+spu_addr_space_convert (rtx op, tree from_type, tree to_type)
+{
+  addr_space_t from_as = TYPE_ADDR_SPACE (TREE_TYPE (from_type));
+  addr_space_t to_as = TYPE_ADDR_SPACE (TREE_TYPE (to_type));
+
+  gcc_assert (from_as == ADDR_SPACE_GENERIC || from_as == ADDR_SPACE_EA);
+  gcc_assert (to_as == ADDR_SPACE_GENERIC || to_as == ADDR_SPACE_EA);
+
+  if (to_as == ADDR_SPACE_GENERIC && from_as == ADDR_SPACE_EA)
+    {
+      rtx result, ls;
+
+      ls = gen_const_mem (DImode,
+			  gen_rtx_SYMBOL_REF (Pmode, "__ea_local_store"));
+      set_mem_align (ls, 128);
+
+      result = gen_reg_rtx (Pmode);
+      ls = force_reg (Pmode, convert_modes (Pmode, DImode, ls, 1));
+      op = force_reg (Pmode, convert_modes (Pmode, EAmode, op, 1));
+      ls = emit_conditional_move (ls, NE, op, const0_rtx, Pmode,
+					  ls, const0_rtx, Pmode, 1);
+
+      emit_insn (gen_subsi3 (result, op, ls));
+
+      return result;
+    }
+
+  else if (to_as == ADDR_SPACE_EA && from_as == ADDR_SPACE_GENERIC)
+    {
+      rtx result, ls;
+
+      ls = gen_const_mem (DImode,
+			  gen_rtx_SYMBOL_REF (Pmode, "__ea_local_store"));
+      set_mem_align (ls, 128);
+
+      result = gen_reg_rtx (EAmode);
+      ls = force_reg (EAmode, convert_modes (EAmode, DImode, ls, 1));
+      op = force_reg (Pmode, op);
+      ls = emit_conditional_move (ls, NE, op, const0_rtx, Pmode,
+					  ls, const0_rtx, EAmode, 1);
+      op = force_reg (EAmode, convert_modes (EAmode, Pmode, op, 1));
+
+      if (EAmode == SImode)
+	emit_insn (gen_addsi3 (result, op, ls));
+      else
+	emit_insn (gen_adddi3 (result, op, ls));
+
+      return result;
+    }
+
+  else
+    gcc_unreachable ();
+}
+
+
 /* Count the total number of instructions in each pipe and return the
    maximum, which is used as the Minimum Iteration Interval (MII)
    in the modulo scheduler.  get_pipe() will return -2, -1, 0, or 1.
@@ -6511,9 +6961,46 @@ spu_section_type_flags (tree decl, const
   /* .toe needs to have type @nobits.  */
   if (strcmp (name, ".toe") == 0)
     return SECTION_BSS;
+  /* Don't load _ea into the current address space.  */
+  if (strcmp (name, "._ea") == 0)
+    return SECTION_WRITE | SECTION_DEBUG;
   return default_section_type_flags (decl, name, reloc);
 }
 
+/* Implement targetm.select_section.  */
+static section *
+spu_select_section (tree decl, int reloc, unsigned HOST_WIDE_INT align)
+{
+  /* Variables and constants defined in the __ea address space
+     go into a special section named "._ea".  */
+  if (TREE_TYPE (decl) != error_mark_node
+      && TYPE_ADDR_SPACE (TREE_TYPE (decl)) == ADDR_SPACE_EA)
+    {
+      /* We might get called with string constants, but get_named_section
+	 doesn't like them as they are not DECLs.  Also, we need to set
+	 flags in that case.  */
+      if (!DECL_P (decl))
+	return get_section ("._ea", SECTION_WRITE | SECTION_DEBUG, NULL);
+
+      return get_named_section (decl, "._ea", reloc);
+    }
+
+  return default_elf_select_section (decl, reloc, align);
+}
+
+/* Implement targetm.unique_section.  */
+static void
+spu_unique_section (tree decl, int reloc)
+{
+  /* We don't support unique section names in the __ea address
+     space for now.  */
+  if (TREE_TYPE (decl) != error_mark_node
+      && TYPE_ADDR_SPACE (TREE_TYPE (decl)) != 0)
+    return;
+
+  default_unique_section (decl, reloc);
+}
+
 /* Generate a constant or register which contains 2^SCALE.  We assume
    the result is valid for MODE.  Currently, MODE must be V4SFmode and
    SCALE must be SImode. */
Index: gcc-head/gcc/config/spu/spu.h
===================================================================
--- gcc-head.orig/gcc/config/spu/spu.h
+++ gcc-head/gcc/config/spu/spu.h
@@ -473,6 +473,17 @@ targetm.resolve_overloaded_builtin = spu
 #define ASM_OUTPUT_LABELREF(FILE, NAME) \
   asm_fprintf (FILE, "%U%s", default_strip_name_encoding (NAME))
 
+#define ASM_OUTPUT_SYMBOL_REF(FILE, X) \
+  do							\
+    {							\
+      tree decl;					\
+      assemble_name (FILE, XSTR ((X), 0));		\
+      if ((decl = SYMBOL_REF_DECL ((X))) != 0		\
+	  && TREE_CODE (decl) == VAR_DECL		\
+	  && TYPE_ADDR_SPACE (TREE_TYPE (decl)))	\
+	fputs ("@ppu", FILE);				\
+    } while (0)
+
 
 /* Instruction Output */
 #define REGISTER_NAMES \
@@ -594,6 +605,14 @@ targetm.resolve_overloaded_builtin = spu
   } while (0)
 
 
+/* Address spaces.  */
+#define ADDR_SPACE_GENERIC	0
+#define ADDR_SPACE_EA		1
+
+/* Named address space keywords.  */
+#define TARGET_ADDR_SPACE_KEYWORDS ADDR_SPACE_KEYWORD ("__ea", ADDR_SPACE_EA)
+
+
 /* Builtins.  */
 
 enum spu_builtin_type
Index: gcc-head/gcc/config/spu/spu.opt
===================================================================
--- gcc-head.orig/gcc/config/spu/spu.opt
+++ gcc-head/gcc/config/spu/spu.opt
@@ -82,3 +82,32 @@ Generate code for given CPU
 mtune=
 Target RejectNegative Joined Var(spu_tune_string)
 Schedule code for given CPU
+
+mea32
+Target Report RejectNegative Var(spu_ea_model,32) Init(32)
+Access variables in 32-bit PPU objects (default)
+
+mea64
+Target Report RejectNegative Var(spu_ea_model,64) VarExists
+Access variables in 64-bit PPU objects
+
+maddress-space-conversion
+Target Report RejectNegative InverseMask(NO_ADDRESS_SPACE_CONVERSION)
+Allow conversions between __ea and generic pointers (default)
+
+mno-address-space-conversion
+Target Report RejectNegative Mask(NO_ADDRESS_SPACE_CONVERSION)
+Do not allow conversions between __ea and generic pointers
+
+mcache-size=
+Target Report RejectNegative Joined UInteger
+Size (in KB) of software data cache
+
+matomic-updates
+Target Report
+Atomically write back software data cache lines (default)
+
+mno-atomic-updates
+Target Report
+Do not atomically write back software data cache lines
+
Index: gcc-head/gcc/config/spu/spu_cache.h
===================================================================
--- /dev/null
+++ gcc-head/gcc/config/spu/spu_cache.h
@@ -0,0 +1,39 @@
+/* Copyright (C) 2008, 2009 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3 of the License, or (at your option)
+   any later version.
+
+   This file is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SPU_CACHE_H
+#define _SPU_CACHE_H
+
+void *__cache_fetch_dirty (__ea void *ea, int n_bytes_dirty);
+void *__cache_fetch (__ea void *ea);
+void __cache_evict (__ea void *ea);
+void __cache_flush (void);
+void __cache_touch (__ea void *ea);
+
+#define cache_fetch_dirty(_ea, _n_bytes_dirty) \
+     __cache_fetch_dirty(_ea, _n_bytes_dirty)
+
+#define cache_fetch(_ea) __cache_fetch(_ea)
+#define cache_touch(_ea) __cache_touch(_ea)
+#define cache_evict(_ea) __cache_evict(_ea)
+#define cache_flush() __cache_flush()
+
+#endif
Index: gcc-head/gcc/config/spu/t-spu-elf
===================================================================
--- gcc-head.orig/gcc/config/spu/t-spu-elf
+++ gcc-head/gcc/config/spu/t-spu-elf
@@ -66,14 +66,39 @@ fp-bit.c: $(srcdir)/config/fp-bit.c $(sr
 # Don't let CTOR_LIST end up in sdata section.
 CRTSTUFF_T_CFLAGS =
 
-#MULTILIB_OPTIONS=mlarge-mem/mtest-abi
-#MULTILIB_DIRNAMES=large-mem test-abi
-#MULTILIB_MATCHES=
+# Multi-lib support.
+MULTILIB_OPTIONS=mea64
 
 # Neither gcc or newlib seem to have a standard way to generate multiple
 # crt*.o files.  So we don't use the standard crt0.o name anymore.
 
-EXTRA_MULTILIB_PARTS = crtbegin.o crtend.o
+EXTRA_MULTILIB_PARTS = crtbegin.o crtend.o libgcc_cachemgr.a libgcc_cachemgr_nonatomic.a \
+	libgcc_cache8k.a libgcc_cache16k.a libgcc_cache32k.a libgcc_cache64k.a libgcc_cache128k.a
+
+$(T)cachemgr.o: $(srcdir)/config/spu/cachemgr.c
+	$(GCC_FOR_TARGET) $(LIBGCC2_CFLAGS) $(MULTILIB_CFLAGS) -c $< -o $@
+
+# Specialised rule to add a -D flag.
+$(T)cachemgr_nonatomic.o: $(srcdir)/config/spu/cachemgr.c
+	$(GCC_FOR_TARGET) $(LIBGCC2_CFLAGS) $(MULTILIB_CFLAGS) -DNONATOMIC -c $< -o $@
+
+$(T)libgcc_%.a: $(T)%.o
+	$(AR_FOR_TARGET) -rcs $@ $<
+
+$(T)cache8k.o: $(srcdir)/config/spu/cache.S
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=8 -o $@ -c $<
+
+$(T)cache16k.o: $(srcdir)/config/spu/cache.S
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=16 -o $@ -c $<
+
+$(T)cache32k.o: $(srcdir)/config/spu/cache.S
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=32 -o $@ -c $<
+
+$(T)cache64k.o: $(srcdir)/config/spu/cache.S
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=64 -o $@ -c $<
+
+$(T)cache128k.o: $(srcdir)/config/spu/cache.S
+	$(GCC_FOR_TARGET) $(MULTILIB_CFLAGS) -D__CACHE_SIZE__=128 -o $@ -c $<
 
 LIBGCC = stmp-multilib
 INSTALL_LIBGCC = install-multilib
Index: gcc-head/gcc/doc/invoke.texi
===================================================================
--- gcc-head.orig/gcc/doc/invoke.texi
+++ gcc-head/gcc/doc/invoke.texi
@@ -832,7 +832,11 @@ See RS/6000 and PowerPC Options.
 -msafe-dma -munsafe-dma @gol
 -mbranch-hints @gol
 -msmall-mem -mlarge-mem -mstdmain @gol
--mfixed-range=@var{register-range}}
+-mfixed-range=@var{register-range} @gol
+-mea32 -mea64 @gol
+-maddress-space-conversion -mno-address-space-conversion @gol
+-mcache-size=@var{cache-size} @gol
+-matomic-updates -mno-atomic-updates}
 
 @emph{System V Options}
 @gccoptlist{-Qy  -Qn  -YP,@var{paths}  -Ym,@var{dir}}
@@ -15919,6 +15923,46 @@ useful when compiling kernel code.  A re
 two registers separated by a dash.  Multiple register ranges can be
 specified separated by a comma.
 
+@item -mea32
+@itemx -mea64
+@opindex mea32
+@opindex mea64
+Compile code assuming that pointers to the PPU address space accessed
+via the @code{__ea} named address space qualifier are either 32 or 64
+bits wide.  The default is 32 bits.  As this is an ABI changing option,
+all object code in an executable must be compiled with the same setting.
+
+@item -maddress-space-conversion
+@itemx -mno-address-space-conversion
+@opindex maddress-space-conversion
+@opindex mno-address-space-conversion
+Allow/disallow treating the @code{__ea} address space as superset
+of the generic address space.  This enables explicit type casts
+between @code{__ea} and generic pointer as well as implicit
+conversions of generic pointers to @code{__ea} pointers.  The
+default is to allow address space pointer conversions.
+
+@item -mcache-size=@var{cache-size}
+@opindex mcache-size
+This option controls the version of libgcc that the compiler links to an
+executable and selects a software-managed cache for accessing variables
+in the @code{__ea} address space with a particular cache size.  Possible
+options for @var{cache-size} are @samp{8}, @samp{16}, @samp{32}, @samp{64}
+and @samp{128}.  The default cache size is 64KB.
+
+@item -matomic-updates
+@itemx -mno-atomic-updates
+@opindex matomic-updates
+@opindex mno-atomic-updates
+This option controls the version of libgcc that the compiler links to an
+executable and selects whether atomic updates to the software-managed
+cache of PPU-side variables are used.  If you use atomic updates, changes
+to a PPU variable from SPU code using the @code{__ea} named address space
+qualifier will not interfere with changes to other PPU variables residing
+in the same cache line from PPU code.  If you do not use atomic updates,
+such interference may occur; however, writing back cache lines will be
+more efficient.  The default behavior is to use atomic updates.
+
 @item -mdual-nops
 @itemx -mdual-nops=@var{n}
 @opindex mdual-nops
-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]