RFA: DSE replace_read revisited

Sun Oct 28 11:54:00 GMT 2007

A few weeks ago, I posted a patch to fix some problems in the
new RTL DSE replace_read code:

    http://gcc.gnu.org/ml/gcc-patches/2007-09/msg01316.html

However, I found that the testcase added there still fails for some
MIPS targets, and investigating that led me to some wrong-code and
pessimisation issues too.  There seem to be several problems:

  (1) As I complained about in the message above, find_shift_sequence
      does the shift in the mode class of the stored value, rather than
      MODE_INT.  So if we have a V4HI store and a V2HI read (say), we'd
      use a vector shift rather than an integer shift.  We'd therefore
      shift the individual HI elements right, rather than shift the
      whole bit pattern.

  (2) When reading the low end of a stored value, the original code used
      gen_lowpart to do the mode punning:

        emit_move_insn (read_reg, gen_lowpart (read_mode, store_info->rhs));

      This ICEd when a DFmode store was followed by an SFmode move, as in:

        union u { double d; float f[2]; };

        float
        foo (union u *u, double d)
        {
          u->d = d;
          return u->f[1];
        }

      for big-endian targets, because SFmode subregs of DFmode values
      are not allowed.  My attempt to fix the integer cases for MIPS
      meant that we'd now convert the DFmode value into SFmode instead,
      which is of course also wrong.

      (Although this particular example isn't very realistic,
      the equivalent TFmode/DFmode one matters for targets with
      IEEE long doubles.)

  (3) The code triggers for cases where it is a pessimisation.
      E.g. on hard-float MIPS targets, if a DFmode value is stored in
      an FPR (as we hope most DFmode values are), we cannot directly
      treat the low part as an SFmode value.  The sequence generated
      for a lowpart would involve a move into and then out of integer
      registers, and it would have been better to keep the original load.

      However, on soft-float MIPS targets, replacing an SFmode load
      of a DFmode store is the right thing to do.

  (4) The code requires the store and read to have the same mode class:

        if (GET_MODE_CLASS (read_mode) != GET_MODE_CLASS (store_mode))
          return false;

      This seems unnecessarily restrictive.  I guess it was meant to
      be a simplifying assumption, but as per (1) and (2) above,
      it isn't really.

  (5) The shift case can't deal with floats, even though it would be
      profitable to do so on some soft-float targets.

  (6) The code only deals with REGs:

        if (store_info->is_set 
            /* No place to keep the value after ra.  */
            && !reload_completed
            /* The careful reviewer may wish to comment my checking that the
               rhs of a store is always a reg.  */
            && REG_P (SET_SRC (body))
            /* Sometimes the store and reload is used for truncation and
              rounding.  */
            && !(FLOAT_MODE_P (GET_MODE (mem)) && (flag_float_store)))

      This does indeed seem unnecessarily restrictive, and is the cause of
      the dse-1.c failures that led me here originally.  The code could
      easily handle subregs and constants as well.

Addressing (1)-(5) together, I think part of the problem is that the
code has too many special exceptions.  From a high-level perspective,
I think we need two things:

  (a) a way of telling whether a mode change is profitable

  (b) a way of extracting the low bits of a value of one arbitrary mode
      into a value of another arbitrary mode

(a) we already have in the form of MODES_TIEABLE_P.  So rather than
check whether the two modes are the same class, I think we should check
whether each individual mode change we make is profitable accordingly to
MODES_TIEABLE_P.

(b) we don't have (AFAIK), so I added a new routine called extract_low_bits.
Quoting the function's comment:

   Try to read the low bits of SRC as an rvalue of mode MODE, preserving
   the bit pattern.  SRC_MODE is the mode of SRC; if this is smaller than
   MODE, fill the upper bits with zeros.  Fail if the layout of either
   mode is unknown (as for CC modes) or if the extraction would involve
   unprofitable mode punning.  Return the value on success, otherwise
   return null.

   This is different from gen_lowpart* in these respects:

     - the returned value must always be considered an rvalue

     - when MODE is wider than SRC_MODE, the extraction involves
       a zero extension

     - when MODE is smaller than SRC_MODE, the extraction involves
       a truncation (and is thus subject to TRULY_NOOP_TRUNCATION).

   In other words, this routine performs a computation, whereas the
   gen_lowpart* routines are conceptually lvalue or rvalue subreg
   operations.

We don't need the extension case for DSE -- MODE is never bigger than
SRC_MODE -- but it falls out of the wash.  We might one day need to do
something like this even if the extraction _would_ involve unprofitable
mode punning; not in DSE, but elsewhere.  If that happens, we can always
add a parameter to control the check.

The patch below therefore:

  - allows the two modes to have different classes, fixing (4)

  - allows the stored value to be a SUBREG or a constant, fixing (6)

  - always does the shift in MODE_INT, fixing (1)

  - makes find_shift_sequence iterate on the shift mode rather than
    access_size, which is more efficient and IMO a little clearer

  - uses MODES_TIEABLE_P to decide whether it is profitable to convert
    the stored value into the shift mode

  - uses gen_simplify_subreg to convert the stored value into the shift
    mode; this is a paradoxical subreg operation, and is allowed to
    fail if the mode change is not valid

  - uses extract_low_bits to convert the shifted value into the read value

  - uses extract_low_bits to convert the store value into the read value
    when no shift is needed, fixing (2) and (3)

  - removes:

      if (access_size > UNITS_PER_WORD || FLOAT_MODE_P (store_mode))
        return false;

    the first check is redundant with find_shift_sequence and removing
    the second fixes (4)

  - computes the nonshift sequence as well as the shift sequence
    before trying the replacement, since the nonshift sequence
    can now fail as well.

FWIW, I tested the patch on CSiBE for mipsisa64-elf at -Os.
The results were:

jikespg:src/spacetab                             21540    21532 :   99.96%
jikespg:src/produce                              12664    12640 :   99.81%
jikespg:src/mkfirst                              17024    17012 :   99.93%
jikespg:src/remsp                                 9036     9024 :   99.87%
libmspack:test/md5                                4812     4808 :   99.92%
linux:kernel/sched                                5992     5988 :   99.93%
lwip:src/core/tcp_output                          3284     3272 :   99.63%
OpenTCP:tcp                                       7230     7242 :  100.17%
teem:src/limn/cam                                 3780     3772 :   99.79%
teem:src/dye/methodsDye                           2844     2820 :   99.16%
teem:src/gage/sclanswer                           4692     4684 :   99.83%
unrarlib:unrarlib/unrarlib                       15672    15676 :  100.03%
zlib:deflate                                      8460     8464 :  100.05%
----------------------------------------------------------------
Total                                          3605989  3605893 :  100.00%

The three increases are due to cases in which we convert a sign-extending
SImode load into a register extension; the former is a single instruction
whereas the latter takes two.  This is something that DSE could do for
REG arguments even before the patch, and it might suggest that we need
to check whether the new instruction is more expensive than the old.
However, the MIPS port gives the two instructions the same cost anyway,
even for -Os, so it wouldn't make any difference as things stand.

I'm afraid that giving a load or store a cost of 1 instruction will lead
to some silly optimisation decisions.  It might be worthwhile if we've
being very aggressive about optimising for size (as is usually the case
for MIPS16) but it seems dangerous otherwise.  I'm therefore not too
worried about the three extra instructions, given that the net saving
is much higher.

The TCP increase is 3 instructions rather than 1 because of a missed
SIGN_EXTEND optimisation in simplify-rtx.c.  I have a separate patch
for that, but I'm not convinced it's stage 3 material.

Bootstrapped & regression-tested on x86_64-linux-gnu.  Also
regression-tested on mipsisa64-elf and mipsisa32-elf, fixing
dse-1.c for the former.  I've added a case that shows why
handling constants is worthwhile.  OK to install?

Richard


gcc/
	* Makefile.in (dse.o): Depend on $(TM_P_H).
	* optabs.h (extract_low_bits): Declare.
	* optabs.c (extract_low_bits): New function.
	* rtlhooks.c (gen_lowpart_general): Generalize SUBREG handling.
	* dse.c: Include tm_p.h.
	(find_shift_sequence): Remove the read_reg argument and return the
	read value.  Emit the instructions instead of returning them.
	Iterate on new_mode rather than calculating it each time.
	Check MODES_TIEABLE_P.  Use simplify_gen_subreg to convert the
	source to NEW_MODE and extract_low_bits to convert the shifted
	value to READ_MODE.
	(replace_read): Allow the load and store to have different mode
	classes.  Use extract_low_bits when SHIFT == 0.  Create the shift
	or extraction instructions before trying the replacement.  Update
	dump-file code accordingly, avoiding use of REGNO (store_info->rhs).

gcc/testsuite/
	* gcc.target/mips/dse-1.c: Add checks for zeros.

Index: gcc/Makefile.in
===================================================================

--- gcc/Makefile.in	2007-10-27 09:25:22.000000000 +0100
+++ gcc/Makefile.in	2007-10-27 09:34:07.000000000 +0100
@@ -2537,8 +2537,8 @@ dce.o : dce.c $(CONFIG_H) $(SYSTEM_H) co
    $(TREE_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) $(DF_H) cselib.h \
    $(DBGCNT_H) dce.h timevar.h tree-pass.h $(DBGCNT_H)
 dse.o : dse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
-   $(TREE_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(RECOG_H) \
-   $(EXPR_H) $(DF_H) cselib.h $(DBGCNT_H) timevar.h tree-pass.h \
+   $(TREE_H) $(TM_P_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h \
+   $(RECOG_H) $(EXPR_H) $(DF_H) cselib.h $(DBGCNT_H) timevar.h tree-pass.h \
    alloc-pool.h $(ALIAS_H) dse.h
 fwprop.o : fwprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    toplev.h insn-config.h $(RECOG_H) $(FLAGS_H) $(OBSTACK_H) $(BASIC_BLOCK_H) \
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2007-10-27 09:25:22.000000000 +0100
+++ gcc/optabs.h	2007-10-27 09:34:07.000000000 +0100
@@ -770,6 +770,8 @@ extern rtx expand_vec_cond_expr (tree, r
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (tree, rtx);
 
+extern rtx extract_low_bits (enum machine_mode, enum machine_mode, rtx);
+
 #define optab_handler(optab,mode) (&(optab)->handlers[(int) (mode)])
 #define convert_optab_handler(optab,mode,mode2) \
 	(&(optab)->handlers[(int) (mode)][(int) (mode2)])
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2007-10-27 09:25:22.000000000 +0100
+++ gcc/optabs.c	2007-10-28 09:26:43.000000000 +0000
@@ -4984,6 +4984,58 @@ gen_move_insn (rtx x, rtx y)
   end_sequence ();
   return seq;
 }
+
+/* Try to read the low bits of SRC as an rvalue of mode MODE, preserving
+   the bit pattern.  SRC_MODE is the mode of SRC; if this is smaller than
+   MODE, fill the upper bits with zeros.  Fail if the layout of either
+   mode is unknown (as for CC modes) or if the extraction would involve
+   unprofitable mode punning.  Return the value on success, otherwise
+   return null.
+
+   This is different from gen_lowpart* in these respects:
+
+     - the returned value must always be considered an rvalue
+
+     - when MODE is wider than SRC_MODE, the extraction involves
+       a zero extension
+
+     - when MODE is smaller than SRC_MODE, the extraction involves
+       a truncation (and is thus subject to TRULY_NOOP_TRUNCATION).
+
+   In other words, this routine performs a computation, whereas the
+   gen_lowpart* routines are conceptually lvalue or rvalue subreg
+   operations.  */
+
+rtx
+extract_low_bits (enum machine_mode mode, enum machine_mode src_mode, rtx src)
+{
+  enum machine_mode int_mode, src_int_mode;
+
+  if (mode == src_mode)
+    return src;
+
+  if (CONSTANT_P (src))
+    return simplify_gen_subreg (mode, src, src_mode,
+				subreg_lowpart_offset (mode, src_mode));
+
+  if (GET_MODE_CLASS (mode) == MODE_CC || GET_MODE_CLASS (src_mode) == MODE_CC)
+    return NULL_RTX;
+
+  src_int_mode = int_mode_for_mode (src_mode);
+  int_mode = int_mode_for_mode (mode);
+  if (src_int_mode == BLKmode || int_mode == BLKmode)
+    return NULL_RTX;
+
+  if (!MODES_TIEABLE_P (src_int_mode, src_mode))
+    return NULL_RTX;
+  if (!MODES_TIEABLE_P (int_mode, mode))
+    return NULL_RTX;
+
+  src = gen_lowpart (src_int_mode, src);
+  src = convert_modes (int_mode, src_int_mode, src, true);
+  src = gen_lowpart (mode, src);
+  return src;
+}
 
 /* Return the insn code used to extend FROM_MODE to TO_MODE.
    UNSIGNEDP specifies zero-extension instead of sign-extension.  If
Index: gcc/rtlhooks.c
===================================================================
--- gcc/rtlhooks.c	2007-10-27 09:25:22.000000000 +0100
+++ gcc/rtlhooks.c	2007-10-27 09:34:07.000000000 +0100
@@ -43,11 +43,9 @@ gen_lowpart_general (enum machine_mode m
 
   if (result)
     return result;
-  /* If it's a REG, it must be a hard reg that's not valid in MODE.  */
-  else if (REG_P (x)
-	   /* Or we could have a subreg of a floating point value.  */
-	   || (GET_CODE (x) == SUBREG
-	       && FLOAT_MODE_P (GET_MODE (SUBREG_REG (x)))))
+  /* Handle SUBREGs and hard REGs that were rejected by
+     simplify_gen_subreg.  */
+  else if (REG_P (x) || GET_CODE (x) == SUBREG)
     {
       result = gen_lowpart_common (mode, copy_to_reg (x));
       gcc_assert (result != 0);
Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2007-10-27 09:25:22.000000000 +0100
+++ gcc/dse.c	2007-10-27 09:34:07.000000000 +0100
@@ -29,6 +29,7 @@ Software Foundation; either version 3, o
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
 #include "flags.h"
@@ -1381,9 +1382,9 @@ record_store (rtx body, bb_info_t bb_inf
   if (store_info->is_set 
       /* No place to keep the value after ra.  */
       && !reload_completed
-      /* The careful reviewer may wish to comment my checking that the
-	 rhs of a store is always a reg.  */
-      && REG_P (SET_SRC (body))
+      && (REG_P (SET_SRC (body))
+	  || GET_CODE (SET_SRC (body)) == SUBREG
+	  || CONSTANT_P (SET_SRC (body)))
       /* Sometimes the store and reload is used for truncation and
 	 rounding.  */
       && !(FLOAT_MODE_P (GET_MODE (mem)) && (flag_float_store)))
@@ -1416,15 +1417,15 @@ dump_insn_info (const char * start, insn
    shift.  */
 
 static rtx
-find_shift_sequence (rtx read_reg,
-		     int access_size,
+find_shift_sequence (int access_size,
 		     store_info_t store_info,
 		     read_info_t read_info,
 		     int shift)
 {
   enum machine_mode store_mode = GET_MODE (store_info->mem);
   enum machine_mode read_mode = GET_MODE (read_info->mem);
-  rtx chosen_seq = NULL;
+  enum machine_mode new_mode;
+  rtx read_reg = NULL;
 
   /* Some machines like the x86 have shift insns for each size of
      operand.  Other machines like the ppc or the ia-64 may only have
@@ -1433,21 +1434,31 @@ find_shift_sequence (rtx read_reg,
      justify the value we want to read but is available in one insn on
      the machine.  */
 
-  for (; access_size <= UNITS_PER_WORD; access_size *= 2)
+  for (new_mode = smallest_mode_for_size (access_size * BITS_PER_UNIT,
+					  MODE_INT);
+       GET_MODE_BITSIZE (new_mode) <= BITS_PER_WORD;
+       new_mode = GET_MODE_WIDER_MODE (new_mode))
     {
-      rtx target, new_reg, shift_seq, insn;
-      enum machine_mode new_mode;
+      rtx target, new_reg, shift_seq, insn, new_lhs;
       int cost;
 
-      /* Try a wider mode if truncating the store mode to ACCESS_SIZE
-	 bytes requires a real instruction.  */
-      if (access_size < GET_MODE_SIZE (store_mode)
-	  && !TRULY_NOOP_TRUNCATION (access_size * BITS_PER_UNIT,
+      /* Try a wider mode if truncating the store mode to NEW_MODE
+	 requires a real instruction.  */
+      if (GET_MODE_BITSIZE (new_mode) < GET_MODE_BITSIZE (store_mode)
+	  && !TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (new_mode),
 				     GET_MODE_BITSIZE (store_mode)))
 	continue;
 
-      new_mode = smallest_mode_for_size (access_size * BITS_PER_UNIT,
-					 GET_MODE_CLASS (read_mode));
+      /* Also try a wider mode if the necessary punning is either not
+	 desirable or not possible.  */
+      if (!CONSTANT_P (store_info->rhs)
+	  && !MODES_TIEABLE_P (new_mode, store_mode))
+	continue;
+      new_lhs = simplify_gen_subreg (new_mode, copy_rtx (store_info->rhs),
+				     store_mode, 0);
+      if (new_lhs == NULL_RTX)
+	continue;
+
       new_reg = gen_reg_rtx (new_mode);
 
       start_sequence ();
@@ -1484,31 +1495,13 @@ find_shift_sequence (rtx read_reg,
 	 take the value from the store and put it into the
 	 shift pseudo, then shift it, then generate another
 	 move to put in into the target of the read.  */
-      start_sequence ();
-      emit_move_insn (new_reg, gen_lowpart (new_mode, store_info->rhs));
+      emit_move_insn (new_reg, new_lhs);
       emit_insn (shift_seq);
-      convert_move (read_reg, new_reg, 1);
-		  
-      if (dump_file)
-	{
-	  fprintf (dump_file, " -- adding extract insn r%d:%s = r%d:%s\n",
-		   REGNO (new_reg), GET_MODE_NAME (new_mode),
-		   REGNO (store_info->rhs), GET_MODE_NAME (store_mode));
-		      
-	  fprintf (dump_file, " -- with shift of r%d by %d\n",
-		   REGNO(new_reg), shift);
-	  fprintf (dump_file, " -- and second extract insn r%d:%s = r%d:%s\n",
-		   REGNO (read_reg), GET_MODE_NAME (read_mode),
-		   REGNO (new_reg), GET_MODE_NAME (new_mode));
-	}
-		  
-      /* Get the three insn sequence and return it.  */
-      chosen_seq = get_insns ();
-      end_sequence ();
+      read_reg = extract_low_bits (read_mode, new_mode, new_reg);
       break;
     }
 
-  return chosen_seq;
+  return read_reg;
 }
 
 
@@ -1551,15 +1544,11 @@ replace_read (store_info_t store_info, i
   enum machine_mode read_mode = GET_MODE (read_info->mem);
   int shift;
   int access_size; /* In bytes.  */
-  rtx read_reg = gen_reg_rtx (read_mode);
-  rtx shift_seq = NULL;
+  rtx insns, read_reg;
 
   if (!dbg_cnt (dse))
     return false;
 
-  if (GET_MODE_CLASS (read_mode) != GET_MODE_CLASS (store_mode))
-    return false;
-
   /* To get here the read is within the boundaries of the write so
      shift will never be negative.  Start out with the shift being in
      bytes.  */
@@ -1573,62 +1562,43 @@ replace_read (store_info_t store_info, i
   /* From now on it is bits.  */
   shift *= BITS_PER_UNIT;
 
-  /* We need to keep this in perspective.  We are replacing a read
+  /* Create a sequence of instructions to set up the read register.
+     This sequence goes immediately before the store and its result
+     is read by the load.
+
+     We need to keep this in perspective.  We are replacing a read
      with a sequence of insns, but the read will almost certainly be
      in cache, so it is not going to be an expensive one.  Thus, we
      are not willing to do a multi insn shift or worse a subroutine
      call to get rid of the read.  */
+  if (dump_file)
+    fprintf (dump_file, "trying to replace %smode load in insn %d"
+	     " from %smode store in insn %d\n",
+	     GET_MODE_NAME (read_mode), INSN_UID (read_insn->insn),
+	     GET_MODE_NAME (store_mode), INSN_UID (store_insn->insn));
+  start_sequence ();
   if (shift)
+    read_reg = find_shift_sequence (access_size, store_info, read_info, shift);
+  else
+    read_reg = extract_low_bits (read_mode, store_mode,
+				 copy_rtx (store_info->rhs));
+  if (read_reg == NULL_RTX)
     {
-      if (access_size > UNITS_PER_WORD || FLOAT_MODE_P (store_mode))
-	return false;
-
-      shift_seq = find_shift_sequence (read_reg, access_size, store_info,
-				       read_info, shift);
-      if (!shift_seq)
-	return false;
+      end_sequence ();
+      if (dump_file)
+	fprintf (dump_file, " -- could not extract bits of stored value\n");
+      return false;
     }
-
-  if (dump_file)
-    fprintf (dump_file, "replacing load at %d from store at %d\n",
-	     INSN_UID (read_insn->insn), INSN_UID (store_insn->insn)); 
+  /* Force the value into a new register so that it won't be clobbered
+     between the store and the load.  */
+  read_reg = copy_to_mode_reg (read_mode, read_reg);
+  insns = get_insns ();
+  end_sequence ();
 
   if (validate_change (read_insn->insn, loc, read_reg, 0))
     {
-      rtx insns;
       deferred_change_t deferred_change = pool_alloc (deferred_change_pool);
       
-      if (read_mode == store_mode)
-	{
-	  start_sequence ();
-	  
-	  /* The modes are the same and everything lines up.  Just
-	     generate a simple move.  */
-	  emit_move_insn (read_reg, store_info->rhs);
-	  if (dump_file)
-	    fprintf (dump_file, " -- adding move insn r%d = r%d\n",
-		     REGNO (read_reg), REGNO (store_info->rhs));
-	  insns = get_insns ();
-	  end_sequence ();
-	}
-      else if (shift)
-	insns = shift_seq;
-      else
-	{
-	  /* The modes are different but the lsb are in the same
-	     place, we need to extract the value in the right from the
-	     rhs of the store.  */
-	  start_sequence ();
-	  convert_move (read_reg, store_info->rhs, 1);
-	  
-	  if (dump_file)
-	    fprintf (dump_file, " -- adding extract insn r%d:%s = r%d:%s\n",
-		     REGNO (read_reg), GET_MODE_NAME (read_mode),
-		     REGNO (store_info->rhs), GET_MODE_NAME (store_mode));
-	  insns = get_insns ();
-	  end_sequence ();
-	}
-
       /* Insert this right before the store insn where it will be safe
 	 from later insns that might change it before the read.  */
       emit_insn_before (insns, store_insn->insn);
@@ -1666,12 +1636,22 @@ replace_read (store_info_t store_info, i
 	 rest of dse, play like this read never happened.  */
       read_insn->read_rec = read_info->next;
       pool_free (read_info_pool, read_info);
+      if (dump_file)
+	{
+	  fprintf (dump_file, " -- replaced the loaded MEM with ");
+	  print_simple_rtl (dump_file, read_reg);
+	  fprintf (dump_file, "\n");
+	}
       return true;
     }
   else 
     {
       if (dump_file)
-	fprintf (dump_file, " -- validation failure\n"); 
+	{
+	  fprintf (dump_file, " -- replacing the loaded MEM with ");
+	  print_simple_rtl (dump_file, read_reg);
+	  fprintf (dump_file, " led to an invalid instruction\n");
+	}
       return false;
     }
 }
Index: gcc/testsuite/gcc.target/mips/dse-1.c
===================================================================
--- gcc/testsuite/gcc.target/mips/dse-1.c	2007-10-27 09:25:22.000000000 +0100
+++ gcc/testsuite/gcc.target/mips/dse-1.c	2007-10-27 09:34:07.000000000 +0100
@@ -14,6 +14,13 @@ #define TEST(ID, TYPE1, TYPE2)					\
     u->m2 = x;							\
     return (u->m1[0]						\
 	    + u->m1[sizeof (TYPE2) / sizeof (TYPE1) - 1]);	\
+  }								\
+								\
+  TYPE1 __attribute__((nomips16))				\
+  g##ID (union u##ID *u)					\
+  {								\
+    u->m2 = 0;							\
+    return (u->m1[0] | u->m1[1]);				\
   }
 
 TEST (1, unsigned int, unsigned long long);