Bug 24230 - [4.1 Regression] ICE in extract_insn with altivec
Summary: [4.1 Regression] ICE in extract_insn with altivec
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P1 normal
Target Milestone: 4.1.0
Assignee: Paolo Bonzini
URL:
Keywords: ice-on-valid-code
Depends on:
Blocks:
 
Reported: 2005-10-06 09:44 UTC by Richard Biener
Modified: 2005-11-07 10:41 UTC (History)
6 users (show)

See Also:
Host:
Target: powerpc*-*-*
Build:
Known to work:
Known to fail: 4.1.0
Last reconfirmed: 2005-11-01 21:23:04


Attachments
testcase (8.50 KB, text/plain)
2005-10-06 09:45 UTC, Richard Biener
Details
reduced testcase (547 bytes, text/plain)
2005-10-11 15:41 UTC, Paolo Bonzini
Details
original source file from xvid (xvidcore-1.1.0-beta2) (2.70 KB, text/plain)
2005-10-31 20:22 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2005-10-06 09:44:48 UTC
/usr/lib/gcc/powerpc64-suse-linux/4.1.0/cc1 -fpreprocessed qpel_altivec.i -quiet -dumpbase qpel_altivec.c -maltivec -mabi=altivec -auxbase-strip =build/image/ppc_asm/qpel_altivec.o -O2 -Wall -version -fPIC -fmessage-length=0 -o qpel_altivec.s
../../src/image/ppc_asm/qpel_altivec.c:414: error: unrecognizable insn:
(insn 711 329 710 4 (set (reg:V16QI 90 13)
        (mem/u/c/i:V16QI (symbol_ref/u:SI ("*.LC254") [flags 0x2]) [0 S16 A128])) -1 (nil)
    (nil))
../../src/image/ppc_asm/qpel_altivec.c:414: internal compiler error: in extract_insn, at recog.c:2084
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://www.suse.de/feedback> for instructions.

loads of preprocessor generated altivec intrinsics.  Dunno if this is a
regression.
Comment 1 Richard Biener 2005-10-06 09:45:23 UTC
Created attachment 9905 [details]
testcase

Preprocessed testcase.
Comment 2 Richard Biener 2005-10-06 11:11:23 UTC
Reducing.
Comment 3 Richard Biener 2005-10-06 11:26:05 UTC
Reduced testcase:

  typedef int int32_t;
    typedef unsigned char uint8_t;
    static const __attribute__((altivec(vector__))) signed char FIR_Tab_16[17] = {  
  };
    void H_Pass_16_Altivec_C(uint8_t *Dst, const uint8_t *Src, int32_t H, int32_t BpS, int32_t Rnd) {
  register __attribute__((altivec(vector__))) signed short sums1,sums2;
  register __attribute__((altivec(vector__))) unsigned char ox00;
  register __attribute__((altivec(vector__))) signed char firs;
  __attribute__((altivec(vector__))) unsigned char vec_src;
  __attribute__((altivec(vector__))) unsigned char tmp;
  while(H-- > 0) {
 sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 );
 sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 );
 firs = FIR_Tab_16[2];
 tmp = __builtin_vec_splat(vec_src,(2));
 sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 );
 sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 );
 firs = FIR_Tab_16[3];
 tmp = __builtin_vec_splat(vec_src,(3));
 sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 );
 sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 );
 firs = FIR_Tab_16[4];
 tmp = __builtin_vec_splat(vec_src,(4));
 sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 );
 sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 );
 firs = FIR_Tab_16[5];
 tmp = __builtin_vec_splat(vec_src,(5));
 sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 );
 sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 );
 firs = FIR_Tab_16[6];
 tmp = __builtin_vec_splat(vec_src,(6));
 *((uint8_t*)&tmp) = Src[16*1];
 sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00,tmp),__builtin_vec_unpackh(firs),sums1 );
 sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00,tmp),__builtin_vec_unpackl(firs),sums2 );
 tmp = (__attribute__((altivec(vector__))) unsigned char)((__attribute__((altivec(vector__))) unsigned short) __builtin_altivec_vspltish (((5))));
 sums1 = __builtin_vec_sra(sums1,(__attribute__((altivec(vector__))) unsigned short)tmp);
 sums2 = __builtin_vec_sra(sums2,(__attribute__((altivec(vector__))) unsigned short)tmp);
 tmp = __builtin_vec_packsu(sums1,sums2);
 }
 }
Comment 4 Andrew Pinski 2005-10-06 16:23:05 UTC
Works on PPC-darwin.
Comment 5 Andrew Pinski 2005-10-06 16:24:13 UTC
(In reply to comment #4)
> Works on PPC-darwin.
That was the reduced testcase.  The full testcase I can reproduce there.

Reducing a testcase for ppc-darwin.
Comment 6 Andrew Pinski 2005-10-06 16:44:00 UTC
Reduced testcase:
typedef int int32_t;
typedef unsigned char uint8_t;
typedef __attribute__((altivec(vector__))) signed short vss;
typedef __attribute__((altivec(vector__))) unsigned short vus;
typedef __attribute__((altivec(vector__))) signed char vsc;
typedef __attribute__((altivec(vector__))) unsigned char vuc;
uint8_t *Src;
vsc FIR_Tab_16[17];
void H_Pass_16_Altivec_C(vuc vec_src, vsc firs, vss sums1, vss sums2)
{
  vss t;
  vuc tmp;
  int H = 10;
  while(H-- > 0)
  {
    tmp = __builtin_vec_splat(vec_src,(3));
    t = (vss)__builtin_vec_mergeh(tmp, tmp);
    sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 );
    sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 );
    firs = FIR_Tab_16[4];
    tmp = __builtin_vec_splat(vec_src,(4));
    sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 );
    sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 );
    firs = FIR_Tab_16[5];
    tmp = __builtin_vec_splat(vec_src,(5));
    sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 );
    sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 );
    firs = FIR_Tab_16[6];
    tmp = __builtin_vec_splat(vec_src,(6));
    sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 );
    sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 );
    firs = FIR_Tab_16[7];
    tmp = __builtin_vec_splat(vec_src,(7));
    sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 );
    sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 );
    firs = FIR_Tab_16[8];
    tmp = __builtin_vec_splat(vec_src,(8));
    sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 );
    firs = FIR_Tab_16[9];
    tmp = __builtin_vec_splat(vec_src,(9));
    *((char*)&tmp) = Src[16*1];
    sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp,tmp),__builtin_vec_unpackh(firs),sums1 );
    sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp,tmp),__builtin_vec_unpackl(firs),sums2 );
    tmp = (vuc)((vus) __builtin_altivec_vspltish (((5))));
    sums1 = __builtin_vec_sra(sums1,(vus)tmp);
    tmp = __builtin_vec_packsu(sums1,sums2);
  }
}
Comment 7 Andrew Pinski 2005-10-06 16:49:49 UTC
I am going to say this is 4.1 regression.  We are not legitimizing the memory address for some reason.
Comment 8 janis187 2005-10-06 20:17:05 UTC
A regression hunt on powerpc-linux using the testcase from comment #6
identified this patch from rth:

  http://gcc.gnu.org/ml/gcc-cvs/2005-08/msg01004.html
Comment 9 Paolo Bonzini 2005-10-07 07:19:40 UTC
I'm looking at it.
Comment 10 Paolo Bonzini 2005-10-11 11:38:35 UTC
This is as small as I could make it.  Any other attempt to hoist something causes it not to fail anymore (at -maltivec -O2).  Interesting, given that GCSE *does* the hoisting...

It's a reload problem.



typedef __attribute__((vector_size (16))) unsigned char vec;
void H_Pass_16_Altivec_C(vec vec_src, vec firs, vec sums1, vec sums2,
                         vec *FIR_Tab_16, unsigned char *Src)
{
  vec tmp, spltb3, spltb4, spltb5, spltb6, mrghb3, mrglb3, mrghb4, mrglb4,
    mrghb5, mrglb5, mrghb6, mrglb6, firs0, firs1, firs2, firs3, upkhb0, upklb0,
    upkhb1, upklb1, upkhb2, upklb2, upkhb3, upklb3, upkhb4, spltb7, mrghb7,
    mrglb7, firs4, spltb8, mrghb8, mrglb8, upklb4;

  spltb3 = __builtin_altivec_vspltb (vec_src, 3);
  spltb4 = __builtin_altivec_vspltb (vec_src, 4);
  spltb5 = __builtin_altivec_vspltb (vec_src, 5);
  spltb6 = __builtin_altivec_vspltb (vec_src, 6);
  mrghb3 = __builtin_altivec_vmrghb (spltb3, spltb3);
  mrglb3 = __builtin_altivec_vmrglb (spltb3, spltb3);
  mrghb4 = __builtin_altivec_vmrghb (spltb4, spltb4);
  mrglb4 = __builtin_altivec_vmrglb (spltb4, spltb4);
  mrghb5 = __builtin_altivec_vmrghb (spltb5, spltb5);
  mrglb5 = __builtin_altivec_vmrglb (spltb5, spltb5);
  mrghb6 = __builtin_altivec_vmrghb (spltb6, spltb6);
  mrglb6 = __builtin_altivec_vmrglb (spltb6, spltb6);
  firs0 = FIR_Tab_16[0];
  firs1 = FIR_Tab_16[1];
  firs2 = FIR_Tab_16[2];
  firs3 = FIR_Tab_16[3];
  upkhb0 = __builtin_altivec_vupkhsb (firs0);
  upklb0 = __builtin_altivec_vupklsb (firs0);
  upkhb1 = __builtin_altivec_vupkhsb (firs1);
  upklb1 = __builtin_altivec_vupklsb (firs1);
  upkhb2 = __builtin_altivec_vupkhsb (firs2);
  upklb2 = __builtin_altivec_vupklsb (firs2);
  upkhb3 = __builtin_altivec_vupkhsb (firs3);
  upklb3 = __builtin_altivec_vupklsb (firs3);
  upkhb4 = __builtin_altivec_vupkhsb (firs);
  *(char *) &tmp = (char) *(Src + 16);
L0:
  sums1 = __builtin_altivec_vmladduhm (mrghb3, upkhb4, sums1);
  sums2 = __builtin_altivec_vmladduhm (mrglb3, upkhb4, sums2);
  sums1 = __builtin_altivec_vmladduhm (mrghb4, upkhb0, sums1);
  sums2 = __builtin_altivec_vmladduhm (mrglb4, upklb0, sums2);
  sums1 = __builtin_altivec_vmladduhm (mrghb5, upkhb1, sums1);
  sums2 = __builtin_altivec_vmladduhm (mrglb5, upklb1, sums2);
  sums1 = __builtin_altivec_vmladduhm (mrghb6, upkhb2, sums1);
  sums2 = __builtin_altivec_vmladduhm (mrglb6, upklb2, sums2);
  spltb7 = __builtin_altivec_vspltb (vec_src, 7);
  mrghb7 = __builtin_altivec_vmrghb (spltb7, spltb7);
  sums1 = __builtin_altivec_vmladduhm (mrghb7, upkhb3, sums1);
  mrglb7 = __builtin_altivec_vmrglb (spltb7, spltb7);
  sums2 = __builtin_altivec_vmladduhm (mrglb7, upklb3, sums2);
  firs4 = FIR_Tab_16[4];
  spltb8 = __builtin_altivec_vspltb (vec_src, 8);
  mrghb8 = __builtin_altivec_vmrghb (spltb8, spltb8);
  upkhb4 = __builtin_altivec_vupkhsb (firs4);
  sums1 = __builtin_altivec_vmladduhm (mrghb8, upkhb4, sums1);
  mrglb8 = __builtin_altivec_vmrglb ((vec)tmp, spltb8);
  upklb4 = __builtin_altivec_vupklsb (firs4);
  sums2 = __builtin_altivec_vmladduhm (mrglb8, upklb4, sums2);
  tmp = __builtin_altivec_vspltish (5);
  sums1 = __builtin_altivec_vsrah (sums1, tmp);
  __builtin_altivec_vpkshus (sums1, sums2);
  goto L0;
}
Comment 11 Paolo Bonzini 2005-10-11 15:41:25 UTC
Created attachment 9967 [details]
reduced testcase

reduced testcase, but with uninitialized variables.  top of tree:

2005-09-29  Paolo Bonzini  <bonzini@gnu.org>

        Revert this patch:

        2005-09-15  Paolo Bonzini  <bonzini@gnu.org>

        * optabs.c (expand_binop): Use swap_commutative_operands_with_target
        to order operands.
        (swap_commutative_operands_with_target): New.
Comment 12 Richard Biener 2005-10-26 12:54:55 UTC
reload -> Micha, can you try to track this down?  It makes xvid ICE on beta-ppc.
Comment 13 Steven Bosscher 2005-10-28 15:58:23 UTC
Smaller test case:

// Compile with -O2 -maltivec
//
// Works with GCC 3.3.5 and GCC 4.0.2
// ICEs with GCC 4.1 from today's CVS
#include <altivec.h>
#define REGLIST                                                              \
         "77",  "78",  "79",  "80",  "81",  "82",  "83",  "84",  "85",  "86",\
         "87",  "88",  "89",  "90",  "91",  "92",  "93",  "94",  "95",  "96",\
         "97",  "98",  "99", "100", "101", "102", "103", "104", "105", "106",\
        "107", "108"
 
 
void
foo (int H)
{
  volatile __attribute__ ((altivec (vector__))) unsigned char tmp;
  while (H-- > 0)
    {
      asm ("" : : : REGLIST);
 
      tmp =
        ( __attribute__ ((altivec (vector__))) unsigned
         char) (( __attribute__ ((altivec (vector__))) unsigned short)
                vec_splat_s16 (((5))));
    }
}

Note that this is really a register allocation problem that we fail on because our register allocator doesn't know about liveness inside blocks, only at the start and end of a block.  But the situation is easily reproducible as long as you pump the register pressure up far enough.

The problem seems to be in reload const-to-mem.  We start with this:

(insn:HI 26 22 56 2 (set (mem/v/c/i:V16QI (plus:SI (reg/f:SI 113 sfp)
                (const_int 16 [0x10])) [0 tmp+0 S16 A128])
        (subreg:V16QI (reg:V8HI 128) 0)) 467 {altivec_stvx_v16qi} (insn_list:REG_DEP_TRUE 22 (nil))
    (expr_list:REG_EQUAL (const_vector:V16QI [
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
            ])
        (nil)))


and we end with this (were we ICE on insn 65):

(insn 65 22 64 2 (set (reg:V16QI 77 0)
        (mem/u/c/i:V16QI (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0 S16 A128])) -1 (nil)
    (nil))

(insn 64 65 26 2 (set (reg:SI 9 9)
        (plus:SI (reg/f:SI 1 1)
            (const_int 16 [0x10]))) 31 {*addsi3_internal1} (nil)
    (nil))

(insn:HI 26 64 56 2 (set (mem/v/c/i:V16QI (reg:SI 9 9) [0 tmp+0 S16 A128])
        (reg:V16QI 77 0)) 467 {altivec_stvx_v16qi} (insn_list:REG_DEP_TRUE 22 (nil))
    (expr_list:REG_EQUAL (const_vector:V16QI [
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
                (const_int 0 [0x0])
                (const_int 5 [0x5])
            ])
        (nil)))
Comment 14 Steven Bosscher 2005-10-28 16:22:25 UTC
More background:
Starting program: /abuild/stevenb/build/gcc/cc1 -O2 -maltivec t.c -da
 foo
Analyzing compilation unitPerforming intraprocedural optimizations
Assembling functions:
 foo
Breakpoint 8, find_reloads (insn=0x401069c0, replace=0, ind_levels=0, live_known=1,
    reload_reg_p=0x10a65334) at reload.c:2541
2541      int no_input_reloads = 0, no_output_reloads = 0;
(gdb) disab 8
(gdb) enab 10
(gdb) cont
Continuing.

Breakpoint 10, emit_insn (x=0x40110680) at emit-rtl.c:4430
4430      rtx last = last_insn;
(gdb) cont
Continuing.

Breakpoint 10, emit_insn (x=0x401106c0) at emit-rtl.c:4430
4430      rtx last = last_insn;
(gdb) p debug_rtx(x)
(set (reg:V16QI 77 0)
    (mem/u/c/i:V16QI (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0 S16 A128]))
$52 = void
(gdb) up
#1  0x1069fe58 in rs6000_emit_move (dest=0x4010e7f8, source=0x40110670, mode=V16QImode)
    at rs6000.c:4058
4058      emit_insn (gen_rtx_SET (VOIDmode, operands[0], operands[1]));
(gdb) bt
#0  emit_insn (x=0x401106c0) at emit-rtl.c:4430
#1  0x1069fe58 in rs6000_emit_move (dest=0x4010e7f8, source=0x40110670, mode=V16QImode)
    at rs6000.c:4058
#2  0x10487fd8 in gen_movv16qi (operand0=0x4010e7f8, operand1=0x40110670) at altivec.md:171
#3  0x1033360c in emit_move_insn_1 (x=0x4010e7f8, y=0x40110670) at expr.c:3107
#4  0x10510fa8 in gen_move_insn (x=0x4010e7f8, y=0x40110670) at optabs.c:4214
#5  0x10594ac4 in gen_reload (out=0x4010e7f8, in=0x40110670, opnum=1, type=RELOAD_FOR_INPUT)
    at reload1.c:7606
#6  0x1058fef8 in emit_input_reload_insns (chain=0x10aa3da0, rl=0x10a5f99c, old=0x40110670, j=3)
    at reload1.c:6635
#7  0x10590c30 in do_input_reload (chain=0x10aa3da0, rl=0x10a5f99c, j=3) at reload1.c:6880
#8  0x10591c00 in emit_reload_insns (chain=0x10aa3da0) at reload1.c:7053
#9  0x10585898 in reload_as_needed (live_known=1) at reload1.c:3902
#10 0x1057bdec in reload (first=0x400351b8, global=1) at reload1.c:1067
#11 0x107b452c in global_alloc (file=0x10aacc40) at global.c:628
Comment 15 Steven Bosscher 2005-10-28 16:59:02 UTC
The trouble appears to come from this:

    case V16QImode:
    case V8HImode:
    case V4SFmode:
    case V4SImode:
    case V4HImode:
    case V2SFmode:
    case V2SImode:
    case V1DImode:
      if (CONSTANT_P (operands[1])
          && !easy_vector_constant (operands[1], mode))
        operands[1] = force_const_mem (mode, operands[1]);
      break;

We get here with:
Breakpoint 14, rs6000_emit_move (dest=0x4010e7f8, source=0x40110670, mode=V16QImode)
    at rs6000.c:3867
3867          if (CONSTANT_P (operands[1])
(gdb) p debug_rtx(dest)
(reg:V16QI 77 0)
$3 = void
(gdb) p debug_rtx(source)
(const_vector:V16QI [
        (const_int 0 [0x0])
        (const_int 5 [0x5])
        (const_int 0 [0x0])
        (const_int 5 [0x5])
        (const_int 0 [0x0])
        (const_int 5 [0x5])
        (const_int 0 [0x0])
        (const_int 5 [0x5])
        (const_int 0 [0x0])
        (const_int 5 [0x5])
        (const_int 0 [0x0])
        (const_int 5 [0x5])
        (const_int 0 [0x0])
        (const_int 5 [0x5])
        (const_int 0 [0x0])
        (const_int 5 [0x5])
    ])
$4 = void
(gdb)   


And we go to emit_set with:
(gdb) p debug_rtx (operands[0])
(reg:V16QI 77 0)
$5 = void
(gdb) p debug_rtx (operands[1])
(mem/u/c/i:V16QI (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0 S16 A128])
$6 = void
(gdb)   

Comment 16 Steven Bosscher 2005-10-28 17:01:03 UTC
On IRC it was suggested that we just need to get a version of easy_vector_constant which does the right thing in any mode.
Comment 17 paolo.bonzini@lu.unisi.ch 2005-10-28 19:16:55 UTC
Subject: Re:  [4.1 Regression] ICE in extract_insn with
 altivec


>On IRC it was suggested that we just need to get a version of
>easy_vector_constant which does the right thing in any mode.
>
Yes, it looks like the bug is that the constant is declared "easy" until 
it is in V8HI mode, but not when the reload is done in V16QI mode.

It may make sense to assert !reload_in_progress && !reload_completed 
before force_const_mem is called.

Paolo
Comment 18 Andrew Pinski 2005-10-28 21:47:51 UTC
Aldy,
  Can you look into this bug?
Comment 19 Mark Mitchell 2005-10-31 06:04:27 UTC
Altivec is very popular; this is a showstopper.
Comment 20 Aldy Hernandez 2005-10-31 12:48:55 UTC
I'll take this.
Comment 21 Aldy Hernandez 2005-10-31 20:08:37 UTC
Does anyone have the un-preprocessed source for this bug?  I'm seeing some assignments that should have casts, and I wan't to rule out bogus input.
Comment 22 Andrew Pinski 2005-10-31 20:10:22 UTC
(In reply to comment #21)
> Does anyone have the un-preprocessed source for this bug?  I'm seeing some
> assignments that should have casts, and I wan't to rule out bogus input.

comment #13 has an un preprocessed source for a simplified version.
Comment 23 Richard Biener 2005-10-31 20:22:35 UTC
Created attachment 10087 [details]
original source file from xvid (xvidcore-1.1.0-beta2)

"Source" looks like:

MAKE_PASS_16(V_Pass_Avrg_Up_16_Add_Altivec_C, AVRG_UP_ADD_16_V, VARS_V, LOAD_V_16, STORE_V_16, BpS, 1)

attached.
Comment 24 Paolo Bonzini 2005-11-01 21:05:22 UTC
Aldy, I have a patch for this that only needs more testing.  If you want, and if you do not have any better idea than what I said in comment #17, I can take this.
Comment 25 Aldy Hernandez 2005-11-01 21:16:00 UTC
Bonzini:

Perhaps both approaches would be even better.  We definitely should handle the transformed vector, because theoretically it's still easy to generate.  And adding the extra check you mention would be icing on the cake :).
Comment 26 Paolo Bonzini 2005-11-01 21:23:04 UTC
Okay, taking this.  If you ever want to make SPE constants more optimized, be careful about this bug though! ;-)
Comment 27 Paolo Bonzini 2005-11-07 10:39:44 UTC
Subject: Bug 24230

Author: bonzini
Date: Mon Nov  7 10:39:36 2005
New Revision: 106588

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106588
Log:
2005-11-07  Paolo Bonzini  <bonzini@gnu.org>

	PR target/24230

	* config/rs6000/rs6000.c (easy_vector_splat_const, easy_vector_same,
	gen_easy_vector_constant_add_self): Delete.
	(vspltis_constant, easy_altivec_constant, gen_easy_altivec_constant):
	New.
	(output_vec_const_move): Use gen_easy_altivec_constant.
	(rs6000_expand_vector_init): Do not emit a set of a VEC_DUPLICATE.
	* config/rs6000/predicates.md (easy_vector_constant): Reorganize tests.
	(easy_vector_constant_add_self): Rewritten.
	* config/rs6000/rs6000-protos.h (easy_vector_splat_const,
	easy_vector_same, gen_easy_vector_constant_add_self): Remove prototype.
	(easy_altivec_constant, gen_easy_altivec_constant): Add prototype.

testsuite:
2005-11-07  Paolo Bonzini  <bonzini@gnu.org>

	PR target/24230

        * gcc.target/powerpc/altivec-consts.c,
        gcc.target/powerpc/altivec-splat.c: New testcase.


Added:
    trunk/gcc/testsuite/gcc.target/powerpc/altivec-consts.c
    trunk/gcc/testsuite/gcc.target/powerpc/altivec-splat.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/altivec.md
    trunk/gcc/config/rs6000/predicates.md
    trunk/gcc/config/rs6000/rs6000-protos.h
    trunk/gcc/config/rs6000/rs6000.c
    trunk/gcc/config/rs6000/rs6000.h
    trunk/gcc/testsuite/ChangeLog

Comment 28 Paolo Bonzini 2005-11-07 10:41:35 UTC
patch committed