/usr/lib/gcc/powerpc64-suse-linux/4.1.0/cc1 -fpreprocessed qpel_altivec.i -quiet -dumpbase qpel_altivec.c -maltivec -mabi=altivec -auxbase-strip =build/image/ppc_asm/qpel_altivec.o -O2 -Wall -version -fPIC -fmessage-length=0 -o qpel_altivec.s ../../src/image/ppc_asm/qpel_altivec.c:414: error: unrecognizable insn: (insn 711 329 710 4 (set (reg:V16QI 90 13) (mem/u/c/i:V16QI (symbol_ref/u:SI ("*.LC254") [flags 0x2]) [0 S16 A128])) -1 (nil) (nil)) ../../src/image/ppc_asm/qpel_altivec.c:414: internal compiler error: in extract_insn, at recog.c:2084 Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://www.suse.de/feedback> for instructions. loads of preprocessor generated altivec intrinsics. Dunno if this is a regression.
Created attachment 9905 [details] testcase Preprocessed testcase.
Reducing.
Reduced testcase: typedef int int32_t; typedef unsigned char uint8_t; static const __attribute__((altivec(vector__))) signed char FIR_Tab_16[17] = { }; void H_Pass_16_Altivec_C(uint8_t *Dst, const uint8_t *Src, int32_t H, int32_t BpS, int32_t Rnd) { register __attribute__((altivec(vector__))) signed short sums1,sums2; register __attribute__((altivec(vector__))) unsigned char ox00; register __attribute__((altivec(vector__))) signed char firs; __attribute__((altivec(vector__))) unsigned char vec_src; __attribute__((altivec(vector__))) unsigned char tmp; while(H-- > 0) { sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[2]; tmp = __builtin_vec_splat(vec_src,(2)); sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[3]; tmp = __builtin_vec_splat(vec_src,(3)); sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[4]; tmp = __builtin_vec_splat(vec_src,(4)); sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[5]; tmp = __builtin_vec_splat(vec_src,(5)); sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[6]; tmp = __builtin_vec_splat(vec_src,(6)); *((uint8_t*)&tmp) = Src[16*1]; sums1 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergeh(ox00,tmp),__builtin_vec_unpackh(firs),sums1 ); sums2 = __builtin_vec_mladd( (__attribute__((altivec(vector__))) signed short)__builtin_vec_mergel(ox00,tmp),__builtin_vec_unpackl(firs),sums2 ); tmp = (__attribute__((altivec(vector__))) unsigned char)((__attribute__((altivec(vector__))) unsigned short) __builtin_altivec_vspltish (((5)))); sums1 = __builtin_vec_sra(sums1,(__attribute__((altivec(vector__))) unsigned short)tmp); sums2 = __builtin_vec_sra(sums2,(__attribute__((altivec(vector__))) unsigned short)tmp); tmp = __builtin_vec_packsu(sums1,sums2); } }
Works on PPC-darwin.
(In reply to comment #4) > Works on PPC-darwin. That was the reduced testcase. The full testcase I can reproduce there. Reducing a testcase for ppc-darwin.
Reduced testcase: typedef int int32_t; typedef unsigned char uint8_t; typedef __attribute__((altivec(vector__))) signed short vss; typedef __attribute__((altivec(vector__))) unsigned short vus; typedef __attribute__((altivec(vector__))) signed char vsc; typedef __attribute__((altivec(vector__))) unsigned char vuc; uint8_t *Src; vsc FIR_Tab_16[17]; void H_Pass_16_Altivec_C(vuc vec_src, vsc firs, vss sums1, vss sums2) { vss t; vuc tmp; int H = 10; while(H-- > 0) { tmp = __builtin_vec_splat(vec_src,(3)); t = (vss)__builtin_vec_mergeh(tmp, tmp); sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[4]; tmp = __builtin_vec_splat(vec_src,(4)); sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[5]; tmp = __builtin_vec_splat(vec_src,(5)); sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[6]; tmp = __builtin_vec_splat(vec_src,(6)); sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[7]; tmp = __builtin_vec_splat(vec_src,(7)); sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 ); sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp, tmp), __builtin_vec_unpackl(firs), sums2 ); firs = FIR_Tab_16[8]; tmp = __builtin_vec_splat(vec_src,(8)); sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp, tmp), __builtin_vec_unpackh(firs), sums1 ); firs = FIR_Tab_16[9]; tmp = __builtin_vec_splat(vec_src,(9)); *((char*)&tmp) = Src[16*1]; sums1 = __builtin_vec_mladd( (vss)__builtin_vec_mergeh(tmp,tmp),__builtin_vec_unpackh(firs),sums1 ); sums2 = __builtin_vec_mladd( (vss)__builtin_vec_mergel(tmp,tmp),__builtin_vec_unpackl(firs),sums2 ); tmp = (vuc)((vus) __builtin_altivec_vspltish (((5)))); sums1 = __builtin_vec_sra(sums1,(vus)tmp); tmp = __builtin_vec_packsu(sums1,sums2); } }
I am going to say this is 4.1 regression. We are not legitimizing the memory address for some reason.
A regression hunt on powerpc-linux using the testcase from comment #6 identified this patch from rth: http://gcc.gnu.org/ml/gcc-cvs/2005-08/msg01004.html
I'm looking at it.
This is as small as I could make it. Any other attempt to hoist something causes it not to fail anymore (at -maltivec -O2). Interesting, given that GCSE *does* the hoisting... It's a reload problem. typedef __attribute__((vector_size (16))) unsigned char vec; void H_Pass_16_Altivec_C(vec vec_src, vec firs, vec sums1, vec sums2, vec *FIR_Tab_16, unsigned char *Src) { vec tmp, spltb3, spltb4, spltb5, spltb6, mrghb3, mrglb3, mrghb4, mrglb4, mrghb5, mrglb5, mrghb6, mrglb6, firs0, firs1, firs2, firs3, upkhb0, upklb0, upkhb1, upklb1, upkhb2, upklb2, upkhb3, upklb3, upkhb4, spltb7, mrghb7, mrglb7, firs4, spltb8, mrghb8, mrglb8, upklb4; spltb3 = __builtin_altivec_vspltb (vec_src, 3); spltb4 = __builtin_altivec_vspltb (vec_src, 4); spltb5 = __builtin_altivec_vspltb (vec_src, 5); spltb6 = __builtin_altivec_vspltb (vec_src, 6); mrghb3 = __builtin_altivec_vmrghb (spltb3, spltb3); mrglb3 = __builtin_altivec_vmrglb (spltb3, spltb3); mrghb4 = __builtin_altivec_vmrghb (spltb4, spltb4); mrglb4 = __builtin_altivec_vmrglb (spltb4, spltb4); mrghb5 = __builtin_altivec_vmrghb (spltb5, spltb5); mrglb5 = __builtin_altivec_vmrglb (spltb5, spltb5); mrghb6 = __builtin_altivec_vmrghb (spltb6, spltb6); mrglb6 = __builtin_altivec_vmrglb (spltb6, spltb6); firs0 = FIR_Tab_16[0]; firs1 = FIR_Tab_16[1]; firs2 = FIR_Tab_16[2]; firs3 = FIR_Tab_16[3]; upkhb0 = __builtin_altivec_vupkhsb (firs0); upklb0 = __builtin_altivec_vupklsb (firs0); upkhb1 = __builtin_altivec_vupkhsb (firs1); upklb1 = __builtin_altivec_vupklsb (firs1); upkhb2 = __builtin_altivec_vupkhsb (firs2); upklb2 = __builtin_altivec_vupklsb (firs2); upkhb3 = __builtin_altivec_vupkhsb (firs3); upklb3 = __builtin_altivec_vupklsb (firs3); upkhb4 = __builtin_altivec_vupkhsb (firs); *(char *) &tmp = (char) *(Src + 16); L0: sums1 = __builtin_altivec_vmladduhm (mrghb3, upkhb4, sums1); sums2 = __builtin_altivec_vmladduhm (mrglb3, upkhb4, sums2); sums1 = __builtin_altivec_vmladduhm (mrghb4, upkhb0, sums1); sums2 = __builtin_altivec_vmladduhm (mrglb4, upklb0, sums2); sums1 = __builtin_altivec_vmladduhm (mrghb5, upkhb1, sums1); sums2 = __builtin_altivec_vmladduhm (mrglb5, upklb1, sums2); sums1 = __builtin_altivec_vmladduhm (mrghb6, upkhb2, sums1); sums2 = __builtin_altivec_vmladduhm (mrglb6, upklb2, sums2); spltb7 = __builtin_altivec_vspltb (vec_src, 7); mrghb7 = __builtin_altivec_vmrghb (spltb7, spltb7); sums1 = __builtin_altivec_vmladduhm (mrghb7, upkhb3, sums1); mrglb7 = __builtin_altivec_vmrglb (spltb7, spltb7); sums2 = __builtin_altivec_vmladduhm (mrglb7, upklb3, sums2); firs4 = FIR_Tab_16[4]; spltb8 = __builtin_altivec_vspltb (vec_src, 8); mrghb8 = __builtin_altivec_vmrghb (spltb8, spltb8); upkhb4 = __builtin_altivec_vupkhsb (firs4); sums1 = __builtin_altivec_vmladduhm (mrghb8, upkhb4, sums1); mrglb8 = __builtin_altivec_vmrglb ((vec)tmp, spltb8); upklb4 = __builtin_altivec_vupklsb (firs4); sums2 = __builtin_altivec_vmladduhm (mrglb8, upklb4, sums2); tmp = __builtin_altivec_vspltish (5); sums1 = __builtin_altivec_vsrah (sums1, tmp); __builtin_altivec_vpkshus (sums1, sums2); goto L0; }
Created attachment 9967 [details] reduced testcase reduced testcase, but with uninitialized variables. top of tree: 2005-09-29 Paolo Bonzini <bonzini@gnu.org> Revert this patch: 2005-09-15 Paolo Bonzini <bonzini@gnu.org> * optabs.c (expand_binop): Use swap_commutative_operands_with_target to order operands. (swap_commutative_operands_with_target): New.
reload -> Micha, can you try to track this down? It makes xvid ICE on beta-ppc.
Smaller test case: // Compile with -O2 -maltivec // // Works with GCC 3.3.5 and GCC 4.0.2 // ICEs with GCC 4.1 from today's CVS #include <altivec.h> #define REGLIST \ "77", "78", "79", "80", "81", "82", "83", "84", "85", "86",\ "87", "88", "89", "90", "91", "92", "93", "94", "95", "96",\ "97", "98", "99", "100", "101", "102", "103", "104", "105", "106",\ "107", "108" void foo (int H) { volatile __attribute__ ((altivec (vector__))) unsigned char tmp; while (H-- > 0) { asm ("" : : : REGLIST); tmp = ( __attribute__ ((altivec (vector__))) unsigned char) (( __attribute__ ((altivec (vector__))) unsigned short) vec_splat_s16 (((5)))); } } Note that this is really a register allocation problem that we fail on because our register allocator doesn't know about liveness inside blocks, only at the start and end of a block. But the situation is easily reproducible as long as you pump the register pressure up far enough. The problem seems to be in reload const-to-mem. We start with this: (insn:HI 26 22 56 2 (set (mem/v/c/i:V16QI (plus:SI (reg/f:SI 113 sfp) (const_int 16 [0x10])) [0 tmp+0 S16 A128]) (subreg:V16QI (reg:V8HI 128) 0)) 467 {altivec_stvx_v16qi} (insn_list:REG_DEP_TRUE 22 (nil)) (expr_list:REG_EQUAL (const_vector:V16QI [ (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) ]) (nil))) and we end with this (were we ICE on insn 65): (insn 65 22 64 2 (set (reg:V16QI 77 0) (mem/u/c/i:V16QI (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0 S16 A128])) -1 (nil) (nil)) (insn 64 65 26 2 (set (reg:SI 9 9) (plus:SI (reg/f:SI 1 1) (const_int 16 [0x10]))) 31 {*addsi3_internal1} (nil) (nil)) (insn:HI 26 64 56 2 (set (mem/v/c/i:V16QI (reg:SI 9 9) [0 tmp+0 S16 A128]) (reg:V16QI 77 0)) 467 {altivec_stvx_v16qi} (insn_list:REG_DEP_TRUE 22 (nil)) (expr_list:REG_EQUAL (const_vector:V16QI [ (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) ]) (nil)))
More background: Starting program: /abuild/stevenb/build/gcc/cc1 -O2 -maltivec t.c -da foo Analyzing compilation unitPerforming intraprocedural optimizations Assembling functions: foo Breakpoint 8, find_reloads (insn=0x401069c0, replace=0, ind_levels=0, live_known=1, reload_reg_p=0x10a65334) at reload.c:2541 2541 int no_input_reloads = 0, no_output_reloads = 0; (gdb) disab 8 (gdb) enab 10 (gdb) cont Continuing. Breakpoint 10, emit_insn (x=0x40110680) at emit-rtl.c:4430 4430 rtx last = last_insn; (gdb) cont Continuing. Breakpoint 10, emit_insn (x=0x401106c0) at emit-rtl.c:4430 4430 rtx last = last_insn; (gdb) p debug_rtx(x) (set (reg:V16QI 77 0) (mem/u/c/i:V16QI (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0 S16 A128])) $52 = void (gdb) up #1 0x1069fe58 in rs6000_emit_move (dest=0x4010e7f8, source=0x40110670, mode=V16QImode) at rs6000.c:4058 4058 emit_insn (gen_rtx_SET (VOIDmode, operands[0], operands[1])); (gdb) bt #0 emit_insn (x=0x401106c0) at emit-rtl.c:4430 #1 0x1069fe58 in rs6000_emit_move (dest=0x4010e7f8, source=0x40110670, mode=V16QImode) at rs6000.c:4058 #2 0x10487fd8 in gen_movv16qi (operand0=0x4010e7f8, operand1=0x40110670) at altivec.md:171 #3 0x1033360c in emit_move_insn_1 (x=0x4010e7f8, y=0x40110670) at expr.c:3107 #4 0x10510fa8 in gen_move_insn (x=0x4010e7f8, y=0x40110670) at optabs.c:4214 #5 0x10594ac4 in gen_reload (out=0x4010e7f8, in=0x40110670, opnum=1, type=RELOAD_FOR_INPUT) at reload1.c:7606 #6 0x1058fef8 in emit_input_reload_insns (chain=0x10aa3da0, rl=0x10a5f99c, old=0x40110670, j=3) at reload1.c:6635 #7 0x10590c30 in do_input_reload (chain=0x10aa3da0, rl=0x10a5f99c, j=3) at reload1.c:6880 #8 0x10591c00 in emit_reload_insns (chain=0x10aa3da0) at reload1.c:7053 #9 0x10585898 in reload_as_needed (live_known=1) at reload1.c:3902 #10 0x1057bdec in reload (first=0x400351b8, global=1) at reload1.c:1067 #11 0x107b452c in global_alloc (file=0x10aacc40) at global.c:628
The trouble appears to come from this: case V16QImode: case V8HImode: case V4SFmode: case V4SImode: case V4HImode: case V2SFmode: case V2SImode: case V1DImode: if (CONSTANT_P (operands[1]) && !easy_vector_constant (operands[1], mode)) operands[1] = force_const_mem (mode, operands[1]); break; We get here with: Breakpoint 14, rs6000_emit_move (dest=0x4010e7f8, source=0x40110670, mode=V16QImode) at rs6000.c:3867 3867 if (CONSTANT_P (operands[1]) (gdb) p debug_rtx(dest) (reg:V16QI 77 0) $3 = void (gdb) p debug_rtx(source) (const_vector:V16QI [ (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) (const_int 0 [0x0]) (const_int 5 [0x5]) ]) $4 = void (gdb) And we go to emit_set with: (gdb) p debug_rtx (operands[0]) (reg:V16QI 77 0) $5 = void (gdb) p debug_rtx (operands[1]) (mem/u/c/i:V16QI (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0 S16 A128]) $6 = void (gdb)
On IRC it was suggested that we just need to get a version of easy_vector_constant which does the right thing in any mode.
Subject: Re: [4.1 Regression] ICE in extract_insn with altivec >On IRC it was suggested that we just need to get a version of >easy_vector_constant which does the right thing in any mode. > Yes, it looks like the bug is that the constant is declared "easy" until it is in V8HI mode, but not when the reload is done in V16QI mode. It may make sense to assert !reload_in_progress && !reload_completed before force_const_mem is called. Paolo
Aldy, Can you look into this bug?
Altivec is very popular; this is a showstopper.
I'll take this.
Does anyone have the un-preprocessed source for this bug? I'm seeing some assignments that should have casts, and I wan't to rule out bogus input.
(In reply to comment #21) > Does anyone have the un-preprocessed source for this bug? I'm seeing some > assignments that should have casts, and I wan't to rule out bogus input. comment #13 has an un preprocessed source for a simplified version.
Created attachment 10087 [details] original source file from xvid (xvidcore-1.1.0-beta2) "Source" looks like: MAKE_PASS_16(V_Pass_Avrg_Up_16_Add_Altivec_C, AVRG_UP_ADD_16_V, VARS_V, LOAD_V_16, STORE_V_16, BpS, 1) attached.
Aldy, I have a patch for this that only needs more testing. If you want, and if you do not have any better idea than what I said in comment #17, I can take this.
Bonzini: Perhaps both approaches would be even better. We definitely should handle the transformed vector, because theoretically it's still easy to generate. And adding the extra check you mention would be icing on the cake :).
Okay, taking this. If you ever want to make SPE constants more optimized, be careful about this bug though! ;-)
Subject: Bug 24230 Author: bonzini Date: Mon Nov 7 10:39:36 2005 New Revision: 106588 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106588 Log: 2005-11-07 Paolo Bonzini <bonzini@gnu.org> PR target/24230 * config/rs6000/rs6000.c (easy_vector_splat_const, easy_vector_same, gen_easy_vector_constant_add_self): Delete. (vspltis_constant, easy_altivec_constant, gen_easy_altivec_constant): New. (output_vec_const_move): Use gen_easy_altivec_constant. (rs6000_expand_vector_init): Do not emit a set of a VEC_DUPLICATE. * config/rs6000/predicates.md (easy_vector_constant): Reorganize tests. (easy_vector_constant_add_self): Rewritten. * config/rs6000/rs6000-protos.h (easy_vector_splat_const, easy_vector_same, gen_easy_vector_constant_add_self): Remove prototype. (easy_altivec_constant, gen_easy_altivec_constant): Add prototype. testsuite: 2005-11-07 Paolo Bonzini <bonzini@gnu.org> PR target/24230 * gcc.target/powerpc/altivec-consts.c, gcc.target/powerpc/altivec-splat.c: New testcase. Added: trunk/gcc/testsuite/gcc.target/powerpc/altivec-consts.c trunk/gcc/testsuite/gcc.target/powerpc/altivec-splat.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/altivec.md trunk/gcc/config/rs6000/predicates.md trunk/gcc/config/rs6000/rs6000-protos.h trunk/gcc/config/rs6000/rs6000.c trunk/gcc/config/rs6000/rs6000.h trunk/gcc/testsuite/ChangeLog
patch committed