[Bug libgomp/69568] New: Invalid HSAIL opcode when using builtin vector
christophe_choquet at hotmail dot com
gcc-bugzilla@gcc.gnu.org
Sat Jan 30 16:57:00 GMT 2016
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69568
Bug ID: 69568
Summary: Invalid HSAIL opcode when using builtin vector
Product: gcc
Version: hsa
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: christophe_choquet at hotmail dot com
CC: jakub at gcc dot gnu.org
Target Milestone: ---
The test case is the following:
typedef float float2 __attribute__ ((vector_size (8)));
float2 *output;
....
#pragma omp target
#pragma omp teams thread_limit(256) // thread_limit is optional
#pragma omp distribute parallel for firstprivate(n) private(i)
for (i=0; i < n; i++) {
float2 a;
a[0] = i;
a[1] = 1+i;
output[i] = a;
}
At execution on Kaveri APU, HSA runtime says:
Error in hsa_code section, at offset 336:
Instruction has invalid type (f32x2), expected one of: b128, f16, f32, f64,
roimg, rwimg, s16, s32, s64, s8, samp, sig32, sig64, u16, u32, u64, u8, woimg
libgomp: HSA fatal error: Could not add a module to the HSA program
In fact, the HSAIL dump gives:
14: ld_private_align(8)_f32x2 $d1, [%a] /* BRIG offset: 380,
op0: 240, op1: 248 */
15: st_align(8)_f32x2 $d1, [$d0] /* BRIG offset: 400, op0:
240, op1: 268 */
In fact cloc.sh for a similar code sequence gives:
st_v2_global_align(8)_f32 ($s0, $s1), [$d1];
I understand the design was not to use vector instructions, but this gives poor
performance when writing complex vectors for example (bad memory pattern).
Since the current HSAIL generation strategy is to pack F32, I tried this fix:
hsa-gen.c:
if (TREE_CODE (type) == VECTOR_TYPE)
{
HOST_WIDE_INT tsize = tree_to_uhwi (TYPE_SIZE (type));
.....
case 64:
// Not working! res |= BRIG_TYPE_PACK_64;
res = BRIG_TYPE_B64;
break;
With this change, HSAIL code generated is:
14: ld_private_align(8)_u64 $d1, [%a] /* BRIG offset: 380,
op0: 240, op1: 248 */
15: st_align(8)_u64 $d1, [$d0] /* BRIG offset: 400, op0: 240,
op1: 268 */
which is correct. The program loads and gives the correct result.
More information about the Gcc-bugs
mailing list