This is the mail archive of the gcc-help@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Defect in GCC 4.4.0 bfin Port : Wrong code generated while accessing multiple arrays with different loop index variables within loop (-Os option)


Consider the following test case:

Test Case Reference:
--------------------

int siVect[40] ;
int siCoeff[40] ;
int siSumofDotProduct  ;
int siIndex1, siIndex2 ;

vTestMultipleArrayAccessWithDifferentLoopIndex()
{
        for (siIndex1=0 ; siIndex1<40 ; siIndex1++)
        {
            siSumofDotProduct += siVect[siIndex1] * siCoeff[siIndex2]; 
            siIndex2++;
        }
}

Assembly generated by bfin port (GCC 4.4.0 --target=bfin-elf) (-Os):
--------------------------------------------------------------------
        P1 = 41 (X);
        LSETUP (.L3, .L6) LC1 = P1;  <---- (1)
        jump.s .L2;                  <---- (2)
.L3:
        R2 = [P0++];
        R3 = [P5++];
        R2 *= R3;
        R0 += 1;
        R1 = R1 + R2;
.L2:
        R2 = P2;
.L6:
        P2 += 1;


Here, marked instruction (2) "jump.s" is wrongly generated along with
hardware loop.

My hypothesis:
----------------

Consider the following snip of code that will get executed while doing
target dependent reorganization of loops:

Reference:
gcc4.4.0/gcc/config/bfin/bfin.c
Function: bfin_optimize_loop
---snip---
    :
    bb = loop->tail;
    last_insn = PREV_INSN (loop->loop_end);

    while (1)
    {
        int bbno = bb->index;

        for (; last_insn != PREV_INSN (BB_HEAD (bb));
        last_insn = PREV_INSN (last_insn))
            if (INSN_P (last_insn))
            break;

        if (last_insn != PREV_INSN (BB_HEAD (bb)))
            break;

        if (single_pred_p (bb)
            && single_pred (bb) != ENTRY_BLOCK_PTR)
        {
            bb = single_pred (bb);
            last_insn = BB_END (bb);
            continue;
        }
        else
        {
            last_insn = NULL_RTX;
            break;
        }
    }

    if (!last_insn)
    {
      if (dump_file)
            fprintf (dump_file, ";; loop %d has no last instruction\n",
loop->loop_no);

      goto bad_loop;

    }
---snip---

The above 'while' loop will be executed for the block containing loop.It
traverses the list of instructions within a loop from bottom to top. If
it finds any instruction of INSN type, it breaks i.e. it assumes that
the loop is valid for replacement.

Now, consider the  RTL dump  (using -fdump-rtl-all option, file with
extension .alignment) with -Os option:

---snip---
(jump_insn 112 86 113 2 /home/meena/Desktop/test.c:8 (set (pc)
        (label_ref 57)) -1 (nil))

(barrier 113 112 62)

(code_label 62 113 49 3 3 "" [1 uses])

(note 49 62 50 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

(insn 50 49 51 3 /home/meena/Desktop/test.c:10 (set (reg:SI 2 R2 [100])
        (mem/s:SI (post_inc:SI (reg:SI 8 P0 [orig:88 ivtmp.35 ] [88]))
[2 siCoeff S4 A32])) 14 {*movsi_insn} (expr_list:REG_INC (reg:SI 8 P0
[orig:88 ivtmp.35 ] [88])
        (nil)))

(insn 51 50 52 3 /home/meena/Desktop/test.c:10 (set (reg:SI 3 R3 [102])
        (mem/s:SI (post_inc:SI (reg:SI 13 P5 [orig:89 ivtmp.31 ] [89]))
[2 siVect S4 A32])) 14 {*movsi_insn} (expr_list:REG_INC (reg:SI 13 P5
[orig:89 ivtmp.31 ] [89])
        (nil)))

(insn 52 51 53 3 /home/meena/Desktop/test.c:10 (set (reg:SI 2 R2 [100])
        (mult:SI (reg:SI 2 R2 [100])
            (reg:SI 3 R3 [102]))) 75 {mulsi3} (expr_list:REG_DEAD
(reg:SI 3 R3 [102])
        (nil)))

(insn 53 52 54 3 /home/meena/Desktop/test.c:10 (set (reg:SI 1 R1
[orig:94 siSumofDotProduct_lsm.17 ] [94])
        (plus:SI (reg:SI 1 R1 [orig:94 siSumofDotProduct_lsm.17 ] [94])
            (reg:SI 2 R2 [100]))) 45 {addsi3} (expr_list:REG_DEAD
(reg:SI 2 R2 [100])
        (nil)))

(insn 54 53 57 3 /home/meena/Desktop/test.c:10 (set (reg:SI 0 R0
[orig:90 ivtmp.25 ] [90])
        (plus:SI (reg:SI 0 R0 [orig:90 ivtmp.25 ] [90])
            (const_int 1 [0x1]))) 45 {addsi3} (nil))

(code_label 57 54 58 4 2 "" [1 uses])

(note 58 57 60 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

(insn 60 58 61 4 /home/meena/Desktop/test.c:10 (set (reg:SI 2 R2
[orig:87 siIndex2_lsm.37 ] [87])
        (reg:SI 10 P2 [orig:91 ivtmp.22 ] [91])) 14 {*movsi_insn} (nil))

(insn 61 60 85 4 /home/meena/Desktop/test.c:10 (set (reg:SI 10 P2
[orig:91 ivtmp.22 ] [91])
        (plus:SI (reg:SI 10 P2 [orig:91 ivtmp.22 ] [91])
            (const_int 1 [0x1]))) 45 {addsi3} (nil))

(jump_insn 85 61 65 4 /home/meena/Desktop/test.c:8 (parallel [
            (set (pc)
                (if_then_else (ne (reg:SI 9 P1 [106])
                        (const_int 1 [0x1]))
                    (label_ref 62)
                    (pc)))
            (set (reg:SI 9 P1 [106])
                (plus:SI (reg:SI 9 P1 [106])
                    (const_int -1 [0xffffffff])))
            (unspec [
                    (const_int 0 [0x0])
                ] 10)
            (clobber (scratch:SI))
        ]) 89 {loop_end} (expr_list:REG_BR_PROB (const_int 9100
[0x238c])
        (nil)))
---snip---

Please note, loop body is present in the block '3' marked by 

(note 49 62 50 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

Whereas loop end instruction (jump_insn 85) is a part of separate block
block 4 marked by

(note 58 57 60 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

This block has two predecessor "bb 2" and "bb 3". As per my
understanding, that the above snip of code work correctly only when the
loop body and loop end instruction are part of same basic block and the
block should have only one predecessor. But these conditions are getting
failed in case of -Os option. Thus, the loop should not be replaced by
hardware loop in this case.

But due to presence of below instructions (of type INSN_P) in basic
block '4',

(insn 60 58 61 4 /home/meena/Desktop/test.c:10 (set (reg:SI 2 R2
[orig:87 siIndex2_lsm.37 ] [87])
        (reg:SI 10 P2 [orig:91 ivtmp.22 ] [91])) 14 {*movsi_insn} (nil))

(insn 61 60 85 4 /home/meena/Desktop/test.c:10 (set (reg:SI 10 P2
[orig:91 ivtmp.22 ] [91])
        (plus:SI (reg:SI 10 P2 [orig:91 ivtmp.22 ] [91])
            (const_int 1 [0x1]))) 45 {addsi3} (nil))

The 'while' loop is getting break in between and wrongly generating the
code for hardware loop.


Please verify my understanding.


Thanks and Regards,
Meena











Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]