RFA: Add lock_lenth attribute to support the ARC port (Was: Re: Ping: RFA: add lock_length attribute to break branch-shortening cycles)

Joern Rennecke joern.rennecke@embecosm.com
Wed Oct 24 13:43:00 GMT 2012


Quoting Richard Biener <richard.guenther@gmail.com>:

> Just to add some extra information, can you quote your ports
> ADJUST_INSN_LENGTH and one example instruction with length/lock_length
> attribute the above applies to?

This is a bit besides the point, since lock_length does mostly the traditional
branch shortening (with alignment effects rolled in for branches - I'm not
really happy about this, but that was required for convergence), length
does alignment effects on top, and ADJUST_INSN_LENGTH does random
context-sensitive adjustments related to pipeline quirks and conditional
execution.

I'm afraid this gcc port's logic in this area is both more complicated than
what you would expect in an average gcc port, and much more simplistic than
what a competent assembler programmer uses to optimize code for these
microarchitectures.

from arc.h:

#define ADJUST_INSN_LENGTH(X, LENGTH)                           \
   (LENGTH) += arc_adjust_insn_length ((X), (LENGTH))

from arc.c:

/* Return length adjustment for INSN.  */
int
arc_adjust_insn_length (rtx insn, int len)
{
   int adj = 0;

   if (!INSN_P (insn))
     return 0;
   if (GET_CODE (PATTERN (insn)) == SEQUENCE)
     {
       int adj0, adj1, len1;
       rtx pat = PATTERN (insn);
       rtx i0 = XVECEXP (pat, 0, 0);
       rtx i1 = XVECEXP (pat, 0, 1);

       len1 = get_attr_lock_length (i1);
       gcc_assert (!len || len >= 4 || (len == 2 && get_attr_iscompact (i1)));
       if (!len1)
         len1 = get_attr_iscompact (i1) != ISCOMPACT_FALSE ? 2 : 4;
       adj0 = arc_adjust_insn_length (i0, len - len1);
       adj1 = arc_adjust_insn_length (i1, len1);
       return adj0 + adj1;
     }
   if (recog_memoized (insn) == CODE_FOR_doloop_end_i)
     {
       rtx prev = prev_nonnote_insn (insn);

       return ((LABEL_P (prev)
                || (TARGET_ARC600
                    && (JUMP_P (prev) || GET_CODE (PATTERN (prev)) ==  
SEQUENCE)))
               ? 4 : 0);
     }

   /* Check for return with but one preceding insn since function
      start / call.  */
   if (TARGET_PAD_RETURN
       && JUMP_P (insn)
       && GET_CODE (PATTERN (insn)) != ADDR_VEC
       && GET_CODE (PATTERN (insn)) != ADDR_DIFF_VEC
       && get_attr_type (insn) == TYPE_RETURN)
     {
       rtx prev = prev_active_insn (insn);

       if (!prev || !(prev = prev_active_insn (prev))
           || ((NONJUMP_INSN_P (prev)
                && GET_CODE (PATTERN (prev)) == SEQUENCE)
               ? CALL_ATTR (XVECEXP (PATTERN (prev), 0, 0), NON_SIBCALL)
               : CALL_ATTR (prev, NON_SIBCALL)))
         return 4;
     }
   /* Rtl changes too much before arc_reorg to keep ccfsm state.
      But we are not required to give exact answers then.  */
   if (cfun->machine->arc_reorg_started
       && (JUMP_P (insn) || (len & 2)))
     {
       struct arc_ccfsm *statep = &cfun->machine->ccfsm_current;

       arc_ccfsm_advance_to (insn);
       switch (statep->state)
         {
         case 0:
           break;
         case 1: case 2:
           /* Deleted branch.  */
           return -len;
         case 3: case 4: case 5:
           /* Conditionalized insn.  */
           if ((!JUMP_P (insn)
                || (get_attr_type (insn) != TYPE_BRANCH
                    && get_attr_type (insn) != TYPE_UNCOND_BRANCH
                    && (get_attr_type (insn) != TYPE_RETURN
                        || (statep->cc != ARC_CC_EQ && statep->cc != ARC_CC_NE)
                        || NEXT_INSN (PREV_INSN (insn)) != insn)))
               && (len & 2))
             adj = 2;
           break;
         default:
           gcc_unreachable ();
         }
     }
   if (TARGET_ARC600)
     {
       rtx succ = next_real_insn (insn);

       if (succ && INSN_P (succ))
         adj += arc600_corereg_hazard (insn, succ);
     }

   /* Restore extracted operands - otherwise splitters like the  
addsi3_mixed one
      can go awry.  */
   extract_constrain_insn_cached (insn);

   return adj;
}

 From arc.md:

; Since the demise of REG_N_SETS, it is no longer possible to find out
; in the prologue / epilogue expanders how many times blink is set.
; Using df_regs_ever_live_p to decide if blink needs saving means that
; any explicit use of blink will cause it to be saved; hence we cannot
; represent the blink use in return / sibcall instructions themselves, and
; instead have to show it in EPILOGUE_USES and must explicitly
; forbid instructions that change blink in the return / sibcall delay slot.
(define_insn "return_i"
   [(return)]
   "reload_completed"
{
   rtx reg
     = gen_rtx_REG (Pmode,
                    arc_return_address_regs[arc_compute_function_type (cfun)]);

   if (TARGET_PAD_RETURN)
     arc_pad_return ();
   output_asm_insn (\"j%!%* [%0]%&\", &reg);
   return \"\";
}
   [(set_attr "type" "return")
    (set_attr "cond" "canuse")
    (set (attr "iscompact")
         (cond [(eq (symbol_ref "arc_compute_function_type (cfun)")
                    (symbol_ref "ARC_FUNCTION_NORMAL"))
                (const_string "maybe")]
               (const_string "false")))
    (set (attr "length")
         (cond [(eq (symbol_ref "arc_compute_function_type (cfun)")
                    (symbol_ref "ARC_FUNCTION_NORMAL"))
                (const_int 4)
                (and (match_test "0") (eq (match_dup 0) (pc)))
                (const_int 4)
                (eq_attr "verify_short" "no")
                (const_int 4)]
               (const_int 2)))])

Earlier in arc.md:

; When estimating sizes during arc_reorg, when optimizing for speed, there
; are three reasons why we need to consider branches to be length 6:
; - annull-false delay slot insns are implemented using conditional execution,
;   thus preventing short insn formation where used.
; - for ARC600: annull-true delay slot isnns are implemented where possile
;   using conditional execution, preventing short insn formation where used.
; - for ARC700: likely or somewhat likely taken branches are made long and
;   unaligned if possible to avoid branch penalty.
(define_insn "*branch_insn"
   [(set (pc)
         (if_then_else (match_operator 1 "proper_comparison_operator"
                                       [(reg CC_REG) (const_int 0)])
                       (label_ref (match_operand 0 "" ""))
                       (pc)))]
   ""
   "*
{
   if (arc_ccfsm_branch_deleted_p ())
     {
       arc_ccfsm_record_branch_deleted ();
       return \"; branch deleted, next insns conditionalized\";
     }
   else
     {
       arc_ccfsm_record_condition (operands[1], 0, insn, 0);
       if (get_attr_length (insn) == 2)
          return \"b%d1%? %^%l0%&\";
       else
          return \"b%d1%# %^%l0\";
     }
}"
   [(set_attr "type" "branch")
    (set
      (attr "lock_length")
      (cond [
        ; In arc_reorg we just guesstimate; might be more or less.
        (match_test "arc_branch_size_unknown_p ()")
        (const_int 4)

        (eq_attr "delay_slot_filled" "yes")
        (const_int 4)

        (ne
          (if_then_else
            (match_operand 1 "equality_comparison_operator" "")
            (ior (lt (minus (match_dup 0) (pc)) (const_int -512))
                 (gt (minus (match_dup 0) (pc))
                     (minus (const_int 506)
                            (symbol_ref "get_attr_delay_slot_length (insn)"))))
            (ior (match_test "!arc_short_comparison_p (operands[1], -64)")
                 (lt (minus (match_dup 0) (pc)) (const_int -64))
                 (gt (minus (match_dup 0) (pc))
                     (minus (const_int 58)
                            (symbol_ref "get_attr_delay_slot_length  
(insn)")))))
          (const_int 0))
        (const_int 4)

        (eq_attr "verify_short" "yes")
        (const_int 2)]
       (const_int 4)))
    (set (attr "iscompact")
         (cond [(eq_attr "lock_length" "2") (const_string "true")]
               (const_string "false")))])

> Which sounds straight-forward.  The docs of ADJUST_INSN_LENGTH
> are not entirely clear, but it sounds like it may only increase   
> length, correct?

No, it may decrease the length, but not below zero.

in fact, the arc port does decrease the length to zero in some cases
where a branch is deleted.

> I see that ADJUST_INSN_LENGTH is really not needed as everything
> should be expressable in the length attribute of an instruction?

In theory, yes, in a similar vein as you can use a turing machine to
implement a text editor.
Each different result length needs a different cond or if clause, so if
you have a C function that calculates the length, you need to evaluate
it multiple times.

>> If you define the @code{lock_length} attribute, the @code{lock_length}
>> attribute will be evaluated, and then the maximum with of @code{lock_length}
>
> with of?  I read it as 'of the'

I meant 'with the'.  I.e. maximum of current lock_length value and the  
(maximum)
lock_length value from the previous iteration.

>> value from the previous iteration will be formed and saved.
>
> So lock_length will only increase during iteration.

Yes.

>> Then the maximum of that value with the @code{length} attribute will
>> be formed, and @code{ADJUST_INSN_LENGTH} will be applied.
>
> ADJUST_INSN_LENGTH will be applied to the maximum?  What will
> be the 'instruction length' equivalent to the simple non-lock_length case?
> Is it the above, max (ADJUST_INSN_LENGTH (lock-length-max, length))?

Huh?  Your parentheses are muddled, so I'm not sure what exactly you
meant to ask.  And ADJUST_INSN_LENGTH is not really a function.  Well,
if we pretend that it was, the expression would be:

ADJUST_INSN_LENGTH (max (lock_length_max, length))

>> Thus, you can allow the length to vary downwards as well as upwards
>> across iterations with suitable definitions of the @code{length} attribute
>> and/or @code{ADJUST_INSN_LENGTH}.  Care has to be taken that this does not
>> lead to infinite loops.
>
> I don't see that you can shrink length with just suitable lock_length and
> length attributes.

You can; have lock_length evaluate to something small, e.g. 0, and then
length dominates the value that is calculated each iteration.

>  What seems to be the cruical difference is that you
> apply ADJUST_INSN_LENGTH after combining lock-length-max and length.
> But then you _decrease_ length with ADJUST_INSN_LENGHT ...
>
> Maybe what you really want is ADJUST_INSN_LENGTH_AFTER which is
> applied afterwards?  Thus,

Maybe that would work, shifting the alignment computations into a
late-applied ADJUST_INSN_LENGTH; the ARC port has relations between  
length, lock_length, iscompact, and verify_short that form dependency  
circles.
When I try to unify length and lock_length,
genattrtab goes into an infinite loop, so there is no way of testing the
port till genattrtab knows when to quit.

> Note that
>
>> Care has to be taken that this does not
>> lead to infinite loops.
>
> doesn't convince me that is properly designed ;)

There are circles that emerge naturally from a description of the  
requirements.
Some local decisions need to be compromised to reach a globally stable
solution.  I don't see how the infrastructure can make a meaningful  
choice where this compromise should occur.
One thing that would be nice to have would be to have a tally of previous
values, so we could decide better when to compromise.

Well, there would be one way to offload the heavy lifting to the
infrastructure: have another attribute that assigns looping thresholds,
for the number of iterations where things change without a lock_length change.
The when an instruction breaks the limit, the new length, if larger  
than lock_length, would be stored in lock_length.
So in the simple length case, that threshold is zero.
What would you call such an attribute?
length_lock_threshold ?  length_iter_threshold ? length_iteration_limit ?



More information about the Gcc-patches mailing list