This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [patch] lno branch merge - vectorizer patch #5

From: Dorit Naishlos <DORIT at il dot ibm dot com>
To: Olga Golovanevsky <OLGA at il dot ibm dot com>
Cc: gcc-patches at gcc dot gnu dot org, Mark Mitchell <mark at codesourcery dot com>, Ayal Zaks <ZAKS at il dot ibm dot com>
Date: Wed, 22 Sep 2004 12:17:16 +0200
Subject: Re: [patch] lno branch merge - vectorizer patch #5
The functionality of this patch is ok with respect to the vectorizer. I
have a few minor comments, mostly with respect to style/documentation:

General comments:

- There are places where the code overflows 80 columns
- There are functions without documentation
- I would appreciate if you could follow the style conventions within
tree-vectorizer.c (namely - the documentation before each function ("/*
Function .... */) and the spaces between functions).

Specific comments:
(see [] for typos and such within your text)

> + /* This page contains code for general loop transfomations to
> +    be applied to some loops before vectorizing it
> +    to make them suitable for vectorization.  */

I think the new utilities are better described as: "Utilities to support
loop peeling for vectorization purposes". (I think the current comment is a
bit misleading. I don't see these new utilities as "general loop
transformations to make loops suitable for vectorization"; to me, this
description implies that these transformations take a loop that is
inherently not vectorizable - because of dependences or what not - and
convert it to a vectorizable loop. This is not the case here. The utilities
you added are used after it has already been established that the loop is
vectorizable, as mechanisms that help the actual vectorization
transformation or that optimize the loop - e.g, force alignment of
accesses.

> + /* This function copies phis from LOOP header to
> +    NEW_LOOP header. AFTER is as
> +    from update_phis_for_duplicate_loop function.  */

"from" --> "in".


> + /* Update PHI nodes for a guard of the LOOP.
> +
> +    LOOP is supposed to have [ a ] preheader bb at which [ a ] guard
condition is located.
> +    The true edge of this condition skips the LOOP and ends
> +    at [ the ] destination of the (unique) LOOP exit. [ The ] Loop exit
bb is suppose[ d ] to be
> +    [ an ] empty [ bb ] (created by this transformation) with one
successor.
> +
> +    This function creates phi nodes at LOOP exit bb. These phis need to
be
> +    created as a result of adding true edge coming from guard.
--> [ created as a result of the new edge coming in from the new guard code
that was added. ]

> +    After the phis creation, [ the ] function updates [ the ] values of
phi nodes from[<-- at?]
> +    the LOOP exit successor bb.  */

Please consider adding something like the following to the documentation:

Original loop:

bb0: loop preheader
     goto bb1

bb1: loop header
     if (exit_cond) goto bb3 else goto bb2

bb2: loop latch
     goto bb1

bb3:

After guard creation:

bb0: loop preheader
     if (guard_condition) goto bb4 else goto bb1

bb1: loop header
     if (exit_cond) goto bb4 else goto bb2

bb2: loop latch
     goto bb1

bb4: loop exit
     (new empty bb)
     goto bb3

bb3:

The function update_phi_nodes_for_guard updates the phi nodes in bb4 and in
bb3, to account for the new edge from bb0 to bb4 (is this a correct
description?).


> /* Make the LOOP iterate NITERS times. It does so[<--This is done ] by
adding a new IV
> +    that starts at zero[ , ] increas[ es ] by one and its limit is
niters[<--NITERS].  */


> /* Given LOOP this function generates a new copy of it and put[ s ] it on
either
> +    the entry if E is the entry edge or at its exit if E is the exit
edge.  */
--> [ and puts it on E, which is either the entry or exit of LOOP ]


> + /* Given the condition statement COND, locate[<-- put/place] it as [
the ] last statement
> +    of GUARD_BB; EXIT_BB is the basic block to skip the loop;


> /* This function verify that LOOP corresponds to
> +    transformation conditions.  */
--> [ This function verifies that certain restrictions apply to LOOP.]


> + /* Given LOOP this function duplicates it to the edge E.
--> [ Given LOOP this function duplicates it "to the edge E" (i.e, places
the loop copy at the entry/exit edge E).]


> +
> +    This function serves as a basic transformation to be done on LOOP
prior
> +    to vectorizing it.
--> [ This transformation takes place before the loop is vectorized.]


> +    For now, there are two main cases when it's used
> +    by [ the ] vectorizer: to support loops with unknown loop bounds
[ (or loop bounds that don't divide by the vectorization factor), ]


> +    and to deal with
> +    alignment unknown at compile time.
--> [and to force the alignment of data references in the loop. ]


> +
> +    FIRST_NITERS (SSA_NAME) parameter specifies how many time[ s ] to
iterate
> +    the first loop. If UPDATE_FIRST_LOOP_COUNT parameter is false, the
first
> +    loop will be iterated FIRST_NITERS times by introducing additional
> +    induction variable and replacing loop exit condition. If
> +    UPDATE_FIRST_LOOP_COUNT is true no change to [ the ] first loop is
made and
> +    the caller to tree_duplicate_loop_to_edge is responsible for
updating
> +    the first loop count.
> +
> +    NITERS (also SSA_NAME) parameter defines the number of iteration the
> +    original loop has been iterated.
--> [original loop iterated.]


> +    For now this function supports only types of loops that can be
> +    vectorizered after the transformation. Such types are the following:
--> [For now this function supports only loop forms that are candidate for
vectorization:]


> +    (5) loops without defs that are used after the loop
> +
...
> +    (5) is checked as part of the function
vect_mark_stmts_to_be_vectorized,
> +    when excluding induction/reduction support.

Note that what is checked in vect_mark_stmts_to_be_vectorized is not
entirely equivalent to the restriction that (5) stands for.


> +   LOOP_DO_PEELING_FOR_ALIGNMENT (res) = 0;

false instead of 0?


> ! }
> !
> ! /* This function generate[ s ] the following statements:
> !
> !  ni_name = number of iterations loop executes
> !  ratio = ni_name / vf
> !  ratio_mult_vf_name = ratio * vf
> !
> !  and locate them at the loop preheader edge.  */
[ and places them at the loop preheader edge. ]

(or better:
/* Function vect_generate_tmps_on_preheader

   Generate the following sequence at the loop preheader edge:
   ....
*/)


> !   /* Call duplicate_loop_body_on_entry_exit to have the vectorized
> !      loop executes RATION_MULT_VF times and the rest in a new
duplication
> !      after that - so we duplicate on exit edge.  */

This comment is not so clear to me


> ! static tree
> ! vect_gen_niters_for_prolog_loop (loop_vec_info loop_vinfo)
>   {

The function name gives the impression that this is a general utility to
set "niters_for_prolog_loop" whereas it is tailored specifically for
forcing the alignment of a data reference in the loop. Maybe consider
changing the function name (and document the function).


> +   if (LOOP_DO_PEELING_FOR_ALIGNMENT[ ](loop_vinfo)
(space missing)


> +     {
> +       vect_do_peeling_for_alignment (loop_vinfo, loops);
> +     }
redundant '{' '}'.


> !   fprintf (dump_file,
> !          "not vectorized: number of iterations cannot be specified.");

"specified" --> "computed".


> --- 3752,3781 ----
>         struct data_reference *dr = VARRAY_GENERIC_PTR
(loop_write_datarefs, i);
>         if (!aligned_access_p (dr))
>     {
> !     /* Decide here whether we need peeling for alignment.  */
> !     /*    if (unknown_alignment_for_access_p (dr))
> !           { */
> !         decide_peeling_count++;
> !         if (decide_peeling_count > MAX_NUMBER_OF_UNALIGNED_DATA_REFS)
> !         {
> !           if (vect_debug_stats (loop) || vect_debug_details (loop))
> !             fprintf (dump_file,
> !                    "not vectorized: too many stores with unknonwn
alignment.\n");
> !           return false;
> !         }
> !         else
> !         {
> !           LOOP_UNALIGNED_DR (loop_vinfo, decide_peeling_count - 1) =
dr;
> !           LOOP_DO_PEELING_FOR_ALIGNMENT (loop_vinfo) = true;
> !         }
> !         /*}
> !     else
> !       {
> !         if (vect_debug_stats (loop) || vect_debug_details (loop))
> !         fprintf (dump_file,
> !                "not vectorized: store has known alignment different
from 0.\n");
> !         return false;
> !         } */
>     }
>       }

Can you remove the code fragments that are commented out?

Also, I think we should explain here that we currently use a heuristic that
peels for a single store access in the loop (because the vectorizer doesn't
support misaligned stores yet, but it will soon support misaligned loads),
but that we plan to develop a better cost model to guide the decision on
which data-access to peel for.


> +    These restrictions will be rel[a]xed in the future.  */


> ! else
> !   {
> !     /* We need only one loop entry for unknown loop bound support.  */
> !     if (loop->num_entries != 1 || !loop->pre_header)
> !       {
> !         if (vect_debug_stats (loop) || vect_debug_details (loop))
> !         fprintf (dump_file,
> !                "not vectorized: more than one loop entry.");
> !         return NULL;
> !       }
> !   }

don't we want to check for this always? I think we rely on there being a
single entry even when the loop bound is simple.

Also applies to:

> !   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> !       && LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor !=
0)
>       {
> !       /* In this case we have to generate epilog loop, that
> !    can be done only for loops with one entry edge.  */
> !       if (LOOP_VINFO_LOOP (loop_vinfo)->num_entries != 1
> !     || !(LOOP_VINFO_LOOP (loop_vinfo)->pre_header))
> !   {
> !     if (vect_debug_stats (loop) || vect_debug_details (loop))
> !       fprintf (dump_file, "not vectorized: More than one loop
preheader.");
> !     return false;
> !   }
>      }


> !   if (LOOP_VINFO_INT_NITERS (loop_vinfo) == 0)
>       {
>         if (vect_debug_stats (loop) || vect_debug_details (loop))
>           fprintf (dump_file, "not vectorized: number of iterations =
0.");
>         return NULL;
>       }

could use integer_zerop (LOOP_VINFO_NITERS (loop))


thanks,

dorit




                                                                                                                                   
                      Olga Golovanevsky                                                                                            
                                               To:       gcc-patches@gcc.gnu.org                                                   
                      19/09/2004 23:54         cc:       Mark Mitchell <mark@codesourcery.com>, Dorit Naishlos/Haifa/IBM@IBMIL,    
                                                Ayal Zaks/Haifa/IBM@IBMIL                                                          
                                               From:     Olga Golovanevsky/Haifa/IBM@IBMIL                                         
                                               Subject:  [patch] lno branch merge - vectorizer patch #5                            
                                                                                                                                   



Hi,

This patch enhances vectorizer with support of loops
with unaligned stores. Both cases are handled:
a store which alignment is unknown in compile time
and a store with known unalignment.

The patch is base on loop transformation introduced
in vectorizer patch #2
(http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01104.html,
hopefully to be reviewed!)

Tested on powerpc-apple-darwin7.0.0. All vectorizer test
pass. Testcases 26 and 28 are changed from
xfail to pass; additional two loops are vectorized in
testcase 31.

The diff includes also vectorizer patch #2, while
ChangeLog is only for this patch.

thanks,
olga

ChengeLog:


        * tree-vectorizer.c (vect_get_loop_niters):
        (vect_gen_niters_for_prolog_loop):
        (vect_update_niters_after_peeling):
        (vect_update_inits_of_dr):
        (vect_update_inits_of_drs):
        (vect_do_peeling_for_alignment): New functions.
        (vect_transform_loop): Added unalignment support.
        (vect_analyze_data_refs_alignment): Allowed one unaligned
store.
        * tree-vectorizer.h (MAX_NUMBER_OF_UNALIGNED_DATA_REFS): New
define.
        (do_peeling_for_alignment):
        (unaligned_drs): New members of _loop_vec_info.
        (LOOP_DO_PEELING_FOR_ALIGNMENT): New macro.



#### peeling_main1.txt has been removed from this note on September 22,
2004 by Dorit Naishlos
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]