Patch: Add #pragma ivdep support to the ME and C FE

Wed Oct 16 14:58:00 GMT 2013

Frederic Riss wrote:
> Just one question. You describe the pragma in the doco patch as:
>
> +This pragma tells the compiler that the immediately following @code{for}
> +loop can be executed in any loop index order without affecting the result.
> +The pragma aids optimization and in particular vectorization as the
> +compiler can then assume a vectorization safelen of infinity.
>
> I'm not a specialist, but I was always told that the 'original'
> meaning of ivdep (which I believe was introduced by Cray), was that
> the compiler could assume that there are only forward dependencies in
> the loop, but not that it can be executed in any order.

The nice thing about #pragma ivdep is that there is no real standard. And
the explanation of the different vendors is also not completely clear.

Some overview about this is given in the following file on pages 13-14 for
Cray Reaseach PVP, MIPSPRO & Open64, Intel ICC, Multiflow
http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/GREPS2007-Benoit.pdf

That's summerized as:
- vector: ignore lexical upward dependencies (Cray PVP, Intel ICC)
- parallel: ignore loop-carried dependencies (MIPSPRO, Open64)
- liberal: ignore loop-variant dependencies (Multiflow)

The quotes for Cray and Intel are below.

Cray: http://docs.cray.com/books/004-2179-001/html-004-2179-001/brtlrwh.html#EKZ5MRWH
"The ivdep directive tells the compiler to ignore vector dependencies for
 the loop immediately following the directive. Conditions other than vector
 dependencies can inhibit vectorization. If these conditions are satisfactory,
 the loop vectorizes. This directive is useful for some loops that contain
 pointers and indirect addressing. The format of this directive is as follows:
 #pragma _CRI ivdep"

Intel: http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-B25ABCC2-BE6F-4599-AEDF-2434F4676E1B.htm
"The ivdep pragma instructs the compiler to ignore assumed vector dependencies.
 To ensure correct code, the compiler treats an assumed dependence as a proven
 dependence, which prevents vectorization. This pragma overrides that decision.
 Use this pragma only when you know that the assumed loop dependencies are safe
 to ignore."

> The Intel docs give this example:
...
> Given your description, this loop wouldn't be a candidate for ivdep,
> as reversing the loop index order changes the semantics. I believe
> that the way you interpret it (ie. setting vectorization safe length
> to INT_MAX) is correct with respect to this other definition, though.

Do you have a suggestion for a better wording? My idea was to interpret
this part similar to OpenMP's simd with safelen=infinity. (Actually, I
believe loop->safelen was added for OpenMPv4's and/or Cilk Plus's "simd".)

OpenMPv4.0, http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf , states
for this (excerpt from page 70):
"A SIMD loop has logical iterations numbered 0, 1,...,N-1 where N is the
number of loop iterations, and the logical numbering denotes the sequence
in which the iterations would be executed if the associated loop(s) were
executed with no SIMD instructions. If the safelen clause is used then no
two iterations executed concurrently with SIMD instructions can have a
greater distance in the logical iteration space than its value. The
parameter of the safelen clause must be a constant positive integer
expression. The number of iterations that are executed concurrently at
any given time is implementation defined. Each concurrent iteration will
be executed by a different SIMD lane. Each set of concurrent iterations
is a SIMD chunk."

> Oh, and are there any plans to maintain this information in some way
> till the back-end? Software pipelining could be another huge winner
> for that kind of dependency analysis simplification.

I don't know until when loop->safelen is kept. As it is late in the
middle-end, providing the backend with this information should be
simple.

Tobias