This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug fortran/51638] gfortran optimization breaks a single variable used as both input and output for subroutine call

From: "burnus at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 05 Jan 2012 14:26:25 +0000
Subject: [Bug fortran/51638] gfortran optimization breaks a single variable used as both input and output for subroutine call
Auto-submitted: auto-generated
References: <bug-51638-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51638

--- Comment #7 from Tobias Burnus <burnus at gcc dot gnu.org> 2012-01-05 14:26:25 UTC ---
(In reply to comment #6)
> > Fortran code is invalid.
> This code is like it is since its first commit on 11-Jun-93

This does not mean that was and/or is valid. Newer compilers tend to optimize
more aggressively than older.


> I still do think there is a problem with gfortran, because 1) it is the only
> one which breaks this simple code regarding our 25 years experience, and 2)
> because it randomly breaks it depending on the architecture and options.

Well, as long as there is no valid program which breaks, I believe that
gfortran does not have a bug (with regards to the this problem report).

In particular, as Steve quoted in comment 2 the 12.4.1.7 guarantees that
nontarget/nonpointer arguments either do not alias or are not modified. With
this in mind, let's look at the generated assembler (compile with "-S" to
generate the assembler file).

If one looks at the assembler for

subroutine iei4ei(inpu,oupu)
  integer(kind=1) :: inpu(4)
  integer(kind=1) :: oupu(4)
  integer(kind=1) :: inte(4)
  inte(:) = inpu(:)
  oupu(4) = inte(1)
  oupu(3) = inte(2)
  oupu(2) = inte(3)
  oupu(1) = inte(4)
end subroutine iei4ei

one gets with -O1 the following (i386-Linux):

.LFB0:
    subl    $8, %esp ! Make space on the stack for 8 byte
.LCFI0:
    movl    %ebx, (%esp)   ! Save %ebx register on the stack
    movl    %esi, 4(%esp)  ! Save %esi register on the stack
.LCFI1:
    movl    12(%esp), %esi ! %esi = addr of inpu
    movl    16(%esp), %ebx ! %ebx = addr of outpu
    movzbl    1(%esi), %ecx  ! %ecx = inpu(2)  ! (zero-extended load)
    movzbl    2(%esi), %edx  ! %edx = inpu(3)  ! (zero-extended load)
    movzbl    3(%esi), %eax  ! %ecx = inpu(4)  ! (zero-extended load)
    movb    %al, (%ebx)    ! outpu(1) = inpu(4) ! al == lowest byte of eax
    movzbl    (%esi), %eax   ! %eax = inpu(1)
    movb    %al, 3(%ebx)   ! outpu(4) = inpu(1)
    movb    %cl, 2(%ebx)   ! outpu(3) = inpu(2)
    movb    %dl, 1(%ebx)   ! outpu(2) = inpu(3)
    movl    (%esp), %ebx  ! Restore ebx register
    movl    4(%esp), %esi ! Restore esi register
    addl    $8, %esp      ! Free the allocated stack
.LCFI2:
    ret                   ! Return


With x86-64-Linux, the assembler is as follows (again with -O1):

.LFB0:
        movzbl  1(%rdi), %ecx  ! ecx = inpu(2)
        movzbl  2(%rdi), %edx  ! edx = inpu(3)
        movzbl  3(%rdi), %eax  ! eax = inpu(4)
        movzbl  (%rdi), %edi   ! edi = inpu(1)
        movb    %dil, 3(%rsi)  ! outpu(4) = inpu(3)
        movb    %cl, 2(%rsi)   ! outpu(3) = inpu(2)
        movb    %dl, 1(%rsi)   ! outpu(2) = inpu(3)
        movb    %al, (%rsi)    ! outpu(1) = inpu(4)
        ret

Thus, it is not surprising that it works with x86-64-Linux and that it fails
with i386-Linux. However, both assembler outputs look perfectly valid - only
that the second version also works if the arguments alias while the first one
does not.


For completeness, ifort 11.1 (ia32) generates with -O2 the code:

        pushl     %edi             ! Save %edi on the stack
        pushl     %esi             ! Save %eds on the stack
        pushl     %ebx             ! Save %ebx on the stack
        movl      16(%esp), %eax          ! eax = inpu
        movl      20(%esp), %esi          ! esi = outpu
        movzbl    (%eax), %edx            ! edx = inpu(1)
        movzbl    1(%eax), %ecx           ! ecx = inpu(2)
        movzbl    2(%eax), %ebx           ! ebx = inpu(3)
        movzbl    3(%eax), %eax           ! eax = inpu(4)
        movb      %dl, iei4ei_$INTE.0.1   ! inte(1) = dl(1)
        movb      %al, (%esi)             ! outpu(1) = inpu(4)
        movb      %cl, 1+iei4ei_$INTE.0.1 ! inte(2) = dl(2)
        movb      %dl, 3(%esi)            ! outpu(4) = inpu(1)
        movb      %cl, 2(%esi)            ! outpu(3) = inpu(2)
        movb      %bl, 2+iei4ei_$INTE.0.1 ! inte(3) = dl(3)
        movb      %al, 3+iei4ei_$INTE.0.1 ! inte(4) = dl(4)
        movb      %bl, 1(%esi)     ! outpu(2) = inpu(3)
        popl      %ebx             ! Restore %ebx
        popl      %esi             ! Restore %esi
        popl      %edi             ! Restore %edi
        ret                        ! Return


Thus, in principle ICC does the same, except that it the generated assembler is
slightly differed such that it works without running into the alias issues.
(I wonder why "ifort" does not optimize the "inte = inpu" away as it is never
used in the assembler.)


> We have never observed this behavior anywhere else for the past 25 years. I
> won't be so confident that the other compilers perform the same optimization
> (at least at level O1), exactly because of all the Fortran 66/77 legacy code
> the Fortran community uses...

I think any well-optimizing compiler takes into account that any pair of
nontarget, nonpointer arguments do not alias (if one of them gets modified).
The C equivalent is that the pointer arguments have the "restrict" attribute.
Thus, unless the compiler is old or not well optimizing, a similar optimization
should be done by other compilers.

Whether that causes alias issues or not (for a given program) depends on the
generated assembler instructions. As the three versions above show, it happens
that only GCC's i386 assembler exposes the alias issues while Intel's version
and GCC's x86-64 version do not. With a slightly different register allocation,
the situation could be different.


> > or make "inte" VOLATILE.
> This seems an interesting path. I am just wondering about computing efficiency
> then, but since this avoids making copies out or in the subroutine, this might
> be the solution.

Well, I think it should not be too bad, though more memory reads are required.


I think the better solution is to mark the two arguments as possibly aliasing
using the TARGET attribute. However, that only works if you do not have the
caller and the callee in the same file - otherwise some compilers will warn or
print an error that an explicit interface is required.

For instance, gfortran or NAG will print an error - while "ifort -warn all"
will warn. (One could argue that gfortran and NAG should only warn, however, it
is always difficult to decide what to allow as vendor extension. The standard
simply states that an explicit interface is required with TARGET. Additionally,
if the caller does not know certain attributes, it might generate wrong code.
Especially in obvious cases like VALUE, if a compiler only warns, users might
miss/ignore the warning and get wrong results [e.g. ifort only warns and not
even with default options].)


> > (You still have the issue of passing an
> > INTEGER(4) to an INTEGER(1)
> 
> This subroutine is part of our low-level set of subroutines (here: IEEE <->
> EEEI byte swapping), which have to be as simple and efficient as possible
> (they can be used on large amount of data).

I do not think that the different integer kinds cause a problem in practice.
Still it's invalid and might cause some trouble in particular cases.

For EQUIVALENCE, the situation is similar - albeit the chance of getting into
trouble is higher. As the elements of equivalence are of different type, the
compiler might do some optimizations which break assumptions that setting one
equivalenced variable will modify the bitpattern of the other.

That's the same with C's union. With -fstrict-aliasing (enabled with -O2), the
compiler does type-based analysis [see -fstrict-aliasing in the GCC manual].

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]