Bug 112604 - [ia64] Output register not preserved after a branch is not taken
Summary: [ia64] Output register not preserved after a branch is not taken
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 13.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-18 12:48 UTC by Jakub Jermar
Modified: 2023-11-20 09:24 UTC (History)
1 user (show)

See Also:
Host:
Target: ia64
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
Requested object and preprocessed source files (207.08 KB, application/gzip)
2023-11-18 13:26 UTC, Jakub Jermar
Details
Updated requested good object file for vfs_file.c (222.34 KB, application/gzip)
2023-11-18 16:13 UTC, Jakub Jermar
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jermar 2023-11-18 12:48:43 UTC
After an upgrade to GCC 13.2 the HelenOS VFS server started to crash on IA-64 (see http://www.helenos.org/ticket/864). I looked into the issue and the problem seems to be that in the following code snippet:

fibril_mutex_lock(&vfs_data->lock);
if (!vfs_data->files) {
        vfs_data->files = malloc(VFS_MAX_OPEN_FILES * sizeof(vfs_file_t *));
        if (!vfs_data->files) {
                fibril_mutex_unlock(&vfs_data->lock);
                return false;
        }
        memset(vfs_data->files, 0, VFS_MAX_OPEN_FILES * sizeof(vfs_file_t *));
}
fibril_mutex_unlock(&vfs_data->lock);

The output argument prepared for the possible call to malloc (1024 in this case) destroys the argument for fibril_mutex_unlock() if the branch is not taken.

In assembly it looks like this:

4000000000001a00 <_vfs_fd_alloc>:
4000000000001a00:       08 48 39 18 80 05       [MMI]       alloc r41=ar.pfs,14,12,0
4000000000001a06:       c0 02 80 00 42 00                   mov r44=r32
4000000000001a0c:       05 00 c4 00                         mov r40=b0
4000000000001a10:       09 38 01 41 00 21       [MMI]       adds r39=64,r32
4000000000001a16:       a0 02 04 00 42 40                   mov r42=r1
4000000000001a1c:       04 10 41 00                         zxt1 r34=r34;;
4000000000001a20:       11 28 fd 01 00 24       [MIB]       mov r37=127
4000000000001a26:       b0 02 04 65 00 00                   mov.i r43=ar.lc
4000000000001a2c:       e8 b4 01 50                         br.call.sptk.many b0=400000000001cf00 <fibril_mutex_lock>;;
4000000000001a30:       08 60 01 40 00 21       [MMI]       mov r44=r32
4000000000001a36:       e0 00 9c 30 20 20                   ld8 r14=[r39]
4000000000001a3c:       00 50 01 84                         mov r1=r42
4000000000001a40:       0a 68 05 00 00 24       [MMI]       mov r45=1;;
4000000000001a46:       c0 02 00 10 48 e0                   mov r44=1024
4000000000001a4c:       00 70 18 e4                         cmp.eq p7,p6=0,r14
4000000000001a50:       16 00 00 00 00 c8       [BBB]       nop.b 0x0
4000000000001a56:       01 f0 01 80 21 00             (p07) br.cond.dpnt.few 4000000000001e30 <_vfs_fd_alloc+0x430>
4000000000001a5c:       10 00 00 40                         br.few 4000000000001a60 <_vfs_fd_alloc+0x60>
4000000000001a60:       11 00 00 00 01 00       [MIB]       nop.m 0x0
4000000000001a66:       00 00 00 02 00 00                   nop.i 0x0
4000000000001a6c:       28 ba 01 50                         br.call.sptk.many b0=400000000001d480 <fibril_mutex_unlock>;;
4000000000001a70:       08 60 01 40 00 21       [MMI]       mov r44=r32

The out0 is r44 in this context. Note how it is first correctly restored to the mutex address at address 1a30 after the fibril_mutex_lock call. But then this value is not used and gets rewritten to 1024 at address 1a46 in preparation for a possible branch and a consequent call to malloc. If the branch is taken, the register is restored properly (not shown here), but if the branch is not taken at address 1a56, the call to fibril_mutex_unlock at address 1a6c is made with a wrong value of r44.

We used the following command line to compile the above snippet:
usr/local/cross/bin/ia64-helenos-gcc -Iuspace/srv_vfs.p -Iuspace -I../../../uspace -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=gnu11 -imacros /home/jermar/software/HelenOS/helenos/build_all/ia64/ski/config.h -O3 -fexec-charset=UTF-8 -finput-charset=UTF-8 -D_HELENOS_SOURCE -Wa,--fatal-warnings -Wall -Wextra -Wwrite-strings -Wunknown-pragmas -Wno-unused-parameter -pipe -ffunction-sections -fdata-sections -fno-common -fdebug-prefix-map=/home/jermar/software/HelenOS/helenos/= -fdebug-prefix-map=../../= -Wsystem-headers -Werror -Wmissing-prototypes -Werror-implicit-function-declaration -Wno-missing-braces -Wno-missing-field-initializers -Wno-unused-parameter -Wno-clobbered -Wno-nonnull-compare -fno-builtin-strftime -isystem../../../common/include -isystem../../../abi/include -isystem../../../abi/arch/ia64/include -isystem../../../uspace/lib/c/arch/ia64/include -isystem../../../uspace/lib/c/include -D__LE__ -fno-unwind-tables -MD -MQ uspace/srv_vfs.p/srv_vfs_vfs_file.c.o -MF uspace/srv_vfs.p/srv_vfs_vfs_file.c.o.d -o uspace/srv_vfs.p/srv_vfs_vfs_file.c.o -c ../../../uspace/srv/vfs/vfs_file.c

$ /usr/local/cross/bin/ia64-helenos-gcc -v                                                                                                                  1209ms  Sat 18 Nov 2023 12:42:07 PM UTC
Using built-in specs.
COLLECT_GCC=/usr/local/cross/bin/ia64-helenos-gcc
COLLECT_LTO_WRAPPER=/home/jermar/toolchain/cross/bin/../libexec/gcc/ia64-helenos/13.2.0/lto-wrapper
Target: ia64-helenos
Configured with: /home/jermar/software/HelenOS/helenos/tools/downloads/gcc-13.2/configure --target=ia64-helenos --prefix=/usr/local/cross --program-prefix=ia64-helenos- --with-gnu-as --with-gnu-ld --disable-nls --enable-languages=c,c++,go --enable-lto --disable-shared --disable-werror --without-headers
Thread model: single
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.0 (GCC) 

Note that even though I am reporting this for 13.2.0, I saw the same issue already with 13.1.1.
Comment 1 Sam James 2023-11-18 12:51:44 UTC
Could you attach bad/good object files and preprocessed source for vfs_file.c?
Comment 2 Jakub Jermar 2023-11-18 13:26:37 UTC
Created attachment 56632 [details]
Requested object and preprocessed source files

Here are the bad and good object and preprocessed sources fies for vfs_file.c. Note that the difference between the good and bad version is -O3 (bad) v.s -O2 (good), but it was the same compiler. I am not sure this is going to be very helpful as with -O2 the code is quite different and the offending function is not inlined. If desired, I may try to go back to GCC 8.2 (the last good version known to me) and try to provide a good file generated with the same compiler flags. Let me know if this would be more useful.

I also attached the entire binary of the VFS server, both good and bad versions. Note these are HelenOS binaries and need to be run in the environment of the HelenOS operating system, which might not be practical for you.
Comment 3 Jakub Jermar 2023-11-18 16:13:33 UTC
Created attachment 56633 [details]
Updated requested good object file for vfs_file.c

Seems like -O3 -fno-unswitch-loops makes the bug go away. See the new attachment for a better example of the good binary and the object file. With this optimization disabled the argument for the malloc/calloc is set only after the branch is taken so it does not destroy the argument for fibril_mutex_unlock():

4000000000001a00 <_vfs_fd_alloc>:
4000000000001a00:       08 48 3d 1a 80 05       [MMI]       alloc r41=ar.pfs,15,13,0
4000000000001a06:       d0 02 80 00 42 60                   mov r45=r32
4000000000001a0c:       05 00 cc 00                         mov r43=pr
4000000000001a10:       09 38 01 41 00 21       [MMI]       adds r39=64,r32
4000000000001a16:       a0 02 04 00 42 40                   mov r42=r1
4000000000001a1c:       04 10 41 00                         zxt1 r34=r34;;
4000000000001a20:       11 80 00 44 91 39       [MIB]       cmp4.eq p16,p17=0,r34
4000000000001a26:       80 02 00 62 00 00                   mov r40=b0
4000000000001a2c:       68 b2 01 50                         br.call.sptk.many b0=400000000001cc80 <fibril_mutex_lock>;;
4000000000001a30:       08 68 01 40 00 21       [MMI]       mov r45=r32
4000000000001a36:       44 02 00 00 42 20             (p16) mov r36=r0
4000000000001a3c:       00 50 01 84                         mov r1=r42
4000000000001a40:       09 70 00 4e 18 10       [MMI]       ld8 r14=[r39]
4000000000001a46:       54 02 00 00 42 c0             (p16) mov r37=r0
4000000000001a4c:       15 00 00 90                         mov r46=1;;
4000000000001a50:       38 22 e1 01 07 64       [MMB] (p17) mov r36=1016
4000000000001a56:       54 fa 03 00 48 00             (p17) mov r37=127
4000000000001a5c:       00 00 00 20                         nop.b 0x0
4000000000001a60:       11 38 00 1c 06 39       [MIB]       cmp.eq p7,p6=0,r14
4000000000001a66:       c0 02 04 65 80 03                   mov.i r44=ar.lc
4000000000001a6c:       f0 03 00 43                   (p07) br.cond.dpnt.few 4000000000001e50 <_vfs_fd_alloc+0x450>;;
4000000000001a70:       10 00 00 00 01 00       [MIB]       nop.m 0x0
4000000000001a76:       00 00 00 02 00 00                   nop.i 0x0
4000000000001a7c:       10 00 00 40                         br.few 4000000000001a80 <_vfs_fd_alloc+0x80>
4000000000001a80:       11 00 00 00 01 00       [MIB]       nop.m 0x0
4000000000001a86:       00 00 00 02 00 00                   nop.i 0x0
4000000000001a8c:       88 b7 01 50                         br.call.sptk.many b0=400000000001d200 <fibril_mutex_unlock>;;
4000000000001a90:       11 68 01 40 00 21       [MIB]       mov r45=r32
....
4000000000001e50:       11 68 01 00 08 24       [MIB]       mov r45=1024
4000000000001e56:       00 00 00 02 00 00                   nop.i 0x0
4000000000001e5c:       f8 bd 00 50                         br.call.sptk.many b0=400000000000dc40 <calloc>;;
....