Created attachment 54522 [details] Reduced test case Steps to reproduce: 1. Save the attachment in a file named av1_fwd_txfm2d.c. 2. On a Debian x86_64 GNU/Linux system, install the gcc-aarch64-linux-gnu package: $ sudo apt install gcc-aarch64-linux-gnu 3. Check the compiler version: $ /usr/bin/aarch64-linux-gnu-gcc --version aarch64-linux-gnu-gcc (Debian 12.2.0-3) 12.2.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 4. Compile the av1_fwd_txfm2d.c file with the following command line: $ /usr/bin/aarch64-linux-gnu-gcc -march=armv8-a -O3 -DNDEBUG -std=c99 -Wall -Wdisabled-optimization -Wextra -Wfloat-conversion -Wformat=2 -Wimplicit-function-declaration -Wlogical-op -Wpointer-arith -Wsign-compare -Wtype-limits -Wuninitialized -Wunused -Wvla -Wstack-usage=100000 -Wshadow -Wundef -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=0 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC -o av1_fwd_txfm2d.o -c av1_fwd_txfm2d.c The compiler emits the following warnings: In function ‘set_fwd_txfm_non_scale_range’, inlined from ‘av1_get_fwd_txfm_cfg’ at av1_fwd_txfm2d.c:86:3: av1_fwd_txfm2d.c:64:31: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 64 | cfg->stage_range_col[i] = (range_mult2_col[i] + 1) >> 1; | ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ av1_fwd_txfm2d.c: In function ‘av1_get_fwd_txfm_cfg’: av1_fwd_txfm2d.c:23:10: note: at offset 12 into destination object ‘stage_range_col’ of size 12 23 | int8_t stage_range_col[12]; | ^~~~~~~~~~~~~~~ In function ‘set_fwd_txfm_non_scale_range’, inlined from ‘av1_get_fwd_txfm_cfg’ at av1_fwd_txfm2d.c:86:3: av1_fwd_txfm2d.c:64:31: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 64 | cfg->stage_range_col[i] = (range_mult2_col[i] + 1) >> 1; | ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ av1_fwd_txfm2d.c: In function ‘av1_get_fwd_txfm_cfg’: av1_fwd_txfm2d.c:23:10: note: at offset 13 into destination object ‘stage_range_col’ of size 12 23 | int8_t stage_range_col[12]; | ^~~~~~~~~~~~~~~ In function ‘set_fwd_txfm_non_scale_range’, inlined from ‘av1_get_fwd_txfm_cfg’ at av1_fwd_txfm2d.c:86:3: av1_fwd_txfm2d.c:72:31: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 72 | cfg->stage_range_row[i] = | ~~~~~~~~~~~~~~~~~~~~~~~~^ 73 | (range_mult2_col[cfg->stage_num_col - 1] + range_mult2_row[i] + 1) >> | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 74 | 1; | ~ av1_fwd_txfm2d.c: In function ‘av1_get_fwd_txfm_cfg’: av1_fwd_txfm2d.c:24:10: note: at offset 12 into destination object ‘stage_range_row’ of size 12 24 | int8_t stage_range_row[12]; | ^~~~~~~~~~~~~~~ In function ‘set_fwd_txfm_non_scale_range’, inlined from ‘av1_get_fwd_txfm_cfg’ at av1_fwd_txfm2d.c:86:3: av1_fwd_txfm2d.c:72:31: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 72 | cfg->stage_range_row[i] = | ~~~~~~~~~~~~~~~~~~~~~~~~^ 73 | (range_mult2_col[cfg->stage_num_col - 1] + range_mult2_row[i] + 1) >> | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 74 | 1; | ~ av1_fwd_txfm2d.c: In function ‘av1_get_fwd_txfm_cfg’: av1_fwd_txfm2d.c:24:10: note: at offset 13 into destination object ‘stage_range_row’ of size 12 24 | int8_t stage_range_row[12]; | ^~~~~~~~~~~~~~~
If I increase the size of the `stage_range_col` and `stage_range_row` arrays in the `TXFM_2D_FLIP_CFG` struct from 12 to 13, 14, 15, the warning messages change and eventually disappear when the array size becomes 15. This seems to imply the compiler somehow thinks cfg->stage_num_col and cfg->stage_num_row can be equal to 15, but I can't figure out how the compiler arrives at the 15 number. Also, I don't get these warnings if I compile with /usr/bin/cc on my Debian x86_64 GNU/Linux system, which has the following verison: $ /usr/bin/cc --version cc (Debian 12.2.0-10) 12.2.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
So the vectorizer is over vectorizing this code slightly. The easiest fix is at add: if (stage_num_col > 12) __builtin_unreachable(); and if (stage_num_row > 12) __builtin_unreachable(); I thought there was another bug related to this warning/vectorizer over vectorizing but I can't find it right now.
(In reply to Wan-Teh Chang from comment #1) > Also, I don't get these warnings if I compile with /usr/bin/cc on my Debian > x86_64 GNU/Linux system, which has the following verison: That is because the autovectorizer is doing it slightly different for x86_64 vs aarch64.
Andrew: Thank you very much for the quick reply. I am also curious about the 15 number. Do you know why the compiler seems to think cfg->stage_num_col and cfg->stage_num_row can be equal to 15? I.e., why do the warnings go away if I increase the array size to 15?
(In reply to Wan-Teh Chang from comment #4) > Andrew: Thank you very much for the quick reply. > > I am also curious about the 15 number. Do you know why the compiler seems to > think cfg->stage_num_col and cfg->stage_num_row can be equal to 15? I.e., > why do the warnings go away if I increase the array size to 15? It is just an over-vectorization/unrolling of the loop. I am not 100% sure but I think it is the epilogue of the vectorized loop which is max of 15. That is the vector version of the loop can be 16 char at a time but the epilogue will be max 15 of size.