[forwarded from https://bugs.debian.org/949192] When gcc-10 compiles with -fsanitize=address, it substitutes any calls to regexec with a version that does not support REG_STARTEND. This makes code that is compiled fail unexpectedly or even produce spurious sanitization errors, since with that option the buffer need not be NUL-terminated. While REG_STARTEND is not in POSIX, it is found on the BSDs and Linux and users may reasonably rely on the fact that it is present on those systems. This issue has caused a bug in the Git testsuite as seen at https://lore.kernel.org/git/20200117174931.GA8958@coredump.intra.peff.net/T/#t. I've attached a testcase. Without -fsanitize=address, it succeeds silently. With -fsanitize=address, it fails and prints an error. Please either fix the regexec implementation such that it is fully functional compared to the version in glibc or disable the sanitization of regexec until it has feature parity. $ cat test.c #include <stdio.h> #include <sys/types.h> #include <regex.h> int main(void) { regex_t r; const char s[] = "ban\0ana"; regmatch_t pmatch[10]; pmatch[0].rm_so = 0; pmatch[0].rm_eo = sizeof(s); if (regcomp(&r, "ana", 0)) return 2; if (regexec(&r, s, sizeof(pmatch)/sizeof(pmatch[0]), pmatch, REG_STARTEND)) { fprintf(stderr, "failed to match\n"); regfree(&r); return 3; } regfree(&r); return 0; } $ gcc-9 -fsanitize=address test.c && ./a.out $ gcc-10 -fsanitize=address test.c && ./a.out failed to match $ gcc-11 -fsanitize=address test.c && ./a.out failed to match
Let me fix that.
I've just made an upstream patch submission.
I'm not sure if your patch is correct. glibc has the system of earliest symbol versions, and so on certain architectures GLIBC_2.3.4 symver will not appear at all. Given: ./sysdeps/mach/hurd/shlib-versions:DEFAULT GLIBC_2.2.6 ./sysdeps/unix/sysv/linux/csky/shlib-versions:DEFAULT GLIBC_2.29 ./sysdeps/unix/sysv/linux/microblaze/shlib-versions:DEFAULT GLIBC_2.18 ./sysdeps/unix/sysv/linux/arc/shlib-versions:DEFAULT GLIBC_2.32 ./sysdeps/unix/sysv/linux/m68k/coldfire/shlib-versions:DEFAULT GLIBC_2.4 ./sysdeps/unix/sysv/linux/arm/shlib-versions:DEFAULT GLIBC_2.4 ./sysdeps/unix/sysv/linux/s390/s390-64/shlib-versions:DEFAULT GLIBC_2.2 ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT GLIBC_2.27 ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT GLIBC_2.27 ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT GLIBC_2.33 ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT GLIBC_2.33 ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:# DEFAULT Earliest symbol set ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:DEFAULT GLIBC_2.16 ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:# DEFAULT Earliest symbol set ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:DEFAULT GLIBC_2.2.5 ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT GLIBC_2.17 ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT GLIBC_2.3 ./sysdeps/unix/sysv/linux/nios2/shlib-versions:DEFAULT GLIBC_2.21 ./sysdeps/unix/sysv/linux/aarch64/shlib-versions:DEFAULT GLIBC_2.17 and the limited list of arches supported by libsanitizer, I'd say at least riscv*, powerpc64le and aarch64 (and maybe x86-64 -mx32 if supported) are affected.
And maybe it would be worth comparing the list of glibc symbols with multiple symbol versions with the list of interposed libsanitizer symbols, perhaps several others need similar treatment.
(In reply to Jakub Jelinek from comment #3) > I'm not sure if your patch is correct. > glibc has the system of earliest symbol versions, and so on certain > architectures > GLIBC_2.3.4 symver will not appear at all. > Given: > ./sysdeps/mach/hurd/shlib-versions:DEFAULT GLIBC_2.2.6 > ./sysdeps/unix/sysv/linux/csky/shlib-versions:DEFAULT GLIBC_2.29 > ./sysdeps/unix/sysv/linux/microblaze/shlib-versions:DEFAULT GLIBC_2.18 > ./sysdeps/unix/sysv/linux/arc/shlib-versions:DEFAULT > GLIBC_2.32 > ./sysdeps/unix/sysv/linux/m68k/coldfire/shlib-versions:DEFAULT > GLIBC_2.4 > ./sysdeps/unix/sysv/linux/arm/shlib-versions:DEFAULT GLIBC_2.4 > ./sysdeps/unix/sysv/linux/s390/s390-64/shlib-versions:DEFAULT GLIBC_2.2 > ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT GLIBC_2.27 > ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT GLIBC_2.27 > ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT GLIBC_2.33 > ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT GLIBC_2.33 > ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:# DEFAULT Earliest > symbol set > ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:DEFAULT GLIBC_2.16 > ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:# DEFAULT Earliest > symbol set > ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:DEFAULT GLIBC_2.2.5 > ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT > GLIBC_2.17 > ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT > GLIBC_2.3 > ./sysdeps/unix/sysv/linux/nios2/shlib-versions:DEFAULT > GLIBC_2.21 > ./sysdeps/unix/sysv/linux/aarch64/shlib-versions:DEFAULT GLIBC_2.17 > and the limited list of arches supported by libsanitizer, I'd say at least > riscv*, powerpc64le and aarch64 (and maybe x86-64 -mx32 if supported) are > affected. Thank you for this. You are right, my patch is not correct. So for the archs we care about we should do: x86_64 - require GLIBC_2.3.4 ppc64 - require GLIBC_2.3.4 aarch64, ppc64le, x32 and riscv* are newer than 2.3.4, so a default non-versioned symbol should be fine. Am I right?
Well, it is not about what arches you care about, but what arches we support in libsanitizer/configure.tgt (from *-linux*). So, riscv64, aarch64, mips, arm, s390*, sparc*, powerpc*, x86. So it is desirable to get it right for all these. Thus, I think you want to use GLIBC_2.3.4 for Linux except on arm, riscv*, x86_64 -mx32, powerpc64le and aarch64. So you need to look up SANITIZE* macros for all of these...
I think libsanitizer falls back to a version-less lookup if the version cannot be found. Therefore, if the glibc baseline is after 2.3.4, the version-less lookup will find the unversioned symbol, which has the right behavior. I don't see any architecture that has two regexec symbols, but does not use GLIBC_2.3.4 for the most recent symbol, based on this command in the glibc source tree: git grep -c ' regexec F' | grep :2$ | cut -d: -f1 | xargs grep ' regexec F' A comment in the interceptor might make sense, though.
Even if it does, exporting regexec@@GLIBC_2.3.4 from libsanitizer when glibc doesn't support that symbol looks wrong.
(In reply to Jakub Jelinek from comment #8) > Even if it does, exporting regexec@@GLIBC_2.3.4 from libsanitizer when glibc > doesn't support that symbol looks wrong. I think all the interceptors use unversioned symbols, so this doesn't matter. Yes, it's quite broken, but when fixing this, you might as well go with the flow …
Ugh, that is quite misdesigned then...
(In reply to Florian Weimer from comment #7) > I think libsanitizer falls back to a version-less lookup if the version > cannot be found. Therefore, if the glibc baseline is after 2.3.4, the > version-less lookup will find the unversioned symbol, which has the right > behavior. Are you sure Florian about it. I've just tested the patch posted here: https://reviews.llvm.org/D95864 on aarch64 and I get the following crash: gcc -fsanitize=address test.c -g && ./a.out AddressSanitizer:DEADLYSIGNAL ================================================================= ==23512==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0xffffe9a34200 sp 0xffffe9a34200 T0) ==23512==Hint: pc points to the zero page. ==23512==The signal is caused by a READ memory access. ==23512==Hint: address points to the zero page. #0 0x0 (<unknown module>) #1 0xffff7d11269c (linux-vdso.so.1+0x69c) That said, there's no fallback.
Created attachment 50149 [details] Patch v2 There's a second version of the patch that selectively define SANITIZER_REGEXEC_VERSION based on the target. @Jakub: It follows your list of target expect ARM which should be fine: head -n1 ./sysdeps/unix/sysv/linux/arm/shlib-versions DEFAULT GLIBC_2.4 Thoughts?
LGTM. But as I said, it could be useful to look for other symbols too. Looking at i686 glibc, I see for j in /lib/libc.so.6 /lib/libpthread.so.0 /lib/libm.so.6; do for i in `readelf -Ws $j | grep -v ' UND ' | awk '{print $8}' | grep -v GLIBC_PRIVATE | grep '[^@]@[^@]' | sed 's/@.*$//' | sort -u`; do readelf -Ws $j | grep -v ' UND ' | awk '{print $8}' | grep -v GLIBC_PRIVATE | grep -q ^${i}@@ && echo $i; done; done alphasort64 chown clock_getcpuclockid clock_getres clock_gettime clock_nanosleep clock_settime fclose fcntl fdopen fgetpos fgetpos64 fmemopen fnmatch fopen fopencookie fsetpos fsetpos64 __fxstat64 getaliasbyname_r getaliasent_r getgrent_r getgrgid_r getgrnam_r gethostbyaddr_r gethostbyname2_r gethostbyname_r gethostent_r getnetbyaddr_r getnetbyname_r getnetent_r getprotobyname_r getprotobynumber_r getprotoent_r getpwent_r getpwnam_r getpwuid_r getrlimit getrlimit64 getrpcbyname_r getrpcbynumber_r getrpcent_r getservbyname_r getservbyport_r getservent_r getspent_r getspnam_r glob glob64 _IO_do_write _IO_fclose _IO_fdopen _IO_fgetpos _IO_fgetpos64 _IO_file_attach _IO_file_close_it _IO_file_finish _IO_file_fopen _IO_file_init _IO_file_overflow _IO_file_seekoff _IO_file_setbuf _IO_file_sync _IO_file_underflow _IO_file_write _IO_file_xsputn _IO_fopen _IO_fsetpos _IO_fsetpos64 _IO_popen _IO_proc_close _IO_proc_open localeconv __lxstat64 msgctl nftw nftw64 pclose popen posix_fadvise64 posix_fallocate64 posix_spawn posix_spawnp pthread_attr_init pthread_attr_setaffinity_np pthread_cond_broadcast pthread_cond_destroy pthread_cond_init pthread_cond_signal pthread_cond_timedwait pthread_cond_wait pthread_getaffinity_np pthread_getattr_np pthread_sigmask quick_exit readdir64 readdir64_r realpath regexec scandir64 sched_getaffinity sched_setaffinity semctl setrlimit shmctl tmpfile versionsort64 vm86 __xstat64 pthread_attr_getaffinity_np pthread_cond_broadcast pthread_cond_signal pthread_cond_timedwait pthread_cond_wait pthread_create pthread_setaffinity_np sem_destroy sem_getvalue sem_init sem_post sem_trywait sem_wait exp exp10f exp2 exp2f expf feclearexcept fegetenv fegetexceptflag feraiseexcept fesetenv fesetexceptflag feupdateenv lgamma lgammaf lgammal log log2 log2f logf pow powf totalorder totalorderf totalorderf128 totalorderf32 totalorderf32x totalorderf64 totalorderf64x totalorderl totalordermag totalordermagf totalordermagf128 totalordermagf32 totalordermagf32x totalordermagf64 totalordermagf64x totalordermagl On x86_64: for j in /lib64/libc.so.6 /lib64/libpthread.so.0 /lib64/libm.so.6; do for i in `readelf -Ws $j | grep -v ' UND ' | awk '{print $8}' | grep -v GLIBC_PRIVATE | grep '[^@]@[^@]' | sed 's/@.*$//' | sort -u`; do readelf -Ws $j | grep -v ' UND ' | awk '{print $8}' | grep -v GLIBC_PRIVATE | grep -q ^${i}@@ && echo $i; done; done clock_getcpuclockid clock_getres clock_gettime clock_nanosleep clock_settime fmemopen glob glob64 memcpy nftw nftw64 posix_spawn posix_spawnp pthread_attr_setaffinity_np pthread_cond_broadcast pthread_cond_destroy pthread_cond_init pthread_cond_signal pthread_cond_timedwait pthread_cond_wait pthread_getaffinity_np pthread_getattr_np pthread_sigmask quick_exit realpath regexec sched_getaffinity sched_setaffinity pthread_attr_getaffinity_np pthread_cond_broadcast pthread_cond_signal pthread_cond_timedwait pthread_cond_wait pthread_setaffinity_np exp exp10f exp2 exp2f expf lgamma lgammaf lgammal log log2 log2f logf pow powf totalorder totalorderf totalorderf128 totalorderf32 totalorderf32x totalorderf64 totalorderf64x totalorderl totalordermag totalordermagf totalordermagf128 totalordermagf32 totalordermagf32x totalordermagf64 totalordermagf64x totalordermagl So, which of these are interposed by sanitizers?
On x86_64 these are intercepted from the list you provided: sched_getaffinity clock_getcpuclockid lgammal clock_settime clock_gettime memcpy fmemopen realpath lgamma lgammaf glob glob64 pthread_attr_getaffinity_np pthread_sigmask regexec clock_getres
The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>: https://gcc.gnu.org/g:81fee438512460f1be50d91ee5de452f8fe5cc18 commit r11-7555-g81fee438512460f1be50d91ee5de452f8fe5cc18 Author: Martin Liska <mliska@suse.cz> Date: Mon Mar 8 15:52:03 2021 +0100 libsanitizer: cherry-pick ad294e572bc5c16f9dc420cc994322de6ca3fbfb libsanitizer/ChangeLog: PR sanitizer/98920 * asan/asan_interceptors.cpp (COMMON_INTERCEPT_FUNCTION_VER): Cherry pick. (COMMON_INTERCEPT_FUNCTION_VER_UNVERSIONED_FALLBACK): Likewise. * asan/asan_interceptors.h (ASAN_INTERCEPT_FUNC_VER_UNVERSIONED_FALLBACK): Likewise. * sanitizer_common/sanitizer_common_interceptors.inc (COMMON_INTERCEPT_FUNCTION_GLIBC_VER_MIN): Likewise. (INIT_REGEX): Likewise. * tsan/tsan_interceptors_posix.cpp (COMMON_INTERCEPT_FUNCTION_VER_UNVERSIONED_FALLBACK): Likewise. gcc/testsuite/ChangeLog: PR sanitizer/98920 * c-c++-common/asan/pr98920.c: New test.
Fixed on master, not planning to backport.
As I'd reported upstream, the new test FAILs on Solaris: +FAIL: c-c++-common/asan/pr98920.c -O0 (test for excess errors) +UNRESOLVED: c-c++-common/asan/pr98920.c -O0 compilation failed to produce executable +FAIL: c-c++-common/asan/pr98920.c -O1 (test for excess errors) +UNRESOLVED: c-c++-common/asan/pr98920.c -O1 compilation failed to produce executable +FAIL: c-c++-common/asan/pr98920.c -O2 (test for excess errors) +UNRESOLVED: c-c++-common/asan/pr98920.c -O2 compilation failed to produce executable +FAIL: c-c++-common/asan/pr98920.c -O2 -flto (test for excess errors) +UNRESOLVED: c-c++-common/asan/pr98920.c -O2 -flto compilation failed to produce executable +FAIL: c-c++-common/asan/pr98920.c -O2 -flto -flto-partition=none (test for excess errors) +UNRESOLVED: c-c++-common/asan/pr98920.c -O2 -flto -flto-partition=none compilation failed to produce executable +FAIL: c-c++-common/asan/pr98920.c -O3 -g (test for excess errors) +UNRESOLVED: c-c++-common/asan/pr98920.c -O3 -g compilation failed to produce executable +FAIL: c-c++-common/asan/pr98920.c -Os (test for excess errors) +UNRESOLVED: c-c++-common/asan/pr98920.c -Os compilation failed to produce executable Excess errors: /vol/gcc/src/hg/master/local/gcc/testsuite/c-c++-common/asan/pr98920.c:17:66: error: 'REG_STARTEND' was not declared in this scope; did you mean 'REG_EPAREN'? It should either be restricted to glibc targets if that's what it requires, or to those that do support REG_STARTEND.
Indeed. But perhaps instead of adding new effective target tests, in this case it could be resolved by: --- gcc/testsuite/c-c++-common/asan/pr98920.c.jj 2021-03-08 23:40:33.935447429 +0100 +++ gcc/testsuite/c-c++-common/asan/pr98920.c 2021-03-09 12:15:28.904809967 +0100 @@ -3,10 +3,13 @@ #include <stdio.h> #include <sys/types.h> +#if __has_include(<regex.h>) #include <regex.h> +#endif int main(void) { +#ifdef REG_STARTEND regex_t r; const char s[] = "ban\0ana"; regmatch_t pmatch[10]; @@ -20,5 +23,6 @@ int main(void) return 3; } regfree(&r); +#endif return 0; }
> --- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> --- > Indeed. > But perhaps instead of adding new effective target tests, in this case it could > be resolved by: [... wrapping the bulk of main in #ifdef REG_STARTEND ...] Right. In fact that's currently the preferred solution when I reported the bug upstream https://reviews.llvm.org/D96348, which has the same issue.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:ea7fff4c43a24dee1db4153250455e6040a1afce commit r11-7576-gea7fff4c43a24dee1db4153250455e6040a1afce Author: Jakub Jelinek <jakub@redhat.com> Date: Tue Mar 9 14:14:09 2021 +0100 testsuite: Fix up pr98920.c on non-glibc or old glibc targets [PR98920] Not all OSes have regex.h and not all OSes that do have REG_STARTEND macro support. Conditionalize the test on that. 2021-03-09 Jakub Jelinek <jakub@redhat.com> PR sanitizer/98920 * c-c++-common/asan/pr98920.c: Only include regex.h if the header exists. If REG_STARTEND macro isn't defined, just return 0 from main instead of the actual test.