Bug 98920

Summary: [10/11 Regression] uses regexec without support for REG_STARTEND with -fsanitize=address
Product: gcc Reporter: Matthias Klose <doko>
Component: sanitizerAssignee: Martin Liška <marxin>
Status: RESOLVED FIXED    
Severity: normal CC: dimhen, dodji, dvyukov, fw, fweimer, jakub, kcc, marxin, ro
Priority: P2 Keywords: wrong-code
Version: 10.2.1   
Target Milestone: 10.3   
URL: https://reviews.llvm.org/D96348
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2021-02-02 00:00:00
Attachments: Patch v2

Description Matthias Klose 2021-02-01 16:42:10 UTC
[forwarded from https://bugs.debian.org/949192]

When gcc-10 compiles with -fsanitize=address, it substitutes any calls
to regexec with a version that does not support REG_STARTEND.  This
makes code that is compiled fail unexpectedly or even produce spurious
sanitization errors, since with that option the buffer need not be
NUL-terminated.

While REG_STARTEND is not in POSIX, it is found on the BSDs and Linux
and users may reasonably rely on the fact that it is present on those
systems.

This issue has caused a bug in the Git testsuite as seen at
https://lore.kernel.org/git/20200117174931.GA8958@coredump.intra.peff.net/T/#t.

I've attached a testcase.  Without -fsanitize=address, it succeeds
silently.  With -fsanitize=address, it fails and prints an error.

Please either fix the regexec implementation such that it is fully
functional compared to the version in glibc or disable the sanitization
of regexec until it has feature parity.

$ cat test.c 
#include <stdio.h>
#include <sys/types.h>
#include <regex.h>

int main(void)
{
    regex_t r;
    const char s[] = "ban\0ana";
    regmatch_t pmatch[10];
    pmatch[0].rm_so = 0;
    pmatch[0].rm_eo = sizeof(s);
    if (regcomp(&r, "ana", 0))
        return 2;
    if (regexec(&r, s, sizeof(pmatch)/sizeof(pmatch[0]), pmatch, REG_STARTEND)) {
        fprintf(stderr, "failed to match\n");
        regfree(&r);
        return 3;
    }
    regfree(&r);
    return 0;
}


$ gcc-9 -fsanitize=address test.c && ./a.out 

$ gcc-10 -fsanitize=address test.c && ./a.out 
failed to match

$ gcc-11 -fsanitize=address test.c && ./a.out 
failed to match
Comment 1 Martin Liška 2021-02-02 10:04:29 UTC
Let me fix that.
Comment 2 Martin Liška 2021-02-02 12:52:05 UTC
I've just made an upstream patch submission.
Comment 3 Jakub Jelinek 2021-02-04 11:08:22 UTC
I'm not sure if your patch is correct.
glibc has the system of earliest symbol versions, and so on certain architectures
GLIBC_2.3.4 symver will not appear at all.
Given:
./sysdeps/mach/hurd/shlib-versions:DEFAULT			GLIBC_2.2.6
./sysdeps/unix/sysv/linux/csky/shlib-versions:DEFAULT			GLIBC_2.29
./sysdeps/unix/sysv/linux/microblaze/shlib-versions:DEFAULT			GLIBC_2.18
./sysdeps/unix/sysv/linux/arc/shlib-versions:DEFAULT                 GLIBC_2.32
./sysdeps/unix/sysv/linux/m68k/coldfire/shlib-versions:DEFAULT                 GLIBC_2.4
./sysdeps/unix/sysv/linux/arm/shlib-versions:DEFAULT			GLIBC_2.4
./sysdeps/unix/sysv/linux/s390/s390-64/shlib-versions:DEFAULT			GLIBC_2.2
./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT		GLIBC_2.27
./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT		GLIBC_2.27
./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT		GLIBC_2.33
./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT		GLIBC_2.33
./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:# DEFAULT			Earliest symbol set
./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:DEFAULT			GLIBC_2.16
./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:# DEFAULT			Earliest symbol set
./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:DEFAULT			GLIBC_2.2.5
./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT			GLIBC_2.17
./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT			GLIBC_2.3
./sysdeps/unix/sysv/linux/nios2/shlib-versions:DEFAULT                 GLIBC_2.21
./sysdeps/unix/sysv/linux/aarch64/shlib-versions:DEFAULT			GLIBC_2.17
and the limited list of arches supported by libsanitizer, I'd say at least riscv*, powerpc64le and aarch64 (and maybe x86-64 -mx32 if supported) are affected.
Comment 4 Jakub Jelinek 2021-02-04 11:19:16 UTC
And maybe it would be worth comparing the list of glibc symbols with multiple symbol versions with the list of interposed libsanitizer symbols, perhaps several others need similar treatment.
Comment 5 Martin Liška 2021-02-05 12:39:21 UTC
(In reply to Jakub Jelinek from comment #3)
> I'm not sure if your patch is correct.
> glibc has the system of earliest symbol versions, and so on certain
> architectures
> GLIBC_2.3.4 symver will not appear at all.
> Given:
> ./sysdeps/mach/hurd/shlib-versions:DEFAULT			GLIBC_2.2.6
> ./sysdeps/unix/sysv/linux/csky/shlib-versions:DEFAULT			GLIBC_2.29
> ./sysdeps/unix/sysv/linux/microblaze/shlib-versions:DEFAULT			GLIBC_2.18
> ./sysdeps/unix/sysv/linux/arc/shlib-versions:DEFAULT                
> GLIBC_2.32
> ./sysdeps/unix/sysv/linux/m68k/coldfire/shlib-versions:DEFAULT              
> GLIBC_2.4
> ./sysdeps/unix/sysv/linux/arm/shlib-versions:DEFAULT			GLIBC_2.4
> ./sysdeps/unix/sysv/linux/s390/s390-64/shlib-versions:DEFAULT			GLIBC_2.2
> ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT		GLIBC_2.27
> ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT		GLIBC_2.27
> ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT		GLIBC_2.33
> ./sysdeps/unix/sysv/linux/riscv/shlib-versions:DEFAULT		GLIBC_2.33
> ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:# DEFAULT			Earliest
> symbol set
> ./sysdeps/unix/sysv/linux/x86_64/x32/shlib-versions:DEFAULT			GLIBC_2.16
> ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:# DEFAULT			Earliest
> symbol set
> ./sysdeps/unix/sysv/linux/x86_64/64/shlib-versions:DEFAULT			GLIBC_2.2.5
> ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT		
> GLIBC_2.17
> ./sysdeps/unix/sysv/linux/powerpc/powerpc64/shlib-versions:DEFAULT		
> GLIBC_2.3
> ./sysdeps/unix/sysv/linux/nios2/shlib-versions:DEFAULT                
> GLIBC_2.21
> ./sysdeps/unix/sysv/linux/aarch64/shlib-versions:DEFAULT			GLIBC_2.17
> and the limited list of arches supported by libsanitizer, I'd say at least
> riscv*, powerpc64le and aarch64 (and maybe x86-64 -mx32 if supported) are
> affected.

Thank you for this. You are right, my patch is not correct. So for the archs we care about we should do:

x86_64 - require GLIBC_2.3.4
ppc64 - require GLIBC_2.3.4

aarch64, ppc64le, x32 and riscv* are newer than 2.3.4, so a default non-versioned symbol should be fine.

Am I right?
Comment 6 Jakub Jelinek 2021-02-05 13:07:01 UTC
Well, it is not about what arches you care about, but what arches we support in libsanitizer/configure.tgt (from *-linux*).
So, riscv64, aarch64, mips, arm, s390*, sparc*, powerpc*, x86.
So it is desirable to get it right for all these.
Thus, I think you want to use GLIBC_2.3.4 for Linux except on arm, riscv*, x86_64 -mx32, powerpc64le and aarch64.
So you need to look up SANITIZE* macros for all of these...
Comment 7 Florian Weimer 2021-02-05 13:16:11 UTC
I think libsanitizer falls back to a version-less lookup if the version cannot be found. Therefore, if the glibc baseline is after 2.3.4, the version-less lookup will find the unversioned symbol, which has the right behavior.

I don't see any architecture that has two regexec symbols, but does not use GLIBC_2.3.4 for the most recent symbol, based on this command in the glibc source tree:

git grep -c ' regexec F' | grep :2$ | cut -d: -f1 | xargs grep ' regexec F'

A comment in the interceptor might make sense, though.
Comment 8 Jakub Jelinek 2021-02-05 13:34:00 UTC
Even if it does, exporting regexec@@GLIBC_2.3.4 from libsanitizer when glibc doesn't support that symbol looks wrong.
Comment 9 Florian Weimer 2021-02-05 14:29:39 UTC
(In reply to Jakub Jelinek from comment #8)
> Even if it does, exporting regexec@@GLIBC_2.3.4 from libsanitizer when glibc
> doesn't support that symbol looks wrong.

I think all the interceptors use unversioned symbols, so this doesn't matter.

Yes, it's quite broken, but when fixing this, you might as well go with the flow …
Comment 10 Jakub Jelinek 2021-02-05 14:41:19 UTC
Ugh, that is quite misdesigned then...
Comment 11 Martin Liška 2021-02-09 09:33:07 UTC
(In reply to Florian Weimer from comment #7)
> I think libsanitizer falls back to a version-less lookup if the version
> cannot be found. Therefore, if the glibc baseline is after 2.3.4, the
> version-less lookup will find the unversioned symbol, which has the right
> behavior.

Are you sure Florian about it. I've just tested the patch posted here:
https://reviews.llvm.org/D95864 on aarch64 and I get the following crash:

gcc -fsanitize=address test.c -g && ./a.out 
AddressSanitizer:DEADLYSIGNAL
=================================================================
==23512==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0xffffe9a34200 sp 0xffffe9a34200 T0)
==23512==Hint: pc points to the zero page.
==23512==The signal is caused by a READ memory access.
==23512==Hint: address points to the zero page.
    #0 0x0  (<unknown module>)
    #1 0xffff7d11269c  (linux-vdso.so.1+0x69c)

That said, there's no fallback.
Comment 12 Martin Liška 2021-02-09 09:35:25 UTC
Created attachment 50149 [details]
Patch v2

There's a second version of the patch that selectively define SANITIZER_REGEXEC_VERSION based on the target.

@Jakub: It follows your list of target expect ARM which should be fine:
head -n1 ./sysdeps/unix/sysv/linux/arm/shlib-versions
DEFAULT			GLIBC_2.4

Thoughts?
Comment 13 Jakub Jelinek 2021-02-09 09:54:08 UTC
LGTM.  But as I said, it could be useful to look for other symbols too.
Looking at i686 glibc, I see
for j in /lib/libc.so.6 /lib/libpthread.so.0 /lib/libm.so.6; do for i in `readelf -Ws $j | grep -v ' UND ' | awk '{print $8}' | grep -v GLIBC_PRIVATE | grep '[^@]@[^@]' | sed 's/@.*$//' | sort -u`; do readelf -Ws $j | grep -v ' UND ' | awk '{print $8}' | grep -v GLIBC_PRIVATE | grep -q ^${i}@@ && echo $i; done; done
alphasort64
chown
clock_getcpuclockid
clock_getres
clock_gettime
clock_nanosleep
clock_settime
fclose
fcntl
fdopen
fgetpos
fgetpos64
fmemopen
fnmatch
fopen
fopencookie
fsetpos
fsetpos64
__fxstat64
getaliasbyname_r
getaliasent_r
getgrent_r
getgrgid_r
getgrnam_r
gethostbyaddr_r
gethostbyname2_r
gethostbyname_r
gethostent_r
getnetbyaddr_r
getnetbyname_r
getnetent_r
getprotobyname_r
getprotobynumber_r
getprotoent_r
getpwent_r
getpwnam_r
getpwuid_r
getrlimit
getrlimit64
getrpcbyname_r
getrpcbynumber_r
getrpcent_r
getservbyname_r
getservbyport_r
getservent_r
getspent_r
getspnam_r
glob
glob64
_IO_do_write
_IO_fclose
_IO_fdopen
_IO_fgetpos
_IO_fgetpos64
_IO_file_attach
_IO_file_close_it
_IO_file_finish
_IO_file_fopen
_IO_file_init
_IO_file_overflow
_IO_file_seekoff
_IO_file_setbuf
_IO_file_sync
_IO_file_underflow
_IO_file_write
_IO_file_xsputn
_IO_fopen
_IO_fsetpos
_IO_fsetpos64
_IO_popen
_IO_proc_close
_IO_proc_open
localeconv
__lxstat64
msgctl
nftw
nftw64
pclose
popen
posix_fadvise64
posix_fallocate64
posix_spawn
posix_spawnp
pthread_attr_init
pthread_attr_setaffinity_np
pthread_cond_broadcast
pthread_cond_destroy
pthread_cond_init
pthread_cond_signal
pthread_cond_timedwait
pthread_cond_wait
pthread_getaffinity_np
pthread_getattr_np
pthread_sigmask
quick_exit
readdir64
readdir64_r
realpath
regexec
scandir64
sched_getaffinity
sched_setaffinity
semctl
setrlimit
shmctl
tmpfile
versionsort64
vm86
__xstat64
pthread_attr_getaffinity_np
pthread_cond_broadcast
pthread_cond_signal
pthread_cond_timedwait
pthread_cond_wait
pthread_create
pthread_setaffinity_np
sem_destroy
sem_getvalue
sem_init
sem_post
sem_trywait
sem_wait
exp
exp10f
exp2
exp2f
expf
feclearexcept
fegetenv
fegetexceptflag
feraiseexcept
fesetenv
fesetexceptflag
feupdateenv
lgamma
lgammaf
lgammal
log
log2
log2f
logf
pow
powf
totalorder
totalorderf
totalorderf128
totalorderf32
totalorderf32x
totalorderf64
totalorderf64x
totalorderl
totalordermag
totalordermagf
totalordermagf128
totalordermagf32
totalordermagf32x
totalordermagf64
totalordermagf64x
totalordermagl

On x86_64:
for j in /lib64/libc.so.6 /lib64/libpthread.so.0 /lib64/libm.so.6; do for i in `readelf -Ws $j | grep -v ' UND ' | awk '{print $8}' | grep -v GLIBC_PRIVATE | grep '[^@]@[^@]' | sed 's/@.*$//' | sort -u`; do readelf -Ws $j | grep -v ' UND ' | awk '{print $8}' | grep -v GLIBC_PRIVATE | grep -q ^${i}@@ && echo $i; done; done
clock_getcpuclockid
clock_getres
clock_gettime
clock_nanosleep
clock_settime
fmemopen
glob
glob64
memcpy
nftw
nftw64
posix_spawn
posix_spawnp
pthread_attr_setaffinity_np
pthread_cond_broadcast
pthread_cond_destroy
pthread_cond_init
pthread_cond_signal
pthread_cond_timedwait
pthread_cond_wait
pthread_getaffinity_np
pthread_getattr_np
pthread_sigmask
quick_exit
realpath
regexec
sched_getaffinity
sched_setaffinity
pthread_attr_getaffinity_np
pthread_cond_broadcast
pthread_cond_signal
pthread_cond_timedwait
pthread_cond_wait
pthread_setaffinity_np
exp
exp10f
exp2
exp2f
expf
lgamma
lgammaf
lgammal
log
log2
log2f
logf
pow
powf
totalorder
totalorderf
totalorderf128
totalorderf32
totalorderf32x
totalorderf64
totalorderf64x
totalorderl
totalordermag
totalordermagf
totalordermagf128
totalordermagf32
totalordermagf32x
totalordermagf64
totalordermagf64x
totalordermagl

So, which of these are interposed by sanitizers?
Comment 14 Martin Liška 2021-02-09 11:50:25 UTC
On x86_64 these are intercepted from the list you provided:

sched_getaffinity
clock_getcpuclockid
lgammal
clock_settime
clock_gettime
memcpy
fmemopen
realpath
lgamma
lgammaf
glob
glob64
pthread_attr_getaffinity_np
pthread_sigmask
regexec
clock_getres
Comment 15 CVS Commits 2021-03-08 14:55:40 UTC
The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:81fee438512460f1be50d91ee5de452f8fe5cc18

commit r11-7555-g81fee438512460f1be50d91ee5de452f8fe5cc18
Author: Martin Liska <mliska@suse.cz>
Date:   Mon Mar 8 15:52:03 2021 +0100

    libsanitizer: cherry-pick ad294e572bc5c16f9dc420cc994322de6ca3fbfb
    
    libsanitizer/ChangeLog:
    
            PR sanitizer/98920
            * asan/asan_interceptors.cpp (COMMON_INTERCEPT_FUNCTION_VER):
            Cherry pick.
            (COMMON_INTERCEPT_FUNCTION_VER_UNVERSIONED_FALLBACK): Likewise.
            * asan/asan_interceptors.h (ASAN_INTERCEPT_FUNC_VER_UNVERSIONED_FALLBACK): Likewise.
            * sanitizer_common/sanitizer_common_interceptors.inc
            (COMMON_INTERCEPT_FUNCTION_GLIBC_VER_MIN): Likewise.
            (INIT_REGEX): Likewise.
            * tsan/tsan_interceptors_posix.cpp (COMMON_INTERCEPT_FUNCTION_VER_UNVERSIONED_FALLBACK):
            Likewise.
    
    gcc/testsuite/ChangeLog:
    
            PR sanitizer/98920
            * c-c++-common/asan/pr98920.c: New test.
Comment 16 Martin Liška 2021-03-08 14:56:06 UTC
Fixed on master, not planning to backport.
Comment 17 Rainer Orth 2021-03-09 10:47:34 UTC
As I'd reported upstream, the new test FAILs on Solaris:

+FAIL: c-c++-common/asan/pr98920.c   -O0  (test for excess errors)
+UNRESOLVED: c-c++-common/asan/pr98920.c   -O0  compilation failed to produce executable
+FAIL: c-c++-common/asan/pr98920.c   -O1  (test for excess errors)
+UNRESOLVED: c-c++-common/asan/pr98920.c   -O1  compilation failed to produce executable
+FAIL: c-c++-common/asan/pr98920.c   -O2  (test for excess errors)
+UNRESOLVED: c-c++-common/asan/pr98920.c   -O2  compilation failed to produce executable
+FAIL: c-c++-common/asan/pr98920.c   -O2 -flto  (test for excess errors)
+UNRESOLVED: c-c++-common/asan/pr98920.c   -O2 -flto  compilation failed to produce executable
+FAIL: c-c++-common/asan/pr98920.c   -O2 -flto -flto-partition=none  (test for excess errors)
+UNRESOLVED: c-c++-common/asan/pr98920.c   -O2 -flto -flto-partition=none  compilation failed to produce executable
+FAIL: c-c++-common/asan/pr98920.c   -O3 -g  (test for excess errors)
+UNRESOLVED: c-c++-common/asan/pr98920.c   -O3 -g  compilation failed to produce executable
+FAIL: c-c++-common/asan/pr98920.c   -Os  (test for excess errors)
+UNRESOLVED: c-c++-common/asan/pr98920.c   -Os  compilation failed to produce executable

Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/c-c++-common/asan/pr98920.c:17:66: error: 'REG_STARTEND' was not declared in this scope; did you mean 'REG_EPAREN'?

It should either be restricted to glibc targets if that's what it requires,
or to those that do support REG_STARTEND.
Comment 18 Jakub Jelinek 2021-03-09 11:16:09 UTC
Indeed.
But perhaps instead of adding new effective target tests, in this case it could be resolved by:
--- gcc/testsuite/c-c++-common/asan/pr98920.c.jj	2021-03-08 23:40:33.935447429 +0100
+++ gcc/testsuite/c-c++-common/asan/pr98920.c	2021-03-09 12:15:28.904809967 +0100
@@ -3,10 +3,13 @@
 
 #include <stdio.h>
 #include <sys/types.h>
+#if __has_include(<regex.h>)
 #include <regex.h>
+#endif
 
 int main(void)
 {
+#ifdef REG_STARTEND
     regex_t r;
     const char s[] = "ban\0ana";
     regmatch_t pmatch[10];
@@ -20,5 +23,6 @@ int main(void)
         return 3;
     }
     regfree(&r);
+#endif
     return 0;
 }
Comment 19 ro@CeBiTec.Uni-Bielefeld.DE 2021-03-09 12:06:57 UTC
> --- Comment #18 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> Indeed.
> But perhaps instead of adding new effective target tests, in this case it could
> be resolved by:
[... wrapping the bulk of main in #ifdef REG_STARTEND ...]

Right.  In fact that's currently the preferred solution when I reported
the bug upstream https://reviews.llvm.org/D96348, which has the same issue.
Comment 20 CVS Commits 2021-03-09 13:15:36 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:ea7fff4c43a24dee1db4153250455e6040a1afce

commit r11-7576-gea7fff4c43a24dee1db4153250455e6040a1afce
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Mar 9 14:14:09 2021 +0100

    testsuite: Fix up pr98920.c on non-glibc or old glibc targets [PR98920]
    
    Not all OSes have regex.h and not all OSes that do have REG_STARTEND macro support.
    Conditionalize the test on that.
    
    2021-03-09  Jakub Jelinek  <jakub@redhat.com>
    
            PR sanitizer/98920
            * c-c++-common/asan/pr98920.c: Only include regex.h if the header
            exists.  If REG_STARTEND macro isn't defined, just return 0 from main
            instead of the actual test.