As discussed yesterday on IRC, current build of GCC has various issues that make it not fully parallelable on machines with higher number of CPUs. I've did a hack to make where I recorded timestamp when a target is triggered and finished: https://github.com/marxin/make/tree/timestamp Then I built GCC with -j1 and used following parser to generate reports: https://github.com/marxin/script-misc/blob/master/parse-make-log.py I prepared various reports that I'm going to add as attachments.
Created attachment 43420 [details] make all-host -j8 on 8 core Haswell machine
Created attachment 43421 [details] make all-host -j128 on 128 core EPYC machine
Created attachment 43422 [details] make (for configure --disable-boostrap) -j128 on 128 core EPYC machine
Created attachment 43423 [details] wall time report: make (for configure --disable-boostrap) on Haswell machine (system compiler -O2 -g)
Created attachment 43424 [details] wall time report: boostrap stage1 on Haswell machine
Created attachment 43425 [details] wall time report: boostrap stage2 on Haswell machine
Created attachment 43426 [details] wall time report: boostrap stage3 on Haswell machine
I forgot to note that minimum time threshold is 0.5s for the wall time reports.
Created attachment 43428 [details] Parallel build of make all-host on 8 core Haswell machine
Created attachment 43432 [details] Parallel build of make all-host on 8 core Haswell machine
(In reply to Martin Liška from comment #10) > Created attachment 43432 [details] > Parallel build of make all-host on 8 core Haswell machine This was generated with a slightly modified make (being able to run fully in parallel): https://github.com/marxin/make/tree/timestamp-v2 And output is then parsed and 'stacked' graph is generated: https://github.com/marxin/script-misc/blob/master/parse-make-log-parallel.py
Created attachment 43439 [details] Parallel build of make all-host on 8 core Haswell machine
Created attachment 43440 [details] Parallel build of make all-host on 128 core EPYC machine
Created attachment 43478 [details] -ftime-report for most time consuming files on Haswell machine
This is a -O0 build? That's what that time report shows afaics.
Created attachment 43482 [details] -ftime-report for most time consuming files on Haswell machine Properly generated with -O2 which was missing in previous version.
The results in comment #13 seem to be missing some compilations -- I would have expected to see more files from libcpp in there. As it is I only see directives.o and line-map.o.
Created attachment 43492 [details] Parallel build of make all-host on 128 core EPYC machine (log file)
(In reply to Tom Tromey from comment #17) > The results in comment #13 seem to be missing some compilations -- > I would have expected to see more files from libcpp in there. > As it is I only see directives.o and line-map.o. There was a minimum threshold of 0.5s, please take a look at log file in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402#c18
For the libsanitizer/*/*_interceptors I make a quick patch: https://github.com/marxin/gcc/commit/5ce658230db567474997fa411f23ac78366487ce which basically splits asan_interceptors.cc and sanitizer_common_interceptors.inc and moves implementation of string functions to a separate compile unit. This shrinks time from 38->34s for asan_interceptors.cc being built with enabled checking stage1 compiler. I believe splitting the interceptors to couple of logical sub-files will make it very fast. List of interceptors grepped from sanitizer_common_interceptors.inc: I can imagine splitting that to components like string, stdio, time, process, thread, math,.. INTERCEPTOR(SIZE_T, strlen, const char *s) { INTERCEPTOR(SIZE_T, strnlen, const char *s, SIZE_T maxlen) { INTERCEPTOR(char*, strndup, const char *s, uptr size) { INTERCEPTOR(char*, __strndup, const char *s, uptr size) { INTERCEPTOR(char*, textdomain, const char *domainname) { INTERCEPTOR(int, strcmp, const char *s1, const char *s2) { INTERCEPTOR(int, strncmp, const char *s1, const char *s2, uptr size) { INTERCEPTOR(int, strcasecmp, const char *s1, const char *s2) { INTERCEPTOR(int, strncasecmp, const char *s1, const char *s2, SIZE_T size) { INTERCEPTOR(char*, strstr, const char *s1, const char *s2) { INTERCEPTOR(char*, strcasestr, const char *s1, const char *s2) { INTERCEPTOR(char*, strtok, char *str, const char *delimiters) { INTERCEPTOR(void*, memmem, const void *s1, SIZE_T len1, const void *s2, INTERCEPTOR(char*, strchr, const char *s, int c) { INTERCEPTOR(char*, strchrnul, const char *s, int c) { INTERCEPTOR(char*, strrchr, const char *s, int c) { INTERCEPTOR(SIZE_T, strspn, const char *s1, const char *s2) { INTERCEPTOR(SIZE_T, strcspn, const char *s1, const char *s2) { INTERCEPTOR(char *, strpbrk, const char *s1, const char *s2) { INTERCEPTOR(void *, memset, void *dst, int v, uptr size) { INTERCEPTOR(void *, memmove, void *dst, const void *src, uptr size) { INTERCEPTOR(void *, memcpy, void *dst, const void *src, uptr size) { INTERCEPTOR(int, memcmp, const void *a1, const void *a2, uptr size) { INTERCEPTOR(void*, memchr, const void *s, int c, SIZE_T n) { INTERCEPTOR(void*, memrchr, const void *s, int c, SIZE_T n) { INTERCEPTOR(double, frexp, double x, int *exp) { INTERCEPTOR(float, frexpf, float x, int *exp) { INTERCEPTOR(long double, frexpl, long double x, int *exp) { INTERCEPTOR(SSIZE_T, read, int fd, void *ptr, SIZE_T count) { INTERCEPTOR(SIZE_T, fread, void *ptr, SIZE_T size, SIZE_T nmemb, void *file) { INTERCEPTOR(SSIZE_T, pread, int fd, void *ptr, SIZE_T count, OFF_T offset) { INTERCEPTOR(SSIZE_T, pread64, int fd, void *ptr, SIZE_T count, OFF64_T offset) { INTERCEPTOR_WITH_SUFFIX(SSIZE_T, readv, int fd, __sanitizer_iovec *iov, INTERCEPTOR(SSIZE_T, preadv, int fd, __sanitizer_iovec *iov, int iovcnt, INTERCEPTOR(SSIZE_T, preadv64, int fd, __sanitizer_iovec *iov, int iovcnt, INTERCEPTOR(SSIZE_T, write, int fd, void *ptr, SIZE_T count) { INTERCEPTOR(SIZE_T, fwrite, const void *p, uptr size, uptr nmemb, void *file) { INTERCEPTOR(SSIZE_T, pwrite, int fd, void *ptr, SIZE_T count, OFF_T offset) { INTERCEPTOR(SSIZE_T, pwrite64, int fd, void *ptr, OFF64_T count, INTERCEPTOR_WITH_SUFFIX(SSIZE_T, writev, int fd, __sanitizer_iovec *iov, INTERCEPTOR(SSIZE_T, pwritev, int fd, __sanitizer_iovec *iov, int iovcnt, INTERCEPTOR(SSIZE_T, pwritev64, int fd, __sanitizer_iovec *iov, int iovcnt, INTERCEPTOR(int, prctl, int option, unsigned long arg2, INTERCEPTOR(unsigned long, time, unsigned long *t) { INTERCEPTOR(__sanitizer_tm *, localtime, unsigned long *timep) { INTERCEPTOR(__sanitizer_tm *, localtime_r, unsigned long *timep, void *result) { INTERCEPTOR(__sanitizer_tm *, gmtime, unsigned long *timep) { INTERCEPTOR(__sanitizer_tm *, gmtime_r, unsigned long *timep, void *result) { INTERCEPTOR(char *, ctime, unsigned long *timep) { INTERCEPTOR(char *, ctime_r, unsigned long *timep, char *result) { INTERCEPTOR(char *, asctime, __sanitizer_tm *tm) { INTERCEPTOR(char *, asctime_r, __sanitizer_tm *tm, char *result) { INTERCEPTOR(long, mktime, __sanitizer_tm *tm) { INTERCEPTOR(char *, strptime, char *s, char *format, __sanitizer_tm *tm) { INTERCEPTOR(int, vscanf, const char *format, va_list ap) INTERCEPTOR(int, vsscanf, const char *str, const char *format, va_list ap) INTERCEPTOR(int, vfscanf, void *stream, const char *format, va_list ap) INTERCEPTOR(int, __isoc99_vscanf, const char *format, va_list ap) INTERCEPTOR(int, __isoc99_vsscanf, const char *str, const char *format, INTERCEPTOR(int, __isoc99_vfscanf, void *stream, const char *format, va_list ap) INTERCEPTOR(int, scanf, const char *format, ...) INTERCEPTOR(int, fscanf, void *stream, const char *format, ...) INTERCEPTOR(int, sscanf, const char *str, const char *format, ...) INTERCEPTOR(int, __isoc99_scanf, const char *format, ...) INTERCEPTOR(int, __isoc99_fscanf, void *stream, const char *format, ...) INTERCEPTOR(int, __isoc99_sscanf, const char *str, const char *format, ...) INTERCEPTOR(int, vprintf, const char *format, va_list ap) INTERCEPTOR(int, vfprintf, __sanitizer_FILE *stream, const char *format, INTERCEPTOR(int, vsnprintf, char *str, SIZE_T size, const char *format, INTERCEPTOR(int, vsnprintf_l, char *str, SIZE_T size, void *loc, INTERCEPTOR(int, snprintf_l, char *str, SIZE_T size, void *loc, INTERCEPTOR(int, vsprintf, char *str, const char *format, va_list ap) INTERCEPTOR(int, vasprintf, char **strp, const char *format, va_list ap) INTERCEPTOR(int, __isoc99_vprintf, const char *format, va_list ap) INTERCEPTOR(int, __isoc99_vfprintf, __sanitizer_FILE *stream, INTERCEPTOR(int, __isoc99_vsnprintf, char *str, SIZE_T size, const char *format, INTERCEPTOR(int, __isoc99_vsprintf, char *str, const char *format, INTERCEPTOR(int, printf, const char *format, ...) INTERCEPTOR(int, fprintf, __sanitizer_FILE *stream, const char *format, ...) INTERCEPTOR(int, sprintf, char *str, const char *format, ...) // NOLINT INTERCEPTOR(int, snprintf, char *str, SIZE_T size, const char *format, ...) INTERCEPTOR(int, asprintf, char **strp, const char *format, ...) INTERCEPTOR(int, __isoc99_printf, const char *format, ...) INTERCEPTOR(int, __isoc99_fprintf, __sanitizer_FILE *stream, const char *format, INTERCEPTOR(int, __isoc99_sprintf, char *str, const char *format, ...) INTERCEPTOR(int, __isoc99_snprintf, char *str, SIZE_T size, INTERCEPTOR(int, ioctl, int d, unsigned long request, ...) { INTERCEPTOR(__sanitizer_passwd *, getpwnam, const char *name) { INTERCEPTOR(__sanitizer_passwd *, getpwuid, u32 uid) { INTERCEPTOR(__sanitizer_group *, getgrnam, const char *name) { INTERCEPTOR(__sanitizer_group *, getgrgid, u32 gid) { INTERCEPTOR(int, getpwnam_r, const char *name, __sanitizer_passwd *pwd, INTERCEPTOR(int, getpwuid_r, u32 uid, __sanitizer_passwd *pwd, char *buf, INTERCEPTOR(int, getgrnam_r, const char *name, __sanitizer_group *grp, INTERCEPTOR(int, getgrgid_r, u32 gid, __sanitizer_group *grp, char *buf, INTERCEPTOR(__sanitizer_passwd *, getpwent, int dummy) { INTERCEPTOR(__sanitizer_group *, getgrent, int dummy) { INTERCEPTOR(__sanitizer_passwd *, fgetpwent, void *fp) { INTERCEPTOR(__sanitizer_group *, fgetgrent, void *fp) { INTERCEPTOR(int, getpwent_r, __sanitizer_passwd *pwbuf, char *buf, INTERCEPTOR(int, fgetpwent_r, void *fp, __sanitizer_passwd *pwbuf, char *buf, INTERCEPTOR(int, getgrent_r, __sanitizer_group *pwbuf, char *buf, SIZE_T buflen, INTERCEPTOR(int, fgetgrent_r, void *fp, __sanitizer_group *pwbuf, char *buf, INTERCEPTOR(void, setpwent, int dummy) { INTERCEPTOR(void, endpwent, int dummy) { INTERCEPTOR(void, setgrent, int dummy) { INTERCEPTOR(void, endgrent, int dummy) { INTERCEPTOR(int, clock_getres, u32 clk_id, void *tp) { INTERCEPTOR(int, clock_gettime, u32 clk_id, void *tp) { INTERCEPTOR(int, clock_settime, u32 clk_id, const void *tp) { INTERCEPTOR(int, getitimer, int which, void *curr_value) { INTERCEPTOR(int, setitimer, int which, const void *new_value, void *old_value) { INTERCEPTOR(int, glob, const char *pattern, int flags, INTERCEPTOR(int, glob64, const char *pattern, int flags, INTERCEPTOR_WITH_SUFFIX(int, wait, int *status) { INTERCEPTOR_WITH_SUFFIX(int, waitid, int idtype, long long id, void *infop, INTERCEPTOR_WITH_SUFFIX(int, waitid, int idtype, int id, void *infop, INTERCEPTOR_WITH_SUFFIX(int, waitpid, int pid, int *status, int options) { INTERCEPTOR(int, wait3, int *status, int options, void *rusage) { INTERCEPTOR(int, __wait4, int pid, int *status, int options, void *rusage) { INTERCEPTOR(int, wait4, int pid, int *status, int options, void *rusage) { INTERCEPTOR(char *, inet_ntop, int af, const void *src, char *dst, u32 size) { INTERCEPTOR(int, inet_pton, int af, const char *src, void *dst) { INTERCEPTOR(int, inet_aton, const char *cp, void *dst) { INTERCEPTOR(int, pthread_getschedparam, uptr thread, int *policy, int *param) { INTERCEPTOR(int, getaddrinfo, char *node, char *service, INTERCEPTOR(int, getnameinfo, void *sockaddr, unsigned salen, char *host, INTERCEPTOR(int, getsockname, int sock_fd, void *addr, int *addrlen) { INTERCEPTOR(struct __sanitizer_hostent *, gethostbyname, char *name) { INTERCEPTOR(struct __sanitizer_hostent *, gethostbyaddr, void *addr, int len, INTERCEPTOR(struct __sanitizer_hostent *, gethostent, int fake) { INTERCEPTOR(struct __sanitizer_hostent *, gethostbyname2, char *name, int af) { INTERCEPTOR(int, gethostbyname_r, char *name, struct __sanitizer_hostent *ret, INTERCEPTOR(int, gethostent_r, struct __sanitizer_hostent *ret, char *buf, INTERCEPTOR(int, gethostbyaddr_r, void *addr, int len, int type, INTERCEPTOR(int, gethostbyname2_r, char *name, int af, INTERCEPTOR(int, getsockopt, int sockfd, int level, int optname, void *optval, INTERCEPTOR(int, accept, int fd, void *addr, unsigned *addrlen) { INTERCEPTOR(int, accept4, int fd, void *addr, unsigned *addrlen, int f) { INTERCEPTOR(double, modf, double x, double *iptr) { INTERCEPTOR(float, modff, float x, float *iptr) { INTERCEPTOR(long double, modfl, long double x, long double *iptr) { INTERCEPTOR(SSIZE_T, recvmsg, int fd, struct __sanitizer_msghdr *msg, INTERCEPTOR(SSIZE_T, sendmsg, int fd, struct __sanitizer_msghdr *msg, INTERCEPTOR(int, getpeername, int sockfd, void *addr, unsigned *addrlen) { INTERCEPTOR(int, sysinfo, void *info) { INTERCEPTOR(__sanitizer_dirent *, opendir, const char *path) { INTERCEPTOR(__sanitizer_dirent *, readdir, void *dirp) { INTERCEPTOR(int, readdir_r, void *dirp, __sanitizer_dirent *entry, INTERCEPTOR(__sanitizer_dirent64 *, readdir64, void *dirp) { INTERCEPTOR(int, readdir64_r, void *dirp, __sanitizer_dirent64 *entry, INTERCEPTOR(uptr, ptrace, int request, int pid, void *addr, void *data) { INTERCEPTOR(char *, setlocale, int category, char *locale) { INTERCEPTOR(char *, getcwd, char *buf, SIZE_T size) { INTERCEPTOR(char *, get_current_dir_name, int fake) { INTERCEPTOR(INTMAX_T, strtoimax, const char *nptr, char **endptr, int base) { INTERCEPTOR(INTMAX_T, strtoumax, const char *nptr, char **endptr, int base) { INTERCEPTOR(SIZE_T, mbstowcs, wchar_t *dest, const char *src, SIZE_T len) { INTERCEPTOR(SIZE_T, mbsrtowcs, wchar_t *dest, const char **src, SIZE_T len, INTERCEPTOR(SIZE_T, mbsnrtowcs, wchar_t *dest, const char **src, SIZE_T nms, INTERCEPTOR(SIZE_T, wcstombs, char *dest, const wchar_t *src, SIZE_T len) { INTERCEPTOR(SIZE_T, wcsrtombs, char *dest, const wchar_t **src, SIZE_T len, INTERCEPTOR(SIZE_T, wcsnrtombs, char *dest, const wchar_t **src, SIZE_T nms, INTERCEPTOR(SIZE_T, wcrtomb, char *dest, wchar_t src, void *ps) { INTERCEPTOR(int, tcgetattr, int fd, void *termios_p) { INTERCEPTOR(char *, realpath, const char *path, char *resolved_path) { INTERCEPTOR(char *, canonicalize_file_name, const char *path) { INTERCEPTOR(SIZE_T, confstr, int name, char *buf, SIZE_T len) { INTERCEPTOR(int, sched_getaffinity, int pid, SIZE_T cpusetsize, void *mask) { INTERCEPTOR(int, sched_getparam, int pid, void *param) { INTERCEPTOR(char *, strerror, int errnum) { INTERCEPTOR(int, strerror_r, int errnum, char *buf, SIZE_T buflen) { INTERCEPTOR(char *, strerror_r, int errnum, char *buf, SIZE_T buflen) { INTERCEPTOR(int, __xpg_strerror_r, int errnum, char *buf, SIZE_T buflen) { INTERCEPTOR(int, scandir, char *dirp, __sanitizer_dirent ***namelist, INTERCEPTOR(int, scandir64, char *dirp, __sanitizer_dirent64 ***namelist, INTERCEPTOR(int, getgroups, int size, u32 *lst) { INTERCEPTOR(int, poll, __sanitizer_pollfd *fds, __sanitizer_nfds_t nfds, INTERCEPTOR(int, ppoll, __sanitizer_pollfd *fds, __sanitizer_nfds_t nfds, INTERCEPTOR(int, wordexp, char *s, __sanitizer_wordexp_t *p, int flags) { INTERCEPTOR(int, sigwait, __sanitizer_sigset_t *set, int *sig) { INTERCEPTOR(int, sigwaitinfo, __sanitizer_sigset_t *set, void *info) { INTERCEPTOR(int, sigtimedwait, __sanitizer_sigset_t *set, void *info, INTERCEPTOR(int, sigemptyset, __sanitizer_sigset_t *set) { INTERCEPTOR(int, sigfillset, __sanitizer_sigset_t *set) { INTERCEPTOR(int, sigpending, __sanitizer_sigset_t *set) { INTERCEPTOR(int, sigprocmask, int how, __sanitizer_sigset_t *set, INTERCEPTOR(int, backtrace, void **buffer, int size) { INTERCEPTOR(char **, backtrace_symbols, void **buffer, int size) { INTERCEPTOR(void, _exit, int status) { INTERCEPTOR(int, pthread_mutex_lock, void *m) { INTERCEPTOR(int, pthread_mutex_unlock, void *m) { INTERCEPTOR(__sanitizer_mntent *, getmntent, void *fp) { INTERCEPTOR(__sanitizer_mntent *, getmntent_r, void *fp, INTERCEPTOR(int, statfs, char *path, void *buf) { INTERCEPTOR(int, fstatfs, int fd, void *buf) { INTERCEPTOR(int, statfs64, char *path, void *buf) { INTERCEPTOR(int, fstatfs64, int fd, void *buf) { INTERCEPTOR(int, statvfs, char *path, void *buf) { INTERCEPTOR(int, fstatvfs, int fd, void *buf) { INTERCEPTOR(int, statvfs64, char *path, void *buf) { INTERCEPTOR(int, fstatvfs64, int fd, void *buf) { INTERCEPTOR(int, initgroups, char *user, u32 group) { INTERCEPTOR(char *, ether_ntoa, __sanitizer_ether_addr *addr) { INTERCEPTOR(__sanitizer_ether_addr *, ether_aton, char *buf) { INTERCEPTOR(int, ether_ntohost, char *hostname, __sanitizer_ether_addr *addr) { INTERCEPTOR(int, ether_hostton, char *hostname, __sanitizer_ether_addr *addr) { INTERCEPTOR(int, ether_line, char *line, __sanitizer_ether_addr *addr, INTERCEPTOR(char *, ether_ntoa_r, __sanitizer_ether_addr *addr, char *buf) { INTERCEPTOR(__sanitizer_ether_addr *, ether_aton_r, char *buf, INTERCEPTOR(int, shmctl, int shmid, int cmd, void *buf) { INTERCEPTOR(int, random_r, void *buf, u32 *result) { INTERCEPTOR_PTHREAD_ATTR_GET(detachstate, sizeof(int)) INTERCEPTOR_PTHREAD_ATTR_GET(guardsize, sizeof(SIZE_T)) INTERCEPTOR_PTHREAD_ATTR_GET(schedparam, struct_sched_param_sz) INTERCEPTOR_PTHREAD_ATTR_GET(schedpolicy, sizeof(int)) INTERCEPTOR_PTHREAD_ATTR_GET(scope, sizeof(int)) INTERCEPTOR_PTHREAD_ATTR_GET(stacksize, sizeof(SIZE_T)) INTERCEPTOR(int, pthread_attr_getstack, void *attr, void **addr, SIZE_T *size) { INTERCEPTOR_PTHREAD_ATTR_GET(inheritsched, sizeof(int)) INTERCEPTOR(int, pthread_attr_getaffinity_np, void *attr, SIZE_T cpusetsize, INTERCEPTOR_PTHREAD_MUTEXATTR_GET(pshared, sizeof(int)) INTERCEPTOR_PTHREAD_MUTEXATTR_GET(type, sizeof(int)) INTERCEPTOR_PTHREAD_MUTEXATTR_GET(protocol, sizeof(int)) INTERCEPTOR_PTHREAD_MUTEXATTR_GET(prioceiling, sizeof(int)) INTERCEPTOR_PTHREAD_MUTEXATTR_GET(robust, sizeof(int)) INTERCEPTOR_PTHREAD_MUTEXATTR_GET(robust_np, sizeof(int)) INTERCEPTOR_PTHREAD_RWLOCKATTR_GET(pshared, sizeof(int)) INTERCEPTOR_PTHREAD_RWLOCKATTR_GET(kind_np, sizeof(int)) INTERCEPTOR_PTHREAD_CONDATTR_GET(pshared, sizeof(int)) INTERCEPTOR_PTHREAD_CONDATTR_GET(clock, sizeof(int)) INTERCEPTOR_PTHREAD_BARRIERATTR_GET(pshared, sizeof(int)) // !mac !android INTERCEPTOR(char *, tmpnam, char *s) { INTERCEPTOR(char *, tmpnam_r, char *s) { INTERCEPTOR(int, ttyname_r, int fd, char *name, SIZE_T namesize) { INTERCEPTOR(char *, tempnam, char *dir, char *pfx) { INTERCEPTOR(int, pthread_setname_np, uptr thread, const char *name) { INTERCEPTOR(void, sincos, double x, double *sin, double *cos) { INTERCEPTOR(void, sincosf, float x, float *sin, float *cos) { INTERCEPTOR(void, sincosl, long double x, long double *sin, long double *cos) { INTERCEPTOR(double, remquo, double x, double y, int *quo) { INTERCEPTOR(float, remquof, float x, float y, int *quo) { INTERCEPTOR(long double, remquol, long double x, long double y, int *quo) { INTERCEPTOR(double, lgamma, double x) { INTERCEPTOR(float, lgammaf, float x) { INTERCEPTOR(long double, lgammal, long double x) { INTERCEPTOR(double, lgamma_r, double x, int *signp) { INTERCEPTOR(float, lgammaf_r, float x, int *signp) { INTERCEPTOR(long double, lgammal_r, long double x, int *signp) { INTERCEPTOR(int, drand48_r, void *buffer, double *result) { INTERCEPTOR(int, lrand48_r, void *buffer, long *result) { INTERCEPTOR(int, rand_r, unsigned *seedp) { INTERCEPTOR(SSIZE_T, getline, char **lineptr, SIZE_T *n, void *stream) { INTERCEPTOR(SSIZE_T, __getdelim, char **lineptr, SIZE_T *n, int delim, INTERCEPTOR(SSIZE_T, getdelim, char **lineptr, SIZE_T *n, int delim, INTERCEPTOR(SIZE_T, iconv, void *cd, char **inbuf, SIZE_T *inbytesleft, INTERCEPTOR(__sanitizer_clock_t, times, void *tms) { INTERCEPTOR(void *, __tls_get_addr, void *arg) { INTERCEPTOR(uptr, __tls_get_addr_internal, void *arg) { INTERCEPTOR(SSIZE_T, listxattr, const char *path, char *list, SIZE_T size) { INTERCEPTOR(SSIZE_T, llistxattr, const char *path, char *list, SIZE_T size) { INTERCEPTOR(SSIZE_T, flistxattr, int fd, char *list, SIZE_T size) { INTERCEPTOR(SSIZE_T, getxattr, const char *path, const char *name, char *value, INTERCEPTOR(SSIZE_T, lgetxattr, const char *path, const char *name, char *value, INTERCEPTOR(SSIZE_T, fgetxattr, int fd, const char *name, char *value, INTERCEPTOR(int, getresuid, void *ruid, void *euid, void *suid) { INTERCEPTOR(int, getresgid, void *rgid, void *egid, void *sgid) { INTERCEPTOR(int, getifaddrs, __sanitizer_ifaddrs **ifap) { INTERCEPTOR(char *, if_indextoname, unsigned int ifindex, char* ifname) { INTERCEPTOR(unsigned int, if_nametoindex, const char* ifname) { INTERCEPTOR(int, capget, void *hdrp, void *datap) { INTERCEPTOR(int, capset, void *hdrp, const void *datap) { INTERCEPTOR(void *, __aeabi_memmove, void *to, const void *from, uptr size) { INTERCEPTOR(void *, __aeabi_memmove4, void *to, const void *from, uptr size) { INTERCEPTOR(void *, __aeabi_memmove8, void *to, const void *from, uptr size) { INTERCEPTOR(void *, __aeabi_memcpy, void *to, const void *from, uptr size) { INTERCEPTOR(void *, __aeabi_memcpy4, void *to, const void *from, uptr size) { INTERCEPTOR(void *, __aeabi_memcpy8, void *to, const void *from, uptr size) { INTERCEPTOR(void *, __aeabi_memset, void *block, uptr size, int c) { INTERCEPTOR(void *, __aeabi_memset4, void *block, uptr size, int c) { INTERCEPTOR(void *, __aeabi_memset8, void *block, uptr size, int c) { INTERCEPTOR(void *, __aeabi_memclr, void *block, uptr size) { INTERCEPTOR(void *, __aeabi_memclr4, void *block, uptr size) { INTERCEPTOR(void *, __aeabi_memclr8, void *block, uptr size) { INTERCEPTOR(void *, __bzero, void *block, uptr size) { INTERCEPTOR(int, ftime, __sanitizer_timeb *tp) { INTERCEPTOR(void, xdrmem_create, __sanitizer_XDR *xdrs, uptr addr, INTERCEPTOR(void, xdrstdio_create, __sanitizer_XDR *xdrs, void *file, int op) { INTERCEPTOR(int, xdr_bytes, __sanitizer_XDR *xdrs, char **p, unsigned *sizep, INTERCEPTOR(int, xdr_string, __sanitizer_XDR *xdrs, char **p, INTERCEPTOR(void *, tsearch, void *key, void **rootp, INTERCEPTOR(int, __uflow, __sanitizer_FILE *fp) { INTERCEPTOR(int, __underflow, __sanitizer_FILE *fp) { INTERCEPTOR(int, __overflow, __sanitizer_FILE *fp, int ch) { INTERCEPTOR(int, __wuflow, __sanitizer_FILE *fp) { INTERCEPTOR(int, __wunderflow, __sanitizer_FILE *fp) { INTERCEPTOR(int, __woverflow, __sanitizer_FILE *fp, int ch) { INTERCEPTOR(__sanitizer_FILE *, fopen, const char *path, const char *mode) { INTERCEPTOR(__sanitizer_FILE *, fdopen, int fd, const char *mode) { INTERCEPTOR(__sanitizer_FILE *, freopen, const char *path, const char *mode, INTERCEPTOR(__sanitizer_FILE *, fopen64, const char *path, const char *mode) { INTERCEPTOR(__sanitizer_FILE *, freopen64, const char *path, const char *mode, INTERCEPTOR(__sanitizer_FILE *, open_memstream, char **ptr, SIZE_T *sizeloc) { INTERCEPTOR(__sanitizer_FILE *, open_wmemstream, wchar_t **ptr, INTERCEPTOR(__sanitizer_FILE *, fmemopen, void *buf, SIZE_T size, INTERCEPTOR(int, _obstack_begin_1, __sanitizer_obstack *obstack, int sz, INTERCEPTOR(int, _obstack_begin, __sanitizer_obstack *obstack, int sz, INTERCEPTOR(void, _obstack_newchunk, __sanitizer_obstack *obstack, int length) { INTERCEPTOR(int, fflush, __sanitizer_FILE *fp) { INTERCEPTOR(int, fclose, __sanitizer_FILE *fp) { INTERCEPTOR(void*, dlopen, const char *filename, int flag) { INTERCEPTOR(int, dlclose, void *handle) { INTERCEPTOR(char *, getpass, const char *prompt) { INTERCEPTOR(int, timerfd_settime, int fd, int flags, void *new_value, INTERCEPTOR(int, timerfd_gettime, int fd, void *curr_value) { INTERCEPTOR(int, mlock, const void *addr, uptr len) { INTERCEPTOR(int, munlock, const void *addr, uptr len) { INTERCEPTOR(int, mlockall, int flags) { INTERCEPTOR(int, munlockall, void) { INTERCEPTOR(__sanitizer_FILE *, fopencookie, void *cookie, const char *mode, INTERCEPTOR(int, sem_init, __sanitizer_sem_t *s, int pshared, unsigned value) { INTERCEPTOR(int, sem_destroy, __sanitizer_sem_t *s) { INTERCEPTOR(int, sem_wait, __sanitizer_sem_t *s) { INTERCEPTOR(int, sem_trywait, __sanitizer_sem_t *s) { INTERCEPTOR(int, sem_timedwait, __sanitizer_sem_t *s, void *abstime) { INTERCEPTOR(int, sem_post, __sanitizer_sem_t *s) { INTERCEPTOR(int, sem_getvalue, __sanitizer_sem_t *s, int *sval) { INTERCEPTOR(int, pthread_setcancelstate, int state, int *oldstate) { INTERCEPTOR(int, pthread_setcanceltype, int type, int *oldtype) { INTERCEPTOR(int, mincore, void *addr, uptr length, unsigned char *vec) { INTERCEPTOR(SSIZE_T, process_vm_readv, int pid, __sanitizer_iovec *local_iov, INTERCEPTOR(SSIZE_T, process_vm_writev, int pid, __sanitizer_iovec *local_iov, INTERCEPTOR(char *, ctermid, char *s) { INTERCEPTOR(char *, ctermid_r, char *s) { INTERCEPTOR(SSIZE_T, recv, int fd, void *buf, SIZE_T len, int flags) { INTERCEPTOR(SSIZE_T, recvfrom, int fd, void *buf, SIZE_T len, int flags, INTERCEPTOR(SSIZE_T, send, int fd, void *buf, SIZE_T len, int flags) { INTERCEPTOR(SSIZE_T, sendto, int fd, void *buf, SIZE_T len, int flags, INTERCEPTOR(int, eventfd_read, int fd, u64 *value) { INTERCEPTOR(int, eventfd_write, int fd, u64 value) { INTERCEPTOR(int, stat, const char *path, void *buf) { INTERCEPTOR(int, __xstat, int version, const char *path, void *buf) { INTERCEPTOR(int, __xstat64, int version, const char *path, void *buf) { INTERCEPTOR(int, __lxstat, int version, const char *path, void *buf) { INTERCEPTOR(int, __lxstat64, int version, const char *path, void *buf) { INTERCEPTOR(void *, getutent, int dummy) { INTERCEPTOR(void *, getutid, void *ut) { INTERCEPTOR(void *, getutline, void *ut) { INTERCEPTOR(void *, getutxent, int dummy) { INTERCEPTOR(void *, getutxid, void *ut) { INTERCEPTOR(void *, getutxline, void *ut) { INTERCEPTOR(int, getloadavg, double *loadavg, int nelem) { INTERCEPTOR(int, mcheck, void (*abortfunc)(int mstatus)) { INTERCEPTOR(int, mcheck_pedantic, void (*abortfunc)(int mstatus)) { INTERCEPTOR(int, mprobe, void *ptr) { INTERCEPTOR(SIZE_T, wcslen, const wchar_t *s) { INTERCEPTOR(SIZE_T, wcsnlen, const wchar_t *s, SIZE_T n) { INTERCEPTOR(wchar_t *, wcscat, wchar_t *dst, const wchar_t *src) { INTERCEPTOR(wchar_t *, wcsncat, wchar_t *dst, const wchar_t *src, SIZE_T n) {
On Wed, 4 Apr 2018, marxin at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 > > --- Comment #20 from Martin Liška <marxin at gcc dot gnu.org> --- > For the libsanitizer/*/*_interceptors I make a quick patch: > https://github.com/marxin/gcc/commit/5ce658230db567474997fa411f23ac78366487ce > which basically splits asan_interceptors.cc and > sanitizer_common_interceptors.inc and moves implementation of string functions > to a separate compile unit. > This shrinks time from 38->34s for asan_interceptors.cc being built with > enabled checking stage1 compiler. > > I believe splitting the interceptors to couple of logical sub-files will make > it very fast. List of interceptors grepped from > sanitizer_common_interceptors.inc: > I can imagine splitting that to components like string, stdio, time, process, > thread, math,.. The question is of course _why_ it is this slow. It's not that this is 10000s of functions or very large ones...
(In reply to rguenther@suse.de from comment #21) > On Wed, 4 Apr 2018, marxin at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 > > > > --- Comment #20 from Martin Liška <marxin at gcc dot gnu.org> --- > > For the libsanitizer/*/*_interceptors I make a quick patch: > > https://github.com/marxin/gcc/commit/5ce658230db567474997fa411f23ac78366487ce > > which basically splits asan_interceptors.cc and > > sanitizer_common_interceptors.inc and moves implementation of string functions > > to a separate compile unit. > > This shrinks time from 38->34s for asan_interceptors.cc being built with > > enabled checking stage1 compiler. > > > > I believe splitting the interceptors to couple of logical sub-files will make > > it very fast. List of interceptors grepped from > > sanitizer_common_interceptors.inc: > > I can imagine splitting that to components like string, stdio, time, process, > > thread, math,.. > > The question is of course _why_ it is this slow. It's not that this > is 10000s of functions or very large ones... It's analyzed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78288
I can easily split insn-emit.c. Once we know which was a split should be done, I can prepare patch for that.
(In reply to Martin Liška from comment #23) > I can easily split insn-emit.c. Once we know which was a split should be > done, I can prepare patch for that. Confirmed, please do this!
Let me assign it.
Created attachment 45630 [details] make -j 64 all-gcc, with --disable-bootstrap, on 64-cores. Blue means dependency to gimple-match. Since gimple-match.c takes so long to compile, I was wondering if it might be possible to reorder the compilation so we can push its compilation early in the dependency graph. I did the following steps: 1) 'configure --disable-bootstrap' 2) 'make -j 64 all-gcc' 3) 'make clean'. 4) 'make gimple-match.o' using a wrapper[1] that I created to log all files required by gimple-match, and plotted the attached graphic. Here, blue means dependency and the largest bar is the 'gimple-match.c' itself. I used a 64 cores AMD Opteron 6376 in the process. Any ideas? [1] https://github.com/giulianobelinassi/gcc-timer-analysis
> Since gimple-match.c takes so long to compile, I was wondering if it might > be possible to reorder the compilation so we can push its compilation early > in the dependency graph. No, the proper fix would be to split the generated files and compile them in parallel. Similarly for all the insn-*.c generated files. That would the proper fix. Anyway, I like the graph you made :)
But what version of GCC is this graph, with what exact configuration?
> No, the proper fix would be to split the generated files and compile them in parallel. Similarly for all the insn-*.c generated files. That would the proper fix. Indeed. However, I am working on parallelizing the compilation with threads. This may lead to a solution, but may not be the best for this scenario. > Anyway, I like the graph you made :) Thank you. > But what version of GCC is this graph, with what exact configuration? * This is the gcc that I used to build: * Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 8.2.0-14' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 8.2.0 (Debian 8.2.0-14) * The gcc that I built: * Using built-in specs. COLLECT_GCC=./xgcc Target: x86_64-pc-linux-gnu Configured with: /home/giulianob/gcc_svn/trunk//configure --disable-checking --disable-bootstrap Thread model: posix gcc version 9.0.1 20190205 (experimental) (GCC)
A possible solution can be usage of '-flinker-output=nolto-rel -r' for huge files.
I think this came up at Cauldron, but I forget what exactly people said about it...
(In reply to Eric Gallager from comment #31) > I think this came up at Cauldron, but I forget what exactly people said > about it... Actually this PR comes before Cauldron 2019. One way to fix this issue is to make the match.pd parser output several smaller gimple-match.c, and add these to the Makefile. Also repeat this procedure to other big files. Another solution is to parallelize GCC internals and make GCC communicate with Make somehow so that when a CPU is idle, it starts compiling some files in parallel.
GCC 10.1 has been released.
(In reply to Giuliano Belinassi from comment #32) > (In reply to Eric Gallager from comment #31) > > I think this came up at Cauldron, but I forget what exactly people said > > about it... > > Actually this PR comes before Cauldron 2019. By "came up" I meant simply that it was mentioned, not that that was where it originated...
(In reply to Martin Liška from comment #30) > A possible solution can be usage of '-flinker-output=nolto-rel -r' for huge > files. it's useful for splitting huge files ?
(In reply to jojo from comment #35) > (In reply to Martin Liška from comment #30) > > A possible solution can be usage of '-flinker-output=nolto-rel -r' for huge > > files. > > it's useful for splitting huge files ? There's experiment I did: $ time g++ -O2 /tmp/gimple-match.ii -c real 0m35.790s user 0m35.490s sys 0m0.268s $ time g++ -O2 /tmp/gimple-match.ii -c -flto real 0m8.138s user 0m7.915s sys 0m0.202s $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o real 0m9.087s user 1m56.028s sys 0m3.292s $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o --param lto-partitions=8 real 0m7.350s user 0m48.548s sys 0m0.976s $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o gimple-match2.o --param lto-partitions=4 real 0m9.847s user 0m30.462s sys 0m0.392s so for N==4 we get to 8+10s = 18s (compared to the original 36s). And total user time is 30+8, which is comparable to the original 36s.
(In reply to Martin Liška from comment #36) > (In reply to jojo from comment #35) > > (In reply to Martin Liška from comment #30) > > > A possible solution can be usage of '-flinker-output=nolto-rel -r' for huge > > > files. > > > > it's useful for splitting huge files ? > > There's experiment I did: > > $ time g++ -O2 /tmp/gimple-match.ii -c > > real 0m35.790s > user 0m35.490s > sys 0m0.268s > > $ time g++ -O2 /tmp/gimple-match.ii -c -flto > > real 0m8.138s > user 0m7.915s > sys 0m0.202s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o > > real 0m9.087s > user 1m56.028s > sys 0m3.292s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o --param lto-partitions=8 > > real 0m7.350s > user 0m48.548s > sys 0m0.976s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o --param lto-partitions=4 > > real 0m9.847s > user 0m30.462s > sys 0m0.392s > > so for N==4 we get to 8+10s = 18s (compared to the original 36s). And total > user time is 30+8, which is comparable > to the original 36s. The GSoC parallelism project this year is supposed to replicate this in a cheaper way and also develop some magic to automatically trigger it when it seems profitable.
(In reply to Martin Liška from comment #36) > (In reply to jojo from comment #35) > > (In reply to Martin Liška from comment #30) > > > A possible solution can be usage of '-flinker-output=nolto-rel -r' for huge > > > files. > > > > it's useful for splitting huge files ? > > There's experiment I did: > > $ time g++ -O2 /tmp/gimple-match.ii -c > > real 0m35.790s > user 0m35.490s > sys 0m0.268s > > $ time g++ -O2 /tmp/gimple-match.ii -c -flto > > real 0m8.138s > user 0m7.915s > sys 0m0.202s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o > > real 0m9.087s > user 1m56.028s > sys 0m3.292s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o --param lto-partitions=8 > > real 0m7.350s > user 0m48.548s > sys 0m0.976s > > $ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -r -o > gimple-match2.o --param lto-partitions=4 > > real 0m9.847s > user 0m30.462s > sys 0m0.392s > > so for N==4 we get to 8+10s = 18s (compared to the original 36s). And total > user time is 30+8, which is comparable > to the original 36s. It's looks a little cost down for huge file as insn-emit.c...... I want to use shell tool like 'csplit' to split it and compile parallelly
GCC 10.2 is released, adjusting target milestone.
GCC 10.3 is being released, retargeting bugs to GCC 10.4.
Latest discussion of this can also be found at: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571555.html
Is this just about parallelism bottlenecks for the main build target (e.g. just `make` or `make all`), or does it apply to other Makefile targets, too? (e.g. the testsuite via `make check`, or docs via `make pdf` or something)
(In reply to Eric Gallager from comment #42) > Is this just about parallelism bottlenecks for the main build target (e.g. > just `make` or `make all`), or does it apply to other Makefile targets, too? > (e.g. the testsuite via `make check`, or docs via `make pdf` or something) Well, it was intended to cover only the main build, which pdf can be seen as part of. On the other hand, `make check` should belong to a different PR if you have troubles with it.
(In reply to Martin Liška from comment #43) > (In reply to Eric Gallager from comment #42) > > Is this just about parallelism bottlenecks for the main build target (e.g. > > just `make` or `make all`), or does it apply to other Makefile targets, too? > > (e.g. the testsuite via `make check`, or docs via `make pdf` or something) > > Well, it was intended to cover only the main build, which pdf can be seen as > part of. I usually have to run `make pdf` as a separate build target, though, as it doesn't get run as part of the main build for me... and the bottleneck there, for the pdf target, is in libstdc++ for me...
(In reply to Martin Liška from comment #0) > [...] > Then I built GCC with -j1 and used following parser to generate reports: > https://github.com/marxin/script-misc/blob/master/parse-make-log.py The new URL for that script is now this, btw: https://github.com/marxin/script-misc/blob/master/legacy/parse-make-log.py
Even partially making the build less recursive would likely help a fair bit. The classic text on this is https://accu.org/journals/overload/14/71/miller_2004/. This doesn't mean that splitting up files is futile, but when watching a build, much of the time, make doesn't even get to traverse into each of the directories, because it doesn't know if it's able to. It can safely be done in stages. Using includes would let you get a lot of the current state wrt split directories. Could even just have a certain number of toplevel directories but non-recursive within them.
(In reply to Sam James from comment #46) > Even partially making the build less recursive would likely help a fair bit. It will help a bit, sure, but not nearly as much as you perhaps hope for. There are quite a few "synchronisation" points where nothing after it can be done until everything before it has been done. Partly this is just because we have a three-stage bootstrap, but also there are some generator programs that everything else depends on (on its output that is), and those are real chokepoints. Also, recursive make is a scourge of humanity, for sure, but fixing this has to be done in autoxxxx first and foremost.
Created attachment 53989 [details] CPU utilization of make all-host on recent AMD server The situation with a recent AMD server is really bad! Having 192 cores, the average CPU utilization of `make all-host` is 6% !
One more observation I made, apparently we're trying to sort (in Makefile.in) OBJS with the biggest at the very beginning: 1295 # Language-independent object files. 1296 # We put the *-match.o and insn-*.o files first so that a parallel make 1297 # will build them sooner, because they are large and otherwise tend to be 1298 # the last objects to finish building. 1299 OBJS = \ 1300 gimple-match.o \ 1301 generic-match.o \ 1302 insn-attrtab.o \ 1303 insn-automata.o \ That's fine, plus we introduce dependency for all objects to depend on generated_files: 4441 # In order for parallel make to really start compiling the expensive 4442 # objects from $(OBJS) as early as possible, build all their 4443 # prerequisites strictly before all objects. 4444 $(ALL_HOST_OBJS) : | $(generated_files) Using that, we should see gimple-match.o being spawned very soon, but it's not the case. Imagine you have already built all-host and let's see what happens: $ rm -f gimple-match.o ; rm -f tree*.o && make -j4 --debug=b libbackend.a 2>&1 | less ... File 'gimple-match.o' does not exist. Prerequisite 'cs-bconfig.h' is newer than target 'bconfig.h'. Must remake target 'bconfig.h'. Prerequisite 'cstamp-h' is newer than target 'auto-host.h'. Must remake target 'auto-host.h'. Prerequisite 's-options' is newer than target 'optionlist'. Must remake target 'optionlist'. Prerequisite 's-gtyp-input' is newer than target 'gtyp-input.list'. Must remake target 'gtyp-input.list'. Prerequisite 's-bversion' is newer than target 'bversion.h'. Must remake target 'bversion.h'. Prerequisite 'cs-config.h' is newer than target 'config.h'. Must remake target 'config.h'. ... File 'tree-vrp.o' does not exist. File 'tree.o' does not exist. Prerequisite 's-i386-bt' is newer than target 'i386-builtin-types.inc'. Must remake target 'i386-builtin-types.inc'. File 'gimple-match.o' does not exist. Prerequisite 's-modes-h' is newer than target 'insn-modes.h'. Must remake target 'insn-modes.h'. Prerequisite 's-modes-inline-h' is newer than target 'insn-modes-inline.h'. Must remake target 'insn-modes-inline.h'. Prerequisite 's-version' is newer than target 'version.h'. Must remake target 'version.h'. Prerequisite 's-options-h' is newer than target 'options.h'. Must remake target 'options.h'. Prerequisite 's-genrtl-h' is newer than target 'genrtl.h'. Must remake target 'genrtl.h'. Prerequisite 's-modes-m' is newer than target 'min-insn-modes.cc'. Must remake target 'min-insn-modes.cc'. ... File 'gimple-match.o' does not exist. Prerequisite 's-gtype' is newer than target 'gtype-desc.h'. Must remake target 'gtype-desc.h'. Prerequisite 's-constants' is newer than target 'insn-constants.h'. Must remake target 'insn-constants.h'. ... Must remake target 'tree-affine.o'. g++ -fno-PIE -c -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include -I/home/marxin/Programming/gcc/gcc/../libcody -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace -o tree-affine.o -MT tree-affine.o -MMD -MP -MF ./.deps/tree-affine.TPo /home/marxin/Programming/gcc/gcc/tree-affine.cc File 'tree-call-cdce.o' does not exist. Must remake target 'tree-call-cdce.o'. g++ -fno-PIE -c -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include -I/home/marxin/Programming/gcc/gcc/../libcody -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace -o tree-call-cdce.o -MT tree-call-cdce.o -MMD -MP -MF ./.deps/tree-call-cdce.TPo /home/marxin/Programming/gcc/gcc/tree-call-cdce.cc File 'tree-cfg.o' does not exist. Must remake target 'tree-cfg.o'. g++ -fno-PIE -c -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include -I/home/marxin/Programming/gcc/gcc/../libcody -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace -o tree-cfg.o -MT tree-cfg.o -MMD -MP -MF ./.deps/tree-cfg.TPo /home/marxin/Programming/gcc/gcc/tree-cfg.cc File 'tree-cfgcleanup.o' does not exist. Must remake target 'tree-cfgcleanup.o'. So gimple-match.o has a complex dependencies that are somehow under investigation and that's why it doesn't start early :/ It's likely related to various ' $(STAMP) $name' we use, if I consider one: gtyp-input.list: s-gtyp-input ; @true s-gtyp-input: Makefile @: $(call write_entries_to_file,$(GTFILES),tmp-gi.list) $(SHELL) $(srcdir)/../move-if-change tmp-gi.list gtyp-input.list $(STAMP) s-gtyp-input Here we touch 's-gtyp-input' later than gtyp-input.list is created and thus gtyp-input.list always need to be remade becase it's dependency s-gtyp-input is newer. Similarly for many other rules: gimple-match.cc: s-match gimple-match-head.cc ; @true s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd cfn-operators.pd $(RUN_GEN) build/genmatch$(build_exeext) --gimple $(srcdir)/match.pd \ > tmp-gimple-match.cc $(RUN_GEN) build/genmatch$(build_exeext) --generic $(srcdir)/match.pd \ > tmp-generic-match.cc $(SHELL) $(srcdir)/../move-if-change tmp-gimple-match.cc \ gimple-match.cc $(SHELL) $(srcdir)/../move-if-change tmp-generic-match.cc \ generic-match.cc $(STAMP) s-match Here it's even more complicated, I think s-match should be only updated if generic-match.cc is touched, otherwise, we again end up younger s-match than gimple-match.cc. Can please any GNU make expect judge here? Starting e.g. gimple-match.cc early would really help to speed up the build process.
(In reply to Martin Liška from comment #48) > Created attachment 53989 [details] > CPU utilization of make all-host on recent AMD server > > The situation with a recent AMD server is really bad! Having 192 cores, the > average CPU utilization of `make all-host` is 6% ! Just do more builds in parallel! There's just 903 .o files in gcc/ and libbackend.a just has 490 of them. It's not surprising the few larger files stretch out the compile-time here. Try LTOing libbackend.a?
(In reply to Martin Liška from comment #49) [...] > Can please any GNU make expect judge here? Starting e.g. gimple-match.cc > early would really help > to speed up the build process. this has come up in the past and there's no reliable way to order things (just use make -j on such machines and overcommit?)
(In reply to Richard Biener from comment #51) > (In reply to Martin Liška from comment #49) > > [...] > > > Can please any GNU make expect judge here? Starting e.g. gimple-match.cc > > early would really help > > to speed up the build process. > > this has come up in the past and there's no reliable way to order things > (just use make -j on such machines and overcommit?) Doesn't make a difference to overall time so early starting isn't the issue it seems.
(In reply to Richard Biener from comment #50) > (In reply to Martin Liška from comment #48) > > Created attachment 53989 [details] > > CPU utilization of make all-host on recent AMD server > > > > The situation with a recent AMD server is really bad! Having 192 cores, the > > average CPU utilization of `make all-host` is 6% ! > > Just do more builds in parallel! No! I'm speaking about faster edit-build-debug cycles and also about faster builds of gcc packages. > There's just 903 .o files in gcc/ and > libbackend.a just has 490 of them. It's not surprising the few larger > files stretch out the compile-time here. Well, gimple-match.o takes ~66s on my new AMD Ryzen 9 5950X CPU :/ > Try LTOing libbackend.a? Yep, that's our parallel for free approach and I would welcome that, however: during IPA pass: inline In member function ‘quick_push’, inlined from ‘make_forwarders_with_degenerate_phis’ at /home/marxin/Programming/gcc/gcc/tree-ssa-dce.cc:1848:6: /home/marxin/Programming/gcc/gcc/vec.h:1958:28: internal compiler error: Segmentation fault 1958 | return m_vec->quick_push (obj); | ^ 0x102f987 internal_error(char const*, ...) ???:0 0x117935b cgraph_node::get_untransformed_body() ???:0 0x123f6e9 optimize_inline_calls(tree_node*) ???:0 0x123e4d2 inline_transform(cgraph_node*) ???:0 0x123da5f execute_all_ipa_transforms(bool) ???:0 0x15ebe1b cgraph_node::expand() ???:0 0x15e2f6d symbol_table::compile() ???:0 0x15d0368 lto_main() ???:0 I'll isolate that and hope we can add a configure option for LTOed libbackend.a.
> Try LTOing libbackend.a? So this option is not feasible as well, we're paying a too high price for parallel WPA of the LTO and the resulting time on 32 cores is even slower :/
Created attachment 53995 [details] make all-host on Ryzen 9
Created attachment 53996 [details] make all-host on Ryzen 9 with LTO partial linking Using partial linking for the following 4 objects (gimple-match.o generic-match.o insn-recog.o insn-emit.o), I can speed up build of all-host by almost 30s from 145 to 115 seconds).
Created attachment 53997 [details] Partial linking path
Since November 2021, there's been a significant regression in the compile time for gimple-match.cc during a bootstrap build (+100% in Stage 2, +73% in Stage 3), with this regression accounting for over 20% of the current total bootstrap time on some aarch64 machines. Most of the change in compile time is due to the following 6 commits (of which one is a performance improvement, and one only regressed the Stage 2 build): 7df89377a7ae3906255e38a79be8e5d962c3a0df 24th November 2021 Enhance optimize_atomic_bit_test_and to handle truncation. (Hongtao Liu) Stage 2: +27% Stage 3: +33% 9a53101caadae1b5c8d791d247b05268ee4f7f92 16th May 2022 Add MIN/MAX folding from fold_cond_expr_with_comparison to match.pd (Richard Biener) Stage 2: +15% Stage 3: +15% 409978d58dafa689c5b3f85013e2786526160f2c 9th August 2022 tree-optimization/106514 - add --param max-jump-thread-paths (Richard Biener) Stage 2: -7% Stage 3: -10% 011d0a033ab370ea38b06b813ac62be8dde0801b 18th August 2022 Make path_range_query standalone and add reset_path. (Aldy Hernandez) Stage 2: +5% Stage 3: +0% 4d9db4bdd458a4b526f59e4bc5bbd549d3861cea 12th December 2022 middle-end: simplify complex if expressions where comparisons are inverse of one another. (Tamar Christina) Stage 2: +10% Stage 3: +9% 733a1b777f16cd397b43a242d9c31761f66d3da8 13th January 2023 sched-deps: do not schedule pseudos across calls [PR108117] (Alexander Monakov) Stage 2: +14% Stage 3: +9%
(In reply to Andrew Carlotti from comment #58) > Since November 2021, there's been a significant regression in the compile > time for gimple-match.cc during a bootstrap build (+100% in Stage 2, +73% in > Stage 3), with this regression accounting for over 20% of the current total > bootstrap time on some aarch64 machines. Thank for the interesting numbers! Yeah, it's very unfortunate :/ > > Most of the change in compile time is due to the following 6 commits (of > which one is a performance improvement, and one only regressed the Stage 2 > build): > > 7df89377a7ae3906255e38a79be8e5d962c3a0df 24th November 2021 > Enhance optimize_atomic_bit_test_and to handle truncation. (Hongtao Liu) > Stage 2: +27% > Stage 3: +33% This one is btw. a known issue PR108129.
(In reply to Martin Liška from comment #59) > (In reply to Andrew Carlotti from comment #58) > > Since November 2021, there's been a significant regression in the compile > > time for gimple-match.cc during a bootstrap build (+100% in Stage 2, +73% in > > Stage 3), with this regression accounting for over 20% of the current total > > bootstrap time on some aarch64 machines. > > Thank for the interesting numbers! Yeah, it's very unfortunate :/ > > > > > Most of the change in compile time is due to the following 6 commits (of > > which one is a performance improvement, and one only regressed the Stage 2 > > build): > > > > 7df89377a7ae3906255e38a79be8e5d962c3a0df 24th November 2021 > > Enhance optimize_atomic_bit_test_and to handle truncation. (Hongtao Liu) > > Stage 2: +27% > > Stage 3: +33% > > This one is btw. a known issue PR108129. But the revision only sligthly changes the patterns so I'm very curious how it arrived at 30% slowdown. Note these (match ..) patterns that are not used from inside match.pd itself (and do not use other (match ..)) would be perfect candidates to emit to separate files. Either by explicit syntax or magically where the former would be easier to cater for in the Makefile. The "trivial" improvement of course would be to special-case iterator uses als for (match ...) like we do for (simplify ...) where we can delay substitution.
(In reply to Richard Biener from comment #60) > > This one is btw. a known issue PR108129. > > But the revision only sligthly changes the patterns so I'm very curious > how it arrived at 30% slowdown. It adds an extra 'convert2?' to 'nop_atomic_bit_test_and_p' matchers, and since match.pd expansion works by emitting match subtrees twice for each '?' component, that gives an extra 2x factor to the already bad combinatorial explosion going on in those patterns. We really need to rework match-and-simplify emission in a smarter way. I've looked at that in January once, but there's a few things I'd need help understanding, such as... > The "trivial" improvement of course would be to special-case > iterator uses als for (match ...) like we do for (simplify ...) where > we can delay substitution. ... this. Is there a short explanation what's 'delayed substitution' in this context?
Looking at gimple-match.cc, the case CFN_BUILT_IN_ATOMIC_FETCH_OR_{1,2,4,8,16}: etc. blocks are identical there, except for the numbers in next_after_fail* label numbers. So, could we perhaps expand everything the way we do and just when emitting a switch hash the subtree of the cases to be emitted and if the hashes are equal also compare and if the subtrees are the same (== would result in the same text being emitted into the output except for the label numbers) emit multiple cases with the same block? Admittedly I haven't looked yet at the data structures genmatch.cc uses before emitting the source, so don't know whether it is feasible.
On Tue, 28 Mar 2023, amonakov at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 > > Alexander Monakov <amonakov at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |amonakov at gcc dot gnu.org > > --- Comment #61 from Alexander Monakov <amonakov at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #60) > > > This one is btw. a known issue PR108129. > > > > But the revision only sligthly changes the patterns so I'm very curious > > how it arrived at 30% slowdown. > > It adds an extra 'convert2?' to 'nop_atomic_bit_test_and_p' matchers, and since > match.pd expansion works by emitting match subtrees twice for each '?' > component, that gives an extra 2x factor to the already bad combinatorial > explosion going on in those patterns. > > We really need to rework match-and-simplify emission in a smarter way. I've > looked at that in January once, but there's a few things I'd need help > understanding, such as... > > > The "trivial" improvement of course would be to special-case > > iterator uses als for (match ...) like we do for (simplify ...) where > > we can delay substitution. > > ... this. Is there a short explanation what's 'delayed substitution' in this > context? 'delayed substitution' works for (simplify (...)) by not expanding the substitution for each (for ..) iterator but instead passing it as variable to a split out common function. For (match (...)) the "substitution" part is trivial so there's no point doing that. But instead we can look to apply something similar to the "matching" part. When we have (for X (A B ...) (simplify (op (X (op2 ...) ...)) ... we get for the matching of 'X' (if it's not at the toplevel) switch (...) { case A: { .. match the rest .. } case B: { .. match the rest .. } ... but we can instead emit (maybe only in a subset of cases?) switch (...) { case A: case B: case ...: { .. mach the rest .. } in theory we support things like (for X (plus IFN_POW) (... as both operators are binary - so that's cases we cannot handle this way. Basically we'd keep the user-defined operator in the AST and adjust code-generation to deal with that. I will try to do that.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:75cda3be0232f745cda4e177d514f6900390af0b commit r13-6902-g75cda3be0232f745cda4e177d514f6900390af0b Author: Richard Biener <rguenther@suse.de> Date: Tue Mar 28 12:42:14 2023 +0200 bootstrap/84402 - improve (match ...) code generation The following avoids duplicating matching code for (match ...) in match.pd when possible. That's more easily possible for (match ...) than simplify because we do not need to handle common matches (those would be diagnosed only during compiling) nor is the result able to inspect the active operator. Specifically this reduces the size of the generated matches for the atomic ops as noted in PR108129. gimple-match.cc shrinks from 245k lines to 209k lines with this patch. PR bootstrap/84402 PR tree-optimization/108129 * genmatch.cc (lower_for): For (match ...) delay substituting into the match operator if possible. (dt_operand::gen_gimple_expr): For user_id look at the first substitute for determining how to access operands. (dt_operand::gen_generic_expr): Likewise. (dt_node::gen_kids): Properly sort user_ids according to their substitutes. (dt_node::gen_kids_1): Code-generate user_id matching.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:580cda3c2799b1f8323af770e52f1eb0fa204718 commit r14-496-g580cda3c2799b1f8323af770e52f1eb0fa204718 Author: Tamar Christina <tamar.christina@arm.com> Date: Fri May 5 13:35:17 2023 +0100 match.pd: don't emit label if not needed This is a small QoL codegen improvement for match.pd to not emit labels when they are not needed. The codegen is nice and there is a small (but consistent) improvement in compile time. gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (dt_simplify::gen_1): Only emit labels if used.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:e487fcc0f7466ea663a0fea52076337bebd42b8b commit r14-497-ge487fcc0f7466ea663a0fea52076337bebd42b8b Author: Tamar Christina <tamar.christina@arm.com> Date: Fri May 5 13:36:01 2023 +0100 match.pd: Remove commented out line pragmas unless -vv is used. genmatch currently outputs commented out line directives that have no effect but the compiler still has to parse only to discard. They are however handy when debugging genmatch output. As such this moves them behind the -vv flag. gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (output_line_directive): Only emit commented directive when -vv.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:c0ce29bc1ce329001b6c02bb3d34bcbb086e1b72 commit r14-498-gc0ce29bc1ce329001b6c02bb3d34bcbb086e1b72 Author: Tamar Christina <tamar.christina@arm.com> Date: Fri May 5 13:36:43 2023 +0100 match.pd: CSE the dump output check. This is a small improvement in QoL codegen for match.pd to save time not re-evaluating the condition for printing debug information in every function. There is a small but consistent runtime and compile time win here. The runtime win comes from not having to do the condition over again, and on Arm plaforms we now use the new test-and-branch support for booleans to only have a single instruction here. gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (decision_tree::gen, write_predicate): Generate new debug_dump var. (dt_simplify::gen_1): Use it.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:27fcf994c5515e1bbf2ff03d28fd2fa927c7e7b5 commit r14-499-g27fcf994c5515e1bbf2ff03d28fd2fa927c7e7b5 Author: Tamar Christina <tamar.christina@arm.com> Date: Fri May 5 13:37:49 2023 +0100 genmatch: split shared code to gimple-match-exports.cc In preparation for automatically splitting match.pd files I split off the non-static helper functions that are shared between the match.pd functions off to another file. This file can be compiled in parallel and also allows us to later avoid duplicate symbols errors. gcc/ChangeLog: PR bootstrap/84402 * Makefile.in (OBJS): Add gimple-match-exports.o. * genmatch.cc (decision_tree::gen): Export gimple_gimplify helpers. * gimple-match-head.cc (gimple_simplify, gimple_resimplify1, gimple_resimplify2, gimple_resimplify3, gimple_resimplify4, gimple_resimplify5, constant_for_folding, convert_conditional_op, maybe_resimplify_conditional_op, gimple_match_op::resimplify, maybe_build_generic_op, build_call_internal, maybe_push_res_to_seq, do_valueize, try_conditional_simplification, gimple_extract, gimple_extract_op, canonicalize_code, commutative_binary_op_p, commutative_ternary_op_p, first_commutative_argument, associative_binary_op_p, directly_supported_p, get_conditional_internal_fn): Moved to gimple-match-exports.cc * gimple-match-exports.cc: New file.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:703417a030b3d80f55ba1402adc3f1692d3631e5 commit r14-500-g703417a030b3d80f55ba1402adc3f1692d3631e5 Author: Tamar Christina <tamar.christina@arm.com> Date: Fri May 5 13:38:50 2023 +0100 match.pd: automatically partition *-match.cc files. Following on from Richi's RFC[1] this is another attempt to split up match.pd into multiple gimple-match and generic-match files. This version is fully automated and requires no human intervention. First things first, some perf numbers. The following shows the effect of the patch on my desktop doing parallel compilation of gimple-match: +--------+------------------+--------+------------------+ | splits | rel. improvement | splits | rel. improvement | +--------+------------------+--------+------------------+ | 1 | 0.00% | 33 | 91.03% | | 2 | 71.77% | 34 | 84.02% | | 3 | 100.71% | 35 | 83.42% | | 4 | 143.08% | 36 | 78.80% | | 5 | 176.18% | 37 | 74.06% | | 6 | 174.40% | 38 | 55.76% | | 7 | 176.62% | 39 | 66.90% | | 8 | 168.35% | 40 | 18.25% | | 9 | 189.80% | 41 | 16.55% | | 10 | 171.77% | 42 | 47.02% | | 11 | 152.82% | 43 | 15.29% | | 12 | 112.20% | 44 | 21.63% | | 13 | 158.57% | 45 | 41.53% | | 14 | 158.57% | 46 | 21.98% | | 15 | 152.07% | 47 | -42.74% | | 16 | 151.70% | 48 | -32.62% | | 17 | 131.52% | 49 | 11.81% | | 18 | 133.11% | 50 | 34.07% | | 19 | 137.33% | 51 | 2.71% | | 20 | 103.83% | 52 | -22.23% | | 21 | 132.47% | 53 | 32.30% | | 22 | 116.52% | 54 | 21.45% | | 23 | 112.73% | 55 | 40.02% | | 24 | 111.94% | 56 | 42.83% | | 25 | 112.73% | 57 | -9.98% | | 26 | 104.07% | 58 | 18.01% | | 27 | 113.27% | 59 | -4.91% | | 28 | 96.77% | 60 | 22.94% | | 29 | 93.42% | 61 | -3.73% | | 30 | 87.67% | 62 | -27.43% | | 31 | 89.54% | 63 | -1.05% | | 32 | 84.42% | 64 | -5.44% | +--------+------------------+--------+------------------+ As can be seen there seems to be a point of diminishing returns in doing splits. This comes from the fact that these match files consume a sizeable amount of headers. At a certain point the parsing overhead of the headers dominate and you start losing in gains. As such from this I've made the default 10 splits per file to allow for some room for growth in the future without needing changes to the split amount. Since 5-10 show roughly the same gains it means we can afford to double the file sizes before we need to up the split amount. This can be controlled by the configure parameter --with-matchpd-partitions=. At 10 splits the sizes of the files are: 1.2M gimple-match-1.cc 490K gimple-match-2.cc 459K gimple-match-3.cc 462K gimple-match-4.cc 466K gimple-match-5.cc 690K gimple-match-6.cc 517K gimple-match-7.cc 693K gimple-match-8.cc 1011K gimple-match-9.cc 490K gimple-match-10.cc 210K gimple-match-auto.h The reason gimple-match-1.cc is so large is because it got allocated a very large function: gimple_simplify_NE_EXPR. Because of these sporadically large functions the allocation to a split happens based on the amount of data already written to a split instead of just a simple round robin allocation (though the patch supports that too.). This means that once gimple_simplify_NE_EXPR is allocated to gimple-match-1.cc nothing uses it again until the rest of the files catch up. To support this split a new header file *-match-auto.h is generated to allow the individual files to compile separately. Lastly for the auto generated files I use pragmas to silence the unused predicate warnings instead of the previous Makefile way because I couldn't find a way to set them without knowing the number of split files beforehand. Finally with this change, bootstrap time has dropped 8 minutes on AArch64. [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-04/msg01125.html gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (emit_func, SIZED_BASED_CHUNKS, get_out_file): New. (decision_tree::gen): Accept list of files instead of single and update to write function definition to header and main file. (write_predicate): Likewise. (write_header): Emit pragmas and new includes. (main): Create file buffers and cleanup. (showUsage, write_header_includes): New.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:0a85544e1aaeca41133ecfc438cda913dbc0f122 commit r14-501-g0a85544e1aaeca41133ecfc438cda913dbc0f122 Author: Tamar Christina <tamar.christina@arm.com> Date: Fri May 5 13:42:17 2023 +0100 match.pd: Use splits in makefile and make configurable. This updates the build system to split up match.pd files into chunks of 10. This also introduces a new flag --with-matchpd-partitions which can be used to change the number of partitions. For the analysis of why 10 please look at the previous patch in the series. gcc/ChangeLog: PR bootstrap/84402 * Makefile.in (NUM_MATCH_SPLITS, MATCH_SPLITS_SEQ, GIMPLE_MATCH_PD_SEQ_SRC, GIMPLE_MATCH_PD_SEQ_O, GENERIC_MATCH_PD_SEQ_SRC, GENERIC_MATCH_PD_SEQ_O): New. (OBJS, MOSTLYCLEANFILES, .PRECIOUS): Use them. (s-match): Split into s-generic-match and s-gimple-match. * configure.ac (with-matchpd-partitions, DEFAULT_MATCHPD_PARTITIONS): New. * configure: Regenerate.
The master branch has been updated by Robin Dapp <rdapp@gcc.gnu.org>: https://gcc.gnu.org/g:184378027e92f51e02d3649e0ca523f487fd2810 commit r14-5034-g184378027e92f51e02d3649e0ca523f487fd2810 Author: Robin Dapp <rdapp@ventanamicro.com> Date: Thu Oct 12 11:23:26 2023 +0200 genemit: Split insn-emit.cc into several partitions. On riscv insn-emit.cc has grown to over 1.2 mio lines of code and compiling it takes considerable time. Therefore, this patch adjust genemit to create several partitions (insn-emit-1.cc to insn-emit-n.cc). The available patterns are written to the given files in a sequential fashion. Similar to match.pd a configure option --with-emitinsn-partitions=num is introduced that makes the number of partition configurable. gcc/ChangeLog: PR bootstrap/84402 PR target/111600 * Makefile.in: Handle split insn-emit.cc. * configure: Regenerate. * configure.ac: Add --with-insnemit-partitions. * genemit.cc (output_peephole2_scratches): Print to file instead of stdout. (print_code): Ditto. (gen_rtx_scratch): Ditto. (gen_exp): Ditto. (gen_emit_seq): Ditto. (emit_c_code): Ditto. (gen_insn): Ditto. (gen_expand): Ditto. (gen_split): Ditto. (output_add_clobbers): Ditto. (output_added_clobbers_hard_reg_p): Ditto. (print_overload_arguments): Ditto. (print_overload_test): Ditto. (handle_overloaded_code_for): Ditto. (handle_overloaded_gen): Ditto. (print_header): New function. (handle_arg): New function. (main): Split output into 10 files. * gensupport.cc (count_patterns): New function. * gensupport.h (count_patterns): Define. * read-md.cc (md_reader::print_md_ptr_loc): Add file argument. * read-md.h (class md_reader): Change definition.