This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PING^3] Re: [PATCH 1/4] Add gcc-auto-profile script
- From: Andi Kleen <andi at firstfloor dot org>
- To: gcc-patches at gcc dot gnu dot org
- Date: Wed, 27 Apr 2016 07:29:33 -0700
- Subject: Re: [PING^3] Re: [PATCH 1/4] Add gcc-auto-profile script
- Authentication-results: sourceware.org; auth=none
- References: <1459140266-6902-1-git-send-email-andi at firstfloor dot org> <878u0dwp9v dot fsf at tassilo dot jf dot intel dot com> <8737qc713x dot fsf_-_ at tassilo dot jf dot intel dot com>
Andi Kleen <andi@firstfloor.org> writes:
Ping^3 for the patch series!
> Andi Kleen <andi@firstfloor.org> writes:
>
> Ping^2 for the patch series!
>
>> Andi Kleen <andi@firstfloor.org> writes:
>>
>> Ping for the patch series!
>>
>>> From: Andi Kleen <ak@linux.intel.com>
>>>
>>> Using autofdo is currently something difficult. It requires using the
>>> model specific branches taken event, which differs on different CPUs.
>>> The example shown in the manual requires a special patched version of
>>> perf that is non standard, and also will likely not work everywhere.
>>>
>>> This patch adds a new gcc-auto-profile script that figures out the
>>> correct event and runs perf. The script is installed with on Linux systems.
>>>
>>> Since maintaining the script would be somewhat tedious (needs changes
>>> every time a new CPU comes out) I auto generated it from the online
>>> Intel event database. The script to do that is in contrib and can be
>>> rerun.
>>>
>>> Right now there is no test if perf works in configure. This
>>> would vary depending on the build and target system, and since
>>> it currently doesn't work in virtualization and needs uptodate
>>> kernel it may often fail in common distribution build setups.
>>>
>>> So Linux just hardcodes installing the script, but it may fail at runtime.
>>>
>>> This is needed to actually make use of autofdo in a generic way
>>> in the build system and in the test suite.
>>>
>>> So far the script is not installed.
>>>
>>> gcc/:
>>> 2016-03-27 Andi Kleen <ak@linux.intel.com>
>>>
>>> * doc/invoke.texi: Document gcc-auto-profile
>>> * gcc-auto-profile: Create.
>>>
>>> contrib/:
>>>
>>> 2016-03-27 Andi Kleen <ak@linux.intel.com>
>>>
>>> * gen_autofdo_event.py: New file to regenerate
>>> gcc-auto-profile.
>>> ---
>>> contrib/gen_autofdo_event.py | 155 +++++++++++++++++++++++++++++++++++++++++++
>>> gcc/doc/invoke.texi | 31 +++++++--
>>> gcc/gcc-auto-profile | 70 +++++++++++++++++++
>>> 3 files changed, 251 insertions(+), 5 deletions(-)
>>> create mode 100755 contrib/gen_autofdo_event.py
>>> create mode 100755 gcc/gcc-auto-profile
>>>
>>> diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
>>> new file mode 100755
>>> index 0000000..db4db33
>>> --- /dev/null
>>> +++ b/contrib/gen_autofdo_event.py
>>> @@ -0,0 +1,155 @@
>>> +#!/usr/bin/python
>>> +# generate Intel taken branches Linux perf event script for autofdo profiling
>>> +
>>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>>> +#
>>> +# GCC is free software; you can redistribute it and/or modify it under
>>> +# the terms of the GNU General Public License as published by the Free
>>> +# Software Foundation; either version 3, or (at your option) any later
>>> +# version.
>>> +#
>>> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
>>> +# for more details.
>>> +#
>>> +# You should have received a copy of the GNU General Public License
>>> +# along with GCC; see the file COPYING3. If not see
>>> +# <http://www.gnu.org/licenses/>. */
>>> +
>>> +# run it with perf record -b -e EVENT program ...
>>> +# The Linux Kernel needs to support the PMU of the current CPU, and
>>> +# it will likely not work in VMs.
>>> +# add --all to print for all cpus, otherwise for current cpu
>>> +# add --script to generate shell script to run correct event
>>> +#
>>> +# requires internet (https) access. this may require setting up a proxy
>>> +# with export https_proxy=...
>>> +#
>>> +import urllib2
>>> +import sys
>>> +import json
>>> +import argparse
>>> +import collections
>>> +
>>> +baseurl = "https://download.01.org/perfmon"
>>> +
>>> +target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
>>> + u'BR_INST_EXEC.TAKEN',
>>> + u'BR_INST_RETIRED.TAKEN_JCC',
>>> + u'BR_INST_TYPE_RETIRED.COND_TAKEN')
>>> +
>>> +ap = argparse.ArgumentParser()
>>> +ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true')
>>> +ap.add_argument('--script', help='Generate shell script', action='store_true')
>>> +args = ap.parse_args()
>>> +
>>> +eventmap = collections.defaultdict(list)
>>> +
>>> +def get_cpu_str():
>>> + with open('/proc/cpuinfo', 'r') as c:
>>> + vendor, fam, model = None, None, None
>>> + for j in c:
>>> + n = j.split()
>>> + if n[0] == 'vendor_id':
>>> + vendor = n[2]
>>> + elif n[0] == 'model' and n[1] == ':':
>>> + model = int(n[2])
>>> + elif n[0] == 'cpu' and n[1] == 'family':
>>> + fam = int(n[3])
>>> + if vendor and fam and model:
>>> + return "%s-%d-%X" % (vendor, fam, model), model
>>> + return None, None
>>> +
>>> +def find_event(eventurl, model):
>>> + print >>sys.stderr, "Downloading", eventurl
>>> + u = urllib2.urlopen(eventurl)
>>> + events = json.loads(u.read())
>>> + u.close()
>>> +
>>> + found = 0
>>> + for j in events:
>>> + if j[u'EventName'] in target_events:
>>> + event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
>>> + if u'PEBS' in j and j[u'PEBS'] > 0:
>>> + event += "p"
>>> + if args.script:
>>> + eventmap[event].append(model)
>>> + else:
>>> + print j[u'EventName'], "event for model", model, "is", event
>>> + found += 1
>>> + return found
>>> +
>>> +if not args.all:
>>> + cpu, model = get_cpu_str()
>>> + if not cpu:
>>> + sys.exit("Unknown CPU type")
>>> +
>>> +url = baseurl + "/mapfile.csv"
>>> +print >>sys.stderr, "Downloading", url
>>> +u = urllib2.urlopen(url)
>>> +found = 0
>>> +cpufound = 0
>>> +for j in u:
>>> + n = j.rstrip().split(',')
>>> + if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
>>> + if args.all:
>>> + vendor, fam, model = n[0].split("-")
>>> + model = int(model, 16)
>>> + cpufound += 1
>>> + found += find_event(baseurl + n[2], model)
>>> +u.close()
>>> +
>>> +if args.script:
>>> + print '''#!/bin/sh
>>> +# profile workload for gcc profile feedback (autofdo) using Linux perf
>>> +# auto generated. to regenerate for new CPUs run
>>> +# contrib/gen_autofdo_event.py --shell --all in gcc source
>>> +
>>> +# usages:
>>> +# gcc-auto-profile program (profile program and children)
>>> +# gcc-auto-profile -a sleep X (profile all for X secs, may need root)
>>> +# gcc-auto-profile -p PID sleep X (profile PID)
>>> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
>>> +# gcc-auto-profile --all -a sleep X (profile kernel and user space)
>>> +
>>> +# identify branches taken event for CPU
>>> +#
>>> +
>>> +FLAGS=u
>>> +
>>> +if [ "$1" = "--kernel" ] ; then
>>> + FLAGS=k
>>> + shift
>>> +fi
>>> +if [ "$1" == "--all" ] ; then
>>> + FLAGS=uk
>>> + shift
>>> +fi
>>> +
>>> +if ! grep -q Intel /proc/cpuinfo ] ; then
>>> + echo >&2 "Only Intel CPUs supported"
>>> + exit 1
>>> +fi
>>> +
>>> +if grep -q hypervisor /proc/cpuinfo ; then
>>> + echo >&2 "Warning: branch profiling may not be functional in VMs"
>>> +fi
>>> +
>>> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
>>> + egrep "^model\s*:" /proc/cpuinfo | head -1` in'''
>>> + for event, mod in eventmap.iteritems():
>>> + for m in mod[:-1]:
>>> + print "model*:\ %s|\\" % m
>>> + print 'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)
>>> + print '''*)
>>> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
>>> + exit 1 ;;'''
>>> + print "esac"
>>> + print 'exec perf record -e $E -b "$@"'
>>> +
>>> +if cpufound == 0 and not args.all:
>>> + sys.exit('CPU %s not found' % cpu)
>>> +
>>> +if found == 0:
>>> + sys.exit('Branch event not found')
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 9e54bb7..427d89a 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -8249,13 +8249,34 @@ which are generally profitable only with profile feedback available:
>>> If omitted, it defaults to @file{fbdata.afdo} in the current directory.
>>>
>>> Producing an AutoFDO profile data file requires running your program
>>> -with the @command{perf} utility on a supported GNU/Linux target system.
>>> +with the @command{gcc-auto-profile} utility on a supported GNU/Linux target system. @command{gcc-auto-profile} calls the @command{perf} utility.
>>> +It also requires Last-Branch-Record support, which typically requires
>>> +a new enough kernel not running virtualized.
>>> +@command{gcc-auto-profile} accepts the same arguments as @command{perf record}.
>>> For more information, see @uref{https://perf.wiki.kernel.org/}.
>>>
>>> -E.g.
>>> @smallexample
>>> -perf record -e br_inst_retired:near_taken -b -o perf.data \
>>> - -- your_program
>>> +gcc-auto-profile your_program
>>> +@end smallexample
>>> +
>>> +On larger programs the resulting perf.data file may be very large.
>>> +In this case it can be better to reduce the sampling rate.
>>> +Collect samples every million taken branches:
>>> +
>>> +@smallexample
>>> +gcc-auto-profile -c 1000000 program
>>> +@end smallexample
>>> +
>>> +Or only profile representative run intervals of the program:
>>> +
>>> +@smallexample
>>> +gcc-auto-profile -p PID-OF-PROGRAM sleep 5
>>> +@end smallexample
>>> +
>>> +Profile complete system for 10 seconds (may require root)
>>> +
>>> +@smallexample
>>> +gcc-auto-profile -a sleep 10
>>> @end smallexample
>>>
>>> Then use the @command{create_gcov} tool to convert the raw profile data
>>> @@ -8266,7 +8287,7 @@ See @uref{https://github.com/google/autofdo}.
>>> E.g.
>>> @smallexample
>>> create_gcov --binary=your_program.unstripped --profile=perf.data \
>>> - --gcov=profile.afdo
>>> + --gcov=profile.afdo -gcov_version 1
>>> @end smallexample
>>> @end table
>>>
>>> diff --git a/gcc/gcc-auto-profile b/gcc/gcc-auto-profile
>>> new file mode 100755
>>> index 0000000..c6712b2
>>> --- /dev/null
>>> +++ b/gcc/gcc-auto-profile
>>> @@ -0,0 +1,70 @@
>>> +#!/bin/sh
>>> +# profile workload for gcc profile feedback (autofdo) using Linux perf
>>> +# auto generated. to regenerate for new CPUs run
>>> +# contrib/gen_autofdo_event.py --shell --all in gcc source
>>> +
>>> +# usages:
>>> +# gcc-auto-profile program (profile program and children)
>>> +# gcc-auto-profile -a sleep X (profile all for X secs, may need root)
>>> +# gcc-auto-profile -p PID sleep X (profile PID)
>>> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
>>> +# gcc-auto-profile --all -a sleep X (profile kernel and user space)
>>> +
>>> +# identify branches taken event for CPU
>>> +#
>>> +
>>> +FLAGS=u
>>> +
>>> +if [ "$1" = "--kernel" ] ; then
>>> + FLAGS=k
>>> + shift
>>> +fi
>>> +if [ "$1" == "--all" ] ; then
>>> + FLAGS=uk
>>> + shift
>>> +fi
>>> +
>>> +if ! grep -q Intel /proc/cpuinfo ] ; then
>>> + echo >&2 "Only Intel CPUs supported"
>>> + exit 1
>>> +fi
>>> +
>>> +if grep -q hypervisor /proc/cpuinfo ; then
>>> + echo >&2 "Warning: branch profiling may not be functional in VMs"
>>> +fi
>>> +
>>> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
>>> + egrep "^model\s*:" /proc/cpuinfo | head -1` in
>>> +model*:\ 55|\
>>> +model*:\ 77|\
>>> +model*:\ 76) E="cpu/event=0xC4,umask=0xFE/p$FLAGS" ;;
>>> +model*:\ 42|\
>>> +model*:\ 45|\
>>> +model*:\ 58|\
>>> +model*:\ 62|\
>>> +model*:\ 60|\
>>> +model*:\ 69|\
>>> +model*:\ 70|\
>>> +model*:\ 63|\
>>> +model*:\ 61|\
>>> +model*:\ 71|\
>>> +model*:\ 86|\
>>> +model*:\ 78|\
>>> +model*:\ 94) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;;
>>> +model*:\ 46|\
>>> +model*:\ 30|\
>>> +model*:\ 31|\
>>> +model*:\ 26|\
>>> +model*:\ 47|\
>>> +model*:\ 37|\
>>> +model*:\ 44) E="cpu/event=0x88,umask=0x40/p$FLAGS" ;;
>>> +model*:\ 28|\
>>> +model*:\ 38|\
>>> +model*:\ 39|\
>>> +model*:\ 54|\
>>> +model*:\ 53) E="cpu/event=0x88,umask=0x41/p$FLAGS" ;;
>>> +*)
>>> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
>>> + exit 1 ;;
>>> +esac
>>> +exec perf record -e $E -b "$@"
--
ak@linux.intel.com -- Speaking for myself only