This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PING^3] Re: [PATCH 1/4] Add gcc-auto-profile script


Andi Kleen <andi@firstfloor.org> writes:

Ping^3 for the patch series!

> Andi Kleen <andi@firstfloor.org> writes:
>
> Ping^2 for the patch series!
>
>> Andi Kleen <andi@firstfloor.org> writes:
>>
>> Ping for the patch series!
>>
>>> From: Andi Kleen <ak@linux.intel.com>
>>>
>>> Using autofdo is currently something difficult. It requires using the
>>> model specific branches taken event, which differs on different CPUs.
>>> The example shown in the manual requires a special patched version of
>>> perf that is non standard, and also will likely not work everywhere.
>>>
>>> This patch adds a new gcc-auto-profile script that figures out the
>>> correct event and runs perf. The script is installed with on Linux systems.
>>>
>>> Since maintaining the script would be somewhat tedious (needs changes
>>> every time a new CPU comes out) I auto generated it from the online
>>> Intel event database. The script to do that is in contrib and can be
>>> rerun.
>>>
>>> Right now there is no test if perf works in configure. This
>>> would vary depending on the build and target system, and since
>>> it currently doesn't work in virtualization and needs uptodate
>>> kernel it may often fail in common distribution build setups.
>>>
>>> So Linux just hardcodes installing the script, but it may fail at runtime.
>>>
>>> This is needed to actually make use of autofdo in a generic way
>>> in the build system and in the test suite.
>>>
>>> So far the script is not installed.
>>>
>>> gcc/:
>>> 2016-03-27  Andi Kleen  <ak@linux.intel.com>
>>>
>>> 	* doc/invoke.texi: Document gcc-auto-profile
>>> 	* gcc-auto-profile: Create.
>>>
>>> contrib/:
>>>
>>> 2016-03-27  Andi Kleen  <ak@linux.intel.com>
>>>
>>> 	* gen_autofdo_event.py: New file to regenerate
>>> 	gcc-auto-profile.
>>> ---
>>>  contrib/gen_autofdo_event.py | 155 +++++++++++++++++++++++++++++++++++++++++++
>>>  gcc/doc/invoke.texi          |  31 +++++++--
>>>  gcc/gcc-auto-profile         |  70 +++++++++++++++++++
>>>  3 files changed, 251 insertions(+), 5 deletions(-)
>>>  create mode 100755 contrib/gen_autofdo_event.py
>>>  create mode 100755 gcc/gcc-auto-profile
>>>
>>> diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
>>> new file mode 100755
>>> index 0000000..db4db33
>>> --- /dev/null
>>> +++ b/contrib/gen_autofdo_event.py
>>> @@ -0,0 +1,155 @@
>>> +#!/usr/bin/python
>>> +# generate Intel taken branches Linux perf event script for autofdo profiling
>>> +
>>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>>> +#
>>> +# GCC is free software; you can redistribute it and/or modify it under
>>> +# the terms of the GNU General Public License as published by the Free
>>> +# Software Foundation; either version 3, or (at your option) any later
>>> +# version.
>>> +#
>>> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +# for more details.
>>> +#
>>> +# You should have received a copy of the GNU General Public License
>>> +# along with GCC; see the file COPYING3.  If not see
>>> +# <http://www.gnu.org/licenses/>.  */
>>> +
>>> +# run it with perf record -b -e EVENT program ...
>>> +# The Linux Kernel needs to support the PMU of the current CPU, and
>>> +# it will likely not work in VMs.
>>> +# add --all to print for all cpus, otherwise for current cpu
>>> +# add --script to generate shell script to run correct event
>>> +#
>>> +# requires internet (https) access. this may require setting up a proxy
>>> +# with export https_proxy=...
>>> +#
>>> +import urllib2
>>> +import sys
>>> +import json
>>> +import argparse
>>> +import collections
>>> +
>>> +baseurl = "https://download.01.org/perfmon";
>>> +
>>> +target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
>>> +                 u'BR_INST_EXEC.TAKEN',
>>> +                 u'BR_INST_RETIRED.TAKEN_JCC',
>>> +                 u'BR_INST_TYPE_RETIRED.COND_TAKEN')
>>> +
>>> +ap = argparse.ArgumentParser()
>>> +ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true')
>>> +ap.add_argument('--script', help='Generate shell script', action='store_true')
>>> +args = ap.parse_args()
>>> +
>>> +eventmap = collections.defaultdict(list)
>>> +
>>> +def get_cpu_str():
>>> +    with open('/proc/cpuinfo', 'r') as c:
>>> +        vendor, fam, model = None, None, None
>>> +        for j in c:
>>> +            n = j.split()
>>> +            if n[0] == 'vendor_id':
>>> +                vendor = n[2]
>>> +            elif n[0] == 'model' and n[1] == ':':
>>> +                model = int(n[2])
>>> +            elif n[0] == 'cpu' and n[1] == 'family':
>>> +                fam = int(n[3])
>>> +            if vendor and fam and model:
>>> +                return "%s-%d-%X" % (vendor, fam, model), model
>>> +    return None, None
>>> +
>>> +def find_event(eventurl, model):
>>> +    print >>sys.stderr, "Downloading", eventurl
>>> +    u = urllib2.urlopen(eventurl)
>>> +    events = json.loads(u.read())
>>> +    u.close()
>>> +
>>> +    found = 0
>>> +    for j in events:
>>> +        if j[u'EventName'] in target_events:
>>> +            event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
>>> +            if u'PEBS' in j and j[u'PEBS'] > 0:
>>> +                event += "p"
>>> +            if args.script:
>>> +                eventmap[event].append(model)
>>> +            else:
>>> +                print j[u'EventName'], "event for model", model, "is", event
>>> +            found += 1
>>> +    return found
>>> +
>>> +if not args.all:
>>> +    cpu, model = get_cpu_str()
>>> +    if not cpu:
>>> +        sys.exit("Unknown CPU type")
>>> +
>>> +url = baseurl + "/mapfile.csv"
>>> +print >>sys.stderr, "Downloading", url
>>> +u = urllib2.urlopen(url)
>>> +found = 0
>>> +cpufound = 0
>>> +for j in u:
>>> +    n = j.rstrip().split(',')
>>> +    if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
>>> +        if args.all:
>>> +            vendor, fam, model = n[0].split("-")
>>> +            model = int(model, 16)
>>> +        cpufound += 1
>>> +        found += find_event(baseurl + n[2], model)
>>> +u.close()
>>> +
>>> +if args.script:
>>> +    print '''#!/bin/sh
>>> +# profile workload for gcc profile feedback (autofdo) using Linux perf
>>> +# auto generated. to regenerate for new CPUs run
>>> +# contrib/gen_autofdo_event.py --shell --all in gcc source
>>> +
>>> +# usages:
>>> +# gcc-auto-profile program             (profile program and children)
>>> +# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
>>> +# gcc-auto-profile -p PID sleep X      (profile PID)
>>> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
>>> +# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
>>> +
>>> +# identify branches taken event for CPU
>>> +#
>>> +
>>> +FLAGS=u
>>> +
>>> +if [ "$1" = "--kernel" ] ; then
>>> +  FLAGS=k
>>> +  shift
>>> +fi
>>> +if [ "$1" == "--all" ] ; then
>>> +  FLAGS=uk
>>> +  shift
>>> +fi
>>> +
>>> +if ! grep -q Intel /proc/cpuinfo ] ; then
>>> +  echo >&2 "Only Intel CPUs supported"
>>> +  exit 1
>>> +fi
>>> +
>>> +if grep -q hypervisor /proc/cpuinfo ; then
>>> +  echo >&2 "Warning: branch profiling may not be functional in VMs"
>>> +fi
>>> +
>>> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
>>> +  egrep "^model\s*:" /proc/cpuinfo | head -1` in'''
>>> +    for event, mod in eventmap.iteritems():
>>> +        for m in mod[:-1]:
>>> +            print "model*:\ %s|\\" % m
>>> +        print 'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)
>>> +    print '''*)
>>> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
>>> +	exit 1 ;;'''
>>> +    print "esac"
>>> +    print 'exec perf record -e $E -b "$@"'
>>> +
>>> +if cpufound == 0 and not args.all:
>>> +    sys.exit('CPU %s not found' % cpu)
>>> +
>>> +if found == 0:
>>> +    sys.exit('Branch event not found')
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 9e54bb7..427d89a 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -8249,13 +8249,34 @@ which are generally profitable only with profile feedback available:
>>>  If omitted, it defaults to @file{fbdata.afdo} in the current directory.
>>>  
>>>  Producing an AutoFDO profile data file requires running your program
>>> -with the @command{perf} utility on a supported GNU/Linux target system.
>>> +with the @command{gcc-auto-profile} utility on a supported GNU/Linux target system. @command{gcc-auto-profile} calls the @command{perf} utility.
>>> +It also requires Last-Branch-Record support, which typically requires
>>> +a new enough kernel not running virtualized.
>>> +@command{gcc-auto-profile} accepts the same arguments as @command{perf record}.
>>>  For more information, see @uref{https://perf.wiki.kernel.org/}.
>>>  
>>> -E.g.
>>>  @smallexample
>>> -perf record -e br_inst_retired:near_taken -b -o perf.data \
>>> -    -- your_program
>>> +gcc-auto-profile your_program
>>> +@end smallexample
>>> +
>>> +On larger programs the resulting perf.data file may be very large.
>>> +In this case it can be better to reduce the sampling rate.
>>> +Collect samples every million taken branches:
>>> +
>>> +@smallexample
>>> +gcc-auto-profile -c 1000000 program
>>> +@end smallexample
>>> +
>>> +Or only profile representative run intervals of the program:
>>> +
>>> +@smallexample
>>> +gcc-auto-profile -p PID-OF-PROGRAM sleep 5
>>> +@end smallexample
>>> +
>>> +Profile complete system for 10 seconds (may require root)
>>> +
>>> +@smallexample
>>> +gcc-auto-profile -a sleep 10
>>>  @end smallexample
>>>  
>>>  Then use the @command{create_gcov} tool to convert the raw profile data
>>> @@ -8266,7 +8287,7 @@ See @uref{https://github.com/google/autofdo}.
>>>  E.g.
>>>  @smallexample
>>>  create_gcov --binary=your_program.unstripped --profile=perf.data \
>>> -    --gcov=profile.afdo
>>> +    --gcov=profile.afdo -gcov_version 1
>>>  @end smallexample
>>>  @end table
>>>  
>>> diff --git a/gcc/gcc-auto-profile b/gcc/gcc-auto-profile
>>> new file mode 100755
>>> index 0000000..c6712b2
>>> --- /dev/null
>>> +++ b/gcc/gcc-auto-profile
>>> @@ -0,0 +1,70 @@
>>> +#!/bin/sh
>>> +# profile workload for gcc profile feedback (autofdo) using Linux perf
>>> +# auto generated. to regenerate for new CPUs run
>>> +# contrib/gen_autofdo_event.py --shell --all in gcc source
>>> +
>>> +# usages:
>>> +# gcc-auto-profile program             (profile program and children)
>>> +# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
>>> +# gcc-auto-profile -p PID sleep X      (profile PID)
>>> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
>>> +# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
>>> +
>>> +# identify branches taken event for CPU
>>> +#
>>> +
>>> +FLAGS=u
>>> +
>>> +if [ "$1" = "--kernel" ] ; then
>>> +  FLAGS=k
>>> +  shift
>>> +fi
>>> +if [ "$1" == "--all" ] ; then
>>> +  FLAGS=uk
>>> +  shift
>>> +fi
>>> +
>>> +if ! grep -q Intel /proc/cpuinfo ] ; then
>>> +  echo >&2 "Only Intel CPUs supported"
>>> +  exit 1
>>> +fi
>>> +
>>> +if grep -q hypervisor /proc/cpuinfo ; then
>>> +  echo >&2 "Warning: branch profiling may not be functional in VMs"
>>> +fi
>>> +
>>> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
>>> +  egrep "^model\s*:" /proc/cpuinfo | head -1` in
>>> +model*:\ 55|\
>>> +model*:\ 77|\
>>> +model*:\ 76) E="cpu/event=0xC4,umask=0xFE/p$FLAGS" ;;
>>> +model*:\ 42|\
>>> +model*:\ 45|\
>>> +model*:\ 58|\
>>> +model*:\ 62|\
>>> +model*:\ 60|\
>>> +model*:\ 69|\
>>> +model*:\ 70|\
>>> +model*:\ 63|\
>>> +model*:\ 61|\
>>> +model*:\ 71|\
>>> +model*:\ 86|\
>>> +model*:\ 78|\
>>> +model*:\ 94) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;;
>>> +model*:\ 46|\
>>> +model*:\ 30|\
>>> +model*:\ 31|\
>>> +model*:\ 26|\
>>> +model*:\ 47|\
>>> +model*:\ 37|\
>>> +model*:\ 44) E="cpu/event=0x88,umask=0x40/p$FLAGS" ;;
>>> +model*:\ 28|\
>>> +model*:\ 38|\
>>> +model*:\ 39|\
>>> +model*:\ 54|\
>>> +model*:\ 53) E="cpu/event=0x88,umask=0x41/p$FLAGS" ;;
>>> +*)
>>> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
>>> +	exit 1 ;;
>>> +esac
>>> +exec perf record -e $E -b "$@"

-- 
ak@linux.intel.com -- Speaking for myself only


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]