This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Maxim Kuvyrkov <maxim dot kuvyrkov at linaro dot org>, Richard Guenther <richard dot guenther at gmail dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Alexander Monakov <amonakov at ispras dot ru>, nd <nd at arm dot com>
- Date: Thu, 29 Aug 2019 17:34:03 +0000
- Subject: Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0owLW9B9kRkSG3eHtDEfRlwDIjpsFFhY1QXp8gXfYKQ=; b=S3xWxuSfO7d/WDB1Y45mlADi0Cmd/JYJluBFDAJ2duc6UR3xvHrApqTGbomqHh8MUxySU/vSFDNLw0sPyGD+ISGeKY6VHWi5aS+tJqr7k6Nq3EMHTOh3tku7c2BSjdhJV85YA9tVm/E9REvyUp8vaWHQ5GyedmhxhtXwH3ZhW3snYxPKd0yVhKHa8yQ3uRQ4/YIdxGnqRRFkiebaksQoCgOpCi568SAhmfVOpI3wm6vQpsF0cQzYCdSvktulKj4ouRcrc60fDvNyK195A3Bv/3RLkgLdzRWHeQVP7y9X41mHwINgyGsuiu3R2v8WA9lUwDge8zTyeDMW4QEFY6O98w==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CGfENmhECXyARpS31OaWphliNqxN7a8robDsLAIGBB+LmUwfzZbW2ffMSvv9bDou7ATxpjaxNM5zbnTixnP0Gz3Uzb2cGi1pNXiscA6UP5a2gBJj/wW4OrZb1SuMT0d/1FYbsdapjHpxol5nWHzINGDVEBURxAsgDOkah+EQ4J0fVOA3PsqB+uJpcUDQltMxmDvv66BPzICjF6n78v72WmstKJA2ZRMWfOpNqUnbrYNc0xVtSpdEYvEy+rMRVliHC8tdA3ZQ2S/zyvblcmO4KaMT+XG+CelnPg9nbu7NUEnwsGAgSzWXjCURcBFHjixRy2buWXGCwVATwPiUt6fbhw==
- Original-authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
- References: <D46C8D08-685F-41A7-8695-23BB65B74A87@linaro.org> <09F25146-8361-4FB0-AE6B-E13BF8CF332F@gmail.com>,<F3D1DE53-D56C-4293-87C5-AA71EEE67680@linaro.org>,<VI1PR0801MB2127C0534510021E6A4BBE0583A20@VI1PR0801MB2127.eurprd08.prod.outlook.com>
Hi Maxim,
> It appears that cores with autoprefetcher hardware prefer loads and stores bundled together, not interspersed with > other instructions to occupy the rest of CPU units.
I don't believe it is as simple as that - modern cores have multiple prefetchers but
won't prefer bundling loads and stores in large blocks. That would result in terrible
performance due to dispatch and issue stalls. Also the increased register pressure
could cause extra spilling. If we group loads and stores, we'd definitely need to
limit them to say 4 or so at most, and then interleave ALU operations.
> Autoprefetching heuristic is enabled only for cores that support it, and isn't active for by default.
It's enabled on most cores, including the default (generic). So we do have to be
careful that this doesn't regress any other benchmarks or do worse on modern
cores.
Cheers,
Wilco