Exploiting fine-grain thread level parallelism on the MIT Multi-ALU Processor

被引：24

作者：

Keckler, SW ^{[1
]}

Dally, WJ ^{[1
]}

Maskit, D ^{[1
]}

Carter, NP ^{[1
]}

Chang, A ^{[1
]}

Lee, WS ^{[1
]}

机构：

[1] Stanford Univ, Comp Syst Lab, Stanford, CA 94305 USA

来源：

25TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS | 1998年

关键词：

D O I：

10.1109/ISCA.1998.694790

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Historically, parallelism has been exploited either at the instruction level with a grain-size of a single instruction or by partitioning applications into coarse threads with grain-sizes of thousands of instructions. Fine-grain threads fill the parallelism gap between these extremes by enabling tasks with run lengths as small as 20 cycles. As this fine-grainparallelism is orthogonal to ILP and coarse threads, it complements both methods and provides an opportunity for greater speedup. This paper describes the efficient communication and synchronization mechanisms implemented in the Multi-ALU Processor (MAP) chip, including a thread creation instruction, register communication, and a hardware barrier These register-based mechanisms provide 10 times faster communication and 60 times faster synchronization than mechanisms that operate via a shared on-chip cache. With a three-processor implementation of the MAT: fine-grain speedups of 1.2-2.1 are demonstrated on a suite of applications.

引用

页码：306 / 317

页数：12