Scheduled Dataflow: Execution paradigm, architecture, and performance evaluation

被引:50
作者
Kavi, KM [1 ]
Giorgi, R
Arul, J
机构
[1] Univ Alabama, Dept Elect & Comp Engn, Huntsville, AL 35899 USA
[2] Univ Siena, Dept Ing Informaz, I-53100 Siena, Italy
关键词
multithreaded architectures; dataflow architectures; superscalar; decoupled architectures; Thread Level Parallelism;
D O I
10.1109/12.947003
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, the Scheduled Dataflow (SDF) architecture - a decoupled memory/execution, multithreaded architecture using nonblocking threads - is presented in detail and evaluated against Superscalar architecture. Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs. This trend allows for better performance, but at the expense of increased hardware complexity and, possibly, higher power expenditures resulting from dynamic instruction scheduling. Our research deviates from this trend by exploring a simpler, yet powerful execution paradigm that is based on dataflow and multithreading. A program is partitioned into nonblocking execution threads. In addition, all memory accesses are decoupled from the thread's execution. Data is preloaded into the thread's context (registers) and all results are poststored after the completion of the thread's execution. While multithreading and decoupling are possible with control-flow architectures, SDF makes it easier to coordinate the memory accesses and execution of a thread, as well as eliminate unnecessary dependencies among instructions. We have compared the execution cycles required for programs on SDF with the execution cycles required by programs on SimpleScalar (a superscalar simulator) by considering the essential aspects of these architectures in order to have a fair comparison. The results show that SDF architecture can outperform the superscalar. SDF performance scales better with the number of functional units and allows for a good exploitation of Thread Level Parallelism (TLP) and available chip area.
引用
收藏
页码:834 / 846
页数:13
相关论文
共 44 条
[1]   SPARCLE - AN EVOLUTIONARY PROCESSOR DESIGN FOR LARGE-SCALE MULTIPROCESSORS [J].
AGARWAL, A ;
KUBIATOWICZ, J ;
KRANZ, D ;
LIM, BH ;
YEUNG, D ;
DSOUZA, G ;
PARKIN, M .
IEEE MICRO, 1993, 13 (03) :48-61
[2]  
AGARWAL A, 1995, ACM COMP AR, P2, DOI 10.1109/ISCA.1995.524544
[3]  
ANG BS, 1994, 354 MASS I TECHN LAB
[4]   EXECUTING A PROGRAM ON THE MIT TAGGED-TOKEN DATA-FLOW ARCHITECTURE [J].
ARVIND ;
NIKHIL, RS .
IEEE TRANSACTIONS ON COMPUTERS, 1990, 39 (03) :300-318
[5]  
ARVIND, 1980, 174 MASS I TECHN LAB
[6]  
BLUMOFE RD, 1995, P 5 ACM S PRINC PRAC, P206
[7]  
Bohm A., 1991, CS91118 COL STAT U C
[8]  
BURGER D, 1997, 1342 U WISC DEP COM
[9]  
BUTLER M, 1991, P 18 ANN INT S COMP, P276
[10]   THE EXPLICIT TOKEN STORE [J].
CULLER, DE ;
PAPADOPOULOS, GM .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1990, 10 (04) :289-308