Runtime mechanisms for efficient dynamic multithreading

被引:8
作者
Karamcheti, V
Plevyak, J
Chien, AA
机构
[1] Concurrent Syst. Architecture Group, Department of Computer Science, Univ. of Illinois at U., Urbana, IL 61801-2987
基金
美国国家航空航天局; 美国国家科学基金会;
关键词
D O I
10.1006/jpdc.1996.0105
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
High performance on distributed memory machines for programming models with dynamic thread creation and multithreading requires efficient thread management and communication. Traditional multithreading runtimes, consisting of few general-purpose, bundled mechanisms that assume minimal compiler and hardware support, are suitable for computations involving coarse-grained threads but provide low efficiency in the presence of small granularity threads and irregular communication behavior. We describe two mechanisms of the Illinois Concert runtime system which address this shortcoming. The first, hybrid stack-heap execution, exploits close coupling with the compiler to dynamically form coarse-grained execution units; threads are lazily created as required by runtime situations. The second, pull messaging, exploits hardware support to implement a distributed message queue with receiver-initiated data transfer, delivering robust performance across a wide range of dynamic communication characteristics. We measure their performance impact based on a Gray T3D implementation of the Concert system. Individually, the mechanisms increase absolute execution efficiency by up to 50%. Together, they increase the feasible space of efficient computations, enabling compute granularities an order of magnitude smaller. Performance results for two large irregular applications demonstrate that expressing programs using dynamic multithreading need not compromise on performance. (C) Academic Press, Inc.
引用
收藏
页码:21 / 40
页数:20
相关论文
共 49 条
[1]   SPARCLE - AN EVOLUTIONARY PROCESSOR DESIGN FOR LARGE-SCALE MULTIPROCESSORS [J].
AGARWAL, A ;
KUBIATOWICZ, J ;
KRANZ, D ;
LIM, BH ;
YEUNG, D ;
DSOUZA, G ;
PARKIN, M .
IEEE MICRO, 1993, 13 (03) :48-61
[2]  
*ARG NAT LAB, 1995, PORTS CONS PORTSO IN
[3]  
Barnes J., 1986, HIERARCHICAL O N LOG
[4]  
BLUMOFE RD, 1995, P PRINC PRACT PAR PR
[5]  
BORKAR S, 1990, 17TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, P70, DOI 10.1109/ISCA.1990.134510
[6]  
Brewer E. A., 1994, Proceedings Eighth International Parallel Processing Symposium (Cat. No.94TH0652-8), P858, DOI 10.1109/IPPS.1994.288205
[7]  
CALLAHAN T, 1995, P INT S COMP ARCH
[8]  
CHANDRA R, 1993, P 4 ACM SIGPLAN S PR
[9]  
CHIEN A, 1993, UIUCDCSR931815 U ILL
[10]  
CHIEN A, 1995, ICC PLUS PLUS LANGUA