FASTEST: A practical low-complexity algorithm for compile-time assignment of parallel programs to multiprocessors

被引:20
作者
Kwok, YK
Ahmad, I
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Peoples R China
关键词
automatic parallelization; compile-time scheduling; task graphs; multiprocessors; parallel processing; parallel programming tool; parallel algorithm; random neighborhood search;
D O I
10.1109/71.752781
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the area of parallelizing compilers, considerable research has been carried out on data dependency analysis, parallelism extraction, as well as program and data partitioning. However, designing a practical, low complexity scheduling algorithm without sacrificing performance remains a challenging problem. A variety of heuristics have been proposed to generate efficient solutions but they take prohibitively long execution times for moderate size or large problems. In this paper, we propose an algorithm called FASTEST (Fast Assignment and Scheduling of Tasks using an Efficient Search Technique) that has O(e) time complexity, where e is the number of edges in the task graph. The algorithm first generates an initial solution in a short time and then refines it by using a simple but robust random neighborhood search. We have also parallelized the search to further lower the time complexity. We are using the algorithm in a prototype automatic parallelization and scheduling tool which compiles sequential code and generates parallel code optimized with judicious scheduling. The proposed algorithm is evaluated with several application programs and outperforms a number of previous algorithms by generating parallelized code with shorter execution times, while taking dramatically shorter scheduling times. The FASTEST algorithm generates optimal solutions for a majority of the test cases and close-to-optimal solutions for the rest.
引用
收藏
页码:147 / 159
页数:13
相关论文
共 26 条
[1]   Automatic parallelization and scheduling of programs on multiprocessors using CASCH [J].
Ahmad, I ;
Kwok, YK ;
Wu, MY ;
Shu, W .
PROCEEDINGS OF THE 1997 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 1997, :288-291
[2]   Optimal and near-optimal allocation of precedence-constrained tasks to parallel processors: Defying the high complexity using effective search techniques [J].
Ahmad, I ;
Kwok, YK .
1998 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - PROCEEDINGS, 1998, :424-431
[3]  
[Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness
[4]   On effective execution of nonuniform DOACROSS loops [J].
Chen, DK ;
Yew, PC .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (05) :463-476
[5]  
Cosnard M., 1995, Parallel Processing Letters, V5, P527, DOI 10.1142/S0129626495000473
[6]   TASK-SCHEDULING IN MULTIPROCESSING SYSTEMS [J].
ELREWINI, H ;
ALI, HH ;
LEWIS, T .
COMPUTER, 1995, 28 (12) :27-&
[7]   Compile-time estimation of communication costs for data parallel programs [J].
Fahringer, T .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1996, 39 (01) :46-65
[8]   AUTOMATIC EXTRACTION OF FUNCTIONAL PARALLELISM FROM ORDINARY PROGRAMS [J].
GIRKAR, M ;
POLYCHRONOPOULOS, CD .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1992, 3 (02) :166-178
[9]  
GUPTA M, 1992, P 6 INT PAR PROC S M
[10]   SCHEDULING PRECEDENCE GRAPHS IN SYSTEMS WITH INTERPROCESSOR COMMUNICATION TIMES [J].
HWANG, JJ ;
CHOW, YC ;
ANGER, FD ;
LEE, CY .
SIAM JOURNAL ON COMPUTING, 1989, 18 (02) :244-257