A multithreaded PowerPC processor for commercial servers

被引：39

作者：

Borkenhagen, JM

Eickemeyer, RJ

Kalla, RN

Kunkel, SR

机构：

[1] IBM Corp, Server Grp, Rochester, MN 55901 USA

[2] IBM Corp, Server Grp, Austin, TX 78758 USA

来源：

IBM JOURNAL OF RESEARCH AND DEVELOPMENT | 2000年 / 44卷 / 06期

关键词：

Buffer storage - Data storage equipment - Microprocessor chips - Optimization - Personal computers;

D O I：

10.1147/rd.446.0885

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes the microarchitecture of the RS64 IV, a multithreaded PowerPC(R) processor, and its memory system. Because this processor is used only in IBM iSeries(TM) and pSeries(TM) commercial sewers, it is optimized solely for commercial sewer workloads. Increasing miss rates because of trends in commercial sewer applications and increasing latency of cache misses because of rapidly increasing clock frequency are having a compounding effect on the portion of execution time that is wasted on cache misses. As a result, several optimizations are included in the processor design to address this problem. The most significant of these is the use of coarse-grained multithreading to enable the processor to perform useful instructions during cache misses. This provides a significant throughput increase while adding less than 5% to the chip area and having very little impact on cycle time. When compared with other performance-improvement techniques, multithreading yields an excellent ratio of performance gain to implementation cost. Second, the miss rate of the L2 cache is reduced by making it four-way associative. Third, the latency of cache-to-cache movement of data is minimized. Fourth, the size of the L1 caches is relatively large. In addition to addressing cache misses, pipeline "holes" caused by branches are minimized with large instruction buffers, large L1 I-cache fetch bandwidth, and optimized resolution of the branch direction. In part, the branches are resolved quickly because of the short but efficient pipeline. To minimize pipeline holes due to data dependencies, the L1 D-cache access is optimized to yield a one-cycle load-to-use penalty.

引用

页码：885 / 898

页数：14

共 13 条

[1] SPARCLE - AN EVOLUTIONARY PROCESSOR DESIGN FOR LARGE-SCALE MULTIPROCESSORS [J].

AGARWAL, A ;

KUBIATOWICZ, J ;

KRANZ, D ;

LIM, BH ;

YEUNG, D ;

DSOUZA, G ;

PARKIN, M .

IEEE MICRO, 1993, 13 (03) :48-61

[2]

[Anonymous], 1994, POWERPC ARCHITECTURE

[3] Memory system characterization of commercial workloads [J].

Barroso, LA ;

Gharachorloo, K ;

Bugnion, E .

25TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 1998, :3-14

[4]

BORKENHAGEN JM, 1994, PR IEEE COMP DESIGN, P192, DOI 10.1109/ICCD.1994.331886

[5]

EICKEMEYER R, 1996, P 23 INT S COMP ARCH, P203

[6]

EICKEMEYER R, 1997, LECT NOTES COMPUT SC, V1336, P75

[7]

Hristea C., 1997, Proceedings of the 1997 ACM/IEEE SC97 Conference

[8] Performance characterization of a quad Pentium Pro SMP using OLTP workloads [J].

Keeton, K ;

Patterson, DA ;

He, YQ ;

Raphael, RC ;

Baker, WE .

25TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 1998, :15-26

[9] An analysis of database workload performance on simultaneous multithreaded processors [J].

Lo, JL ;

Barroso, LA ;

Eggers, SJ ;

Gharachorloo, K ;

Levy, HM ;

Parekh, SS .

25TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 1998, :39-50

[10]

MAYNARD AMG, 1994, P 6 INT C ARCH SUPP, P145

← 1 2 →