共 8 条
- [1] Benchmarking GPUs to tune dense linear algebra. Vasily Volkov,James W Demmel. SC’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing . 2008
- [2] Intel-performance libraries:Multi-core-ready software for numeric-intensive computation. Burylov I,Chuvelev M. Intel Technology Journal . 2007
- [3] Anatomy of high-performance matrix multiplication. K.Goto,R.V.D.Geijn. ACM Transactions on Mathematical Software . 2008
- [4] Optimizationprinciples and application performance evaluation of amultithreaded GPU using CUDA. Ryoo S,Rodrigues C I,Stone S S,et al. Proceedings of the13th ACM SIGPLAN Symposium on Principles and Practice ofParallel Programming . 2008
- [5] NVIDIA CUDA programming guide Version2.1. NVIDIA Corporation. http://developer.nvidia.com/cuda . 2009
- [6] Matrix multiplication via arithmetic progressions. Don Coppersmith,Shmuel Winograd. Journal of Symbolic Logic . 1990
- [7] Computer architecture:a quantitative approach. Hennessy J L,Patterson D A. . 2007
- [8] Compute Unified Device Architecture Application Suitability. WM Hwu,C Rodrigues,S Ryoo,J Stratton. Computing in Science and Engineering . 2009