深度学习平台体系架构及其关键技术

被引:5
作者
束柬 [1 ,2 ,3 ]
陈剑波 [1 ,2 ]
机构
[1] 科大讯飞股份有限公司人工智能研究院
[2] 认知智能国家重点实验室
[3] 中国科学技术大学计算机学院
基金
国家自然科学基金重点项目;
关键词
人工智能; 模型训练; 深度学习; 体系架构; 容器化;
D O I
10.19734/j.issn.1001-3695.2023.03.0111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
针对AI模型生产和训练,传统基于脚本的物理单机或集群方式存在训练推理割裂、资源利用不充分、计算环境难迁移、训练流程冗长等问题,提出了一种面向深度学习模型训练的平台体系架构,架构分为数据平台层、计算平台层、训练套件层以及管理平台层四层,并逐层进行分析。在关键技术上,首先提出了训练推理一体化框架,采用抽象的计算流图屏蔽网络结构差异,并进行图优化;其次,提出了GPU状态感知的自适应资源匹配机制,采用环形消除算法解决通信成本线性增长问题;同时,提出基于启发式算法的标签匹配调度算法,以提高资源利用率;并且,通过租户管理和容灾机制的建立保障了系统平台的安全可靠性。最终搭建仿真平台验证其可用性、安全可靠性和拓展性。通过深度学习平台的应用,可以帮助企业更简易快捷地训练定制化模型和使用个性化服务,加速AI生产落地,推动AI技术和整个生态的繁荣发展。
引用
收藏
页码:3353 / 3357
页数:5
相关论文
共 21 条
  • [1] Training, testing and benchmarking medical AI models using Clinical AIBench
    Huang Y.
    Miao X.
    Zhang R.
    Ma L.
    Liu W.
    Zhang F.
    Guan X.
    Liang X.
    Lu X.
    Tang S.
    Zhang Z.
    [J]. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 2022, 2 (01):
  • [2] Optimizing makespan and resource utilization for multi-DNN training in GPU cluster.[J].Li Zhongjin;Chang Victor;Hu Haiyang;Fu Maozhong;Ge Jidong;Piccialli Francesco.Future Generation Computer Systems.2021,
  • [3] Research on Face Recognition Sports Intelligence Training Platform Based on Artificial Intelligence
    Yang, Jie
    Tang, Lian
    Li, Xin-Wei
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2021, 30 (06N08)
  • [4] AI business model: an integrative business approach.[J].Mishra Shrutika;Tripathi A. R..Journal of Innovation and Entrepreneurship.2021, 1
  • [5] A survey on deep multimodal learning for computer vision: advances; trends; applications; and datasets.[J].Khaled Bayoudh;Raja Knani;Fayçal Hamdaoui;Abdellatif Mtibaa.The Visual Computer.2021, 8
  • [6] Distributed deep learning platform for pedestrian detection on IT convergence environment
    Han, Seong-Soo
    Kim, Yoon-Ki
    Jeon, You-Boo
    Park, JinSoo
    Park, Doo-Soon
    Hwang, DuHyun
    Jeong, Chang-Sung
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (07) : 5460 - 5485
  • [7] Business Process Driven Trust-Based Task Scheduling
    Shu, Jian
    Jain, Hemant
    Liang, Changyong
    [J]. INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2019, 16 (03) : 1 - 28
  • [8] NiftyNet: a deep-learning platform for medical imaging.[J].Eli Gibson;Wenqi Li;Carole Sudre;Lucas Fidon;Dzhoshkun I. Shakir;Guotai Wang;Zach Eaton-Rosen;Robert Gray;Tom Doel;Yipeng Hu;Tom Whyntie;Parashkev Nachev;Marc Modat;Dean C. Barratt;Sébastien Ourselin;M. Jorge Cardoso;Tom Vercauteren.Computer Methods and Programs in Biomedicine.2018,
  • [9] Deep Multimodal Learning A survey on recent advances and trends
    Ramachandram, Dhanesh
    Taylor, Graham W.
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 96 - 108
  • [10] Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach
    Liang, Muxuan
    Li, Zhizhong
    Chen, Ting
    Zeng, Jianyang
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (04) : 928 - 937