Journal of Chongqing University of Technology(Natural Science) ›› 2023, Vol. 37 ›› Issue (9): 208-216.

• Information and computer science • Previous Articles     Next Articles

Research on the construction and optimization of heterogeneous distributed deep learning platform

  

  • Online:2023-10-17 Published:2023-10-17

Abstract: The combination of deep learning and big data technology is the general trend. There are still many problems to be solved and optimized in terms of resource management and task scheduling. Aiming at the three problems of weak management ability of heterogeneous resources, poor flexibility of native scheduling algorithms, and lack of a unified interface for multiple frameworks, a distributed deep learning framework integration platform under heterogeneous resources is proposed, and the optimization of task scheduling algorithms is studied. Based on the Spark framework, the platform expands and manages heterogeneous resources downwards, integrates the two frameworks SparkOnAngel and TensorFlowOnSpark upwards, and uses physical labeling to label machines with different computing resources. The dual representation of the model is used to optimize the scheduling algorithm. The results show that compared with the traditional spark cluster, the execution time of this platform is reduced by 13.4% in the mixed task scenario of 5 minist_spark and 5 WordCount tasks; can be reduced to 32.31%. The platform can expand the management of GPU resources, make the scheduling algorithm more flexible and efficient, and provide a unified calling interface for multiple frameworks.

CLC Number: 

  • TP393