Journal of Chongqing University of Technology(Natural Science) ›› 2023, Vol. 37 ›› Issue (1): 56-65.
• "Intelligent Vehicle Perception and Control in Complex Environments" special column • Previous Articles Next Articles
Online:
Published:
Abstract:
The continuous progress of artificial intelligence has pushed cars into an intelligent era. At present, the common automatic driving scheme adopts a hierarchical architecture of perception decision control. Such a model has many difficulties: 1. Rule-based strategies require a lot of manual design, which not only requires complex processes, but also costs a lot; 2. It is unable to adapt to a densely populated and complex urban traffic environment; 3. The lower module is closely connected with the upper module, and the maintenance of the system is cumbersome and miscellaneous. In view of these problems, this paper uses the Carla urban driving simulator to conduct simulation experiments on lane keeping tasks of intelligent driving with a deep deterministic policy gradient algorithm, aiming to solve the problem of over dependence on traditional upper and lower modules through end-to-end control methods. Secondly, because the algorithm is a combination of reinforcement learning and deep learning, according to the characteristics of reinforcement learning, irregular trial and error are required in the training process, and they are too costly for vehicle driving. Therefore, in view of the characteristic that the deep deterministic policy gradient algorithm needs trial and error, based on this algorithm, a real-time monitor for dangerous automobile behaviors is designed between the environment and the agent, which can constrain and correct the dangerous behaviors of the agent so as to reduce trial and error behaviors and improve the training efficiency.
The deep deterministic policy gradient algorithm and the supervised deep deterministic policy gradient algorithm have trained 70 000 episodes respectively in the Carla simulation environment. The simulation results show that the two algorithms have finally achieved the same training effect, can effectively avoid obstacles, and make the driver drive normally without violating driving rules, but the latter has a faster convergence speed. Secondly, taking the map, the number of dynamic factors and weather as the control variables, the two algorithm models are evaluated and tested under a unified evaluation scheme of the experimental platform with a lane keeping task. Finally, the supervised deep deterministic policy gradient algorithm achieves 98% and 89% of the average task completion in the environment without and with dynamic factors respectively, while the DDPG achieves 97% and 88% of the average task completion respectively also in the two above mentioned environments.
The deep deterministic policy gradient algorithm is used to control the autonomous vehicle end to end, which not only effectively improves the disadvantages of the heavy dependence on the upper and lower modules in the traditional scheme, but also shortens the development cycle. Although the final control effect of the supervised reinforcement learning is the same as that of the original algorithm, it significantly improves the convergence speed and effectively reduces the early trial and error frequency of agents. Therefore, the combination of supervised learning and reinforcement learning can provide a new solution to reduce the risk of trial and error in reinforcement learning, and provide a certain reference value for the realization of end-to-end intelligent driving from simulation to the practical application of in-deep reinforcement learning.
CLC Number:
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: http://clgzk.qks.cqut.edu.cn/EN/
http://clgzk.qks.cqut.edu.cn/EN/Y2023/V37/I1/56
Cited