重庆理工大学学报(自然科学) ›› 2023, Vol. 37 ›› Issue (1): 56-65.

• “复杂环境智能汽车感知与控制”专栏 • 上一篇    下一篇

基于深度确定性梯度算法的端到端自动驾驶策略

赖晨光,杨小青,胡 博   

  1. 1.重庆理工大学 汽车零部件制造及检测技术教育部重点实验室,重庆 400054; 2.重庆理工大学 车辆工程学院,重庆 400054
  • 出版日期:2023-02-16 发布日期:2023-02-16
  • 作者简介::赖晨光,男,博士,教授,主要从事汽车和高速列车空气动力学研究,Email:chenguanglai@cqut.edu.cn;杨小青, 女,硕士研究生,主要从事端到端的无人驾驶控制研究,Email:yxq@2019.cqut.edu.cn;通讯作者 胡博,男,博 士,副教授,主要从事新能源汽车动力总成建模及智能控制研究,Email:b.hu@cqut.edu.cn。

An end-to-end autonomous driving strategy based on the deep deterministic gradient algorithm

  • Online:2023-02-16 Published:2023-02-16

摘要: 根据深度确定性策略梯度算法理论,提出了端到端的自动驾驶控制策略,通过Carla 无人驾驶模拟器,以汽车前视图像和少量测量信息作为输入,直接输出转向、油门或制动的控制 动作。同时,鉴于强化学习过程中存在大量试错行为,设计了对危险试错动作加以约束并修正 的监督器,以减少危险动作并提升训练效率。根据 Carla的训练测试结果表明,深度确定性策略 梯度算法能使小车学习到有效的自动驾驶策略,且添加监督器之后的算法能明显减少试错行为 并提升训练效率。

关键词: 自动驾驶, 强化学习, 深度确定性策略梯度, 监督式深度强化学习

Abstract:

The continuous progress of artificial intelligence has pushed cars into an intelligent era. At present, the common automatic driving scheme adopts a hierarchical architecture of perception decision control. Such a model has many difficulties: 1. Rule-based strategies require a lot of manual design, which not only requires complex processes, but also costs a lot; 2. It is unable to adapt to a densely populated and complex urban traffic environment; 3. The lower module is closely connected with the upper module, and the maintenance of the system is cumbersome and miscellaneous. In view of these problems, this paper uses the Carla urban driving simulator to conduct simulation experiments on lane keeping tasks of intelligent driving with a deep deterministic policy gradient algorithm, aiming to solve the problem of over dependence on traditional upper and lower modules through end-to-end control methods. Secondly, because the algorithm is a combination of reinforcement learning and deep learning, according to the characteristics of reinforcement learning, irregular trial and error are required in the training process, and they are too costly for vehicle driving. Therefore, in view of the characteristic that the deep deterministic policy gradient algorithm needs trial and error, based on this algorithm, a real-time monitor for dangerous automobile behaviors is designed between the environment and the agent, which can constrain and correct the dangerous behaviors of the agent so as to reduce trial and error behaviors and improve the training efficiency.

The deep deterministic policy gradient algorithm and the supervised deep deterministic policy gradient algorithm have trained 70 000 episodes respectively in the Carla simulation environment. The simulation results show that the two algorithms have finally achieved the same training effect, can effectively avoid obstacles, and make the driver drive normally without violating driving rules, but the latter has a faster convergence speed. Secondly, taking the map, the number of dynamic factors and weather as the control variables, the two algorithm models are evaluated and tested under a unified evaluation scheme of the experimental platform with a lane keeping task. Finally, the supervised deep deterministic policy gradient algorithm achieves 98% and 89% of the average task completion in the environment without and with dynamic factors respectively, while the DDPG achieves 97% and 88% of the average task completion respectively also in the two above mentioned environments.

The deep deterministic policy gradient algorithm is used to control the autonomous vehicle end to end, which not only effectively improves the disadvantages of the heavy dependence on the upper and lower modules in the traditional scheme, but also shortens the development cycle. Although the final control effect of the supervised reinforcement learning is the same as that of the original algorithm, it significantly improves the convergence speed and effectively reduces the early trial and error frequency of agents. Therefore, the combination of supervised learning and reinforcement learning can provide a new solution to reduce the risk of trial and error in reinforcement learning, and provide a certain reference value for the realization of end-to-end intelligent driving from simulation to the practical application of in-deep reinforcement learning.

中图分类号: 

  • U463.6