Abstract

Inverse reinforcement learning (IRL) has been successfully applied in many robotics and autonomous driving studies without the need for hand-tuning a reward function. However, it suffers from safety issues. Compared to the reinforcement learning algorithms, IRL is even more vulnerable to unsafe situations as it can only infer the importance of safety based on expert demonstrations. In this paper, we propose a safety-aware adversarial inverse reinforcement learning (S-AIRL) algorithm. First, the control barrier function is used to guide the training of a safety critic, which leverages the knowledge of system dynamics in the sampling process without training an additional guiding policy. The trained safety critic is then integrated into the discriminator to help discern the generated data and expert demonstrations from the standpoint of safety. Finally, to further enforce the importance of safety, a regulator is introduced in the loss function of the discriminator training to prevent the recovered reward function from assigning high rewards to the risky behaviors. We tested our S-AIRL in the highway autonomous driving scenario. Comparing to the original AIRL algorithm, with the same level of imitation learning performance, the proposed S-AIRL can reduce the collision rate by 32.6%.

References

1.
Kiran
,
B. R.
,
Sobh
,
I.
,
Talpaert
,
V.
,
Mannion
,
P.
,
Sallab
,
A. A. A.
,
Yogamani
,
S.
, and
Pérez
,
P.
,
2021
, “
Deep Reinforcement Learning for Autonomous Driving: A Survey
,”
IEEE Trans. Intell. Transpor. Sys.
, pp.
1
18
.
2.
Kober
,
J.
,
Bagnell
,
J. A.
, and
Peters
,
J.
,
2013
, “
Reinforcement Learning in Robotics: A Survey
,”
Int. J. Robot. Res.
,
32
(
11
), pp.
1238
1274
.
3.
Pomerleau
,
D. A.
,
1991
, “
Efficient Training of Artificial Neural Networks for Autonomous Navigation
,”
Neural Comput.
,
3
(
1
), pp.
88
97
.
4.
Ross
,
S.
, and
Bagnell
,
D.
,
2010
, “
Efficient Reductions for Imitation Learning
,”
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings
,
Sardinia, Italy
,
May 13–15
, pp.
661
668
.
5.
Abbeel
,
P.
, and
Ng
,
A. Y.
,
2004
, “
Apprenticeship Learning Via Inverse Reinforcement Learning
,”
Proceedings of the Twenty-First International Conference on Machine Learning
,
Banff, Alberta, Canada
,
July 4–8
, p.
1
.
6.
Ziebart
,
B. D.
,
Maas
,
A. L.
,
Bagnell
,
J. A.
, and
Dey
,
A. K.
,
2008
, “
Maximum Entropy Inverse Reinforcement Learning
,”
AAAI
, Vol.
8
,
Chicago, IL
,
July 13–17
, pp.
1433
1438
.
7.
Finn
,
C.
,
Levine
,
S.
, and
Abbeel
,
P.
,
2016
, “
Guided Cost Learning: Deep Inverse Optimal Control Via Policy Optimization
,”
International Conference on Machine Learning, PMLR
,
New York City, New York
,
June 19–24
, pp.
49
58
.
8.
Finn
,
C.
,
Christiano
,
P.
,
Abbeel
,
P.
, and
Levine
,
S.
,
2016
, “
A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
,” preprint arXiv:1611.03852.
9.
Fu
,
J.
,
Luo
,
K.
, and
Levine
,
S.
,
2018
, “
Learning Robust Rewards With Adversarial Inverse Reinforcement Learning
,”
International Conference on Learning Representations
,
Vancouver, BC, Canada
,
Apr. 30–May 3
.
10.
Wang
,
P.
,
Liu
,
D.
,
Chen
,
J.
,
Li
,
H.
, and
Chan
,
C.-Y.
,
2021
, “
Decision Making for Autonomous Driving via Augmented Adversarial Inverse Reinforcement Learning
,” 2021
IEEE International Conference on Robotics and Automation (ICRA)
, IEEE, pp.
1036
1042
.
11.
Tucker
,
A.
,
Gleave
,
A.
, and
Russell
,
S.
,
2018
, “
Inverse Reinforcement Learning for Video Games
,” preprint arXiv:1810.10593.
12.
Ho
,
J.
, and
Ermon
,
S.
,
2016
, “
Generative Adversarial Imitation Learning
,”
Adv. Neur. Infor. Proc. Sys.
,
29
, pp.
4565
4573
.
13.
Gillula
,
J. H.
, and
Tomlin
,
C. J.
,
2012
, “
Guaranteed Safe Online Learning Via Reachability: Tracking a Ground Target Using a Quadrotor
,”
Proceedings of the IEEE ICRA
,
Saint Paul, MN
,
May 14–18
,
IEEE
, pp.
2723
2730
.
14.
Bansal
,
S.
,
Chen
,
M.
,
Herbert
,
S.
, and
Tomlin
,
C. J.
,
2017
, “
Hamilton–Jacobi Reachability: A Brief Overview and Recent Advances
,”
Proceedings of the IEEE Conference on Decision and Control
,
Melbourne, Australia
,
Dec. 12–15
,
IEEE
, pp.
2242
2253
.
15.
Choi
,
J. J.
,
Lee
,
D.
,
Sreenath
,
K.
,
Tomlin
,
C. J.
, and
Herbert
,
S. L.
,
2021
, “
Robust Control Barrier-Value Functions for Safety-Critical Control
,” preprint arXiv:2104.02808.
16.
Ames
,
A. D.
,
Coogan
,
S.
,
Egerstedt
,
M.
,
Notomista
,
G.
,
Sreenath
,
K.
, and
Tabuada
,
P.
,
2019
, “
Control Barrier Functions: Theory and Applications
,”
Proceedings of the IEEE ECC
,
Naples, Italy
,
June 25–28
,
IEEE
, pp.
3420
3431
.
17.
Cheng
,
R.
,
Orosz
,
G.
,
Murray
,
R. M.
, and
Burdick
,
J. W.
,
2019
, “
End-to-End Safe Reinforcement Learning Through Barrier Functions for Safety-Critical Continuous Control Tasks
,”
Proceedings of the AAAI Conference on Artificial Intelligence
,
Honolulu, HI
,
Jan. 27–Feb. 1
, Vol.
33
, pp.
3387
3395
.
18.
Fisac
,
J. F.
,
Akametalu
,
A. K.
,
Zeilinger
,
M. N.
,
Kaynama
,
S.
,
Gillula
,
J.
, and
Tomlin
,
C. J.
,
2018
, “
A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems
,”
IEEE Trans. Autom. Contr.
,
64
(
7
), pp.
2737
2752
.
19.
Taylor
,
A.
,
Singletary
,
A.
,
Yue
,
Y.
, and
Ames
,
A.
,
2020
, “
Learning for Safety-Critical Control With Control Barrier Functions
,”
Learning for Dynamics and Control, PMLR
,
Online
,
June 10–11
, pp.
708
717
.
20.
Srinivasan
,
K.
,
Eysenbach
,
B.
,
Ha
,
S.
,
Tan
,
J.
, and
Finn
,
C.
,
2020
, “
Learning to be Safe: Deep RL With a Safety Critic
,” preprint arXiv:2010.14603.
21.
Eysenbach
,
B.
,
Gu
,
S.
,
Ibarz
,
J.
, and
Levine
,
S.
,
2018
, “
Leave No Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
,”
International Conference on Learning Representations
,
Vancouver, BC, Canada
,
Apr. 30–May 3
.
22.
Thananjeyan
,
B.
,
Balakrishna
,
A.
,
Nair
,
S.
,
Luo
,
M.
,
Srinivasan
,
K.
,
Hwang
,
M.
,
Gonzalez
,
J. E.
,
Ibarz
,
J.
,
Finn
,
C.
, and
Goldberg
,
K.
,
2021
, “
Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones
,”
IEEE Robot. Autom. Lett.
,
6
(
3
), pp.
4915
4922
.
23.
Zhang
,
J.
, and
Cho
,
K.
,
2016
, “
Query-Efficient Imitation Learning for End-to-End Autonomous Driving
,” preprint arXiv:1605.06450.
24.
Brown
,
D. S.
,
Cui
,
Y.
, and
Niekum
,
S.
,
2018
, “
Risk-Aware Active Inverse Reinforcement Learning
,”
Conference on Robot Learning, PMLR
,
Zurich, Switzerland
,
Oct. 29–31
, pp.
362
372
.
25.
Lacotte
,
J.
,
Ghavamzadeh
,
M.
,
Chow
,
Y.
, and
Pavone
,
M.
,
2019
, “
Risk-Sensitive Generative Adversarial Imitation Learning
,”
The 22nd International Conference on Artificial Intelligence and Statistics, PMLR
,
Naha, Okinawa, Japan
,
Apr. 16–18
, pp.
2154
2163
.
26.
Ames
,
A. D.
,
Grizzle
,
J. W.
, and
Tabuada
,
P.
,
2014
, “
Control Barrier Function Based Quadratic Programs With Application to Adaptive Cruise Control
,”
Proceedings of the IEEE Conference on Decision and Control
,
Los Angeles, CA
,
Dec. 15–17
,
IEEE
, pp.
6271
6278
.
27.
Leurent
,
E.
,
2018
, “
An Environment for Autonomous Driving Decision-Making
.” https://github.com/eleurent/highway-env
28.
Treiber
,
M.
,
Hennecke
,
A.
, and
Helbing
,
D.
,
2000
, “
Congested Traffic States in Empirical Observations and Microscopic Simulations
,”
Phys. Rev. E
,
62
(
2
), p.
1805
.
29.
Kesting
,
A.
,
Treiber
,
M.
, and
Helbing
,
D.
,
2007
, “
General Lane-Changing Model Mobil for Car-Following Models
,”
Transp. Res. Record
,
1999
(
1
), pp.
86
94
.
30.
Schulman
,
J.
,
Wolski
,
F.
,
Dhariwal
,
P.
,
Radford
,
A.
, and
Klimov
,
O.
,
2017
, “
Proximal Policy Optimization Algorithms
,” preprint arXiv:1707.06347.
31.
Kong
,
J.
,
Pfeiffer
,
M.
,
Schildbach
,
G.
, and
Borrelli
,
F.
,
2015
, “
Kinematic and Dynamic Vehicle Models for Autonomous Driving Control Design
,”
2015 IEEE Intelligent Vehicles Symposium (IV)
,
Seoul, South Korea
,
June 19–July 1
,
IEEE
, pp.
1094
1099
.
32.
He
,
S.
,
Zeng
,
J.
,
Zhang
,
B.
, and
Sreenath
,
K.
,
2021
, “
Rule-Based Safety-Critical Control Design Using Control Barrier Functions With Application to Autonomous Lane Change
,” preprint arXiv:2103.12382.
33.
Bae
,
I.
,
Moon
,
J.
,
Jhung
,
J.
,
Suk
,
H.
,
Kim
,
T.
,
Park
,
H.
,
Cha
,
J.
,
Kim
,
J.
,
Kim
,
D.
, and
Kim
,
S.
,
2020
, “
Self-Driving Like a Human Driver Instead of a Robocar: Personalized Comfortable Driving Experience for Autonomous Vehicles
,” preprint arXiv:2001.03908.
You do not currently have access to this content.