Abstract

Considering the load uncertainty and unmodeled dynamics in multicylinder hydraulic systems, this paper proposes a balance control algorithm based on safe reinforcement learning to release the restrictions of classical model-based control methods that depend on fixed gain. In this paper, the hydraulic press is controlled by a trained agent that directly maps the system states to control commands in an end-to-end manner. By introducing an action modifier into the algorithm, the system states are kept within security constraints from the beginning of training, making safe exploration possible. Furthermore, a normalized exponential reward function has been proposed. Compared with a quadratic reward function, the precision is greatly improved under the same training steps. The experiment shows that our algorithm can achieve high precision and fast balance for multicylinder hydraulic presses while being highly robust. To the best of our knowledge, this research is the first to attempt the application of a reinforcement learning algorithm to multi-execution units of hydraulic systems.

References

1.
Lu
,
X.
, and
Huang
,
M.
,
2012
, “
System Decomposition-Based Multilevel Control for Hydraulic Press Machine
,”
IEEE Trans. Ind. Electron.
,
59
(
4
), pp.
1980
1987
.10.1109/TIE.2011.2160137
2.
Zheng
,
A.
,
2012
, “
Modeling and Control of 80 mn Titanium-Alloy Forging Hydraulic Press
,” Ph.D. thesis,
Tianjin University
,
Tianjin
.
3.
Chao
,
J.
, and
Wu
,
A.
,
2013
, “
Novel Control Scheme for Multi Cylinder Hydraulic Press
,”
J. Huazhong Univ. Sci. Technol.
,
041
(
009
), pp.
42
47
.
4.
Chao
,
J.
,
Haocheng
,
H.
, and
Enzeng
,
D.
,
2021
, “
Dynamic Allocation of Sliding Mode Fault Tolerance of Hydraulic Press Based on Disturbance Observer
,”
J. Huazhong Univ. Sci. Technol.
,
49
(
9
), pp.
33
39
.10.13245/j.hust.210806
5.
Wang
,
Y.
,
Sun
,
J.
,
He
,
H.
, and
Sun
,
C.
,
2020
, “
Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control
,”
IEEE Trans. Syst., Man, Cybern.: Syst.
,
50
(
10
), pp.
3713
3725
.10.1109/TSMC.2018.2884725
6.
Silver
,
D.
,
Huang
,
A.
,
Maddison
,
C. J.
,
Guez
,
A.
,
Sifre
,
L.
,
van den Driessche
,
G.
,
Schrittwieser
,
J.
, et al.,
2016
, “
Mastering the Game of Go With Deep Neural Networks and Tree Search
,”
Nat.
,
529
(
7587
), pp.
484
489
.10.1038/nature16961
7.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Rusu
,
A. A.
,
Veness
,
J.
,
Bellemare
,
M. G.
,
Graves
,
A.
,
Ried Miller
,
M.
,
Fidjeland
,
A. K.
,
Ostrovski
,
G.
,
Petersen
,
S.
,
Beattie
,
C.
,
Sadik
,
A.
,
Antonoglou
,
I.
,
King
,
H.
,
Kumaran
,
D.
,
Wierstra
,
D.
,
Legg
,
S.
, and
Hassabis
,
D.
,
2015
, “
Human-Level Control Through Deep Reinforcement Learning
,”
Nature
,
518
(
7540
), pp.
529
533
.10.1038/nature14236
8.
Wang
,
C.
,
Wang
,
J.
,
Shen
,
Y.
, and
Zhang
,
X. D.
,
2019
, “
Autonomous Navigation of Uavs in Large Scale Complex Environments: A Deep Reinforcement Learning Approach
,”
IEEE Trans. Veh. Technol.
,
68
(
3
), pp.
2124
2136
.10.1109/TVT.2018.2890773
9.
Wu
,
H.
,
Song
,
S. J.
,
You
,
K. Y.
, and
Wu
,
C.
,
2019
, “
Depth Control of Model-Free Auvs Via Reinforcement Learning
,”
IEEE Trans. Syst. Man Cybern.-Syst.
,
49
(
12
), pp.
2499
2510
.10.1109/TSMC.2017.2785794
10.
Wang
,
D. R.
,
Shen
,
Y.
,
Wan
,
J. H.
,
Sha
,
Q. X.
,
Li
,
G. L.
,
Chen
,
G. Z.
, and
He
,
B.
,
2022
, “
Sliding Mode Heading Control for Auv Based on Continuous Hybrid Model-Free and Model-Based Reinforcement Learning
,”
Appl. Ocean Res.
,
118
, p.
102960
.10.1016/j.apor.2021.102960
11.
Pi
,
C. H.
,
Ye
,
W. Y.
, and
Cheng
,
S.
,
2021
, “
Robust Quadrotor Control Through Reinforcement Learning With Disturbance Compensation
,”
Appl. Sci.
,
11
(
7
), p.
3257
.10.3390/app11073257
12.
Jiahui
,
Z.
,
2018
, “
Study on Adaptive PID Control Strategy Based on Actor-Critic Learning
,”. Master thesis,
Yanshan University, Qinhuangdao, China
.
13.
Zhengjie
,
G.
,
2019
, “
Position Control for Hydraulic Drive Unit Based on Deep Reinforcement Learning
,” Master thesis, Yanshan University, Qinhuangdao, China.
14.
Wyrwal
,
D.
,
Lindner
,
T.
,
Nowak
,
P.
, and
Bialek
,
M.
,
2020
, “
Control Strategy of Hydraulic Cylinder Based on Deep Reinforcement Learning
,” Mecha Tronics Systems and Materials (
MSM
), Bialystok, Poland, July 1–3, pp.
1
5
.10.1109/MSM49833.2020.9202351
15.
Brunke
,
L.
,
Greeff
,
M.
,
Hall
,
A. W.
,
Yuan
,
Z. C.
,
Zhou
,
S. Q.
,
Panerati
,
J.
, and
Schoellig
,
A. P.
,
2022
, “
Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning
,”
Annu. Rev. Control Rob. Autom. Syst.
,
5
(
1
), pp.
411
444
.10.1146/annurev-control-042920-020211
16.
Tessler
,
C.
,
Mankowitz
,
D. J.
, and
Mannor
,
S.
,
2018
, “
Reward Constrained Policy Optimization
,” e-print
arXiv:1805.11074
.10.48550/arXiv.1805.11074
17.
Ou
,
X.
,
Chang
,
Q.
, and
Chakraborty
,
N.
,
2019
, “
Simulation Study on Reward Function of Reinforcement Learning in Gantry Work Cell Scheduling
,”
J. Manuf. Syst.
,
50
, pp.
1
8
.10.1016/j.jmsy.2018.11.005
18.
Achiam
,
J.
,
Held
,
D.
,
Tamar
,
A.
, and
Abbeel
,
P.
,
2017
, “Constrained Policy Optimization,” e-print
arXiv:1705.10528
.10.48550/arXiv.1705.10528
19.
Chowdhary
,
G.
, and
Johnson
,
E. N.
,
2011
, “
Con Current Learning for Convergence in Adaptive Control Without Persistency of Excitation
,”
49th IEEE Conference on Decision and Control
, Atlanta, GA, Dec. 15–17, pp.
3674
3679
.10.1109/CDC.2010.5717148
20.
Dalal
,
G.
,
Dvijotham
,
K.
,
Vecerik
,
M.
,
Hester
,
T.
,
Paduraru
,
C.
, and
Tassa
,
Y.
,
2018
, “
Safe Exploration in Continuous Action Spaces
,” e-print
arXiv:1801.08757
.10.48550/arXiv.1801.08757
21.
Jia
,
C.
,
Sun
,
Y. L.
,
Du
,
L. F.
, and
Wang
,
H. K.
,
2023
, “
Fault-Tolerant Control Strategy for Multi Cylinder Hydraulic Press Machine Based on Dynamic Control Allocation and Adjustable Multi Objective Optimization
,”
Asian J. Control.
,
25
(
5
), pp.
3541
3558
.10.1002/asjc.3038
22.
Bellman
,
R.
,
1957
,
Dynamic Programming
, Princeton University Press, Princeton, NJ.
23.
Sutton
,
R. S.
,
McAllester
,
D.
,
Singh
,
S.
, and
Man Sour
,
Y.
,
1999
, “
Policy Gradient Methods for Reinforcement Learning With Function Approximation
,” Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS'99), MIT Press, Cambridge, MA, Nov., pp.
1057
1063
.
24.
Sutton
,
R. S.
,
1988
, “
Learning to Predict by the Methods of Temporal Differences
,”
Mach. Learn.
,
3
(
1
), pp.
9
44
.10.1007/BF00115009
25.
Schulman
,
J.
,
Levine
,
S.
,
Moritz
,
P.
,
Jordan
,
M. I.
, and
Abbeel
,
P.
,
2015
, “
Trust Region Policy Optimization
,” e-print,
arXiv:1502.05477
.10.48550/arXiv.1502.05477
26.
Schulman
,
J.
,
Wolski
,
F.
,
Dhariwal
,
P.
,
Radford
,
A.
, and
Klimov
,
O.
,
2017
, “
Proximal Policy Optimization Algorithms
,” e-print
arXiv:1707.06347
.10.48550/arXiv.1707.06347
27.
Lillicrap
,
T. P.
,
Hunt
,
J. J.
,
Pritzel
,
A.
,
Heess
,
N.
,
Erez
,
T.
,
Tassa
,
Y.
,
Silver
,
D.
, and
Wierstra
,
D.
,
2015
, “
Continuous Control With Deep Reinforcement Learning
,” e-print
arXiv:1509.02971
.10.48550/arXiv.1509.02971
28.
Schulman
,
J.
,
Moritz
,
P.
,
Levine
,
S.
,
Jordan
,
M.
, and
Abbeel
,
P.
,
2015
, “
High-Dimensional Continuous Control Using Generalized Advantage Estimation
,” e-print
arXiv:1506.02438
.10.48550/arXiv.1506.02438
29.
Wu
,
X.
,
Liu
,
S.
,
Yang
,
L.
,
Deng
,
W.
, and
Jia
,
Z.
,
2020
, “
A Gait Control Method for Biped Robot on Slope Based on Deep Reinforcement Learning
,”
Acta Autom. Sin.
,
47
(
8
), pp.
1
13
.10.16383/j.aas.c190547
You do not currently have access to this content.