Abstract

In order to improve the reliability and economy of decentralized trade economy dynamic scheduling on e-Commerce platforms and shorten the running time of decentralized trade economy dynamic scheduling on e-Commerce platforms, a decentralized trade economy dynamic scheduling method based on the reinforcement learning algorithm is proposed. In this paper, we analyze the basic theory of the reinforcement learning algorithm, study the Q-learning algorithm, build a neural network to fit the value model, and initialize the reinforcement learning algorithm. With Markov decision process as the framework model, the optimal state behavior value function is updated by using the modeless discounted reward reinforcement learning algorithm Q-learning as the value iteration method. Gibbs distribution is used to construct exploratory random strategies to select behaviors with probability. Using the reinforcement learning algorithm and the three-layer feedforward neural network as the approximator of the state behavior value function, this paper studies the generalization of the value function faced by the decentralized trade economy dynamic scheduling of e-Commerce platforms and realizes the decentralized trade economy dynamic scheduling of e-Commerce platforms. The experimental results show that the proposed method can effectively improve the reliability and economy of the decentralized trade economy dynamic scheduling of e-Commerce platforms.

References

1.
Liu
J.
,
Zhang
Y.
,
Wang
X.
,
Deng
Y.
, and
Wu
X.
, “
Dynamic Pricing on E-Commerce Platform with Deep Reinforcement Learning: A Field Experiment
,”
arXiv preprint arXiv:1912.02572v3
,
2021
, https://doi.org/10.48550/arXiv.1912.02572
2.
Joshi
P.
and
Kumar
A.
, “
A Novel Framework for Decentralized C2C E-Commerce Using Smart Contract
,” in
2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT)
(
New York
:
Institute of Electrical and Electronics Engineers
,
2020
),
1
5
, https://doi.org/10.1109/ICCCNT49239.2020.9225377
3.
Yuan
G.
,
Zhao
H.
, and
Yuan
L.
, “
Research on E-Commerce Service Platform from the Perspective of Targeted Poverty Alleviation Utilizing Huanren and Xinbin as Examples
,” in
Fifth International Conference on Social Sciences and Economic Development (ICSSED 2020)
(
Amsterdam, the Netherlands
:
Atlantis Press
,
2020
),
395
399
, https://doi.org/10.2991/assehr.k.200331.082
4.
Guo
L.
, “
Cross-Border E-Commerce Platform for Commodity Automatic Pricing Model Based on Deep Learning
,”
Electronic Commerce Research
22
, no. 
1
(March
2020
):
1
20
, https://doi.org/10.1007/s10660-020-09449-6
5.
Liu
Y. Y.
, “
Empirical Analysis of the Impact of China’s Cross-Border Electronic Commerce on Foreign Trade Based on VAR Model
,”
Journal of Physics: Conference Series
1533
(
2020
): 022064, https://doi.org/10.1088/1742-6596/1533/2/022064
6.
Dai
C.
,
Hu
Z.
, and
Su
Q.
, “
An Adaptive Hybrid Backtracking Search Optimization Algorithm for Dynamic Economic Dispatch with Valve-Point Effects
,”
Energy
239
, Part E (January
2022
): 122461, https://doi.org/10.1016/j.energy.2021.122461
7.
McLarty
D.
,
Panossian
N.
,
Jabbari
F.
, and
Traverso
A.
, “
Dynamic Economic Dispatch Using Complementary Quadratic Programming
,”
Energy
166
(January
2019
):
755
764
, https://doi.org/10.1016/j.energy.2018.10.087
8.
Xia
Y.
,
Zhou
B.
, and
Lin
X.
, “
Online Recursive Power Management Strategy Based on the Reinforcement Learning Algorithm with Cosine Similarity and a Forgetting Factor
,”
IEEE Transactions on Industrial Electronics
68
, no. 
6
(April
2020
):
5013
5023
, https://doi.org/10.1109/TIE.2020.2988189
9.
Vázquez-Canteli
J. R.
and
Nagy
Z.
, “
Reinforcement Learning for Demand Response: A Review of Algorithms and Modeling Techniques
,”
Applied Energy
235
(February
2019
):
1072
1089
, https://doi.org/10.1016/j.apenergy.2018.11.002
10.
Sun
Y.
,
Xu
J.
,
Chen
C.
, and
Hu
W.
, “
Reinforcement Learning-Based Optimal Tracking Control for Levitation System of Maglev Vehicle with Input Time Delay
,”
IEEE Transactions on Instrumentation and Measurement
71
(January
2022
):
1
13
, https://doi.org/10.1109/TIM.2022.3142059
11.
Schlicher
L.
,
Slikker
M.
,
van Jaarsveld
W.
, and
van Houtum
G.-J.
, “
Core Nonemptiness of Stratified Pooling Games: A Structured Markov Decision Process Approach
,”
Mathematics of Operations Research
45
, no. 
4
(November
2020
):
1445
1465
, https://doi.org/10.1287/moor.2019.1038
12.
Hui
H.
,
Chen
W.
, and
Wang
L.
, “
Caching with Finite Buffer and Request Delay Information: A Markov Decision Process Approach
,”
IEEE Transactions on Wireless Communications
19
, no. 
8
(August
2020
):
5148
5161
, https://doi.org/10.1109/TWC.2020.2989513
13.
Dai
P.
,
Yu
W.
, and
Chen
D.
, “
Distributed Q-Learning Algorithm for Dynamic Resource Allocation with Unknown Objective Functions and Application to Microgrid
,”
IEEE Transactions on Cybernetics
52
, no. 
11
(November
2022
):
12340
12350
, https://doi.org/10.1109/TCYB.2021.3082639
14.
Karami
M. H.
,
Aghababa
H.
, and
Keyhanipour
A. H.
, “
An Entanglement-Inspired Action Selection and Knowledge Sharing Scheme for Cooperative Multi-agent Q-Learning Algorithm Used in Robot Navigation
,” in
2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)
(
New York
:
Institute of Electrical and Electronics Engineers
,
2020
):
617
622
, https://doi.org/10.1109/ICCKE50421.2020.9303636
15.
Ahmad
H.
, “
Variational Iteration Algorithm-I with an Auxiliary Parameter for Solving Fokker-Planck Equation
,”
Earthline Journal of Mathematical Sciences
2
, no. 
1
(April
2019
):
29
37
, https://doi.org/10.34198/EJMS.2119.2937
This content is only available via PDF.
You do not currently have access to this content.