Abstract
Human-robot collaboration (HRC) has become an integral element of many manufacturing and service industries. A fundamental requirement for safe HRC is understanding and predicting human trajectories and intentions, especially when humans and robots operate nearby. Although existing research emphasizes predicting human motions or intentions, a key challenge is predicting both human trajectories and intentions simultaneously. This paper addresses this gap by developing a multi-task learning (MTL) framework consisting of a Bi-LSTM-based encoder-decoder architecture that obtains the motion data from both human and robot trajectories as inputs and performs two main tasks simultaneously: human trajectory prediction and human intention prediction. The first task predicts human trajectories, focusing on reconstructing the motion sequences, while the second task tests two main approaches for intention prediction: supervised learning, specifically a Support Vector Machine (SVM), to predict human intention based on the latent representation, and, an unsupervised learning method, the Hidden Markov Model (HMM), that decodes the latent features for human intention prediction. Four encoder designs are evaluated for feature extraction, including interaction-attention, interaction-pooling, interaction-Seq2Seq, and Seq2Seq. The framework is validated through a case study of a desktop disassembly task with robots operating at different speeds. The results include evaluating different encoder designs, analyzing the impact of incorporating robot motion into the encoder, and detailed visualizations. The findings show that our proposed framework can accurately predict human trajectories and intentions.