Abstract
Machine learning and other data-driven methods have developed at a prolific rate for industrial applications due to the advent of industrial big data. However, industrial datasets may not be especially well-suited to supervised learning approaches that require extensive domain knowledge in the complete and accurate labeling of datasets. To address these challenges, a semi-supervised learning approach is proposed that makes use of partially labeled subsets. The proposed methodology is applied to high-dimensional in-process measurement data, utilizing a convolutional autoencoder (CAE) for unsupervised feature extraction. A multiclass extension for semi-supervised anomaly diagnosis is proposed that utilizes principal component analysis (PCA) as the basis for anomaly scoring, and the proposed approach intersects the results of targeted one-against-all phases on partially labeled sets to classify faults. Experiments in a case study on semiconductor manufacturing measurement data are performed to explore the relationship between latent features extracted and anomaly detection performance. The application of the proposed algorithm achieves a true positive detection rate of over 90% with false positive rate under 9% for both local and global anomaly types, with these results accomplished while reducing over 99% of the original input data dimensions. In addition, the approach also allows for positive samples to be identified that were previously undetected by human experts. These results are promising for the application of the proposed semi-supervised methodology in real industrial settings.