Abstract

Structural nonlinearities are often spatially localized, such joints and interfaces, localized damage, or isolated connections, in an otherwise linearly behaving system. Quinn and Brink (2021, “Global System Reduction Order Modeling for Localized Feature Inclusion,” ASME J. Vib. Acoust., 143(4), p. 041006.) modeled this localized nonlinearity as a deviatoric force component. In other previous work (Najera-Flores, D. A., Quinn, D. D., Garland, A., Vlachas, K., Chatzi, E., and Todd, M. D., 2023, “A Structure-Preserving Machine Learning Framework for Accurate Prediction of Structural Dynamics for Systems With Isolated Nonlinearities,”), the authors proposed a physics-informed machine learning framework to determine the deviatoric force from measurements obtained only at the boundary of the nonlinear region, assuming a noise-free environment. However, in real experimental applications, the data are expected to contain noise from a variety of sources. In this work, we explore the sensitivity of the trained network by comparing the network responses when trained on deterministic (“noise-free”) model data and model data with additive noise (“noisy”). As the neural network does not yield a closed-form transformation from the input distribution to the response distribution, we leverage the use of conformal sets to build an illustration of sensitivity. Through the conformal set assumption of exchangeability, we may build a distribution-free prediction interval for both network responses of the clean and noisy training sets. This work will explore the application of conformal sets for uncertainty quantification of a deterministic structure-preserving neural network and its deployment in a structural health monitoring framework to detect deviations from a baseline state based on noisy measurements.

1 Introduction

Structural health monitoring (SHM) systems are key enabling capabilities that inform reliable and robust operations of modern structures, typically exposed to environments that may cause damage or changes in the system. These SHM systems seek to provide actionable information about both the current health state of the system as well as predict future limit states, but they require access to a digital twin that is able to provide response predictions in near real-time. These digital twins are computational models of engineering systems or components that are faithful representations of the deployed system in the field [14]. Digital twins require continuous updates informed by data obtained from sensor systems embedded in the physical systems [5,6]. Since they are meant to operate in real-time aided by continuous data streams, they offer an enhanced ability to monitor the evolving health of a system and predict the future response to both nominal operating conditions and unexpected excursions in loading environments [79].

To be truly effective and useful in SHM systems, digital twins need to be able to account for nonlinearities arising as a result of damage or changes to the structure. A wide range of nonlinearities present within structural systems are often located within localized regions of the structure [10]. Examples include nonlinearities introduced by joints and interfaces, localized damage, or isolated regions with nonlinear connections. As such, the majority of the system may be characterized as “linear” or otherwise known (in the sense that other nonlinearities are known); however, the overall system response is nonetheless significantly influenced by the presence of the isolated nonlinearities, and accurate models must resolve these regions of influence. In such cases, the linear modes of the system still provide an adequate description of the majority of the structure external to the nonlinear regions, but the coupling between these modes arising from the nonlinearities may no longer be neglected. For systems with isolated nonlinearities, Quinn and Brink [11] formulate the equations of motion in such a way as to isolate the effect of the nonlinearities on the underlying linear structure. In this work, the effect of the nonlinearities is introduced through a deviatoric force acting on the ideal linear system, representing only the contribution from the nonlinearities and localized to the boundary of the nonlinear region. As a result, the global modes of the linear structure continue to serve as framework for the development of reduced order models, while only the isolated nonlinear regions need to be resolved in an associated nonlinear system whose domain is only over the nonlinear region [11]. Furthermore, the authors previously developed a structure-preserving approach to model these deviatoric forces with machine learning in a model-agnostic way. The proposed approach replaced the nonlinear deviatoric component with a trained neural network embedded in a structure-preserving architecture [12].

To provide information that is actionable, SHM systems need to deal with the uncertainty of the predictions provided by the digital twin system in the presence of noise coming from measurements that serve as inputs to the model. The previous work by the authors [12] considered only noise-free data. However, in real experimental applications, the data are expected to contain noise which motivates this study on its effect on the ML system predictions. The authors of Ref. [13] summarized a variety of ways to perform uncertainty quantification in ML systems, including Monte Carlo dropout [14], Bayesian neural networks [15], neural network ensembles [16], spectral-normalized neural Gaussian processes [17], among others. However, all of these methods require special treatment of the neural network architecture through the inclusion of specific layer types or modified training schemes. For cases where a deterministic neural network is available, an approach to characterizing the observational uncertainty (e.g., from measurement noise in the inputs) in an ML system is through conformal predictions [18].

While traditional estimators such as the jackknife estimator capture measurement uncertainty, conformal prediction methods are able to additionally capture model bias [19]. Here, the jackknife+ estimator is used to construct measures of uncertainty for the damage parameter at each snapshot. In this work, we illustrate a general structure with an isolated nonlinearity formulation, present a structure-preserving machine learning methodology for estimating the deviatoric force at the boundary, and apply conformal methods to measure the uncertainties about the estimated deviatoric forces. We demonstrate how these uncertainty measures can be leveraged to identify changes in the structure in the presence of noise through hypothesis testing. The paper is organized as follows. Section 2 presents a summary of previous work and the enhancements to existing methods in this paper. Section 3 illustrates the application of the proposed approach on a numerical example while Sec. 4 provides conclusions and discussion of future work opportunities.

2 Background and Methodology

This paper leverages a structure-preserving machine learning method developed by the authors in Ref. [12]. We briefly describe the methodology here. Sections 2.1 and 2.2 summarize previous work that forms the foundation for this paper while Sec. 2.3 describes the statistical inference methods used to quantify uncertainty in the model.

2.1 Isolated Nonlinearity Formulation.

This work considers a general structure that contains an isolated region where nonlinearities can occur. This structure can be decomposed into two adjacent regions C1 and C2 as illustrated in Fig. 1. Note that region C1 is described by the internal variables x1N1, while x2N2 is the corresponding internal variables for C2. The equations of motion within C1 are assumed to be linear and known since the nonlinearities are localized in C2. As a result, the equations of motion for this structure can be written as
(1)
Fig. 1
General structure; the nonlinearities are localized within the region C2
Fig. 1
General structure; the nonlinearities are localized within the region C2
Close modal
The isolated nonlinearities C2 are represented by NN2. Examples of isolated nonlinearities include joints, nonlinear attachments, or localized damage. Based on the work in Refs. [11] and [20], the equations of motion can be partitioned into two systems of equation terms as
(2)
where w is a mixed displacement vector can be defined as
(3)

where y(N1+N2) represents the response of the ideal system (i.e., in the absence of a deviatoric force) and zN2 is the deviatoric response defined as z=x2y. Furthermore, the subscripts represent the following regions:

c is the DOFs in C1 that are not coupled to the interface DOFs, α is the DOFs in C1 that are coupled to the interface DOFs, β is the DOFs in C2 that are coupled to the interface DOFs, and n is the DOFs in C2 that are not coupled to the interface DOFs.

The advantage of this formulation is that the deviatoric force Q is expressed as a function of zβ which is available at the boundary of the isolated region. This formulation avoids the need to have access to the interior of the isolated region C2 which may not be physically accessible (e.g., a joint). This work assumes that xβ is obtained by measuring the response at the boundary (e.g., with sensors) and that these measurements may be noisy.

2.2 Structure-Preserving Machine Learning.

The structure-preserving ML formulation is based on the work presented in Refs. [12] and [21]. We start by defining the kinetic energy T, the potential energy V, and the Rayleigh damping I terms of the known system as
(4)
(5)
(6)
Following the definition of the Lagrangian, this is obtained as
(7)
from which the generalized momentum can be obtained as
(8)
We then used these quantities to compute the Hamiltonian as
(9)
Next, the momentum rate (i.e., the time derivative of the generalized momentum) is required to relate the derived quantities back to quantities that can be measured (i.e., acceleration). The momentum rate is computed as
(10)
where Qknc is the kth term of the generalized force vector, and includes both the external forces and the deviatoric term
(11)
Following the work presented in Ref. [12], the deviatoric term is modeled with a multilayer perceptron (MLP). In this work, Swish activation functions are used in all hidden layers, and a linear activation is used for the output layer. The MLP is augmented with a dictionary of polynomial terms, which includes linear and higher order terms. Polynomial terms are chosen because they are a common way of modeling nonlinear forces in structural systems [22]. The MLP takes in the boundary terms as input
(12)

where zβ corresponds to the difference between the measured and ideal response at the boundary of the isolated region. The coefficients λ and γ are learned during training, K is the polynomial degree chosen, and the function G(·) represents the MLP. More details are provided in the original articles [12,21]. A diagram illustrating the neural network architecture is shown in Fig. 2.

Fig. 2
Neural network architecture used to model the deviatoric force term. The inputs to the network and to the polynomial terms are zβ and z˙β. The outputs from the network and the polynomial terms are added together to obtain the deviatoric force.
Fig. 2
Neural network architecture used to model the deviatoric force term. The inputs to the network and to the polynomial terms are zβ and z˙β. The outputs from the network and the polynomial terms are added together to obtain the deviatoric force.
Close modal
Once the vector of generalized forces, including the contribution from the deviatoric force term, is defined, the accelerations can be predicted as
(13)
where M is the known mass matrix of the idealized system, and p˙ is the momenta rate (obtained from Eq. (10). As w is a vector of known quantities, the learning can be supervised by minimizing the squared error between the predicted and known accelerations
(14)
Once trained, the deviatoric force can be computed from the discrepancy term, zβ=xβyβ, as
(15)

where N is a short-hand form for Eq. (12), for any set of initial conditions or forcing function. It should be noted that the deviatoric force is not directly used for training. Instead, the residual of the equation of motion of the system is used to train the network. This formulation enforces preservation of the underlying geometric structure of the dynamic system.

2.3 Statistical Inference on Deviatoric Forces.

In practice, the responses z are sampled from some nonbifurcated (unimodal) distribution that induces a distribution on Q, denoted as D(Q). Characterizing D(Q) allows for building prediction intervals on functions of samples from D(Q). If the distribution on z is known, it is possible to obtain the distribution of Q through the transformation defined in Eq. (15). Commonly, the distribution on z is unknown, leading to a nonparametric characterization of D(Q).

A common method for characterizing uncertainty under nonparametric assumptions is through the jackknife estimator. The jackknife estimator, fjack, of a parameter, θ, is an average of functions of systematic subsamples of a collection of data of size n. Explicitly, for an estimator, fn, the jackknife estimator is defined [23] as
(16)
where f(i) is the same computation of fn but for the subsample with the ith data point left out. The construction of Eq. (16) reduces in-sample bias [23]. While fn has a bias
(17)
the jackknife estimator, fjack, has a reduced bias [23] of
(18)

for some a,b. While the bias reduction in Eq. (18) from Eq. (17) lends Eq. (16) to be a desirable estimator.

The variance of the fjack can be obtained through a jackknife estimate of its own by writing
(19)
As fjack is the average of Eq. (19), the variance of fn can be computed as
(20)

Under stable conditions on f, Eq. (20) is a consistent estimator for the variance of fjack [23].

Defining fn=Q in Eq. (16), the estimation process follows as responses of the neural network in Eq. (15). To this end, Eq. (16) assumes that the true response of N(zβ,z˙β) is the parameter θ. For different training sets, or cross-validating data subsets, the value of θ is changing from sample to sample. This induces a machine-learning model bias into the estimation process. Therefore, a conformal prediction method is applied to capture-and-correct for this bias as the jackknife estimator similarly captures the bias in the traditional nonparametric setting [19].

The jackknife+ estimator is a conformal prediction method that extends the jackknife estimator to account for the machine learning model bias by minimizing the absolute deviation of each leave-one-out estimator, f(i). As the variance of the jackknife estimator is computed using a sample of the jackknife pseudo-values Eq. (19), an estimation methodology such as local absolute deviations may be used to compute prediction sets. Given absolute deviations, di=|fif(i)|, an estimator of fn+1 for a new observation, xn+1, can be computed as
(21)
To account for the model bias in Eq. (21), local absolute deviations are calculated as
(22)
forming a collection of upper and lower bounds for each of the i models. A conformal prediction set is then given by
(23)
where qn,α is the α-percentile of a sample of n observations [24]. The model bias is accounted for in Eq. (23) through the local absolute deviations given in Eq. (22). A jackknife+ prediction interval is then
(24)

Recalling that setting fn=Q, Eq. (24) yields a 100(1α)% conformal prediction interval for the deviatoric force. This general approach will be used to obtain estimates of the prediction intervals in Sec. 3.

3 Results

This results section will describe an example problem and demonstrate the proposed approach to define predictive intervals and detect shifts in the domain through hypothesis testing.

3.1 Example Description.

To demonstrate the proposed methodology a numerical example representing a one-dimensional rod with isolated nonlinearities is used. This example is described in detail in Ref. [12] but a general description is provided here. The rod is discretized with 64 masses that are connected through springs with mass m =1 and linear stiffness k =1. Proportional damping was defined as CξK. The nonlinear region C2 is located in the interval (s1,s2)=(0.25,0.35), so that the isolated region exists between elements 16 and r22. The deviatoric force across the isolated region is identified as
(25)
and the displacement and velocity across the interval are defined as
(26)
The initial conditions of the system are specified in terms of the modal displacements and velocities (q(0),q˙(0)), where
(27)

where Φ is the modal transformation matrix.

The nonlinearities in the isolated region are assumed to contain linear mistuning and cubic nonlinearities in terms of both stiffness and damping. Finally, hysteretic nonlinear damping is also present between the elements within C2, so that the nonlinear force is represented as
(28)
where the hysteretic force is defined as
(29)

following the regularized formulation presented in Ref. [12].

Unless noted otherwise the parameters of the nonlinear region are chosen as
(30)

with ξ=104, and ρ represents the level of hysteresis included. For the data generated for training the network, ρ=0.0.

3.2 Noise Model Description.

In previous work, it had been assumed that the measurements were noise-free. In contrast, in this paper, it is assumed that the measurements taken at the boundary (xβ) are noisy. To this end, random Gaussian noise is added to the simulated data as
(31)
where x¯β(t) is the mean process and δ is the random noise which is assumed to be independent and identically distributed modeled as
(32)

where ν is the noise factor considered which can also be interpreted as the reciprocal of the signal-to-noise ratio since it represents the ratio between the noise variance and the signal variance.

As a result, the deviatoric response inherits the same additive noise as
(33)

and similarly for the deviatoric velocities, z˙β(t).

3.3 Effect of Noise on Model Performance.

The noise observed in the inputs is propagated through the trained neural network when evaluating the deviatoric force term. As a result, the outputs are noisy as well. As the neural network does not yield a closed-form transformation from the input distribution to the response distribution, we leverage the use of conformal sets to build an illustration of sensitivity which enables the definition of predictive confidence intervals. We start by assessing the network's performance when trained with noisy and clean (i.e., noise-free) data. To this end, the network was trained with one simulated realization using the rod model. Random noise was added to the response as described in Eq. (31) with ν=0.05.

For the following results, the trained networks were evaluated with a different realization (i.e., with different initial conditions than those used for training). The deviatoric force was modeled as described in Eq. (28) where only linear terms were included in the polynomial terms (based on prior knowledge of how the interface displacements relate to the forces), and the MLP consisted of five hidden layers with Swish activation functions and the following number of units: {8,12,12,10,10}, which were determined based on a hyperparameter grid search. The output layer had a linear activation function. The model was trained using the Adam optimizer [25] with a learning rate of 1×104 for 65,000 epochs, which took around two hours to train on a single graphics processing unit. The neural network was implemented using the Jax [26] and Flax [27] packages. Figure 3 illustrates the predicted deviatoric force across the interface when given noisy inputs. Both the network trained with noisy data and the network trained with clean data predict really similar outputs when confronted with noisy data during inference. This result is illustrated in Figs. 4 and 5. As shown, the distributions predicted by both networks are indistinguishable from each other which indicates that the network is robust to the presence of noise (at least additive white noise). This result may be due to the structure-preserving constraints in the network that have a regularization effect. It should be noted that the network trained with noisy data required twice the amount of training epochs to reach the same level of accuracy as the network trained with clean data.

Fig. 3
Predictions produced with neural network trained with noisy and clean data for input data with noise levels of 5% and 20%. Network predictions: black, actual: red. (Color version online.)
Fig. 3
Predictions produced with neural network trained with noisy and clean data for input data with noise levels of 5% and 20%. Network predictions: black, actual: red. (Color version online.)
Close modal
Fig. 4
Close-up of predictions for noisy input data (5%) and corresponding histograms at three points in time. Clean: green, noisy: magenta, actual: red. (Color version online.)
Fig. 4
Close-up of predictions for noisy input data (5%) and corresponding histograms at three points in time. Clean: green, noisy: magenta, actual: red. (Color version online.)
Close modal
Fig. 5
Close-up of predictions for noisy input data (20%) and corresponding histograms at three points in time. Clean: green, noisy: magenta, actual: red. (Color version online.)
Fig. 5
Close-up of predictions for noisy input data (20%) and corresponding histograms at three points in time. Clean: green, noisy: magenta, actual: red. (Color version online.)
Close modal

To further evaluate the robustness of the network to noise, the network trained with clean data was used inside a time integration loop and random noise (at 1% level) was added at every integration step. Figure 6 illustrates the effect of the noise on the integrated response. As shown, the mean response can still be recovered even when noise is added during integration. While the effect of error accumulation is evident in these plots, the response did not diverge for the time range that was considered.

Fig. 6
Response obtained from time integration with trained neural network in the loop. Network predictions: black, actual: red. (Color version online.)
Fig. 6
Response obtained from time integration with trained neural network in the loop. Network predictions: black, actual: red. (Color version online.)
Close modal

3.4 Prediction Intervals and Damage Detection.

The next step is to define predictive intervals for the network using the conformal regression approach outlined in Sec. 2.3. To this end, the Python package MAPIE [28] was used to define 95% confidence intervals for the trained models using the jacknife+ method. The intervals are obtained by fitting a conformal regression model to a “calibration” set but the results presented here are for a different realization (i.e., different initial conditions). Figures 79 illustrate the confidence intervals obtained for noise levels 5%, 10%, and 20%, respectively. As illustrated, the predicted confidence intervals provide reasonable coverage of the data (i.e., roughly 95% of the data is covered by the interval).

Fig. 7
Deviatoric force predictions from noisy inputs (at 5% noise level) at three points in time alongside predictive confidence intervals. Actual response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Fig. 7
Deviatoric force predictions from noisy inputs (at 5% noise level) at three points in time alongside predictive confidence intervals. Actual response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Close modal
Fig. 8
Deviatoric force predictions from noisy inputs (at 10% noise level) at three points in time alongside predictive confidence intervals. Actual response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Fig. 8
Deviatoric force predictions from noisy inputs (at 10% noise level) at three points in time alongside predictive confidence intervals. Actual response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Close modal
Fig. 9
Deviatoric force predictions from noisy inputs (at 20% noise level) at three points in time alongside predictive confidence intervals. Actual response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Fig. 9
Deviatoric force predictions from noisy inputs (at 20% noise level) at three points in time alongside predictive confidence intervals. Actual response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Close modal

Now we test whether these predictive intervals can be informative in determining whether damage is present in the structure based on response measurements at the boundary of the isolated region. To simulate damage, the parameter ρ that controls the level of hysteresis in the structure is set to either 0, 1, 10, and 20. These levels define the baseline, low damage, medium damage, and high damage levels, respectively. These cases were simulated and compared to the predicted response from the neural network that had been trained with noisy data from the baseline model. This process was repeated for all three levels of damage and the three noise levels previously considered. For example, Fig. 10 corresponds to low level damage and 5% noise. In this case, it is evident that there has been a shift in the system response as the response of the damaged system is now offset from the mean of the distribution. However, when more noise is added, as illustrated in Fig. 11, this shift becomes less apparent. In contrast, a high level damage is apparent even in the presence of 20% noise, as shown in Fig. 12. These results illustrate the challenge of distinguishing meaningful domain shifts in the presence of noise.

Fig. 10
Deviatoric force predictions from noisy inputs (at 5% noise level) at three points in time alongside predictive confidence intervals. Low level damaged response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Fig. 10
Deviatoric force predictions from noisy inputs (at 5% noise level) at three points in time alongside predictive confidence intervals. Low level damaged response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Close modal
Fig. 11
Deviatoric force predictions from noisy inputs (at 20% noise level) at three points in time alongside predictive confidence intervals. Low level damaged response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Fig. 11
Deviatoric force predictions from noisy inputs (at 20% noise level) at three points in time alongside predictive confidence intervals. Low level damaged response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Close modal
Fig. 12
Deviatoric force predictions from noisy inputs (at 20% noise level) at three points in time alongside predictive confidence intervals. High level damaged response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Fig. 12
Deviatoric force predictions from noisy inputs (at 20% noise level) at three points in time alongside predictive confidence intervals. High level damaged response shown in red, 95% predictive confidence intervals shown in yellow. Green line indicates time point at which the histograms are computed. (Color version online.)
Close modal

The next step is to propagate these intervals through the dynamic process via time integration to obtain the response across the isolated region. For the following examples, a noise factor of 1% was used in order to avoid numerical convergence problems during time integration. For conciseness, only the velocity across the isolated region is shown. Figure 13 shows the time histories of the velocity and their corresponding power spectral density (PSD). The three levels of damage (blue) are plotted alongside the baseline response (black). As shown, it is hard to distinguish between the cases from visual inspection which motivates a more quantitative assessment. Another way to inspect the data is to look at the histograms of the samples at different time points or frequency points. Figure 14 illustrates the sample distributions at different points in time from the time histories, and Fig. 15 shows the histograms at different frequencies from the PSDs. As shown, the distributions exhibit heteroscedastic behavior and, as a result, the trends observed in the histograms are not constant through time or frequency.

Fig. 13
Velocity time history (left) and PSD across isolated region boundary. The baseline response is shown in black and the response for the damaged cases is shown in blue. (Color version online.)
Fig. 13
Velocity time history (left) and PSD across isolated region boundary. The baseline response is shown in black and the response for the damaged cases is shown in blue. (Color version online.)
Close modal
Fig. 14
Histograms of velocity time history samples corresponding to different damage levels
Fig. 14
Histograms of velocity time history samples corresponding to different damage levels
Close modal
Fig. 15
Histograms of velocity PSD samples corresponding to different damage levels
Fig. 15
Histograms of velocity PSD samples corresponding to different damage levels
Close modal

To further understand how the distributions vary through time and frequency, a Kolmogorov–Smirnov (KS) two-sample test was performed to compare the damaged cases to the baseline using the implementation in the SciPy package [29]. In this test, the null hypothesis (H0) is that the two samples (i.e., baseline and damaged) are from the same continuous distribution while the alternative hypothesis (H1) is that they are from two different continuous distributions. The physical interpretation of failing to reject the null hypothesis is that there is no damage present (i.e., the two datasets are from a “healthy” baseline state). In this case, we use a significance level of 5% which implies that p-values that are larger than 0.05 imply that the response is from the same distribution as the baseline state. In other words, p-values smaller than 0.05 will indicate the presence of damage. Figure 16 illustrates the p-values obtained from the KS test as a function of time (left) and frequency (right). As shown, the statistical significance of the test depends on the damage level which is not surprising. As shown in the histograms in Figs. 14 and 15, the shift in the empirical distributions is more evident as the damage level increases. Moreover the peaks in the p-values from the time history seem to correspond to zero-crossings where the differences in response may not be evident. On the frequency side, the test seems to more effective at detecting the damaged cases at higher frequencies. This result implies that the type of damage modeled (i.e., hysteresis) may be having a more pronounced effect on the higher frequency content. The time history p-values are more consistent through time because the high-frequency response is distributed across time. Finally, to combine the two concepts, the short time Fourier transform (STFT) was used to compute the change in frequency as a function of time and the KS test was performed at each frequency and time cell. These results are plotted in Fig. 17 where the upper limit of the color bar was set to log(0.05) so that yellow cells indicate that the null hypothesis was not rejected (i.e., no different from the baseline). These results seem consistent with the PSD and time history results as they show that the level 1 (low) damage is hard to detect with the KS test and that damage is not evident at the lower frequency response. A similar analysis was performed with the Cramer–von Mises (CVM) [30,31] to verify that similar trends were observed. The p-value as a function of time and frequency is plotted in Fig. 18. The CVM test exhibits lower sensitivity to the domain shift in the STFT as illustrated by the fact that there are large yellow regions in Fig. 18 which indicates that the data were sampled from the same distribution. However, the number of cells for which the CVM rejects the null hypothesis (i.e., blue regions) increases as the level damage increases.

Fig. 16
Velocity time histories (top left) and PSDs (top right) and p-values as a function of time (left) and frequency (right) with p = 0.05 shown with a dashed black line. (Color version online.)
Fig. 16
Velocity time histories (top left) and PSDs (top right) and p-values as a function of time (left) and frequency (right) with p = 0.05 shown with a dashed black line. (Color version online.)
Close modal
Fig. 17
P-values as a function of time and frequency computed from STFT of data computed with the KS test. Upper color bar limit is set to log(0.05) so that anything that is yellow exceeds p = 0.05. (Color version online.)
Fig. 17
P-values as a function of time and frequency computed from STFT of data computed with the KS test. Upper color bar limit is set to log(0.05) so that anything that is yellow exceeds p = 0.05. (Color version online.)
Close modal
Fig. 18
P-values as a function of time and frequency computed from STFT of data computed with the CVM test. Upper color bar limit is set to log(0.05) so that anything that is yellow exceeds p = 0.05. (Color version online.)
Fig. 18
P-values as a function of time and frequency computed from STFT of data computed with the CVM test. Upper color bar limit is set to log(0.05) so that anything that is yellow exceeds p = 0.05. (Color version online.)
Close modal

4 Conclusions

This paper presented a framework for uncertainty quantification of neural network response predictions through the definition of predictive intervals using conformal regression. The proposed approach was developed in the context of a structure-preserving neural network that has an embedded isolated nonlinearity formulation to provide physical constraints to the problem. As this model is intended to be used in an online structural health monitoring system, the performance of the models in the presence of measurement noise was evaluated. It was shown that the trained models were not significantly affected by noise in the inputs. This noise was propagated through the machine learning model and through the time integration of the dynamic system. Conformal regression was used to define a predictive interval of the integrated response that was used to assess the presence of damage (in the form of hysteresis) through hypothesis testing. Future work will explore other sources of uncertainty (such as epistemic uncertainty) and will apply the proposed approach to experimental cases.

Acknowledgment

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under Contract No. DE-NA-0003525.

Funding Data

  • Sandia National Laboratories (Contract No. DE-NA-0003525; Funder ID: 10.13039/100006234).

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

References

1.
Chinesta
,
F.
,
Cueto
,
E.
,
Abisset-Chavanne
,
E.
,
Duval
,
J. L.
, and
Khaldi
,
F. E.
,
2020
, “
Virtual, Digital and Hybrid Twins: A New Paradigm in Data-Based Engineering and Engineered Data
,”
Arch. Comput. Methods Eng.
,
27
(
1
), pp.
105
134
.10.1007/s11831-018-9301-4
2.
Gardner
,
P.
,
Borgo
,
M. D.
,
Ruffini
,
V.
,
Hughes
,
A. J.
,
Zhu
,
Y.
, and
Wagg
,
D. J.
,
2020
, “
Towards the Development of an Operational Digital Twin
,”
Vibration
,
3
(
3
), pp.
235
265
.10.3390/vibration3030018
3.
Wagg
,
D. J.
,
Worden
,
K.
,
Barthorpe
,
R. J.
, and
Gardner
,
P.
,
2020
, “
Digital Twins: State-of-the-Art and Future Directions for Modeling and Simulation in Engineering Dynamics Applications
,”
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part B: Mech. Eng.
,
6
(
3
), p.
030901
.10.1115/1.4046739
4.
Thelen
,
A.
,
Zhang
,
X.
,
Fink
,
O.
,
Lu
,
Y.
,
Ghosh
,
S.
,
Youn
,
B. D.
,
Todd
,
M. D.
,
Mahadevan
,
S.
,
Hu
,
C.
, and
Hu
,
Z.
,
2022
, “
A Comprehensive Review of Digital Twin—Part 1: Modeling and Twinning Enabling Technologies
,”
Struct. Multidiscip. Optim.
,
65
(
12
), p.
354
.https://arxiv.org/pdf/2208.14197.pdf
5.
Wright
,
L.
, and
Davidson
,
S.
,
2020
, “
How to Tell the Difference Between a Model and a Digital Twin
,”
Adv. Model. Simul. Eng. Sci.
,
7
(
1
), p.
12
.10.1186/s40323-020-00147-4
6.
McClellan
,
A.
,
Lorenzetti
,
J.
,
Pavone
,
M.
, and
Farhat
,
C.
,
2022
, “
A Physics-Based Digital Twin for Model Predictive Control of Autonomous Unmanned Aerial Vehicle Landing
,”
Philos. Trans. R. Soc., A
,
380
(
2229
), p.
8
.10.1098/rsta.2021.0204
7.
Tsialiamanis
,
G.
,
Wagg
,
D. J.
,
Dervilis
,
N.
, and
Worden
,
K.
,
2021
, “
On Generative Models as the Basis for Digital Twins
,”
Data-Centric Eng.
,
2
, p.
e11
.10.1017/dce.2021.13
8.
Bonney
,
M. S.
, and
Wagg
,
D.
,
2022
, “
Historical Perspective of the Development of Digital Twins
,”
Special Topics in Structural Dynamics and Experimental Techniques (Conference Proceedings of the Society for Experimental Mechanics Series
, Vol.
5
),
Springer
,
Cham, Switzerland
, pp.
15
20
.10.1007/978-3-030-75914-8_2
9.
Thelen
,
A.
,
Zhang
,
X.
,
Fink
,
O.
,
Lu
,
Y.
,
Ghosh
,
S.
,
Youn
,
B. D.
,
Todd
,
M. D.
,
Mahadevan
,
S.
,
Hu
,
C.
, and
Hu
,
Z.
,
2023
, “
A Comprehensive Review of Digital Twin—Part 2: Roles of Uncertainty Quantification and Optimization, a Battery Digital Twin, and Perspectives
,”
Struct. Multidiscip. Optim.
,
66
(
1
), p.
1
.10.1007/s00158-022-03410-x
10.
Friswell
,
M. I.
,
Penny
,
J. E. T.
, and
Garvey
,
S. D.
,
1995
, “
Using Linear Model Reduction to Investigate the Dynamics of Structures With Local Non-Linearities
,”
Mech. Syst. Signal Process.
,
9
(
3
), pp.
317
328
.10.1006/mssp.1995.0026
11.
Quinn
,
D. D.
, and
Brink
,
A. R.
,
2021
, “
Global System Reduction Order Modeling for Localized Feature Inclusion
,”
ASME J. Vib. Acoust.
,
143
(
4
), p.
041006
.10.1115/1.4048890
12.
Najera-Flores
,
D. A.
,
Quinn
,
D. D.
,
Garland
,
A.
,
Vlachas
,
K.
,
Chatzi
,
E.
, and
Todd
,
M. D.
,
2023
, “
A Structure-Preserving Machine Learning Framework for Accurate Prediction of Structural Dynamics for Systems With Isolated Nonlinearities
,”.10.2139/ssrn.4573380
13.
Nemani
,
V.
,
Biggio
,
L.
,
Huan
,
X.
,
Hu
,
Z.
,
Fink
,
O.
,
Tran
,
A.
,
Wang
,
Y.
,
Du
,
X.
,
Zhang
,
X.
, and
Hu
,
C.
,
2023
, “
Uncertainty Quantification in Machine Learning for Engineering Design and Health Prognostics: A Tutorial
,”
Mech. Syst. Signal Process
, 205, p.
110796
.10.1016/j.ymssp.2023.110796
14.
Gal
,
Y.
, and
Ghahramani
,
Z.
,
2016
, “
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
,”
Proceedings of the 33rd International Conference on Machine Learning
, New York, June 20–22, pp.
1050
1059
.https://proceedings.mlr.press/v48/gal16.html
15.
MacKay
,
D. J. C.
,
1992
, “
A Practical Bayesian Framework for Backpropagation Networks
,”
Neural Comput.
,
4
(
3
), pp.
448
472
.10.1162/neco.1992.4.3.448
16.
Opitz
,
D.
, and
Maclin
,
R.
,
1999
, “
Popular Ensemble Methods: An Empirical Study
,”
J. Artif. Intell. Res.
,
11
(
1
), pp.
169
198
.10.1613/jair.614
17.
Liu
,
J. Z.
,
Lin
,
Z.
,
Padhy
,
S.
,
Tran
,
D.
,
Bedrax-Weiss
,
T.
, and
Lakshminarayanan
,
B.
,
2020
, “
Simple and Principled Uncertainty Estimation With Deterministic Deep Learning Via Distance Awareness
,” e-print arXiv:2006.10108.
18.
Shafer
,
G.
, and
Vovk
,
V.
,
2007
, “
A Tutorial on Conformal Prediction
,”
J. Mach. Learn. Res.
,
9
(
3
), pp.
371
421
.10.48550/arXiv.0706.3188
19.
Barber
,
R. F.
,
Candes
,
E. J.
,
Ramdas
,
A.
, and
Tibshirani
,
R. J.
,
2021
, “
Predictive Inference With the Jackknife+
,”
The Annals of Statistics
, 49(1), pp.
486
507
.10.1214/20-AOS1965
20.
Vlachas
,
K.
,
Garland
,
A.
,
Quinn
,
D. D.
, and
Chatzi
,
E.
,
2023
, “
Parametric Reduced Order Modelling for Component-Oriented Treatment and Localized Nonlinear Feature Inclusion
,”
Nonlinear Dyn.
, 112, pp.
3399
3420
.10.1007/s11071-023-09213-z
21.
Najera-Flores
,
D. A.
, and
Todd
,
M. D.
,
2023
, “
A Structure-Preserving Neural Differential Operator With Embedded Hamiltonian Constraints for Modeling Structural Dynamics
,”
Comput. Mech.
,
72
(
2
), pp.
241
252
.10.1007/s00466-023-02288-w
22.
Kerschen
,
G.
,
Worden
,
K.
,
Vakakis
,
A. F.
, and
Golinval
,
J.-C.
,
2006
, “
Past, Present and Future of Nonlinear System Identification in Structural Dynamics
,”
Mech. Syst. Signal Process.
,
20
(
3
), pp.
505
592
.10.1016/j.ymssp.2005.04.008
23.
Wasserman
,
L.
,
2006
,
All of Nonparametric Statistics (Springer Texts in Statistics)
,
Springer-Verlag
,
Berlin, Germany
.10.1007/0-387-30623-4
24.
Kim
,
B.
,
Xu
,
C.
, and
Barber
,
R. F.
,
2020
, “
Predictive Inference Is Free With the Jackknife+-After-Bootstrap
,” e-print arXiv:2002.09025.
25.
Kingma
,
D. P.
, and
Ba
,
J.
,
2015
, “
Adam: A Method for Stochastic Optimization
,” e-print arXiv:1412.6980.
26.
Bradbury
,
J.
,
Frostig
,
R.
,
Hawkins
,
P.
,
Johnson
,
M. J.
,
Leary
,
C.
,
Maclaurin
,
D.
,
Necula
,
G.
,
Paszke
,
A.
,
VanderPlas
,
J.
,
Wanderman-Milne
,
S.
, and
Zhang
,
Q.
,
2018
, “
JAX Composable Transformations of Python+NumPy Programs
,”.http://github.com/google/jax
27.
Heek
,
J.
,
Levskaya
,
A.
,
Oliver
,
A.
,
Ritter
,
M.
,
Rondepierre
,
B.
,
Steiner
,
A.
, and
van Zee
,
M.
,
2023
, “
Flax: A Neural Network Library and Ecosystem for JAX
,”.http://github.com/google/flax
28.
MAPIE Development Team
,
2023
, “
MAPIE—Model Agnostic Prediction Interval Estimator
,”.https://github.com/scikit-learn-contrib/MAPIE
29.
Virtanen
,
P.
,
Gommers
,
R.
,
Oliphant
,
T. E.
,
Haberland
,
M.
,
Reddy
,
T.
,
Cournapeau
,
D.
,
Burovski
,
E.
, et al.,
2020
, “
SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python
,”
Nat. Methods
,
17
(
3
), pp.
261
272
.10.1038/s41592-019-0686-2
30.
Cramér
,
H.
,
1928
, “
On the Composition of Elementary Errors
,”
Scand. Actuarial J.
,
1928
(
1
), pp.
13
74
.10.1080/03461238.1928.10416862
31.
CSöRgő
,
S.
, and
Faraway
,
J. J.
,
2018
, “
The Exact and Asymptotic Distributions of Cramér-Von Mises Statistics
,”
J. R. Stat. Soc.: Ser. B (Methodol.)
,
58
(
1
), pp.
221
234
.10.1111/j.2517-6161.1996.tb02077.x