Abstract

The United States Navy (USN) intends to increase the amount of uncrewed aircraft in a carrier air wing. To support this increase, carrier-based uncrewed aircraft will be required to have some level of autonomy as there will be situations where a human cannot be in/on the loop. However, there is no existing and approved method to certify autonomy within Naval Aviation. In support of generating certification evidence for autonomy, the United States Naval Academy (USNA) has created a training and evaluation system (TES) to provide quantifiable metrics for feedback performance in autonomous systems. The preliminary use case for this work focuses on autonomous aerial refueling. Prior demonstrations of autonomous aerial refueling have leveraged a deep neural network (DNN) for processing visual feedback to approximate the relative position of an aerial refueling drogue. The training and evaluation system proposed in this work simulates the relative motion between the aerial refueling drogue and feedback camera system using industrial robotics. Ground-truth measurements of the pose between the camera and drogue are measured using a commercial motion capture system. Preliminary results demonstrate calibration methods providing ground-truth measurements with millimeter precision. Leveraging this calibration, the proposed system is capable of providing large-scale datasets for DNN training and evaluation against a precise ground truth.

1 Introduction

The United States Navy (USN) has publicly stated that the air wing of the future will be 40% uncrewed [1]. As part of this effort, the USN is preparing to field the Boeing MQ-25 Stingray, an uncrewed refueling aircraft [2]. The MQ-25 will be the first large, uncrewed aircraft to operate regularly from the flight deck of a United States aircraft carrier. The current planned operations for the MQ-25 can be considered automation as there will be a human in/on the loop acting as the air vehicle operator. As the USN expands uncrewed aerial systems beyond automation, reliance on autonomy will increase. However, an approved method to certify autonomy within naval aviation does not currently exist. In response to this need, the United States Naal Academy (USNA) has created a training and evaluation system (TES) to quantify feedback performance for autonomous systems.

In coordination with the Office of Naval Research, the Naval Aviation Systems Command, and the National Airworthiness Council Artificial Intelligence Working Group (NAWCAIWG), this work focuses on an unclassified use case for certification using feedback derived from deep neural network (DNN) processing of visual imagery. In this case, the autonomous aircraft is acting as the receiver in an aerial refueling operation (i.e., approaching and coupling with a refueling drogue), and feedback must reliably approximate the relative position of the drogue from visual imagery. This will enable an uncrewed aircraft to complete an autonomous task, a first for naval aviation. Details of the use case can be found in [3]. Prior work has shown that an uncrewed aircraft in live testing and simulation can perform autonomous refueling under ideal conditions with a human closely monitoring all aspects [4,5]. However, a fleet-wide flight clearance without a human in/on the loop requires extensive performance quantification to assess risk. Further, standards or methods of compliance do not exist to certify this level of autonomy within naval aviation. The described use case offers several unique challenges, most notably a lack of a universal standard for measuring the accuracy and performance of DNN feedback. The proposed TES provides tools for data acquisition, ground-truth labeling for DNN training, and performance quantification.

The use of supervised learning to train DNNs requires an established “ground truth” defining a known input/output correspondence. In practice, creating ground-truth correspondence using nonsimulated imagery represents the most labor-intensive part of training. This effort is justified as valid ground-truth labeling of large datasets is required for accurate and predictable DNN performance [6].

While this task is relatively unskilled, lapses in accuracy in labeling data can introduce training error that reduces network accuracy. Prior research shows that a trained DNN can identify an aerial refueling drogue from a relatively small dataset [7]. However, generating a dataset of accurately labeled images is resource-intensive. As such, the TES is designed to serve both as an autolabeling tool to generate large data ground-truth correspondences and as an evaluation tool. To accomplish this, the TES incorporates industrial robots to provide automated articulation between the designated sensor (for this application, a machine vision camera), and the target (KC-130 refueling drogue). Early results showed the ability to track the relative pose across the combined manipulation workspace, and the ability to project bounding boxes defining salient target features in image space without human interaction.

This paper will highlight the process we used in establishing the TES and is structured as follows. Section 2 highlights related work, the use case, and the motivation for the TES. Section 3 details the TES design, calibration, and automation of data acquisition. Section 4 describes the preliminary evaluation of the TES and presents the results generated. Finally, Sec. 5 summarizes conclusions and discusses future work.

2 Background

The prevalence and accuracy of DNNs, specifically deep convolution neural networks, have expanded notably since the adoption of graphics processing unit computing [8]. Government and commercial entities continue to adopt this technology with applications ranging from license plate identification [9,10] to identifying humans in a distorted image [11]. Regardless of application, these algorithms require large datasets for training before they can be effective.

In 2021, Zhan et al. published a survey paper on machine-learning techniques for autolabeling across a breadth of data formats (video, audio, and text). They examined numerous papers describing methods for generating large datasets used for training machine learning algorithms [6]. However, none of the papers surveyed offered a method for automatically labeling data with quantified labeling accuracy. Given the certification and safety considerations associated with this work, these autolabeling techniques, while qualitatively effective, lack the critical information needed to support certification and safety approvals for flight clearance. Additional work on developing autolabeling methods is ongoing. Some methods include having a human manually label images while an algorithm learns from their actions. The algorithm is then able to continue the task with minimal supervision [1215].

This paper focuses on developing a method for generating large ground-truth correspondences between images and known features for use in DNN training and evaluation. The specific application for this preliminary work is the tracking of an aerial refueling drogue in simulated flight configurations. Following Refs. [4,5], visual feedback from a camera fixed to the simulated receiving aircraft will be processed to define its distance and relative position from the drogue using salient geometric features on the drogue (e.g., the relative size of drogue features in pixels). Assuming a precise training set containing, acceptable DNN selection, and acceptable DNN parameter tuning, prior work [7] suggests that the DNN will accurately track relative position during the final approach to contact with the drogue. For the precise training set, this paper introduces the idea of using ground-truth data to automatically label the bounding boxes for DNN training vice another technique. Existing autolabeling and hand-labeling techniques do not offer a quantitative accuracy metric when generating ground-truth correspondence for the training and evaluation of the DNN. The method presented in this paper differs from existing autolabeling and hand-labeling techniques by providing quantifiable tracking metrics for labeling of training and evaluation data. For this use case, the quantified tracking accuracy generated in the training and evaluation of the DNN can provide safe operating bounds for control systems leveraging DNN feedback. As mentioned above, these measures are also critical in assessing the safety and eventual certification of flight autonomy leveraging DNN feedback.

With the advent of increased computing power (notably graphics processing unit computing), machine learning techniques such as DNNs have been widely adopted as a cost-effective method of processing large datasets across a range of sensor modalities. An extensive body of work explores the applications of machine learning algorithms to identify objects in images with applications spanning various fields. Machine learning methods have been applied to anomaly detection in medical imaging, with extensive work related to lung cancer detection [1620], to fire detection camera footage [2123], and as a feedback modality for self-driving cars [2426].

The use of computer vision as a method for feedback in autonomous aerial refueling has been studied by multiple organizations. The Air Force Institute of Technology has an active program of identifying the receiver aircraft and identifying the relative pose information to a synthetic refueling drogue [2731]. The work at Air Force Institute of Technology relies on a tanker-based vision system vice one that is hosted by the receiver aircraft. The approach simulated in this paper focuses on a receiver-based computer vision system. Various iterations of this approach and its utility for aerial refueling have appeared in the literature [3236]. While all of these approaches show utility for demonstrating an uncrewed vehicle can complete the task, none have been vetted within naval aviation as part of the safety of flight clearance.

Fielding large uncrewed platforms within naval aviation has become the next logical next step. However, most of the functionality within these platforms is only certified safe for flight when a human is in/on the loop. As we begin to field these platforms within a carrier air wing, a need has been identified to allow them to complete their missions without human oversight. The platforms will need to exhibit some level of autonomous functionality. Before certifying this autonomous functionality, the USN needs to formally establish specifications and methods of compliance that certification officials can use to make informed risk decisions.

To enable an early dialog between industry, academia, and the United States military certification officials the NAWCAIWG sponsored two working groups to study the issue and make recommendations [37,38]. These working groups elected to use an unclassified use case: An autonomous aircraft acting as the receiver during aerial refueling through the use of a DNN. This use case was presented at Xpotential 2022 [39], detailed in the International Test and Evaluation Association Journal [3], and in the Systems Engineering Journal. The paper is another step along this research line.

This paper details the development, calibration, and application of TES to generate ground-truth correspondences for DNN training and evaluation, specifically associated with an autonomous aerial refueling task. The overarching goal of this work is to develop a framework for providing highly accurate and quantifiable ground-truth measurements applicable to the training and evaluation of autonomy with the eventual goal of providing information for certification of autonomy.

3 Test and Evaluation System Overview

This section provides an overview of the proposed TES, associated calibration, and automated data acquisition. Section 3.1 details the TES design including hardware specification and software development. Section 3.2 overviews the required calibration of the TES to provide values for static, unknown transformations. Section 3.3 describes the current approach to automated data acquisition both for calibration and autolabeling in the context of this work.

3.1 Design.

The TES proposed in this work was developed within the Department of Weapons, Robotics, & Control Engineering's Vision Integration in Polymanual Robotics (VIPER) lab at the USNA. The VIPER lab was established as a research facility to explore computer vision and sensor uncertainty leveraging multi-arm industrial robotic manipulation and commercial motion capture. The VIPER lab currently houses three, multiple degree-of-freedom (DoF) industrial manipulators: (1) Yaskawa SIA20F—a 7-DoF, 1090-mm horizontal reach, 20 kg payload manipulator; (2) Universal Robot UR10—a 6-DoF, 1300-mm horizontal reach, 10 kg payload manipulator; and (3) Universal Robot UR5—a 6-DoF, 850-mm horizontal reach, 5 kg payload manipulator. Within the VIPER lab, the Yaskawa SIA20F is anchored to the lab floor, and the UR5 and UR10 are mounted on individual mobile bases. The VIPER lab integrates a 12-camera motion capture (MoCap) constellation (OptiTrack PrimeX 41, Natural Point Inc.), providing an advertised ±0.10-mm 3D tracking accuracy.

The TES design goals are as follows: (1) simulate, at a minimum, the final 4.5 m approach for an aerial refueling task given a 1.8 m distance between the camera and the center of the drogue at an approach distance of 0 m (simulating the approximate position of the camera when the receiver makes contact as discussed in [7]); (2) simulate off-axis misalignment of at least ±1.5 m during approach; (3) integrate a KC-130 aerial refueling drogue following the North Atlantic Treaty Organization (NATO) “probe-and-drogue” standard [40] as a representative target for training and evaluation; (4) integrate an interchangeable system for mounting machine vision cameras; and (5) provide precise measurements for defining the position and orientation of drogue features relative to the camera.

Given the 14 kg weight of the KC-130 aerial refueling drogue assembly, mounting options to address design goal (3) are limited to (a) static mounting or (b) mounting to the SIA20F. Given design goal (2), the SIA20F was selected enabling ±850 mm off-axis articulation of the drogue (after accounting for pedestal collision). Mounting for the interchangeable machine vision cameras to address design goal (4) is accomplished using a rigid 80/20 interface between the UR10 end effector and a standard 1/4 in camera ball head. The UR10 is selected for the camera mount to address the remaining off-axis misalignment requirement of design goal (2) enabling a combined misalignment capability exceeding ±1.8 m. Design goal (1) is addressed by the mobility of the UR10 base relative to the rigidly fixed SIA20F with restrictions imposed by available floor space limiting the maximum approach distance to 6 m. To address design goal (5), reflective markers are rigidly fixed to components of the system to define frames that can be tracked using the MoCap. Figure 1 highlights the components of the TES.

Fig. 1
Image of the TES with labeled components
Fig. 1
Image of the TES with labeled components
Close modal

To keep the KC-130 drogue “inflated” during imaging, the drogue is lightly modified to incorporate tension cables and turnbuckles at six locations evenly spaced about the drogue's center axis. This results in an inflated, but notably hexagonal drogue configuration. Figure 2 highlights the difference between an inflated KC-130 drogue in-flight, and an image captured by the TES. This discrepancy between the current hexagonal shape and the desired circular shape is discussed in Sec. 5 and will be addressed in future work.

Fig. 2
Comparison between an image captured of an in-flight drogue and an image captured by the TES with drogue “inflated” using tension cables. The KC-130 drogue inflated in-flight is shown on the left, and the TES drogue is shown on the right. Color differences between drogue parachute material are due to fuel/exhaust deposits from use.
Fig. 2
Comparison between an image captured of an in-flight drogue and an image captured by the TES with drogue “inflated” using tension cables. The KC-130 drogue inflated in-flight is shown on the left, and the TES drogue is shown on the right. Color differences between drogue parachute material are due to fuel/exhaust deposits from use.
Close modal

The TES is interfaced using ROS and MATLAB wrappers. The SIA20F is interfaced using the ROS-Industrial “Motoman” package [41], the UR10 is interfaced using the ROS-Industrial “UR Modern Driver” package [42], and the MoCap is interfaced using the ROS “VRPN Client” package [43]. The MATLAB wrappers [4446] utilize the MATLAB ROS Toolbox and allow users to command the robots, query robot state, and query feedback from the MoCap. Cameras are interfaced directly in MATLAB using the Image Acquisition Toolbox.

Table 1 describes the coordinate frames defined as part of the TES, and Table 2 describes the transformations required for the proposed effort.

Table 1

TES frame definitions, descriptions, and information sources. All frames are 3D with units of mm except m (2D with units of pixels).

FrameFrame descriptionInformation source
euFixed relative to UR10 end-effectorUR10 controller
ouFixed relative to UR10 baseUR10 controller
eyFixed relative to SIA20F end-effectorSIA20F controller
oyFixed relative to SIA20F baseSIA20F controller
tuFixed relative to UR10 end-effectorMoCap
buFixed relative to UR10 baseMoCap
tyFixed relative to SIA20F end-effectorMoCap
byFixed relative to SIA20F baseMoCap
wFixed relative to MoCap “world”MoCap
cFixed to camera focal pointCamera
dDrogue salient feature frameUser/CAD
mFixed to upper left of digital imageCamera
FrameFrame descriptionInformation source
euFixed relative to UR10 end-effectorUR10 controller
ouFixed relative to UR10 baseUR10 controller
eyFixed relative to SIA20F end-effectorSIA20F controller
oyFixed relative to SIA20F baseSIA20F controller
tuFixed relative to UR10 end-effectorMoCap
buFixed relative to UR10 baseMoCap
tyFixed relative to SIA20F end-effectorMoCap
byFixed relative to SIA20F baseMoCap
wFixed relative to MoCap “world”MoCap
cFixed to camera focal pointCamera
dDrogue salient feature frameUser/CAD
mFixed to upper left of digital imageCamera
Table 2

Transformations natively measured by the TES and unknown fixed transformations

TransformationTransform descriptionGroupLinear unitsInformation source
HeuouPose of eu relative to ouSE(3)mmUR10 Controller
HeyoyPose of ey relative to oySE(3)mmSIA20F Controller
HtuwPose of tu relative to wSE(3)mmMoCap
HbuwPose of bu relative to wSE(3)mmMoCap
HtywPose of ty relative to wSE(3)mmMoCap
HbywPose of by relative to wSE(3)mmMoCap
HeutuPose of eu relative to tuSE(3)mmStatic, Unknown
HoubuPose of ou relative to ouSE(3)mmStatic, Unknown
HeytyPose of ey relative to tySE(3)mmStatic, Unknown
HoybyPose of oy relative to bySE(3)mmStatic, Unknown
HeucPose of oy relative to bySE(3)mmStatic, Unknown
HtucPose of tu relative to cSE(3)mmStatic, Unknown
HtydPose of ty relative to dSE(3)mmStatic, Unknown
AcmUndistorted projection of c to mIntrinsicPixelsStatic, Unknown
TransformationTransform descriptionGroupLinear unitsInformation source
HeuouPose of eu relative to ouSE(3)mmUR10 Controller
HeyoyPose of ey relative to oySE(3)mmSIA20F Controller
HtuwPose of tu relative to wSE(3)mmMoCap
HbuwPose of bu relative to wSE(3)mmMoCap
HtywPose of ty relative to wSE(3)mmMoCap
HbywPose of by relative to wSE(3)mmMoCap
HeutuPose of eu relative to tuSE(3)mmStatic, Unknown
HoubuPose of ou relative to ouSE(3)mmStatic, Unknown
HeytyPose of ey relative to tySE(3)mmStatic, Unknown
HoybyPose of oy relative to bySE(3)mmStatic, Unknown
HeucPose of oy relative to bySE(3)mmStatic, Unknown
HtucPose of tu relative to cSE(3)mmStatic, Unknown
HtydPose of ty relative to dSE(3)mmStatic, Unknown
AcmUndistorted projection of c to mIntrinsicPixelsStatic, Unknown
Assuming the static, unknown transformations given in Table 2 can be recovered accurately, the TES provides redundant information both for describing the 3D position of drogue features relative to the camera frame and for projecting 3D drogue features into digital images. Given a set of salient features described as a set of points Pd, Eqs. (1) and (2) provide two unique methods for defining features relative to Frame c
(1)
(2)
Noting the block matrix form of Hdc (Eq. (3)), projection of features into an undistorted “pinhole” camera image is defined in Eq. (4)
(3)
(4)

Here, RdcSO(3) defines the orientation of frame d relative to frame c, X¯dc3×1 defines the position of frame d relative to frame c, zc1×N defines the z-distance of N salient features (P) relative to the camera frame (typically referred to as “scale” with variable s), xm and ym1×N represent the pixel coordinates of salient features P within the undistorted digital image (relative to Frame m), and denotes element-wise (i.e., Hadamard) division. For distorted images, points projected using the pinhole model (Eq. (4)) must be distorted using the applicable lens/camera model (e.g., Brown-Conrady or Fisheye).

The two methods presented for calculating Hdc (Eqs. (1) and (2)) rely on information gathered from different components of the TES. Manufacturer specifications prescribe a MoCap accuracy of ±0.1 mm, SIA20F “repeatability” of ±0.1 mm, and UR10 “repeatability” of ±0.1 mm. Repeatability in this context describes the position error bounds associated with repeated movement to a fixed waypoint anywhere within the robot's workspace. Assuming a one-to-one relationship between repeatability and position information reported from the SIA20F and UR10 controllers suggests an accuracy of ±0.1 mm for both the SIA20F and UR10. Ignoring the error introduced in the recovery of the static, unknown transformations described in Table 2, a best-case error associated with Eqs. (1) and (2) can be approximated using the convolution. For Eq. (1), relying on two measured transformations with ±0.1 mm accuracy, the approximate best-case error should be within ±0.3 mm. For Eq. (2), relying on five measured transformations with ±0.1 mm accuracy, the approximate best-case error should be within ±0.9 mm. This approximation follows intuition and suggests that Eq. (1) will yield a more precise result.

3.2 Calibration.

The goal of TES calibration is to accurately establish values for the static, unknown transformations described in Table 2. To do so, two calibration fiducials are introduced to provide extrinsic information describing the fiducial pose relative to the camera frame (frame c). For convenience, these are defined as a 2D checkerboard (frame f) fixed to a unique MoCap rigid body (frame g); and a 235 mm AprilTag (frame a) rigidly fixed to the SIA20F base frame. These fiducials were selected to provide compatibility with the “Camera Calibration” and “Read AprilTag” tools available in the MATLAB Computer Vision Toolbox. Figure 3 provides images of the fiducials highlighting the MoCap rigid body markers, and Table 3 provides a summary of the transformations introduced by the fiducials.

Fig. 3
Checkerboard (left) and AprilTag fiducials with MoCap rigid body markers highlighted
Fig. 3
Checkerboard (left) and AprilTag fiducials with MoCap rigid body markers highlighted
Close modal
Table 3

Transformations introduced for TES calibration

TransformationTransform descriptionGroupLinear unitsInformation source
HfcPose of f relative to cSE(3)mmCamera (camera calibration)
HacPose of a relative to cSE(3)mmCamera (read AprilTag)
HgfPose of g relative to fSE(3)mmStatic, Unknown
HbyaPose of by relative to aSE(3)mmStatic, Unknown
TransformationTransform descriptionGroupLinear unitsInformation source
HfcPose of f relative to cSE(3)mmCamera (camera calibration)
HacPose of a relative to cSE(3)mmCamera (read AprilTag)
HgfPose of g relative to fSE(3)mmStatic, Unknown
HbyaPose of by relative to aSE(3)mmStatic, Unknown
Noting that all unknown transformations are static, the solution to “AX = XB” for the Special Euclidean Group (SE(3)) can be applied [47]. The “AX = XB” formulation using notation from this work is provided in the following equation:
(5)

Frames u and v are introduced generically to be replaced by the applicable frame labels in this work. Assuming n calibration samples, i and j denote discrete samples taken at unique instances of time (i,j{1,2,,n} and ij). Thus, Hu(j)u(i) represents the pose of frame u in sample i relative to frame u in sample j, and Hv(j)v(i) represents the pose of frame v in sample i relative to frame u in sample j. Given that the unknown transformation is static, Hv(j)u(j)=Hv(i)u(i)=Hvu. A summary of the transformations used to recover the static, unknown transformations is provided in Table 4. Error approximations for the recovered transformations can be defined as ±0.7 mm for A and B values derived from MoCap and/or robot feedback. Values leveraging fiducial extrinsics (Hfc and Hac) are dependent on camera calibration accuracy.

Table 4

“AX = XB” transformation definitions used to solve for unknown, static transformations. Note that Heuc and Htuc can use transformations recovered using both the checkerboard fiducial (frame f) and the AprilTag fiducial (frame a)

XSE(3) (Unknown)ASE(3)BSE(3)
Heutu=Heu(j)tu(j)=Heu(i)tu(i)Heu(i)eu(j)=(Heu(j)ou)1Heu(i)ouHtu(i)tu(j)=(Htu(j)w)1Htu(i)w
Houbu=Hou(j)bu(j)=Hou(i)bu(i)Hou(i)ou(j)=Heuou(j)(Heuou(i))1Hbu(i)bu(j)=(Hbu(j)w)1Hbu(i)w
Heyty=Hey(j)ty(j)=Hey(i)ty(i)Hey(i)ey(j)=(Hey(j)oy)1Hey(i)oyHty(i)ty(j)=(Hty(j)w)1Hty(i)w
Hoyby=Hoy(j)by(j)=Hoy(i)by(i)Hoy(i)oy(j)=Heyoy(j)(Heyoy(i))1Hby(i)by(j)=(Hby(j)w)1Hby(i)w
Heuc=Heu(j)c(j)=Heu(i)c(i)Heu(i)eu(j)=(Heu(j)ou)1Heu(i)ouHc(i)c(j)=Hfc(j)(Hfc(i))1
Heu(i)eu(j)=(Heu(j)ou)1Heu(i)ouHc(i)c(j)=Hac(j)(Hac(i))1
Htuc=Htu(j)c(j)=Htu(i)c(i)Htu(i)tu(j)=(Htu(j)w)1Htu(i)wHc(i)c(j)=Hfc(j)(Hfc(i))1
Htu(i)tu(j)=(Htu(j)w)1Htu(i)wHc(i)c(j)=Hac(j)(Hac(i))1
Hgf=Hg(j)f(j)=Hg(i)f(i)Hg(i)g(j)=(Hg(j)w)1Hg(i)wHf(i)f(j)=(Hf(j)c)1Hf(i)c
Hbya=Hby(j)a(j)=Hby(i)a(i)Hby(i)by(j)=(Hby(j)w)1Hby(i)wHa(i)a(j)=(Ha(j)c)1Ha(i)c
XSE(3) (Unknown)ASE(3)BSE(3)
Heutu=Heu(j)tu(j)=Heu(i)tu(i)Heu(i)eu(j)=(Heu(j)ou)1Heu(i)ouHtu(i)tu(j)=(Htu(j)w)1Htu(i)w
Houbu=Hou(j)bu(j)=Hou(i)bu(i)Hou(i)ou(j)=Heuou(j)(Heuou(i))1Hbu(i)bu(j)=(Hbu(j)w)1Hbu(i)w
Heyty=Hey(j)ty(j)=Hey(i)ty(i)Hey(i)ey(j)=(Hey(j)oy)1Hey(i)oyHty(i)ty(j)=(Hty(j)w)1Hty(i)w
Hoyby=Hoy(j)by(j)=Hoy(i)by(i)Hoy(i)oy(j)=Heyoy(j)(Heyoy(i))1Hby(i)by(j)=(Hby(j)w)1Hby(i)w
Heuc=Heu(j)c(j)=Heu(i)c(i)Heu(i)eu(j)=(Heu(j)ou)1Heu(i)ouHc(i)c(j)=Hfc(j)(Hfc(i))1
Heu(i)eu(j)=(Heu(j)ou)1Heu(i)ouHc(i)c(j)=Hac(j)(Hac(i))1
Htuc=Htu(j)c(j)=Htu(i)c(i)Htu(i)tu(j)=(Htu(j)w)1Htu(i)wHc(i)c(j)=Hfc(j)(Hfc(i))1
Htu(i)tu(j)=(Htu(j)w)1Htu(i)wHc(i)c(j)=Hac(j)(Hac(i))1
Hgf=Hg(j)f(j)=Hg(i)f(i)Hg(i)g(j)=(Hg(j)w)1Hg(i)wHf(i)f(j)=(Hf(j)c)1Hf(i)c
Hbya=Hby(j)a(j)=Hby(i)a(i)Hby(i)by(j)=(Hby(j)w)1Hby(i)wHa(i)a(j)=(Ha(j)c)1Ha(i)c

Though assumed static, environmental factors such as temperature will impact the transformations recovered by the TES calibration. As a result, the TES calibration must be evaluated prior to and potentially intermittently during data acquisition using the methods described in Sec. 4.2. In practice, the TES calibration accuracy is evaluated using new images taken of the checkerboard and AprilTag fiducials to provide a comparison between extrinsics calculated using static transformations (extrinsics) and MoCap measurements and transformations (extrinsics) recovered by the camera directly. If and when the mean extrinsic errors derived from this evaluation fall outside of the desired accuracy, the TES must be recalibrated before further use.

3.3 Automated Data Acquisition.

Automated data acquisition with the TES is accomplished using the hardware interfaces described in Sec. 3.1 and with a simulation environment developed as a “digital twin” for the physical TES. The simulation environment utilizes the MATLAB Robotics System Toolbox to model and visualize the UR10 and SIA20F, and to define applicable collision geometries. MoCap feedback and static transformations from calibration (Sec. 3.2) provide relative pose information for the placement of the UR10, SIA20F, fiducial frames, and drogue. A camera simulation based on camera parameters recovered during calibration and fiducial visualizations defined by known design properties are incorporated into the TES simulation using tools from [4851].

The resultant TES simulation environment provides:

  • Collision detection for the UR10, SIA20F, pedestal, and mounting geometry, and components mounted to end-effectors (i.e., camera and drogue).

  • Forward and inverse kinematics for the UR10 and SIA20F including tool transformation offsets.

  • Information regarding visibility of desired features within the camera's field of view (e.g., checkerboard fiducial, AprilTag fiducial, and drogue features).

  • Simulated camera images for validation and debugging.

Currently, the TES simulation is used to determine if manipulator waypoints are (1) reachable within the workspace of the TES, (2) collision-free, and (3) have a specified feature in the camera field of view. In this preliminary application, the SIA20F is set to a fixed location with the drogue facing the UR10. With the SIA20F fixed, each waypoint is uniquely defined as a joint configuration for the UR10 corresponding to a camera pose in task space. To reduce the search space, candidate waypoints are derived using end-effector positions selected from a circular region of the UR10 workspace. The normal to the plane containing this circular region is defined in Eq. (6) where ẑou represents the unit vector z-direction frame ou and fou is the center point of the salient feature of interest referenced to frame ou. Using this definition, the circular region is uniquely defined using the region radius, the region center height specified along ẑou, and the distance of the region from the UR10 base frame defined along n̂ou
(6)

Given a desired number of samples, features of interest, and a prescribed sampling region, the initial “automated waypoint generation for data acquisition” (daqWaypoints) algorithm is defined in Algorithm 1. The functions isIkin, isCollision, and isVisible used in Algorithm 1 are built into the TES simulation environment providing tools to check if an inverse kinematic solution exists, if the system is in a collision state, and if the features of interest are within the camera field of view.

Once a set of waypoints q is defined, a collision-free path between waypoints can be defined using established methods (e.g., [52]). For this preliminary effort, the defined set of waypoints (defined in joint space) is sorted by distance from a starting configuration, each adjacent pair of sorted waypoints is connected via linear interpolation, and the TES simulation environment evaluates the interpolated points. If no collisions are found, the interpolated points are added to the path. If a collision is found, the second element of the adjacent pair is replaced by the interpolated point prior to the collision. This produces a safe, reachable, collision-free path. However, waypoints replaced in this method do not guarantee that any/all desired features are visible in the camera field of view. The result is a dataset that may be smaller than the value prescribed by the user.

Algorithm 1.

Automated waypoint generation for data acquisition

1: proceduredaqWaypoints(n*,F,C)   ▷ n* defines desired samples, F defines features, C defines search region
2: i0
3:  whilein*do
4:   HeuourandSEC
5:   ifisIkin(Heuou)=0then continue
6:   end if
7:   qiikin(Heuou)
8:   ifisCollision(qi)=1orisVisible(q,F)=0then continue
9:   end if
10:    ii+1
11:       q[i]qi
12:    end while
13:    returnq   ▷ q defines waypoints in joint space
14: end procedure
1: proceduredaqWaypoints(n*,F,C)   ▷ n* defines desired samples, F defines features, C defines search region
2: i0
3:  whilein*do
4:   HeuourandSEC
5:   ifisIkin(Heuou)=0then continue
6:   end if
7:   qiikin(Heuou)
8:   ifisCollision(qi)=1orisVisible(q,F)=0then continue
9:   end if
10:    ii+1
11:       q[i]qi
12:    end while
13:    returnq   ▷ q defines waypoints in joint space
14: end procedure

With a collision-free path defined, the calibrated TES is capable of providing several options for ground-truth correspondence. Some examples include:

  1. Project drogue features into image space following Eqs. (1) and (4)

  2. Project drogue features into image space following Eqs. (1) and (4), and define bounding boxes for specific features

  3. Define a fully or reduced parametrization describing the pose of the drogue relative to the camera frame following Eq. (1) (drogue position and yaw/pitch/roll relative to camera frame, drogue yaw/pitch relative to camera frame, drogue position only relative to camera frame, etc.)

  4. Define a full or reduced parametrization describing the pose of the drogue relative to a user-defined frame fixed to the camera frame using an extension to following Eq. (1) (e.g., the refueling probe frame)

Beyond the options described above, combinations including image projections and full/reduced parametrization of the relative pose are possible. The advantage of the TES over existing methods is the breadth of information available. This provides flexibility in labeling modality.

4 Results

This section reviews the calibration and validation of the TES. Section 4.1 describes results from the automated data acquisition methods presented in Sec. 3.3; Sec. 4.2 describes calibration results and noise considerations; and Sec. 4.3 presents the preliminary autolabeling results for the TES.

4.1 Automated Data Acquisition.

Following a coarse initial calibration of the TES, the automated data acquisition approach described in Sec. 3.3 was used to capture 2500 waypoints with the features of interest defined as the corners of the calibration checkerboard, and 2500 waypoints with the features of interest defined as the corners of the AprilTag fixed to the SIA20F base frame. The total number of 2500 waypoints for each dataset was selected to limit the data acquisition time to approximately 4 h for each dataset. The 4-h approximation assumes a conservative 6-s per image average acquisition rate which was limited by a conservative move to waypoint, stop movement, capture image, and capture pose data acquisition strategy. The 4-h per dataset time limit was chosen to restrict the test duration to a total of 8 h while supervising the system.

A collision-free path was generated for each set of waypoints using the method described in Sec. 3.3. The waypoints and collision-free paths generated for both the checkerboard and AprilTag fiducials are shown in Fig. 4 (left), and the joint space paths and waypoints missed when generating the collision-free path are shown in Fig. 4 (right). Of the 2500 waypoints generated, 2367 checkerboard waypoints were reached using the collision-free path, and 1442 AprilTag waypoints were reached using the collision-free path with unreachable waypoints replaced by the closest reachable configuration. The data acquisition time using the TES to capture checkerboard images was approximately 3 h and 40 min for the 2500 waypoints associated with the checkerboard (averaging approximately 5.3 s per image). The data acquisition time using the TES to capture AprilTag images was approximately 3 h and 30 min for the 2500 waypoints associated with the AprilTag (averaging approximately 5.0 s per image).

Fig. 4
Automated data acquisition results highlighting task-space waypoints and collision-free paths overlaid on the TES simulation with the SIA20F visualization suppressed (left); and the joint-space collision-free paths with highlighted missed waypoints (right). Checkerboard paths are highlighted in blue, and AprilTag paths are highlighted in red.
Fig. 4
Automated data acquisition results highlighting task-space waypoints and collision-free paths overlaid on the TES simulation with the SIA20F visualization suppressed (left); and the joint-space collision-free paths with highlighted missed waypoints (right). Checkerboard paths are highlighted in blue, and AprilTag paths are highlighted in red.
Close modal

The resultant dataset yielded 2326 viable checkerboard image/pose correspondences and 1613 AprilTag image/pose correspondences. Note that the decrease in viable image/pose correspondences for the checkerboard dataset indicates that 41 images contained a partial or obstructed checkerboard view, and the increase in viable image/pose correspondences for the AprilTag dataset indicates that 171 “missed-waypoints” yielded an in-view AprilTag.

4.2 Calibration Results.

The image/pose correspondences described in Sec. 4.1 were separated into calibration and validation subsets. Camera calibration was performed using 1163 (odd index values) of the 2326 checkerboard images collected, and evaluated against the remaining 1163 images not used for calibration (even index values). Both the calibration and evaluation datasets yielded a mean reprojection error of 0.10 pixels. Using tools from the MATLAB Camera Calibration toolbox, the calibration processing time was approximately 2 h and 20 min for each of the 1163 image datasets. These results were achieved using a Windows 10 operating system running on a PC with a 3.80 GHz processor (Intel Xeon Processor E3-1270 v6 8M Cache), 32 GB RAM, and an NVIDIA Quadro P1000 graphics card. The MATLAB version used was R2021b. The camera calibration results are shown in Fig. 5.

Fig. 5
Reprojection error associated with fiducial extrinsics recovered using camera calibration. Calibration reprojection error is shown in green with a mean error highlighted in yellow. The evaluation reprojection error is shown in blue, and the mean shown in magenta. Images with “near-redundant” extrinsics are highlighted in gray.
Fig. 5
Reprojection error associated with fiducial extrinsics recovered using camera calibration. Calibration reprojection error is shown in green with a mean error highlighted in yellow. The evaluation reprojection error is shown in blue, and the mean shown in magenta. Images with “near-redundant” extrinsics are highlighted in gray.
Close modal

Using the checkerboard and AprilTag fiducial extrinsics recovered with camera calibration parameters and the AprilTag library, the “AX = XB” solution was applied to recover Hgf,Hbya, and Htuc. Using MATLAB 2021b on the same Windows 10 PC used for camera calibration, the processing time for AprilTag pose recovery and the “AX = XB” solution was approximately 40 min for 807 image calibration dataset, and the AprilTag pose recovery for the 806 image evaluation dataset was approximately 10 min.

Figure 6 shows the “mean extrinsic error” associated with the static transformations Hgf and Htuc; and Fig. 7 shows the “mean extrinsic error” associated with the static transformations Hbya and Htuc. The term “mean extrinsic error” in this context is defined as the mean error between the 3D position of the checkerboard or AprilTag corners defined using extrinsics recovered using camera calibration or the AprilTag library, and the positions of corners using MoCap and static recovered transformations. For both the checkerboard and AprilTag static transformation recovery, the use of odd index value images for calibrations yields substantial mean extrinsic error for both calibration and evaluation (Figs. 6 and 7-top). To account both for the redundant images highlighted in Fig. 5 and possible non-Gaussian noise associated with the VRPN MoCap feedback, calibration using 6000 individual subsets of 100 randomly selected images was explored. The lowest error calibration results from both the checkerboard and AprilTag random subset is shown in Figs. 6 and 7-bottom.

Fig. 6
Mean extrinsic error associated with the recovery of the checkerboard static transformations (Hgf and Htuc) using the “AX = XB” solution with odd index value images for calibration (top), and the lowest-error 100 image subset calibration (bottom)
Fig. 6
Mean extrinsic error associated with the recovery of the checkerboard static transformations (Hgf and Htuc) using the “AX = XB” solution with odd index value images for calibration (top), and the lowest-error 100 image subset calibration (bottom)
Close modal
Fig. 7
Mean extrinsic error associated with the recovery of the AprilTag static transformations (Hbya and Htuc) using the “AX = XB” solution with odd index value images for calibration (top), and the lowest-error 100 image subset calibration (bottom)
Fig. 7
Mean extrinsic error associated with the recovery of the AprilTag static transformations (Hbya and Htuc) using the “AX = XB” solution with odd index value images for calibration (top), and the lowest-error 100 image subset calibration (bottom)
Close modal

4.3 Auto-Labeling Results.

Autolabeling of the drogue is performed using the recovered value of Htuc (Sec. 4.2), MoCap feedback defining Htuw and Htyw, camera intrinsics Acm, a user-defined approximation of Htyd, and a user-defined set of salient drogue features Pd. For this work, the drogue features consist of a 609.9 mm (24 in) diameter circle offset 596.9 mm (23.5 in) from a coaxial 101.6 mm (4 in) diameter circle. These circles define the drogue's inflated cloth drag component and coupler. To show orientation discrepancies associated with this approach, circles are connected using 36 equally spaced segments approximating the 3D “spokes” connecting the cloth drag component to the coupler.

The salient features are then projected onto corresponding images using Eqs. (1), (3), and (4). Figure 8 shows projected salient features overlaid onto seven captured images of the drogue.

Fig. 8
User-defined salient features projected onto captured images of the drogue using MoCap feedback, calibrated values for Htuc and Acm, and a user-defined Htyd
Fig. 8
User-defined salient features projected onto captured images of the drogue using MoCap feedback, calibrated values for Htuc and Acm, and a user-defined Htyd
Close modal

5 Conclusion and Future Work

The automated data acquisition results presented in Sec. 4.1 show that the methods presented in Sec. 3.3 provide a viable approach to acquiring large datasets safely using the TES and TES simulation environment. Of the 2500 desired waypoints collected, >93% of checkerboard waypoints yielded viable images and >64% of AprilTag waypoints yielded viable images. This discrepancy between the desired number of waypoints, and the number of waypoints yielding viable images suggests a need for an improved data acquisition approach to efficiency. As described in Sec. 3.3, future work will implement collision-free motion planning approaches from the literature, and alternatives to the random sampling approach will be explored.

Data acquisition times for the checkerboard and AprilTag datasets were approximately 5.3 and 5.0 s per image, respectively. The variation in acquisition times results largely from the waypoint spacing in the random sampling defined by Algorithm 1 and the replacement of waypoints during the collision-free motion planning process. The data acquisition time for the TES can be further reduced by refining the waypoint order to minimize the total distance traveled and by removing waypoints where the fiducial of interest is not in the field of view. These improvements will be addressed in future work.

The processing time for calibration is time-consuming and does not scale well as the image sets become very large. For camera calibration, processing times were approximately 2 h and 20 min for each 1163 image dataset. For AprilTag calibration, this time reduces to approximately 40 min for 807 images, and for AprilTag evaluation this time drops to approximately 10 min. While this processing time is extensive, calibration and evaluation datasets can be processed “overnight” without operator supervision to ensure hardware safety. This processing time can be reduced in the near term by migrating the calibration from a MATLAB environment to a compiled language (e.g., C++).

The camera calibration results presented in Sec. 4.2 suggest that the data acquisition capabilities of the TES provide excellent camera calibration: 0.10 pixel reprojection error for both the calibration and evaluation data for 1280 × 960 pixel images. Further analysis of this calibration and evaluation data shows that >21% of the images collected using the automated data acquisition technique are effectively redundant (i.e., providing a fiducial pose relative to the camera frame that nearly matches the pose of an existing waypoint). This further emphasizes the need to improve the automated data acquisition efficiency to ensure a wide variety of unique samples within the workspace of the TES.

Results from static transformation recovery described in Sec. 4.2 highlight an oversight in both the TES design and automated data acquisition approach described in this work. Specifically, the use of odd index value images for both the checkerboard and AprilTag fiducials (Figs. 6 and 7-top) yields unacceptably high mean calibration and evaluation extrinsic errors with values exceeding 305 mm or 12 in. while the use of 100 image subsets for calibration drastically reduces both calibration and evaluation extrinsic errors (Figs. 6 and 7-bottom).

Rerunning portions of the automated data acquisition on the TES shows intermittent, discontinuous “jumps” in the poses reported by the VRPN MoCap interface, primarily in Hgw. Viewing the same rerun automated data acquisition using the native manufacturer software interface (Motive) shows that markers associated with the MoCap camera rigid body (Frame g) are intermittently lost and the rigid body is either not being tracked or an incorrect pose is being assigned. Noting the presence of extraneous MoCap data within the data collected for both the checkerboard and AprilTag fiducials explains the improvement seen with the 100-image subset approach to calibration. Further, identifying of 214 (<10%) of checkerboard images and 171 (<12%) of AprilTag images as outliers reduces the evaluation extrinsic error to < 3.4 mm (< 0.13 in) for the checkerboard fiducial and < 4.1 mm (<0.16 in) for the AprilTag fiducial. While these final errors are > 10 times the ±0.3 mm estimated in the TES design (Sec. 3.1), they are well within the ground-truth tracking accuracy required for the application proposed in this work.

To improve static transformation recovery during calibration, three near-term methods are proposed for future work: (1) improve the camera MoCap rigid body design by increasing marker spacing, decreasing symmetry in marker placement, and defining a marker placement that can be tracked reliably in all orientations; (2) leverage MoCap manufacturer software tools (e.g., NatNet SDK) to capture individual marker visibility information during automated data acquisition; and (3) record native MoCap sessions during automated data acquisition for outlier detection in postprocessing. These improvements should improve TES data quality by providing the information necessary to automatically remove erroneous tracking information. Beyond these near-term improvements, future work will explore extensions to the automated waypoint generation for the data acquisition algorithm (Algorithm 1) to enable waypoint generation throughout the TES workspace. Instead of searching arbitrarily defined subregions to reduce computation overhead, these future methods will explore alternative constraints such as uniform endpoint spacing in task space, movement of feature points throughout the camera's field of view, etc. This proposed improvement to automated waypoint generation will provide a more complete exploration of the viable data acquisition space and may provide improvements when used to augment the TES calibration.

The autolabeling results shown in Fig. 8 qualitatively demonstrate the performance of the TES in the context of aerial refueling. Both the inflated cloth drag component and coupler of the drogue appear to align well with the user-defined model despite the poor lighting of the coupler in the collected images. The added overlay of the drogue “spokes” shows marginal misalignment along the drogue's center axis. Future work will replace the manual definition of Htyd and Pd with values defined using a MoCap digitizing wand to trace drogue features. This will eliminate the manual definition of values by the user and should improve autolabeling accuracy.

The TES presented in this work is a viable tool for generating large, autolabeled datasets of an aerial refueling drogue. Results demonstrate calibration methods providing ground-truth measurements with a mean precision of ±4.1 mm, autolabeling of drogue images appears to be qualitatively accurate. Unlike traditional methods for establishing ground truth labeling of images used to train a DNN, use of the TES (1) requires no time-consuming manual labeling or manual evaluation of ground truth and (2) provides a quantitative performance metric describing the precision of labeling in linear units. While this preliminary work was conducted under operator supervision to ensure hardware safety, supervision requirements can be relaxed or removed as the TES is further refined. Future work is proposed to address the identified limitations of and improvements to the system.

References

1.
Tucker
,
P.
,
2021
, “
Drones Could One Day Make Up 40 Percent of a Carrier Air Wing, Navy Says
,” accessed Dec. 27, 2023, https://www.defenseone.com/technology/2021/03/drones-could-one-day-make-40-carrier-air-wing-navy-says/172799/
2.
Shelbourne
,
M.
,
2021
, “
MQ-25A Unmanned Prototype Now on Carrier George H.W. Bush for at-Sea Testing
,” accessed Dec. 27, 2023, https://news.usni.org/2021/12/02/mq-25a-unmanned-prototype-now-on-carrier-george-h-w-bush-for-at-sea-testing
3.
Costello
,
D.
, and
Xu
,
H.
,
2023
, “
Using a Run Time Assurance Approach for Certifying Autonomy Within Naval Aviation
,”
Syst. Eng.
,
26
(
3
), pp.
271
278
.10.1002/sys.21654
4.
Schweikhard
,
K.
,
2006
, “
Results of NASA/DARPA Automatic Probe and Drogue Refueling Flight Test
,”
SAE Guidance and Control Subcommittee Meeting
, SAE,
Williamsburg, VA
.
5.
NORTHROP GRUMMAN,
2015
, “
X-47B Unmanned Aircraft Demonstrates the First Autonomous Aerial Refueling
,” accessed Dec.27, https://www.northropgrumman.com/what-we-do/air/x-47b-ucas
6.
Zhang
,
S.
,
Jafari
,
O.
, and
Nagarkar
,
P.
,
2021
, “
A Survey on Machine Learning Techniques for Auto Labeling of Video, Audio, and Text Data
,”
arXiv:2109.03784.
10.48550/arxiv.2109.03784
7.
Ross
,
B.
,
Mauldin
,
C.
,
Parry
,
J.
, and
Costello
,
D.
,
2022
, “
First Steps Toward Certifying an UAS to Receive Fuel Airborne
,”
Proceedings of International Conference on Unmanned Aircraft Systems
,
IEEE
,
Dubrovnik, Croatia
, June
21
24
.10.1109/ICUAS54217.2022.9836174
8.
Nabavinejad
,
S. M.
,
Reda
,
S.
, and
Ebrahimi
,
M.
,
2022
, “
Coordinated Batching and DVFS for DNN Inference on GPU Accelerators
,”
IEEE Trans. Parallel Distrib. Syst.
,
33
(
10
), pp.
2496
2508
.10.1109/TPDS.2022.3144614
9.
Onim
,
M. S. H.
,
Nyeem
,
H.
,
Roy
,
K.
,
Hasan
,
M.
,
Ishmam
,
A.
,
Akif
,
M. A. H.
, and
Ovi
,
T. B.
,
2022
, “
BLPnet: A New DNN Model and Bengali OCR Engine for Automatic Licence Plate Recognition
,”
Array (New York)
,
15
, p.
100244
.10.1016/j.array.2022.100244
10.
Deng
,
X.
,
Qin
,
W.
,
Zhang
,
R.
, and
Qi
,
Y.
,
2019
, “
Automatic Segmentation Algorithm of License Plate Image Based on PCNN and DNN
,”
2019 International Conference on Image and Video Processing, and Artificial Intelligence
,
SPIE
, Taipie, Taiwan, Sept. 22-29, p.
1132102
.10.1117/12.2537280
11.
Fuertes
,
D.
,
del Blanco
,
C. R.
,
Carballeira
,
P.
,
Jaureguizar
,
F.
, and
García
,
N.
,
2022
, “
People Detection With Omnidirectional Cameras Using a Spatial Grid of Deep Learning Foveatic Classifiers
,”
Digital Signal Processing
,
126
, p.
103473
.10.1016/j.dsp.2022.103473
12.
Kim
,
M.
, and
Lee
,
I.
,
2022
, “
Human-Guided Auto-Labeling for Network Traffic Data: The GELM Approach
,”
Neural Networks
,
152
, pp.
510
526
.10.1016/j.neunet.2022.05.007
13.
Elezi
,
I.
,
Yu
,
Z.
,
Anandkumar
,
A.
,
Leal-Taixe
,
L.
, and
Alvarez
,
J. M.
,
2021
, “
Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
, June 19–25, pp.
14492
14501
.
14.
Ganeshan
,
A.
,
Vallet
,
A.
,
Kudo
,
Y.
,
Maeda
,
S.-I.
,
Kerola
,
T.
,
Ambrus
,
R.
,
Park
,
D.
, and
Gaidon
,
A.
,
2021
, “
Warp-Refine Propagation: Semi-Supervised Auto-Labeling Via Cycle-Consistency
,”
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
,
IEEE
, Oct. 10–17, pp.
15479
15489
.
15.
Elezi
,
I.
,
Yu
,
Z.
,
Anandkumar
,
A.
,
Leal-Taixe
,
L.
, and
Alverez
,
J.
,
2021
, “
Towards Reducing Labeling Cost in Deep Object Detection
,” accessed Dec. 27, https://authors.library.caltech.edu/110652/1/2106.11921.pdf
16.
Alsammed
,
S. M. Z. A.
,
2021
, “
Implementation of Lung Cancer Diagnosis Based on DNN in Healthcare System
,”
Webology
,
18
(
Special Issue 04
), pp.
798
812
.10.14704/WEB/V18SI04/WEB18166
17.
Surendar
,
P.
,
2021
, “
Diagnosis of Lung Cancer Using Hybrid Deep Neural Network With Adaptive Sine Cosine Crow Search Algorithm
,”
J. Comput. Sci.
,
53
, p.
101374
.10.1016/j.jocs.2021.101374
18.
Tan
,
H.
,
Bates
,
J. H. T.
, and
Matthew Kinsey
,
C.
,
2022
, “
Discriminating TB Lung Nodules From Early Lung Cancers Using Deep Learning
,”
BMC Med. Inf. Decis. Making
,
22
(
1
), pp.
1
161
.10.1186/s12911-022-01904-8
19.
Arumuga Maria Devi
,
T.
, and
Mebin Jose
,
V. I.
,
2021
, “
Three Stream Network Model for Lung Cancer Classification in the CT Images
,”
Open Comput. Sci.
,
11
(
1
), pp.
251
261
.10.1515/comp-2020-0145
20.
Song
,
Q.
,
Zhao
,
L.
,
Luo
,
X.
, and
Dou
,
X.
,
2017
, “
Using Deep Learning for Classification of Lung Nodules on Computed Tomography Images
,”
J. Healthcare Eng.
,
2017
, pp.
1
7
.10.1155/2017/8314740
21.
Kim
,
B.
, and
Lee
,
J.
,
2021
, “
A Bayesian Network-Based Information Fusion Combined With DNNs for Robust Video Fire Detection
,”
Appl. Sci.
,
11
(
16
), p.
7624
.10.3390/app11167624
22.
Park
,
M.
, and
Ko
,
B. C.
,
2020
, “
Two-Step Real-Time Night-Time Fire Detection in an Urban Environment Using Static ELASTIC-YOLOv3 and Temporal Fire-Tube
,”
Sensors (Basel, Switzerland)
,
20
(
8
), p.
2202
.10.3390/s20082202
23.
Khan
,
A.
,
Hassan
,
B.
,
Khan
,
S.
,
Ahmed
,
R.
, and
Abuassba
,
A.
,
2022
, “
DeepFire: A Novel Dataset and Deep Transfer Learning Benchmark for Forest Fire Detection
,”
Mobile Inf. Syst.
,
2022
, pp.
1
14
.10.1155/2022/5358359
24.
Kocić
,
J.
,
Jovičić
,
N.
, and
Drndarević
,
V.
,
2019
, “
An End-to-End Deep Neural Network for Autonomous Driving Designed for Embedded Automotive Platforms
,”
Sensors (Basel, Switzerland)
,
19
(
9
), p.
2064
.10.3390/s19092064
25.
Gupta
,
A.
,
Anpalagan
,
A.
,
Guan
,
L.
, and
Khwaja
,
A. S.
,
2021
, “
Deep Learning for Object Detection and Scene Perception in Self-Driving Cars: Survey, Challenges, and Open Issues
,”
Array (New York)
,
10
, p.
100057
.10.1016/j.array.2021.100057
26.
Tian
,
Y.
,
Pei
,
K.
,
Jana
,
S.
, and
Ray
,
B.
,
2018
, “
DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars
,”
2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE '18), ACM
, Gothenburg, Sweden, May 27–June 3, pp.
303
314
.10.1145/3180155.3180220
27.
Lee
,
A.
,
Dallmann
,
W.
,
Nykl
,
S.
,
Taylor
,
C.
, and
Borghetti
,
B.
,
2020
, “
Long-Range Pose Estimation for Aerial Refueling Approaches Using Deep Neural Networks
,”
J. Aerosp. Inf. Syst.
,
17
(
11
), pp.
634
646
.10.2514/1.I010842
28.
Johnson
,
D. T.
,
Nykl
,
S. L.
, and
Raquet
,
J. F.
,
2017
, “
Combining Stereo Vision and Inertial Navigation for Automated Aerial Refueling
,”
J. Guid., Control, Dyn.
,
40
(
9
), pp.
2250
2259
.10.2514/1.G002648
29.
Paulson
,
Z.
,
Nykl
,
S.
,
Pecarina
,
J.
, and
Woolley
,
B.
,
2019
, “
Mitigating the Effects of Boom Occlusion on Automated Aerial Refueling Through Shadow Volumes
,”
J. Def. Model. Simul.
,
16
(
2
), pp.
175
189
.10.1177/1548512918808408
30.
Piekenbrock
,
M.
,
Robinson
,
J.
,
Burchett
,
L.
,
Nykl
,
S.
,
Woolley
,
B.
, and
Terzuoli
,
A.
,
2016
, “
Automated Aerial Refueling: Parallelized 3D Iterative Closest Point: Subject Area: Guidance and Control
,” 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (
OIS
),
IEEE
, Dayton, OH, July 26–29, pp.
188
192
.10.1109/NAECON.2016.7856797
31.
Parsons
,
C.
,
Paulson
,
Z.
,
Nykl
,
S.
,
Dallman
,
W.
,
Woolley
,
B. G.
, and
Pecarina
,
J.
,
2019
, “
Analysis of Simulated Imagery for Real-Time Vision-Based Automated Aerial Refueling
,”
J. Aerosp. Inf. Syst.
,
16
(
3
), pp.
77
93
.10.2514/1.I010658
32.
Valasek
,
J.
,
Gunnam
,
K.
,
Kimmett
,
J.
,
Junkins
,
J. L.
,
Hughes
,
D.
, and
Tandale
,
M. D.
,
2005
, “
Vision-Based Sensor and Navigation System for Autonomous Air Refueling
,”
J. Guid., Control, Dyn.
,
28
(
5
), pp.
979
989
.10.2514/1.11934
33.
Tandale
,
M. D.
,
Bowers
,
R.
, and
Valasek
,
J.
,
2006
, “
Trajectory Tracking Controller for Vision-Based Probe and Drogue Autonomous Aerial Refueling
,”
J. Guid., Control, Dyn.
,
29
(
4
), pp.
846
857
.10.2514/1.19694
34.
Ren
,
J.
,
Dai
,
X.
,
Quan
,
Q.
,
Zi-Bo
,
W.
, and
Kai-Yuan
,
C.
,
2019
, “
Reliable Docking Control Scheme for Probe-Drogue Refueling
,”
J. Guid., Control, Dyn.
,
42
(
11
), pp.
2511
2520
.10.2514/1.G003708
35.
Valasek
,
J.
,
Famularo
,
D.
, and
Marwaha
,
M.
,
2017
, “
Fault-Tolerant Adaptive Model Inversion Control for Vision-Based Autonomous Air Refueling
,”
J. Guid., Control, Dyn.
,
40
(
6
), pp.
1336
1347
.10.2514/1.G001888
36.
Wang
,
X.
,
Kong
,
X.
,
Zhi
,
J.
,
Chen
,
Y.
, and
Dong
,
X.
,
2015
, “
Real-Time Drogue Recognition and 3D Locating for UAV Autonomous Aerial Refueling Based on Monocular Machine Vision
,”
Chin. J. Aeronaut.
,
28
(
6
), pp.
1667
1675
.10.1016/j.cja.2015.10.006
37.
Costello
,
D.
, and
Adams
,
R.
,
2023
, “
A Framework for Airworthiness Certification of Autonomous Systems Within United States Naval Aviation
,”
J. Aviat.
,
7
(
1
), pp.
7
16
.10.30518/jav.1161725
38.
Parry
,
J.
,
Costello
,
D.
,
Rupert
,
J.
, and
Taylor
,
G.
,
2023
, “
The National Airworthiness Council Artificial Intelligence Working Group (NACAIWG) Summit Proceedings 2022
,”
Syst. Eng.
,
26
(
6
), pp.
925
930
.10.1002/sys.21703
39.
Costello
,
D.
, and
Xu
,
H.
,
2022
, “
Run Time Assurance Approach to Certifying Autonomy Within Naval Aviation: Possible Method to Certify w/o a Human in or on the Loop
,” XPotential 2022,
Orlando, FL
, Apr.
22
25
.
40.
ATP-3.3.4.2.(D), U.S. STANDARDS RELATED DOCUMENT (SRD)
,
2022
, “
NATO Joint Airpower Competence Centre
,” accessed July 20, 2023, https://coi.japcc.org/app/uploads/US-National-SRD.pdf
41.
Edwards
,
S.
,
2022
, “
Motoman
,” accessed July 20, 2023, http://wiki.ros.org/motoman
42.
Anderson
,
T.
, and
Rasmussen
,
S.
,
2022
, “
ur_Modern_Driver
,” accessed July 20, 2023, http://wiki.ros.org/ur_modern_driver
43.
Bovbel
,
P.
,
2022
, “
Vrpn_Client_Ros
,” accessed July 20, 2023, http://wiki.ros.org/vrpn_client_ros
44.
Helmich
,
H.
,
2022
, “
RosYaskawaToolbox
,” accessed July 20, 2023, https://github.com/hfhelmich/RosYaskawaToolbox
45.
Helmich
,
H.
,
2022
, “
RosURToolbox
,” accessed July 20, 2023, https://github.com/hfhelmich/RosURToolbox
46.
Helmich
,
H.
,
2022
, “
RosVRPNToolbox
,” accessed July 20, 2023, https://github.com/hfhelmich/RosVRPNToolbox
47.
Park
,
F.
, and
Martin
,
B.
,
1994
, “
Robot Sensor Calibration: Solving AX=XB on the Euclidean Group
,”
IEEE Trans. Rob. Autom.
,
10
(
5
), pp.
717
721
.10.1109/70.326576
48.
Kutzer
,
M.
,
2022
, “
Transformation Toolbox for MATLAB
,” accessed July 20, 2023, https://github.com/kutzer/TransformationToolbox
49.
Kutzer
,
M.
,
2022
, “
Patch Toolbox for MATLAB
,” accessed July 20, 2023, https://github.com/kutzer/PatchToolbox
50.
Kutzer
,
M.
,
2022
, “
Geometry Toolbox for MATLAB
,” accessed July 20, 2023, https://github.com/kutzer/GeometryToolbox
51.
Kutzer
,
M.
,
2022
, “
Plotting Toolbox for MATLAB
,” accessed July 20, 2023, https://github.com/kutzer/PlottingToolbox
52.
Kimmel
,
A.
,
Shome
,
R.
,
Littlefield
,
Z.
, and
Bekris
,
K.
,
2019
, “
Fast, Anytime Motion Planning for Prehensile Manipulation in Clutter
,”
arXiv:1806.07465.
10.48550/arXiv.1806.07465