Graphical Abstract Figure
Graphical Abstract Figure
Close modal

Abstract

The evolution of multimodal large language models (LLMs) capable of processing diverse input modalities (e.g., text and images) holds new prospects for their application in engineering design, such as the generation of 3D computer-aided design (CAD) models. However, little is known about the ability of multimodal LLMs to generate 3D design objects, and there is a lack of quantitative assessment. In this study, we develop an approach to enable LLMs to generate 3D CAD models (i.e., LLM4CAD) and perform experiments to evaluate their efficacy where GPT-4 and GPT-4V were employed as examples. To address the challenge of data scarcity for multimodal LLM studies, we created a data synthesis pipeline to generate CAD models, sketches, and image data of typical mechanical components (e.g., gears and springs) and collect their natural language descriptions with dimensional information using Amazon Mechanical Turk. We positioned the CAD program (programming script for CAD design) as a bridge, facilitating the conversion of LLMs’ textual output into tangible CAD design objects. We focus on two critical capabilities: the generation of syntactically correct CAD programs (Cap1) and the accuracy of the parsed 3D shapes (Cap2) quantified by intersection over union. The results show that both GPT-4 and GPT-4V demonstrate great potential in 3D CAD generation by just leveraging their zero-shot learning ability. Specifically, on average, GPT-4V outperforms when processing only text-based input, exceeding the results obtained using multimodal inputs, such as text with image, for Cap 1 and Cap 2. However, when examining category-specific results of mechanical components, the prominence of multimodal inputs is increasingly evident for more complex geometries (e.g., springs and gears) in both Cap 1 and Cap 2. The potential of multimodal LLMs to improve 3D CAD generation is clear, but their application must be carefully calibrated to the complexity of the target CAD models to be generated.

References

1.
Brown
,
T.
,
Mann
,
B.
,
Ryder
,
N.
,
Subbiah
,
M.
,
Kaplan
,
J. D.
,
Dhariwal
,
P.
,
Neelakantan
,
A.
, et al.,
2020
, “
Language Models Are Few-Shot Learners
,”
Adv. Neural Inform. Process. Syst.
,
33
, pp.
1877
1901
.
2.
Kasneci
,
E.
,
Seßler
,
K.
,
Küchemann
,
S.
,
Bannert
,
M.
,
Dementieva
,
D.
,
Fischer
,
F.
,
Gasser
,
U.
, et al.,
2023
, “
Chatgpt for Good? On Opportunities and Challenges of Large Language Models for Education
,”
Learn. Individual Differ.
,
103
, p.
102274
.
3.
OpenAI
,
2023
, “Gpt-4v(ision) System Card.”
4.
Driess
,
D.
,
Xia
,
F.
,
Sajjadi
,
M. S.
,
Lynch
,
C.
,
Chowdhery
,
A.
,
Ichter
,
B.
, and
Wahid
,
A.
,
2023
, “
Palm-e: An Embodied Multimodal Language Model
,”
Proceedings of the 40th International Conference on Machine Learning
,
Honolulu, HI
,
July 23–29
, PMLR, pp.
8469
8488
.
5.
Kocaballi
,
A. B.
,
2023
, “Conversational AI-Powered Design: Chatgpt as Designer, User, and Product,” preprint arXiv:2302.07406.
6.
Filippi
,
S.
,
2023
, “
Measuring the Impact of Chatgpt on Fostering Concept Generation in Innovative Product Design
,”
Electronics
,
12
(
16
), p.
3535
.
7.
Ma
,
K.
,
Grandi
,
D.
,
McComb
,
C.
, and
Goucher-Lambert
,
K.
,
2023
, “
Conceptual Design Generation Using Large Language Models
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Boston, MA
,
Aug. 20–23
, p. V006T06A021.
8.
Li
,
X.
,
Wang
,
Y.
, and
Sha
,
Z.
,
2023
, “
Deep Learning Methods of Cross-Modal Tasks for Conceptual Design of Product Shapes: A Review
,”
ASME J. Mech. Des.
,
145
(
4
), p.
041401
.
9.
Li
,
X.
,
Wang
,
Y.
, and
Sha
,
Z.
,
2022
, “
Deep Learning of Cross-Modal Tasks for Conceptual Design of Engineered Products: A Review
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
St. Louis, MO
,
Aug. 14–17
, p. V006T06A016.
10.
Song
,
B.
,
Zhou
,
R.
, and
Ahmed
,
F.
,
2024
, “
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
,”
ASME J. Comput. Inf. Sci. Eng.
,
24
(
1
), p.
010801
.
11.
Li
,
X.
,
Xie
,
C.
, and
Sha
,
Z.
,
2022
, “
A Predictive and Generative Design Approach for Three-Dimensional Mesh Shapes Using Target-Embedding Variational Autoencoder
,”
ASME J. Mech. Des.
,
144
(
11
), p.
114501
.
12.
Gao
,
L.
,
Madaan
,
A.
,
Zhou
,
S.
,
Alon
,
U.
,
Liu
,
P.
,
Yang
,
Y.
,
Callan
,
J.
, and
Neubig
,
G.
,
2023
, “
Pal: Program-Aided Language Models
,”
Proceedings of the 40th International Conference on Machine Learning
,
Honolulu, HI
,
July 23–29
,
PMLR, pp. 10764–10799
.
13.
Li
,
X.
,
Sun
,
Y.
, and
Sha
,
Z.
,
2024
, “
Llm4cad: Multi-modal Large Language Models for 3d Computer-Aided Design Generation
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Washington, DC
,
Aug. 25–28
, p. V006T06A015.
14.
Nelson
,
M. D.
,
Goenner
,
B. L.
, and
Gale
,
B. K.
,
2023
, “
Utilizing Chatgpt to Assist CAD Design for Microfluidic Devices
,”
Lab Chip
,
23
(
17
), pp.
3778
3784
.
15.
Baltrušaitis
,
T.
,
Ahuja
,
C.
, and
Morency
,
L.-P.
,
2018
, “
Multimodal Machine Learning: A Survey and Taxonomy
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
41
(
2
), pp.
423
443
.
16.
Song
,
B.
,
Miller
,
S.
, and
Ahmed
,
F.
,
2023
, “
Attention-Enhanced Multimodal Learning for Conceptual Design Evaluations
,”
ASME J. Mech. Des.
,
145
(
4
), p.
041410
.
17.
Su
,
H.
,
Song
,
B.
, and
Ahmed
,
F.
,
2023
, “
Multi-modal Machine Learning for Vehicle Rating Predictions Using Image, Text, and Parametric Data
,”
International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Boston, MA
,
Aug. 20–23
, p. V002T02A089.
18.
Chowdhary
,
K. R.
,
2020
, “Natural Language Processing,”
Fundamentals of Artificial Intelligence
,
Springer
,
New Delhi
, pp.
603
649
.
19.
OpenAI
,
2022
, “Introducing ChatGPT: Optimizing Language Models for Dialogue,” November, https://openai.com/blog/chatgpt/, Accessed January 16, 2024.
20.
Wu
,
T.
,
He
,
S.
,
Liu
,
J.
,
Sun
,
S.
,
Liu
,
K.
,
Han
,
Q.-L.
, and
Tang
,
Y.
,
2023
, “
A Brief Overview of Chatgpt: The History, Status Quo and Potential Future Development
,”
IEEE/CAA J. Autom. Sin.
,
10
(
5
), pp.
1122
1136
.
21.
Ray
,
P. P.
,
2023
, “
Chatgpt: A Comprehensive Review on Background, Applications, Key Challenges, Bias, Ethics, Limitations and Future Scope
,”
Int. Things Cyber-Phys. Syst.
,
3
, pp.
121
154
.
22.
Haleem
,
A.
,
Javaid
,
M.
, and
Singh
,
R. P.
,
2022
, “
An Era of Chatgpt as a Significant Futuristic Support Tool: A Study on Features, Abilities, and Challenges
,”
BenchCouncil Trans. Bench. Standards Eval.
,
2
(
4
), p.
100089
.
23.
Abdullah
,
M.
,
Madain
,
A.
, and
Jararweh
,
Y.
,
2022
, “
Chatgpt: Fundamentals, Applications and Social Impacts
,”
Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS)
,
Milan, Italy
,
Nov. 29–Dec. 1
, pp.
1
8
.
24.
Gulwani
,
S.
,
Polozov
,
O.
,
Singh
,
R.
,
2017
, “
Program Synthesis
,”
Found. Trends Programm. Lang.
,
4
(
1–2
), pp.
1
119
.
25.
Wang
,
X.
,
Anwer
,
N.
,
Dai
,
Y.
, and
Liu
,
A.
,
2023
, “
Chatgpt for Design, Manufacturing, and Education
,”
Procedia CIRP
,
119
, pp.
7
14
.
26.
Makatura
,
L.
,
Foshey
,
M.
,
Wang
,
B.
,
HähnLein
,
F.
,
Ma
,
P.
,
Deng
,
B.
,
Tjandrasuwita
,
M.
, et al.,
2023
, “How Can Large Language Models Help Humans in Design and Manufacturing?” preprint arXiv:2307.14377.
27.
Wu
,
F.
,
Hsiao
,
S.-W.
, and
Lu
,
P.
,
2024
, “
An Aigc-Empowered Methodology to Product Color Matching Design
,”
Displays
,
81
, p.
102623
.
28.
Grandi
,
D.
,
Patawari Jain
,
Y.
,
Groom
,
A.
,
Cramer
,
B.
, and
McComb
,
C.
,
2025
, “
Evaluating Large Language Models for Material Selection
,”
ASME J. Comput. Inf. Sci. Eng.
,
25
(
2
), p.
021004
.
29.
Meltzer
,
P.
,
Lambourne
,
J. G.
, and
Grandi
,
D.
,
2024
, “
What’s in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models Through User-Provided Names in Computer Aided Design Files
,”
ASME J. Comput. Inf. Sci. Eng.
,
24
(
1
), p.
011002
.
30.
Naghavi Khanghah
,
K.
,
Wang
,
Z.
, and
Xu
,
H.
,
2025
, “
Reconstruction and Generation of Porous Metamaterial Units Via Variational Graph Autoencoder and Large Language Model
,”
ASME J. Comput. Inf. Sci. Eng.
,
25
(
2
), p.
021003
.
31.
OpenAI
,
2023
, “Gpt-4 Technical Report,” preprint arXiv:2303.08774.
32.
Kim
,
S.
,
Chi
,
H.-g.
,
Hu
,
X.
,
Huang
,
Q.
, and
Ramani
,
K.
,
2020
, “
A Large-Scale Annotated Mechanical Components Benchmark for Classification and Retrieval Tasks With Deep Neural Networks
,” Proceedings of the 16th European Conference on Computer Vision (ECCV), Glasgow, Aug. 23–28,
Springer
, pp.
175
191
.
33.
Lee
,
H.
,
Lee
,
J.
,
Kim
,
H.
, and
Mun
,
D.
,
2022
, “
Dataset and Method for Deep Learning-Based Reconstruction of 3d CAD Models Containing Machining Features for Mechanical Parts
,”
J. Comput. Des. Eng.
,
9
(
1
), pp.
114
127
.
34.
Manda
,
B.
,
Dhayarkar
,
S.
,
Mitheran
,
S.
,
Viekash
,
V.
, and
Muthuganapathy
,
R.
,
2021
, “
‘cadsketchnet’—An Annotated Sketch Dataset for 3d CAD Model Retrieval With Deep Neural Networks
,”
Comput. Graph.
,
99
, pp.
100
113
.
35.
McKay
,
M. D.
,
Beckman
,
R. J.
, and
Conover
,
W. J.
,
2000
, “
A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code
,”
Technometrics
,
42
(
1
), pp.
55
61
.
36.
Luo
,
T.
,
Rockwell
,
C.
,
Lee
,
H.
, and
Johnson
,
J.
,
2023
, “
Scalable 3D Captioning with Pretrained Models
,”
Advances in Neural Information Processing Systems 36
,
New Orleans, LA
,
Dec. 10–16
, Vol. 36, pp.
75307
75337
.
37.
Radford
,
A.
,
Kim
,
J. W.
,
Hallacy
,
C.
,
Ramesh
,
A.
,
Goh
,
G.
,
Agarwal
,
S.
, and
Sastry
,
G.
,
2021
, “
Learning Transferable Visual Models From Natural Language Supervision
,”
Proceedings of the 38th International Conference on Machine Learning
,
Virtual Event
,
July 18–24
,
PMLR, pp. 8748–8763
.
38.
Mason
,
W.
, and
Suri
,
S.
,
2012
, “
Conducting Behavioral Research on Amazon’s Mechanical Turk
,”
Behav. Res. Methods
,
44
(
1
), pp.
1
23
.
39.
Lopez
,
C. E.
,
Miller
,
S. R.
, and
Tucker
,
C. S.
,
2019
, “
Exploring Biases Between Human and Machine Generated Designs
,”
ASME J. Mech. Des.
,
141
(
2
), p.
021104
.
40.
Li
,
X.
,
Xie
,
C.
, and
Sha
,
Z.
,
2023
, “
Design Representation for Performance Evaluation of 3d Shapes in Structure-Aware Generative Design
,”
Des. Sci.
,
9
, p.
e27
.
41.
Li
,
X.
,
Xie
,
C.
, and
Sha
,
Z.
,
2021
, “
Part-Aware Product Design Agent Using Deep Generative Network and Local Linear Embedding
,”
Proceedings of the 54th Hawaii International Conference on System Sciences
,
Virtual Event
,
Jan. 5–8
, pp.
5250
5259
.
You do not currently have access to this content.