Prompt Engineering: A Bibliometric Analysis
DOI:
https://doi.org/10.5281/zenodo.18213819Keywords:
In-context learning, Natural language processing, Publication trends, Research collaboration, Thematic analysisAbstract
Prompt engineering addresses the design and optimization of natural language instructions for controlling large language model behavior, representing a critical domain within artificial intelligence and human-computer interaction. Despite explosive research growth following widespread adoption of large language models, systematic analyses of the field's intellectual structure, publication patterns, and collaborative networks remain limited. This study conducted a comprehensive bibliometric analysis of prompt engineering research from 2020 to 2025 using Web of Science Core Collection as the data source. A systematic search strategy retrieved 4,890 publications from 1,538 sources authored by 16,052 researchers across computer science and artificial intelligence domains. Analysis employed the Bibliometrix package (version 4.3.5) in R (version 4.5.1) to examine publication trends, author productivity, institutional contributions, geographic distribution, thematic structure, collaboration patterns, and citation impact through performance analysis, keyword co-occurrence networks, and science mapping techniques. The field demonstrated explosive expansion with 125.1% annual growth rate, exhibiting three developmental phases: emergence phase (2020-2021), acceleration phase (2022-2023), and explosion phase (2024-2025) when output reached 2,093 publications annually. The Chinese Academy of Sciences led institutional productivity with 345 publications, while China dominated national output with 1,584 documents representing 32.67% of the corpus. Geographic analysis revealed quality-quantity trade-offs with Singapore achieving the highest average citation impact (37.37 citations per document) despite modest volume. Author analysis identified Zhang Y as most productive (43 publications) while collaboration metrics indicated 4.9 co-authors per document and 26.44% international co-authorship rate. Keyword analysis revealed "large language models" (946 occurrences) and "prompt engineering" (733 occurrences) as dominant themes with three distinct thematic clusters: core prompting methodologies, machine learning foundations, and application domains. Network visualization confirmed integration of few-shot learning, chain-of-thought prompting, and in-context learning techniques into large language model applications. IEEE Access dominated publication venues with 176 articles, while natural language processing conferences (ACL, EMNLP, NeurIPS) emerged as primary dissemination channels. Citation analysis identified foundational contributions in instruction following and chain-of-thought reasoning alongside contemporary methodological innovations. The findings reveal prompt engineering's rapid crystallization as a distinct research domain emphasizing practical techniques over theoretical foundations, while concentration around specific models indicates potential fragmentation risks requiring unified frameworks transcending particular implementations.
References
Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
Cheong, M. (2025). ChatGPT's performance evaluation in spreadsheets modelling to inform assessments redesign. Journal of Computer Assisted Learning, 41(3), e70035. https://doi.org/10.1111/jcal.70035
Dahl, D. B., Scott, D., Roosen, C., Magnusson, A., & Swinton, J. (2019). xtable: Export Tables to LaTeX or HTML (Version 1.8-4) [R package]. https://CRAN.R-project.org/package=xtable
Hsieh, P., & Lee, W. (2025). A particle swarm optimization-based approach coupled with large language models for prompt optimization. Expert Systems, 42(6), e70049. https://doi.org/10.1111/exsy.70049
Jovanovic, M., & Voss, P. (2025). Towards incremental learning in large language models: A critical review. Expert Systems, 42(10), e70127. https://doi.org/10.1111/exsy.70127
Lizarraga, A., Honig, E., & Wu, Y. N. (2025). From stochastic parrots to digital intelligence: The evolution of language models and their cognitive capabilities. WIREs Computational Statistics, 17(3), e70035. https://doi.org/10.1002/wics.70035
Nguyen-Duc, A., Cabrero-Daniel, B., Przybylek, A., Arora, C., Khanna, D., Herda, T., Rafiq, U., Melegati, J., Guerra, E., Kemell, K.-K., Saari, M., Nguyen, C., Yousuf, O., Matthies, C., Rafiq, A., Becker, S., Muñoz, R. F., Chanin, R., Sales, A., . . . Abrahamsson, P. (2025). Generative artificial intelligence for software engineering—A research agenda. Software: Practice and Experience, 55(11), 1806–1843. https://doi.org/10.1002/spe.70005
Pranckutė, R. (2021). Web of Science (WoS) and Scopus: The titans of bibliographic information in today’s academic world. Publications, 9(1), 12. https://doi.org/10.3390/publications9010012
R Core Team. (2025). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
Sasson Lazovsky, G., Raz, T., & Kenett, Y. N. (2024). The art of creative inquiry—From question asking to prompt engineering. The Journal of Creative Behavior, 59(1), e671. https://doi.org/10.1002/jocb.671
Vaira, L. A., Lechien, J. R., Abbate, V., Allevi, F., Audino, G., Beltramini, G. A., Bergonzani, M., Bernini, L., Boccaletti, R., Bonvin, N., Bossi, P., Bruschini, L., Calvanese, L., Canzi, P., Carobbio, A. L. C., Cazzador, D., Cocuzza, S., Colombo, G., D'Aguanno, V., . . . De Riu, G. (2023). Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: A multicenter collaborative analysis. Otolaryngology–Head and Neck Surgery, 170(6), 1492–1503. https://doi.org/10.1002/ohn.489
van Diessen, E., Amerongen, R. A., Zijlmans, M., & Otte, W. M. (2024). Potential merits and flaws of large language models in epilepsy care: A critical review. Epilepsia, 65(4), 873–886. https://doi.org/10.1111/epi.17907
Wang, Y., Li, L., & Chen, L. (2025). Recent advances in finetuning multimodal large language models. AI Magazine, 46(3), e70025. https://doi.org/10.1002/aaai.70025
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag.
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A grammar of data manipulation (Version 1.1.4) [R package]. https://CRAN.R-project.org/package=dplyr
Xue, M., Liu, Y., Xiao, X., & Wilson, M. (2025). Automatic prompt engineering for automatic scoring. Journal of Educational Measurement. Advance online publication. https://doi.org/10.1111/jedm.70002
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Black Sea Journal of Artificial Intelligence

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.