Prompt Engineering: A Bibliometric Analysis

Authors

DOI:

https://doi.org/10.5281/zenodo.18213819

Keywords:

In-context learning, Natural language processing, Publication trends, Research collaboration, Thematic analysis

Abstract

Prompt engineering addresses the design and optimization of natural language instructions for controlling large language model behavior, representing a critical domain within artificial intelligence and human-computer interaction. Despite explosive research growth following widespread adoption of large language models, systematic analyses of the field's intellectual structure, publication patterns, and collaborative networks remain limited. This study conducted a comprehensive bibliometric analysis of prompt engineering research from 2020 to 2025 using Web of Science Core Collection as the data source. A systematic search strategy retrieved 4,890 publications from 1,538 sources authored by 16,052 researchers across computer science and artificial intelligence domains. Analysis employed the Bibliometrix package (version 4.3.5) in R (version 4.5.1) to examine publication trends, author productivity, institutional contributions, geographic distribution, thematic structure, collaboration patterns, and citation impact through performance analysis, keyword co-occurrence networks, and science mapping techniques. The field demonstrated explosive expansion with 125.1% annual growth rate, exhibiting three developmental phases: emergence phase (2020-2021), acceleration phase (2022-2023), and explosion phase (2024-2025) when output reached 2,093 publications annually. The Chinese Academy of Sciences led institutional productivity with 345 publications, while China dominated national output with 1,584 documents representing 32.67% of the corpus. Geographic analysis revealed quality-quantity trade-offs with Singapore achieving the highest average citation impact (37.37 citations per document) despite modest volume. Author analysis identified Zhang Y as most productive (43 publications) while collaboration metrics indicated 4.9 co-authors per document and 26.44% international co-authorship rate. Keyword analysis revealed "large language models" (946 occurrences) and "prompt engineering" (733 occurrences) as dominant themes with three distinct thematic clusters: core prompting methodologies, machine learning foundations, and application domains. Network visualization confirmed integration of few-shot learning, chain-of-thought prompting, and in-context learning techniques into large language model applications. IEEE Access dominated publication venues with 176 articles, while natural language processing conferences (ACL, EMNLP, NeurIPS) emerged as primary dissemination channels. Citation analysis identified foundational contributions in instruction following and chain-of-thought reasoning alongside contemporary methodological innovations. The findings reveal prompt engineering's rapid crystallization as a distinct research domain emphasizing practical techniques over theoretical foundations, while concentration around specific models indicates potential fragmentation risks requiring unified frameworks transcending particular implementations.

References

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html

Cheong, M. (2025). ChatGPT's performance evaluation in spreadsheets modelling to inform assessments redesign. Journal of Computer Assisted Learning, 41(3), e70035. https://doi.org/10.1111/jcal.70035

Dahl, D. B., Scott, D., Roosen, C., Magnusson, A., & Swinton, J. (2019). xtable: Export Tables to LaTeX or HTML (Version 1.8-4) [R package]. https://CRAN.R-project.org/package=xtable

Hsieh, P., & Lee, W. (2025). A particle swarm optimization-based approach coupled with large language models for prompt optimization. Expert Systems, 42(6), e70049. https://doi.org/10.1111/exsy.70049

Jovanovic, M., & Voss, P. (2025). Towards incremental learning in large language models: A critical review. Expert Systems, 42(10), e70127. https://doi.org/10.1111/exsy.70127

Lizarraga, A., Honig, E., & Wu, Y. N. (2025). From stochastic parrots to digital intelligence: The evolution of language models and their cognitive capabilities. WIREs Computational Statistics, 17(3), e70035. https://doi.org/10.1002/wics.70035

Nguyen-Duc, A., Cabrero-Daniel, B., Przybylek, A., Arora, C., Khanna, D., Herda, T., Rafiq, U., Melegati, J., Guerra, E., Kemell, K.-K., Saari, M., Nguyen, C., Yousuf, O., Matthies, C., Rafiq, A., Becker, S., Muñoz, R. F., Chanin, R., Sales, A., . . . Abrahamsson, P. (2025). Generative artificial intelligence for software engineering—A research agenda. Software: Practice and Experience, 55(11), 1806–1843. https://doi.org/10.1002/spe.70005

Pranckutė, R. (2021). Web of Science (WoS) and Scopus: The titans of bibliographic information in today’s academic world. Publications, 9(1), 12. https://doi.org/10.3390/publications9010012

R Core Team. (2025). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/

Sasson Lazovsky, G., Raz, T., & Kenett, Y. N. (2024). The art of creative inquiry—From question asking to prompt engineering. The Journal of Creative Behavior, 59(1), e671. https://doi.org/10.1002/jocb.671

Vaira, L. A., Lechien, J. R., Abbate, V., Allevi, F., Audino, G., Beltramini, G. A., Bergonzani, M., Bernini, L., Boccaletti, R., Bonvin, N., Bossi, P., Bruschini, L., Calvanese, L., Canzi, P., Carobbio, A. L. C., Cazzador, D., Cocuzza, S., Colombo, G., D'Aguanno, V., . . . De Riu, G. (2023). Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: A multicenter collaborative analysis. Otolaryngology–Head and Neck Surgery, 170(6), 1492–1503. https://doi.org/10.1002/ohn.489

van Diessen, E., Amerongen, R. A., Zijlmans, M., & Otte, W. M. (2024). Potential merits and flaws of large language models in epilepsy care: A critical review. Epilepsia, 65(4), 873–886. https://doi.org/10.1111/epi.17907

Wang, Y., Li, L., & Chen, L. (2025). Recent advances in finetuning multimodal large language models. AI Magazine, 46(3), e70025. https://doi.org/10.1002/aaai.70025

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag.

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). dplyr: A grammar of data manipulation (Version 1.1.4) [R package]. https://CRAN.R-project.org/package=dplyr

Xue, M., Liu, Y., Xiao, X., & Wilson, M. (2025). Automatic prompt engineering for automatic scoring. Journal of Educational Measurement. Advance online publication. https://doi.org/10.1111/jedm.70002

Downloads

Published

2025-12-15

How to Cite

KOÇYİĞİT, A., & ŞENYER, N. (2025). Prompt Engineering: A Bibliometric Analysis. Black Sea Journal of Artificial Intelligence, 1(2), 40–52. https://doi.org/10.5281/zenodo.18213819

Issue

Section

Original Research Article