Detection of Security Vulnerabilities in Javascript Functions Using Machine Learning Methods

Authors

DOI:

https://doi.org/10.5281/zenodo.18213689

Keywords:

Cyber security, Machine learning, Javascript security, Vulnerability detection, Code security

Abstract

JavaScript has become an essential programming language in modern web applications; however, its dynamic and loosely-typed nature introduces considerable security vulnerabilities. This paper presents JSVULNDETECT, a novel machine learning-based framework designed to automatically detect vulnerabilities in JavaScript code segments. Unlike traditional manual review approaches, which are often error-prone and inefficient, our system offers a scalable, automated solution for static code analysis. We evaluate and compare twelve machine learning algorithms—such as Random Forest, XGBoost, LightGBM, and Support Vector Machines—using a carefully preprocessed and balanced dataset. To address class imbalance, we apply the SMOTE-Tomek method, and we optimize model performance via Bayesian hyperparameter search. Moreover, we integrate ensemble learning strategies (Voting and Stacking), achieving a maximum accuracy of 97.51% with the Stacking Classifier. Our work advances the field by combining multiple learning paradigms, enhancing detection performance, and providing a web-based interface for real-time analysis. The proposed framework not only surpasses previous results reported in similar studies but also serves as a practical tool for developers and security analysts in identifying potential threats in JavaScript functions.

References

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018

Dietterich, T. G. (2000). Ensemble methods in machine learning. In J. Kittler & F. Roli (Eds.), Multiple Classifier Systems (pp. 1–15). Springer. https://doi.org/10.1007/3-540-45014-9_1

Ferenc, R., Hegedűs, P., Gyimesi, P., Antal, G., Bán, D., & Gyimóthy, T. (2019). Challenging machine learning algorithms in predicting vulnerable JavaScript functions. In 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE) (pp. 8–14). IEEE. https://doi.org/10.1109/RAISE.2019.00010

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/s10994-006-6226-1

Harzevili, N. S., Belle, A. B., Wang, J., Wang, S., Jiang, Z. M., & Nagappan, N. (2023). A survey on automated software vulnerability detection using machine learning and deep learning (arXiv:2306.11673). arXiv. https://doi.org/10.48550/arXiv.2306.11673

Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). John Wiley & Sons.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146–3154. https://proceedings.neurips.cc/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf

Liu, Z., Fang, Y., Huang, C., & Xu, Y. (2023). MFXSS: An effective XSS vulnerability detection method in JavaScript based on multi-feature model. Computers & Security, 124, 103015. https://doi.org/10.1016/j.cose.2022.103015

McCallum, A., & Nigam, K. (1998). A comparison of event models for naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization (pp. 41–48). AAAI Press.

Şahin, M. (2025). Binary logistic regression procedure with an application. Black Sea Journal of Statistics, 1(1), 22–26. https://blackseapublishers.online/index.php/statistics/article/view/21

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 25, 2951–2959.

Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1

Downloads

Published

2025-06-15

How to Cite

ALAV, H. H., TONKAL, Özgür, & CÖMERT, Z. (2025). Detection of Security Vulnerabilities in Javascript Functions Using Machine Learning Methods. Black Sea Journal of Artificial Intelligence, 1(1), 15–24. https://doi.org/10.5281/zenodo.18213689

Issue

Section

Original Research Article