FREE CASH FLOW AND OVERINVESTMENT: MACHINE LEARNING PERSPECTIVES IN VIETNAM
Phong Nguyen Anh, Tam Phan Huy, Thanh Ngo Phu
University of Economics and Law and Vietnam National University, Ho Chi Minh City, Vietnam
Abstract: This study develops and evaluates a comprehensive machine learning framework to classify overinvestment among Vietnamese listed firms using firm-level financial and governance variables. Drawing on agency theory and prior empirical research, overinvestment is defined as firms with free cash flow above the sample median and Tobin’s Q below the median. The dataset includes 6,561 firm-year observations from 2000 to 2024, covering companies listed on the Hanoi and Ho Chi Minh Stock Exchanges. Seven classification algorithms were compared: Logistic Regression, Random Forest, XGBoost, LightGBM, Support Vector Machine, K-Nearest Neighbors, and Artificial Neural Networks. Performance was assessed via 10-fold cross-validation across multiple metrics including accuracy, precision, recall, F1 score, AUC-ROC, MCC, Brier score, and computational efficiency. Results show that ensemble methods, particularly Random Forest and XGBoost, achieved the highest predictive performance, while simpler classifiers like SVM and KNN consistently underperformed. The findings confirm that advanced machine learning techniques can effectively model the nonlinear and heterogeneous determinants of overinvestment. This study contributes to the literature by demonstrating the applicability of modern predictive analytics in an emerging market context and providing evidence that supports agency theory perspectives on investment inefficiency.
Keywords: Overinvestment, Machine Learning, Ensemble Methods, Investment Efficiency, Vietnam.
JEL codes: G32, C45, M41