COMPUTATIONAL CHALLENGES IN ANCIENT LANGUAGE MODELING: THE CASE OF THE RIGVEDA
1Rohit D.S., 2Vijay Kumar K., 3Vikas S., 4Vivek Goutam, 5Purnima S.M.
1,2,3,4 Scholars, 5Assistant Professor,
Department of Computer Science and Engineering, R.N.S. Institute of Technology, Bengaluru, India
Abstract—The Rigveda, composed over three millennia ago, is among the earliest known literary texts and an invaluable source of Indo-European linguistic, cultural, and philosophical heritage. With 1,028 hymns written in Vedic Sanskrit which is a very archaic form of Sanskrit yet to be fully deciphered, its structure poses considerable challenges to linguists and is yet to be subject to computational analysis. As Natural Language Processing (NLP) evolves, it becomes an essential tool in interpreting such ancient texts, helping us look at this ancient text in a way like never before. This survey explores the state-of-the-art NLP approaches applied to Vedic literature, focusing on the Rigveda. We review efforts in corpus creation [3], [4], [10], [21], [26], [27], morphological analysis [1], [6], [12], syntactic and semantic
parsing [3], [5], [11], [13], [23], and the use of deep learning mod- els [14], [15]. We also examine the specific linguistic challenges in processing Vedic Sanskrit, such as compound disambiguation and poetic syntax [7], [9]. Through critical analysis, we highlight gaps in current research and suggest future directions, including semantic web integration [19], cultural heritage preservation, and advanced question-answering systems [18]. This paper aims to provide a roadmap for computational philologists and NLP researchers venturing into Vedic studies and to revitalize curiosity on studying the RigVeda.
Index Terms—Rigveda, Natural Language Processing, Vedic Sanskrit, Computational Linguistics, Ancient Text Mining.