Abstract: This paper proposes a deep neural network framework for robust and accurate recognition of Arabic Sign Language (ArSL) gestures. The system employs a multi-layer convolutional neural network (CNN) architecture optimized for spatial feature extraction from complex hand shapes. A novel preprocessing pipeline is introduced, utilizing a hand-tracking algorithm to isolate hand regions and generate skeletal representations, thereby reducing background noise and improving recognition consistency under variable lighting conditions. The model is trained and evaluated on the ArASL2018 dataset, comprising over 54,000 labeled gesture images spanning 39 alphabetic and word classes. To enhance generalization, the training data were augmented through rotation, zoom, and shifting transformations. The proposed framework achieved an overall accuracy of 99.8% and demonstrated remarkable tolerance to Gaussian noise, confirming its ability to maintain performance in degraded visual environments.


Keywords: Arabic Sign Language, Convolutional neural network, ArSL, Hand gesture.