Abstract: Deepfakes—synthetic media generated via advanced deep learning techniques such as Generative Adversarial Networks (GANs)—pose a growing threat to digital authenticity, trust, and security. While these technologies have legitimate applications in entertainment and content creation, they have also enabled the spread of misinformation, fraud, and identity manipulation. This paper presents a comprehensive survey of current deepfake detection techniques, categorizing them into traditional, machine learning-based, and deep learning-based approaches. It explores the key datasets and performance metrics which are used to benchmark detection models and it also discuss technical, ethical, and robustness concerns. This paper proposes a conceptual hybrid CNN-LSTM-based architecture that combines spatial and temporal feature analysis to improve detection accuracy of deep fake videos. The modular design is planned to support future enhancements, including the incorporation of Vision Transformers and attention mechanisms. In this study, we provide a fundamental understanding of deepfake detection and determine promising avenues for further research and real-world applications.

Keywords: Generative Adversarial Networks (GANs), Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), Vision Transformers (ViT), Spatiotemporal Modeling, Dataset Generalization.