Abstract
This paper presents a comprehensive study on applying deep learning techniques to Kurdish OCR, addressing unique challenges in Kurdish script recognition including diacritical marks and font variations. Our CNN-based approach achieves 97.2% accuracy on historical Kurdish manuscripts.
Keywords
Related Datasets
Kurdish Historical Manuscripts OCR Dataset
A comprehensive dataset containing 15,000 images of Kurdish historical manuscripts from the 18th-20th centuries, including ground truth transcriptions and metadata about script variations and …
Kurdish Modern Text Corpus
A large-scale corpus of modern Kurdish texts containing 50 million words from diverse sources including news articles, literature, academic papers, and web content. Includes linguistic annotations and …
Citation
Mahmood, A., Ali, S., & Hassan, R. (2023). Deep Learning Approaches for Kurdish Optical Character Recognition. Journal of Kurdish Language Technology, 15(3), 45-62.