• Home /
  • Papers /
  • Deep Learning Approaches for Kurdish Optical Character Recognition

Abstract

This paper presents a comprehensive study on applying deep learning techniques to Kurdish OCR, addressing unique challenges in Kurdish script recognition including diacritical marks and font variations. Our CNN-based approach achieves 97.2% accuracy on historical Kurdish manuscripts.

Keywords

OCR Deep Learning Kurdish CNN Historical Manuscripts

Related Datasets

Kurdish Historical Manuscripts OCR Dataset

February 10, 2023 2.3 GB PNG, TXT, JSON

A comprehensive dataset containing 15,000 images of Kurdish historical manuscripts from the 18th-20th centuries, including ground truth transcriptions and metadata about script variations and …

Kurdish Modern Text Corpus

January 15, 2023 1.8 GB TXT, XML, JSON

A large-scale corpus of modern Kurdish texts containing 50 million words from diverse sources including news articles, literature, academic papers, and web content. Includes linguistic annotations and …

Citation

Mahmood, A., Ali, S., & Hassan, R. (2023). Deep Learning Approaches for Kurdish Optical Character Recognition. Journal of Kurdish Language Technology, 15(3), 45-62.

Publication Details

Authors 3 authors
Datasets 2 datasets