Project Overview
Our OCR project focuses on developing state-of-the-art optical character recognition systems specifically designed for Kurdish texts. We address unique challenges including script variations, diacritical marks, and historical manuscript preservation. Our deep learning approaches achieve over 95% accuracy on degraded historical texts.
Technologies & Methods
Applications
- Historical Document Preservation
- Digital Archives
- Text Digitization
Related Publications
Deep Learning Approaches for Kurdish Optical Character Recognition
Dr. Ahmad Kurdish , Dr. Sara Ahmed , Dr. Fatima Hassan (2023)
This paper presents a comprehensive study on applying deep learning techniques to Kurdish OCR, addressing unique challenges in Kurdish script recognition including diacritical marks and font …
Related Datasets
Kurdish Historical Manuscripts OCR Dataset
Published: February 10, 2023 | Size: 2.3 GB
A comprehensive dataset containing 15,000 images of Kurdish historical manuscripts from the 18th-20th centuries, including ground truth transcriptions and metadata about script variations and …
Kurdish Modern Text Corpus
Published: January 15, 2023 | Size: 1.8 GB
A large-scale corpus of modern Kurdish texts containing 50 million words from diverse sources including news articles, literature, academic papers, and web content. Includes linguistic annotations and …
Project Statistics
Research Team
Funding
Kurdistan Regional Government Research Grant