Optical Character Recognition (OCR)

Advanced text recognition systems for Kurdish historical and modern documents

Active
Started: January 2022
3 Team Members

Project Overview

Our OCR project focuses on developing state-of-the-art optical character recognition systems specifically designed for Kurdish texts. We address unique challenges including script variations, diacritical marks, and historical manuscript preservation. Our deep learning approaches achieve over 95% accuracy on degraded historical texts.

Technologies & Methods

Deep Learning CNN Computer Vision Image Processing

Applications

  • Historical Document Preservation
  • Digital Archives
  • Text Digitization

Related Publications

Deep Learning Approaches for Kurdish Optical Character Recognition

Dr. Ahmad Kurdish , Dr. Sara Ahmed , Dr. Fatima Hassan (2023)

This paper presents a comprehensive study on applying deep learning techniques to Kurdish OCR, addressing unique challenges in Kurdish script recognition including diacritical marks and font …

Related Datasets

Kurdish Historical Manuscripts OCR Dataset

Published: February 10, 2023 | Size: 2.3 GB

A comprehensive dataset containing 15,000 images of Kurdish historical manuscripts from the 18th-20th centuries, including ground truth transcriptions and metadata about script variations and …

Kurdish Modern Text Corpus

Published: January 15, 2023 | Size: 1.8 GB

A large-scale corpus of modern Kurdish texts containing 50 million words from diverse sources including news articles, literature, academic papers, and web content. Includes linguistic annotations and …

Project Statistics

Publications: 1
Datasets: 2
Team Size: 3

Research Team

Funding

Kurdistan Regional Government Research Grant