Dataset Description
A comprehensive dataset containing 15,000 images of Kurdish historical manuscripts from the 18th-20th centuries, including ground truth transcriptions and metadata about script variations and preservation conditions.
Data Structure
Images (PNG, 300 DPI), Ground truth text files (UTF-8), Metadata (JSON format with manuscript info, date, region, script type)
Primary Publication
Deep Learning Approaches for Kurdish Optical Character Recognition
Dr. Ahmad Kurdish , Dr. Sara Ahmed , Dr. Fatima Hassan (2023)
This paper presents a comprehensive study on applying deep learning techniques to Kurdish OCR, addressing unique challenges in Kurdish script recognition including diacritical marks and font …
Related Publications
How to Cite
Mahmood, A., Ali, S., & Hassan, R. (2023). Kurdish Historical Manuscripts OCR Dataset. KaiLab Research Data Repository. https://doi.org/10.5281/kurd-hist-ocr.v1