Kurdish Historical Manuscripts OCR Dataset

Published: February 10, 2023
Size: 2.3 GB
License: CC BY-NC-SA 4.0
Domain: Historical Documents
Languages:
Kurdish (Sorani) Kurdish (Kurmanji)
Formats:
PNG TXT JSON
Contributing Organizations:

Dataset Description

A comprehensive dataset containing 15,000 images of Kurdish historical manuscripts from the 18th-20th centuries, including ground truth transcriptions and metadata about script variations and preservation conditions.

Data Structure

Images (PNG, 300 DPI), Ground truth text files (UTF-8), Metadata (JSON format with manuscript info, date, region, script type)

Primary Publication

Deep Learning Approaches for Kurdish Optical Character Recognition

Dr. Ahmad Kurdish , Dr. Sara Ahmed , Dr. Fatima Hassan (2023)

This paper presents a comprehensive study on applying deep learning techniques to Kurdish OCR, addressing unique challenges in Kurdish script recognition including diacritical marks and font …

Related Publications

How to Cite

Mahmood, A., Ali, S., & Hassan, R. (2023). Kurdish Historical Manuscripts OCR Dataset. KaiLab Research Data Repository. https://doi.org/10.5281/kurd-hist-ocr.v1

Dataset Information

Total Size 2.3 GB
Languages 2 languages
Formats 3 formats
Related Papers 1 publication

Data Access

Download Dataset

Licensed under CC BY-NC-SA 4.0