Kurdish Historical Manuscripts OCR Dataset
A comprehensive dataset containing 15,000 images of Kurdish historical manuscripts from the 18th-20th centuries, including ground truth transcriptions and metadata about script variations and …
A comprehensive dataset containing 15,000 images of Kurdish historical manuscripts from the 18th-20th centuries, including ground truth transcriptions and metadata about script variations and …
A large-scale corpus of modern Kurdish texts containing 50 million words from diverse sources including news articles, literature, academic papers, and web content. Includes linguistic annotations and …
Comprehensive morphological analysis dataset containing 100,000 Kurdish words with detailed morphological breakdowns, POS tags, and inflectional information for both Sorani and Kurmanji dialects.
Comprehensive pronunciation dictionary for Kurdish containing phonetic transcriptions for 75,000 words, including stress patterns and dialectal variations using IPA notation.
Large-scale audio corpus containing 1,000 hours of Kurdish speech from 500+ speakers across different dialects, ages, and regions. Includes high-quality transcriptions and speaker metadata.
High-quality parallel corpus containing 500,000 sentence pairs for Kurdish-English translation, covering multiple domains and ensuring balanced representation of both Sorani and Kurmanji dialects.
Total Datasets
Research Papers
Research Projects