Datasets

Kurdish Historical Manuscripts OCR Dataset

Feb 2023 2.3 GB CC BY-NC-SA 4.0

PNG TXT JSON

A comprehensive dataset containing 15,000 images of Kurdish historical manuscripts from the 18th-20th centuries, including ground truth transcriptions and metadata about script variations and …

Kurdish (Sorani) Kurdish (Kurmanji)

Historical Documents

1 related paper

Kurdish Modern Text Corpus

Jan 2023 1.8 GB CC BY 4.0

TXT XML JSON

A large-scale corpus of modern Kurdish texts containing 50 million words from diverse sources including news articles, literature, academic papers, and web content. Includes linguistic annotations and …

Kurdish (Sorani) Kurdish (Kurmanji) Kurdish (Pehlewani)

General Text

2 related papers

Kurdish Morphological Analysis Dataset

Apr 2023 85 MB CC BY 4.0

CSV XML JSON

Comprehensive morphological analysis dataset containing 100,000 Kurdish words with detailed morphological breakdowns, POS tags, and inflectional information for both Sorani and Kurmanji dialects.

Kurdish (Sorani) Kurdish (Kurmanji)

Morphological Analysis

2 related papers

Kurdish Phonetic Pronunciation Dictionary

Mar 2023 120 MB CC BY 4.0

JSON CSV TXT

Comprehensive pronunciation dictionary for Kurdish containing phonetic transcriptions for 75,000 words, including stress patterns and dialectal variations using IPA notation.

Kurdish (Sorani) Kurdish (Kurmanji)

Pronunciation

2 related papers

Kurdish Speech Recognition Audio Corpus

Aug 2023 12.5 GB CC BY-NC 4.0

WAV TXT JSON TextGrid

Large-scale audio corpus containing 1,000 hours of Kurdish speech from 500+ speakers across different dialects, ages, and regions. Includes high-quality transcriptions and speaker metadata.

Kurdish (Sorani) Kurdish (Kurmanji) Kurdish (Pehlewani)

Speech Recognition

1 related paper

Kurdish-English Parallel Translation Corpus

May 2023 450 MB CC BY-SA 4.0

TSV JSON

High-quality parallel corpus containing 500,000 sentence pairs for Kurdish-English translation, covering multiple domains and ensuring balanced representation of both Sorani and Kurmanji dialects.

Kurdish (Sorani) Kurdish (Kurmanji) English

Translation

1 related paper

Statistics

6

Total Datasets

9

Research Papers

9

Research Projects

View All Papers View All Projects View All Contributors