Dataset Description
Large-scale audio corpus containing 1,000 hours of Kurdish speech from 500+ speakers across different dialects, ages, and regions. Includes high-quality transcriptions and speaker metadata.
Data Structure
WAV audio files (16kHz, mono), Transcript files with timestamps, Speaker metadata (demographics, dialect), Phonetic alignments
Primary Publication
Automatic Speech Recognition for Kurdish Dialects
Dr. Mohammad Ali , Dr. John Doe , Dr. Sara Ahmed (2023)
This research develops the first comprehensive ASR system for Kurdish, supporting both Sorani and Kurmanji dialects with domain adaptation techniques. Achieved 89.3% word accuracy on conversational …
Related Publications
How to Cite
Kareem, H., Ahmed, R., & Jamal, S. (2023). Kurdish Speech Recognition Audio Corpus. KaiLab Research Data Repository. https://doi.org/10.5281/kurd-speech-corpus.v1
Dataset Information
Total Size
12.5 GB
Languages
3 languages
Formats
4 formats
Related Papers
1 publication