Kurdish-English Parallel Translation Corpus

Published: May 8, 2023
Size: 450 MB
License: CC BY-SA 4.0
Domain: Translation
Languages:
Kurdish (Sorani) Kurdish (Kurmanji) English
Formats:
TSV JSON
Contributing Organizations:

Dataset Description

High-quality parallel corpus containing 500,000 sentence pairs for Kurdish-English translation, covering multiple domains and ensuring balanced representation of both Sorani and Kurmanji dialects.

Data Structure

Tab-separated files with aligned sentences, Metadata including domain tags and quality scores, Source attribution files

Primary Publication

Neural Machine Translation for Kurdish-English Language Pairs

Dr. Karim Mohammad , Dr. Zainab Hussein (2023)

We present a transformer-based neural machine translation system specifically designed for Kurdish-English translation, incorporating morphological awareness and handling dialectal variations across …

Related Publications

How to Cite

Salim, N., & Rashid, L. (2023). Kurdish-English Parallel Translation Corpus. KaiLab Research Data Repository. https://doi.org/10.5281/kurd-en-parallel.v1

Dataset Information

Total Size 450 MB
Languages 3 languages
Formats 2 formats
Related Papers 1 publication

Data Access

Download Dataset

Licensed under CC BY-SA 4.0