Project Overview
Systematic creation and analysis of comprehensive Kurdish text corpora covering multiple domains and dialects. Our 50-million-word corpus includes automated quality assessment, linguistic annotation, and serves as the foundation for numerous Kurdish NLP applications.
Technologies & Methods
Applications
- Language Research
- NLP Model Training
- Linguistic Analysis
Related Publications
Large-Scale Kurdish Text Corpus Creation and Analysis
Dr. Zainab Hussein , Dr. Mohammad Ali , Dr. Sara Ahmed (2023)
Comprehensive methodology for creating and analyzing a 50-million-word Kurdish corpus covering multiple domains and dialects, with automated quality assessment and linguistic annotation.
Related Datasets
Kurdish Modern Text Corpus
Published: January 15, 2023 | Size: 1.8 GB
A large-scale corpus of modern Kurdish texts containing 50 million words from diverse sources including news articles, literature, academic papers, and web content. Includes linguistic annotations and …
Project Statistics
Research Team
Funding
Kurdistan Academy of Sciences Research Grant