• Home /
  • Papers /
  • Large-Scale Kurdish Text Corpus Creation and Analysis

Abstract

Comprehensive methodology for creating and analyzing a 50-million-word Kurdish corpus covering multiple domains and dialects, with automated quality assessment and linguistic annotation.

Keywords

Corpus Linguistics Kurdish Text Mining Language Resources

Related Datasets

Kurdish Modern Text Corpus

January 15, 2023 1.8 GB TXT, XML, JSON

A large-scale corpus of modern Kurdish texts containing 50 million words from diverse sources including news articles, literature, academic papers, and web content. Includes linguistic annotations and …

Citation

Omar, Z., Hassan, K., & Ali, A. (2023). Large-Scale Kurdish Text Corpus Creation and Analysis. Language Resources and Evaluation, 57(3), 891-920.

Publication Details

Authors 3 authors
Datasets 3 datasets