KDC-4007数据集收集数据集

资源介绍

Arazo M. Mustafa, (arazo.2007 '@' yahoo.com),
School of Computer Science University of Sulaimania, Kurdistan, Iraq

Data Set Information:

The most important feature of this dataset is its simplicity to use and its being well-documented, which can be widely used in various studies of text analysis regarding Kurdish Sorani news and articles.
The documents consist of eight categories, which are Sport, Religion, Art, Economic, Education, Social, Style, and Health. Each of them consisted of 500 text documents, where the total size of the corpus is 4,007 text files.
The dataset and documents have become freely accessible in order to have repeatable outcomes for experimental assessment.

Attribute Information:

There is four collection:

- ST-Ds datasets, just stop words elimination is performed by using Kurdish preprocessing-step approach.
- The pre-ds dataset, Kurdish preprocessing-step approach is used.
- The Pre+TW-Ds dataset, TF?—IDF term weighting on the Pre-Ds dataset is performed.
- Orig-Ds datasets, no process is used which is the original dataset.

Relevant Papers:

[1] Arazo M. Mustafa and Tarik A. Rashid,a€? Kurdish Stemmer Pre-processing Steps for Improving Information Retrievala€?, Journal of Information Science, First published date: january-01-2017, 10.1177/0165551516683617.
[2] Tarik A. Rashid, Arazo M. Mustafa and Ari M. Saeed, 2017.'A Robust Categorization System for Kurdish Sorani Text Documents'. Information Technology Journal, 16: 27-34.
[3] Tarik A. Rashid, Arazu M. Mustafa, Ari M. Saeed Automatic Kurdish Text Classification Using KDC 4007 Dataset, accepted in Springer book, Series Title: Lecture Notes on Data Engineering and Communications Technologies: Book title: Advances in Internetworking, Data & Web Technologies, Indexing: The books of this series are submitted to ISI Proceedings, EI, Scopus, MetaPress, Springerlink, 2017.

Citation Request:

If you have no special citation requests, please leave this field blank.

END

上一篇 KASANDR数据集

下一篇劳动关系数据集

发表评论取消回复

请先登录账户再评论哦

KDC-4007数据集收集数据集免费

资源介绍

发表评论取消回复

最新文章

热门文章

UCI数据库

帕尔默企鹅数据集

小麦种子数据集

开放采样设置数据集中的气体传感器阵列

BBC 新闻数据集

标签云

猜你喜欢

KDC-4007数据集收集数据集免费

资源介绍

发表评论 取消回复

最新文章

热门文章

UCI数据库

帕尔默企鹅数据集

小麦种子数据集

开放采样设置数据集中的气体传感器阵列

BBC 新闻数据集

标签云

猜你喜欢

UCI数据库

帕尔默企鹅数据集

小麦种子数据集

开放采样设置数据集中的气体传感器阵列

BBC 新闻数据集

Twitter 情绪分析和Sentiment140 数据集

电离层数据集

EPIC-Kitchens

纸币验证数据集

Jeopardy! 问题数据集

发表评论取消回复