Tunisian Arabish Corpus (TArC)免费

Khan 55 2021-08-24 NLP

点击图片放大查看

资源介绍

Dataset has been extracted from social media for an amount of 43,313 tokens. The classification task consists in categorizing the text at the token level into three classes: arabizi, foreign and emotag., lang: Tunisian, iterations: 4,790, file_type: TSV, tasks: Classification, Part-of-Speech (POS)

END

标签

中等规模文本分类

上一篇 CaSiNo: A Corpus of Campsite Negotiation Dialogues for Automatic Negotiation Systems

下一篇 NCBI Disease Corpus

发表评论取消回复

请先登录账户再评论哦

猜你喜欢