Tunisian Arabish Corpus (TArC)免费

Khan 26 2021-08-24 NLP

资源介绍

Dataset has been extracted from social media for an amount of 43,313 tokens. The classification task consists in categorizing the text at the token level into three classes: arabizi, foreign and emotag., lang: Tunisian, iterations: 4,790, file_type: TSV, tasks: Classification, Part-of-Speech (POS)

END

发表评论

猜你喜欢