The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For me ...
Dataset has been extracted from social media for an amount of 43,313 tokens. The classification task consists in cat ...
Condescending language use is caustic; it can bring dialogues to an end and bifurcate communities. Thus, systems for ...
Automatic summarization methods have been studied on a variety of domains, including news and scientific articles. Y ...
Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a m ...
Automated fact-checking based on machine learning is a promising approach to identify false information distributed ...
数据集包含 60,000 本电子书, 朗: 多语言, 迭代: 60 0, file_type: 文本, 任务: 文本Corpora
数据集使用专业工作室以南莱万丁阿拉伯语(达马西亚口音)录制。合成语音作为使用此语料库的输出产生了高质量 --, file_type ...
本文介绍了 NUBes语料库的第一个版本(西班牙语生物医学文本中的否定和不确定性注释)。语料库是持续研究的一部分,目前包括 ...
数据集收集了波兰议会、议会和参议院会议记录中的语言分析文件。它基于波兰 Sejm Corpus.,朗: 波兰语, 迭代: 3,000+, ...