Dataset contains 6,892 disease mentions, which are mapped to 790 unique disease concepts. Of these, 88% link to a Me ...
Dataset has been extracted from social media for an amount of 43,313 tokens. The classification task consists in cat ...
Automated systems that negotiate with humans have broad applications in pedagogy and conversational AI. To advance t ...
Condescending language use is caustic; it can bring dialogues to an end and bifurcate communities. Thus, systems for ...
Automatic summarization methods have been studied on a variety of domains, including news and scientific articles. Y ...
Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a m ...
Automated fact-checking based on machine learning is a promising approach to identify false information distributed ...
The lack of large-scale datasets has been a major hindrance to the development of NLP tasks such as spelling correct ...
This paper introduces RyanSpeech, a new speech corpus for research on automated text-to-speech (TTS) systems. Public ...
数据集包含 60,000 本电子书, 朗: 多语言, 迭代: 60 0, file_type: 文本, 任务: 文本Corpora
Dataset包含超过 3900 万篇在计算机科学、神经科学和生物医学领域发表的研究论文 file_type。
随着在线社交网络的普及,监控所有用户生成的内容变得越来越困难。因此,自动化互联网上不当交换内容的适度过程已成为一项优 ...
数据集使用专业工作室以南莱万丁阿拉伯语(达马西亚口音)录制。合成语音作为使用此语料库的输出产生了高质量 --, file_type ...
本文介绍了 NUBes语料库的第一个版本(西班牙语生物医学文本中的否定和不确定性注释)。语料库是持续研究的一部分,目前包括 ...
数据集收集了波兰议会、议会和参议院会议记录中的语言分析文件。它基于波兰 Sejm Corpus.,朗: 波兰语, 迭代: 3,000+, ...
共同点是创造、修复和更新相互理解的过程,这是复杂的人类交流的一个重要方面。然而,传统的对话制度建立共同点的能力有限, ...
Dataset包含来自客户的负面反馈,其中他们陈述了对给定公司不满意的原因。数据集有英文和意大利文版本,朗:意大利语、英语 ...
科学文献的发展速度比以往任何时候都要快。由于出版物数量不断增加,以及专业领域日益多样化,在特定科学领域寻找专家从未像 ...
要从预先训练的语言模型中获取高质量的句子嵌入,它们必须增加额外的预培训目标,或对大量标记文本对进行精细调整。虽然后一 ...
鉴于一小套种子实体(例如,"美国","俄罗斯"),基于语料库的集扩展是诱导一组广泛的实体,这些实体共享相同的语义类(本例 ...