此数据集提供产品问题答题系统中标记的幽默检测。数据集包含 3 csv 文件:幽默.csv包含幽默产品问题、非幽默无偏见问题.csv ...
从客户评论中提取的句子集,标有其帮助分数。
此数据集在公开发布的专题聊天数据集(https://github.com/alexa/Topical-Chat)的基础上提供额外的注释,这将有助于重现我 ...
DROP 数据集包含 96k 对问答 (QA), 超过 6.7K 段落,在列训练(77k QAs)、开发 (9.5k QAs) 和隐藏的测试分区 (9.5k Q ...
This bucket contains the checkpoints used to reproduce the baseline results reported in the DialoGLUE benchmark host ...
带有 ASR 错误的句子分类数据数据。
原始堆栈交换答案及其语音友好型重新制定。
亚马逊产品问题及其答案,以及公共产品信息。
REaltime Data 合成和分析 (REDASA) COVID-19 快照包含我们策展人社区制作的策划协议的输出。详细的描述可以在我们的论文 ...
MIMIC-III("重症监护医疗信息市场")是一个大型的单中心数据库,包含与大型三级护理医院重症监护室的病人有关的信息。数据 ...
日本令牌词典,用于与MeCab。
日语词典和文字嵌入用于自然语言处理。苏达奇迪克是日本令牌(形态分析仪)苏达奇的词典。chiVe是日本预训单词嵌入(单词载 ...
A corpus of web crawl data composed of over 50 billion web pages.
Dataset contains news articles and their summaries., lang: Polish, iterations: 723, file_type: TSV, tasks: Summarization
Dataset contains 200 abstracts including a representative sample of all PubMed citations relevant to DNA methylation ...
There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems ...
This paper presents a new challenging information extraction task in the domain of materials science. We develop an ...
The lack of publicly accessible text corpora is a major obstacle for progress in natural language processing. For me ...
This paper presents two colloquial Sinhala language corpora from the language efforts of the Data, Analysis and Poli ...
As AI systems become an increasing part of people's everyday lives, it becomes ever more important that they underst ...