大规模 - 第 5 页 - 数据集市

NLP

49

段落内容的离散推理

DROP 数据集包含 96k 对问答（QA），超过 6.7K 段落，在列训练（77k QAs）、开发（9.5k QAs）和隐藏的测试分区（9.5k Q ...

Khan

2021-08-24

NLP

46

亚马逊-PQA

亚马逊产品问题及其答案，以及公共产品信息。

Khan

2021-08-24

NLP

287

MIMIC-III（"重症监护医疗信息市场"）

MIMIC-III（"重症监护医疗信息市场"）是一个大型的单中心数据库，包含与大型三级护理医院重症监护室的病人有关的信息。数据 ...

Khan

2021-08-24

NLP

124

苏达奇语言资源

日语词典和文字嵌入用于自然语言处理。苏达奇迪克是日本令牌（形态分析仪）苏达奇的词典。chiVe是日本预训单词嵌入（单词载 ...

Khan

2021-08-24

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

NLP

165

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

This paper presents a new challenging information extraction task in the domain of materials science. We develop an ...

Khan

2021-08-24

Sinhala Language Corpora and Stopwords from a Decade of Sri Lankan Facebook

NLP

103

Sinhala Language Corpora and Stopwords from a Decade of Sri Lankan Facebook

This paper presents two colloquial Sinhala language corpora from the language efforts of the Data, Analysis and Poli ...

Khan

2021-08-24

Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes

NLP

127

Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes

As AI systems become an increasing part of people's everyday lives, it becomes ever more important that they underst ...

Khan

2021-08-24

CaSiNo: A Corpus of Campsite Negotiation Dialogues for Automatic Negotiation Systems

NLP

131

CaSiNo: A Corpus of Campsite Negotiation Dialogues for Automatic Negotiation Systems

Automated systems that negotiate with humans have broad applications in pedagogy and conversational AI. To advance t ...

Khan

2021-08-24

A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors

NLP

103

A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors

The lack of large-scale datasets has been a major hindrance to the development of NLP tasks such as spelling correct ...

Khan

2021-08-24

NLP

53

开放式研究科珀斯

Dataset包含超过 3900 万篇在计算机科学、神经科学和生物医学领域发表的研究论文 file_type。

Khan

2021-08-24