NLP - 第 2 页 - 数据集市

NLP

46

CSI语料库

荷兰语，该语料库包含两种类型的学生文本：作文和评论。涉及作者（性别、年龄、性取向、来源地区、性格概况）和文档（时间、 ...

Khan

2021-08-24

ZEST: ZEroShot learning from Task descriptions

NLP

144

ZEST: ZEroShot learning from Task descriptions

ZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 differ ...

Khan

2021-08-24

NLP

82

为欧洲官方语言提供网络规模并行语料库（ParaCrawl）

ParaCrawl 是一套大型平行公司，通过广泛的网络爬行工作，为所有欧盟官方语言提供往返英语的辅助。从识别带有翻译文本的网站 ...

Khan

2021-08-24

NLP

57

NLP - fast.ai datasets

Some of the most important datasets for NLP, with a focus on classification, including IMDb, AG-News, Amazon Reviews ...

Khan

2021-08-24

其他软件

121

Google Books Ngrams

N-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The ...

Khan

2021-08-24

NLP

81

Aristo Tuple KB

294,000 science-relevant tuples

Khan

2021-08-24

NLP

224

NIH NCBI PMC 文章数据集

PMC 开放访问（OA）子集，其中包含 PMC 中包含具有机器可读知识

Khan

2021-08-24

NLP

127

日本令牌词典

日本令牌词典，用于与MeCab。

Khan

2021-08-24

NLP

136

苏达奇语言资源

日语词典和文字嵌入用于自然语言处理。苏达奇迪克是日本令牌（形态分析仪）苏达奇的词典。chiVe是日本预训单词嵌入（单词载 ...

Khan

2021-08-24

NLP

130

Common Crawl

A corpus of web crawl data composed of over 50 billion web pages.

Khan

2021-08-24