扩展数据集,包含 5 万条文章文本免费

jsaifc 17 2021-09-06 文本语料


扩展数据集,包含 5 万条文章文本 (http://ds.jsai.org.cn/) 文本语料 第1张

This data set is the same as the original Gun Violence Data set, with the exception that I have added text from 50k random articles from the selection, nearly 1/5 of all articles are included. While many of these are corrupt or no longer accessible, some 25k are available for different types of analyses.

