宋飞文本语料库

资源介绍

宋飞文本语料库 (http://ds.jsai.org.cn/) 文本语料第1张

Context Seinfeld is my favorite TV show. I wrote a script to scrape the scripts of all Seinfeld episodes from the site [seinology.com](http://www.seinology.com) and merge them into a text corpus so that I could train a language model. The source code is available [here](https://github.com/luonglearnstocode/Seinfeld-text-corpus). Hope you could find it useful. Any feedback would be appreciated. Content corpus.txt: the corpus of length 717576 words, including 64919 lines of Seinfeld scripts, ready to train language models on.

END

上一篇验证码2文本

下一篇 Twitter情绪分析

发表评论取消回复

请先登录账户再评论哦

宋飞文本语料库免费

资源介绍

发表评论取消回复

最新文章

热门文章

天气信息

PLastiCC 我提取的功能

皮尔逊'的父亲和儿子身高数据

datastf

标签云

猜你喜欢

宋飞文本语料库免费

资源介绍

发表评论 取消回复

最新文章

热门文章

宋飞文本语料库

天气信息

PLastiCC 我提取的功能

皮尔逊'的父亲和儿子身高数据

datastf

标签云

猜你喜欢

天气信息

PLastiCC 我提取的功能

皮尔逊'的父亲和儿子身高数据

datastf

模糊多分类

中等故事

推特数据集#AvengersEndgame

RIP Harambe

微博数据集

《旧金山纪事报》文章数据集

发表评论取消回复