新闻数据免费

jsaifc 17 2021-09-02 文本语料

资源介绍

新闻数据 (http://ds.jsai.org.cn/) 文本语料 第1张

Now days it's really a hectic task to generate title to any realtime article like news or comments. As we see the increasing need of real-time data to leverage the models to their full potential, so that any content can be summarized with 10-15 words. The dataset is basically very simple consisting of 2 columns namely title and content. - title : Under this column we have the titles to the news content, which is close to around 10-15 words. - content : This columns holds the news content, which is close to around 55-65 words. This dataset was prepared in Jan 2019, so the data is very new and has got a lot of technological updates so while preprocessing please take care of all the new terms. Last day I was working on a text summarizer + classification problem and this was the time I thought I would take up some realtime data rather than some toy dataset. I wrote a couple of scrapers and got the data and after a week, I was able to build a couple of good models with this data, so I thought of sharing the data so that the public can benefit from the same. All the very best...!!

END
上一篇
下一篇

发表评论