COVID-19推文:包含关于新型冠状病毒的60多万条推文的数据集免费

jsaifc 16 2021-08-29 医疗图像

资源介绍

This dataset contains 653 996 tweets related to the Coronavirus topic and highlighted by hashtags such as: #COVID-19, #COVID19, #COVID, #Coronavirus, #NCoV and #Corona. The tweets' crawling period started on the 27th of February and ended on the 25th of March 2020, which is spread over four weeks.

The tweets were generated by 390 458 users from 133 different countries and were written in 61 languages. English being the most used language with almost 400k tweets, followed by Spanish with around 80k tweets.

The data is stored in as a CSV file, where each line represents a tweet. The CSV file provides information on the following fields:

  • Author: the user who posted the tweet
  • Recipient: contains the name of the user in case of a reply, otherwise it would have the same value as the previous field
  • Tweet: the full content of the tweet
  • Hashtags: the list of hashtags present in the tweet
  • Language: the language of the tweet
  • Relationship: gives information on the type of the tweet, whether it is a retweet, a reply, a tweet with a mention, etc.
  • Location: the country of the author of the tweet, which is unfortunately not always available
  • Date: the publication date of the tweet
  • Source: the device or platform used to send the tweet

The dataset can as well be used to construct a social graph since it includes the relations "Replies to", "Retweet", "MentionsInRetweet" and "Mentions".

END

发表评论