11-11.mn 数据集免费

jsaifc 30 2021-09-06 文本语料

资源介绍

11-11.mn 数据集 (http://ds.jsai.org.cn/) 文本语料 第1张

Context A year ago, I did pretty simple exploratory data analysis on 1111.mn dataset. Since then I was the only person who had an access to it. I kept thinking what if I release the dataset to the public so that researchers and data scientists can take benefit? And today I'm thrilled to announce the release of the first yet biggest open dataset in Mongolian. 1111.mn is a website designed for connecting government agencies with the public. Anyone can issue a ticket for a complaint, criticism or simply a request, then it will be forwarded to a specific government agency. Content Dataset has 80036 records from 2012–10–13 to 2018–11–12(6 years). There are 6 fields as described below: - agency : Government agency name that 1111.mn agent forwarded to. - content: Text content of the ticket - created_at: Creation date, time of the ticket. Note that I had to preprocess this part since the original content was in natural - language format('1 day ago' etc). I used awesome date time converter library called maya. - source_text: Information source. Whether the ticket issued by phone, in person or through the - status_text: Status of the ticket. Whether the issue has been resolved or not. - type_text: Type of ticket. Complaint? Request? Acknowledgements Data is scraped from 11-11.mn Disclaimer I don't own the dataset. So the license is not clear. However, my purpose for releasing this dataset is for research purpose only.

END

发表评论