Content The dataset contains 10.000 records retrieved by searching for 'matrix factorization' in arXiv API. Records contain, among others, title, author information, summary and article URLs. See [this notebook][1] for more detailed information. Inspiration I wanted to try out text mining methods from [gensim][2]. Acknowledgements I obtained the records using [arxiv.py][3] library. [1]: https://www.kaggle.com/lambdaofgod/arxiv-api-text-mining [2]: https://radimrehurek.com/gensim/ [3]: https://github.com/lukasschwab/arxiv.py