Valerio Perrone
v.perrone '@' warwick.ac.uk
Department of Statistics
University of Warwick
Coventry (UK)
Data Set Information:
The dataset is in the form of a 11463 x 5812 matrix of word counts, containing 11463 words and 5811 NIPS conference papers (the first column contains the list of words). Each column contains the number of times each word appears in the corresponding document. The names of the columns give information about each document and its timestamp in the following format: Xyear_paperID.
The matrix of word counts was obtained using the R package 'tma€? to process the raw .txt files of the full text of the NIPS conference papers published between 1987 and 2015. The document-term matrix was constructed after tokenization, removal of stopwords and truncation of the vocabulary by only keeping words occurring more than 50 times.
Attribute Information:
Column 1: 'X' (list of words)
Columns 2-5812: 'Xyear_ID' (timestamp and paper ID)
Relevant Papers:
Perrone V., Jenkins P. A., Spano D., Teh Y. W. (2016). Poisson Random Fields for Dynamic Feature Models. [Web Link] ([Web Link]).
Citation Request:
If you use this data please cite 'Poisson Random Fields for Dynamic Feature Models'. Perrone V., Jenkins P. A., Spano D., Teh Y. W. (2016). [Web Link] ([Web Link]).