肺癌数据集 - 数据集市

资源介绍

Creators:

BUPA Medical Research Ltd.

Donor:

Richard S. Forsyth
8 Grosvenor Avenue
Mapperley Park
Nottingham NG3 5DX
0602-621676

Data Set Information:

The first 5 variables are all blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. Each line in the dataset constitutes the record of a single male individual.

Important note: The 7th field (selector) has been widely misinterpreted in the past as a dependent variable representing presence or absence of a liver disorder. This is incorrect [1]. The 7th field was created by BUPA researchers as a train/test selector. It is not suitable as a dependent variable for classification. The dataset does not contain any variable representing presence or absence of a liver disorder. Researchers who wish to use this dataset as a classification benchmark should follow the method used in experiments by the donor (Forsyth & Rada, 1986, Machine learning: applications in expert systems and information retrieval) and others (e.g. Turney, 1995, Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm), who used the 6th field (drinks), after dichotomising, as a dependent variable for classification. Because of widespread misinterpretation in the past, researchers should take care to state their method clearly.

Attribute Information:

1. mcv mean corpuscular volume
2. alkphos alkaline phosphotase
3. sgpt alanine aminotransferase
4. sgot aspartate aminotransferase
5. gammagt gamma-glutamyl transpeptidase
6. drinks number of half-pint equivalents of alcoholic beverages drunk per day
7. selector field created by the BUPA researchers to split the data into train/test sets

Relevant Papers:

McDermott & Forsyth 2016, Diagnosing a disorder in a classification benchmark, Pattern Recognition Letters, Volume 73.

END

上一篇语音康复数据集

下一篇淋巴造影数据集

发表评论取消回复

请先登录账户再评论哦

肺癌数据集免费

资源介绍

发表评论取消回复

最新文章

热门文章

UCI数据库

帕尔默企鹅数据集

小麦种子数据集

开放采样设置数据集中的气体传感器阵列

BBC 新闻数据集

标签云

猜你喜欢

肺癌数据集免费

资源介绍

发表评论 取消回复

最新文章

热门文章

UCI数据库

帕尔默企鹅数据集

小麦种子数据集

开放采样设置数据集中的气体传感器阵列

BBC 新闻数据集

标签云

猜你喜欢

UCI数据库

帕尔默企鹅数据集

小麦种子数据集

开放采样设置数据集中的气体传感器阵列

BBC 新闻数据集

Twitter 情绪分析和Sentiment140 数据集

电离层数据集

EPIC-Kitchens

纸币验证数据集

Jeopardy! 问题数据集

发表评论取消回复