Context The proteomics dataset was summarized by the SWISS-PROT database release 42 (2003–2004) by which obtained extracting all animal, fungal and plant protein sequences. Content The dataset contains 5959 proteins annotated to one of 11 different subcellular locations which are: chloroplast, cytoplasm, endoplasmic reticulum, extracellular space, Golgi apparatus, lysosomal, mitochondrion, nucleus, peroxisome, plasma membrane and vacuole which represented proteins of plants cell and fungal cell while animal cells shared all localizations with them, but have lysosomes instead of vacuoles. The only variable we intend to consider is protein sequence. Acknowledgements We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research. Inspiration Your data will be in front of the world's largest data science community. What questions do you want to see answered?