We generated our own dataset (IITM-HeTra) from cameras monitoring road traffic in Chennai, India. To ensure that data are temporally uncorrelated, we sample a frame every two seconds from multiple video streams. We extracted 2400 frames in total. We manually labeled 2400 frames under different vehicle categories. The number of available frames reduced to 1417 after careful scrutiny and elimination of unclear images. We initially defined eight different vehicle classes commonly seen in Indian traffic. Few of these classes were similar while two classes had less number of labeled instances; these were merged into similar looking classes. For example, in our dataset, we had different categories for small car, SUV, and sedan which were merged under the light motor vehicle (LMV) category. A total of 6319 labeled vehicles are available in the collected dataset. This includes 3294 two-wheelers, 279 heavy motor vehicles (HMV), 2148 cars, and 598 auto-rickshaws. A second dataset was created by merging cars and auto-rickshaws together into light motor vehicle (LMV) class. Approximately 25.2\% of vehicles were occluded. We thank the Interdisciplinary Lab for Data Sciences funded by IIT Madras and Robert Bosch Centre for Data Science and AI (RBC-DSAI) for supporting this research. If you use this dataset in your dataset please cite the following paper: @inproceedings{mittal2018training, title={Training a deep learning architecture for vehicle detection using limited heterogeneous traffic data}, author={Mittal, Deepak and Reddy, Avinash and Ramadurai, Gitakrishnan and Mitra, Kaushik and Ravindran, Balaraman}, booktitle={2018 10th International Conference on Communication Systems \& Networks (COMSNETS)}, pages={589--294}, year={2018}, organization={IEEE} }