资源介绍

Kannada-MNIST (http://ds.jsai.org.cn/) NLP 第1张

Here, we disseminate a new handwritten digits-dataset, termed Kannada-MNIST, for the Kannada script, that can potentially serve as a direct drop-in replacement for the original MNIST dataset.

Data Collection

This dataset is based off of the efforts of 65 volunteers from Bangalore, India, who are native speakers and users of the Kannada language and the script. This was curated to serve as a direct one-to-one drop-in replacement for the original MNIST dataset (akin to Fashion-MNIST and K-MNIST datasets).

65 volunteers were recruited in Bangalore, India, who were native speakers of the language as well as day-to-day users of the numeral script. Each volunteer filled out an A3 sheet containing a 32 × 40 grid. This yielded filled-out A3 sheets containing 128 instances of each number which we posit is large enough to capture most of the natural intra-volunteer variations of the glyph shapes. All of the sheets thus collected were scanned at 600 dots-per-inch resolution using the Konica Accurio-Press-C6085 scanner that yielded 65 4963 × 3509 png images.

Data Format

The main Kannada-MNIST dataset that consists of a training set of 60000 28 × 28 gray-scale sample images.

Citation

Please use the following citation when referencing the dataset:

@article{prabhu2019kannada,
  title={Kannada-MNIST: A new handwritten digits dataset for the Kannada language},
  author={Prabhu, Vinay Uday},
  journal={arXiv preprint arXiv:1908.01242},
  year={2019}

END

Kannada-MNIST免费

资源介绍

Data Collection

Data Format

Citation

发表评论取消回复

最新文章

热门文章

MIMIC-III（"重症监护医疗信息市场"）

NIH NCBI PMC 文章数据集

IAM 50个最常见的作家手写数据集

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

标签云

猜你喜欢

Kannada-MNIST免费

资源介绍

Data Collection

Data Format

Citation

发表评论 取消回复

最新文章

热门文章

MIMIC-III（"重症监护医疗信息市场"）

NIH NCBI PMC 文章数据集

IAM 50个最常见的作家手写数据集

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

标签云

猜你喜欢

MIMIC-III（"重症监护医疗信息市场"）

NIH NCBI PMC 文章数据集

IAM 50个最常见的作家手写数据集

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

Chinese handwritten digits MNIST dataset

Yahoo! N-Grams 2.0

路透社语料库

NUBES：西班牙临床文本中否定和不确定性的语料库

Reddit评论

发表评论取消回复