MIT-IBM Watson AI Lab 就推出了一个全新的百万规模视频理解数据集Moments-in-Time虽然没有之前的YouTube-8M数据集大,但应该是目前多样性,差异性最高的数据集了。该数据集的任务仍然为视频分类任务,不过其更专注于对“动作”的分类,此处的动作为广义的动作或动态,其执行者不一定是人,也可以是物体或者动物,这点应该是该数据集与现有数据集最大的区分。
moments.csail.mit.edu/img/CAM_video_no_probs.mp4
[1] Monfort M, Zhou B, Bargal S A, et al. Moments in Time Dataset: one million videos for event understanding[J].
[2] Salamon J, Jacoby C, Bello J P. A dataset and taxonomy for urban sound research[C]//Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014: 1041-1044.
[3] Sigurdsson G A, Russakovsky O, Gupta A. What Actions are Needed for Understanding Human Actions in Videos?[J]. arXiv preprint arXiv:1708.02696, 2017.