资源介绍

TBAD The 早餐准备相关的10个动作数据集 (http://ds.jsai.org.cn/) 机器视觉第1张

A common problem in computer vision is the applicability of the algorithms developed on the meticulously controlled datasets on real world problems, such as unscripted, uncontrolled videos with natural lighting, view points and environments. With the advancements in the feature descriptors and generative methods in action recognition, a need for comprehensive datasets that reflect the variability of real world recognition scenarios has emerged.

This dataset comprises of 10 actions related to breakfast preparation, performed by 52 different individuals in 18 different kitchens. The dataset is to-date one of the largest fully annotated datasets available. One of the main motivations for the proposed recording setup “in the wild” as opposed to a single controlled lab environment is for the dataset to more closely reflect real-world conditions as it pertains to the monitoring and analysis of daily activities.

The number of cameras used varied from location to location (n = 3 − 5). The cameras were uncalibrated and the position of the cameras changes based on the location. Overall we recorded ∼77 hours of video (> 4 million frames). The cameras used were webcams, standard industry cameras (Prosilica GE680C) as well as a stereo camera (BumbleBee , Pointgrey, Inc). To balance out viewpoints, we also mirrored videos recorded from laterally-positioned cameras. To reduce the overall amount of data, all videos were down-sampled to a resolution of 320×240 pixels with a frame rate of 15 fps.

Cooking activities included the preparation of:

coffee (n=200)
orange juice (n=187)
chocolate milk (n=224)
tea (n=223)
bowl of cereals (n=214)
fried eggs (n=198)
pancakes (n=173)
fruit salad (n=185)
sandwich (n=197)
scrambled eggs (n=188).

Citation

The benchmark and database are described in the following article. We request that authors cite this paper in publications describing work carried out with this system and/or the video database.

H. Kuehne, A. B. Arslan and T. Serre. The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities. CVPR, 2014. PDF Bibtex

H. Kuehne, J. Gall and T. Serre. An end-to-end generative framework for video segmentation and recognition. WACV, 2016. PDF Bibtex Project Website

END

上一篇 HMDB 人类动作视频数据集

下一篇 300个正方形，圆形和三角形的图像数据集

发表评论取消回复

请先登录账户再评论哦

TBAD The 早餐准备相关的10个动作数据集免费

资源介绍

Citation

发表评论取消回复

最新文章

热门文章

十个中国明星的人脸数据集

高分辨率卫星图像标注数据集（xBD）

EGTEA Gaze+

Fruits-360水果图像数据集

无人机拍照图片（VisDrone2019）

标签云

猜你喜欢

TBAD The 早餐准备相关的10个动作数据集免费

资源介绍

Citation

发表评论 取消回复

最新文章

热门文章

十个中国明星的人脸数据集

高分辨率卫星图像标注数据集（xBD）

EGTEA Gaze+

Fruits-360水果图像数据集

无人机拍照图片（VisDrone2019）

标签云

猜你喜欢

十个中国明星的人脸数据集

高分辨率卫星图像标注数据集（xBD）

EGTEA Gaze+

Fruits-360水果图像数据集

无人机拍照图片（VisDrone2019）

Human Pose Evaluator 人体轮廓识别图像数据

失物招领数据集

行为检测数据集-ActivityNet

Caltech-UCSD Birds200 鸟类图像数据集

Extended Yale Face Database B 人脸数据集

发表评论取消回复