TOP  >  Thesis/Dissertation  >  Estimation Cooking Action Based on Positional Information Independent of Design of Kitchen

Estimation Cooking Action Based on Positional Information Independent of Design of Kitchen

Nowadays, many funs of cooking publish their original recipes on web sites.Such recipes are often illustrated only by text with several pictures.However, such recipes are sometimes hard to understand how to cook. A recipe with movies is easier to understand it.Our goal is to construct a system which helps users make a recipe with movies. We call a movie which contains one part of cooking process "video segment". To observe cooking, a user is required to set cameras in the kitchen.From the observation, the system extracts video segments automatically.Then, the user can select and attach important video segments easily to a recipe.For realizing such an application, we propose a method to estimate five major cooking actions, "cut", "peel", "wash", "pour", and "stir", through the observation. Some kinds of cooking actions are always performed at the same equipment. For example, "cut" is performed at counter, and "wash" is performed at sink area.However, some are not.For example, "peel" can be performed at both the counter and the sink. For this reason, we cannot estimate the five actions only by the used equipments. We estimate the actions with HMM, which estimates the actions stochastically from the history of the user's migration between the equipments. To extract the migration in the observation, following two problems must be solved. The first problem is possible variations of camera settings. Cameras are set in different positions and angles by users. When camera angle is low, the area where any change on pixel value happens by a cooking action does not correspond to the location of equipments. Hereafter, we call the area with changes on pixel value "working area". To extract the migration automatically, we need to extract working areas from the observation. The second problem is the difference in designs of kitchens. Because of the difference, we need to match working areas and equipments automatically for each kitchen. To solve the first problem, we divides captured scene into some regions by clustering working location for each frame by X-means. Since at least some cooking actions are performed at definite equipments, the working location is expected to construct a distribution with some concentrations. We regard each concentration as a working area. To solve the second problem, we use bias in the period when equipments are used.For example, a user tends to be at the counter in early stage of a cooking for "cut". On the contrary, the stove is used at last. The proposed method matches working areas and equipments for each kitchen as follows. Firstly, we prepare a "teacher kitchen", on which working areas and equipments are matched manually. Secondly, we count up how many frames the user stayed for each working area.This produce histograms, which represents the bias when equipment is used. Finally, we match the histograms obtained from the teacher kitchen and other kitchens. In experiments, we applied our method to two kitchens with different design, and with different number and location of cameras. At first, we checked whether the extracted working areas correspond to the equipments or not. As a result, the extracted areas corresponded to the equipment well when cameras were set on top of the kitchen. Second, we matched the areas to those extracted from the observation of teacher kitchen. We brought out that our method could match the working areas representing "stove" accurately. At last, we evaluated the accuracy of estimating the five major cooking actions by cross validation under following conditions; five videos out of six videos are observed in the teacher kitchen, the major cooking actions are modeled by HMMs from these observations, and the actions in the other observations are estimated with those models. The evaluation shows that "cut", "wash" and "stir" can be estimated with high accuracy. As our future works, we would like to improve the matching accuracy for working areas representing counter and sink, and also the estimation accuracy for "peel" and "pour" actions. Additional evaluations are also important. In this paper, the method is applied to only the observations with chikuzen-ni. It is necessary to evaluate our method with other dishes. Evaluation with more number of kitchens is also an important future work.